Proceedings of the Fourth International Natural Language

Generating Intelligent Numerical Answers
in a Question-Answering System
Véronique Moriceau
Institut de Recherche en Informatique de Toulouse
118, route de Narbonne, 31062 Toulouse, France
[email protected]
Abstract
In this paper, we present a questionanswering system on the Web which aims
at generating intelligent answers to numerical questions. These answers are generated in a cooperative way: besides a direct
answer, comments are generated to explain to the user the variation of numerical
data extracted from the Web. We present
the content determination and realisation
tasks. We also present some elements of
evaluation with respect to end-users.
1 Introduction
Search engines on the Web and most existing
question-answering (QA) systems provide the user
with a set of hyperlinks and/or Web page extracts
containing answer(s) to a question. These answers
may be incoherent to a certain degree: they may be
equivalent, complementary, contradictory, at different levels of precision or specificity, etc. It is
then quite difficult for the user to know which answer is the correct one. Thus, an analysis of relevance and coherence of candidate answers is essential.
1.1
Related work
Search engines on the Web produce a set of answers to a question in the form of hyperlinks or
page extracts, ranked according to content or popularity criteria (Salton, 1989; Page et al., 1998).
Some QA systems on the Web use other techniques: candidate answers are ranked according
to a score which takes into account lexical relations between questions and answers, semantic
categories of concepts, distance between words,
etc. (Moldovan et al., 2003), (Narayanan and
Harabagiu, 2004), (Radev and McKeown, 1998).
Recently, advanced QA systems defined relationships (equivalence, contradiction, ...) between
Web page extracts or texts containing possible answers in order to combine them and to produce
a single answer (Radev and McKeown, 1998),
(Harabagiu and Lacatusu, 2004), (Webber et al.,
2002).
Most systems provide the user with either a set
of potential answers (ranked or not), or the ”best”
answer according to some relevance criteria. They
do not provide answers which take into account
information from a set of candidate answers or
answer inconsistencies. As for logical approaches
used for database query, they are based on majority approach or on source reliability. But, contrary
to the assumption of (Motro et al., 2004), we noted
that reliability information (information about the
author, date of Web pages, ...) is rather difficult
to obtain, so we assume that all Web pages are
equally reliable.
1.2 Motivations and goals
Our framework is advanced QA systems over open
domains. Our main goals are to model and to evaluate a system which, from a factoid question in
natural language (in French), selects a set of candidate answers on the Web and generates cooperative answers in natural language. Our challenge
is (1) to generate a synthetic answer instead of a
list of potential answers (in order to avoid providing the user with too much information), and (2) to
generate relevant comments which explain the variety of answers extracted from the Web (in order
to avoid misleading the user) (Grice, 1975). In a
cooperative perspective, we propose an approach
for answer generation which uses answer integration. When several possible answers are extracted
from the Web, the goal is to define a coherent core
103
Proceedings of the Fourth International Natural Language Generation Conference, pages 103–110,
c
Sydney, July 2006. 2006
Association for Computational Linguistics
from candidate answers and to generate a cooperative answer, i.e. an answer with explanations.
In this paper, we focus on the integration of numerical data in order to generate natural language
cooperative answers to numerical questions. We
first present some motivational problems for the
generation of numerical answers in a QA system.
Then, we present the content determination and
realization processes. Finally, we give some elements of evaluation of our system outputs, with
respect to end-users.
2 On numerical data
We focus on the integration of numerical data
for the generation of natural language cooperative numerical answers. We first present some related work on generation from numerical data sets.
Then we propose a model for the generation of cooperative numerical answers.
2.1
Related work
The generation of summaries from numerical data
has been developed in some NLG systems. For example, the system ANA (Kukich, 1983) generates
stock market reports by computing fluctuations
for a day. FoG (Goldberg et al, 1994) produces
weather forecasts from forecast data. More recently, StockReporter (Dale, 2003) was developed
to generate summaries describing how a stock performs over a period. Yu et al. (2005) propose a
system which generates summaries of sensor data
from gas turbines.
Those systems have input data analysis components which are more or less efficient and describe
numerical time-series data. In the framework of
QA systems, there are other major problems that
the previous systems do not deal with. When a
numerical question is submitted to a QA system,
a set of numerical data is extracted from the Web.
Then, the goal is not to describe the whole data set
but to find an appropriate answer, dealing with the
user expectations (for example, contraints in the
question) or data inconsistencies. Another important point is the analysis of numerical input data in
order to identify causes (besides time) of variation.
2.2
A typology of numerical answers
Our challenge is to develop a formal framework
for the integration of numerical data extracted
from Web pages in order to produce cooperative
numerical answers.
104
To define the different types of numerical
answers, we collected a set of 80 question-answer
pairs about prices, quantities, age, time, weight,
temperature, speed and distance. The goal is
to identify for each question-answer pair why
extracted numerical values are different (is this an
inconsistency? an evolution?).
A numerical question may accept several
answers when numerical values vary according
to some criteria. Let us consider the following
examples.
Example 1 :
How many inhabitants are there in France?
- Population census in France (1999): 60184186.
- 61.7: number of inhabitants in France in 2004.
Example 2 :
What is the average age of marriage of women in
2004?
- In Iran, the average age of marriage of women
was 21 years in 2004.
- In 2004, Moroccan women get married at the
age of 27.
Example 3 :
At what temperature should I serve wine?
- Red wine must be served at room temperature.
- Champagne: between 8 and 10 ˚ C.
- White wine: between 8 and 11 ˚ C.
The corpus analysis allows us to identify 3 main
variation criteria, namely time (ex.1), place (ex.2)
and restriction (ex.3: restriction on the focus, for
example: Champagne/wine). These criteria can be
combined: some numerical values vary according
to time and place, to time and restrictions, etc. (for
example, the average age of marriage vary according to time, place and restrictions on men/women).
2.3 A model for cooperative numerical
answer generation
The system has to generate an answer from a set
of numerical data. In order to identify the different
problems, let us consider the following example :
What is the average age of marriage in France?
- In 1972, the average age of marriage was 24.5
for men and 22.4 for women. In 2005, it is 30 for
men and 28 for women.
- The average age of marriage in France increased
from 24.5 to 26.9 for women and from 26.5 to 29
for men between 1986 and 1995.
This set of potential answers may seem incoherent but their internal coherence can be made apparent once a variation criterion is identified. In a
cooperative perspective, an answer can be for example:
In 2005, the average age of marriage in France
was 30 for men and 28 for women.
It increased by about 5.5 years between 1972 and
2005.
This answer is composed of:
1. a direct answer to the question,
2. an explanation characterizing the variation
mode of the numerical value.
To generate this kind of answer, it is necessary (1)
to integrate candidate answers in order to elaborate a direct answer (for example by solving inconsistencies), and (2) to integrate candidate answers
characteristics in order to generate an explanation.
Figure 1 presents the general architecture of our
system which allows us to generate answers and
explanations from several different numerical answers. Questions are submitted in natural language to QRISTAL1 which analyses them and selects potential answers from the Web. Then, a
grammar is applied to extract information needed
for the generation of an appropriate cooperative
answer. This information is mainly:
- the searched numerical value (val),
- the unit of measure,
- the question focus,
- the date and place of the information,
- the restriction(s) on the question focus ,
- the precision of the numerical value (for example
adverbs or prepositions such as in about 700, ...),
- linguistic clues indicating a variation of the value
(temporal adverbs, verbs of change/movement as
in the price increased to 200 euro).
For the extraction of restrictions, a set of basic
properties is defined (colors, form, material, etc.).
Ontologies are also necessary. For example, for
the question how many inhabitants are there
in France?, population of overseas regions and
metropolitan population are restrictions of France
because they are daughters of the concept France
in the ontology. On the contrary, prison population of France is not a restriction because prison is
not a daughter of France. Several ontologies are
available2 but the lack of available knowledge for
1
2
Figure 1: Architecture
some domains obviously influences the quality of
answers.
We define the set A = {a1 , ..., aN }, with ai a
frame which gathers all this information for a numerical value. Figure 2 shows an extraction result.
Figure 2: Extraction results
From the frame set, the variation criteria and
mode of the searched numerical value are identified: these components perform content determination. Finally, a natural language answer is
generated explaining those characteristics. Each
of these stages is presented in the next sections.
3 Content determination for
explanations
In order to produce explanations for data variation,
the system must have a data analysis component
www.qristal.fr, Synapse Développement
http://www.daml.org/ontologies/
105
which can infer, from extracted information, the
variation phenomena, criteria and mode.
3.1
Variation criteria
Once we have the frames representing the different
numerical values, the goal is to determine if there
is a variation and to identify the variation criteria
of the value. We assume that there is a variation if
there is at least k different numerical values with
different criteria (time, place, restriction) among
the N frames (for the moment, we arbitrarily set
k = N/4, but this has to be evaluated). Thus, a
numerical value varies according to:
related to the line slope, reflects the degree of linear relationship between two variables. It ranges
from +1 to −1. For example, figure 3 shows that a
positive Pearson’s correlation implies a general increase of values whereas a negative Pearson’s correlation implies a general decrease. On the contrary, if r is low (−0.6 < r < 0.6), then we consider that the variation is random (Fisher, 1925).
1. time if T = {ai (V al), ∃ aj ∈ A,
such as ai (V al) 6= aj (V al)
∧ ai (U nit) = aj (U nit)
∧ ai (Date) 6= aj (Date) }
∧ card(T ) ≥ k
Figure 3: Variation mode
2. place if P = {ai (V al), ∃ aj ∈ A,
such as ai (V al) 6= aj (V al)
∧ ai (U nit) = aj (U nit)
∧ ai (P lace) 6= aj (P lace) }
∧ card(P ) ≥ k
Figure 4 shows the results for the question How
many inhabitants are there in France? Different numerical values and associated dates are extracted from Web pages. The Pearson’s correlation
is 0.694 meaning that the number of inhabitants
increases over time (between 1999 and 2005).
3. restriction if Rt = {ai (V al), ∃ aj ∈ A,
such as ai (V al) 6= aj (V al)
∧ ai (U nit) = aj (U nit)
∧ ai (Restriction) 6= aj (Restriction) }
∧ card(Rt) ≥ k
4. time and place if (1) ∧ (2)
Figure 4: Variation mode: How many inhabitants
are there in France?
5. time and restriction if (1) ∧ (3)
6. place and restriction if (2) ∧ (3)
7. time, place and restriction if (1) ∧ (2) ∧ (3)
4 Answer generation
Numerical values can be compared only if they
have the same unit of measure. If not, they have to
be converted. More details about comparison rules
are presented in (Moriceau, 2006).
3.2
Variation mode
In the case of numerical values varying over time,
it is possible to characterize more precisely the
variation. The idea is to draw a trend (increase,
decrease, ...) of variaton over time so that a precise
explanation can be generated. For this purpose, we
draw a regression line which determines the relationship between the two extracted variables value
and date.
In particular, Pearson’s correlation coefficient (r),
106
Once the searched numerical values have been extracted and characterized by their variation criteria and mode, a cooperative answer is generated in
natural language. It is composed of two parts:
- a direct answer if available,
- an explanation of the value variation.
4.1
4.1.1
Direct answer generation
Question constraints
The content determination process for the direct answer generation is mainly guided by constraints which may be explicit or implicit in the
question. For example, in the question how many
inhabitants are there in France in 2006?, there
are explicit constraints on time and place. On
the contrary, in how many inhabitants are there in
France?, there is no constraint on time. Let C be
the set of question constraints: C = {Ct , Cp , Cr }
with :
- Ct : constraint on time (Ct ∈ {exp time, ∅}),
- Cp : constraint on place (Cp ∈ {exp place, ∅}),
- Cr : constraint on restrictions (Cr ∈ {exp restr,
∅}).
For example, in the question what is the average
age of marriage in France?: Ct = ∅, Cp = France
and Cr = ∅.
When there is no explicit constraint in the question, we distinguish several cases:
- if there is no explicit constraint on time in the
question and if a numerical variation over time has
been infered from the data set, then we assume that
the user wants to have the most recent information:
Ct = max({ai (date), ai ∈ A}),
- if there is no explicit constraint on place in the
question and if a numerical variation according to
place has been infered from the data set, then we
assume that the user wants to have the information
for the closest place to him (the system can have
this information for example via a user model),
- if there is no explicit constraint on restrictions in
the question and if a numerical variation according to restrictions has been infered from the data
set, then we assume that the user wants to have the
information for any restrictions.
For example, on figure 5: Ct = 2000 (the most
recent information), Cp = France and Cr = ∅.
4.1.2 Candidate answers
Candidate frames for direct answers are those
which satisfy the set of constraints C. Let AC be
the set of frames which satisfy C (via subsumption):
AC = {ai ∈ A, such as
ai (date) = (Ct ∨ ∅) ∧ ai (place) = (Cp ∨ ∅) ∧
Cr ∨ ∅
if Cr 6= ∅
ai (restriction) =
exp rest ∨ ∅ if Cr = ∅
For figure 5: AC = {a1 , a2 , a3 , a4 , a5 , a6 }.
4.1.3 Choosing a direct answer
A direct answer has to be generated from
the set AC. We define subsets of AC which
contain frames having the same restrictions: a
direct answer will be generated for each relevant
restriction. Let A be the subsets of frames
satisfying the question constraints and having the
same restrictions: A = {AC1 , ..., ACM } with:
107
ACi = {aj , such as ∀ aj , ak ∈ AC,
aj (restriction) = ak (restriction)
∨ aj (restriction) = ∅},
and AC1 , ..., ACM are disjoint.
For figure 5: A = {AC1 , AC2 } with:
AC1 = {a1 , a3 , a5 }, subset for restriction women,
AC2 = {a2 , a4 , a6 }, subset for restriction men.
Then, for each element in A , an answer is
generated :
∀ ACi ∈ A , answer = generate answer(ACi ).
Each element of A may contain one or several frames, i.e. one or several numerical data.
Some of these values may be aberrant (for example, How high is the Eiffel Tower? 300m, 324m,
18cm): they are filtered out via classical statistical
methods (use of the standard deviation). Among
the remaining frames, values may be equal or not
at different degrees (rounded values, for example).
Those values have to be integrated so that a synthetic answer can be generated.
There are many operators used in logical approaches for fusion: conjunction, disjunction, average, etc. But, they may produce an answer
which is not cooperative: a conjunction or disjunction of all candidates may mislead users; the average of candidates is an ”artificial” answer since it
has been computed and not extracted from Web
pages.
Our approach allows the system to choose a
value among the set of possible values, dealing
with the problem of rounded or approximative
data. Candidate values are represented by an oriented graph whose arcs are weighted with the cost
between the two linked values and the weight (w)
of the departure value (its number of occurrences).
A graph G of numerical values is defined by N
the set of nodes (set of values) and A rc the set of
arcs. The cost c(x, y) of arc(x, y) is:
n
n
X
X
|x − y|
× (w(x) +
w(xi )) +
c(xi , x).
y
i=1
i=1
with (x1 , ..., xn , x) a path from x1 to x.
Finally, we define a fusion operator which
selects the value which is used for the direct
answer. This value is the one which maximizes
the difference (cost(x)) between the cost to leave
this value and the cost to arrive to this value:
Figure 5: Data set for What is the average age of marriage in France?
answer = y ∈ N , such as
cost(y) = max({ cost(n), ∀ n ∈ N ,
cost(n) = cost leave(n)
− cost arrive(n)})
P
with: cost leave(x) = Pi c(x, xi ) and,
cost arrive(x) = i c(xi , x).
Let us consider an example. The following values are candidate for the direct answer to the question How high is the Mont-Blanc?: 4800, 4807
(2 occurrences), 4808 (2 occurrences), 4808.75,
4810 (8 occurrences) and 4813. Figure 6 shows
the graph of values: in this example, the value
which maximizes the costs is 4810.
From the selected value, the system generates
a direct answer in natural language in the form
of Focus Verb (Precision) Value. For example,
the generated answer for How high is the MontBlanc? is The Mont-Blanc is about 4810 meters
high. Here the preposition about indicates to the
user that the given value is an approximation.
For the question what is the average age of marriage in France?, a direct answer has to be generated for each restriction. For the restriction men
(AC2 ), there are 3 candidate values: 29.8, 30 and
30.6, the value which minimizes the costs being
30. For the restriction women (AC1 ), there are
also 3 candidate values: 27.7, 28 and 28.5, the
value which minimizes the costs being 28. After aggregation process, the generated direct answer is: In 2000, the average age of marriage in
France was about 30 years for men and 28 years
for women.
4.2
Explanation generation
The generation of the cooperative part of the answer is complex because it requires lexical knowledge. This part of the answer has to explain to
the user variation phenomena of search values:
when a variation of values is identified and char108
acterised, an explanation is generated in the form
of X varies according to Criteria. In the case of
variation according to restrictions or properties of
the focus, a generalizer is generated. For example, the average age of marriage varies for men and
women: the explanation is in the form the average
age of marriage varies according to sex. The generalizer is the mother concept in the ontology or a
property of the mother concept (Benamara, 2004).
For numerical value varying over time, if the variation mode (increase or decrease) is identified,
a more precise explanation is generated: X increased/decreased between... and... instead of X
varies over time.
Here, verbs are used to express precisely numerical variations. The lexicalisation process needs
deep lexical descriptions. We use for that purpose a classification of French verbs (Saint-Dizier,
1999) based on the main classes defined by WordNet. The classes we are interested in for our
task are mainly those of verbs of state (have,
be, weight, etc.), verbs of change (increase, decrease, etc.) and verbs of movement (climb,
move forward/backward, etc.) used metaphorically (Moriceau et al, 2003). From these classes,
we selected a set of about 100 verbs which can be
applied to numerical values.
From these classes, we characterized sub-classes
of growth, decrease, etc., so that the lexicalisation
task is constrained by the type of verbs which has
to be used according to the variation mode.
A deep semantics of verbs is necessary to generate an answer which takes into account the characteristics of numerical variation as well as possible: for example, the variation mode but also
the speed and range of the variation. Thus, for
each sub-class of verbs and its associated variation mode, we need a refined description of ontological domains and selectional restrictions so that
Figure 6: Graph of candidate values for How high is the Mont-Blanc?
an appropriate verb lexicalisation can be chosen:
which verb can be applied to prices, to age, etc.?
(Moriceau et al, 2003). We propose to use proportional series representing verb sub-classes according to the speed and amplitude of variation. For
example, the use of climb (resp. drop) indicates
a faster growth (resp. decrease) than go up (resp.
go down): the verb climb is prefered for the generation of Gas prices climb 20.3% in october 2005
whereas go up is prefered in Gas prices went up
7.2% in september 2005.
Verbs can possibly be associated with a preposition that refines the information (The average age
of marriage increased by about 5.5 years between
1972 and 2005).
4.3
Answer justification
Our system generates a cooperative answer composed of a direct answer to the question and an explanation for the possible variation of the searched
numerical value. But the answer may not be sure
because of a too high/low number of candidate
values to the direct answer. In this case, it may be
useful to add some additional information for the
user in order to justify or complete the generated
answer.
We propose to add a know-how component to
our system, which provides the user with one or
two relevant Web page extracts besides the generated answer whenever it is necessary. These extracts must contain information about the searched
numerical values, and for example some explanations of the causes of numerical variation. Some
linguistic clues can be used to select page extracts:
number of numerical values concerning the question focus, causal marks (because of, due to, ...),
etc. Figure 7 shows an output example of our system.
109
Figure 7: An ouput example
5 Evaluation
In this section, we present some elements of evaluation of our system with respect to 15 end-users3 .
We first evaluated how users behave when they
are faced with different candidate answers to a
question. To each user, we presented 5 numerical questions and their candidate answers which
vary according to time or restrictions and ask them
to produce their own answer from candidate answers. For numerical answers varying according
to restrictions, 93% of subjects produce answers
explaining the different numerical values for each
restriction. For numerical answers varying over
time, 80% of subjects produce answers giving the
most recent information (20% of subjects produce
an answer which a summary of all candidate values). This validates our hypothesis presented in
section 4.1.1.
The second point we evaluated is the answer order. Our system produces answers in the form of
a direct answer, then an explanation and a justification (page extract) if necessary. We proposed
to users answers with these three parts arranged
randomly. Contrary to (Yu et al, 2005) which propose first an overview and then a zoom on inter3
Subjects are between 20 and 35 years old and are accustomed to using search engines.
esting phenomena, 73% of subjects prefered the
order proposed by our system, perhaps because, in
QA systems, users wants to have a direct answer
to their question before having explanations.
The last point we evaluated is the quality of the
system answers. For this purpose, we asked subjects to choose, for 5 questions, which answer they
prefer among: the system answer, an average, an
interval and a disjunction of all candidate answers.
91% of subjects prefered the system answer. 75%
of subjects found that the explanation produced is
useful and only 31% of subjects consulted the Web
page extract (28% of these found it useful).
S. Harabagiu and F. Lacatusu. 2004. Strategies for
Advanced Question Answering. Proceedings of the
Workshop on Pragmatics of Question Answering at
HLT-NAACL 2004.
6 Conclusion
V. Moriceau. 2006. Numerical Data Integration
for Question-Answering. Proceedings of EACLKRAQ’06, Trento, Italy.
We proposed a question-answering system which
generates intelligent answers to numerical questions. Candidate answers are first extracted from
the Web. Generated answers are composed of
three parts: (1) a direct answer: the content
determination process ”chooses” a direct answer
among candidates, dealing with data inconsistencies and approximations, (2) an explanation: the
content determination process allows to identify,
from data sets, the possible value variations and
to infer their variation criteria (time, place or restrictions on the question focus), and (3) a possible Web page extract. This work has several future
directions among which we plan:
- to define precisely in which cases it is useful to
propose a Web page extract as a justification and,
- to measure the relevance of restrictions on the
question focus to avoid generating an enumeration
of values corresponding to irrelevant restrictions.
References
K. Kukich. 1983. Knowledge-based report generation: a knowledge engineering approach to natural
language report generation. Ph.D. Thesis, Information Science Department, University of Pittsburgh.
D. Moldovan, C. Clark, S. Harabagiu and S. Maiorano.
2003. COGEX: A Logic Prover for Question Answering. Proceedings of HLT-NAACL 2003.
V. Moriceau and P. Saint-Dizier. 2003. A Conceptual
Treatment of Metaphors for NLP. Proceedings of
ICON, Mysore, India.
A. Motro, P. Anokhin. 2004. Fusionplex: resolution
of data inconsistencies in the integration of heterogeneous information sources. Information Fusion,
Elsevier.
S. Narayanan, S. Harabagiu. 2004. Answering Questions Using Adcanced Semantics and Probabilistic Inference. Proceedings of the Workshop on
Pragmatics of Question Answering, HLT-NAACL,
Boston, USA, 2004.
L. Page, S. Brin, R. Motwani, T. Winograd. 1998. The
PageRank Citation Ranking: Bringing Ordre to the
Web. Technical Report, Computer Science Department, Stanford University.
D.R. Radev and K.R. McKeown. 1998. Generating Natural Language Summaries from Multiple OnLine Sources. Computational Linguistics, vol. 24,
issue 3 - Natural Language Generation.
P. Saint-Dizier. 1999. Alternations and Verb Semantic
Classes for French. Predicative Forms for NL and
LKB, Kluwer Academic.
P. Saint-Dizier. 2005. PrepNet: a Framework for
Describing Prepositions: preliminary investigation
results. Proceedings of IWCS’05, Tilburg, The
Netherlands.
F. Benamara. 2004. Generating Intensional Answers
in Intelligent Question Answering Systems. LNAI
Series, volume 3123, Springer.
G. Salton. 2002. Automatic Text Processing. The
Transformation, Analysis and Retrieval of Information by Computer, Addison-Wesley.
R. Dale.
2003.
http://www.ics.mq.edu.au/ lgtdemo/StockReporter/.
R. A. Fisher 1925. Statistical Methods for Research
Workers, originally published in London by Oliver
and Boyd.
B. Webber, C. Gardent and J. Bos. 2002. Position
statement: Inference in Question Answering. Proceedings of LREC, Las Palmas, Spain.
E. Goldberg, N. Driedger, R. Kittredge. 1994. Using natural language processing to produce weather
forecasts. IEEE Expert 9(2).
J. Yu, E. Reiter, J. Hunter, C. Mellish. 2005. Choosing
the content of textual summaries of large time-series
data sets. Natural Language Engineering, 11.
H.P. Grice. 1975. Logic and conversation. In P. Cole
and J.L. Morgan, (eds.): Syntax and Semantics, Vol.
3, Speech Acts, New York, Academic Press.
110