COATIS, an NLP system to locate expressions of actions connected

COATIS, an NLP System to Locate Expressions
of Actions Connected by Causality Links
D a n i e l a Gaxcia
Universitd de Paris-Sorbonne, Cams-Lalic
96, boulevard Raspail, 75006 Paris, France
and
EDF-DER, I M A - T I E M
1, avenue du General-De-Gaulle, 92141 Clamart Cedex, France
A b s t r a c t . COATIS is an automatic tool designed to locate certain actions expressed in texts. Rules of contextual exploration, activated by the
presence of linguistic indicators of causality in sentences, enable COATIS
to locate expressions that denote field actions and that are linked by
causal relations. COATIS processes technical texts of any domain, in the
French language. It is therefore particularly suitable for use in causal
knowledge acquisition from texts.
1
Introduction
The notions of transfer, entity and movement, expressed by natural languages,
have been extensively studied for instance by Talmy [18, 19], Langacker [14],
Jackendoff [11] and Pustejovski [16], but the systematic study of the encoding
of causality by natural languages is still in its early stages. Several research
studies of systematic description of vocabulary (verbs of movement are analyzed
by means of schemas by M. Abraham in [1], semantic transitivity, aspectuality
schema and diathesis schema had been studied by J.-P. Desclds in [5], while particular semantic domains such as relations of localisation and whole-part relations
were also recently studied and presented in [13] and [12]) have been conducted
with the aim to get knowledge from texts without any information about the
field described in the processed text.
In this paper, we set out the results concerning the notions of action and
causal relations between actions as expressed by verbs of the French language.
The model we built is coupled with the Strategy of the Contextual Exploration to
obtain the COATIS computer system. This system aims to index the processed
text by the actions expressed within it and that are organized by causal links. We
start by explaining (Section 2) how we organized the French verbs that express
causal links between actions. We then describe (Section 3) the COATIS system.
2 Semantic Organization of Causality
as it is Expressed in French
A classic distinction between the efficient causality and the causality that is able
to be described by formal representations, has long been established and we take
348
it into account. We distinguish between the efficient causality where one action
provokes a different action that comes later in time ("Massive deforestation of
the planet leads to global coolin]'), and the causality that substitutes the notions of cause and effect by regularities encountered between actions ("Energy
is proportional to mass"). We extend this distinction with an original work on
the organization of the efficient causal relations.
2.1
Efficient Causality as E x p r e s s e d by French Verbs
The idea of efficient causality as an oriented retation between actions, can be
expressed by French verbs. French verbs such as provoquer (to provoke), g~ner
(to disturb), rgsulter (to result), or conduire d (to lead to), are called indicator
verbs o] causality (or indicators for short). The indicators that express efficient
causality relations can (i) clarify the nature of the produced effect (-disturbing -1,
-letting-, -modification-, -creation-, etc.), or (ii) clarify the intervention of the
causal action (-contribution-, -collaboration-). Figure i presents an extract of the
model comprising twenty-three specific relations of causality (nineteen relations
of efficient causality and four relations of formal causality).
Fig. 1. Semantic organization of the relations of efficient causality (extract).
The model presented below comes primarily from the manual classification
of indicator verbs found in technical texts. Some of the classes we describe were
first brought forward by the American linguist Leonard Talmy [20].
2.2
Efficient Causalities Clarifying t h e P r o d u c e d Effect
One causal action can block, impede, influence, create or keep another action. The
two relations -blocking and -keeping- respectively extend the relations -impediment1 We note each specific causal relation between two hyphens.
349
and -creation- that render an account of two extreme notions. Between the two
notions, -creation- and -impediment-, non categoric influences from one action
to another one are possible.
One causal action can facilitate, let or disturb another action. These three
relations can be ordered according to a qualitative feature which is associated
with the relation of-influence-: the relation of-facilitation- is a modification t h a t
is more or less positive; the relation of-letting- is a nil modification; the relation
of-disturbing- is a modification that is more or less negative.
As described above, the organization of efficient causalities clarifying the
nature of the produced effect justifies the existence of a continuum between
-blocking- and -keeping-. Indeed, the -facilitation- relation, taken to extremes,
becomes a -creation-. And the relation of -disturbing-, varying in its intensity,
merges into the relation of-impediment- when it goes beyond the extrem limit.
Other authors are currently engaged in research on verb classification. Note
in particular the work of Patrick Saint-Dizier described in [17], as well as the
results exposed by Beth Levin in [15]. Jacques Francois in [8] is interested in the
notion of causality as a criteria for classifying the causation verbs of change.
Our model is based on an analysis of texts with frequent occurrences of 253
verbs carrying the notion of efficient causality.
3
COATIS
COATIS applies the Strategy of the Contextual Exploration [7] to detect expressions of action. The method is applied sentence by sentence to the whole
text. The presence of an indicator (e.g., one of the verbs in Fig. 1) activates a
contextual exploration of the sentence that seeks to detect the clues necessary
to:
- decide if the indicator is likely to express a causal relation in the text or, on
the contrary, to show that an interpretation of causality is impossible;
- identify the arguments of the relation.
This is the general organization of a system that processes texts by the
Strategy of the Contextual Exploration. This method is still implemented and
used to resolve problems such as knowledge modelling from texts (SEEK system
is presented in [13], automatic abstracting [2], as well as tense and aspect analysis
[6]). These systems are implemented as Knowledge Base Systems where the
knowledge used is only a linguistic one. Figure 2 illustrates an example of a
sentence processed by COATIS.
COATIS is implemented in SMECI 2. The system applies to texts t h a t have
first been processed by L E X T E R [3], a terminology extraction software that
performs a morpho-syntactic analysis of a corpus of French texts on any technical
domain and yields a network of noun phrases.
2 SMECITM . from ILOG T M - is a Knowledge-Based System generator.
350
Input:
La [raise ~ Famllele de diffefent~types d'ouvral~l
la p!upart du t ~ l le If~ec.~n~-n~t du ~s~ul
outl~e
~ve
I
zr~ e~ parallele de diffeam~*~
d'ouwages
t disturbing
Fig. 2. An example of a sentence processed by COATIS. The noun phrases between brackets are the noun phrases detected by LEXTER. The sentence processed
is "Putting different kinds of equipment in parallel generally hinders the operation of
the network". The result of the process is that the two expressions of action "Putting
different kinds of equipment in parallel" and "operation of the network" are in causal
relation of-disturbing-.
Linguistic indicators are not sufficient to detect the presence of a causal
relation in a sentence. Account must be taken of their occurrence. In order to
keep only those utterances that express causality, it is necessary to examine the
context of the indicators detected, in order to locate relevant clues. The rules
of contextual exploration analyze different kind of information, e.g. morphosyntactic (detecting for exemple the occurrence of an infinitive verb preceding
or following the occurrence of the indicator); or morphologic (for example, the
presence in part of the indicator context, to be determined by the system, of a
French linguistic unit ending in -ment, -ion, -ure, -ise, -age, -aison, -yse, etc.).
COATIS criteria for selecting LEXTER units are essentially connected to
the relative position of different units in the sentence: LEXTER units, chosen
prepositions depending on indicators, verbs with a morphology feature (infinitive, past or present participle, conjugated) and punctuation. The noun phrase
selected (one LEXTER unit) is then explored to search for additional clues that
would confirm the first causal interpretation of the indicator (e.g. the presence of
a French noun ending in -ment, -ure, -age, etc.). This second exploration is necessary because the morphological information is only relevant when it is present
in a certain part of the sentence.
The organization of indicator verbs also makes it possible to specify the causal
value concerned (-creation-, -disturbing-, -collaboration-, etc.).
One of the main features of COATIS is its independence with regard to the
subject field processed. Indicators and clues are independent of any specific field.
COATIS does not need a particular domain dictionary to operate. It is therefore
a useful tool for causal knowledge acquisition from texts, since no preliminary
domain knowledge is required.
At present, there are few computerized systems with the same operational
designs, and it seems difficult to compare COATIS with other systems. Quote
from the work of Gary C. Borchardt [4]: it is about a system which locates
the causal relations in written texts according to the directives provided by the
351
constructor of the system. In contrast, COATIS processes any technical text t h a t
is not specifically written for its use. For the time being, COATIS does not make
a lot of noise (about 15% for the "Guide for Regional Networks Planning");
however it is much more difficult to quantify what the system does not find,
so we have to process many other texts to be able give a verdict. However, the
results already obtained from technical texts are encouraging, as is the feedback
from cogniticians using the results.
4
Using
Results
of COATIS
COATIS can be used for different applications. COATIS results are helpful in
constituting action terminology for a given domain. They can also be helpful in
building a causal model of a domain: COATIS provides (i) a number of expressions of action and (ii) a basic organizational structure based on causal relations.
Moreover, we have listed a number of additional contributions to the modelling
process:
- When resolving problems of inference, if we have detected that "A -letting- B"
and if we moreover know that A is carried out, then B becomes possible ;
if we have detected that "A -cause- B" and that A is carried out, then B is
carried out or going to be carried out at a certain point.
- When building a causal network, if "A -cause- B" and "B -cause- C", we
should ask the expert if we can add the relation "A -cause- C".
- When testing the coherence of the knowledge gathered, if "A -cause- B" and
"B -hinder- C", and if the relation "A -cause- C" exists in the network, we
have to argue this apparent paradox by considering it as a particular case
or an exception, or even by leaving it out altogether.
The results provided by COATIS are being used in the H Y P E R P L A N [10]
project which concerns the building of a Technical Documentation Consulting System (TDCS) for the "Guide for Regional Networks Planning" (150.000
words): part of the text indexing takes into account the causal information identified by COATIS [9].
5
Conclusion
The approach adopted to building COATIS aims to find an operational m e t h o d
for constructing representations from a text, taking into account the causality
notion. While our university research team is engaged in theoretical research,
we are implementing a number of automatic tools to process texts. The independence of these systems with regard to a particular field and their ability to
adapt to many different texts was made possible by the support of a strong
theoretical base : a linguistic model under construction that explores and seeks
to model the general notions expressed by natural languages, e.g. membership,
loealisation, whole-part relation, movement, transfer and.., causality.
352
References
I. Abraham, M.: Analyse sdmantico-cognitive des verbes de mouvement et d'activitd :
Contribution mdthodologique ~ la constitution d'un dictionnaire informatique des
verbes. Phl) E H E S S Paris (1995)
2. Berri, J., Le Roux, D., Malrieu, D, Minel, J.-L.: S E R A P H I N main sentences automatic extraction system. Proceedings of the Second Language Engineering Convention Londres October (1995)
3. Bourigault, D.: L E X T E R , a Natural Language Processing Tool for Terminology
Extraction. Proceedings of the 7th E U R A L E X International Congress Goteborg
(1996)
4. Borchardt, G.-C.: Thinking between the Lines, Computers and Comprehension of
Causal Descriptions. MIT Press Cambridge Massachusetts (1994)
5. Desclds, J.-P.: Langages applicatifs, Langues naturelles et Cognition. Hermds Paris
(1990)
6. Desclds, J.-P., Jouis, C., Oh, H.-G., Reppert, D.: Exploration contextuelle et sdmantique: Un syst~me expert qui trouve les valeurs sdmantiques des temps de l'indicatif
dans un texte. Knowledge Modelling and expertise transfer. D.Herin-Aime, R. Dieng, J.-P. Regourd, J.-P. Angoujard (eds) IOS Press Amsterdam Washington D C
Tokyo (1991) 371-400
7. Desclds,J.-P.,Minel, J.-L.:L'exploration contextuelle.In Le rdsum6 par exploration
contextuelle. Communications to the Cogniscience-Est Meeting, Nancy, November
1994. Technical Report C A M S 95(1) (1995) 3-17
8. Francois, F.: Changement, causation, action. Librairie Droz Gen~ve-Paris (1989)
9. Garcia, D., Aussenac-Gilles, N., Courcelle, A.: Exploitation, pour la moddlisation,
des connaissances causales ddtectdes par COATIS dans les textes. Proceedings of
7th Journdes d'Acquisition des Connaissances. Sate France (1996)
10. Gros, C., Assadi, H., Aussenac-Gilles, N., Courcelle, A.: Task Models for Technical
Documentation Accessing. Proceedings of the 10th European Knowledge Acquisition Workshop. Nottingham (UK) (1996)
11. Jackendoff, R.: Semantics and Cognition. Cambridge (Mass.) MIT Press (1983)
12. Jackiewicz, A.: Expression lexicale de la relation d'ingrddience. Faits de Langue 7
(1996)
13. Jouis, C., Mustafa-Elhadi, W.: Conceptual Modeling of Database-Schema using
linguistic knowledge. Application to Terminological databases. Proceedings of the
First Workshop on Application of Natural Language to Databases (NLDB:9295).
AFCET Versailles France (1995) 103-118
14. Langacker, L.: Foundation of Cognitive Grammar. Standford Univ. Press 1 (1987)
15. Levin, B.: English Verb classes and Alternations, Preliminary investigations. University of Chicago Press (1993)
16. Pustejovski, J.: Generative Lexicon. MIT Press (1995)
17. Saint-Dizier, P.: Verb semantic Classes for French: Construction and Semantic
Representation. Proceedings of IFIP, Conference on verb semantic classes. Univ. of
Pennsylvania (1995)
18. Talmy, L.: Semantics and Syntax of Motion. Syntax and Semantics 4 NY Academic
Press (1975) 181-238.
19. Talmy, L.: How Language Structures Space, Spatial Orientation: Theory, Research
and Application. H. Pick, L. Acredolo (eds.) Prenum Press (1983)
20. Talmy, L.: Force dynamics in language and cognition. Cognitive Science 12 (1988)
49-100