as a PDF

Chapter 12
An Experiment in Temporal Language Learning
Kateryna Gerasymova1 and Michael Spranger2,3
This paper is the author’s draft and has now been officially published as:
Kateryna Gerasymova and Michael Spranger (2012). An Experiment in Temporal Language Learning. In Luc Steels and Manfred Hild (Eds.), Language Grounding in Robots, 237–254. New York:
Springer.
Abstract Russian requires speakers of the language to conceptualize events using
temporal language devices such as Aktionsarten and aspect, which relate to particular profiles and characteristics of events such as whether the event just started,
whether it is ongoing or it is a repeated event. This chapter explores how such temporal features of events can be processed and learned by robots through grounded
situated interactions. We use a whole systems approach, tightly integrating perception, conceptualization grammatical processing and learning and demonstrate how
a system of Aktionsarten can be acquired.
Key words: temporal language, aspect, cognitive semantics, fluid construction
grammar, language acquisition
12.1 Introduction
Temporal language concerns the conceptualization and expression of relations between events (such as past, present, future) and the internal structure of events (such
as perfective versus imperfective). We focus here only on aspect and use Russian as
target language because this language is known to have a complex aspectual system.
To illustrate this chapter, we consider an example sentence from Russian:
1 Institute
of Biomedical Engineering (IBME), University of Oxford, Oxford, United Kingdom, email: [email protected]
2 Sony Computer Science Laboratory Paris, 6 rue Amyot, 75005 Paris, France
3 Systems Technology Laboratory, Sony Corporation, Minato-ku 108-0075, Tokyo, Japan
237
238
K. Gerasymova and M. Spranger
Example 1. Mixa zaxagal, kogda Maxa stola.
Miša zašagal, kogda Maša stojala.
‘Misha began to step, while Masha was standing.’
Aktionsarten describe the lexical temporal semantics of verbs. Here we approach
Aktionsarten in terms of event boundaries (Bickel, 1996), as also used by Stoll
(1998) (as in Figure 12.1). The event itself is bound in time: It has a starting and
an ending point and is of some duration. The boundaries and phases of this event,
which are the focal point of the Aktionsart in question, are indicated by a curly
brace.
{
event
t
Fig. 12.1 Duratives focus on the inner portion of the event and do not highlight any boundaries.
Durative Aktionsart1 comprises verbs, which describe events without referring
to any boundaries, e.g. meqtat~i (mečtat’, ‘dream’), as illustrated in Figure 12.1.
Duratives are imperfective and are often used as a description of usual facts, simple
activities and states, as in the example:
Example 2. Kogda Kat govorilai , ona oqen~ milo ulybalas~i ...
Kogda Katja govorila, ona očen’ milo ulybalas’.
‘When Katya spoke, she smiled charmingly.’
(I. S. Turgenev. Fathers and Sons)
Ingressives express the beginning of an event, e.g. zaxagat~ p (zašagat’, ‘start
stepping’). They are most commonly associated with the prefix za- (za-), which
introduces an initial boundary to the event described by the unprefixed verb, as
shown in Figure 12.2. The new form becomes perfective.
{
event
t
Fig. 12.2 Ingressive Aktionsart concentrates on the initial boundary of an event.
In the following sentence, the ingressive zaplakat~ p (zaplakat’, ‘burst into
tears’) denotes the beginning of a corresponding event of crying:
Example 3. Tt Kat ne hotela ehat~, da kak zaplaqet p ...
Tëtja Katja ne chotela echat’, da kak zaplačet.
‘Aunt Katya didn’t want to go and suddenly burst into tears.’
(L. N. Tolstoy. A gloomy morning)
1
The terminology is comparable to that of Forsyth (1970) who refers to Aktionsarten as procedurals and uses slightly different terms for each of the different Aktionsarten.
12 An Experiment in Temporal Language Learning
(profile-event-aktionsart
?stand-dur
?stand-events
(bind
239
?durative)
aktionsart
(get-context
?durative
(filter-set-event
(filter-by-allen-relation-to-now
(filter-event-by-agent
?stand-dur-past
?masha-stand-dur-past
?stand-dur
durative)
?stand-events
?stand-time)
?stand-dur-past
?idv)
(bind
(bind
?ctx-1
?stand-event-type)
event-type
?stand-event-type
allen-relation
(filter-by-individual
(unique-entity
(unique-entity
?topic
(filter-by-allen-relation
?stand-event
(filter-event-by-agent
(filter-by-allen-relation-to-now
(get-context
?misha-step-ingr-past
(filter-set-event
?step-ingr-past
(get-context
?step-ingr
?step-events
?ctx-3)
?stand-event
?misha-step-ingr-past
(bind
(profile-event-aktionsart
?ctx-2
precedes)
?individual)
allen-relation
?aktionsart)
?step-event-type)
event-type
?step-time)
?step-time
(bind
?step-event-type
?ctx-2)
?allen-relation)
?step-ingr-past
?step-ingr
?step-events
?ctx-3
(bind
?stand-time
?idv
stand-event)
?masha-stand-dur-past)
?target)
?target
?ctx-1)
?idv-2)
(bind
individual
(bind
allen-relation
(filter-by-individual
precedes)
(bind
aktionsart
?idv-2
(get-context
individual
?aktionsart
?individual
?allen
?ctx-4
masha)
contained-in)
?individual-2)
?ctx-4)
?individual-2
misha)
ingressive)
step-event)
Fig. 12.3 IRL network underlying the example sentence 1 configured autonomously by a speaker
to discriminate a particular event in the context.
Another class of temporal Aktionsarten are: delimitatives, which denote the development of an action to a limited extent and are therefore characterized by both
boundaries, the initial and final; the telic Aktionsart incorporates the notion of result or goal of the corresponding action as a part of the verb’s semantics, focusing on
the final boundary; the semelfactives express the notion of doing something once;
they are punctual, so initial and final boundary coincide. All these Aktionsarten
are derived from durative verbs by attaching one of the nineteen different prefixes
(Krongauz, 1998), after which the resulting form becomes perfective. The semelfactives can also be derived by suffixation. An account of the Aktionsarten that are not
purely temporal falls beyond the scope of this paper. Moreover we focus the discussion here only on those Aktionsarten which occur in the example sentence: durative
and ingressive.
240
K. Gerasymova and M. Spranger
12.2 Grounded Temporal Semantics
We use Incremental Recruitment Language or IRL (see Spranger et al, 2012, this
volume) to represent and compute semantic structures that underlie the meaning of
utterances2 . Figure 12.3 depicts a potential network underlying example sentence 1.
It encodes a set of operations that will lead to the identification of Misha’s stepping,
by relating its beginning to the event of Masha’s standing.
In order to understand the operations in the network from Figure 12.3, we first
need to consider the grounding of events in sensorimotor data streams, which underlies the operations of the semantic structure. Earlier work (Steels and Baillie, 2003)
has focussed on complex event structures, here we use a similar albeit simplified approach. Most importantly though and in difference to earlier approaches, grounding
of events is happening as part of conceptualization. Here is how that is achieved.
The vision system provides streams of raw data, which encode spatial and color
properties of objects in the vicinity of the robots. Special subsystems of the vision
system recognize robots or other known entities (see Figure 12.4). The system has
been specifically tuned to recognizing robots and particular parts of robots such as
the feet. The raw data streams perceived by a robot are available to the network
via the operation get-context. The identification of events works on top of the
raw data streams and is realized through the operation filter-set-event. In the
subnetwork of Figure 12.3 (bottom left), filter-set-event performs filtering for
stepping. The operation goes through the sensorimotor stream provided by ctx-3
and identifies trajectories in space and time that resemble stepping. Finally, it packages found events and marks their type (here, step), start and end times, as well as
participating agents.
The processing of Aktionsarten in IRL relies on a type hierarchy in the computer science sense of the term (see Figure 12.5). Events are intervals, consisting of
a start and end time as well as an event-type, e.g. step or stand (which encodes what
kind of event it is), and an object anchor, which points to the agent of the event.
The primitive profile-by-aktionsarten extends the representation of events: it
profiles a given event to either a bound or an unbound subtype, depending on the
particular Aktionsart in question.
For the durative Aktionsart the result of profiling is an unbound event: an event
that still contains the same information as the input event, i.e. start and end time, as
well as type, but with the additional information of being unbound. For Aktionsarten
that are described as bound events, profiling will add highlights on boundaries as
required by the Aktionsart. For instance, profiling by ingressive highlights the initial
boundary. It is important that these distinctions are imposed by the agent: they are
part of conceptualizing a scene in a certain way; they are not part of the scene itself.
Processing of events inevitably raises the issue of reference in time and tense.
We use the Allen interval calculus (Allen, 1983) to represent temporal relations –
2
The demo of the processing of the example sentence can be found on www.fcgnet.org/reviews/aktionsarten-demo/. Additional information on our modeling efforts on temporal
language can be found in Gerasymova et al (2009); Gerasymova (2010); Gerasymova and Spranger
(2010). Early work in the temporal domain can be found in De Beule (2006).
12 An Experiment in Temporal Language Learning
241
Fig. 12.4 Vision system tracking robots. Both the robot and its feet are detected and tracked.
The vision system encodes the trajectory of the foot in relation to the robot, allowing the event
recognition operations to classify particular trajectories as step events or stand events.
interval
start : real
end : real
object-anchor :
symbol
event
type : symbol
bounded event
boundaries : list
unbounded event
Fig. 12.5 Event hierarchy for modeling temporal Aktionsarten. Events are intervals consisting of
a start and end time as well as a an event-type, e.g. walk or read and an object anchor – the agent
of the event.
the basis of tense distinctions. The corresponding operation is called filter-byallen-relation. The special version of this primitive, which employs now as a deictic center, is called filter-by-allen-relation-to-now and is used in the implementation of the past tense combined with the precedes temporal relation. Moreover, when looking closely at the IRL-network from Figure 12.3, one encounters
other events serving as reference points for temporal relations. The two previously
profiled events, stepping and standing, are related through contained-in, realizing
temporal subordination.
The described representations and type hierarchies derive their descriptive power
from the role they play in the complete network. For instance, it now becomes possible to specialize the way temporal relations apply to particular event types. In Russian it is not possible to combine all Aktionsarten with all temporal relations. For an
unbound event, i.e. an event conceptualized as being durative, certain relations like
meets – one event stops exactly when another starts – are incomprehensible. Since a
durative event possesses neither initial nor final boundary, it does not make sense to
relate a durative event using a meets temporal relation which works on boundaries.
However, it is plausible to, for instance, relate a durative event to another one us-
242
K. Gerasymova and M. Spranger
ing a contained relation which is what happens in our example structure. Here, the
durative standing contains the ingressive stepping.
For ingressives the set of possible relationships change in a different way. While
it seems perfectly reasonable to conceptualize some event as meeting an ingressive
event, i.e. finishing exactly when the other starts, the ingressive event itself cannot
meet another event. Because of the underspecified final boundary of ingressives,
the following example where an ingressive stepping meets delimitative crying is
ungrammatical:
Example 4. (*) Kak tol~ko Mixa zaxagal, Maxa poplakala.
Kak tol’ko Miša zašagal, Maša poplakala.
‘As Misha started stepping, Masha cried for a while.’
However, when the same temporal conjunction kak tol~ko, which normally means
immediately after or in terms of the temporal logic meets, occurs with two ingressives, the interpretation is that two events have started together:
Example 5. Kak tol~ko Mixa zaxagal, Maxa zaplakala.
Kak tol’ko Miša zašagal, Maša zaplakala.
‘As Misha started stepping, Masha started crying.’
Hence, the two ingressives can be related with the starts temporal relation, indicating that both events start at the same time, which is in line with our model because
the necessary initial boundaries of both events are activated.
Networks, as the one from Figure 12.3, are possibly built in two different ways.
First, when trying to plan an utterance in order to fulfill a particular communicative
goal, agents search the space of possible networks for the one that promises to best
achieve the goal. Second, when confronted with an utterance in parsing, the language system will recover parts of the network. It is then up to the hearer to fill in
the missing parts of the network given the information encoded in the utterance and
the context.
12.3 Syntactic Processing of Temporal Language
Language processing is implemented in the Fluid Construction Grammar (FCG)
formalism (Steels et al, 2012, this volume). IRL-networks constitute the semantic input for the grammar engine which utilizes constructions to handle production
and parsing. During production, the initial transient structure contains the meaning (IRL-network) on its semantic pole; the syntactic pole is empty (Figure 12.6).
The application of constructions is realized as a search process, which tries to find
the next possible construction to apply. Overall in the course of production of the
example sentence, 12 constructions of the implemented grammar are applied. The
resulting linguistic structure, depicted in Figure 12.7, is rendered into the utterance
“Misha za- shaga -l, kogda Masha stoja -la", which was our targeted example.
12 An Experiment in Temporal Language Learning
243
top
meaning ((get-context ctx-1-idv)
(bind event-type stand-event-type-idv
stand-event)
(filter-set-event stand-events-idv
ctx-1-idv stand-event-type-idv)
(filter-by-allen-relation-to-now
stand-dur-past-idv stand-dur-idv
stand-time-idv)
(bind allen-relation stand-time-idv
precedes)
(profile-event-aktionsart stand-dur-idv
stand-events-idv durative-idv)
(bind aktionsart durative-idv durative)
(filter-event-by-agent
masha-stand-dur-past-idv
stand-dur-past-idv idv-idv)
(get-context ctx-2-idv)
(filter-by-individual idv-idv ctx-2-idv
individual-idv)
(bind individual individual-idv masha)
(unique-entity stand-event-idv
masha-stand-dur-past-idv)
(get-context ctx-3-idv)
sem syn
(bind event-type step-event-type-idv
top
step-event)
(filter-set-event step-events-idv
ctx-3-idv step-event-type-idv)
(filter-by-allen-relation-to-now
step-ingr-past-idv step-ingr-idv
step-time-idv)
(bind allen-relation step-time-idv
precedes)
(profile-event-aktionsart step-ingr-idv
step-events-idv aktionsart-idv)
(bind aktionsart aktionsart-idv
ingressive)
(filter-event-by-agent
misha-step-ingr-past-idv
step-ingr-past-idv idv-2-idv)
(filter-by-individual idv-2-idv
ctx-4-idv individual-2-idv)
(bind individual individual-2-idv misha)
(get-context ctx-4-idv)
(filter-by-allen-relation target-idv
misha-step-ingr-past-idv
stand-event-idv allen-relation-idv)
(bind allen-relation allen-relation-idv
contained-in)
(unique-entity topic-idv target-idv))
Fig. 12.6 Initial transient structure at the beginning of the production process. T he semantic (left)
pole contains the list of primitives representing the meaning that has to be expressed – output of
the IRL network (Figure 12.3). During the production process, the syntactic (right) pole of this
structure will gradually be built up as a result of the application of constructions.
The interplay between Aktionsarten and Aspect is encoded in the grammar
via specialized constructions. Let us zoom into the final transient structure from
Figure 12.7 and focus in the syntactic pole on the part “Misha started stepping", depicted in Figure 12.8. This part contains six units: the unit subject-verb-unit-3
with the information about the word order and its ‘daughters’, which correspond
to the subject misha-unit-2 and verb profiled-verb-3. How was this structure created? First the lexical constructions are applied; the one responsible for
step creates the unit named step-event-unit-2 containing no grammatical information yet. After that, the profile-verb construction triggers on the meaning
profile-event-aktionsart and filter-by-allen-relation-to-now and creates new profiled-verb-3 unit, where the linguistic information about the corresponding semantic and syntactic categories will be allocated. Further, the structure is rearranged: profiled-verb-3 unit is put on the top of the hierarchy, attached to it are the dependent subunits, for example, for the verb-stem itself –
244
K. Gerasymova and M. Spranger
subject-verb-unit-4
form
((meets masha-unit-2
profiled-verb-4))
syn-cat
((pos
intransitive-verb-phrase))
profiled-verb-4
ending-unit-4
form
((meets
stand-event-unit-2
ending-unit-4))
syn-cat
((pos ending))
syn-cat
((aktionsart
durative)
(aspect
imperfective)
(gender idv-idv
female)
(person idv-idv
3rd)
(tense past)
(number idv-idv
singular)
(pos verb))
form
((string
ending-unit-4
"-la"))
stand-event-unit-2
form
((string
stand-event-unit-2
"stoja"))
syn-cat
((pos verb-stem))
masha-unit-2
form ((string masha-unit-2 "Masha"))
syn-cat
((pos proper-noun)
(gender idv-idv female)
(number idv-idv singular)
(person idv-idv 3rd)
(case idv-idv nominativ))
ending-unit-3
conjunction-unit-2
syn
top
syn-cat
((pos ending))
form
((meets
contained-in-unit-2
subject-verb-unit-4)
(meets
subject-verb-unit-3
?comma-1)
(string ?comma-1 ", ")
(precedes
subject-verb-unit-3
subject-verb-unit-4))
profiled-verb-3
form
((meets
prefix-unit-2
step-event-unit-2)
(meets
step-event-unit-2
ending-unit-3))
subject-verb-unit-3
form
((meets misha-unit-2
profiled-verb-3))
syn-cat
((pos
intransitive-verb-phrase))
syn-cat
((aspect perfective)
(aktionsart
ingressive)
(gender idv-2-idv
male)
(person idv-2-idv
3rd)
(tense past)
(number idv-2-idv
singular)
(pos verb))
form
((string
ending-unit-3
"-l"))
prefix-unit-2
syn-cat
((pos prefix))
form
((string
prefix-unit-2
"za-"))
step-event-unit-2
form
((string
step-event-unit-2
"shaga"))
syn-cat
((pos verb-stem))
misha-unit-2
form ((string misha-unit-2 "Misha"))
syn-cat
((pos proper-noun)
(gender idv-2-idv male)
(number idv-2-idv singular)
(person idv-2-idv 3rd)
(case idv-2-idv nominativ))
contained-in-unit-2
form ((string contained-in-unit-2 "kogda"))
syn-cat ((pos conjunction))
Fig. 12.7 Syntactic pole of the final linguistic structure created by the FCG system for expressing
the IRL network from Figure 12.3. Gathering all the strings and meets constraints from the syntactic pole (this figure) yields the utterance “Misha za- shaga -l, kogda Masha stoja -la" (‘Misha
began to step, while Masha was standing’), which was our targeted example sentence 1.
step-event-unit-2. The profiled-verb-3 unit gathers information about the
grammatical categories, on which constructions responsible for morphology can
operate, attaching markers, order constraints, or even prosody information. For example, a special construction for ingressive Aktionsart triggers on a structure only
if the latter features the semantic category (aktionsart ?profiled-event-1
12 An Experiment in Temporal Language Learning
245
ending-unit-3
syn-cat
((pos ending))
profiled-verb-3
form
((meets
prefix-unit-2
step-event-unit-2)
(meets
step-event-unit-2
ending-unit-3))
subject-verb-unit-3
form
((meets misha-unit-2
profiled-verb-3))
syn-cat
((pos
intransitive-verb-phrase))
syn-cat
((aspect perfective)
(aktionsart
ingressive)
(gender idv-2-idv
male)
(person idv-2-idv
3rd)
(tense past)
(number idv-2-idv
singular)
(pos verb))
form
((string
ending-unit-3
"-l"))
prefix-unit-2
syn-cat
((pos prefix))
form
((string
prefix-unit-2
"za-"))
step-event-unit-2
form
((string
step-event-unit-2
"shaga"))
syn-cat
((pos verb-stem))
misha-unit-2
form ((string misha-unit-2 "Misha"))
syn-cat
((pos proper-noun)
(gender idv-2-idv male)
(number idv-2-idv singular)
(person idv-2-idv 3rd)
(case idv-2-idv nominativ))
Fig. 12.8 Profiling of verbs (detail of the final linguistic structure in Figure 12.7).
ingressive), which indicates the meaning (bind aktionsart aktionsart-idv
ingressive). Then the ingressive-construction fills in the syntactic category Ak-
tionsart with the value ingressive, but also the aspect with perfective. Only in this
case was it possible for the prefix za- (za-) to attach to the profiled verb, whereby
realizing the grammatical expression of ingressive Aktionsart which leads to perfective aspect. Similarly, the ending -l (-l) was assigned to the verb only as a surface
expression of the past tense, masculine gender in singular.
So far we have mostly discussed production, but thanks to the reversability of
FCG, exactly the same set of constructions also works for parsing. In parsing the
goal of the agent is to reconstruct the syntactic structure and extract an IRL-network
acting as the meaning of the utterance. The hearer will execute the operations specified in the network for the present context to derive the topic of the conversation,
which is the event the speaker wanted the hearer to pay attention to. During this
process, the FCG parser will build both syntactic and semantic structures, gradually introducing more and more operations, entities, and links between them, into
the network. Once the FCG parser has decoded the network as well as possible,
IRL will try out and actively expand the network by adding links and missing entities. The level of reconstruction required depends on the degree of ambiguity in the
communicated utterance. In our example, FCG will parse the utterance, and because
246
K. Gerasymova and M. Spranger
there is no ambiguity, it can execute the network, which looks exactly like the one in
Figure 12.3. But, for instance, when parsing the temporal conjunction kak tol~ko
(kak tol’ko, ‘as soon as’) in Example 5, the composer relies not only on its lexical
meaning (which corresponds to meets relation) but also takes into consideration the
semantic constraints of the Aktionsarten, which leads to the correct interpretation,
namely the starting-event in the given context.
12.4 Acquisition of Temporal Language
We now turn to the question how temporal language can be acquired by artificial
agents. One of the important parts for aspectual language is how the grammatical
category of aspect is aquired. Over the last few decades numerous exciting psycholinguistic studies have been concerned with the process of child language acquisition (Tomasello, 2000; Lieven et al, 2003) in general and with acquisition of
temporal language in particular (Stoll, 1998). The most promising approaches are
usage-based which means they hypothesize that children’s linguistic skills result
from their accumulated experience with language. For instance, Tomasello (2000)
proposes concrete stages children go through when they acquire language: children
begin speaking using holophrastic units, later they learn item-based constructions,
and only at the final stage do they manage adult-like abstract constructions.
For aspectual language, in particular, psycholinguistics have hypothesized similar stage. For Russian, children start by acquiring predominantly simple verbs (imperfectives, see Gagarina, 2000). This is likely because imperfectives have a minimal stem complexity (simple verbs), do not contain aspectual operators and are
easier to use than perfectives. Overall, children acquire aspect in a piecemeal fashion, and learning can take up to the age of six or seven (Stoll, 1998). At the beginning, verbal aspect is mastered by children as a part of the lexical meaning of a verb
within the general process of cognitive development, i.e., children learn to recognize
and to ‘name’ different situations by means of different forms of verbs. This means
that children do not learn aspect as a separate verbal category per se, and that they
have no general semantic representation of the grammatical aspect, relying instead
on the lexical classes of individual Aktionsarten (Gagarina, 2000). In turn, different
Aktionsarten are learned independently from each other in a context-specific way
(Stoll, 1998). Only later is the grammatical category of aspect abstracted away from
this tight contextual connection by unifying several Aktionsarten into the abstract
category of perfective. At this stage aspect finally becomes recognized as a category
separate from the lexical meaning of verbs (Gagarina, 2000).
The following sections focus on the acquisition of aspectual grammar. We describe a learning experiment in which artificial agents acquire constructions of the
target grammar by participating in situated verbal interactions. We build on extensive previous research on lexicon formation (Steels, 1995) and assume that the learning agents are equipped with a fully developed lexicon. Additionally, we also scaf-
12 An Experiment in Temporal Language Learning
247
fold the grounding of semantics and use IRL as the bottom layer. We are not using
tense in these experiments.
12.5 The Aspectual Language Game
The setup of the experiment is inspired by the comprehension experiments of Stoll
(1998), who investigated how children develop their understanding of aspectual
forms. Preschool children were interviewed after watching pairs of short movies,
each illustrating what would be described by a different aspectual form of the same
verb stem. Similarly, in our experiment artificial agents observe pairs of events differing in temporal semantics and consequently best described by different aspectual
forms. Some agents in the population are tutors and possess a fully developed aspectual system. Further, there are learning agents which have to autonomously acquire
the aspectual system. Agents of both types engage in dialogues, and the learning
agents subsequently pick up the aspectual grammar, so that at the end of the experiment all learner agents converge on a set of grammatical constructions similar to
those of the tutors.
Language games are routinized communicative interactions between pairs of
agents. Here is the script for such an interaction
1. Two agents are randomly selected from the population. One agent acts as the
speaker, the other as the hearer. Learners as well as tutors can be both speaker and
hearer. Both agents perceive a shared joint-attentional frame (Tomasello, 1995)
called context. The context consists of two events of the same kind but with
different temporal semantics, e.g., ongoing reading versus reading for a while.
Both events differ also in protagonists (either Mixa – Michael or Maxa –
Masha); thus, two example events are Michael reading for a while versus Masha
reading the whole time.
2. The speaker starts the interaction by choosing one event from the context as a
topic, for example, the event where Michael was reading for a while. The communicative goal of the speaker is to ask a question about the protagonist of the
topic-event (in our case Michael) which discriminates the protagonist. Therefore,
the event’s temporal structure has to be incorporated. For example, Kto poqital? (Kto počital?, ‘Who read for a while?’) discriminates Michael because
only he was involved in the action for a short period of time (Masha was reading
for the whole time). Once the speaker found such a question, he transmits it to
the hearer.
3. The hearer perceives the utterance, parses it, and interprets it using the context.
The task of the hearer is to identify the protagonist of the topic-event unambiguously; guessing is not allowed.
4. If the hearer is able to unambiguously answer the question, she verbalizes her
answer by saying Michael. Otherwise, she gives up.
5. The speaker signals whether the answer is correct, i.e., whether the answer corresponds to the protagonist of the topic-event. The right answer means the inter-
248
K. Gerasymova and M. Spranger
по-читал
read for-a-while
Fig. 12.9 Schema of the holophrasis poqital (počital, ‘read-for-a-while’). This holophrastic
construction maps the form of the observed utterance poqital to its meaning read-for-a-while.
The learner treats it as a single unit without knowing its composition.
action was a success, no answer or a wrong answer is considered to be a communicative failure.
6. In the case of either incorrect or absent answer, the speaker reveals the desired
answer.
7. Based on the outcome of the interaction, the learner consolidates his grammatical
knowledge by increasing or decreasing the scores of grammatical constructions,
as well as creating new constructions or deleting the old ones.
12.6 Stages in the Acquisition of Temporal Language
The key to agents’ ability to learn are cognitive mechanisms for detecting and solving problems that may be encountered during interactions, e.g., inability to parse an
utterance or ambiguity in interpretation. Successful application of these problemsolving tactics underlies the whole learning process, which can be divided into (at
least) three subsequent stages with respect to the learning mechanisms employed:
acquisition of holophrases, item-based constructions and abstract constructions.
Holophrases are the first type of children’s early constructions, where children
use a single linguistic symbol to communicate their intentions about a specific scene
(Tomasello, 2000). By analogy, learning artificial agents acquire holophrastic constructions during their first phase of learning.
Learning hapens when a learning hearer cannot completely parse a question the
speaker posed, as in the example interaction Kto poqital? (Kto počital?, ‘Who
read for a while?’). The linguistic parts that can be processed are kto and čital. The
prefix po- is left unprocessed. This leads to ambiguity in the interpretation of the
question since both events are about reading. Consequently, two hypotheses about
the protagonist involved in the event are found by the agent. Since the hearer is not
allowed to guess, she gives up, the interaction is a failure. At the end, the speaker reveals the right answer: Michael. The hearer tries to learn from her shortcoming and
first stores the complete perceived utterance as a sample. Additionally, she searches
her context for a semantic factor that could differentiate Michael from Masha, since
questions are assumed to be discriminative. The distinctive feature for Michael is the
temporal structure of his reading, which is for a while, in contrast to the ongoing
reading of Masha. The stored sample is supplemented with this deduced informa-
communicative success
holophrastic constructions
total number of constructions
249
number of constructions
12 An Experiment in Temporal Language Learning
Fig. 12.10 Learning holophrastic constructions. The learner is equipped only with one repair strategy – internalize observed utterances. The communicative success is reached, but the inventory
contains 98 holophrastic constructions (14 verbs × 7 different temporal semantics).
tion (schematically shown in Figure 12.9). The holophrasis is implemented as an
FCG construction – mapping of meaning and form. Kto is not stored in the sample
construction because it is assumed to be known by the agent.
The intuition behind holophrases is that the learning agent assumes that poqital is a single constituent after encountering it for the very first time. This way, the
learning agent stores perceived samples creating undifferentiated holophrastic constructions, e.g., poigral (poigral, ‘played-for-a-while’), porisoval (porisoval,
‘drew-for-a-while’). These holophrasis constructions are fully operational, which
means that by the second time the agent hears the same question, she will be able
to parse it entirely and, moreover, generate this question when in the role of the
speaker (but only the exact same question).
When learners are equipped with such a strategy, they are able to communicate successfully after memorizing all possible prefix+verb combinations they have
encountered. Figure 12.10 depicts the convergence of communicative success accompanying subsequent acquisition of holophrases. However, such organization of
the language inventory is unsatisfactory. With every additional verb floating in the
population, the number of needed constructions increases by the number of temporal semantic features, and with every additional semantic feature by the number of
verbs. Furthermore, such inventory organization lacks any notion of grammar, which
contradicts the known abilities of adult native speakers of Russian to recognize two
distinct aspects.
Item-based Constructions are created in a second stage (Tomasello, 2000).
These constructions are more general and based on repeatedly encountered samples
of similar kind. For example, the hearer again faces the problem of ambiguous inter-
250
K. Gerasymova and M. Spranger
по-читал
read for-a-while
по-играл
play for-a-while
по-рисовал
draw for-a-while
по-verb
verb for-a-while
Fig. 12.11 Learning of the item-based construction po+verb (po+verb, ‘verb+for-a-while’).
Above the arrow: undifferentiated holophrases are stored in memory when encountered. Under
the arrow: holophrases with a particular prefix become generalized to an item-based construction
based on this prefix, enabling parsing of prefixed verbs.
pretation because of the inability to parse popisal (popisal, ‘wrote-for-a-while’).
But now, instead of giving up, she searches through her stored samples for a means
of parsing the utterance, eventually noticing that the difference between holophrastic constructions for poqital, poigral, porisoval is the actual verb stem.
Hence, the agent is able to create a more general construction for the usage pattern po+verb (with a slot for a verb), as shown in Figure 12.11, and successfully
parse the utterance involving popisal. The discovery of this usage pattern corresponds to the acquisition of the delimitative Aktionsart. More precisely, the agent
has learned that the presence of the prefix po- (po-) in front of a verb indicates that
the temporal semantic feature for a while has been added to its meaning.
After this stage, the learning agent can correctly interpret any (known) verb prefixed by po- (po-), even if she has not encountered this particular combination before. However, the agent has only acquired the ability to comprehend the pattern
po+verb but is still lacking additional knowledge for utilizing this device in language generation. What is missing is an understanding of the general principle of
deriving new Aktionsarten by prefixation required to actively create a prefix structure in production.
Nevertheless, the process of generalization described here for the prefix po(po-) works exactly the same for other prefixes, given enough generalizable material in an agent’s linguistic inventory. The learned constructions are item-based,
the item being the particular prefix. The independent emergence of such item-based
constructions for other prefixes mirrors the independent acquisition of Aktionsarten
(hypothesized in Section 12.4).
Finally, Abstract Constructions are created. The final phase of the acquisition
process in children is characterized by generalization over item-based constructions
and formation of abstract constructions, in which children express their communicative intentions through utterances that instantiate relatively abstract, adult-like
linguistic constructions (Tomasello, 2000).
12 An Experiment in Temporal Language Learning
251
по-verb
verb for-a-while
на-verb
verb complete
за-verb
prefix-verb
verb
begin
verb temporal-sem-cat
Fig. 12.12 Learning the principle of prefixation: emergence of an abstract construction for perfective. Above the arrow: item-based constructions based on particular prefixes. Under the arrow:
abstract construction expressing the general principle of prefixation for derivation of new Aktionsarten is learned through generalization over item-based constructions.
Although the two previous learning strategies solved the parsing problem for artificial learners, production remains troublesome. When faced with a need to generate
a question in dialogs, learners are still unable to construct the complete utterance.
In particular, they are unable to express the temporal semantics of events needed for
discrimination. This failure is detected by the learner after re-entering the outcome
of production into her own language system for parsing and noticing that the constructed utterance is insufficient to single out the topic. The idea behind re-entrance
is to predict the effect of the utterance before actually passing it to the hearer.
To repair her communicative problem, the learner examines the inventory of her
linguistic experiences. There, accumulated item-based constructions reveal a general principle that the temporal semantics of verbs (Aktionsart) can be expressed
by means of prefixation. This discovery can be captured by a novel abstract construction, where a prefixed verb, regardless of the actual form of the prefix and
corresponding Aktionsart, becomes marked for the perfective aspect. The new construction operates only on the abstract semantic and syntactic categories of Aktionsart (temporal-sem-cat) and aspect and generates an abstract unit for a prefix without
any concrete linguistic material (Figure 12.12). Only after this stage is the agent able
to generate the perfective derivation of any (known) verb without having heard the
resulting form before. This process resembles the way children acquire the grammatical category of aspect late in development, by unifying several Aktionsarten
into the abstract category of perfective.
12.7 Experimental Results
Through repeated interactions of the presented aspectual language game, artificial
learners are able to acquire the aspectual grammar. Figure 12.13 (upper graph) displays the development of the grammar of one learning agent. In the world in which
252
K. Gerasymova and M. Spranger
1
40
communicative success
communicative success
30
total number of constructions
0.6
25
20
0.4
0.2
0
15
holophrastic constructions
semantic
constructions
mapping
constructions
morphological
constructions
0
500
1000
1500
2000
number of interactions
2500
10
number of constructions
35
0.8
5
0
3000
holophrastic construction
semantic constructions
mapping constructions
number of constructions
number of constructions
morphological
constructions
Fig. 12.13 Development of aspectual grammar: communicative success and number of grammatical constructions of one learner during the acquisition process (above: population of one learner
and one tutor, avg. of 10 parallel runs of the experiment; bottom: population of 10 agents with 5
learners).
the learner is situated, events can exhibit 7 different temporal semantics: ongoing,
begin, for a while, finish, complete, exhaustion, alteration. Therefore, the target
grammar should contain 20 construction in total.3
3
This number results from the particular realization of the target grammar in FCG and is assembled
from 7 semantic and 7 abstract mapping constructions (for each temporal semantic facet) and 6
morphological constructions. The durative Aktionsart coding the ongoing temporal semantics does
not require a prefix and, therefore, lacks a morphological construction.
12 An Experiment in Temporal Language Learning
253
In the beginning, the only kind of grammatical constructions the learning agent
creates are holophrases (red line); their number is aligned with the total number of
grammatical constructions the agent acquires (yellow line). After a couple dozen
interactions, the learner starts to generalize, noticing the system behind the stored
samples: other types of grammatical constructions are generated (semantic and morphological item-based and abstract mapping constructions, indicated by the green,
gray and blue lines, respectively). The communicative success converges to the maximum value after approximately 300 interactions (cyan line); each posterior game
will be a success.
All constructions in the agent’s inventory have a score in the range of [0..1] at any
given time during the game. When a new construction comes into play, it is assigned
an initial score of 0.5. In the course of the game, the scores of constructions are
updated depending on their success in communication (unsuccessful constructions
are punished). After the target grammar is acquired (20 constructions in total), the
very specific holophrastic constructions become redundant: they are in competition
with more general item-based and abstract constructions. Eventually, holophrases
lose and disappear after about 2000 interactions. The bottom graph in Figure 12.13
displays a similar dynamic for the scaled-up case of 5 learning agents in a population
of 10 agents.
12.8 Conclusion
This chapter investigated how temporal language can be operationalized in artificial
agents and how artificial learners can acquire aspectual grammar through communicative interactions. We examined mechanisms for the grounding and semantic and
syntactic processing of Russian temporal language and highlighted how information
flows through the system so that artificial speakers and listeners can communicate.
For the acquisition study, we equipped artificial tutors with subsets of Russian
aspectual grammar and had them interact with learning agents, giving students the
opportunity to infer and adopt constructions for talking about aspectually marked
events in their environment. We introduced three different learning operators reminiscent of findings in developmental psychology. Results proved that the proposed
learning operators together with the machinery needed for routine conceptualization
and language processing lead to the successful acquisition of aspectual structures
found in human grammars.
Acknowledgements
This research has been carried out at Sony CSL Paris with support from the EU FP7
project ALEAR. We are indebted to Masahiro Fujita, Hideki Shimomura and their
team from Sony Corporation Japan for making the robots available to us.
254
K. Gerasymova and M. Spranger
References
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM
26(11):832–843
Bickel B (1996) Aspect, mood, and time in Belhare: studies in the semantics –
pragmatics interface of a Himalayan language. Zürich: ASAS-Verlag
De Beule J (2006) Simulating the syntax and semantics of linguistic constructions
about time. In: Gontier N, van Bendegem JP, Aerts D (eds) Evolutionary Epistemology, Language and Culture - A non-adaptationist, systems theoretical approach, Springer
Forsyth J (1970) A Grammar of Aspect: Usage and Meaning in the Russian Verb.
Cambridge: Cambridge University Press
Gagarina N (2000) The acquisition of aspectuality by Russian children: the early
stages. ZAS Papers in Linguistics 15:232–246
Gerasymova K (2010) Emergence of aktionsarten: The first step towards aspect. In:
Smith A, Schouwstra M, de Boer B, Smith K (eds) The Evolution of Language
(Evolang8), World Scientific, Singapore, pp 145–152
Gerasymova K, Spranger M (2010) Acquisition of grammar in autonomous artificial
systems. In: Coelho H, Studer R, Woolridge M (eds) Proceedings of the 19th
European Conference on Artificial Intelligence (ECAI-2010), IOS Press, pp 923–
928
Gerasymova K, Steels L, van Trijp R (2009) Aspectual morphology of russian verbs
in fluid construction grammar. In: Taatgen N, van Rijn H (eds) Proceedings of
the 31th Annual Conference of the Cognitive Science Society, Cognitive Science
Society, pp 1370–1375
Krongauz MA (1998) Pristavki i glagoly v russkom jazyke: semantičeskaja grammatika. Moscow: Jazyki russkoj kul’tury
Lieven E, Behrens H, Speares J, Tomasello M (2003) Early syntactic creativity: A
usage-based approach. Journal of Child Language 30(02):333–370
Spranger M, Pauw S, Loetzsch M, Steels L (2012) Open-ended Procedural Semantics. In: Steels L, Hild M (eds) Language Grounding in Robots, Springer, New
York
Steels L (1995) A self-organizing spatial vocabulary. Artificial Life 2(3):319–332
Steels L, Baillie JC (2003) Shared grounding of event descriptions by autonomous
robots. Robotics and Autonomous Systems 43(2-3):163–173
Steels L, De Beule J, Wellens P (2012) Fluid Construction Grammar on Real
Robots. In: Steels L, Hild M (eds) Language Grounding in Robots, Springer,
New York
Stoll S (1998) The role of aktionsart in the acquisition of Russian aspect. First Language 18(54):351–376
Tomasello M (1995) Joint attention as social cognition. In: Moore C, Dunham PJ
(eds) Joint attention: Its origins and role in development, Lawrence Erlbaum Associates, Hillsdale, NJ, pp 103–130
Tomasello M (2000) First steps toward a usage-based theory of language acquisition. Cognitive Linguistics 11-1/2:61–82