Chapter 12 An Experiment in Temporal Language Learning Kateryna Gerasymova1 and Michael Spranger2,3 This paper is the author’s draft and has now been officially published as: Kateryna Gerasymova and Michael Spranger (2012). An Experiment in Temporal Language Learning. In Luc Steels and Manfred Hild (Eds.), Language Grounding in Robots, 237–254. New York: Springer. Abstract Russian requires speakers of the language to conceptualize events using temporal language devices such as Aktionsarten and aspect, which relate to particular profiles and characteristics of events such as whether the event just started, whether it is ongoing or it is a repeated event. This chapter explores how such temporal features of events can be processed and learned by robots through grounded situated interactions. We use a whole systems approach, tightly integrating perception, conceptualization grammatical processing and learning and demonstrate how a system of Aktionsarten can be acquired. Key words: temporal language, aspect, cognitive semantics, fluid construction grammar, language acquisition 12.1 Introduction Temporal language concerns the conceptualization and expression of relations between events (such as past, present, future) and the internal structure of events (such as perfective versus imperfective). We focus here only on aspect and use Russian as target language because this language is known to have a complex aspectual system. To illustrate this chapter, we consider an example sentence from Russian: 1 Institute of Biomedical Engineering (IBME), University of Oxford, Oxford, United Kingdom, email: [email protected] 2 Sony Computer Science Laboratory Paris, 6 rue Amyot, 75005 Paris, France 3 Systems Technology Laboratory, Sony Corporation, Minato-ku 108-0075, Tokyo, Japan 237 238 K. Gerasymova and M. Spranger Example 1. Mixa zaxagal, kogda Maxa stola. Miša zašagal, kogda Maša stojala. ‘Misha began to step, while Masha was standing.’ Aktionsarten describe the lexical temporal semantics of verbs. Here we approach Aktionsarten in terms of event boundaries (Bickel, 1996), as also used by Stoll (1998) (as in Figure 12.1). The event itself is bound in time: It has a starting and an ending point and is of some duration. The boundaries and phases of this event, which are the focal point of the Aktionsart in question, are indicated by a curly brace. { event t Fig. 12.1 Duratives focus on the inner portion of the event and do not highlight any boundaries. Durative Aktionsart1 comprises verbs, which describe events without referring to any boundaries, e.g. meqtat~i (mečtat’, ‘dream’), as illustrated in Figure 12.1. Duratives are imperfective and are often used as a description of usual facts, simple activities and states, as in the example: Example 2. Kogda Kat govorilai , ona oqen~ milo ulybalas~i ... Kogda Katja govorila, ona očen’ milo ulybalas’. ‘When Katya spoke, she smiled charmingly.’ (I. S. Turgenev. Fathers and Sons) Ingressives express the beginning of an event, e.g. zaxagat~ p (zašagat’, ‘start stepping’). They are most commonly associated with the prefix za- (za-), which introduces an initial boundary to the event described by the unprefixed verb, as shown in Figure 12.2. The new form becomes perfective. { event t Fig. 12.2 Ingressive Aktionsart concentrates on the initial boundary of an event. In the following sentence, the ingressive zaplakat~ p (zaplakat’, ‘burst into tears’) denotes the beginning of a corresponding event of crying: Example 3. Tt Kat ne hotela ehat~, da kak zaplaqet p ... Tëtja Katja ne chotela echat’, da kak zaplačet. ‘Aunt Katya didn’t want to go and suddenly burst into tears.’ (L. N. Tolstoy. A gloomy morning) 1 The terminology is comparable to that of Forsyth (1970) who refers to Aktionsarten as procedurals and uses slightly different terms for each of the different Aktionsarten. 12 An Experiment in Temporal Language Learning (profile-event-aktionsart ?stand-dur ?stand-events (bind 239 ?durative) aktionsart (get-context ?durative (filter-set-event (filter-by-allen-relation-to-now (filter-event-by-agent ?stand-dur-past ?masha-stand-dur-past ?stand-dur durative) ?stand-events ?stand-time) ?stand-dur-past ?idv) (bind (bind ?ctx-1 ?stand-event-type) event-type ?stand-event-type allen-relation (filter-by-individual (unique-entity (unique-entity ?topic (filter-by-allen-relation ?stand-event (filter-event-by-agent (filter-by-allen-relation-to-now (get-context ?misha-step-ingr-past (filter-set-event ?step-ingr-past (get-context ?step-ingr ?step-events ?ctx-3) ?stand-event ?misha-step-ingr-past (bind (profile-event-aktionsart ?ctx-2 precedes) ?individual) allen-relation ?aktionsart) ?step-event-type) event-type ?step-time) ?step-time (bind ?step-event-type ?ctx-2) ?allen-relation) ?step-ingr-past ?step-ingr ?step-events ?ctx-3 (bind ?stand-time ?idv stand-event) ?masha-stand-dur-past) ?target) ?target ?ctx-1) ?idv-2) (bind individual (bind allen-relation (filter-by-individual precedes) (bind aktionsart ?idv-2 (get-context individual ?aktionsart ?individual ?allen ?ctx-4 masha) contained-in) ?individual-2) ?ctx-4) ?individual-2 misha) ingressive) step-event) Fig. 12.3 IRL network underlying the example sentence 1 configured autonomously by a speaker to discriminate a particular event in the context. Another class of temporal Aktionsarten are: delimitatives, which denote the development of an action to a limited extent and are therefore characterized by both boundaries, the initial and final; the telic Aktionsart incorporates the notion of result or goal of the corresponding action as a part of the verb’s semantics, focusing on the final boundary; the semelfactives express the notion of doing something once; they are punctual, so initial and final boundary coincide. All these Aktionsarten are derived from durative verbs by attaching one of the nineteen different prefixes (Krongauz, 1998), after which the resulting form becomes perfective. The semelfactives can also be derived by suffixation. An account of the Aktionsarten that are not purely temporal falls beyond the scope of this paper. Moreover we focus the discussion here only on those Aktionsarten which occur in the example sentence: durative and ingressive. 240 K. Gerasymova and M. Spranger 12.2 Grounded Temporal Semantics We use Incremental Recruitment Language or IRL (see Spranger et al, 2012, this volume) to represent and compute semantic structures that underlie the meaning of utterances2 . Figure 12.3 depicts a potential network underlying example sentence 1. It encodes a set of operations that will lead to the identification of Misha’s stepping, by relating its beginning to the event of Masha’s standing. In order to understand the operations in the network from Figure 12.3, we first need to consider the grounding of events in sensorimotor data streams, which underlies the operations of the semantic structure. Earlier work (Steels and Baillie, 2003) has focussed on complex event structures, here we use a similar albeit simplified approach. Most importantly though and in difference to earlier approaches, grounding of events is happening as part of conceptualization. Here is how that is achieved. The vision system provides streams of raw data, which encode spatial and color properties of objects in the vicinity of the robots. Special subsystems of the vision system recognize robots or other known entities (see Figure 12.4). The system has been specifically tuned to recognizing robots and particular parts of robots such as the feet. The raw data streams perceived by a robot are available to the network via the operation get-context. The identification of events works on top of the raw data streams and is realized through the operation filter-set-event. In the subnetwork of Figure 12.3 (bottom left), filter-set-event performs filtering for stepping. The operation goes through the sensorimotor stream provided by ctx-3 and identifies trajectories in space and time that resemble stepping. Finally, it packages found events and marks their type (here, step), start and end times, as well as participating agents. The processing of Aktionsarten in IRL relies on a type hierarchy in the computer science sense of the term (see Figure 12.5). Events are intervals, consisting of a start and end time as well as an event-type, e.g. step or stand (which encodes what kind of event it is), and an object anchor, which points to the agent of the event. The primitive profile-by-aktionsarten extends the representation of events: it profiles a given event to either a bound or an unbound subtype, depending on the particular Aktionsart in question. For the durative Aktionsart the result of profiling is an unbound event: an event that still contains the same information as the input event, i.e. start and end time, as well as type, but with the additional information of being unbound. For Aktionsarten that are described as bound events, profiling will add highlights on boundaries as required by the Aktionsart. For instance, profiling by ingressive highlights the initial boundary. It is important that these distinctions are imposed by the agent: they are part of conceptualizing a scene in a certain way; they are not part of the scene itself. Processing of events inevitably raises the issue of reference in time and tense. We use the Allen interval calculus (Allen, 1983) to represent temporal relations – 2 The demo of the processing of the example sentence can be found on www.fcgnet.org/reviews/aktionsarten-demo/. Additional information on our modeling efforts on temporal language can be found in Gerasymova et al (2009); Gerasymova (2010); Gerasymova and Spranger (2010). Early work in the temporal domain can be found in De Beule (2006). 12 An Experiment in Temporal Language Learning 241 Fig. 12.4 Vision system tracking robots. Both the robot and its feet are detected and tracked. The vision system encodes the trajectory of the foot in relation to the robot, allowing the event recognition operations to classify particular trajectories as step events or stand events. interval start : real end : real object-anchor : symbol event type : symbol bounded event boundaries : list unbounded event Fig. 12.5 Event hierarchy for modeling temporal Aktionsarten. Events are intervals consisting of a start and end time as well as a an event-type, e.g. walk or read and an object anchor – the agent of the event. the basis of tense distinctions. The corresponding operation is called filter-byallen-relation. The special version of this primitive, which employs now as a deictic center, is called filter-by-allen-relation-to-now and is used in the implementation of the past tense combined with the precedes temporal relation. Moreover, when looking closely at the IRL-network from Figure 12.3, one encounters other events serving as reference points for temporal relations. The two previously profiled events, stepping and standing, are related through contained-in, realizing temporal subordination. The described representations and type hierarchies derive their descriptive power from the role they play in the complete network. For instance, it now becomes possible to specialize the way temporal relations apply to particular event types. In Russian it is not possible to combine all Aktionsarten with all temporal relations. For an unbound event, i.e. an event conceptualized as being durative, certain relations like meets – one event stops exactly when another starts – are incomprehensible. Since a durative event possesses neither initial nor final boundary, it does not make sense to relate a durative event using a meets temporal relation which works on boundaries. However, it is plausible to, for instance, relate a durative event to another one us- 242 K. Gerasymova and M. Spranger ing a contained relation which is what happens in our example structure. Here, the durative standing contains the ingressive stepping. For ingressives the set of possible relationships change in a different way. While it seems perfectly reasonable to conceptualize some event as meeting an ingressive event, i.e. finishing exactly when the other starts, the ingressive event itself cannot meet another event. Because of the underspecified final boundary of ingressives, the following example where an ingressive stepping meets delimitative crying is ungrammatical: Example 4. (*) Kak tol~ko Mixa zaxagal, Maxa poplakala. Kak tol’ko Miša zašagal, Maša poplakala. ‘As Misha started stepping, Masha cried for a while.’ However, when the same temporal conjunction kak tol~ko, which normally means immediately after or in terms of the temporal logic meets, occurs with two ingressives, the interpretation is that two events have started together: Example 5. Kak tol~ko Mixa zaxagal, Maxa zaplakala. Kak tol’ko Miša zašagal, Maša zaplakala. ‘As Misha started stepping, Masha started crying.’ Hence, the two ingressives can be related with the starts temporal relation, indicating that both events start at the same time, which is in line with our model because the necessary initial boundaries of both events are activated. Networks, as the one from Figure 12.3, are possibly built in two different ways. First, when trying to plan an utterance in order to fulfill a particular communicative goal, agents search the space of possible networks for the one that promises to best achieve the goal. Second, when confronted with an utterance in parsing, the language system will recover parts of the network. It is then up to the hearer to fill in the missing parts of the network given the information encoded in the utterance and the context. 12.3 Syntactic Processing of Temporal Language Language processing is implemented in the Fluid Construction Grammar (FCG) formalism (Steels et al, 2012, this volume). IRL-networks constitute the semantic input for the grammar engine which utilizes constructions to handle production and parsing. During production, the initial transient structure contains the meaning (IRL-network) on its semantic pole; the syntactic pole is empty (Figure 12.6). The application of constructions is realized as a search process, which tries to find the next possible construction to apply. Overall in the course of production of the example sentence, 12 constructions of the implemented grammar are applied. The resulting linguistic structure, depicted in Figure 12.7, is rendered into the utterance “Misha za- shaga -l, kogda Masha stoja -la", which was our targeted example. 12 An Experiment in Temporal Language Learning 243 top meaning ((get-context ctx-1-idv) (bind event-type stand-event-type-idv stand-event) (filter-set-event stand-events-idv ctx-1-idv stand-event-type-idv) (filter-by-allen-relation-to-now stand-dur-past-idv stand-dur-idv stand-time-idv) (bind allen-relation stand-time-idv precedes) (profile-event-aktionsart stand-dur-idv stand-events-idv durative-idv) (bind aktionsart durative-idv durative) (filter-event-by-agent masha-stand-dur-past-idv stand-dur-past-idv idv-idv) (get-context ctx-2-idv) (filter-by-individual idv-idv ctx-2-idv individual-idv) (bind individual individual-idv masha) (unique-entity stand-event-idv masha-stand-dur-past-idv) (get-context ctx-3-idv) sem syn (bind event-type step-event-type-idv top step-event) (filter-set-event step-events-idv ctx-3-idv step-event-type-idv) (filter-by-allen-relation-to-now step-ingr-past-idv step-ingr-idv step-time-idv) (bind allen-relation step-time-idv precedes) (profile-event-aktionsart step-ingr-idv step-events-idv aktionsart-idv) (bind aktionsart aktionsart-idv ingressive) (filter-event-by-agent misha-step-ingr-past-idv step-ingr-past-idv idv-2-idv) (filter-by-individual idv-2-idv ctx-4-idv individual-2-idv) (bind individual individual-2-idv misha) (get-context ctx-4-idv) (filter-by-allen-relation target-idv misha-step-ingr-past-idv stand-event-idv allen-relation-idv) (bind allen-relation allen-relation-idv contained-in) (unique-entity topic-idv target-idv)) Fig. 12.6 Initial transient structure at the beginning of the production process. T he semantic (left) pole contains the list of primitives representing the meaning that has to be expressed – output of the IRL network (Figure 12.3). During the production process, the syntactic (right) pole of this structure will gradually be built up as a result of the application of constructions. The interplay between Aktionsarten and Aspect is encoded in the grammar via specialized constructions. Let us zoom into the final transient structure from Figure 12.7 and focus in the syntactic pole on the part “Misha started stepping", depicted in Figure 12.8. This part contains six units: the unit subject-verb-unit-3 with the information about the word order and its ‘daughters’, which correspond to the subject misha-unit-2 and verb profiled-verb-3. How was this structure created? First the lexical constructions are applied; the one responsible for step creates the unit named step-event-unit-2 containing no grammatical information yet. After that, the profile-verb construction triggers on the meaning profile-event-aktionsart and filter-by-allen-relation-to-now and creates new profiled-verb-3 unit, where the linguistic information about the corresponding semantic and syntactic categories will be allocated. Further, the structure is rearranged: profiled-verb-3 unit is put on the top of the hierarchy, attached to it are the dependent subunits, for example, for the verb-stem itself – 244 K. Gerasymova and M. Spranger subject-verb-unit-4 form ((meets masha-unit-2 profiled-verb-4)) syn-cat ((pos intransitive-verb-phrase)) profiled-verb-4 ending-unit-4 form ((meets stand-event-unit-2 ending-unit-4)) syn-cat ((pos ending)) syn-cat ((aktionsart durative) (aspect imperfective) (gender idv-idv female) (person idv-idv 3rd) (tense past) (number idv-idv singular) (pos verb)) form ((string ending-unit-4 "-la")) stand-event-unit-2 form ((string stand-event-unit-2 "stoja")) syn-cat ((pos verb-stem)) masha-unit-2 form ((string masha-unit-2 "Masha")) syn-cat ((pos proper-noun) (gender idv-idv female) (number idv-idv singular) (person idv-idv 3rd) (case idv-idv nominativ)) ending-unit-3 conjunction-unit-2 syn top syn-cat ((pos ending)) form ((meets contained-in-unit-2 subject-verb-unit-4) (meets subject-verb-unit-3 ?comma-1) (string ?comma-1 ", ") (precedes subject-verb-unit-3 subject-verb-unit-4)) profiled-verb-3 form ((meets prefix-unit-2 step-event-unit-2) (meets step-event-unit-2 ending-unit-3)) subject-verb-unit-3 form ((meets misha-unit-2 profiled-verb-3)) syn-cat ((pos intransitive-verb-phrase)) syn-cat ((aspect perfective) (aktionsart ingressive) (gender idv-2-idv male) (person idv-2-idv 3rd) (tense past) (number idv-2-idv singular) (pos verb)) form ((string ending-unit-3 "-l")) prefix-unit-2 syn-cat ((pos prefix)) form ((string prefix-unit-2 "za-")) step-event-unit-2 form ((string step-event-unit-2 "shaga")) syn-cat ((pos verb-stem)) misha-unit-2 form ((string misha-unit-2 "Misha")) syn-cat ((pos proper-noun) (gender idv-2-idv male) (number idv-2-idv singular) (person idv-2-idv 3rd) (case idv-2-idv nominativ)) contained-in-unit-2 form ((string contained-in-unit-2 "kogda")) syn-cat ((pos conjunction)) Fig. 12.7 Syntactic pole of the final linguistic structure created by the FCG system for expressing the IRL network from Figure 12.3. Gathering all the strings and meets constraints from the syntactic pole (this figure) yields the utterance “Misha za- shaga -l, kogda Masha stoja -la" (‘Misha began to step, while Masha was standing’), which was our targeted example sentence 1. step-event-unit-2. The profiled-verb-3 unit gathers information about the grammatical categories, on which constructions responsible for morphology can operate, attaching markers, order constraints, or even prosody information. For example, a special construction for ingressive Aktionsart triggers on a structure only if the latter features the semantic category (aktionsart ?profiled-event-1 12 An Experiment in Temporal Language Learning 245 ending-unit-3 syn-cat ((pos ending)) profiled-verb-3 form ((meets prefix-unit-2 step-event-unit-2) (meets step-event-unit-2 ending-unit-3)) subject-verb-unit-3 form ((meets misha-unit-2 profiled-verb-3)) syn-cat ((pos intransitive-verb-phrase)) syn-cat ((aspect perfective) (aktionsart ingressive) (gender idv-2-idv male) (person idv-2-idv 3rd) (tense past) (number idv-2-idv singular) (pos verb)) form ((string ending-unit-3 "-l")) prefix-unit-2 syn-cat ((pos prefix)) form ((string prefix-unit-2 "za-")) step-event-unit-2 form ((string step-event-unit-2 "shaga")) syn-cat ((pos verb-stem)) misha-unit-2 form ((string misha-unit-2 "Misha")) syn-cat ((pos proper-noun) (gender idv-2-idv male) (number idv-2-idv singular) (person idv-2-idv 3rd) (case idv-2-idv nominativ)) Fig. 12.8 Profiling of verbs (detail of the final linguistic structure in Figure 12.7). ingressive), which indicates the meaning (bind aktionsart aktionsart-idv ingressive). Then the ingressive-construction fills in the syntactic category Ak- tionsart with the value ingressive, but also the aspect with perfective. Only in this case was it possible for the prefix za- (za-) to attach to the profiled verb, whereby realizing the grammatical expression of ingressive Aktionsart which leads to perfective aspect. Similarly, the ending -l (-l) was assigned to the verb only as a surface expression of the past tense, masculine gender in singular. So far we have mostly discussed production, but thanks to the reversability of FCG, exactly the same set of constructions also works for parsing. In parsing the goal of the agent is to reconstruct the syntactic structure and extract an IRL-network acting as the meaning of the utterance. The hearer will execute the operations specified in the network for the present context to derive the topic of the conversation, which is the event the speaker wanted the hearer to pay attention to. During this process, the FCG parser will build both syntactic and semantic structures, gradually introducing more and more operations, entities, and links between them, into the network. Once the FCG parser has decoded the network as well as possible, IRL will try out and actively expand the network by adding links and missing entities. The level of reconstruction required depends on the degree of ambiguity in the communicated utterance. In our example, FCG will parse the utterance, and because 246 K. Gerasymova and M. Spranger there is no ambiguity, it can execute the network, which looks exactly like the one in Figure 12.3. But, for instance, when parsing the temporal conjunction kak tol~ko (kak tol’ko, ‘as soon as’) in Example 5, the composer relies not only on its lexical meaning (which corresponds to meets relation) but also takes into consideration the semantic constraints of the Aktionsarten, which leads to the correct interpretation, namely the starting-event in the given context. 12.4 Acquisition of Temporal Language We now turn to the question how temporal language can be acquired by artificial agents. One of the important parts for aspectual language is how the grammatical category of aspect is aquired. Over the last few decades numerous exciting psycholinguistic studies have been concerned with the process of child language acquisition (Tomasello, 2000; Lieven et al, 2003) in general and with acquisition of temporal language in particular (Stoll, 1998). The most promising approaches are usage-based which means they hypothesize that children’s linguistic skills result from their accumulated experience with language. For instance, Tomasello (2000) proposes concrete stages children go through when they acquire language: children begin speaking using holophrastic units, later they learn item-based constructions, and only at the final stage do they manage adult-like abstract constructions. For aspectual language, in particular, psycholinguistics have hypothesized similar stage. For Russian, children start by acquiring predominantly simple verbs (imperfectives, see Gagarina, 2000). This is likely because imperfectives have a minimal stem complexity (simple verbs), do not contain aspectual operators and are easier to use than perfectives. Overall, children acquire aspect in a piecemeal fashion, and learning can take up to the age of six or seven (Stoll, 1998). At the beginning, verbal aspect is mastered by children as a part of the lexical meaning of a verb within the general process of cognitive development, i.e., children learn to recognize and to ‘name’ different situations by means of different forms of verbs. This means that children do not learn aspect as a separate verbal category per se, and that they have no general semantic representation of the grammatical aspect, relying instead on the lexical classes of individual Aktionsarten (Gagarina, 2000). In turn, different Aktionsarten are learned independently from each other in a context-specific way (Stoll, 1998). Only later is the grammatical category of aspect abstracted away from this tight contextual connection by unifying several Aktionsarten into the abstract category of perfective. At this stage aspect finally becomes recognized as a category separate from the lexical meaning of verbs (Gagarina, 2000). The following sections focus on the acquisition of aspectual grammar. We describe a learning experiment in which artificial agents acquire constructions of the target grammar by participating in situated verbal interactions. We build on extensive previous research on lexicon formation (Steels, 1995) and assume that the learning agents are equipped with a fully developed lexicon. Additionally, we also scaf- 12 An Experiment in Temporal Language Learning 247 fold the grounding of semantics and use IRL as the bottom layer. We are not using tense in these experiments. 12.5 The Aspectual Language Game The setup of the experiment is inspired by the comprehension experiments of Stoll (1998), who investigated how children develop their understanding of aspectual forms. Preschool children were interviewed after watching pairs of short movies, each illustrating what would be described by a different aspectual form of the same verb stem. Similarly, in our experiment artificial agents observe pairs of events differing in temporal semantics and consequently best described by different aspectual forms. Some agents in the population are tutors and possess a fully developed aspectual system. Further, there are learning agents which have to autonomously acquire the aspectual system. Agents of both types engage in dialogues, and the learning agents subsequently pick up the aspectual grammar, so that at the end of the experiment all learner agents converge on a set of grammatical constructions similar to those of the tutors. Language games are routinized communicative interactions between pairs of agents. Here is the script for such an interaction 1. Two agents are randomly selected from the population. One agent acts as the speaker, the other as the hearer. Learners as well as tutors can be both speaker and hearer. Both agents perceive a shared joint-attentional frame (Tomasello, 1995) called context. The context consists of two events of the same kind but with different temporal semantics, e.g., ongoing reading versus reading for a while. Both events differ also in protagonists (either Mixa – Michael or Maxa – Masha); thus, two example events are Michael reading for a while versus Masha reading the whole time. 2. The speaker starts the interaction by choosing one event from the context as a topic, for example, the event where Michael was reading for a while. The communicative goal of the speaker is to ask a question about the protagonist of the topic-event (in our case Michael) which discriminates the protagonist. Therefore, the event’s temporal structure has to be incorporated. For example, Kto poqital? (Kto počital?, ‘Who read for a while?’) discriminates Michael because only he was involved in the action for a short period of time (Masha was reading for the whole time). Once the speaker found such a question, he transmits it to the hearer. 3. The hearer perceives the utterance, parses it, and interprets it using the context. The task of the hearer is to identify the protagonist of the topic-event unambiguously; guessing is not allowed. 4. If the hearer is able to unambiguously answer the question, she verbalizes her answer by saying Michael. Otherwise, she gives up. 5. The speaker signals whether the answer is correct, i.e., whether the answer corresponds to the protagonist of the topic-event. The right answer means the inter- 248 K. Gerasymova and M. Spranger по-читал read for-a-while Fig. 12.9 Schema of the holophrasis poqital (počital, ‘read-for-a-while’). This holophrastic construction maps the form of the observed utterance poqital to its meaning read-for-a-while. The learner treats it as a single unit without knowing its composition. action was a success, no answer or a wrong answer is considered to be a communicative failure. 6. In the case of either incorrect or absent answer, the speaker reveals the desired answer. 7. Based on the outcome of the interaction, the learner consolidates his grammatical knowledge by increasing or decreasing the scores of grammatical constructions, as well as creating new constructions or deleting the old ones. 12.6 Stages in the Acquisition of Temporal Language The key to agents’ ability to learn are cognitive mechanisms for detecting and solving problems that may be encountered during interactions, e.g., inability to parse an utterance or ambiguity in interpretation. Successful application of these problemsolving tactics underlies the whole learning process, which can be divided into (at least) three subsequent stages with respect to the learning mechanisms employed: acquisition of holophrases, item-based constructions and abstract constructions. Holophrases are the first type of children’s early constructions, where children use a single linguistic symbol to communicate their intentions about a specific scene (Tomasello, 2000). By analogy, learning artificial agents acquire holophrastic constructions during their first phase of learning. Learning hapens when a learning hearer cannot completely parse a question the speaker posed, as in the example interaction Kto poqital? (Kto počital?, ‘Who read for a while?’). The linguistic parts that can be processed are kto and čital. The prefix po- is left unprocessed. This leads to ambiguity in the interpretation of the question since both events are about reading. Consequently, two hypotheses about the protagonist involved in the event are found by the agent. Since the hearer is not allowed to guess, she gives up, the interaction is a failure. At the end, the speaker reveals the right answer: Michael. The hearer tries to learn from her shortcoming and first stores the complete perceived utterance as a sample. Additionally, she searches her context for a semantic factor that could differentiate Michael from Masha, since questions are assumed to be discriminative. The distinctive feature for Michael is the temporal structure of his reading, which is for a while, in contrast to the ongoing reading of Masha. The stored sample is supplemented with this deduced informa- communicative success holophrastic constructions total number of constructions 249 number of constructions 12 An Experiment in Temporal Language Learning Fig. 12.10 Learning holophrastic constructions. The learner is equipped only with one repair strategy – internalize observed utterances. The communicative success is reached, but the inventory contains 98 holophrastic constructions (14 verbs × 7 different temporal semantics). tion (schematically shown in Figure 12.9). The holophrasis is implemented as an FCG construction – mapping of meaning and form. Kto is not stored in the sample construction because it is assumed to be known by the agent. The intuition behind holophrases is that the learning agent assumes that poqital is a single constituent after encountering it for the very first time. This way, the learning agent stores perceived samples creating undifferentiated holophrastic constructions, e.g., poigral (poigral, ‘played-for-a-while’), porisoval (porisoval, ‘drew-for-a-while’). These holophrasis constructions are fully operational, which means that by the second time the agent hears the same question, she will be able to parse it entirely and, moreover, generate this question when in the role of the speaker (but only the exact same question). When learners are equipped with such a strategy, they are able to communicate successfully after memorizing all possible prefix+verb combinations they have encountered. Figure 12.10 depicts the convergence of communicative success accompanying subsequent acquisition of holophrases. However, such organization of the language inventory is unsatisfactory. With every additional verb floating in the population, the number of needed constructions increases by the number of temporal semantic features, and with every additional semantic feature by the number of verbs. Furthermore, such inventory organization lacks any notion of grammar, which contradicts the known abilities of adult native speakers of Russian to recognize two distinct aspects. Item-based Constructions are created in a second stage (Tomasello, 2000). These constructions are more general and based on repeatedly encountered samples of similar kind. For example, the hearer again faces the problem of ambiguous inter- 250 K. Gerasymova and M. Spranger по-читал read for-a-while по-играл play for-a-while по-рисовал draw for-a-while по-verb verb for-a-while Fig. 12.11 Learning of the item-based construction po+verb (po+verb, ‘verb+for-a-while’). Above the arrow: undifferentiated holophrases are stored in memory when encountered. Under the arrow: holophrases with a particular prefix become generalized to an item-based construction based on this prefix, enabling parsing of prefixed verbs. pretation because of the inability to parse popisal (popisal, ‘wrote-for-a-while’). But now, instead of giving up, she searches through her stored samples for a means of parsing the utterance, eventually noticing that the difference between holophrastic constructions for poqital, poigral, porisoval is the actual verb stem. Hence, the agent is able to create a more general construction for the usage pattern po+verb (with a slot for a verb), as shown in Figure 12.11, and successfully parse the utterance involving popisal. The discovery of this usage pattern corresponds to the acquisition of the delimitative Aktionsart. More precisely, the agent has learned that the presence of the prefix po- (po-) in front of a verb indicates that the temporal semantic feature for a while has been added to its meaning. After this stage, the learning agent can correctly interpret any (known) verb prefixed by po- (po-), even if she has not encountered this particular combination before. However, the agent has only acquired the ability to comprehend the pattern po+verb but is still lacking additional knowledge for utilizing this device in language generation. What is missing is an understanding of the general principle of deriving new Aktionsarten by prefixation required to actively create a prefix structure in production. Nevertheless, the process of generalization described here for the prefix po(po-) works exactly the same for other prefixes, given enough generalizable material in an agent’s linguistic inventory. The learned constructions are item-based, the item being the particular prefix. The independent emergence of such item-based constructions for other prefixes mirrors the independent acquisition of Aktionsarten (hypothesized in Section 12.4). Finally, Abstract Constructions are created. The final phase of the acquisition process in children is characterized by generalization over item-based constructions and formation of abstract constructions, in which children express their communicative intentions through utterances that instantiate relatively abstract, adult-like linguistic constructions (Tomasello, 2000). 12 An Experiment in Temporal Language Learning 251 по-verb verb for-a-while на-verb verb complete за-verb prefix-verb verb begin verb temporal-sem-cat Fig. 12.12 Learning the principle of prefixation: emergence of an abstract construction for perfective. Above the arrow: item-based constructions based on particular prefixes. Under the arrow: abstract construction expressing the general principle of prefixation for derivation of new Aktionsarten is learned through generalization over item-based constructions. Although the two previous learning strategies solved the parsing problem for artificial learners, production remains troublesome. When faced with a need to generate a question in dialogs, learners are still unable to construct the complete utterance. In particular, they are unable to express the temporal semantics of events needed for discrimination. This failure is detected by the learner after re-entering the outcome of production into her own language system for parsing and noticing that the constructed utterance is insufficient to single out the topic. The idea behind re-entrance is to predict the effect of the utterance before actually passing it to the hearer. To repair her communicative problem, the learner examines the inventory of her linguistic experiences. There, accumulated item-based constructions reveal a general principle that the temporal semantics of verbs (Aktionsart) can be expressed by means of prefixation. This discovery can be captured by a novel abstract construction, where a prefixed verb, regardless of the actual form of the prefix and corresponding Aktionsart, becomes marked for the perfective aspect. The new construction operates only on the abstract semantic and syntactic categories of Aktionsart (temporal-sem-cat) and aspect and generates an abstract unit for a prefix without any concrete linguistic material (Figure 12.12). Only after this stage is the agent able to generate the perfective derivation of any (known) verb without having heard the resulting form before. This process resembles the way children acquire the grammatical category of aspect late in development, by unifying several Aktionsarten into the abstract category of perfective. 12.7 Experimental Results Through repeated interactions of the presented aspectual language game, artificial learners are able to acquire the aspectual grammar. Figure 12.13 (upper graph) displays the development of the grammar of one learning agent. In the world in which 252 K. Gerasymova and M. Spranger 1 40 communicative success communicative success 30 total number of constructions 0.6 25 20 0.4 0.2 0 15 holophrastic constructions semantic constructions mapping constructions morphological constructions 0 500 1000 1500 2000 number of interactions 2500 10 number of constructions 35 0.8 5 0 3000 holophrastic construction semantic constructions mapping constructions number of constructions number of constructions morphological constructions Fig. 12.13 Development of aspectual grammar: communicative success and number of grammatical constructions of one learner during the acquisition process (above: population of one learner and one tutor, avg. of 10 parallel runs of the experiment; bottom: population of 10 agents with 5 learners). the learner is situated, events can exhibit 7 different temporal semantics: ongoing, begin, for a while, finish, complete, exhaustion, alteration. Therefore, the target grammar should contain 20 construction in total.3 3 This number results from the particular realization of the target grammar in FCG and is assembled from 7 semantic and 7 abstract mapping constructions (for each temporal semantic facet) and 6 morphological constructions. The durative Aktionsart coding the ongoing temporal semantics does not require a prefix and, therefore, lacks a morphological construction. 12 An Experiment in Temporal Language Learning 253 In the beginning, the only kind of grammatical constructions the learning agent creates are holophrases (red line); their number is aligned with the total number of grammatical constructions the agent acquires (yellow line). After a couple dozen interactions, the learner starts to generalize, noticing the system behind the stored samples: other types of grammatical constructions are generated (semantic and morphological item-based and abstract mapping constructions, indicated by the green, gray and blue lines, respectively). The communicative success converges to the maximum value after approximately 300 interactions (cyan line); each posterior game will be a success. All constructions in the agent’s inventory have a score in the range of [0..1] at any given time during the game. When a new construction comes into play, it is assigned an initial score of 0.5. In the course of the game, the scores of constructions are updated depending on their success in communication (unsuccessful constructions are punished). After the target grammar is acquired (20 constructions in total), the very specific holophrastic constructions become redundant: they are in competition with more general item-based and abstract constructions. Eventually, holophrases lose and disappear after about 2000 interactions. The bottom graph in Figure 12.13 displays a similar dynamic for the scaled-up case of 5 learning agents in a population of 10 agents. 12.8 Conclusion This chapter investigated how temporal language can be operationalized in artificial agents and how artificial learners can acquire aspectual grammar through communicative interactions. We examined mechanisms for the grounding and semantic and syntactic processing of Russian temporal language and highlighted how information flows through the system so that artificial speakers and listeners can communicate. For the acquisition study, we equipped artificial tutors with subsets of Russian aspectual grammar and had them interact with learning agents, giving students the opportunity to infer and adopt constructions for talking about aspectually marked events in their environment. We introduced three different learning operators reminiscent of findings in developmental psychology. Results proved that the proposed learning operators together with the machinery needed for routine conceptualization and language processing lead to the successful acquisition of aspectual structures found in human grammars. Acknowledgements This research has been carried out at Sony CSL Paris with support from the EU FP7 project ALEAR. We are indebted to Masahiro Fujita, Hideki Shimomura and their team from Sony Corporation Japan for making the robots available to us. 254 K. Gerasymova and M. Spranger References Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843 Bickel B (1996) Aspect, mood, and time in Belhare: studies in the semantics – pragmatics interface of a Himalayan language. Zürich: ASAS-Verlag De Beule J (2006) Simulating the syntax and semantics of linguistic constructions about time. In: Gontier N, van Bendegem JP, Aerts D (eds) Evolutionary Epistemology, Language and Culture - A non-adaptationist, systems theoretical approach, Springer Forsyth J (1970) A Grammar of Aspect: Usage and Meaning in the Russian Verb. Cambridge: Cambridge University Press Gagarina N (2000) The acquisition of aspectuality by Russian children: the early stages. ZAS Papers in Linguistics 15:232–246 Gerasymova K (2010) Emergence of aktionsarten: The first step towards aspect. In: Smith A, Schouwstra M, de Boer B, Smith K (eds) The Evolution of Language (Evolang8), World Scientific, Singapore, pp 145–152 Gerasymova K, Spranger M (2010) Acquisition of grammar in autonomous artificial systems. In: Coelho H, Studer R, Woolridge M (eds) Proceedings of the 19th European Conference on Artificial Intelligence (ECAI-2010), IOS Press, pp 923– 928 Gerasymova K, Steels L, van Trijp R (2009) Aspectual morphology of russian verbs in fluid construction grammar. In: Taatgen N, van Rijn H (eds) Proceedings of the 31th Annual Conference of the Cognitive Science Society, Cognitive Science Society, pp 1370–1375 Krongauz MA (1998) Pristavki i glagoly v russkom jazyke: semantičeskaja grammatika. Moscow: Jazyki russkoj kul’tury Lieven E, Behrens H, Speares J, Tomasello M (2003) Early syntactic creativity: A usage-based approach. Journal of Child Language 30(02):333–370 Spranger M, Pauw S, Loetzsch M, Steels L (2012) Open-ended Procedural Semantics. In: Steels L, Hild M (eds) Language Grounding in Robots, Springer, New York Steels L (1995) A self-organizing spatial vocabulary. Artificial Life 2(3):319–332 Steels L, Baillie JC (2003) Shared grounding of event descriptions by autonomous robots. Robotics and Autonomous Systems 43(2-3):163–173 Steels L, De Beule J, Wellens P (2012) Fluid Construction Grammar on Real Robots. In: Steels L, Hild M (eds) Language Grounding in Robots, Springer, New York Stoll S (1998) The role of aktionsart in the acquisition of Russian aspect. First Language 18(54):351–376 Tomasello M (1995) Joint attention as social cognition. In: Moore C, Dunham PJ (eds) Joint attention: Its origins and role in development, Lawrence Erlbaum Associates, Hillsdale, NJ, pp 103–130 Tomasello M (2000) First steps toward a usage-based theory of language acquisition. Cognitive Linguistics 11-1/2:61–82
© Copyright 2026 Paperzz