The Inferential Transmission of Language Andrew D. M. Smith Language Evolution and Computation Research Unit, University of Edinburgh, UK Language is a symbolic, culturally transmitted system of communication, which is learnt through the inference of meaning. In this paper, I describe the importance of meaning inference, not only in language acquisition, but also in developing a unified explanation for language change and evolution. Using an agent-based computational model of meaning creation and communication, I show how the meanings of words can be inferred through disambiguation across multiple contexts, using cross-situational statistical learning. I demonstrate that the uncertainty inherent in the process of meaning inference, moreover, leads to stable variation in both conceptual and lexical structure, providing evidence which helps to explain how language changes rapidly without losing communicability. Finally, I describe how an inferential model of communication may provide important theoretical insights into plausible explanations of the bootstrapping of, and the subsequent progressive complexification of, cultural communication systems. Keywords language acquisition · language change · language evolution · meaning inference · cultural transmission · cross-situational learning 1 Introduction Language is a symbolic, culturally transmitted system of communication, which is learnt through the inference of meaning. Language is culturally transmitted, because although our genetically-specified cognitive capacity equips us with the ability to learn and use language, the specific languages we acquire are clearly not determined genetically. Rather, the languages children learn are those which they hear spoken by the people in their community, not necessarily those of their parents. Secondly, the meaning of a linguistic utterance is not transmitted directly, but is inferred indirectly by the hearer, through pragmatic insights and the social context in which the utterance is received. In this paper, I present a computational model of language based on cultural interactions, individual adap- tations, and a model of word learning based on the inference of meaning through cross-situational statistical learning, and use this experimental framework to explore a unified account of language change on the three different timescales described by Kirby and Hurford (2002): An individual’s acquisition of language on an ontogenetic timescale; the historical development of language on a glossogenetic timescale; and the emergence and complexification of language on a phylogenetic timescale. The remainder of this article is divided into six sections. In Section 2, I explore the philosophical problem of identifying the meaning of an unfamiliar word, and discuss some of the psycholinguistic theories which have been proposed to explain how children overcome this problem. In Section 3, I describe the details of the computational model of language use based on cul- Correspondence to: Andrew D. M. Smith, Language Evolution and Computation Research Unit, University of Edinburgh, Adam Ferguson Building, 40 George Square, Edinburgh EH8 9LL, Scotland, UK. E-mail: [email protected] Tel.: +44 (0)131 651 1837, Fax: +44 (0)131 650 3962. Copyright © 2005 International Society for Adaptive Behavior (2005), Vol 13(4): 311–324. [1059–7123(200512) 13:4; 311–324; 059321] Figures 1, 2, 4, 5 appear in color online: http://adb.sagepub.com 311 312 Adaptive Behavior 13(4) tural transmission and meaning inference through disambiguation across multiple contexts. In Section 4, I review previous experiments carried out using this model, with a particular emphasis on the implementation of many of the psychologically motivated constraints discussed in Section 2, and on comparing the results with attested evidence from studies of child language. Section 5 is concerned with historical linguistic change, and focuses on two different kinds of linguistic variation which are demonstrated in the model. The basic model is embedded within a generational model, and I present further experimental results which demonstrate how these variations can themselves explain the dynamic nature of language across generations of language users: How it changes rapidly over time while maintaining its utility as a shared communication system. In Section 6, I sketch a theoretical scenario of qualitative language change on a phylogenetic timescale, through the same processes of inferential cultural transmission. Finally, in Section 7, I describe how this opens the way for the inferential transmission of language to provide a unified account of language change on all three timescales. 2 Uncertainty in Lexical Acquisition One of the most distinctive features of human language is the fact that words and their meanings are related not iconically, through perceptual similarity, but symbolically, by the very fact of the association alone. This linkage of words and meanings together in the Saussurean sign (Saussure, 1916) is a priori arbitrary and determined by convention. For instance, there is nothing in the sound of the word chair which suggests any aspect of its meaning, nor that this is similar to that of the phonetically dissimilar word seat. Historical cultural tradition alone determines that Swahili speakers will use the word kiti, and Hungarian speakers the word szék, to express the same meaning of CHAIR. But how do we associate words with their correct arbitrary meaning? Carey & Bartlett (1978) describe the phenomenon of fast mapping, demonstrating that preschool infants can learn the meaning of an unfamiliar word, contrasted with a familiar one, after just one exposure. Lexical acquisition is not only extremely rapid, but also extensive: Children learn the meanings of about 40,000 words by the age of ten (Anglin, 1993). The child’s instinctive, prodigious talent for lexical acquisition is even more remarkable, given the logical problem of inducing the meaning of an unfamiliar word. This was famously illustrated by Quine (1960), who presented an imaginary anthropologist observing a speaker uttering the word “gavagai” while pointing towards a rabbit. Quine explains that, logically, gavagai could mean any one of RABBIT, ANIMAL, DINNER, UNDETACHED RABBIT PARTS, or indeed an infinite number of alternatives. Even worse, Quine shows that, regardless of how much additional information the anthropologist collects, an infinite number of semantic hypotheses which are logically consistent with the data will remain. Theoretically, then, the meaning of an unfamiliar word can never be completely ascertained, yet in practice, children effortlessly succeed time and again. There is little consensus, however, on how this success is achieved. In the following sections, I briefly review various suggestions from psychologists of how children overcome this problem of meaning uncertainty. These can be broadly grouped into three different types: Representational constraints on what the child will consider; interpretational constraints, context-dependent strategies which depend on novelty and what has already been learnt; and more general, social constraints. In Section 4, I will explore the implementation and effects of such constraints in a model of lexical acquisition. 2.1 Representational constraints Macnamara (1982) suggests that, just as children most naturally represent their environment in terms of the objects within it, so they also assume that novel words refer to whole objects, rather than parts or properties thereof. Markman (1989) claims, further, that such a whole-object bias is specifically tailored to word learning. Bloom (2001), however, points out that a similar bias appears to be used in many other cognitive domains, such as tracking, categorization, addition and subtraction. Although the whole-object bias is therefore useful in explaining the bootstrapping of lexical acquisition, it is clearly not a sufficient explanation of the whole problem. Words are clearly not used solely to name objects, so additional constraints have been sought by others to account for more complex facets of word learning. Landau, Smith, and Jones (1988) presented children with an unfamiliar object which they explicitly named as a “dax”, then referred to a number of different test objects, asking the children in each case: “Is this a dax?” Children generalized the new name to Smith objects with the same shape as the original object, but ignored other properties such as size and texture, even if these appeared much more salient, leading Landau to propose an innate shape bias. Soja, Carey, and Spelke (1991), however, show that although children generalize to objects of the same shape as a rigid object, they generalize to objects of the same material if the original object was not rigid. L. B. Smith (2001), moreover, reports children paying special attention to the texture of an object, but only if it has eyes or shoes! These studies show that children are not only good at making generalizations based on the properties of objects, but also at learning which properties are useful to attend to. Although domain-specific learning biases such as the proposed shape bias may be used in lexical acquisition, it seems probable that the biases themselves are shaped by general development processes. In Section 4, I simulate the existence of representational constraints in terms of innate biases which make learners more likely to use particular properties of objects in the categorization process. 2.2 Interpretational Constraints A common problem with all representational biases, however, is that they must, during the course of learning, eventually be overcome. The child, for instance, must be able to find words for parts as well as wholes, textures as well as shapes. More general constraints on the interpretation process itself have therefore been proposed, which depend on what the child has already learnt. Markman (1989) puts forward one of the most basic of these principles, the assumption of mutual exclusivity, or the assumption that words do not share referents. The use of mutual exclusivity to disambiguate reference has often been shown experimentally, most notably by Markman and Wachtel (1988). They assume that mutual exclusivity applies particularly in the early stages of lexical acquisition, when most vocabulary items are basic level words, but weakens over time, allowing the child to construct words with overlapping extensions, and thus semantic hierarchies. Clark (1987) proposes a similar theory focused on the more general notion of contrast, which assumes that any difference in form marks a difference in meaning. From this, she predicts that children use contrast both to assign novel words to gaps in their lexicons, and to coin new words to fill such gaps when necessary. In Section 4, I show how inferential models of language learning which Inferential Language Transmission 313 include an implementation of mutual exclusivity allow agents with very different conceptual structures to communicate much more successfully than those without mutual exclusivity. 2.3 Social Constraints Many other principles of social and cognitive development are also used in lexical acquisition (Tomasello, 1999), often grouped together under the rubric of theory of mind, which is widely held to be the distinguishing factor between the social cognition of humans and that of other animals. Tomasello and Rakoczy (2003) argue that there are two crucial stages in the development of this specialized social cognition. Firstly, at around age 1, children begin to understand adults as intentional agents, following their gaze and where they point to. They understand that adults have control over their perceptions, and can choose to attend to particular objects or aspects of a situation. Because children learn to understand the attentional intentions of others, they also realize that they can attract another’s attention to something by pointing at it or holding it up for inspection (Liszkowski, Carpenter, Henning, Striano, & Tomasello, 2004). This joint attention of both child and adult to the same situation allows the child to greatly reduce the number of interpretations it will consider for a signal. In order to be more specific about the particular experience or property we want to draw someone else’s attention to, linguistic symbols become necessary (Tomasello, 1999). At this stage, the representational and interpretational constraints described above allow the child to enter into a virtuous circle of increasing lexical acquisition and increasing cognitive development. In Section 4, I present findings which show that the relative degree of joint attention in the communicative process is very important in determining how long it will take for a learner to learn a lexicon. 3 Modeling the Inference of Meaning 3.1 Cultural Transmission The evolution of language has, until recently, been primarily viewed in terms of explaining the biological evolution of a language instinct (Pinker, 1994), which has been characterized as a specialized cognitive organ within the brain. This language organ is held to con- 314 Adaptive Behavior 13(4) Figure 1 The expression/induction model of language as a dynamic, culturally transmitted system. Individuals express linguistic behavior based on their internal representations; their internal representations, in turn, are induced in response to encountered linguistic behavior. Language persists in two qualitatively different states: internal knowledge and external behavior. tain both a formal coding of Universal Grammar, which limits the set of possible human languages, and a Language Acquisition Device, which directs the course of grammar construction based on the observed primary linguistic data (Chomsky, 1965). Accounts within this framework seek to demonstrate how this innate, domain-specific cognitive organ could have evolved incrementally through a standard adaptionist process of natural selection (Jackendoff, 2002). An alternative view, however, focuses on how linguistic structures themselves adapt to be learnt by humans. The two distinct manifestations of language that we recognize from the Chomskyan account are reconfigured as distinct, qualitatively different, phases in the life cycle of a language: Individuals express external linguistic behavior based on their internal linguistic representations, and induce internal linguistic representations, or grammars, in response to the external linguistic behavior (or primary linguistic data) they encounter. Language is culturally transmitted, because the linguistic input used by one individual to construct its grammar is itself the output of other individuals, as shown diagrammatically in Figure 1. Language learners attempt to acquire the language of the other members of their community, and differences between their internal grammars and those of other individuals occur as a result of the dynamic cultural evolution of the language itself. Models which represent the transmission of language in this way have been termed expression/ induction (E/I) models (Hurford, 2002) or iterated learning models (Smith, et al.). Such models have been used to demonstrate the cultural emergence of structural properties of language, such as compositionality (Batali, 2002; Brighton, 2002) and recursion (Kirby, 2002). The key feature of these models is a transmission bottleneck, which restricts the amount of linguistic data a learner is exposed to. Under these conditions, parts of language which are regular, or can be generalized from other examples, are much more likely to persist through a repeated cycle of expression and induction than idiosyncratic parts of language which cannot be generalized. 3.2 Meaning Inference The majority of these models of language evolution, however, do not address the problem of learning what words mean, but rather they simply assume that meanings are pre-defined entities, and that the transmission of language consists of the simultaneous and explicit transfer of signals and meanings. This assumption of explicit meaning transfer leads to two major conceptual difficulties, namely signal redundancy and lack of semantic variation. Firstly, as I have argued previously (Smith, 2003a, 2005), if meanings are transferred telepathically, then any signals which are used by the individuals cannot be said to actually convey meaning at all. If they have no meaningful content, Smith then their very existence poses the problem of signal redundancy: What is the motivation for language users to spend time and energy in learning a symbolic system of signals which provides them no additional information? Secondly, languages are actual historical entities, which are constantly changing, as a result of variation at many levels of analysis (Trask, 1996). Indeed it has been argued by many, including Bybee, Perkins, and Pagliuca (1994) and Croft (2000), that variability is one of the most fundamental features of language, which must be taken account of in a realistic model of language learning and use. In the inferential model presented in this article, therefore, agents initially have neither lexical nor conceptual structures, but merely the ability to develop individual conceptual representations and to learn from their own experiences. This model differs from others which make use of the E/I framework, because it does not contain a pre-defined, structured meaning system, but instead assumes that individuals infer meaning through experience, and that the meanings so inferred can vary between individuals: See also Vogt (2003) for a similar approach to using E/I models without explicit meaning transfer. Crucial to this model is the existence of an external world, as the source from which meaning can be inferred; without it, meanings must be predefined, can only be communicated directly through some kind of telepathy, and cannot vary naturally. The model therefore contains three different levels of representation: A. an external environment, which provides the motivation and source for meaning creation; B. an agent-specific internal representation of meaning, which is not accessible by others; C. a set of signals which can be transmitted between agents. Figure 2 shows these three levels of representation in a model of communication which both avoids the signal redundancy paradox and grounds the symbols (Harnad, 1990). The demarcation of the representation into an external domain, containing things which can potentially be accessed and manipulated by all individuals, and an internal, private domain, containing items only accessible by the individual itself, is vital to the validity of the model. The mere existence of an external world, as in Hutchins and Hazlehurst (1995)’s model of shared vocabulary development, is not suffi- Inferential Language Transmission 315 Figure 2 A model of communication which avoids the problem of signal redundancy. The model has three levels of representation: an external environment (A); an internal semantic representation (B); and a public set of signals (C). The mappings between A and B and between B and C, represented by the arrows, also fall into the internal, private domain, whose boundary is shown by the dotted line. cient to avoid the signal redundancy problem: Not only must an agent’s semantic representation be private, but so must the mappings which map their meanings both to signals and to objects in the world. 3.3 Description of the Model In the model of inferential learning described in this paper, the world contains objects, which can be objectively described in terms of their feature values, real numbers pseudo-randomly generated within the range [0, 1]. Agents can use their dedicated sensory channels to sense whether a particular feature value falls between two bounds, can create meanings which allow them to distinguish objects from each other, and can create words to express these meanings. In the experiments presented here, the world contains 20 different objects and each agent has 5 sensory channels. The model is based on the language games described by Steels (1996), but is extended in a number of ways. The initial source of an agent’s interaction with the environment is through discrimination games (Steels, 1996). A subset of the objects, called the context, is chosen from the world and presented to one of the agents; one of these objects is chosen, at random, to be the target of the discrimination episode, and the agent aims to distinguish this object from all the others 316 Adaptive Behavior 13(4) in the context. The agent searches its sensory channels for a distinctive category, namely a semantic representation which both correctly describes the target, and does not accurately describe any other object in the context. In the experiments presented here, distinctive categories are restricted to single categories, rather than logical combinations of nodes from different sensory channels. Failure triggers meaning creation by splitting the sensitivity range of a sensory channel into two discrete, equally sized segments, each of which is therefore sensitive to half the range of the previous segment. An agent searches through its sensory channels until it finds a split which would have produced a successful distinctive category in the current discrimination game, had the category already existed. Repeated splitting results in a hierarchical, tree-like conceptual structure, whose nodes represent semantic categories. Nodes nearer the root of the tree represent more general meanings, with wider sensitivity ranges which cover a greater proportion of the semantic space, and nodes nearer the leaves represent more specific meanings. There is no pre-definition of which categories should be created, and meaning creation is carried out by each agent individually, so agents create different, but typically equally successful, semantic representations of the world. Having developed meanings which can effectively describe the objects in the world, agents communicate about the objects using the distinctive category chosen in the discrimination process. Hurford (1989) introduced the idea of using dynamic communication matrices to model the evolution of communication strategies, and showed that bidirectional, Saussurean mappings between signals and meanings are essential for the development of viable communication systems. Oliphant and Batali (1997) extended this model to show that the best way of ensuring continuing increases in communicative accuracy is to choose signals based on how they are interpreted by the rest of the population. Oliphant and Batali’s algorithm, however, requires that agents can directly access the internal semantic representations of other agents. In order to avoid this mind-reading, I use a modified version of their algorithm, called introspective obverter (Smith, 2003b), where the speaker puts itself into the hearer’s shoes, and chooses the signal which it would be most likely to interpret correctly as the hearer, given the current context and its own semantic representations. Signal choice is therefore based on the speaker’s own interpretative behavior. The speaker uses this algorithm to choose a signal for the distinctive category it found, and transmits this to the hearer, together with the context of the discrimination game. Neither the meaning itself, nor the target object to which the meaning refers, are explicitly identified to the hearer. The hearer interprets the signal, and learns its meaning, solely from the information in the current context and from its previous experience of the signal in other contexts. In order to infer the meaning of an utterance, the hearer first uses the conceptual structures it has developed to play a separate discrimination game for each object in the context (i.e., with each object in turn serving as the target object), thereby creating a list of semantic hypotheses. This list consists of every meaning in its current conceptual structure which could serve as a distinctive category for any single object in the context. In principle, without any constraints such as those discussed in Section 2, each of these possible meanings is equally plausible, so the hearer considers all of them, and stores them individually in its internal lexicon in association with the signal. This lexicon contains a count of the co-occurrence of each signal-meaning pair <s, m>, which is used to calculate the conditional probability P(m|s) that, given the signal s, the meaning m is associated with s, according to the formula f ( s, m ) P ( m s ) = --------------------------n ∑ f ( s, i ) i=1 where f(s, m) is the number of times s has been associated with m and n is the number of items in the lexicon (Smith, 2003b). The hearer uses its lexicon to choose a preferred meaning for the signal, namely the one which has the highest conditional probability for the received signal. If two or more meanings have equal conditional probability, then one of them is chosen at random. The success of the communicative episode is measured by referent identity: The hearer’s inferred meaning is a distinctive category which picks out one of the objects in the context, and if this object is the same as the speaker’s initial target, then the episode succeeds. Importantly, therefore, there is no requirement for agents to use (or even to have) the same internally specified meaning, only that they both identify the same external referent. The measurement of communicative success, indeed, takes place solely for the benefit of the experimenter: Neither agent receives any information at all Smith about the result of the communicative episode. The learning mechanism, therefore, does not rely on any corrective feedback, in contrast to the guessing game (Steels & Kaplan, 2002; Vogt, 2002; Steels & Belpaeme, in press), but instead relies on the co-occurrence of words and their inferred meanings. The status of such feedback in lexical acquisition is the source of much current debate in psycholinguistics. It is widely accepted that children receive little, if any, direct corrective feedback while learning words (Bloom, 2000). Lieven (1994), indeed, describes cultures in which children are not even addressed in the early stages of acquisition. On the other hand, Chouinard and Clark (2003) have shown that adults often reformulate what they think children have said, and that such reformulations can act as an important source of implicit feedback. This model described here, however, serves to show that successful lexical acquisition can take place without explicit feedback, using cross-situational statistical learning (see also Vogt and Smith, in press). 3.4 Cross-situational Statistical Learning Cross-situational statistical learning is based on the statistical co-occurrence of words and their inferred meanings, and is similar to the technique proposed by Siskind (1996). It differs from Siskind’s model most fundamentally in that his learners are provided with a hypothesis set which already contains all possible meanings in the world, from which they eliminate those which are incoherent in the current situation. Inferential Language Transmission 317 By contrast, in the model presented here, the hypothesis set is in principle infinite, as learners create new meanings through experience with the world, and choose the meanings which are most plausible, given both the current situation and their interaction history. In order to fully understand the process of crosssituational statistical learning within the language game model, let us go through a number of simplified example games shown in Figure 3, to see how the meaning of an unfamiliar word is disambiguated through its repeated occurrence in different contexts. Suppose that objects are described only in terms of two features, shape and brightness, and that the hearer encounters the contexts and utterances shown in columns A and B. For the purposes of this exposition, there are five different shapes shown in Figure 3, and three different categories of brightness, namely LIGHT, INTERMEDIATE and DARK. The hearer, moreover, assumes that each utterance serves to discriminate one (and only one) object from all the others in the context. In the first game, the hearer encounters two wheels, two trees and a star: In terms of shape, therefore, only STAR can be used to describe a single object and is therefore a possible meaning. In terms of brightness, there is one light object (the first tree), one dark object (the second wheel), and the rest are intermediate: Both light and DARK could also be possible meanings, and therefore both, together with STAR, are shown in column C, which represents the set of semantic hypotheses in the current game. Column D shows all the meanings which Figure 3 Cross-situational statistical learning across three language games. Each game shows the context of objects (A); the signal uttered (B); the current set of semantic hypotheses constructed by the hearer (C); and the relevant part of the hearer’s lexicon (D), with meanings (m) and the frequency of their co-occurrence with the signal (f). 318 Adaptive Behavior 13(4) have ever been associated by the hearer with the current signal ikwob; as this is the first time the word has been encountered, it contains the three possible meanings in column C, and a co-occurrence frequency of one for each. As all the co-occurrence frequencies are equal, the hearer must choose one of these meanings at random. In the second game, the hearer encounters ikwob again, this time in a completely different context. Using the same process as before, the hearer creates another (different) set of semantic hypotheses for the current context, namely DARK, MAN, STAR and TREE, shown in column C. After adding these meanings to the lexicon, and updating the co-occurrence frequencies, we can see in column D that after two games, the hearer now has five possible meanings, whose co-occurrence frequencies are no longer equal. Both DARK and STAR have occurred twice in conjunction with ikwob, but the others only once. The ambiguity has therefore been reduced, but not yet eliminated, and so the hearer would choose a meaning at random from those with the highest cooccurrence. The third game proceeds in the same fashion, and provides the hearer with a further set of possible meanings: LIGHT and STAR. When these are added to the lexicon, one meaning has a higher co-occurrence frequency than all the others, and so the hearer is confident that the meaning of ikwob is STAR. 4 Inferential Acquisition Recent empirical research, indeed, shows that a crosssituational model of learning provides a robust account of lexical acquisition. Houston-Price, Plunkett, Harris, and Duffy (2003) show that children use cross-situational learning to disambiguate word reference, even though their experiments were designed with attentional cues. Akhtar and Montague (1999), and Klibanoff and Waxman (2000), have separately demonstrated that novel adjectival categories are learnt cross-situationally, within the context of basic level categories. Previous experiments using this and similar computational models have also shown that large lexicons can be learnt, and that agents with different conceptual structures can successfully communicate with each other. The parameters of these models can also be varied, to implement the constraints on lexical learning discussed in Section 2, and to explore their effects on learning and communication. Such experiments have found that communicative success is closely related to the level of meaning similarity between agents (Smith, 2003b). Moreover, if agents have the same representational biases, then they are more likely to develop similar meanings. This relationship can be diluted, however, if agents use an interpretational constraint like mutual exclusivity, reducing the number of semantic hypotheses under consideration by ignoring those objects for which they already know an appropriate word. Under these circumstances, high levels of communicative success occur even among agents with very dissimilar conceptual structures (Smith, 2005). Social constraints such as joint attention, moreover, can be implemented by altering the size of the context in a communicative episode. As the size of the context increases, so the time taken for the learner to learn the lexicon increases (Smith, 2003a; Smith & Vogt, 2004). The inferential model of communication and cross-situational statistical learning presented here is, therefore, a plausible model of lexical acquisition, whose results correspond well with recent attested evidence from child language studies. 5 Inferential Variation and Change It is well recognized that language change is driven by variation in language communities (Trask, 1996). In the inferential model described in this paper, there are two important sources of variation, which I call conceptual and lexical. In Section 5.1, I will describe the source and effects of these variations, and present methods of measuring them. In Section 5.2, I show how the inferential paradigm can be used to explain aspects of historical linguistic change. Examples of both types of variation can be seen in Figure 4. Taken from a representative simulation, this diagram shows extracts from the conceptual structure of an adult and a child. Each agent has five sensory channels on which conceptual structures are built, but for ease of exposition, only one of these is shown here. 5.1 Conceptual and Lexical Variation The independent creation of conceptual structure leads inevitably to variation in agents’ semantic representations, both because an agent’s response to a particular experience is not deterministic, and because agents’ experiences themselves differ (Smith, 2003a). In the Smith Inferential Language Transmission 319 Figure 4 Extract from the internal structures of two agents, showing variation in both conceptual and lexical structures. The conceptual structures are shown by hierarchical tree structures, each node of which represents a different meaning. Conceptual variation, where meanings have no corresponding equivalent in the other agent’s conceptual structure, is marked with dotted lines. Lexical structures are represented by the words attached to the nodes, which signify the agent’s preferred word for the meaning; empty nodes have no preferred word. Lexical variations, where the agents disagree on the meaning of a word, are circled in the right-hand structure. upper part of Figure 4, we first consider only the agents’ conceptual structures, shown by the hierarchical tree structures. Nodes with no equivalent in the other agent’s conceptual structure are marked with dotted lines. Although the two agents in Figure 4 have developed similar structures, it is clear to see that in three different places, the child has developed additional conceptual structure. Such conceptual variation can be quantified by considering the nodes which the trees have in common. If k(t, u) is the number of nodes which two trees t and u have in common, and n(t) is the total number of nodes on tree t, then the similarity τ(t, u) between trees t and u is 2k ( t, u ) τ ( t, u ) = --------------------------n(t) + n(u) Averaging τ across all their sensory channels, we can produce a measure of conceptual, or meaning, similarity between two agents (Smith, 2003a). Secondly, the uncertainty inherent in cross-situational learning produces inevitable variation in the agents’ lexical associations. The inferred meanings are dependent on the particular conceptual structures which the agent has created, and the associations themselves depend on the particular contexts in which words are heard. Lexical variation can be measured by consider- 320 Adaptive Behavior 13(4) ing whether agents have the same preferred word for each meaning. An agent’s preferred word for meaning m is the word in its lexicon which has the highest conditional probability in association with m, and which does not have a higher conditional probability in association with a different meaning. Preferred words are represented in the lower part of Figure 4 by the words attached to the nodes; empty nodes have no preferred word, and circles are used to highlight lexical variations, where the agents have different preferred words. The words wm and hhd, for instance, have not been learnt correctly by the child, although the relevant nodes on the adult’s conceptual structure do exist in the child’s conceptual structure. The child has attached both wm and hhd to nodes nearer the root of the tree; because these nodes cover a larger degree of semantic space than their meanings for the adult, this kind of change can be considered generalization. In Section 5.2, I describe why generalization of this kind occurs frequently in this model. Lexical items are said to persist if they are successfully learnt; lexical persistence across the whole of an agent’s lexicon is a very useful measure of linguistic change, and can be measured both within and between generations. Intra-generational lexical persistence is the proportion of the adult’s lexicon learnt correctly by the child, while inter-generational lexical persistence is the proportion of the original language developed by the adult in the first generation of the simulation, which is still intact in the language of the child at the end of the nth generation. At the end of each generation, each agent has approximately 50 preferred words in their lexicon. 5.2 Experimental Results To investigate semantic change across multiple generations of cultural transmission, the basic inferential model is extended vertically into a traditional iterated learning model with generational turnover (Smith et al., 2003). Each generation has two phases: A set of 100 orientation episodes, followed by a number of communication episodes. In the orientation phase, the agents explore the world individually, and create meanings to represent what they encounter, through discrimination games. Both agents take part in this phase, though it is almost redundant for the adults who have already developed a rich conceptual structure, except in the initial generation when they have none. In the commu- nication phase, the adult attempts to communicate to the child as described in Section 3.3; in each communicative episode there are five objects in the context. Communicative success occurs when the object identified by the hearer’s chosen meaning is the same as the speaker’s initial target object. There is no requirement for the agents to use identical internal meanings, only that they identify the same external referent. Neither agent receives any feedback about the communicative success of the episode. At the end of a generation, the adult is removed, the child becomes adult, and a new child is introduced. The language inferred in the previous generation by the child becomes the source of its output in the subsequent generation, as described in Section 3. Figure 5 shows results from a typical simulation run over ten generations, each made up of 5,000 episodes. Analyses of meaning similarity, of communicative success over the previous 100 episodes, and of inter- and intra-generational lexical persistence were calculated. Previous work has shown that levels of communicative success are closely correlated with levels of meaning similarity in mono-generational models of acquisition (Smith, 2003b). In the left-hand graph, we can clearly see here that levels of meaning similarity and communicative success are again very closely correlated in a multi-generational model. In each generation, the communicative success rate rises rapidly at first, as the child successfully learns the meanings of many words, then the increase slows, as the child tries to infer the meanings of the remaining words. These words stand for meanings which are seldom used by the adult and so occur relatively infrequently in communicative episodes, and are therefore learnt much more slowly. In the right-hand graph, we see that the rate of inter-generational lexical persistence shows a considerable cumulative decline after only a few generations, although the intra-generational rate remains stable across generations. There are two separate pressures on the language which enforce this relentless erosion over successive generations of inferential cultural transmission, which can be regarded as twin bottlenecks on the language’s transmission. Conceptual variation restricts the number of words which can potentially persist into the next generation: Only words which refer to meanings which are shared by the agents can be learnt. Lexical variation, through imperfect learning, then restricts the number Smith Inferential Language Transmission 321 Figure 5 An iterated inferential model, with generations of 5,000 episodes. Communicative success and meaning similarity (left); intra-generational and inter-generational lexical persistence (right). of words which actually persist into the next generation. The pressures from these two bottlenecks naturally result in a steady cumulative decline in intergenerational lexical persistence. Although the language changes rapidly, such that very little of the original adult’s language remains after only a few generations, we can see from the left-hand graph that communicative success between adults and children within a single generation is not affected, and remains very high. The language change described in these experiments also has a distinct qualitative pattern, in that words which refer to more specific meanings tend to disappear first, and only more general words tend to survive across multiple generations. This occurs because the Steelsian method of hierarchical conceptual construction forces some order on the meanings created: There is no way, for instance, to create a meaning in the depths of a tree without first creating the relevant meanings further up the hierarchical structure. This means that more general meanings are more likely to be shared by the agents, and therefore more likely to pass through the conceptual variation bottleneck. Secondly, agents use a communicative model which follows Grice (1975)’s maxim of quantity, in that distinctive categories provide sufficient information to identify the target, but are not unnecessarily specific. General meanings are more likely to be used by the adult and inferred by the child, and therefore pass through the second bottleneck on learning. 6 Inferential Evolution Meaning inference is important not only in explaining how language can change so rapidly without becoming incomprehensible to its users, but also in theoretical explanations of how communication could have begun in the first instance, and how it could have become increasingly complex without losing its utility. Although communication is commonly characterized as the passing of information from speaker to hearer, Burling (2000) points out that the initial communicative episode was not triggered by a speaker making an intentional signal, but rather by a hearer interpreting some behavior as a signal. The existence of communication is indeed defined by interpretative intent: No matter how many signals are sent, communication does not happen until someone tries to interpret them. Even involuntary behavior can be interpreted as a signal, and the act of interpretation is indeed performative, rendering the original behavior a signal and the whole episode communicative. Premack (1975) gives the example of an individual who always gives a cry of excitement on finding a strawberry. Even without an intention to provide information to others, the call becomes functionally referential when it is associated by a hearer with the presence of strawberries, and communication is founded. Exactly the same constraints, however, apply not only to the instantiation of communication, but to its progressive complexification. Any viable develop- 322 Adaptive Behavior 13(4) ment of a communication system is constrained by the development of the hearer’s interpretative capabilities, because utterances must be able to be interpreted in order to survive. Origgi & Sperber (2000) point out why the inferential nature of human communication is central to its evolution, by contrasting it with an alternative view of communication, as a code, where meanings are encoded into signals and decoded back into meanings. Coded communication systems work very well, but only when interlocutors share the same set of signals and meanings; mismatches in either set lead to communication failure. The complexification of language in such a system is a puzzle, because any kind of modification in one individual’s internal linguistic representation, even one which could result in the acquisition of a more complex, potentially more beneficial language, would cause a mismatch between that individual and the others, and thus communication failure. As we have seen, however, an inferential model can allow for divergent and dynamic conceptual structures, and yet still be used in successful communication. The inference by the hearer of a richer, more complex semantic structure than was intended by the speaker does not necessarily result in a catastrophic breakdown in communication. Instead, the inference of additional semantic structure may lead the hearer to search for additional information from the context to satisfy the new inferred structure. They will then use the same signals as other individuals, but associate them with more detailed meanings. Although this extra detail will be ignored by most interlocutors, if it is in any way beneficial to those who can understand it, then the capacity to infer more complex structure may become stabilized in the population. The inference of meaning and repeated form–function reanalysis therefore provides an important theoretical insight into how communication systems like language might have evolved initially ex nihilo, and how they might have become progressively more complex. Future research is planned with complex models of semantic inference to explore this hypothesis more closely. 7 Conclusions It is important to acknowledge not only that language is a culturally-transmitted system of communication, but also that this transmission is based on the infer- ence of meaning. Inferential communication provides a straightforward explanation for the existence of otherwise redundant signals, and the simulations presented here show how the same process may underlie the development of language on three different timescales: Acquisition in the child; change in the language; and evolution in the species. I have shown how the basic model of cross-situational learning, attested in lexical acquisition, can be enhanced by psychologically plausible representational constraints which allow individuals to build similar conceptual structures, interpretational constraints which allow successful communication between agents with divergent conceptual structures, and social constraints which allow more rapid learning. I have explained how the uncertainty inherent in meaning inference leads to variation in both conceptual and lexical structure, and presented experiments which show how language can both change rapidly over generations, while maintaining its communicative utility in the language community. Finally, I have sketched a scenario in which the inference of meaning may explain the development and complexification of a communication system driven by interpretative capabilities. Acknowledgments This research was supported by ESRC postdoctoral fellowship PTA-026027-0094. I am grateful to three anonymous reviewers for their constructive comments on an earlier draft of this paper. References Akhtar, N., & Montague, L. (1999). Early lexical acquisition: The role of cross-situational learning. First Language, 19, 347–358. Anglin, J. M. (1993). Vocabulary development: A morphological analysis. Monographs of the Society for Research in Child Development, 58(10), 1–166. Batali, J. (2002). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. In E. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models (pp. 111– 172). Cambridge, UK: Cambridge University Press. Bloom, P. (2000). How children learn the meanings of words. Cambridge, MA: MIT Press. Bloom, P. (2001). Roots of word learning. In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and concep- Smith tual development (pp. 159–181). Cambridge, UK: Cambridge University Press. Brighton, H. (2002). Compositional syntax from cultural transmission. Artificial Life, 8(1), 25–54. Burling, R. (2000). Comprehension, production and conventionalisation in the origins of language. In C. Knight, M. Studdert-Kennedy, & J. R. Hurford (Eds.), The evolutionary emergence of language (pp. 27–39). Cambridge, UK: Cambridge University Press. Bybee, J., Perkins, R., & Pagliuca, W. (1994). The evolution of grammar: Tense, aspect and modality in the languages of the world. Chicago: University of Chicago Press. Carey, S., & Bartlett, E. (1978). Acquiring a single new word. Papers and Reports on Child Language Development, 15, 17–29. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chouinard, M. M., & Clark, E. V. (2003). Adult reformulations of child errors as negative evidence. Journal of Child Language, 30, 637–669. Clark, E. V. (1987). The principle of contrast: A constraint on language acquisition. In B. MacWhinney (Ed.), Mechanisms of language acquisition. London: Erlbaum. Croft, W. (2000). Explaining language change: An evolutionary approach. Harlow, UK: Pearson. Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics (Vol. 3, pp. 41–58). New York: Academic Press. Harnad, S. (1990). The symbol grounding problem. Physica, D 42, 335–346. Houston-Price, C., Plunkett, K., Harris, P., & Duffy, H. (2003). Developmental change in infants’ use of word meaning. (Paper presented to XIth European Conference on Developmental Psychology, Catholic University of Milan) Hurford, J. R. (1989). Biological evolution of the Saussurean sign as a component of the language acquisition device. Lingua, 77, 187–222. Hurford, J. R. (2002). Expression/induction models of language evolution: Dimensions and issues. In E. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models (pp. 301–344). Cambridge, UK: Cambridge University Press. Hutchins, E., & Hazlehurst, B. (1995). How to invent a lexicon: The development of shared symbols in interaction. In N. Gilbert & R. Conte (Eds.), Artificial societies: The computer simulation of social life. London: UCL Press. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press. Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In E. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models (pp. 173–203). Cambridge, UK: Cambridge University Press. Inferential Language Transmission 323 Kirby, S., & Hurford, J. R. (2002). The emergence of linguistic structure: An overview of the iterated learning model. In A. Cangelosi & D. Parisi (Eds.), Simulating the evolution of language (pp. 121–148). London: Springer. Klibanoff, R. S., & Waxman, S. R. (2000). Basic level object categories support the acquisition of novel adjectives: Evidence from pre-school aged children. Child Development, 7 (3), 649–659. Landau, B., Smith, L. B., & Jones, S. S. (1988). The importance of shape in early lexical learning. Cognitive Development, 3, 299–321. Lieven, E. V. M. (1994). Crosslinguistic and crosscultural aspects of language addressed to children. In C. Gallaway & B. J. Richards (Eds.), Input and interaction in language acquisition (pp. 56–73). Cambridge, UK: Cambridge University Press. Liszkowski, U., Carpenter, M., Henning, A., Striano, T., & Tomasello, M. (2004). Twelve-month-olds point to share attention and interest. Developmental Science, 7(3), 297–307. Macnamara, J. (1982). Names for things: A study of human learning. Cambridge, MA: MIT Press. Markman, E. M. (1989). Categorization and naming in children: Problems of induction. Cambridge. MA: MIT Press. Markman, E. M., & Wachtel, G. F. (1988). Children’s use of mutual exclusivity to constrain the meaning of words. Cognitive Psychology, 20, 121–157. Oliphant, M., & Batali, J. (1997). Learning and the emergence of coordinated communication. Center for Research on Language Newsletter, 11(1). Origgi, G., & Sperber, D. (2000). Evolution, communication and the proper function of language. In P. Carruthers & A. Chamberlain (Eds.), Evolution and the human mind: Modularity, language and meta-cognition (pp. 140–169). Cambridge, UK: Cambridge University Press. Pinker, S. (1994). The language instinct. London: Penguin. Premack, D. (1975). On the origins of language. In M. S. Gazzaniga & C. B. Blakemore (Eds.), Handbook of psychobiology (pp. 591–605). New York: Academic Press. Quine, W. v. O. (1960). Word and object. Cambridge, MA: MIT Press. Saussure, F. d. (1916). Cours de linguistique générale. Paris: Payot. Siskind, J. M. (1996). A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition, 61, 39–91. Smith, A. D. M. (2003a). Evolving communication through the inference of meaning. PhD thesis, Philosophy, Psychology and Language Sciences, University of Edinburgh. Smith, A. D. M. (2003b). Intelligent meaning creation in a clumpy world helps communication. Artificial Life, 9(2), 175–190. Smith, A. D. M. (2005). Mutual exclusivity: Communicative success despite conceptual divergence. In M. Tallerman 324 Adaptive Behavior 13(4) (Ed.), Language origins: Perspectives on evolution (pp. 372–388). Oxford: Oxford University Press. Smith, A. D. M., & Vogt, P. (2004). Lexicon acquisition in an uncertain world. (Paper given at the 5th International Conference on the Evolution of Language, Leipzig.) Smith, K., Brighton, H., & Kirby, S. (2003). Complex systems in language evolution: The cultural emergence of compositional structure. Advances in Complex Systems, 6(4), 537–558. Smith, L. B. (2001). How domain-general processes may create domain-specific biases. In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (pp. 101–131). Cambridge, UK: Cambridge University Press. Soja, N. N., Carey, S., & Spelke, E. S. (1991). Ontological categories guide young children’s inductions of word meanings: Object terms and substance terms. Cognition, 38, 179–211. Steels, L. (1996). Perceptually grounded meaning creation. In M. Tokoro (Ed.), Proceedings of the International Conference on Multi-agent Systems. Cambridge, MA: MIT Press. Steels, L., & Belpaeme, T. (in press). Coordinating perceptually grounded categories through language: A case study for colour. Behavioral and Brain Sciences. Steels, L., & Kaplan, F. (2002). Bootstrapping grounded word semantics. In E. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models (pp. 53–73). Cambridge, UK: Cambridge University Press. Tomasello, M. (1999). The cultural origins of human cognition. Harvard: Harvard University Press. Tomasello, M., & Rakoczy, H. (2003). What makes human cognition unique?: From individual to shared to collective intentionality. Mind and Language, 18(2), 121–147. Trask, R. L. (1996). Historical linguistics. London: Arnold. Vogt, P. (2002). The physical symbol grounding problem. Cognitive Systems Research Journal, 3(3), 429–457. Vogt, P. (2003). Grounded lexicon formation without explicit reference transfer. In W. Banzhaf, T. Christaller, J. Ziegler, P. Dittrich, & J. T. Kim (Eds.), Advances in artificial life: Proceedings of the 7th European Conference on Artificial Life (pp. 545–552). Heidelberg: Springer. Vogt, P.& Smith, A. D. M. (in press). Learning color words is slow: A cross-situational learning account. Behavioral and Brain Sciences. About the Author Andrew Smith is a research fellow in the Language Evolution and Computation Research Unit at the University of Edinburgh. He received a BA in languages and linguistic science from the University of York, an MSc in computing from the University of Bradford, and his PhD in linguistics from the University of Edinburgh. His current research uses computational simulations to explore processes of language acquisition, change and evolution.
© Copyright 2026 Paperzz