The Inferential Transmission of Language

The Inferential Transmission of Language
Andrew D. M. Smith
Language Evolution and Computation Research Unit, University of Edinburgh, UK
Language is a symbolic, culturally transmitted system of communication, which is learnt through the
inference of meaning. In this paper, I describe the importance of meaning inference, not only in language acquisition, but also in developing a unified explanation for language change and evolution.
Using an agent-based computational model of meaning creation and communication, I show how the
meanings of words can be inferred through disambiguation across multiple contexts, using cross-situational statistical learning. I demonstrate that the uncertainty inherent in the process of meaning inference, moreover, leads to stable variation in both conceptual and lexical structure, providing evidence
which helps to explain how language changes rapidly without losing communicability. Finally, I
describe how an inferential model of communication may provide important theoretical insights into
plausible explanations of the bootstrapping of, and the subsequent progressive complexification of,
cultural communication systems.
Keywords language acquisition · language change · language evolution · meaning inference · cultural transmission · cross-situational learning
1
Introduction
Language is a symbolic, culturally transmitted system
of communication, which is learnt through the inference of meaning. Language is culturally transmitted,
because although our genetically-specified cognitive
capacity equips us with the ability to learn and use language, the specific languages we acquire are clearly
not determined genetically. Rather, the languages
children learn are those which they hear spoken by the
people in their community, not necessarily those of their
parents. Secondly, the meaning of a linguistic utterance is not transmitted directly, but is inferred indirectly by the hearer, through pragmatic insights and
the social context in which the utterance is received. In
this paper, I present a computational model of language based on cultural interactions, individual adap-
tations, and a model of word learning based on the
inference of meaning through cross-situational statistical learning, and use this experimental framework to
explore a unified account of language change on the
three different timescales described by Kirby and Hurford (2002): An individual’s acquisition of language
on an ontogenetic timescale; the historical development of language on a glossogenetic timescale; and the
emergence and complexification of language on a phylogenetic timescale.
The remainder of this article is divided into six
sections. In Section 2, I explore the philosophical problem of identifying the meaning of an unfamiliar word,
and discuss some of the psycholinguistic theories which
have been proposed to explain how children overcome
this problem. In Section 3, I describe the details of the
computational model of language use based on cul-
Correspondence to: Andrew D. M. Smith, Language Evolution and
Computation Research Unit, University of Edinburgh, Adam Ferguson
Building, 40 George Square, Edinburgh EH8 9LL, Scotland, UK.
E-mail: [email protected]
Tel.: +44 (0)131 651 1837, Fax: +44 (0)131 650 3962.
Copyright © 2005 International Society for Adaptive Behavior
(2005), Vol 13(4): 311–324.
[1059–7123(200512) 13:4; 311–324; 059321]
Figures 1, 2, 4, 5 appear in color online: http://adb.sagepub.com
311
312
Adaptive Behavior 13(4)
tural transmission and meaning inference through disambiguation across multiple contexts. In Section 4, I
review previous experiments carried out using this
model, with a particular emphasis on the implementation of many of the psychologically motivated constraints discussed in Section 2, and on comparing the
results with attested evidence from studies of child
language. Section 5 is concerned with historical linguistic change, and focuses on two different kinds of
linguistic variation which are demonstrated in the
model. The basic model is embedded within a generational model, and I present further experimental results
which demonstrate how these variations can themselves
explain the dynamic nature of language across generations of language users: How it changes rapidly over
time while maintaining its utility as a shared communication system. In Section 6, I sketch a theoretical
scenario of qualitative language change on a phylogenetic timescale, through the same processes of inferential
cultural transmission. Finally, in Section 7, I describe
how this opens the way for the inferential transmission
of language to provide a unified account of language
change on all three timescales.
2
Uncertainty in Lexical Acquisition
One of the most distinctive features of human language is the fact that words and their meanings are
related not iconically, through perceptual similarity, but
symbolically, by the very fact of the association alone.
This linkage of words and meanings together in the
Saussurean sign (Saussure, 1916) is a priori arbitrary
and determined by convention. For instance, there is
nothing in the sound of the word chair which suggests
any aspect of its meaning, nor that this is similar to that
of the phonetically dissimilar word seat. Historical
cultural tradition alone determines that Swahili speakers will use the word kiti, and Hungarian speakers the
word szék, to express the same meaning of CHAIR.
But how do we associate words with their correct
arbitrary meaning? Carey & Bartlett (1978) describe the
phenomenon of fast mapping, demonstrating that preschool infants can learn the meaning of an unfamiliar
word, contrasted with a familiar one, after just one
exposure. Lexical acquisition is not only extremely
rapid, but also extensive: Children learn the meanings
of about 40,000 words by the age of ten (Anglin, 1993).
The child’s instinctive, prodigious talent for lexical
acquisition is even more remarkable, given the logical
problem of inducing the meaning of an unfamiliar
word. This was famously illustrated by Quine (1960),
who presented an imaginary anthropologist observing a
speaker uttering the word “gavagai” while pointing
towards a rabbit. Quine explains that, logically, gavagai could mean any one of RABBIT, ANIMAL, DINNER,
UNDETACHED RABBIT PARTS, or indeed an infinite
number of alternatives. Even worse, Quine shows that,
regardless of how much additional information the
anthropologist collects, an infinite number of semantic
hypotheses which are logically consistent with the data
will remain. Theoretically, then, the meaning of an unfamiliar word can never be completely ascertained, yet
in practice, children effortlessly succeed time and again.
There is little consensus, however, on how this success
is achieved. In the following sections, I briefly review
various suggestions from psychologists of how children overcome this problem of meaning uncertainty.
These can be broadly grouped into three different types:
Representational constraints on what the child will
consider; interpretational constraints, context-dependent
strategies which depend on novelty and what has already
been learnt; and more general, social constraints. In
Section 4, I will explore the implementation and effects
of such constraints in a model of lexical acquisition.
2.1 Representational constraints
Macnamara (1982) suggests that, just as children most
naturally represent their environment in terms of the
objects within it, so they also assume that novel words
refer to whole objects, rather than parts or properties
thereof. Markman (1989) claims, further, that such a
whole-object bias is specifically tailored to word learning. Bloom (2001), however, points out that a similar
bias appears to be used in many other cognitive domains,
such as tracking, categorization, addition and subtraction. Although the whole-object bias is therefore useful
in explaining the bootstrapping of lexical acquisition,
it is clearly not a sufficient explanation of the whole
problem. Words are clearly not used solely to name
objects, so additional constraints have been sought by
others to account for more complex facets of word
learning. Landau, Smith, and Jones (1988) presented
children with an unfamiliar object which they explicitly named as a “dax”, then referred to a number of different test objects, asking the children in each case: “Is
this a dax?” Children generalized the new name to
Smith
objects with the same shape as the original object, but
ignored other properties such as size and texture, even
if these appeared much more salient, leading Landau
to propose an innate shape bias. Soja, Carey, and Spelke
(1991), however, show that although children generalize to objects of the same shape as a rigid object, they
generalize to objects of the same material if the original object was not rigid. L. B. Smith (2001), moreover,
reports children paying special attention to the texture
of an object, but only if it has eyes or shoes! These
studies show that children are not only good at making
generalizations based on the properties of objects, but
also at learning which properties are useful to attend
to. Although domain-specific learning biases such as
the proposed shape bias may be used in lexical acquisition, it seems probable that the biases themselves are
shaped by general development processes. In Section 4,
I simulate the existence of representational constraints
in terms of innate biases which make learners more
likely to use particular properties of objects in the categorization process.
2.2 Interpretational Constraints
A common problem with all representational biases,
however, is that they must, during the course of learning, eventually be overcome. The child, for instance,
must be able to find words for parts as well as wholes,
textures as well as shapes. More general constraints on
the interpretation process itself have therefore been
proposed, which depend on what the child has already
learnt. Markman (1989) puts forward one of the most
basic of these principles, the assumption of mutual
exclusivity, or the assumption that words do not share
referents. The use of mutual exclusivity to disambiguate
reference has often been shown experimentally, most
notably by Markman and Wachtel (1988). They assume
that mutual exclusivity applies particularly in the early
stages of lexical acquisition, when most vocabulary
items are basic level words, but weakens over time,
allowing the child to construct words with overlapping
extensions, and thus semantic hierarchies. Clark (1987)
proposes a similar theory focused on the more general
notion of contrast, which assumes that any difference
in form marks a difference in meaning. From this, she
predicts that children use contrast both to assign novel
words to gaps in their lexicons, and to coin new words
to fill such gaps when necessary. In Section 4, I show
how inferential models of language learning which
Inferential Language Transmission
313
include an implementation of mutual exclusivity allow
agents with very different conceptual structures to communicate much more successfully than those without
mutual exclusivity.
2.3 Social Constraints
Many other principles of social and cognitive development are also used in lexical acquisition (Tomasello,
1999), often grouped together under the rubric of theory of mind, which is widely held to be the distinguishing factor between the social cognition of humans and
that of other animals. Tomasello and Rakoczy (2003)
argue that there are two crucial stages in the development of this specialized social cognition. Firstly, at
around age 1, children begin to understand adults as
intentional agents, following their gaze and where
they point to. They understand that adults have control
over their perceptions, and can choose to attend to particular objects or aspects of a situation. Because children learn to understand the attentional intentions of
others, they also realize that they can attract another’s
attention to something by pointing at it or holding it up
for inspection (Liszkowski, Carpenter, Henning, Striano, & Tomasello, 2004). This joint attention of both
child and adult to the same situation allows the child to
greatly reduce the number of interpretations it will
consider for a signal. In order to be more specific about
the particular experience or property we want to draw
someone else’s attention to, linguistic symbols become
necessary (Tomasello, 1999). At this stage, the representational and interpretational constraints described
above allow the child to enter into a virtuous circle of
increasing lexical acquisition and increasing cognitive
development. In Section 4, I present findings which
show that the relative degree of joint attention in the
communicative process is very important in determining
how long it will take for a learner to learn a lexicon.
3
Modeling the Inference of Meaning
3.1 Cultural Transmission
The evolution of language has, until recently, been primarily viewed in terms of explaining the biological
evolution of a language instinct (Pinker, 1994), which
has been characterized as a specialized cognitive organ
within the brain. This language organ is held to con-
314
Adaptive Behavior 13(4)
Figure 1 The expression/induction model of language as a dynamic, culturally transmitted system. Individuals express
linguistic behavior based on their internal representations; their internal representations, in turn, are induced in response
to encountered linguistic behavior. Language persists in two qualitatively different states: internal knowledge and external behavior.
tain both a formal coding of Universal Grammar,
which limits the set of possible human languages, and
a Language Acquisition Device, which directs the
course of grammar construction based on the observed
primary linguistic data (Chomsky, 1965). Accounts
within this framework seek to demonstrate how this
innate, domain-specific cognitive organ could have
evolved incrementally through a standard adaptionist
process of natural selection (Jackendoff, 2002).
An alternative view, however, focuses on how
linguistic structures themselves adapt to be learnt by
humans. The two distinct manifestations of language
that we recognize from the Chomskyan account are
reconfigured as distinct, qualitatively different, phases
in the life cycle of a language: Individuals express external linguistic behavior based on their internal linguistic
representations, and induce internal linguistic representations, or grammars, in response to the external
linguistic behavior (or primary linguistic data) they
encounter. Language is culturally transmitted, because
the linguistic input used by one individual to construct
its grammar is itself the output of other individuals, as
shown diagrammatically in Figure 1. Language learners attempt to acquire the language of the other members of their community, and differences between their
internal grammars and those of other individuals occur
as a result of the dynamic cultural evolution of the language itself. Models which represent the transmission
of language in this way have been termed expression/
induction (E/I) models (Hurford, 2002) or iterated
learning models (Smith, et al.). Such models have been
used to demonstrate the cultural emergence of structural properties of language, such as compositionality
(Batali, 2002; Brighton, 2002) and recursion (Kirby,
2002). The key feature of these models is a transmission
bottleneck, which restricts the amount of linguistic data
a learner is exposed to. Under these conditions, parts
of language which are regular, or can be generalized
from other examples, are much more likely to persist
through a repeated cycle of expression and induction
than idiosyncratic parts of language which cannot be
generalized.
3.2 Meaning Inference
The majority of these models of language evolution,
however, do not address the problem of learning what
words mean, but rather they simply assume that meanings are pre-defined entities, and that the transmission
of language consists of the simultaneous and explicit
transfer of signals and meanings. This assumption of
explicit meaning transfer leads to two major conceptual difficulties, namely signal redundancy and lack of
semantic variation. Firstly, as I have argued previously (Smith, 2003a, 2005), if meanings are transferred telepathically, then any signals which are used
by the individuals cannot be said to actually convey
meaning at all. If they have no meaningful content,
Smith
then their very existence poses the problem of signal
redundancy: What is the motivation for language users
to spend time and energy in learning a symbolic system of signals which provides them no additional
information? Secondly, languages are actual historical
entities, which are constantly changing, as a result of
variation at many levels of analysis (Trask, 1996).
Indeed it has been argued by many, including Bybee,
Perkins, and Pagliuca (1994) and Croft (2000), that variability is one of the most fundamental features of language, which must be taken account of in a realistic
model of language learning and use.
In the inferential model presented in this article,
therefore, agents initially have neither lexical nor conceptual structures, but merely the ability to develop
individual conceptual representations and to learn from
their own experiences. This model differs from others
which make use of the E/I framework, because it does
not contain a pre-defined, structured meaning system,
but instead assumes that individuals infer meaning
through experience, and that the meanings so inferred
can vary between individuals: See also Vogt (2003) for
a similar approach to using E/I models without explicit
meaning transfer. Crucial to this model is the existence
of an external world, as the source from which meaning can be inferred; without it, meanings must be predefined, can only be communicated directly through
some kind of telepathy, and cannot vary naturally. The
model therefore contains three different levels of representation:
A. an external environment, which provides the motivation and source for meaning creation;
B. an agent-specific internal representation of meaning, which is not accessible by others;
C. a set of signals which can be transmitted between
agents.
Figure 2 shows these three levels of representation
in a model of communication which both avoids the
signal redundancy paradox and grounds the symbols
(Harnad, 1990). The demarcation of the representation
into an external domain, containing things which can
potentially be accessed and manipulated by all individuals, and an internal, private domain, containing items
only accessible by the individual itself, is vital to the
validity of the model. The mere existence of an external world, as in Hutchins and Hazlehurst (1995)’s
model of shared vocabulary development, is not suffi-
Inferential Language Transmission
315
Figure 2 A model of communication which avoids the
problem of signal redundancy. The model has three levels of representation: an external environment (A); an internal semantic representation (B); and a public set of
signals (C). The mappings between A and B and between
B and C, represented by the arrows, also fall into the internal, private domain, whose boundary is shown by the
dotted line.
cient to avoid the signal redundancy problem: Not only
must an agent’s semantic representation be private, but
so must the mappings which map their meanings both
to signals and to objects in the world.
3.3 Description of the Model
In the model of inferential learning described in this
paper, the world contains objects, which can be objectively described in terms of their feature values, real
numbers pseudo-randomly generated within the range
[0, 1]. Agents can use their dedicated sensory channels
to sense whether a particular feature value falls between
two bounds, can create meanings which allow them to
distinguish objects from each other, and can create
words to express these meanings. In the experiments
presented here, the world contains 20 different objects
and each agent has 5 sensory channels. The model is
based on the language games described by Steels (1996),
but is extended in a number of ways.
The initial source of an agent’s interaction with the
environment is through discrimination games (Steels,
1996). A subset of the objects, called the context, is
chosen from the world and presented to one of the
agents; one of these objects is chosen, at random, to
be the target of the discrimination episode, and the
agent aims to distinguish this object from all the others
316
Adaptive Behavior 13(4)
in the context. The agent searches its sensory channels
for a distinctive category, namely a semantic representation which both correctly describes the target, and
does not accurately describe any other object in the
context. In the experiments presented here, distinctive
categories are restricted to single categories, rather
than logical combinations of nodes from different sensory channels. Failure triggers meaning creation by
splitting the sensitivity range of a sensory channel into
two discrete, equally sized segments, each of which is
therefore sensitive to half the range of the previous
segment. An agent searches through its sensory channels until it finds a split which would have produced a
successful distinctive category in the current discrimination game, had the category already existed. Repeated
splitting results in a hierarchical, tree-like conceptual
structure, whose nodes represent semantic categories.
Nodes nearer the root of the tree represent more general meanings, with wider sensitivity ranges which
cover a greater proportion of the semantic space, and
nodes nearer the leaves represent more specific meanings. There is no pre-definition of which categories
should be created, and meaning creation is carried out
by each agent individually, so agents create different,
but typically equally successful, semantic representations of the world.
Having developed meanings which can effectively
describe the objects in the world, agents communicate
about the objects using the distinctive category chosen in
the discrimination process. Hurford (1989) introduced
the idea of using dynamic communication matrices to
model the evolution of communication strategies, and
showed that bidirectional, Saussurean mappings between
signals and meanings are essential for the development
of viable communication systems. Oliphant and Batali
(1997) extended this model to show that the best way
of ensuring continuing increases in communicative
accuracy is to choose signals based on how they are
interpreted by the rest of the population. Oliphant and
Batali’s algorithm, however, requires that agents can
directly access the internal semantic representations of
other agents. In order to avoid this mind-reading, I use
a modified version of their algorithm, called introspective obverter (Smith, 2003b), where the speaker puts
itself into the hearer’s shoes, and chooses the signal
which it would be most likely to interpret correctly as
the hearer, given the current context and its own semantic representations. Signal choice is therefore based on
the speaker’s own interpretative behavior. The speaker
uses this algorithm to choose a signal for the distinctive category it found, and transmits this to the hearer,
together with the context of the discrimination game.
Neither the meaning itself, nor the target object to which
the meaning refers, are explicitly identified to the hearer.
The hearer interprets the signal, and learns its meaning, solely from the information in the current context
and from its previous experience of the signal in other
contexts.
In order to infer the meaning of an utterance, the
hearer first uses the conceptual structures it has developed to play a separate discrimination game for each
object in the context (i.e., with each object in turn serving
as the target object), thereby creating a list of semantic
hypotheses. This list consists of every meaning in its
current conceptual structure which could serve as a
distinctive category for any single object in the context. In principle, without any constraints such as those
discussed in Section 2, each of these possible meanings is equally plausible, so the hearer considers all of
them, and stores them individually in its internal lexicon in association with the signal. This lexicon contains
a count of the co-occurrence of each signal-meaning
pair <s, m>, which is used to calculate the conditional
probability P(m|s) that, given the signal s, the meaning
m is associated with s, according to the formula
f ( s, m )
P ( m s ) = --------------------------n
∑ f ( s, i )
i=1
where f(s, m) is the number of times s has been associated with m and n is the number of items in the lexicon
(Smith, 2003b). The hearer uses its lexicon to choose a
preferred meaning for the signal, namely the one which
has the highest conditional probability for the received
signal. If two or more meanings have equal conditional
probability, then one of them is chosen at random.
The success of the communicative episode is measured by referent identity: The hearer’s inferred meaning is a distinctive category which picks out one of the
objects in the context, and if this object is the same as
the speaker’s initial target, then the episode succeeds.
Importantly, therefore, there is no requirement for agents
to use (or even to have) the same internally specified
meaning, only that they both identify the same external
referent. The measurement of communicative success,
indeed, takes place solely for the benefit of the experimenter: Neither agent receives any information at all
Smith
about the result of the communicative episode. The
learning mechanism, therefore, does not rely on any
corrective feedback, in contrast to the guessing game
(Steels & Kaplan, 2002; Vogt, 2002; Steels & Belpaeme,
in press), but instead relies on the co-occurrence of
words and their inferred meanings. The status of such
feedback in lexical acquisition is the source of much
current debate in psycholinguistics. It is widely accepted
that children receive little, if any, direct corrective
feedback while learning words (Bloom, 2000). Lieven
(1994), indeed, describes cultures in which children are
not even addressed in the early stages of acquisition. On
the other hand, Chouinard and Clark (2003) have shown
that adults often reformulate what they think children
have said, and that such reformulations can act as an
important source of implicit feedback. This model
described here, however, serves to show that successful lexical acquisition can take place without explicit
feedback, using cross-situational statistical learning
(see also Vogt and Smith, in press).
3.4 Cross-situational Statistical Learning
Cross-situational statistical learning is based on the
statistical co-occurrence of words and their inferred
meanings, and is similar to the technique proposed by
Siskind (1996). It differs from Siskind’s model most
fundamentally in that his learners are provided with a
hypothesis set which already contains all possible
meanings in the world, from which they eliminate
those which are incoherent in the current situation.
Inferential Language Transmission
317
By contrast, in the model presented here, the hypothesis set is in principle infinite, as learners create
new meanings through experience with the world,
and choose the meanings which are most plausible,
given both the current situation and their interaction
history.
In order to fully understand the process of crosssituational statistical learning within the language game
model, let us go through a number of simplified example games shown in Figure 3, to see how the meaning
of an unfamiliar word is disambiguated through its
repeated occurrence in different contexts. Suppose that
objects are described only in terms of two features,
shape and brightness, and that the hearer encounters
the contexts and utterances shown in columns A and
B. For the purposes of this exposition, there are five
different shapes shown in Figure 3, and three different
categories of brightness, namely LIGHT, INTERMEDIATE
and DARK. The hearer, moreover, assumes that each
utterance serves to discriminate one (and only one)
object from all the others in the context.
In the first game, the hearer encounters two wheels,
two trees and a star: In terms of shape, therefore, only
STAR can be used to describe a single object and is therefore a possible meaning. In terms of brightness, there
is one light object (the first tree), one dark object (the
second wheel), and the rest are intermediate: Both light
and DARK could also be possible meanings, and therefore both, together with STAR, are shown in column C,
which represents the set of semantic hypotheses in the
current game. Column D shows all the meanings which
Figure 3 Cross-situational statistical learning across three language games. Each game shows the context of objects
(A); the signal uttered (B); the current set of semantic hypotheses constructed by the hearer (C); and the relevant part of
the hearer’s lexicon (D), with meanings (m) and the frequency of their co-occurrence with the signal (f).
318
Adaptive Behavior 13(4)
have ever been associated by the hearer with the current signal ikwob; as this is the first time the word has
been encountered, it contains the three possible meanings in column C, and a co-occurrence frequency of
one for each. As all the co-occurrence frequencies are
equal, the hearer must choose one of these meanings at
random.
In the second game, the hearer encounters ikwob
again, this time in a completely different context. Using
the same process as before, the hearer creates another
(different) set of semantic hypotheses for the current
context, namely DARK, MAN, STAR and TREE, shown in
column C. After adding these meanings to the lexicon,
and updating the co-occurrence frequencies, we can
see in column D that after two games, the hearer now
has five possible meanings, whose co-occurrence frequencies are no longer equal. Both DARK and STAR have
occurred twice in conjunction with ikwob, but the others
only once. The ambiguity has therefore been reduced,
but not yet eliminated, and so the hearer would choose
a meaning at random from those with the highest cooccurrence. The third game proceeds in the same fashion, and provides the hearer with a further set of possible meanings: LIGHT and STAR. When these are added
to the lexicon, one meaning has a higher co-occurrence
frequency than all the others, and so the hearer is confident that the meaning of ikwob is STAR.
4
Inferential Acquisition
Recent empirical research, indeed, shows that a crosssituational model of learning provides a robust account
of lexical acquisition. Houston-Price, Plunkett, Harris,
and Duffy (2003) show that children use cross-situational learning to disambiguate word reference, even
though their experiments were designed with attentional cues. Akhtar and Montague (1999), and Klibanoff
and Waxman (2000), have separately demonstrated that
novel adjectival categories are learnt cross-situationally, within the context of basic level categories.
Previous experiments using this and similar computational models have also shown that large lexicons
can be learnt, and that agents with different conceptual
structures can successfully communicate with each other.
The parameters of these models can also be varied, to
implement the constraints on lexical learning discussed in Section 2, and to explore their effects on learning and communication. Such experiments have found
that communicative success is closely related to the
level of meaning similarity between agents (Smith,
2003b). Moreover, if agents have the same representational biases, then they are more likely to develop similar meanings. This relationship can be diluted, however,
if agents use an interpretational constraint like mutual
exclusivity, reducing the number of semantic hypotheses under consideration by ignoring those objects for
which they already know an appropriate word. Under
these circumstances, high levels of communicative success occur even among agents with very dissimilar conceptual structures (Smith, 2005). Social constraints such
as joint attention, moreover, can be implemented by
altering the size of the context in a communicative episode. As the size of the context increases, so the time
taken for the learner to learn the lexicon increases
(Smith, 2003a; Smith & Vogt, 2004). The inferential
model of communication and cross-situational statistical learning presented here is, therefore, a plausible
model of lexical acquisition, whose results correspond
well with recent attested evidence from child language
studies.
5
Inferential Variation and Change
It is well recognized that language change is driven by
variation in language communities (Trask, 1996). In
the inferential model described in this paper, there are
two important sources of variation, which I call conceptual and lexical. In Section 5.1, I will describe the
source and effects of these variations, and present methods of measuring them. In Section 5.2, I show how the
inferential paradigm can be used to explain aspects of
historical linguistic change. Examples of both types of
variation can be seen in Figure 4. Taken from a representative simulation, this diagram shows extracts from
the conceptual structure of an adult and a child. Each
agent has five sensory channels on which conceptual
structures are built, but for ease of exposition, only one
of these is shown here.
5.1 Conceptual and Lexical Variation
The independent creation of conceptual structure leads
inevitably to variation in agents’ semantic representations, both because an agent’s response to a particular
experience is not deterministic, and because agents’
experiences themselves differ (Smith, 2003a). In the
Smith
Inferential Language Transmission
319
Figure 4 Extract from the internal structures of two agents, showing variation in both conceptual and lexical structures.
The conceptual structures are shown by hierarchical tree structures, each node of which represents a different meaning.
Conceptual variation, where meanings have no corresponding equivalent in the other agent’s conceptual structure, is
marked with dotted lines. Lexical structures are represented by the words attached to the nodes, which signify the
agent’s preferred word for the meaning; empty nodes have no preferred word. Lexical variations, where the agents disagree on the meaning of a word, are circled in the right-hand structure.
upper part of Figure 4, we first consider only the agents’
conceptual structures, shown by the hierarchical tree
structures. Nodes with no equivalent in the other agent’s
conceptual structure are marked with dotted lines.
Although the two agents in Figure 4 have developed
similar structures, it is clear to see that in three different
places, the child has developed additional conceptual
structure. Such conceptual variation can be quantified
by considering the nodes which the trees have in common. If k(t, u) is the number of nodes which two trees
t and u have in common, and n(t) is the total number of
nodes on tree t, then the similarity τ(t, u) between trees
t and u is
2k ( t, u )
τ ( t, u ) = --------------------------n(t) + n(u)
Averaging τ across all their sensory channels, we can
produce a measure of conceptual, or meaning, similarity between two agents (Smith, 2003a).
Secondly, the uncertainty inherent in cross-situational learning produces inevitable variation in the
agents’ lexical associations. The inferred meanings are
dependent on the particular conceptual structures which
the agent has created, and the associations themselves
depend on the particular contexts in which words are
heard. Lexical variation can be measured by consider-
320
Adaptive Behavior 13(4)
ing whether agents have the same preferred word for
each meaning. An agent’s preferred word for meaning
m is the word in its lexicon which has the highest conditional probability in association with m, and which
does not have a higher conditional probability in association with a different meaning. Preferred words are
represented in the lower part of Figure 4 by the words
attached to the nodes; empty nodes have no preferred
word, and circles are used to highlight lexical variations, where the agents have different preferred words.
The words wm and hhd, for instance, have not been
learnt correctly by the child, although the relevant
nodes on the adult’s conceptual structure do exist in
the child’s conceptual structure. The child has attached
both wm and hhd to nodes nearer the root of the tree;
because these nodes cover a larger degree of semantic
space than their meanings for the adult, this kind of
change can be considered generalization. In Section 5.2,
I describe why generalization of this kind occurs frequently in this model.
Lexical items are said to persist if they are successfully learnt; lexical persistence across the whole of
an agent’s lexicon is a very useful measure of linguistic
change, and can be measured both within and between
generations. Intra-generational lexical persistence is
the proportion of the adult’s lexicon learnt correctly by
the child, while inter-generational lexical persistence
is the proportion of the original language developed by
the adult in the first generation of the simulation,
which is still intact in the language of the child at the
end of the nth generation. At the end of each generation, each agent has approximately 50 preferred words
in their lexicon.
5.2 Experimental Results
To investigate semantic change across multiple generations of cultural transmission, the basic inferential
model is extended vertically into a traditional iterated
learning model with generational turnover (Smith et al.,
2003). Each generation has two phases: A set of 100
orientation episodes, followed by a number of communication episodes. In the orientation phase, the agents
explore the world individually, and create meanings to
represent what they encounter, through discrimination
games. Both agents take part in this phase, though it is
almost redundant for the adults who have already
developed a rich conceptual structure, except in the
initial generation when they have none. In the commu-
nication phase, the adult attempts to communicate to
the child as described in Section 3.3; in each communicative episode there are five objects in the context.
Communicative success occurs when the object identified by the hearer’s chosen meaning is the same as the
speaker’s initial target object. There is no requirement
for the agents to use identical internal meanings, only
that they identify the same external referent. Neither
agent receives any feedback about the communicative
success of the episode. At the end of a generation, the
adult is removed, the child becomes adult, and a new
child is introduced. The language inferred in the previous generation by the child becomes the source of its
output in the subsequent generation, as described in
Section 3. Figure 5 shows results from a typical simulation run over ten generations, each made up of 5,000
episodes. Analyses of meaning similarity, of communicative success over the previous 100 episodes, and of
inter- and intra-generational lexical persistence were
calculated.
Previous work has shown that levels of communicative success are closely correlated with levels of
meaning similarity in mono-generational models of
acquisition (Smith, 2003b). In the left-hand graph, we
can clearly see here that levels of meaning similarity
and communicative success are again very closely
correlated in a multi-generational model. In each generation, the communicative success rate rises rapidly
at first, as the child successfully learns the meanings
of many words, then the increase slows, as the child
tries to infer the meanings of the remaining words.
These words stand for meanings which are seldom
used by the adult and so occur relatively infrequently
in communicative episodes, and are therefore learnt
much more slowly.
In the right-hand graph, we see that the rate of
inter-generational lexical persistence shows a considerable cumulative decline after only a few generations, although the intra-generational rate remains
stable across generations. There are two separate
pressures on the language which enforce this relentless erosion over successive generations of inferential cultural transmission, which can be regarded
as twin bottlenecks on the language’s transmission.
Conceptual variation restricts the number of words
which can potentially persist into the next generation: Only words which refer to meanings which are
shared by the agents can be learnt. Lexical variation,
through imperfect learning, then restricts the number
Smith
Inferential Language Transmission
321
Figure 5 An iterated inferential model, with generations of 5,000 episodes. Communicative success and meaning similarity (left); intra-generational and inter-generational lexical persistence (right).
of words which actually persist into the next generation. The pressures from these two bottlenecks naturally result in a steady cumulative decline in intergenerational lexical persistence. Although the language changes rapidly, such that very little of the
original adult’s language remains after only a few
generations, we can see from the left-hand graph that
communicative success between adults and children
within a single generation is not affected, and remains
very high.
The language change described in these experiments also has a distinct qualitative pattern, in that words
which refer to more specific meanings tend to disappear first, and only more general words tend to survive
across multiple generations. This occurs because the
Steelsian method of hierarchical conceptual construction forces some order on the meanings created: There
is no way, for instance, to create a meaning in the depths
of a tree without first creating the relevant meanings
further up the hierarchical structure. This means that
more general meanings are more likely to be shared by
the agents, and therefore more likely to pass through
the conceptual variation bottleneck. Secondly, agents
use a communicative model which follows Grice
(1975)’s maxim of quantity, in that distinctive categories provide sufficient information to identify the target,
but are not unnecessarily specific. General meanings
are more likely to be used by the adult and inferred by
the child, and therefore pass through the second bottleneck on learning.
6
Inferential Evolution
Meaning inference is important not only in explaining
how language can change so rapidly without becoming
incomprehensible to its users, but also in theoretical
explanations of how communication could have begun
in the first instance, and how it could have become
increasingly complex without losing its utility. Although
communication is commonly characterized as the
passing of information from speaker to hearer, Burling
(2000) points out that the initial communicative episode was not triggered by a speaker making an intentional signal, but rather by a hearer interpreting some
behavior as a signal. The existence of communication
is indeed defined by interpretative intent: No matter
how many signals are sent, communication does not
happen until someone tries to interpret them. Even
involuntary behavior can be interpreted as a signal,
and the act of interpretation is indeed performative,
rendering the original behavior a signal and the whole
episode communicative. Premack (1975) gives the
example of an individual who always gives a cry of
excitement on finding a strawberry. Even without an
intention to provide information to others, the call
becomes functionally referential when it is associated
by a hearer with the presence of strawberries, and
communication is founded.
Exactly the same constraints, however, apply not
only to the instantiation of communication, but to its
progressive complexification. Any viable develop-
322
Adaptive Behavior 13(4)
ment of a communication system is constrained by the
development of the hearer’s interpretative capabilities, because utterances must be able to be interpreted
in order to survive. Origgi & Sperber (2000) point out
why the inferential nature of human communication is
central to its evolution, by contrasting it with an alternative view of communication, as a code, where meanings are encoded into signals and decoded back into
meanings. Coded communication systems work very
well, but only when interlocutors share the same set of
signals and meanings; mismatches in either set lead to
communication failure. The complexification of language in such a system is a puzzle, because any kind of
modification in one individual’s internal linguistic
representation, even one which could result in the
acquisition of a more complex, potentially more beneficial language, would cause a mismatch between that
individual and the others, and thus communication
failure. As we have seen, however, an inferential model
can allow for divergent and dynamic conceptual structures, and yet still be used in successful communication. The inference by the hearer of a richer, more
complex semantic structure than was intended by the
speaker does not necessarily result in a catastrophic
breakdown in communication. Instead, the inference
of additional semantic structure may lead the hearer to
search for additional information from the context to
satisfy the new inferred structure. They will then use
the same signals as other individuals, but associate
them with more detailed meanings. Although this
extra detail will be ignored by most interlocutors, if it
is in any way beneficial to those who can understand it,
then the capacity to infer more complex structure may
become stabilized in the population.
The inference of meaning and repeated form–function reanalysis therefore provides an important theoretical insight into how communication systems like
language might have evolved initially ex nihilo, and how
they might have become progressively more complex.
Future research is planned with complex models of
semantic inference to explore this hypothesis more
closely.
7
Conclusions
It is important to acknowledge not only that language
is a culturally-transmitted system of communication,
but also that this transmission is based on the infer-
ence of meaning. Inferential communication provides
a straightforward explanation for the existence of otherwise redundant signals, and the simulations presented here show how the same process may underlie
the development of language on three different timescales: Acquisition in the child; change in the language; and evolution in the species.
I have shown how the basic model of cross-situational learning, attested in lexical acquisition, can be
enhanced by psychologically plausible representational
constraints which allow individuals to build similar
conceptual structures, interpretational constraints which
allow successful communication between agents with
divergent conceptual structures, and social constraints
which allow more rapid learning. I have explained
how the uncertainty inherent in meaning inference
leads to variation in both conceptual and lexical structure, and presented experiments which show how language can both change rapidly over generations, while
maintaining its communicative utility in the language
community. Finally, I have sketched a scenario in which
the inference of meaning may explain the development
and complexification of a communication system driven
by interpretative capabilities.
Acknowledgments
This research was supported by ESRC postdoctoral fellowship
PTA-026027-0094. I am grateful to three anonymous reviewers
for their constructive comments on an earlier draft of this paper.
References
Akhtar, N., & Montague, L. (1999). Early lexical acquisition:
The role of cross-situational learning. First Language, 19,
347–358.
Anglin, J. M. (1993). Vocabulary development: A morphological analysis. Monographs of the Society for Research in
Child Development, 58(10), 1–166.
Batali, J. (2002). The negotiation and acquisition of recursive
grammars as a result of competition among exemplars. In
E. Briscoe (Ed.), Linguistic evolution through language
acquisition: Formal and computational models (pp. 111–
172). Cambridge, UK: Cambridge University Press.
Bloom, P. (2000). How children learn the meanings of words.
Cambridge, MA: MIT Press.
Bloom, P. (2001). Roots of word learning. In M. Bowerman &
S. C. Levinson (Eds.), Language acquisition and concep-
Smith
tual development (pp. 159–181). Cambridge, UK: Cambridge University Press.
Brighton, H. (2002). Compositional syntax from cultural transmission. Artificial Life, 8(1), 25–54.
Burling, R. (2000). Comprehension, production and conventionalisation in the origins of language. In C. Knight, M.
Studdert-Kennedy, & J. R. Hurford (Eds.), The evolutionary emergence of language (pp. 27–39). Cambridge, UK:
Cambridge University Press.
Bybee, J., Perkins, R., & Pagliuca, W. (1994). The evolution of
grammar: Tense, aspect and modality in the languages of
the world. Chicago: University of Chicago Press.
Carey, S., & Bartlett, E. (1978). Acquiring a single new word.
Papers and Reports on Child Language Development, 15,
17–29.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chouinard, M. M., & Clark, E. V. (2003). Adult reformulations
of child errors as negative evidence. Journal of Child Language, 30, 637–669.
Clark, E. V. (1987). The principle of contrast: A constraint on
language acquisition. In B. MacWhinney (Ed.), Mechanisms of language acquisition. London: Erlbaum.
Croft, W. (2000). Explaining language change: An evolutionary approach. Harlow, UK: Pearson.
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L.
Morgan (Eds.), Syntax and semantics (Vol. 3, pp. 41–58).
New York: Academic Press.
Harnad, S. (1990). The symbol grounding problem. Physica, D
42, 335–346.
Houston-Price, C., Plunkett, K., Harris, P., & Duffy, H. (2003).
Developmental change in infants’ use of word meaning.
(Paper presented to XIth European Conference on Developmental Psychology, Catholic University of Milan)
Hurford, J. R. (1989). Biological evolution of the Saussurean
sign as a component of the language acquisition device.
Lingua, 77, 187–222.
Hurford, J. R. (2002). Expression/induction models of language evolution: Dimensions and issues. In E. Briscoe
(Ed.), Linguistic evolution through language acquisition:
Formal and computational models (pp. 301–344). Cambridge, UK: Cambridge University Press.
Hutchins, E., & Hazlehurst, B. (1995). How to invent a lexicon:
The development of shared symbols in interaction. In N.
Gilbert & R. Conte (Eds.), Artificial societies: The computer simulation of social life. London: UCL Press.
Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press.
Kirby, S. (2002). Learning, bottlenecks and the evolution of
recursive syntax. In E. Briscoe (Ed.), Linguistic evolution
through language acquisition: Formal and computational
models (pp. 173–203). Cambridge, UK: Cambridge University Press.
Inferential Language Transmission
323
Kirby, S., & Hurford, J. R. (2002). The emergence of linguistic
structure: An overview of the iterated learning model. In
A. Cangelosi & D. Parisi (Eds.), Simulating the evolution
of language (pp. 121–148). London: Springer.
Klibanoff, R. S., & Waxman, S. R. (2000). Basic level object
categories support the acquisition of novel adjectives: Evidence from pre-school aged children. Child Development,
7 (3), 649–659.
Landau, B., Smith, L. B., & Jones, S. S. (1988). The importance of shape in early lexical learning. Cognitive Development, 3, 299–321.
Lieven, E. V. M. (1994). Crosslinguistic and crosscultural
aspects of language addressed to children. In C. Gallaway
& B. J. Richards (Eds.), Input and interaction in language
acquisition (pp. 56–73). Cambridge, UK: Cambridge University Press.
Liszkowski, U., Carpenter, M., Henning, A., Striano, T., &
Tomasello, M. (2004). Twelve-month-olds point to share
attention and interest. Developmental Science, 7(3), 297–307.
Macnamara, J. (1982). Names for things: A study of human
learning. Cambridge, MA: MIT Press.
Markman, E. M. (1989). Categorization and naming in children: Problems of induction. Cambridge. MA: MIT Press.
Markman, E. M., & Wachtel, G. F. (1988). Children’s use of
mutual exclusivity to constrain the meaning of words.
Cognitive Psychology, 20, 121–157.
Oliphant, M., & Batali, J. (1997). Learning and the emergence
of coordinated communication. Center for Research on
Language Newsletter, 11(1).
Origgi, G., & Sperber, D. (2000). Evolution, communication
and the proper function of language. In P. Carruthers & A.
Chamberlain (Eds.), Evolution and the human mind: Modularity, language and meta-cognition (pp. 140–169).
Cambridge, UK: Cambridge University Press.
Pinker, S. (1994). The language instinct. London: Penguin.
Premack, D. (1975). On the origins of language. In M. S. Gazzaniga & C. B. Blakemore (Eds.), Handbook of psychobiology (pp. 591–605). New York: Academic Press.
Quine, W. v. O. (1960). Word and object. Cambridge, MA:
MIT Press.
Saussure, F. d. (1916). Cours de linguistique générale. Paris:
Payot.
Siskind, J. M. (1996). A computational study of cross-situational techniques for learning word-to-meaning mappings.
Cognition, 61, 39–91.
Smith, A. D. M. (2003a). Evolving communication through the
inference of meaning. PhD thesis, Philosophy, Psychology
and Language Sciences, University of Edinburgh.
Smith, A. D. M. (2003b). Intelligent meaning creation in a
clumpy world helps communication. Artificial Life, 9(2),
175–190.
Smith, A. D. M. (2005). Mutual exclusivity: Communicative
success despite conceptual divergence. In M. Tallerman
324
Adaptive Behavior 13(4)
(Ed.), Language origins: Perspectives on evolution (pp.
372–388). Oxford: Oxford University Press.
Smith, A. D. M., & Vogt, P. (2004). Lexicon acquisition in an
uncertain world. (Paper given at the 5th International Conference on the Evolution of Language, Leipzig.)
Smith, K., Brighton, H., & Kirby, S. (2003). Complex systems
in language evolution: The cultural emergence of compositional structure. Advances in Complex Systems, 6(4),
537–558.
Smith, L. B. (2001). How domain-general processes may create
domain-specific biases. In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (pp. 101–131). Cambridge, UK: Cambridge University
Press.
Soja, N. N., Carey, S., & Spelke, E. S. (1991). Ontological categories guide young children’s inductions of word meanings: Object terms and substance terms. Cognition, 38,
179–211.
Steels, L. (1996). Perceptually grounded meaning creation. In
M. Tokoro (Ed.), Proceedings of the International Conference on Multi-agent Systems. Cambridge, MA: MIT
Press.
Steels, L., & Belpaeme, T. (in press). Coordinating perceptually grounded categories through language: A case study
for colour. Behavioral and Brain Sciences.
Steels, L., & Kaplan, F. (2002). Bootstrapping grounded word
semantics. In E. Briscoe (Ed.), Linguistic evolution through
language acquisition: Formal and computational models
(pp. 53–73). Cambridge, UK: Cambridge University Press.
Tomasello, M. (1999). The cultural origins of human cognition.
Harvard: Harvard University Press.
Tomasello, M., & Rakoczy, H. (2003). What makes human
cognition unique?: From individual to shared to collective
intentionality. Mind and Language, 18(2), 121–147.
Trask, R. L. (1996). Historical linguistics. London: Arnold.
Vogt, P. (2002). The physical symbol grounding problem. Cognitive Systems Research Journal, 3(3), 429–457.
Vogt, P. (2003). Grounded lexicon formation without explicit
reference transfer. In W. Banzhaf, T. Christaller, J. Ziegler, P. Dittrich, & J. T. Kim (Eds.), Advances in artificial
life: Proceedings of the 7th European Conference on Artificial Life (pp. 545–552). Heidelberg: Springer.
Vogt, P.& Smith, A. D. M. (in press). Learning color words is
slow: A cross-situational learning account. Behavioral and
Brain Sciences.
About the Author
Andrew Smith is a research fellow in the Language Evolution and Computation
Research Unit at the University of Edinburgh. He received a BA in languages and linguistic science from the University of York, an MSc in computing from the University of Bradford, and his PhD in linguistics from the University of Edinburgh. His current research
uses computational simulations to explore processes of language acquisition, change
and evolution.