University of California
Los Angeles
The effect of structure in population and
grammar space on language divergence
A dissertation submitted in partial satisfaction
of the requirements for the degree
Doctor of Philosophy in Biology
by
Yoosook Lee
2006
c Copyright by
°
Yoosook Lee
2006
The dissertation of Yoosook Lee is approved.
Richard Zimmer
Peter Narins
Edward P. Stabler
Charles E. Taylor, Committee Chair
University of California, Los Angeles
2006
ii
To my parents
iii
Table of Contents
1 Studies on language evolution . . . . . . . . . . . . . . . . . . . . .
1
1.1
Language organ . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Parallel between molecular evolution and language evolution . . .
3
1.3
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.4
Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.4.1
Population Network Model . . . . . . . . . . . . . . . . . .
8
1.4.2
Parameterized Grammar Model . . . . . . . . . . . . . . .
10
1.4.3
Grammar Network Model . . . . . . . . . . . . . . . . . .
12
2 The role of population structure in language evolution . . . . .
14
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.2
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3
Analytic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3 The role of language structure in language evolution . . . . . .
29
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.2
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.3
Analytic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
iv
3.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4 Evolutionary dynamics in grammar networks . . . . . . . . . . .
44
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.2
Evolutionary Dynamics of a Population of Language Learners . .
47
4.3
Grammar Network (GN) Model . . . . . . . . . . . . . . . . . . .
49
4.3.1
The Complete Graph . . . . . . . . . . . . . . . . . . . . .
50
4.3.2
Regular Ring Lattice . . . . . . . . . . . . . . . . . . . . .
51
4.3.3
Random Network . . . . . . . . . . . . . . . . . . . . . . .
52
4.3.4
Small-World Network . . . . . . . . . . . . . . . . . . . . .
53
4.4
Parameter settings and Methods . . . . . . . . . . . . . . . . . . .
54
4.5
Result 1: three equilibrium states . . . . . . . . . . . . . . . . . .
55
4.6
Result 3: the relationship between grammar network density and
language convergence . . . . . . . . . . . . . . . . . . . . . . . . .
61
4.7
Result 3: rate of language convergence in grammar networks . . .
64
4.8
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
5 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
v
List of Figures
1.1
Illustration of human brain highlighting Broca’s and Wernicke’s
area, thought to be involved with language processing [sta97]. . .
1.2
2
As an organism is determined in part by its genome, language
can be viewed as a collection of linguistic properties which form a
hierarchy of some kind. . . . . . . . . . . . . . . . . . . . . . . . .
1.3
4
A Map of Martha’s Vineyard (image from [Kro05]). The low and
high frequency mnemonics of sonagrams point to what one might
expect to hear from chickadee throughout the island. . . . . . . .
2.1
5
Topologies: (A) Fully-connected, denoted FC. The number of connection for each individual nc is N − 1. (B) Linear, nc = 2. (C)
A von Neumann lattice with r = 1, denoted VN, nc = 4. (D)
Bridge, which has multiple fully-connected subpopulations and a
fixed number of connections between subpopulations. . . . . . . .
2.2
17
The dominant(◦) and average(×) grammar frequency at the last
time step of a set of fully-connected runs, overlaid with symmetric
(horizontal line) and asymmetric (curved line) analytic solutions
for a = 0.5, n = 10, f0 = 0.
2.3
. . . . . . . . . . . . . . . . . . . . .
21
Time-series of fully-connected single runs. The dash line(--) is X+
for q = 0.9, the upper dot-dash line(·-) is X+ for q = 0.85, the
lower dot-dash line is X0 . When q = 0.8, only the X0 is stable.
2.4
.
22
. . . . .
25
Linguistic coherence. The solid line(–) for φ0 , the dash line(--) for
φ1 , and the dot-dashed line(·-) for global coherence φ∞ .
vi
3.1
The dominant(×) language frequency after 100,000 time steps overlaid with symmetric (horizontal line) and asymmetric (curved line)
solutions for a(or ā), n = 64, f0 = 0.001. Each point is an independent replica. dl shown at the top left corner of each graph. . .
3.2
The number of languages(×) and the average effective number of
languages(—). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
38
39
Different type of graph with 12 nodes. (a) the complete graph, (b)
a ring lattice with k = 2, (c) a random graph with k = 1 and r =
12, (d) a small-world graph with k = 2 and r = 3. The examples
shown here have different densities. The experiments, however,
designed in such way that all the graphs except a complete graph
have the same density. . . . . . . . . . . . . . . . . . . . . . . . .
4.2
51
The dominant grammar frequency, xmax vs learning fidelity q for
a ring lattice with n = 51: (a) k = 25 (a compete graph), (b)
k = 15, (c) k = 10. The q interval is 10−4 . . . . . . . . . . . . . .
4.3
The effective number of grammars, ne vs q (a) k = 25 (complete
graph), (b) k = 15, (c) k = 10. . . . . . . . . . . . . . . . . . . . .
4.4
56
57
Frequency of all grammars, sorted by index number, of a single
run at the steady state for a ring lattice with n = 51, k = 15, and
a = 0.5: (a): q = 0, (b): q = 0.82, and (c): q = .83. . . . . . . . .
4.5
59
Distribution of grammar frequencies at the steady state with n =
51 and a = 0.5 over the range of 0 ≤ q ≤ 1. For each q value,
the grammar frequencies are sorted. (a) complete graph, (b) ring
lattice with k = 15, (c) random network with k = 1 and r = 714
and (d) small word with k = 13 and r = 102.
vii
. . . . . . . . . . .
60
4.6
Non-linear behavior of language convergence in response to network density. Various measures such as (a) grammar diversity ne ,
(b) dominant frequency xmax , and (c) minimum frequency xmin are
plotted for varying k = [1, 100] given a ring lattice with n = 200,
a = 0.5, and a learning fidelity q = 0.8.
4.7
. . . . . . . . . . . . . .
62
Grammar diversity, dominant frequency, minimum frequency in
response to network density of a random graph with k = 1 and
r = 714 at the learning fidelity of q = 0.6.
4.8
. . . . . . . . . . . . .
63
Time (or the number of replication cycles) took to reach a equilibrium state for each topology (a) complete graph, (b) regular graph
with k = 15, (c) random network with k = 1 and r = 714, and
(d) small-world network with k = 13 and r = 102 . Numerical
summary of this graph is presented in Table 4.7. . . . . . . . . . .
viii
65
List of Tables
1.1
Comparison of genetic inheritance and language . . . . . . . . . .
2.1
Parameters. Note that 10 connections of bridge topology are randomly selected from each subpopulation.
4
. . . . . . . . . . . . .
19
3.1
Parameters used in the simulations. . . . . . . . . . . . . . . . . .
32
3.2
System settings, average language similarity (ā), q1 and q2 . When
l = 1, we use a = 0.5. A: one component with 64 options, B: two
components with 8 options, C: three components with 4 options,
D: 6 components with 2 options. Each setup has exactly the same
number of possible languages. . . . . . . . . . . . . . . . . . . . .
4.1
36
Parameters chosen for comparing language convergence behavior
of four grammar networks in response to learning fidelity q. All
networks are set to the same density except a complete graph. . .
4.2
Parameters chosen for comparing language convergence behavior
in response to graph density.
4.3
54
. . . . . . . . . . . . . . . . . . . .
54
A brief summary of convergence time for various grammar networks. A population reaches at the steady state quickest in complete graph and slowest in a regular ring lattice. Convergence time
is longer is a learning fidelity falls near transition points. . . . . .
ix
66
Acknowledgments
I thank Prof. Charles Taylor, Prof. Edward Stabler, Prof. Peter Narins and
Prof. Richard Zimmer for their enthusiasm and support. Special thanks to Travis
Collier for his many helps including writing modeling programs; Gregory Kobele
for his insightful comments and discussions; Alexander Kirschel, Yuan Yao and
Prof. Martin Cody for sharing their amazing knowledge and stories of birds.
x
Vita
1979
Born, Jeju, Korea
1999-2000
Research participant in Structural Biology Laboratory at the
Pohang University of Science and Technology, Korea. Worked
on X-ray crystallography for identifying 3-D structure of various
proteins.
1997-2000
B.S. Life Science, the Pohang University of Science and Technology
2002-2004
Teaching Assistant, Department of Ecology and Evolutionary
Biology, UCLA. Organized and taught OBEE 199 (the substitute of Computational Biology course). Also lead Population
genetics lab sections.
2003
Teacher, Malaria Research and Training Center, Mali. Taught
HTML course for aiding self-promotion and information sharing
via the Internet among malaria researchers.
2001-2006
Graduate Student Researcher, Department of Ecology and Evolutionary Biology, UCLA. Since 2004, Anopheles gambiae Population Genomics Database design and management.
Publications
1. Stabler, E.P., Collier, T.C. , Kobele, G.M., Lee, Y., Lin, Y., Riggle, J., Yao,
xi
Y., and Taylor, C.T. (2003). The learning and emergence of mildly context sensitive languages. European Conference of Artificial Life.
2. Kobele, G.M., Riggle, J., Collier, T.C., Lee, Y., Lin, Y., Yao, Y., Taylor, C.E.,
and Stabler, E.P. (2003). Grounding as Learning. Proceedings of the Workshop
on Language Evolution.
3. Lee, Y., Riggle, J., Collier, T.C., Stabler, E.P., and Taylor, C.E. (2004).
Adaptive Communication among Collaborative Agents: Preliminary Results with
Symbol Grounding. Artificial Life and Robotics 8 : 127–132 (Also published in
the Proceedings of 8th International Symposium on Artificial Life and Robotics).
4. Manoukis, N.C., Lee, Y., Vallejo, E., and Taylor, C.E. (2004). Detecting
recurrent extinction in a meta-populations of Anopheles gambiae: preliminary
results using simulation. Proceedings of the 6th WSEAS International Conference on Algorithms, Scientific Computing, Modeling and Simulation.
5. Lee, Y., Collier, T.C., Stabler, E.P., and Taylor, C.E. (2005). The role of
population structure in language evolution. Proceedings of the 10th International
Symposium on Artificial Life and Robotics.
6. Lee, Y., Collier, T.C., Kobele, G.M., Stabler, E.P., and Taylor, C.E. (2005).
Grammar Structure and the Dynamics of Language Evolution. European Conference on Artificial Life.
7. Slotman, M.A., Tripet, F., Cornel, A.J., Meneses, C.R., Lee, Y., Reimer,
L.J., Thienmann, T.C., Fondjo, E., Fofana, A., Traoré, S.F., and Lanzaro, G.C.
(2006). Evidence for subdivision within the M molecular form of Anopheles gambiae. Molecular Ecology [in press].
xii
Abstract of the Dissertation
The effect of structure in population and
grammar space on language divergence
by
Yoosook Lee
Doctor of Philosophy in Biology
University of California, Los Angeles, 2006
Professor Charles E. Taylor, Chair
This dissertation explores parallels between selection and evolution of Mendelian
populations to that of language systems. The population dynamics of language
learners are addressed as an instantiation of the general case of agents learning from one another. I found that population structure, language structure,
and grammar network structure effect the evolutionary dynamics and influence
the degree of linguistic coherence. Further, under some types of grammar networks, populations exhibit “weak cohesion” where the distribution of grammar
frequencies forms a roughly Gaussian shape, similar to a quasi-species in models
of molecular evolution. Weak cohesion is also characterized by a large amount
of standing variation within the population which implies continued evolutionary potential. Positive frequency-dependent selection (FDS) has been of limited
interest due to its obvious dynamics leading to monomorphism. However, investigations in the field of language evolution reveal that a population may exhibit
standing variation even under the positive FDS if population or grammar space is
structured. Recently, researchers have been considering the applicability of adaptive language to adaptive sensor networks in a heterogeneous environments. An
xiii
evolutionary strategy where nodes adopt successful schemes from their neighbors
with a fitness bonus for agreement is a general option with great promise. Such
a system maps directly onto the systems I present.
The organization of this thesis is as follows: Chapter 1 provides the background of related research and the objectives of my thesis. The effect of population structure in the evolutionary dynamics of language is presented in chapter
2, which is published in Artificial Life and Robotics 8. Chapter 3 deals with language structure and its impact on language evolution; this material is published
in Proceedings of the VIIIth European Conference on Artificial Life. Chapter 4 investigates the effects of grammar network structure in the evolutionary dynamics
of language. Parts of this chapter have been submitted to Adaptive Behavior.
xiv
CHAPTER 1
Studies on language evolution
”Language is a whore, a mistress, a wife, a pen-friend, a check-out
girl, a complimentary moist lemon-scented cleansing square or handy
freshen-up wipette. Language is the breath of God, the dew on a
fresh apple, it’s the soft rain of dust that falls into a shaft of morning
sun when you pull from an old bookshelf a forgotten volume of erotic
memoirs; language is the faint scent of urine on a pair of boxer shorts,
it’s a half-remembered childhood birthday party, a creak on the stair,
a spluttering match held to a frosted pane, the warm wet, trusting
touch of a leaking nappy, the hulk of a charred Panzer, the underside
of a granite boulder, the first downy growth on the upper lip of a
Mediterranean girl, cobwebs long since overrun by an old Wellington
boot.”
1.1
— Stephen Fry, ”A Bit of Fry and Laurie”
Language organ
The human ability to think and communicate in expressive symbolic languages
distinguishes us from any other species. Language enables a new mode of evolution by allowing much broader and more flexible storage and transmission of
information. Thus the view that the emergence of language is one of the major
transition in evolutionary history has been shared by many [SH95, SS99, JL05,
1
Figure 1.1: Illustration of human brain highlighting Broca’s and Wernicke’s area,
thought to be involved with language processing [sta97].
Smi61, Cho80, PB90, Dea97].
In the 1950s, Chomsky put forth the idea that language acquisition process
resembles a machine that selects an appropriate subset of rules to generate sentences that are consistent with input, opposing the behaviorists who view language as a behavior that you acquire through experience. Chomsky questioned
such behaviorists’ point of view by pointing out that children learn the correct
grammars of their native language without being exposed to sufficient stimuli.
His poverty of the stimulus argument led to the conclusion that children possess
innate “faculty of language” that enables to master and use natural language
[Cho59, Cho65, Cho80, Cho02]. Due to the computational complexity of natural
language grammars and the poverty of stimuli in language learning, Chomsky
argues that there must be a common set of constraints guiding language acquisition.
Since then, much progress has been made not just from linguistics but also
from archeology, neuroscience and genetics enriching our understanding about
the origin of language. Many archaeologists and anthropologists suspect that lan-
2
guage may help human survival directly by facilitating organized hunting which is
distinguished from opportunistic hunting [WL68, Lau68, Aie98, HdM01, dHL03].
Others have suggested that language has evolved less from its ability to communicate information to others, but more from its ability to manipulate others
[Daw89]. Hauser, Chomsky and Fitch state more recently that certain perceptual
and articulatory abilities may have been selected for, although how the most fundamental aspects of human language emerged remains unclear [HCF02, FHC05].
From various neurological studies of language disorder, researchers have found
that the language organ, enabling communication using language, is not only
highly localized but also compartmentalized. Two well-known cases are Wernicke’s and Broca’s aphasia (See Fig.1.1). Further more, a gene called FOXP2
was identified recently by Vargha-Khadem et al. from studies of language impairment among KE family members [VWA95, VWP98, LFH01, SCJ05].
Overwhelming evidence indicating the language acquisition device, also known
as LAD, is innate and hard-wired in genome has led many researchers to put
language evolution in the realm of biology [SH95, SS99, Dea97, PB90, Pin94].
1.2
Parallel between molecular evolution and language
evolution
As an organism is determined in part by its genome, language is determined
in part by a lexicon of generators which determine its phonology, semantics,
morphology and syntax; these properties may evolve (Fig. 1.2) [Jos00, LM91].
As genetic information is transmitted from a parent to an offspring, linguistic
information is transmitted from a teacher to a learner. Table 1.1 summarizes
some of the analogies between genetic inheritance and language.
3
Gene2
Gene1
Property1
Gene3
Property1
Gene5
Gene4
Property1
Property1
Property1
Genome
Language
Figure 1.2: As an organism is determined in part by its genome, language can
be viewed as a collection of linguistic properties which form a hierarchy of some
kind.
Information
Species
Language
Transmission
Reproduction Learning
Error
Mutation
Incomplete or incorrect Learning
Storage
Genes
LAD, syntactic and lexical rules
Table 1.1: Comparison of genetic inheritance and language
As the genome is reproduced with variation by the process of mutation, language is reproduced with variation introduced by incorrect or incomplete learning. Although the original language may not be preserved in its original form
[Bic81, KSC99, SC01, Pin94], they provide a bedrock upon which new language
can emerge. While humans have dialects, one can find similar variations among
songbirds such as chickadees which learn their songs from their parents and/or
their neighbors [Kro05, SR93, KBH99].
One core question of speciation is how gene flow is blocked from one species
to others.
Physical factors, such as mountains and oceans, are example of
extrinsic barriers blocking gene flow. Genetic traits such as mate preference,
courtship, and genetic incompatibility may all contribute to intrinsic barriers.
Information flow is analogous to gene flow. Both extrinsic and intrinsic bar-
4
Figure 1.3: A Map of Martha’s Vineyard (image from [Kro05]). The low and
high frequency mnemonics of sonagrams point to what one might expect to hear
from chickadee throughout the island.
riers to language flow may exist. Geographic variations in signals have been
found among songbirds such as the black-capped chickadee as shown in Fig. 1.3
[KBH99]. Geographic separation often correlates with distribution of languages
or dialects[Cav97, CMP94]. Evidence of a new sign language among Nicaraguan
children [KSC99, SC01] and Black English Vernacular shared mostly among the
black speech community [Lab72] also show the geographic separation alone cannot explain language diversity.
All in all, these similarities between biological evolution and language change
have led many researchers to adopt evolutionary theory for studying language evolution. Some researcher like Komarova and Nowak adopt a quasi-species model
based on Eigen and Schuster’s molecular evolution work [ES79, EMS89] for describing the dynamics of language evolution [NKN01, KNN01, NKN02].
With the notion of reproduction, mutation, fitness, and information flow in
5
language change, I will endeavor to build upon their studies and to explore further
parallels between the evolutionary dynamics of language and biological species.
1.3
Objectives
The question of signal convergence and divergence is of interest to linguistics
[Kir98, Smi02, Jm03, NKD99, GHK00], to engineering communicating networks
[CT04, LF99, SK98, AT96] as well as to biology [NSH01, Sut03, SS02b, SS02a,
SSP03, LS04].
For linguists, questions include: “What causes languages to
change [Fox95, HJ96, McM94], and why do humans have so many different
languages?[Rue94, CMP94, Cav97]”. From an engineering point of view, how
to achieve convergence to a single language in a distributed adaptive system
[Ste01, YS93] is an important issue. For biologists, especially in the field of ornithology, how signals of songbird diverge is of interest, tied to the question of
speciation.
The evolutionary dynamics of language takes on special importance for robotics
and artificial life because it provides a superb platform for studying the emergence of united behavior from distributed, separate agents. For the last decade,
great progress has been made in the field of robotics to investigate the social
and/or cognitive factors contributing the emergence of coherent communication
among autonomous agents [AT96, Ste01, MCN03, LRC04, CT04, MN06, Ste06].
How to achieve and maintain mutual intelligibility in a distributed adaptive
system is an important issue, as in adversarial conditions, where it is advantageous to maintain high coherency among “friendlies” with minimal understanding from the adversary [YS93, LRC04, CAA06]. This is especially important in
the context of heterogeneous sensors or robots collectively learning about their
6
environment where different sensing modalities makes adaptive symbolic communication vital. More generally, the dynamics of language evolution provides
insight into convergence to a common understanding where distributed learning
is a goal. At a theoretical level, these issues are fundamentally similar.
Although there has been debate about the degree to which underlying representation of languages is inherited or learned and how language impacts fitness,
many researchers agree that linguistically relevant properties are to some extent
learned through cultural transmission and change through time [Cho65, Cho80,
PB90, Pin94, HCF02, FHC05, SKB03]. How this might occur has been the subject of many analytic and simulation studies [HI95, HI96, Ste96, NKN01, Kir01,
SBK03]. Mechanisms such as positive frequency dependent selection and Iterative Learning are examples of leading explanations for how a coherent language
emerges [HI95, HI96, NKN01, Kir01].
The most influential studies that have adopted the molecular evolution dynamics to language evolution have been based on the convergence dynamics of a
population learning languages from a set where each language is equally related
to every other language. This is, of course, a great oversimplification that may
have important consequences for subsequent analogies. Moreover, these studies
have focused mostly on a fully-connected population where all individuals have
an equal probability of learning from each other and the fitness contribution of
language is evaluated using the speaker frequency of a language among the entire
population [KNN01, NKN01].
In such studies, two extreme convergence states are identified: no-convergence
state (“Tower of Babel”), also known as the symmetric state, where all languages
are represented in roughly equal frequencies, and global convergence (“Lingua
Franca”), where a single language and its close variants predominate. Interme-
7
diate convergence states are typically not identified nor attained.
The current diversity of natural language, however, is distant from the two
extreme states produced by these models. While the exact definition of what
distinguishes two different languages from two different dialects is difficult to
enumerate, there is general agreement that World wide, over 5,000 languages
are spoken today, and about 1,400 languages are spoken in Africa. There are
studies that report geographic variation of acoustic signals in animal communication [KBH99, TSH93]. Songbirds such as the black-capped chickadees are
of particular interest since they learn their songs from their parent or neighbors. The consequences for sexual selection on gene flow have contributed to
sustained interactions in such species. These observations suggest the possibility
of a multi-language state, where the average individual belongs to a neighborhood
predominated by a single language, but no single language dominates across the
entire population. One ultimate goal of my research is to find conditions under
which such multi-language states are possible or not.
Adopting the language evolution model proposed by Nowak et al., refereed
to as the NKN model, and relaxing some of the assumptions they have made,
we have explored the evolutionary dynamics of language. The results show that
population network, language structure, and grammar network structure influence the language evolutionary dynamics [LCS05a, LCK05, LCT06].
1.4
1.4.1
Hypotheses
Population Network Model
Cavalli-Sforza et al. have studied the correlation between genes and languages
and found a strong relationship of genetic history and class of languages[Cav97].
8
Allopatry not only reduces genetic mixing but also the communicative connectivity between the divided populations. Such physical separations are likely responsible for genetic divergence and linguistic diversification. Cavalli-Sforza et
al. asserts that the connectivity between groups of people affects the divergence
of languages[CMM92, Cav97].
In the NKN model, which uses a fully connected population network, we
expect two types of equilibrium states: the symmetric state and asymmetric
states. In the symmetric state, there is no dominant language and the proportion
of all languages is equal. In an asymmetric state, a large proportion of the
population speaks one dominant language. If the mutation rate is sufficiently
low, a population will converge to an asymmetric state.
Individual-based models have proven helpful as an experimental tool for studying language evolution in complicated situations such as social networks [Ste01,
Ste06, MCN03, MN06, AT96, CT04, LRC04, LCS05a]. In this study I have explored four types of population networks.
• Fully-connected network
This is a situation where everyone is a neighbor of every other. An idealized
situation corresponding to the panmictic assumption, which most theorists
including Komarova et al.. have dealt with.
• Random network
A situation where the number of connections is limited but to whom each
individual is connected to is arbitrary. Each individual only affects a limited
fraction of the population.
• Von Neumann network
A situation where the number of connections is limited and connected in-
9
dividuals are determined by their position. In a 4-way lattice structure, 4
individuals on the right North, South, West and East are effective neighbors.
• Bridged network
A situation where a population is divided into multiple groups and have
limited connection between groups. Each group is a fully-connected network.
Instead of determining the fitness of each individual by calculating the probability of successful communication with every other individual in a population,
only individuals connected through a communication channel contribute to the
calculation of fitness, that is to say, individuals who cannot communicate to
others will have a fitness of zero.
In chapter 2, simulations of the population network model are described and
discussed. Among our conclusions is that local convergence has important implications to developing systems such as sensor networks where adaptive communication between agents in a heterogeneous environment is desirable. This work
has been published in Artificial life and Robotics [LCS05a].
1.4.2
Parameterized Grammar Model
In the NKN model each grammar, labeled Gi , has uniform similarity to the rest
of grammars. Such similarity setting allows a grammar to mutate into any other
grammar in a single step. In this model, language convergence can occur only
above a critical error threshold [KNN01, NKN01].
Natural language, however, is structured in a way that some pair of languages are more closely related than others. For example, English and French
10
are more similar than either is to Swahili. This is potentially important for language convergence. To model this, I introduced a regularity in the language space
by viewing the locus of language transmission as a sequence of learned parameters like AAAABABCBCBB. This representation is in the spirit of Chomsky’s
principles and parameters approach to language.
In the “Principles and Parameters” approach, principles are common and
invariant features that all natural languages share. Parameters are specific points
of variation between languages. In this framework, principle and parameters
are part of innate universal grammar which all humans possess. Principles and
parameters need not be learned; exposure to language triggers the parameter to
adjust its setting to be consistent with input stimuli [CL93, Cho95].
Parameters, as I used them here, correspond to whatever the relevant differences are between language at the level of description relevant to transmission.
This will correspond to parameters in the Chomskian sense just in case these
letter parameters are approximately relevant to linguistic transmission.
With the setup of parameterized grammars, similarity between any two individuals is determined by how many rules or characteristics of language they
agree on. Similarities between languages, therefore, are not assumed to be uniform as in Komarova et al.. Naturally some grammars are more similar than
others, therefore the probability of transition is higher between closely related
grammars. Mutation creates variants from a teacher.
The results showed that language structure led to substantial levels of linguistic coherence even in the learning fidelity regions where single dominant language
cannot emerge if there is no language structure. The qualitatively different dynamics with respect to the critical learning fidelity suggests that convergence to
a set of closely related languages is significantly easier than previously thought.
11
This chapter has been published in 8th European Conference on Artificial Life
[LCK05].
1.4.3
Grammar Network Model
Most studies on language evolution have been based on the convergence dynamics of a population learning languages from a set where each language is equally
related to every other language. However, natural languages are organized hierarchically, with language families, subgroups, and even dialects. The similarity
of this language hierarchy with species trees is one of the inspirations for an
evolutionary approach in the first place.
We define a grammar network to be a graph describing the similarity between
grammars. Preliminary results have shown that different grammar networks alone
can qualitatively change the dynamics of language convergence [Olf05, OM04,
LCK05]. Independently, Matsen and Nowak recently have explored language
convergence conditions on a “nearly-regular” language network [MN04].
In chapter 4, I explore the population dynamics of grammar acquisition using
four structurally distinct grammar networks. I assume that the grammar network
determines the similarity of grammars from each other in mutational space as
well as their contribution to the individual’s fitness. Two equilibrium states
have previously been characterized in terms of language convergence: one is the
symmetric state where all grammars exists in equal frequencies, and the other
is a strong cohesion state where the symmetry breaks down and one grammar
predominates in a population [NKN01, KNN01].
Our results identify another state we label weak cohesion, where a group of
structurally similar grammars predominates in the population. The high degree
of standing variation characteristic of the weak cohesion state is more suggestive
12
of real-world observations than the other two equilibria previously described.
The relative convergence time indicates that a population reaches an equilibrium
state slower when a learning fidelity lies within a transition region regardless of
the topology of grammar networks. Graph density threshold required to achieve
a significant linguistic adds another dimension to consider for a coherent language
to emerge. Parts of this chapter is submitted to Adaptive Behavior [LCT06].
13
CHAPTER 2
The role of population structure in language
evolution
Abstract
The question of language evolution is of interest to linguistics, biology and recently, engineering communicating networks. Previous work on these problems
has focused mostly on a fully-connected population. We are extending this study
to structured populations, which are generally more realistic and offer rich opportunities for linguistic diversification. Our work focuses on the convergence
properties of a spatially structured population of learners acquiring a language
from one another. We investigate several metrics, including mean language coherence and the critical learning fidelity threshold. This work is published in
Proceedings of 8th International Symposium of Artificial Life and Robotics and
reprinted in Artificial Life and Robotics journal.
2.1
Introduction
The question of linguistic divergence is of interest to linguistics[Kir98, Smi02,
Jm03, NKD99, GHK00], biology[NSH01, Sut03],and engineering communicating networks[CT04, LF99, SK98, AT96]. For linguists the question is: “What
causes languages to change[Fox95, HJ96, McM94], and why do humans have so
14
many different languages?[Rue94, CMP94, Cav97]”. From an engineering point
of view, how to achieve convergence to a single language in a distributed adaptive
system[Ste01, YS93] is an important issue, as in adversarial conditions, where we
would like to maintain high coherency among “friendlies” with minimal understanding from the adversary.
More generally, the dynamics of language evolution provides insight into convergence to a common understanding where distributed learning is a goal. At
a theoretical level, these issues are fundamentally similar. The evolution of
language takes on special importance for robotics and artificial life because it
provides a superb platform for studying the emergence of united behavior from
distributed, separate agents.
Previous work on these problems has focused mostly on a fully-connected
population where all individuals have an equal probability of learning from each
other and the fitness contribution of language is evaluated using the frequency
among the entire population[KNN01, NKN01]. We are extending this study to
structured populations, which are generally more realistic and offer rich opportunities for diversification. Our work focuses on the convergence properties of
a population of learners acquiring a language from one another under different
connectivity connectivity conditions, called topologies. This approach is motivated in part by studies indicating that whom a person learns language from can
heavily influence one’s language [Pin95, SC01, KSC99, Ste03].
Breaking the symmetry that a fully-connected population provided makes
finding an analytical solutions much more difficult, though perhaps not impossible. Therefore, we are using simulations to explore the convergence properties of
variety of distinct topologies. We compare the topologies on several metrics, including mean language coherence and critical error-threshold. Our results show
15
that topology has a large effect on overall convergence and can create stable
multi-language solutions.
The multi-language solutions are a third distinct phase of local convergence
between no-convergence (“Tower of Babel”) on the one hand, where all languages
are represented in roughly equal frequencies, and global convergence (“Lingua
Franca”), where a single language and its close variants predominate. In a multilanguage solution, the average individual belongs to a neighborhood predominated by a single language, but no single language dominates across the entire
population.
In our paper these simulations are described and discussed. Among our conclusions is that local convergence has important implications for developing systems such as sensor networks where adaptive communication between agents in
a heterogeneous environment is desirable.
2.2
Methods
Our system is constructed with each individual possessing a parameterized grammar, in the principles and parameters tradition, which can be encoded as a sequence of symbols. However, for the baseline tests we report here, each grammar
consists of a single symbol and all grammars have the same expressive power and
equal distance from each other. This is a necessary simplification to make our
results comparable to the analytic results from Komarova et. al.[KNN01].
Each individual exists within a topology defining a set of neighboring individuals. We explore four different topologies: fully-connected (FC), linear, von
Neumann lattice (VN), and bridge, illustrated in Figure 2.1.
The fitness of an individual has two parts: the base fitness, denoted as f0 ,
16
A
C
B
x
x
@¡
¡@ x
x
x x x
D
x
x
x
x
s
fully s
connected
s
fully
connected
s
Figure 2.1: Topologies: (A) Fully-connected, denoted FC. The number of connection for each individual nc is N − 1. (B) Linear, nc = 2. (C) A von Neumann lattice with r = 1, denoted VN, nc = 4. (D) Bridge, which has multiple
fully-connected subpopulations and a fixed number of connections between subpopulations.
and a linguistic merit proportional to the probability that the individual could
successfully communicate with its neighbors. In the simplified system, linguistic
merit is proportional to the number of neighbors which share the same grammar.
In the fully-connected topology, each individual of a given grammar will have the
same fitness, but this does not hold for other topologies.
Specifically, the fitness of individual i, fi , is f0 plus the sum over each neighbor
j of the similarity between i’s grammar and j’s grammar.
n
c
1X
(aij + aji )
fi = f 0 +
2 j=1
(2.1)
Each time step, an individual is chosen proportional to its fitness to reproduce.
Reproduction can be thought of as the chosen individual producing an offspring
which inherits the parent’s grammar and replaces one of the parent’s neighbors.
The offspring learns the parent’s grammar with a certain learning fidelity, q. This
learning fidelity is properly a function of the specifics of the learning method the
child uses and the complexity of the grammar, but in the simplified system the
learning fidelity is reducible to a transition probability function between grammar
Gi and grammar Gj equal to q for i = j, and (1 − q)/(n − 1) for i 6= j.
17
The algorithm of our program is as follows:
for each individual i in a population P
set a random language Li of i
end for
for each individual i ∈ P
compute fitness fi of i
end for
do until number of updates is met
select an individual k ∈ P
select a random neighbor j of individual k
replace the neighbor j with an offspring of individual k
the offspring becomes an individual j
if the offspring is mutant( mutation rate = µ)
get a random language for Lj
else
Lj = Lk
end if
update fitness of the individual j
end do
One important metric is the dominant grammar frequency. We measure this
directly each time step by counting the abundance of each grammar. Which
grammar is the dominant one may change each time it is measured; in other
words, the dominant grammar is whichever grammar happens to be at the highest
frequency at the time.
The linguistic coherence, denoted as φ, is measured using the following equa-
18
tion:
φ=
N
nc
1 X1X
(aij + aji )
N i=1 2 j=1
(2.2)
Various different “levels” of coherence exist as defined by the set of individuals
in nc the second summation occurs over. Local coherence, φ0 , only sums over the
neighbors of each individual and is proportional to mean fitness (equal if f0 = 0).
φ1 is the coherence measured over the set of neighbor’s neighbors, and generally,
φi is measured using the set of (neighbor’s)i neighbors. Global coherence, φ∞ ,
corresponds to summation is over the entire population. In the fully-connected
topology, all of these convergence levels reduce to the same value.
Topologies
Parameter
FC
Linear
VN
a
0.5
f0
0
n (# grammars)
10
N (pop size)
500
500
484
Bridge
500
# subpops
2
subpopsize
250
# connections
10
# time steps
105
106
5 × 105
105
Table 2.1: Parameters. Note that 10 connections of bridge topology are randomly
selected from each subpopulation.
For the experiments, we used a population size N of 500, except for the von
Neumann lattice which was a 22×22 torus giving a population size of 484. The
similarity of between languages a was set at .5, the base fitness f0 was 0, and
the number of different possible grammars n was 10. All relevant parameters are
19
summarized in Table 2.1.
The experiments, or runs, are done for a set number of time steps that varies
with topology. The goal is to make each run long enough that the system will
very probably reach an equilibrium. A set of 5 replica runs, varying only the
random number generator seed, were done at each q value between 0.65 and 1 at
0.01 intervals.
2.3
Analytic Model
For the fully-connected topology given a uniform similarity a between n different
grammars, and the learning fidelity of q, three equilibrium solutions for grammar
frequency were derived by Komarova et. al.[KNN01]:
X0 = 1/n
(a − 1)(1 + (n − 2)q) ∓
X± =
2(a − 1)(n − 1)
(2.3)
√
D
(2.4)
where D = 4[1 + a(n − 2) + f0 (n − 1)](1 − q)(n − 1)(a − 1) + (1 − a)2 [1 + (n − 2)q]2 .
Below a specific learning fidelity q1 , D is negative and there is no real solution for X± . Therefore, for q < q1 , only the symmetric solution X0 exists and
no grammar dominates. Solving for q when D = 0 determines the critical leaning fidelity threshold q1 , which corresponds to the error threshold in molecular
evolution.
4 − 2f0 (n − 1)2 − 3n − a(2n2 − 7n + 6)
(1 − a)(n − 2)2
p
3
2(n − 1) 2 1 + f0 [1 + a(n − 2) + f0 (n − 1)]
+
(1 − a)(n − 2)2
q1 =
(2.5)
When q1 < q < q2 for a specific q2 , both the symmetric X± and asymmetric
X0 solutions exist and are stable. For q > q2 however, only the asymmetric
20
1
q =0.836
1
0.9
q =0.925
2
0.8
Dominant Frequency
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.65
0.7
0.75
0.8
0.85
Learning Fidelity
0.9
0.95
1
Figure 2.2: The dominant(◦) and average(×) grammar frequency at the last time
step of a set of fully-connected runs, overlaid with symmetric (horizontal line)
and asymmetric (curved line) analytic solutions for a = 0.5, n = 10, f0 = 0.
solution where one grammar dominates the population is stable. This q2 value is
the point where X0 = X− , giving:
q2 =
n2 (f0 + a) + (n + 1)(1 − a)
n2 (f0 + a) + 2n(1 − a)
(2.6)
Komarova et. al. provide much more detail and proofs[KNN01]. We plot these
solutions and compare them to experimental results in Figure 2.2.
2.4
Results
The empirical results for the fully-connected topology well match the expectation
from the analytic results arrived at by Komarova et. al.[KNN01], as shown in
21
1
0.9
q=0.9
0.8
Dominant Frequency
0.7
0.6
0.5
q=0.85
0.4
0.3
q=0.8
0.2
0.1
0
0
1
2
3
4
5
6
Time Step [× 104]
7
8
9
10
Figure 2.3: Time-series of fully-connected single runs. The dash line(--) is X +
for q = 0.9, the upper dot-dash line(·-) is X+ for q = 0.85, the lower dot-dash
line is X0 . When q = 0.8, only the X0 is stable.
Figure 2.2. In the region where only the symmetric solution is stable (q < q1 ), the
average grammar frequency is 1/n. The dominant grammar frequency appears
high because it is the upper end of a distribution of grammar frequencies which
has a non-zero variance due to the finite population size.
In the bi-stability region (q1 < q < q2 ), a discrepancy between the analytic and
empirical results presumably derives from a lack of runs settling at the symmetric
solution. With a finite population, the basin of attraction of the symmetric
solution in this region is very weak. Choosing which individual reproduces each
time step is stochastic. This combined with stochastic learning errors appear to
be sufficient perturbation to make the symmetric solution unstable empirically
in this region.
22
The time series of single runs with three different learning fidelities in the fullyconnected topology are shown in Figure 2.3. These learning fidelities correspond
to the three different regions of stable solutions. The run in the region where
symmetric and asymmetric solutions are possible shows the very weak attraction
of the symmetric solution. Even starting with every individual in the population
being initialized to the same grammar, a dominant frequency of 1, runs at this
learning fidelity settle into a similar pattern (data not shown).
For topologies other than fully-connected, convergence provides a more clear
picture of system dynamics than dominant frequency. Global coherence φ∞ correlates very closely with dominant frequency, but local coherence φ0 corresponds
directly to the linguistic contribution to fitness and is directly operated on by the
evolutionary process.
Figure 2.4 shows the coherence values by taking the average values at the
end-points of the 5 replica runs at each q value. The learning fidelity threshold,
or error-threshold, for the emergence of a dominant grammar is where indicated
by the inflection point in the coherence curve. The emergence of a dominant universal grammar among the entire population is reflected in the global coherence
curve.
The bridge topology with 2 subpopulations of size 250 and 10 random connections between subpopulations (Figure 2.4 B) is indistinguishable from two
isolated fully connected populations. It demonstrates a very similar learning
fidelity threshold, q1 ≈ 0.84, as the fully-connected run shown in panel A. However, global convergence is never achieved reliably in excess to the probability
that both subpopulations individually converge to the same grammar by chance.
The up-tick in the φ∞ line at q = 1 is not statistically significant due to this
effect. Additionally, φ1 is extremely close to φ0 while φ∞ only rises to approx-
23
imately 0.5, indicating that there is a large degree of separation between many
individuals.
24
Convergence
1
0.8
A: Fully−Connected
0.6
0.4
0.2
0
0.7
0.75
0.8
0.85
0.9
0.95
1
0.8
0.85
0.9
0.95
1
0.8
0.85
0.9
0.95
1
0.9
0.95
1
Convergence
1
0.8
B: Bridge
0.6
0.4
0.2
0
0.7
0.75
Convergence
1
0.8
C: Linear
0.6
0.4
0.2
0
0.7
0.75
Convergence
1
0.8
D: Von Neumann lattice
0.6
0.4
0.2
0
0.7
0.75
0.8
0.85
Learning Fidelity
Figure 2.4: Linguistic coherence. The solid line(–) for φ0 , the dash line(--) for
φ1 , and the dot-dashed line(·-) for global coherence φ∞ .
25
For the linear topology shown in Figure 2.4 C, φ0 and φ1 slowly rise over
the entire range shown, and the trend extends all the way from q ≈ 0.2 where
learning error makes each offspring essentially random (data not shown). Since
φ1 trends upward along with φ0 , we can conclude that extended “patches” of
individuals with the same grammar form. Near q ≈ 0.99, φ1 begins to approach
φ0 , and φ∞ shows a slight up-tick. However, even with perfect learning fidelity,
φ∞ is only slightly different from the symmetric solution 1/n. There appears to
be no possible global convergence learning fidelity threshold for this topology.
In contrast to the linear topology, the toroidal von Neumann lattice topology
(VN) shows a clear learning fidelity threshold for all three coherence metrics at
q ≈ 0.96. Below this threshold, the VN topology behaves similarly to the linear
topology with an expected decrease in φ0 and φ1 due to the doubling of neighbors.
2.5
Discussion
Empirical results using agent-based simulations closely match the analytic results
produced by Komarova, Nowak, and others for the fully-connected topology.
However, a relatively small population size combined with stochastic scheduling
and learning errors lead to sufficient perturbations that empirical results show
less stability than the pure math would suggest.
The instability of the fully-connected model at learning fidelities just above
the critical threshold q1 tempts the conclusion that human languages exist in this
“edge of chaos” region. Humans exist in complex social connectivity networks,
which are probably closer to the bridge topology. The instability of human language is more probably related to changing connectivity and topology than a
specific learning fidelity.
26
Topologies other than fully-connected can behave quite differently. The bridge
topology for the parameters we have tested quickly and stably converges to two
independent grammars, one for each subpopulation, above a critical learning
fidelity. A linear topology fails to converge to a single dominant grammar, but
does converge to many “patches” that increase in size as q increases. Both of
these cases correspond to stable multi-language solutions which do not exist in
the fully-connected topology.
Additionally, the bridge topology has parameters that quite likely change its
dynamics. At a higher number of connections between subpopulation and/or a
higher number of smaller subpopulations, there is probably a global coherence
threshold.
The lack of apparent learning fidelity threshold for the linear topology and
similarity to the behavior of the VN topology below its threshold suggests that
there is a critical connectivity value that a regular lattice must exceed before
global convergence is possible. This result, while only hinted at here, would fit
very well with percolation theory. Percolation theory may also provide insight
into what parameterized random graph topologies a learning threshold exists for.
The fully-connected topology provides the scenario with the lowest critical
learning fidelity, but it also requires the most intensive communication. A topology with much more limited connectivity such as a lattice or clustered graph
may still globally converge with much more limited communication. For many
engineering situations such as adaptive sensor networks, this is an important
consideration.
Language in this study is sufficiently abstract that these results apply to
many situations where agents adapt by learning from one another and convergence is desirable. In an adaptive sensor network setting, it may be beneficial
27
for sensor nodes to adapt their communicative coding and recognition/detection
systems based on the specific topology of deployment and the actual inputs to the
network. An evolutionary strategy where nodes adopt successful schemes from
their neighbors with a fitness bonus for agreement is a general option with great
promise. Such a system maps directly onto the linguistic systems we present.
2.6
Summary
We demonstrated the role of topology is critical in determining the degree of
linguistic coherence and the learning fidelity threshold through empirical studies informed by the theoretical results for an idealized population. The reality
of complex population structure makes evident the importance of topology in
studying the dynamics of language acquisition and language evolution. Further
investigation on various topologies with different parameter settings may provide
a more in depth understanding of language evolution and diversification.
Acknowledgments
This work was supported by the UCLA Center for Embedded Network Sensors, the
Defense Advance Research Projects Agency (DARPA), administered by the Army Research Office under Emergent Surveillance Plexus MURI Award No. DAAD19-01-10504, and DARPA MURI award administered by the US Airforce No. F49620-01-10361. Any opinions, findings, and conclusions or recommendations expressed in this
publication are those of the authors and do not necessarily reflect the views of the
sponsoring agencies.
28
CHAPTER 3
The role of language structure in language
evolution
Abstract
The complexity, variation, and change of languages make evident the importance
of representation and learning in the acquisition and evolution of language. For
example, analytic studies of simple language in unstructured populations have
shown complex dynamics, depending on the fidelity of language transmission. In
this study we extend these analysis of evolutionary dynamics to include grammars
inspired by the principles and parameters paradigm. In particular, the space of
languages is structured so that some pairs of languages are more similar than
others, and mutations tend to change languages to nearby variants. We found
that coherence emerges with lower learning fidelity than predicted by earlier work
with an unstructured language space. This work is published in Proceedings of
the VIIIth European Conference on Artificial Life.
3.1
Introduction
The evolutionary dynamics of language provides insight into the factors allowing
subpopulations to converge on common or similar languages. The problem has
a more general significance for robotics and artificial life as a clear and empiri-
29
cally supported platform for the study of how coherent behavior can emerge in a
population of distributed adaptive agents.
Of particular interest from the perspective of evolutionary dynamics are insights into the means and value of conserving linguistic diversity. The practical
importance of linguistic diversity has attracted some attention [Sut03, Sta05],
though perhaps not as much as biological diversity. Recent studies that have
applied a biological perspective to the evolution of linguistic convergence and diversity have shown promising results [HI95, HI96, Ste96, NKN01, Kir01, SBK03,
Niy06]. Most such studies that apply a biological perspective to language evolution have been based on very simple languages arbitrarily related to each one
another. We believe these studies may be enriched by a more realistic description
of language.
Language models based on the Chomskian paradigm [Cho65, Cho80] view
language as an aspect of individual psychology. There has been some debate
about the extent to which the underlying representation of languages are inherited or learned and how language impacts fitness. Pinker and Bloom, for
example, suggest that a language instinct constrained by universal grammar sets
the stage for language acquisition which then contributes to individual fitness
[PB90, Pin94]. Hauser, Chomsky and Fitch argue more recently that while
certain perceptual and articulatory abilities may have been selected for, it remains unclear how the most fundamental aspects of human language emerged
[HCF02, FHC05]. All parties agree that linguistically relevant properties are to
some extent learned through cultural transmission and change through time. How
this might occur has been the subject of many analytic and simulation studies
[Niy06, NKN02, SBK03, Kir01, Ste96].
As an organism is determined in part by its genome, language is determined
30
in part by a lexicon of generators which in turn determine its phonology, semantics, morphology and syntax; these properties may evolve [Jos00, LM91]. Both
the genome and at least part of language is inherited with variation, and therefore potentially a target for natural selection. These similarities have lead some
researchers to adopt a quasi-species model [ES79, EMS89] for describing the dynamics of language evolution [NKN01, KNN01]. In their model, grammars are
mutationally equidistant from each other with arbitrarily assigned similarity. It
seems, however, that the kind of changes language actually undergoes is much
smaller than what this model seems to predict – the language of a child is, more
or less, the same as that of its linguistic community. This suggests an approach
where the similarity between languages is correlated with their distance from each
other in mutational space.
In this paper, we study how certain properties of the space of possible languages and learning mechanisms impact language change. We introduce a regularity in the language space by viewing the locus of language transmission as
a series of learned parameters and calculating the similarity between languages
as the proportion of parameters that agree. We explore the effect of this simple
regularity on the dynamics of language evolution primarily through simulations.
These simulations go beyond previous analytic studies of simple models, and we
find that structure has a significant impact on stability results.
3.2
Methods
Consider a fully-connected finite population of N individuals, each of whom possesses a language which is encoded as a sequence of l linguistic ‘components’ or
‘parameters’. Each parameter can take only a limited number d of values. For
example, a language L with 10 parameters each taking 3 values (A,B,C) can be
31
Symbol Parameters
value(s)
N
population size
500
f0
base fitness
10−3
l
Number of language parameters
d
Number of values per each parameter 64, 8, 4, or 2
n
Number of possible grammars
64(= dl )
Number of time steps
100,000
1, 2, 3, or 6
Table 3.1: Parameters used in the simulations.
represented by a linear sequence like AABABCBABA. (This string is not an example of a statement from the language, but rather a represents the language
itself.) Such a representation is in the spirit of Chomsky’s “principles and parameters” approach to language[Niy06]. To allay a potential source of confusion:
parameters, as we use them here, correspond to whatever the relevant differences
are between language at the level of description relevant to transmission. This
will correspond to parameters in the Chomskian sense just in case these latter
parameters are appropriately relevant to linguistic transmission. We are throughout assuming that whatever is being transmitted can be usefully viewed (at least
to a first approximation) as a finite sequence of finite-valued parameters.
Representing a language as a sequence, we define the language similarity between individual i and j, denoted aij , as the proportion of parameters on which
the two individuals agree. For example, the language similarity between an individual i whose language is represented as AAA and an individual j whose
language is represented as ABA is 2/3 and aij = aji .
The fitness of an individual has two parts: the base fitness, denoted f0 , and
a linguistic merit proportional to the probability that the individual is able to
32
successfully communicate with another, selected at random from his neighbors.
The linguistic merit of an individual is proportional to the sum of language similarity between the individual and others it is in linguistic contact with (which is
the entire population for this model). The overall fitness of an individual, f i , is
described as the following, as in [NKN02]:
N
N
X
1X
fi = f 0 +
(aij + aji ) = f0 +
aij
2 j=1
j=1
(3.1)
noting that aij = aji according to our definition of similarity.
At each time step, an individual is chosen to reproduce randomly and independently with a probability according to relative fitness. Reproduction can
be thought of either as the individual producing an offspring which inherits the
parent’s language and replaces another in the population, or another individual
changing its language to match the “teacher’s” language. We will use the former
terminology.
The offspring learns the parent’s language with a certain learning fidelity,
q. This learning fidelity is properly a function of the specifics of the learning
method the child uses and the complexity of the language, often modeled with
a probability distribution over the possible transition from each language L i to
each other (possibly different) Lj . But in the present setting we use the first order
approximation that the only incorrect/imperfect learning is a single parameter
change per reproductive event. We refer to this constraint as gradual learning.
The rationale behind this approach is that learning errors do not typically result
in the learner acquiring a radically different language. This single parameter
change constraint on incorrect/incomplete learning is analogous to only allowing
single point mutations to the linear sequence representation of the language. As
such, it defines the “sequence space”[EMS89] through which the population moves
during the evolutionary process.
33
We study language change in an ideal population using a simulation, using
the following algorithm. Initially each individual in the population P starts with
a randomly chosen language from set of all possible languages.
for each individual i ∈ P
compute fitness fi of i
end for
do until number of updates is met
select an individual i ∈ P with a probability proportional to fitness
select a second random individual j from the population
replace individual j with an offspring k of individual i
if the offspring is mutant( mutation rate = µ)
change a random parameter of Lk
else
Lk = Li
end if
update fitness of the individual j
end do
We measure the dominant language frequency directly at each time step by
counting the number of individuals speaking each language. The dominant language at any given time is simply the language that is most frequent at that time,
and will typically change over time unless the population has strongly converged.
The linguistic coherence of the population, denoted φ, is defined as follows:
N
N
1 XX
aij
φ=
N i=1 j=1
(3.2)
Counting the actual number of languages that exist in the population may
disguise the degree of variation when some of the languages disproportionately
34
dominate. Consequently, we used an analogue to the effective number of alleles
in a population, which we will refer to as the effective number of languages in the
population, ne [CK70]:
ne =
Ã
N
X
p2i
i=1
!−1
(3.3)
where pi is the frequency of each language.
Table 3.1 shows the parameter settings for the experimental setup. We used
a population size N of 500, a base fitness f0 of 0.001, and we let the number of
different possible languages n be 64. Each language in a set can be represented
as a linear sequence of length l with elements drawn from a set of d possible
values. For set A, the similarity between languages that are not the same is set
to a constant value a equal to 0.5. For all other sets, aij is the Hamming distance
divided by sequence length as described above. The reproduction cycle repeated
for 100,000 times to make each run long enough to reach an equilibrium. Twenty
replica runs, varying only the random number generator seed, were done at each
q between 0.5 and 1 at 0.02 intervals.
3.3
Analytic Model
Given a uniform similarity a between n different languages, and the learning
fidelity of q, three equilibrium solutions, X0 and X± , for language frequency were
derived by Komarova et. al.[KNN01] for a family of single-component languages:
X0 = 1/n
X± = ((a − 1)(1 + (n − 2)q) ∓
√
(3.4)
D)(2(a − 1)(n − 1))−1
(3.5)
where
D = 4[1 + a(n − 2) + f0 (n − 1)](1 − q)(n − 1)(a − 1) + (1 − a)2 [1 + (n − 2)q]2
35
Setting l d n(= dl )
ā
q1
q2
A
1 64
64
0.500 0.830 0.985
B
2 8
64
0.111 0.516 0.902
C
3 4
64
0.238 0.662 0.955
D
6 2
64
0.492 0.826 0.985
Table 3.2: System settings, average language similarity (ā), q1 and q2 . When
l = 1, we use a = 0.5. A: one component with 64 options, B: two components
with 8 options, C: three components with 4 options, D: 6 components with 2
options. Each setup has exactly the same number of possible languages.
Below a certain learning fidelity of q1 , only the symmetric solution X0 exists
and no single language dominates. Solving for q when D = 0 determines the
critical leaning fidelity threshold q1 , which corresponds to the error threshold in
molecular evolution.
q1 =
h
3p
1
2
4
+
2(n
−
1)
(1 + f0 ) [1 + a(n − 2) + f0 (n − 1)]
(1 − a)(n − 2)2
(3.6)
¤
2
2
−2f0 (n − 1) − 3n − a(2n − 7n + 6)
When q1 < q < q2 for a specific q2 , both the symmetric X± and asymmetric
X0 solutions exist and are stable. For q > q2 however, only the asymmetric
solution where one language dominates the population is stable. This q2 value is
the point where X0 = X− , giving:
¡
¢¡
¢−1
q2 = n2 (f0 + a) + (n + 1)(1 − a) n2 (f0 + a) + 2n(1 − a)
Komarova et. al. provide much more detail and proofs[KNN01].
36
(3.7)
By introducing a regularity in language, we effectively change the transition
matrix of aij . To compare our findings with the analytical result, we use the
average language similarity ā for calculating q1 and q2 , where ā is calculated
using the equation below:
1
ā =
n−1
à l−1
Xl−k
k=1
l
µ ¶!
n
(d − 1)
k
k
(3.8)
We consider 4 settings A-D, varying in the “amount of structure.” The four
cases are listed in Table 3.2 together with the calculated ā for each case.
3.4
Results
We plot the experimental and analytic results for comparison in Figure 3.1. The
empirical results for the uniform similarity of a = 0.5 between two different
languages closely follows the expectation from the analytic results arrived at by
Komarova et. al.[KNN01] as shown in Figure 3.1 A, which we have previously
described in detail[LCS05b].
The results of the multi-component languages (Figure 3.1 B, C and D) do
not show the clear transition from symmetric to asymmetric solution. The trend
is considerably smoother, with nothing but an increased variance in results at
the point of the phase transition for parameter sets C and D. Parameter set B
shows a region where both symmetric and asymmetric solutions appear stable
for q values between 0.6 and 0.7, but it is notable that the empirical asymmetric
dominant abundance is significantly below the analytical expectation for this set
as well as C and D.
37
Dominant Freq
1
1
0.8 A: FC 64
0.6
Dominant Freq
2
0.2
0.55
1
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.95
1
2
0.8 B: FC 8
0.6 q1=0.516
q2=0.902
0.4
0.2
0
0.5
Dominant Freq
q =0.985
1
0.4
0
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
1
3
0.8 C: FC 4
q1=0.662
0.6
q2=0.955
0.4
0.2
0
0.5
Dominant Freq
q =0.83
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1
6
0.8 D: FC 2
q =0.826
0.6
q =0.985
1
2
0.4
0.2
0
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Learning Fidelity
0.85
0.9
0.95
1
Figure 3.1: The dominant(×) language frequency after 100,000 time steps overlaid with symmetric (horizontal line) and asymmetric (curved line) solutions for
a(or ā), n = 64, f0 = 0.001. Each point is an independent replica. dl shown at
the top left corner of each graph.
38
Num Languages
Num Languages
60
A: FC 641
q1=0.83
q2=0.985
40
20
0
0.5
60
0.55
0.6
0.65
0.7
0.75
D: FC 26
0.8
0.85
0.9
q =0.826
0.95
1
q =0.985
1
2
40
20
0
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Learning Fidelity
0.85
0.9
0.95
1
Figure 3.2: The number of languages(×) and the average effective number of
languages(—).
Since the setup A and D have similar ā values (a¯A ' a¯D ), they provide a
better example of what difference the multi-parameter language brings to the
language evolution scenario. Figure 3.2 compares the number of languages and
the effective number of languages (ne ), calculated using the equation (3.3). In
the single-parameter language case A, all the possible languages exist in the
population in the region where q < q1A . On the other hand, the 6-parameter case
D has only half of the all possible languages at q = q1D .
Figure 3.1 A shows that if the learning fidelity is greater than 0.9, one language
dominates in the population. That trend is illustrated clearly by the average effective number of languages in Figure 3.2 A. There are still over half of all possible
languages remaining in the population at q = 0.9. This number overestimates the
true variation in the population when some languages disproportionately dominate while most are at very low frequency. Incomplete/incorrect learning provides
a constant influx of variants, but these variants do not propagate to any appre-
39
ciable frequency due to their inferior fitness. The effective number of languages
ne for the set A at q = 0.9 is close to 1 (ne = 1.68), which indicates that the
population has converged to one language, and the rest of languages exist a very
low frequency .
In contrast, Figure 3.2 D shows a gradual decline in number of languages as
learning fidelity increases. For this set, the number of languages in the population
starts decreasing noticeably for q values above 0.55, and the effective number of
languages ne decreases over the entire range. However, at q values above 0.9, set
D shows a higher ne value (3.75 at q = 0.9) than set A, indicating that there are
more relatively high abundance languages in set D despite the fact that the total
number of languages is lower.
In set A, all possible languages are a single step away in sequence space; in
other words, all possible languages are reachable by a single incorrect/incomplete
learning event. In set D, however, only a small subset of possible languages are
producible as single step variants from the dominant language. These single-step
variants of the dominant account for the majority of non-dominant languages in
the population. Additionally, these variants have a high fitness relative ā, and a
higher equilibrium frequency in mutation-selection balance.
3.5
Discussion
For the set of single-component languages, our empirical results closely match
the analytic results produced by Komarova et al. In an unstructured language
space, high fidelity learner-driven change, such as the sort exhibited by human
languages, can only occur just above the critical error threshold q1 , near the
bifurcation point.
40
These simulations show that substantial levels of linguistic coherence can be
achieved with lower learning fidelity if structure is introduced. All four settings
explored here have language spaces of exactly the same size, and yet the structured language sets allow fairly stable asymmetric solutions even with quite low
learning fidelity and show a much more gradual approach to coherence.
We conclude that a simple regularity combined with gradual learning can
dramatically reduce the number of languages that exist in the population, even
in regions where analytic results indicate that only symmetric solutions will be
stable. Gradual learning used in this experiment seems a more realistic approximation to reality than the “memoryless” learning used in previous work. The
qualitatively different dynamics with respect to the critical learning fidelity suggests that convergence to a set of closely related languages is significantly easier
than previously thought.
These results are in keeping with the expectations of a quasi-species interpretation. Gradual learning maps the grammars into a sequence space where some
grammars have fewer mutational (incomplete/incorrect learning) steps from others. Calculating the similarity between grammars which determines fitness as
one minus Hamming distance divided by sequence length ensures that grammars
that are close in the sequence space have similar fitness values. This produces a
smooth fitness landscape.
The upshot of this smooth fitness landscape is that selection operates on the
quasi-species formed by the dominant grammar and its close variants. At learning fidelity values below q1 , the population converges not to a single dominant
grammar with all other grammars equally represented, but instead to a family of
similar grammars. The fidelity of this family is higher than the nominal learning fidelity because a sizable proportion of incomplete/incorrect learning events
41
among members of the quasi-species result in other members of the quasi-species.
At still lower q values, that family of grammars (the quasi-species) spreads farther out in sequence space, until at some point it includes all possible grammars
and is identical to the symmetric analytical solution provided by Nowak and
Komarova’s model [NKN01, KNN01].
For learning fidelity values higher than q1 , we note that a structured grammar
space weakens the selection against the minor variants of the dominant grammar
in comparison to unstructured or single component grammar models. This effect
causes the population to display a dominant abundance below the analytical
model’s expectations because the close variants of the dominant have a higher
equilibrium abundance in mutation-selection balance.
We conjecture natural languages can be viewed as belonging to a highly structured set at some level of description relevant to a theory of learning and cultural
transmission, even if this structure is not reducible to a simple sequence representation. As such, the qualitatively different dynamics explored here are important
to understanding how human language evolves through time. Additionally, in
technological applications where agents learn from each other and it is desirable
for the overall system to converge, these results may provide a guide to designing
properties of the language or state representation depending on the degree of convergence desired. If it is sufficient that agents of the system just mostly agree, i.e.
converge to close variants of a dominant grammar, then a structured state space
may provide a way to achieve faster convergence at higher mutation values. However, if absolute convergence is required, the state space must be designed such
that minor variants are strongly selected against, producing a sharp fitness peak.
This constraint also implies that a critical mutation/learning fidelity threshold
exists.
42
Acknowledgments
This work was supported by NIH, the UCLA Center for Embedded Network Sensors,
the Defense Advance Research Projects Agency (DARPA), administered by the Army
Research Office under Emergent Surveillance Plexus MURI Award No. DAAD19-011-0504, and DARPA MURI award administered by the US Airforce No. F49620-01-10361. Any opinions, findings, and conclusions or recommendations expressed in this
publication are those of the authors and do not necessarily reflect the views of the
sponsoring agencies.
43
CHAPTER 4
Evolutionary dynamics in grammar networks
Abstract
Recent studies of evolutionary language dynamics have shown promising results
regarding linguistic convergence and diversity. Many studies assume a language
set where each language is equally related to every other language. In this chapter, I deviate from this assumption and assume that some languages are more
closely related than others. I specify the conditions for the emergence of linguistic
coherence in terms of learning fidelity and network density. In agreement with
previous studies, a bifurcation from the symmetric state, where all grammars are
in equal frequency, to an asymmetric state occurs in a complete graph network.
More sparse grammar networks exhibit different patterns of convergence, often
including a weak cohesion phase where the distribution of grammar frequencies
forms roughly Gaussian shape centered around the most frequent grammar g.
Weak cohesion is of particular interest from an evolutionary standpoint since
there is both linguistic coherence and standing heritable variation. Different networks not only exhibit different patterns of language convergence, but also reach
an equilibrium state in different speed. The relative convergence time indicates
that a population reaches a steady state slower when a learning fidelity is close to
a transition point regardless of the topology of grammar networks. Part of this
work is submitted to Adaptive Behavior.
44
4.1
Introduction
The human ability to think and communicate in expressive symbolic languages
distinguishes us from any other species. The emergence of language is one of
the truly major events in evolutionary history. Language enables a new mode of
evolution by allowing much broader and more flexible transmission of heritable
information [SS99, JL05].
Among the many aspects of language, grammar has been most appreciated
as crucial since it provides effectively infinite expressivity. There has been a
great deal of debate about the degree to which the underlying representation
of languages are inherited or learned and how language may affect organisms’
biological fitness.
Language models based on the Chomskian paradigm [Cho65, Cho80] view language as an aspect of individual psychology. Due to the computational complexity
of natural language grammars and the poverty of stimuli available in language
learning, there must be a common set of constraints guiding language acquisition.
Pinker and Bloom, for example, suggest that a language instinct constrained by a
universal grammar sets the stage for language acquisition which then contributes
to individual fitness [PB90, Pin94]. Hauser, Chomsky and Fitch argue more
recently that while certain perceptual and articulatory abilities may have been
selected for, it remains unclear how the most fundamental aspects of human language emerged [HCF02, FHC05]. However, all parties agree that linguistically
relevant properties are to some extent learned through cultural transmission and
change over time.
Many researchers have subsequently adopted evolutionary theory for studying language evolution. Recent studies applying a biological perspective to the
45
evolution of linguistic convergence and diversity have shown promising results
[HI95, HI96, Ste96, KNN01, NKN01, Kir01]. Particularly, Komarova, Nowak and
Niyogi have adopted a model based on Eigen and Schuster’s molecular evolution
work [ES79, EMS89, FS87] for describing the dynamics of language evolution
[NKN01, KN01, KNN01].
Most studies on language evolution have been based on the convergence dynamics of a population learning languages from a set where each language is
equally related to every other language. However, natural languages are organized hierarchically, with language families, subgroups, and even dialects. The
similarity of this language hierarchy with species trees is one of the inspirations
for an evolutionary approach in the first place.
In this chapter, we define a grammar network to be a graph describing the similarity between grammars. Preliminary results have shown that different grammar
networks alone can qualitatively change the dynamics of language convergence
[Olf05, OM04, LCK05]. Independently, Matsen and Nowak recently have explored language convergence conditions on a “nearly-regular” language network
[MN04].
I explore the population dynamics of grammar acquisition using four structurally distinct grammar networks. I assume that the grammar network determines the similarity of grammars from each other in mutational space as well as
their contribution to the individual’s fitness. Two equilibrium states have previously been characterized in terms of language convergence: one is the symmetric
state where all grammars exists in equal frequencies, and the other is a strong
cohesion state where the symmetry breaks and one grammar predominates in a
population [NKN01, KNN01].
Our results identify another state we label weak cohesion, where a group of
46
structurally similar grammars predominates in the population. The high degree
of standing variation characteristic of the weak cohesion state is more suggestive
of real-world observations than the other two equilibria previously described.
The relative convergence time indicates that a population reaches an equilibrium
state slower when a learning fidelity lies within a transition region regardless of
the topology of grammar networks. Graph density threshold required to achieve
a significant linguistic adds another dimension to consider for a coherent language
to emerge.
4.2
Evolutionary Dynamics of a Population of Language
Learners
Consider a network of grammars with nodes U = {G1 , G2 , . . . , Gn }. Let sij
denote the similarity of Gi to Gj . Then, we define the adjacency weights of
the grammar network as aij = (sij + sji )/2. The matrix of mutual similarities
A = [aij ] specifies the set of edges E of a graph Γ = (U, E) according to the
relation aij > 0
⇐⇒
(i, j) ∈ E. Matrix A represents the interconnection
topology of the grammar network. In our setting, A is defined as:
aij =
1
a
0
if i = j
if (i, j) ∈ E
if (i, j) ∈
/E
Each grammar represents a language hypothesis, consisting of a set of rules
that an individual deduced from its input. It is possible that two grammars
can have a completely different set of rules and yet generate sentences that are
somewhat mutually understandable. In order to prevent potential confusion,
47
let us distinguish between two types of similarities, structural similarity and
expressive similarity. Structural similarity is how many grammatical rules or
lexical items two grammars share. Expressive similarity relates to the probability
that a sentence generated from one grammar is similar to a sentence generated
from another grammar. Structural similarity is analogous to genotype similarity,
and expressive similarity is analogous to phenotype similarity.
In our setting, all grammars in a network are positioned on a polygon where
their positions are indicative of structural similarity. If two grammars are positioned side by side, they share many common rules for generating sentences.
The relationship between grammars is represented as edges in a grammar
network specifying which learner grammars each teacher grammar may produce.
In a molecular evolution framework, this is the graph defining possible mutational
transitions. For our model, the grammar network is completely specified by the
matrix of mutual similarities A. Note that a mutual similarity aij is equivalent to
the expressive similarity of two language hypotheses. As long as two grammars
are connected ((i, j) ∈ E), they have some degree of mutual intelligibility (a ij >
0).
Within this setting, each mutation or incorrect language learning step doesn’t
necessarily yield a grammar that is structurally similar to its teacher’s grammar.
Sometimes the learner can deduce a set of rules that are completely different from
its parent’s and yet generate sentences that are very close to its input. Thus, the
grammar network defines the space which an individual explores while learning
language.
Let xi denote the proportion of a population of constant size speaking grammar Gi with n possible grammars existing. I assume that each individual uses
P
only one grammar, thus we have nj=1 xj = 1.
48
The fitness of individuals with grammar Gi is fi = f0 +
P
aij xj where f0
P
is the base fitness which does not depend on the language, and j aij xj is the
j
language contribution to fitness. Note that this fitness equation is frequency
dependent.
The evolutionary dynamics of this population is of the form
ẋi =
X
j
where φ =
P
i
xj fj qji − φxi ,
1≤j≤n
(4.1)
xi fi is the average fitness, and Q = [qij ] is the learning fidelity
matrix. The term −φxi maintains the constant population size.
This dynamic system can be thought of either as having individuals that produce offspring to replaces a randomly chosen individual, or as having individuals
that change their grammars by learning a teacher’s language. I will use the latter
terminology.
The learning model is a relationship between the matrix of mutual similarities
A and the matrix Q, which is defined by
qii = q,
qij = (1 − q) P
aij
j6=i
aij
for all i 6= j.
(4.2)
The learning fidelity q is the probability that a learner acquires the same grammar
P
as its teacher. Q satisfies the condition j qij = 1 for all i. The special case of
this transition matrix where aij = a for all i 6= j was analyzed by Komarova et.al.
[KNN01, KN01].
4.3
Grammar Network (GN) Model
Graph theory provides a large number of metrics for describing the topology of
grammar networks [OM04, Olf05]. Among many metrics for describing the topology of grammar networks, the three we will focus on are density, the clustering
49
coefficient, and the mean path length. Density D is the measure of the number
of links in the graph divided by the number of possible links [AB02]. The clustering coefficient for node i, Ci , is the number of links between nodes that are
both connected to node i divided by the number of such links in the complete
graph [WS98]. C̄ is simply the mean Ci over all i. Mean path length is the mean
number of links in the graph that must be traversed to connect any pair of nodes.
Four canonical types of graphs with substantially different structural properties considered in this experiment are the complete graph, a regular ring lattice, a
random graph, and a small-world graph, depicted in Fig.4.1. For each graphs, we
expect the dynamics to be different. Less than complete graphs make sense from
a biological or sociological standpoint since learning a language that is similar
to what it has already seems much easier than learning a drastically different
language. In the following, we describe in more detail each type of graph that
will be used in our experiments.
4.3.1
The Complete Graph
In the complete graph, the mutual intelligibility matrix A is defined by:
1
if i = j
aij =
a
if i 6= j
, where a > 0. The density, clustering coefficient, and mean path length of a
complete graph are 1, and the total number of links is n(n − 1)/2. The complete
grammar network is where every grammar has equal mutual intelligibility with
any other grammar. A grammar can mutate to any other grammar with a single
learning step; evident from mean path length equal to 1.
50
(a)
(b)
(c)
(d)
Figure 4.1: Different type of graph with 12 nodes. (a) the complete graph, (b)
a ring lattice with k = 2, (c) a random graph with k = 1 and r = 12, (d)
a small-world graph with k = 2 and r = 3. The examples shown here have
different densities. The experiments, however, designed in such way that all the
graphs except a complete graph have the same density.
4.3.2
Regular Ring Lattice
A regular ring lattice is a cyclic graph with n nodes evenly spaced on a ring
and each node linked to its 2k nearest neighbors on the ring (example shown in
Fig.4.1(b)). The mutual intelligibility matrix of this graph with nonzero weights
equal to a > 0 is denoted by
A = In + aC(n, k)
51
where In denotes an n × n identity matrix. C(n, k) is the set of links satisfying
the constraints that i 6= j and nodes i and j are separated by less k vertices on
the polygon defining the ring.
The density of this class of graphs is 2nk/n(n − 1) = 2k/(n − 1). For the
special choice of n = 2k + 1, one obtains a complete graph with the maximum
density of 1. Ring lattice graphs are highly clustered, where each node has the
same clustering coefficient of:
C̄ =
3k − 3
4k − 2
for 1 < k < n/3
In this network, the mutual intelligibility can infer some structural similarity.
In order for a learner to acquire a grammar that is very different from its teacher’s,
the learner has to undergo many mutational steps. This trend is reflected by the
large mean path length, which is proportional to n/k on a ring lattice[AB02].
4.3.3
Random Network
In contrast to the regularity of the ring lattice graph, a pure random graph is
composed randomly chosen links. I consider a slight variant of the purely random
graph described as
A = In + aC(n, 1) + aR(n, r).
This random graph consists of r random links with weights of a as well as a regular
ring lattice with k = 1 (example shown in Fig.4.1(c)). The ring lattice of k = 1
guarantees connectivity of the graph. The r random links exclude duplicates and
self-connections so the total number of links is n + r. I choose r = n(k − 1) to
make the density the same as a ring lattice with the same n and k values.
In this network, each mutational step involves arbitrary changes in the rules
and the mutual similarity is irrelevant to the structural similarity. The mean path
52
length on a random graph is proportional to log of n [AB02], leading to much
lower values than a ring lattice with k ¿ n. It turns out that this low mean path
length is a good indication that the evolutionary dynamics in a random graph is
similar to that in a complete graph.
4.3.4
Small-World Network
A small-world graph, proposed by Watts and Strogatz [AB02, WS98], is characterized by a low-density and relatively high mean clustering coefficient, much like
the ring lattice. However, small-world graphs have a relatively short mean path
length which is roughly proportional to ln n, like a random graph.
A standard method to generate a small world graph starts with a ring lattice
with a low k value, then adds a number r random additional links excluding
self-connection and duplication:
A = In + aC(n, k) + aR(n, r).
(example shown in Figure 4.1(d)) [NW99a, NW99b, Olf05, OM04]. By choosing
r = n(k0 − k), we generate small world graphs that have identical density to a
ring lattice with the same n and k equal to k0 .
In this graph, most mutational steps involve minor structural change, but a
small portion of mutation events involves drastic change in the grammar structure. In a small-world grammar space most expressively similar grammars are
also structurally similar, but a small number of grammars are structurally very
different yet mutually intelligible.
53
Network Type
n
k
r
Graph Density
q
Complete graph 51
25
-
1
[0.94 1]
Ring lattice
51
15
-
0.6
[0 1]
Random graph
51
1
714
0.6
[0 1]
Small World
51
13
102
0.6
[0 1]
Table 4.1: Parameters chosen for comparing language convergence behavior of
four grammar networks in response to learning fidelity q. All networks are set to
the same density except a complete graph.
Network Type
n
k
r
q
Ring lattice
200
[1 100]
-
0.8
1
[200 19800]
0.6
Random graph 200
Table 4.2: Parameters chosen for comparing language convergence behavior in
response to graph density.
4.4
Parameter settings and Methods
I choose a random network with k = 1 and r = 714 and a small world network
with k = 15 and r = 102 for comparing equilibrium states in response to learning
fidelity q. The density of all network is the same except for a complete graph.
The choice of graph parameters for comparison are summarized in Tab. 4.4.
For testing the effect of graph density in language convergence, I choose a ring
lattice and a random graph with varying k and r, respectively. Learning fidelity
values are chosen so that strong cohesion do not occur in the graph density we
used for Experiment 1. The choice of graph parameters are summarized in Tab.
4.4.
Results are obtained from a numerical evaluation of the system described by
54
equation 4.1, refereed to as runs, using the fourth-order Runge-Kutta method
[AS72]. I assume the population is approximately in equilibrium state when the
first and second derivatives for all grammar frequencies are below a predefined
threshold.
I obtain the approximate steady-state frequencies of grammars for a given adjacency matrix A associated with a graph G, and a random choice of initial state
P
x(0), which satisfies i xi (0) = 1. From the grammar frequencies we calculated
the effective number of grammars ne as a measure of grammar diversity, which is
defined as[CK70]:
1
ne = P
i
x2i
The convergence time Tc is measured as the number of replication cycles to
took to reach an approximate steady state. For each grammar network, mean
convergence time T¯c is calculated as the average number of replication cycles took
to reach equilibrium states over the range of q = [0, 1].
4.5
Result 1: three equilibrium states
In Fig.4.2 we show the equilibrium frequency of the dominant grammar over a
range of q values for a complete graph and regular lattices with fixed n = 51
and a = 0.5. Each point is the result from a single run, but the q interval
(= 10−4 ) is small enough that the points appear as a line in places. The symmetric
solution where all grammars exist in equal frequencies of 1/n is a stable attractor
for q ≤ qs . The asymmetric state where single predominant grammar emerges
become a stable attractor above a critical q value of qa . Fig.4.2(a) shows the
bi-stability region, where both symmetric and asymmetric solutions are stable,
in agreement with the results analyzed by Komarova et.al. [KNN01, NKN01].
55
1
(a)
xmax
0.8
0.6
qa = 0.965
0.4
q = 0.972
s
0.2
0
0.94
1
0.96
0.98
1
(b)
xmax
0.8
0.6
qa = 0.828
0.4
q = 0.3522
s
0.2
0
0
0.2
0.4
0.6
0.8
1
1
(c)
xmax
0.8
0.6
qa = 0.7956
0.4
0.2
0
0
0.2
0.4
q
0.6
0.8
1
Figure 4.2: The dominant grammar frequency, xmax vs learning fidelity q for a
ring lattice with n = 51: (a) k = 25 (a compete graph), (b) k = 15, (c) k = 10.
The q interval is 10−4 .
The ring lattice graph with k = 15 also has symmetric and asymmetric solutions as shown in Fig.4.2(b). In a complete graph, qs is greater then qa allowing
a bi-stability region where both symmetric and asymmetric solutions can be stable depending on the initial frequency distribution. For the ring lattice network,
however, the symmetric state breaks at qs well before asymmetric solution can
be stable. The symmetric state is not observed for the ring lattice network with
k = 10 as shown in Fig.4.2(c).
The learning fidelity threshold for asymmetric solution, qa , is the highest for
56
50
qa = 0.965
(a)
ne
40
30
20
q = 0.972
10
s
0
0.94
50
0.96
0.98
qa = 0.828
ne
40
(b)
1
30
20
q = 0.3522
10
0
s
0
0.2
0.4
0.6
0.8
1
50
(c)
ne
40
qa = 0.7956
30
20
10
0
0
0.2
0.4
q
0.6
0.8
1
Figure 4.3: The effective number of grammars, ne vs q (a) k = 25 (complete
graph), (b) k = 15, (c) k = 10.
the complete graph and it is lower in the network with the smaller k values
(qa,k=25 = 0.9720 > qa,k=15 = 0.8280 > qa,k=10 = 0.7956).
Fig.4.3 shows the effective number of grammars for precisely the same runs
as Fig.4.2. In agreement with the results shown in Fig.4.2(a), Fig.4.3(a) shows
two classes of equilibrium states as well as bi-stability region over a range of
learning fidelities qa ≤ q ≤ qs for the complete graph. The grammar diversity ne
is the maximum (= n) when a population is in symmetric state, while ne is the
minimum (' 1) when a single dominant grammar emerges. The value of ne is
indicative of the degree of linguistic coherence.
57
Fig.4.3(b) suggests a third class of solutions which occurs at q values between
qs and qa for ring lattice networks. This class of stable attractors is characterized
by a nearly linear decrease in ne when qs ≤ q ≤ qa , as shown in Fig.4.3(b).
For the regular ring lattice with k = 10, the symmetric state does not exist for
q ≥ 0 as shown in Fig.4.3(c). The symmetric solution can still be obtained using
negative q values, but the interpretation of such values is not obvious. Fig.4.3(c)
also shows that some level of coherence can be achieved even with the learning
fidelity of 0 when k = 10.
In the regular ring lattice with k = 15, the frequencies of each grammar xi at
approximate equilibrium state for different learning fidelities are distributed as
shown in Fig.4.4. I cut the ring at the point opposite to the dominant grammar
and spread it along the x axis, so that the dominant grammar is always at the
center. If the grammar is positioned close to the dominant, the grammar index is
close to the index of the dominant, indicating that they are structurally similar.
Fig.4.4(a) shows the grammar frequency distribution when q = 0 as an example of a symmetric state. If the learning fidelity q is greater than qa , only one
grammar dominates as shown in Fig.4.4(c). We call this phase strong cohesion.
When the learning fidelity is between qs and qa , the grammar frequencies form
a smooth curve reminiscent of a Gaussian as shown in Fig.4.4(b). We call this
phase weak cohesion. In this phase learning fidelity is too low for a single grammar to dominate by faithfully reproducing itself, however structure in grammar
space allows for a collection of closely structurally similar grammars to rise in
frequency. Since learning errors produce similar grammars to the teacher’s grammar, the effective learning fidelity for the group of grammars is higher. This is
analogous to the formation of a quasi-species in molecular evolution [ES79].
The grammar frequency distribution over the range of q for various topologies
58
(a)
Frequency
0.2
0.15
0.1
0.05
0
10
20
30
40
50
10
20
30
40
50
20
30
40
Adjusted Grammar Index
50
(b)
Frequency
0.2
0.15
0.1
0.05
0
(c)
Frequency
0.8
0.6
0.4
0.2
0
10
Figure 4.4: Frequency of all grammars, sorted by index number, of a single run
at the steady state for a ring lattice with n = 51, k = 15, and a = 0.5: (a): q = 0,
(b): q = 0.82, and (c): q = .83.
is shown in Fig.4.5. For each q value the grammar frequencies are sorted.
For all types of grammar network, a population is in strong cohesion state
if learning fidelity exceeds error threshold as shown in Fig.4.5. The observed
error threshold is the highest for the complete graph and the lowest for the
random graph. For the same network density, a small-world graph has lower error
threshold than a ring lattice but has higher threshold than a random graph.
59
Grammar Frequency
(a)
(c)
1
50
0.5
(b)
40
30
0
1
20
0.988
0.976
0.964
0.952
q
10
0.94
0
i′
(d)
Figure 4.5: Distribution of grammar frequencies at the steady state with n = 51
and a = 0.5 over the range of 0 ≤ q ≤ 1. For each q value, the grammar
frequencies are sorted. (a) complete graph, (b) ring lattice with k = 15, (c)
random network with k = 1 and r = 714 and (d) small word with k = 13 and
r = 102.
60
For a complete graph, a population is in symmetric state if learning fidelity
is below error threshold (Fig.4.5(a)). In the regular ring lattice with k = 15, a
weak cohesion phase is observed in mid-range learning fidelties. A population is
in the symmetric state if the learning fidelity is sufficiently low for a ring lattice
(Fig.4.5(b)). I was unable to detect any obvious weak cohesion for a random
graph as shown in Fig.4.5(c).
4.6
Result 3: the relationship between grammar network
density and language convergence
Fig.4.6 shows the effect of graph density on the level of coherence for a regular
ring lattice. I plotted (a) grammar diversity, (b) dominant grammar frequency,
and (c) minimum grammar frequency for a ring lattice with n = 200 and a = 0.5
given a fixed learning fidelity. Notice that the learning fidelity we used (q = 0.8)
is smaller than qa = 0.829 for a complete graph. The grammar diversity and
dominant frequency changes in non-linear fashion in response to network density.
When a grammar network form a complete graph (k = 100), the population
is in the symmetric state (ne = 200) as expected. When the density of a ring
lattice is sufficiently high (k ≥ 83), the population is also in the symmetric state,
and both xmax and xmin are equal to 1/n = 5 × 10−3 as expected.
As the density of the graph decreases, grammar diversity ne decreases reflecting an increased level of linguistic coherence, indicating a weak cohesion state.
When the network density is sufficiently small (k ≤ 10), the grammar diversity
is close to 1 and the dominant frequency is over 0.5 which is indicatives of strong
cohesion.
61
200
160
ne
120
(a)
80
k = 10
k = 83
40
0
0
20
40
60
80
100
0.7
(b)
xmax
0.56
0.42
0.28
k = 10
k = 83
0.14
0
0
20
40
60
80
100
−3
x 10
5
(c)
xmin
4
3
2
k = 10
1
0
0
20
k = 83
40
60
80
100
k
Figure 4.6: Non-linear behavior of language convergence in response to network
density. Various measures such as (a) grammar diversity ne , (b) dominant frequency xmax , and (c) minimum frequency xmin are plotted for varying k = [1, 100]
given a ring lattice with n = 200, a = 0.5, and a learning fidelity q = 0.8.
62
200
db = 0.29849
(a)
Diversity (ne)
150
100
da = 0.26935
50
(b)
Dominant Frequency (xmax)
0
0
0.2
0.4
0.4
0.6
network density
0.8
1
0.8
1
db = 0.29849
0.3
0.2
da = 0.26935
0.1
0
0
0.2
0.4
0.6
network density
−3
(c)
Minimun Frequency (xmin)
5
x 10
db = 0.29849
4
3
2
da = 0.26935
1
0
0
0.2
0.4
0.6
network density
0.8
1
Figure 4.7: Grammar diversity, dominant frequency, minimum frequency in response to network density of a random graph with k = 1 and r = 714 at the
learning fidelity of q = 0.6.
63
Fig.4.7 shows the effect of graph density on the convergence for a random
network. I was unable to find weak cohesion conditions for a random graph. I
label da as the maximum graph density where only strong cohesion is a stable attractor. db is the graph density where strong cohesion becomes a stable attractor
but other equilibrium state is also stable.
Given a fixed learning fidelity value of q = 0.6, a random graph exhibit a sort of
the symmetric state if graph density is sufficiently high. In other words, if network
density is sufficiently small, strong cohesion can be observed. In grammar network
context, if a grammar has a limited number of grammars that it can mutate into
and its number of transitions is relatively small compared to the number of overall
possible transitions, population may reach a common consensus.
Unlike a ring lattice, a random graph does not appear to have a perfect
symmetric state; the xmax remains small but xmin do not remain at 1/n = 5×10−3 .
4.7
Result 3: rate of language convergence in grammar
networks
Fig.4.8 shows the convergence time measured elapsed time for each grammar
network to reach the approximate equilibrium state. The q regions where convergence time is relatively long closely match the transition points between phases
seen from monitoring xmax and ne .
1
Although the actual elapsed time may vary depending on the choice of integration step size or integration method, the relative time clearly shows that it
take more time to reach steady-state when the learning fidelity resides near a
1
You may compare the convergence time result of a complete graph and a ring lattice with
the previous study presented in Fig.4.2 and Fig.4.3.
64
4
2500
7
6
1500
(b)
1000
Convergence Time
Convergence Time
2000
(a)
x 10
5
4
3
2
500
1
0.96
q
0.98
0
1
0
0.2
12000
14000
10000
12000
8000
(d)
6000
4000
2000
0
Convergence Time
(c)
Convergence Time
0
0.94
0.4
0.6
q
0.8
1
10000
8000
6000
4000
2000
0
0.2
0.4
q
0.6
0.8
0
1
0
0.2
0.4
q
0.6
0.8
1
Figure 4.8: Time (or the number of replication cycles) took to reach a equilibrium
state for each topology (a) complete graph, (b) regular graph with k = 15, (c)
random network with k = 1 and r = 714, and (d) small-world network with
k = 13 and r = 102 . Numerical summary of this graph is presented in Table 4.7.
transition point.
Fig.4.8(b) shows a sharp peak in convergence time for a ring lattice with k =
15 around q = 0.35, where transition from symmetric to weak cohesion occurs,
and around q = 0.82, where transition from weak cohesion to strong cohesion
occurs. In contrast, a random network shows only one peak, a transition to strong
cohesion, as shown in Fig.4.8(c). For a small world network, the transition from
symmetric to weak cohesion and the transition from weak cohesion to strong
cohesion overlap over broad region of q values as shown in Fig.4.8(d), yet it is
evident that two distinct peaks correspond to the the transition point from the
symmetric state to weak cohesion and the transition point from weak cohesion
65
Network Type
Network Density
Mean Convergence Time ±δ
1
145.0 ± 200.5
A regular ring lattice
0.6
548.3 ± 1414.1
A random graph
0.6
205.6 ± 269.2
A small-world
0.6
885.0 ± 1154.1
A complete graph
Table 4.3: A brief summary of convergence time for various grammar networks.
A population reaches at the steady state quickest in complete graph and slowest
in a regular ring lattice. Convergence time is longer is a learning fidelity falls
near transition points.
to strong cohesion.
Mean convergence time over the range of q = [0, 1] for each grammar network
are presented in Table 4.7. Overall, the mean convergence time is the shortest
for a complete graph and longest for a small-world network. Among three graphs
that have the same density, a random graph reaches at equilibrium fastest.
4.8
Discussion
The dynamics of the evolutionary system defined by equation 4.1 are characterized by three possible equilibrium states: (1) the symmetric state (q ≤ qs ) where
xi = 1/n for all i, (2) a weak cohesion state where the symmetry in grammar
frequencies breaks and the distribution of grammar frequencies forms roughly
Gaussian shape centered around the most frequent grammar, and (3) a strong
cohesion state (q ≥ qa ) where a single predominant grammar emerges.
All grammar network shows distinct patterns of language convergence, with
varying composition of three equilibrium states, namely the symmetric state,
66
weak cohesion states, and strong cohesion states. For the same density, a random
network will generally have a much smaller mean path length than a regular
graph. Thus the evolutionary dynamics of relatively dense random graphs much
more closely resemble the complete graph than a regular graph with the same
density.
For a grammar space defined by a mid to low density regular ring lattice, a
weak cohesion phase of equilibria can be identified at learning fidelities between
qs and qa . This region is below the error threshold for a complete graph where
no cohesion or evolution can take place. The existence of a weak cohesion phase
is dependent of structure in grammar space allowing the formation of a quasispecies of related grammars. While the learning fidelity in this region is too low
to allow a single grammar to faithfully reproduce itself well enough to maintain a
higher frequency than other grammars, the effective learning fidelity of the quasispecies as a whole is sufficiently high for the collection of grammars within it to
dominate.
The existence of the weak cohesion phase suggests that a group of related
grammars can emerge with lower learning fidelities than is required to establish
a single consensus. Weak cohesion is also characterized by a large amount of
standing heritable variation within the population which is particularly intriguing
from evolutionary perspective.
Our results of varying graph density indicate that there are threshold density
values for convergence. In general, as the density of the grammar network decreases the level of linguistic convergence increases for a fixed learning fidelity.
We expect that this result holds generally for other structured graphs, although
the mean path length or clustering coefficient may be more indicative metrics of
structure in grammar space.
67
Recognizing the importance of social networks in language learning, individual based models have attracted attention as an experimental tool for language
evolution. In the course of this investigation, we found that linguistic relatedness
alone can affect the evolutionary dynamics.
The regular ring lattice is only an example of many different network structures. However, the qualitatively different dynamics with respect to the network
density suggests that convergence to a set of closely related languages is significantly easier for a structured grammar space, in contrast to conclusions based
only on the fully-connected model.
If a grammar space is structured as a complete graph, high fidelity learnerdriven change, such as the sort exhibited by human languages, can only occur
just above a critical error threshold where the bifurcation of strong cohesion and
the symmetric state begins. The empirical results from other grammar networks
shows that this threshold is highest for the complete grammar network. Strong
cohesion can be achieved in lower learning fidelity regions if grammar space is
structured. Weak cohesion may occur with lower learning fidelity if a grammar
network is structured. Weak cohesion is of particular interest since a population
in weak cohesion state holds a large amount of standing heritable variation.
Different networks not only exhibit different patterns of language convergence,
but also reach an equilibrium state in different speed. The relative convergence
time indicates that a population reaches a steady state slower when a learning
fidelity is close to a transition point regardless of the topology of grammar networks. The convergence time results suggest that there may be another critical
error threshold where weak cohesion becomes a stable attractor but the symmetric solution is also stable. In technological applications where agents learn
from each other and it is The study is far from predicting the state or course of
68
evolution of natural languages.
In technological applications where agents learn from each other and it is desirable for the overall system to converge, these results may provide a guide to
designing properties of the language or state representation depending on the degree of convergence desired. If it is sufficient that agents of the system just mostly
agree, i.e. converge to close variants of a dominant grammar, then a structured
state space may provide a way to achieve faster convergence at higher mutation values. However, if absolute convergence is required, the state space must
be designed such that minor variants are strongly selected against, producing a
sharp fitness peak. This constraint also implies that a critical mutation/learning
fidelity threshold exists.
Acknowledgments
We thank Gregory M. Kobele for insightful linguistic interpretation of our model. Special thanks to Prof. Emilio Frazzoli, Prof. Jeff S. Shamma, Prof. Edward P. Stabler,
and Prof. Charles E. Taylor. This work was supported by MURI award administered
by the US Airforce No. F49620-01-1-0361, NSF (CNS 0513058, EF-0410438), and the
UCLA Center for Embedded Network Sensors. Any opinions, findings, and conclusions
or recommendations expressed in this publication are those of the authors and do not
necessarily reflect the views of the sponsoring agencies.
69
CHAPTER 5
Summary
In this thesis, I explored parallels between genetic inheritance and evolutionary
aspect of language systems. The population dynamics of language learners are
addressed as a less abstract instantiation of the general case of agents learning
from one another. I found that topology of population network and language
structure affect the evolutionary dynamics and influence the degree of linguistic
coherence and linguistic diversity.
My study shows that a structured grammar space, where a grammar can
mutate into only a limited set of related grammars, can allow coherence to emerge
under lower learning fidelity conditions than indicated by previous work. Further,
under some types of grammar networks, populations exhibit “weak cohesion”
where the distribution of grammar frequencies forms a roughly Gaussian shape,
reminiscent of a quasi-species in models of molecular evolution. Weak cohesion
is also characterized by a large amount of standing heritable variation within the
population which is particularly intriguing from evolutionary perspective. In a
grammar network model, not only error rate but also network density affect the
degree of coherence and diversity of language.
Frequency-dependent selection (FDS), often being synonymous with negative
FDS, has been regarded as a mechanism for maintaining polymorphism in a population. On the contrary, the population dynamics under positive FDS has been
of limited interest partly due to its obvious dynamics leading to monomorphism.
70
Under this type of selection, the fitness of an organism increases as its proportions
in a population increases. The positive FDS, however, gathered fair amount of
attention in the field of language evolution. Recent studies applying a biological
perspective to the evolution of linguistic convergence and diversity have shown
promising results. Investigations regarding the possibility of polymorphism under positive FDS, reveal that the population may exhibit standing variations even
under the positive FDS if population or grammar space is structured.
As a part of research project building adaptive sensor networks in a heterogeneous environment, researchers have considered the applicability of adaptive
language. Designing a machine language that is learnable and adaptive to a temporally and spatially varying environment is a big challenge. An evolutionary
strategy where nodes adopt successful schemes from their neighbors with a fitness bonus for agreement is a general option with great promise. Such a system
maps directly onto the linguistic systems I present.
71
References
[AB02]
R. Albert and A.-L. Barabási. “Statistical mechanics of complex networks.” Reviews of Modern Physics, 74:47, 2002.
[Aie98]
L. Aiello. “The foundations of human language.” In N. G. Jablonski and L. Aiello, editors, The origin and diversification of language.
University of California Press, 1998.
[AS72]
M. Abramowitz and C. A. Stegun, editors. Handbook of Mathematical
Functions with Formulas, Graphs, and Mathematical Tables. Dover,
New York, 9 edition, 1972.
[AT96]
T. Arita and C. E. Taylor. “A Simple Model for the Evolution of
Communication.” In L.J. Fogel, P. J. Angeline, and T. Bäck, editors,
The Fifth Annual Conference On Evolutionary Programming, pp. 405–
410, Cambridge, MA, 1996. The MIT Press.
[Bic81]
D. Bickerton. Roots of language. Karoma, Ann Arbor, MI, 1981.
[CAA06] C. E. Chen, A. Ali, W. Asgari, H Park, R. E. Hudson, K. Yao, and
Charles E. Taylor. “Design and testing of robust acoustic arrays for
localization and enhancement of several bird sources.” In Fifth International Conference on Information Processing in Sensor Networks.
ICIPSN, 2006.
[Cav97]
L. L. Cavalli-Sforza. “gene, peoples and languages.” Proc. Natl. Acad.
Sci. USA, 94:7719–7724, 1997.
[Cho59]
N. Chomsky. “A review of B. F. Skinner’s Verbal Behavior.” Language,
35:26–58, 1959.
[Cho65]
N. Chomsky. Aspects of the Theory of Syntax. The MIT Press, Cambridge, MA, 1965.
[Cho80]
N. Chomsky. Rules and Representations. Basil Blackwell, London,
1980.
[Cho95]
N. Chomsky. The minimalist program. MIT Press, Cambridge, MA,
1995.
[Cho02]
N. Chomsky. On nature and Language. Cambridge University Press,
2002.
72
[CK70]
J. F. Crow and M. Kimura. An Introduction to Population Genetics
Theory. Harper & Row Publishers, New York, Evanston and London,
1970.
[CL93]
N. Chomsky and H. Lasnik. “The theory of principles and parameters.” In Syntax: An international handbook of comtemporary research,
pp. 1–32. Walter de Gruyter, Berlin, 1993.
[CMM92] L. L. Cavalli-Sforza, E. Minch, and J. L. Mountain. “Coevolution of
genes and languages revisited.” Proc. Natl. Acad. Sci. USA, 89:5620–
5624, 1992.
[CMP94] L. L. Cavalli-Sforza, P. Menozzi, and A. Piazza. The History and
Geography of Human Genes. Princeton University Press, 1994.
[CT04]
T. C. Collier and C. E. Taylor. “Self-organization in sensor networks.”
Journal of Parallel and Distributed Computing, 64(7), 2004.
[Daw89]
R. Dawkins. The Selfish Gene. Oxford University Press, 1989.
[Dea97]
T. W. Deacon. The Symbolic Species: The Co-evolution of Language
and the Brain. W.W. Norton, 1997.
[dHL03]
F. d’Errico, C. Henshilwood, G. Lawson, M. Vanhaeren, A.-M. Tillier,
M. Soressi, F. Bresson, B. Maureille, A. Nowell, J. Lakarra, L. Backwell, and M. Julien. “Archaeological Evidence for the Emergence of
Language, Symbolism, and Music — An Alternative Multidisciplinary
Perspective.” Journal of World Prehistory, 17(1):1–70, 2003.
[EMS89] M. Eigen, J. McCaskill, and P. Schuster. “The molecular quasispecies.” Adv. Chem. Phys., 75:149–263, 1989.
[ES79]
M. Eigen and P. Schuster. The hypercycle: A principle of natural
self-organization. Springer Verlag: Berlin, 1979.
[FHC05] W. T. Fitch, M. D. Hauser, and N. Chomsky. “The evolution of the
language faculty: Clarifications and implications.” Cognition, Forthcoming, 2005.
[Fox95]
A. Fox. Linguistic Reconstruction. Oxford University Press, Oxford,
UK, 1995.
[FS87]
W. Fontana and P. Schuster. “A computer model of evolutionary
optimization.” Biophysical Chemistry, 26:123–147, 1987.
73
[GHK00] N. Grassly, A. von Haesler, and D. C. Krakauer. “Error, population
structure and the origin of diverse sign systems.” J. Theor. Biol.,
206:369–378, 2000.
[HCF02] M. D. Hauser, N. Chomsky, and W. T. Fitch. “The faculty of language:
what it is, who has it, and how did it evolve?” Science, 298:1569–
1579, 2002.
[HdM01] C. S. Henshilwood, F. d’Errico, C. W. Marean, R. G. Milo, and
R. Yates. “An early bone tool industry from the Middle Stone Age
at Blombos Cave, South Africa: implications for the origin of modern human behaviour, symbolism and language.” Journal of Human
Evolution, 41:631–678, 2001.
[HI95]
T. Hashimoto and T. Ikegami. “Evaluation of symbolic grammar
systems.” In Advances in Artificial Life, pp. 812–823. SpringerVerlag:Berlin, 1995.
[HI96]
T. Hashimoto and T. Ikegami. “Emergence of net-grammar in communicating agents.” BioSystems, 38:1–14, 1996.
[HJ96]
Hans H. Hock and Brian D. Joseph. Language History, Language
Change, and Language Relationship: An Introduction to Historical and
Comparative Linguistics. Mouton de Gruyter, Berlin, 1996.
[JL05]
E. Jablonka and M. J. Lamb. Evolution in Four Dimensions : Genetic,
Epigenetic, Behavioral, and Symbolic Variation in the History of Life.
The MIT Press, Cambridge, MA, 2005.
[Jm03]
G. Jäger. “Evolutionary Game Theory and Linguistic Typology: a
Case Study.” In Proceedings of the 14th Amsterdam Colloquium. ILLC,
2003.
[Jos00]
B. Joseph. “Historical linguistics.” In Mark Aronoff and Janie ReesMiller, editors, The Handbook of Linguistics. Blackwell, Oxford, 2000.
[KBH99] Donald E. Kroodsma, Bruce E. Byers, Sylvia L. Halkin, Christopher
Hill, Dolly Minis, Jeffrey R. Bolsinger, Jo-Anne Dawson, Elizabeth
Donelan, Jeffrey Farrington, Frank B. Gill, Peter Houlihan, Doug
Innes, Geoff Keller, Linda Macaulay, Curtis A. Marantz, Jan Ortiz,
Philip K. Stoddard, and Krista Wilda. “Geographic variation in blackcapped chickadee songs and singing behavior.” The Auk, 116(2):387–
402, 1999.
74
[Kir98]
S. Kirby. “Fitness and the selective adaptation of language.” In J. R.
Hurford, M. Studdert-Kennedy, and C. Knight, editors, Approaches to
the Evolution of Language: Social and Cognitive Bases, Cambridge,
UK, 1998. Cambridge University Press.
[Kir01]
S. Kirby. “Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity.”
IEEE Transactions on Evolutionary Computation, 5(2):102–110, 2001.
http://www.ling.ed.ac.uk/anonftp/pub/staff/kirby/skirbyieee.ps.
[KN01]
N. L. Komarova and M. A. Nowak. “The evolutionary dynamics of
the lexical matrix.” Bulletin of Mathematical Biology, 63(3):451–485,
2001.
[KNN01] N. L. Komarova, P. Niyogi, and M. A. Nowak. “Evolutionary dynamics
of grammar acquisition.” Journal of Theoretical Biology, 209(1):43–
59, 2001.
[Kro05]
D. E. Kroodsma. The singing life of birds: the art and science of
listening to birdsong. Houghton Mifflin Company, New York, 2005.
[KSC99] J. Kegl, A. Senghas, and M. Coppola. “Creation through Contact: Sign Language Emergence and Sign Language Change in
Nicaragua.” In Michel DeGraff, editor, Language Creation and Language Change:Creolization, Diachrony, and Development, pp. 179–237,
Cambridge, MA, 1999. The MIT Press.
[Lab72]
W. Labov. Language in the inner city: studies in the Black English
Vernacular. University of Pennsilvania Press, Philadelphia, Pennsylvania, 1972.
[Lau68]
W. S. Laughlin. “Hunting: an integrating biobehavior system and its
evolutionary importance.” In R. B. Lee and I. DeVore, editors, Man
the hunter. Aldine, Chicaco, 1968.
[LCK05] Y. Lee, T. C. Collier, G. M. Kobele, E. P. Stabler, and C. E. Taylor.
“Grammar Structure and the Dynamics of Language Evolution.” In
European Conference on Artificial Life, 2005.
[LCS05a] Y. Lee, T. C. Collier, E. P. Stabler, and C. E. Taylor. “The role
of population structure in language evolution.” Artificial Life and
Robotics, 2005.
75
[LCS05b] Y. Lee, T. C. Collier, E. P. Stabler, and C. E. Taylor. “The role of
population structure in language evolution.” In The 10th International
Conference on Artificial Life and Robotics, Beppu, Oita, Japan, 2005.
[LCT06] Y. Lee, T. C. Collier, C. T. Taylor, and R. Olfati-Saber. “Emergence of
Cohesion in Evolutionary Dynamics of Grammar Network.” Adaptive
Behavior, 2006. submitted.
[LF99]
D. Livingstone and C. Fyfe. “Modelling the Evolution of Linguistic Diversity.” In J. Nicoud D. Floreano and F. Mondada, editors, ECAL99,
pp. 704–708, Berlin, 1999. Springer-Verlag.
[LFH01]
C. S. Lai, S. E. Fisher, J. A. Hurst, F. Vargha-Khadem, and A. P.
Monaco. “A forkhead-domain gene is mutated in a severe speech and
language disorder.” Nature, 413(6855):519–523, 2001.
[LM91]
S. M. Lamb and E. Douglass Mitchell. Sprung from a Common Source:
Investigations into the Prehistory of Languages. Stanford University
Press, Stanford, California, 1991.
[LRC04] Yoosook Lee, Jason Riggle, Travis C. Collier, Edward P. Stabler, and
Charles E. Taylor. “Adaptive Communication among Collaborative
Agents: Preliminary Results with Symbol Grouding.” Artificial Life
and Robotics, 8:127–132, 2004.
[LS04]
R. F. Lachlan and M. R. Servedio. “Song Learning Accelerates Allopatric Speciation.” Evolution, 58(9):2049–2063, 2004.
[McM94] April M. S. McMahon. Understanding Language Change. Cambridge
University Press, Cambridge, UK, 1 edition, 1994.
[MCN03] Davide Marocco, Angelo Cangelosi, and Stefano Nolfi.
“The
emergence of communication in evolutionary robots.” Philosophical Transactions: Mathematical, Physical and Engineering Sciences,
361(1811):2397–2421, 2003.
[MN04]
F. A. Matsen and M. A. Nowak. “Win-stay, lose-shift in language
learning from peers.” Proc. Natl. Acad. Sci. USA, 101(52):18053–
18057, December 2004.
[MN06]
Davide Marocco and Stefano Nolfi. “Emergence of communication in
teams of embodied and situated agents.” In Proceedings of the 6th
evolution of language conference, 2006.
76
[Niy06]
P. Niyogi. The Computational Nature of Language Learning and Evolution (Current Studies in Linguistics). The MIT Press, Cambridge,
MA, 2006.
[NKD99] M. A Nowak, D. C. Krakauer, and A. Dress. “An error limit for the
evolution of language.” Proc. Roy. Soc. Lond. B., pp. 2131–2136, 1999.
[NKN01] M. A. Nowak, N. L. Komarova, and P. Niyogi. “Evolution of Universal
Grammar.” Science, 291:114–118, 2001.
[NKN02] M. A. Nowak, N. Komarova, and P. Niyogi. “Computational and
evolutionary aspects of language.” Nature, 417:611–617, 2002.
[NSH01] S. Nowicki, W. A. Searcy, M. Hughes, and J. Podos. “The evolution
of birdsong: male and female response to song innovation in swamp
sparrows.” Animal Behaviour, 62:1189–1195, 2001.
[NW99a] M. E. J. Newman and D. J. Watts. “Renormalization group analysis
of the small-world network model.” Physics Letters A, 263:341–346,
1999.
[NW99b] M. E. J. Newman and D. J. Watts. “Scaling and percolation in the
small-world network model.” Physics Review E, 60:7332–7342, 1999.
[Olf05]
R. Olfati-Saber. “Ultrafast Consensus in Small-World Networks.”
Proc. of American Control Conference, pp. 2371–2378, 2005.
[OM04]
R. Olfati-Saber and R. M. Murray. “Consensus problems in networks
of agents with switching topology and time-delays.” IEEE Trans. Automatic Control, 49(9):1520–1533, Sep. 2004.
[PB90]
S. Pinker and P. Bloom. “Natural language and natural selection.”
Behavioral and Brain Sciences, 13:707–784, 1990.
[Pin94]
S. Pinker. The Language Instinct. Penguin, London, 1994.
[Pin95]
S. Pinker. The Language Instinct. HarperPerennial, New York, NY,
1995.
[Rue94]
M. Ruehlen. The Origin of Language. John Wiley & Sons Inc., New
York, 1994.
[SBK03] K. Smith, H. Brighton, and S. Kirby. “Complex Systems in Language
Evolution: the cultural emergence of compositional structure.” Advances in Complex Systems, 6(4):537–558, 2003.
77
[SC01]
A. Senghas and M. Coppola. “Children creating language: How
Nicaraguan Sign Language acquired a spatial grammar.” Psychological
Science, 12:323–328, 2001.
[SCJ05]
Weiguo Shu, Julie Y. Cho, Yuhui Jiang, Minhua Zhang, Donald
Weisz, Gregory A. Elder, James Schmeidler, Rita De Gasperi, Miguel
A. Gama Sosa, Donald Rabidou, Anthony C. Santucci, Daniel Peri,
Edward Morrisey, and Joseph D. Buxbaum. “Altered ultrasonic vocalization in mice with a disruption in the Foxp2 gene.” Proc. Natl.
Acad. Sci. USA, 102(27):9643–9648, 2005.
[SH95]
J. Maynard Smith and D. G. C. Harper. “Animal Signals: Models and
Terminology.” Journal of Theoretical Biology, 177:305–311, 1995.
[SK98]
L. Steels and F. Kaplan. “Stochasticity as a Source of Innovation in
Language Games.” In C. Adami, R. Belew, H. Kitano, and C. Taylor,
editors, Proceedings of Artificial Life VI, pp. 368–376, Cambridge, MA,
June 1998. The MIT Press.
[SKB03] K. Smith, S. Kirby, and H. Brighton. “Iterated Learning: a framework
for the emergence of language.” Artificial Life, 9(4):371–386, 2003.
[Smi61]
A. Smith. “Considerations concerning the first formation of languages,
and the different Genius of original and compounded languages.” In
James R. Otteson, editor, Adam Smith: Selected Philosophical Writings. Imprint Academic, 1761.
[Smi02]
K. Smith. “Natural selection and cultural selection in the evolution of communication.” Adaptive Behavior, 10(1):25–44, 2002.
http://www.ling.ed.ac.uk/ kenny/publications/ab paper.ps.gz.
[SR93]
S. A. Shackleton and L. Ratcliffe. “Development of song in hand-reared
black-capped chickadees.” Wilson Bulletin, 105(4):637–644, 1993.
[SS99]
J. Maynard Smith and E. Szathmary. The origins of life: From the
Birth of Life to the Origin of Language. Oxford University Press, 1999.
[SS02a]
H. Slabbekoorn and T. B. Smith. “Bird song, ecology and speciation.”
Phil. Trans. R. Soc. Lond. B, 357:493–503, 2002.
[SS02b]
H. Slabbekoorn and T. B. Smith. “Habitat-dependent song divergence
in the little greenbul: an analysis of environmental selection pressures
on acoustic signals.” Evolution, 56(9):1849–1858, 2002.
78
[SSP03]
Michael D. Sorenson, Kristina M. Sefc, and Robert B. Payne. “Speciation by host switch in brood parasitic indigobirds.” Nature, 424:928–
931, 2003.
[sta97]
NIH staff. “NIH Pub. No. 97-4257: Aphasia.”, 1997.
http://www.nidcd.nih.gov/health/voice/aphasia.asp.
[Sta05]
Economist Staff. “Endangered Languages: Babel runs backwards.”
Economist, 374(8407), January 2005.
[Ste96]
L. Steels. “Self-organizing vocabularies.” In C. Langton and T. Shimohara, editors, Artificial Life V, pp. 179–184, Nara, Japan, 1996.
http://arti.vub.ac.be/steels/alife96.ps.
[Ste01]
L. Steels. “Language games for autonomous robots.” IEEE Intelligent
systems, pp. 17–22, October 2001.
[Ste03]
L. Steels. “Social Language learning.” In M. Tokoro and L. Steels,
editors, The Future of Learning, pp. 133–162, Amsterdam, 2003. IOS
Press.
[Ste06]
Luc Steels. “Semiotic Dynamics for Embodied Agents.” IEEE Intelligent Systems, 21(3):32–38, Jan-Feb 2006.
[Sut03]
W. J. Sutherland. “Parallel extinction risk and global distribution of
languages and species.” Nature, 423:276–279, 2003.
[TSH93]
P. L. Tubaro, E. T. Segura, and P. Handford. “Geographic variation
in the song of the Rufous-collared sparrow in eastern Argentina.” The
Condor, 93:588–593, 1993.
[VWA95] F. Vargha-Khadem, K. Watkins, K. Alcock, P. Fletcher, and R. Passingham. “Praxic and nonverbal cognitive deficits in a large family with
a genetically transmitted speech and language disorder.” Proc. Natl.
Acad. Sci. USA, 92:930–933, 1995.
[VWP98] F. Vargha-Khadem, K. E. Watkins, C. J. Price, J. Ashburner, K. J.
Alcock, A. Connelly, R. S. J. Frackowiak, K. J. Friston, M. E. Pembrey,
M. Mishkin, D. G. Gadian, and R. E. Passingham. “Neural basis of an
inherited speech and language disorder.” Proc. Natl. Acad. Sci. USA,
95:12695–12700, 1998.
[WL68]
S. L. Washburn and C. S. Lancaster. “The evolution of hunting.” In
R. B. Lee and I. DeVore, editors, Man the hunter. Aldine, Chicaco,
1968.
79
[WS98]
D. J. Watts and S. H. Strogatz. “Collective dynamics of ’small-world’
networks.” Nature, 393:440–442, June 1998.
[YS93]
H. Yanco and L. A. Stein. “An Adaptive Communication Protocol for
Cooperating Mobile Robots.” In J.-A. Meyer, H.L. Roitblat, and S.W.
Wilson, editors, From Animals to Animats 2: Proceedings of the Second
International Conference on the Simulation of Adaptive Behavior, pp.
478–485, Cambridge, MA, 1993. The MIT Press/Bradford Books.
80
© Copyright 2026 Paperzz