Genetic and linguistic diversity: Global distribution and implications

Genetic and linguistic diversity: Global distribution and implications for
prehistory
Daniel Nettle
Centre for Behaviour & Evolution
Newcastle University
Abstract
Whilst some have claimed that languages and genes evolve in tandem within the
human population, data on genetic diversity show that this is not the generally
the case. Human genetic diversity is greatest within, and reduces with distance
from, Africa. This pattern arose from serial founder effects as an African source
population colonised the rest of the globe. Diversity of language families is rather
low in African and Eurasia, and highest in Oceania and the Americas. I suggest
that this is because language-family diversity is most heavily conditioned by
homogenisation associated with agricultural expansions in the Holocene. Such
expansions affected more of the land masses of Africa and Eurasia than of
Oceania or the Americas. I argue that the different patterns of diversity found in
genes and languages are to be expected, since their mechanisms of transmission
are so different, with language fast to mutate but potentially slow to diffuse, and
genes slow to mutate but fast to diffuse.
1
1. Genetic and linguistic diversity: The ‘intrinsic relation’?
In 1988, Cavalli-Sforza and colleagues published a paper purporting to show
‘considerable parallelism between genetic and linguistic evolution’ in the human
population (Cavalli-Sforza et al. 1988: 6002). The centre-piece of the paper was a
figure showing two trees, facing each other at their branch tips. That on the left
represented the genetic affiliations of major human populations (using classical
autosomal markers), whilst that on the right was a diagram of the relationships of
the languages that those populations speak. This idea proved highly influential: at
time of writing, the paper has been cited 446 times on Web of Science. Other
authors picked up the idea that genetic and linguistic diversification in humans go
in lock-step, suggesting that ‘there is an intrinsic relation between genetics and
language’ (Chen, Sokal & Ruhlen 1995: 607), and that this discovery might
herald a ‘new synthesis’ of genetics and linguistics with archaeology to give a
unified account of human population history (Renfrew 1992). The idea is also of
interest to those who study cultural evolution more generally, since the thrust of
this idea was that cultural and genetic transmission mechanisms operated in
tandem over the generations.
In the twenty years since the publication of Cavalli-Sforza et al.’s paper,
molecular genetics has indeed provided a wealth of new inferences about human
population history, inferences that have been quite successfully married with the
archaeological evidence. However, the parallelism of linguistic and genetic
diversity has not generally survived closer examination. In fact, as I will argue in
this paper, human genetic and linguistic diversity have an almost diametrically
opposite distribution at the global scale (though there are local instances of
parallelism). This finding is actually not surprising given what we know about the
mechanisms generating and maintaining diversity in the two cases; as I will
argue, intra-population linguistic differences are quick to arise and slow to be
abolished, whereas genetic differences are slow to arise and quick to be
abolished. This means that genetic and linguistic diversity are usually informative
about population events at different time depths. They are thus both useful
markers of population history, but they do not always tell us the same thing.
In section 2, I describe what the Cavalli-Sforza et al. (1988) diagram actually
meant, and why it was not evidence of parallel genetic and linguistic evolution.
Section 3 examines the patterns of continental-scale diversity in genetic and
2
linguistic systems, and looks at what these may be telling us about history.
Section 4 considers the mechanisms of linguistic and genetic transmission more
generally, and considers why we might expect the two types of evolution to
become decoupled.
2. What the Cavalli-Sforza diagram actually means
What does the famous diagram from Cavalli-Sforza et al. (1988) actually
indicate? We will concentrate on the left (genetic) side and the right (linguistic)
side in turn.
The left side shows a diagram of the inter-population genetic distances of 42
world populations. The African populations are genetically closest to each other.
All the non-African populations are closer to each other than they are to any
African population, and within the non-African populations, there are relationships
which seem easily interpretable: populations cluster by land-mass, and the
cluster of native American populations is close to the cluster of East Asian
populations, which makes sense given other evidence of a trans-Bering human
entry into the Americas.
The most striking signal in this pattern, then, is that of geography. Populations
are similar to each more or less in proportion to their geographical distances, and
this is true at all scales, both within and between major landmass groups.
Subsequent work has amply confirmed that geography is the best predictor of
genetic affiliations between human groups. For example, using the modern
molecular data on around 1000 individuals from the Human Genome Diversity
Project, the geographical distance (by navigable routes) between the residences
of 2 individuals accounts for a staggering 75% or more of the variance of the
genetic distance between them (Ramachandran et al. 2005, Liu et al. 2006,
Handley et al. 2007). Thus, human genetic diversity is largely clinally distributed
along the axes of geography. There is some debate about whether this cline is
smooth, or shows abrupt discontinuities which would allow clusters of the
population to be identified (Rosenberg et al. 2005), but it is clear that these
cluster boundaries account for a most an extra ~2% of the variation (Handley et
al. 2007) above and beyond the continuum of distance.
This geographical ordering could have arisen in two ways. First, assume that
human beings arose in one location and spread outwards from there by repeated
episodes of local colonisation, in each of which, a small subset of the source
3
population moved to the adjacent empty location. The colonising subset would
not be a representative sample of the source, and so there would be some
measurable allele-frequency distance between source and colony. Some time
later, a subset of this subset colonises out to the next adjacent location. If this
process is repeated many times, a geographical-genetic distance association is
created, because the further apart two populations are, the greater the number of
colonisation events separate them, and thus the greater the opportunity for
founder effects and drift during colonisation to generate genetic differences
between them. This process is called the ‘serial founder effect’ (Ramachandran et
al. 2005).
The second reason for a relationship between genetics and geography is gene
flow once populations are established. Genetic distance between populations does
not only increase with time when populations are isolated; it also decreases with
time when there is inter-marriage. Given that the likelihood of inter-marriage
declines with distance, this is a powerful mechanism for creating genetic
affiliations that are proportional to distance. We thus have two possible
mechanisms for the creation of geographical clines: serial founder effects and
subsequent admixture. Though both probably occur, there are good reasons for
believing serial founder effects to be overwhelmingly important in accounting for
the pattern of human genetic diversity (see section 3).
Turning to the linguistic side of the Cavalli-Sforza et al. (1988) diagram, the
languages of the 42 populations are organised into 16 higher-scale phyla, which
in turn converge to a single node. The positions of the 16 linguistic phyla relative
to each other mirror the genetic distances of the populations to each other
(African phyla closest to other African phyla, ‘Amerind’ closest to East Asian
languages etc.).
To appreciate why this is problematic, some background is needed. Linguists
agree that many of the world’s languages can be grouped into families, in which
there is such systematic correspondence across the basic lexicon that the only
plausible interpretation is that the two languages have sprung from a common
source, and evolved by cultural descent with modification. Note that not all
linguistic affinities imply phylogenetic relatedness of this kind. There is also plenty
of change which arises through language contact and the consequent
transmission of lexical or grammatical items piecemeal across linguistic
boundaries (Dixon 1997, Campbell 1998).
4
Note also that not all linguistic phylogenetic relationships can be resolved.
Ongoing processes of change within language erode the characteristic traces of
common descent, such that within a period often assumed to be less than 10,000
years (Nichols 1992), it becomes impossible to detect, from three languages,
which two are more closely related than the third. Thus, beyond a certain point,
there is no accepted evidence that would justify preferring any linguistic
phylogeny over any other. There is some debate about the point at which this
impenetrable fog sets in, but an influential historical linguist represents the
consensus when she argues that there are around 150 linguistic phyla in the
world of such a depth that although their internal branching structure can be
resolved, no higher phylogenetic arrangement of can be justifiably preferred to
any other (Nichols 1992). These phyla (henceforth, stocks) contain anything from
1 to 1000 or more languages.
This is important, because most of the 16 main linguistic phyla in the CavalliSforza et al. diagram are stocks of this type. In the diagram, they are ordered
with the four African stocks next to each other, then all the Eurasian ones next to
one another, and so on for each land mass. However, as mentioned, there is no
linguistic reason for preferring this ordering to any other of the 2 x 1013 possible
orderings. The only reason that this ordering is the most sensible is
geographical; this particular ordering maintains the relative geographic position of
the major stocks. However, since we know that genetic distances primarily reflect
geographical adjacency, then the ordering the linguistic stocks geographically
pretty well guarantees a high degree of congruence between the two sides of the
diagram, without there really being any deep parallel between linguistic and
genetic diversity.
This problem is exacerbated by the fact that three of the 16 linguistic phyla on
the diagram are not accepted as unitary by many historical linguists. ‘Amerind’
represents a proposal by Greenberg (1987) to unify many dozens of distinct
language families in the Americas into one deep stock on the basis of fragmentary
lexical evidence, but is not accepted as convincing by historical linguists (Nichols
1992, Dixon 1997, Mithun 1997). ‘Indopacific’ and ‘Australian’ also represent
lumpings together of what are more usually regarded as numerous independent
families, whose higher level phylogeny has not been demonstrated (Foley 1986,
Nichols 1992, Dixon 1997). Indeed, doubts have been expressed about the utility
of phylogenies of languages in Australia, where there is evidence of very ancient
5
and regular diffusion of lexical items amongst small, fluid social groups (Dixon
1997). Thus, the Cavalli-Sforza diagram further exaggerates the congruence of
linguistic and genetic diversity by assigning key genetically homogenous
populations to a single linguistic stock, when the consensus in linguistics is that
these populations actually contain rather great linguistic diversity.
In short, apparent the congruence of linguistic and genetic diversity at the global
scale in Cavalli-Sforza et al. (1988) appears to be illusory. At the more regional
scale, correspondences between linguistic and genetic affiliation, even when
controlling for geographical distance have sometimes been found and sometimes
not (Nettle & Harris 2003). Thus, tandem genetic and linguistic evolution can
certainly occur. What is less clear is that it is the global norm.
This is a difficult question to address, since although there are beginning to be
useful techniques for placing human populations into phylogenetic orderings
based on genetic evidence, even from the recombining portion of the genome
(Hellenthal, Auton & Falush 2008), there is no generally accepted way of drawing
a phylogeny of the world’s languages that goes deeper than the 150 independent
stocks. This chapter therefore takes a different approach, concentrating instead
on characterising the degree of genetic and linguistic diversity within each of the
major continents, to examine the extent to which genetically diverse continents
are also linguistically diverse ones.
3. Genetic and linguistic diversity at the continental level
In this section, I ask what the extent of diversity within the major continents is,
in linguistic and in genetic terms. For the genetics, this is tantamount to asking
how much two individuals chosen at random from the population are likely to
differ from one another in the genetic system under study. For language, it is not
easy to produce a continuous measure of how different the languages of two
randomly chosen individuals would be. However, we can ask, for two randomly
chosen locations on the continent, what is the probability that the same language
will be spoken there (i.e. how many languages are there relative to the land
area?), and what is the probability that the languages spoken there will belong to
the same stock (i.e. how many linguistic stocks are there relative to the land
area?).
6
Genetic data
Continent-by-continent diversity data are available from published studies for a
number of genetic systems: mitochondrial DNA, Y chromosome haplotypes,
autosomal microsatellites and haplotypes from a non-recombining section of
chromosome 21 (table 1). For details of the individuals sampled and the
measures, the reader is referred to the original references. All of the data sources
sample a substantial number of individuals from multiple locations within each
continent, and in each case the data presented here are in the form of some kind
of average genetic distance between individuals (in terms of pairwise site
differences for the mtDNA molecule, and FST or some formally similar genetic
distance for other systems). There are two similar data sources for the Y
chromosome, and both have been included for comparison. Note that due to the
different measures (and also the different mutation and recombination rates), we
cannot compare the levels of diversity across the genetic systems. However, we
can look at the rank order of continents relative to each other for each system.
The continental breakdown differs slightly from study to study (e.g. in separating
South from East Asia, or Australia from New Guinea), and so I have adopted the
broadest common denominator and taken the mean value of component regions
where necessary. No major conclusion is affected by this procedure.
Linguistic data
Total number of languages for each continent was extracted from Grimes (2000),
and of linguistic stocks from Nichols (1992), to which the reader is referred for
details of what constitutes a stock. As well as the absolute counts, I present the
numbers per million km2.
Results and discussion
The data are shown in table 2. For all the genetic systems without exception,
there is more diversity within Africa than in any other continent. This is a wellknown finding, and relates to Africa’s role as the oldest and source population for
humankind (Cann, Stoneking & Wilson 1987, Bowcock et al. 1994, Ramachandran
et al. 2000). The other continents are not in a consistent order across all
systems, but there is a tendency for the Americas to be the most homogenous
population. Thus, the findings shown here agree with the broad consensus in
human genetics that patterns of intra-population diversity fit well with the model
of a source population in Africa from which there was serial founding of colonies
spreading over the rest of the globe and reaching the Americas last (Hellenthal et
al. 2008). The serial founder model specifically predicts that internal genetic
7
diversity within populations will decline the further those populations are from the
African origin, and genetic data generally accord with this prediction. Indeed,
studying data from the Human Genome Diversity Project at a finer spatial scale
than those reported here, Ramachandran et al. (2000) show that 76% of the
variation in intra-population genetic diversity in humans is explained by land
distance from Addis Ababa, with East Africans the most diverse and native South
Americans the most homogenous.
The language data uncorrected for land area are not especially revealing. Asia has
the most languages, and Europe the least, but then Asia is the largest continent
and Europe the second smallest. Correcting for land area, a different pattern
emerges. Africa and especially Oceania are relatively diverse for their size, whilst
Europe and the Americas are less so. I have argued at length elsewhere that
language diversity reflects the scale of organisation of the subsistence economy
in recent times (Nettle 1998a, 1999a). Africa and Oceania (especially New
Guinea) have many small languages because they have many small subsistence
economies, facilitated by the low level of modern economic development, the
recency of state formation, and equatorial climates whose lack of seasonality
minimise the need for exchange and whose disease burdens encourage limited
dispersal (Fincher & Thornhill 2008). Europe is at the opposite extreme, with
seasonal production and large-scale market exchange over many hundreds of
years, and early state formation. Asia is a mixture of a more New Guinea-like
situation in Southeast Asia and Indonesia, with a more Europe-like situation in
East Asia. The relatively low language diversity of the Americas probably reflects
post-contact extinction; many languages were lost in the population collapses
after European arrival.
The stock diversity, both uncorrected and corrected for land area, shows a
different pattern again. Eurasia and Africa are rather poor in stocks, whereas
Oceania and the Americas are around an order of magnitude more diverse. To
understand why this might be the case, we need to consider that the best
available evidence suggests that the large stocks with which we are familiar, such
as Niger-Congo (Bantu), Indo-European, and Austronesian, appear to have been
spread by large-scale demographic expansions within the last ten thousand years,
often driven by the expansion of agricultural production systems (Renfrew 1987,
Diamond 1994, Nettle 1999a). More often than not, these expansion were into
already-inhabited areas, whose foraging populations were incorporated
biologically but whose culture, including language, leaves no trace. Thus, when
8
we observe the relatively low linguistic diversity of Africa and Eurasia, we are
observing the homogenising events of the Holocene, with its high-density, fastgrowing food-producing populations expanding and overlaying the previous
pattern of diversity. These homogenising events had much greater impact in
Africa and Eurasia, where a few centres of food production lead to expansions
across much of those continents’ East-West axis, and significant cultural
homogenisation. In the Americas, there were transitions to food production, but
they did not easily spread along the predominantly North-South axis of the
continent, and a mixture of separated farming and foraging populations, with all
the cultural diversity they represent, remained. In Oceania, there were only
limited transitions to food production (none at all in Australia), which, coupled
with the challenging island and mountain geography, has allowed many
populations to persist unhomogenised.
Thus, far from working in tandem, genetic and linguistic diversity show quite
different patterns and reflect processes working at different temporal depths.
Lowered genetic diversity tells us about founder effects stemming from
colonisation events that may have happened forty thousand years ago. Lowered
diversity of linguistic stocks tells us that there have been homogenising
processes, such as large scale demographic expansions associated with
agriculture, within the past ten thousand years. Lowered diversity of languages
tell us about the pattern of economic organisation probably within the last five
hundred years (that small groups have been incorporated into a wider regional
system, for example).
4. Transmission mechanisms and decoupling of language and genes
The previous section showed that genetic traits and a cultural trait – language –
can become radically decoupled during evolution, such that the greatest diversity
in language is found where there is the least diversity in genes. This section
considers in a little more detail how this decoupling can happen, given that both
language and genes pass from generation to generation through local
interactions.
Both genetic and cultural change are characterised by some kind of innovation
and some kind of diffusion. In the genetic case, the source of innovation is
random mutation, whereas in the linguistic case, the source is idiosyncrasies,
random or otherwise, in the processes of language acquisition and use. Rates of
9
new genetic mutation are generally fairly low (although they differ markedly
across the genome), and rates of cultural innovation are probably much higher as
a rule.
For genetics, the mechanism of diffusion is sexual reproduction. If an individual
mates, then because of fair meiosis, any mutation that individual is carrying has a
more or less equal chance of appearing in the offspring, and (assuming it has no
dramatic impact on fitness either way), an equal chance to other alleles of being
diffused into further generations. This ready diffusion means that, in classical
population genetic models, even a single migrant per generation between two
genetic populations is enough to make the gene pool of those two populations
converge. The process of cultural diffusion is likely to be quite different. The
learner does not sample from just two individuals (its parents), but is potentially
exposed to a wide range of cultural models. Exactly how these models are
sampled and their input incorporated is not understood in detail, and probably
varies. Proposals include a conformist bias (adopt the most frequent cultural
variant in the surroundings), or a prestige bias (adopt the variant of the
individuals with the highest status locally; Boyd & Richerson 1985; for simulations
of the effects of different transmission rules in different social networks see Nettle
1999a, b). Both conformist and prestige-biased transmission produce quite
different dynamics in the cultural than in the genetic case.
For example, consider a sub-group of 10 individuals with variant A which is
incorporated into a larger population of 90 individuals that carry variant a. There
is panmixis and no difference in fitness between the variants. In the genetic case,
where A and a are alleles of a gene, then after a generation or two, by simple
Hardy-Weinberg calculations, the allele frequencies in the new population will be
1:9 A:a. Now consider the genetic case. If the cultural trait is transmitted by
conformist learning, then all learners, even if they are children of A parents, will
encounter more instances of a than A, and thus the frequencies after a generation
or two will be 0:10. On the other hand, if learning is prestige-biased, then the
frequencies might be either 0:10 or 10:0, depending on whether the 10
individuals with variant A were coming into the group with high or with low local
status.
The implication of the foregoing considerations is the following. Genetic
innovations arise only slowly, because the rate of mutation is low. This is why
non-African populations still show reduced diversity from serial founder effects
10
tens of thousand years after their origins. On the other hand, genetic innovations
that are not too deleterious to fitness diffuse readily. This is why the great Bantu
expansions in the African Holocene did not manage to reduce Africa’s internal
genetic diversity. As long as a few individuals from the non-Bantu populations
swallowed up in the expansion managed to have children, their genetic legacy has
a good chance of remaining in the continental mix. In the cultural/linguistic case,
innovations can arise rapidly, explaining why the Americas can have produced
dozens of quite different language families in what may be around 15 thousand
years of habitation. However, linguistic innovations do not diffuse so readily. If
speakers of non-Bantu languages were subsumed by Bantu populations
substantially larger, or of higher status, than they were, then their cultural traits
would disappear utterly, and their biological children would be speaking Bantu
languages.
These different transmission mechanisms help explain why the patterns of
linguistic and genetic diversity on the landscape are so different. Genetic diversity
is generally arranged along smooth clines, with only a tiny amount of abrupt
discontinuity, and even this corresponds to physical barriers rather than ethnic
boundaries (Rosenberg et al. 2005, Handley et al. 2007). By contrast, a glance at
a linguistic atlas will show that in most rural areas of the world, people speak one
language exactly to the point where they suddenly speak another. There may of
course be multilingualism, but in these cases the several codes are nonetheless
kept largely distinct in the speakers’ minds. Why the marked difference? In the
genetic case, because of genes’ ready diffusion, a few local migrants per
generation are enough to smooth the discontinuities away. In the linguistic case,
people generally belong to, and accord status within, one particular core social
network or another, and a point on the landscape is reached where we have
crossed from the sphere of influence of one to that of the next. These social
structures are the conduits of transmission for language, and incomers must
usually conform to incumbent prestige and weight of numbers. Genetically, a few
Englishmen migrating to France every generation renders those two populations
indistinguishable. Linguistically, a few Englishmen migrating to France every
generation is just a few Englishmen who have to learn French. French is in no
sense a more Germanic language as a consequence. Thus, ethnolinguistic
boundaries can survive many generations of chronic cross-boundary exogamy
without becoming any less exact or marked (Barth 1969, Sorenson 1971, Nettle
1998b).
11
The exception to this picture is what linguists call loan words; isolated lexical
items that move between languages, including languages of different families.
Loans do not change the phylogenetic affiliation of the two languages, though in
extreme cases, they may make the phylogenetic affiliations completely obscure
(Thomason & Kaufman 1988, Dixon 1997). There are scattered observations in
the literature that the frequency of loan-words between small communities tracks
the amount of demographic admixture (Nettle 1998b, Lansing et al. 2007). The
mechanism appears to be that the in-marrying parent talks to their children in
their native tongue, and certain words manage to diffuse into general use, even
though it is the local majority language that is being transmitted overall. Thus it
may be that loanwords are more tightly coupled to genes than whole languages
are.
Thus, we have an account of how linguistic and genetic diversity become so
decoupled (figure 2). An ancient continent such as Africa would have produced a
great wealth of linguistic diversity in the Pleistocene, but most of that failed to
diffuse into the present through Holocene processes of homogenization. By
contrast, Africa’s relatively great genetic diversity did survive these events. The
opposite to Africa, a young continent like the Americas has had enough time to
produce a great flowering of linguistic diversity – and no pre-European
expansions able to erase that diversity - but not enough time for the genetic
bottleneck of its founding to be erased, leaving it relatively homogenous in
genetic terms.
12
References Cited
Barth, F. (1960). Ethnic Groups and Boundaries. London: Allen & Unwin.
Bowcock, A.M., A. Ruiz-Linares, A. Tomfohrde, E. Minch, J.R. Kidd & L.L. CavalliSforza. (1994). High resolution of human evolutionary trees with
polymorphic microsatellites. Nature 368: 455-7.
Boyd, R. & P. Richerson (1985). Culture and the Evolutionary Process. Chicago:
University of Chicago Press.
Campbell, L. (1998). Historical Linguistics. Edinburgh: Edinburgh University Press.
Cann, R.L., M. Stoneking & M. Wilson (1987). Mitochondrial DNA and human
evolution. Nature 325: 31-6.
Cavalli-Sforza, L.L., A. Piazza, P. Menozzi & J. Mountain (1988). Reconstruction of
human evolution: Bringing together genetic, archaeological and linguistic
data. Proceedings of the National Academy of Sciences of the USA 85:
6002-6.
Chen, J, R.R. Sokal & M. Ruhlen (1995). Worldwide analysis of genetic and
linguistic relationships of human populations. Human Biology 67: 595-612.
Diamond, J. (1994). Guns, Germs and Steel: The Fate of Human Societies.
London: Jonathan Cape.
Dixon, R.M.W. (1997). The Rise and Fall of Languages. Cambridge: Cambridge
University Press.
Fincher, C.L. & R. Thornhill (2008). A parasite-driven wedge: Infectious diseases
may explain language and other biodiversity. Oikos 117:1289-1297.
Foley, W. A. (1986). The Papuan Languages of New Guinea. Cambridge:
Cambridge University Press.
Greenberg, J.H. (1987). Language in the Americas. Stanford: Stanford University
Press.
Grimes, B.F. (2000). Ethnologue: The World’s Languages. Norman, OK: Summer
Institute of Linguists. 13th edition.
Hammer, M.F., A.B. Spurdle, T. Karafet, M.R. Bonner, E.T. Wood et al. (1997).
The geographic distribution of Y chromosome variation. Genetics 145:
787-85.
Handley, L.J.L., A. Manica, J. Goudet & F. Balloux (2007). Going the distance:
Human population genetics in a clinal world. Trends in Genetics 23: 432-9.
Hellenthal, G., A. Auton & D. Falush (2008). Inferring human colonization history
using a copying model. Plos Genetics 4: e1000078.
Ingman, M., H. Kaessmann, S. Paabo & U. Gyllensten (2000). Mitochondrial
genome variation and the origin of modern humans. Nature 408: 708-13.
13
Jin, L., P.A. Underhill, V. Doctor, R.W. Davis, P.D. Shen, L.L. Cavalli-Sforza & P.J.
Oefner (1999). Distribution of haplotypes from a chromosome 21 region
distinguishes multiple prehistoric human migrations. Proceedings of the
National Academy of Sciences of the USA 96: 3796-400.
Lansing, J.S., M.P. Cox, S.S. Downey, B.M. Gabler, B. Hallmark et al. (2007).
Coevolution of languages and genes on the island of Sumba, eastern
Indonesia. Proceedings of the National Academy of Sciences of the USA
104: 16022-6.
Liu, H., F. Prugnolle, A. Manica & F. Balloux (2006). A geographically explicit
genetic model of worldwide human settlement. American Journal of
Human Genetics 79: 230-7.
Mithun, M. (1999). The Languages of Native North America. Cambridge:
Cambridge University Press.
Nettle, D. (1998a). Explaining global patterns of language diversity. Journal of
Anthropological Archaeology 17: 354-74.
Nettle, D. (1998b). The Fyem Language of Northern Nigeria. Munich: Lincom
Europa.
Nettle, D. (1999a). Linguistic Diversity. Oxford: Oxford University Press.
Nettle, D. (1999b). Using social impact theory to simulate language change.
Lingua 108: 95-117.
Nettle, D. & L. Harriss (2003). Genetic and linguistic affinities between human
populations in Eurasia and West Africa. Human Biology 75: 331-44.
Nichols, J. (1992). Linguistic Diversity in Space and Time. Chicago: University of
Chicago Press.
Ramachandran, S., O. Deshpande, C.C. Roseman, N.A. Rosenberg, M.W. Feldman
& L.L. Cavalli-Sforza (2005). Support from the relationship of genetic and
geographic distance in human populations for a serial founder effect
originating in Africa. Proceedings of the National Academy of Sciences of
the USA 102: 15942-7.
Renfrew, C. (1987). Archaeology and Language. London: Jonathan Cape.
Renfrew, C. (1992). Archaeology, genetics and linguistic diversity. Man 27: 44578.
Rosenberg, N.A., S. Mahajan, S. Ramachandran, C. Zhao, J.K. Pritchard & M.W.
Feldman (2005). Clines, clusters, and the effect of study design on the
inference of human population structure. PLoS Genetics 1: 660-671.
Sorenson, A.P. (1971). Multilingualism in the Northwest Amazon. American
Anthropologist 69: 670-84.
14
Thomason, S. G. & T. Kaufman (1988). Language Contact, Creolization and
Genetic Linguistics. Berkeley: University of California Press.
15
Table 1. Types and sources of data on continental genetic and linguistic
diversity.
System
Data type
Diversity measure
Source
Complete
Pairwise sequence
Ingman et al.
sequence
differences (mean)
(2000)
Haplotype
Nei diversity
Hammer et al.
frequencies
measure
(1996)
Haplotype
Entropy measure
Jin et al. (1999)
Entropy measure
Jin et al. (1999)
Genetic
mtDNA
Y chromosome (1)
Y chromosome (2)
frequencies
Non-recombining
Haplotype
segment of
frequencies
chromosome 21
Autosomal
Microsatellite
Mean
Bowcock et al.
microsatellites
frequencies
heterozygosity
(1994)
Count
Absolute number
Grimes (2000)
Linguistic
Languages
and number/ million
km2
Linguistic stocks
Count
Absolute number
and number/ million
km2
16
Nichols (1992)
Table 2. Continental comparisons for genetic and linguistic diversity (data as described in table 1). Bold: Most diverse continent for the
system. Italic: Least diverse continent for the system.
Continent
mtDNA
Y chr. 1
Y chr. 2
Chr. 21
Microsats.
Languages
Languages/
Stocks
Mkm2
Stocks/
Mkm2
Africa
76.7
0.88
61
73
0.81
2011
66.90
20
0.67
Asia
36.7
0.78
37
50
0.69
2165
48.58
22
0.49
Europe
24.7
0.74
39
48
0.73
225
22.64
6
0.60
Oceania
41.8
0.72
41
71
0.64
1302
169.31
46
5.98
Americas
36.2
0.56
18
41
0.59
1000
23.76
157
3.73
17
Table 3. Transmission mechanisms for languages and genes: Summary
Innovation rate
Diffusability
Languages
High
Low
Genes
Low
High
18