Genetic and linguistic diversity: Global distribution and implications for prehistory Daniel Nettle Centre for Behaviour & Evolution Newcastle University Abstract Whilst some have claimed that languages and genes evolve in tandem within the human population, data on genetic diversity show that this is not the generally the case. Human genetic diversity is greatest within, and reduces with distance from, Africa. This pattern arose from serial founder effects as an African source population colonised the rest of the globe. Diversity of language families is rather low in African and Eurasia, and highest in Oceania and the Americas. I suggest that this is because language-family diversity is most heavily conditioned by homogenisation associated with agricultural expansions in the Holocene. Such expansions affected more of the land masses of Africa and Eurasia than of Oceania or the Americas. I argue that the different patterns of diversity found in genes and languages are to be expected, since their mechanisms of transmission are so different, with language fast to mutate but potentially slow to diffuse, and genes slow to mutate but fast to diffuse. 1 1. Genetic and linguistic diversity: The ‘intrinsic relation’? In 1988, Cavalli-Sforza and colleagues published a paper purporting to show ‘considerable parallelism between genetic and linguistic evolution’ in the human population (Cavalli-Sforza et al. 1988: 6002). The centre-piece of the paper was a figure showing two trees, facing each other at their branch tips. That on the left represented the genetic affiliations of major human populations (using classical autosomal markers), whilst that on the right was a diagram of the relationships of the languages that those populations speak. This idea proved highly influential: at time of writing, the paper has been cited 446 times on Web of Science. Other authors picked up the idea that genetic and linguistic diversification in humans go in lock-step, suggesting that ‘there is an intrinsic relation between genetics and language’ (Chen, Sokal & Ruhlen 1995: 607), and that this discovery might herald a ‘new synthesis’ of genetics and linguistics with archaeology to give a unified account of human population history (Renfrew 1992). The idea is also of interest to those who study cultural evolution more generally, since the thrust of this idea was that cultural and genetic transmission mechanisms operated in tandem over the generations. In the twenty years since the publication of Cavalli-Sforza et al.’s paper, molecular genetics has indeed provided a wealth of new inferences about human population history, inferences that have been quite successfully married with the archaeological evidence. However, the parallelism of linguistic and genetic diversity has not generally survived closer examination. In fact, as I will argue in this paper, human genetic and linguistic diversity have an almost diametrically opposite distribution at the global scale (though there are local instances of parallelism). This finding is actually not surprising given what we know about the mechanisms generating and maintaining diversity in the two cases; as I will argue, intra-population linguistic differences are quick to arise and slow to be abolished, whereas genetic differences are slow to arise and quick to be abolished. This means that genetic and linguistic diversity are usually informative about population events at different time depths. They are thus both useful markers of population history, but they do not always tell us the same thing. In section 2, I describe what the Cavalli-Sforza et al. (1988) diagram actually meant, and why it was not evidence of parallel genetic and linguistic evolution. Section 3 examines the patterns of continental-scale diversity in genetic and 2 linguistic systems, and looks at what these may be telling us about history. Section 4 considers the mechanisms of linguistic and genetic transmission more generally, and considers why we might expect the two types of evolution to become decoupled. 2. What the Cavalli-Sforza diagram actually means What does the famous diagram from Cavalli-Sforza et al. (1988) actually indicate? We will concentrate on the left (genetic) side and the right (linguistic) side in turn. The left side shows a diagram of the inter-population genetic distances of 42 world populations. The African populations are genetically closest to each other. All the non-African populations are closer to each other than they are to any African population, and within the non-African populations, there are relationships which seem easily interpretable: populations cluster by land-mass, and the cluster of native American populations is close to the cluster of East Asian populations, which makes sense given other evidence of a trans-Bering human entry into the Americas. The most striking signal in this pattern, then, is that of geography. Populations are similar to each more or less in proportion to their geographical distances, and this is true at all scales, both within and between major landmass groups. Subsequent work has amply confirmed that geography is the best predictor of genetic affiliations between human groups. For example, using the modern molecular data on around 1000 individuals from the Human Genome Diversity Project, the geographical distance (by navigable routes) between the residences of 2 individuals accounts for a staggering 75% or more of the variance of the genetic distance between them (Ramachandran et al. 2005, Liu et al. 2006, Handley et al. 2007). Thus, human genetic diversity is largely clinally distributed along the axes of geography. There is some debate about whether this cline is smooth, or shows abrupt discontinuities which would allow clusters of the population to be identified (Rosenberg et al. 2005), but it is clear that these cluster boundaries account for a most an extra ~2% of the variation (Handley et al. 2007) above and beyond the continuum of distance. This geographical ordering could have arisen in two ways. First, assume that human beings arose in one location and spread outwards from there by repeated episodes of local colonisation, in each of which, a small subset of the source 3 population moved to the adjacent empty location. The colonising subset would not be a representative sample of the source, and so there would be some measurable allele-frequency distance between source and colony. Some time later, a subset of this subset colonises out to the next adjacent location. If this process is repeated many times, a geographical-genetic distance association is created, because the further apart two populations are, the greater the number of colonisation events separate them, and thus the greater the opportunity for founder effects and drift during colonisation to generate genetic differences between them. This process is called the ‘serial founder effect’ (Ramachandran et al. 2005). The second reason for a relationship between genetics and geography is gene flow once populations are established. Genetic distance between populations does not only increase with time when populations are isolated; it also decreases with time when there is inter-marriage. Given that the likelihood of inter-marriage declines with distance, this is a powerful mechanism for creating genetic affiliations that are proportional to distance. We thus have two possible mechanisms for the creation of geographical clines: serial founder effects and subsequent admixture. Though both probably occur, there are good reasons for believing serial founder effects to be overwhelmingly important in accounting for the pattern of human genetic diversity (see section 3). Turning to the linguistic side of the Cavalli-Sforza et al. (1988) diagram, the languages of the 42 populations are organised into 16 higher-scale phyla, which in turn converge to a single node. The positions of the 16 linguistic phyla relative to each other mirror the genetic distances of the populations to each other (African phyla closest to other African phyla, ‘Amerind’ closest to East Asian languages etc.). To appreciate why this is problematic, some background is needed. Linguists agree that many of the world’s languages can be grouped into families, in which there is such systematic correspondence across the basic lexicon that the only plausible interpretation is that the two languages have sprung from a common source, and evolved by cultural descent with modification. Note that not all linguistic affinities imply phylogenetic relatedness of this kind. There is also plenty of change which arises through language contact and the consequent transmission of lexical or grammatical items piecemeal across linguistic boundaries (Dixon 1997, Campbell 1998). 4 Note also that not all linguistic phylogenetic relationships can be resolved. Ongoing processes of change within language erode the characteristic traces of common descent, such that within a period often assumed to be less than 10,000 years (Nichols 1992), it becomes impossible to detect, from three languages, which two are more closely related than the third. Thus, beyond a certain point, there is no accepted evidence that would justify preferring any linguistic phylogeny over any other. There is some debate about the point at which this impenetrable fog sets in, but an influential historical linguist represents the consensus when she argues that there are around 150 linguistic phyla in the world of such a depth that although their internal branching structure can be resolved, no higher phylogenetic arrangement of can be justifiably preferred to any other (Nichols 1992). These phyla (henceforth, stocks) contain anything from 1 to 1000 or more languages. This is important, because most of the 16 main linguistic phyla in the CavalliSforza et al. diagram are stocks of this type. In the diagram, they are ordered with the four African stocks next to each other, then all the Eurasian ones next to one another, and so on for each land mass. However, as mentioned, there is no linguistic reason for preferring this ordering to any other of the 2 x 1013 possible orderings. The only reason that this ordering is the most sensible is geographical; this particular ordering maintains the relative geographic position of the major stocks. However, since we know that genetic distances primarily reflect geographical adjacency, then the ordering the linguistic stocks geographically pretty well guarantees a high degree of congruence between the two sides of the diagram, without there really being any deep parallel between linguistic and genetic diversity. This problem is exacerbated by the fact that three of the 16 linguistic phyla on the diagram are not accepted as unitary by many historical linguists. ‘Amerind’ represents a proposal by Greenberg (1987) to unify many dozens of distinct language families in the Americas into one deep stock on the basis of fragmentary lexical evidence, but is not accepted as convincing by historical linguists (Nichols 1992, Dixon 1997, Mithun 1997). ‘Indopacific’ and ‘Australian’ also represent lumpings together of what are more usually regarded as numerous independent families, whose higher level phylogeny has not been demonstrated (Foley 1986, Nichols 1992, Dixon 1997). Indeed, doubts have been expressed about the utility of phylogenies of languages in Australia, where there is evidence of very ancient 5 and regular diffusion of lexical items amongst small, fluid social groups (Dixon 1997). Thus, the Cavalli-Sforza diagram further exaggerates the congruence of linguistic and genetic diversity by assigning key genetically homogenous populations to a single linguistic stock, when the consensus in linguistics is that these populations actually contain rather great linguistic diversity. In short, apparent the congruence of linguistic and genetic diversity at the global scale in Cavalli-Sforza et al. (1988) appears to be illusory. At the more regional scale, correspondences between linguistic and genetic affiliation, even when controlling for geographical distance have sometimes been found and sometimes not (Nettle & Harris 2003). Thus, tandem genetic and linguistic evolution can certainly occur. What is less clear is that it is the global norm. This is a difficult question to address, since although there are beginning to be useful techniques for placing human populations into phylogenetic orderings based on genetic evidence, even from the recombining portion of the genome (Hellenthal, Auton & Falush 2008), there is no generally accepted way of drawing a phylogeny of the world’s languages that goes deeper than the 150 independent stocks. This chapter therefore takes a different approach, concentrating instead on characterising the degree of genetic and linguistic diversity within each of the major continents, to examine the extent to which genetically diverse continents are also linguistically diverse ones. 3. Genetic and linguistic diversity at the continental level In this section, I ask what the extent of diversity within the major continents is, in linguistic and in genetic terms. For the genetics, this is tantamount to asking how much two individuals chosen at random from the population are likely to differ from one another in the genetic system under study. For language, it is not easy to produce a continuous measure of how different the languages of two randomly chosen individuals would be. However, we can ask, for two randomly chosen locations on the continent, what is the probability that the same language will be spoken there (i.e. how many languages are there relative to the land area?), and what is the probability that the languages spoken there will belong to the same stock (i.e. how many linguistic stocks are there relative to the land area?). 6 Genetic data Continent-by-continent diversity data are available from published studies for a number of genetic systems: mitochondrial DNA, Y chromosome haplotypes, autosomal microsatellites and haplotypes from a non-recombining section of chromosome 21 (table 1). For details of the individuals sampled and the measures, the reader is referred to the original references. All of the data sources sample a substantial number of individuals from multiple locations within each continent, and in each case the data presented here are in the form of some kind of average genetic distance between individuals (in terms of pairwise site differences for the mtDNA molecule, and FST or some formally similar genetic distance for other systems). There are two similar data sources for the Y chromosome, and both have been included for comparison. Note that due to the different measures (and also the different mutation and recombination rates), we cannot compare the levels of diversity across the genetic systems. However, we can look at the rank order of continents relative to each other for each system. The continental breakdown differs slightly from study to study (e.g. in separating South from East Asia, or Australia from New Guinea), and so I have adopted the broadest common denominator and taken the mean value of component regions where necessary. No major conclusion is affected by this procedure. Linguistic data Total number of languages for each continent was extracted from Grimes (2000), and of linguistic stocks from Nichols (1992), to which the reader is referred for details of what constitutes a stock. As well as the absolute counts, I present the numbers per million km2. Results and discussion The data are shown in table 2. For all the genetic systems without exception, there is more diversity within Africa than in any other continent. This is a wellknown finding, and relates to Africa’s role as the oldest and source population for humankind (Cann, Stoneking & Wilson 1987, Bowcock et al. 1994, Ramachandran et al. 2000). The other continents are not in a consistent order across all systems, but there is a tendency for the Americas to be the most homogenous population. Thus, the findings shown here agree with the broad consensus in human genetics that patterns of intra-population diversity fit well with the model of a source population in Africa from which there was serial founding of colonies spreading over the rest of the globe and reaching the Americas last (Hellenthal et al. 2008). The serial founder model specifically predicts that internal genetic 7 diversity within populations will decline the further those populations are from the African origin, and genetic data generally accord with this prediction. Indeed, studying data from the Human Genome Diversity Project at a finer spatial scale than those reported here, Ramachandran et al. (2000) show that 76% of the variation in intra-population genetic diversity in humans is explained by land distance from Addis Ababa, with East Africans the most diverse and native South Americans the most homogenous. The language data uncorrected for land area are not especially revealing. Asia has the most languages, and Europe the least, but then Asia is the largest continent and Europe the second smallest. Correcting for land area, a different pattern emerges. Africa and especially Oceania are relatively diverse for their size, whilst Europe and the Americas are less so. I have argued at length elsewhere that language diversity reflects the scale of organisation of the subsistence economy in recent times (Nettle 1998a, 1999a). Africa and Oceania (especially New Guinea) have many small languages because they have many small subsistence economies, facilitated by the low level of modern economic development, the recency of state formation, and equatorial climates whose lack of seasonality minimise the need for exchange and whose disease burdens encourage limited dispersal (Fincher & Thornhill 2008). Europe is at the opposite extreme, with seasonal production and large-scale market exchange over many hundreds of years, and early state formation. Asia is a mixture of a more New Guinea-like situation in Southeast Asia and Indonesia, with a more Europe-like situation in East Asia. The relatively low language diversity of the Americas probably reflects post-contact extinction; many languages were lost in the population collapses after European arrival. The stock diversity, both uncorrected and corrected for land area, shows a different pattern again. Eurasia and Africa are rather poor in stocks, whereas Oceania and the Americas are around an order of magnitude more diverse. To understand why this might be the case, we need to consider that the best available evidence suggests that the large stocks with which we are familiar, such as Niger-Congo (Bantu), Indo-European, and Austronesian, appear to have been spread by large-scale demographic expansions within the last ten thousand years, often driven by the expansion of agricultural production systems (Renfrew 1987, Diamond 1994, Nettle 1999a). More often than not, these expansion were into already-inhabited areas, whose foraging populations were incorporated biologically but whose culture, including language, leaves no trace. Thus, when 8 we observe the relatively low linguistic diversity of Africa and Eurasia, we are observing the homogenising events of the Holocene, with its high-density, fastgrowing food-producing populations expanding and overlaying the previous pattern of diversity. These homogenising events had much greater impact in Africa and Eurasia, where a few centres of food production lead to expansions across much of those continents’ East-West axis, and significant cultural homogenisation. In the Americas, there were transitions to food production, but they did not easily spread along the predominantly North-South axis of the continent, and a mixture of separated farming and foraging populations, with all the cultural diversity they represent, remained. In Oceania, there were only limited transitions to food production (none at all in Australia), which, coupled with the challenging island and mountain geography, has allowed many populations to persist unhomogenised. Thus, far from working in tandem, genetic and linguistic diversity show quite different patterns and reflect processes working at different temporal depths. Lowered genetic diversity tells us about founder effects stemming from colonisation events that may have happened forty thousand years ago. Lowered diversity of linguistic stocks tells us that there have been homogenising processes, such as large scale demographic expansions associated with agriculture, within the past ten thousand years. Lowered diversity of languages tell us about the pattern of economic organisation probably within the last five hundred years (that small groups have been incorporated into a wider regional system, for example). 4. Transmission mechanisms and decoupling of language and genes The previous section showed that genetic traits and a cultural trait – language – can become radically decoupled during evolution, such that the greatest diversity in language is found where there is the least diversity in genes. This section considers in a little more detail how this decoupling can happen, given that both language and genes pass from generation to generation through local interactions. Both genetic and cultural change are characterised by some kind of innovation and some kind of diffusion. In the genetic case, the source of innovation is random mutation, whereas in the linguistic case, the source is idiosyncrasies, random or otherwise, in the processes of language acquisition and use. Rates of 9 new genetic mutation are generally fairly low (although they differ markedly across the genome), and rates of cultural innovation are probably much higher as a rule. For genetics, the mechanism of diffusion is sexual reproduction. If an individual mates, then because of fair meiosis, any mutation that individual is carrying has a more or less equal chance of appearing in the offspring, and (assuming it has no dramatic impact on fitness either way), an equal chance to other alleles of being diffused into further generations. This ready diffusion means that, in classical population genetic models, even a single migrant per generation between two genetic populations is enough to make the gene pool of those two populations converge. The process of cultural diffusion is likely to be quite different. The learner does not sample from just two individuals (its parents), but is potentially exposed to a wide range of cultural models. Exactly how these models are sampled and their input incorporated is not understood in detail, and probably varies. Proposals include a conformist bias (adopt the most frequent cultural variant in the surroundings), or a prestige bias (adopt the variant of the individuals with the highest status locally; Boyd & Richerson 1985; for simulations of the effects of different transmission rules in different social networks see Nettle 1999a, b). Both conformist and prestige-biased transmission produce quite different dynamics in the cultural than in the genetic case. For example, consider a sub-group of 10 individuals with variant A which is incorporated into a larger population of 90 individuals that carry variant a. There is panmixis and no difference in fitness between the variants. In the genetic case, where A and a are alleles of a gene, then after a generation or two, by simple Hardy-Weinberg calculations, the allele frequencies in the new population will be 1:9 A:a. Now consider the genetic case. If the cultural trait is transmitted by conformist learning, then all learners, even if they are children of A parents, will encounter more instances of a than A, and thus the frequencies after a generation or two will be 0:10. On the other hand, if learning is prestige-biased, then the frequencies might be either 0:10 or 10:0, depending on whether the 10 individuals with variant A were coming into the group with high or with low local status. The implication of the foregoing considerations is the following. Genetic innovations arise only slowly, because the rate of mutation is low. This is why non-African populations still show reduced diversity from serial founder effects 10 tens of thousand years after their origins. On the other hand, genetic innovations that are not too deleterious to fitness diffuse readily. This is why the great Bantu expansions in the African Holocene did not manage to reduce Africa’s internal genetic diversity. As long as a few individuals from the non-Bantu populations swallowed up in the expansion managed to have children, their genetic legacy has a good chance of remaining in the continental mix. In the cultural/linguistic case, innovations can arise rapidly, explaining why the Americas can have produced dozens of quite different language families in what may be around 15 thousand years of habitation. However, linguistic innovations do not diffuse so readily. If speakers of non-Bantu languages were subsumed by Bantu populations substantially larger, or of higher status, than they were, then their cultural traits would disappear utterly, and their biological children would be speaking Bantu languages. These different transmission mechanisms help explain why the patterns of linguistic and genetic diversity on the landscape are so different. Genetic diversity is generally arranged along smooth clines, with only a tiny amount of abrupt discontinuity, and even this corresponds to physical barriers rather than ethnic boundaries (Rosenberg et al. 2005, Handley et al. 2007). By contrast, a glance at a linguistic atlas will show that in most rural areas of the world, people speak one language exactly to the point where they suddenly speak another. There may of course be multilingualism, but in these cases the several codes are nonetheless kept largely distinct in the speakers’ minds. Why the marked difference? In the genetic case, because of genes’ ready diffusion, a few local migrants per generation are enough to smooth the discontinuities away. In the linguistic case, people generally belong to, and accord status within, one particular core social network or another, and a point on the landscape is reached where we have crossed from the sphere of influence of one to that of the next. These social structures are the conduits of transmission for language, and incomers must usually conform to incumbent prestige and weight of numbers. Genetically, a few Englishmen migrating to France every generation renders those two populations indistinguishable. Linguistically, a few Englishmen migrating to France every generation is just a few Englishmen who have to learn French. French is in no sense a more Germanic language as a consequence. Thus, ethnolinguistic boundaries can survive many generations of chronic cross-boundary exogamy without becoming any less exact or marked (Barth 1969, Sorenson 1971, Nettle 1998b). 11 The exception to this picture is what linguists call loan words; isolated lexical items that move between languages, including languages of different families. Loans do not change the phylogenetic affiliation of the two languages, though in extreme cases, they may make the phylogenetic affiliations completely obscure (Thomason & Kaufman 1988, Dixon 1997). There are scattered observations in the literature that the frequency of loan-words between small communities tracks the amount of demographic admixture (Nettle 1998b, Lansing et al. 2007). The mechanism appears to be that the in-marrying parent talks to their children in their native tongue, and certain words manage to diffuse into general use, even though it is the local majority language that is being transmitted overall. Thus it may be that loanwords are more tightly coupled to genes than whole languages are. Thus, we have an account of how linguistic and genetic diversity become so decoupled (figure 2). An ancient continent such as Africa would have produced a great wealth of linguistic diversity in the Pleistocene, but most of that failed to diffuse into the present through Holocene processes of homogenization. By contrast, Africa’s relatively great genetic diversity did survive these events. The opposite to Africa, a young continent like the Americas has had enough time to produce a great flowering of linguistic diversity – and no pre-European expansions able to erase that diversity - but not enough time for the genetic bottleneck of its founding to be erased, leaving it relatively homogenous in genetic terms. 12 References Cited Barth, F. (1960). Ethnic Groups and Boundaries. London: Allen & Unwin. Bowcock, A.M., A. Ruiz-Linares, A. Tomfohrde, E. Minch, J.R. Kidd & L.L. CavalliSforza. (1994). High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368: 455-7. Boyd, R. & P. Richerson (1985). Culture and the Evolutionary Process. Chicago: University of Chicago Press. Campbell, L. (1998). Historical Linguistics. Edinburgh: Edinburgh University Press. Cann, R.L., M. Stoneking & M. Wilson (1987). Mitochondrial DNA and human evolution. Nature 325: 31-6. Cavalli-Sforza, L.L., A. Piazza, P. Menozzi & J. Mountain (1988). Reconstruction of human evolution: Bringing together genetic, archaeological and linguistic data. Proceedings of the National Academy of Sciences of the USA 85: 6002-6. Chen, J, R.R. Sokal & M. Ruhlen (1995). Worldwide analysis of genetic and linguistic relationships of human populations. Human Biology 67: 595-612. Diamond, J. (1994). Guns, Germs and Steel: The Fate of Human Societies. London: Jonathan Cape. Dixon, R.M.W. (1997). The Rise and Fall of Languages. Cambridge: Cambridge University Press. Fincher, C.L. & R. Thornhill (2008). A parasite-driven wedge: Infectious diseases may explain language and other biodiversity. Oikos 117:1289-1297. Foley, W. A. (1986). The Papuan Languages of New Guinea. Cambridge: Cambridge University Press. Greenberg, J.H. (1987). Language in the Americas. Stanford: Stanford University Press. Grimes, B.F. (2000). Ethnologue: The World’s Languages. Norman, OK: Summer Institute of Linguists. 13th edition. Hammer, M.F., A.B. Spurdle, T. Karafet, M.R. Bonner, E.T. Wood et al. (1997). The geographic distribution of Y chromosome variation. Genetics 145: 787-85. Handley, L.J.L., A. Manica, J. Goudet & F. Balloux (2007). Going the distance: Human population genetics in a clinal world. Trends in Genetics 23: 432-9. Hellenthal, G., A. Auton & D. Falush (2008). Inferring human colonization history using a copying model. Plos Genetics 4: e1000078. Ingman, M., H. Kaessmann, S. Paabo & U. Gyllensten (2000). Mitochondrial genome variation and the origin of modern humans. Nature 408: 708-13. 13 Jin, L., P.A. Underhill, V. Doctor, R.W. Davis, P.D. Shen, L.L. Cavalli-Sforza & P.J. Oefner (1999). Distribution of haplotypes from a chromosome 21 region distinguishes multiple prehistoric human migrations. Proceedings of the National Academy of Sciences of the USA 96: 3796-400. Lansing, J.S., M.P. Cox, S.S. Downey, B.M. Gabler, B. Hallmark et al. (2007). Coevolution of languages and genes on the island of Sumba, eastern Indonesia. Proceedings of the National Academy of Sciences of the USA 104: 16022-6. Liu, H., F. Prugnolle, A. Manica & F. Balloux (2006). A geographically explicit genetic model of worldwide human settlement. American Journal of Human Genetics 79: 230-7. Mithun, M. (1999). The Languages of Native North America. Cambridge: Cambridge University Press. Nettle, D. (1998a). Explaining global patterns of language diversity. Journal of Anthropological Archaeology 17: 354-74. Nettle, D. (1998b). The Fyem Language of Northern Nigeria. Munich: Lincom Europa. Nettle, D. (1999a). Linguistic Diversity. Oxford: Oxford University Press. Nettle, D. (1999b). Using social impact theory to simulate language change. Lingua 108: 95-117. Nettle, D. & L. Harriss (2003). Genetic and linguistic affinities between human populations in Eurasia and West Africa. Human Biology 75: 331-44. Nichols, J. (1992). Linguistic Diversity in Space and Time. Chicago: University of Chicago Press. Ramachandran, S., O. Deshpande, C.C. Roseman, N.A. Rosenberg, M.W. Feldman & L.L. Cavalli-Sforza (2005). Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proceedings of the National Academy of Sciences of the USA 102: 15942-7. Renfrew, C. (1987). Archaeology and Language. London: Jonathan Cape. Renfrew, C. (1992). Archaeology, genetics and linguistic diversity. Man 27: 44578. Rosenberg, N.A., S. Mahajan, S. Ramachandran, C. Zhao, J.K. Pritchard & M.W. Feldman (2005). Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genetics 1: 660-671. Sorenson, A.P. (1971). Multilingualism in the Northwest Amazon. American Anthropologist 69: 670-84. 14 Thomason, S. G. & T. Kaufman (1988). Language Contact, Creolization and Genetic Linguistics. Berkeley: University of California Press. 15 Table 1. Types and sources of data on continental genetic and linguistic diversity. System Data type Diversity measure Source Complete Pairwise sequence Ingman et al. sequence differences (mean) (2000) Haplotype Nei diversity Hammer et al. frequencies measure (1996) Haplotype Entropy measure Jin et al. (1999) Entropy measure Jin et al. (1999) Genetic mtDNA Y chromosome (1) Y chromosome (2) frequencies Non-recombining Haplotype segment of frequencies chromosome 21 Autosomal Microsatellite Mean Bowcock et al. microsatellites frequencies heterozygosity (1994) Count Absolute number Grimes (2000) Linguistic Languages and number/ million km2 Linguistic stocks Count Absolute number and number/ million km2 16 Nichols (1992) Table 2. Continental comparisons for genetic and linguistic diversity (data as described in table 1). Bold: Most diverse continent for the system. Italic: Least diverse continent for the system. Continent mtDNA Y chr. 1 Y chr. 2 Chr. 21 Microsats. Languages Languages/ Stocks Mkm2 Stocks/ Mkm2 Africa 76.7 0.88 61 73 0.81 2011 66.90 20 0.67 Asia 36.7 0.78 37 50 0.69 2165 48.58 22 0.49 Europe 24.7 0.74 39 48 0.73 225 22.64 6 0.60 Oceania 41.8 0.72 41 71 0.64 1302 169.31 46 5.98 Americas 36.2 0.56 18 41 0.59 1000 23.76 157 3.73 17 Table 3. Transmission mechanisms for languages and genes: Summary Innovation rate Diffusability Languages High Low Genes Low High 18
© Copyright 2026 Paperzz