American Journal of Physical Anthropology Copy of e-mail Notification Your article ( 2006-00153.R2 ) from American Journal of Physical Anthropology is available for download ===== American Journal of Physical Anthropology Published by John Wiley & Sons, Inc. Dear Author, Your article page proofs for American Journal of Physical Anthropology are ready for review. John Wiley & Sons has made this article available to you online for faster, more efficient editing. Please follow the instructions below and you will be able to access a PDF version of your article as well as relevant accompanying paperwork. First, make sure you have a copy of Adobe Acrobat Reader software to read these files. This is free software and is available for user downloading at http://www.adobe.com/products/acrobat/readstep.html. Open your web browser, and enter the following web address: http://kwglobal.co.in/jw/retrieval.aspx You will be prompted to log in, and asked for a password. Your login name will be your email address, and your password will be ---Example: Login: your e-mail address Password: ---The site contains one file, containing: - Author Instructions Checklist Adobe Acrobat Users - NOTES tool sheet Reprint Order form A copy of your page proofs for your article Print out this file, and fill out the forms by hand. (If you do not wish to order reprints, please mark a "0" on the reprint order form.) Read your page proofs carefully and: - indicate changes or corrections in the margin of the page proofs answer all queries (footnotes A,B,C, etc.) on the last page of the PDF proof proofread any tables and equations carefully check your figure legends for accuracy Within 48 hours, please return via fax or express mail all materials to the address given below. This will include: 1) Page proofs with corrections 2) Reprint Order form Return to: Doug Frank Production Editor American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522 [email protected] Phone: 800-238-3814 x615 Fax: 717-738-9360 Technical problems? If you experience technical problems downloading your file or any other problem with the website listed above, please contact Malathi Shekar (e-mail: [email protected], phone: +91 (44) 42058888 (x310). Questions regarding your article? Please don’t hesitate to contact me with any questions about the article itself, or if you have trouble interpreting any of the questions listed at the end of your file. REMEMBER TO INCLUDE YOUR ARTICLE NO. ( 2006-00153.R2 ) WITH ALL CORRESPONDENCE. This will help both of us address your query most efficiently. As this e-proofing system was designed to make the publishing process easier for everyone, we welcome any and all feedback. Thanks for participating in our e-proofing system! This e-proof is to be used only for the purpose of returning corrections to the publisher. Sincerely, Doug Frank Production Editor Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522 e-mail: [email protected] Phone: 800-238-3814 x615 Fax: 717-738-9360 111 RIVER STREET, HOBOKEN, NJ 07030 ***IMMEDIATE RESPONSE REQUIRED*** Your article will be published online via Wiley's EarlyView® service (www.interscience.wiley.com) shortly after receipt of corrections. EarlyView® is Wiley's online publication of individual articles in full text HTML and/or pdf format before release of the compiled print issue of the journal. Articles posted online in EarlyView® are peer-reviewed, copyedited, author corrected, and fully citable via the article DOI (for further information, visit www.doi.org). EarlyView® means you benefit from the best of two worlds--fast online availability as well as traditional, issue-based archiving. Please follow these instructions to avoid delay of publication. READ PROOFS CAREFULLY • This will be your only chance to review these proofs. Please note that once your corrected article is posted online, it is considered legally published, and cannot be removed from the Web site for further corrections. • Please note that the volume and page numbers shown on the proofs are for position only. ANSWER ALL QUERIES ON PROOFS (Queries for you to answer are attached as the last page of your proof.) • Mark all corrections directly on the proofs. Note that excessive author alterations may ultimately result in delay of publication and extra costs may be charged to you. CHECK FIGURES AND TABLES CAREFULLY (Color figure proofs will be sent under separate cover.) • Check size, numbering, and orientation of figures. • All images in the PDF are downsampled (reduced to lower resolution and file size) to facilitate Internet delivery. These images will appear at higher resolution and sharpness in the printed article. • Review figure legends to ensure that they are complete. • Check all tables. Review layout, title, and footnotes. COMPLETE REPRINT ORDER FORM • Fill out the attached reprint order form. It is important to return the form even if you are not ordering reprints. You may, if you wish, pay for the reprints with a credit card. Reprints will be mailed only after your article appears in print. This is the most opportune time to order reprints. If you wait until after your article comes off press, the reprints will be considerably more expensive. RETURN PROOFS REPRINT ORDER FORM CTA (If you have not already signed one) RETURN IMMEDIATELY AS YOUR ARTICLE WILL BE POSTED ONLINE SHORTLY AFTER RECEIPT; FAX PROOFS TO 717-738-9360. QUESTIONS? Doug Frank, Production Editor Phone: 800-238-3814 x615 E-mail: [email protected] Refer to journal acronym and article production number (i.e., AJPA 02-3399 for American Journal of Physical Anthropology ms 02-3399). Softproofing for advanced Adobe Acrobat Users - NOTES tool NOTE: ACROBAT READER FROM THE INTERNET DOES NOT CONTAIN THE NOTES TOOL USED IN THIS PROCEDURE. Acrobat annotation tools can be very useful for indicating changes to the PDF proof of your article. By using Acrobat annotation tools, a full digital pathway can be maintained for your page proofs. The NOTES annotation tool can be used with either Adobe Acrobat 6.0 or Adobe Acrobat 7.0. Other annotation tools are also available in Acrobat 6.0, but this instruction sheet will concentrate on how to use the NOTES tool. Acrobat Reader, the free Internet download software from Adobe, DOES NOT contain the NOTES tool. In order to softproof using the NOTES tool you must have the full software suite Adobe Acrobat Exchange 6.0 or Adobe Acrobat 7.0 installed on your computer. Steps for Softproofing using Adobe Acrobat NOTES tool: 1. Open the PDF page proof of your article using either Adobe Acrobat Exchange 6.0 or Adobe Acrobat 7.0. Proof your article on-screen or print a copy for markup of changes. 2. Go to Edit/Preferences/Commenting (in Acrobat 6.0) or Edit/Preferences/Commenting (in Acrobat 7.0) check “Always use login name for author name” option. Also, set the font size at 9 or 10 point. 3. When you have decided on the corrections to your article, select the NOTES tool from the Acrobat toolbox (Acrobat 6.0) and click to display note text to be changed, or Comments/Add Note (in Acrobat 7.0). 4. Enter your corrections into the NOTES text box window. Be sure to clearly indicate where the correction is to be placed and what text it will effect. If necessary to avoid confusion, you can use your TEXT SELECTION tool to copy the text to be corrected and paste it into the NOTES text box window. At this point, you can type the corrections directly into the NOTES text box window. DO NOT correct the text by typing directly on the PDF page. 5. Go through your entire article using the NOTES tool as described in Step 4. 6. When you have completed the corrections to your article, go to Document/Export Comments (in Acrobat 6.0) or Comments/Export Comments (in Acrobat 7.0). Save your NOTES file to a place on your harddrive where you can easily locate it. Name your NOTES file with the article number assigned to your article in the original softproofing e-mail message. 7. When closing your article PDF be sure NOT to save changes to original file. 8. To make changes to a NOTES file you have exported, simply re-open the original PDF proof file, go to Document/Import Comments and import the NOTES file you saved. Make changes and reexport NOTES file keeping the same file name. 9. When complete, attach your NOTES file to a reply e-mail message. Be sure to include your name, the date, and the title of the journal your article will be printed in. C1 REPRINT BILLING DEPARTMENT x 111 RIVER STREET x HOBOKEN, NJ 07030 PHONE: (201) 748-8789; FAX: (201) 748-6326 E-MAIL: [email protected] PREPUBLICATION REPRINT ORDER FORM Please complete this form even if you are not ordering reprints. This form MUST be returned with your corrected proofs and original manuscript. Your reprints will be shipped approximately 4 weeks after publication. Reprints ordered after printing will be substantially more expensive. JOURNAL American Journal of Physical Anthropology VOLUME ISSUE TITLE OF MANUSCRIPT MS. NO. AJPA-- NO. OF AUTHOR(S) PAGES No. of Pages 100 Reprints 200 Reprints 300 Reprints 400 Reprints $ $ $ $ 500 Reprints $ 1-4 336 501 694 890 1052 5-8 469 703 987 1251 1477 9-12 594 923 1234 1565 1850 13-16 714 1156 1527 1901 2273 17-20 794 1340 1775 2212 2648 21-24 911 1529 2031 2536 3037 25-28 1004 1707 2267 2828 3388 29-32 1108 1894 2515 3135 3755 33-36 1219 2092 2773 3456 4143 37-40 1329 2290 3033 3776 4528 **REPRINTS ARE ONLY AVAILABLE IN LOTS OF 100. IF YOU WISH TO ORDER MORE THAN 500 REPRINTS, PLEASE CONTACT OUR REPRINTS DEPARTMENT AT (201) 748-8789 FOR A PRICE QUOTE. Please send me _____________________ Please add appropriate State and Local Tax reprints of the above article at $ (Tax Exempt No.____________________) $ Please add 5% Postage and Handling $ TOTAL AMOUNT OF ORDER** $ for United States orders only. **International orders must be paid in currency and drawn on a U.S. bank Please check one: Check enclosed Bill me Credit Card If credit card order, charge to: Visa MasterCard Credit Card No American Express Signature Exp. Date BILL TO: SHIP TO: Name Name Institution Institution Address Address Purchase Order No. Phone E-mail (Please, no P.O. Box numbers) Fax COPYRIGHT TRANSFER AGREEMENT Date: To: 111 River Street Hoboken, NJ 07030 201.748.6000 FAX 201.748.6052 Production/Contribution ID# _______________ Publisher/Editorial office use only Re: Manuscript entitled________________________________________________________________________________ ____________________________________________________________________________________(the “Contribution”) for publication in ______________________________________________________________________(the “Journal”) published by Wiley Periodicals, Inc. (“Wiley”). Dear Contributor(s): Thank you for submitting your Contribution for publication. In order to expedite the editing and publishing process and enable Wiley to disseminate your work to the fullest extent, we need to have this Copyright Transfer Agreement signed and returned to us as soon as possible. If the Contribution is not accepted for publication this Agreement shall be null and void. A. COPYRIGHT 1. The Contributor assigns to Wiley, during the full term of copyright and any extensions or renewals of that term, all copyright in and to the Contribution, including but not limited to the right to publish, republish, transmit, sell, distribute and otherwise use the Contribution and the material contained therein in electronic and print editions of the Journal and in derivative works throughout the world, in all languages and in all media of expression now known or later developed, and to license or permit others to do so. 2. Reproduction, posting, transmission or other distribution or use of the Contribution or any material contained therein, in any medium as permitted hereunder, requires a citation to the Journal and an appropriate credit to Wiley as Publisher, suitable in form and content as follows: (Title of Article, Author, Journal Title and Volume/Issue Copyright [year] Wiley Periodicals, Inc. or copyright owner as specified in the Journal.) B. RETAINED RIGHTS Notwithstanding the above, the Contributor or, if applicable, the Contributor’s Employer, retains all proprietary rights other than copyright, such as patent rights, in any process, procedure or article of manufacture described in the Contribution, and the right to make oral presentations of material from the Contribution. C. OTHER RIGHTS OF CONTRIBUTOR Wiley grants back to the Contributor the following: 1. The right to share with colleagues print or electronic “preprints” of the unpublished Contribution, in form and content as accepted by Wiley for publication in the Journal. Such preprints may be posted as electronic files on the Contributor’s own website for personal or professional use, or on the Contributor’s internal university or corporate networks/intranet, or secure external website at the Contributor’s institution, but not for commercial sale or for any systematic external distribution by a third party (e.g., a listserve or database connected to a public access server). Prior to publication, the Contributor must include the following notice on the preprint: “This is a preprint of an article accepted for publication in [Journal title] copyright (year) (copyright owner as specified in the Journal)”. After publication of the Contribution by Wiley, the preprint notice should be amended to read as follows: “This is a preprint of an article published in [include the complete citation information for the final version of the Contribution as published in the print edition of the Journal]”, and should provide an electronic link to the Journal’s WWW site, located at the following Wiley URL: http://www.interscience.Wiley.com/. The Contributor agrees not to update the preprint or replace it with the published version of the Contribution. 2. The right, without charge, to photocopy or to transmit online or to download, print out and distribute to a colleague a copy of the published Contribution in whole or in part, for the colleague’s personal or professional use, for the advancement of scholarly or scientific research or study, or for corporate informational purposes in accordance with Paragraph D.2 below. 3. The right to republish, without charge, in print format, all or part of the material from the published Contribution in a book written or edited by the Contributor. 4. The right to use selected figures and tables, and selected text (up to 250 words, exclusive of the abstract) from the Contribution, for the Contributor’s own teaching purposes, or for incorporation within another work by the Contributor that is made part of an edited work published (in print or electronic format) by a third party, or for presentation in electronic format on an internal computer network or external website of the Contributor or the Contributor’s employer. 5. The right to include the Contribution in a compilation for classroom use (course packs) to be distributed to students at the Contributor’s institution free of charge or to be stored in electronic format in datarooms for access by students at the Contributor’s institution as part of their course work (sometimes called “electronic reserve rooms”) and for inhouse training programs at the Contributor’s employer. D. CONTRIBUTIONS OWNED BY EMPLOYER 1. If the Contribution was written by the Contributor in the course of the Contributor’s employment (as a “work-madefor-hire” in the course of employment), the Contribution is owned by the company/employer which must sign this Agreement (in addition to the Contributor’s signature), in the space provided below. In such case, the company/employer hereby assigns to Wiley, during the full term of copyright, all copyright in and to the Contribution for the full term of copyright throughout the world as specified in paragraph A above. 2. In addition to the rights specified as retained in paragraph B above and the rights granted back to the Contributor pursuant to paragraph C above, Wiley hereby grants back, without charge, to such company/employer, its subsidiaries and divisions, the right to make copies of and distribute the published Contribution internally in print format or electronically on the Company’s internal network. Upon payment of Wiley’s reprint fee, the institution may distribute (but not resell) print copies of the published Contribution externally. Although copies so made shall not be available for individual re-sale, they may be included by the company/employer as part of an information package included with software or other products offered for sale or license. Posting of the published Contribution by the institution on a public access website may only be done with Wiley’s written permission, and payment of any applicable fee(s). E. GOVERNMENT CONTRACTS In the case of a Contribution prepared under U.S. Government contract or grant, the U.S. Government may reproduce, without charge, all or portions of the Contribution and may authorize others to do so, for official U.S. Government purposes only, if the U.S. Government contract or grant so requires. (U.S. Government Employees: see note at end.) F. COPYRIGHT NOTICE The Contributor and the company/employer agree that any and all copies of the Contribution or any part thereof distributed or posted by them in print or electronic format as permitted herein will include the notice of copyright as stipulated in the Journal and a full citation to the Journal as published by Wiley. G. CONTRIBUTOR’S REPRESENTATIONS The Contributor represents that the Contribution is the Contributor’s original work. If the Contribution was prepared jointly, the Contributor agrees to inform the co-Contributors of the terms of this Agreement and to obtain their signature to this Agreement or their written permission to sign on their behalf. The Contribution is submitted only to this Journal and has not been published before, except for “preprints” as permitted above. (If excerpts from copyrighted works owned by third parties are included, the Contributor will obtain written permission from the copyright owners for all uses as set forth in Wiley’s permissions form or in the Journal’s Instructions for Contributors, and show credit to the sources in the Contribution.) The Contributor also warrants that the Contribution contains no libelous or unlawful statements, does not infringe upon the rights (including without limitation the copyright, patent or trademark rights) or the privacy of others, or contain material or instructions that might cause harm or injury. CHECK ONE: [____] Contributor-owned work ______________________________________ _________________ Contributor’s signature Date ________________________________________________________ Type or print name and title ______________________________________ _________________ Co-contributor’s signature Date ________________________________________________________ Type or print name and title ATTACHED ADDITIONAL SIGNATURE PAGE AS NECESSARY [____] Company/Institution-owned work (made-for-hire in the course of employment) ______________________________________ Company or Institution (Employer-for-Hire) _________________ Date ______________________________________ Authorized signature of Employer _________________ Date [____] U.S. Government work Note to U.S. Government Employees A contribution prepared by a U.S. federal government employee as part of the employee’s official duties, or which is an official U.S. Government publication is called a “U.S. Government work,” and is in the public domain in the United States. In such case, the employee may cross out Paragraph A.1 but must sign and return this Agreement. If the Contribution was not prepared as part of the employee’s duties or is not an official U.S. Government publication, it is not a U.S. Government work. [____] U.K. Government work (Crown Copyright) Note to U.K. Government Employees The rights in a Contribution prepared by an employee of a U.K. government department, agency or other Crown body as part of his/her official duties, or which is an official government publication, belong to the Crown. In such case, Wiley will forward the relevant form to the Employee for signature. J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 1 AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 131:000–000 (2007) AQ1 Worldwide Analysis of Multiple Microsatellites: Language Diversity has a Detectable Influence on DNA Diversity Elise M.S. Belle and Guido Barbujani* Dipartimento di Biologia ed Evoluzione, Università di Ferrara, Via Borsari, 46, 44100 Ferrara, Italy KEY WORDS microsatellite loci; population genetics; genetic diversity; linguistics ABSTRACT Previous studies of the correlations between the languages spoken by human populations and the genes carried by the members of those populations have been limited by the small amount of genetic markers available and by approximations in the treatment of linguistic data. In this study we analyzed a large collection of polymorphic microsatellite loci (377), distributed on all autosomes, and used Ruhlen’s linguistic classification, to investigate the relative roles of geography and language in shaping the distribution of human DNA diversity at a worldwide scale. For this purpose, we performed three different kinds of analysis: (i) we partitioned genetic variances at three hierarchical levels of population subdivision according to language group by means of a molecular analysis of variance (AMOVA); (ii) we quantified by a series of Mantel’s tests the correlation between measures of genetic and linguistic differentiation; and (iii) we tested whether linguistic differences are increased across known zones of increased genetic change between populations. Genetic differences appear to more closely reflect geographic than linguistic differentiation. However, our analyses show that language differences also have a detectable effect on DNA diversity at the genomic level, above and beyond the effects of geographic distance. Am J Phys Anthropol 131:000–000, 2007. V 2007 Wiley-Liss, Inc. The patterns of genetic diversity in the geographic space reflect the interaction between evolutionary factors leading populations to diverge (mutation, drift, and diversifying selection) and factors causing genetic convergence (gene flow and other forms of selection). In humans, the spatial distance between potential mates is probably the most important factor preventing gene flow between populations (Relethford, 2004), but nonspatial reproductive barriers, such as cultural, religious, and language boundaries, also play an important role (Barbujani, 1991). In particular, language boundaries have been shown to represent a major obstacle to interbreeding, which accounts for the generally good correlation between allele frequencies and language groups (Sokal, 1988; Cavalli-Sforza et al., 1988; Barbujani and Sokal, 1990; Cavalli-Sforza et al., 1992). Any obstacles to gene flow, including geographic distance, tend to decrease both genetic and linguistic similarity between populations. Languages spoken by distant or isolated groups will undergo independent changes and slowly diverge from the source language, whereas adjacent groups—or groups not separated by reproductive barriers—will have easier contacts and will tend to speak related languages (Nichols, 1997). Under the same conditions, parallel processes occur at the genomic level. However, the phenomena producing genetic and linguistic change through time are not identical, and changes might occur at different time scales. Cavalli-Sforza et al. (1988) first suggested that linguistic change might be more rapid than genetic change. Chen et al. (1995) remarked that genetic transmission is only vertical whereas languages can also spread horizontally and suggested that linguistic and genetic change are likely to proceed at different rates. In human history many examples of phenomena disrupting the parallelism between genetic and linguistic change are indeed known. These include selection affecting specific genetic loci, episodes of language replacement associated with the migration of a numerically small elite, creolization, and admixture accompanied by the establishment of a common language (Renfrew, 1989). Therefore, a relationship between language and genes is not a safe general assumption, but rather a hypothesis which needs to be tested. Around 6,000 languages are currently spoken on earth, but the great majority is threatened of extinction. Indeed about 97% of the world’s population speak 4% of the languages, whereas 10% of languages have less than 100 speakers (Wurm, 2001). About half of the languages are currently losing their speakers, which could lead to the replacement of between 50% and 90% of the minority languages before the end of the century. In this context, it is urgent to assess to which extent languages differences have played a role in shaping human genetic diversity before many of the relevant data vanish. Several studies have shown that there is a significant correlation between genetic and linguistic diversity in Europe (Sokal, 1988; Sokal et al., 1989, 1989; Harding and Sokal, 1988; Sajantila et al., 1995) and that language barriers probably play an important role in main- C 2007 V C Grant sponsor: European Science Foundation (Eurocores Programme: The Origin of Man, Language, and Languages) through the Italian CNRUniversity of Ferrara. *Correspondence to: Guido Barbujani, Dipartimento di Biologia ed Evoluzione, Università di Ferrara, Via Borsari, 46, 44100 Ferrara, Italy. E-mail: [email protected] Received 27 June 2006; accepted 28 February 2007 DOI 10.1002/ajpa.20622 Published online in Wiley InterScience (www.interscience.wiley.com). WILEY-LISS, INC. ID: vasanss Date: 20/3/07 Time: 15:43 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 2 Stage: I Page: 2 BELLE AND BARBUJANI taining genetic differences by acting as reproductive barriers (Barbujani and Sokal, 1990). In sub-Saharan Africa, Excoffier et al. (1991) found that the language family relationships are a better predictor of the genetic structure than geographical relatedness. Based on classical markers (20 alleles in total), Cavalli-Sforza et al. (1988) first produced evidence suggesting that the main linguistic groups of the world broadly correspond to genetic clusters. Later, Chen et al. (1995) and Poloni et al. (1997) also found a significant correlation between genetic and linguistic diversity based, respectively, on 11 autosomal loci and on Y-chromosome restriction polymorphisms. On the other hand, the results are not so clear-cut in the Americas and at the worldwide scale. Some authors have indeed failed to detect association between genetic and linguistic diversity among native Americans and Asians (Monsalve et al., 1999) and within North America (Hunley and Long, 2005). Finally, Nettle and Harris (2003) found a significant influence of languages on X chromosome genetic differences in Europe and East and Central Asia, but not in the Near East, Southeast Asia, and West Africa, a result indicating that this correlation emerges only under specific conditions. All these studies relied on a limited number of genetic polymorphisms, which may not exhaustively reflect diversity at the genomic level. The number of loci analyzed is not a secondary factor. Indeed, particular selection regimes—affecting specific loci—may generate genetic outliers, i.e. loci with unusual spatial patterns of diversity. As a consequence, as has been suggested for many decades now (Cavalli-Sforza, 1966), to minimize the effect of such outliers, an extensive sampling of different genome regions is indispensable. Therefore, in this study we analyzed a large sample of polymorphic microsatellite loci (377) spread across the genome, the largest genetic dataset used so far to investigate the influence of languages on the distribution of human genetic diversity. Rosenberg et al. (2002) briefly discussed the relationship between genes and languages in these data, but without a quantitative approach. Here, thanks to the large number of markers considered, we are seeking a more accurate estimate of the correlation between genetics and linguistics. Those autosomal microsatellite markers have a higher rate of mutation than other markers, and could therefore parallel more closely the linguistic change (Barbujani, 1997). In addition, estimates of variation at microsatellite loci do not seem to suffer from ascertainment bias, a factor known to lead to erroneous conclusions in analyses based on single nucleotide or insertion–deletion polymorphisms (Clark et al., 2005). We addressed three questions in particular, namely (i) Do increasing levels of linguistic differentiation lead to greater genetic distances? (ii) What is the relative weight of geography and languages in shaping human genetic diversity? (iii) Do previously identified genetic barriers across the world also tend to represent zones of increased linguistic differentiation? diversity panel of the CEPH (Centre pour l’Etude des Polymorphismes Humains at the Institut Jean Dausset in Paris) (Cann et al., 2002). In the CEPH panel, the individuals defined as \Bantus" came from several different areas; we excluded from the analysis all those who were not from Kenya because of their small sample sizes. Languages were attributed to the samples according to two sources, namely Ruhlen’s (1991) classification and The Ethnologue website (Gordon, 2005). There is so far no universally accepted global taxonomy of languages; however, Ruhlen’s classification, albeit somewhat controversial, is widely used in population genetic studies (Poloni et al., 1997; Chen et al., 1995; Cavalli-Sforza et al., 1992, 1988; Excoffier et al., 1991) because it is probably the most comprehensive attempt to group all the world’s 6,000 languages into linguistic phyla, 17. The languages of all populations of this study were defined consistently in the two sources, with the exception of a few populations indicated in Table 1. For instance, The Ethnologue classifies the American languages into to four distinct phyla, and not one as Ruhlen (1991) does. The CEPH dataset includes two pygmy samples, Biaka and Mbuti. The pygmies’ linguistic affiliation is difficult to define, as in most cases they seem to have borrowed their neighbors’ language (Cavalli-Sforza, 1986). We attributed the Biaka to the Niger-Kordofanian phylum, which includes Bantu languages, because, besides speaking a language of this phylum, the Biaka are thought to have undergone extensive admixture with Bantu speakers (Cavalli-Sforza, 1986). On the contrary, Mbuti pygmies include both speakers of Nilosaharan and NigerKordofanian languages (ALFRED database: Cheung et al., 2000), whom proved impossible for us to separate. As a consequence, the Mbuti sample was not considered in the analysis. In this way, our results refer to 50 populations: 6 from Africa, 3 from the Middle East, 26 from Asia, 2 from Oceania, 8 from Europe, and 5 from the Americas (Fig. 1), which we classified into linguistic families, branches, and phyla (Table 1). Analysis of molecular variance Measures of genetic variance were hierarchically partitioned by an analysis of molecular variance (AMOVA) (Excoffier et al., 1992) implemented in the Arlequin ver 2.0 software (Schneider et al., 2000). For each locus, we compared each individual genotype with (i) the genotype of the individuals of the same linguistic family, (ii) the genotypes of the individuals from different families of the same linguistic phylum and (iii) the genotypes of individuals belonging to different linguistic phyla. Two measures of genetic distance were estimated from the data which can be assimilated to FST and RST values. The number of alleles differing between two haplotypes was calculated as: ^ xy ¼ d L X dxy ðiÞ i¼1 MATERIALS AND METHODS Dataset We analyzed the published genetic dataset comprising 377 autosomal microsatellite loci distributed on all 22 autosomes in 52 world populations (Rosenberg et al., 2002). The samples were part of a study of the cell-line where dxy(i) is the Kronecker function, equal to 1 if the alleles of the ith locus are identical for both haplotypes, and equal to 0 otherwise, and summation is over all loci. This index is analogous to a weighted FST statistics over all loci (see Arlequin manual, Schneider et al., 2000). American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:43 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 T1 F1 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 3 3 HUMAN DNA DIVERSITY AND LANGUAGES TABLE I. Language name, family, branch and phylum for each population considered, according to Ruhlen’s classification of languages Population name Language (dLAN¼1) Family (dLAN¼2) Biaka Yaka Bantoid Niger-Congo Mandenka Mandinka Mande Niger-Congo Yoruba Yoruba Yoruba-North. Akoko Niger-Congo San Kenya San Bantu Hai.n//um Bantoid – Niger-Congo Mozabite Bedouin Druze Palestinian Brahui Mozabite Arabic Arabic Arabic Brahui Berber Arabo-Canaanite Arabo-Canaanite Arabo-Canaanite Dravidian – Semitic Semitic Semitic North West Balochi Baluchi Iranian Indo-Iranian Hazara Persian Iranian Indo-Iranian Makrani Baluchi Iranian Indo-Iranian Sindhi Sindhi Indic Indo-Iranian Pathan Newari Kalash Kalasha Tibeto-Karen Eth: Tibeto-Burman Indo-Iranian Burusho Han Tujia Burushaski Mandarin/Cantonese Tujia Tibetic Eth: Himalayish Indic Eth: Dardic – Chinese Tai Yi Yi Burmic Miao Miao Miao Oroqen Daur Mongolian Hezhen Xibo Uyghur Dai Oroqen Daur Mongolian Hezhen Xibo Uyghur Dai Lahu Lahu Tungus Mongolian Mongolian Tungus Tungus Turkic Austronesian Eth: Austro-Tai Burmic She She Yao Naxi Naxi Burmic Tu Yakut Japanese Cambodian Tu Yakut Japanese Khmer Mongolian Turkic Japanese Mon-Khmer Papuan – Papuan Melanesian – Melanesian Branch (dLAN¼3) – Sinitic Austro-Tai Eth: Tibeto-Burman Tibeto-Karen Eth: Tibeto-Burman Miao-Yao Mongolian-Tungus Mongolian-Tungus Mongolian-Tungus Mongolian-Tungus Mongolian-Tungus – Austro-Tai Eth: Babar Tibeto-Karen Eth: Tibeto-Burman Miao-Yao Tibeto-Karen Eth: Tibeto-Burman Mongolian-Tungus – Korean-Japanese Austro-Asiatic French Basque Sardinian French Basque Sardinian Romance continental – – – Eth: Oceanic – Eth: Oceanic italic – Italic Bergamo Italian Romance continental Italic Tuscan Italian Romance continental Italic Orcadian English – Germanic Adygei Adyghe – Circassian Phylum (dLAN¼4) Niger-Kordofanian Eth: Niger-Congo Niger-Kordofanian Eth: Niger-Congo Niger-Kordofanian Eth: Niger-Congo Khoisan Niger-Kordofanian Eth: Niger-Congo Afro-Asiatic Afro-Asiatic Afro-Asiatic Afro-Asiatic Elamo-Dravidian Eth: Dravidian Indo-Hittite Eth: Indo-European Indo-Hittite Eth: Indo-European Indo-Hittite Eth: Indo-European Indo-Hittite Eth: Indo-European Sino-Tibetan Indo-Hittite Eth: Indo-European Language isolate Sino-Tibetan Austric Eth: Sino-Tibetan Sino-Tibetan Austric Eth: Miao-Yao Altaic Altaic Altaic Altaic Altaic Altaic Austric Eth: Austronesian Sino-Tibetan Austric Eth: Miao-Yao Sino-Tibetan Altaic Altaic Altaic Austric Eth: Austro-Asiatic Indo-Pacific Eth: Austronesian Indo-Pacific Eth: Austronesian Indo-Hittite Language Isolate Indo-Hittite Eth: Indo-European Indo-Hittite Eth: Indo-European Indo-Hittite Eth: Indo-European Indo-Hittite Eth: Indo-European Northern Caucasian (continued) American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:43 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 4 Stage: I Page: 4 BELLE AND BARBUJANI TABLE 1. (Continued) Population name Language (dLAN¼1) Family (dLAN¼2) Branch (dLAN¼3) Russian Russian Slavic Balto-Slavic Pima Pima Pimic Maya Maya Mexican Piapoco Piapoco Macro-Arawakan Karitiana Karitiana Kariti-Tupi Surui Surui Kariti-Tupi Uto-Aztecan Sonoran Penutian Eth: Yucatecan Equatorial-Tucanoan Eth: Maipuran Equatorial-Tucanoan Eth: Arikem Equatorial-Tucanoan Eth: Monde Phylum (dLAN¼4) Indo-Hittite Eth: Indo-European Amerind Eth: Aztecan Amerind Eth: Mayan Amerind Eth: Arawakan Amerind Eth: Tupi Amerind Eth: Tupi When the classification of The Ethnologue differs significantly from Ruhlen’s, the different classification is indicated as ‘Eth:’ on a second line. C O L O R Fig. 1. Distribution of the sampling localities. The significant genomic boundaries found based on RST values (Barbujani and Belle, 2006) are represented by thick black lines. The Delaunay connections are represented by thin lines (solid between language phyla, dashed between language families, and dotted within families). Different colors represent samples belonging to different language phyla and different symbols (triangles, circles, squares) represent samples belonging to different language families within the same phylum. Slatkin’s (1995) RST is a specific distance measure for microsatellites, based on both allele frequency differences between populations and repeat number differences between alleles. This measure was calculated as the sum of the squared numbers of repeat differences between two haplotypes, that is to say as: ^ xy ¼ d L X axi ayi 2 i¼1 where axi is the number of repeats at the ith locus (see Arlequin manual, Schneider et al., 2000). The significance of the estimated variances was tested by means of a nonparametric permutational procedure (Excoffier et al., 1992). In three independent tests, (i) individuals were assigned to random language families of the same phylum, (ii) individuals were assigned to random language families regardless of the phylum, and (iii) language families were randomly assigned to language phyla, each time recalculating the relevant variance. Each randomization process was iterated 10,000 American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:43 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 5 5 HUMAN DNA DIVERSITY AND LANGUAGES times. The empirical distributions thus obtained were then compared with the observed variances, and an empirical level of significance was thus estimated for each variance. Mantel tests A matrix of great circle geographic distances (dGEO matrix) between populations was constructed using a method first proposed by Ramachandran et al. (2005), i.e. assuming five obligatory waypoints on the world’s map so as to make distances between continents more reflective of the likely human dispersal routes from Africa. These points were Anadyr, Russia (64N, 177E); Cairo, Egypt (30N, 31E); Istanbul, Turkey (41N, 28E); Phnom Penh, Cambodia (11N, 104E); and Prince Rupert, Canada (54N, 130W). Linguistic distances (in the dLAN matrix) were estimated as simple dissimilarity indexes ranging from 0 to 4, according to the method described by Excoffier et al. (1991). In fact, two dLAN matrices were constructed, based either on Ruhlen’s (1991) language classification, or on The Ethnologue. Population speaking languages belonging to different phyla were assigned dLAN ¼ 4, languages of different branches dLAN ¼ 3, languages of different families dLAN ¼ 2, different languages dLAN ¼ 1, and the same language dLAN ¼ 0. Finally, matrices of genetic distances between populations (dGEN matrix) were computed, using either the standard FST measure or Slatkin’s RST. Correlation between genetic boundaries and language barriers The correlations among the three matrices were estimated by pairwise (Mantel, 1967) and multiple (Smouse et al., 1986) Mantel tests. First we calculated three pairwise correlations (between dGEO and dGEN, between dGEO and dLAN, and between dGEN and dLAN). We then separated the effects of geography and language on the genetic distances by calculating the partial correlations between dGEN and dGEO (with dLAN held constant) and between dGEN and dLAN (with dGEO held constant). The significance of the observed coefficients was calculated by randomly permuting rows and columns of one matrix while keeping the other matrix constant, thus obtaining an empirical null distribution of correlation coefficients. Doubts have been raised on the accuracy of the P-values associated with the partial Mantel tests (Raufaste and Rousset, 2001). Be that as it may, we provide those values for the sake of comparison with previous studies. All the above procedures are implemented in the Arlequin ver 2.0 software (Schneider et al., 2000). Correlation between genetic barriers and linguistic groups In a previous analysis of the same dataset, six significant genetic boundaries were identified, namely zones where the rate of genetic change is locally increased with respect to random locations (Barbujani and Belle, 2006; see also Rosenberg et al., 2005 for an alternative approach). In this study, we investigated whether at these genetic boundaries linguistic change is increased with respect to other zones on the map. Adjacent populations were connected by a Delaunay network (Manni et al., 2004) and each edge of the network was associated with a measure of linguistic differentiation (Figure 1). TABLE 2. Hierarchical AMOVA analysis, showing the percentage of variation at each of three levels of population hierarchy, considering either allele frequencies only (FST) or both allele frequencies and molecular distances between alleles (RST) Measure of genetic distance FST RST Among language phyla Among populations between language families, within phyla Within populations 2.9 2.4 6.7 2.9 94.7 90.4 All values are significant at the P < 0.0001 level. For this purpose, we pooled the indexes of language dissimilarity used for the Mantel tests in two different ways: we either considered dLAN ¼ 4 versus dLAN ¼ 0, 1, 2, 3 or dLAN ¼ 3, 4 versus dLAN ¼ 0, 1, 2. We then compared the average dLAN along edges of the network that were crossed by a significant genetic boundary with the average dLAN measured along the other edges of the network, under the null hypothesis of no difference between averages. RESULTS AMOVA Considering allele-frequency differences (FST), the analysis of variance reveals that only 2.9% of the genetic variation is explained by differences between linguistic phyla and 2.4% is due to differences between language families (Table 2). The average worldwide FST value estimated from this dataset is equal to 0.053. When molecular differences between alleles are considered (RST), genetic diversity between phyla and families increases to 6.7% of the total for differences among language phyla and to 2.9% among families. For both analyses, the diversity indexes among individuals from the same language family, individuals from different language families of the same phylum, and individuals from different language phyla are all significantly greater than 0 at P < 0.05. Considering the classification of languages of The Ethnologue instead of Ruhlen’s, we found very similar results, with the difference in the percentages <0.5% for each component of variation (data not given). In what follows, unless otherwise specified, we shall refer to dLAN matrices estimated according to Ruhlen’s classification. We did not perform analyses at a lower hierarchical level, that is to say within language phyla or families, because of the highly variable number of populations sampled within each linguistic group. Mantel tests The correlation coefficients between geographic, linguistic and genetic distances, using either RST or FST distances, are given in Table 3 and Figure 2. Considering FST values, and hence not taking into account molecular differences between alleles, both variables, geography and language, appear significantly associated with genetic variation. The correlation between genetic diversity and linguistic distances was highly significant (r ¼ 0.226, P < 0.0001), meaning that 5.1% (r2) of the genetic variation is accounted for by language distances. The proportion of genetic variation accounted for by geography was much higher (r2 ¼ 65.3%). However, geographic distances and linguistic classification are also correlated, and predictably so, American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:44 T2 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 T3, F2 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 6 Stage: I Page: 6 BELLE AND BARBUJANI TABLE 3. Mantel correlation and partial correlation coefficients (r), between genetic (dGEN), geographic (dGEO) and linguistic matrices (dLAN) based on Ruhlen’s classification of languages, either using FST or RST measures of genetic diversity Genetic measure RST FST Correlation coefficient (r) Matrices considered DGEN and dGEO dGEN and dlan dGEO and dLAN dGEN, dGEO, and dLAN dGEN and dLAN, dLAN constanta dGEN and dLAN, dLAN constanta Proportion of variance explained (r2) 0.808*** 0.226*** 0.268*** 0.653 0.051 0.072 0.653 0.633 0.0003 0.796*** 0.018N.S. Correlation coefficient (r) Proportion of variance explained (r2) 0.746*** 0.311*** 0.269*** 0.723*** 0.172*** 0.557 0.097 0.072 0.570 0.523 0.030 a Partial correlation coefficients. *** P 0.0001; NS: non significants. C O L O R Fig. 2. (a) Correlation between genetic distances calculated as RST values, and geographic distances constrained by five obligatory waypoints (dGEO). Following Ramachandran et al. (2005), red squares represent comparisons within regions, green triangles comparisons between populations in Africa and Eurasia, and blue diamonds comparisons with America and Oceania. (b) Box plot showing the correlation between genetic distances (RST) and linguistic distances (dLAN) calculated on the basis of Ruhlen’s classification of languages. The upper and lower limits of the box represent, respectively, the 5% and 95% confidence intervals, and the long horizontal bar in the box represents the median. American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 7 7 HUMAN DNA DIVERSITY AND LANGUAGES because populations speaking related languages form clusters in the geographical space. As a consequence, the observed correlation between linguistic and genetic distances may conceivably reflect the fact that both are correlated with geographical distances. When geographic distances are held constant, the correlation does not remain significant when genetic distances are measured by FST. By contrast, using RST values (Table 3), which are more specific for microsatellites, we found that 55.7% of the genetic variance is explained by geography, while only 9.7% is accounted for by linguistics (without keeping any variable constant). In this case, the correlation between genetic diversity and linguistic distances remains significant, and highly so, even at constant geographic distances (r ¼ 0.172, P < 0.0001), and in this case linguistic distances alone account for less than 3% (r2) of genetic variation. Therefore, the geographic location of populations seems to be a better predictor of genetic distances than their linguistic affiliation, even though both correlations are highly significant. However, our results also suggest that between 43% (with RST values) and 45% (with FST values) of the overall genetic variance reflects the action of other factors. Once again, considering the classification of languages of The Ethnologue instead of Ruhlen’s did not alter substantially our results (data not given). Figure 2a is a plot of the RST values versus pairwise geographic distances between populations. The correlation appears to be mostly due to comparisons of American and Oceanian populations, where both the RST values and geographic distances are greatest. However, increasing genetic distances tend to correspond to increasing language distances also when populations of the same region are compared. This is confirmed by Figure 2b, where the category corresponding to dLAN ¼ 4 has a large confidence interval for the RST value. Correlation between genetic barriers and linguistic groups T4 In our 50 population samples, 48 different languages are spoken belonging to 11 linguistic phyla and 2 language isolates, according to Ruhlen’s (1991) classification. If language differences represent an important barrier to gene flow, we expect the linguistic difference between populations on different sides of a genetic barrier to be on average greater than the one on the same side of that barrier. In a previous analysis of this dataset, six significant genetic boundaries were identified using Slatkin’s RST measure of genetic distances (Barbujani and Belle, 2006). We found that out of the 97 Delaunay connections of the map, 23 are crossed by one of those genetic boundaries (Fig. 1 and Table 4). A fraction equal to 32.7% (18/55) of the Delaunay connections between linguistic phyla (dLAN ¼ 4) is crossed by a significant genetic barrier, as is the case for 13.9% (6/43) of the connections within linguistic phyla (dLAN ¼ 1, 2, or 3). The difference between those proportions is statistically significant (v2 ¼ 4.6, P < 0.05), showing that increased genetic change tends to be observed with higher probability where linguistic change is greatest. Among the most controversial issues in linguistics is the classification of native American languages. Although Ruhlen clusters them into a single phylum, many studies have suggested that they could belong to TABLE 4. Proportion of Delaunay connections in the world’s map crossed by a significant genetic barrier as defined in Barbujani and Belle (2006) Delaunay connections Linguistic distance dLAN¼0 dLAN¼1 dLAN¼2 dLAN¼3 dLAN¼4 Total Crossed by a barrier Not crossed by a barrier Total 0 1 2 3 18 23 5 8 7 17 37 74 5 9 9 20 55 97 Two different ways of grouping the coding of the linguistic distances were tested: (i) at the level dLAN¼4 versus dLAN¼0,l,2,3 and (ii) at the level dLAN¼3,4 versus dLAN¼0,1,2 (see main text). several distinct phyla (see for instance Sims-Williams, 1998). We reran the analysis considering the classification of The Ethnologue (Table 1). In this way, 21 out of 23 Delaunay connections showing increased genetic change occur between different language phyla, thus making the difference between linguistic categories even more significant (v2 ¼ 9.9, P < 0.01) than in the previous test. To understand if the correlation is due to a specific degree of linguistic differentiation, we also investigated whether populations speaking languages belonging to different phyla or branches (dLAN ¼ 3 or 4) occur more often on different sides of a genetic boundary than populations speaking languages belonging to the same branch (dLAN ¼ 0, 1, or 2). The difference between those linguistic categories was not significant (v2 ¼ 2.13, NS). DISCUSSION This study suggests that linguistic differences between populations have a small, but nonnegligible, effect on patterns of DNA variation at the world scale. The large number of markers considered, and the fact that ascertainment bias is unlikely to have affected the statistics estimated from DNA data, suggests that this conclusion is robust and can be generalized at the genome level, at least for autosomes. By contrast, uniparentally-transmitted markers (mtDNA and Y chromosome) have a lower effective population size and hence are affected more strongly by genetic drift, often showing peculiar geographical patterns (see for instance Jorde et al., 2000, Wilson et al., 2001; Romualdi et al., 2002). As a consequence, for further generalizations our conclusions will have to be tested against suitable mitochondrial DNA and Y-chromosome datasets. Here we found that 5.3% of the global variance in our autosomal dataset can be explained by differences between linguistic groups (families or phyla) using FST measures of genetic distances, and 9.6% using the RST statistic. These values are of the same order of magnitude as the proportion of genetic variation that can be attributed to differences among seven geographic regions using the same dataset, that is to say 3.6% (Rosenberg et al., 2002) and 9.2% (Excoffier and Hamilton, 2003), using respectively FST and RST measures of genetic diversity. These low values could be interpreted as meaning that language affiliations are as informative—or uninformative—as geographic locations to predict the genetic rela- American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 8 Stage: I Page: 8 BELLE AND BARBUJANI tionships between populations. However, the Mantel tests indicate that in fact geographic distances are a good predictor of genetic distances, and that linguistic distances also affect DNA differences. The levels of genetic differentiation inferred from microsatellites are lower than that from other DNA polymorphisms (Jorde et al., 2000; Romualdi et al., 2002). This is thought to be due, at least in part, to the higher microsatellite mutation rates, which tend to increase variation within groups compared to variation among groups (Jin and Chakraborty, 1995; Jorde et al., 2000). When molecular distances between alleles were also considered (RST distances), the differences between language groups became almost twice as large (9.6%). This confirms that evolutionary processes affecting DNA sequences and allele frequencies occur at different time scales. It is known that DNA sequences evolve relatively slowly by the gradual accumulation of mutations, whereas allele frequencies change more rapidly under the action of genetic drift. Many polymorphisms may even predate the geographical differentiation of modern humans and differ between human groups only in relative frequencies (Klein et al., 1993; O’hUigin et al. 2002). Accordingly, we would expect linguistic change to be more closely associated with changes in allele frequencies, and hence with FST. However, microsatellites tend to evolve more rapidly than other polymorphisms, and hence RST measures seem to more closely parallel linguistic change than FST (for a similar result, see Excoffier and Hamilton, 2003). Globally speaking, the genetic differences between the main language phyla probably reflect relatively ancient demographic subdivision, whose effects may be easier to identify at the time scale of microsatellite evolution. Mantel tests allowed us to quantify the relative importance of geography and language in determining the current genetic diversity. As already observed, genetic distances are more closely related with geographic than with linguistic distances. The values of the correlation coefficients between geographic and genetic distances are consistent with the ones obtained by Ramachandran et al. (2005) based on the same populations but with 406 additional loci. Our values are lower, but we also observe a stronger correlation with RST measures of genetic distances (r ¼ 0.808) than with FST measures (r ¼ 0.746). In the partial correlations, using RST measures of genetic diversity, geography accounts for more than 50% of the genetic variance when languages are kept constant, whereas linguistic differences account for only 3% of the genetic variance when geography is controlled for. Nevertheless, both correlations are significant. This suggests that (i), as already known, populations that are distant in the geographical space tend to be genetically differentiated, but also that (ii) pairs of populations separated by the same geographic distance tend to be more genetically differentiated if they speak languages belonging to different phyla. This significant correlation between linguistic and genetic distances based on RST values is consistent with previous analyses of European diversity based on nuclear allele frequencies (Sokal, 1988) and on Y-chromosome polymorphisms (Poloni et al., 1997; Rosser et al., 2000), and with Chen et al.’s (1995) results at the world level, the latter study based on 11 loci only. By contrast, when considering genetic distances calculated as FST values, the partial correlation between genetics and linguistics holding geography constant is not significant. However, it is important to note that because there is presently no consensus on how to quantify linguistic relatedness, we were forced to adopt a rather arbitrary and approximate scale of values. Therefore the fact that we did find significant correlations suggests that the relationship between linguistic and genetic differentiation may in fact be stronger than estimated. In agreement with this view is another result of this study. We examined the effect of using a dLAN ¼ 8 (instead of 4) for languages belonging to different phyla, implicitly assuming that only relatively close linguistic relationships can be safely established, whereas the relationships between language phyla are minimal and largely undefined (see e.g. Trask, 1996). This rescaling had the effect of further increasing the partial correlation between genetics and languages (r ¼ 0.206, P < 0.0001 with RST measures), although the association between genetics and geography remains stronger. Geography and languages together explain between 57% (RST) and 65% (FST) of the DNA variance among the populations of this study, two values lower than those reported for the same populations using different statistics, namely 78% (Ramachandran et al., 2005). At any rate, this means that a substantial part of microsatellite diversity must be accounted for by other demographic or evolutionary factors, and presumably that forces acting locally, such as genetic drift and founder effects, have substantially contributed to shaping human DNA diversity. Some aspects of this phenomenon have already been pointed out, including the fact that the DNA similarities between the Hazara and the Uyghurs of this study would not be expected based on either their linguistic or geographic relationships (Rosenberg et al., 2002). Finally, we investigated to which extent large linguistic differences are localized at the main genetic boundaries. Our analysis showed that the greatest proportion of genetic barriers falls between populations speaking languages of different phyla. In other words, genetic boundaries, or zones or sharp genetic change, tend to occur where well-distinct linguistic groups are at contact. We showed in a previous study (Belle and Barbujani, 2006) that genetic boundaries do not necessarily correspond to obvious physical barriers. Significant genetic boundaries have been observed, for instance, within South American populations separated by but a few kilometers. The very small size of the American hunting–gathering populations, and their documented tendency to split along family lines (Crawford, 1998) have certainly enhanced the local drift effects. Obviously, when population splits are followed by isolation, both language and genetic divergence will ensue, and language differences will reinforce evolutionary independence between populations, even in the absence of other reproductive barriers. To understand if the significant difference found between genetic barriers separating populations speaking languages belonging to the same phylum or different phyla was due to linguistic differences at the phylum level, we repeated the same analysis using a different grouping, that is to say between branches (or phyla) versus within branches. We found that no significant difference was left, suggesting that most of the correlations previously found are due to the effects of the major linguistic boundaries, such as those separating languages of different phyla. Our results may have been biased by a number of factors. First, it has been suggested that the uneven distribution of the populations sampled across the world could American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 9 HUMAN DNA DIVERSITY AND LANGUAGES influence the inferred patterns of genetic diversity (Serre and Paabo, 2004), and hence of linguistic diversity. However, this issue is still debated, with Rosenberg et al. (2005) maintaining that this geographic representation was actually suitable for population structure studies. Second, the classification of languages at the world level is still controversial. To better assess the patterns of correlation between genetic and linguistic diversity, we need a clearer consensus on the relationships among languages. Despite those potential drawbacks, the results of our analysis, the largest comparison so far of genetic and linguistic diversity in terms of the number of loci considered, confirm the existence, at a worldwide scale, of a small but detectable effect of linguistic differences on human DNA diversity at the genomic level. This result is also of practical anthropological significance, because most languages spoken on earth are currently threatened of extinction. Our finding that the distribution of languages has a detectable influence on human genetic diversity suggests that it would be essential to constitute a collection of additional linguistic and genetic data from the populations which are the most endangered, before it is too late to study them. ACKNOWLEDGMENTS We thank Noah Rosenberg and another anonymous reviewer for insightful comments and suggestions. LITERATURE CITED Barbujani G. 1991. What do languages tell us about human microevolution? Trends Ecol Evol 5:151–156. Barbujani G. 1997. DNA variation and language affinities. Am J Hum Genet 61:1011–1014. Barbujani G, Belle E. 2006. Genomic boundaries between human populations. Hum Hered 61:15–21. Barbujani G, Sokal RR. 1990. Zones of sharp genetic change in Europe are also linguistic boundaries. Proc Natl Acad Sci USA 87:1816–1819. Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L, Ferrara GB, Friedlaender JS, Groot H, Gurwitz D, Jenkins T, Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA, Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q, Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G, Dausset J Cavalli-Sforza LL. 2002. A human genome diversity cell line panel. Science 296:261–262. Cavalli-Sforza LL. 1966. Population structure and human evolution. Proc Royal Soc B 164:362–379. Cavalli-Sforza LL. 1986. African Pygmies. Academic Press, Orlando, FL. Cavalli-Sforza LL, Minch E Mountain JL. 1992. Coevolution of genes and languages revisited. Proc Natl Acad Sci USA 89:5620–5624. Cavalli-Sforza LL, Piazza A, Menozzi P Mountain J. 1988. Reconstruction of human evolution: Bringing together genetic, archaeological and linguistic data. Proc Natl Acad Sci USA 85:6002–6006. Chen J, Sokal RR, Ruhlen M. 1995. Worldwide analysis of genetic and linguistic relationships of human populations. Hum Biol 67:595–612. Cheung KH, Osier MV, Kidd JR, Pakstis AJ, Miller PL, Kidd KK. 2000. ALFRED: an allele frequency database for diverse populations and DNA polymorphisms. Nucleic Acids Res 28:361–363. Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R. 2005. Ascertainment bias in studies of human genomewide polymorphism. Genome Res 15:1496–1502. 9 Crawford M. 1998. The origin of native Americans: evidence from archeological genetics. Cambridge University Press, Cambridge. Excoffier L, Hamilton G. 2002. Comment on \Genetic structure of human populations". Science 298:2381–2385. Excoffier L, Harding RM, Sokal RR, Pellegrini B, SanchezMazas A. 1991. Spatial differentiation of RH and GM haplotype frequencies in sub-Saharan Africa and its relation to linguistic affinities. Hum Biol 63:273–307. Excoffier L, Smouse PE, Quattro JM. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131:479–491. Gordon RG Jr. 2005. Ethnologue: languages of the world, 15th ed. Dallas, TX: SIL International. Online version: http:// www.ethnologue.com/. Harding RM, Sokal RR. 1988. Classification of the European language families by genetic distances. Proc Natl Acad Sci USA 85:9370–9372. Hunley K, Long JC. 2005. Gene flow across linguistic boundaries in native North American populations. Proc Natl Acad Sci USA 102:1312–1317. Jin L, Chakraborty R. 1995. Population structure, stepwise mutations, heterozygote deficiency and their implications in DNA forensics. Heredity 74:274–285. Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE, Seielstad MT, Batzer MA. 2000. The distribution of human genetic diversity: a comparison of mitochondrial, autosomal and Y-chromosome data. Am J Hum Genet 66:979–988. Klein J, Satta Y, O’hUigin C, Takahata N. 1993. The molecular descent of the major histocompatibility complex. Annu Rev Immunol 11:269–295. Manni F, Guerard E, Heyer E. 2004. Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by using Monmonier’s algorithm. Hum Biol 76:173–190. Mantel NA. 1967. The detection of disease clustering and a generalized regression approach. Cancer Res 27:209–220. Monsalve MV, Helgason A, Devine DV. 1999. Languages, geography and HLA haplotypes in native American and Asian populations. Proc Royal Soc B 266:2209–2216. Nettle D, Harriss L. 2003. Genetic and linguistic affinities between human populations in Eurasia and West Africa. Hum Biol 75:331–344. Nichols J. 1997. Modeling ancient population structures and movement in linguistics. Annu Rev Anthropol 26:359–384. O’hUigin C, Satta Y, Takahata N, Klein J. 2002. Contribution of homoplasy and of ancestral polymorphism to the evolution of genes in anthropoid primates. Mol Biol Evol 19:1501–1513. Poloni ES, Semino O, Passarino G, Santachiara-Benerecetti AS, Dupanloup I, Langaney A, Excoffier L. 1997. Human genetic affinities for Y-chromosome P49a,f/TaqI haplotypes show strong correspondence with linguistics. Am J Hum Genet 61:1015–1035. Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL. 2005. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci USA 102:15942–15947. Raufaste N, Rousset F. 2001. Are partial Mantel tests adequate? Evolution 55:1703–1705. Relethford JH. 2004. Global patterns of isolation by distance based on genetic and morphological data. Hum Biol 76:499– 513. Renfrew C. 1989. Models of change in languages and archaeology. Trans Philos Soc 87:103–155. Romualdi C, Balding D, Nasidze IS, Risch G, Robichaux M, Sherry ST, Stoneking M, Batzer MA, Barbujani G. 2002. Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms. Genome Res 12:602–612. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW. 2002. Genetic structure of human populations. Science 298:2381–2385. American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 10 Stage: I Page: 10 BELLE AND BARBUJANI Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard JK, Feldman MW. 2005. Clines, clusters, and the effects of study design on the inference of human population structure. PloS Genetics 6:e70. Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D, Amorim A, Amos W, Armenteros M, Arroyo E, Barbujani G, Beckman G, Beckman L, Bertranpetit J, Bosch E, Bradley DG, Brede G, Cooper G, Corte-Real HB, de Knijff P, Decorte R, Dubrova YE, Evgrafov O, Gilissen A, Glisic S, Golge M, Hill EW, Jeziorowska A, Kalaydjieva L, Kayser M, Kivisild T, Kravchenko SA, Krumina A, Kucinskas V, Lavinha J, Livshits LA, Malaspina P, Maria S, McElreavey K, Meitinger TA, Mikelsaar AV, Mitchell RJ, Nafa K, Nicholson J, Norby S, Pandya A, Parik J, Patsalis PC, Pereira L, Peterlin B, Pielberg G, Prata MJ, Previdere C, Roewer L, Rootsi S, Rubinsztein DC, Saillard J, Santos FR, Stefanescu G, Sykes BC, Tolun A, Villems R, Tyler-Smith C, Jobling MA. 2000. Y-chromosomal diversity in Europe is clinal and influenced primarily by geography, rather than language. Am J Hum Genet 67:1526–1543. Ruhlen M. 1991. A guide to the world’s languages, Vol 1: Classification. Stanford, CA: Stanford University Press. Sajantila A, Lahermo P, Anttinen T, Lukka M, Sistonen P, Savontaus M-L, Aula P, Beckman L, Tranebjaerg L, GeddeDahl T, Issel-Tarver L, DiRienzo A, Paabo S. 1995. Genes and languages in Europe: an analysis of mitochondrial lineages. Genome Res 5:42–52. Schneider S, Roessli D, Excoffier L. 2000. Arlequin ver. 2.000: A software for population genetics data analysis. Genetics and Biometry Laboratory, University of Geneva, Switzerland. Serre D, Paabo S. 2004. Evidence for gradients of human genetic diversity within and among continents. Genome Res 14:1679–1685. Sims-Williams P. 1998. Genetics, linguistics, and prehistory: thinking big and thinking straight. Antiquity 72:505–527. Slatkin M. 1995. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457–462. Smouse PE, Long JC, Sokal RR. 1986. Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst Zool 35:627–632. Sokal RR. 1988. Genetic, geographic and linguistic distances in Europe. Proc Natl Acad Sci USA 85:1722–1726. Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J, Vaudor A. 1989. Genetic differences among language families in Europe. Am J Phys Anthropol 79:489–502. Trask RL. 1996. Historical linguistics. London: Arnold. Wilson JE, Weale ME, Smith AC, Gratrix F, Fletcher B, Thomas MG, Bradman N, Goldstein DB. 2001. Population genetic structure of variable drug response. Nat Genet 29:265–269. Wurm SA. 2001. Atlas of the world’s languages in danger of disappearing, 2nd ed. UNESCO Publishing. American Journal of Physical Anthropology—DOI 10.1002/ajpa ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044 J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 11 AQ1: The title has been modified slightly so as to eliminate the \the statement/sentence effect" of the original. Kindly confirm whether the changes made are appropriate. AQ2: Kindly note that as Roosenberg et al (2002) is repeated twice in the reference list, the latter has been deleted. ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
© Copyright 2026 Paperzz