American Journal of Physical Anthropology

American Journal of Physical Anthropology
Copy of e-mail Notification
Your article ( 2006-00153.R2 ) from American Journal of Physical Anthropology is available for download
=====
American Journal of Physical Anthropology Published by John Wiley & Sons, Inc.
Dear Author,
Your article page proofs for American Journal of Physical Anthropology are ready for review. John Wiley
& Sons has made this article available to you online for faster, more efficient editing. Please follow the
instructions below and you will be able to access a PDF version of your article as well as relevant
accompanying paperwork.
First, make sure you have a copy of Adobe Acrobat Reader software to read these files. This is free
software and is available for user downloading at
http://www.adobe.com/products/acrobat/readstep.html.
Open your web browser, and enter the following web address:
http://kwglobal.co.in/jw/retrieval.aspx
You will be prompted to log in, and asked for a password. Your login name will be your email address, and
your password will be ---Example:
Login: your e-mail address
Password: ---The site contains one file, containing:
-
Author Instructions Checklist
Adobe Acrobat Users - NOTES tool sheet
Reprint Order form
A copy of your page proofs for your article
Print out this file, and fill out the forms by hand. (If you do not wish to order reprints, please mark a "0" on
the reprint order form.) Read your page proofs carefully and:
-
indicate changes or corrections in the margin of the page proofs
answer all queries (footnotes A,B,C, etc.) on the last page of the PDF proof
proofread any tables and equations carefully
check your figure legends for accuracy
Within 48 hours, please return via fax or express mail all materials to the address given below. This will
include:
1) Page proofs with corrections
2) Reprint Order form
Return to:
Doug Frank
Production Editor
American Journal of Physical Anthropology
Copy of e-mail Notification
Cadmus Communications
300 W. Chestnut St.
Ephrata, PA 17522
[email protected]
Phone: 800-238-3814 x615
Fax: 717-738-9360
Technical problems? If you experience technical problems downloading your file or any other problem
with the website listed above, please contact Malathi Shekar (e-mail: [email protected],
phone: +91 (44) 42058888 (x310).
Questions regarding your article? Please don’t hesitate to contact me with any questions about the article
itself, or if you have trouble interpreting any of the questions listed at the end of your file. REMEMBER
TO INCLUDE YOUR ARTICLE NO. ( 2006-00153.R2 ) WITH ALL CORRESPONDENCE.
This will help both of us address your query most efficiently.
As this e-proofing system was designed to make the publishing process easier for everyone, we welcome
any and all feedback. Thanks for participating in our e-proofing system!
This e-proof is to be used only for the purpose of returning corrections to the publisher.
Sincerely,
Doug Frank
Production Editor
Cadmus Communications
300 W. Chestnut St.
Ephrata, PA 17522
e-mail: [email protected]
Phone: 800-238-3814 x615
Fax: 717-738-9360
111 RIVER STREET, HOBOKEN, NJ
07030
***IMMEDIATE RESPONSE REQUIRED***
Your article will be published online via Wiley's EarlyView® service (www.interscience.wiley.com) shortly after receipt of
corrections. EarlyView® is Wiley's online publication of individual articles in full text HTML and/or pdf format before release of
the compiled print issue of the journal. Articles posted online in EarlyView® are peer-reviewed, copyedited, author corrected,
and fully citable via the article DOI (for further information, visit www.doi.org). EarlyView® means you benefit from the best of
two worlds--fast online availability as well as traditional, issue-based archiving.
Please follow these instructions to avoid delay of publication.
READ PROOFS CAREFULLY
• This will be your only chance to review these proofs. Please note that once your corrected article is posted
online, it is considered legally published, and cannot be removed from the Web site for further corrections.
• Please note that the volume and page numbers shown on the proofs are for position only.
ANSWER ALL QUERIES ON PROOFS (Queries for you to answer are attached as the last page of your proof.)
• Mark all corrections directly on the proofs. Note that excessive author alterations may ultimately result in delay of
publication and extra costs may be charged to you.
CHECK FIGURES AND TABLES CAREFULLY (Color figure proofs will be sent under separate cover.)
• Check size, numbering, and orientation of figures.
• All images in the PDF are downsampled (reduced to lower resolution and file size) to facilitate Internet delivery.
These images will appear at higher resolution and sharpness in the printed article.
• Review figure legends to ensure that they are complete.
• Check all tables. Review layout, title, and footnotes.
COMPLETE REPRINT ORDER FORM
• Fill out the attached reprint order form. It is important to return the form even if you are not ordering reprints. You
may, if you wish, pay for the reprints with a credit card. Reprints will be mailed only after your article appears in
print. This is the most opportune time to order reprints. If you wait until after your article comes off press, the
reprints will be considerably more expensive.
RETURN
PROOFS
REPRINT ORDER FORM
CTA (If you have not already signed one)
RETURN IMMEDIATELY AS YOUR ARTICLE WILL BE POSTED ONLINE SHORTLY AFTER RECEIPT;
FAX PROOFS TO 717-738-9360.
QUESTIONS?
Doug Frank, Production Editor
Phone: 800-238-3814 x615
E-mail: [email protected]
Refer to journal acronym and article production number
(i.e., AJPA 02-3399 for American Journal of Physical Anthropology ms 02-3399).
Softproofing for advanced Adobe Acrobat Users
- NOTES tool
NOTE: ACROBAT READER FROM THE INTERNET DOES NOT CONTAIN THE NOTES TOOL USED IN THIS PROCEDURE.
Acrobat annotation tools can be very useful for indicating changes to the PDF proof of your article. By
using Acrobat annotation tools, a full digital pathway can be maintained for your page proofs.
The NOTES annotation tool can be used with either Adobe Acrobat 6.0 or Adobe Acrobat 7.0. Other
annotation tools are also available in Acrobat 6.0, but this instruction sheet will concentrate on how to
use the NOTES tool. Acrobat Reader, the free Internet download software from Adobe, DOES NOT
contain the NOTES tool. In order to softproof using the NOTES tool you must have
the full software suite Adobe Acrobat Exchange 6.0 or Adobe Acrobat 7.0 installed on your computer.
Steps for Softproofing using Adobe Acrobat NOTES tool:
1. Open the PDF page proof of your article using either Adobe Acrobat Exchange 6.0 or Adobe
Acrobat 7.0. Proof your article on-screen or print a copy for markup of changes.
2. Go to Edit/Preferences/Commenting (in Acrobat 6.0) or Edit/Preferences/Commenting (in Acrobat
7.0) check “Always use login name for author name” option. Also, set the font size at 9 or 10 point.
3. When you have decided on the corrections to your article, select the NOTES tool from the Acrobat
toolbox (Acrobat 6.0) and click to display note text to be changed, or Comments/Add Note (in Acrobat
7.0).
4. Enter your corrections into the NOTES text box window. Be sure to clearly indicate where the
correction is to be placed and what text it will effect. If necessary to avoid confusion, you can use your
TEXT SELECTION tool to copy the text to be corrected and paste it into the NOTES text box window.
At this point, you can type the corrections directly into the NOTES text box window. DO NOT correct
the text by typing directly on the PDF page.
5. Go through your entire article using the NOTES tool as described in Step 4.
6. When you have completed the corrections to your article, go to Document/Export Comments (in
Acrobat 6.0) or Comments/Export Comments (in Acrobat 7.0). Save your NOTES file to a place on
your harddrive where you can easily locate it. Name your NOTES file with the article number
assigned to your article in the original softproofing e-mail message.
7. When closing your article PDF be sure NOT to save changes to original file.
8. To make changes to a NOTES file you have exported, simply re-open the original PDF
proof file, go to Document/Import Comments and import the NOTES file you saved. Make changes
and reexport NOTES file keeping the same file name.
9. When complete, attach your NOTES file to a reply e-mail message. Be sure to include your name,
the date, and the title of the journal your article will be printed in.
C1
REPRINT BILLING DEPARTMENT x 111 RIVER STREET x HOBOKEN, NJ 07030
PHONE: (201) 748-8789; FAX: (201) 748-6326
E-MAIL: [email protected]
PREPUBLICATION REPRINT ORDER FORM
Please complete this form even if you are not ordering reprints. This form MUST be returned with your corrected proofs and original manuscript. Your
reprints will be shipped approximately 4 weeks after publication. Reprints ordered after printing will be substantially more expensive.
JOURNAL
American Journal of Physical Anthropology
VOLUME
ISSUE
TITLE OF
MANUSCRIPT
MS. NO.
AJPA--
NO. OF
AUTHOR(S)
PAGES
No. of Pages
100 Reprints
200 Reprints
300 Reprints
400 Reprints
$
$
$
$
500 Reprints
$
1-4
336
501
694
890
1052
5-8
469
703
987
1251
1477
9-12
594
923
1234
1565
1850
13-16
714
1156
1527
1901
2273
17-20
794
1340
1775
2212
2648
21-24
911
1529
2031
2536
3037
25-28
1004
1707
2267
2828
3388
29-32
1108
1894
2515
3135
3755
33-36
1219
2092
2773
3456
4143
37-40
1329
2290
3033
3776
4528
**REPRINTS ARE ONLY AVAILABLE IN LOTS OF 100. IF YOU WISH TO ORDER MORE THAN 500 REPRINTS, PLEASE CONTACT OUR
REPRINTS DEPARTMENT AT (201) 748-8789 FOR A PRICE QUOTE.
Please send me
_____________________
Please add appropriate State and Local Tax
reprints of the above article at
$
(Tax Exempt No.____________________)
$
Please add 5% Postage and Handling
$
TOTAL AMOUNT OF ORDER**
$
for United States orders only.
**International orders must be paid in currency and drawn on a U.S. bank
Please check one:
Check enclosed
Bill me
Credit Card
If credit card order, charge to:
Visa
MasterCard
Credit Card No
American Express
Signature
Exp. Date
BILL TO:
SHIP TO:
Name
Name
Institution
Institution
Address
Address
Purchase Order No.
Phone
E-mail
(Please, no P.O. Box numbers)
Fax
COPYRIGHT TRANSFER AGREEMENT
Date:
To:
111 River Street
Hoboken, NJ 07030
201.748.6000
FAX 201.748.6052
Production/Contribution
ID# _______________
Publisher/Editorial office use only
Re: Manuscript entitled________________________________________________________________________________
____________________________________________________________________________________(the “Contribution”)
for publication in ______________________________________________________________________(the “Journal”)
published by Wiley Periodicals, Inc. (“Wiley”).
Dear Contributor(s):
Thank you for submitting your Contribution for publication. In order to expedite the editing and publishing process and enable
Wiley to disseminate your work to the fullest extent, we need to have this Copyright Transfer Agreement signed and returned to
us as soon as possible. If the Contribution is not accepted for publication this Agreement shall be null and void.
A. COPYRIGHT
1. The Contributor assigns to Wiley, during the full term of copyright and any extensions or renewals of that term, all
copyright in and to the Contribution, including but not limited to the right to publish, republish, transmit, sell, distribute
and otherwise use the Contribution and the material contained therein in electronic and print editions of the Journal and
in derivative works throughout the world, in all languages and in all media of expression now known or later
developed, and to license or permit others to do so.
2. Reproduction, posting, transmission or other distribution or use of the Contribution or any material contained
therein, in any medium as permitted hereunder, requires a citation to the Journal and an appropriate credit to Wiley
as Publisher, suitable in form and content as follows: (Title of Article, Author, Journal Title and Volume/Issue
Copyright  [year] Wiley Periodicals, Inc. or copyright owner as specified in the Journal.)
B. RETAINED RIGHTS
Notwithstanding the above, the Contributor or, if applicable, the Contributor’s Employer, retains all proprietary rights
other than copyright, such as patent rights, in any process, procedure or article of manufacture described in the
Contribution, and the right to make oral presentations of material from the Contribution.
C. OTHER RIGHTS OF CONTRIBUTOR
Wiley grants back to the Contributor the following:
1. The right to share with colleagues print or electronic “preprints” of the unpublished Contribution, in form and
content as accepted by Wiley for publication in the Journal. Such preprints may be posted as electronic files on the
Contributor’s own website for personal or professional use, or on the Contributor’s internal university or corporate
networks/intranet, or secure external website at the Contributor’s institution, but not for commercial sale or for any
systematic external distribution by a third party (e.g., a listserve or database connected to a public access server).
Prior to publication, the Contributor must include the following notice on the preprint: “This is a preprint of an
article accepted for publication in [Journal title]  copyright (year) (copyright owner as specified in the Journal)”.
After publication of the Contribution by Wiley, the preprint notice should be amended to read as follows: “This is a
preprint of an article published in [include the complete citation information for the final version of the Contribution
as published in the print edition of the Journal]”, and should provide an electronic link to the Journal’s WWW site,
located at the following Wiley URL: http://www.interscience.Wiley.com/. The Contributor agrees not to update the
preprint or replace it with the published version of the Contribution.
2. The right, without charge, to photocopy or to transmit online or to download, print out and distribute to a colleague a
copy of the published Contribution in whole or in part, for the colleague’s personal or professional use, for the
advancement of scholarly or scientific research or study, or for corporate informational purposes in accordance with
Paragraph D.2 below.
3. The right to republish, without charge, in print format, all or part of the material from the published Contribution in
a book written or edited by the Contributor.
4. The right to use selected figures and tables, and selected text (up to 250 words, exclusive of the abstract) from the
Contribution, for the Contributor’s own teaching purposes, or for incorporation within another work by the
Contributor that is made part of an edited work published (in print or electronic format) by a third party, or for
presentation in electronic format on an internal computer network or external website of the Contributor or the
Contributor’s employer.
5. The right to include the Contribution in a compilation for classroom use (course packs) to be distributed to students
at the Contributor’s institution free of charge or to be stored in electronic format in datarooms for access by students
at the Contributor’s institution as part of their course work (sometimes called “electronic reserve rooms”) and for inhouse training programs at the Contributor’s employer.
D. CONTRIBUTIONS OWNED BY EMPLOYER
1. If the Contribution was written by the Contributor in the course of the Contributor’s employment (as a “work-madefor-hire” in the course of employment), the Contribution is owned by the company/employer which must sign this
Agreement (in addition to the Contributor’s signature), in the space provided below. In such case, the
company/employer hereby assigns to Wiley, during the full term of copyright, all copyright in and to the
Contribution for the full term of copyright throughout the world as specified in paragraph A above.
2. In addition to the rights specified as retained in paragraph B above and the rights granted back to the Contributor
pursuant to paragraph C above, Wiley hereby grants back, without charge, to such company/employer, its
subsidiaries and divisions, the right to make copies of and distribute the published Contribution internally in print
format or electronically on the Company’s internal network. Upon payment of Wiley’s reprint fee, the institution may
distribute (but not resell) print copies of the published Contribution externally. Although copies so made shall not be
available for individual re-sale, they may be included by the company/employer as part of an information package
included with software or other products offered for sale or license. Posting of the published Contribution by the
institution on a public access website may only be done with Wiley’s written permission, and payment of any
applicable fee(s).
E. GOVERNMENT CONTRACTS
In the case of a Contribution prepared under U.S. Government contract or grant, the U.S. Government may reproduce,
without charge, all or portions of the Contribution and may authorize others to do so, for official U.S. Government
purposes only, if the U.S. Government contract or grant so requires. (U.S. Government Employees: see note at end.)
F. COPYRIGHT NOTICE
The Contributor and the company/employer agree that any and all copies of the Contribution or any part thereof
distributed or posted by them in print or electronic format as permitted herein will include the notice of copyright as
stipulated in the Journal and a full citation to the Journal as published by Wiley.
G. CONTRIBUTOR’S REPRESENTATIONS
The Contributor represents that the Contribution is the Contributor’s original work. If the Contribution was prepared
jointly, the Contributor agrees to inform the co-Contributors of the terms of this Agreement and to obtain their signature
to this Agreement or their written permission to sign on their behalf. The Contribution is submitted only to this Journal
and has not been published before, except for “preprints” as permitted above. (If excerpts from copyrighted works owned
by third parties are included, the Contributor will obtain written permission from the copyright owners for all uses as set
forth in Wiley’s permissions form or in the Journal’s Instructions for Contributors, and show credit to the sources in the
Contribution.) The Contributor also warrants that the Contribution contains no libelous or unlawful statements, does not
infringe upon the rights (including without limitation the copyright, patent or trademark rights) or the privacy of others, or
contain material or instructions that might cause harm or injury.
CHECK ONE:
[____] Contributor-owned work
______________________________________ _________________
Contributor’s signature
Date
________________________________________________________
Type or print name and title
______________________________________ _________________
Co-contributor’s signature
Date
________________________________________________________
Type or print name and title
ATTACHED ADDITIONAL SIGNATURE PAGE AS NECESSARY
[____] Company/Institution-owned work
(made-for-hire in the
course of employment)
______________________________________
Company or Institution (Employer-for-Hire)
_________________
Date
______________________________________
Authorized signature of Employer
_________________
Date
[____] U.S. Government work
Note to U.S. Government Employees
A contribution prepared by a U.S. federal government employee as part of the employee’s official duties, or which is an
official U.S. Government publication is called a “U.S. Government work,” and is in the public domain in the United States.
In such case, the employee may cross out Paragraph A.1 but must sign and return this Agreement. If the Contribution was not
prepared as part of the employee’s duties or is not an official U.S. Government publication, it is not a U.S. Government work.
[____] U.K. Government work (Crown Copyright)
Note to U.K. Government Employees
The rights in a Contribution prepared by an employee of a U.K. government department, agency or other Crown body as part
of his/her official duties, or which is an official government publication, belong to the Crown. In such case, Wiley will
forward the relevant form to the Employee for signature.
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
Stage: I
Page: 1
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 131:000–000 (2007)
AQ1
Worldwide Analysis of Multiple Microsatellites:
Language Diversity has a Detectable Influence
on DNA Diversity
Elise M.S. Belle and Guido Barbujani*
Dipartimento di Biologia ed Evoluzione, Università di Ferrara, Via Borsari, 46, 44100 Ferrara, Italy
KEY WORDS
microsatellite loci; population genetics; genetic diversity; linguistics
ABSTRACT
Previous studies of the correlations between the languages spoken by human populations and
the genes carried by the members of those populations
have been limited by the small amount of genetic
markers available and by approximations in the treatment of linguistic data. In this study we analyzed a large
collection of polymorphic microsatellite loci (377), distributed on all autosomes, and used Ruhlen’s linguistic classification, to investigate the relative roles of geography
and language in shaping the distribution of human DNA
diversity at a worldwide scale. For this purpose, we performed three different kinds of analysis: (i) we partitioned genetic variances at three hierarchical levels of
population subdivision according to language group by
means of a molecular analysis of variance (AMOVA); (ii)
we quantified by a series of Mantel’s tests the correlation
between measures of genetic and linguistic differentiation; and (iii) we tested whether linguistic differences
are increased across known zones of increased genetic
change between populations. Genetic differences appear
to more closely reflect geographic than linguistic differentiation. However, our analyses show that language differences also have a detectable effect on DNA diversity at
the genomic level, above and beyond the effects of geographic distance. Am J Phys Anthropol 131:000–000,
2007. V 2007 Wiley-Liss, Inc.
The patterns of genetic diversity in the geographic
space reflect the interaction between evolutionary factors
leading populations to diverge (mutation, drift, and
diversifying selection) and factors causing genetic convergence (gene flow and other forms of selection). In
humans, the spatial distance between potential mates is
probably the most important factor preventing gene flow
between populations (Relethford, 2004), but nonspatial
reproductive barriers, such as cultural, religious, and
language boundaries, also play an important role (Barbujani, 1991). In particular, language boundaries have
been shown to represent a major obstacle to interbreeding, which accounts for the generally good correlation
between allele frequencies and language groups (Sokal,
1988; Cavalli-Sforza et al., 1988; Barbujani and Sokal,
1990; Cavalli-Sforza et al., 1992).
Any obstacles to gene flow, including geographic distance, tend to decrease both genetic and linguistic similarity between populations. Languages spoken by distant
or isolated groups will undergo independent changes and
slowly diverge from the source language, whereas adjacent groups—or groups not separated by reproductive
barriers—will have easier contacts and will tend to
speak related languages (Nichols, 1997). Under the same
conditions, parallel processes occur at the genomic level.
However, the phenomena producing genetic and linguistic change through time are not identical, and changes
might occur at different time scales. Cavalli-Sforza et al.
(1988) first suggested that linguistic change might be
more rapid than genetic change. Chen et al. (1995)
remarked that genetic transmission is only vertical
whereas languages can also spread horizontally and suggested that linguistic and genetic change are likely to
proceed at different rates. In human history many examples of phenomena disrupting the parallelism between
genetic and linguistic change are indeed known. These
include selection affecting specific genetic loci, episodes
of language replacement associated with the migration
of a numerically small elite, creolization, and admixture
accompanied by the establishment of a common language (Renfrew, 1989). Therefore, a relationship between
language and genes is not a safe general assumption,
but rather a hypothesis which needs to be tested.
Around 6,000 languages are currently spoken on
earth, but the great majority is threatened of extinction.
Indeed about 97% of the world’s population speak 4% of
the languages, whereas 10% of languages have less than
100 speakers (Wurm, 2001). About half of the languages
are currently losing their speakers, which could lead to
the replacement of between 50% and 90% of the minority
languages before the end of the century. In this context,
it is urgent to assess to which extent languages differences have played a role in shaping human genetic diversity before many of the relevant data vanish.
Several studies have shown that there is a significant
correlation between genetic and linguistic diversity in
Europe (Sokal, 1988; Sokal et al., 1989, 1989; Harding
and Sokal, 1988; Sajantila et al., 1995) and that language barriers probably play an important role in main-
C 2007
V
C
Grant sponsor: European Science Foundation (Eurocores Programme: The Origin of Man, Language, and Languages) through
the Italian CNRUniversity of Ferrara.
*Correspondence to: Guido Barbujani, Dipartimento di Biologia ed
Evoluzione, Università di Ferrara, Via Borsari, 46, 44100 Ferrara,
Italy. E-mail: [email protected]
Received 27 June 2006; accepted 28 February 2007
DOI 10.1002/ajpa.20622
Published online in Wiley InterScience (www.interscience.wiley.com).
WILEY-LISS, INC.
ID: vasanss
Date: 20/3/07
Time: 15:43
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
2
Stage: I
Page: 2
BELLE AND BARBUJANI
taining genetic differences by acting as reproductive barriers (Barbujani and Sokal, 1990). In sub-Saharan
Africa, Excoffier et al. (1991) found that the language
family relationships are a better predictor of the genetic
structure than geographical relatedness. Based on classical markers (20 alleles in total), Cavalli-Sforza et al.
(1988) first produced evidence suggesting that the main
linguistic groups of the world broadly correspond to
genetic clusters. Later, Chen et al. (1995) and Poloni
et al. (1997) also found a significant correlation between
genetic and linguistic diversity based, respectively, on 11
autosomal loci and on Y-chromosome restriction polymorphisms.
On the other hand, the results are not so clear-cut in the
Americas and at the worldwide scale. Some authors have
indeed failed to detect association between genetic and linguistic diversity among native Americans and Asians
(Monsalve et al., 1999) and within North America (Hunley
and Long, 2005). Finally, Nettle and Harris (2003) found a
significant influence of languages on X chromosome
genetic differences in Europe and East and Central Asia,
but not in the Near East, Southeast Asia, and West Africa,
a result indicating that this correlation emerges only
under specific conditions. All these studies relied on a limited number of genetic polymorphisms, which may not
exhaustively reflect diversity at the genomic level.
The number of loci analyzed is not a secondary factor.
Indeed, particular selection regimes—affecting specific
loci—may generate genetic outliers, i.e. loci with unusual
spatial patterns of diversity. As a consequence, as has been
suggested for many decades now (Cavalli-Sforza, 1966), to
minimize the effect of such outliers, an extensive sampling
of different genome regions is indispensable.
Therefore, in this study we analyzed a large sample of
polymorphic microsatellite loci (377) spread across the
genome, the largest genetic dataset used so far to investigate the influence of languages on the distribution of
human genetic diversity. Rosenberg et al. (2002) briefly
discussed the relationship between genes and languages
in these data, but without a quantitative approach.
Here, thanks to the large number of markers considered,
we are seeking a more accurate estimate of the correlation between genetics and linguistics. Those autosomal
microsatellite markers have a higher rate of mutation
than other markers, and could therefore parallel more
closely the linguistic change (Barbujani, 1997). In addition, estimates of variation at microsatellite loci do not
seem to suffer from ascertainment bias, a factor known
to lead to erroneous conclusions in analyses based on
single nucleotide or insertion–deletion polymorphisms
(Clark et al., 2005).
We addressed three questions in particular, namely (i)
Do increasing levels of linguistic differentiation lead to
greater genetic distances? (ii) What is the relative
weight of geography and languages in shaping human
genetic diversity? (iii) Do previously identified genetic
barriers across the world also tend to represent zones of
increased linguistic differentiation?
diversity panel of the CEPH (Centre pour l’Etude des
Polymorphismes Humains at the Institut Jean Dausset
in Paris) (Cann et al., 2002). In the CEPH panel, the
individuals defined as \Bantus" came from several different areas; we excluded from the analysis all those
who were not from Kenya because of their small sample
sizes.
Languages were attributed to the samples according to
two sources, namely Ruhlen’s (1991) classification and
The Ethnologue website (Gordon, 2005). There is so far
no universally accepted global taxonomy of languages;
however, Ruhlen’s classification, albeit somewhat controversial, is widely used in population genetic studies
(Poloni et al., 1997; Chen et al., 1995; Cavalli-Sforza
et al., 1992, 1988; Excoffier et al., 1991) because it is
probably the most comprehensive attempt to group all
the world’s 6,000 languages into linguistic phyla, 17. The
languages of all populations of this study were defined
consistently in the two sources, with the exception of a
few populations indicated in Table 1. For instance, The
Ethnologue classifies the American languages into to
four distinct phyla, and not one as Ruhlen (1991) does.
The CEPH dataset includes two pygmy samples, Biaka
and Mbuti. The pygmies’ linguistic affiliation is difficult
to define, as in most cases they seem to have borrowed
their neighbors’ language (Cavalli-Sforza, 1986). We
attributed the Biaka to the Niger-Kordofanian phylum,
which includes Bantu languages, because, besides speaking a language of this phylum, the Biaka are thought to
have undergone extensive admixture with Bantu speakers (Cavalli-Sforza, 1986). On the contrary, Mbuti pygmies include both speakers of Nilosaharan and NigerKordofanian languages (ALFRED database: Cheung
et al., 2000), whom proved impossible for us to separate.
As a consequence, the Mbuti sample was not considered
in the analysis. In this way, our results refer to 50 populations: 6 from Africa, 3 from the Middle East, 26 from
Asia, 2 from Oceania, 8 from Europe, and 5 from the
Americas (Fig. 1), which we classified into linguistic families, branches, and phyla (Table 1).
Analysis of molecular variance
Measures of genetic variance were hierarchically partitioned by an analysis of molecular variance (AMOVA)
(Excoffier et al., 1992) implemented in the Arlequin ver
2.0 software (Schneider et al., 2000). For each locus, we
compared each individual genotype with (i) the genotype
of the individuals of the same linguistic family, (ii) the
genotypes of the individuals from different families of
the same linguistic phylum and (iii) the genotypes of
individuals belonging to different linguistic phyla.
Two measures of genetic distance were estimated from
the data which can be assimilated to FST and RST values.
The number of alleles differing between two haplotypes
was calculated as:
^ xy ¼
d
L
X
dxy ðiÞ
i¼1
MATERIALS AND METHODS
Dataset
We analyzed the published genetic dataset comprising
377 autosomal microsatellite loci distributed on all 22
autosomes in 52 world populations (Rosenberg et al.,
2002). The samples were part of a study of the cell-line
where dxy(i) is the Kronecker function, equal to 1 if the
alleles of the ith locus are identical for both haplotypes,
and equal to 0 otherwise, and summation is over all loci.
This index is analogous to a weighted FST statistics over
all loci (see Arlequin manual, Schneider et al., 2000).
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:43
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
T1
F1
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
Stage: I
Page: 3
3
HUMAN DNA DIVERSITY AND LANGUAGES
TABLE I. Language name, family, branch and phylum for each population considered,
according to Ruhlen’s classification of languages
Population name
Language (dLAN¼1)
Family (dLAN¼2)
Biaka
Yaka
Bantoid
Niger-Congo
Mandenka
Mandinka
Mande
Niger-Congo
Yoruba
Yoruba
Yoruba-North. Akoko
Niger-Congo
San
Kenya
San
Bantu
Hai.n//um
Bantoid
–
Niger-Congo
Mozabite
Bedouin
Druze
Palestinian
Brahui
Mozabite
Arabic
Arabic
Arabic
Brahui
Berber
Arabo-Canaanite
Arabo-Canaanite
Arabo-Canaanite
Dravidian
–
Semitic
Semitic
Semitic
North West
Balochi
Baluchi
Iranian
Indo-Iranian
Hazara
Persian
Iranian
Indo-Iranian
Makrani
Baluchi
Iranian
Indo-Iranian
Sindhi
Sindhi
Indic
Indo-Iranian
Pathan
Newari
Kalash
Kalasha
Tibeto-Karen
Eth: Tibeto-Burman
Indo-Iranian
Burusho
Han
Tujia
Burushaski
Mandarin/Cantonese
Tujia
Tibetic
Eth: Himalayish
Indic
Eth: Dardic
–
Chinese
Tai
Yi
Yi
Burmic
Miao
Miao
Miao
Oroqen
Daur
Mongolian
Hezhen
Xibo
Uyghur
Dai
Oroqen
Daur
Mongolian
Hezhen
Xibo
Uyghur
Dai
Lahu
Lahu
Tungus
Mongolian
Mongolian
Tungus
Tungus
Turkic
Austronesian
Eth: Austro-Tai
Burmic
She
She
Yao
Naxi
Naxi
Burmic
Tu
Yakut
Japanese
Cambodian
Tu
Yakut
Japanese
Khmer
Mongolian
Turkic
Japanese
Mon-Khmer
Papuan
–
Papuan
Melanesian
–
Melanesian
Branch (dLAN¼3)
–
Sinitic
Austro-Tai
Eth: Tibeto-Burman
Tibeto-Karen
Eth: Tibeto-Burman
Miao-Yao
Mongolian-Tungus
Mongolian-Tungus
Mongolian-Tungus
Mongolian-Tungus
Mongolian-Tungus
–
Austro-Tai
Eth: Babar
Tibeto-Karen
Eth: Tibeto-Burman
Miao-Yao
Tibeto-Karen
Eth: Tibeto-Burman
Mongolian-Tungus
–
Korean-Japanese
Austro-Asiatic
French
Basque
Sardinian
French
Basque
Sardinian
Romance continental
–
–
–
Eth: Oceanic
–
Eth: Oceanic
italic
–
Italic
Bergamo
Italian
Romance continental
Italic
Tuscan
Italian
Romance continental
Italic
Orcadian
English
–
Germanic
Adygei
Adyghe
–
Circassian
Phylum (dLAN¼4)
Niger-Kordofanian
Eth: Niger-Congo
Niger-Kordofanian
Eth: Niger-Congo
Niger-Kordofanian
Eth: Niger-Congo
Khoisan
Niger-Kordofanian
Eth: Niger-Congo
Afro-Asiatic
Afro-Asiatic
Afro-Asiatic
Afro-Asiatic
Elamo-Dravidian
Eth: Dravidian
Indo-Hittite
Eth: Indo-European
Indo-Hittite
Eth: Indo-European
Indo-Hittite
Eth: Indo-European
Indo-Hittite
Eth: Indo-European
Sino-Tibetan
Indo-Hittite
Eth: Indo-European
Language isolate
Sino-Tibetan
Austric
Eth: Sino-Tibetan
Sino-Tibetan
Austric
Eth: Miao-Yao
Altaic
Altaic
Altaic
Altaic
Altaic
Altaic
Austric
Eth: Austronesian
Sino-Tibetan
Austric
Eth: Miao-Yao
Sino-Tibetan
Altaic
Altaic
Altaic
Austric
Eth: Austro-Asiatic
Indo-Pacific
Eth: Austronesian
Indo-Pacific
Eth: Austronesian
Indo-Hittite
Language Isolate
Indo-Hittite
Eth: Indo-European
Indo-Hittite
Eth: Indo-European
Indo-Hittite
Eth: Indo-European
Indo-Hittite
Eth: Indo-European
Northern Caucasian
(continued)
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:43
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
4
Stage: I
Page: 4
BELLE AND BARBUJANI
TABLE 1. (Continued)
Population name
Language (dLAN¼1)
Family (dLAN¼2)
Branch (dLAN¼3)
Russian
Russian
Slavic
Balto-Slavic
Pima
Pima
Pimic
Maya
Maya
Mexican
Piapoco
Piapoco
Macro-Arawakan
Karitiana
Karitiana
Kariti-Tupi
Surui
Surui
Kariti-Tupi
Uto-Aztecan
Sonoran
Penutian
Eth: Yucatecan
Equatorial-Tucanoan
Eth: Maipuran
Equatorial-Tucanoan
Eth: Arikem
Equatorial-Tucanoan
Eth: Monde
Phylum (dLAN¼4)
Indo-Hittite
Eth: Indo-European
Amerind
Eth: Aztecan
Amerind
Eth: Mayan
Amerind
Eth: Arawakan
Amerind
Eth: Tupi
Amerind
Eth: Tupi
When the classification of The Ethnologue differs significantly from Ruhlen’s, the different classification is indicated as ‘Eth:’ on a
second line.
C
O
L
O
R
Fig. 1. Distribution of the sampling localities. The significant genomic boundaries found based on RST values (Barbujani and
Belle, 2006) are represented by thick black lines. The Delaunay connections are represented by thin lines (solid between language
phyla, dashed between language families, and dotted within families). Different colors represent samples belonging to different language phyla and different symbols (triangles, circles, squares) represent samples belonging to different language families within
the same phylum.
Slatkin’s (1995) RST is a specific distance measure for
microsatellites, based on both allele frequency differences between populations and repeat number differences
between alleles. This measure was calculated as the sum
of the squared numbers of repeat differences between
two haplotypes, that is to say as:
^ xy ¼
d
L X
axi ayi
2
i¼1
where axi is the number of repeats at the ith locus (see
Arlequin manual, Schneider et al., 2000).
The significance of the estimated variances was tested
by means of a nonparametric permutational procedure
(Excoffier et al., 1992). In three independent tests, (i)
individuals were assigned to random language families
of the same phylum, (ii) individuals were assigned to
random language families regardless of the phylum, and
(iii) language families were randomly assigned to language phyla, each time recalculating the relevant variance. Each randomization process was iterated 10,000
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:43
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
Stage: I
Page: 5
5
HUMAN DNA DIVERSITY AND LANGUAGES
times. The empirical distributions thus obtained were
then compared with the observed variances, and an empirical level of significance was thus estimated for each
variance.
Mantel tests
A matrix of great circle geographic distances (dGEO
matrix) between populations was constructed using a
method first proposed by Ramachandran et al. (2005),
i.e. assuming five obligatory waypoints on the world’s
map so as to make distances between continents more
reflective of the likely human dispersal routes from
Africa. These points were Anadyr, Russia (64N, 177E);
Cairo, Egypt (30N, 31E); Istanbul, Turkey (41N, 28E);
Phnom Penh, Cambodia (11N, 104E); and Prince Rupert,
Canada (54N, 130W).
Linguistic distances (in the dLAN matrix) were estimated as simple dissimilarity indexes ranging from 0 to
4, according to the method described by Excoffier et al.
(1991). In fact, two dLAN matrices were constructed,
based either on Ruhlen’s (1991) language classification,
or on The Ethnologue. Population speaking languages
belonging to different phyla were assigned dLAN ¼ 4,
languages of different branches dLAN ¼ 3, languages of
different families dLAN ¼ 2, different languages dLAN ¼
1, and the same language dLAN ¼ 0.
Finally, matrices of genetic distances between populations (dGEN matrix) were computed, using either the
standard FST measure or Slatkin’s RST.
Correlation between genetic boundaries and
language barriers
The correlations among the three matrices were estimated by pairwise (Mantel, 1967) and multiple (Smouse
et al., 1986) Mantel tests. First we calculated three pairwise correlations (between dGEO and dGEN, between
dGEO and dLAN, and between dGEN and dLAN). We then
separated the effects of geography and language on the
genetic distances by calculating the partial correlations
between dGEN and dGEO (with dLAN held constant) and
between dGEN and dLAN (with dGEO held constant). The
significance of the observed coefficients was calculated
by randomly permuting rows and columns of one matrix
while keeping the other matrix constant, thus obtaining
an empirical null distribution of correlation coefficients.
Doubts have been raised on the accuracy of the P-values
associated with the partial Mantel tests (Raufaste and
Rousset, 2001). Be that as it may, we provide those values for the sake of comparison with previous studies. All
the above procedures are implemented in the Arlequin
ver 2.0 software (Schneider et al., 2000).
Correlation between genetic barriers and
linguistic groups
In a previous analysis of the same dataset, six significant genetic boundaries were identified, namely zones
where the rate of genetic change is locally increased
with respect to random locations (Barbujani and Belle,
2006; see also Rosenberg et al., 2005 for an alternative
approach). In this study, we investigated whether at
these genetic boundaries linguistic change is increased
with respect to other zones on the map. Adjacent populations were connected by a Delaunay network (Manni
et al., 2004) and each edge of the network was associated
with a measure of linguistic differentiation (Figure 1).
TABLE 2. Hierarchical AMOVA analysis, showing the percentage of variation at each of three levels of population hierarchy,
considering either allele frequencies only (FST) or both allele
frequencies and molecular distances between alleles (RST)
Measure of genetic distance
FST
RST
Among language phyla
Among populations between language
families, within phyla
Within populations
2.9
2.4
6.7
2.9
94.7
90.4
All values are significant at the P < 0.0001 level.
For this purpose, we pooled the indexes of language dissimilarity used for the Mantel tests in two different
ways: we either considered dLAN ¼ 4 versus dLAN ¼ 0, 1,
2, 3 or dLAN ¼ 3, 4 versus dLAN ¼ 0, 1, 2. We then compared the average dLAN along edges of the network that
were crossed by a significant genetic boundary with the
average dLAN measured along the other edges of the network, under the null hypothesis of no difference between
averages.
RESULTS
AMOVA
Considering allele-frequency differences (FST), the
analysis of variance reveals that only 2.9% of the genetic
variation is explained by differences between linguistic
phyla and 2.4% is due to differences between language
families (Table 2). The average worldwide FST value estimated from this dataset is equal to 0.053.
When molecular differences between alleles are considered (RST), genetic diversity between phyla and families
increases to 6.7% of the total for differences among language phyla and to 2.9% among families.
For both analyses, the diversity indexes among individuals from the same language family, individuals from
different language families of the same phylum, and
individuals from different language phyla are all significantly greater than 0 at P < 0.05. Considering the classification of languages of The Ethnologue instead of Ruhlen’s, we found very similar results, with the difference
in the percentages <0.5% for each component of variation (data not given). In what follows, unless otherwise
specified, we shall refer to dLAN matrices estimated
according to Ruhlen’s classification.
We did not perform analyses at a lower hierarchical
level, that is to say within language phyla or families,
because of the highly variable number of populations
sampled within each linguistic group.
Mantel tests
The correlation coefficients between geographic, linguistic and genetic distances, using either RST or FST
distances, are given in Table 3 and Figure 2.
Considering FST values, and hence not taking into
account molecular differences between alleles, both variables, geography and language, appear significantly
associated with genetic variation. The correlation
between genetic diversity and linguistic distances was
highly significant (r ¼ 0.226, P < 0.0001), meaning that
5.1% (r2) of the genetic variation is accounted for by language distances. The proportion of genetic variation
accounted for by geography was much higher (r2 ¼
65.3%). However, geographic distances and linguistic
classification are also correlated, and predictably so,
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:44
T2
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
T3, F2
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
6
Stage: I
Page: 6
BELLE AND BARBUJANI
TABLE 3. Mantel correlation and partial correlation coefficients (r), between genetic (dGEN), geographic (dGEO) and linguistic matrices (dLAN) based on Ruhlen’s classification of languages, either using FST or RST measures of genetic diversity
Genetic measure
RST
FST
Correlation
coefficient (r)
Matrices considered
DGEN and dGEO
dGEN and dlan
dGEO and dLAN
dGEN, dGEO, and dLAN
dGEN and dLAN, dLAN constanta
dGEN and dLAN, dLAN constanta
Proportion of
variance
explained (r2)
0.808***
0.226***
0.268***
0.653
0.051
0.072
0.653
0.633
0.0003
0.796***
0.018N.S.
Correlation
coefficient (r)
Proportion of
variance
explained (r2)
0.746***
0.311***
0.269***
0.723***
0.172***
0.557
0.097
0.072
0.570
0.523
0.030
a
Partial correlation coefficients.
*** P 0.0001; NS: non significants.
C
O
L
O
R
Fig. 2. (a) Correlation between genetic distances calculated as RST values, and geographic distances constrained by five obligatory waypoints (dGEO). Following Ramachandran et al. (2005), red squares represent comparisons within regions, green triangles
comparisons between populations in Africa and Eurasia, and blue diamonds comparisons with America and Oceania. (b) Box plot
showing the correlation between genetic distances (RST) and linguistic distances (dLAN) calculated on the basis of Ruhlen’s classification of languages. The upper and lower limits of the box represent, respectively, the 5% and 95% confidence intervals, and the
long horizontal bar in the box represents the median.
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:44
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
Stage: I
Page: 7
7
HUMAN DNA DIVERSITY AND LANGUAGES
because populations speaking related languages form
clusters in the geographical space. As a consequence, the
observed correlation between linguistic and genetic distances may conceivably reflect the fact that both are correlated with geographical distances. When geographic
distances are held constant, the correlation does not
remain significant when genetic distances are measured
by FST.
By contrast, using RST values (Table 3), which are
more specific for microsatellites, we found that 55.7% of
the genetic variance is explained by geography, while
only 9.7% is accounted for by linguistics (without keeping any variable constant). In this case, the correlation
between genetic diversity and linguistic distances
remains significant, and highly so, even at constant geographic distances (r ¼ 0.172, P < 0.0001), and in this
case linguistic distances alone account for less than 3%
(r2) of genetic variation.
Therefore, the geographic location of populations
seems to be a better predictor of genetic distances than
their linguistic affiliation, even though both correlations
are highly significant. However, our results also suggest
that between 43% (with RST values) and 45% (with FST
values) of the overall genetic variance reflects the action
of other factors. Once again, considering the classification of languages of The Ethnologue instead of Ruhlen’s
did not alter substantially our results (data not given).
Figure 2a is a plot of the RST values versus pairwise
geographic distances between populations. The correlation appears to be mostly due to comparisons of American and Oceanian populations, where both the RST values and geographic distances are greatest. However,
increasing genetic distances tend to correspond to
increasing language distances also when populations of
the same region are compared. This is confirmed by Figure 2b, where the category corresponding to dLAN ¼ 4
has a large confidence interval for the RST value.
Correlation between genetic barriers and
linguistic groups
T4
In our 50 population samples, 48 different languages
are spoken belonging to 11 linguistic phyla and 2 language isolates, according to Ruhlen’s (1991) classification. If language differences represent an important barrier to gene flow, we expect the linguistic difference
between populations on different sides of a genetic barrier to be on average greater than the one on the same
side of that barrier.
In a previous analysis of this dataset, six significant
genetic boundaries were identified using Slatkin’s RST
measure of genetic distances (Barbujani and Belle,
2006). We found that out of the 97 Delaunay connections
of the map, 23 are crossed by one of those genetic boundaries (Fig. 1 and Table 4). A fraction equal to 32.7%
(18/55) of the Delaunay connections between linguistic
phyla (dLAN ¼ 4) is crossed by a significant genetic barrier, as is the case for 13.9% (6/43) of the connections
within linguistic phyla (dLAN ¼ 1, 2, or 3). The difference
between those proportions is statistically significant
(v2 ¼ 4.6, P < 0.05), showing that increased genetic
change tends to be observed with higher probability
where linguistic change is greatest.
Among the most controversial issues in linguistics is
the classification of native American languages.
Although Ruhlen clusters them into a single phylum,
many studies have suggested that they could belong to
TABLE 4. Proportion of Delaunay connections in the world’s
map crossed by a significant genetic
barrier as defined in Barbujani and Belle (2006)
Delaunay connections
Linguistic
distance
dLAN¼0
dLAN¼1
dLAN¼2
dLAN¼3
dLAN¼4
Total
Crossed by
a barrier
Not crossed
by a barrier
Total
0
1
2
3
18
23
5
8
7
17
37
74
5
9
9
20
55
97
Two different ways of grouping the coding of the linguistic distances were tested: (i) at the level dLAN¼4 versus dLAN¼0,l,2,3
and (ii) at the level dLAN¼3,4 versus dLAN¼0,1,2 (see main
text).
several distinct phyla (see for instance Sims-Williams,
1998). We reran the analysis considering the classification of The Ethnologue (Table 1). In this way, 21 out of
23 Delaunay connections showing increased genetic
change occur between different language phyla, thus
making the difference between linguistic categories even
more significant (v2 ¼ 9.9, P < 0.01) than in the previous
test.
To understand if the correlation is due to a specific
degree of linguistic differentiation, we also investigated
whether populations speaking languages belonging to
different phyla or branches (dLAN ¼ 3 or 4) occur more
often on different sides of a genetic boundary than populations speaking languages belonging to the same branch
(dLAN ¼ 0, 1, or 2). The difference between those linguistic categories was not significant (v2 ¼ 2.13, NS).
DISCUSSION
This study suggests that linguistic differences between
populations have a small, but nonnegligible, effect on
patterns of DNA variation at the world scale. The large
number of markers considered, and the fact that ascertainment bias is unlikely to have affected the statistics
estimated from DNA data, suggests that this conclusion
is robust and can be generalized at the genome level, at
least for autosomes. By contrast, uniparentally-transmitted markers (mtDNA and Y chromosome) have a lower
effective population size and hence are affected more
strongly by genetic drift, often showing peculiar geographical patterns (see for instance Jorde et al., 2000,
Wilson et al., 2001; Romualdi et al., 2002). As a consequence, for further generalizations our conclusions will
have to be tested against suitable mitochondrial DNA
and Y-chromosome datasets.
Here we found that 5.3% of the global variance in our
autosomal dataset can be explained by differences
between linguistic groups (families or phyla) using FST
measures of genetic distances, and 9.6% using the RST
statistic. These values are of the same order of magnitude as the proportion of genetic variation that can be
attributed to differences among seven geographic regions
using the same dataset, that is to say 3.6% (Rosenberg
et al., 2002) and 9.2% (Excoffier and Hamilton, 2003),
using respectively FST and RST measures of genetic diversity.
These low values could be interpreted as meaning that
language affiliations are as informative—or uninformative—as geographic locations to predict the genetic rela-
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:44
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
8
Stage: I
Page: 8
BELLE AND BARBUJANI
tionships between populations. However, the Mantel
tests indicate that in fact geographic distances are a
good predictor of genetic distances, and that linguistic
distances also affect DNA differences.
The levels of genetic differentiation inferred from
microsatellites are lower than that from other DNA polymorphisms (Jorde et al., 2000; Romualdi et al., 2002).
This is thought to be due, at least in part, to the higher
microsatellite mutation rates, which tend to increase
variation within groups compared to variation among
groups (Jin and Chakraborty, 1995; Jorde et al., 2000).
When molecular distances between alleles were also
considered (RST distances), the differences between language groups became almost twice as large (9.6%). This
confirms that evolutionary processes affecting DNA
sequences and allele frequencies occur at different time
scales. It is known that DNA sequences evolve relatively
slowly by the gradual accumulation of mutations,
whereas allele frequencies change more rapidly under
the action of genetic drift. Many polymorphisms may
even predate the geographical differentiation of modern
humans and differ between human groups only in relative frequencies (Klein et al., 1993; O’hUigin et al. 2002).
Accordingly, we would expect linguistic change to be
more closely associated with changes in allele frequencies, and hence with FST. However, microsatellites tend
to evolve more rapidly than other polymorphisms, and
hence RST measures seem to more closely parallel linguistic change than FST (for a similar result, see Excoffier and Hamilton, 2003). Globally speaking, the genetic
differences between the main language phyla probably
reflect relatively ancient demographic subdivision, whose
effects may be easier to identify at the time scale of
microsatellite evolution.
Mantel tests allowed us to quantify the relative importance of geography and language in determining the current genetic diversity. As already observed, genetic distances are more closely related with geographic than
with linguistic distances. The values of the correlation
coefficients between geographic and genetic distances
are consistent with the ones obtained by Ramachandran
et al. (2005) based on the same populations but with 406
additional loci. Our values are lower, but we also observe
a stronger correlation with RST measures of genetic distances (r ¼ 0.808) than with FST measures (r ¼ 0.746).
In the partial correlations, using RST measures of
genetic diversity, geography accounts for more than 50%
of the genetic variance when languages are kept constant, whereas linguistic differences account for only 3%
of the genetic variance when geography is controlled for.
Nevertheless, both correlations are significant. This suggests that (i), as already known, populations that are
distant in the geographical space tend to be genetically
differentiated, but also that (ii) pairs of populations separated by the same geographic distance tend to be more
genetically differentiated if they speak languages belonging to different phyla. This significant correlation
between linguistic and genetic distances based on RST
values is consistent with previous analyses of European
diversity based on nuclear allele frequencies (Sokal,
1988) and on Y-chromosome polymorphisms (Poloni
et al., 1997; Rosser et al., 2000), and with Chen et al.’s
(1995) results at the world level, the latter study based
on 11 loci only. By contrast, when considering genetic
distances calculated as FST values, the partial correlation between genetics and linguistics holding geography
constant is not significant. However, it is important to
note that because there is presently no consensus on
how to quantify linguistic relatedness, we were forced to
adopt a rather arbitrary and approximate scale of values. Therefore the fact that we did find significant correlations suggests that the relationship between linguistic
and genetic differentiation may in fact be stronger than
estimated.
In agreement with this view is another result of this
study. We examined the effect of using a dLAN ¼ 8
(instead of 4) for languages belonging to different phyla,
implicitly assuming that only relatively close linguistic
relationships can be safely established, whereas the relationships between language phyla are minimal and
largely undefined (see e.g. Trask, 1996). This rescaling
had the effect of further increasing the partial correlation between genetics and languages (r ¼ 0.206, P <
0.0001 with RST measures), although the association
between genetics and geography remains stronger.
Geography and languages together explain between
57% (RST) and 65% (FST) of the DNA variance among the
populations of this study, two values lower than those
reported for the same populations using different statistics, namely 78% (Ramachandran et al., 2005). At any
rate, this means that a substantial part of microsatellite
diversity must be accounted for by other demographic or
evolutionary factors, and presumably that forces acting
locally, such as genetic drift and founder effects, have substantially contributed to shaping human DNA diversity.
Some aspects of this phenomenon have already been
pointed out, including the fact that the DNA similarities
between the Hazara and the Uyghurs of this study would
not be expected based on either their linguistic or geographic relationships (Rosenberg et al., 2002).
Finally, we investigated to which extent large linguistic differences are localized at the main genetic boundaries. Our analysis showed that the greatest proportion of
genetic barriers falls between populations speaking languages of different phyla. In other words, genetic boundaries, or zones or sharp genetic change, tend to occur
where well-distinct linguistic groups are at contact. We
showed in a previous study (Belle and Barbujani, 2006)
that genetic boundaries do not necessarily correspond to
obvious physical barriers. Significant genetic boundaries
have been observed, for instance, within South American
populations separated by but a few kilometers. The very
small size of the American hunting–gathering populations, and their documented tendency to split along family lines (Crawford, 1998) have certainly enhanced the
local drift effects. Obviously, when population splits are
followed by isolation, both language and genetic divergence will ensue, and language differences will reinforce
evolutionary independence between populations, even in
the absence of other reproductive barriers.
To understand if the significant difference found
between genetic barriers separating populations speaking languages belonging to the same phylum or different
phyla was due to linguistic differences at the phylum
level, we repeated the same analysis using a different
grouping, that is to say between branches (or phyla) versus within branches. We found that no significant difference was left, suggesting that most of the correlations
previously found are due to the effects of the major linguistic boundaries, such as those separating languages
of different phyla.
Our results may have been biased by a number of factors. First, it has been suggested that the uneven distribution of the populations sampled across the world could
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:44
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
Stage: I
Page: 9
HUMAN DNA DIVERSITY AND LANGUAGES
influence the inferred patterns of genetic diversity (Serre
and Paabo, 2004), and hence of linguistic diversity. However, this issue is still debated, with Rosenberg et al.
(2005) maintaining that this geographic representation
was actually suitable for population structure studies.
Second, the classification of languages at the world level
is still controversial. To better assess the patterns of correlation between genetic and linguistic diversity, we
need a clearer consensus on the relationships among languages.
Despite those potential drawbacks, the results of our
analysis, the largest comparison so far of genetic and linguistic diversity in terms of the number of loci considered, confirm the existence, at a worldwide scale, of a
small but detectable effect of linguistic differences on
human DNA diversity at the genomic level.
This result is also of practical anthropological significance, because most languages spoken on earth are currently threatened of extinction. Our finding that the distribution of languages has a detectable influence on
human genetic diversity suggests that it would be essential to constitute a collection of additional linguistic and
genetic data from the populations which are the most
endangered, before it is too late to study them.
ACKNOWLEDGMENTS
We thank Noah Rosenberg and another anonymous
reviewer for insightful comments and suggestions.
LITERATURE CITED
Barbujani G. 1991. What do languages tell us about human
microevolution? Trends Ecol Evol 5:151–156.
Barbujani G. 1997. DNA variation and language affinities. Am J
Hum Genet 61:1011–1014.
Barbujani G, Belle E. 2006. Genomic boundaries between
human populations. Hum Hered 61:15–21.
Barbujani G, Sokal RR. 1990. Zones of sharp genetic change in
Europe are also linguistic boundaries. Proc Natl Acad Sci
USA 87:1816–1819.
Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre
L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen
A, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L,
Ferrara GB, Friedlaender JS, Groot H, Gurwitz D, Jenkins T,
Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA,
Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q,
Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G,
Dausset J Cavalli-Sforza LL. 2002. A human genome diversity
cell line panel. Science 296:261–262.
Cavalli-Sforza LL. 1966. Population structure and human evolution. Proc Royal Soc B 164:362–379.
Cavalli-Sforza LL. 1986. African Pygmies. Academic Press, Orlando, FL.
Cavalli-Sforza LL, Minch E Mountain JL. 1992. Coevolution of
genes and languages revisited. Proc Natl Acad Sci USA
89:5620–5624.
Cavalli-Sforza LL, Piazza A, Menozzi P Mountain J. 1988.
Reconstruction of human evolution: Bringing together genetic,
archaeological and linguistic data. Proc Natl Acad Sci USA
85:6002–6006.
Chen J, Sokal RR, Ruhlen M. 1995. Worldwide analysis of
genetic and linguistic relationships of human populations.
Hum Biol 67:595–612.
Cheung KH, Osier MV, Kidd JR, Pakstis AJ, Miller PL, Kidd
KK. 2000. ALFRED: an allele frequency database for diverse
populations and DNA polymorphisms. Nucleic Acids Res
28:361–363.
Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen
R. 2005. Ascertainment bias in studies of human genomewide polymorphism. Genome Res 15:1496–1502.
9
Crawford M. 1998. The origin of native Americans: evidence
from archeological genetics. Cambridge University Press,
Cambridge.
Excoffier L, Hamilton G. 2002. Comment on \Genetic structure
of human populations". Science 298:2381–2385.
Excoffier L, Harding RM, Sokal RR, Pellegrini B, SanchezMazas A. 1991. Spatial differentiation of RH and GM haplotype frequencies in sub-Saharan Africa and its relation to linguistic affinities. Hum Biol 63:273–307.
Excoffier L, Smouse PE, Quattro JM. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction
data. Genetics 131:479–491.
Gordon RG Jr. 2005. Ethnologue: languages of the world, 15th
ed. Dallas, TX: SIL International. Online version: http://
www.ethnologue.com/.
Harding RM, Sokal RR. 1988. Classification of the European
language families by genetic distances. Proc Natl Acad Sci
USA 85:9370–9372.
Hunley K, Long JC. 2005. Gene flow across linguistic boundaries in native North American populations. Proc Natl Acad
Sci USA 102:1312–1317.
Jin L, Chakraborty R. 1995. Population structure, stepwise
mutations, heterozygote deficiency and their implications in
DNA forensics. Heredity 74:274–285.
Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE,
Seielstad MT, Batzer MA. 2000. The distribution of human
genetic diversity: a comparison of mitochondrial, autosomal
and Y-chromosome data. Am J Hum Genet 66:979–988.
Klein J, Satta Y, O’hUigin C, Takahata N. 1993. The molecular
descent of the major histocompatibility complex. Annu Rev
Immunol 11:269–295.
Manni F, Guerard E, Heyer E. 2004. Geographic patterns of
(genetic, morphologic, linguistic) variation: how barriers can
be detected by using Monmonier’s algorithm. Hum Biol
76:173–190.
Mantel NA. 1967. The detection of disease clustering and a generalized regression approach. Cancer Res 27:209–220.
Monsalve MV, Helgason A, Devine DV. 1999. Languages, geography and HLA haplotypes in native American and Asian
populations. Proc Royal Soc B 266:2209–2216.
Nettle D, Harriss L. 2003. Genetic and linguistic affinities
between human populations in Eurasia and West Africa. Hum
Biol 75:331–344.
Nichols J. 1997. Modeling ancient population structures and
movement in linguistics. Annu Rev Anthropol 26:359–384.
O’hUigin C, Satta Y, Takahata N, Klein J. 2002. Contribution of
homoplasy and of ancestral polymorphism to the evolution of
genes in anthropoid primates. Mol Biol Evol 19:1501–1513.
Poloni ES, Semino O, Passarino G, Santachiara-Benerecetti AS,
Dupanloup I, Langaney A, Excoffier L. 1997. Human genetic
affinities for Y-chromosome P49a,f/TaqI haplotypes show
strong correspondence with linguistics. Am J Hum Genet
61:1015–1035.
Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA,
Feldman MW, Cavalli-Sforza LL. 2005. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc
Natl Acad Sci USA 102:15942–15947.
Raufaste N, Rousset F. 2001. Are partial Mantel tests adequate?
Evolution 55:1703–1705.
Relethford JH. 2004. Global patterns of isolation by distance
based on genetic and morphological data. Hum Biol 76:499–
513.
Renfrew C. 1989. Models of change in languages and archaeology. Trans Philos Soc 87:103–155.
Romualdi C, Balding D, Nasidze IS, Risch G, Robichaux M,
Sherry ST, Stoneking M, Batzer MA, Barbujani G. 2002. Patterns of human diversity, within and among continents,
inferred from biallelic DNA polymorphisms. Genome Res
12:602–612.
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK,
Zhivotovsky LA, Feldman MW. 2002. Genetic structure of
human populations. Science 298:2381–2385.
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:44
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
10
Stage: I
Page: 10
BELLE AND BARBUJANI
Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard
JK, Feldman MW. 2005. Clines, clusters, and the effects of
study design on the inference of human population structure.
PloS Genetics 6:e70.
Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D,
Amorim A, Amos W, Armenteros M, Arroyo E, Barbujani G,
Beckman G, Beckman L, Bertranpetit J, Bosch E, Bradley
DG, Brede G, Cooper G, Corte-Real HB, de Knijff P, Decorte
R, Dubrova YE, Evgrafov O, Gilissen A, Glisic S, Golge M,
Hill EW, Jeziorowska A, Kalaydjieva L, Kayser M, Kivisild T,
Kravchenko SA, Krumina A, Kucinskas V, Lavinha J, Livshits
LA, Malaspina P, Maria S, McElreavey K, Meitinger TA,
Mikelsaar AV, Mitchell RJ, Nafa K, Nicholson J, Norby S,
Pandya A, Parik J, Patsalis PC, Pereira L, Peterlin B, Pielberg G, Prata MJ, Previdere C, Roewer L, Rootsi S, Rubinsztein DC, Saillard J, Santos FR, Stefanescu G, Sykes BC,
Tolun A, Villems R, Tyler-Smith C, Jobling MA. 2000. Y-chromosomal diversity in Europe is clinal and influenced primarily by geography, rather than language. Am J Hum Genet
67:1526–1543.
Ruhlen M. 1991. A guide to the world’s languages, Vol 1: Classification. Stanford, CA: Stanford University Press.
Sajantila A, Lahermo P, Anttinen T, Lukka M, Sistonen P,
Savontaus M-L, Aula P, Beckman L, Tranebjaerg L, GeddeDahl T, Issel-Tarver L, DiRienzo A, Paabo S. 1995. Genes and
languages in Europe: an analysis of mitochondrial lineages.
Genome Res 5:42–52.
Schneider S, Roessli D, Excoffier L. 2000. Arlequin ver. 2.000: A
software for population genetics data analysis. Genetics and
Biometry Laboratory, University of Geneva, Switzerland.
Serre D, Paabo S. 2004. Evidence for gradients of human genetic diversity within and among continents. Genome Res 14:1679–1685.
Sims-Williams P. 1998. Genetics, linguistics, and prehistory:
thinking big and thinking straight. Antiquity 72:505–527.
Slatkin M. 1995. A measure of population subdivision based on
microsatellite allele frequencies. Genetics 139:457–462.
Smouse PE, Long JC, Sokal RR. 1986. Multiple regression and
correlation extensions of the Mantel test of matrix correspondence. Syst Zool 35:627–632.
Sokal RR. 1988. Genetic, geographic and linguistic distances in
Europe. Proc Natl Acad Sci USA 85:1722–1726.
Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J, Vaudor A.
1989. Genetic differences among language families in Europe.
Am J Phys Anthropol 79:489–502.
Trask RL. 1996. Historical linguistics. London: Arnold.
Wilson JE, Weale ME, Smith AC, Gratrix F, Fletcher B, Thomas
MG, Bradman N, Goldstein DB. 2001. Population genetic
structure of variable drug response. Nat Genet 29:265–269.
Wurm SA. 2001. Atlas of the world’s languages in danger of disappearing, 2nd ed. UNESCO Publishing.
American Journal of Physical Anthropology—DOI 10.1002/ajpa
ID: vasanss
Date: 20/3/07
Time: 15:44
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044
J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07
Stage: I
Page: 11
AQ1: The title has been modified slightly so as to eliminate the \the statement/sentence effect" of the
original. Kindly confirm whether the changes made are appropriate.
AQ2: Kindly note that as Roosenberg et al (2002) is repeated twice in the reference list, the latter has
been deleted.
ID: vasanss
Date: 20/3/07
Time: 15:44
Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044