Genomic Drift and Evolution of Microsatellite

Genomic Drift and Evolution of Microsatellite DNAs in Human Populations
Naoko Takezaki* and Masatoshi Nei *Life Science Research Center, Kagawa University, Japan; and Department of Biology, Institute of Molecular Evolutionary
Genetics, Pennsylvania State University
In recent years, copy number variation (CNV) of DNA segments has become a hot topic in the study of genetic variation,
and a large amount of CNVs has been uncovered in human populations. The CNVs involving the smallest units of DNA
segments are microsatellite DNAs, and the evolutionary change of microsatellite DNAs is believed to occur mostly by
the increase or decrease of one repeat unit at a time in a more or less neutral fashion. If we note that eukaryotic genomes
contain millions of microsatellite loci, this pattern of nucleotide change is expected to generate random changes of
genome size, that is, genomic drift, and will provide a neutral model of CNV evolution. We therefore investigated the
amount of variation of the total number of repeats (TNR) per individual concerned with 145 microsatellite loci in three
human populations, Africans, Europeans, and Asians. It was shown that the TNR follows the normal distribution in all
three populations and that the extent of variation of TNR is more than 50% greater in Africans than in Europeans and
Asians as expected from the hypothesis of African origin of modern humans. If we consider all microsatellite loci in the
human genome and compute the variation of the total number of nucleotides involved (TNN), it is possible to study the
contribution of microsatellite loci to the genome size variation. This study has shown that the genome sizes of human
individuals are affected considerably by genomic drift of microsatellite DNA alone. This pattern of evolution is similar to
that of olfactory receptor (OR) genes previously studied in human populations and support the idea that the number of
OR genes has evolved in a more or less neutral fashion. However, this conclusion does not necessarily apply to the
genomewide CNVs of various DNA segments, and it appears that long variant DNA fragments are deleterious and under
purifying selection.
Introduction
Microsatellite DNAs are tandem repeats of short nucleotides (1–6 bp) and are abundant in eukaryotic genomes.
In microsatellite loci, the number of repeats changes rapidly, and the extent of polymorphism in populations is very
high. The change of repeat number in microsatellite loci is
generally caused by replication slippage, and the repeat
number mostly increases or decreases by one repeat at
a time. In practice, however, irregular changes such as multistep changes or point mutations are known to occur, and
the number of repeats seems to show a directional change
under certain circumstances (Ellegren 2000, 2004). Therefore, the actual mutational pattern of microsatellite loci is
quite complicated and heterogeneous. In general, however,
the repeat number is believed to change in a random fashion, roughly following the stepwise mutation model (SMM)
(Di Rienzo et al. 1994; Ellegren 2000, 2004; Schlötterer
2000; Xu et al. 2000; Estoup et al. 2002), which was originally developed for electrophoretic loci (Ohta and Kimura
1973).
Recently, many studies have shown that human populations harbor a substantial amount of copy number variation (CNV) of DNA segments caused by duplication,
deletion, insertion, etc. (e.g., Sebat et al. 2004; Tuzun
et al. 2005; Redon et al. 2006) and at least 10% of the human genome contains CNVs (Redon et al. 2006; Scherer
et al. 2007; Wong et al. 2007). CNVs are also known to
generate some complex medical disorders such as cancer
and autoimmune diseases (e.g., Fanciulli et al. 2007; Hollox
et al. 2008) and often occur in multigene families (e.g.,
olfactory receptor [OR] genes, innate immune systems
genes, and b-defensin antimicrobial gene clusters). ThereKey words: microsatellite DNA, genomic drift, copy number
variation, genome evolution, human evolution.
E-mail: [email protected].
Mol. Biol. Evol. 26(8):1835–1840. 2009
doi:10.1093/molbev/msp091
Advance Access publication April 30, 2009
Ó The Author 2009. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
fore, CNVs seem to be an important source of genetic and
phenotypic variation (Feuk et al. 2006; Freeman et al. 2006;
Beckmann et al. 2007; Bailey et al. 2008).
By analyzing Redon et al.’s (2006) data on DNA hybridization, Nozawa et al. (2007) found that the gene copy
number of OR genes varies extensively within and between
human populations and the number is normally distributed
in each of the three populations examined (Africans,
Europeans, and Asians). They then proposed that this
variation is caused by duplication and deletion (or pseudonization) of genes that occur in a random fashion. Nei
(2007) called this type of change the genomic drift. This
proposition may be supported if the total number of repeats
(TNR) involved in microsatellite loci also shows the normal
distribution, because in these loci, the number of repeats is
known to change in a more or less random fashion.
For this reason, we have decided to study the distribution of TNR of microsatellite loci in the African, European,
and Asian populations. We also examined the relative contributions of microsatellite DNAs and OR genes to the
variation of human genome size.
Materials and Methods
Microsatellite DNA data were taken from Ramachandran
et al. (2005) (http://rosenberglab.bioinformatics.med.umich.
edu/data/ramachandranEtAl2005/combinedmicrosats-1027.
stru). In this website, genotypic data of 783 autosomal
microsatellite loci (45 dinucleotide, 175 trinucleotide,
555 tetranucleotide, and 8 pentanucleotide repeat loci)
are available for 53 human populations (see also Takezaki
and Nei 2008). From this data set, we chose 145 loci
(9 dinucleotide, 23 trinucleotide, 112 tetranucleotide,
and 1 pentanucleotide repeat loci) from Biaka Pygmy
(N 5 32) in Africa, French (N 5 29) in Europe, and
Han-Chinese (N 5 34) in Asia after elimination of loci
with missing (null) allele data. The 145 loci used in this
study appear to be good representatives of all 783 loci
in the original data set of Ramachandran et al. (2005)
1836 Takezaki and Nei
Table 1
Average Heterozygosities (H) and Average Number of Alleles
per Locus (Na) for the Three Human Populations
145 Loci
Populations
African
European
Asian
H
0.75
0.70
0.73
The second one was the (dl)2 distance (Goldstein et al.
1995), which is defined by
ðdlÞ2 5
783 Loci
Na
H
7.2
6.3
6.4
0.76
0.70
0.73
Na
7.5
6.4
6.5
because the average heterozygosities and the average
numbers of alleles per locus for the three populations
are nearly the same for the data sets of 145 loci and
783 loci (table 1). The allele frequency data of the 145
loci for the three populations are available in Supplementary Material online.
In the data set used here, microsatellite alleles were
represented by the fragment size of the amplified DNA
at each locus. The fragment for an allele included the flanking regions as well as the repeat region of the locus, and
the actual number of repeats was not known. We computed
the TNR for each individual from the total fragment size
(nucleotides) of the two alleles divided by the repeat unit
length of the locus for all 145 loci and then obtained the
difference of this number from that of the individual
who had the smallest number. This was done to eliminate
the DNA segments in the flanking regions, and therefore,
the TNR for the individual can be obtained approximately.
In this paper, we call this difference the TNR and use it
to study the variation of TNR among different individuals
in populations.
It should be noted that Nozawa et al. (2007) used CNV
data for OR genes from Africans (Yoruba), Europeans
(European descendants in Utah), and Asians (Han-Chinese
and Japanese) obtained by Redon et al. (2006), but these
samples are completely different from the data used in this
study.
The heterozygosity
P for a locus was estimated by
h5f2n=½2n 1gð1 xi2 Þ, where xi is the frequency of
the ith allele in the sample, and n is the number of diploid individuals examined at the locus (Nei 1987). The average
hetP
erozygosity (H) for r loci was estimated by H5 rj51 hj =r,
where hj is the value of h at the jth locus.
The genetic distances between the populations were
computed by two different distance measures. The first
one was Nei’s (1972) standard genetic distance (DS), which
is given by
JXY
DS 5 ln pffiffiffiffiffiffiffiffiffi ;
JX JY
Pr Pmj 2
Pr Pm j 2
where
Pr PmJj X 5 j i xij =r and JY 5 j i yij =r; JXY 5
i xij yij =r; and xij and yij are the frequencies of the
j
ith allele at the jth locus in populations X and Y, respectively. r is the number of loci and mj is the number of alleles
at the jth locus. Under the infinite allele model (IAM)
(Kimura and Crow 1964) with mutation–drift balance,
the expectation of DS, E[DS], is given by 2vt, where v is
the mutation rate per locus per generation and t is the time
measured in generations after the two populations diverged.
r
X
ðlXj lYi Þ2 =r;
j
P
P
where lXj 5 i ixij and lYj 5 i iyij are the average repeat
number of alleles at the jth locus, and xij and yij are the frequencies of the allele with repeat number (allele size) i at the
jth locus in populations X and Y, respectively. Under the
SMM with mutation–drift balance, E[(dl)2] 5 2vt and
the value of (dl)2 is expected to increase linearly with time.
Search of Microsatellite Loci in Genomic Sequence
The members of different types of microsatellite loci in
the human genome were estimated by comparing the human
and chimpanzee chromosome 21 sequences. The alignment of
human and chimpanzee genomic sequences is available at the
UCSC genome web site (http://hgdownload.cse.ucsc.edu/
downloads.html) (Karolchik et al. 2003). Two kinds of aligned
sequences of human and chimpanzee chromosome 21
(chr21.hg18.panTro2.net.axt and chr21.panTro2.hg18.
net.axt) were downloaded. The two different alignments of
the genome sequences were BlastZ hit (Schwartz et al.
2003) from human to chimpanzee sequences and from
chimpanzee to human sequences. We used the regions of
the aligned sequences in chr21.hg18.panTro2.net.axt that
appeared only once in each of the two alignments. These
regions add up to 28 Mb of the human–chimpanzee alignment, about 1% of the entire human genome.
Uninterrupted microsatellite DNA repeat loci with
repeat unit size 1–5 bp were searched by setting the minimum number of repeats of a locus to three and six. If locations of human and chimpanzee loci with the same repeat
motif overlapped, the loci were regarded as orthologous.
Otherwise, the loci were regarded as nonorthologous. Variable loci are loci for which the numbers of repeats are different for human and chimpanzee, and invariable loci are
those with the same number of repeats for human and chimpanzee. Microsatellite loci were searched only in intergenic
and intronic regions by using the information at the UCSC
website.
Results
Variation of Microsatellite Loci in Human Populations
Figure 1 shows the frequency distribution of TNR in
the three human populations. The abscissa of this figure
stands for the TNR summed up for all 145 microsatellite
loci examined. The distribution approximately follows
the normal distribution in each population (P 5 0.96,
0.53, and 0.92 for Africans, Europeans, and Asians, respectively, by the Kolmogorov–Smirnov test).
The mean TNR is smaller in Africans than in
Europeans and Asians (fig. 1). The mean difference in
TNR between Africans and Europeans is statistically
significant (P 5 0.03 by t-test), but the difference between
Africans and Asians is not. Under the assumption of random changes of TNR, these differences are caused
primarily by stochastic errors and do not necessarily
Genomic Drift of Microsatellite DNAs 1837
Number of individuals
10
8
Table 2
MAD of TNR between Two Randomly Chosen Individuals
within and between Populations
Mean=75.1
SD=42.2
African
N=32
6
4
0
10
Number of individuals
African
European
Asian
African
European
Asian
47.0
41.1
40.0
30.4
26.8
32.3
2
0
20
40
60
80
100
120
European
N=29
8
140
160
180
200
Mean=95.3
SD=23.0
6
4
2
0
0
10
Number of individuals
Population
20
40
60
80
100
120
Asian
N=34
8
140
160
180
200
Mean=81.9
SD=28.7
6
4
2
0
0
20
40
60
80
100
120
140
160
180
200
Total repeat number
FIG. 1.—Distributions of total repeat number of 145 microsatellite
loci for individuals of African, European, and Asian populations. The
TNR refers to the difference of the sum of repeat numbers for two alleles
of all loci in an individual and the smallest number observed.
indicate the effect of natural selection. The standard deviation (SD) of TNR in Africans is about 50–80% greater
than that in non-Africans, and the difference is statistically significant (P 5 0.002–0.032 by the F-test). This
observation is consistent with the theory of African origin
of modern humans (see Nei and Roychoudhury 1982;
Cann et al. 1987), which maintains that modern humans
originated in Africa before 116,000 years ago and subsequently spread through the rest of the world. In this
theory, it is likely that the founding populations of nonAfricans were derived from a subset of the African
population and therefore the current non-African populations are expected to have a smaller genetic variation than
the African population.
To measure the extent of variation of TNR caused by
stochastic changes, we computed the mean absolute difference (MAD) of TNN between two randomly chosen individuals within and between populations following Nozawa
et al. (2007). Like the SD values, the MAD for Africans
is about 50% greater than that for Europeans and Asians
(table 2). The interpopulational MAD between Africans
and non-Africans is also about 50% greater than that between Europeans and Asians.
The estimates of Nei’s genetic distance (DS) and (dl)2
distance between Africans and non-Africans are about
twice as large as the distance values between Europeans
and Asians (table 3). The estimates of the both genetic distance measures for Africans and non-Africans are approximately two times higher than the estimates between
Europeans and Asians. This observation is consistent with
the previous results obtained by classical markers and other
microsatellite DNA data (Nei and Roychoudhury 1982;
Goldstein et al. 1995; Cavalli-Sforza and Feldman 2003).
The expected value of DS distance increases linearly with
the time after divergence of populations in mutation–drift
equilibrium in the IAM, whereas the same linear increase
with time holds for (dl)2 in the SMM (see Materials and
Methods). Therefore, this finding suggests that the evolutionary change of microsatellite loci does not occur exactly
following the IAM or SMM but lie somewhere between the
two models (Li 1976; Ellegren 2000, 2004).
The MAD of TNR between Africans and non-Africans
is both about 1.5 times as large as the MAD between
Europeans and Asians. This ratio is somewhat lower than
that of the genetic distance values, but MAD appears to
increase with time, as expected theoretically (Chakraborty
and Nei 1982).
Comparison of Variation of the TNN Between
Microsatellite Loci and OR Genes
It is interesting to study the relative contributions of
microsatellite loci and OR genes to the genomic variation.
However, this cannot be done by comparison of the distributions of TNR for the two groups of genes, because the
number of nucleotides involved in a gene or repeat unit
is not the same. In the case of OR genes, one gene is composed of about 930 nt (310 amino acids per protein), and
Nozawa et al. (2007) showed that the SD of TNR is 10 in
the African population. Therefore, it is possible to obtain
the SD of the total number of nucleotides (TNN)
(SD[TNN]) by multiplying SD by 930. (This is because
the abscissa of the distribution of gene number has to be
converted into the TNN involved.) Therefore, we have
SD(TNN) 5 930 10 5 9,300. This indicates that the genomic variation caused by OR genes is substantial.
Estimation of SD(TNN) for microsatellite loci in the
entire genome is more complicated because the loci we
used are not a random sample of all types of microsatellite
loci. In our search of the microsatellite loci in the human
chromosome 21, we found 70,103 mononucleotide,
2,989 dinucleotide, 188 trinucleotide, 287 tetranucleotide,
and 26 pentanucleotide loci, which are orthologous between human and chimpanzee (table 4). Among these loci,
1838 Takezaki and Nei
Table 3
Genetic Distances between Populations
Genetic Distance
African versus European
African versus Asian
European versus Asian
DS
(dl)2
0.23
0.25
0.10
1.29
1.18
0.60
there were 16,767, 1,974, 118, 217, and 21 variable loci
(loci with different number of repeats between human
and chimpanzee). Note that the estimate of the number
of microsatellite loci varies depending on the minimum
number of repeats used for each locus and some other factors in the search (International Human Genome Sequencing Consortium 2001; Takezaki N, unpublished data).
Furthermore, the variability of microsatellite loci is positively correlated with the number of repeats (Ellegren
2000, 2004). We tried to include highly variable loci by
setting the minimum number of repeats in the search to
six. Although the variability of microsatellite loci seems
to be affected by many other factors, we assume that all
types of microsatellite loci are equally polymorphic to have
a crude estimate of the effect of variation of TNN in microsatellite loci on the genome size. We also assume that the
variable loci on the entire genome are 100 times the numbers of loci found on chromosome 21 because the genomic
region examined was about 1% of the entire human genome
(Materials and Methods). Then, the approximate value of
SD(TNN) for the entire genome in the African population
may be computed in the following way: First, the SD value
of TNR for the 145 microsatellite loci was 42.2 in the
African population (fig. 1). Here we note that if the number
of repeats evolves independently at each of the r loci following the SMM model, the variance of total repeat number
for r loci is given by rV, where V is the variance of repeat
number for a locus (Chakraborty p
and
pffiffiffiffi
ffiffi Nei 1982). Therefore,
the SD of the TNR for r loci is r SD1 , where SD1 5 V .
From the p
SD
ffiffiffiffiffiffiffiffi value of the 145 loci, we obtain
SD1 542:2= 14553:5. For the r loci with repeat unit
length b, the variance of the total nucleotide number
[SD(TNN)]2 is given by b2rV where V 5 (SD1)2. If we
use these results, the variance for the TNR of microsatellite
loci on the genome can be computed as [SD(TNN)]2 5 (12
16,767 100 þ 22 1,974 100 þ 32 118 100 þ
42 217 100 þ 52 21 100) 3.52 5 36,409,450 5
(6,034)2. Therefore, SD(TNN) is 6,034. This estimate is
slightly smaller than that of SD(TNN) for OR genes.
Yet, this number is likely to be an overestimate because experimentalists tend to study highly polymorphic loci rather
than low-polymorphic loci. For this reason, it is interesting
to know a minimum estimate of SD(TNN) considering only
the loci with tri- and tetra-, and pentanucleotide repeats. The
variance of total nucleotide number [SD(TNN)]2 for these
loci is (32 118 100 þ 42 217 100 þ 52 21 100) 3.52 5 6,197,252 5 (2,489)2. Therefore, SD(TNN)
is 2,489. The number of loci for mono- and dinucleotide
repeat loci is nearly 50 times greater than the number of
tri-, tetra-, and pentanucleotide loci (table 4). In addition,
there are some nonorthologous loci (table 4) and a large
number of variable loci with a small repeat number, which
may also contribute to genome size variation (see the numbers of microsatellite loci with the minimum repeat number
set to three in supplementary table S1, Supplementary Material online). Therefore, our estimate of SD(TNN) 5 2,489 is
almost certainly a minimum estimate, though it is very crude.
If we use this minimum estimate, the expected difference [4 SD(TNN)] between the numbers of the upper and
lower 5% levels of individuals in the African population is
about 9,956 nt. This estimate indicates that the contribution
of microsatellite loci to the genome size variation is substantial mainly because of the large number of loci in
the genome. Yet, it appears to be smaller than that of
OR genes, where the large repeat (gene) size contributes
to the genomic variation.
In the above computation, we considered only the
African population. In the European and Asian populations
the SD of the total repeat number is about 65–68% of that
of the African. Therefore, the extent of variation of TNN
is expected to be smaller in these populations, yet the variation is substantial. This is true with the SD(TNN) of OR
genes as well (Nozawa et al. 2007).
Discussion
We have seen that the number of repeats involved in
the microsatellite loci in the human genome approximately
follows the normal distribution as expected from the
Table 4
Numbers of Microsatellite Loci in Human Populations Identified from the Comparison of Genome Sequences of Human and
Chimpanzee Chromosome 21
Orthologous Loci
Repeat Unit
Mononucleotide
Dinucleotide
Trinucleotide
Tetranucleotide
Pentanucleotide
Total
Nonorthologous Loci
Variable
Invariable
All
Human
Chimpanzee
16,767
1,974
118
217
21
19,097
53,336
1,015
70
70
5
54,496
70,103
2,989
188
287
26
73,593
5,087
790
128
408
88
6,501
4,871
727
129
264
38
6,029
The minimum of the number of repeats used for the search of microsatellite loci was set to six in the search of microsatellite loci.
Variable: The loci in which the repeat numbers are different for human and chimpanzee.
Invariable: The loci in which the repeat numbers are the same for human and chimpanzee.
Genomic Drift of Microsatellite DNAs 1839
mutational pattern of these loci, and this observation suggests that a similar normal distribution observed with the
number of OR genes is also caused by random events of
genomic drift.
However, even if CNV is generated by genomic drift,
the gene copy number may not necessarily show a normal
distribution if the number of loci involved is small. For example, the number of gene copies responsible for taste receptors in mammalian species is of the order of 20–30 per
genome (Nei et al. 2008), and consequently the CNV of this
gene family shows a skewed distribution in human populations apparently because of sampling errors.
By contrast, if we consider the CNVs for the entire
genome, the CNVs are apparently generated by insertions,
deletions, and inversions of various sizes of DNA fragments, some fragments being as large as tens of thousands
of nucleotides (Kidd et al. 2008), and the fragment size apparently shows a lognormal distribution (Perry et al. 2008).
This means that the large variant DNA fragments are more
frequent than expected under the normal distribution. In
fact, the most frequent class of variant fragments was that
for about 10 kb, but there were fragments of the order of
megabases in low frequency (Perry et al. 2008). If this distribution is stationary, it suggests either that long fragments
are generated with a lower frequency than short fragments
or that the former are generally deleterious and eventually
eliminated from the population. In practice, the latter hypothesis is more likely to be true than the former, and neutral CNVs seem to be limited to certain groups of gene
families. At the genome level, many new large DNA insertions and deletions are probably eliminated because of the
association with complex medical disorders.
In general, the CNVs in the human genome are heterogeneous, and some are apparently deleterious, whereas
others are more or less neutral or even advantageous. It
would be important to know the evolutionary significance
of these CNVs in the future. In this case, microsatellite loci
may be used as a neutral model of CNVs.
Supplementary Material
The allele frequency data of 145 loci for Africans,
Europeans, and Asians and supplementary table S1 are
available at Molecular Biology and Evolution online
(http://www.mbe.oxfordjournals.org/).
Literature Cited
Bailey JA, Kidd JM, Eichler EE. 2008. Human copy number
polymorphic genes. Cytogenet Genome Res. 123:234–243.
Beckmann JS, Estivill X, Antonarakis SE. 2007. Copy number
variants and genetic traits: closer to the resolution of phenotypic
to genotypic variability. Nat Rev Genet. 8:639–646.
Cann RL, Stoneking M, Wilson AC. 1987. Mitochondrial DNA
and human evolution. Nature. 325:31–36.
Cavalli-Sforza L, Feldman MW. 2003. The application of
molecular genetic approaches to the study of human evolution.
Nat Genet. 33:266–275.
Chakraborty R, Nei M. 1982. Genetic differentiation of
quantitative characters between populations or species. I.
Mutation and random genetic drift. Genet Res. 39:303–314.
Di Rienzo A, Peterson AC, Garza JC, Valdes AM, Slatkin M,
Freimer NB. 1994. Mutational processes of simple-sequence
repeat loci in human populations. Proc Natl Acad Sci USA.
91:3166–3170.
Ellegren H. 2000. Microsatellite mutations in the germline:
implications for evolutionary inference. Trend Genet. 16:
551–558.
Ellegren H. 2004. Microsatellites: simple sequences with
complex evolution. Nat Rev Genet. 5:435–445.
Estoup A, Jarne P, Cornuet J. 2002. Homoplasy and
mutation model at microsatellite loci and their consequences for population genetics analysis. Mol Ecol. 11:
1591–1604.
Fanciulli M, Norsworthy PJ, Petretto E, et al. (20 co-authors).
2007. FCGR3B copy number variation is associated with
susceptibility to systemic, but not organ-specific, autoimmunity. Nat Genet. 39:721–723.
Feuk L, Carson AR, Scherer SW. 2006. Structural variation in the
human genome. Nat Rev Genet. 7:85–97.
Freeman JL, Perry GH, Fuek L, et al. (13 co-authors). 2006.
Copy number variation: new insights in genome diversity.
Genome Res. 16:949–961.
Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman MW.
1995. Genetic absolute dating based on microsatellites and the
origin of modern humans. Proc Natl Acad Sci USA. 92:
6723–6727.
Hollox E, Huffmeier U, Zeeuwen PL, et al. (13 co-authors).
2008. Psoriasis is associated with increased beta-defensin
genomic copy number. Nat Genet. 40:23–25.
International Human Genome Sequencing Consortium. 2001.
Initial sequencing and analysis of the human genome. Nature.
409:860–921.
Karolchik D, Baertsch R, Diekhans M, et al. (13 co-authors).
2003. The UCSC Genome Browser Database. Nucl Acids
Res. 31:51–54.
Kidd JM, Cooper GM, Donahue WF, et al. (46 co-authors). 2008.
Mapping and sequencing of structural variation from eight
human genomes. Nature. 453:56–64.
Kimura M, Crow JF. 1964. The number of alleles that can
be maintained in a finite populations. Genetics. 49:
523–538.
Li WH. 1976. Mixed model of mutation for electrophoretic
identity of proteins within and between populations. Genetics.
83:423–432.
Nei M. 1972. Genetic distance between populations. Am Nat. 106:
203–291.
Nei M. 1987. Molecular evolutionary genetics. New York:
Columbia University Press.
Nei M. 2007. The new mutation theory of phenotypic evolution.
Proc Natl Acad Sci USA. 104:12235–12242.
Nei M, Niimura Y, Nozawa M. 2008. The evolution of animal
chemosensory receptor gene repertoires: roles of chance and
necessity. Nat Rev Genet. 9:951–963.
Nei M, Roychoudhury AK. 1982. Genetic relationship and
evolution of human races. Evol Biol. 14:1–59.
Nozawa M, Kawahara Y, Nei M. 2007. Genomic drift and copy
number variation of sensory receptor genes in humans. Proc
Natl Acad Sci USA. 104:20147–20148.
Ohta T, Kimura M. 1973. A model of mutation appropriate to
estimate the number of electrophoretically detectable alleles in
a finite population. Genet Res. 22:201–204.
Perry GH, Ben-Dor A, Tsalenko A, et al. (17 co-authors). 2008.
The fine-scale and complex architecture of human copynumber variation. Am J Hum Genet. 82:685–695.
Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA,
Feldman MW, Cavalli-Sforza LL. 2005. Support from the
relationship of genetic and geographic distance in human
1840 Takezaki and Nei
populations for a serial founder effect originating in Africa.
Proc Natl Acad Sci USA. 102:15942–15947.
Redon R, Ishikawa S, Fitch KR, et al. (43 co-authors). 2006.
Global variation in copy number in the human genome.
Nature. 444:428–429.
Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE,
Carter NP, Hurles ME, Feuk L. 2007. Challenges and
standards in integrating surveys of structural variation. Nat
Genet. 39:S7–S15.
Schlötterer C. 2000. Evolutionary dynamics of microsatellite
DNA. Chromosoma. 109:365–371.
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R,
Hardison RC, Haussler D, Miller W. 2003. Human-mouse
alignments with BLASTZ. Genome Res. 13:103–107.
Sebat J, Lakshmi B, Troge J, et al. (21 co-authors). 2004.
Large-scale copy number polymorphism. Science. 305:
525–528.
Takezaki N, Nei M. 2008. Empirical tests of the reliability of
phylogenetic trees constructed with microsatellite DNA.
Genetics. 178:385–392.
Tuzun E, Sharp AJ, Bailey JA, et al. (12 co-authors). 2005. Finescale structural variation of the human genome. Nat Genet. 37:
727–732.
Wong KK, deLeeuw RJ, Dosanjh NS, et al. (11 co-authors). 2007.
A comprehensive analysis of common copy-number variations
in the human genome. Am J Hum Genet. 80:91–104.
Xu X, Peng M, Fang Z, Xu X. 2000. The direction of microsatellite mutations is dependent upon allele length. Nat Genet.
24:396–399.
Takashi Gojobori, Associate Editor
Accepted April 22, 2009