Recent acceleration of human adaptive evolution

Recent acceleration of human
adaptive evolution
John Hawks ∗ , Eric T. Wang † , Gregory M. Cochran ‡ , Henry C. Harpending§‡ , and Robert K. Moyzis ¶
∗
Department of Anthropology, University of Wisconsin–Madison, Madison, WI 53706,† Advanced Development, Affymetrix, Inc., Santa Clara, CA 95051,‡ Department
of Anthropology, University of Utah, Salt Lake City, UT 84112, and
¶
Department of Biological Chemistry and Institute of Genomics and Bioinformatics, University of
California, Irvine, Irvine CA 92697
Submitted to Proceedings of the National Academy of Sciences of the United States of America
Genomic surveys in humans identify a large amount of recent positive selection. Using the 3.9M HapMap SNP dataset, we found that
selection has accelerated greatly during the last 40,000 years. We
tested the null hypothesis that the observed age distribution of
recent positively selected linkage blocks is consistent with a constant rate of adaptive substitution during human evolution. We
show that a constant rate high enough to explain the number of
recently selected variants would predict (1) site heterozygosity at
least tenfold lower than is observed in humans, (2) a strong relationship of heterozygosity and local recombination rate, which
is not observed in humans, (3) an implausibly high number of
adaptive substitutions between humans and chimpanzees, and
(4) nearly 100 times the observed number of high-frequency LD
blocks. Larger populations generate more new selected mutations, and we show the consistency of the observed data with the
historical pattern of human population growth. We consider human demographic growth to be linked with past changes in human
cultures and ecologies. Both processes have contributed to the
extraordinarily rapid recent genetic evolution of our species.
linkage disequilibrium
H
|
positive selection
|
HapMap
|
Neolithic
uman populations have vastly increased in numbers during the
past 50,000 years or more [1]. Under theory, more people means
more new adaptive mutations [2]. Hence, population growth should
cause an increase in the rate of adaptive substitutions: an acceleration
of new positively selected alleles. In humans, this effect may have
been augmented by vast changes in cultures and ecology during the
Late Pleistocene and Holocene, creating new opportunities for adaptation. Such acceleration does not require any change in the per-genome
rate of adaptive mutations; it is a simple effect of changing demography, possibly increased by changing ecology. The best analogy may
be the rapid recent evolution of domesticates such as maize [3, 4].
Human genetic variation appears consistent with a recent acceleration of positive selection. A new advantageous mutation that escapes
genetic drift will rapidly increase in frequency, more quickly than recombination can shuffle it with other genetic variants [5]. As a result,
selection generates long-range blocks of linkage disequilibrium (LD)
across tens or hundreds of kilobases, depending on the age of the selected variant and the local recombination rate. The expected decay
of LD with distance surrounding a recently selected allele provides a
powerful means of discriminating selection from other demographic
causes of extended LD, such as bottlenecks and admixture [3, 6].
Previously, we applied the LD decay (LDD) test to SNP data from
Perlegen and the HapMap [7], finding evidence for recent selection
on approximately 1800 human genes. We refer to these as ascertained selected variants (ASVs). This number encompasses some
7% of human genes, and is consistent with the proportion found in
another survey using a related approach [6]. Because LD decays
quickly over time, most ASVs are quite recent [8], in comparison to
other approaches that detect selection over longer evolutionary time
scales [9, 10]. Many human genes are now known to have strongly se-
lected alleles in recent historical times, such as lactase [11, 12], CCR5
[13, 14], and FY [15]. These surveys show that such genes are very
common. This observation is surprising: in theory, such strongly selected variants should be rare [2, 16]. The observed distribution seems
to reflect an exceptionally rapid rate of adaptive evolution.
But the hypothesis that genomic data show a high recent rate of
selection must overcome three principal objections: (1) Some proportion of ASVs might be neutral loci that exhibit high LD because of
recent population expansion or population structure; (2) The LDD test
might exhibit an ascertainment bias that misses older selection; and
(3) A high constant rate of adaptive substitutions might also explain
the large number of ASVs. The first two objections may be effectively
tested by restricting our comparisons to a set of frequencies and ages
for which neutral explanations or ascertainment biases are most unlikely. We test the third objection by considering a constant long-term
rate as a null hypothesis, and by deriving the corresponding high rate
of adaptive substitutions from the observed data. This required estimates of allele ages and a theoretical derivation of substitution rate,
as described below.
Rejecting neutrality. The LDD test is weaker as applied to rare alleles, which may exhibit significant LD because of recent population
growth. However, as in [3], we have entirely excluded rare alleles
from analysis. By using a very stringent frequency cutoff of 22%, we
have included age estimates for only those alleles that provide very
strong evidence of selection [3, 7, 14]. Furthermore, the candidate selected genes occur predominantly in genic regions, and preferentially
include genes in functional classes that are plausible targets for recent adaptive changes. No neutral explanation, including population
structure, can account for these features; only selection can.
Finding old alleles. The original Perlegen and HapMap datasets
were relatively small (1.6M and 1.0M SNPs, respectively). The low
SNP density limited the power of LD methods to detect older selection events, particularly in high-recombination areas of the genome
[3]. Therefore, we have now recomputed the LDD test on the newly
released 3.9 million HapMap genotype dataset [7]. By varying the
LDD test search parameters, we can now statistically detect alleles
with more rapid LD decay (and hence older inferred ages) [3]. For all
parameters used, the detection threshold was set at an ALnLH greater
Conflict of interest footnote placeholder
Insert ’This paper was submitted directly to the PNAS office.’ when applicable.
Abbreviations: ASV, ascertained selected variant; FRC, fraction of recombinant chromosomes;
LD, linkage disequilibrium; SNP, single nucleotide polymorphism
E.T.W. and R.K.M. invented analytical tests for selection and performed analyses of empirical
data; J.H. and G.M.C. formulated demographic hypotheses; J.H., G.M.C. and H.C.H. perfomed
simulations and analysis of demographic models; G.M.C. and H.C.H. organized meetings and
correspondence; J.H. prepared the figures; J.H. and R.K.M. wrote the paper. The first three
authors contributed equally to this work.
§
To whom correspondence should be addressed. E-mail: [email protected]
c
2007
by The National Academy of Sciences of the USA
www.pnas.org — —
PNAS
Issue Date
Volume
Issue Number
1–??
than 2.6 SD (≥ 99.5th percentile) from the genome average. Again,
this LDD threshold is a stringent cutoff for the detection of genomic
outliers, because the high number of selective events are included
in the genome average [3]. The probabilistic LDD test does not require the calculation of inferred haplotypes [3], so it is not a daunting
computational task to calculate ALnLH values for the HapMap 3.9M
SNPs genotyped in 270 individuals: 90 European ancestry (CEU),
90 African (Yoruba) ancestry (YRI), 45 Han Chinese (CHB) and 45
Japanese (JPT).
This new analysis uncovered only 12 new SNPs (in 6 clusters)
not originally detected in the CEU population [3] and 466 new SNPs
representing 206 independent clusters in the YRI population. A total
of 2803 (CEU), 2367 (CHB), 2783 (JPT), and 3486 (YRI) selection
events were found. As noted previously [3], many inferred selected
sites have faster LD decay in YRI samples (with older coalescence
times), resulting in lower background LD and more previously unobserved variants. The denser HapMap dataset provided better resoltuion of LD decay (i.e., rapid decay can be reliably detected from
background LD only with high density). The 3.9M HapMap dataset
discovered more ASVs, but only an incremental increase in the CEU
and a (≈ 7%) increase in YRI values. This indicates that most events
(defined by the LDD test) coalescing to ages up to 80,000 years ago
have been detected, and any ascertainment bias against older selection
is very slight within the given frequency range.
Ancient selected alleles are also more likely to be near or at fixation than recent alleles. Just as we excluded rare alleles, we also
excluded high frequency alleles (i.e., > 78%) in our age distribution.
But the number of such high frequency alleles provides another test of
the hypothesis that the LDD test has missed older events. We modified the LDD test to find these high-frequency “near-fixed” alleles, and
found only 50 candidates. Other studies have likewise found few nearfixed alleles [17, 18]. These studies also show that very few ASVs
are shared between HapMap samples; most are population-specific
[3, 6]. In our data, only 509 clusters are shared between CEU and
YRI samples; many of these are likely to have been under balancing
selection (Supplementary material). The small number of near-fixed
events and the small number of shared events are strong evidence that
the LDD test has not missed a large number of ancient selected alleles.
estimated from the YRI sample as a conservative value.
Allele ages. We used a modification of previously described methods [19, 20, 21] to estimate an allele age (coalescence time) for each
selected cluster. We focused on the HapMap populations with the
largest sample sizes, which were the African ancestry (YRI) and European ancestry (CEU) samples. Similar results were obtained for the
Chinese (CHB) and Japanese (JPT) populations (data not shown).
Fig. 1 presents histograms of these age estimates. The YRI
sample shows a modal (peak) age of approximately 8,000 years ago,
assuming 25-year generations; the CEU sample shows a peak age
of approximately 5250 years ago, both values consistent with earlier
work [3, 6]. The difference in peak age likely explains why weaker
tests have found stronger evidence of selection in European ancestry
samples [22, 23], unlike the current study.
Predictions of constant rate. We can derive four predictions from
the rate of adaptive substitution, each of which refutes the null hypothesis of constant rate:
1. The null hypothesis predicts that the average nucleotide diversity across the genome should be vastly lower than observed. Recurrent selected substitutions greatly reduce the diversity of linked neutral
alleles by hitchhiking or pseudohitchhiking [25, 26]. Using an approximation for site heterozygosity under pseudohitchhiking [25, 27] we
estimated the expected site heterozygosity under the null hypothesis
as 3.5 × 10−5 (Supplementary material). This value is less than one
tenth the observed site heterozygosity, which is between 4.0 and 6.0
×10−4 in human populations [7, 28, 29].
2. Hitchhiking is more important in regions of low recombination, so the null hypothesis predicts a strong relationship between
nucleotide diversity and local recombination rate. The null hypothesis predicts a tenfold increase in diversity across the range of local
recombination rates represented by human gene regions. Empirically,
diversity is slightly correlated with local recombination rate, but the
relationship is weak, and may be partly explained by mutation rate
[7, 30].
3. The annual rate of 0.53 adaptive substitutions consistent with
the YRI data predicts an implausible 6.4 million adaptive substitutions between humans and chimpanzees. In contrast, there are only
around 40,000 amino acid substitutions separating these species, and
only around 18 million total substitutions [31]. This amount of selection, amounting to more than 1/3 of all substitutions, or 100 times
the observed number of amino acid substitutions, is implausible.
4. The null hypothesis predicts that many selected alleles should
be found between 78% and 100% frequency. Positively selected
alleles follow a logistic growth curve, which proceeds very rapidly
through intermediate frequencies. Because selected alleles spend relatively little time in the ascertainment range, the ascertained blocks
should be the “tip of the iceberg” of a larger number of recently selected blocks at or near fixation. For example, the ASVs in the YRI
dataset have a modal age of ≈ 8,000 years ago. Based on the diffusion model for selection on an additive gene, ascertained variants
should only account for 18% of the total number of selected variants
still segregating. In contrast, 41% of segregating variants should be
above 78%. Dominant alleles (which have a higher fixation probability) progress even more slowly above 78%, so that additivity is
the more conservative assumption. Empirically, few such near-fixed
variants with high LD scores have been found in the human genome
[7]. Modifying the LDD algorithm to specifically search for high frequency “fixed” alleles found only 50 potential sites, in contrast to the
greater than 5000 predicted by the constant rate model. While it is
possible that the rapid LD decay expected for older selected alleles
near fixation may not be detected as efficiently by the LDD test, two
other surveys have also found small numbers of such events [17, 18].
This difference of two orders of magnitude is a strong refutation of
the null hypothesis.
Rate estimation. Using the diffusion model of positive selection
[24], we estimated the adaptive substitution rate consistent with the
observed age distribution of ASVs. For the YRI data, this estimate
is 0.53 substitutions per year. For the CEU data, this estimate is 0.59
substitutions per year. The average fitness advantage of new variants
(assuming dominant effects) is estimated as 0.022 for the Yoruba age
distribution, and 0.034 for the European distribution. Curves obtained
using these estimated values fit the observed data well (Fig. 1). The
higher estimated rate for Europeans emerges from the more recent
modal age of variants. For further analyses, we used the lower rate
Population growth. The rate of adaptive evolution in human populations has indeed accelerated within the past 80,000 years. The
results above demonstrate the extent of acceleration: the recent rate
must be 1–2 orders of magnitude higher than the long-term rate to
explain the genome-wide pattern.
Population growth itself predicts an acceleration effect, because
the number of new mutations increases as a linear product of the number of individuals [2], and exponential growth increases the fixation
probability of new adaptive mutations [32]. We considered the hypothesis that the magnitude of human population growth might explain
2
www.pnas.org — —
Footline Author
a large fraction of the recent acceleration of new adaptive alleles. To
test this hypothesis, we constructed a simplified model of historic and
prehistoric population growth, based on historical and archaeological
estimates of population size [1, 33, 34].
Population growth in the Upper Paleolithic and late MSA began by
50,000 years ago. Several archaeological indicators show long-term
increases in population density, including more small-game exploitation, greater pressure on easily-collected prey species like tortoises
and shellfish, more intense hunting of dangerous prey species, and occupation of previously uninhabited islands and circumarctic regions
[35]. Demographic growth intensifed during the Holocene, as domestication centers in the Near East, Egypt and China underwent expansions commencing by 10,000 – 8000 years ago [36, 37]. From these
centers, population growth spread into Europe, North Africa, South
and Southeast Asia, and Australasia during the succeeding 6000 years
[37, 38]. Subsaharan Africa bears special consideration, because of
its initial large population size and influence on earlier human dispersals [39]. Despite the possible early appearance of annual cereal
collection and cattle husbandry in North Africa, subsaharan Africa
has no archaeological evidence for agriculture before 4000 years ago
[37]. West Asian agricultural plants like wheat did poorly in tropical
sun and rainfall regimes, while animals faced a series of diseases that
posed barriers to entry [40]. As a consequence, some 2500 years ago
the population of Subsaharan Africa was likely fewer than 7 million
people, compared to European, West Asian, East Asian, and South
Asian populations approaching or in excess of 30 million each [1].
At this time, the Subsaharan population grew at a high rate, with the
dispersal of Bantu populations from West Africa and the spread of pastoralism and agriculture southward through East Africa [41, 42]. Our
model based on archaeological and historical evidence includes large
long-term African population size, gradual Late Pleistocene population growth, an early Neolithic transition in West Asia and Europe,
and a later rise in the rate of growth in Subsaharan Africa coincident
with agricultural dispersal (Fig. 2).
As shown in Fig. 3, the demographic model predicts the recent
peak ages of the African and European distributions of selected variants, at a much lower average selection intensity than the constant
population size model. In particular, the demographic model readily
explains the difference in age distributions between YRI and CEU
samples: the YRI sample has more variants dating to earlier times
when African populations were large compared to West Asia and Europe, while earlier Neolithic growth in West Asia and Europe led to
a pulse of recent variants in those regions. The data that falsify the
constant rate model, such as the observed genome-wide heterozygosity value and the probable number of human-chimpanzee adaptive
substitutions, are fully consistent with the demographic model.
Discussion
Our simple demographic model explains much of the recent pattern, but some aspects remain. Although the small number of highfrequency variants (between 78% and 100%) is much more consistent
with the demographic model than a constant rate of change, it is still
relatively low even considering the rapid acceleration predicted by
demography. Demographic change may be the major driver of new
adaptive evolution, but the detailed pattern must involve gene functions and gene-environment interactions.
Cultural and ecological changes in human populations may explain many details of the pattern. Human migrations into Eurasia
created new selective pressures on features such as skin pigmentation, adaptation to cold, and diet [20, 21, 23]. Over this time span,
humans both inside and outside Africa underwent rapid skeletal evoFootline Author
lution [43, 44]. Some of the most radical new selective pressures have
been associated with the transition to agriculture [45]. For example,
genes related to disease resistance are among the inferred functional
classes most likely to show evidence of recent positive selection [3].
Virulent epidemic diseases, including smallpox, malaria, yellow fever,
typhus and cholera, became important causes of mortality after the origin and spread of agriculture [46]. Likewise, subsistence and dietary
changes have led to selection on genes such as lactase [12].
It is sometimes claimed that the pace of human evolution should
have slowed as cultural adaptation supplanted genetic adaptation. The
high empirical number of recent adaptive variants would seem sufficient to refute this claim [3, 6]. It is important to note that the peak ages
of new selected variants in our data do not reflect the highest intensity
of selection, but merely our ability to detect selection. Due to the
recent acceleration, many more new adaptive mutations should exist
than have yet been ascertained, occurring at a faster and faster rate
during historic times. Adaptive alleles with frequencies under 22%
should then greatly outnumber those at higher frequencies. To the extent that new adaptive alleles continued to reflect demographic growth,
the Neolithic and later periods would have experienced a rate of adaptive evolution more than 100 times higher than characterized most of
human evolution. Cultural changes have reduced mortality rates, but
variance in reproduction has continued to fuel genetic change [47]. In
our view, the rapid cultural evolution during the Late Pleistocene created vastly more opportunities for further genetic change, not fewer,
as new avenues emerged for communication, social interactions, and
creativity.
Materials and Methods
The 3.9 M HapMap release was obtained from the International
HapMap Project website (http://www.hapmap.org). The Linkage Disequilibrium Decay (LDD) test [3] was applied to all four HapMap
population datasets. Briefly, by examining individuals homozygous
for a given SNP, the fraction of inferred recombinant chromosomes
(FRC) at adjacent polymorphisms can be directly computed without
the need to infer haplotype, a computationally daunting task on such
large datasets. The test uses the expected increase with distance in
FRC surrounding a selected allele to identify such alleles. Importantly, the method is insensitive to local recombination rate, because
local rate will influence the extent of LD surrounding all alleles, while
the method looks for LD differences between alleles. By using a large
sliding window (ranging from 0.25 to 1.0 Mb in the current study),
and by explicitly acknowledging the expected LD structure of selected
alleles, the LDD test can distinguish selection from other population
genetic/demographic mechanisms resulting in large LD blocks [3].
It has been suggested that assessments of LD decay may be more
appropriate to rank candidate regions for selection, instead of to identify regions definitely under selection. This reflects two related concerns: false negative results (missing true selected events) and failing
to identify selection on standing variants. Selection on standing variants may have been important in recent human evolution, but it does
not respond closely to demographic changes. We have minimized the
proportion of false negatives by limiting the frequency range for ascertainment to those with maximum power, as well as by using the denser
3.9 HapMap release. Missing events should bias the age distribution
of ASVs to be older than the true distribution; our observations reflect the surprisingly young age distribution, so our analysis of ASVs
is conservative.
A modification of the LDD test was conducted on the CEU and
YRI datasets, to find selected alleles near fixation. Unlike the normal
LDD test, all SNPs greater than 78% frequency (the cutoff used for
PNAS
Issue Date
Volume
Issue Number
3
primary analysis of this data) were queried, using the same sliding
windows as the normal test. Unlike the standard test, however, the
requirement that the alternative allele be no more than 1SD from the
genome average was not implemented [3]. Ninety-three clusters were
identified in the CEU population and 85 in the YRI population (with 65
overlaps), a total of 113 “fixed” events. Unlike normal LDD screens
[3], half of these observed fixed events determined by long range
LD were in extreme centromeric or telomeric regions, which have no
recombination or high recombination, respectively [7, 48]. The interpretation of extended LD in these regions is ambiguous, therefore,
since low recombination maintains large LD blocks (centromeres)
and well-documented high telomere-telomere exchange homogenizes
these regions [48]. Removing these centromeric and telomeric regions
in which LD is likely to be the result of mechanisms different than
selection yields approximately 50 regions of potential fixation.
Clustering. The LD Decay (LDD) test produces “clusters” of SNPs
with the signature of selection, due to the extensive LD surrounding these alleles [3]. Each cluster is likely to represent a single selection event, and hence we have attempted to minimize potential
over-counting by cluster analysis. Using a simple nearest-neighbor
technique, we assign a 10 kb radius to each selected SNP. Each pass
through the data produces a new set of centroids, and cluster membership is reassigned to the nearest centroid. A SNP that lies more than
20 kb away from the nearest centroid is considered a new cluster, with
it being the sole member. Using larger window sizes (up to 100kb)
reduces the number of independent clusters (by approximately half),
however at the cost of “fusing” likely independent events (data not
shown). We believe the 10 kb window, therefore, is a conservative
first-pass clustering of the observed selection events.
Each selected SNP identified via the LDD test was sorted and
mapped to its physical location on human chromosomes (UCSC Human Genome 17). We iterate through the SNP list, starting with the
most distal, and a SNP and its closest neighbor (within 10 kb radius)
are clustered together with a new centroid (average) i computed. To
be included as part of the ith cluster, the next SNP on the sorted SNP
list must fall within 20 kb of the ith cluster. If it is within 20 kb of both
an upstream and downstream cluster, to be integrated in the ith cluster
it must have a distance to the ith centroid closer than the next closest
centroid (i + 1). Otherwise, a new centroid and cluster is initiated.
This task is repeated for all SNPs identified by the LDD test.
Allele age calculations. Coalescence times (commonly referred
to as allele ages) were calculated by methods described previously
[19, 20, 21]. Briefly, information contained in neighboring SNPs and
the local recombination frequency is used to infer age. The genotyped
population is binned (at the SNP under inferred selection, the target
SNP) into the major and minor alleles [3]. While every neighboring
SNP gives information on the age of the target SNP, a single recombination event carries all the downstream neighbors to an equal or
higher fraction of recombinant chromosomes (FRC). Hence, our algorithm moves away (positively and negatively) from the target SNP,
and computes allele age only when a higher FRC level is reached
in a neighboring SNP. A single neighboring SNP with no neighbors
within 20 kb is not used for computation. This method is consistent
with the theoretical and experimental expectations of LDD surrounding selected alleles [3].
For neighboring SNPs, allele age is computed using:
t=
1
xt − y
ln(
)
ln (1 − c)
1−y
[1]
where t = allele age (in generations), c = recombination rate
(calculated at the distance to the neighboring SNP), xt = frequency
4
www.pnas.org — —
in generation t, and y = frequency on ancestral chromosomes. This
method is a method-of-moments estimator [19], because the estimate
results from equating the observed proportion of non-recombinant
chromosomes with the proportion expected if the true value of t is the
estimated value. It requires no population genetic or demographic assumptions, only the exponential decay of initially perfect LD because
of recombination. Estimates are obtained until FRC reaches 0.3, to
avoid allele age calculations of lower reliability. We assume the ancestral allele is always the allele with neutral or genome average LDD
ALnLH scores [3]. Average regional recombination rates were obtained by querying data from ref. [49] in the UCSC database. Regions
with less than 0.1 cM/Mb average recombination rate were excluded.
All allele age estimates are averages of the individual calculations at
the target SNP [21].
Estimating the rate of adaptive substitutions. Under the null
hypothesis of a constant rate of adaptive substitution, the age distribution of ascertained selected variants can estimate the mean fitness
advantage (s̄) of new selected variants. The empirical distribution of
fitness effects of adaptive substitutions is not known. On theoretical
grounds, this distribution is expected to approximate a negative exponential [16]. Other studies have assumed this distribution or a gamma
distribution with similar shape [50, 51, 52], and selected mutations
in laboratory organisms appear to fit this theoretical model [53, 54].
In these expressions, s is the selection coefficient favoring a new mutation, and s̄ is the mean selection coefficient among the set of all
advantageous mutations. We assume that adaptive alleles are dominant in effect; this allows the highest fixation probability [55], the
most rapid increase in frequencies, and is therefore conservative —
less dominance requires a higher substitution rate to explain the observed distribution. The value of s̄ is not known, and we are concerned
with finding the single value that creates the best fit of the population
size prediction to the observed data. We assumed a negative exponential distribution of s, in which P r[s] = e−s/s̄ . The number of
ascertained new adaptive variants originating in any single generation
t is given by the equation:
nt,asc = 4Nt ν
b
a
se−s/s̄ ds
[2]
Here, ν is the rate of adaptive mutations per genome per generation
and Nt is the effective population size in generation t. This integral
derives from the expectation of adaptive mutations in a diploid population (here, 2N ν) multiplied by the fixation probability 2s for each,
again assuming dominant fitness effect. Under the null hypothesis, the
population size Nt is constant across all generations, so the expected
number of new adaptive mutations (ascertained and nonascertained)
is likewise constant.
We considered the range of s between value a yielding a current
mean frequency of 0.22, and b yielding a current mean frequency of
0.78, as derived from the diffusion approximation for dominant advantageous alleles [56]. The parameter ν is constant in effect across
all generations, while the number of ascertained variants originating
in each generation varies with the range of s placing new alleles in the
ascertainment range. We applied a hill-climbing algorithm to find the
best-fit value of s̄ for the empirical distribution of block ages, allowing
ν to vary freely. With an estimate for s̄, the rate of adaptive mutations, ν, can be estimated as the value that satisfies equation 2. This
is also sufficient to estimate the expected number of substitutions per
generation, which is the value of the integral in Eq. 2 over the range
0 to infinity (in our analyses, the vast majority had 0.01 ≤ s ≤ 0.1).
For the YRI data, assuming dominant fitness effects, the resulting estimate of adaptive substitution rate is 13.25 per generation, or 0.53
Footline Author
per year.
This work was supported by grants from the U.S. Department of Energy, the
National Institute of Mental Health and the National Institute of Aging (to
1. Biraben, J.-N. (2003) Population et Sociétés 394, 1–4.
2. Fisher, R. A. (1930) The Genetical Theory of Natural Selection (Clarendon, Oxford).
3. Wang, E. T., Kodama, G., Baldi, P., & Moyzis, R. K. (2006) Proc. Natl. Acad. Sci. U. S. A. 103,
135–140.
4. Wright, S. I., Bi, I. V., Schroeder, S. G., Yamasaki, M., Doebley, J. F., McMullen, M. D., & Gaut,
B. S. (2006) Science 308, 1310–1314.
5. Kim, Y. & Nielsen, R. (2004) Genetics 167, 1513–1524.
6. Voight, B. F., Kudaravalli, S., Wen, X., & Pritchard, J. K. (2006) PLoS Biol. 4, e72.
7. The International HapMap Consortium (2005) Nature 437, 1299–1320.
R.K.M.), the Unz Foundation (to G.M.C.), the University of Utah (to H.C.H.),
and the Graduate School of the University of Wisconsin (to J.H.). The work
benefited from comments and discussions with Alan Fix, Dennis O’Rourke,
Kristen Hawkes, Alan Rogers, Chad Huff, Milford Wolpoff, Balaji Srinivasan.
28. Wang, D., Fan, J., Siao, C., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., et al. (1998) Science 280, 1077–1081.
29. Stephens, J. C., Schneider, J. A., Tanguay, D. A., Choi, J., Acharya, T., Stanley, S. E., Jiang, R.,
messer, C. J., Chew, A., Han, J.-H., et al. (2001) Science 293, 489–493.
30. Hellmann, I., Ebersberger, I., Ptak, S. E., Pääbo, S., & Przeworski, M. (2003) Am. J. Hum. Genet.
72, 1527–1535.
31. The Chimpanzee Sequencing and Analysis Consortium (2005) Nature 437, 69–87.
32. Otto, S. P. & Whitlock, M. C. (1997) Genetics 146, 723–733.
33. Coale, A. J. (1974) Sci. Am. 231, 40–52.
34. Weiss, K. (1984) Hum. Biol. 56, 637–649.
8. Przeworski, M. (2001) Genetics 160, 1179–1189.
35. Stiner, M. C., Munro, N. D., & Surovell, T. A. (2000) Curr. Anthropol. 41, 39–73.
9. Bustamante, C. D., Fledel-Alon, A., Williamson, S., Nielsen, R., Hubisz, M. T., Glanowski, S.,
Tanenbaum, D. M., White, T. J., Sninsky, J. J., Hernandez, R. D., et al. (2005) Nature 437, 1153–
1157.
36. Bar-Yosef, O. & Belfer-Cohen, A. (1992) In Transitions to Agriculture in Prehistory, eds. Gebauer,
A. B. & Price, T. D. (Prehistory Press, Madison, WI). pp. 21–48.
10. Pollard, K. S., Salama, S. R., King, B., Kern, A. D., Dreszer, T., Katzman, S., Siepel, A., Pedersen,
J. S., Bejerano, G., Baertsch, R., et al. (2006) PLoS Genet. 2, e168.
37. Bellwood, P. (2005) First Farmers: The Origins of Agricultural Societies (Blackwell Publishing,
Oxford, UK).
11. Hollox, E. J., Poulter, M., Zvarek, M., Ferak, V., Krause, A., Jenkins, T., Saha, N., Kozlov, A. I., &
Swallow, D. M. (2001) Am. J. Hum. Genet. 68, 160–172.
38. Price, T. D., ed. (2000) Europe’s First Farmers (Cambridge University Press, Cambridge, UK).
12. Bersaglieri, T., Sabeti, P. C., Patterson, N., Vanderploeg, T., Schaffner, S. F., Drake, J. A., Rhodes,
M., Reich, D. E., & Hirschhorn, J. N. (2004) Am. J. Hum. Genet. 74, 1111–1120.
40. Gifford-Gonzalez, D. (2000) African Archaeological Review 17, 95–139.
13. Novembre, J., Galvani, A. P., & Slatkin, M. (2005) PLoS Biol. 3, e339.
41. Hanotte, O., Bradley, D. G., Ochieng, J. W., Verjee, Y., Hill, E. W., & Rege, J. E. O. (2002) Science
296, 336–339.
14. Sabeti, P. C., Walsh, E., Schaffner, S. F., Varilly, P., Fry, B., Hutcheson, H. B., Cullen, M., Mikkelsen,
T. S., Roy, J., Patterson, N., et al. (2005) PLoS Biol. 3, e378.
42. Diamond, J. & Bellwood, P. (2003) Science 300, 597–603.
15. Hamblin, M. T., Thompson, E. E., & Di Rienzo, A. (2002) Am. J. Hum. Genet. 70.
16. Orr, H. A. (2003) Genetics 163, 1519–1526.
39. Relethford, J. H. (1999) Evol. Anthropol. 8, 7–10.
43. Frayer, D. W. (1977) Am. J. Phys. Anthropol. 46, 109–120.
44. Larsen, C. S. (1995) Annu. Rev. Anthropol. 24, 185–213.
45. Armelagos, G. J. & Harper, K. N. (2005) Evol. Anthropol. 14, 68–77.
17. Williamson, S., Hubisz, M. J., Clark, A. G., Payseur, B. A., Bustamante, C. D., & Nielsen, R.
(2007) PLoS Genet. in press.
46. McNeill, W. (1976) Plagues and Peoples (Doubleday, Garden City, NY).
18. Kimura, R., Fujimoto, A., Tokunaga, K., & Ohashi, J. (2007) PLoS One 2, e286.
47. Crow, J. F. (1966) BioScience 16, 863–867.
19. Slatkin, M. & Rannala, B. (2000) Annu. Rev. Genom. Hum. Genet. 1, 225–249.
48. Riethman, H. C., Xiang, Z., Paul, S., Morse, E., Hu, X.-L., Flint, J., Chi, H.-C., Grady, D. L., &
Moyzis, R. K. (2001) Nature 409, 948–951.
20. Ding, Y.-C., Chi, H.-C., Grady, D. L., Morishima, A., Kidd, J. R., Kidd, K. K., Flodman, P., Spence,
M. A., Schuck, S., Swanson, J. M., et al. (2002) Proc. Natl. Acad. Sci. U. S. A. 99, 309–314.
21. Wang, E., Ding, Y.-C., Flodman, P., Kid, J. R., Kidd, K. K., Grady, D. L., Ryder, O. A., Spence,
M. A., Swanson, J. M., & Moyzis, R. K. (2004) Am. J. Hum. Genet. 74, 931–944.
22. Kayser, M., Brauer, S., & Stoneking, M. (2003) Mol. Biol. Evol. 20, 893–900.
23. Akey, J. M., Eberle, M. A., Rieder, M. J., Carlson, C. S., Shriver, M. D., Nickerson, D. A., &
Kruglyak, L. (2004) PLoS Biol. 2, e286.
24. Wright, S. (1969) The Theory of Gene Frequencies, Evolution and the Genetics of Populations,
vol. 2 (University of Chicago Press, Chicago).
49. Kong, A., Gudbjartsson, D. F., Sainz, J., Jonsdottir, G. M., Gudjonsson, S. A., Richardsson, B.,
Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. (2002) Nature Genet. 31, 241–247.
50. Keightley, P. D. & Lynch, M. (2003) Evolution 57, 683–685.
51. Shaw, F. H., Geyer, C. J., & Shaw, R. G. (2002) Evolution 56, 453–463.
52. Elena, S. F., Ekunwe, L., Hajela, N., Oden, S. A., & Lenski, R. E. (1998) Genetica 102/103,
349–358.
53. Imhof, M. & Schlötterer, C. (2001) Proc. Natl. Acad. Sci. U. S. A. 98, 1113–1117.
54. Kassen, R. & Bataillon, T. (2006) Nature Genet. in press.
25. Gillespie, J. H. (2000) Genetics 155, 909–919.
55. Haldane, J. B. S. (1927) Transactions of the Cambridge Philosophical Society 23, 19–41.
26. Kim, Y. (2006) Genetics 172, 1967–1978.
56. Ewens, W. J. (2004) Mathematical Population Genetics (Cambridge University Press, Cambridge,
UK).
27. Betancourt, A. J., Kim, Y., & Orr, H. A. (2004) Genetics 168, 2261–2269.
Footline Author
PNAS
Issue Date
Volume
Issue Number
5