Recent acceleration of human adaptive evolution John Hawks ∗ , Eric T. Wang † , Gregory M. Cochran ‡ , Henry C. Harpending§‡ , and Robert K. Moyzis ¶ ∗ Department of Anthropology, University of Wisconsin–Madison, Madison, WI 53706,† Advanced Development, Affymetrix, Inc., Santa Clara, CA 95051,‡ Department of Anthropology, University of Utah, Salt Lake City, UT 84112, and ¶ Department of Biological Chemistry and Institute of Genomics and Bioinformatics, University of California, Irvine, Irvine CA 92697 Submitted to Proceedings of the National Academy of Sciences of the United States of America Genomic surveys in humans identify a large amount of recent positive selection. Using the 3.9M HapMap SNP dataset, we found that selection has accelerated greatly during the last 40,000 years. We tested the null hypothesis that the observed age distribution of recent positively selected linkage blocks is consistent with a constant rate of adaptive substitution during human evolution. We show that a constant rate high enough to explain the number of recently selected variants would predict (1) site heterozygosity at least tenfold lower than is observed in humans, (2) a strong relationship of heterozygosity and local recombination rate, which is not observed in humans, (3) an implausibly high number of adaptive substitutions between humans and chimpanzees, and (4) nearly 100 times the observed number of high-frequency LD blocks. Larger populations generate more new selected mutations, and we show the consistency of the observed data with the historical pattern of human population growth. We consider human demographic growth to be linked with past changes in human cultures and ecologies. Both processes have contributed to the extraordinarily rapid recent genetic evolution of our species. linkage disequilibrium H | positive selection | HapMap | Neolithic uman populations have vastly increased in numbers during the past 50,000 years or more [1]. Under theory, more people means more new adaptive mutations [2]. Hence, population growth should cause an increase in the rate of adaptive substitutions: an acceleration of new positively selected alleles. In humans, this effect may have been augmented by vast changes in cultures and ecology during the Late Pleistocene and Holocene, creating new opportunities for adaptation. Such acceleration does not require any change in the per-genome rate of adaptive mutations; it is a simple effect of changing demography, possibly increased by changing ecology. The best analogy may be the rapid recent evolution of domesticates such as maize [3, 4]. Human genetic variation appears consistent with a recent acceleration of positive selection. A new advantageous mutation that escapes genetic drift will rapidly increase in frequency, more quickly than recombination can shuffle it with other genetic variants [5]. As a result, selection generates long-range blocks of linkage disequilibrium (LD) across tens or hundreds of kilobases, depending on the age of the selected variant and the local recombination rate. The expected decay of LD with distance surrounding a recently selected allele provides a powerful means of discriminating selection from other demographic causes of extended LD, such as bottlenecks and admixture [3, 6]. Previously, we applied the LD decay (LDD) test to SNP data from Perlegen and the HapMap [7], finding evidence for recent selection on approximately 1800 human genes. We refer to these as ascertained selected variants (ASVs). This number encompasses some 7% of human genes, and is consistent with the proportion found in another survey using a related approach [6]. Because LD decays quickly over time, most ASVs are quite recent [8], in comparison to other approaches that detect selection over longer evolutionary time scales [9, 10]. Many human genes are now known to have strongly se- lected alleles in recent historical times, such as lactase [11, 12], CCR5 [13, 14], and FY [15]. These surveys show that such genes are very common. This observation is surprising: in theory, such strongly selected variants should be rare [2, 16]. The observed distribution seems to reflect an exceptionally rapid rate of adaptive evolution. But the hypothesis that genomic data show a high recent rate of selection must overcome three principal objections: (1) Some proportion of ASVs might be neutral loci that exhibit high LD because of recent population expansion or population structure; (2) The LDD test might exhibit an ascertainment bias that misses older selection; and (3) A high constant rate of adaptive substitutions might also explain the large number of ASVs. The first two objections may be effectively tested by restricting our comparisons to a set of frequencies and ages for which neutral explanations or ascertainment biases are most unlikely. We test the third objection by considering a constant long-term rate as a null hypothesis, and by deriving the corresponding high rate of adaptive substitutions from the observed data. This required estimates of allele ages and a theoretical derivation of substitution rate, as described below. Rejecting neutrality. The LDD test is weaker as applied to rare alleles, which may exhibit significant LD because of recent population growth. However, as in [3], we have entirely excluded rare alleles from analysis. By using a very stringent frequency cutoff of 22%, we have included age estimates for only those alleles that provide very strong evidence of selection [3, 7, 14]. Furthermore, the candidate selected genes occur predominantly in genic regions, and preferentially include genes in functional classes that are plausible targets for recent adaptive changes. No neutral explanation, including population structure, can account for these features; only selection can. Finding old alleles. The original Perlegen and HapMap datasets were relatively small (1.6M and 1.0M SNPs, respectively). The low SNP density limited the power of LD methods to detect older selection events, particularly in high-recombination areas of the genome [3]. Therefore, we have now recomputed the LDD test on the newly released 3.9 million HapMap genotype dataset [7]. By varying the LDD test search parameters, we can now statistically detect alleles with more rapid LD decay (and hence older inferred ages) [3]. For all parameters used, the detection threshold was set at an ALnLH greater Conflict of interest footnote placeholder Insert ’This paper was submitted directly to the PNAS office.’ when applicable. Abbreviations: ASV, ascertained selected variant; FRC, fraction of recombinant chromosomes; LD, linkage disequilibrium; SNP, single nucleotide polymorphism E.T.W. and R.K.M. invented analytical tests for selection and performed analyses of empirical data; J.H. and G.M.C. formulated demographic hypotheses; J.H., G.M.C. and H.C.H. perfomed simulations and analysis of demographic models; G.M.C. and H.C.H. organized meetings and correspondence; J.H. prepared the figures; J.H. and R.K.M. wrote the paper. The first three authors contributed equally to this work. § To whom correspondence should be addressed. E-mail: [email protected] c 2007 by The National Academy of Sciences of the USA www.pnas.org — — PNAS Issue Date Volume Issue Number 1–?? than 2.6 SD (≥ 99.5th percentile) from the genome average. Again, this LDD threshold is a stringent cutoff for the detection of genomic outliers, because the high number of selective events are included in the genome average [3]. The probabilistic LDD test does not require the calculation of inferred haplotypes [3], so it is not a daunting computational task to calculate ALnLH values for the HapMap 3.9M SNPs genotyped in 270 individuals: 90 European ancestry (CEU), 90 African (Yoruba) ancestry (YRI), 45 Han Chinese (CHB) and 45 Japanese (JPT). This new analysis uncovered only 12 new SNPs (in 6 clusters) not originally detected in the CEU population [3] and 466 new SNPs representing 206 independent clusters in the YRI population. A total of 2803 (CEU), 2367 (CHB), 2783 (JPT), and 3486 (YRI) selection events were found. As noted previously [3], many inferred selected sites have faster LD decay in YRI samples (with older coalescence times), resulting in lower background LD and more previously unobserved variants. The denser HapMap dataset provided better resoltuion of LD decay (i.e., rapid decay can be reliably detected from background LD only with high density). The 3.9M HapMap dataset discovered more ASVs, but only an incremental increase in the CEU and a (≈ 7%) increase in YRI values. This indicates that most events (defined by the LDD test) coalescing to ages up to 80,000 years ago have been detected, and any ascertainment bias against older selection is very slight within the given frequency range. Ancient selected alleles are also more likely to be near or at fixation than recent alleles. Just as we excluded rare alleles, we also excluded high frequency alleles (i.e., > 78%) in our age distribution. But the number of such high frequency alleles provides another test of the hypothesis that the LDD test has missed older events. We modified the LDD test to find these high-frequency “near-fixed” alleles, and found only 50 candidates. Other studies have likewise found few nearfixed alleles [17, 18]. These studies also show that very few ASVs are shared between HapMap samples; most are population-specific [3, 6]. In our data, only 509 clusters are shared between CEU and YRI samples; many of these are likely to have been under balancing selection (Supplementary material). The small number of near-fixed events and the small number of shared events are strong evidence that the LDD test has not missed a large number of ancient selected alleles. estimated from the YRI sample as a conservative value. Allele ages. We used a modification of previously described methods [19, 20, 21] to estimate an allele age (coalescence time) for each selected cluster. We focused on the HapMap populations with the largest sample sizes, which were the African ancestry (YRI) and European ancestry (CEU) samples. Similar results were obtained for the Chinese (CHB) and Japanese (JPT) populations (data not shown). Fig. 1 presents histograms of these age estimates. The YRI sample shows a modal (peak) age of approximately 8,000 years ago, assuming 25-year generations; the CEU sample shows a peak age of approximately 5250 years ago, both values consistent with earlier work [3, 6]. The difference in peak age likely explains why weaker tests have found stronger evidence of selection in European ancestry samples [22, 23], unlike the current study. Predictions of constant rate. We can derive four predictions from the rate of adaptive substitution, each of which refutes the null hypothesis of constant rate: 1. The null hypothesis predicts that the average nucleotide diversity across the genome should be vastly lower than observed. Recurrent selected substitutions greatly reduce the diversity of linked neutral alleles by hitchhiking or pseudohitchhiking [25, 26]. Using an approximation for site heterozygosity under pseudohitchhiking [25, 27] we estimated the expected site heterozygosity under the null hypothesis as 3.5 × 10−5 (Supplementary material). This value is less than one tenth the observed site heterozygosity, which is between 4.0 and 6.0 ×10−4 in human populations [7, 28, 29]. 2. Hitchhiking is more important in regions of low recombination, so the null hypothesis predicts a strong relationship between nucleotide diversity and local recombination rate. The null hypothesis predicts a tenfold increase in diversity across the range of local recombination rates represented by human gene regions. Empirically, diversity is slightly correlated with local recombination rate, but the relationship is weak, and may be partly explained by mutation rate [7, 30]. 3. The annual rate of 0.53 adaptive substitutions consistent with the YRI data predicts an implausible 6.4 million adaptive substitutions between humans and chimpanzees. In contrast, there are only around 40,000 amino acid substitutions separating these species, and only around 18 million total substitutions [31]. This amount of selection, amounting to more than 1/3 of all substitutions, or 100 times the observed number of amino acid substitutions, is implausible. 4. The null hypothesis predicts that many selected alleles should be found between 78% and 100% frequency. Positively selected alleles follow a logistic growth curve, which proceeds very rapidly through intermediate frequencies. Because selected alleles spend relatively little time in the ascertainment range, the ascertained blocks should be the “tip of the iceberg” of a larger number of recently selected blocks at or near fixation. For example, the ASVs in the YRI dataset have a modal age of ≈ 8,000 years ago. Based on the diffusion model for selection on an additive gene, ascertained variants should only account for 18% of the total number of selected variants still segregating. In contrast, 41% of segregating variants should be above 78%. Dominant alleles (which have a higher fixation probability) progress even more slowly above 78%, so that additivity is the more conservative assumption. Empirically, few such near-fixed variants with high LD scores have been found in the human genome [7]. Modifying the LDD algorithm to specifically search for high frequency “fixed” alleles found only 50 potential sites, in contrast to the greater than 5000 predicted by the constant rate model. While it is possible that the rapid LD decay expected for older selected alleles near fixation may not be detected as efficiently by the LDD test, two other surveys have also found small numbers of such events [17, 18]. This difference of two orders of magnitude is a strong refutation of the null hypothesis. Rate estimation. Using the diffusion model of positive selection [24], we estimated the adaptive substitution rate consistent with the observed age distribution of ASVs. For the YRI data, this estimate is 0.53 substitutions per year. For the CEU data, this estimate is 0.59 substitutions per year. The average fitness advantage of new variants (assuming dominant effects) is estimated as 0.022 for the Yoruba age distribution, and 0.034 for the European distribution. Curves obtained using these estimated values fit the observed data well (Fig. 1). The higher estimated rate for Europeans emerges from the more recent modal age of variants. For further analyses, we used the lower rate Population growth. The rate of adaptive evolution in human populations has indeed accelerated within the past 80,000 years. The results above demonstrate the extent of acceleration: the recent rate must be 1–2 orders of magnitude higher than the long-term rate to explain the genome-wide pattern. Population growth itself predicts an acceleration effect, because the number of new mutations increases as a linear product of the number of individuals [2], and exponential growth increases the fixation probability of new adaptive mutations [32]. We considered the hypothesis that the magnitude of human population growth might explain 2 www.pnas.org — — Footline Author a large fraction of the recent acceleration of new adaptive alleles. To test this hypothesis, we constructed a simplified model of historic and prehistoric population growth, based on historical and archaeological estimates of population size [1, 33, 34]. Population growth in the Upper Paleolithic and late MSA began by 50,000 years ago. Several archaeological indicators show long-term increases in population density, including more small-game exploitation, greater pressure on easily-collected prey species like tortoises and shellfish, more intense hunting of dangerous prey species, and occupation of previously uninhabited islands and circumarctic regions [35]. Demographic growth intensifed during the Holocene, as domestication centers in the Near East, Egypt and China underwent expansions commencing by 10,000 – 8000 years ago [36, 37]. From these centers, population growth spread into Europe, North Africa, South and Southeast Asia, and Australasia during the succeeding 6000 years [37, 38]. Subsaharan Africa bears special consideration, because of its initial large population size and influence on earlier human dispersals [39]. Despite the possible early appearance of annual cereal collection and cattle husbandry in North Africa, subsaharan Africa has no archaeological evidence for agriculture before 4000 years ago [37]. West Asian agricultural plants like wheat did poorly in tropical sun and rainfall regimes, while animals faced a series of diseases that posed barriers to entry [40]. As a consequence, some 2500 years ago the population of Subsaharan Africa was likely fewer than 7 million people, compared to European, West Asian, East Asian, and South Asian populations approaching or in excess of 30 million each [1]. At this time, the Subsaharan population grew at a high rate, with the dispersal of Bantu populations from West Africa and the spread of pastoralism and agriculture southward through East Africa [41, 42]. Our model based on archaeological and historical evidence includes large long-term African population size, gradual Late Pleistocene population growth, an early Neolithic transition in West Asia and Europe, and a later rise in the rate of growth in Subsaharan Africa coincident with agricultural dispersal (Fig. 2). As shown in Fig. 3, the demographic model predicts the recent peak ages of the African and European distributions of selected variants, at a much lower average selection intensity than the constant population size model. In particular, the demographic model readily explains the difference in age distributions between YRI and CEU samples: the YRI sample has more variants dating to earlier times when African populations were large compared to West Asia and Europe, while earlier Neolithic growth in West Asia and Europe led to a pulse of recent variants in those regions. The data that falsify the constant rate model, such as the observed genome-wide heterozygosity value and the probable number of human-chimpanzee adaptive substitutions, are fully consistent with the demographic model. Discussion Our simple demographic model explains much of the recent pattern, but some aspects remain. Although the small number of highfrequency variants (between 78% and 100%) is much more consistent with the demographic model than a constant rate of change, it is still relatively low even considering the rapid acceleration predicted by demography. Demographic change may be the major driver of new adaptive evolution, but the detailed pattern must involve gene functions and gene-environment interactions. Cultural and ecological changes in human populations may explain many details of the pattern. Human migrations into Eurasia created new selective pressures on features such as skin pigmentation, adaptation to cold, and diet [20, 21, 23]. Over this time span, humans both inside and outside Africa underwent rapid skeletal evoFootline Author lution [43, 44]. Some of the most radical new selective pressures have been associated with the transition to agriculture [45]. For example, genes related to disease resistance are among the inferred functional classes most likely to show evidence of recent positive selection [3]. Virulent epidemic diseases, including smallpox, malaria, yellow fever, typhus and cholera, became important causes of mortality after the origin and spread of agriculture [46]. Likewise, subsistence and dietary changes have led to selection on genes such as lactase [12]. It is sometimes claimed that the pace of human evolution should have slowed as cultural adaptation supplanted genetic adaptation. The high empirical number of recent adaptive variants would seem sufficient to refute this claim [3, 6]. It is important to note that the peak ages of new selected variants in our data do not reflect the highest intensity of selection, but merely our ability to detect selection. Due to the recent acceleration, many more new adaptive mutations should exist than have yet been ascertained, occurring at a faster and faster rate during historic times. Adaptive alleles with frequencies under 22% should then greatly outnumber those at higher frequencies. To the extent that new adaptive alleles continued to reflect demographic growth, the Neolithic and later periods would have experienced a rate of adaptive evolution more than 100 times higher than characterized most of human evolution. Cultural changes have reduced mortality rates, but variance in reproduction has continued to fuel genetic change [47]. In our view, the rapid cultural evolution during the Late Pleistocene created vastly more opportunities for further genetic change, not fewer, as new avenues emerged for communication, social interactions, and creativity. Materials and Methods The 3.9 M HapMap release was obtained from the International HapMap Project website (http://www.hapmap.org). The Linkage Disequilibrium Decay (LDD) test [3] was applied to all four HapMap population datasets. Briefly, by examining individuals homozygous for a given SNP, the fraction of inferred recombinant chromosomes (FRC) at adjacent polymorphisms can be directly computed without the need to infer haplotype, a computationally daunting task on such large datasets. The test uses the expected increase with distance in FRC surrounding a selected allele to identify such alleles. Importantly, the method is insensitive to local recombination rate, because local rate will influence the extent of LD surrounding all alleles, while the method looks for LD differences between alleles. By using a large sliding window (ranging from 0.25 to 1.0 Mb in the current study), and by explicitly acknowledging the expected LD structure of selected alleles, the LDD test can distinguish selection from other population genetic/demographic mechanisms resulting in large LD blocks [3]. It has been suggested that assessments of LD decay may be more appropriate to rank candidate regions for selection, instead of to identify regions definitely under selection. This reflects two related concerns: false negative results (missing true selected events) and failing to identify selection on standing variants. Selection on standing variants may have been important in recent human evolution, but it does not respond closely to demographic changes. We have minimized the proportion of false negatives by limiting the frequency range for ascertainment to those with maximum power, as well as by using the denser 3.9 HapMap release. Missing events should bias the age distribution of ASVs to be older than the true distribution; our observations reflect the surprisingly young age distribution, so our analysis of ASVs is conservative. A modification of the LDD test was conducted on the CEU and YRI datasets, to find selected alleles near fixation. Unlike the normal LDD test, all SNPs greater than 78% frequency (the cutoff used for PNAS Issue Date Volume Issue Number 3 primary analysis of this data) were queried, using the same sliding windows as the normal test. Unlike the standard test, however, the requirement that the alternative allele be no more than 1SD from the genome average was not implemented [3]. Ninety-three clusters were identified in the CEU population and 85 in the YRI population (with 65 overlaps), a total of 113 “fixed” events. Unlike normal LDD screens [3], half of these observed fixed events determined by long range LD were in extreme centromeric or telomeric regions, which have no recombination or high recombination, respectively [7, 48]. The interpretation of extended LD in these regions is ambiguous, therefore, since low recombination maintains large LD blocks (centromeres) and well-documented high telomere-telomere exchange homogenizes these regions [48]. Removing these centromeric and telomeric regions in which LD is likely to be the result of mechanisms different than selection yields approximately 50 regions of potential fixation. Clustering. The LD Decay (LDD) test produces “clusters” of SNPs with the signature of selection, due to the extensive LD surrounding these alleles [3]. Each cluster is likely to represent a single selection event, and hence we have attempted to minimize potential over-counting by cluster analysis. Using a simple nearest-neighbor technique, we assign a 10 kb radius to each selected SNP. Each pass through the data produces a new set of centroids, and cluster membership is reassigned to the nearest centroid. A SNP that lies more than 20 kb away from the nearest centroid is considered a new cluster, with it being the sole member. Using larger window sizes (up to 100kb) reduces the number of independent clusters (by approximately half), however at the cost of “fusing” likely independent events (data not shown). We believe the 10 kb window, therefore, is a conservative first-pass clustering of the observed selection events. Each selected SNP identified via the LDD test was sorted and mapped to its physical location on human chromosomes (UCSC Human Genome 17). We iterate through the SNP list, starting with the most distal, and a SNP and its closest neighbor (within 10 kb radius) are clustered together with a new centroid (average) i computed. To be included as part of the ith cluster, the next SNP on the sorted SNP list must fall within 20 kb of the ith cluster. If it is within 20 kb of both an upstream and downstream cluster, to be integrated in the ith cluster it must have a distance to the ith centroid closer than the next closest centroid (i + 1). Otherwise, a new centroid and cluster is initiated. This task is repeated for all SNPs identified by the LDD test. Allele age calculations. Coalescence times (commonly referred to as allele ages) were calculated by methods described previously [19, 20, 21]. Briefly, information contained in neighboring SNPs and the local recombination frequency is used to infer age. The genotyped population is binned (at the SNP under inferred selection, the target SNP) into the major and minor alleles [3]. While every neighboring SNP gives information on the age of the target SNP, a single recombination event carries all the downstream neighbors to an equal or higher fraction of recombinant chromosomes (FRC). Hence, our algorithm moves away (positively and negatively) from the target SNP, and computes allele age only when a higher FRC level is reached in a neighboring SNP. A single neighboring SNP with no neighbors within 20 kb is not used for computation. This method is consistent with the theoretical and experimental expectations of LDD surrounding selected alleles [3]. For neighboring SNPs, allele age is computed using: t= 1 xt − y ln( ) ln (1 − c) 1−y [1] where t = allele age (in generations), c = recombination rate (calculated at the distance to the neighboring SNP), xt = frequency 4 www.pnas.org — — in generation t, and y = frequency on ancestral chromosomes. This method is a method-of-moments estimator [19], because the estimate results from equating the observed proportion of non-recombinant chromosomes with the proportion expected if the true value of t is the estimated value. It requires no population genetic or demographic assumptions, only the exponential decay of initially perfect LD because of recombination. Estimates are obtained until FRC reaches 0.3, to avoid allele age calculations of lower reliability. We assume the ancestral allele is always the allele with neutral or genome average LDD ALnLH scores [3]. Average regional recombination rates were obtained by querying data from ref. [49] in the UCSC database. Regions with less than 0.1 cM/Mb average recombination rate were excluded. All allele age estimates are averages of the individual calculations at the target SNP [21]. Estimating the rate of adaptive substitutions. Under the null hypothesis of a constant rate of adaptive substitution, the age distribution of ascertained selected variants can estimate the mean fitness advantage (s̄) of new selected variants. The empirical distribution of fitness effects of adaptive substitutions is not known. On theoretical grounds, this distribution is expected to approximate a negative exponential [16]. Other studies have assumed this distribution or a gamma distribution with similar shape [50, 51, 52], and selected mutations in laboratory organisms appear to fit this theoretical model [53, 54]. In these expressions, s is the selection coefficient favoring a new mutation, and s̄ is the mean selection coefficient among the set of all advantageous mutations. We assume that adaptive alleles are dominant in effect; this allows the highest fixation probability [55], the most rapid increase in frequencies, and is therefore conservative — less dominance requires a higher substitution rate to explain the observed distribution. The value of s̄ is not known, and we are concerned with finding the single value that creates the best fit of the population size prediction to the observed data. We assumed a negative exponential distribution of s, in which P r[s] = e−s/s̄ . The number of ascertained new adaptive variants originating in any single generation t is given by the equation: nt,asc = 4Nt ν b a se−s/s̄ ds [2] Here, ν is the rate of adaptive mutations per genome per generation and Nt is the effective population size in generation t. This integral derives from the expectation of adaptive mutations in a diploid population (here, 2N ν) multiplied by the fixation probability 2s for each, again assuming dominant fitness effect. Under the null hypothesis, the population size Nt is constant across all generations, so the expected number of new adaptive mutations (ascertained and nonascertained) is likewise constant. We considered the range of s between value a yielding a current mean frequency of 0.22, and b yielding a current mean frequency of 0.78, as derived from the diffusion approximation for dominant advantageous alleles [56]. The parameter ν is constant in effect across all generations, while the number of ascertained variants originating in each generation varies with the range of s placing new alleles in the ascertainment range. We applied a hill-climbing algorithm to find the best-fit value of s̄ for the empirical distribution of block ages, allowing ν to vary freely. With an estimate for s̄, the rate of adaptive mutations, ν, can be estimated as the value that satisfies equation 2. This is also sufficient to estimate the expected number of substitutions per generation, which is the value of the integral in Eq. 2 over the range 0 to infinity (in our analyses, the vast majority had 0.01 ≤ s ≤ 0.1). For the YRI data, assuming dominant fitness effects, the resulting estimate of adaptive substitution rate is 13.25 per generation, or 0.53 Footline Author per year. This work was supported by grants from the U.S. Department of Energy, the National Institute of Mental Health and the National Institute of Aging (to 1. Biraben, J.-N. (2003) Population et Sociétés 394, 1–4. 2. Fisher, R. A. (1930) The Genetical Theory of Natural Selection (Clarendon, Oxford). 3. Wang, E. T., Kodama, G., Baldi, P., & Moyzis, R. K. (2006) Proc. Natl. Acad. Sci. U. S. A. 103, 135–140. 4. Wright, S. I., Bi, I. V., Schroeder, S. G., Yamasaki, M., Doebley, J. F., McMullen, M. D., & Gaut, B. S. (2006) Science 308, 1310–1314. 5. Kim, Y. & Nielsen, R. (2004) Genetics 167, 1513–1524. 6. Voight, B. F., Kudaravalli, S., Wen, X., & Pritchard, J. K. (2006) PLoS Biol. 4, e72. 7. The International HapMap Consortium (2005) Nature 437, 1299–1320. R.K.M.), the Unz Foundation (to G.M.C.), the University of Utah (to H.C.H.), and the Graduate School of the University of Wisconsin (to J.H.). The work benefited from comments and discussions with Alan Fix, Dennis O’Rourke, Kristen Hawkes, Alan Rogers, Chad Huff, Milford Wolpoff, Balaji Srinivasan. 28. Wang, D., Fan, J., Siao, C., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., et al. (1998) Science 280, 1077–1081. 29. Stephens, J. C., Schneider, J. A., Tanguay, D. A., Choi, J., Acharya, T., Stanley, S. E., Jiang, R., messer, C. J., Chew, A., Han, J.-H., et al. (2001) Science 293, 489–493. 30. Hellmann, I., Ebersberger, I., Ptak, S. E., Pääbo, S., & Przeworski, M. (2003) Am. J. Hum. Genet. 72, 1527–1535. 31. The Chimpanzee Sequencing and Analysis Consortium (2005) Nature 437, 69–87. 32. Otto, S. P. & Whitlock, M. C. (1997) Genetics 146, 723–733. 33. Coale, A. J. (1974) Sci. Am. 231, 40–52. 34. Weiss, K. (1984) Hum. Biol. 56, 637–649. 8. Przeworski, M. (2001) Genetics 160, 1179–1189. 35. Stiner, M. C., Munro, N. D., & Surovell, T. A. (2000) Curr. Anthropol. 41, 39–73. 9. Bustamante, C. D., Fledel-Alon, A., Williamson, S., Nielsen, R., Hubisz, M. T., Glanowski, S., Tanenbaum, D. M., White, T. J., Sninsky, J. J., Hernandez, R. D., et al. (2005) Nature 437, 1153– 1157. 36. Bar-Yosef, O. & Belfer-Cohen, A. (1992) In Transitions to Agriculture in Prehistory, eds. Gebauer, A. B. & Price, T. D. (Prehistory Press, Madison, WI). pp. 21–48. 10. Pollard, K. S., Salama, S. R., King, B., Kern, A. D., Dreszer, T., Katzman, S., Siepel, A., Pedersen, J. S., Bejerano, G., Baertsch, R., et al. (2006) PLoS Genet. 2, e168. 37. Bellwood, P. (2005) First Farmers: The Origins of Agricultural Societies (Blackwell Publishing, Oxford, UK). 11. Hollox, E. J., Poulter, M., Zvarek, M., Ferak, V., Krause, A., Jenkins, T., Saha, N., Kozlov, A. I., & Swallow, D. M. (2001) Am. J. Hum. Genet. 68, 160–172. 38. Price, T. D., ed. (2000) Europe’s First Farmers (Cambridge University Press, Cambridge, UK). 12. Bersaglieri, T., Sabeti, P. C., Patterson, N., Vanderploeg, T., Schaffner, S. F., Drake, J. A., Rhodes, M., Reich, D. E., & Hirschhorn, J. N. (2004) Am. J. Hum. Genet. 74, 1111–1120. 40. Gifford-Gonzalez, D. (2000) African Archaeological Review 17, 95–139. 13. Novembre, J., Galvani, A. P., & Slatkin, M. (2005) PLoS Biol. 3, e339. 41. Hanotte, O., Bradley, D. G., Ochieng, J. W., Verjee, Y., Hill, E. W., & Rege, J. E. O. (2002) Science 296, 336–339. 14. Sabeti, P. C., Walsh, E., Schaffner, S. F., Varilly, P., Fry, B., Hutcheson, H. B., Cullen, M., Mikkelsen, T. S., Roy, J., Patterson, N., et al. (2005) PLoS Biol. 3, e378. 42. Diamond, J. & Bellwood, P. (2003) Science 300, 597–603. 15. Hamblin, M. T., Thompson, E. E., & Di Rienzo, A. (2002) Am. J. Hum. Genet. 70. 16. Orr, H. A. (2003) Genetics 163, 1519–1526. 39. Relethford, J. H. (1999) Evol. Anthropol. 8, 7–10. 43. Frayer, D. W. (1977) Am. J. Phys. Anthropol. 46, 109–120. 44. Larsen, C. S. (1995) Annu. Rev. Anthropol. 24, 185–213. 45. Armelagos, G. J. & Harper, K. N. (2005) Evol. Anthropol. 14, 68–77. 17. Williamson, S., Hubisz, M. J., Clark, A. G., Payseur, B. A., Bustamante, C. D., & Nielsen, R. (2007) PLoS Genet. in press. 46. McNeill, W. (1976) Plagues and Peoples (Doubleday, Garden City, NY). 18. Kimura, R., Fujimoto, A., Tokunaga, K., & Ohashi, J. (2007) PLoS One 2, e286. 47. Crow, J. F. (1966) BioScience 16, 863–867. 19. Slatkin, M. & Rannala, B. (2000) Annu. Rev. Genom. Hum. Genet. 1, 225–249. 48. Riethman, H. C., Xiang, Z., Paul, S., Morse, E., Hu, X.-L., Flint, J., Chi, H.-C., Grady, D. L., & Moyzis, R. K. (2001) Nature 409, 948–951. 20. Ding, Y.-C., Chi, H.-C., Grady, D. L., Morishima, A., Kidd, J. R., Kidd, K. K., Flodman, P., Spence, M. A., Schuck, S., Swanson, J. M., et al. (2002) Proc. Natl. Acad. Sci. U. S. A. 99, 309–314. 21. Wang, E., Ding, Y.-C., Flodman, P., Kid, J. R., Kidd, K. K., Grady, D. L., Ryder, O. A., Spence, M. A., Swanson, J. M., & Moyzis, R. K. (2004) Am. J. Hum. Genet. 74, 931–944. 22. Kayser, M., Brauer, S., & Stoneking, M. (2003) Mol. Biol. Evol. 20, 893–900. 23. Akey, J. M., Eberle, M. A., Rieder, M. J., Carlson, C. S., Shriver, M. D., Nickerson, D. A., & Kruglyak, L. (2004) PLoS Biol. 2, e286. 24. Wright, S. (1969) The Theory of Gene Frequencies, Evolution and the Genetics of Populations, vol. 2 (University of Chicago Press, Chicago). 49. Kong, A., Gudbjartsson, D. F., Sainz, J., Jonsdottir, G. M., Gudjonsson, S. A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. (2002) Nature Genet. 31, 241–247. 50. Keightley, P. D. & Lynch, M. (2003) Evolution 57, 683–685. 51. Shaw, F. H., Geyer, C. J., & Shaw, R. G. (2002) Evolution 56, 453–463. 52. Elena, S. F., Ekunwe, L., Hajela, N., Oden, S. A., & Lenski, R. E. (1998) Genetica 102/103, 349–358. 53. Imhof, M. & Schlötterer, C. (2001) Proc. Natl. Acad. Sci. U. S. A. 98, 1113–1117. 54. Kassen, R. & Bataillon, T. (2006) Nature Genet. in press. 25. Gillespie, J. H. (2000) Genetics 155, 909–919. 55. Haldane, J. B. S. (1927) Transactions of the Cambridge Philosophical Society 23, 19–41. 26. Kim, Y. (2006) Genetics 172, 1967–1978. 56. Ewens, W. J. (2004) Mathematical Population Genetics (Cambridge University Press, Cambridge, UK). 27. Betancourt, A. J., Kim, Y., & Orr, H. A. (2004) Genetics 168, 2261–2269. Footline Author PNAS Issue Date Volume Issue Number 5
© Copyright 2026 Paperzz