Apparent Trends of Amino Acid Gain and Loss in Protein Evolution Due to Nearly Neutral Variation John H. McDonald Department of Biological Sciences, University of Delaware It has recently been claimed that certain amino acids have been increasing in frequency in all living organisms for most of the history of life on earth, while other amino acids have been decreasing in frequency. Three lines of evidence have been offered for this assertion, but each has a more plausible alternative interpretation. Here I show that unequal patterns of gains and losses for particular pairs of amino acids (such as more leucine / phenylalanine than phenylalanine / leucine substitutions in humans and chimpanzees since they split from a common ancestor) are consistent with a simple neutral model at equilibrium amino acid frequencies. Unequal numbers of gains and losses for particular amino acids (such as more gains than losses of cysteine) are shown by simulations to be consistent with a model of nearly neutral evolution. Unequal numbers of gains and losses for particular amino acids in human polymorphism data are shown by simulations to be explainable by the nearly neutral model as well. In a comparison of protein sequences from four strains of Escherichia coli, polarized by one outgroup strain of Salmonella, the disparity in number of gains and losses for particular amino acids is strong in terminal branches but weaker or nonexistent in internal branches, which is inconsistent with the universal trend model but as expected under the nearly neutral model. Introduction Jordan et al. (2005) compared protein sequences from pairs of closely related bacteria, yeast, and mammals, inferring the ancestral states by comparison with a more distantly related outgroup for each pair of species. They also examined human protein polymorphism data, with the ancestral allele at each polymorphic site inferred from comparison with chimpanzee sequences. Their surprising conclusion was that certain amino acids, such as cysteine, methionine, histidine, serine, and phenylalanine, are more likely to be gained than lost in protein evolution, while other amino acids, such as proline, alanine, glutamic acid, and glycine, are more likely to be lost than gained. Furthermore, they concluded that these trends are universal in all living organisms and have been ongoing for billions of years. They speculated that these trends result because some amino acids, having been added to the genetic code relatively recently, are still increasing toward their equilibrium frequency. Earlier, Zuckerkandl, Derancourt, and Vogel (1971) performed a similar analysis of a much smaller data set (globins and cytochrome c in several vertebrates) and reached similar conclusions. Here I examine the evidence provided by Jordan et al. (2005) and suggest alternate explanations that do not involve universal, directional changes in amino acid composition. Substitutional Asymmetry of Pairs of Amino Acids Jordan et al. (2005) offered three lines of evidence for universal trends of change in amino acid frequencies. The first is that for many pairs of amino acids, there are unequal numbers of substitutions in one direction compared with those in the opposite direction. For example, in the comparison of humans with chimpanzees, using mice (Mus musculus) as the outgroup, there are 102 sites where phenylalanine has been gained and leucine lost, while only 57 Key words: protein evolution, Markov chain, nearly neutral model, Escherichia coli. E-mail: [email protected]. Mol. Biol. Evol. 23(2):240–244. 2006 doi:10.1093/molbev/msj026 Advance Access publication September 29, 2005 Ó The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] sites have the opposite pattern (leucine gained and phenylalanine lost). Jordan et al. (2005) conclude that amino acid frequencies are not stationary but are instead changing in frequency. In a two-allele system at equilibrium, there must be equal number of substitutions in each direction; a significant difference in the number of substitutions would indeed indicate that the allele frequencies were changing. However, there are 20 possible amino acids at any site. For a site with more than two possible alleles, equal numbers of substitutions in each direction would only be expected if the evolutionary process were a time-reversible Markov process with stationary amino acid frequencies (Liò and Goldman 1998). In a reversible process, by definition the number of A / B substitutions per unit time is equal to the number of B / A substitutions, so that the process would look the same whether time was running forward or backward. Reversibility is usually assumed when inferring a matrix of substitution rates from pairwise comparisons of sequences (Dayhoff, Schwartz, and Orcutt 1978; Jones, Taylor, and Thornton 1992; S. Henikoff and J. G. Henikoff 2000; Müller and Vingron 2000; Goldman and Whelan 2002; Veerassamy, Smith, and Tillier 2003; Kosiol and Goldman 2005). However, this assumption is made purely for mathematical convenience; there is no biological evidence for reversibility (Liò and Goldman 1998). With more than two alleles, it is easy to create neutral models of protein evolution that are not reversible. For example, consider a three-allele model in which the substitution rate from B to C (PBC) is 4 3 10ÿ5 substitutions per generation and PAB 5 PAC 5 PBA 5 PCA 5 PCB 5 1 3 10ÿ5. Using this rate matrix in a Markov chain model until the amino acid frequencies become stationary, the frequencies of A, B, and C are fA 5 0.333, fB 5 0.167, and fC 5 0.500. Because there are twice as many A as B amino acids and PAB 5 PBA, there will be twice as many A / B as B / A substitutions per unit time. Clearly, differences in the numbers of forward and reverse substitutions for pairs of amino acids are consistent with a simple neutral model at equilibrium and do not necessarily indicate changing amino acid frequencies. Nearly Neutral Protein Evolution 241 FIG. 1.—The nine possible fates of a mildly deleterious B allele with a lifetime shorter than any branch on the phylogeny. Phylogenies A–D have one gain and one loss of B, phylogeny E has two losses, phylogeny F has one gain and two losses, and phylogenies G–I have one gain and no losses. There are thus eight gains and eight losses in total, although only the two gains in H and I would be inferred from modern sequence data. Unequal Numbers of Gains and Losses of Amino Acids The second line of evidence offered by Jordan et al. (2005) is that certain amino acids are gained more than they are lost, while others are lost more than they are gained. For example, in the comparison of humans with chimpanzees there are 137 sites where cysteine has been substituted for a different ancestral amino acid (‘‘gains’’ of cysteine) and only 72 sites where cysteine has been lost; similar excess gains of cysteine are seen in rodents, yeast, and 12 taxa of bacteria. This could be explained by a trend of increasing frequencies of cysteine, but there are other explanations that do not involve changing amino acid composition. One problem with using parsimony to infer gains and losses of amino acids is that the inferences are sometimes incorrect; when an outgroup and one ingroup have amino acid B and the other ingroup has A, there may have been two A / B substitutions, not one B / A. This is particularly likely if A is more common than B; there will be more inferred common / rare than rare / common substitutions, even when the actual number of substitutions in each direction is equal (Collins, Wimberger, and Naylor 1994; Perna and Kocher 1995). Jordan et al. (2005) attempted to avoid this problem by using closely related species, but there can be substantially more inferred common / rare substitutions despite fairly small amounts of divergence (Eyre-Walker 1998). Sites which fit the nearly neutral model of protein evolution (Ohta 1992) may be particularly likely to show more inferred substitutions in one direction than the other. Consider a site at which amino acid A is favored by natural selection. When a mutation to amino acid B occurs, it is selected against. If the selection against B is weak enough, B may remain present for a while before being replaced by A again. Under this model, there is a chance that when comparing sequences from an outgroup and two ingroup species, one of the ingroups will have the mildly deleterious B, while the other two species have the preferred A. This would be interpreted as a gain of B. On the other hand, an apparent loss of B, in which the outgroup and one of the ingroup species would both have the deleterious B while only one of the ingroup species has the preferred A, would be quite unlikely; it would require either two independent substitutions of the deleterious B, one in the outgroup and FIG. 2.—Initial (1) and final (3) frequencies of the mildly deleterious allele B after 3,000 generations of simulated evolution. one in an ingroup, or a deleterious B that survived in two lineages since the common ancestor of all three species. Under this nearly neutral model, there could be many more apparent gains of B than losses, even if the ancestral state at sites that differ among the taxa is always inferred correctly. This may seem paradoxical, but the ‘‘missing’’ losses of B would occur at sites where a common ancestor of two species happened to have the mildly deleterious B, which was replaced by the favored A in both daughter lineages (fig. 1). To illustrate how nearly neutral evolution could produce a difference in the number of gains and losses, I wrote a computer program to simulate mutation, drift, and selection for a two-allele locus, with allele A being preferred and allele B being deleterious. The population size was 25 diploid individuals, and the fitnesses of genotypes AA, AB, and BB were 1, 1 ÿ s, and 1 ÿ 2s, respectively. The A / B and B / A mutation rates, lAB and lBA, were 0.0001 per generation. For each replicate, a single population was started with the initial state either fixed for A or fixed for B. The initial probability of being fixed for B, PB, was determined by setting the number of A / B substitutions equal to the number of B / A substitutions, PAlABuB 5 PBlBAuA, solving for PB, PB 5 uB/(uA 1 uB), and then using equation (10) of Kimura (1962) for uA and uB, the probabilities of fixation of A and B. Evolution in this initial population was first simulated for 1,000 generations to produce equilibrium levels of polymorphism. The population was then duplicated into an outgroup and an ingroup lineage, and evolution was simulated for 1,000 more generations. The ingroup was then duplicated, and evolution was simulated in the outgroup and the two ingroups for 1,000 more generations. One allele was then sampled at random from each of the three species. If the two ingroup alleles were different, the allele that was in the outgroup was counted as lost in one of the ingroups, while the allele that was not in the outgroup was counted as gained. The simulations were replicated 10,000 times for each selection coefficient. When 2Ns 5 0 (the neutral model), the equilibrium frequency of B is 0.5; as the selection coefficient against B increases, the equilibrium frequency of B declines (fig. 2). The final frequency is the same as the initial frequency, indicating that there is no trend of changing allele frequencies in 242 McDonald FIG. 3.—Number of replicate simulations of a three-species model with inferred gains (gray bars) or losses (black bars) of a mildly deleterious allele in one of the two ingroup species. FIG. 4.—Number of replicate simulations of a two-species model with inferred gains (gray bars) or losses (black bars) of a mildly deleterious allele as a polymorphism in the ingroup species. these simulations. The number of gains of B initially increases and then declines as the selection coefficient against B increases (fig. 3). The increase is presumably due to the increasing frequency of A (and thus increasing frequency of sites where B can be gained). The number of losses of B declines more rapidly; as selection against B intensifies and the average frequency of B decreases, it becomes unlikely that both the outgroup and one of the ingroups would have B at the end of the simulated generations (the pattern that is interpreted as a loss of B). As a result, nearly neutral evolution produces many more apparent gains than losses of a mildly deleterious allele. Although this simple two-allele model could be made more elaborate and realistic, the results seem sufficient to demonstrate that unequal numbers of gains and losses do not necessarily indicate changing allele frequencies. ses in polymorphism data (fig. 4). The results demonstrate that the unequal numbers of gains and losses in human polymorphism data found by Jordan et al. (2005) do not necessarily indicate changing allele frequencies but may merely add to the evidence that many protein polymorphisms in humans are mildly deleterious, an interpretation that is amply supported by evidence from mitochondrial (Nachman et al. 1996; Hasegawa, Cao, and Yang 1998; Moilanen and Majamaa 2003; Elson, Turnbull, and Howell 2004; Ho et al. 2005) and nuclear genes (Cargill et al. 1999; Fay, Wyckoff, and Wu 2001; Sunyaev et al. 2001, 2003; Hughes et al. 2003; Williamson et al. 2005). Nearly Neutral Human Polymorphisms The third line of evidence for a universal trend offered by Jordan et al. (2005) is that for human protein polymorphisms (with chimpanzee as the outgroup), the ratio of gains to losses for most amino acids is significantly different from one. However, the nearly neutral model could also explain this result. Consider a site where A is favored by selection and B is mildly deleterious. If the selection against B is weak, it is possible for B to appear as a polymorphism for a while. There will thus be some polymorphisms where A is the ancestral allele and B has been gained. But because B is mildly deleterious, it is unlikely to occur in both the outgroup and as a polymorphism in the ingroup. Therefore, there will be fewer sites where B is the ancestral allele and A is a derived polymorphism. To test whether nearly neutral evolution could produce more gains than losses for polymorphism data, I simulated evolution using the same model described above. The original population evolved for 1,000 generations, then it was duplicated and the outgroup and ingroup species evolved for 1,000 more generations. If the ingroup was polymorphic, one allele was sampled at random from the outgroup to infer which allele was gained and which was lost. The simulations were replicated 10,000 times for each selection coefficient. The simulations show that for a broad range of selection coefficients, nearly neutral evolution results in a large number of differences between the number of gains and los- Nearly Neutral Polymorphisms in Escherichia coli One way to distinguish between the universal trend model and the nearly neutral model is to examine the pattern of gains and losses on a phylogeny with more than three taxa. If the long-term directional trend postulated by Jordan et al. (2005) is correct, the ratio of gains to losses should be similar in all parts of a phylogeny. Under the nearly neutral model, however, the apparent excess gains will be confined mostly to the tips of a phylogeny, as mildly deleterious substitutions have a relatively short ‘‘lifetime’’ and are unlikely to survive long enough to appear in more than one taxon. To test this, I used the program Blastall (Altschul et al. 1997) to identify matching protein sequences in one outgroup, Salmonella enterica Typhi CT18 (Parkhill et al. 2001) and four completely sequenced strains of Escherichia coli: E. coli K12 (Blattner et al. 1997), E. coli O157:H7 (Perna et al. 2001), E. coli CFT073 (Welch et al. 2002), and ‘‘Shigella flexneri’’ 2a301 (Jin et al. 2002). Sequences with the matching portion reported by Blastall shorter than 50 amino acids, sets of sequences for which not all E. coli strains had a match in S. enterica, and sets of sequences in which the proportion of pairwise differences between the S. enterica and E. coli sequences varied significantly (2 3 4 G-test, P , 0.05) were deleted. The protein sequences were then aligned using ClustalW (Thompson, Higgins, and Gibson 1994). Ambiguously aligned sites adjacent to gaps were omitted, with the omitted sites extending from the gap to the nearest pair of adjacent sites that were both identical in all five sequences. Different genes had different estimated phylogenies, presumably because of recombination (Guttman and Dykhuizen 1994; McGraw et al. 1999), so no attempt was made to estimate an evolutionary Nearly Neutral Protein Evolution 243 Table 1 Gains and Losses of Amino Acids in Escherichia coli Unique Gains Losses Cys Ile Asn Ser His Met Tyr Phe Thr Lys Val Leu Glu Asp Gln Arg Gly Trp Ala Pro 177 534 326 638 272 194 114 144 504 202 551 360 262 268 181 230 184 13 347 90 31 203 137 285 132 103 65 95 376 162 510 349 326 395 285 398 365 31 1033 310 Shared D 0.70* 0.45* 0.41* 0.38* 0.35* 0.31* 0.27* 0.21* 0.15* 0.11* 0.04 0.02 ÿ0.11* ÿ0.19* ÿ0.22* ÿ0.27* ÿ0.33* ÿ0.41* ÿ0.50* ÿ0.55* Gains Losses 16 184 109 179 64 65 30 35 169 64 220 96 148 155 84 65 56 4 210 26 11 175 81 159 41 27 30 31 159 89 191 128 157 149 83 88 54 3 269 54 Jordan et al. (2005) D 0.19 0.03 0.15* 0.06 0.22* 0.41* 0 0.06 0.03 ÿ0.16* 0.07 ÿ0.14* ÿ0.03 0.02 0.01 ÿ0.15* 0.02 0.14 ÿ0.12* ÿ0.35* Gains Losses 143 638 491 738 284 238 126 151 638 286 686 436 311 391 360 298 239 19 531 138 57 364 262 549 194 155 80 122 564 257 643 394 367 493 476 374 334 34 1129 294 D 0.43* 0.27* 0.30* 0.15* 0.19* 0.21* 0.22* 0.11* 0.06* 0.05 0.03 0.05 ÿ0.08* ÿ0.12* ÿ0.14* ÿ0.11* ÿ0.17* ÿ0.28* ÿ0.36* ÿ0.36* NOTE.—For each amino acid, the numbers of gains and losses are shown for unique substitutions (those seen in only one strain of E. coli) and shared substitutions (those seen in two or three strains of E. coli). Data on E. coli from Jordan et al. (2005) are shown for comparison. D, the number of gains minus losses divided by the total number of substitutions. * Values of D that are significantly different from 0 (exact binomial test, P , 0.05). tree and assign substitutions to particular branches of the tree. Instead, gains and losses were simply divided into two classes, ‘‘unique’’ (the derived amino acid present in only one strain of E. coli) or ‘‘shared’’ (the derived amino acid present in more than one strain). The data of Jordan et al. (2005) and the unique substitutions show similar patterns; all differences between gains and losses are in the same direction, and 19 of the 20 amino acids show a larger difference for unique substitutions than for the data of Jordan et al. (2005) (table 1). However, there is a marked difference between unique substitutions and shared substitutions in the patterns of gain and loss. For 18 amino acids, the shared substitutions have a normalized difference between gains and losses that is either smaller than or in the opposite direction to those seen at unique sites. Only 8 amino acids have a significant difference between gains and losses for shared substitutions, while 19 amino acids have a significant difference for unique substitutions. The stronger bias seen for unique gains and losses is inconsistent with the universal trend postulated by Jordan et al. (2005), which would predict the same amount of bias on all branches of a phylogeny, but it is expected under the nearly neutral model. This adds to the evidence that many protein polymorphisms in E. coli are mildly deleterious (Sawyer, Dykhuizen, and Hartl 1987; Hughes 2005). The bias for shared gains and losses seen for some amino acids may result from universal trends in amino acid composition, but they may also represent mildly deleterious alleles that are so nearly neutral that they have survived in more than one strain. The bias in shared gains and losses may also result from directional changes in amino acid composition due to positive selection, as has been re- peatedly shown for related species of prokaryotes living in different environments (Haney et al. 1999; McDonald, Grasso, and Rejto 1999; McDonald 2001; Nishio et al. 2003; Di Giulio 2005; Methe et al. 2005). Acknowledgments I thank H. Akashi, B. C. Verrelli, and B. J. Wolpert for helpful comments on the manuscript. Literature Cited Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402. Blattner, F. R., G. Plunkett, C. A. Bloch et al. (16 co-authors). 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474. Cargill, M., D. Altshuler, J. Ireland et al. (17 co-authors). 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22:231–238. Collins, T. M., P. H. Wimberger, and G. J. P. Naylor. 1994. Compositional bias, character-state bias, and character-state reconstruction using parsimony. Syst. Biol. 47:482–496. Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary change in proteins. Pp. 345–352 in M. O. Dayhoff, ed. Atlas of protein sequence and structure. Volume 5, Supplement 2. National Biomedical Research Foundation, Washington, D.C. Di Giulio, M. 2005. A comparison of proteins from Pyrococcus furiosus and Pyrococcus abyssi: barophily in the physicochemical properties of amino acids and in the genetic code. Gene 346:1–6. Elson, J. L., D. M. Turnbull, and N. Howell. 2004. Comparative genomics and the evolution of human mitochondrial DNA: assessing the effects of selection. Am. J. Hum. Genet. 74: 229–238. Eyre-Walker, A. 1998. Problems with parsimony in sequences of biased base composition. J. Mol. Evol. 47:686–690. Fay, J. C., G. J. Wyckoff, and C.-I. Wu. 2001. Positive and negative selection on the human genome. Genetics 158: 1227–1234. Goldman, N., and S. Whelan. 2002. A novel use of equilibrium frequencies in models of sequence evolution. Mol. Biol. Evol. 19:1821–1831. Guttman, D. S., and D. E. Dykhuizen. 1994. Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science 266:1380–1383. Haney, P. J., J. H. Badger, G. L. Buldak, C. I. Reich, C. R. Woese, and G. J. Olsen. 1999. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. Proc. Natl. Acad. Sci. USA 96:3578–3583. Hasegawa, M., Y. Cao, and Z. H. Yang. 1998. Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Mol. Biol. Evol. 15:1499–1505. Henikoff, S., and J. G. Henikoff. 2000. Amino acid substitution matrices. Adv. Protein Chem. 54:73–97. Ho, S. Y. W., M. J. Phillips, A. Cooper, and A. J. Drummond. 2005. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol. Evol. 22:1561–1568. Hughes, A. L. 2005. Evidence for abundant slightly deleterious polymorphisms in bacterial populations. Genetics 169:533–538. 244 McDonald Hughes, A. L., B. Packer, R. Welch, A. W. Bergen, S. J. Chanock, and M. Yeager. 2003. Widespread purifying selection at polymorphic sites in human protein-coding loci. Proc. Natl. Acad. Sci. USA 100:15754–15757. Jin, Q., Z. H. Yuan, J. G. Xu et al. (32 co-authors). 2002. Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res. 30:4432–4441. Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275–282. Jordan, I. K., F. A. Kondrashov, I. A. Adzhubei, Y. I. Wolf, E. V. Koonin, A. S. Kondrashov, and S. Sunyaev. 2005. A universal trend of amino acid gain and loss in protein evolution. Nature 433:633–638. Kimura, M. 1962. On the probability of fixation of mutant genes in a population. Genetics 47:713–719. Kosiol, C., and N. Goldman. 2005. Different versions of the Dayhoff rate matrix. Mol. Biol. Evol. 22:193–199. Liò, P., and N. Goldman. 1998. Models of molecular evolution and phylogeny. Genome Res. 8:1233–1244. McDonald, J. H. 2001. Patterns of temperature adaptation in proteins from the bacteria Deinococcus radiodurans and Thermus thermophilus. Mol. Biol. Evol. 18:741–749. McDonald, J. H., A. M. Grasso, and L. K. Rejto. 1999. Patterns of temperature adaptation in proteins from Methanococcus and Bacillus. Mol. Biol. Evol. 16:1785–1790. McGraw, E. A., J. Li, R. K. Selander, and T. S. Whittam. 1999. Molecular evolution and mosaic structure of a, b, and k intimins of pathogenic Escherichia coli. Mol. Biol. Evol. 16:12–22. Methe, B. A., K. E. Nelson, J. W. Deming et al. (24 co-authors). 2005. The psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses. Proc. Natl. Acad. Sci. USA 102: 10913–10918. Moilanen, J. S., and K. Majamaa. 2003. Phylogenetic network and physicochemical properties of nonsynonymous mutations in the protein-coding genes of human mitochondrial DNA. Mol. Biol. Evol. 20:1195–1210. Müller, T., and M. Vingron. 2000. Modeling amino acid replacement. J. Comput. Biol. 7:761–776. Nachman, M. W., W. M. Brown, M. Stoneking, and C. F. Aquadro. 1996. Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142:953–963. Nishio, Y., Y. Nakamura, Y. Kawarabayasi et al. (11 co-authors). 2003. Comparative complete genome sequence analysis of the amino acid replacements responsible for the thermo- stability of Corynebacterium efficiens. Genome Res. 13: 1572–1579. Ohta, T. 1992. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23:263–286. Parkhill, J., G. Dougan, K. D. James et al. (40 co-authors). 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413:848–852. Perna, N. T., and T. D. Kocher. 1995. Unequal base frequencies and estimation of substitution rates. Mol. Biol. Evol. 12: 359–361. Perna, N. T., G. Plunkett, V. Burland et al. (27 co-authors). 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529–533. Sawyer, S. A., D. E. Dykhuizen, and D. L. Hartl. 1987. Confidence interval for the number of selectively neutral amino acid polymorphisms. Proc. Natl. Acad. Sci. USA 84: 6225–6228. Sunyaev, S., V. Ramensky, I. Koch, W. Lathe, A. S. Kondashov, and P. Bork. 2001. Prediction of deleterious human alleles. Hum. Mol. Genet. 10:591–597. Sunyaev, S., F. A. Kondrashov, P. Bork, and V. Ramensky. 2003. Impact of selection, mutation rate and genetic drift on human genetic variation. Hum. Mol. Genet. 12:3325–3330. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL-W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. Veerassamy, S., A. Smith, and E. R. M. Tillier. 2003. A transition probability model for amino acid substitutions from blocks. J. Comput. Biol. 10:997–1010. Welch, R. A., V. Burland, G. Plunkett et al. (18 co-authors). 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. USA 99:17020–17024. Williamson, S. H., R. Hernandez, A. Fledel-Alon, L. Zhu, R. Nielsen, and C. D. Bustamente. 2005. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA 102:7882–7887. Zuckerkandl, E., J. Derancourt, and H. Vogel. 1971. Mutational trends and random processes in the evolution of informational macromolecules. J. Mol. Biol. 59:473–490. William Martin, Associate Editor Accepted September 19, 2005
© Copyright 2026 Paperzz