Apparent Trends of Amino Acid Gain and Loss in

Apparent Trends of Amino Acid Gain and Loss in Protein Evolution
Due to Nearly Neutral Variation
John H. McDonald
Department of Biological Sciences, University of Delaware
It has recently been claimed that certain amino acids have been increasing in frequency in all living organisms for most of
the history of life on earth, while other amino acids have been decreasing in frequency. Three lines of evidence have been
offered for this assertion, but each has a more plausible alternative interpretation. Here I show that unequal patterns of gains
and losses for particular pairs of amino acids (such as more leucine / phenylalanine than phenylalanine / leucine
substitutions in humans and chimpanzees since they split from a common ancestor) are consistent with a simple neutral
model at equilibrium amino acid frequencies. Unequal numbers of gains and losses for particular amino acids (such as more
gains than losses of cysteine) are shown by simulations to be consistent with a model of nearly neutral evolution. Unequal
numbers of gains and losses for particular amino acids in human polymorphism data are shown by simulations to be
explainable by the nearly neutral model as well. In a comparison of protein sequences from four strains of Escherichia
coli, polarized by one outgroup strain of Salmonella, the disparity in number of gains and losses for particular amino acids
is strong in terminal branches but weaker or nonexistent in internal branches, which is inconsistent with the universal trend
model but as expected under the nearly neutral model.
Introduction
Jordan et al. (2005) compared protein sequences from
pairs of closely related bacteria, yeast, and mammals, inferring the ancestral states by comparison with a more distantly
related outgroup for each pair of species. They also examined human protein polymorphism data, with the ancestral
allele at each polymorphic site inferred from comparison
with chimpanzee sequences. Their surprising conclusion
was that certain amino acids, such as cysteine, methionine,
histidine, serine, and phenylalanine, are more likely to be
gained than lost in protein evolution, while other amino
acids, such as proline, alanine, glutamic acid, and glycine,
are more likely to be lost than gained. Furthermore, they
concluded that these trends are universal in all living organisms and have been ongoing for billions of years. They speculated that these trends result because some amino acids,
having been added to the genetic code relatively recently,
are still increasing toward their equilibrium frequency.
Earlier, Zuckerkandl, Derancourt, and Vogel (1971) performed a similar analysis of a much smaller data set (globins
and cytochrome c in several vertebrates) and reached similar
conclusions. Here I examine the evidence provided by
Jordan et al. (2005) and suggest alternate explanations that
do not involve universal, directional changes in amino acid
composition.
Substitutional Asymmetry of Pairs of Amino Acids
Jordan et al. (2005) offered three lines of evidence for
universal trends of change in amino acid frequencies. The
first is that for many pairs of amino acids, there are unequal
numbers of substitutions in one direction compared with
those in the opposite direction. For example, in the comparison of humans with chimpanzees, using mice (Mus
musculus) as the outgroup, there are 102 sites where phenylalanine has been gained and leucine lost, while only 57
Key words: protein evolution, Markov chain, nearly neutral model,
Escherichia coli.
E-mail: [email protected].
Mol. Biol. Evol. 23(2):240–244. 2006
doi:10.1093/molbev/msj026
Advance Access publication September 29, 2005
Ó The Author 2005. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
sites have the opposite pattern (leucine gained and phenylalanine lost). Jordan et al. (2005) conclude that amino acid
frequencies are not stationary but are instead changing in
frequency.
In a two-allele system at equilibrium, there must be
equal number of substitutions in each direction; a significant
difference in the number of substitutions would indeed indicate that the allele frequencies were changing. However,
there are 20 possible amino acids at any site. For a site with
more than two possible alleles, equal numbers of substitutions in each direction would only be expected if the evolutionary process were a time-reversible Markov process
with stationary amino acid frequencies (Liò and Goldman
1998). In a reversible process, by definition the number of
A / B substitutions per unit time is equal to the number of
B / A substitutions, so that the process would look the
same whether time was running forward or backward. Reversibility is usually assumed when inferring a matrix of
substitution rates from pairwise comparisons of sequences
(Dayhoff, Schwartz, and Orcutt 1978; Jones, Taylor, and
Thornton 1992; S. Henikoff and J. G. Henikoff 2000;
Müller and Vingron 2000; Goldman and Whelan 2002;
Veerassamy, Smith, and Tillier 2003; Kosiol and Goldman
2005). However, this assumption is made purely for mathematical convenience; there is no biological evidence for
reversibility (Liò and Goldman 1998).
With more than two alleles, it is easy to create neutral
models of protein evolution that are not reversible. For example, consider a three-allele model in which the substitution rate from B to C (PBC) is 4 3 10ÿ5 substitutions per
generation and PAB 5 PAC 5 PBA 5 PCA 5 PCB 5 1 3
10ÿ5. Using this rate matrix in a Markov chain model until
the amino acid frequencies become stationary, the frequencies of A, B, and C are fA 5 0.333, fB 5 0.167, and fC 5
0.500. Because there are twice as many A as B amino acids
and PAB 5 PBA, there will be twice as many A / B as
B / A substitutions per unit time. Clearly, differences
in the numbers of forward and reverse substitutions for pairs
of amino acids are consistent with a simple neutral model at
equilibrium and do not necessarily indicate changing amino
acid frequencies.
Nearly Neutral Protein Evolution 241
FIG. 1.—The nine possible fates of a mildly deleterious B allele with
a lifetime shorter than any branch on the phylogeny. Phylogenies A–D
have one gain and one loss of B, phylogeny E has two losses, phylogeny
F has one gain and two losses, and phylogenies G–I have one gain and no
losses. There are thus eight gains and eight losses in total, although only the
two gains in H and I would be inferred from modern sequence data.
Unequal Numbers of Gains and Losses of
Amino Acids
The second line of evidence offered by Jordan et al.
(2005) is that certain amino acids are gained more than they
are lost, while others are lost more than they are gained. For
example, in the comparison of humans with chimpanzees
there are 137 sites where cysteine has been substituted
for a different ancestral amino acid (‘‘gains’’ of cysteine)
and only 72 sites where cysteine has been lost; similar excess gains of cysteine are seen in rodents, yeast, and 12 taxa
of bacteria. This could be explained by a trend of increasing
frequencies of cysteine, but there are other explanations that
do not involve changing amino acid composition.
One problem with using parsimony to infer gains and
losses of amino acids is that the inferences are sometimes
incorrect; when an outgroup and one ingroup have amino
acid B and the other ingroup has A, there may have been
two A / B substitutions, not one B / A. This is particularly likely if A is more common than B; there will be more
inferred common / rare than rare / common substitutions, even when the actual number of substitutions in each
direction is equal (Collins, Wimberger, and Naylor 1994;
Perna and Kocher 1995). Jordan et al. (2005) attempted
to avoid this problem by using closely related species,
but there can be substantially more inferred common /
rare substitutions despite fairly small amounts of divergence (Eyre-Walker 1998).
Sites which fit the nearly neutral model of protein evolution (Ohta 1992) may be particularly likely to show more
inferred substitutions in one direction than the other. Consider a site at which amino acid A is favored by natural selection. When a mutation to amino acid B occurs, it is
selected against. If the selection against B is weak enough,
B may remain present for a while before being replaced by
A again. Under this model, there is a chance that when comparing sequences from an outgroup and two ingroup species, one of the ingroups will have the mildly deleterious
B, while the other two species have the preferred A. This
would be interpreted as a gain of B. On the other hand, an
apparent loss of B, in which the outgroup and one of the
ingroup species would both have the deleterious B while
only one of the ingroup species has the preferred A, would
be quite unlikely; it would require either two independent
substitutions of the deleterious B, one in the outgroup and
FIG. 2.—Initial (1) and final (3) frequencies of the mildly deleterious
allele B after 3,000 generations of simulated evolution.
one in an ingroup, or a deleterious B that survived in two
lineages since the common ancestor of all three species. Under this nearly neutral model, there could be many more
apparent gains of B than losses, even if the ancestral state
at sites that differ among the taxa is always inferred correctly. This may seem paradoxical, but the ‘‘missing’’ losses
of B would occur at sites where a common ancestor of two
species happened to have the mildly deleterious B, which
was replaced by the favored A in both daughter lineages
(fig. 1).
To illustrate how nearly neutral evolution could produce a difference in the number of gains and losses, I wrote a
computer program to simulate mutation, drift, and selection
for a two-allele locus, with allele A being preferred and allele B being deleterious. The population size was 25 diploid
individuals, and the fitnesses of genotypes AA, AB, and BB
were 1, 1 ÿ s, and 1 ÿ 2s, respectively. The A / B and
B / A mutation rates, lAB and lBA, were 0.0001 per generation. For each replicate, a single population was started
with the initial state either fixed for A or fixed for B. The
initial probability of being fixed for B, PB, was determined
by setting the number of A / B substitutions equal to the
number of B / A substitutions, PAlABuB 5 PBlBAuA,
solving for PB, PB 5 uB/(uA 1 uB), and then using equation
(10) of Kimura (1962) for uA and uB, the probabilities of
fixation of A and B. Evolution in this initial population
was first simulated for 1,000 generations to produce equilibrium levels of polymorphism. The population was then
duplicated into an outgroup and an ingroup lineage, and
evolution was simulated for 1,000 more generations. The
ingroup was then duplicated, and evolution was simulated
in the outgroup and the two ingroups for 1,000 more generations. One allele was then sampled at random from each
of the three species. If the two ingroup alleles were different, the allele that was in the outgroup was counted as lost in
one of the ingroups, while the allele that was not in the outgroup was counted as gained. The simulations were replicated 10,000 times for each selection coefficient.
When 2Ns 5 0 (the neutral model), the equilibrium frequency of B is 0.5; as the selection coefficient against B increases, the equilibrium frequency of B declines (fig. 2). The
final frequency is the same as the initial frequency, indicating that there is no trend of changing allele frequencies in
242 McDonald
FIG. 3.—Number of replicate simulations of a three-species model
with inferred gains (gray bars) or losses (black bars) of a mildly deleterious
allele in one of the two ingroup species.
FIG. 4.—Number of replicate simulations of a two-species model with
inferred gains (gray bars) or losses (black bars) of a mildly deleterious
allele as a polymorphism in the ingroup species.
these simulations. The number of gains of B initially increases and then declines as the selection coefficient against
B increases (fig. 3). The increase is presumably due to the
increasing frequency of A (and thus increasing frequency of
sites where B can be gained). The number of losses of B
declines more rapidly; as selection against B intensifies
and the average frequency of B decreases, it becomes unlikely that both the outgroup and one of the ingroups would
have B at the end of the simulated generations (the pattern
that is interpreted as a loss of B). As a result, nearly neutral
evolution produces many more apparent gains than losses of
a mildly deleterious allele. Although this simple two-allele
model could be made more elaborate and realistic, the results
seem sufficient to demonstrate that unequal numbers of
gains and losses do not necessarily indicate changing allele
frequencies.
ses in polymorphism data (fig. 4). The results demonstrate
that the unequal numbers of gains and losses in human polymorphism data found by Jordan et al. (2005) do not necessarily indicate changing allele frequencies but may merely
add to the evidence that many protein polymorphisms in
humans are mildly deleterious, an interpretation that is amply supported by evidence from mitochondrial (Nachman
et al. 1996; Hasegawa, Cao, and Yang 1998; Moilanen
and Majamaa 2003; Elson, Turnbull, and Howell 2004;
Ho et al. 2005) and nuclear genes (Cargill et al. 1999;
Fay, Wyckoff, and Wu 2001; Sunyaev et al. 2001, 2003;
Hughes et al. 2003; Williamson et al. 2005).
Nearly Neutral Human Polymorphisms
The third line of evidence for a universal trend offered
by Jordan et al. (2005) is that for human protein polymorphisms (with chimpanzee as the outgroup), the ratio of
gains to losses for most amino acids is significantly different from one. However, the nearly neutral model could also
explain this result. Consider a site where A is favored by
selection and B is mildly deleterious. If the selection against
B is weak, it is possible for B to appear as a polymorphism
for a while. There will thus be some polymorphisms where
A is the ancestral allele and B has been gained. But because
B is mildly deleterious, it is unlikely to occur in both the
outgroup and as a polymorphism in the ingroup. Therefore,
there will be fewer sites where B is the ancestral allele and
A is a derived polymorphism.
To test whether nearly neutral evolution could produce
more gains than losses for polymorphism data, I simulated
evolution using the same model described above. The original population evolved for 1,000 generations, then it was
duplicated and the outgroup and ingroup species evolved
for 1,000 more generations. If the ingroup was polymorphic, one allele was sampled at random from the outgroup
to infer which allele was gained and which was lost. The
simulations were replicated 10,000 times for each selection
coefficient.
The simulations show that for a broad range of selection coefficients, nearly neutral evolution results in a large
number of differences between the number of gains and los-
Nearly Neutral Polymorphisms in Escherichia coli
One way to distinguish between the universal trend
model and the nearly neutral model is to examine the pattern of gains and losses on a phylogeny with more than
three taxa. If the long-term directional trend postulated
by Jordan et al. (2005) is correct, the ratio of gains to losses
should be similar in all parts of a phylogeny. Under the
nearly neutral model, however, the apparent excess gains
will be confined mostly to the tips of a phylogeny, as mildly
deleterious substitutions have a relatively short ‘‘lifetime’’
and are unlikely to survive long enough to appear in more
than one taxon. To test this, I used the program Blastall
(Altschul et al. 1997) to identify matching protein sequences in one outgroup, Salmonella enterica Typhi CT18
(Parkhill et al. 2001) and four completely sequenced strains
of Escherichia coli: E. coli K12 (Blattner et al. 1997),
E. coli O157:H7 (Perna et al. 2001), E. coli CFT073 (Welch
et al. 2002), and ‘‘Shigella flexneri’’ 2a301 (Jin et al. 2002).
Sequences with the matching portion reported by Blastall
shorter than 50 amino acids, sets of sequences for which
not all E. coli strains had a match in S. enterica, and sets
of sequences in which the proportion of pairwise differences
between the S. enterica and E. coli sequences varied significantly (2 3 4 G-test, P , 0.05) were deleted. The protein
sequences were then aligned using ClustalW (Thompson,
Higgins, and Gibson 1994). Ambiguously aligned sites adjacent to gaps were omitted, with the omitted sites extending
from the gap to the nearest pair of adjacent sites that were
both identical in all five sequences. Different genes had different estimated phylogenies, presumably because of recombination (Guttman and Dykhuizen 1994; McGraw et al.
1999), so no attempt was made to estimate an evolutionary
Nearly Neutral Protein Evolution 243
Table 1
Gains and Losses of Amino Acids in Escherichia coli
Unique
Gains Losses
Cys
Ile
Asn
Ser
His
Met
Tyr
Phe
Thr
Lys
Val
Leu
Glu
Asp
Gln
Arg
Gly
Trp
Ala
Pro
177
534
326
638
272
194
114
144
504
202
551
360
262
268
181
230
184
13
347
90
31
203
137
285
132
103
65
95
376
162
510
349
326
395
285
398
365
31
1033
310
Shared
D
0.70*
0.45*
0.41*
0.38*
0.35*
0.31*
0.27*
0.21*
0.15*
0.11*
0.04
0.02
ÿ0.11*
ÿ0.19*
ÿ0.22*
ÿ0.27*
ÿ0.33*
ÿ0.41*
ÿ0.50*
ÿ0.55*
Gains Losses
16
184
109
179
64
65
30
35
169
64
220
96
148
155
84
65
56
4
210
26
11
175
81
159
41
27
30
31
159
89
191
128
157
149
83
88
54
3
269
54
Jordan et al. (2005)
D
0.19
0.03
0.15*
0.06
0.22*
0.41*
0
0.06
0.03
ÿ0.16*
0.07
ÿ0.14*
ÿ0.03
0.02
0.01
ÿ0.15*
0.02
0.14
ÿ0.12*
ÿ0.35*
Gains Losses
143
638
491
738
284
238
126
151
638
286
686
436
311
391
360
298
239
19
531
138
57
364
262
549
194
155
80
122
564
257
643
394
367
493
476
374
334
34
1129
294
D
0.43*
0.27*
0.30*
0.15*
0.19*
0.21*
0.22*
0.11*
0.06*
0.05
0.03
0.05
ÿ0.08*
ÿ0.12*
ÿ0.14*
ÿ0.11*
ÿ0.17*
ÿ0.28*
ÿ0.36*
ÿ0.36*
NOTE.—For each amino acid, the numbers of gains and losses are shown for
unique substitutions (those seen in only one strain of E. coli) and shared substitutions
(those seen in two or three strains of E. coli). Data on E. coli from Jordan et al. (2005)
are shown for comparison. D, the number of gains minus losses divided by the total
number of substitutions.
* Values of D that are significantly different from 0 (exact binomial test,
P , 0.05).
tree and assign substitutions to particular branches of the
tree. Instead, gains and losses were simply divided into
two classes, ‘‘unique’’ (the derived amino acid present in only one strain of E. coli) or ‘‘shared’’ (the derived amino acid
present in more than one strain).
The data of Jordan et al. (2005) and the unique substitutions show similar patterns; all differences between
gains and losses are in the same direction, and 19 of the
20 amino acids show a larger difference for unique substitutions than for the data of Jordan et al. (2005) (table 1).
However, there is a marked difference between unique substitutions and shared substitutions in the patterns of gain
and loss. For 18 amino acids, the shared substitutions have
a normalized difference between gains and losses that is either smaller than or in the opposite direction to those seen at
unique sites. Only 8 amino acids have a significant difference between gains and losses for shared substitutions,
while 19 amino acids have a significant difference for
unique substitutions. The stronger bias seen for unique
gains and losses is inconsistent with the universal trend postulated by Jordan et al. (2005), which would predict the
same amount of bias on all branches of a phylogeny, but
it is expected under the nearly neutral model. This adds
to the evidence that many protein polymorphisms in E. coli
are mildly deleterious (Sawyer, Dykhuizen, and Hartl 1987;
Hughes 2005). The bias for shared gains and losses seen for
some amino acids may result from universal trends in
amino acid composition, but they may also represent mildly
deleterious alleles that are so nearly neutral that they have
survived in more than one strain. The bias in shared gains
and losses may also result from directional changes in amino
acid composition due to positive selection, as has been re-
peatedly shown for related species of prokaryotes living in
different environments (Haney et al. 1999; McDonald,
Grasso, and Rejto 1999; McDonald 2001; Nishio et al.
2003; Di Giulio 2005; Methe et al. 2005).
Acknowledgments
I thank H. Akashi, B. C. Verrelli, and B. J. Wolpert
for helpful comments on the manuscript.
Literature Cited
Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang,
W. Miller, and D. J. Lipman. 1997. Gapped BLAST and
PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res. 25:3389–3402.
Blattner, F. R., G. Plunkett, C. A. Bloch et al. (16 co-authors).
1997. The complete genome sequence of Escherichia coli
K-12. Science 277:1453–1474.
Cargill, M., D. Altshuler, J. Ireland et al. (17 co-authors). 1999.
Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22:231–238.
Collins, T. M., P. H. Wimberger, and G. J. P. Naylor. 1994. Compositional bias, character-state bias, and character-state reconstruction using parsimony. Syst. Biol. 47:482–496.
Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model
of evolutionary change in proteins. Pp. 345–352 in M. O.
Dayhoff, ed. Atlas of protein sequence and structure. Volume
5, Supplement 2. National Biomedical Research Foundation,
Washington, D.C.
Di Giulio, M. 2005. A comparison of proteins from Pyrococcus
furiosus and Pyrococcus abyssi: barophily in the physicochemical properties of amino acids and in the genetic code. Gene
346:1–6.
Elson, J. L., D. M. Turnbull, and N. Howell. 2004. Comparative
genomics and the evolution of human mitochondrial DNA:
assessing the effects of selection. Am. J. Hum. Genet. 74:
229–238.
Eyre-Walker, A. 1998. Problems with parsimony in sequences of
biased base composition. J. Mol. Evol. 47:686–690.
Fay, J. C., G. J. Wyckoff, and C.-I. Wu. 2001. Positive and
negative selection on the human genome. Genetics 158:
1227–1234.
Goldman, N., and S. Whelan. 2002. A novel use of equilibrium
frequencies in models of sequence evolution. Mol. Biol. Evol.
19:1821–1831.
Guttman, D. S., and D. E. Dykhuizen. 1994. Clonal divergence in
Escherichia coli as a result of recombination, not mutation.
Science 266:1380–1383.
Haney, P. J., J. H. Badger, G. L. Buldak, C. I. Reich, C. R. Woese,
and G. J. Olsen. 1999. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely
thermophilic Methanococcus species. Proc. Natl. Acad. Sci.
USA 96:3578–3583.
Hasegawa, M., Y. Cao, and Z. H. Yang. 1998. Preponderance of
slightly deleterious polymorphism in mitochondrial DNA:
nonsynonymous/synonymous rate ratio is much higher within
species than between species. Mol. Biol. Evol. 15:1499–1505.
Henikoff, S., and J. G. Henikoff. 2000. Amino acid substitution
matrices. Adv. Protein Chem. 54:73–97.
Ho, S. Y. W., M. J. Phillips, A. Cooper, and A. J. Drummond.
2005. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol.
Evol. 22:1561–1568.
Hughes, A. L. 2005. Evidence for abundant slightly deleterious
polymorphisms in bacterial populations. Genetics 169:533–538.
244 McDonald
Hughes, A. L., B. Packer, R. Welch, A. W. Bergen, S. J. Chanock,
and M. Yeager. 2003. Widespread purifying selection at polymorphic sites in human protein-coding loci. Proc. Natl. Acad.
Sci. USA 100:15754–15757.
Jin, Q., Z. H. Yuan, J. G. Xu et al. (32 co-authors). 2002. Genome
sequence of Shigella flexneri 2a: insights into pathogenicity
through comparison with genomes of Escherichia coli K12
and O157. Nucleic Acids Res. 30:4432–4441.
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid
generation of mutation data matrices from protein sequences.
Comput. Appl. Biosci. 8:275–282.
Jordan, I. K., F. A. Kondrashov, I. A. Adzhubei, Y. I. Wolf,
E. V. Koonin, A. S. Kondrashov, and S. Sunyaev. 2005.
A universal trend of amino acid gain and loss in protein
evolution. Nature 433:633–638.
Kimura, M. 1962. On the probability of fixation of mutant genes
in a population. Genetics 47:713–719.
Kosiol, C., and N. Goldman. 2005. Different versions of the
Dayhoff rate matrix. Mol. Biol. Evol. 22:193–199.
Liò, P., and N. Goldman. 1998. Models of molecular evolution
and phylogeny. Genome Res. 8:1233–1244.
McDonald, J. H. 2001. Patterns of temperature adaptation in proteins from the bacteria Deinococcus radiodurans and Thermus
thermophilus. Mol. Biol. Evol. 18:741–749.
McDonald, J. H., A. M. Grasso, and L. K. Rejto. 1999. Patterns of
temperature adaptation in proteins from Methanococcus and
Bacillus. Mol. Biol. Evol. 16:1785–1790.
McGraw, E. A., J. Li, R. K. Selander, and T. S. Whittam. 1999.
Molecular evolution and mosaic structure of a, b, and k
intimins of pathogenic Escherichia coli. Mol. Biol. Evol.
16:12–22.
Methe, B. A., K. E. Nelson, J. W. Deming et al. (24 co-authors).
2005. The psychrophilic lifestyle as revealed by the genome
sequence of Colwellia psychrerythraea 34H through genomic
and proteomic analyses. Proc. Natl. Acad. Sci. USA 102:
10913–10918.
Moilanen, J. S., and K. Majamaa. 2003. Phylogenetic network and
physicochemical properties of nonsynonymous mutations in
the protein-coding genes of human mitochondrial DNA.
Mol. Biol. Evol. 20:1195–1210.
Müller, T., and M. Vingron. 2000. Modeling amino acid replacement. J. Comput. Biol. 7:761–776.
Nachman, M. W., W. M. Brown, M. Stoneking, and C. F. Aquadro.
1996. Nonneutral mitochondrial DNA variation in humans and
chimpanzees. Genetics 142:953–963.
Nishio, Y., Y. Nakamura, Y. Kawarabayasi et al. (11 co-authors).
2003. Comparative complete genome sequence analysis of
the amino acid replacements responsible for the thermo-
stability of Corynebacterium efficiens. Genome Res. 13:
1572–1579.
Ohta, T. 1992. The nearly neutral theory of molecular evolution.
Annu. Rev. Ecol. Syst. 23:263–286.
Parkhill, J., G. Dougan, K. D. James et al. (40 co-authors). 2001.
Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413:848–852.
Perna, N. T., and T. D. Kocher. 1995. Unequal base frequencies
and estimation of substitution rates. Mol. Biol. Evol. 12:
359–361.
Perna, N. T., G. Plunkett, V. Burland et al. (27 co-authors). 2001.
Genome sequence of enterohaemorrhagic Escherichia coli
O157:H7. Nature 409:529–533.
Sawyer, S. A., D. E. Dykhuizen, and D. L. Hartl. 1987. Confidence interval for the number of selectively neutral amino
acid polymorphisms. Proc. Natl. Acad. Sci. USA 84:
6225–6228.
Sunyaev, S., V. Ramensky, I. Koch, W. Lathe, A. S. Kondashov,
and P. Bork. 2001. Prediction of deleterious human alleles.
Hum. Mol. Genet. 10:591–597.
Sunyaev, S., F. A. Kondrashov, P. Bork, and V. Ramensky. 2003.
Impact of selection, mutation rate and genetic drift on human
genetic variation. Hum. Mol. Genet. 12:3325–3330.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994.
CLUSTAL-W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice.
Nucleic Acids Res. 22:4673–4680.
Veerassamy, S., A. Smith, and E. R. M. Tillier. 2003. A transition
probability model for amino acid substitutions from blocks.
J. Comput. Biol. 10:997–1010.
Welch, R. A., V. Burland, G. Plunkett et al. (18 co-authors). 2002.
Extensive mosaic structure revealed by the complete genome
sequence of uropathogenic Escherichia coli. Proc. Natl. Acad.
Sci. USA 99:17020–17024.
Williamson, S. H., R. Hernandez, A. Fledel-Alon, L. Zhu,
R. Nielsen, and C. D. Bustamente. 2005. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA
102:7882–7887.
Zuckerkandl, E., J. Derancourt, and H. Vogel. 1971. Mutational
trends and random processes in the evolution of informational
macromolecules. J. Mol. Biol. 59:473–490.
William Martin, Associate Editor
Accepted September 19, 2005