site-by-site estimation of the rate of substitution and the correlation of

Syst. Biol. 46(2):346-353, 1997
SITE-BY-SITE ESTIMATION OF THE RATE OF SUBSTITUTION AND
THE CORRELATION OF RATES IN MITOCHONDRIAL DNA
RASMUS NIELSEN
Department of Integrative Biology, University of California, Berkeley, California 94720, USA;
E-mail: [email protected]. berkeley.edu
Abstract.—A maximum likelihood method for independently estimating the relative rate of substitution at different nucleotide sites is presented. With this method, the evolution of DNA sequences can be analyzed without assuming a specific distribution of rates among sites. To investigate the pattern of correlation of rates among sites, the method was applied to a data set
consisting of the protein-coding regions of the mitochondrial genome from 10 vertebrate species.
Rates appear to be strongly correlated at distances up to 40 codons apart. Furthermore, there
appears to be some higher order correlation of sites approximately 75 codons apart. The method
of site-by-site estimation of the rate of substitution may also be applied to examine other aspects
of rate variation along a DNA sequence and to assess the difference in the support of a tree along
the sequence. [Correlation among sites; maximum likelihood; mtDNA; rate variation; site-by-site
estimation.]
The number of data used in phylogenetic studies has increased dramatically in recent years. Commonly, several different
genes are sequenced for the same set of
species. For example, the mitochondrial genome has been fully sequenced in many
vertebrate species (Anderson et al., 1981,
1982; Bibb et al., 1981; Roe et al., 1985;
Gadaleta et al., 1989; Desjardins and Morais, 1990; Arnason et al., 1991; Arnason
and Johnson, 1992; Tzeng et al., 1992;
Chang et al., 1994). Such large data sets
probably will become more common in the
future. Therefore, new methods that take
advantage of the estimates of topology and
branch lengths obtained from large data
sets should be developed. In this paper, I
describe a maximum likelihood method
for independently estimating the relative
rate of substitution at each site when such
estimates have been obtained. The method
was used to estimate the degree of correlation of rates in mitochondrial DNA
(mtDNA).
A decade ago, most models of DNA sequence evolution (e.g., Jukes and Cantor,
1969; Kimura, 1980; Msenstein, 1981; Hasegawa et al., 1985) assumed constancy of
rates among sites. This assumption is
clearly unrealistic. For example, the observation that coding sequences contain sites
with predominantly replacement substi-
tutions and sites with predominantly silent
substitutions shows that a model assuming constancy in rate among sites is not
satisfactory. Other causes of rate variation
may include differences in the biological
mutation rate along the sequence and varying degrees of selection in different regions
or domains.
Several models of sequence evolution
have recently been proposed that allow for
among-site rate variation (Yang, 1993; Kelly and Rice, 1996). Particularly popular are
models based on gamma-distributed rates
(Uzzell and Corbin, 1971; Jin and Nei,
1990; Yang, 1993, 1994; see Yang, 1996, for
a review). Yang (1995) devised a method
for estimating the rate at each site and the
degree of correlation of rates among sites
assuming gamma-distributed rates and a
stationary autocorrelation function where
only adjacent sites are correlated. The aim
of the present study was to devise an estimator of the rate at each site and the degree of correlation that does not rely on
these assumptions. The disadvantage is
that the topology and branch lengths of the
underlying tree must be known prior to
the analysis. The relative rates can then be
estimated independently for each site. Because such estimates are independent, further analysis such as estimation of the autocorrelation function may be performed
directly on the obtained estimates.
346
1997
347
NIELSEN—ESTIMATION OF M T D N A SUBSTITUTION RATES
Estimates of the rate of substitution at
individual sites have already been obtained in various cases by the parsimony
method (see Horovitz and Meyer, 1995).
This parsimony-based method may be reliable if all branches are short. However,
when some branches are long, parsimony
may provide unreliable estimates of the
rate of substitution because several substitutions may occur on the same branch. In
such cases, the maximum likelihood method should provide better estimates.
In this paper, I explore the properties of
the maximum likelihood estimator of the
rate of substitution. The maximum likelihood method assumes, as does the parsimony method, that the underlying topology is known with great certainty. In
addition, the maximum likelihood estimator requires that reasonable estimates of
the branch lengths have been obtained.
ESTIMATING THE SUBSTITUTION RATE
For the maximum likelihood estimator
of the relative substitution rate (w) at individual sites of a DNA sequence, u refers
to the composite parameter \t, where A. is
the rate of change at each site and t is the
total tree length. Reliable estimates of u at
each site cannot be obtained while estimating the topology and branch lengths of
a tree. However, if the topology and relative branch lengths of a given tree are already known, an estimate can easily be obtained for each site by maximizing the
likelihood of observing the data for different values of the substitution rate. That is,
an estimate (ft) of the relative rate of substitution at a site is obtained by maximizing
L{u) = P(site pattern|w).
(1)
The likelihood for a given u can be calculated using standard methods (Felsenstein,
1981). For example, for three sequences
L(u) = 2
TT,-P,B1(W, O-P1B2(W, t2)
i=a,c,t,g
•Pmfr. 'a),
(2)
where irr is the stationary frequency of
base i and PiBl(u, tj is the probability of
observing base B1 at time t1 (the relative
r
a = {G, A, A}
0.02-
0.01-
• * ^
—
* ^
C}
00
0.5
1.0
u
1.5
2.0
FIGURE 1. The likelihood functions (L) of u for two
different site patterns (a and b). An unrooted threetaxon tree with relative branch lengths 2,1, and 1 was
assumed. For site a with site pattern {G, A, A}, the
likelihood function has a maximum at 0.5. However,
for site b with site pattern {G, A, C}, the likelihood is
a strictly increasing function of u. In that case, fl = <».
length of the particular branch) when base
i is observed at time 0. The function P/;(«,
t) is specified by the particular model of
evolution assumed. Any desired substitution matrix can thereby in principle be
used. The actual maximization procedure
is trivial because, given the topology and
relative branch lengths, only one parameter (u) is unknown.
Clearly, estimates of ft = 0 will be obtained when the particular site considered
is not variable. However, when sites are
highly variable, estimates for which ft = °°
may be obtained. To understand this phenomenon, consider a tree with three taxa.
In such a tree, any site pattern {a, b, c},
where a ¥^ b, a ¥=• c, and b ¥^ c, will be
more likely to occur as the substitution
rate increases. Therefore, the estimate of
the relative rate goes to infinity for sites
with such patterns (Fig. 1). Although the
probability of observing site patterns with
this property in examples with moderate
or low substitution rates decreases rapidly
as more taxa are included, these site patterns may still occur. If they are ignored in
the analysis, the estimator ft will be biased
toward smaller values.
There may be three solutions to this
problem: (1) sites with estimates of ft = °°
348
VOL. 46
SYSTEMATIC BIOLOGY
(3)
L(um) = P(site pattern|u)-P(u),
which can be maximized for u. The function P(u) can be specified by, for example,
a gamma or exponential distribution. Using this approach is not equivalent to forcing the distribution to follow some predefined function. The distribution of rates
obtained by this approach may be very different from the prior distribution. However, this approach solves the problem of obtaining estimates of ft = <».
SIMULATIONS
10.
Likelihood >^^
1E
8.
6.
c
jctatio
may simply be excluded, (2) an arbitrary
maximum value of u may be predefined,
and (3) a modified likelihood approach
may be applied. The aim of the modified
likelihood approach method is in this case
to force the likelihood function to be centered around smaller values, which can be
done by using any decreasing function as
a prior distribution of the rate of substitution. The modified likelihood function
becomes
Q.
X
LU
/,'
y^Likelihood
/ ' y^
+ prior
4.
2.
y^T^
Parsimony
0
0
2
4
6
8
True rate (u)
10
FIGURE 2. The mean of the estimators U (Likelihood), Um (Likelihood + prior), and flpars (Parsimony)
in 10,000 simulations under a Jukes and Cantor (1969)
model of evolution. The tree estimated from the
mtDNA of 10 vertebrate species was assumed. Notice
that u (the true relative substitution rate) is equivalent
to the expected number of substitutions on the tree in
one site.
are ignored, ft will be biased (Fig. 2). The
Simulations were performed to evaluate downward bias increases as the rate inthe performance of ft. These simulations creases simply because more sites have eswere performed using the topology and timates of ft = oo. The result is that ft conbranch lengths estimated from a data set verges to a finite value (Fig. 2). This
consisting of the mitochondrial genome asymptotic behavior is expected for any esfrom 10 different vertebrates. This data set, timator because there is only a finite numand the topology of the tree, was discussed ber of possible different site patterns. The
in more detail by Cummings et al. (1995). estimates of the rate must converge to a
One of the goals of the simulations was to finite value equal to or less than max{tf}.
compare the maximum likelihood estimator
Another source of bias is small sample
with the commonly applied parsimony es- size (i.e., few taxa). This bias results in
timator. Because the parsimony method can- larger values. Figure 2 illustrates that the
not easily take mutational biases or unequal source of bias is important when u is small
base frequencies into account, the Jukes and or moderate. To illustrate the effect of samCantor (1969) model was chosen for a fair ple size, simulations were performed in
comparison. The evolution in one site was which the number of taxa was varied. Taxa
simulated under a given mutation rate, and were added to the sample by simply joinestimates using the two different estima- ing together identical 10-taxon phylogenies
tors (ft and the parsimony estimator) were at the root (Fig. 3). The bias is greatly reobtained. For each value of the true sub- duced for larger sample sizes, suggesting
stitution rate (u), 10,000 replicates were that the estimator is consistent (i.e., the exgenerated. These 10,000 replicates were pectation of the estimate will converge to
then used to estimate the expectation of the true value as more taxa are added) and
the estimators under a given value of the that the bias observed is a small sample
true mutation rate. Observations for which size effect. The expectation of a modified
u = oo were ignored.
likelihood estimator with a gamma distriBecause site patterns for which ft = o° bution with mean 10.0 and shape param-
1997
NIELSEN—ESTIMATION OF M T D N A SUBSTITUTION RATES
20
40 60 80
Number of taxa
100 120
FIGURE 3. The bias in ft as a function of the number
of taxa for u - 6.0 under the Jukes and Cantor model
of evolution. The bias is defined as the true value minus the expectation of the estimator.
eter 0.5 as a prior distribution was also explored (Fig. 2). The use of such a prior
distribution implies that the estimator will
underestimate the true rate. However, the
bias is not substantially larger than the
bias for the normal likelihood approach.
In the 10-taxon case, tf increases approximately linearly with u except for extremely high values of u. Therefore, although ti
cannot be applied as an unbiased estimate
in all cases, it may still be applied to distinguish between rates at different sites
even when the sequences apparently are
very saturated.
The maximum likelihood estimators u
and Um were compared with the commonly
applied parsimony estimator (tfpars). Simulation results for tfpars (Fig. 2) indicate that
tfpars will strongly underestimate the true
mutation rate. Furthermore, tipais converges
quickly to its asymptotic value and therefore does not behave approximately linearly for nearly as long as do U and tim.
A standard method for comparing different estimators is by the mean squared
error (MSE) of the estimators (see Rice,
1995). The MSE of ti. and tfpars are compared in Figure 4. At low levels of divergence, the parsimony estimator performs
better or as well as the maximum likelihood estimators. The parsimony estimator
performs well for the low levels of diver-
349
4
6
8
True rate (u)
FIGURE 4. The mean squared error (MSE) of the
estimators ft (Likelihood), ftm (w/ prior), and #pars (Parsimony) as determined by simulations of the Jukes and
Cantor (1969) model of evolution on the 10-taxon tree;
10,000 simulations were performed. For tim, a gamma
distribution with a mean of 10.0 and shape parameter
equal to 0.5 was chosen as the prior distribution.
gence because the probability of two or
more substitutions occurring in the same
lineage is very low. However, the methods
differ substantially for high levels of divergence. In these cases, the maximum
likelihood estimator (u) performs substantially better than the parsimony estimator.
This result is also expected because the
parsimony estimator of the rate of substitution by definition will perform poorly
when the probability of several substitutions occurring on the same branch is high.
The main difference between the two
methods is that the maximum likelihood
estimator can distinguish between different rates even when the overall level of divergence is very high, whereas the parsimony estimator cannot.
Also, tim (w/ prior in Fig. 4) has a much
lower MSE over most of the parameter
space than does ti. In fact, the use of a prior distribution has in this case a clear variance-reducing effect. It therefore appears
that tim should be preferred over ti and tfpars
in most cases.
CORRELATION OF RATES IN MTDNA
With this method, the correlation of
rates among sites can be directly examined
by estimating the autocorrelation function
350
VOL. 46
SYSTEMATIC BIOLOGY
along the sequence. The autocorrelation
function provides information about the
degree to which the rate in one site can be
predicted from the rate in other sites a certain distance away. If the autocorrelation
function is estimated along the sequence
without taking the position in the codon
into account, a strong periodicity at distances of 3 bp will naturally occur in the
autocorrelation function because of the nature of the genetic code. Instead, the autocorrelation for each codon position (first,
second, third) should be examined separately. In this manner the correlation
among codons can be explored.
The estimator um was used to analyze
the protein-coding regions of the mitochondrial genome for 10 vertebrate taxa
(Cummings et al, 1995). The maximum
likelihood estimator is particularly appropriate for this use because it provides an
estimate of the rate at each site and is able
to deal with the very high rates of evolution encountered in vertebrate mtDNA. Estimates of topology and branch lengths of
the phylogeny were obtained by the maximum likelihood method used on a data
set of 14,416 protein-coding sites of mtDNA (Cummings et al., 1995). The HKY
model of DNA substitution with gammadistributed rates (Hasegawa et al., 1985;
Yang, 1993) was used to estimate the topology and branch lengths. The estimates
of these parameters were assumed to be
accurate under the model. This assumption is justified by the large number sites
included in the analysis and is also supported by the fact that other methods of tree
reconstruction provide identical topologies
(see Cummings et al., 1995). Subsequently,
the transition/transversion (Ts/Tv) bias, the
mean number of substitutions, and the
shape parameter of the gamma distribution
were estimated independently for the first,
second, and third codon positions (Table
1). Applying these estimated parameter
values, um was obtained for each site using
the HKY model (Hasegawa et al., 1985)
with a prior distribution given by a gamma distribution with shape parameter
identical to the empirically estimated value. The ratio of transitions to transversions
TABLE 1. Estimated parameter values for correlation of substitution rates in mtDNA.
Codon
position
Ts/Tv bias
1.38
1.19
2.47
Gamma
shape
parameter
Mean no.
substitutions/
site
0.45
0.27
1.90
0.80
6.16
was assumed to be the same for each site
within each codon position. After estimation of the relevant parameters of the mutation model, the autocorrelation function
was estimated using standard methods
(see Box and Jenkins, 1976) separately for
each codon position (Fig. 5).
There is a strong correlation of rates for
distances up to 40 bp apart in the first and
second positions and a peak in the autocorrelation function at distances of 75 codons. However, in the third codon position
there is little or no correlation of rates
among sites. None of the same data were
used when estimating the autocorrelation
function in the first and in the second position. The close similarity of the two functions therefore strongly suggests that the
pattern observed (Fig. 5) is not related to
sampling.
DISCUSSION
There may be several explanations for
the correlation in rates in second and first
positions among codons close to each other: the rate of mutation may vary among
regions, mutations may stretch over several codons (e.g., if they arise as gene conversions), or selection may act differently
in different regions. The existence of separate functional domains predicts such differences in the intensity of selection between
different regions. If the lack of correlation in
rates between codons in the third position is
caused by constancy and /or independence
in the rate among third positions, the third
explanation probably is correct. However,
because third positions are strongly saturated (Table 1), it is difficult to distinguish
among these three explanations. A high
degree of saturation will increase the variance in the estimate and will tend to ac-
1997
351
NIELSEN—ESTIMATION OF M T D N A SUBSTITUTION RATES
0.25
First position
Second position
Third position
50
100
150
Distance in codons
200
FIGURE 5. The autocorrelation function along the sequence of mtDNA for the first 200 codons estimated
applying the HKY model with a gamma distribution as the prior distribution. The values of the Ts/Tv bias,
the gamma shape parameter, the mean number of substitutions, and the base frequencies were first estimated
for each codon position separately. These values were then used when the rate was estimated at each site. The
curves were obtained by binomial smoothing.
centuate any deficiencies of the model of
DNA evolution. As more sequences of the
mitochondrial genome become available,
this problem may be alleviated by including less divergent sequences in the analysis.
The pattern observed in the first 40 codons at the first and second codon positions is not surprising. This type of firstorder correlation is expected if the rate at
one site is positively correlated with the
rate at neighboring sites (this pattern was
also observed by Yang, 1995). The general
observation that most proteins are organized into different functional domains
predicts such correlation. Nucleotides
specifying amino acids within functional
domains are expected to have lower rates
than nucleotides located between domains
because these supposedly are more free to
vary. Therefore, one does not need to invoke correlated changes, such as expected
under the covarion hypothesis (Fitch, 1971)
or when gene conversions are frequent, to
explain this pattern of correlated rates.
However, the second peak in the autocorrelation function (at a distance of 75 codons) cannot in itself be explained by the
observation that nucleotides close to each
other have similar rates. Instead, it may be
an effect of the organization of the secondary structure of the mitochondrially encoded proteins or it may be caused by variation in the GC content in the
mitochondrial genome.
There are several important consequences of the observation that rates are
strongly spatially correlated. First, the variance in the estimate of parameters obtained using phylogenetic methods may be
greater than that originally reported. Second, tests aimed at detecting gene conversion, recombination, or conflicting phylogenetic signal should be interpreted with
caution. Such tests typically examine
whether the distribution of certain site patterns along the sequence is clustered. For
example, some tests for gene conversion
search for runs of certain site patterns (see
Takahata, 1994, for a review). However,
352
VOL. 46
SYSTEMATIC BIOLOGY
some degree of clustering is expected simply because rates are spatially correlated.
If such tests implicitly or explicitly assume
that the rates at different sites are independent, the tests may not be valid when rates
are correlated.
Maximum likelihood estimate of the rate
at each site in a DNA sequence is an obvious method for analyzing rate variation
and the factors underlying molecular evolution. This method performs much better
than the commonly applied parsimony estimator when the levels of divergence are
high. However, the method assumes that
certain parameters, such as topology and
branch lengths of the underlying phylogenetic tree, are known. There may be
three situations where this assumption is
reasonable. First, when the number of sites
is very high one may trust the large sample properties of the maximum likelihood
estimator and conclude that under the
model the estimates obtained are equal to
the true values. Second, one may perform
robustness analyses (e.g., by simulations)
to assess to what degree the conclusions of
the study are affected by the assumption
of knowledge about the true parameter
values. Third, in some cases it may be established a priori that the conclusions are
not sensitive to any bias introduced by assuming a wrong topology or wrong
branch lengths. For example, in the present
case, assuming a wrong set of branch
lengths could not possibly in itself introduce any spurious correlation where no
correlation existed. However, if the method
is heuristically applied to data sets containing many fewer sites than in the present example or to examine other problems,
the robustness of the method should be
carefully analyzed. In such cases, it may
often be more appropriate to apply the
method of Yang (1995).
The maximum likelihood approach described here is applicable for purposes other than the study of the correlation of rates.
For example, the likelihood value at each
site provides a measure of the support of
the tree and the model at that site given
the estimated mutation rate. In this way, il
and the corresponding likelihood value
may be used jointly for assessing the difference in the support of a tree along the
sequence.
The potential dependence of the site-bysite rate estimates on the assumed branch
lengths and topology should not prevent
researchers from using this method. As
more sequence data become available, estimates of branch lengths will gradually
improve. Methods such as the one presented here, which take advantage of the
information obtained in estimates of the
topology and branch lengths of phylogenetic trees, are another means of examining many interesting evolutionary questions.
ACKNOWLEDGMENTS
I thank M. Slatkin, J. Huelsenbeck, Z. Yang, M. Hasegawa, C. Simon, T. Speed, and an anonymous reviewer for comments and discussion. I am grateful to Mike
Cummings for providing the aligned nucleotide sequences. This work was supported in part by NIH
grant GM40282 to M. Slatkin and by personal grants
to R.N. from the Danish Research Academy and the
Danish Research Council.
REFERENCES
ANDERSON, S., A. T. BANKIER, B. G. BARRELL, M. H.
DE BRUIJN, A. R. COULSON, J. DROUIN, I. C. EPERON,
D. P. NIERLICH, B. A. ROE, AND F. SANGER. 1981.
Sequence and organization of the human mitochondrial genome. Nature 290:457-465.
ANDERSON, S V M. H. L. DE BRUIJNA, A. R. COULSON,
I. C. EPERON, F. SANGER, AND I. G. YOUNG. 1982.
Complete sequence of bovine mitochondrial DNA.
Conserved features of the mammalian mitochondrial genome. J. Mol. Biol. 156:683-717.
ARNASON, U, A. GULLBERG, AND B. WIDEGREN. 1991.
The complete nucleotide sequence of the fin whale,
Balaenoptera physalus. J. Mol. Evol. 33:556-568.
ARNASON, U, AND E. JOHNSON. 1992. The complete
nucleotide sequence of the mitochondrial DNA of
the harbor seal, Phoca vitulina. J. Mol. Evol. 34:493505.
BIBB, M. J., R. A. VAN ETTEN, C. T. WRIGHT, M. W.
WALBERG, AND D. A. CLAYTON. 1981. Sequence and
gene organization of mouse mitochondrial DNA.
Cell 26:167-180.
Box, G. E. P., AND G. M. JENKINS. 1976. Time series
analysis: forecasting and control. Holden-Day, San
Francisco.
CHANG, Y.-S., F.-L. HUANG, AND T.-B. LO. 1994. The
complete nucleotide sequence and gene organization of carp (Cyprinus carpio) mitochondrial genome.
J. Mol. Evol. 38:138-155.
CUMMINGS, M., J. WAKELEY, AND S. P. OTTO. 1995.
Sampling DNA sequence data in phylogenetic analysis. Mol. Biol. Evol. 12:814-822.
1997
NIELSEN—ESTIMATION OF MTDNA SUBSTITUTION RATES
353
evolutionary rates of base substitutions through
comparative studies of nucleotide sequences. J. Mol.
gene organization of the chicken mitochondrial geEvol. 16:111-120.
nome, a novel gene order in higher vertebrates. J.
Mol. Biol. 121:599-634.
RICE, J. A. 1995. Mathematical statistics and data analysis. Duxbury Press, Belmont, California.
FELSENSTEIN, J. 1981. Evolutionary trees from DNA
sequences: A maximum likelihood approach. J. Mol. ROE, B. A., D.-P. M. WILSON, AND J. F.-H. WONG. 1985.
The complete nucleotide sequence of the Xenopus
Evol. 17:368-376.
laevis mitochondrial genome. J. Biol. Chem. 260:
FITCH, W. M. 1971. Rate of change of concomitantly
9759-9774.
variable codons. J. Mol. Evol. 1:84-96.
TAKAHATA, N. 1994. Comments on the detection of
GADALETA, G., G. PEPE, C. DE CANDIA, C. QUAGLIreciprocal recombination or gene conversion. ImARIELLO, E. SBIS, AND C. SACCONE. 1989. The complete nucleotide sequence of the Rattits norvegicus munogenetics 39:146-149.
mitochondrial genome: Signals revealed by com- TZENG , C. S., C.-F. Hu, S.-C. SHEN, AND P. C. HUANG.
parative analysis between vertebrates. J. Mol. Evol.
1992. The complete nucleotide sequence of the Crossostoma lacustre mitochondrial genome: Conserva28:497-516.
tion and variations among vertebrates. Nucleic AcHASEGAWA, M., H. KISHINO, AND T. YANO. 1985. Datids Res. 20:4853-4858.
ing of the human-ape splitting by a molecular clock
UZZELL, T., AND K. W. CORBIN. 1971. Fitting discrete
of mitochondrial DNA. J. Mol. Evol. 22:160-174.
probability distributions to evolutionary events. SciHOROVITZ, I., AND A. MEYER. 1995. Systematics of
ence 172:1089-1096.
New World monkeys (Platyrrhini, Primates) based
on 16S mitochondrial DNA sequences: A compara- YANG, Z. 1993. Maximum-likelihood estimation of
phylogeny from DNA sequences when substitution
tive analysis of different weighting methods in clarates differ over sites. Mol. Biol. Evol. 39:105-111.
distic analysis. Mol. Phylogenet. Evol. 4:448-456.
JIN, L., AND M. NEI. 1990. Limitations of the evolu- YANG, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates
tionary parsimony of phylogenetic analysis. Mol.
over sites: Approximate methods. J. Mol. Evol. 39:
Biol. Evol. 7:82-102.
306-314.
JUKES, T. H., AND C. R. CANTOR. 1969. Evolution of
protein molecules. Pages 21-132 in Mammalian YANG, Z. 1995. A space-time process model for the
evolution of DNA sequences. Genetics 139:993-1005.
protein metabolism III (H. N. Munro, ed.). AcademYANG, Z. 1996. Among-site rate variation and its imic Press, New York.
pact on phylogenetic analyses. Trends Ecol. Evol.
KELLY, C, AND J. RICE. 1996. Modeling nucleotide
11:367-372.
evolution: A heterogeneous rate analysis. Math.
Biosci. 133:85-109.
Received 2 April 1996; accepted 15 January 1997
KIMURA, M. 1980. A simple method for estimating Associate Editor: Chris Simon
DESJARDINS, P., AND R. MORAIS. 1990. Sequence and