DNA microenvironments and the molecular clock

J Mol Evol (1989) 29:407-411
Journal of
Molecular Evolution
(~) Springer-Verlag New York Inc, 1989
D N A Microenvirenments and the Molecular Clock
C. S a c c o n e , t G . Pesole, ~ a n d G . P r e p a r a t a 2
1Centro Studi sui Mitocondri e Metabolismo Energetico, CNR, presso Dipartimento di Biochimica e Biologia Molecolare,
Universit/t di Bad, Italy
2 Dipartimento di Fisica, Universith di Milano, Italy
Summary.
A few y e a r s ago we p r e s e n t e d a stat i o n a r y M a r k o v m o d e l o f gene e v o l u t i o n a c c o r d i n g
to w h i c h o n l y h o m o l o g o u s genes f r o m n o t t o o div e r g e n t species o b e y i n g the c o n d i t i o n o f b e i n g stat i o n a r y m a y b e h a v e as r e l i a b l e m o l e c u l a r clocks. A
c o m p a r t m e n t a l i z e d model of the nuclear g e n o m e in
w h i c h the g e n e s are d i s t r i b u t e d i n c o m p a r t m e n t s ,
the isochores, d e f i n e d b y t h e i r G + C c o n t e n t h a s
b e e n p r o p o s e d recently. W e h a v e f o u n d t h a t o n l y
h o m o l o g o u s gene p a i r s t h a t are s t a t i o n a r y , a n d b e l o n g to t h e s a m e i s o c h o r e , c a n b e u s e d c o n s i s t e n t l y
for t h e d e t e r m i n a t i o n o f p h y l o g e n y a n d b a s e s u b s t i t u t i o n rate. I n p a r t i c u l a r , for the r o d e n t - h u m a n
c o u p l e , o n l y a b o u t h a l f o f t h e h o m o l o g o u s gene p a i r s
are s t a t i o n a r y . S t a t i o n a r y g e n e s e v o l v e at t h e t h i r d
silent c o d o n p o s i t i o n w i t h t h e s a m e v e l o c i t y i n d e p e n d e n t o f t h e g e n e s a n d b a s e c o m p o s i t i o n . By c o n trast, n o n s t a t i o n a r y g e n e s d i s p l a y a p p a r e n t rate v a l ues ( p s e u d o v e l o c i t i e s ) t h a t are s i g n i f i c a n t l y higher.
O u r results cast d o u b t u p o n r e c e n t c l a i m s o f a large
a c c e l e r a t i o n i n the rate o f m o l e c u l a r e v o l u t i o n i n
rodents.
Key words:
Stationary M a r k o v process -- Silent
s u b s t i t u t i o n rates - - P s e u d o v e l o c i t i e s
- - Base c o m position -- Isochore -- Nuclear genes -- R o d e n t s
-- Human -- Artiodactyls
Introduction
T h e p r o b l e m o f t h e e x i s t e n c e o f a m o l e c u l a r clock
that marks the e v o l u t i o n of biological m a c r o m o l e cules is j u s t a q u a r t e r o f a c e n t u r y o l d ( Z u c k e r k a n d l
Offprint requests to: C. Saccone
and
well
that
and
P a u l i n g 1962), a n d it n o w a p p e a r s to be r a t h e r
supported experimentally. However, granted
such a clock is at work, the q u e s t i o n o f w h y
h o w it w o r k s r e m a i n s c o n t r o v e r s i a l . Is it ac-
Table 1. Nuclear genes used in our analysis together with their
G+C content at the third position and the resulting stationarity
check for each pair of genes compared at this position
Third silent position % G+C
and stationarity check
Genes
H vs A
H vs R
Interleukin-2
Parathyroid
Relaxin
Thyrotropin/3
Proglucagon
Na,K-ATPase, 13
ANF
/3-globin
~-globin
UPA
a-cardiac actin
Prolactin
Interferon a
CCK
Metallothionein I
POMC
a-globin
Luteinizingl3
Insulin
a-skeletal actin
37/38
42/41
36/38
41/42
50/54
53/58
70/68
71/69
66/67
66/64
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
66/62
66/77
77/75
78/84
91/90
88/87
82/83
Y
N
Y
Y
Y
Y
Y
37/55
42/50
36/52
41/55
50/56
53/65
70/68
71/71
66/62
66/59
65/59
66/52
66/66
77/75
78/85
91/85
88/71
82/70
83/73
89/78
A vs R
N
N
N
N
Y
N
Y
Y
Y
Y
Y
N
Y
Y
Y
Y
N
N
N
N
38/55
41/50
38/52
42/55
54/56
58/65
68/68
69/71
67/62
64/59
N
N
N
N
Y
N
Y
Y
Y
Y
62/52
77/66
75/75
84/85
90/85
87/71
83/70
N
N
Y
Y
Y
N
N
D a t a sources: All sequences were extracted through our software
ACNUC (Gouy et al. 1985) from the GenBank (1987, release
50) and the EMBL (1988, release 14) data libraries. ANF (atrial
natriuretic factor); UPA (urokinase-type plasminogen activator);
CCK (cholecystokinin); POMC (pro-opiomelanocortin). The stationary check was performed according to a chi-square test: Y =
stationary, N = nonstationary
408
Table 2. Velocities o f silent base substitution at the third position o f codons in nuclear genes for pairwise c o m p a r i s o n s between
h u m a n (H), artiodactyls (A), and rodents (Rd)
Genes
H vs A
H vs Rd
A vs Rd
M vs R
2.3
2.3
3.7
2.1
2.7
2.0
2.0
+ 1.3
_+ 0.9
+ 2.8
+ 1.7
+ 2.2
+ 0.8
+ 0.7
3.1 +
2.0 -+
2.1 +
2.0 +
2.7 +
2.8+
2.0 _+ 1.8
2.4 +_ 2.9
1.8 + 1.4
1.7
3.2
2.4
2.2
_+ 1.2
-+ 3.2
+ 2.9
+_ 1.9
2.2 + 2.6
3.4 + 2.8
2.3 --- 1.7
4.1 + 5 . 0
1.9 _+ 1.3
1.6 + 0.2
2.1 +_ 0.3
2.1 + 0.3
1.9 _+ 0.5
3.5
2.5
3.2
3.5
3.8
+ 1.5
_+ 2.1
_+ 2.4
+- 2.2
_ 2.5
4.6
2.7
3.8
3.8
+ 4.0
+ 2.2
_ 3.5
_+ 2.3
4.0 _ 3.3
3.1 _+2.5
2.7 _+ 2.3
2.6 +_ 1.7
3.2 _ 1.9
2.8_+2.1
3.4 - 2.8
4.1 _ 3.7
Velocities for stationary couples
Interleukin-2
Parathyroid
Relaxin
Thyrotropin/3
Proglucagon
Na,K-ATPase,/3
ANF
/3-globin
~-globin
UPA
a-cardiac actin
Prolactin
Interferon a
CCK
Metallothionein I
POMC
Supergene
1.4
1.5
2.1
2.1
1.4
1.2
1.9
1.9
2.3
1.5
+_ 1.0
+ 1.1
+- 1.5
+ 1.2
+ 0.7
+ 0.5
_+ 1.3
+ 1.3
+ 1.6
+ 0.6
3.3+2.1
1.7
0.8
1.5
1.4
1.9
1.1
2.2 -+ 1.2
2.2 + 1.4
2.6 +_ 1.5
Pseudovelocities for nonstationary couples
Interleukin-2
Parathyroid
Relaxin
Thyrotropin ~
Prolactin
Interferon a
a-globin
Leuteinizing/3
Insulin
a-skeletal actin
Supergene
2.1 + 1.5
2.1 +- 1.5
3.0 - 0.4
2.9 +- 0.4
I f a gene is sequenced in both m o u s e (M) and rat (R), we c o m p a r e d only the m o u s e gene to nonrodent genes. The velocities are
substitutions per site per billion years. The a s s u m e d divergence times are 75 Myr for h u m a n s , artiodactyls, and rodents, and 30 Myr
for rat and m o u s e (Wilson et al. 1977). This latter estimate, which stems from molecular c o m p a r i s o n s o f proteins encoded by nuclear
genes (Wilson et al. 1977) and o f mitochondrial genes (Lanave et al. 1985), contrasts with that used by Li et al. (1987), namely 15
Myr. Wilson et al. (1987) have pointed out that the rodent fossil record is actually consistent with an ancient separation time for rat
and m o u s e
cording to the generation time or rather to the physical time? Is it regular? Or rather, does it accelerate
and decelerate along particular lineages? All these
questions appear in the long literature on the subject
(see Kimura 1987; Wilson et al. 1987; Zuckerkandl
1987 for reviews). One way to answer such questions
is to have at our disposal (1) an exhaustive set of
homologous gene sequences, and (2) a well-defined
and consistent mathematical model for gene evolution. Although the desirability of the first ingredient is obvious to everybody, it seems to us that
the second has been underrated by most researchers
in the field.
For more than two decades, journals have pubfished mathematical papers on molecular evolution
that rest on an unstated assumption which is that
the sequences compared are alike in base composition. Therefore, we proposed as a model for molecular evolution the simplest of all stochastic pro-
cesses: the s t a t i o n a r y M a r k o v process, whose
mathematical structure is simple enough to allow
us to draw reliable conclusions (Lanave et al. 1984,
1985; Preparata and Saccone 1987; Saccone et al.
1987). The crucial feature of our stationary Markov
model is that the base populations (the fraction of
the four bases, denoted qi, where i = A,C,G,T) o f
homologous genes to be compared are, within the
calculable statistical fluctuations, equal (stationary).
In such cases one can, in a well-defined way, reconstruct, from the observed types of base differences,
the dynamical characteristics of the evolutionary
process and draw, given a reliable time o f divergence
between any two species, quantitative phylogenetic
trees. Our analysis of several mitochondrial and nuclear genes for a number of different species (Lanave
et al. 1984, 1985; Preparata and Saccone 1987; Saccone et al. 1987) has shown that many homologous
pairs of genes, even from closely related species (such
409
Fig. 1. Synonymous rate for stationary (velocities) and nonstationary genes (pseudovelocities)between rodents and humans according
to the Li method (Wu and Li 1985).
as m a m m a l s ) , d i s p l a y d i f f e r e n t qi v a l u e s , t h u s v i olating the condition of stationarity (sometimes
called base compositional equilibrium).
The existence of nonrandom mutation pressure,
usually called directional mutation pressure, toward
a h i g h e r o r l o w e r G + C c o n t e n t o f D N A , is a w e l l r e c o g n i z e d p h e n o m e n o n ( M u t o a n d O s a w a 1987;
S u e o k a 1988). H o w e v e r , so f a r n o b o d y h a s s t u d i e d
the implication of this important process upon the
quantitative estimation of sequence divergence. Recently, Bernardi and coworkers (Bernardi and Bern a r d i 1985, 1986; B e r n a r d i et al. 1985) p r e s e n t e d
evidence for the existence of well-defined microenv i r o n m e n t s in t h e n u c l e a r g e n o m e o f w a r m - b l o o d e d
v e r t e b r a t e s . T h e y h a v e s h o w n t h a t in t h e s e a n i m a l s
the g e n e s a r e d i s t r i b u t e d in c o m p a r t m e n t s , t h e i s o chores, defined by different G+C contents. This
c o n c e p t is s u p p o r t e d b y t h e r e c e n t d a t a o f K o r e m b e r g a n d R y k o w s k i (1988) w h o h a v e d e m o n s t r a t e d
a correlation between chromosome bands and genornic clustering. Our paper examines the bearing of
t h e n o t i o n o f i s o c h o r e s , i.e., o f w e l l - d e f i n e d m i croenvironments, on the characterization of gene
Pairs suited for evolutionary studies.
Methods and Results
Stationarity Tests. Using the method of Lanave et al. (1984), we
emphasize the determination of the size of statistical fluctuations,
which are generally neglected in the majority of studies, even
though the existence of large variations among different estimates
is well recognized. A list of the nuclear genes used in our analysis
for human, artiodactyls, and rodents (rat and mouse) together
with their G + C content at the third silent codon position and a
check on whether they are stationary is given in Table 1. Clearly,
homologous genes may or may not be stationary. This probably
corresponds to different gene compartmentalization in the nuclear genome of various species. In particular the compartmentalizations in human and artiodactyls are seen to possess a higher
similarity than in rodents. Stationary and nonstationary pairs of
genes presumably belong to the same or different isochores, respectively. In several cases this has been demonstrated experimentally (Bernardi et al. 1985).
Velocities and Pseudovelocities. In Table 2 the calculated velocities of base substitution obtained by comparing, for each gene
at the third silent codon position, the various different species
(human vs artiodactyls, human vs rodents, artiodactyls vs rodents, rat vs mouse) are displayed. In order to reduce the statistical fluctuation, as in our previous papers (Lanave et al. 1984,
1985; Preparata and Saceone 1987; Saccone et al. 1987), we have
linked together all the stationary and the nonstationary genes
creating putative "supergenes," whose third silent codon posi-
410
tions were analyzed together in different couples (the larger the
sequence, the smaller the error). The base substitution rates for
these supergenes also appear in Table 2.
In the case ofa nonstationary comparison, to which our model
cannot be applied (Lanave et al. 1984, 1985; Preparata and Saccone 1987; Saccone et al. 1987), the numbers refer to values
denoted "pseudovelocities," obtained by arbitrarily applying
symmetry to the counting matrices and by applying to them our
stationary Markov model. [Any Markov method that calculates
rates without checking how stationary the process is produces in
fact pseudovelocities (e.g., in the analysis performed by Wu and
Li (1985) and Li et al. (1987)).]
Not unexpectedly, pseudovelocities are on the average higher
than the velocities of stationary genes, which exhibit, within rather large statistical fluctuations (due to the short length of the
sequence compared), a definite degree of universality. This is
more evident in the supergene comparison where the reduced
statistical fluctuations so obtained allow us to see that all evolutionary velocities are significantly the same, and definitely lower than pseudovelocities (Table 2).
Finally, the evolutionary rates of stationary and nonstationary
genes between rodents and human calculated according to the Li
method (Wu and Li 1985) are reported in Fig. 1. It is clear that
the great variability in synonymous substitution rates found by
Wu and Li (1985) and Li et al. (1987) and emphasized by Mouchiroud and Gautier (1988) can be explained easily by the lack
of a stationary phase between homologous pairs of genes. The
data in Fig. 1 show indeed that among the stationary genes the
synonymous rate is rather universal.
b r a t e s n e e d to b e p a r t i c u l a r l y a w a r e o f t h i s f a c t o r
b e c a u s e c o m p a r t m e n t a l i z a t i o n is s u c h a n o t a b l e feat u r e o f t h e s e g e n o m e s ( B e r n a r d i a n d B e r n a r d i 1986).
I n a d d i t i o n , o u r r e s u l t s l e a d o n e to ask: h o w c a n
the same rate be maintained between genes if the
gene is l o c a l i z e d in d i f f e r e n t c o m p a r t m e n t s ? Is t h e
p a s s a g e f r o m o n e i s o c h o r e to a n o t h e r g r a d u a l o r
sudden? And what are the functional consequences
o f s u c h shifts? A s d e m o n s t r a t e d b y B e r n a r d i a n d
B e r n a r d i (1986), t h e c o m p o s i t i o n a l c o n s t r a i n o f i s o c h o r e s is r e f l e c t e d in all t h r e e o f t h e c o d o n p o s i t i o n s
a n d in t h e r e g i o n s s u r r o u n d i n g t h e g e n e w i t h t h e
f o l l o w i n g d e c r e a s i n g o r d e r : t h i r d > first > s e c o n d
> flanking regions. This in turn could imply that
n o t o n l y t h e s t r u c t u r a l p a r t o f t h e g e n e b u t a l s o its
regulation can be changed by changing the microenv i r o n m e n t . M o r e o v e r , a shift f r o m o n e i s o c h o r e to
another could be the basis for mechanisms by which
homologous genes acquire a new function (paraIog o u s genes), w h e r e a s t h e l o c a l i z a t i o n o f a g e n e f a m i l y
in t h e s a m e i s o c h o r e c o u l d h e l p to e x p l a i n c o n c e r t e d
evolution and some aspects of molecular drive
( D o v e r 1987).
Acknowledgments. This work was partially financed by the
Discussion
Our analyses indicate that once the microenvironmerit has been taken into account adequately, molecular evolution within the boundaries of that mic r o e n v i r o n m e n t t i c k s at a w e l l - d e f i n e d a n d r a t h e r
u n i v e r s a l rate. F u r t h e r m o r e t h e i m p o r t a n t r o l e o f
gene c o m p a r t m e n t s in t h e s t r u c t u r e o f t h e M a r k o v
process describing their molecular evolution gives
us a n e w c r i t e r i o n f o r g e n e " o r t h o l o g y , " t h e e q u i v a l e n c e class o f e v o l u t i o n a r y d y n a m i c s : it is o n l y f o r
homologous genes belonging to the same compartm e n t s t h a t a s i m p l e M a r k o v p r o c e s s c a n h o p e to
d e s c r i b e t h e i r e v o l u t i o n . T h i s fact c a s t s s e r i o u s
doubts upon some recent determinations of evol u t i o n a r y r a t e s b y L i et al. (1987), w h o c l a i m a large
acceleration of molecular evolution of rodents. These
claims are based on two unreliable assumptions: the
use o f p s e u d o v e l o c i t i e s a n d t h e r a t - m o u s e d i v e r gence t i m e o f 15 m i l l i o n y e a r s ( M y r ) . By c o n f i n i n g
a t t e n t i o n t o t r u e v e l o c i t i e s a n d t a k i n g for r a t - m o u s e
t h e d i v e r g e n c e t i m e o f 30 M y r , we o b t a i n a n e v o l u t i o n a r y r a t e c o n s i s t e n t w i t h all o t h e r m a m m a l s
(see T a b l e 2).
O u r r e s u l t s d e m o n s t r a t i n g t h e effect o f b a s e c o m positional constraints on estimates of sequence divergence also may explain some unexpected results
in m o l e c u l a r p h y l o g e n y s u c h as t h e c l o s e r r e l a t i o n ship between birds and mammals than with reptiles
o r a m p h i b i a n s ( B i s h o p a n d F r i d a y 1987). S t u d e n t s
of molecular evolution in warm-blooded verte-
M.P.I. (40%), by Progetto Finalizzato Ingegneria Genetica e Basi
Molecolari delle Malattie Ereditarie (CNR), and by the Progetto
Strategico Genoma Umano (CNR), Italy.
References
Bernardi G, Bernardi GJ ( 1985) Codon usage and genome composition. J Mol Evol 22:363-365
Bernardi G, Bernardi G (1986) Compositional constraints and
genome evolution. J Mol Evol 24:1-11
Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G,
Meunier-Rotival M, Rodier F (1985) The mosaic genome
of warm-blooded vertebrates. Science 228:953-958
Bishop MJ, Friday AE (1987) In: Patterson C (ed) Molecules
and morphology in evolution: conflict or compromise? Cambridge University Press
Dover GA (1987) DNA turnover and molecular clock. J Mol
Evol 26:47-58
EMBL (1988) Release 14.0. European Molecular Biology Laboratory, Heidelberg
GenBank (1987) Release 50.0. Bolt, Beranek and Newman,
Cambridge MA
Gouy M, Gautier C, Attimonelli M, Lanave C, Di Paola G
(1985) ACNUC-- a portable retrieval system for nucleic acid
sequence database: logical and physical designs and usage.
CABIOS 1:167-172
KimuraM (1987) Molecular evolutionary clock and the neutral
theory. J Mol Evol 26:24-33
Koremberg JR, Rykowski MC (1988) Human genome organization: Alu, Lines, and the molecular structure ofmetaphase
chromosome bands. Cell 53:39 I--400
Lanave C, Preparata G, Saccone C, Serio G (1984) A new
method for calculating evolutionary substitution rates. J Mol
Evol 20:86-93
Lanave C, Preparata G, Saccone C (1985) Mammalian genes
as molecular clock? J Mol Evol 21:346-350
411
Li W-H, Tanimura M, Sharp PM (1987) An evaluation of the
molecular clock hypothesis using mammalian DNA sequences. J Mol Evol 25:330-342
Mouchiroud D, Gautier C (1988) High codon usage changes in
mammalian genes. Mol Biol Evol 5:192-194
Muto A, Osawa S (1987) The guanine and cytosine content of
genomic DNA and bacterial evolution. Proc Natl Acad Sci
USA 84:166-169
Preparata G, Saccone C (1987) A simple quantitative model of
the molecular clock. J Mol Evol 26:7-15
Saccone C, Preparata G, Lanave C (1987) Chance, stochasticity
and evolution: the Markov clock. In: Quagliariello E, Bernardi
G, Ullmann A (eds) Enzyme adaptation to natural philosophy: heritage from Jacques Monod. Elsevier Science Publishers B.V. (Biomedical Division), pp 159-172
Sueoka N (1988) Directional mutation pressure and neutral
molecular evolution. Proc Natl Acad Sci USA 85:2653-2657
Wilson AC, Cadson SS, White TJ (1977) Biochemical evolution. Annu Rev Biochem 46:573-639
Wilson AC, Ochman H, Prager EM (1987) Molecular time scale
for evolution. Trends Genet 3:241-247
Wu C, Li W-H (1985) Evidence for higher rates ofnucleotide
substitution in rodents than in man. Proe Natl Acad Sci USA
82:1741-1745
Zuckerkandl E (1987) On the molecular evolutionary clock. J
Mol Evol 26:34-46
Zuckerkandl E, Pauling L (1962) Molecular disease, evolution
and genetic heterogeneity. In: Kasha M, Pullman B (eds) Horizons in biochemistry. Academic Press, New York, pp 189225
Received November 7, 1988/Revised and accepted March 10,
1989