J Mol Evol (1989) 29:407-411 Journal of Molecular Evolution (~) Springer-Verlag New York Inc, 1989 D N A Microenvirenments and the Molecular Clock C. S a c c o n e , t G . Pesole, ~ a n d G . P r e p a r a t a 2 1Centro Studi sui Mitocondri e Metabolismo Energetico, CNR, presso Dipartimento di Biochimica e Biologia Molecolare, Universit/t di Bad, Italy 2 Dipartimento di Fisica, Universith di Milano, Italy Summary. A few y e a r s ago we p r e s e n t e d a stat i o n a r y M a r k o v m o d e l o f gene e v o l u t i o n a c c o r d i n g to w h i c h o n l y h o m o l o g o u s genes f r o m n o t t o o div e r g e n t species o b e y i n g the c o n d i t i o n o f b e i n g stat i o n a r y m a y b e h a v e as r e l i a b l e m o l e c u l a r clocks. A c o m p a r t m e n t a l i z e d model of the nuclear g e n o m e in w h i c h the g e n e s are d i s t r i b u t e d i n c o m p a r t m e n t s , the isochores, d e f i n e d b y t h e i r G + C c o n t e n t h a s b e e n p r o p o s e d recently. W e h a v e f o u n d t h a t o n l y h o m o l o g o u s gene p a i r s t h a t are s t a t i o n a r y , a n d b e l o n g to t h e s a m e i s o c h o r e , c a n b e u s e d c o n s i s t e n t l y for t h e d e t e r m i n a t i o n o f p h y l o g e n y a n d b a s e s u b s t i t u t i o n rate. I n p a r t i c u l a r , for the r o d e n t - h u m a n c o u p l e , o n l y a b o u t h a l f o f t h e h o m o l o g o u s gene p a i r s are s t a t i o n a r y . S t a t i o n a r y g e n e s e v o l v e at t h e t h i r d silent c o d o n p o s i t i o n w i t h t h e s a m e v e l o c i t y i n d e p e n d e n t o f t h e g e n e s a n d b a s e c o m p o s i t i o n . By c o n trast, n o n s t a t i o n a r y g e n e s d i s p l a y a p p a r e n t rate v a l ues ( p s e u d o v e l o c i t i e s ) t h a t are s i g n i f i c a n t l y higher. O u r results cast d o u b t u p o n r e c e n t c l a i m s o f a large a c c e l e r a t i o n i n the rate o f m o l e c u l a r e v o l u t i o n i n rodents. Key words: Stationary M a r k o v process -- Silent s u b s t i t u t i o n rates - - P s e u d o v e l o c i t i e s - - Base c o m position -- Isochore -- Nuclear genes -- R o d e n t s -- Human -- Artiodactyls Introduction T h e p r o b l e m o f t h e e x i s t e n c e o f a m o l e c u l a r clock that marks the e v o l u t i o n of biological m a c r o m o l e cules is j u s t a q u a r t e r o f a c e n t u r y o l d ( Z u c k e r k a n d l Offprint requests to: C. Saccone and well that and P a u l i n g 1962), a n d it n o w a p p e a r s to be r a t h e r supported experimentally. However, granted such a clock is at work, the q u e s t i o n o f w h y h o w it w o r k s r e m a i n s c o n t r o v e r s i a l . Is it ac- Table 1. Nuclear genes used in our analysis together with their G+C content at the third position and the resulting stationarity check for each pair of genes compared at this position Third silent position % G+C and stationarity check Genes H vs A H vs R Interleukin-2 Parathyroid Relaxin Thyrotropin/3 Proglucagon Na,K-ATPase, 13 ANF /3-globin ~-globin UPA a-cardiac actin Prolactin Interferon a CCK Metallothionein I POMC a-globin Luteinizingl3 Insulin a-skeletal actin 37/38 42/41 36/38 41/42 50/54 53/58 70/68 71/69 66/67 66/64 Y Y Y Y Y Y Y Y Y Y 66/62 66/77 77/75 78/84 91/90 88/87 82/83 Y N Y Y Y Y Y 37/55 42/50 36/52 41/55 50/56 53/65 70/68 71/71 66/62 66/59 65/59 66/52 66/66 77/75 78/85 91/85 88/71 82/70 83/73 89/78 A vs R N N N N Y N Y Y Y Y Y N Y Y Y Y N N N N 38/55 41/50 38/52 42/55 54/56 58/65 68/68 69/71 67/62 64/59 N N N N Y N Y Y Y Y 62/52 77/66 75/75 84/85 90/85 87/71 83/70 N N Y Y Y N N D a t a sources: All sequences were extracted through our software ACNUC (Gouy et al. 1985) from the GenBank (1987, release 50) and the EMBL (1988, release 14) data libraries. ANF (atrial natriuretic factor); UPA (urokinase-type plasminogen activator); CCK (cholecystokinin); POMC (pro-opiomelanocortin). The stationary check was performed according to a chi-square test: Y = stationary, N = nonstationary 408 Table 2. Velocities o f silent base substitution at the third position o f codons in nuclear genes for pairwise c o m p a r i s o n s between h u m a n (H), artiodactyls (A), and rodents (Rd) Genes H vs A H vs Rd A vs Rd M vs R 2.3 2.3 3.7 2.1 2.7 2.0 2.0 + 1.3 _+ 0.9 + 2.8 + 1.7 + 2.2 + 0.8 + 0.7 3.1 + 2.0 -+ 2.1 + 2.0 + 2.7 + 2.8+ 2.0 _+ 1.8 2.4 +_ 2.9 1.8 + 1.4 1.7 3.2 2.4 2.2 _+ 1.2 -+ 3.2 + 2.9 +_ 1.9 2.2 + 2.6 3.4 + 2.8 2.3 --- 1.7 4.1 + 5 . 0 1.9 _+ 1.3 1.6 + 0.2 2.1 +_ 0.3 2.1 + 0.3 1.9 _+ 0.5 3.5 2.5 3.2 3.5 3.8 + 1.5 _+ 2.1 _+ 2.4 +- 2.2 _ 2.5 4.6 2.7 3.8 3.8 + 4.0 + 2.2 _ 3.5 _+ 2.3 4.0 _ 3.3 3.1 _+2.5 2.7 _+ 2.3 2.6 +_ 1.7 3.2 _ 1.9 2.8_+2.1 3.4 - 2.8 4.1 _ 3.7 Velocities for stationary couples Interleukin-2 Parathyroid Relaxin Thyrotropin/3 Proglucagon Na,K-ATPase,/3 ANF /3-globin ~-globin UPA a-cardiac actin Prolactin Interferon a CCK Metallothionein I POMC Supergene 1.4 1.5 2.1 2.1 1.4 1.2 1.9 1.9 2.3 1.5 +_ 1.0 + 1.1 +- 1.5 + 1.2 + 0.7 + 0.5 _+ 1.3 + 1.3 + 1.6 + 0.6 3.3+2.1 1.7 0.8 1.5 1.4 1.9 1.1 2.2 -+ 1.2 2.2 + 1.4 2.6 +_ 1.5 Pseudovelocities for nonstationary couples Interleukin-2 Parathyroid Relaxin Thyrotropin ~ Prolactin Interferon a a-globin Leuteinizing/3 Insulin a-skeletal actin Supergene 2.1 + 1.5 2.1 +- 1.5 3.0 - 0.4 2.9 +- 0.4 I f a gene is sequenced in both m o u s e (M) and rat (R), we c o m p a r e d only the m o u s e gene to nonrodent genes. The velocities are substitutions per site per billion years. The a s s u m e d divergence times are 75 Myr for h u m a n s , artiodactyls, and rodents, and 30 Myr for rat and m o u s e (Wilson et al. 1977). This latter estimate, which stems from molecular c o m p a r i s o n s o f proteins encoded by nuclear genes (Wilson et al. 1977) and o f mitochondrial genes (Lanave et al. 1985), contrasts with that used by Li et al. (1987), namely 15 Myr. Wilson et al. (1987) have pointed out that the rodent fossil record is actually consistent with an ancient separation time for rat and m o u s e cording to the generation time or rather to the physical time? Is it regular? Or rather, does it accelerate and decelerate along particular lineages? All these questions appear in the long literature on the subject (see Kimura 1987; Wilson et al. 1987; Zuckerkandl 1987 for reviews). One way to answer such questions is to have at our disposal (1) an exhaustive set of homologous gene sequences, and (2) a well-defined and consistent mathematical model for gene evolution. Although the desirability of the first ingredient is obvious to everybody, it seems to us that the second has been underrated by most researchers in the field. For more than two decades, journals have pubfished mathematical papers on molecular evolution that rest on an unstated assumption which is that the sequences compared are alike in base composition. Therefore, we proposed as a model for molecular evolution the simplest of all stochastic pro- cesses: the s t a t i o n a r y M a r k o v process, whose mathematical structure is simple enough to allow us to draw reliable conclusions (Lanave et al. 1984, 1985; Preparata and Saccone 1987; Saccone et al. 1987). The crucial feature of our stationary Markov model is that the base populations (the fraction of the four bases, denoted qi, where i = A,C,G,T) o f homologous genes to be compared are, within the calculable statistical fluctuations, equal (stationary). In such cases one can, in a well-defined way, reconstruct, from the observed types of base differences, the dynamical characteristics of the evolutionary process and draw, given a reliable time o f divergence between any two species, quantitative phylogenetic trees. Our analysis of several mitochondrial and nuclear genes for a number of different species (Lanave et al. 1984, 1985; Preparata and Saccone 1987; Saccone et al. 1987) has shown that many homologous pairs of genes, even from closely related species (such 409 Fig. 1. Synonymous rate for stationary (velocities) and nonstationary genes (pseudovelocities)between rodents and humans according to the Li method (Wu and Li 1985). as m a m m a l s ) , d i s p l a y d i f f e r e n t qi v a l u e s , t h u s v i olating the condition of stationarity (sometimes called base compositional equilibrium). The existence of nonrandom mutation pressure, usually called directional mutation pressure, toward a h i g h e r o r l o w e r G + C c o n t e n t o f D N A , is a w e l l r e c o g n i z e d p h e n o m e n o n ( M u t o a n d O s a w a 1987; S u e o k a 1988). H o w e v e r , so f a r n o b o d y h a s s t u d i e d the implication of this important process upon the quantitative estimation of sequence divergence. Recently, Bernardi and coworkers (Bernardi and Bern a r d i 1985, 1986; B e r n a r d i et al. 1985) p r e s e n t e d evidence for the existence of well-defined microenv i r o n m e n t s in t h e n u c l e a r g e n o m e o f w a r m - b l o o d e d v e r t e b r a t e s . T h e y h a v e s h o w n t h a t in t h e s e a n i m a l s the g e n e s a r e d i s t r i b u t e d in c o m p a r t m e n t s , t h e i s o chores, defined by different G+C contents. This c o n c e p t is s u p p o r t e d b y t h e r e c e n t d a t a o f K o r e m b e r g a n d R y k o w s k i (1988) w h o h a v e d e m o n s t r a t e d a correlation between chromosome bands and genornic clustering. Our paper examines the bearing of t h e n o t i o n o f i s o c h o r e s , i.e., o f w e l l - d e f i n e d m i croenvironments, on the characterization of gene Pairs suited for evolutionary studies. Methods and Results Stationarity Tests. Using the method of Lanave et al. (1984), we emphasize the determination of the size of statistical fluctuations, which are generally neglected in the majority of studies, even though the existence of large variations among different estimates is well recognized. A list of the nuclear genes used in our analysis for human, artiodactyls, and rodents (rat and mouse) together with their G + C content at the third silent codon position and a check on whether they are stationary is given in Table 1. Clearly, homologous genes may or may not be stationary. This probably corresponds to different gene compartmentalization in the nuclear genome of various species. In particular the compartmentalizations in human and artiodactyls are seen to possess a higher similarity than in rodents. Stationary and nonstationary pairs of genes presumably belong to the same or different isochores, respectively. In several cases this has been demonstrated experimentally (Bernardi et al. 1985). Velocities and Pseudovelocities. In Table 2 the calculated velocities of base substitution obtained by comparing, for each gene at the third silent codon position, the various different species (human vs artiodactyls, human vs rodents, artiodactyls vs rodents, rat vs mouse) are displayed. In order to reduce the statistical fluctuation, as in our previous papers (Lanave et al. 1984, 1985; Preparata and Saceone 1987; Saccone et al. 1987), we have linked together all the stationary and the nonstationary genes creating putative "supergenes," whose third silent codon posi- 410 tions were analyzed together in different couples (the larger the sequence, the smaller the error). The base substitution rates for these supergenes also appear in Table 2. In the case ofa nonstationary comparison, to which our model cannot be applied (Lanave et al. 1984, 1985; Preparata and Saccone 1987; Saccone et al. 1987), the numbers refer to values denoted "pseudovelocities," obtained by arbitrarily applying symmetry to the counting matrices and by applying to them our stationary Markov model. [Any Markov method that calculates rates without checking how stationary the process is produces in fact pseudovelocities (e.g., in the analysis performed by Wu and Li (1985) and Li et al. (1987)).] Not unexpectedly, pseudovelocities are on the average higher than the velocities of stationary genes, which exhibit, within rather large statistical fluctuations (due to the short length of the sequence compared), a definite degree of universality. This is more evident in the supergene comparison where the reduced statistical fluctuations so obtained allow us to see that all evolutionary velocities are significantly the same, and definitely lower than pseudovelocities (Table 2). Finally, the evolutionary rates of stationary and nonstationary genes between rodents and human calculated according to the Li method (Wu and Li 1985) are reported in Fig. 1. It is clear that the great variability in synonymous substitution rates found by Wu and Li (1985) and Li et al. (1987) and emphasized by Mouchiroud and Gautier (1988) can be explained easily by the lack of a stationary phase between homologous pairs of genes. The data in Fig. 1 show indeed that among the stationary genes the synonymous rate is rather universal. b r a t e s n e e d to b e p a r t i c u l a r l y a w a r e o f t h i s f a c t o r b e c a u s e c o m p a r t m e n t a l i z a t i o n is s u c h a n o t a b l e feat u r e o f t h e s e g e n o m e s ( B e r n a r d i a n d B e r n a r d i 1986). I n a d d i t i o n , o u r r e s u l t s l e a d o n e to ask: h o w c a n the same rate be maintained between genes if the gene is l o c a l i z e d in d i f f e r e n t c o m p a r t m e n t s ? Is t h e p a s s a g e f r o m o n e i s o c h o r e to a n o t h e r g r a d u a l o r sudden? And what are the functional consequences o f s u c h shifts? A s d e m o n s t r a t e d b y B e r n a r d i a n d B e r n a r d i (1986), t h e c o m p o s i t i o n a l c o n s t r a i n o f i s o c h o r e s is r e f l e c t e d in all t h r e e o f t h e c o d o n p o s i t i o n s a n d in t h e r e g i o n s s u r r o u n d i n g t h e g e n e w i t h t h e f o l l o w i n g d e c r e a s i n g o r d e r : t h i r d > first > s e c o n d > flanking regions. This in turn could imply that n o t o n l y t h e s t r u c t u r a l p a r t o f t h e g e n e b u t a l s o its regulation can be changed by changing the microenv i r o n m e n t . M o r e o v e r , a shift f r o m o n e i s o c h o r e to another could be the basis for mechanisms by which homologous genes acquire a new function (paraIog o u s genes), w h e r e a s t h e l o c a l i z a t i o n o f a g e n e f a m i l y in t h e s a m e i s o c h o r e c o u l d h e l p to e x p l a i n c o n c e r t e d evolution and some aspects of molecular drive ( D o v e r 1987). Acknowledgments. This work was partially financed by the Discussion Our analyses indicate that once the microenvironmerit has been taken into account adequately, molecular evolution within the boundaries of that mic r o e n v i r o n m e n t t i c k s at a w e l l - d e f i n e d a n d r a t h e r u n i v e r s a l rate. F u r t h e r m o r e t h e i m p o r t a n t r o l e o f gene c o m p a r t m e n t s in t h e s t r u c t u r e o f t h e M a r k o v process describing their molecular evolution gives us a n e w c r i t e r i o n f o r g e n e " o r t h o l o g y , " t h e e q u i v a l e n c e class o f e v o l u t i o n a r y d y n a m i c s : it is o n l y f o r homologous genes belonging to the same compartm e n t s t h a t a s i m p l e M a r k o v p r o c e s s c a n h o p e to d e s c r i b e t h e i r e v o l u t i o n . T h i s fact c a s t s s e r i o u s doubts upon some recent determinations of evol u t i o n a r y r a t e s b y L i et al. (1987), w h o c l a i m a large acceleration of molecular evolution of rodents. These claims are based on two unreliable assumptions: the use o f p s e u d o v e l o c i t i e s a n d t h e r a t - m o u s e d i v e r gence t i m e o f 15 m i l l i o n y e a r s ( M y r ) . By c o n f i n i n g a t t e n t i o n t o t r u e v e l o c i t i e s a n d t a k i n g for r a t - m o u s e t h e d i v e r g e n c e t i m e o f 30 M y r , we o b t a i n a n e v o l u t i o n a r y r a t e c o n s i s t e n t w i t h all o t h e r m a m m a l s (see T a b l e 2). O u r r e s u l t s d e m o n s t r a t i n g t h e effect o f b a s e c o m positional constraints on estimates of sequence divergence also may explain some unexpected results in m o l e c u l a r p h y l o g e n y s u c h as t h e c l o s e r r e l a t i o n ship between birds and mammals than with reptiles o r a m p h i b i a n s ( B i s h o p a n d F r i d a y 1987). S t u d e n t s of molecular evolution in warm-blooded verte- M.P.I. (40%), by Progetto Finalizzato Ingegneria Genetica e Basi Molecolari delle Malattie Ereditarie (CNR), and by the Progetto Strategico Genoma Umano (CNR), Italy. References Bernardi G, Bernardi GJ ( 1985) Codon usage and genome composition. J Mol Evol 22:363-365 Bernardi G, Bernardi G (1986) Compositional constraints and genome evolution. J Mol Evol 24:1-11 Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953-958 Bishop MJ, Friday AE (1987) In: Patterson C (ed) Molecules and morphology in evolution: conflict or compromise? Cambridge University Press Dover GA (1987) DNA turnover and molecular clock. J Mol Evol 26:47-58 EMBL (1988) Release 14.0. European Molecular Biology Laboratory, Heidelberg GenBank (1987) Release 50.0. Bolt, Beranek and Newman, Cambridge MA Gouy M, Gautier C, Attimonelli M, Lanave C, Di Paola G (1985) ACNUC-- a portable retrieval system for nucleic acid sequence database: logical and physical designs and usage. CABIOS 1:167-172 KimuraM (1987) Molecular evolutionary clock and the neutral theory. J Mol Evol 26:24-33 Koremberg JR, Rykowski MC (1988) Human genome organization: Alu, Lines, and the molecular structure ofmetaphase chromosome bands. Cell 53:39 I--400 Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86-93 Lanave C, Preparata G, Saccone C (1985) Mammalian genes as molecular clock? J Mol Evol 21:346-350 411 Li W-H, Tanimura M, Sharp PM (1987) An evaluation of the molecular clock hypothesis using mammalian DNA sequences. J Mol Evol 25:330-342 Mouchiroud D, Gautier C (1988) High codon usage changes in mammalian genes. Mol Biol Evol 5:192-194 Muto A, Osawa S (1987) The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA 84:166-169 Preparata G, Saccone C (1987) A simple quantitative model of the molecular clock. J Mol Evol 26:7-15 Saccone C, Preparata G, Lanave C (1987) Chance, stochasticity and evolution: the Markov clock. In: Quagliariello E, Bernardi G, Ullmann A (eds) Enzyme adaptation to natural philosophy: heritage from Jacques Monod. Elsevier Science Publishers B.V. (Biomedical Division), pp 159-172 Sueoka N (1988) Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci USA 85:2653-2657 Wilson AC, Cadson SS, White TJ (1977) Biochemical evolution. Annu Rev Biochem 46:573-639 Wilson AC, Ochman H, Prager EM (1987) Molecular time scale for evolution. Trends Genet 3:241-247 Wu C, Li W-H (1985) Evidence for higher rates ofnucleotide substitution in rodents than in man. Proe Natl Acad Sci USA 82:1741-1745 Zuckerkandl E (1987) On the molecular evolutionary clock. J Mol Evol 26:34-46 Zuckerkandl E, Pauling L (1962) Molecular disease, evolution and genetic heterogeneity. In: Kasha M, Pullman B (eds) Horizons in biochemistry. Academic Press, New York, pp 189225 Received November 7, 1988/Revised and accepted March 10, 1989
© Copyright 2024 Paperzz