Gene 261 (2000) 189±195 www.elsevier.com/locate/gene The late stage of genetic code structuring took place at a high temperature Massimo Di Giulio* International Institute of Genetics and Biophysics, CNR, Via G. Marconi 10, 80125 Naples, Napoli, Italy Accepted 30 October 2000 Received by G. Bernardi Abstract The correlation between the optimal growth temperature of organisms and a thermophily index based on the propensity of amino acids to enter more frequently into (hyper)thermophile proteins is used to conduct an analysis aiming to establish whether genetic code structuring took place at a low or a high temperature. If the number of codons attributed to the various amino acids in the genetic code constitutes an estimate of the mean amino acid composition of proteins produced when the genetic code was de®nitively structured, then the thermophily index can also be associated to the genetic code. This value and the sampling of the variable thermophily index of different alignments of protein sequences from mesophile, thermophile and hyperthermophile species make it possible to establish, with an extremely high statistical con®dence, that the late stage of genetic code structuring took place in a hyperthermophile (or thermophile) `organism'. Moreover the 95% con®dence interval of the temperature at which the genetic code was ®xed turned out to be 91 ^ 248C. These observations seem to support the hypothesis that the origin of life might have taken place at a high temperature. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Origin of life; Genetic code origin; Thermostability; Thermophily index; Optimal growth temperature 1. Introduction There is a lively open debate on whether the origin of life took place at a high or a low temperature (Achenbach-Richter et al., 1987; Pace, 1991; Holm, 1992; Miller and Lazcano, 1995; Forterre, 1995, 1996; Forterre et al., 1995; Russell et al., 1998; Wachtershauser, 1998). A hot origin of life is favoured, though not proven, for instance by (i) evidence of a phylogenetic nature, which shows that the node of the last universal common ancestor (LUCA) in the tree derived from 16S ribosomal RNA is surrounded by hyperthermophile species which moreover have short branches (Woese, 1987; Wachtershauser, 1988a,b; Woese et al., 1990; Pace, 1991; Stetter, 1995), and (ii) the presumed conditions of the primordial Earth (Baross and Hoffman, 1985; Nisbet, 1985; Shock, 1996) which seem to resemble those of the biotopes of hyperthermophiles. A hot origin of life is also supported by some theories put forward to explain such an origin (Russell et al., 1998; Wachtershauser, 1988a,b; 1998). A low-temperature origin is mainly favoured by those who support the heterotroph theory of the origin of life (Miller and Lazcano, 1995), maintaining that at a high temperature the majority of the building blocks of life * Tel.: 139-081-7257313; fax: 139-081-5936123. E-mail address: [email protected] (M. Di Giulio). would decompose and that this high temperature would be incompatible with the expectations of the RNA world (Miller and Lazcano, 1995; Forterre, 1995, 1996; Forterre et al., 1995). Phylogenetic data (Woese, 1987; Wachtershauser, 1988a,b; Woese et al., 1990; Pace, 1991; Stetter, 1995) favouring a hot origin of life has been weakened by the observation that the guanine plus cytosine content of the ancestral rRNA sequences of the LUCA does not seem to be compatible with its hyperthermophile nature (Galtier et al., 1999) although there is some doubt on the truth of such a conclusion (Di Giulio, 2000). These analyses (Galtier et al., 1999; Di Giulio, 2000) are subject to a number of limitations, such as the intrinsic uncertainty in reconstructing the ancestral rRNA sequences of the LUCA, and the uncertainty deriving from the topology of the phylogenetic tree used in such a reconstruction, i.e. the uncertainty of the position that the root occupies in the tree of life (Philippe and Forterre, 1999; Forterre and Philippe, 1999) and the uncertainty of the relative positions of the various species on this tree (Philippe et al., 2000; Stitter and Hall, 1999). Furthermore, the observation that the LUCA was a mesophile `organism' (Galtier et al., 1999) might imply, though only weakly, that life originated at a low temperature because between the origin of life and the evolution of the LUCA there might have been changes in the physical environment in which life 0378-1119/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(00)00522-9 190 M. Di Giulio / Gene 261 (2000) 189±195 originated (Arrhenius et al., 1999), although the truth of such an argument seems to depend above all on the unclear nature (Doolittle, 2000) of the LUCA (Di Giulio, 2000). In this paper a method that seems to remove most of these limitations was used. Moreover the conclusion reached seems to refer to an evolutionary time prior to that of the evolution of the LUCA, i.e. the late stage of the origin of genetic code organisation and thus a stage closer to the origin of life with the consequence that the conclusion is more easily ascribed to the time in which life originated on our planet. 2. Materials and methods The majority of the sequences used in the analysis were taken from the NCBI database using the BLASTP program (Alschul et al., 1990). In particular, the name of a speci®c protein belonging to a randomly-chosen organism was used to obtain the protein sequence. By means of BLASTP (Alschul et al., 1990) this sequence was used as a probe to obtain all the orthologous proteins present in the database. In many cases, in order to ensure that all the orthologous proteins had been identi®ed, amino acid sequences from the Archaea, Bacteria and Eukarya domains were used as probes for a given protein. The different proteins in the analysis were chosen at random and were not paralogous to one another. The amino acid sequences were aligned using the CLUSTALX program (Thimpson et al., 1997) with its default parameters. Only the alignment zones between highly conserved regions were preserved in the analysis whereas all the amino acid sites containing at least one gap were eliminated. All the alignments used in the analysis are available upon request to the author. The values of the optimal growth temperatures (Topt) of the various organisms were taken from Jacobs and Gerstein (1960) and from Staley et al. (1984). In a number of cases, especially for the eukaryotes, these values were found by consulting the specialised literature. Table 1 Thermophily ranks (See text for their de®nition) Arg 19.50 Trp 18.25 Pro 17.25 Ile 15.50 Tyr 14.75 Cys 13.75 Leu 13.75 Val 13.00 Glu 11.25 Ala 11.00 Phe 10.25 Lys 10.00 His 9.25 Met 7.00 Gly 6.00 Asp 6.00 Gln 5.25 Thr 5.00 Asn 2.25 Ser 1.00 McDonald et al., 1999) using only those values from the Methanococcus species. Finally the mean rank for each amino acid was calculated as emerging from the comparison between the amino acid ranks from the Bacillus species (McDonald et al., 1999) and the already calculated ones from the Methanococcus species. These values are reported in Table 1. They can be considered as thermophily ranks in the sense that a high value indicates a greater propensity for that amino acid to enter the proteins of the (hyper)thermophiles. The thermophily index (TI) that can be associated to any one protein is de®ned as TI N X j1 Rj =N where Rj is the value of the thermophily rank (Table 1) of the j-th amino acid, and N is the total number of amino acids in the considered protein. In order to calculate this index, a simple algorithm using a FASTA format input ®le was written (available upon request to the author). 3.2. The existence of a correlation between the optimal growth temperature of organisms and the thermophily index It was found that there is a strong correlation between the optimal growth temperature of the various organisms and the thermophily index (Figs. 1 and 2). The regression in Fig. 1 was calculated starting from the multiple alignment of the 3. Results 3.1. The construction of a thermophily index Two works (Haney et al., 1999; McDonald et al., 1999) have compared protein sequences from mesophile and thermophile-hyperthermophile species. MacDonald et al.'s Fig. 1 and Haney's Table. 1 make it possible to derive three amino acid rankings, two for the Methanococcus species (Haney et al., 1999; McDonald et al., 1999) and one for the Bacillus species (McDonald et al., 1999). I attributed the highest rank to the amino acids most frequently found in the (hyper)thermophiles. The mean rank value was next calculated for each amino acid between the amino acid ranks obtained from these two works (Haney et al., 1999; Fig. 1. The correlation between the optimal growth temperature (Topt) of the various organisms and the thermophily index. This correlation refers to 46 amino acid sequences of the signal recognition particle (54 kDa). See text for further information. M. Di Giulio / Gene 261 (2000) 189±195 191 thermophily index is highly signi®cant (F 51:8, d:f: 45, P ,, 1023 ). This result was con®rmed for many other proteins (Table 2) with a signi®cance in the regression lines that in several cases is equivalent to that shown in Fig. 1; in other cases the probability was lower than or equal to 10 23, three proteins gave a probability of about 2% while two proteins showed no signi®cance (P ù 0:40); the latter were not included in Table 2. 3.3. The genetic code originated in a thermophile or hyperthermophile `organism' Fig. 2. The correlation between the optimal growth temperature (Topt) of the various organisms and the thermophily index (TI) calculated for 25 amino acid sequences of S-adenosyl-1-homocysteine hydrolase. The vertical line refers to the value that the TI assumes in the genetic code. See text for further information. signal recognition particle (54 kDa) as obtainable at the web site http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html. After removing some isoenzyme sequences and some partial sequences and eliminating some regions of dubious alignment, a multiple alignment of 46 sequences was obtained, each one 356 amino acids long. The regression line (T opt 2444:433 1 48:082TI) (Fig. 1) between the optimal growth temperature of the various organisms and the There has been a long debate on the existence of a positive correlation between the number of codons specifying amino acids in the genetic code and the frequency with which amino acids appear in proteins (King and Jukes, 1969; Jukes et al., 1975; Dufton, 1997). We must therefore believe, at least in part, that the number of codons codifying in the genetic code for the various amino acids was a variable subject to the in¯uence of natural selection at the time of genetic code origin. This viewpoint seems to be supported by the observation that the number of codons attributed to amino acids in the genetic code correlates with the molecular weight of the corresponding amino acid, or more generally with their `size' (Hasegawa and Miyata, 1980; Di Giulio, 1989; Taylor and Coates, 1989). This correlation becomes highly signi®cant if we replace the Table 2 The mean values of the thermophily indice (TI) for the proteins used in the analysis, both for mesophile species (Low T) and for thermophile-hyperthermophile species (High T) with their relative standard deviations a Proteins Carbamoyl-phosphate synthetase Signal recognition particle 54 kDa Threonyl-tRNA synthetase Aspartate carbamoyltransferase Acetylornithine aminotransferase Inosine-5 0 -monophosphate dehydrogenase Beta subunit of tryptophan synthase S-adenosyl-1-homocysteine hydrolase Cell division protein (ftsZ) Glutamate-1-semialdehyde aminotransferase Glutamine-fructose-6-phosphate transferase Phosphoribosylamine-glycine ligase Anthranilate synthetase alpha-subunit Threonine synthase Aspartate aminotransferase Succinyl-CoA synthetase beta subunit Adenylosuccinate synthase Acetyl-CoA synthetase Valyl-tRNA Synthetase Glutamate dehydrogenase (GDH) CTP synthetase Alignment length Mean TI Amino acid number Low T 501 356 374 189 226 338 354 343 296 321 339 243 268 179 302 288 240 257 434 293 356 10.315 10.002 10.729 10.358 10.044 10.059 10.289 10.023 9.848 10.260 10.458 10.439 10.714 9.937 10.317 10.347 10.217 10.503 10.703 10.282 10.497 Standard deviation Number of sequences High T Low T High T Low T High T 10.533 10.654 11.280 10.784 10.317 10.657 10.576 10.586 10.260 10.656 10.771 10.685 10.964 10.299 10.635 10.989 10.444 10.931 11.267 10.754 10.601 0.117 0.250 0.198 0.297 0.270 0.266 0.160 0.082 0.154 0.299 0.232 0.261 0.139 0.310 0.267 0.154 0.190 0.140 0.218 0.110 0.112 0.154 0.276 0.231 0.249 0.2 14 0.099 0.147 0.228 0.198 0.209 0.188 0.185 0.198 0.136 0.274 0.185 0.138 0.159 0.173 0.193 0.132 26 35 26 24 14 18 17 16 20 20 22 22 14 15 10 22 20 16 28 14 20 11 11 8 6 7 7 9 9 8 5 6 8 11 8 11 6 8 5 9 12 8 a The table also indicates the number of protein sequences used for each given protein, again for the mesophile group (Low T) and for the thermophilehyperthermophile group (High T), as well as the length of the multiple alignment of the amino acid sequences after the regions of dubious alignment were removed. 192 M. Di Giulio / Gene 261 (2000) 189±195 number of codons which in the code codify for amino acids with the frequency with which amino acids appear in proteins (Di Giulio, 1989). This indicates that the number of codons was a variable subject to selection (Di Giulio, 1989) because the `size' of amino acids with which the number of codons correlates is important for protein structure (Epstein, 1967; Grantham, 1974). All this seems to justify the assumption that the genetic code structure, in terms of the number of codons attributed to the various amino acids, supplies the mean amino acid composition of proteins produced during the evolutionary stage in which the genetic code was de®nitively structured. If this is true, we can calculate the value that the thermophily index (TI) assumes in the genetic code, i.e. the value of this index would indicate the mean thermophily status expressed at that time by the mean protein produced by the genetic code. The TI calculated for the genetic code is equal to 10.684 and was calculated by multiplying the value of the thermophily rank (Table 1) of the j-th amino acid by the number of codons that specify for that amino acid in the genetic code, summing these 20 products and then dividing the result by 61. In order to establish whether the TI 10:684 associated to the mean protein produced at the time of genetic code structuring belongs to a mesophile, thermophile or hyperthermophile `organism', the TI variable was sampled (Table 2). In other words, for the proteins from a multiple alignment (Table 2) the value of the mean TI was calculated from all the mesophile sequences (low T) and the mean TI from all the thermophile-hyperthermophile sequences (high T), with their relative standard deviations (Table 2). The general mean was then calculated of all these means (Table 2) obtaining a value of 10.302 for the mean general TI of the mesophiles and 10.697 for the mean general TI of the thermophiles-hyperthermophiles. The respective standard deviations calculated from those reported in Table 2, i.e. weighted for the corresponding degrees of freedom, turned out to be 0.203 for the mesophiles and 0.192 for the thermophiles-hyperthermophiles. Finally, assuming that the mean of the population is equal to the value of the TI of the genetic code, in other words if the null hypothesis is m 10:684, a t-test was conducted (Balaam, 1972) to establish whether the samples with a mean value of TI 10.302 of the mesophiles and that of the TI 10:697 of the thermophiles-hyperthermophiles can be considered as being extracted from the population de®ned through the genetic code having m 10:684 (Balaam, 1972). The following results was obtained. For the value TI 10:302 of the mesophiles we get t (d:f: 20, 28:623 [(10.302±10.684)/(0.203/(21) 1/2)] 23 P , ,10 ), while for the value TI 10:697 of the thermophiles-hyperthermophiles, we get t 10:310 [(10.697± 10.684)/(0.192/(21) 1/2)] (d:f: 20, 0:70 , P , 0:80). This clearly shows that the genetic code originated in a thermophile or hyperthermophile `organism' because the TI value of the genetic code is indistinguishable from the mean TI value of the proteins belonging to the thermophile- hyperthermophile group. This conclusion is also supported by the observation that the mean TI value from the mesophiles is statistically very different from that of the genetic code. 3.4. Estimation of the temperature at which the genetic code originated Having established that the late stage of genetic code structuring took place in a thermophile or hyperthermophile `organism', we can attempt to establish the temperature at which this took place. In order to do this, all the 21 regression lines that can be calculated from Table 2 was studied and chose the one having a mean TI value for the thermophile-hyperthermophile group near the TI of the genetic code and also a low dispersion. The choice fell on S-adenosyl-1-homocysteine hydrolase (Table 2). The regression line (Topt 2885:315 1 91:372TI) of this protein (Fig. 2), which is highly signi®cant (F 158:5, d:f: 24, P ,, 1023 ), made it possible to estimate a mean value of the temperature at which the genetic code was ®xed of 90.98C. The 95% con®dence interval (Wonnacott and Wonnacott, 1982) for this value is 90:9 ^ 24:28C. This interval is the one that would be expected on the basis of the fact that the temperatures of the thermophiles and hyperthermophiles used in the analysis (Table 2) mostly derive from hyperthermophile species. In other words, the sample (Table 2) is enriched with protein sequences derived from organisms with an optimal growth temperature greater than or equal to 808C. Therefore, the calculated con®dence interval seems to re¯ect this very feature. However, the regression lines of other proteins was also used in Table 2, such as aspartate aminotransferase, which gave a 95% con®dence interval of this temperature equal to 80:8 ^ 35:58C, but to obtain this interval four points belonging to mesophiles were eliminated from the regression. Thus, none of these lines managed to supply a con®dence interval of the temperature at which the genetic code was ®xed that was perfectly consistent with the general test establishing that the genetic code originated in a thermophile or hyperthermophile `organism' and therefore better than the interval derived from the data in Fig. 2. 4. Discussion There apparently exists a con¯ict between assigning the value of the thermophily index of the genetic code to a hyperthermophile (or thermophile) `organism' and the distribution of the thermophily ranks (Table 1) within the code itself. The thermophily ranks seem to correlate positively with the `size' of amino acids. Indeed, the correlation coef®cient between the thermophily ranks (Table 1) and the molecular volume of amino acids (Grantham, 1974) is signi®cant (r 10:518, d:f: 18, Z 12:364, P 0:018); (the correlation coef®cient with the molecular weight of amino acids is only marginally signi®cant M. Di Giulio / Gene 261 (2000) 189±195 (r 10:434, d:f: 18, Z 11:918, P 0:055)). This seems to indicate that an increase in the molecular volume of an amino acid used in a protein should correspond, on average, to an increase in the thermostability of the protein itself. Therefore, provided that the thermostability of proteins was an important selective factor in the structuring of the genetic code, we should expect a positive correlation between the number of codons specifying for amino acids in the genetic code and the `size' of amino acids; whereas a negative correlation may even be observed, for instance between the number of codons and the molecular weight of amino acids (Hasegawa and Miyata, 1980; Di Giulio, 1989; Taylor and Coates, 1989). Hence, the behaviour between these three variables explains why there is no correlation between the number of codons specifying for amino acids in the genetic code and the thermophily ranks (r 10:070, d:f: 18, Z 10:289, P 0:77). Thus, it would appear that the number of codons attributed to amino acids in the genetic code re¯ects the latters `mesophile' behaviour because the code attributes, on average, a larger number of codons to the smaller amino acids (Di Giulio, 1989) and this prevents the occurrence of the expected positive correlation between the number of codons and the thermophily ranks which would agree with the main result of the present manuscript, but this is not the case. A possible explanation is that amino acid `size' is important for protein structure (Epstein, 1967; Grantham, 1974) and may re¯ect more general aspects of these structures rather than re¯ecting only their thermostabilty. Moreover, as protein thermostability is a complex phenomenon, it cannot be easily described through any single amino acid property. However, the anomalous behaviour of arginine (Jukes, 1978) to which the genetic code assigns six codons but which is under-represented in the mesophile proteins (Jukes et al., 1975; Jukes, 1978) and has the highest thermophily rank (Table 1), can be interpreted as re¯ecting the `thermophile' behaviour of the genetic code (Di Giulio, 2000). (For serine, to which the genetic code also assigns six codons but which has the lowest thermophily rank (Table 1) no anomalous behaviour is observed regarding the discrepancy between the number of its codons in the genetic code and the frequency with which serine appears in the mesophile proteins (Jukes et al., 1975; Jukes, 1978). Therefore, in this case, there is an agreement with the tendency to assign many codons to the smaller amino acids in the genetic code (Di Giulio, 1989)). In conclusion, although a `contradictory' behaviour can be seen in the genetic code between the number of codons, the thermophily ranks and the `size' of amino acids, which in a certain sense seems to place some limitations on the use here made of the thermophily index value that can be associated to the genetic code, nevertheless the conclusion that the late phase of genetic code structuring took place in a hyperthermophile (or thermophile) `organism' on the whole emerges strengthened. This is because this conclusion is 193 supported by using aspects of the genetic code that, in a certain sense, tend to favour the opposite conclusion. It seems that we must therefore conclude that the distribution of thermophily ranks in the genetic code, although not signi®cantly correlating with the number of codons, nevertheless ensured a suf®cient base for protein thermostability. The observation that the late phase of genetic code structuring took place in a hyperthermophile (or thermophile) `organism' contrasts sharply with a study (Galtier et al., 1999) which uses the correlation between the optimal growth temperature of prokartyotes and the G 1 C content of rRNAs and estimates, through a complex Markov model, the G 1 C percentages of the ancestral sequences of rRNAs of the LUCA to reach the conclusion that the LUCA did not live at a high temperature (Galtier et al., 1999). Given the simplicity of the analysis referred in the present manuscript, it is believed that the conclusion of Galtier et al., (1999) is mistaken and that the G 1 C content estimated by their sequence evolution model is incorrect. This seems to be further supported by a study which uses the parsimony method to reconstruct the ancestral sequences of rRNAs and reaches the conclusion that the LUCA was a thermophile or hyperthermophile `organism' (Di Giulio, 2000). But what implications are there for the origin of life in providing evidence in favour of the possibility that the late phase of genetic code structuring took place at a high temperature? Does this observation imply that the origin of life took place at a high temperature? Clearly the formal answer to the latter question is `no', because it can in any case be supposed that the early phase of the origin of life, for instance in the absence of the genetic system of the type that we now know, took place at a low temperature and only later, with the origin of the genetic code, did the system shift to a high temperature. Nevertheless, it is believed that the more reasonable answer that we can give to this question is `yes'. In actual fact the structuring of the genetic code marks the end of the origin of life and not its beginning because it is clear that the multitude of all the molecules involved in its extrinsication implies a complexity level that is far from the origin of life in the strict sense of the term. Strictly speaking, therefore, providing evidence to imply that the code structuring took place at a high temperature does not, in turn, imply that the origin of life took place at a high temperature. Nevertheless, we must reasonably believe that not only did the phase that led to the structuring of the genetic code take place at a high temperature, but also that many of the previous and subsequent phases took place at the same temperature. For the sake of argument, let us assume that there was a stage in genetic code origin in which only 15 amino acids were codi®ed in the code. Clearly the temperature characterising this stage must have been the same as that of the ®nal stage that led to the structuring of the code because a temperature variation between these two stages would imply changes in the thermostability of a large number of macromolecules and this variation would thus have been highly improbable. 194 M. Di Giulio / Gene 261 (2000) 189±195 If we follow the coevolution theory of genetic code origin (Wong, 1975) we must believe that there was a stage during code origin in which only ®ve±six amino acids were codi®ed. In this stage the temperature must reasonably have been the same as that of the fully developed code for the same reason as mentioned above. In other words, if the coevolution theory (Wong, 1975) is true then there was a stage in which proteins that were already fairly complex and formed from only ®ve±six types of amino acid were produced through a protein synthesis machine that was substantially equivalent to the current one. This implies that this stage must have had the same temperature as the ®nal stage of code structuring because a temperature variation would have likewise required many simultaneous changes in order to vary the thermostability of macromolecules and this variation thus seems unlikely. However, at this stage the genetic code was still evolving and it therefore seems possible, although improbable, that such a temperature variation could have taken place. We have thus reasonably shifted the high temperature that, in light of the evidence provided here, characterised the ®nal phase of genetic code structuring back to the early phase of its origin. Clearly it is not possible to carry this retrodating operation any further at this point. However, it is also clear that if in the early phases of genetic code origin the temperature was high, then the whole system in which this phase originated must have at least tolerated a high temperature. Therefore, the phase of the origin of life that triggered genetic code origin must have taken place at a high temperature. Although the phases prior to this might have taken place at a low temperature, it seems much more reasonable to assume that the origin of life itself took place at a high temperature. A high temperature as the environment in which life originated is contemplated by some theories (Russell et al., 1998; Wachtershauser, 1988a,b, 1998). Finally, we must bear in mind that the coevolution theory of genetic code origin postulates that this origin took place through an imprinting by the biosynthetic relationships between amino acids on the organisation of the genetic code (Wong, 1975). Therefore, this origin seems to have something to do with a phase of the origin of metabolism (Wong, 1975; Wachtershauser, 1988a; Danchin, 1989; Di Giulio, 1993) and the origin of metabolism in some senses seems to assume the meaning of the origin of life. In this sense, the origin of the genetic code might be closer to the origin of life than other arguments lead to believe. If, on the other hand, the origin of the genetic code took place as envisaged by the stereochemical theory (Woese, 1967; Shimizu, 1982; Yarus, 1998) which generally seems to refer to an earlier period, such as the one in which the genetic code originated, than the one contemplated by the coevolution theory (Di Giulio, 1998) it is thought that the conclusion remains substantially the same. Indeed, the considerations made for the coevolution theory must likewise hold here. Therefore, according to the expectations of the stereochemical theory, we must believe that the system in which the genetic code originated through stereochemical interactions between anticodons (or codons) and amino acids (Woese, 1967; Shimizu, 1982; Yarus, 1998) must have at least tolerated a high temperature and, therefore, it is more likely that this system evolved at a high temperature. Consequently, this environment probably also housed the origin of life itself. In conclusion, although the evidence in favour of a hightemperature code structuring does not allow us to be certain that life originated at such a temperature, it nevertheless seems very reasonable to believe that this could actually have been the case. Acknowledgements Thanks to Prof P. Cammarano for the discussions we have had and for giving me the CPS alignment. Also thanks to M. Valenzi, S. Cossu and M. Petrillo for their kind help. References Achenbach-Richter, L., Gupta, R., Stetter, K.O., Woese, C.R., 1987. Were the original eubacteria thermophiles? Syst. Appl. Microbiol. 9, 34±39. Alschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 251, 403±410. Arrhenius, G., Bada, J.L., Joyce, G.F., Lazcano, A., Miller, S., Orgel, L.E., 1999. Science 283, 792. Balaam, L.N., 1972. Fundamentals of Biometry. G. Allen, and Uniwin, London, pp. 120±142. Baross, J.A., Hoffman, S.E., 1985. Submarine hydrothermal vents and associated gradient environments as sites for the origin and evolution of life. Orig. Life 15, 327±345. Danchin, A., 1989. Homeotopic transformation and the origin of translation. Prog. Biophys. Mol. Biol. 54, 81±86. Di Giulio, M., 1989. Some aspects of the organization and evolution of the genetic code. J. Mol. Evol 29, 191±201. Di Giulio, M., 1993. Origin of glutaminyl-tRNA synthetase: and example of palimpsest? J. Mol. Evol 37, 5±10. Di Giulio, M., 1998. Re¯ections on the origin of the genetic code: a hypothesis. J. Theor. Biol 191, 191±196. Di Giulio, M., 2000. The universal ancestor lived in a thermophilic or hyperthermophilic environment. J. Theor. Biol 203, 203±213. Doolittle, W.F., 2000. The nature of the universal ancestor and the evolution of the proteome. Curr. Opin. Struct. Biol. 10, 355±358. Dufton, M.J., 1997. Genetic code synonym quotas and amino acid complexity: cutting the cost of proteins. J. Theor. Biol. 187, 165±173. Epstein, C.J., 1967. Non-randomness of amino-acid changes in the evolution of homologous proteins. Nature 215, 355±359. Forterre, P., 1995. Thermoreduction, a hypothesis for the origin of prokaryotes. C. R. Acad. Sci. Paris 318, 415±422. Forterre, P., 1996. A hot topic: the origin of hyperthermophiles. Cell 85, 789±792. Forterre, P., Confalonieri, F., Charbonnier, F., Duguet, M., 1995. Speculations on the origin of life and thermophily: review of available information on reverse gyrase suggests that hyperthermophilic procaryotes are not so primitive. Orig. Life Evol. Biosphere 25, 235±249. Forterre, P., Philippe, H., 1999. Where is the root of the universal ancestor? Bioessays 21, 871±879. Galtier, N., Tourasse, N., Gouy, M., 1999. A nonhyperthermophilic common ancestor to extant life forms. Science 283, 220±221. M. Di Giulio / Gene 261 (2000) 189±195 Grantham, R., 1974. Amino acid difference formula to help explain protein evolution. Science 185, 862±864. Haney, P.J., Badger, J.H., Buldak, G.L., Reich, C.I., Woese, C.R., Olsen, G.J., 1999. Thermal adaptation analyzed by comparison of proteins sequences from mesophilic and extremely thermophilic Methanococcus species. Proc. Natl. Acad. Sci. USA 96, 3578±3583. Hasegawa, M., Miyata, T., 1980. On the antisymmetry of the amino acid code table. Orig. Life 10, 265±270. Holm, N.G., 1992. Marine hydrothermal systems and the origin of life. Orig. Life Evol. Biosphere 22, 1±241. Jacobs, M.B., Gerstein, M.J., 1960. Handbook of Microbiology. D. van Nostrand, London. Jukes, T.H., 1978. The genetic code. Adv. Enzymol. 47, 375±432. Jukes, T.H., Holmquist, R., Moise, H., 1975. Amino acid composition of proteins: selection against the genetic code. Science 189, 50±51. King, J.L., Jukes, T., 1969. Non-Darwinian evolution. Science 164, 788± 798. McDonald, J.H., Grasso, A.M., Rejto, L.K., 1999. Patterns of temperature adaptation in proteins from Methanococcus and Bacillus. Mol. Biol. Evol. 16, 1785±1790. Miller, S.L., Lazcano, A., 1995. The origin of life±did it occur at high temperatures? J. Mol. Evol. 41, 689±692. Nisbet, E., 1985. The geological setting of the earliest life forms. J. Mol. Evol. 21, 289. Pace, N.R., 1991. Origin of life±facing up to the physical setting. Cell 65, 531±533. Philippe, H., Forterre, P., 1999. The rooting of the universal tree of life is not reliable. J. Mol. Evol. 49, 509±523. Philippe, H., Lopez, P., Brinkmann, H., Budin, K., Germot, A., Laurent, J., Moreira, D., Muller, M., Le Guyader, H., 2000. Early-branching or fastevolving eukaryotes? An answer based on slowly evolving positions. Proc. R. Soc. London B Biol. Sci. 267, 1213±1221. Russell, J., Daia, D.E., Hall, A.J., 1998. The emergence of life from FeS bubbles at alkaline hot springs in an acid ocean. In: Wiegel, J., Adams, W.W. (Eds.). Thermophiles: The Keys to Molecular Evolution and the Origin of Life. Taylor & Francis, London, pp. 77±110. 195 Shimizu, M., 1982. Molecular basis for the genetic code. J. Mol. Evol. 18, 297±303. Shock, E.L., 1996. Evolution of Hydrothermal Systems on Earth (and Mars?), Ciba Foundation Symposium 202. Wiley, Chichester, pp. 40± 60. Staley, J.T., Bryant, M.P., Pfennig, N., Holt, J.G., 1984. In: Hensyl, W.R. (Ed.). Bergey's Manual of Systematic Bacteriology, Vol. 3. Lippincott Williams & Wilkins, Philadelphia, Pa, pp. 177±187. Stetter, K.O., 1995. Microbial life in hyperthermal environments. ASM News 61, 285±290. Stitter, J.W., Hall, B.D., 1999. Long-branch attraction and the rDNA model of early eukaryotic evolution. Mol. Biol. Evol. 16, 1270±1279. Taylor, F.J.R., Coates, D., 1989. The code within the codons. BioSystems 22, 177±187. Thimpson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The CLUSTAL_X windows interface: ¯exible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876±4882. Wachtershauser, G., 1988a. Before enzymes and templates: theory of surface metabolism. Microbiol, Rev. 52, 452±484. Wachtershauser, G., 1988b. Pyrite formation, the ®rst energy source for life: a hypothesis. Syst. Appl. Microbiol. 10, 207±210. Wachtershauser, G., 1998. The case for a hyperthermophilic, chemolithoautotrophic origin of life in a iron-sulfur world. In: Wiegel, J., Adams, W.W. (Eds.). Thermophiles: The Keys to Molecular Evolution and the Origin of Life. Taylor & Francis, London, pp. 47±57. Woese, C.R., 1967. The Genetic Code. Harper & Row, New York. Woese, C.R., 1987. Bacterial evolution. Microbiol. Rev. 51, 221±271. Woese, C.R., Kandler, O., Wheelis, M.L., 1990. Towards a natural system of organism. Proposal for the domains Archaea, Bacteria and Eucarya. Proc. Natl. Acad. Sci. USA 87, 4576±4579. Wong, J.T., 1975. A co-evolution theory of the genetic code. Proc. Natl. Acad. Sci. USA 72, 1909±1912. Wonnacott, T.H., Wonnacott, R.J., 1982. Introductory statistics, Chapter 12. Wiley, New York, pp. 281±304. Yarus, M., 1998. Amino acids as RNA-ligands: a direct-RNA-template theory for the codes origin. J. Mol. Evol. 47, 109±117.
© Copyright 2026 Paperzz