The late stage of genetic code structuring took place at a

Gene 261 (2000) 189±195
www.elsevier.com/locate/gene
The late stage of genetic code structuring took place at a high temperature
Massimo Di Giulio*
International Institute of Genetics and Biophysics, CNR, Via G. Marconi 10, 80125 Naples, Napoli, Italy
Accepted 30 October 2000
Received by G. Bernardi
Abstract
The correlation between the optimal growth temperature of organisms and a thermophily index based on the propensity of amino acids to
enter more frequently into (hyper)thermophile proteins is used to conduct an analysis aiming to establish whether genetic code structuring
took place at a low or a high temperature. If the number of codons attributed to the various amino acids in the genetic code constitutes an
estimate of the mean amino acid composition of proteins produced when the genetic code was de®nitively structured, then the thermophily
index can also be associated to the genetic code. This value and the sampling of the variable thermophily index of different alignments of
protein sequences from mesophile, thermophile and hyperthermophile species make it possible to establish, with an extremely high statistical
con®dence, that the late stage of genetic code structuring took place in a hyperthermophile (or thermophile) `organism'. Moreover the 95%
con®dence interval of the temperature at which the genetic code was ®xed turned out to be 91 ^ 248C. These observations seem to support
the hypothesis that the origin of life might have taken place at a high temperature. q 2000 Elsevier Science B.V. All rights reserved.
Keywords: Origin of life; Genetic code origin; Thermostability; Thermophily index; Optimal growth temperature
1. Introduction
There is a lively open debate on whether the origin of life
took place at a high or a low temperature (Achenbach-Richter et al., 1987; Pace, 1991; Holm, 1992; Miller and
Lazcano, 1995; Forterre, 1995, 1996; Forterre et al., 1995;
Russell et al., 1998; Wachtershauser, 1998). A hot origin of
life is favoured, though not proven, for instance by (i)
evidence of a phylogenetic nature, which shows that the
node of the last universal common ancestor (LUCA) in
the tree derived from 16S ribosomal RNA is surrounded
by hyperthermophile species which moreover have short
branches (Woese, 1987; Wachtershauser, 1988a,b; Woese
et al., 1990; Pace, 1991; Stetter, 1995), and (ii) the presumed
conditions of the primordial Earth (Baross and Hoffman,
1985; Nisbet, 1985; Shock, 1996) which seem to resemble
those of the biotopes of hyperthermophiles. A hot origin of
life is also supported by some theories put forward to
explain such an origin (Russell et al., 1998; Wachtershauser,
1988a,b; 1998).
A low-temperature origin is mainly favoured by those
who support the heterotroph theory of the origin of life
(Miller and Lazcano, 1995), maintaining that at a high
temperature the majority of the building blocks of life
* Tel.: 139-081-7257313; fax: 139-081-5936123.
E-mail address: [email protected] (M. Di Giulio).
would decompose and that this high temperature would be
incompatible with the expectations of the RNA world
(Miller and Lazcano, 1995; Forterre, 1995, 1996; Forterre
et al., 1995).
Phylogenetic data (Woese, 1987; Wachtershauser,
1988a,b; Woese et al., 1990; Pace, 1991; Stetter, 1995)
favouring a hot origin of life has been weakened by the
observation that the guanine plus cytosine content of the
ancestral rRNA sequences of the LUCA does not seem to
be compatible with its hyperthermophile nature (Galtier et
al., 1999) although there is some doubt on the truth of such a
conclusion (Di Giulio, 2000). These analyses (Galtier et al.,
1999; Di Giulio, 2000) are subject to a number of limitations, such as the intrinsic uncertainty in reconstructing the
ancestral rRNA sequences of the LUCA, and the uncertainty
deriving from the topology of the phylogenetic tree used in
such a reconstruction, i.e. the uncertainty of the position that
the root occupies in the tree of life (Philippe and Forterre,
1999; Forterre and Philippe, 1999) and the uncertainty of the
relative positions of the various species on this tree
(Philippe et al., 2000; Stitter and Hall, 1999). Furthermore,
the observation that the LUCA was a mesophile `organism'
(Galtier et al., 1999) might imply, though only weakly, that
life originated at a low temperature because between the
origin of life and the evolution of the LUCA there might
have been changes in the physical environment in which life
0378-1119/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved.
PII: S 0378-111 9(00)00522-9
190
M. Di Giulio / Gene 261 (2000) 189±195
originated (Arrhenius et al., 1999), although the truth of
such an argument seems to depend above all on the unclear
nature (Doolittle, 2000) of the LUCA (Di Giulio, 2000).
In this paper a method that seems to remove most of these
limitations was used. Moreover the conclusion reached
seems to refer to an evolutionary time prior to that of the
evolution of the LUCA, i.e. the late stage of the origin of
genetic code organisation and thus a stage closer to the
origin of life with the consequence that the conclusion is
more easily ascribed to the time in which life originated on
our planet.
2. Materials and methods
The majority of the sequences used in the analysis were
taken from the NCBI database using the BLASTP program
(Alschul et al., 1990). In particular, the name of a speci®c
protein belonging to a randomly-chosen organism was used
to obtain the protein sequence. By means of BLASTP
(Alschul et al., 1990) this sequence was used as a probe to
obtain all the orthologous proteins present in the database.
In many cases, in order to ensure that all the orthologous
proteins had been identi®ed, amino acid sequences from the
Archaea, Bacteria and Eukarya domains were used as
probes for a given protein. The different proteins in the
analysis were chosen at random and were not paralogous
to one another.
The amino acid sequences were aligned using the CLUSTALX program (Thimpson et al., 1997) with its default
parameters. Only the alignment zones between highly
conserved regions were preserved in the analysis whereas
all the amino acid sites containing at least one gap were
eliminated. All the alignments used in the analysis are available upon request to the author.
The values of the optimal growth temperatures (Topt) of
the various organisms were taken from Jacobs and Gerstein
(1960) and from Staley et al. (1984). In a number of cases,
especially for the eukaryotes, these values were found by
consulting the specialised literature.
Table 1
Thermophily ranks (See text for their de®nition)
Arg ˆ 19.50
Trp ˆ 18.25
Pro ˆ 17.25
Ile ˆ 15.50
Tyr ˆ 14.75
Cys ˆ 13.75
Leu ˆ 13.75
Val ˆ 13.00
Glu ˆ 11.25
Ala ˆ 11.00
Phe ˆ 10.25
Lys ˆ 10.00
His ˆ 9.25
Met ˆ 7.00
Gly ˆ 6.00
Asp ˆ 6.00
Gln ˆ 5.25
Thr ˆ 5.00
Asn ˆ 2.25
Ser ˆ 1.00
McDonald et al., 1999) using only those values from the
Methanococcus species. Finally the mean rank for each
amino acid was calculated as emerging from the comparison
between the amino acid ranks from the Bacillus species
(McDonald et al., 1999) and the already calculated ones
from the Methanococcus species. These values are reported
in Table 1. They can be considered as thermophily ranks in
the sense that a high value indicates a greater propensity for
that amino acid to enter the proteins of the (hyper)thermophiles.
The thermophily index (TI) that can be associated to any
one protein is de®ned as
TI ˆ
N
X
jˆ1
Rj =N
where Rj is the value of the thermophily rank (Table 1) of
the j-th amino acid, and N is the total number of amino acids
in the considered protein. In order to calculate this index, a
simple algorithm using a FASTA format input ®le was written (available upon request to the author).
3.2. The existence of a correlation between the optimal
growth temperature of organisms and the thermophily index
It was found that there is a strong correlation between the
optimal growth temperature of the various organisms and
the thermophily index (Figs. 1 and 2). The regression in Fig.
1 was calculated starting from the multiple alignment of the
3. Results
3.1. The construction of a thermophily index
Two works (Haney et al., 1999; McDonald et al., 1999)
have compared protein sequences from mesophile and thermophile-hyperthermophile species. MacDonald et al.'s Fig.
1 and Haney's Table. 1 make it possible to derive three
amino acid rankings, two for the Methanococcus species
(Haney et al., 1999; McDonald et al., 1999) and one for
the Bacillus species (McDonald et al., 1999). I attributed
the highest rank to the amino acids most frequently found in
the (hyper)thermophiles. The mean rank value was next
calculated for each amino acid between the amino acid
ranks obtained from these two works (Haney et al., 1999;
Fig. 1. The correlation between the optimal growth temperature (Topt) of the
various organisms and the thermophily index. This correlation refers to 46
amino acid sequences of the signal recognition particle (54 kDa). See text
for further information.
M. Di Giulio / Gene 261 (2000) 189±195
191
thermophily index is highly signi®cant (F ˆ 51:8,
d:f: ˆ 45, P ,, 1023 ). This result was con®rmed for
many other proteins (Table 2) with a signi®cance in the
regression lines that in several cases is equivalent to that
shown in Fig. 1; in other cases the probability was lower
than or equal to 10 23, three proteins gave a probability of
about 2% while two proteins showed no signi®cance
(P ù 0:40); the latter were not included in Table 2.
3.3. The genetic code originated in a thermophile or
hyperthermophile `organism'
Fig. 2. The correlation between the optimal growth temperature (Topt) of the
various organisms and the thermophily index (TI) calculated for 25 amino
acid sequences of S-adenosyl-1-homocysteine hydrolase. The vertical line
refers to the value that the TI assumes in the genetic code. See text for
further information.
signal recognition particle (54 kDa) as obtainable at the web
site http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html. After
removing some isoenzyme sequences and some partial
sequences and eliminating some regions of dubious alignment, a multiple alignment of 46 sequences was obtained,
each one 356 amino acids long. The regression line
(T opt ˆ 2444:433 1 48:082TI) (Fig. 1) between the optimal growth temperature of the various organisms and the
There has been a long debate on the existence of a positive correlation between the number of codons specifying
amino acids in the genetic code and the frequency with
which amino acids appear in proteins (King and Jukes,
1969; Jukes et al., 1975; Dufton, 1997). We must therefore
believe, at least in part, that the number of codons codifying
in the genetic code for the various amino acids was a variable subject to the in¯uence of natural selection at the time
of genetic code origin. This viewpoint seems to be
supported by the observation that the number of codons
attributed to amino acids in the genetic code correlates
with the molecular weight of the corresponding amino
acid, or more generally with their `size' (Hasegawa and
Miyata, 1980; Di Giulio, 1989; Taylor and Coates, 1989).
This correlation becomes highly signi®cant if we replace the
Table 2
The mean values of the thermophily indice (TI) for the proteins used in the analysis, both for mesophile species (Low T) and for thermophile-hyperthermophile
species (High T) with their relative standard deviations a
Proteins
Carbamoyl-phosphate synthetase
Signal recognition particle 54 kDa
Threonyl-tRNA synthetase
Aspartate carbamoyltransferase
Acetylornithine aminotransferase
Inosine-5 0 -monophosphate dehydrogenase
Beta subunit of tryptophan synthase
S-adenosyl-1-homocysteine hydrolase
Cell division protein (ftsZ)
Glutamate-1-semialdehyde aminotransferase
Glutamine-fructose-6-phosphate transferase
Phosphoribosylamine-glycine ligase
Anthranilate synthetase alpha-subunit
Threonine synthase
Aspartate aminotransferase
Succinyl-CoA synthetase beta subunit
Adenylosuccinate synthase
Acetyl-CoA synthetase
Valyl-tRNA Synthetase
Glutamate dehydrogenase (GDH)
CTP synthetase
Alignment length
Mean TI
Amino acid number
Low T
501
356
374
189
226
338
354
343
296
321
339
243
268
179
302
288
240
257
434
293
356
10.315
10.002
10.729
10.358
10.044
10.059
10.289
10.023
9.848
10.260
10.458
10.439
10.714
9.937
10.317
10.347
10.217
10.503
10.703
10.282
10.497
Standard deviation
Number of sequences
High T
Low T
High T
Low T
High T
10.533
10.654
11.280
10.784
10.317
10.657
10.576
10.586
10.260
10.656
10.771
10.685
10.964
10.299
10.635
10.989
10.444
10.931
11.267
10.754
10.601
0.117
0.250
0.198
0.297
0.270
0.266
0.160
0.082
0.154
0.299
0.232
0.261
0.139
0.310
0.267
0.154
0.190
0.140
0.218
0.110
0.112
0.154
0.276
0.231
0.249
0.2 14
0.099
0.147
0.228
0.198
0.209
0.188
0.185
0.198
0.136
0.274
0.185
0.138
0.159
0.173
0.193
0.132
26
35
26
24
14
18
17
16
20
20
22
22
14
15
10
22
20
16
28
14
20
11
11
8
6
7
7
9
9
8
5
6
8
11
8
11
6
8
5
9
12
8
a
The table also indicates the number of protein sequences used for each given protein, again for the mesophile group (Low T) and for the thermophilehyperthermophile group (High T), as well as the length of the multiple alignment of the amino acid sequences after the regions of dubious alignment were
removed.
192
M. Di Giulio / Gene 261 (2000) 189±195
number of codons which in the code codify for amino acids
with the frequency with which amino acids appear in
proteins (Di Giulio, 1989). This indicates that the number
of codons was a variable subject to selection (Di Giulio,
1989) because the `size' of amino acids with which the
number of codons correlates is important for protein structure (Epstein, 1967; Grantham, 1974).
All this seems to justify the assumption that the genetic
code structure, in terms of the number of codons attributed to
the various amino acids, supplies the mean amino acid
composition of proteins produced during the evolutionary
stage in which the genetic code was de®nitively structured.
If this is true, we can calculate the value that the thermophily
index (TI) assumes in the genetic code, i.e. the value of this
index would indicate the mean thermophily status expressed
at that time by the mean protein produced by the genetic code.
The TI calculated for the genetic code is equal to 10.684 and
was calculated by multiplying the value of the thermophily
rank (Table 1) of the j-th amino acid by the number of codons
that specify for that amino acid in the genetic code, summing
these 20 products and then dividing the result by 61.
In order to establish whether the TI ˆ 10:684 associated
to the mean protein produced at the time of genetic code
structuring belongs to a mesophile, thermophile or
hyperthermophile `organism', the TI variable was sampled
(Table 2). In other words, for the proteins from a multiple
alignment (Table 2) the value of the mean TI was calculated from all the mesophile sequences (low T) and the
mean TI from all the thermophile-hyperthermophile
sequences (high T), with their relative standard deviations
(Table 2). The general mean was then calculated of all
these means (Table 2) obtaining a value of 10.302 for
the mean general TI of the mesophiles and 10.697 for
the mean general TI of the thermophiles-hyperthermophiles. The respective standard deviations calculated from
those reported in Table 2, i.e. weighted for the corresponding degrees of freedom, turned out to be 0.203 for the
mesophiles and 0.192 for the thermophiles-hyperthermophiles. Finally, assuming that the mean of the population
is equal to the value of the TI of the genetic code, in other
words if the null hypothesis is m ˆ 10:684, a t-test was
conducted (Balaam, 1972) to establish whether the samples
with a mean value of TI ˆ 10.302 of the mesophiles and
that of the TI ˆ 10:697 of the thermophiles-hyperthermophiles can be considered as being extracted from the population de®ned through the genetic code having m ˆ 10:684
(Balaam, 1972). The following results was obtained. For
the value TI ˆ 10:302 of the mesophiles we get t ˆ
(d:f: ˆ 20,
28:623
[(10.302±10.684)/(0.203/(21) 1/2)]
23
P , ,10 ), while for the value TI ˆ 10:697 of the thermophiles-hyperthermophiles, we get t ˆ 10:310 [(10.697±
10.684)/(0.192/(21) 1/2)] (d:f: ˆ 20, 0:70 , P , 0:80). This
clearly shows that the genetic code originated in a thermophile or hyperthermophile `organism' because the TI value
of the genetic code is indistinguishable from the mean TI
value of the proteins belonging to the thermophile-
hyperthermophile group. This conclusion is also supported
by the observation that the mean TI value from the mesophiles is statistically very different from that of the genetic
code.
3.4. Estimation of the temperature at which the genetic code
originated
Having established that the late stage of genetic code
structuring took place in a thermophile or hyperthermophile
`organism', we can attempt to establish the temperature at
which this took place. In order to do this, all the 21 regression lines that can be calculated from Table 2 was studied
and chose the one having a mean TI value for the thermophile-hyperthermophile group near the TI of the genetic
code and also a low dispersion. The choice fell on S-adenosyl-1-homocysteine hydrolase (Table 2). The regression line
(Topt ˆ 2885:315 1 91:372TI) of this protein (Fig. 2),
which is highly signi®cant (F ˆ 158:5, d:f: ˆ 24,
P ,, 1023 ), made it possible to estimate a mean value of
the temperature at which the genetic code was ®xed of
90.98C. The 95% con®dence interval (Wonnacott and
Wonnacott, 1982) for this value is 90:9 ^ 24:28C. This
interval is the one that would be expected on the basis of
the fact that the temperatures of the thermophiles and
hyperthermophiles used in the analysis (Table 2) mostly
derive from hyperthermophile species. In other words, the
sample (Table 2) is enriched with protein sequences derived
from organisms with an optimal growth temperature greater
than or equal to 808C. Therefore, the calculated con®dence
interval seems to re¯ect this very feature. However, the
regression lines of other proteins was also used in Table 2,
such as aspartate aminotransferase, which gave a 95% con®dence interval of this temperature equal to 80:8 ^ 35:58C,
but to obtain this interval four points belonging to mesophiles were eliminated from the regression. Thus, none of
these lines managed to supply a con®dence interval of the
temperature at which the genetic code was ®xed that was
perfectly consistent with the general test establishing that
the genetic code originated in a thermophile or hyperthermophile `organism' and therefore better than the interval
derived from the data in Fig. 2.
4. Discussion
There apparently exists a con¯ict between assigning the
value of the thermophily index of the genetic code to a
hyperthermophile (or thermophile) `organism' and the
distribution of the thermophily ranks (Table 1) within the
code itself. The thermophily ranks seem to correlate positively with the `size' of amino acids. Indeed, the correlation
coef®cient between the thermophily ranks (Table 1) and the
molecular volume of amino acids (Grantham, 1974) is
signi®cant
(r ˆ 10:518,
d:f: ˆ 18,
Z ˆ 12:364,
P ˆ 0:018); (the correlation coef®cient with the molecular
weight of amino acids is only marginally signi®cant
M. Di Giulio / Gene 261 (2000) 189±195
(r ˆ 10:434, d:f: ˆ 18, Z ˆ 11:918, P ˆ 0:055)). This
seems to indicate that an increase in the molecular volume
of an amino acid used in a protein should correspond, on
average, to an increase in the thermostability of the protein
itself. Therefore, provided that the thermostability of
proteins was an important selective factor in the structuring
of the genetic code, we should expect a positive correlation
between the number of codons specifying for amino acids in
the genetic code and the `size' of amino acids; whereas a
negative correlation may even be observed, for instance
between the number of codons and the molecular weight
of amino acids (Hasegawa and Miyata, 1980; Di Giulio,
1989; Taylor and Coates, 1989). Hence, the behaviour
between these three variables explains why there is no
correlation between the number of codons specifying for
amino acids in the genetic code and the thermophily ranks
(r ˆ 10:070, d:f: ˆ 18, Z ˆ 10:289, P ˆ 0:77). Thus, it
would appear that the number of codons attributed to
amino acids in the genetic code re¯ects the latters `mesophile' behaviour because the code attributes, on average, a
larger number of codons to the smaller amino acids (Di
Giulio, 1989) and this prevents the occurrence of the
expected positive correlation between the number of codons
and the thermophily ranks which would agree with the main
result of the present manuscript, but this is not the case.
A possible explanation is that amino acid `size' is important for protein structure (Epstein, 1967; Grantham, 1974)
and may re¯ect more general aspects of these structures
rather than re¯ecting only their thermostabilty. Moreover,
as protein thermostability is a complex phenomenon, it
cannot be easily described through any single amino acid
property.
However, the anomalous behaviour of arginine (Jukes,
1978) to which the genetic code assigns six codons but
which is under-represented in the mesophile proteins
(Jukes et al., 1975; Jukes, 1978) and has the highest thermophily rank (Table 1), can be interpreted as re¯ecting the
`thermophile' behaviour of the genetic code (Di Giulio,
2000). (For serine, to which the genetic code also assigns
six codons but which has the lowest thermophily rank
(Table 1) no anomalous behaviour is observed regarding
the discrepancy between the number of its codons in the
genetic code and the frequency with which serine appears
in the mesophile proteins (Jukes et al., 1975; Jukes, 1978).
Therefore, in this case, there is an agreement with the
tendency to assign many codons to the smaller amino
acids in the genetic code (Di Giulio, 1989)).
In conclusion, although a `contradictory' behaviour can
be seen in the genetic code between the number of codons,
the thermophily ranks and the `size' of amino acids, which
in a certain sense seems to place some limitations on the use
here made of the thermophily index value that can be associated to the genetic code, nevertheless the conclusion that
the late phase of genetic code structuring took place in a
hyperthermophile (or thermophile) `organism' on the whole
emerges strengthened. This is because this conclusion is
193
supported by using aspects of the genetic code that, in a
certain sense, tend to favour the opposite conclusion. It
seems that we must therefore conclude that the distribution
of thermophily ranks in the genetic code, although not
signi®cantly correlating with the number of codons, nevertheless ensured a suf®cient base for protein thermostability.
The observation that the late phase of genetic code structuring took place in a hyperthermophile (or thermophile)
`organism' contrasts sharply with a study (Galtier et al.,
1999) which uses the correlation between the optimal
growth temperature of prokartyotes and the G 1 C content
of rRNAs and estimates, through a complex Markov model,
the G 1 C percentages of the ancestral sequences of rRNAs
of the LUCA to reach the conclusion that the LUCA did not
live at a high temperature (Galtier et al., 1999). Given the
simplicity of the analysis referred in the present manuscript,
it is believed that the conclusion of Galtier et al., (1999) is
mistaken and that the G 1 C content estimated by their
sequence evolution model is incorrect. This seems to be
further supported by a study which uses the parsimony
method to reconstruct the ancestral sequences of rRNAs
and reaches the conclusion that the LUCA was a thermophile or hyperthermophile `organism' (Di Giulio, 2000).
But what implications are there for the origin of life in
providing evidence in favour of the possibility that the late
phase of genetic code structuring took place at a high
temperature? Does this observation imply that the origin of
life took place at a high temperature? Clearly the formal
answer to the latter question is `no', because it can in any
case be supposed that the early phase of the origin of life, for
instance in the absence of the genetic system of the type that
we now know, took place at a low temperature and only later,
with the origin of the genetic code, did the system shift to a
high temperature. Nevertheless, it is believed that the more
reasonable answer that we can give to this question is `yes'.
In actual fact the structuring of the genetic code marks the
end of the origin of life and not its beginning because it is
clear that the multitude of all the molecules involved in its
extrinsication implies a complexity level that is far from the
origin of life in the strict sense of the term. Strictly speaking,
therefore, providing evidence to imply that the code structuring took place at a high temperature does not, in turn,
imply that the origin of life took place at a high temperature.
Nevertheless, we must reasonably believe that not only did
the phase that led to the structuring of the genetic code take
place at a high temperature, but also that many of the
previous and subsequent phases took place at the same
temperature. For the sake of argument, let us assume that
there was a stage in genetic code origin in which only 15
amino acids were codi®ed in the code. Clearly the temperature characterising this stage must have been the same as
that of the ®nal stage that led to the structuring of the code
because a temperature variation between these two stages
would imply changes in the thermostability of a large
number of macromolecules and this variation would thus
have been highly improbable.
194
M. Di Giulio / Gene 261 (2000) 189±195
If we follow the coevolution theory of genetic code origin
(Wong, 1975) we must believe that there was a stage during
code origin in which only ®ve±six amino acids were codi®ed. In this stage the temperature must reasonably have
been the same as that of the fully developed code for the
same reason as mentioned above. In other words, if the
coevolution theory (Wong, 1975) is true then there was a
stage in which proteins that were already fairly complex and
formed from only ®ve±six types of amino acid were
produced through a protein synthesis machine that was
substantially equivalent to the current one. This implies
that this stage must have had the same temperature as the
®nal stage of code structuring because a temperature variation would have likewise required many simultaneous
changes in order to vary the thermostability of macromolecules and this variation thus seems unlikely. However, at
this stage the genetic code was still evolving and it therefore
seems possible, although improbable, that such a temperature variation could have taken place.
We have thus reasonably shifted the high temperature
that, in light of the evidence provided here, characterised
the ®nal phase of genetic code structuring back to the early
phase of its origin. Clearly it is not possible to carry this
retrodating operation any further at this point. However, it is
also clear that if in the early phases of genetic code origin
the temperature was high, then the whole system in which
this phase originated must have at least tolerated a high
temperature. Therefore, the phase of the origin of life that
triggered genetic code origin must have taken place at a high
temperature. Although the phases prior to this might have
taken place at a low temperature, it seems much more
reasonable to assume that the origin of life itself took
place at a high temperature. A high temperature as the environment in which life originated is contemplated by some
theories (Russell et al., 1998; Wachtershauser, 1988a,b,
1998).
Finally, we must bear in mind that the coevolution theory
of genetic code origin postulates that this origin took place
through an imprinting by the biosynthetic relationships
between amino acids on the organisation of the genetic
code (Wong, 1975). Therefore, this origin seems to have
something to do with a phase of the origin of metabolism
(Wong, 1975; Wachtershauser, 1988a; Danchin, 1989; Di
Giulio, 1993) and the origin of metabolism in some senses
seems to assume the meaning of the origin of life. In this
sense, the origin of the genetic code might be closer to the
origin of life than other arguments lead to believe.
If, on the other hand, the origin of the genetic code took
place as envisaged by the stereochemical theory (Woese,
1967; Shimizu, 1982; Yarus, 1998) which generally seems
to refer to an earlier period, such as the one in which the
genetic code originated, than the one contemplated by the
coevolution theory (Di Giulio, 1998) it is thought that the
conclusion remains substantially the same. Indeed, the
considerations made for the coevolution theory must likewise hold here. Therefore, according to the expectations of
the stereochemical theory, we must believe that the system
in which the genetic code originated through stereochemical
interactions between anticodons (or codons) and amino
acids (Woese, 1967; Shimizu, 1982; Yarus, 1998) must
have at least tolerated a high temperature and, therefore, it
is more likely that this system evolved at a high temperature.
Consequently, this environment probably also housed the
origin of life itself.
In conclusion, although the evidence in favour of a hightemperature code structuring does not allow us to be certain
that life originated at such a temperature, it nevertheless
seems very reasonable to believe that this could actually
have been the case.
Acknowledgements
Thanks to Prof P. Cammarano for the discussions we
have had and for giving me the CPS alignment. Also thanks
to M. Valenzi, S. Cossu and M. Petrillo for their kind help.
References
Achenbach-Richter, L., Gupta, R., Stetter, K.O., Woese, C.R., 1987. Were
the original eubacteria thermophiles? Syst. Appl. Microbiol. 9, 34±39.
Alschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic
local alignment search tool. J. Mol. Biol. 251, 403±410.
Arrhenius, G., Bada, J.L., Joyce, G.F., Lazcano, A., Miller, S., Orgel, L.E.,
1999. Science 283, 792.
Balaam, L.N., 1972. Fundamentals of Biometry. G. Allen, and Uniwin,
London, pp. 120±142.
Baross, J.A., Hoffman, S.E., 1985. Submarine hydrothermal vents and
associated gradient environments as sites for the origin and evolution
of life. Orig. Life 15, 327±345.
Danchin, A., 1989. Homeotopic transformation and the origin of translation. Prog. Biophys. Mol. Biol. 54, 81±86.
Di Giulio, M., 1989. Some aspects of the organization and evolution of the
genetic code. J. Mol. Evol 29, 191±201.
Di Giulio, M., 1993. Origin of glutaminyl-tRNA synthetase: and example
of palimpsest? J. Mol. Evol 37, 5±10.
Di Giulio, M., 1998. Re¯ections on the origin of the genetic code: a hypothesis. J. Theor. Biol 191, 191±196.
Di Giulio, M., 2000. The universal ancestor lived in a thermophilic or
hyperthermophilic environment. J. Theor. Biol 203, 203±213.
Doolittle, W.F., 2000. The nature of the universal ancestor and the evolution of the proteome. Curr. Opin. Struct. Biol. 10, 355±358.
Dufton, M.J., 1997. Genetic code synonym quotas and amino acid
complexity: cutting the cost of proteins. J. Theor. Biol. 187, 165±173.
Epstein, C.J., 1967. Non-randomness of amino-acid changes in the evolution of homologous proteins. Nature 215, 355±359.
Forterre, P., 1995. Thermoreduction, a hypothesis for the origin of prokaryotes. C. R. Acad. Sci. Paris 318, 415±422.
Forterre, P., 1996. A hot topic: the origin of hyperthermophiles. Cell 85,
789±792.
Forterre, P., Confalonieri, F., Charbonnier, F., Duguet, M., 1995. Speculations on the origin of life and thermophily: review of available information on reverse gyrase suggests that hyperthermophilic procaryotes are
not so primitive. Orig. Life Evol. Biosphere 25, 235±249.
Forterre, P., Philippe, H., 1999. Where is the root of the universal ancestor?
Bioessays 21, 871±879.
Galtier, N., Tourasse, N., Gouy, M., 1999. A nonhyperthermophilic
common ancestor to extant life forms. Science 283, 220±221.
M. Di Giulio / Gene 261 (2000) 189±195
Grantham, R., 1974. Amino acid difference formula to help explain protein
evolution. Science 185, 862±864.
Haney, P.J., Badger, J.H., Buldak, G.L., Reich, C.I., Woese, C.R., Olsen,
G.J., 1999. Thermal adaptation analyzed by comparison of proteins
sequences from mesophilic and extremely thermophilic Methanococcus
species. Proc. Natl. Acad. Sci. USA 96, 3578±3583.
Hasegawa, M., Miyata, T., 1980. On the antisymmetry of the amino acid
code table. Orig. Life 10, 265±270.
Holm, N.G., 1992. Marine hydrothermal systems and the origin of life.
Orig. Life Evol. Biosphere 22, 1±241.
Jacobs, M.B., Gerstein, M.J., 1960. Handbook of Microbiology. D. van
Nostrand, London.
Jukes, T.H., 1978. The genetic code. Adv. Enzymol. 47, 375±432.
Jukes, T.H., Holmquist, R., Moise, H., 1975. Amino acid composition of
proteins: selection against the genetic code. Science 189, 50±51.
King, J.L., Jukes, T., 1969. Non-Darwinian evolution. Science 164, 788±
798.
McDonald, J.H., Grasso, A.M., Rejto, L.K., 1999. Patterns of temperature
adaptation in proteins from Methanococcus and Bacillus. Mol. Biol.
Evol. 16, 1785±1790.
Miller, S.L., Lazcano, A., 1995. The origin of life±did it occur at high
temperatures? J. Mol. Evol. 41, 689±692.
Nisbet, E., 1985. The geological setting of the earliest life forms. J. Mol.
Evol. 21, 289.
Pace, N.R., 1991. Origin of life±facing up to the physical setting. Cell 65,
531±533.
Philippe, H., Forterre, P., 1999. The rooting of the universal tree of life is
not reliable. J. Mol. Evol. 49, 509±523.
Philippe, H., Lopez, P., Brinkmann, H., Budin, K., Germot, A., Laurent, J.,
Moreira, D., Muller, M., Le Guyader, H., 2000. Early-branching or fastevolving eukaryotes? An answer based on slowly evolving positions.
Proc. R. Soc. London B Biol. Sci. 267, 1213±1221.
Russell, J., Daia, D.E., Hall, A.J., 1998. The emergence of life from FeS
bubbles at alkaline hot springs in an acid ocean. In: Wiegel, J., Adams,
W.W. (Eds.). Thermophiles: The Keys to Molecular Evolution and the
Origin of Life. Taylor & Francis, London, pp. 77±110.
195
Shimizu, M., 1982. Molecular basis for the genetic code. J. Mol. Evol. 18,
297±303.
Shock, E.L., 1996. Evolution of Hydrothermal Systems on Earth (and
Mars?), Ciba Foundation Symposium 202. Wiley, Chichester, pp. 40±
60.
Staley, J.T., Bryant, M.P., Pfennig, N., Holt, J.G., 1984. In: Hensyl, W.R.
(Ed.). Bergey's Manual of Systematic Bacteriology, Vol. 3. Lippincott
Williams & Wilkins, Philadelphia, Pa, pp. 177±187.
Stetter, K.O., 1995. Microbial life in hyperthermal environments. ASM
News 61, 285±290.
Stitter, J.W., Hall, B.D., 1999. Long-branch attraction and the rDNA model
of early eukaryotic evolution. Mol. Biol. Evol. 16, 1270±1279.
Taylor, F.J.R., Coates, D., 1989. The code within the codons. BioSystems
22, 177±187.
Thimpson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G.,
1997. The CLUSTAL_X windows interface: ¯exible strategies for
multiple sequence alignment aided by quality analysis tools. Nucleic
Acids Res. 25, 4876±4882.
Wachtershauser, G., 1988a. Before enzymes and templates: theory of
surface metabolism. Microbiol, Rev. 52, 452±484.
Wachtershauser, G., 1988b. Pyrite formation, the ®rst energy source for
life: a hypothesis. Syst. Appl. Microbiol. 10, 207±210.
Wachtershauser, G., 1998. The case for a hyperthermophilic, chemolithoautotrophic origin of life in a iron-sulfur world. In: Wiegel, J.,
Adams, W.W. (Eds.). Thermophiles: The Keys to Molecular Evolution
and the Origin of Life. Taylor & Francis, London, pp. 47±57.
Woese, C.R., 1967. The Genetic Code. Harper & Row, New York.
Woese, C.R., 1987. Bacterial evolution. Microbiol. Rev. 51, 221±271.
Woese, C.R., Kandler, O., Wheelis, M.L., 1990. Towards a natural system
of organism. Proposal for the domains Archaea, Bacteria and Eucarya.
Proc. Natl. Acad. Sci. USA 87, 4576±4579.
Wong, J.T., 1975. A co-evolution theory of the genetic code. Proc. Natl.
Acad. Sci. USA 72, 1909±1912.
Wonnacott, T.H., Wonnacott, R.J., 1982. Introductory statistics, Chapter
12. Wiley, New York, pp. 281±304.
Yarus, M., 1998. Amino acids as RNA-ligands: a direct-RNA-template
theory for the codes origin. J. Mol. Evol. 47, 109±117.