On the Relationships between the Genetic Code Coevolution

On the Relationships between the Genetic Code Coevolution Hypothesis
and the Physicochemical Hypothesis
M assim o Di G iulio
International Institute o f Genetics and Biophysics, C NR Via G. Marconi 10, 80125 N apoli,
Italy
Z. Naturforsch. 46c, 3 0 5 -3 1 2 (1991); received September 28, 1990
Genetic Code Theories, C oevolution, Precursor-Product Amino Acids, Am ino Acids Proper­
ties, Hypergeometric Distribution
This paper analyzes the relationships between the genetic code coevolution hypothesis and
the physicochemical hypothesis by means o f a comparative study o f the precursor-product
amino acid pairs on which the former hypothesis is based. Even if the coevolution between the
biosynthetic relationships o f amino acids and the organization o f the genetic code is not ques­
tioned in this paper, the results and the arguments used lead us to believe that the selective
pressures considered essential by the physicochemical postulates, played a more active role
than that o f the precursor-product relationships in defining the allocation o f these amino acids
in the genetic code. It is furthermore pointed out that the two evolutionary hypothesis might
be aspects o f the same selective pressure, and thus difficult to differentiate.
Introduction
Some recent findings, such as self-splicing
rR N A in tro n o f ciliate Tetrahym ena and oth er
R N A m olecules having catalytic activity [1], have
given m ore credibility to the suggestion [2] that
when life started on this plan et, R N A was used as
both a genom e and a catalyst o f its own replication
[3]. W einer an d M aizels [4] discussed a m odel for
the replication o f R N A m olecules th a t seems to
have som e im plications for p ro tein synthesis. They
[4] suggest th a t tR N A -like stru ctu res which were
im p o rtan t for the initiation o f R N A synthesis,
m ight have been specifically am inoacylated with
an am ino acid thro u g h an a b a rra n t activity o f the
replicase. In this m odel the association o f an am i­
no acid to a specific cod o n should merely reflect
the fo rtu ito u s affinity o f the am in o acid for the ac­
tive site in a specific tR N A synthase: R N A repli­
case in W einer an d M aizels’s m odel.
The association o f an am in o acid to a specific
codon can also be explained by tw o theories p ro ­
posed several years ago.
The first o f these, the coevolution hypothesis o f
the genetic code [5, 6], suggests th a t cod o n d istri­
bution betw een am ino acids was m ainly d eter­
m ined by the concession o f co d o n s from the p re­
cursor am ino acid to the p ro d u c t am ino acid
Reprint requests to Dr. M. Di G iulio.
Verlag der Zeitschrift für N aturforschung, D-7400 Tübingen
0939-5075/91/0300-0305 $01.30/0
form ed from the precursor. T he o rg anization o f
the genetic code should, therefore, reflect the bio­
synthetic relationships betw een am ino acids. This
evolutionary hypothesis seems to agree w ith the
model discussed by W einer and M aizels [4]: once
the tR N A -like structure was charged by an am ino
acid through the ab erran t activity o f the R N A re­
plicase, the am ino acid was converted into the
product am ino acid while it was still attached to
the tR N A -like m olecule, as discussed by W ong [5].
There have been several direct [5], indirect [6, 7]
and com patible [8-1 4 ] confirm ations o f the co­
evolution m echanism .
On the other hand, the physicochem ical h y p o th ­
esis o f the genetic code [15, 16], postulates that
am ino acid allocation in the genetic code reflects
the physicochem ical properties o f the am ino acids,
since the organization o f the genetic code was de­
term ined by selective pressures tending either to
minimize the deleterious effects o f m u tatio n s [15]
or to reduce translation errors [16], This ev o lu tio n ­
ary hypothesis also seems to agree w ith W einer
and M aizels’s [4] model: we need only to postulate
th at the tR N A -like structure was charged by
tR N A synthase, th ro u g h a generic discrim ination
between am ino acids based on their physicochem i­
cal properties. The successive im provem ent o f this
ap p aratu s through the above selective pressures
probably determ ined am ino acid allocation in the
genetic code.
The physicochem ical selective m echanism also
appears to have received several c o rro b o ra tio n s
Dieses Werk wurde im Jahr 2013 vom Verlag Zeitschrift für Naturforschung
in Zusammenarbeit mit der Max-Planck-Gesellschaft zur Förderung der
Wissenschaften e.V. digitalisiert und unter folgender Lizenz veröffentlicht:
Creative Commons Namensnennung-Keine Bearbeitung 3.0 Deutschland
Lizenz.
This work has been digitalized and published in 2013 by Verlag Zeitschrift
für Naturforschung in cooperation with the Max Planck Society for the
Advancement of Science under a Creative Commons Attribution-NoDerivs
3.0 Germany License.
Zum 01.01.2015 ist eine Anpassung der Lizenzbedingungen (Entfall der
Creative Commons Lizenzbedingung „Keine Bearbeitung“) beabsichtigt,
um eine Nachnutzung auch im Rahmen zukünftiger wissenschaftlicher
Nutzungsformen zu ermöglichen.
On 01.01.2015 it is planned to change the License Conditions (the removal
of the Creative Commons License condition “no derivative works”). This is
to allow reuse in the area of future scientific usage.
306
M. Di G iulio • G enetic C ode E volution
binding the physicochem ical properties o f the am i­
no acids to the o rg an izatio n o f the genetic code
[17-26],
In this p ap er I analyze the relationships between
the coevolution hypothesis [5] and the physico­
chem ical one [15, 16] by m eans o f a com parative
study on the p recu rso r-p ro d u ct am ino acid pairs
on w hich the fo rm er hypothesis is based, so as to
help identify w hich o f the selective pressures con­
trib u ted m ost in determ ining the allocation o f the
am ino acids in a p recu rso r-p ro d u ct relationship in
the genetic code.
R esults
The coevolution hypothesis o f the genetic code
[5, 6] is essentially based on the study o f the conti­
guity o f the p recu rso r and p ro d u c t am ino acid co­
dons. W ong [5] uses the hypergeom etric d istrib u ­
tio n to establish the co rrelatio n between codon
allocation in the genetic code and biosynthetic
p athw ays o f the am ino acids. In p articu lar, W ong
[5] considers th a t fo r every precu rso r codon, there
will be a o th er triplets in the genetic code, which
are contiguous to the am ino acid codifying codons
a n d b oth er triplets th at are not. If a p ro d u ct o f a
given am ino acid is codified by n codons, the ra n ­
d o m probability, P, th a t as m any as x o f these n
co dons will be co n tig u o u s to som e o f the precursor
codons, can be determ ined according to the hyper­
geom etric d istrib u tio n [27]:
p_ y
a\
v (tf-.x:)!.x!
b\
(b~n+ x)\(n~x)\
(a+b~n)\n\
(a+b)\
W ong [5], therefore, associates a probability, P,
to every pair o f p recu rso r-p ro d u ct am ino acids
th a t this p articu lar association was determ ined by
chance. F inally, he [5] calculates the - 2 -lnP
qu an tities an d , uses F ish er’s [28] m ethod, which
established th a t the - 2 • InP q u an tity is distributed
according to the x 2 law w ith tw o degrees o f free­
d o m (d/). He then sum s these q u antities ( - 2 T n P),
a n d from the to ta l x 2 establishes the p ro bability of
the aggregate fo r the set o f the precurso r-p roduct
p airs th a t he considers.
W ong [5] exam ines eight pairs o f am ino acids
fo r w hich the p recu rso r-p ro d u ct relationship is cer­
tain and finds a x 2 value o f 45.01 (df = 16) and,
therefore, a p ro b ab ility o f 1.4 x 10 4 th a t for these
eight pairs the set o f p recu rso r-p ro d u ct contiguities
was determ ined by chance. (T he calculation o f the
probability associated to a given x 2 value was ca r­
ried out using equation 4.2 show n in L ancaster
[29].)
In this analysis W ong [5] does not include pairs
o f p recursor-product am ino acids for w hich the re­
lationship is less certain alth o u g h very plausible. I
have extended W ong’s analysis so as to include a
further seven pairs for w hich the contiguity be­
tween the precu rso r-p ro d u ct am ino acid codons
can be seen in the actual genetic code. The results
are show n in T able I. T he first eight pairs are those
taken from W ong [5], while the last seven pairs are
the ones th at I have added. The x2 value for
the seven additional p airs is 25.73 ( d /= 1 4 ,
P = 0.0280). C onsidering the set o f all fifteen pre­
cursor-product pairs (T able I), we get a x 2 value o f
70.74 (df= 30) and a p ro b ab ility o f 3.9 x 10“5 th at
the p recursor-product relationships were gen erat­
ed by chance.
The physicochem ical hypothesis [15, 16] a ttrib ­
utes a fundam ental role to the physicochem ical
properties o f the am ino acids in the organization
o f the genetic code. It could be asked, therefore, as
in the p recu rso r-p ro d u ct relationship, w hat the
probability is th a t the choice o f th a t particu lar pair
o f am ino acids was determ ined according to the
Table I. Random probability o f precursor-product
codon contiguities. The parameters a, b, n, x and P are
defined by Eqn. (1) and in the text. For the precursorproduct pairs that are biosynthetically interconvertible,
as can be seen in W ong’s [5] figure 1, the precursor-product direction has been established on the basis o f this fig­
ure, shifting from the central am ino acids (Asp, Glu)
towards the amino acids on the outside (products).
Precursorproduct
a
S er-T rp
S er-C y s
V a l-L e u
T h r-Ile
G ln -H is
P h e-T y r
G lu -G ln
A sp -A sn
T h r-M et
A la -S e r
G ly -S e r
A la -G ly
G lu -A sp
A sp -A la
G lu -A la
34 24
34 24
24 36
24 36
14 48
14 48
14 48
14 48
24 36
24 36
24 36
24 36
14 48
14 48
14 48
b
n
.r
P
1
2
6
1
2
6
0.586
0.339
0.00268
0.0591
0.0481
0.0481
0.0481
0.0481
0.400
0.167
0.780
0.0218
0.0481
0.217
0.217
3
2
2
2
2
1
6
6
3
2
2
2
2
1
4
2
4
4
4
4
2
2
2
2
- 2 T nP
1.07
2.16
11.84
5.66
6.07
6.07
6.07
6.07
1.83
3.58
0.50
7.65
6.07
3.05
3.05
M. Di G iulio • G enetic C od e E volution
physicochem ical prop erties o f am ino acids. In o r­
der to calculate this pro b ab ility , we first o f all need
an index to m easure the sim ilarity o r dissim ilarity
between the am ino acids. Since we can expect that
such an index m ust co rrelate w ith som e m easure o f
evolutionary acceptability [7], i.e. the substitution
o f one am ino acid w ith an o th e r, the choice seems
to fall on G ra n th a m ’s [30] index, because it co rre­
lates well w ith the relative su b stitu tio n frequencies
o f am ino acids. Since this index is the com bination
o f three am ino acid p ro p erties (polarity, m olecular
volum e and com position), tw o o f w hich have been
recognized as being im p o rta n t in the o rganization
o f the genetic code [21, 25], it seems a t present to
be the best index available in literatu re and could
potentially be used in the below analysis.
The probability associated to a given pair o f
p recursor-p ro d u ct am ino acids w hich is the expres­
sion o f their physicochem ical p roperties, m ust be
affected by the nu m b er o f tim es th a t this p articu lar
pair o f am ino acids is selected in the genetic code,
as this num ber inform s us o f the im p o rtan ce that
selection has a ttrib u te d to a given p recu rso r-p ro d ­
uct pair.* The coevolution hypothesis is verified
* As 1 have said above from a general point o f view the
probability associated to a given pair o f precursorproduct amino acids must reflect an aspect o f the
genetic code, indicating the im portance that the choice
o f that pair has had in the organization o f the code.
For instance, for the precursor-product pairs
Val-Leu and Ser-Trp it seems natural to believe that
this probability must be affected by the fact that in the
genetic code the Val-Leu pair is chosen 6 times while
the Ser-Trp is chosen only once. Therefore, in the ev­
olution o f the code the Val-Leu pair must have been
more important than Ser-Trp, and this must be re­
flected in the probability. M oreover, it does not seem
relevant to think that the number o f times that the
code chooses a given pair o f amino acids (or equiva­
lently, the number o f contiguous codons; see foot­
note **), must be normalized compared to an expect­
ed value. (This expected value could be determined:
(i) on the basis o f the total number o f times that a
given amino acid interchanges with other amino acids,
or (ii) compared to other and not better specified
characteristics o f the genetic code.) In this case the
choice made during the evolution o f the code would
be replaced with another number that, even taking
other aspects o f the code into account, could not be
more correct than the one given by the number o f
limes that an amino acid interchanges with another.
This is because it is this number that acted during the
evolution o f the code, and it is to this number that a
probability, which must in some way be an expression
o f the physicochemical properties o f the amino acids,
307
on the notion o f contiguous codons between the
precursor and the p ro duct [5] and not on the n u m ­
ber o f times th at a given p recursor changes in to a
product, when its codons undergo a single base
change. It is clear th at in o rd er to com pare the two
evolutionary hypotheses, we m ust also consider
the case in which the weight to be attrib u te d to the
physicochem ical distance o f the p recursor-product
pair is based on the num ber o f contiguous codons
between the precursor and its p roduct, an d not
only on the num ber o f tim es th at tw o in ter­
change.**
has to refer. This number represents a link between
the pair o f amino acids that was determined during
the evolution o f the code and that conditioned its or­
ganization. M oreover, its substitution with another
normalized number, which is abstract to som e extent,
cannot be considered because in this context it would
not reflect the actual organization o f the genetic code.
Finally, it must be stressed that the probabilities cal­
culated by W ong [5] depend in the final analysis, al­
m ost exclusively on the number o f contiguous codons
between the two amino acids. Therefore, a probability
reflecting the physicochemical properties o f the amino
acids and which also incorporates an aspect o f the ge­
netic code, must also be affected by the same weights,
at least for comparative purposes as in this paper.
** In actual fact all the calculations have been carried
out in conditions that are meaningful for the coevo­
lution hypothesis. The use o f hypergeometric distri­
bution (Epn. (1)) on the set o f codons contiguous to
a given precursor, is meaningful for providing a sta­
tistical base for the coevolution hypothesis, but has
no real biological meaning. On the other hand, the
mutational or translational structure o f the genetic
code does have a biological meaning. In other words,
for the Ser-Cys pair for instance, the important thing
from a biological point o f view is not the fact that
2 Cys codons are contiguous to the Ser codons, but
the fact that Ser (Cys) transforms into Cys (Ser)
4 times, when its codons undergo single mutations or
translation errors. The two evolutionary hypotheses
should, therefore, be compared in the more appro­
priate field to the mutational (translational) struc­
ture o f the genetic code. I have performed these cal­
culations, i.e. I have extended the use o f hypergeo­
metric distribution to the set o f all 392 changes o f
sense defined in the genetic code after redefining the
parameters. And in parallel, in the Z test (see below)
I have used the number o f times that a given precur­
sor transforms into the product (on the base o f the
genetic code) as a weight to be attributed to the dis­
tance between the precursor and its product. I have
obtained a similar result to the ones referred to be­
low, even if the differences in the probability values
associated to the x 2 values are more pronounced.
These results are not given because they reach the
same conclusions, albeit more markedly, as those
shown in the paper.
308
The pro b ab ility associated to a given precursorp ro d u ct pair is calculated by Z test [31]. In order to
un d erstan d w hy this test is used, we m ust first
m ake som e considerations.
R egardless o f the m echanism determ ing the al­
location o f am ino acids in a p recu rso r-p ro d u ct re­
lationship, if in th a t m echanism the physicochem i­
cal p roperties o f the am ino acids were involved
(here m easured using G ra n th a m ’s [30] distances),
then the only distances involved in this allocation
were the 37 distances betw een the p recursor and its
p ro d u ct. If the contiguity betw een the codons o f
the precu rso r and those o f the pro d u ct was, for
exam ple, determ ined by a co m petition m echanism
betw een these tw o am ino acids so as to be charged
o n to prim o rd ial tR N A s [5, 10], then the distances
in w hich the p recu rso r an d p ro d u ct am ino acids
are not involved w ould not have been at all able to
affect the contiguity betw een these tw o am ino
acids. H ence the question we m ust answ er is: w hat
is the pro b ab ility th at the characteristic distance of
the p recu rso r-p ro d u ct pair (w hen weight w ith the
n um ber o f contig u o u s co dons betw een these two
am ino acids) w as extracted from the p o p u lation o f
the 37 distances in w hich the precursor and the
p ro d u ct am ino acid participate, and not from the
p o p u latio n o f the 190 distances in w hich all tw enty
am ino acids are involved?
This is an unu su al but favorable statistical situa­
tion. The above argum ents help us understand
th a t the 37 physicochem ical distances, in which the
p recu rso r an d its p ro d u ct are involved, do not re­
present the sam ple b u t are the p o p u latio n o f all the
interactions (at least o f those th a t have been signif­
icant in the definition o f the genetic code) between
the p recu rso r-p ro d u ct and the molecules, such as
tR N A , which m ay have determ ined the position
occupied by these am ino acids in the genetic code.
Since we can identify the m ain statistical indices of
this p o p u latio n , we are in the m ost advantageous
position to apply the Z test [31]. M ore specifically,
by calculating the m ean and variance o f the 37 dis­
tances, we can ask w ith w hat p robability the dis­
tance o f the p recu rso r-p ro d u ct pair was extracted
from this p o p u latio n o f distances, when its n um er­
ousness is given by the n u m b er o f contiguous co­
dons betw een these tw o am ino acids.
T hus, the value o f the p recu rso r-p ro d u ct dis­
tance is considered as a sam ple extracted from the
p o p u latio n o f the 37 distances. We can, therefore,
M. Di G iu lio ■G enetic C ode E volution
m easure w hether or n ot the precu rso r-p ro d u ct dis­
tance can be regarded as being indicative o f the
population o f the 37 distances representing its n a t­
ural control. F o r instance, in the case o f the ValLeu pair, the m ean and sta n d a rd deviation o f the
37 distances [30] in w hich Val an d Leu participate,
are equal to 93.784 and 48.448 respectively, and
since the num ber o f co n tiguous codons is 6, the
value o f the standardized n o rm al variable (Z) is
^ = (^v a i.L eu -^ )/(W W /2) = (32-93.784)/(48.448/
(6)12) = -3 .1 2 3 7 , where 32 is the value o f the
G ra n th a m ’s distance betw een Val and Leu. F rom
the value Z = -3 .1 2 3 7 we get the required p ro b a ­
bility: P ( Z ^ -3 .1 2 3 7 ) = 8.929 x 10“4. This p ro b a ­
bility indicates th a t the V al-Leu distance when
weighted with a frequency o f 6, ca n n o t be regarded
as being representative o f the p o p u latio n o f the 37
distances, and hence physicochem ical selective
pressures m ust have acted on the choice o f this pair.
The results o f this analysis are show n in
Table II. F o r the eight pairs analyzed by W ong [5]
we get P = 1.7 x 10~6(x 2 = 56.89, d / = 16) and for
the rem aining seven pairs we get P = 0.00894 (x2 =
29.50, d /= 14); while the p robability for the set o f
all fifteen am ino acid pairs is 2.3 x 10"7 (x 2 = 86.39,
d /= 30; Table II).
Discussion
In discussing these results it becom e im m ediate­
ly clear th at the coevolution hypothesis [5] does
not appear to be based on sufficiently solid fo u n ­
dations, since all the y j values th a t can be associat­
ed to this theory are system atically below those de­
rived from the physicochem ical postulates [15, 16].
Specifically, the y} p ro bability o f the set o f the fif­
teen pairs o f am ino acids derived from the physi­
cochem ical postulates is a b o u t 170 times sm aller
(about 80 times sm aller for the eight m ain pairs)
than the one calculated for the coevolution hy­
pothesis. A nd even if w ith the coevolution h y p o th ­
esis there should be a certain extension, albeit lim ­
ited, o f the m inim ization o f the physicochem ical
distances in the genetic code [7], m ay be relevant to
ask the following questions. H ow can a subsidiary
selective m echanism , the physicochem ical one, as
specified in the coevolution hypothesis [7], give
probability lower th a n those o b tain ed from the as­
sum ed m ain theory? O r are the tw o evolutionary
hypotheses perhaps aspects o f the sam e phenom e­
M. Di G iulio • G enetic C ode E volution
309
Table II. (i and a are the mean and the standard deviation, respectively, o f the 37
distances [30] in which the precursor and its product are involved. Dtj is the value
o f the Grantham’s [30] distance between the two amino acids; .v indicates the
number o f contiguous codons between the precursor and the product and this
number represents numerousness in the Z test [31]. Z obs = (D l} - }i)/(CT/(.r)12) and
P (Z
Z obs) is the probability o f observing that particular Z value, representing
the normal standardized variable: N (0,1).
S er-T rp
S er-C ys
V a l-L eu
T h r-Ile
G ln -H is
P h e-T yr
G lu -G ln
A sp -A sn
T h r-M et
A la -S e r
G ly -S er
A la -G ly
G lu -A sp
A sp -A la
G lu -A la
111.865
141.703
93.784
85.000
81.649
99.486
89.757
111.594
83.162
102.432
108.703
105.594
107.459
107.351
97.135
45.145
48.510
48.448
43.820
33.918
51.207
38.750
45.254
41.377
36.853
37.514
36.785
45.716
43.110
39.286
DU
177
112
32
89
24
22
29
23
81
99
56
60
45
126
107
X
1
2
6
3
2
2
2
2
1
4
2
4
2
2
2
JOS
N
a
Amino
acid pairs
+1.4428
-0 .8 6 5 9
-3 .1 2 3 7
+ 0.1581
-2 .4 0 3 7
-2 .1 4 0 0
-2 .2 1 7 4
-2 .7 6 8 6
-0 .0 5 2 2
-0 .1 8 6 2
-1 .9 8 6 8
-2 .4 7 8 9
-1 .9 3 2 2
+0.6118
+ 0.3551
P {Z =S Zobs )
- 2 - ln P
0.9255
0.15
0.1933
3.29
8.929 x 10“
4 14.04
0.5628
1.15
0.0081
9.63
0.0162
8.25
8.64
0.0133
11.74
0.00282
0.4792
1.47
0.4261
1.71
0.0235
7.50
10.04
0.0066
0.0267
7.25
0.7296
0.63
0.6387
0.90
non? However, W ong’s [5] results and those m en ­
tioned so far in this p ap er need to be clarified.
One possible in terp retatio n m ay be given by the
fact th a t the physicochem ical prop erties o f the
precursor-p ro d u ct am ino acids, m easured using
G ra n th a m ’s index are preserved along the biosyn­
thetic pathw ays linking them , even if a p riori there
is nothing th a t states th a t this should occur. I have,
therefore, calculated the p artial (linear) c o rrela­
tion coefficient [32] for the fifteen pairs o f am ino
acids between the - 2 T n P qu an tities (T able I and
II), while keeping co n stan t, an d thus elim inating,
the influence o f the x p aram eter values th a t link
the - 2 T n P variables. This gives a p artial co rrela­
tion coefficient o f R = +0.676 ( P < 0.005, t = 3.177,
d f= 12). The co rrelatio n is significant even when
we consider only the eight m ain am ino acids pairs.
This proves th a t G ra n th a m ’s distances are, at least
partly, preserved along the biosynthetic pathw ays
linking the precu rso r to the p ro d u ct am ino acid.
W ong’s [5] observations m ight, therefore, be
due to the fact th a t the physicochem ical p roperties
o f the am ino acids are preserved along the biosyn­
thetic pathw ays linking them . F u rth erm o re, if the
m ain selective pressure determ ining the o rg an iza­
tion o f the genetic code, was the one invoked by
the physicochem ical postu lates [15, 16], then this
pressure m ight not only have affected the o rg an i­
zation o f the code, but also have determ ined the
choice o f the biosynthetic pathw ays linking the
various pairs o f am ino acids.* M oreover, even if
coevolution between the biosynthetic pathw ays
and the genetic code is not questioned here, we are
trying to identify the selective pressure th at was
m ainly responsible for the organization o f the
genetic code.
The coevolution hypothesis identifies in the ex­
tension o f the chem ical varieties o f the am ino acid
lateral chains the m ain selective pressure determ in­
ing the organization o f the genetic code [6, 33]. If,
therefore, the in troduction o f the new am ino acids
occurred through the concession o f codons from
the precursor to the pro d u ct am ino acid then (and
so th a t there is a selectable extension o f the chem i­
cal varieties o f the protein molecule lateral chains),
* The correlation found between the - 2 • \nP quantities
(R = + 0.676, P < 0.005) may be interpreted different­
ly. The pairs o f amino acids for which there is a low
probability that they are involved by chance (on the
basis o f the genetic code), are characterized by having
very similar physicochemical properties. Therefore,
this association might have little to do with the precursor-product relationships, but could be a m anifesta­
tion o f the non-random distribution o f the physico­
chemical properties in the genetic code.
M. Di G iulio • G enetic C ode E volution
310
we can expect th a t the am ino acids in the precursorp ro d u ct relationship m ust have dissim ilar physico­
chem ical properties. Let us consider two develop­
ing biosynthetic pathw ays th a t link the same pre­
cu rso r to tw o pro d u cts, one sim ilar and the other
dissim ilar to the precursor. The selective ad van­
tage o f fixing the biosynthetic pathw ay th at links
the precu rso r to its dissim ilar p ro d u ct, m ust have
been g reater th an th a t fixing the com peting p a th ­
w ay th a t links the precu rso r to a sim ilar product.
T he reason for this is, if the m ain selective pressure
was indeed the one th a t increased the variety o f the
am ino acid lateral chains, then the intro d u ction o f
a p ro d u ct th a t was dissim ilar to its precursor
w ould certainly have caused a greater increase in
the chem ical varieties o f the am ino acid lateral
chains, com pared to th a t m ade by a pro d u ct simi­
lar to its precursor. T his view helps us understand
th a t in o rd er to fully verify the coevolution hy­
pothesis, the correlatio n coefficient between the
- 2 T n P q u antities m ust have been significant and
negative, i.e. co n trary to observations. This, im ­
plies the selection o f dissim ilar p recursor-product
pairs in the physicochem ical properties, which
w ould have rapidly expanded the chem ical varie­
ties o f the lateral chains o f the proteins, thus im ­
proving their catalytic efficiency.
T able III show s a code obtained from the genet­
ic code th ro u g h p erm u tatio n s o f am ino acids, for
w hich the above conditions hold true. This code
Table III. The code obtained from the actual genetic
code by simple permutation o f the amino acids, that is
keeping the blocks o f synonym ous codons unchanged as
in the actual code. See text for the meaning o f this code.
Ter-chain termination signal.
UUU
UUC
UUA
UUG
Glu
Glu
Ala
Ala
U C U lie
u c c lie
U CA lie
U C G lie
UAU
UAC
UAA
UAG
Phe
Phe
Ter
Ter
UGU
UGC
UGA
UGG
Gin
Gin
Ter
Lys
CUU
CUC
CUA
CUG
Ala
Ala
Ala
Ala
CCU
CCC
CCA
CCG
Met
Met
Met
Met
C AU
CAC
C AA
CAG
Tyr
Tyr
Val
Val
CG U
CGC
CGA
CGG
Gly
Gly
Gly
Gly
AUU
AUC
AUA
AUG
Trp
Trp
Trp
Ser
A CU
ACC
ACA
ACG
Thr
Thr
Thr
Thr
AAU
AAC
AAA
AAG
Pro
Pro
Cys
Cys
AGU
AGC
AGA
AGG
Ile
Ile
Gly
Gly
GUU
GUC
GUA
GUG
Asp
Asp
Asp
Asp
GCU
GCC
GCA
GCG
Arg
Arg
Arg
Arg
GAU
GA C
GAA
GAG
Asn
Asn
Leu
Leu
GGU
GGC
GGA
GGG
His
His
His
His
has the sam e fifteen precursor pro d u ct am ino acid
pairs o f the actual code. If we apply the hypergeo­
metric distrib u tio n (Eqn. (1)) we get, for the fifteen
pairs exam ined in this paper, an %2 value o f 87.46
which is greater th an the corresponding value
X2 = 75.11, o btained by applying the Z test (data
not shown). M oreover, this code (T able III) shows
a partial co rrelatio n coefficient betw een the
- 2 ln P quantities (d ata n o t show n) th at is nega­
tive and significant (R = -0 .4 4 3 , t = 1.712, d /= 12,
-P = 0.05). The co n stru ctio n o f this code merely
aim s to show the possibility th at bo th the x 2 value
and the sign and significance o f the correlation
coefficient betw een the - 2 -ln P quantities could
fully verify the coevolution hypothesis. W hen this
code (Table III) is com pared to the real code, it
seems to highlight the fact th a t o ther selective pres­
sures m ust have been in play. M oreover, although
the biosynthetic relationships between am ino acids
are certainly reflected in the organization o f the ge­
netic code, they do n o t seem to confirm the predic­
tions o f the coevolution hypothesis. In fact, these
observations seem to show th at the selective pres­
sure invoked by this hypothesis (even if so general
and hence un d o u b ted ly in play) should have p ro ­
duced a genetic code for the type show n in
Table III. This table shows th a t the concession o f
codons from p recursor to pro d u ct am ino acids is
m ore m anifest com pared to the physicochem ical
postulates.
Finally, the low p ro bability (P = 1.7 x 10“6,
X 2 = 56.89; T able II) associated to the eight m ain
pairs o f precu rso r p ro d u ct am ino acids could indi­
cate th at for these pairs there m ight have been a
com petition m echanism between the precursor
and the p ro d u ct to be charged o nto prim ordial
tR N A [5, 10]. This w ould entail the lack o f pairs o f
am ino acids in the genetic code in which the p re­
cursor am ino acid is involved b ut has no precursor
product relationships and, for which, a lower
probability th an 1.7 x 10“6 can n o t be calculated.
This does n o t agree w ith the following result. In
the genetic code we can choose eight pairs o f am i­
no acids in w hich the precu rso r am ino acid is in­
volved (Ser-A sn, Ser-Thr, Val-Ile, T hr-P ro, G lnLys, Phe-Leu, G lu-Lys, A sp-H is) and for which
we get P = 1.8 x 10 7 (y} = 62.70, df = 16; d ata not
shown). This p ro bability is ab o u t 9 tim es lower
than the one referring to the eight m ain pairs o f
p recursor-product am ino acids (T able II). H ow ­
M . Di G iulio • G enetic Code E volution
311
ever, even if these probabilities are a b o u t the sam e
m agnitude and, therefore, such as not to defini­
tively exclude the com petition m echanism between
the precursor product am ino acids to be charged
on the prim ordial tR N A ; this com petition m echa­
nism w ould have been based n o t on an am ino acid
distance index but rather on an am ino acid com pe­
tition index. There is some evidence th a t this am i­
no acid com petition m echanism was n o t one o f the
m ain factors determ ining the o rganization o f the
genetic code [34],
In my opinion the analysis m ade in this pap er
better highlights the relationships between the co ­
evolution and the physicochem ical hypothesis.
F urtherm o re, this analysis seems to have identified
in the preservation o f the physicochem ical p ro p e r­
ties o f the am ino acids in a p recu rso r-p ro d u ct rela­
tionship, a statistical significance factor higher
th an the one th at can be associated to the precursor-product relationships.
In conclusion, W ong’s [5] observation and those
show n here can be regarded as being tw o aspects
o f the sam e phenom enon. In this case, since the
theories discussed above are the results o f the sam e
selective pressure, they can n o t be easily differen­
tiated even with the com parison used in this paper,
and therefore the evaluation o f the relative im p o rt­
ance o f the tw o evolutionary hypotheses [7] in de­
term ining the organization o f the genetic code,
could lose som e o f its significance. This seems to be
confirm ed by the study perform ed by Ju rk a and
Sm ith [35], These au th o rs stress th at in the prebiotic
environm ent the ß-turns have becom e the object o f
selection and have affected b oth the evolution o f
the genetic code and the biosynthetic pathw ays for
am ino acids.
T he au th o rs [35] also note th a t the simple
codon-am biguity-reduction m odel o f the genetic
code [14] (which does n ot seem to be essentially
different from w hat has been here reported as the
physicochem ical hypothesis) m ay produce, a code,
in only tw o reduction steps th at accounts for
ß-turns, ß-sheets and hydrophilic-hydrophobic
structures. T hus, the coevolution between the bio­
synthetic pathw ays and the organization o f the ge­
netic code could be interpreted as a m anifestation
o f the im portance o f the preservation o f am ino
acid physicochem ical properties, which agrees
w ith the m ain result shown in this paper. This
gives a higher statistical significance in the case in
w hich the organization o f the genetic code is inter­
preted from the p o int o f view o f physicochem ical
properties, th an in the case where this is seen from
the point o f view o f precursor-product relation­
ships.
[1] T. R. Cech and B. L. Bass, Ann. Rev. Biochem. 55,
599(1986).
[2] F. H. C. Crick, J. Mol. Biol. 38, 367 (1968).
[3] T. R. Cech, Proc. Natl. Acad. Sei. U .S.A . 83, 4360
(1986).
[4] A. M. W einerand N. Maizels, Proc. Natl. Acad. Sei.
U .S.A . 84, 7383(1987).
[5] J. T. F. W ong, Proc. Natl. Acad. Sei. U .S.A . 72,
1909(1975).
[6] J. T. F. W ong, Proc. Natl. Acad. Sei. U .S .A . 73,
2336(1976).
[7] J. T. F. Wong, Proc. Natl. Acad. Sei. U .S.A . 77,
1083(1980).
[8] F. R. Salemme, M. D. Miller, and S. R. Jordan,
Proc. Natl. Acad. Sei. U.S. A. 74, 2820 (1977).
[9] J. T. F. W ong and P. M. Bronskill, J. M ol. Evol. 13,
115(1979).
[10] J. T. F. W ong, Proc. Natl. Acad. Sei. U .S .A . 80,
6303(1983).
[11] F. Zinoni, A. Birkmann, W. Leinfelder, and A.
Boch, Proc. Natl. Acad. Sei. U .S.A . 84, 3156 (1987).
[12] A. Schon, C. G. Kannagara, S. Gough, and D. Soli,
Nature 331, 187(1988).
[13] D. Soil, Nature 331, 662 (1988).
[14] W. M. Fitch and K. Upper, Cold Spring Harbor
Sym. Quant. Biol. 52, 759 (1987).
[15] T. M. Sonneborn, in: Evolving Genes and Proteins
(V. Bryson and H. J. Vogel, eds.), pp. 3 7 7 -3 9 7 ,
Academic Press, New York 1965.
[16] C. R. W oese, D. H. Dugre, M. Kando, and W. C.
Saxinger, Cold Spring Harbor Sym. Quant. Biol. 31,
723(1966).
[17] C. J. Epstein, Nature 2 1 0 ,25 (1966).
[18] A. L. Goldberg and R. E. Wittes, Science 153, 420
(1966).
[19] M. V. Volkenstein, Biochim. Biophys. Acta 119, 421
(1966).
[20] C. Allf-Steinberger, Proc. Natl. Acad. Sei. U .S.A.
6 4 ,5 8 4 (1 9 6 9 ).
[21] J. R. Jungck, J. M ol. Evol. 11,211 (1978).
[22] A. L. Weber and J. C. Lacey, Jr., J. M ol. Evol. 11,
199(1978).
[23] R. V. W olfenden, C. C. F. Cullis, and C. C. B.
Southgate, Science 206, 575 (1979).
[24] M. Sjostrom and S. W old, J. M ol. Evol. 22, 272
(1985).
[25] M. Di G iulio, J. M ol. Evol. 29, 191 (1989).
[26] M. Di Giulio, J. M ol. Evol. 29, 288 (1989).
[27] N. L. Johnson and S. Kotz, Urn M odels and their
A pplication, Chapter 2, pp. 7 9 -8 3 , John Wiley &
Sons, New York, London, Sydney, Toronto 1977.
[28] R. A. Fisher, Statistical M ethods for Research
312
Workers, 11 th ed., p. 99, Oliver & Boyd, Edinburgh,
London 1950.
[29] H. O. Lancaster, The chi-Squared Distribution,
p. 21, John Wiley & Sons, New York, London,
Sydney, Toronto 1969.
[30] R. Grantham, Science 185, 862 (1974).
[31] L. N. Balaam, Fundamentals o f Biometry, Chapter
6, George Allen & Unwin (eds.), London 1972.
M. Di G iulio ■G enetic Code E volution
[32] D. A. Kenny, Correlation and Casuality, pp. 5 4 -5 6 ,
John Wiley & Sons, N ew York, London, Sydney,
Toronto 1979.
[33] J .T . F. W ong, TIBS 6 ,3 3 (1 9 8 1 ).
[34] M. Di G iulio, The Evolution o f the Genetic Code: A
Comparison between some Theories (unpublished
results, available on request).
[35] J. Jurka and T. F. Smith, J. Mol. Evol. 25, 15 (1987).