The Universal Genetic Code described by a Model

The Name of the Journal
1
The Universal Genetic Code described by a
Model based on Group Theory
Paola Pozzo
Abstract
T
his study was performed, in order to verify if a mathematical model based on group theory can
describe the universal genetic code and codon sequences behavior. We investigated the reproducibility of the 64 variations of nucleotide triplets forming the genetic code and observed that
the model based on a 4th order cyclic group can describe the DNA sequence with a unique distribution
of nucleotides. Moreover, the model reproduces the shape of the 3D structures of codon sequences in
some cases with surprising precision. We investigated also the possibility that special points along the
sequence can represent binding sites. The comparison of our sets of points with the predictions of some
specific tools gives promising results. Additionally, the analysis of the triplet codon selection of some
proteins seems to confirm the recent idea that 5’-sequence of mRNAs strongly influences their translation.
1 Introduction
Group theory (Rotman, 1994) is the mathematical instrument used to study objects symmetries in all disciplines using mathematical models and computational
techniques. The general procedure to describe a system
is to associate it to a suitable symmetry group. Various
physical systems can be modelled by symmetry groups.
Thus, group theory has many applications in physics
and chemistry, but does not have up to now many applications in biology. We investigated the possibility that
a mathematical description based on group theory can
describe the universal genetic code and codon sequences
behavior.
The genetic code (Crick, 1988) is the set of rules, by
which information encoded in DNA or mRNA sequences is translated into proteins. The genetic code defines a
mapping between nucleotide triplets of mRNA, called
codons, and amino acids forming the backbone of a protein (Ridley, 2006). Through nearly all species the same
translation code is used, which is referred to as universal
genetic code (Elzanowski & Ostell, 2008). Coding regions of genes can be considered as short instructions
built up by the “letters” of the DNA alphabet. The genetic code is degenerated, since there are 64 variations of
nucleotide triplets, but only 20 amino acids and a trans-
Issue 1(2), 2010
lation stop signal to be coded by them. Consequently,
some amino acids are encoded multiple times and many
different combinations of codons can build the same
protein. The purpose of this study was to verify that the
genetic code can be described with a mathematical model based on group theory and to verify if the sequences
built following this model are consistent with the properties of the real sequences.
2 Short Introduction to Group Theory
In mathematics, a group is an algebraic structure consisting of a set of objects together with an operation (composition law) that combines any two of its elements to
form a third element. Each group has an element neuter
respect to the composition law: any element is invariant
if combined with the neuter one. A cyclic group (Harary,
1994; Lomont, 1987; Scott, 1987) is a special group of
objects generated by a single element, in the sense that
the group has an element g (called “generator") such
that, when written multiplicatively, every element of the
group is a power of g. If gn gives the neuter element, n is
the order of the group. The elements of any group are
partitioned into equivalence classes; members of the
same class share many properties. One of the simplest
cyclic groups is the 4th order cyclic group C4, the set of
The Name of the Journal
2
all integer multiples of rotation by 90° in a threedimensional orthogonal space. The composition law is
the consecutive application of rotations and neuter element is the class of multiples of rotation by 360°.
To any group is associated a multiplication table,
describing the relations between the equivalence classes
(therefore between the elements). Table 1 shows the
multiplication table for the 4th order cyclic group.
the element (i, j) = (1, 3) represents a rotation of 90°
around z axis. Each element is identified in a unique way
by two parameters associated respect to Table1 and Table 2. The neuter elements have the values 1 and 4 for
representations and multiplication tables respectively.
For the other elements, Table 1 gives the corresponding
multiplication table parameters (the class parameters)
and Table 2 gives the corresponding one-dimensional
representations table parameters, as shown in Table 4.
E
A
A2
A3
E
e
A
A2
A3
11
12
13
A3
A3
e
A
A2
21
22
23
A2
A2
A3
e
A
31
32
33
A
A
A2
A3
E
Matrix element indices
Table 3: ij matrix indices
Table 1: Multiplication table
Matrix element indices
Groups can be represented in several ways. The
simplest way is to use one-dimensional representations.
Table 2 shows the one-dimensional representations for
the 4th order cyclic group.
11
12
13
21
22
23
31
32
33
Multiplication table params
R1
R2
R3
R4
1
1
1
E
1
1
1
1
2
2
2
A
1
-1
i
-i
3
3
3
A2
1
1
-1
-1
A3
1
-1
-i
i
Table 2: One-dimensional representations table
One-dimens. table params
-1
i
-i
1
-1
-1
-1
-i
i
Each class (called E the neuter one and A, A2 and
A3 respectively) contains four elements. The neuter element is always represented by the real integer 1. The
Table 4: Multiplication and one-dimensional tables parameters for cyclic group C4
group may also be represented with 3 × 3 matrices Gij, in
which i is the number of the row and j the number of the
column. In Table 2 the rows represent the classes, the
columns specify the elements into the classes. The notneuter elements will be identified by the couples ij of
indices in Table 3.
As we said, this is the group of all integer multiples
of rotation by 90°, therefore the index i (equivalence
class index) represents the amplitude of the rotation
associated to the class. The j index related to the element
into the class specifies the axis of rotation. For example,
Any state of the system described by the group is
represented with a vector in the orthogonal space, summation of the vectors representing the single elements of
the classes. The rotation matrices associated to each state
applied to the vector will represent the evolution of the
system. The parameters of any state of the system will be
the composition of the parameters of the single elements,
and have to be equal 1 and 4 for the representation and
multiplication tables respectively. Neutrality with respect to the one-dimensional representations table
Issue 1(2), 2010
The Name of the Journal
3
means that the total products of the parameters of each
state have to be equal to unity. This is the request for the
corresponding matrix to have real Eigenvalues. In fact
only real values can represent real systems. Neutrality
with respect to the multiplicative table is a gauge invariance request (Frampton, 2008). Gauge invariance is the
property of a system to be invariant under a group of
local transformations.
3 Group Theory and Genetic Code
Given this short introduction, we ask how such a model
could describe DNA double helix and the genetic code.
In the previous section, we defined a group as follows:
“In mathematics, a group is an algebraic structure consisting of a set of objects“. In our case we have four objects into two separate classes with two elements each,
the purines (AG) and pyrimidines (CT or CU) (Figure 1).
PURINES
PYRIMIDINES
A
Adenine
G
Guanine
T/U
Thymine Uracil
C
Cytosine
Figure 1: Basic classes in genetic code: purines and pyrimidines : As starting point we have two separate classes, purines and pyrimidines, containing two elements
each: Adenine and Guanine in the first one and Cytosine and Thymine or Uracil in the second one.
Two classes with two elements each can lead to a
3rd order group (two classes plus the neuter element),
but this is not enough to describe the genetic code. In
fact for each codon we need three nucleotides. A 4th order group seems to be the correct choice. The basic step
is to define the correct distribution of nucleotides (purines and pyrimidines) into the equivalence classes of
the group, and then to verify that this distribution is the
only one that generates the double helix structure.
To describe the double helix, the pair of indices (i,
j) to identify a class element (nucleotide) is not enough.
We need a second pair of indices (alpha, beta) to identify
the side of the double helix. Therefore, it is convenient to
define each nucleotide with a function using the set of
indices (i, j) to identify the corresponding element into
Issue 1(2), 2010
the group and (alpha, beta) to indicate the side of the
double helix. (alpha, beta) can assume the values (1, 0) or
(0, 1).
The 4th order group has 4 classes with 4 elements
each. The 4th class is the neuter class and into each class
the 4th element is the neuter element. The consequence is
that we need 9 elements, three for each class. We have
only four elements, and the permutations of the 5 missing elements are 5! = 120, corresponding to 120 different
sets of equivalence classes. All the permutations involved have been analyzed.
To reproduce the double helix structure we have to
generate couples of nucleotides neuter respect to Table 1
and Table 2. This means that the parameters of the corresponding couples will be 1 and 4 respectively. For example the (i, j) elements (1,2) and (3,2) have as parameters the values (1, i) and (3, –i) and generate a correct
couple. The composition of the two class parameters is 1
& 3 = 4, that is the neuter value. The multiplication table
parameters are complex values, but i multiplied by –i
gives the real integer 1. The alpha/beta indexes (indicated with 10 and 01 respectively) in any couple are set as
of opposite symmetry, referring to the opposite sides of
the double helix. All the permutations involved have
been analyzed to generate the pairs of nucleotides following the group rules, but only two distributions, in
Table 5 and Table 6, generate the correct double-helix
structure.
A
A
G
C
C
U/T
C
U/T
G
A
G
U/T
U/T
A
C
G
C
A
Table 5
Table 6
Table 7 and Table 8 show the results generated
with Table 5 distribution.
All the other sets of equivalence classes generate
couples not compatible with DNA double helix. The
correct distribution is the one in Table 5. In fact, verifications regarding the reproduction of 3D structure of proteins have shown that Table 6 distribution does not
build a correct shape. For more details see Appendix A.
The first important result is that we can well reproduce the genetic code, namely all the 64 variations of
nucleotide triplets used to encode 20 amino acids, with
the pairs of nucleotides just generated. To generate the
The Name of the Journal
4
bases
a
AT
b
AA
c
GC
d
CC
e
TT
f
GG
Table 7: pairs of nucleotides generated accordingly to
the group rules.
ac
alpha/beta
1010
alpha/beta
1001
alpha/beta
0110
alpha/beta
0101
nucleotides
ATGC
alpha/beta
1001
alpha/beta
1010
alpha/beta
0101
alpha/beta
0110
nucleotides
AATT
alpha/beta
1001
alpha/beta
0101
alpha/beta
1010
alpha/beta
0110
nucleotides
CCGG
be
df
Table 8: Combinations consistent with the DNA double
helix.
triplets of nucleotides we have to start from the consideration that the three bases of one amino acid are a sequence taken from the same side of the double helix. To
generate a codon of three bases we considered all the
possible combinations with 6 elementary states of opposite symmetry built with the couples of bases just generated, extracting only the sets of 3 with the same values of
(alpha, beta), that is belonging to the same side of the
double helix. Each codon has its set of two parameters,
calculated with the parameters of the three nucleotides
forming the codon. Therefore any sequence built with
the set of amino acids has its set of two parameters, cal-
Issue 1(2), 2010
culated with the parameters of the single codons forming the sequence.
As we said, the reproduction of the genetic code is
the first important result, but now we have to verify if
the sequences built with the codons generated with this
model are consistent with the properties of the real sequences. The set of amino acids has been used to build
about 100 protein sequences, analyzing for each one the
reproduction of 3D structure, binding sites predictions
and codon distribution along genes.
4 Proteins Three-Dimensional Structure
The three-dimensional structure of a protein is determined by the amino acid sequence. In fact, each protein
is translated from a sequence of mRNA to a linear chain
of amino acids. Amino acids interact with each other to
produce a well-defined three-dimensional structure, the
folded protein. For many proteins, the correct threedimensional structure is essential to function and failure
to get the correct structure produces inactive proteins
with different properties.
We said “Any state of the system described by the
group is represented with a vector in the orthogonal space,
summation of the vectors representing the single elements of
the classes. The rotation matrices associated to each state applied to the vector will represent the evolution of the system”.
In this model, each synonymous codon has associated
the three-dimensional rotation matrices of nucleotides.
Therefore, synonymous codons encode the same amino
acid, but have different table parameters and also different rotation matrices associated.
As we already said in “Short Introduction to Group
Theory”, the 4th order cyclic group is the group of all
integer multiples of rotation by 90°, therefore the index i
(equivalence class index) represents the amplitude of the
rotation associated to the class. The j index related to the
element into the class specifies the axis of rotation. Table
9 reports the rotations corresponding to each element
that is to each nucleotide. For example serine:
Codon
UCA
UCC
UCU
Matrix
Elements
(2,2)(3,3)(1,2)
(3,1)(3,3)(3,3)
(3,1)(3,3)(3,1)
Rotations
180° axis Y, 270° axis Z, 90° axis Y
270° axis X, 270° axis Z, 270° axis Z
270° axis X, 270° axis Z, 270° axis X
The three sequences of rotations are very different, and
the choice of one codon gives a different contribution to
The Name of the Journal
5
the “action" of the sequence, that is to build the correct
three-dimensional structure.
(1,1)
90° axis x
(2,1)
180° axis x
(3,1)
270° axis x
(1,2)
90° axis y
(2,2)
180° axis y
(3,2)
270° axis y
(1,3)
90° axis z
(2,3)
180° axis z
(3,3)
270° axis z
Table 9: ij matrix elements rotations
How do we apply rotations to build a 3D shape?
Each nucleotide is a state of the system, represented by a 3 × 3 matrix and each 3 × 3 matrix has a corresponding state vector in the orthogonal space of the
group. Also a codon represents a state of the system: the
corresponding vector is the sum of the single nucleotides
vectors and the corresponding matrix is the sum of the
single nucleotides matrices. In the example of serine
UCA codon, the 3 × 3 matrix is the sum of the U, C and
A nucleotides matrices, as shown in Table 10.
U nucl.
matrix
C nucl.
matrix
-1
0
0
0
1
0
0
0
-1
×
0
0
0
1
1
0
0
0
1
=
0
0
0
-1
1
0
0
0
-1
1
1
0
0
0
-1
0
1
-1
-1
0
0
U * UCA = T1
0
-1
0
1
0
0
0
0
1
×
A nucl.
matrix
0
0
0
-1
1
0
0
0
-1
=
0
0
0
C * T1 = T2
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
-1
0
1
0
1
0
0
×
0
0
0
1
1
0
0
0
-1
=
0
0
0
A * T2 = T3
A+U+C
UCA codon matrix
0
1
0
0
1
0
0
0
1
Table 10: U, C and A nucleotides matrices and their
sum UCA codon matrix
The three rotations associated are: 180° respect to Y
axis, 270° respect to Z axis and 90° respect to Y axis. Table 11 shows the corresponding rotations matrices (see
Appendix B).
-1
0
0
0
1
0
0
0
1
0
1
0
-1
0
0
0
1
0
0
0
-1
0
0
1
-1
0
0
Table 11: 180° Y axis, 270° Z axis and 90° Y axis rotations matrices
Issue 1(2), 2010
After joining the triplets to form a sequence, we
have something like a “flat strip”. To build a 3D shape
on an orthogonal space we need a set of points with
coordinates (x, y, z).
We set the origin of an orthogonal reference frame
at the beginning of the sequence. The coordinates of the
first point are the coordinates of the vector representing
the first codon after the application of its three rotations.
Let us take as example serine UCA as first codon. To
execute the rotations, we will multiply the three matrices
in Table 11 with the codon matrix in Table 11 following
the order of the nucleotides:
The resulting vector, represented by the T3 matrix,
has coordinates (-3, 2, -2): this is the first point of the set
we will use to build the 3D shape.
Then we translate the origin of the reference frame
to the tip of the resulting vector, where the second codon
starts. In our example, we translate the origin (0, 0, 0) of
the reference frame to the point (-3, 2, -2). We apply then
the three rotations of the second codon to the vector
starting in position (-3, 2, -2). Thus, we obtain the coordinates of the second point that will be of course converted into the original reference frame. Then we translate again the origin of the reference frame to the tip of
the resulting vector and so on until the end of the sequence. In the present study, the set of resulting coordinates has been used as input for gnuplot http://www
.gnuplot.info/download.html, free software that displays various mathematical functions and numerical
data. The result reproduces the shape of the final three-
The Name of the Journal
dimensional structure of the protein, in some cases with
surprising precision.
It is like if the rotations associated to the sequence
of codons express the force acting between the nucleotides to produce the geometry of the final structure.
The rotations do not represent the real rotations of
the linear chain in the real physical space, but the rotations of the state vector representing the sequence in the
orthogonal space of the group.
In Appendix C is reported the full example of the
protein Nisisn 53 http://www.ncbi.nlm.nih.gov/protein
/ABV64388.1 . Some examples are given in Figures 2, 3,
4 and 5. The pictures are taken from the NCBI web site.
5 Binding Sites Comparison
A binding site is a region on a protein, DNA, or RNA to
which specific other sequences, generically called ligands, form a chemical bond. Binding sites allow a protein to interact with specific ligands, therefore predicting
the binding sites between two interacting proteins provides important clues to the function of the protein itself.
There are points in a sequence where the corresponding sub-sequence is neuter, with the total equivalence class parameter (related to multiplication table)
equal to 4. For example, if the position 125 is a neuter
point, this means that the sub-sequence of the first 125
codons has a multiplication table parameter equal to 4,
neuter. We investigated the possibility that the neuter
points along a sequence can represent potential binding
sites, comparing our results with the predictions of some
specific tools.
A ligand cannot bind to all the neuter points, but
only to sites where there is “affinity”, and the parameters of single codons are the “key”. In fact if in a protein
there are 50 neuter points, the sub-sequences are neuter,
but the corresponding codons in the specific positions
have their own parameters. This is valid of course for
both the protein and the ligand. To have an effective
bond, the binding site parameters of the ligand composed together with the binding site parameters of the
protein must give the neuter values.
Example: codons 120 and 175 are neuter points,
which are possible binding sites. The total multiplication
table parameters of the two subsequences is 4, but the
corresponding codons have respectively parameters (3,
1) and (2, –i). If a ligand on a binding site has parameters
(2, –i) cannot bind to both sites of the protein, but only to
point 175: 2 and 2 gives 4 as parameter for the class, but
Issue 1(2), 2010
6
for point 120 we have parameters 3 and 2, and the cyclic
result is 1, not neuter (3 + 2 = 5 – 4 = 1).
The analysis of the regions close to the neuter
points can give information about the stability of the
bond. High compatibility between the parameters of the
codons in the regions around the neuter points indicates
a stable resulting bond.
The difficult point is to compare the binding sites
of the model with “real” binding sites or at least with
reliable prediction tools. There are many tools in bioinformatics, and after a careful investigation, we have selected for the comparison two of the most used ones:
Match and Alibaba2 of Gene Regulation. Both use the
binding sites collected in TRANSFAC database. In both
tools, we made many restrictions in the setup of parameters. In fact using the standard parameters the list of the
predicted binding sites is very long, and covers almost
the all sequence, becoming meaningless for a comparison.
Match http://www.gene-regulation.com/cgi-bin/
pub/programs/match/bin/match.cgi is used with the
cut-offs parameters for core and matrix similarity set to
1.0 and 0.93 respectively. The matrix similarity is a score
that describes the quality of a match between a matrix
and an arbitrary part of the input sequences. Analogously, the core similarity denotes the quality of a match between the core sequence of a matrix (i.e. the five most
conserved positions within a matrix) and a part of the
input sequence.
To use Alibaba2 http://www.gene-regulation.com
/pub/programs/alibaba2/index.html we set the parameter Similarity of sequence to matrix equal 100. This
parameter measure the similarity between the matrix
and the sequence analyze in percent. 100 % means that
the most often occurring nucleotides in matrix (the matrix’s consensus) are the same like in the unknown sequence. 1% means that the unknown sequence is just
similar to the matrix. The blue lines represent the predictions of our tool, the red lines represent Match and
Alibaba2 predictions.
In Figure 6 we consider Camp http://www.ncbi.
nlm.nih.gov/protein/CAG46759.1
Figure 6A reports the comparison between our
predictions and Match tool with the very strict choice for
cut-offs parameters for core and matrix similarity set to
1.0 and 0.93 respectively. The Match predictions fit very
well in our biding sites regions.
The Name of the Journal
7
Collagen 316
Figure 2: Proteins three-dimensional structure: Collagen. The three-dimensional structure of a protein is determined by the amino acid sequence. In fact, each protein is translated from a sequence of mRNA to a linear chain
of amino acids. Amino acids interact with each other to produce a well-defined three-dimensional structure, the
folded protein. In this model each synonymous codon has an “action” associated, represented with the threedimensional rotation matrices of nucleotides. The rotations performed following the nucleotides distributions
reproduce the shape of the final three-dimensional structure of the protein. It is like if the rotations express the
force acting between the nucleotides to produce the final structure. The shape of collagen is reproduced in a
surprising way.
Acetylcholine receptor 519
Arp2/3 304
Figure 3: Proteins three-dimensional structure. The three-dimensional structure of Acetylcholine receptor and
Arp2/3
Issue 1(2), 2010
The Name of the Journal
8
Dystrophin 71
Histone 448
Nisin 53
Figure 4: Proteins three-dimensional structure. The three-dimensional structure of Dystrophin, Histone and
Nisin.
Issue 1(2), 2010
The Name of the Journal
9
Scramblase 379
Serum albumine 607
SMN1 288
Figure 5: Proteins three-dimensional structure. The three-dimensional structure of Scramblase, Serum albumine and SMN1.
Issue 1(2), 2010
The Name of the Journal
10
2.5
2
1.5
Series2
(A)
Series1
1
0.5
0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101 106 111 116 121 126 131 136 141 146 151 156 161 166
2.5
2
1.5
Series2
(B)
Series1
1
0.5
0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101 106 111 116 121 126 131 136 141 146 151 156 161 166
2.5
2
1.5
Series2
(C)
Series1
1
0.5
0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101 106 111 116 121 126 131 136 141 146 151 156 161 166
Figure 6: Binding sites comparison .The binding sites predictions of our tool are compared with the predictions of two
of the most used tools: Match and Alibaba2 of Gene Regulation. Both use the binding sites collected in TRANSFAC database. In both tools we made many restrictions in the setup of parameters. In fact using the standard parameters the list
of the predicted binding sites is very long, and covers almost the all sequence, becoming meaningless for a comparison.
Match http://www.gene-regulation.com/cgi-bin/pub/programs/match/bin/match.cgi is used with the cut-offs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. To use Alibaba2 http://www.generegulation.com/pub/programs/alibaba2/index.html we set the parameter Similarity of sequence to matrix equal 100.
This parameter measure the similarity between the matrix and the sequence analyze in percent. 100 % means that the
most often occurring nucleotides in matrix (the matrix’s consensus) are the same like in the unknown sequence. 1%
means that the unknown sequence is just similar to the matrix. The blue lines represent the predictions of our tool, the
red lines represent Match and Alibaba2 predictions. In this case we consider Camp http://www.ncbi.nlm.nih.gov/
protein/CAG46759.1. Figure A: reports the comparison between our predictions and Match tool with the very strict
choice for cut-offs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. The Match predictions fit
very well in our biding sites regions. Figure B: reports the comparison between our predictions and Alibaba2 tool with
the parameter Similarity of sequence to matrix equal 100. The Alibaba2 predictions fit very well in our biding sites regions. Figure C: comparison between our predictions and Alibaba2 tool with the standard parameters setup. Alibaba2
prediction covers almost the all sequence, becoming meaningless for a comparison.
Issue 1(2), 2010
The Name of the Journal
Figure 6B reports the comparison between our
predictions and Alibaba2 tool with the parameter Similarity of sequence to matrix equal 100. The Alibaba2
predictions fit very well in our biding sites regions.
Figure 6C reports the comparison between our
predictions and Alibaba2 tool with the standard parameters setup. Alibaba2 prediction covers almost the all sequence, becoming meaningless for a comparison.
In Figure 7 we consider Pseudomonas fluorescens
partial cop gene for putative copper transporting
ATPase http://www.ncbi.nlm.nih.gov/nuccore/139925
48.
In Figure 8 we consider Insulin http://www.ncbi.
nlm.nih.gov/protein/AAA40590.1
This mechanism could also give a justification to
the need of 64 triplets in the genetic code to encode 20
amino acids. Each synonymous codon has a different set
of parameters. The choice of which one will encode a
specific amino acid on a protein influences the positions
of the subsequent binding sites, because the parameters
of the subsequences will change. If in a sequence we
replace a triplet with a synonymous codon, the positions
of binding sites from that point to the end of the sequence will change, and this will influence the potential
interactions of the sequence with other ligands.
Figure 9 shows the comparison between neuter
points in the sequence of Insulin http://www.ncbi.nlm.
nih.gov/protein/AAA40590.1
Figure 9A represents the neuter points positions
calculated from the original sequence.
Figure 9B represents the neuter points after the replacement of asparagine AAC, parameters (i, 1) in Position 34 with the synonymous codon AAU, parameters (i,
4). The first 5 points at the beginning of the sequence are
the same, but after position 34 the points change significantly.
6 Codons Distribution along Genes
The results of recent studies, focused on searching for
genome trends in codon choice, suggest that we don’t
yet understand all the rules guiding translation, but the
emerging idea is that codon choice is not random.
The genetic code is degenerate: there are 61 codons
instead of 20. Consequently, some amino acids are encoded multiple times and many different combinations
of codons build the same protein. One of the main questions is how it’s working the mechanism guiding the
Issue 1(2), 2010
11
codon choice within synonymous sets. Many recent
studies focused on this subject.
Cannarrozzi and colleagues (Cannarozzi et al.,
2010; Tuller et al., 2010) examined groups of synonymous codons to understand if were randomly or nonrandomly ordered along genes.
The discovery of ramps (Tuller et al., 2010; Fredrick
& Ibba, 2010) shows that codon choice is not uniform. In
the “ramp”, region including the first 30-50 codons, the
speed of translation is slow and then it will increase up
to a certain level for the rest of the gene.
Kudla et al. (2009) examining the effects of synonymous codon substitutions on the efficiency of translation, found that the sequence at the beginning of the
gene strongly influence the translation itself. These results are consistent with other similar studies concerning
the importance of mRNA structure in controlling translation initiation (de Smit & van Duin, 2003; Studer &
Joseph, 2006). The analysis of the codon sequences of
some proteins with our model seems to confirm these
recent ideas.
We have examined the distribution of codons in
the sequences of many proteins respect to the values of
multiplication and one-dimensional tables. Figures 10
and 11 report as example the results of six proteins of
different length respect to the multiplication table. For
each protein we have examined the all sequence and the
initial region (including around the 10 % of the total
number of codons). The distribution along the all sequence is fully consistent and homogeneous with the
initial one. It’s like if in the initial part the translation
process has to choose the best set of parameters for the
specific sequence and the rest of the translation will follow the initial choice. The result is a homogenous distribution of codons along the all sequences. This is also
consistent with the results of the recent studies reported
above, especially with the concept of “ramp” and the
idea that the sequence at the beginning of the gene
strongly influences the translation.
As we already said, each codon has its set of two
parameters, calculated with the parameters of the single
nucleotides. Then codons in a synonymous set have different parameters: they encode the same amino acid but
with a different contribute to the sequence in terms of
symmetry; therefore the choice of the synonymous codon can change significantly the final parameters of the
sequence.
The Name of the Journal
12
2.5
2
1.5
Series2
Series1
1
0.5
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91
(A)
2.5
2
1.5
Series2
Series1
1
0.5
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91
(B)
2.5
2
1.5
Series2
Series1
1
0.5
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91
(C)
Figure 7: Binding sites comparison. In this case we consider Pseudomonas fluorescens partial cop gene for putative copper transporting ATPase http://www.ncbi.nlm.nih.gov/nuccore/13992548. Figure A: reports the comparison between our predictions and Match tool with the very strict choice for cut-offs parameters for
core and matrix similarity set to 1.0 and 0.93 respectively. The Match predictions fit very well in our biding sites
regions. Figure B: reports the comparison between our predictions and Alibaba2 tool with the parameter Similarity of sequence to matrix equal 100. The Alibaba2 predictions fit very well in our biding sites regions. Figure
C: comparison between our predictions and Alibaba2 tool with the standard parameters setup. Alibaba2 prediction covers almost the all sequence, becoming meaningless for a comparison.
Issue 1(2), 2010
The Name of the Journal
13
2.5
2
1.5
Series2
(A)
Series1
1
0.5
0
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97 100 103 106 109
2.5
2
1.5
(B)
Series2
Series1
1
0.5
0
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97 100 103 106 109
Figure 8: Binding sites comparison. In this case we consider Insulin http://www.ncbi.nlm.nih.gov/protein/AAA
40590.1. Figure A: reports the comparison between our predictions and Match tool with the very strict choice for cutoffs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. The Match predictions fit very well in
our biding sites regions. Figure B: reports the comparison between our predictions and Alibaba2 tool with the standard parameters setup. Alibaba2 prediction covers almost the all sequence, becoming meaningless for a comparison.
Following these results, we could say that at the
beginning the translation is slow because the choice of
the “best” codon in order to achieve the final neutrality
of the sequence is more difficult. It’s like if the “translator” has to consider a big number of combinations of
parameters to find the best path that will lead to a correct result. After a certain number of codons, the “path”
is traced and the choice is faster. The remaining translation will follow the choice of parameters made in the
initial region, and the choice of synonymous codons will
be faster because “guided”.
Following these results, we could say that at the
beginning the translation is slow because the choice of
the “best” codon in order to achieve the final neutrality
of the sequence is more difficult. It’s like if the “translator” has to consider a big number of combinations of
Issue 1(2), 2010
parameters to find the best path that will lead to a correct result. After a certain number of codons, the “path”
is traced and the choice is faster. The remaining translation will follow the choice of parameters made in the
initial region, and the choice of synonymous codons will
be faster because “guided”.
In our example, proteins in Figures 10 and 11 have
been grouped by length. In Figure 10 are represented the
three shortest sequences: Nisin 53 codons, Insulin 109
codons and Keratin 138 codons. In Figure 11 are represented the longest sequences: Adenylyl cyclase 367 codons, Coronin 454 codons and Myosin with 1520 codons.
On the left side of the picture is reported the percentage
distribution of codons in the initial region of the sequence. On the right side is reported the percentage distribution of codons along the all sequence.
The Name of the Journal
14
(A)
(B)
Figure 9: Neuter point changes with synonymous codons. Thecould also give a justification to the need of 64 triplets in
the genetic code to encode 20 amino acids. Each synonymous codon has a different set of parameters. The choice of
which one will encode a specific amino acid on a protein influences the positions of the following binding sites, because
the parameters of the subsequences will change. If in a sequence we replace a triplet with a synonymous codon, the positions of binding sites from that point to the end of the sequence will change, and this will influence the potential interactions of the sequence with other ligands. The figure shows the comparison between neuter points in the sequence of
Insulin http://www.ncbi.nlm.nih.gov/protein/AAA40590.1. Figure A: represents neuter points positions calculated
from the original sequence. Figure B: represents the neuter points after the replacement of asparagine AAC, parameters
(©, 1) in position 34 with the synonymous codon AAU, parameters (©, 4). The first 5 points at the beginning of the sequence are the same, but after position 34 the points change significantly.
Issue 1(2), 2010
The Name of the Journal
15
Begin Sequence
All Sequence
(A)
(B)
(C)
Figure 10: Codons distribution along genes in short sequences. The Figure reports the distribution of codons in the
sequences of the three shortest proteins respect to the value of multiplicative table: Nisin 53 codons, Insulin 109 codons and Keratin 138 codons. On the left side of the picture is reported the percentage distribution of codons in the
initial region of the sequence. On the right side is reported the percentage distribution of codons along the all sequence. For Nisin_53 (A) the 2 are predominant and 4 are rare. This sequence is very short, and to compare the beginning with the all sequence does not have a big statistical meaning, anyway the comparison is not bad. For Insulin_109 (B) and Keratin_138 (C) the 1 and 3 are predominant and only some 4 are present.
For Nisin_53 (Figure 10A) the 2 are predominant
and 4 are rare. This sequence is very short, and to compare the beginning with the all sequence does not have a
big statistical meaning, anyway the comparison is not
bad. For Insulin_109 (Figure10B) and Keratin_138 (Figure10C) the 1 and 3 are predominant and only some 4
are present. For Coronin_454 (Figure11A) the values 1
Issue 1(2), 2010
and 3 are predominant. For Myosin_1520 (Figure11B)
and Adenylyl_367 (Figure11C) there is an equilibrate
distribution of the four values.
In conclusion, our results really suggest that the
character of the sequence is determined at the beginning
of the translation, when the process it’s slow. Then the
initial choice is maintained along the all gene.
The Name of the Journal
16
Begin Sequence
All Sequence
(A)
(B)
(C)
Figure 11: Codons distribution along genes in long sequences. The figure reports the distribution of codons in the
sequences respect to the value of multiplicative table: Adenylyl cyclase 367 codons, Coronin 454 codons and Myosin
with 1520 codons. On the left side of the picture is reported the percentage distribution of codons in the initial region of the sequence. On the right side is reported the percentage distribution of codons along the all sequence. For
Coronin_454. In Figure A, the values 1 and 3 are predominant. For Myosin_1520 (Figure B) and Adenylyl_367 (Figure C) there is an equilibrate distribution of the four values.
6 Discussion
Analyzing the results of this model, it seems that there is
a relation between a synonymous codon choice and the
behavior of the sequence, giving a justification to the
need of 64 triplets in the genetic code to encode 20 amino acids.
Regarding the 3D structure, to each synonymous
codon are associated the three-dimensional rotation ma-
Issue 1(2), 2010
trices of nucleotides. Therefore, synonymous codons
encode the same amino acid, but have different table
parameters and also different rotation matrices. Different rotations give a different contribution to build the
final 3D structure.
We have seen also that the choice of which codon
will encode a specific amino acid on a protein influences
the positions of the neuter points along the sequence. If
The Name of the Journal
17
neuter points represent binding sites, the synonymous
codon choice influence the interactions of the protein
and the function of the protein itself. The existence of 64
redundant triplets is justified with the need to adapt the
amino acid encoding process to the function of the protein. This is possible choosing the synonymous codon
with the right parameters for each specific context.
Moreover, if neuter points represent possible binding
sites, it would be possible modify the regions (i.e. enable
or disable a binding site) playing with the codon parameters.
Moreover, in this model the concept of evolution is
linked to the possibility for a mutation to modify the
parameters and the symmetry of the sequence. In evolution play a fundamental role processes that constantly
introduce variations. The main cause of variation is mutation, which changes the sequence of a gene. The total
symmetry of a sequence is invariant under local transformations related to the multiplicative table: only
changes that don’t modify the parameters of the sequence are allowed and after the changes, the all sequence must still belong to the same equivalence class.
Appendix A
The basic step is to define the correct distribution of nucleotides (purines and pyrimidines) into the equivalence
classes of the group, and then to verify that this distribution is the only one that generates the double helix structure. The 4th order group has 4 classes with 4 elements
each. We need 9 not neuter elements, three for each
class. We have only four elements, and the permutations
of the 5 missing elements are 5! = 120, corresponding to
120 different sets of equivalence classes.
All the permutations involved have been analyzed,
and only the two distributions of nucleotides in Table
A1 and Table A2 generate the correct double-helix structure. Table A3 showS the results generated respectively
with Table A1 and Table A2.
A
A
G
C
C
U/T
C
U/T
G
A
G
U/T
U/T
A
C
G
C
A
Table A1
Table A2
Bases Tab. 1
Bases Tab. 2
(1,1)(3,1)
AT
CG
(1,2)(3,2)
AA
CC
(1,3)(3,3)
GC
TA
(2,1)(2,1)
CC
AA
(2,2)(2,2)
TT
GG
(2,3)(2,3)
GG
TT
Table A3: pairs of nucleotides generated from Table 1
and Table 2 respectively
We are looking for a unique distribution of nucleotides, and we found two distributions able to generate
the correct 64 triplets of the genetic code. But if we check
carefully the Tables A1 and A2, we see that the two distributions are complementary respect to the DNA double helix. In Tables A1 and A2 we find in the same positions the corresponding nucleotides of the DNA pairs:
where in Table A1 we have nucleotide A, in Table A2 we
find nucleotide C and the same for the pair G-T.
Both distributions generate the 64 triplets of the
genetic code, but only the distribution in Table A1 is the
correct one. In fact, verifications regarding the reproduction of 3D structure of proteins have shown that Table
A2 distribution does not build a correct shape.
Figures A1-1, A1-2 and A1-3 show as examples the
reproductions of the 3D shapes of Nisin, Collagen and
Histone with both distributions. It is clear from the comparison with the pictures taken from NCBI site that the
distribution in Table A2 is very far from building a good
3D shape.
Appendix B
The rotations of generic α, β and γ angles are represented
by the following matrices:
⎡ 1
0
⎢
R( x)(γ ) = ⎢ 0 cosγ
⎢ 0 sin γ
⎣
⎤
0
⎥
− sin γ ⎥
cosγ ⎥⎦
Rotation of γ degrees around x-axis
⎡ cos β
⎢
R( y )( β ) = ⎢
0
⎢ − sin β
⎣
0 sin β ⎤
⎥
1
0 ⎥
0 cos β ⎥⎦
Rotation of β degrees around y-axis
Issue 1(2), 2010
The Name of the Journal
18
⎡ cosα
R( z)(α ) = ⎢ sin α
⎢
⎢⎣ 0
− sin α
cosα
0
0
0
1
⎤
⎥
⎥
⎥⎦
Rotation of α degrees around z-axis
For γ= 90° we have sin 90° = 1 and cos 90° = 0; for γ=
180° we have sin 180° = 0 and cos 180° = -1; for γ= 270°
we have sin 270° = -1 and cos 270° = 0; therefore the rotations matrices around x, y and z axis are:
1
0
0
1
0
0
1
0
0
0
0
-1
0
-1
0
0
0
1
0
1
0
0
0
-1
0
-1
0
TTG
(3,1)(3,1)(1,3)
270° x-axis, 270° x-axis, 90° z-axis
GAT
(1,3)(3,2)(2,2)
90° z-axis, 270° y-axis, 180° y-axis
TTG
(3,1)(3,1)(1,3)
270° x-axis, 270° x-axis, 90° z-axis
CTA
(3,3)(2,2)(1,2)
270° z-axis, 180° y-axis, 90° y-axis
TCT
(3,1)(3,3)(3,1)
270° x-axis, 270° z-axis, 270° x-axis
GTT
(1,3)(3,1)(3,1)
90° z-axis, 270° x-axis, 270° x-axis
TCG
(3,1)(2,1)(2,3)
270° x-axis, 180° x-axis, 180° z-axis
AAG
(1,1)(1,1)(1,3)
90° x-axis, 90° x-axis, 90° z-axis
AAA
(1,1)(1,1)(1,1)
90° x-axis, 90° x-axis, 90° x-axis
GAT
(1,3)(3,2)(2,2)
90° z-axis, 270° y-axis, 180° y-axis
TCA
(2,2)(3,3)(1,2)
180° y-axis, 270° z-axis, 90° y-axis
GGT
(1,3)(1,3)(3,1)
90° z-axis, 90° z-axis, 270° x-axis
GCA
(2,3)(2,1)(1,1)
180° z-axis, 180° x-axis, 90° x-axis
TCA
(2,2)(3,3)(1,2)
180° y-axis, 270° z-axis, 90° y-axis
X axis Rotations matrices for 90°, 180° and 270°
CCA
(3,3)(3,3)(1,1)
270° z-axis, 270° z-axis, 90° x-axis
CGC
(2,1)(2,3)(3,3)
180° x-axis, 180° z-axis, 270° z-axis
0
ATT
(3,2)(2,2)(3,1)
270° y-axis, 180° y-axis, 270° x-axis
ACA
(1,1)(3,3)(1,1)
90° x-axis, 270° z-axis, 90° x-axis
AGT
(3,2)(1,3)(2,2)
270° y-axis, 90° z-axis, 180° y-axis
ATT
(3,2)(2,2)(3,1)
270° y-axis, 180° y-axis, 270° x-axis
TCG
(3,1)(2,1)(2,3)
270° x-axis, 180° x-axis, 180° z-axis
CTA
(3,3)(2,2)(1,2)
270° z-axis, 180° y-axis, 90° y-axis
TGT
(3,1)(1,3)(3,1)
270° x-axis, 90° z-axis, 270° x-axis
ACA
(1,1)(3,3)(1,1)
90° x-axis, 270° z-axis, 90° x-axis
CCC
(3,3)(3,3)(3,3)
270° z-axis, 270° z-axis, 270° z-axis
GGT
(1,3)(1,3)(3,1)
90° z-axis, 90° z-axis, 270° x-axis
TGT
(3,1)(1,3)(3,1)
270° x-axis, 90° z-axis, 270° x-axis
AAA
(1,1)(1,1)(1,1)
90° x-axis, 90° x-axis, 90° x-axis
ACA
(1,1)(3,3)(1,1)
90° x-axis, 270° z-axis, 90° x-axis
GGA
(1,3)(1,3)(1,1)
90° z-axis, 90° z-axis, 90° x-axis
GCT
(2,3)(2,1)(3,1)
180° z-axis, 180° x-axis, 270° x-axis
CTG
(2,1)(3,1)(2,3)
180° x-axis, 270° x-axis, 180° z-axis
ATG
(3,2)(2,2)(1,3)
270° y-axis, 180° y-axis, 90° z-axis
GGT
(1,3)(1,3)(3,1)
90° z-axis, 90° z-axis, 270° x-axis
TGT
(3,1)(1,3)(3,1)
270° x-axis, 90° z-axis, 270° x-axis
AAC
(1,1)(1,1)(3,3)
90° x-axis, 90° x-axis, 270° z-axis
ATG
(3,2)(2,2)(1,3)
270° y-axis, 180° y-axis, 90° z-axis
AAA
(1,1)(1,1)(1,1)
90° x-axis, 90° x-axis, 90° x-axis
ACA
(1,1)(3,3)(1,1)
90° x-axis, 270° z-axis, 90° x-axis
GCA
(2,3)(2,1)(1,1)
180° z-axis, 180° x-axis, 90° x-axis
ACT
(1,2)(3,3)(2,2)
90° y-axis, 270° z-axis, 180° y-axis
TGT
(3,1)(1,3)(3,1)
270° x-axis, 90° z-axis, 270° x-axis
CAT
(3,3)(1,2)(2,2)
270° z-axis, 90° y-axis, 180° y-axis
TGT
(3,1)(1,3)(3,1)
270° x-axis, 90° z-axis, 270° x-axis
AGT
(3,2)(1,3)(2,2)
270° y-axis, 90° z-axis, 180° y-axis
ATT
(3,2)(2,2)(3,1)
270° y-axis, 180° y-axis, 270° x-axis
CAC
(3,3)(1,1)(3,3)
270° z-axis, 90° x-axis, 270° z-axis
GTA
(1,3)(2,2)(3,2)
90° z-axis, 180° y-axis, 270° y-axis
0
1
-1
0
0
0
0
-1
0
1
0
0
1
0
0
1
0
-1
0
0
0
0
-1
1
0
0
Y axis Rotations matrices for 90°, 180° and 270°
0
-1
0
-1
0
0
0
1
0
1
0
0
0
-1
0
-1
0
0
0
0
1
0
0
1
0
0
1
Z axis Rotations matrices for 90°, 180° and 270°
Appendix C
In this section we will show how to build the 3D structure of the protein Nisin 53 http://www.ncbi.nlm
.nih.gov/protein/ABV64388.1. The sequence of amino
acids is the following:
DFNLDLLSVSKKDSGASPRITSISLCTPGCKTGALMGCNMKTATCNCSIHVSK
gattttaacttggatttgctatctgtttcgaagaaagattcaggtgcatcaccacgcattacaagtatttcgctatgtacacccggttgtaaaacaggagctctgatgggttgtaacatgaaaacagcaacttgtaattgtagtattcacgtaagcaaa
T and U nucleotides are represented by the same matrix
element and have the same rotation associated.
Codon Matrix elements
Rotations associated
GAT
(1,3)(3,2)(2,2)
90° z-axis, 270° y-axis, 180° y-axis
TTT
(3,1)(3,1)(3,1)
270° x-axis, 270° x-axis, 270° x-axis
AAC
(1,1)(1,1)(3,3)
90° x-axis, 90° x-axis, 270° z-axis
Issue 1(2), 2010
The Name of the Journal
19
AGC
(1,1)(2,3)(2,1)
90° x-axis, 180° z-axis, 180° x-axis
AAA
(1,1)(1,1)(1,1)
90° x-axis, 90° x-axis, 90° x-axis
To build a 3D shape on an orthogonal space we
need a set of points with coordinates (x, y, z). We will
describe in detail the calculation of the first four points,
then it will be enough to repeat the steps to get the complete set of coordinates.
We set the origin of an orthogonal reference frame
at the beginning of the sequence, where the first codon
starts. Therefore, the coordinates of the first point are the
coordinates of the vector representing the first codon.
Then after the application of its three rotations, we will
get the coordinates of the second point. For the first codon we have:
GAA (1,3)(3,2)(2,2) 90° z-axis, 270° y-axis, 180° y-axis
The corresponding codon matrix and the rotation matrices are:
GAT codon
matrix
90°
z-axis
270°
y-axis
180°
y-axis
0
0
1
0
-1
0
0
0
-1
-1
0
0
0
1
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
1
1
0
0
0
0
-1
GAT codon matrix and 90° z-axis, 270° y-axis, 180° yaxis rotation matrices
Consequently, the coordinates of the first point are (3, 2,
2).
Now we apply the rotations to get the coordinates
of the second point: we will multiply the three rotation
matrices with the codon matrix following the order of
the nucleotides.
The coordinates of the resulting T3 vector are (2, 3,
2). To get the coordinates of the second point we convert
the values (2, 3, 2) into the main reference frame (3, 2, 2)
and we get (5, 5, 4).
0
-1
0
1
0
0
0
0
1
*
0
0
1
0
1
0
0
1
0
G * GAT = T1
Issue 1(2), 2010
=
0
-1
0
0
0
1
0
1
0
0
0
-1
0
1
0
1
0
0
*
0
-1
0
0
0
1
0
1
0
0
=
-1
0
0
0
1
0
-1
0
0
1
0
0
0
1
0
1
0
A * T1 = T2
-1
0
0
0
1
0
0
0
-1
0
*
-1
0
0
0
1
0
-1
0
=
T * T2 = T3
Now we will repeat the calculation for the second
codon. To do this we translate the origin of the reference
frame to the tip of the T3 vector. Thus the origin of the
reference frame for the second codon has coordinates (5,
5, 4) respect to the main reference frame of the sequence.
Also in this case the coordinates will be then converted
into the reference frame of the sequence.
For the second codon we have:
TTT (3,1)(3,1)(3,1) 270° x-axis, 270° x-axis, 270° x-axis
The corresponding codon matrix and the rotation matrices are:
TTT codon
matrix
270°
x-axis
270°
x-axis
270°
x-axis
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
1
0
0
1
0
0
1
3
0
0
0
-1
0
0
-1
0
0
-1
0
TTT codon matrix and 270° x-axis, 270° x-axis, 270° xaxis rotation matrices
Now we apply the rotations to get the coordinates
of the third point: we will multiply the three rotation
matrices with the codon matrix following the order of
the nucleotides.
1
0
0
0
0
0
0
0
1
0
0
0
0
-1
0
3
0
0
*
T * TTT = T1
=
0
0
0
3
0
0
0
0
0
The Name of the Journal
1
0
0
0
0
1
0
-1
0
*
20
0
0
0
3
0
0
0
0
0
=
0
0
0
0
0
0
-3
0
0
1
0
0
0
0
-1
0
1
0
*
T * T1 = T2
1
0
0
0
0
0
0
0
1
0
0
0
0
-1
0
-3
0
0
*
=
0
0
0
1
0
0
-3
0
0
0
0
-1
0
0
0
0
1
0
0
0
0
0
0
0
1
=
2
0
0
0
0
-1
0
0
0
*
2
0
0
2
0
0
0
0
-1
0
0
0
=
0
0
0
0
0
-1
0
0
0
-2
0
0
0
0
-1
A * T1 = T2
The coordinates of the resulting T3 vector are (0, -3,
0). To get the coordinates of the third point we convert
the values (0, -3, 0) respect to the reference frame of the
sequence. We remind that the reference frame for the
second codon has coordinates (5, 5, 4), thus the coordinates of the third point are (5, 2, 4).
Now we will repeat the calculation for the third
codon. To do this we translate the origin of the reference
frame to the tip of the T3 vector. Thus the origin of the
reference frame for the third codon has coordinates (5, 2,
4) respect to the main reference frame of the sequence.
Also in this case the coordinates will be then converted
into the reference frame of the sequence.
For the third codon we have:
AAC (1,1)(1,1)(3,3) 90° x-axis, 90° x-axis, 270° z-axis
The corresponding codon matrix and the rotation
matrices are:
90°
x-axis
0
A * AAC = T1
T * T2 = T3
AAC codon
matrix
2
90°
x-axis
270°
x-axis
0
1
0
-1
0
1
0
0
1
*
2
0
0
0
0
0
0
0
-1
=
C * T2 = T3
The coordinates of the resulting T3 vector are (0, -2,
-3). To get the coordinates of the fourth point we convert
the values (0, -2, -3) respect to the reference frame of the
sequence. We remind that the reference frame for the
third codon has coordinates (5, 2, 4), thus the coordinates of the fourth point are (5, 0, 1).
It is enough now to repeat the steps with all the
remaining codons of the sequence to get the complete set
of coordinates.
To see the result, the set of coordinates will have to
be saved on a .dat file, say nisin.dat. Then it can be used
as input for graphical tools, like gnuplot, free software
that displays various mathematical functions and numerical data http://www.gnuplot.info/download.html,
running the commands:
2
0
0
1
0
0
1
0
0
0
1
0
> set hidden3d
> splot “C://….path…//nisin.dat” with lines
0
0
0
0
0
-1
0
0
-1
-1
0
0
and the result will be the following:
0
0
1
0
1
0
0
0
0
0
1
1
AAC codon matrix and 90° x-axis, 90° x-axis, 270° x-axis
rotation matrices
Now we apply the rotations to get the coordinates
of the third point: we will multiply the three rotation
matrices with the codon matrix following the order of
the nucleotides.
Issue 1(2), 2010
The Name of the Journal
21
The final set of coordinates in our example is:
322
554
524
501
5 3 -1
761
7 9 -1
471
971
9 4 -1
953
9 10 3
12 10 3
14 13 5
11 15 3
5 16 3
4 16 7
1 18 5
0 12 5
4 12 2
7 12 0
4 12 -2
6 10 -5
9 10 -7
9 11 -3
6 9 -1
4 9 -4
1 9 -6
193
-5 10 3
-7 10 0
-4 10 0
-7 10 -2
-14 10 -2
-14 9 -6
-14 10 -2
-16 12 -5
-22 13 -5
-24 13 -8
-24 11 -11
-26 13 -14
-23 13 -14
-26 13 -16
-27 13 -12
-29 10 -10
-31 10 -13
-33 10 -10
-35 10 -13
-33 8 -16
-30 8 -18
-36 8 -19
-34 11 -17
Author Biography
Paola Pozzo
Research and Development Statistics, Intrasoft – Eurostat, Luxembourg. [email protected]
References
Basilevsky, A. (1983) Applied matrix algebra in the statistical sciences, Dover
Publications.
Rotman, J. (1994). An introduction to the theory of groups. New York, Springer-Verlag.
Crick, Francis, (1988). Chapter 8: The genetic code. What mad pursuit: a personal view of scientific discovery. New York, Basic Books. (pp. 89–101).
Ridley, M. (2006). Genome, New York, Harper Perennial.
Elzanowski A. & Ostell J. (2008). The Genetic Codes, National Center for
Biotechnology
Information
(NCBI).
http://www.ncbi.nlm.nih.
gov/Taxonomy/Utils/wprintgc.cgi?mode=c.
Harary, F. (1994) Graph Theory, ed. Reading. MA,Addison-Wesley.
Lomont, J.S. (1987). Cyclic Groups, in Applications of Finite Groups. Dover.
New York. (p. 78).
Scott, W.R. (1987). Cyclic Groups in Group Theory. Dover, New York. (pp. 3435).
Frampton, P. (2008). Gauge Field Theories. 3ed. Wiley-VCH.
Cannarrozzi, G., et al. (2010). Cell, 141 (pp. 355-367).
Tuller, T., et al. (2010). Cell, 141: (pp. 344 – 354).
Fredrick, K. and M. Ibba (2010), Cell, 141 (pp. 227 – 229).
de Smit, M.H. & J. van Duin (2003). Journal of Molecular Biology, 331 (pp.
737–743).
Studer, S.M. & Joseph, S. (2006). Molecular Cell, 22 (pp. 105–115).
Issue 1(2), 2010