The Name of the Journal 1 The Universal Genetic Code described by a Model based on Group Theory Paola Pozzo Abstract T his study was performed, in order to verify if a mathematical model based on group theory can describe the universal genetic code and codon sequences behavior. We investigated the reproducibility of the 64 variations of nucleotide triplets forming the genetic code and observed that the model based on a 4th order cyclic group can describe the DNA sequence with a unique distribution of nucleotides. Moreover, the model reproduces the shape of the 3D structures of codon sequences in some cases with surprising precision. We investigated also the possibility that special points along the sequence can represent binding sites. The comparison of our sets of points with the predictions of some specific tools gives promising results. Additionally, the analysis of the triplet codon selection of some proteins seems to confirm the recent idea that 5’-sequence of mRNAs strongly influences their translation. 1 Introduction Group theory (Rotman, 1994) is the mathematical instrument used to study objects symmetries in all disciplines using mathematical models and computational techniques. The general procedure to describe a system is to associate it to a suitable symmetry group. Various physical systems can be modelled by symmetry groups. Thus, group theory has many applications in physics and chemistry, but does not have up to now many applications in biology. We investigated the possibility that a mathematical description based on group theory can describe the universal genetic code and codon sequences behavior. The genetic code (Crick, 1988) is the set of rules, by which information encoded in DNA or mRNA sequences is translated into proteins. The genetic code defines a mapping between nucleotide triplets of mRNA, called codons, and amino acids forming the backbone of a protein (Ridley, 2006). Through nearly all species the same translation code is used, which is referred to as universal genetic code (Elzanowski & Ostell, 2008). Coding regions of genes can be considered as short instructions built up by the “letters” of the DNA alphabet. The genetic code is degenerated, since there are 64 variations of nucleotide triplets, but only 20 amino acids and a trans- Issue 1(2), 2010 lation stop signal to be coded by them. Consequently, some amino acids are encoded multiple times and many different combinations of codons can build the same protein. The purpose of this study was to verify that the genetic code can be described with a mathematical model based on group theory and to verify if the sequences built following this model are consistent with the properties of the real sequences. 2 Short Introduction to Group Theory In mathematics, a group is an algebraic structure consisting of a set of objects together with an operation (composition law) that combines any two of its elements to form a third element. Each group has an element neuter respect to the composition law: any element is invariant if combined with the neuter one. A cyclic group (Harary, 1994; Lomont, 1987; Scott, 1987) is a special group of objects generated by a single element, in the sense that the group has an element g (called “generator") such that, when written multiplicatively, every element of the group is a power of g. If gn gives the neuter element, n is the order of the group. The elements of any group are partitioned into equivalence classes; members of the same class share many properties. One of the simplest cyclic groups is the 4th order cyclic group C4, the set of The Name of the Journal 2 all integer multiples of rotation by 90° in a threedimensional orthogonal space. The composition law is the consecutive application of rotations and neuter element is the class of multiples of rotation by 360°. To any group is associated a multiplication table, describing the relations between the equivalence classes (therefore between the elements). Table 1 shows the multiplication table for the 4th order cyclic group. the element (i, j) = (1, 3) represents a rotation of 90° around z axis. Each element is identified in a unique way by two parameters associated respect to Table1 and Table 2. The neuter elements have the values 1 and 4 for representations and multiplication tables respectively. For the other elements, Table 1 gives the corresponding multiplication table parameters (the class parameters) and Table 2 gives the corresponding one-dimensional representations table parameters, as shown in Table 4. E A A2 A3 E e A A2 A3 11 12 13 A3 A3 e A A2 21 22 23 A2 A2 A3 e A 31 32 33 A A A2 A3 E Matrix element indices Table 3: ij matrix indices Table 1: Multiplication table Matrix element indices Groups can be represented in several ways. The simplest way is to use one-dimensional representations. Table 2 shows the one-dimensional representations for the 4th order cyclic group. 11 12 13 21 22 23 31 32 33 Multiplication table params R1 R2 R3 R4 1 1 1 E 1 1 1 1 2 2 2 A 1 -1 i -i 3 3 3 A2 1 1 -1 -1 A3 1 -1 -i i Table 2: One-dimensional representations table One-dimens. table params -1 i -i 1 -1 -1 -1 -i i Each class (called E the neuter one and A, A2 and A3 respectively) contains four elements. The neuter element is always represented by the real integer 1. The Table 4: Multiplication and one-dimensional tables parameters for cyclic group C4 group may also be represented with 3 × 3 matrices Gij, in which i is the number of the row and j the number of the column. In Table 2 the rows represent the classes, the columns specify the elements into the classes. The notneuter elements will be identified by the couples ij of indices in Table 3. As we said, this is the group of all integer multiples of rotation by 90°, therefore the index i (equivalence class index) represents the amplitude of the rotation associated to the class. The j index related to the element into the class specifies the axis of rotation. For example, Any state of the system described by the group is represented with a vector in the orthogonal space, summation of the vectors representing the single elements of the classes. The rotation matrices associated to each state applied to the vector will represent the evolution of the system. The parameters of any state of the system will be the composition of the parameters of the single elements, and have to be equal 1 and 4 for the representation and multiplication tables respectively. Neutrality with respect to the one-dimensional representations table Issue 1(2), 2010 The Name of the Journal 3 means that the total products of the parameters of each state have to be equal to unity. This is the request for the corresponding matrix to have real Eigenvalues. In fact only real values can represent real systems. Neutrality with respect to the multiplicative table is a gauge invariance request (Frampton, 2008). Gauge invariance is the property of a system to be invariant under a group of local transformations. 3 Group Theory and Genetic Code Given this short introduction, we ask how such a model could describe DNA double helix and the genetic code. In the previous section, we defined a group as follows: “In mathematics, a group is an algebraic structure consisting of a set of objects“. In our case we have four objects into two separate classes with two elements each, the purines (AG) and pyrimidines (CT or CU) (Figure 1). PURINES PYRIMIDINES A Adenine G Guanine T/U Thymine Uracil C Cytosine Figure 1: Basic classes in genetic code: purines and pyrimidines : As starting point we have two separate classes, purines and pyrimidines, containing two elements each: Adenine and Guanine in the first one and Cytosine and Thymine or Uracil in the second one. Two classes with two elements each can lead to a 3rd order group (two classes plus the neuter element), but this is not enough to describe the genetic code. In fact for each codon we need three nucleotides. A 4th order group seems to be the correct choice. The basic step is to define the correct distribution of nucleotides (purines and pyrimidines) into the equivalence classes of the group, and then to verify that this distribution is the only one that generates the double helix structure. To describe the double helix, the pair of indices (i, j) to identify a class element (nucleotide) is not enough. We need a second pair of indices (alpha, beta) to identify the side of the double helix. Therefore, it is convenient to define each nucleotide with a function using the set of indices (i, j) to identify the corresponding element into Issue 1(2), 2010 the group and (alpha, beta) to indicate the side of the double helix. (alpha, beta) can assume the values (1, 0) or (0, 1). The 4th order group has 4 classes with 4 elements each. The 4th class is the neuter class and into each class the 4th element is the neuter element. The consequence is that we need 9 elements, three for each class. We have only four elements, and the permutations of the 5 missing elements are 5! = 120, corresponding to 120 different sets of equivalence classes. All the permutations involved have been analyzed. To reproduce the double helix structure we have to generate couples of nucleotides neuter respect to Table 1 and Table 2. This means that the parameters of the corresponding couples will be 1 and 4 respectively. For example the (i, j) elements (1,2) and (3,2) have as parameters the values (1, i) and (3, –i) and generate a correct couple. The composition of the two class parameters is 1 & 3 = 4, that is the neuter value. The multiplication table parameters are complex values, but i multiplied by –i gives the real integer 1. The alpha/beta indexes (indicated with 10 and 01 respectively) in any couple are set as of opposite symmetry, referring to the opposite sides of the double helix. All the permutations involved have been analyzed to generate the pairs of nucleotides following the group rules, but only two distributions, in Table 5 and Table 6, generate the correct double-helix structure. A A G C C U/T C U/T G A G U/T U/T A C G C A Table 5 Table 6 Table 7 and Table 8 show the results generated with Table 5 distribution. All the other sets of equivalence classes generate couples not compatible with DNA double helix. The correct distribution is the one in Table 5. In fact, verifications regarding the reproduction of 3D structure of proteins have shown that Table 6 distribution does not build a correct shape. For more details see Appendix A. The first important result is that we can well reproduce the genetic code, namely all the 64 variations of nucleotide triplets used to encode 20 amino acids, with the pairs of nucleotides just generated. To generate the The Name of the Journal 4 bases a AT b AA c GC d CC e TT f GG Table 7: pairs of nucleotides generated accordingly to the group rules. ac alpha/beta 1010 alpha/beta 1001 alpha/beta 0110 alpha/beta 0101 nucleotides ATGC alpha/beta 1001 alpha/beta 1010 alpha/beta 0101 alpha/beta 0110 nucleotides AATT alpha/beta 1001 alpha/beta 0101 alpha/beta 1010 alpha/beta 0110 nucleotides CCGG be df Table 8: Combinations consistent with the DNA double helix. triplets of nucleotides we have to start from the consideration that the three bases of one amino acid are a sequence taken from the same side of the double helix. To generate a codon of three bases we considered all the possible combinations with 6 elementary states of opposite symmetry built with the couples of bases just generated, extracting only the sets of 3 with the same values of (alpha, beta), that is belonging to the same side of the double helix. Each codon has its set of two parameters, calculated with the parameters of the three nucleotides forming the codon. Therefore any sequence built with the set of amino acids has its set of two parameters, cal- Issue 1(2), 2010 culated with the parameters of the single codons forming the sequence. As we said, the reproduction of the genetic code is the first important result, but now we have to verify if the sequences built with the codons generated with this model are consistent with the properties of the real sequences. The set of amino acids has been used to build about 100 protein sequences, analyzing for each one the reproduction of 3D structure, binding sites predictions and codon distribution along genes. 4 Proteins Three-Dimensional Structure The three-dimensional structure of a protein is determined by the amino acid sequence. In fact, each protein is translated from a sequence of mRNA to a linear chain of amino acids. Amino acids interact with each other to produce a well-defined three-dimensional structure, the folded protein. For many proteins, the correct threedimensional structure is essential to function and failure to get the correct structure produces inactive proteins with different properties. We said “Any state of the system described by the group is represented with a vector in the orthogonal space, summation of the vectors representing the single elements of the classes. The rotation matrices associated to each state applied to the vector will represent the evolution of the system”. In this model, each synonymous codon has associated the three-dimensional rotation matrices of nucleotides. Therefore, synonymous codons encode the same amino acid, but have different table parameters and also different rotation matrices associated. As we already said in “Short Introduction to Group Theory”, the 4th order cyclic group is the group of all integer multiples of rotation by 90°, therefore the index i (equivalence class index) represents the amplitude of the rotation associated to the class. The j index related to the element into the class specifies the axis of rotation. Table 9 reports the rotations corresponding to each element that is to each nucleotide. For example serine: Codon UCA UCC UCU Matrix Elements (2,2)(3,3)(1,2) (3,1)(3,3)(3,3) (3,1)(3,3)(3,1) Rotations 180° axis Y, 270° axis Z, 90° axis Y 270° axis X, 270° axis Z, 270° axis Z 270° axis X, 270° axis Z, 270° axis X The three sequences of rotations are very different, and the choice of one codon gives a different contribution to The Name of the Journal 5 the “action" of the sequence, that is to build the correct three-dimensional structure. (1,1) 90° axis x (2,1) 180° axis x (3,1) 270° axis x (1,2) 90° axis y (2,2) 180° axis y (3,2) 270° axis y (1,3) 90° axis z (2,3) 180° axis z (3,3) 270° axis z Table 9: ij matrix elements rotations How do we apply rotations to build a 3D shape? Each nucleotide is a state of the system, represented by a 3 × 3 matrix and each 3 × 3 matrix has a corresponding state vector in the orthogonal space of the group. Also a codon represents a state of the system: the corresponding vector is the sum of the single nucleotides vectors and the corresponding matrix is the sum of the single nucleotides matrices. In the example of serine UCA codon, the 3 × 3 matrix is the sum of the U, C and A nucleotides matrices, as shown in Table 10. U nucl. matrix C nucl. matrix -1 0 0 0 1 0 0 0 -1 × 0 0 0 1 1 0 0 0 1 = 0 0 0 -1 1 0 0 0 -1 1 1 0 0 0 -1 0 1 -1 -1 0 0 U * UCA = T1 0 -1 0 1 0 0 0 0 1 × A nucl. matrix 0 0 0 -1 1 0 0 0 -1 = 0 0 0 C * T1 = T2 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 -1 0 1 0 1 0 0 × 0 0 0 1 1 0 0 0 -1 = 0 0 0 A * T2 = T3 A+U+C UCA codon matrix 0 1 0 0 1 0 0 0 1 Table 10: U, C and A nucleotides matrices and their sum UCA codon matrix The three rotations associated are: 180° respect to Y axis, 270° respect to Z axis and 90° respect to Y axis. Table 11 shows the corresponding rotations matrices (see Appendix B). -1 0 0 0 1 0 0 0 1 0 1 0 -1 0 0 0 1 0 0 0 -1 0 0 1 -1 0 0 Table 11: 180° Y axis, 270° Z axis and 90° Y axis rotations matrices Issue 1(2), 2010 After joining the triplets to form a sequence, we have something like a “flat strip”. To build a 3D shape on an orthogonal space we need a set of points with coordinates (x, y, z). We set the origin of an orthogonal reference frame at the beginning of the sequence. The coordinates of the first point are the coordinates of the vector representing the first codon after the application of its three rotations. Let us take as example serine UCA as first codon. To execute the rotations, we will multiply the three matrices in Table 11 with the codon matrix in Table 11 following the order of the nucleotides: The resulting vector, represented by the T3 matrix, has coordinates (-3, 2, -2): this is the first point of the set we will use to build the 3D shape. Then we translate the origin of the reference frame to the tip of the resulting vector, where the second codon starts. In our example, we translate the origin (0, 0, 0) of the reference frame to the point (-3, 2, -2). We apply then the three rotations of the second codon to the vector starting in position (-3, 2, -2). Thus, we obtain the coordinates of the second point that will be of course converted into the original reference frame. Then we translate again the origin of the reference frame to the tip of the resulting vector and so on until the end of the sequence. In the present study, the set of resulting coordinates has been used as input for gnuplot http://www .gnuplot.info/download.html, free software that displays various mathematical functions and numerical data. The result reproduces the shape of the final three- The Name of the Journal dimensional structure of the protein, in some cases with surprising precision. It is like if the rotations associated to the sequence of codons express the force acting between the nucleotides to produce the geometry of the final structure. The rotations do not represent the real rotations of the linear chain in the real physical space, but the rotations of the state vector representing the sequence in the orthogonal space of the group. In Appendix C is reported the full example of the protein Nisisn 53 http://www.ncbi.nlm.nih.gov/protein /ABV64388.1 . Some examples are given in Figures 2, 3, 4 and 5. The pictures are taken from the NCBI web site. 5 Binding Sites Comparison A binding site is a region on a protein, DNA, or RNA to which specific other sequences, generically called ligands, form a chemical bond. Binding sites allow a protein to interact with specific ligands, therefore predicting the binding sites between two interacting proteins provides important clues to the function of the protein itself. There are points in a sequence where the corresponding sub-sequence is neuter, with the total equivalence class parameter (related to multiplication table) equal to 4. For example, if the position 125 is a neuter point, this means that the sub-sequence of the first 125 codons has a multiplication table parameter equal to 4, neuter. We investigated the possibility that the neuter points along a sequence can represent potential binding sites, comparing our results with the predictions of some specific tools. A ligand cannot bind to all the neuter points, but only to sites where there is “affinity”, and the parameters of single codons are the “key”. In fact if in a protein there are 50 neuter points, the sub-sequences are neuter, but the corresponding codons in the specific positions have their own parameters. This is valid of course for both the protein and the ligand. To have an effective bond, the binding site parameters of the ligand composed together with the binding site parameters of the protein must give the neuter values. Example: codons 120 and 175 are neuter points, which are possible binding sites. The total multiplication table parameters of the two subsequences is 4, but the corresponding codons have respectively parameters (3, 1) and (2, –i). If a ligand on a binding site has parameters (2, –i) cannot bind to both sites of the protein, but only to point 175: 2 and 2 gives 4 as parameter for the class, but Issue 1(2), 2010 6 for point 120 we have parameters 3 and 2, and the cyclic result is 1, not neuter (3 + 2 = 5 – 4 = 1). The analysis of the regions close to the neuter points can give information about the stability of the bond. High compatibility between the parameters of the codons in the regions around the neuter points indicates a stable resulting bond. The difficult point is to compare the binding sites of the model with “real” binding sites or at least with reliable prediction tools. There are many tools in bioinformatics, and after a careful investigation, we have selected for the comparison two of the most used ones: Match and Alibaba2 of Gene Regulation. Both use the binding sites collected in TRANSFAC database. In both tools, we made many restrictions in the setup of parameters. In fact using the standard parameters the list of the predicted binding sites is very long, and covers almost the all sequence, becoming meaningless for a comparison. Match http://www.gene-regulation.com/cgi-bin/ pub/programs/match/bin/match.cgi is used with the cut-offs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. The matrix similarity is a score that describes the quality of a match between a matrix and an arbitrary part of the input sequences. Analogously, the core similarity denotes the quality of a match between the core sequence of a matrix (i.e. the five most conserved positions within a matrix) and a part of the input sequence. To use Alibaba2 http://www.gene-regulation.com /pub/programs/alibaba2/index.html we set the parameter Similarity of sequence to matrix equal 100. This parameter measure the similarity between the matrix and the sequence analyze in percent. 100 % means that the most often occurring nucleotides in matrix (the matrix’s consensus) are the same like in the unknown sequence. 1% means that the unknown sequence is just similar to the matrix. The blue lines represent the predictions of our tool, the red lines represent Match and Alibaba2 predictions. In Figure 6 we consider Camp http://www.ncbi. nlm.nih.gov/protein/CAG46759.1 Figure 6A reports the comparison between our predictions and Match tool with the very strict choice for cut-offs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. The Match predictions fit very well in our biding sites regions. The Name of the Journal 7 Collagen 316 Figure 2: Proteins three-dimensional structure: Collagen. The three-dimensional structure of a protein is determined by the amino acid sequence. In fact, each protein is translated from a sequence of mRNA to a linear chain of amino acids. Amino acids interact with each other to produce a well-defined three-dimensional structure, the folded protein. In this model each synonymous codon has an “action” associated, represented with the threedimensional rotation matrices of nucleotides. The rotations performed following the nucleotides distributions reproduce the shape of the final three-dimensional structure of the protein. It is like if the rotations express the force acting between the nucleotides to produce the final structure. The shape of collagen is reproduced in a surprising way. Acetylcholine receptor 519 Arp2/3 304 Figure 3: Proteins three-dimensional structure. The three-dimensional structure of Acetylcholine receptor and Arp2/3 Issue 1(2), 2010 The Name of the Journal 8 Dystrophin 71 Histone 448 Nisin 53 Figure 4: Proteins three-dimensional structure. The three-dimensional structure of Dystrophin, Histone and Nisin. Issue 1(2), 2010 The Name of the Journal 9 Scramblase 379 Serum albumine 607 SMN1 288 Figure 5: Proteins three-dimensional structure. The three-dimensional structure of Scramblase, Serum albumine and SMN1. Issue 1(2), 2010 The Name of the Journal 10 2.5 2 1.5 Series2 (A) Series1 1 0.5 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 2.5 2 1.5 Series2 (B) Series1 1 0.5 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 2.5 2 1.5 Series2 (C) Series1 1 0.5 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 Figure 6: Binding sites comparison .The binding sites predictions of our tool are compared with the predictions of two of the most used tools: Match and Alibaba2 of Gene Regulation. Both use the binding sites collected in TRANSFAC database. In both tools we made many restrictions in the setup of parameters. In fact using the standard parameters the list of the predicted binding sites is very long, and covers almost the all sequence, becoming meaningless for a comparison. Match http://www.gene-regulation.com/cgi-bin/pub/programs/match/bin/match.cgi is used with the cut-offs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. To use Alibaba2 http://www.generegulation.com/pub/programs/alibaba2/index.html we set the parameter Similarity of sequence to matrix equal 100. This parameter measure the similarity between the matrix and the sequence analyze in percent. 100 % means that the most often occurring nucleotides in matrix (the matrix’s consensus) are the same like in the unknown sequence. 1% means that the unknown sequence is just similar to the matrix. The blue lines represent the predictions of our tool, the red lines represent Match and Alibaba2 predictions. In this case we consider Camp http://www.ncbi.nlm.nih.gov/ protein/CAG46759.1. Figure A: reports the comparison between our predictions and Match tool with the very strict choice for cut-offs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. The Match predictions fit very well in our biding sites regions. Figure B: reports the comparison between our predictions and Alibaba2 tool with the parameter Similarity of sequence to matrix equal 100. The Alibaba2 predictions fit very well in our biding sites regions. Figure C: comparison between our predictions and Alibaba2 tool with the standard parameters setup. Alibaba2 prediction covers almost the all sequence, becoming meaningless for a comparison. Issue 1(2), 2010 The Name of the Journal Figure 6B reports the comparison between our predictions and Alibaba2 tool with the parameter Similarity of sequence to matrix equal 100. The Alibaba2 predictions fit very well in our biding sites regions. Figure 6C reports the comparison between our predictions and Alibaba2 tool with the standard parameters setup. Alibaba2 prediction covers almost the all sequence, becoming meaningless for a comparison. In Figure 7 we consider Pseudomonas fluorescens partial cop gene for putative copper transporting ATPase http://www.ncbi.nlm.nih.gov/nuccore/139925 48. In Figure 8 we consider Insulin http://www.ncbi. nlm.nih.gov/protein/AAA40590.1 This mechanism could also give a justification to the need of 64 triplets in the genetic code to encode 20 amino acids. Each synonymous codon has a different set of parameters. The choice of which one will encode a specific amino acid on a protein influences the positions of the subsequent binding sites, because the parameters of the subsequences will change. If in a sequence we replace a triplet with a synonymous codon, the positions of binding sites from that point to the end of the sequence will change, and this will influence the potential interactions of the sequence with other ligands. Figure 9 shows the comparison between neuter points in the sequence of Insulin http://www.ncbi.nlm. nih.gov/protein/AAA40590.1 Figure 9A represents the neuter points positions calculated from the original sequence. Figure 9B represents the neuter points after the replacement of asparagine AAC, parameters (i, 1) in Position 34 with the synonymous codon AAU, parameters (i, 4). The first 5 points at the beginning of the sequence are the same, but after position 34 the points change significantly. 6 Codons Distribution along Genes The results of recent studies, focused on searching for genome trends in codon choice, suggest that we don’t yet understand all the rules guiding translation, but the emerging idea is that codon choice is not random. The genetic code is degenerate: there are 61 codons instead of 20. Consequently, some amino acids are encoded multiple times and many different combinations of codons build the same protein. One of the main questions is how it’s working the mechanism guiding the Issue 1(2), 2010 11 codon choice within synonymous sets. Many recent studies focused on this subject. Cannarrozzi and colleagues (Cannarozzi et al., 2010; Tuller et al., 2010) examined groups of synonymous codons to understand if were randomly or nonrandomly ordered along genes. The discovery of ramps (Tuller et al., 2010; Fredrick & Ibba, 2010) shows that codon choice is not uniform. In the “ramp”, region including the first 30-50 codons, the speed of translation is slow and then it will increase up to a certain level for the rest of the gene. Kudla et al. (2009) examining the effects of synonymous codon substitutions on the efficiency of translation, found that the sequence at the beginning of the gene strongly influence the translation itself. These results are consistent with other similar studies concerning the importance of mRNA structure in controlling translation initiation (de Smit & van Duin, 2003; Studer & Joseph, 2006). The analysis of the codon sequences of some proteins with our model seems to confirm these recent ideas. We have examined the distribution of codons in the sequences of many proteins respect to the values of multiplication and one-dimensional tables. Figures 10 and 11 report as example the results of six proteins of different length respect to the multiplication table. For each protein we have examined the all sequence and the initial region (including around the 10 % of the total number of codons). The distribution along the all sequence is fully consistent and homogeneous with the initial one. It’s like if in the initial part the translation process has to choose the best set of parameters for the specific sequence and the rest of the translation will follow the initial choice. The result is a homogenous distribution of codons along the all sequences. This is also consistent with the results of the recent studies reported above, especially with the concept of “ramp” and the idea that the sequence at the beginning of the gene strongly influences the translation. As we already said, each codon has its set of two parameters, calculated with the parameters of the single nucleotides. Then codons in a synonymous set have different parameters: they encode the same amino acid but with a different contribute to the sequence in terms of symmetry; therefore the choice of the synonymous codon can change significantly the final parameters of the sequence. The Name of the Journal 12 2.5 2 1.5 Series2 Series1 1 0.5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 (A) 2.5 2 1.5 Series2 Series1 1 0.5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 (B) 2.5 2 1.5 Series2 Series1 1 0.5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 (C) Figure 7: Binding sites comparison. In this case we consider Pseudomonas fluorescens partial cop gene for putative copper transporting ATPase http://www.ncbi.nlm.nih.gov/nuccore/13992548. Figure A: reports the comparison between our predictions and Match tool with the very strict choice for cut-offs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. The Match predictions fit very well in our biding sites regions. Figure B: reports the comparison between our predictions and Alibaba2 tool with the parameter Similarity of sequence to matrix equal 100. The Alibaba2 predictions fit very well in our biding sites regions. Figure C: comparison between our predictions and Alibaba2 tool with the standard parameters setup. Alibaba2 prediction covers almost the all sequence, becoming meaningless for a comparison. Issue 1(2), 2010 The Name of the Journal 13 2.5 2 1.5 Series2 (A) Series1 1 0.5 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 2.5 2 1.5 (B) Series2 Series1 1 0.5 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 Figure 8: Binding sites comparison. In this case we consider Insulin http://www.ncbi.nlm.nih.gov/protein/AAA 40590.1. Figure A: reports the comparison between our predictions and Match tool with the very strict choice for cutoffs parameters for core and matrix similarity set to 1.0 and 0.93 respectively. The Match predictions fit very well in our biding sites regions. Figure B: reports the comparison between our predictions and Alibaba2 tool with the standard parameters setup. Alibaba2 prediction covers almost the all sequence, becoming meaningless for a comparison. Following these results, we could say that at the beginning the translation is slow because the choice of the “best” codon in order to achieve the final neutrality of the sequence is more difficult. It’s like if the “translator” has to consider a big number of combinations of parameters to find the best path that will lead to a correct result. After a certain number of codons, the “path” is traced and the choice is faster. The remaining translation will follow the choice of parameters made in the initial region, and the choice of synonymous codons will be faster because “guided”. Following these results, we could say that at the beginning the translation is slow because the choice of the “best” codon in order to achieve the final neutrality of the sequence is more difficult. It’s like if the “translator” has to consider a big number of combinations of Issue 1(2), 2010 parameters to find the best path that will lead to a correct result. After a certain number of codons, the “path” is traced and the choice is faster. The remaining translation will follow the choice of parameters made in the initial region, and the choice of synonymous codons will be faster because “guided”. In our example, proteins in Figures 10 and 11 have been grouped by length. In Figure 10 are represented the three shortest sequences: Nisin 53 codons, Insulin 109 codons and Keratin 138 codons. In Figure 11 are represented the longest sequences: Adenylyl cyclase 367 codons, Coronin 454 codons and Myosin with 1520 codons. On the left side of the picture is reported the percentage distribution of codons in the initial region of the sequence. On the right side is reported the percentage distribution of codons along the all sequence. The Name of the Journal 14 (A) (B) Figure 9: Neuter point changes with synonymous codons. Thecould also give a justification to the need of 64 triplets in the genetic code to encode 20 amino acids. Each synonymous codon has a different set of parameters. The choice of which one will encode a specific amino acid on a protein influences the positions of the following binding sites, because the parameters of the subsequences will change. If in a sequence we replace a triplet with a synonymous codon, the positions of binding sites from that point to the end of the sequence will change, and this will influence the potential interactions of the sequence with other ligands. The figure shows the comparison between neuter points in the sequence of Insulin http://www.ncbi.nlm.nih.gov/protein/AAA40590.1. Figure A: represents neuter points positions calculated from the original sequence. Figure B: represents the neuter points after the replacement of asparagine AAC, parameters (©, 1) in position 34 with the synonymous codon AAU, parameters (©, 4). The first 5 points at the beginning of the sequence are the same, but after position 34 the points change significantly. Issue 1(2), 2010 The Name of the Journal 15 Begin Sequence All Sequence (A) (B) (C) Figure 10: Codons distribution along genes in short sequences. The Figure reports the distribution of codons in the sequences of the three shortest proteins respect to the value of multiplicative table: Nisin 53 codons, Insulin 109 codons and Keratin 138 codons. On the left side of the picture is reported the percentage distribution of codons in the initial region of the sequence. On the right side is reported the percentage distribution of codons along the all sequence. For Nisin_53 (A) the 2 are predominant and 4 are rare. This sequence is very short, and to compare the beginning with the all sequence does not have a big statistical meaning, anyway the comparison is not bad. For Insulin_109 (B) and Keratin_138 (C) the 1 and 3 are predominant and only some 4 are present. For Nisin_53 (Figure 10A) the 2 are predominant and 4 are rare. This sequence is very short, and to compare the beginning with the all sequence does not have a big statistical meaning, anyway the comparison is not bad. For Insulin_109 (Figure10B) and Keratin_138 (Figure10C) the 1 and 3 are predominant and only some 4 are present. For Coronin_454 (Figure11A) the values 1 Issue 1(2), 2010 and 3 are predominant. For Myosin_1520 (Figure11B) and Adenylyl_367 (Figure11C) there is an equilibrate distribution of the four values. In conclusion, our results really suggest that the character of the sequence is determined at the beginning of the translation, when the process it’s slow. Then the initial choice is maintained along the all gene. The Name of the Journal 16 Begin Sequence All Sequence (A) (B) (C) Figure 11: Codons distribution along genes in long sequences. The figure reports the distribution of codons in the sequences respect to the value of multiplicative table: Adenylyl cyclase 367 codons, Coronin 454 codons and Myosin with 1520 codons. On the left side of the picture is reported the percentage distribution of codons in the initial region of the sequence. On the right side is reported the percentage distribution of codons along the all sequence. For Coronin_454. In Figure A, the values 1 and 3 are predominant. For Myosin_1520 (Figure B) and Adenylyl_367 (Figure C) there is an equilibrate distribution of the four values. 6 Discussion Analyzing the results of this model, it seems that there is a relation between a synonymous codon choice and the behavior of the sequence, giving a justification to the need of 64 triplets in the genetic code to encode 20 amino acids. Regarding the 3D structure, to each synonymous codon are associated the three-dimensional rotation ma- Issue 1(2), 2010 trices of nucleotides. Therefore, synonymous codons encode the same amino acid, but have different table parameters and also different rotation matrices. Different rotations give a different contribution to build the final 3D structure. We have seen also that the choice of which codon will encode a specific amino acid on a protein influences the positions of the neuter points along the sequence. If The Name of the Journal 17 neuter points represent binding sites, the synonymous codon choice influence the interactions of the protein and the function of the protein itself. The existence of 64 redundant triplets is justified with the need to adapt the amino acid encoding process to the function of the protein. This is possible choosing the synonymous codon with the right parameters for each specific context. Moreover, if neuter points represent possible binding sites, it would be possible modify the regions (i.e. enable or disable a binding site) playing with the codon parameters. Moreover, in this model the concept of evolution is linked to the possibility for a mutation to modify the parameters and the symmetry of the sequence. In evolution play a fundamental role processes that constantly introduce variations. The main cause of variation is mutation, which changes the sequence of a gene. The total symmetry of a sequence is invariant under local transformations related to the multiplicative table: only changes that don’t modify the parameters of the sequence are allowed and after the changes, the all sequence must still belong to the same equivalence class. Appendix A The basic step is to define the correct distribution of nucleotides (purines and pyrimidines) into the equivalence classes of the group, and then to verify that this distribution is the only one that generates the double helix structure. The 4th order group has 4 classes with 4 elements each. We need 9 not neuter elements, three for each class. We have only four elements, and the permutations of the 5 missing elements are 5! = 120, corresponding to 120 different sets of equivalence classes. All the permutations involved have been analyzed, and only the two distributions of nucleotides in Table A1 and Table A2 generate the correct double-helix structure. Table A3 showS the results generated respectively with Table A1 and Table A2. A A G C C U/T C U/T G A G U/T U/T A C G C A Table A1 Table A2 Bases Tab. 1 Bases Tab. 2 (1,1)(3,1) AT CG (1,2)(3,2) AA CC (1,3)(3,3) GC TA (2,1)(2,1) CC AA (2,2)(2,2) TT GG (2,3)(2,3) GG TT Table A3: pairs of nucleotides generated from Table 1 and Table 2 respectively We are looking for a unique distribution of nucleotides, and we found two distributions able to generate the correct 64 triplets of the genetic code. But if we check carefully the Tables A1 and A2, we see that the two distributions are complementary respect to the DNA double helix. In Tables A1 and A2 we find in the same positions the corresponding nucleotides of the DNA pairs: where in Table A1 we have nucleotide A, in Table A2 we find nucleotide C and the same for the pair G-T. Both distributions generate the 64 triplets of the genetic code, but only the distribution in Table A1 is the correct one. In fact, verifications regarding the reproduction of 3D structure of proteins have shown that Table A2 distribution does not build a correct shape. Figures A1-1, A1-2 and A1-3 show as examples the reproductions of the 3D shapes of Nisin, Collagen and Histone with both distributions. It is clear from the comparison with the pictures taken from NCBI site that the distribution in Table A2 is very far from building a good 3D shape. Appendix B The rotations of generic α, β and γ angles are represented by the following matrices: ⎡ 1 0 ⎢ R( x)(γ ) = ⎢ 0 cosγ ⎢ 0 sin γ ⎣ ⎤ 0 ⎥ − sin γ ⎥ cosγ ⎥⎦ Rotation of γ degrees around x-axis ⎡ cos β ⎢ R( y )( β ) = ⎢ 0 ⎢ − sin β ⎣ 0 sin β ⎤ ⎥ 1 0 ⎥ 0 cos β ⎥⎦ Rotation of β degrees around y-axis Issue 1(2), 2010 The Name of the Journal 18 ⎡ cosα R( z)(α ) = ⎢ sin α ⎢ ⎢⎣ 0 − sin α cosα 0 0 0 1 ⎤ ⎥ ⎥ ⎥⎦ Rotation of α degrees around z-axis For γ= 90° we have sin 90° = 1 and cos 90° = 0; for γ= 180° we have sin 180° = 0 and cos 180° = -1; for γ= 270° we have sin 270° = -1 and cos 270° = 0; therefore the rotations matrices around x, y and z axis are: 1 0 0 1 0 0 1 0 0 0 0 -1 0 -1 0 0 0 1 0 1 0 0 0 -1 0 -1 0 TTG (3,1)(3,1)(1,3) 270° x-axis, 270° x-axis, 90° z-axis GAT (1,3)(3,2)(2,2) 90° z-axis, 270° y-axis, 180° y-axis TTG (3,1)(3,1)(1,3) 270° x-axis, 270° x-axis, 90° z-axis CTA (3,3)(2,2)(1,2) 270° z-axis, 180° y-axis, 90° y-axis TCT (3,1)(3,3)(3,1) 270° x-axis, 270° z-axis, 270° x-axis GTT (1,3)(3,1)(3,1) 90° z-axis, 270° x-axis, 270° x-axis TCG (3,1)(2,1)(2,3) 270° x-axis, 180° x-axis, 180° z-axis AAG (1,1)(1,1)(1,3) 90° x-axis, 90° x-axis, 90° z-axis AAA (1,1)(1,1)(1,1) 90° x-axis, 90° x-axis, 90° x-axis GAT (1,3)(3,2)(2,2) 90° z-axis, 270° y-axis, 180° y-axis TCA (2,2)(3,3)(1,2) 180° y-axis, 270° z-axis, 90° y-axis GGT (1,3)(1,3)(3,1) 90° z-axis, 90° z-axis, 270° x-axis GCA (2,3)(2,1)(1,1) 180° z-axis, 180° x-axis, 90° x-axis TCA (2,2)(3,3)(1,2) 180° y-axis, 270° z-axis, 90° y-axis X axis Rotations matrices for 90°, 180° and 270° CCA (3,3)(3,3)(1,1) 270° z-axis, 270° z-axis, 90° x-axis CGC (2,1)(2,3)(3,3) 180° x-axis, 180° z-axis, 270° z-axis 0 ATT (3,2)(2,2)(3,1) 270° y-axis, 180° y-axis, 270° x-axis ACA (1,1)(3,3)(1,1) 90° x-axis, 270° z-axis, 90° x-axis AGT (3,2)(1,3)(2,2) 270° y-axis, 90° z-axis, 180° y-axis ATT (3,2)(2,2)(3,1) 270° y-axis, 180° y-axis, 270° x-axis TCG (3,1)(2,1)(2,3) 270° x-axis, 180° x-axis, 180° z-axis CTA (3,3)(2,2)(1,2) 270° z-axis, 180° y-axis, 90° y-axis TGT (3,1)(1,3)(3,1) 270° x-axis, 90° z-axis, 270° x-axis ACA (1,1)(3,3)(1,1) 90° x-axis, 270° z-axis, 90° x-axis CCC (3,3)(3,3)(3,3) 270° z-axis, 270° z-axis, 270° z-axis GGT (1,3)(1,3)(3,1) 90° z-axis, 90° z-axis, 270° x-axis TGT (3,1)(1,3)(3,1) 270° x-axis, 90° z-axis, 270° x-axis AAA (1,1)(1,1)(1,1) 90° x-axis, 90° x-axis, 90° x-axis ACA (1,1)(3,3)(1,1) 90° x-axis, 270° z-axis, 90° x-axis GGA (1,3)(1,3)(1,1) 90° z-axis, 90° z-axis, 90° x-axis GCT (2,3)(2,1)(3,1) 180° z-axis, 180° x-axis, 270° x-axis CTG (2,1)(3,1)(2,3) 180° x-axis, 270° x-axis, 180° z-axis ATG (3,2)(2,2)(1,3) 270° y-axis, 180° y-axis, 90° z-axis GGT (1,3)(1,3)(3,1) 90° z-axis, 90° z-axis, 270° x-axis TGT (3,1)(1,3)(3,1) 270° x-axis, 90° z-axis, 270° x-axis AAC (1,1)(1,1)(3,3) 90° x-axis, 90° x-axis, 270° z-axis ATG (3,2)(2,2)(1,3) 270° y-axis, 180° y-axis, 90° z-axis AAA (1,1)(1,1)(1,1) 90° x-axis, 90° x-axis, 90° x-axis ACA (1,1)(3,3)(1,1) 90° x-axis, 270° z-axis, 90° x-axis GCA (2,3)(2,1)(1,1) 180° z-axis, 180° x-axis, 90° x-axis ACT (1,2)(3,3)(2,2) 90° y-axis, 270° z-axis, 180° y-axis TGT (3,1)(1,3)(3,1) 270° x-axis, 90° z-axis, 270° x-axis CAT (3,3)(1,2)(2,2) 270° z-axis, 90° y-axis, 180° y-axis TGT (3,1)(1,3)(3,1) 270° x-axis, 90° z-axis, 270° x-axis AGT (3,2)(1,3)(2,2) 270° y-axis, 90° z-axis, 180° y-axis ATT (3,2)(2,2)(3,1) 270° y-axis, 180° y-axis, 270° x-axis CAC (3,3)(1,1)(3,3) 270° z-axis, 90° x-axis, 270° z-axis GTA (1,3)(2,2)(3,2) 90° z-axis, 180° y-axis, 270° y-axis 0 1 -1 0 0 0 0 -1 0 1 0 0 1 0 0 1 0 -1 0 0 0 0 -1 1 0 0 Y axis Rotations matrices for 90°, 180° and 270° 0 -1 0 -1 0 0 0 1 0 1 0 0 0 -1 0 -1 0 0 0 0 1 0 0 1 0 0 1 Z axis Rotations matrices for 90°, 180° and 270° Appendix C In this section we will show how to build the 3D structure of the protein Nisin 53 http://www.ncbi.nlm .nih.gov/protein/ABV64388.1. The sequence of amino acids is the following: DFNLDLLSVSKKDSGASPRITSISLCTPGCKTGALMGCNMKTATCNCSIHVSK gattttaacttggatttgctatctgtttcgaagaaagattcaggtgcatcaccacgcattacaagtatttcgctatgtacacccggttgtaaaacaggagctctgatgggttgtaacatgaaaacagcaacttgtaattgtagtattcacgtaagcaaa T and U nucleotides are represented by the same matrix element and have the same rotation associated. Codon Matrix elements Rotations associated GAT (1,3)(3,2)(2,2) 90° z-axis, 270° y-axis, 180° y-axis TTT (3,1)(3,1)(3,1) 270° x-axis, 270° x-axis, 270° x-axis AAC (1,1)(1,1)(3,3) 90° x-axis, 90° x-axis, 270° z-axis Issue 1(2), 2010 The Name of the Journal 19 AGC (1,1)(2,3)(2,1) 90° x-axis, 180° z-axis, 180° x-axis AAA (1,1)(1,1)(1,1) 90° x-axis, 90° x-axis, 90° x-axis To build a 3D shape on an orthogonal space we need a set of points with coordinates (x, y, z). We will describe in detail the calculation of the first four points, then it will be enough to repeat the steps to get the complete set of coordinates. We set the origin of an orthogonal reference frame at the beginning of the sequence, where the first codon starts. Therefore, the coordinates of the first point are the coordinates of the vector representing the first codon. Then after the application of its three rotations, we will get the coordinates of the second point. For the first codon we have: GAA (1,3)(3,2)(2,2) 90° z-axis, 270° y-axis, 180° y-axis The corresponding codon matrix and the rotation matrices are: GAT codon matrix 90° z-axis 270° y-axis 180° y-axis 0 0 1 0 -1 0 0 0 -1 -1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 -1 GAT codon matrix and 90° z-axis, 270° y-axis, 180° yaxis rotation matrices Consequently, the coordinates of the first point are (3, 2, 2). Now we apply the rotations to get the coordinates of the second point: we will multiply the three rotation matrices with the codon matrix following the order of the nucleotides. The coordinates of the resulting T3 vector are (2, 3, 2). To get the coordinates of the second point we convert the values (2, 3, 2) into the main reference frame (3, 2, 2) and we get (5, 5, 4). 0 -1 0 1 0 0 0 0 1 * 0 0 1 0 1 0 0 1 0 G * GAT = T1 Issue 1(2), 2010 = 0 -1 0 0 0 1 0 1 0 0 0 -1 0 1 0 1 0 0 * 0 -1 0 0 0 1 0 1 0 0 = -1 0 0 0 1 0 -1 0 0 1 0 0 0 1 0 1 0 A * T1 = T2 -1 0 0 0 1 0 0 0 -1 0 * -1 0 0 0 1 0 -1 0 = T * T2 = T3 Now we will repeat the calculation for the second codon. To do this we translate the origin of the reference frame to the tip of the T3 vector. Thus the origin of the reference frame for the second codon has coordinates (5, 5, 4) respect to the main reference frame of the sequence. Also in this case the coordinates will be then converted into the reference frame of the sequence. For the second codon we have: TTT (3,1)(3,1)(3,1) 270° x-axis, 270° x-axis, 270° x-axis The corresponding codon matrix and the rotation matrices are: TTT codon matrix 270° x-axis 270° x-axis 270° x-axis 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 3 0 0 0 -1 0 0 -1 0 0 -1 0 TTT codon matrix and 270° x-axis, 270° x-axis, 270° xaxis rotation matrices Now we apply the rotations to get the coordinates of the third point: we will multiply the three rotation matrices with the codon matrix following the order of the nucleotides. 1 0 0 0 0 0 0 0 1 0 0 0 0 -1 0 3 0 0 * T * TTT = T1 = 0 0 0 3 0 0 0 0 0 The Name of the Journal 1 0 0 0 0 1 0 -1 0 * 20 0 0 0 3 0 0 0 0 0 = 0 0 0 0 0 0 -3 0 0 1 0 0 0 0 -1 0 1 0 * T * T1 = T2 1 0 0 0 0 0 0 0 1 0 0 0 0 -1 0 -3 0 0 * = 0 0 0 1 0 0 -3 0 0 0 0 -1 0 0 0 0 1 0 0 0 0 0 0 0 1 = 2 0 0 0 0 -1 0 0 0 * 2 0 0 2 0 0 0 0 -1 0 0 0 = 0 0 0 0 0 -1 0 0 0 -2 0 0 0 0 -1 A * T1 = T2 The coordinates of the resulting T3 vector are (0, -3, 0). To get the coordinates of the third point we convert the values (0, -3, 0) respect to the reference frame of the sequence. We remind that the reference frame for the second codon has coordinates (5, 5, 4), thus the coordinates of the third point are (5, 2, 4). Now we will repeat the calculation for the third codon. To do this we translate the origin of the reference frame to the tip of the T3 vector. Thus the origin of the reference frame for the third codon has coordinates (5, 2, 4) respect to the main reference frame of the sequence. Also in this case the coordinates will be then converted into the reference frame of the sequence. For the third codon we have: AAC (1,1)(1,1)(3,3) 90° x-axis, 90° x-axis, 270° z-axis The corresponding codon matrix and the rotation matrices are: 90° x-axis 0 A * AAC = T1 T * T2 = T3 AAC codon matrix 2 90° x-axis 270° x-axis 0 1 0 -1 0 1 0 0 1 * 2 0 0 0 0 0 0 0 -1 = C * T2 = T3 The coordinates of the resulting T3 vector are (0, -2, -3). To get the coordinates of the fourth point we convert the values (0, -2, -3) respect to the reference frame of the sequence. We remind that the reference frame for the third codon has coordinates (5, 2, 4), thus the coordinates of the fourth point are (5, 0, 1). It is enough now to repeat the steps with all the remaining codons of the sequence to get the complete set of coordinates. To see the result, the set of coordinates will have to be saved on a .dat file, say nisin.dat. Then it can be used as input for graphical tools, like gnuplot, free software that displays various mathematical functions and numerical data http://www.gnuplot.info/download.html, running the commands: 2 0 0 1 0 0 1 0 0 0 1 0 > set hidden3d > splot “C://….path…//nisin.dat” with lines 0 0 0 0 0 -1 0 0 -1 -1 0 0 and the result will be the following: 0 0 1 0 1 0 0 0 0 0 1 1 AAC codon matrix and 90° x-axis, 90° x-axis, 270° x-axis rotation matrices Now we apply the rotations to get the coordinates of the third point: we will multiply the three rotation matrices with the codon matrix following the order of the nucleotides. Issue 1(2), 2010 The Name of the Journal 21 The final set of coordinates in our example is: 322 554 524 501 5 3 -1 761 7 9 -1 471 971 9 4 -1 953 9 10 3 12 10 3 14 13 5 11 15 3 5 16 3 4 16 7 1 18 5 0 12 5 4 12 2 7 12 0 4 12 -2 6 10 -5 9 10 -7 9 11 -3 6 9 -1 4 9 -4 1 9 -6 193 -5 10 3 -7 10 0 -4 10 0 -7 10 -2 -14 10 -2 -14 9 -6 -14 10 -2 -16 12 -5 -22 13 -5 -24 13 -8 -24 11 -11 -26 13 -14 -23 13 -14 -26 13 -16 -27 13 -12 -29 10 -10 -31 10 -13 -33 10 -10 -35 10 -13 -33 8 -16 -30 8 -18 -36 8 -19 -34 11 -17 Author Biography Paola Pozzo Research and Development Statistics, Intrasoft – Eurostat, Luxembourg. [email protected] References Basilevsky, A. (1983) Applied matrix algebra in the statistical sciences, Dover Publications. Rotman, J. (1994). An introduction to the theory of groups. New York, Springer-Verlag. Crick, Francis, (1988). Chapter 8: The genetic code. What mad pursuit: a personal view of scientific discovery. New York, Basic Books. (pp. 89–101). Ridley, M. (2006). Genome, New York, Harper Perennial. Elzanowski A. & Ostell J. (2008). The Genetic Codes, National Center for Biotechnology Information (NCBI). http://www.ncbi.nlm.nih. gov/Taxonomy/Utils/wprintgc.cgi?mode=c. Harary, F. (1994) Graph Theory, ed. Reading. MA,Addison-Wesley. Lomont, J.S. (1987). Cyclic Groups, in Applications of Finite Groups. Dover. New York. (p. 78). Scott, W.R. (1987). Cyclic Groups in Group Theory. Dover, New York. (pp. 3435). Frampton, P. (2008). Gauge Field Theories. 3ed. Wiley-VCH. Cannarrozzi, G., et al. (2010). Cell, 141 (pp. 355-367). Tuller, T., et al. (2010). Cell, 141: (pp. 344 – 354). Fredrick, K. and M. Ibba (2010), Cell, 141 (pp. 227 – 229). de Smit, M.H. & J. van Duin (2003). Journal of Molecular Biology, 331 (pp. 737–743). Studer, S.M. & Joseph, S. (2006). Molecular Cell, 22 (pp. 105–115). Issue 1(2), 2010
© Copyright 2026 Paperzz