Article BIOINFORMATICS HTTP://DX.DOI.ORG/10.5504/BBEQ.2013.0052 TERMINATION CODONS AND STOP CODON CONTEXT IN BACTERIA AND MAMMALIAN MITOCHONDRIA Kiril T. Kirilov1, Ashkan Golshani2, Ivan G. Ivanov1 Institute of Molecular Biology, Bulgarian Academy of Sciences, Sofia, Bulgaria 2 Department of Biology, Carleton University, Ottawa, Canada Correspondence to: Kiril Kirilov E-mail: [email protected] 1 abstract The aim of this study was to analyze the frequency of occurrence of individual stop codons and combinations of stop codons with adjacent upstream and downstream triplets in 264 bacterial and 1308 mammalian mitochondrial genomes. For the purpose of this analysis a novel program (Gene Triplet Analysis) was applied. The obtained results indicate that the standard stop codon UAA is the most frequently used one (48 %) in both bacteria and mitochondria. In addition, our analysis revealed 30 nonstandard translation termination codons in mitochondria. The preferential nucleotides in all three positions adjacent to the stop codons in mammalian mitochondria are A and U. Biotechnol. & Biotechnol. Eq. 2013, 27(4), 4018-4025 Keywords: termination codons, stop-codon context, Gene Triplet Analysis (GTA), mitochondrial release factor, RF, ICT1, C12orf65 Introduction The Postgenomic era made it clear that the genetic code is not simply an assignment of 61 trinucleotides to 20 α-amino acids but it is also a modulating factor in gene expression. The degeneration of genetic code in combination with the species (possibly tissue) specific codon usage is a mechanism for fine regulation/modulation of protein biosynthesis (25). Although the codon usage phenomenon was described long ago, its biological significance remained vague until the last decade. Ever since the year 2000, thousands of prokaryotic, eukaryotic, viral and organelle genomes have been sequenced, opening a new thoroughfare for extensive bioinformatics analyses, including studies on the codon usage phenomenon. The obtained results undoubtedly demonstrate that the synonymous codons are not randomly employed and that their preference varies widely between the different taxonomic groups (sometimes between the species) (20). Besides single codon bias, the combinations of codons (codon pairs) are also biased (3). The latter is explained by the difference in compatibility between the isoacceptor tRNAs occupying the two functional (A and P) sites on the translating ribosome (13). Taking into consideration that the steric parameters of these two sites are genetically pre-determined by the structure of the small ribosomal subunit, it is logical to assume that the combinations between two isoacceptor tRNAs (having different spatial structures) or a tRNA in P and a release factor (RF1 or RF2) in the A site will not be equal and, therefore, could play the role of a modulating factor in the translation of genetic information (19). Bearing in mind that these combinations are genetically encoded by the collinear 4018 arrangement of synonymous codons in mRNA, their effect on translation can be indirectly estimated via the frequency of occurrence of codon pairs in protein coding genes. In a previous study (4) we investigated the frequency of occurrence of codon pairs in all Escherichia coli open reading frames (ORFs) and proved undoubtedly that both the combinations sense:sense and sense:stop codons were nonrandomly distributed. Our analysis, based on 4289 ORFs, revealed that the frequency of occurrence of codon pairs in the E. coli genome varied from zero to 4913. Based on their preference, the codon pairs were classified as: overrepresented, moderately represented, underrepresented and missing. Thus, 19 missing pairs were identified of which 14 appeared to be combinations between sense and stop codons. With the exception of one pair, ACU:UGA, the rest contained UAG as a stop codon. Furthermore, we studied the effect of various sense:stop codon pairs on translation efficiency in vivo and demonstrated that the missing pairs CCU:UAG and CCC:UAG, but not CCU:UAA and CCC:UAA (all coding for Pro:stop) had a strong suppressing effect on the translation of chloramphenicol acethyltransferase (cat) gene (5). The yield of recombinant (CAT) protein in this experiment had a direct correlation with the type of stop codon used in the pair, with a decreasing order: UAA>UGA>UAG. In our earlier studies, however, we did not investigate either the usage of nucleotides located downstream of the stop codon (in the non-translating region), or their effect on translation. Here, we took advantage of the availability of hundreds of bacterial and thousands of mitochondrial genome sequences, to analyze the frequency of occurrence (usage) of combinations of stop codons with adjacent upstream and downstream triplets in 482 453 bacterial and 16 967 mitochondrial genes belonging to 264 bacterial and 1308 mammalian mitochondrial genomes. To study all three types of triplet combinations: pre-stop:stop, stop:post-stop and pre-stop:stop:post-stop triplets, we Biotechnol. & Biotechnol. Eq. 27/2013/4 developed an original program, called Gene Triplet Analysis (GTA), capable of directly using biological data files (*.gbk). Databases and Methodology Genome sequence data files In this study, 1308 mammalian mitochondrial genomes containing 16 967 protein coding genes and 264 bacterial genomes carrying 482453 open reading frames (ORF) were obtained and analyzed. Lists of these genomes belonging to different taxonomic groups are found in Appendix 1 and Appendix 2. All bacterial and mitochondrial (mtDNA) sequences were downloaded from the NCBI GenBank (http:// www.ncbi.nlm.nih.gov/genbank/) as *.gbk files containing complete information about the organism, its genome and sequences of its genes and proteins. Codon and codon/triplet pairs analysis To study the stop codon context, a new program named Gene Triplet Analysis (GTA) was written in JAVA NetBeans IDE 6.1 and described in details elsewhere (10). An advantage of this program is that it directly uses biological data files (*.gbk) from external BioJava (www.biojava.org) libraries and the obtained results are saved as *.csv (Comma Separated Values, see Fig. 1). It loads *.gbk files and consecutively extracts information concerning the species name, gene signature and localization in the full length sequence, and calculates the nucleotide position in relation to the first and the last nucleotide in the ORF. Since transcription may take place in opposite directions, two signs (plus or minus) are possible to refer to the nucleotide position depending on the direction of transcription. A positive sign indicates inverse transcription (transcription from opposite strand). In case of reverse transcription, the program automatically converts the sequence into the correct format. The algorithm also identifies the start codon and the first downstream codon (called post-start codon). After that, it determines the stop codon together with the two adjacent triplets: pre-stop and post-stop trinucleotides. Furthermore, the program defines the nonanucleotides pre-stop:stop:post-stop. All data obtained by the GTA program are mounted in a table as *.csv files for further use in mathematical and statistical calculations. The program is designed to work with a userfriendly graphical interface and is available at: http://bio21. bas.bg/kirilov/. Results and Discussion Although the genetic code is considered standard and universal, there are important differences between the standard (cellular) and mitochondrial genetic codes. Twentythree deviations from the standard genetic code have been described in mitochondria, indicating that there are at least 23 different types of mitochondrial genetic codes (18); www. ncbi.nlm.nih.gov/taxonomy/taxonomyhome.html/index. cgi?chapter=cgencodes (1). The main differences between the standard and mitochondrial genetic codes can be formulated as follows: a) Many triplets in mitochondria are assigned to different amino acids, other than those for the standard genetic code; b) Some sense codons play the role of termination codons in mitochondria; and c) The stop codon UGA encodes tryptophan in mitochondria. In this study we took advantage of a large number of bacterial and mitochondrial genomic sequences available at the DNA databases to compare the usage of stop codons and combinations of stop codons with adjacent triplets in both bacteria and mammalian mitochondria. Stop-codon usage in bacteria and mammalian mitochondria There are three stop codons in the standard genetic code: UAA, UGA and UAG. In this study, we determined their usage (frequency of occurrence) in 482 453 bacterial ORFs belonging to 264 bacterial genomes and also in 16 967 mitochondrial protein-coding genes from 1308 mammalian mitochondria. The data presented in Table 1 show that UAA is the most preferred stop codon (accounting for approximately 48 % of all stop codons) in both bacteria and mitochondria. There is some difference in the usage of the second stop codon UAG in bacteria (19.36 %) and mitochondria (14.22 %) and a noticeable difference in the usage of the third standard stop codon UGA. It is the second preferential stop codon in bacteria (31.87 %) and completely missing in mitochondria. Our analysis also revealed some non-standard stop codons (AUA, UUA, AAA, AAG, AGU and AGC) in bacteria, with a frequency of occurrence below 0.0001 %, which may account for sequencing or data processing errors. TABLE 1 Stop codon usage in bacteria and mammalian mitochondria Codon UAA UGA UAG Fig. 1. Gene Triplet Analysis (GTA) Program. Biotechnol. & Biotechnol. Eq. 27/2013/4 Bacteria Occurrence (%) 48.76765 31.87005 19.35919 Mitochondria Occurrence (%) 47.99364 0.017677 14.22426 In mitochondria, however, besides the two standard stop codons (UAA and UAG), 40 additional translation termination triplets were revealed (Table 2). As seen in Table 2, the most frequently used non-standard stop codon is CCU (4.93 %), followed by AAU (4.07 %), AUA (2.99 %), CAU (2.94 %), AGA (2.91 %), AGG (2.8 %), CUU (2.34 %), ACU (2.33 %), 4019 CUA (1.93 %), UAU (1.90 %), UUU (1.62 %), UUA (1.51 %), AUU (1.51 %), and UCU (1.01 %). The rest of the non-standard stop codons have a frequency of occurrence lower than 1.0. Ten of them (AGC, ACA, ACG, UCA, GAG, GGG, GCA, GCC, CAC, CCG) have an extremely low frequency of usage (lower than 0.01 %) and the fact that they appear could be explained by sequencing or data processing errors. This group of codons was omitted from further analysis, reducing the number of non-standard stop codons in mammalian mitochondria to 30 triplets. codon combinations (pre-stop:stop pairs) in both bacterial and mammalian mitochondrial genomes. Usage of pre-stop:stop codon pairs in bacteria and mammalian mitochondria. The frequency of occurrence of pre-stop:stop codon pairs in 264 bacterial genomes is presented in Table 3 and Appendix 3. As indicated in Table 3, the most preferred 3’-terminal codon pairs in bacteria contain the stop codon UAA (48.76775 %), followed by UGA (31.86991 %) and UAG (19.35923 %), which correlates with the frequency of usage of the corresponding stop codons (see Table 1). The most frequently used codon pair in bacteria is aaa:Uaa (6.044 %), followed by gaa:Uaa (2.724 %), and aaU:Uaa (2.054 %). Among the most frequently used 3’-terminal codon pairs with a frequency of occurrence of at least 1 % (18 in total), the stop codon Uaa appears 12 times; Uga, 5 times; and Uag, once. Our data also demonstrate that the most preferential penultimate (last sense) codon is the Lys codon AAA (9.19 %). We revealed 15 combinations of sense codons with nonstandard termination triplets (aaa, aag, agc, agU, aUa, caa, cgU, cUg, gac, ggU, gUa, UUa, UUg) showing a frequency of occurrence lower than 0.0002 %. These may reflect sequencing or data processing errors and are omitted in further analysis. The frequencies of occurrence of pre-stop:codon codon pairs in mammalian mitochondria are presented in Table 4A and Table 4B, and Appendix 4. Considering that at least 32 different triplets are used for termination in mitochondria (see above), is it logical to expect the presence of a higher number of pre-stop:stop codon combinations than in bacteria. Our analysis revealed 968 such pairs (expected number: 62×32 = 1984) with a frequency of occurrence between 4.167 % and less than 0.001 % (see TABLE 2 Non-standard stop codon usage in mammalian mitochondria Codon ccU aaU aUa caU aga agg cUU acU cUa UaU (%) 4.93 4.07 2.99 2.95 2.91 2.80 2.34 2.33 1.93 1.91 Codon UUU UUa aUU UcU agU gaU gcU UgU gUU ggU (%) 1.62 1.51 1.51 1.11 0.61 0.55 0.36 0.29 0.21 0.20 Codon gUa cgU cUg aUg gaa aag aaa aac UUg cca (%) 0.19 0.16 0.11 0.04 0.02 0.02 0.01 0.01 0.01 0.01 Frequency of usage of combinations of stop codons with adjacent triplets in bacteria and mammalian mitochondria Taking into consideration that the stop codon context might be important for the efficiency of translation termination, we determined the frequency of occurrence of all 183 sense:stop TABLE 3 Frequency of occurrence of pre-stop:stop codon pairs in bacteria* Codon Freq. (%) AA Codon aaa:Uaa 6.04 Lys gaa:Uaa 2.72 Freq. (%) AA Codon aUU:Uaa 1.44 Ile Glu caa:Uaa 1.38 aaU:Uaa 2.05 Asn aag:Uaa 1.80 Freq. (%) Freq. (%) AA Codon AA cgc:Uga 0.99 Arg aga:Uaa 0.85 Arg Gln gac:Uga 0.97 Asp gag:Uga 0.85 Glu UUa:Uaa 1.37 Leu UaU:Uaa 0.94 Tyr aUa:Uaa 0.83 Ile Lys gag:Uaa 1.22 Glu gaa:Uag 0.88 Glu UUc:Uaa 0.82 Phe aaa:Uag 1.66 Lys ggc:Uga 1.09 Gly gca:Uaa 0.87 Ala gga:Uaa 0.78 Gly gaU:Uaa 1.64 Asp aac:Uaa 1.09 Asn gUU:Uaa 0.87 Val aag:Uag 0.77 Lys UUU:Uaa 1.61 Phe gcU:Uaa 1.08 Ala gac:Uaa 0.87 Asp caU:Uaa 0.76 His gcc:Uga 1.59 Ala gca:Uga 1.07 Ala cUU:Uaa 0.87 Leu caa:Uga 0.72 Gln aaa:Uga 1.49 Lys gaa:Uga 1.06 Glu gaU:Uga 0.85 Asp aag:Uga 0.70 Lys The table contains data for frequently used (above 0.7 %) codon pairs only. For more information see Appendix 3. * AA: amino acid. 4020 Biotechnol. & Biotechnol. Eq. 27/2013/4 Appendix 4). As expected, the most preferential 3’-terminal codon pairs contained the standard stop codons UAA and UAG. They amount to approximately 62 % of all termination codon pairs in mitochondria. The most preferred codon pairs containing standard stop codons (Table 4A) are: Ugc:Uaa (4.17 %), aaU:Uaa (2.41 %), aUU:Uaa (2.19 %), acc:Uaa (2.05 %), UgU:Uaa (1.96 %), AAA:UAA (1.80 %), etc. It is worth mentioning that, with an occurrence of 6 %, the most frequently used pre-stop:stop codon pair in bacteria, AAA:UAA, takes the sixth position in mitochondria (1.82 %). Among the most frequently used pre-stop:stop codon pairs in mitochondria, with occurrence of more than 1 % (23 in total), Uaa appears 21 times, Uag twice and ccU once. TABLE 4 Frequency of occurrence of pre-stop:stop codon pairs in mammalian mitochondria (see also Appendix 4). 4A. Standard pre-stop:stop codon pairs in mammalian mitochondria* Codon Freq. (%) Codon Freq. (%) Codon Freq. (%) (Table 4B). Comparing with the data in Table 2, one can see that their frequency of usage does not correlate with that of the corresponding non-standard stop codons (CUU, AGG, AAU, AUA and AGA). Stop:post-stop codon usage in bacteria and mammalian mitochondria. As mentioned above (see Databases and Methodology), the new GTA Program made it possible to also study the frequency of occurrence for combinations of stop codons with triplets located downstream in the 3’ noncoding region. The results from this analysis for bacteria and mammalian mitochondria are presented in Table 5 and Table 6, and Appendix 5 and Appendix 6. TABLE 5 Frequency of occurrence of stop:post-stop codon pairs in bacteria Codon Freq. (%) Codon Freq. (%) Codon Freq. (%) Uaa:aaa 3.5264 Uga:aaa 1.1357 Uga:cgc 0.7941 Uaa:UUU 2.4216 Uaa:aag 1.0876 Uga:gcc 0.7910 Uaa:aaU 2.1130 Uaa:Uca 1.0434 Uaa:UcU 0.7806 Uaa:Uaa 1.8400 Uaa:gga 0.9976 Uag:UUU 0.7787 Uaa:UUa 1.7753 Uga:UUU 0.9856 Uaa:aca 0.7752 Ugc:Uaa 4.1669 aUc:Uaa 1.4322 Uac:Uaa 1.0727 aaU:Uaa 2.4106 gaa:Uaa 1.3791 cUU:Uaa 1.0550 aUU:Uaa 2.1866 gUU:Uag 1.3497 gag:Uaa 1.0373 acc:Uaa 2.0451 cUa:Uaa 1.3379 UaU:Uaa 0.9843 UgU:Uaa 1.9626 cac:Uaa 1.3261 Ucc:Uaa 0.8369 Uaa:aUU 1.6300 Uaa:gaa 0.9454 Uaa:UUc 0.7557 aaa:Uaa 1.8153 Uca:Uaa 1.2848 aUa:Uaa 0.7721 Uaa:UaU 1.5162 Uaa:aUg 0.8737 Uaa:agg 0.7487 gUU:Uaa 1.7151 UUa:Uaa 1.2613 gUc:Uaa 0.7721 Uaa:aUa 1.5096 Uaa:aac 0.8679 Uga:Uga 0.7476 aac:Uaa 1.6915 aca:Uaa 1.1847 caU:Uaa 0.7603 Uga:Uaa 1.5795 caa:Uaa 1.1729 aUU:Uag 0.7544 Uaa:Uga 1.2393 Uaa:aga 0.8428 Uaa:Ugg 0.7396 UUU:Uaa 1.5265 gaa:Uag 1.1434 UcU:Uaa 0.7426 Uga:gcg 1.1458 Uag:aaa 0.8364 Uaa:UUg 0.7385 The table contains data for frequently used (above 0.4 %) codon pairs only. For more information see Appendix 4. * 4B. Non-Standard pre-stop:stop codon pairs in mammalian mitochondria* Codon Freq. (%) Codon Freq. (%) Codon Freq. (%) acg:ccU 1.5972 gaU:cUU 0.4597 cgU:aUa 0.2652 gaa:agg 0.9843 gaU:caU 0.4185 gUa:aga 0.2593 ccg:aaU 0.9784 gag:cUU 0.3949 aUc:UUa 0.2416 cUc:aUa 0.9784 gUU:agg 0.3831 aga:aUa 0.2299 Uga:aga 0.9135 gaa:acU 0.3359 gUU:caU 0.2122 cUg:aaU 0.8782 aUc:aUa 0.3006 ccg:ccU 0.2122 gcU:caU 0.7426 aUg:ccU 0.2888 aaU:gaU 0.2004 cag:aaU 0.7073 gag:ccU 0.2888 aac:aUa 0.1945 cgU:cUa 0.6365 cgU:UUa 0.2888 Uaa:aaU 0.1886 gaU:ccU 0.6306 UUU:acU 0.2829 acg:cUU 0.1886 The table contains data for frequently used (above 0.4 %) codon pairs only. For more information see Appendix 4. * The most frequently used pre-stop:stop codon pairs containing non-standard stop codons are ACG:CUU, GAA:AGG, CCG:AAU, CUC:AUA, UGA:AGA, etc Biotechnol. & Biotechnol. Eq. 27/2013/4 The most used stop:post-stop codon pair in bacteria (Table 5) is Uaa:aaa (3.53 %), followed by Uaa:UUU (2.42 %), Uaa:aaU (2.11 %), Uaa:UUa (1.78 %), and Uaa:aUU (1.63 %). This distribution indicates that the preferential 3’-terminal pairs in bacteria appear to be highly enriched in A and U. The frequency of stop:post-stop codon pairs in mammalian mitochondria containing standard stop codons (Table 6A) is 62.21 %. The most frequently used pairs are: UAA:AAA (4.78 %), UAA:UGA (2.60 %), UAA:GCU (2.17 %), UAA:UGG (2.13 %), UAA:AAU (1.83 %), and UAA:GAA (1.72 %). Unlike the stop:post-stop codon pairs in bacteria, 37.79 % of all pairs of this type in mitochondria contain non-standard stop codons (see Table 6B). As shown in Table 6B, the first two most frequently used stop:post-stop codon pairs with non-standard stop codons (CCU:CAC and AAU:AGG) contain the two most used nonstandard stop codons CCU and AAU (see Table 2). However, the frequency of occurrence of the other highly used codon pairs (CUA:AUG, AUA:AUC, ACU:GUA) does not correlate with the frequency of usage of the corresponding non-standard stop codons. 4021 TABLE 6 Frequency of occurrence of the stop:post-stop codon pairs in mammalian mitochondria. 6A. Standard stop:post-stop codon pairs in mammalian mitochondria (see also Appendix 6) Codon Freq. (%) Codon Freq. (%) Codon Freq. (%) Uaa:aaa 4.7793 Uaa:aUU 1.3377 Uaa:UaU 0.7720 Uaa:Uga 2.6047 Uaa:UUa 1.3318 Uaa:Uca 0.7720 Uaa:gcU 2.1746 Uaa:acU 1.2376 Uaa:aga 0.7190 Uaa:Ugg 2.1333 Uaa:Uaa 1.1256 Uaa:UgU 0.6600 Uaa:aaU 1.8269 Uag:aaa 1.1138 Uaa:gUa 0.6188 Uaa:gaa 1.7208 Uaa:UcU 1.0902 Uaa:ccc 0.6188 Uaa:UUU 1.6442 Uaa:caa 0.9959 Uaa:ccU 0.6129 Uaa:aUa 1.4968 Uaa:Uag 0.9429 Uag:UUU 0.6129 Uaa:aag 1.4792 Uaa:aca 0.9370 Uaa:cca 0.6070 Uaa:cga 1.3908 Uaa:acc 0.8840 Uag:ggg 0.5952 6B. Non-Standard stop:post-stop codon pairs in mammalian mitochondria (see also Appendix 6) Codon Freq. (%) Codon Freq. (%) Codon Freq. (%) ccU:cac 1.4851 aaU:aUg 0.4597 aga:aag 0.2770 aaU:agg 1.2317 aUa:aUg 0.4420 ccU:gcU 0.2711 cUa:aUg 1.0313 ccU:cgc 0.4420 cUU:gUa 0.2711 aUa:aUc 0.9841 caU:aUU 0.4243 agg:aaa 0.2652 acU:gUa 0.9665 cUU:gcc 0.4066 UaU:gca 0.2652 aga:gUc 0.7190 caU:aUc 0.3595 aUa:aga 0.2416 agg:aag 0.7131 UcU:gUa 0.3595 ccU:gca 0.2416 ccU:gUa 0.6659 acU:gcc 0.3536 cUU:gcU 0.2357 UUa:aUg 0.4832 ccU:gcc 0.3300 ccU:aUU 0.2298 aaU:aga 0.4656 ccU:caU 0.2829 acU:gca 0.2239 Pre-stop:stop:post-stop codon usage in bacteria and mammalian mitochondria. Taking into consideration that the efficiency of translation termination depends on both the type of stop codon used and the context of adjacent nucleotides (stop codon context), we were motivated to determine the frequency of occurrence of nonanucleotides representing combinations of the three 3’-terminal triplets (pre-stop:stop:post-stop codon) in bacteria and mammalian mitochondria. As illustrated in Table 7 and Fig. 2 the nucleotide preference at position -1 (the last nucleotide of the pre-stop codon) in bacteria is: A (32.27 %), U (26.6 %), C (22.5 %), G (18.64 %). The preferred nucleotides at position -2 (the second nucleotide of the prestop codon) are also A or U (63.19 %). The same holds true for position +1 (the first nucleotide following the stop codon) where A and U occur at a frequency of about 30 % each, and G and C with 22 % and 17 %, respectively. The two nucleotides, A and U, are also preferential at positions +2 and +3. 4022 Fig. 2. Frequency of occurrence of the pre-stop:stop:post-stop codon triplets in bacteria. Abscissa first dimension: nucleotide’s number in the nonanucleotide consisting of pre-stop, stop and post-stop codon triplets; abscissa second dimension: type of nucleotides at positions -3 to +3 in the nonanucleotides; ordinate: frequency of occurrence of nonanucleotides in bacterial genes. TABLE 7 Frequency of occurrence of the pre-stop:stop:post-stop codon triplets in bacteria and mammalian mitochondria. Position -3 -2 -1 1 2 3 +1 +2 +3 U 16.9 23.9 26.6 100.0 0.0 0.0 30.4 25.5 26.7 Bacteria A G 31.5 32.1 39.3 18.4 32.3 18.6 0.0 0.0 68.3 31.7 80.6 19.4 30.8 22.1 29.6 22.9 31.4 22.3 C 19.6 18.4 22.5 0.0 0.0 0.0 16.7 22.0 19.7 U 29.1 32.2 33.9 68.6 12.5 25.2 21.9 28.6 25.2 Mitochondria A G 28.0 21.4 31.2 16.7 28.5 14.0 17.4 1.6 71.7 7.0 57.6 17.2 38.5 24.6 29.3 22.0 36.8 21.2 C 21.5 20.0 23.6 12.5 8.8 0.0 15.1 20.0 16.8 Vertical column (left) represents nucleotides’ position in the nonanucleotide consisting of pre-stop:stop:post-stop codon triplets; horizontal row (top) represents type of nucleotides in positions -3 to +3 in the corresponding nonanucleotide. The sum of the values in each horizontal row of the two columns is 100 %. The preference of A and U in the triplets adjacent to stop codons is more noticeable in mitochondria (Table 7 and Fig. 3). In spite of a high number of diverse stop codons employed in mitochondria, the usage of A/U at position -1 is 62.4 %, and 60.3 % at position +1. The most avoided nucleotide in mitochondria at position 1 is G (1.57 %); at position 2, G (7.0 %) and C (8.8%); and at position 3, C (close to zero). Translation in both prokaryotes and eukaryotes is terminated by three stop codons (UAG, UGA and UAA), which are recognized by two classes (class I and II) of translation termination/release factors. Class I includes the release factors RF1 and RF2 (in prokaryotes) and eRF1 and eRF2 (in eukaryotes), and class II is represented by the release factor RF3 only. RF1/eRF1 and RF2/eRF2 recognize the stop codons UAA/UAG and UAA/UGA, respectively. Both class I release factors hydrolyze the ester bond between the growing polypeptide chain and the last tRNA in the ribosomal P site and the RF3, also called ribosome recycling factor. The Biotechnol. & Biotechnol. Eq. 27/2013/4 latter is employed by prokaryotes only and is not known in eukaryotes (8, 9). Fig. 3. Frequency of occurrence of the pre-stop:stop:post-stop codon triplets in mammalian mitochondria. Abscissa first dimension: nucleotide’s number in the nonanucleotide consisting of pre-stop, stop and post-stop codon triplets; abscissa second dimension: type of nucleotides in positions -3 to +3 in the nonanucleotides; ordinate: frequency of occurrence of nonanucleotides in mitochondrial genes. While the translation termination machinery of prokaryotes and eukaryotes is well understood, limited information is available on the translation termination in mitochondria. They contain a separate translation apparatus for the synthesis of mitochondrion-specific proteins encoded by the mitochondrial DNA (7). In mammals, mtDNA encodes 13 proteins that play essential roles in the respiratory chain reaction (6). All proteins required for mitochondrial translation, however, including those involved in translation termination, are coded by nuclear genes and are imported from the cellular cytoplasm. It is shown that only mtRF1a is necessary and sufficient for the termination of translation of all 13 mitochondrial polypeptides in human mitochondria (22, 24). Recent proteomic analyses uncovered 73 proteins associated with the mitochondrial ribosomes (17). Later, Richter et al. (16) postulated that the immature colon carcinoma transcript-1 (ICT1) might be a member of the mitochondrial release factors family. ICT1 is a component of the 39S mitochondrial ribosomal subunit that carries a ribosome-dependent peptidyltRNA hydrolase (PTH) activity and is essential for cell viability. The authors also showed that this PTH activity is codon nonspecific and speculated that it might be involved in the hydrolysis of peptidyl-tRNAs in prematurely terminated (stalled) mitochondrial ribosomes. Another feature of the mitochondrial translation machinery is the use of a number of different termination codons. As discussed, UAA is preferred in all (prokaryotic, eukaryotic and mitochondrial) translation systems. In mammalian mitochondria, its usage is close to 48 %, which is much greater than that of the second standard stop codon UAG (14.22 %). The UAA:UAG ratio in mitochondria is 3.4, which is higher than that in bacteria (2.4). The bias for UAA in prokaryotes and eukaryotes is explained by the fact that it is recognized by two release factors (RF1/eRF1 and RF2/eRF2), whereas the other two stop codons (UAG and UGA) are recognized by Biotechnol. & Biotechnol. Eq. 27/2013/4 one release factor each (12). In mitochondria, however, only one RF factor, the mitochondrial release factor 1 (mtRF1), is utilized. Since it recognizes both UAA and UAG codons (11), it is logical to expect a lower frequency of usage for the UAA codon in mitochondria. This is contrary to our observation that UAA appears with a high frequency. The unexpected high preference for UAA observed here, might be explained by: i) existence of a putative mitochondrial release factor that specifically recognizes UAA and not UAG, or ii) existence of an auxiliary (helper) factor that enhances the activity/affinity of the mtRF1 for the UAA codon. Final remarks In this study we analyzed the frequency of usage of termination codons in 264 bacterial and 1308 mammalian mitochondrial genomes. Expectedly, our results for bacteria are in accordance with published reports from small scale studies. In mitochondria, however, in addition to the two standard stop codons UAA and UAG, we revealed 40 additional non-standard stop codons with a frequency of usage varying from 0.001 % to 5 %. Assuming that the appearance of some extremely rare stop codons might be due to data processing or sequencing errors, we determined a reliability threshold of 0.01 %. This led to the omission of ten extremely rare stop codons and reduction of the number of the non-standard termination codons to 30 triplets (their usage is shown in Table 2). Bearing in mind that these non-standard stop codons may also serve as sense codons in the standard genetic code, the question is by what mechanism they are recognized as termination signals in mitochondria. Certain codons, such as AGA/AGG in human mitochondria are not recognized by any mt-tRNA or mt-RF and promote termination via ribosomal frame-shifting (24). Consequently, they may not be considered to be classical termination codons per se. The number of the predicted non-canonical termination codons that actually function as stop codons in mammalian mitochondria is difficult to estimate, without experimental support. We believe that our statistical analysis will inspire future studies designed to shed more light on the potential of such codons to terminate translation in mitochondria and on the mechanism of this process. The ICT1 protein identified in the study of Richter et al. (16) contains an M domain, which is typical for the RF factors. Within this domain, GGQ is responsible for the hydrolysis of the ester bond between the growing polypeptide chain and the last tRNA. However, the ICT1 protein appears to be devoid of a stop codon-recognizing NIKS domain. Hypothetically, this protein sticks to the E-site on the mitochondrial large ribosomal subunit and cleaves the ester bond independently of the codon type in the mitoribosomal A site (17). Another protein, C12orf65, which is devoid of a NIKS domain and, therefore, hypothetically functioning as a codon non-specific release factor, has recently been described by Antonicka et al. (2) and Smits et al. (21). Together with mtRF1a/mtRF1, the ICT1 and C12orf65 are also considered to be mitochondrial release factors (mtRFs). 4023 Based on our data and other published reports, we could conclude that translation termination in mammalian mitochondria can be realized by both standard and nonstandard (non-canonical) stop codons. In principle, the latter are sense codons in the standard genetic code but, in a specific context, could play the role of termination signals. To study the stop codon context in both prokaryotes and mitochondria, we used an original program developed by us (GTA, see above), which allows an analysis to be made of the frequency of occurrence of nucleotide combinations of stop codons and adjacent nucleotides (both upstream and downstream). Thus, the frequency of usage of all hexanucleotides representing pre-stop:stop and stop:post-stop codons, and also the nonanucleotides pre-stop:stop:post-stop in all bacterial and mitochondrial genomes was determined. In a previous study, we determined the frequency of occurrence of the pre-stop:stop codon pairs in the E. coli genome and identified the most frequently used and missing codon pairs (4). In the present study, we expand our analysis to 264 bacterial and 1308 mitochondrial genomes. As shown in Fig. 2 and Fig. 3, the most frequently used nucleotides at positions -3, -2, -1 (upstream) and +1, +2, +3 (downstream) adjacent to the stop codons in both bacteria and mitochondria is A (30 % or higher), followed by U. At position -3 in bacteria the most biased nucleotide is G (32 %). In an experimental system Mottagui-Tabar and Isaksson (14) varied the nucleotides at positions -1 and -2, using a weak stop signal UGAA, and observed a well-expressed modulation of translation termination effect in E. coli and B. subtilis but not in S. typhimurium. Other studies indicate that the content of nucleotides located downstream of the stop codon (+ signs) are also important for the efficiency of translation termination (15, 23). This finding is supported by bioinformatics analyses of both prokaryotic and eukaryotic genomes (REF). This is in strong agreement with the X-ray crystallography analysis by Dalphin et al. (4) and Korostelev et al. (7), who independently showed that eRF1 interacts not only with the stop codon situated in the ribosomal A site, but also with the adjacent nucleotides at positions -1, -2, +1 and +2. In addition, the pre-stop:stop codon usage data allowed us to determine the bias of C-terminal amino acids in bacterial proteins. As shown in Fig. 4, the most frequently used C-terminal amino acids in bacteria are: Lys (12.5 %), Ala and Leu (about 8 %), Arg, Glu and Ser (about 7 %), Gly (6.1 %), Asp, Asn, Ile and Val (about 5 %), Phe and Gln (about 4 %), etc. Fig. 4. Frequency of usage of C-terminal amino acids in bacterial proteins. 4024 Fig. 5. Frequency of usage of C-terminal amino acids in E. coli proteins. In terms of usage, the C-terminal amino acids can be classified into four groups: a) frequently used (Lys, Ala, Leu, Arg, Glu, Ser and Gly); b) moderately used (Asp, Asn, Ile, Val, Phe, Gln and Pro); c) rare (Tyr, His and Thr), and d) avoided (Met, Trp and Cys). As seen in Fig. 4, the group of frequently used amino acids is represented by: hydrophilic (Lys, Arg, Glu and Ser), hydrophobic (Ala, Leu and Gly), basic (Lys and Arg), and acidic (Glu) α-amino acids. To some extent the same holds true for the group of moderately used amino acids. This makes it difficult to draw conclusions about the relationship between chemical nature and frequency of usage of the C-terminal α-amino acids in bacteria. It should be mentioned that the data presented in Fig. 4 represents an average for the 264 bacterial species used in this study. The frequency of occurrence of C-terminal amino acids in E. coli proteins alone (Fig. 5) indicates that the individual data might substantially deviate from the average presented in Fig. 4. For instance, the most preferred C-terminal amino acid in E. coli is Glu (15 %) and not Lys (12 %) as suggested by the average counts, and the two moderately used amino acids Gln and Met in the average distribution are absent in the E. coli proteins. The program, source codes and Appendices are available at: http://bio21.bas.bg/kirilov/. Conclusions The frequency of occurrence of stop codons as well as of combinations of stop codons and adjacent upstream and downstream nucleotides in 482 453 open reading frames belonging to 264 bacterial and 1308 mammalian mitochondrial genomes was determined by a novel program (Gene Triplet Analysis). Based on this analysis, the following conclusions can be drawn: • The most frequently used termination codon in both bacteria and mammalian mitochondria is the standard termination codon UAA. • Besides the two standard stop codons (UAA and UAG), 30 other non-standard termination codons are found in mammalian mitochondria. • The preferential nucleotides in all three positions (±1 to ±3) adjacent to the termination codons in both bacteria and mammalian mitochondria are A and U. Biotechnol. & Biotechnol. Eq. 27/2013/4 • The most frequently used pre-stop:stop codon pairs in mammalian mitochondria are AAA:UAA (6.044 %), GAA:UAA (2.724 %) and AAU:UAA (2.054 %). • The most common post-stop:stop codon pairs in mammalian mitochondria are UAA:AAA (3.53 %), UAA:UUU (2.42 %), UAA:AAU (2.11 %), UAA:UUA (1.78 %), and UAA:AUU (1.63 %). • The most frequently used C-terminal amino acids in bacteria are Lys, Ala, Leu, Arg, Glu, Ser, Gly. • The most frequently used C-terminal amino acids in mammalian mitochondria are Glu, Lys, Ala, Arg, Leu, Gly, Ser. • The most avoided C-terminal amino acids in bacteria are Met, Trp and Cys. Acknowledgements This study was supported by Grant No. IDEAS 02-30/2009 from the National Science Fund of Bulgaria. References 1. Anjay A. (2012) National Center for Biotechnology Information (NCBI), Bethesda, Maryland, U.S.A. 2. Antonicka H., Ostergaard E., Sasarman F., Weraarpachai W., et al. (2010) Am. J. Hum. Genet., 87, 115-122. 3. Bossi L., Ruth J.R., (1980) Nature, 286, 123-127. 4. Boycheva S., Chkodrov G., Ivanov I. (2003) Bioinformatics, 19, 987-998. 5. Boycheva S.S., Bachvarov B.I., Berzal-Heranz A., Ivanov I.G. (2004) Curr. Microbiol., 48, 97-101. 6. D’Aurelio M., Gajewski C.D., Lenaz G., Manfredi G. (2006) Hum. Mol. Genet., 15, 2157-69. 7. Hunter S.E., Spremulli L.L. (2004) Mitochondrion, 4, 21-29. Biotechnol. & Biotechnol. Eq. 27/2013/4 8. Janosi L., Mottagui-Tabar S., Isaksson L.A., Sekine Y., et al. (1998) EMBO J., 17, 1141-1151. 9. Janosi L., Shimizu I., Kaji A. (1994) P. Natl. Acad. Sci. USA, 91, 4249-4253. 10.Kirilov K., Ivanov I. (2012) Biotechnol. Biotech. Eq., 26, 33103314. 11.Korostelev A., Asahara H., Lancaster L., Laurberg M., et al. (2008) P. Natl. Acad. Sci. USA, 105, 19684-19689. 12.Laurberg M., Asahara H., Korostelev A., Zhu J., et al. (2008) Nature, 454, 852-857. 13.Leger M., Dulude D., Steinberg S.V., Brakier-Gingras L. (2007) Nucleic Acids Res., 35, 5581-5592. 14.Mottagui-Tabar S., Isaksson L.A. (1998) Gene, 212, 189-196. 15.Poole E.S., Brown C.M., Tate W.P. (1995) EMBO J., 14, 151158. 16.Richter R., Rorbach J., Pajak A., Smith P.M., et al. (2010) EMBO J., 29, 1116-1125. 17.Rorbach J., Richter R., Wessels H.J., Wydro M., et al. (2008) Nucleic Acids Res., 36, 5787-5799. 18.Sayers E.W., Barrett T., Benson D.A., Bolton E., et al. (2012) Nucleic Acids Res., 40, D13-25. 19.Schluenzen F., Tocilj A., Zarivach R., Harms J., et al. (2000) Cell, 102, 615-623. 20.Sharp P.M., Li W.H. (1987) Nucleic Acids Res., 15, 1281-1295. 21.Smits P., Antonicka H., van Hasselt P.M., Weraarpachai W., et al. (2011) Eur. J. Hum. Genet., 19, 275-279. 22.Soleimanpour-Lichaei H.R., Kuhl I., Gaisne M., Passos J.F., et al. (2007) Mol. Cell, 27, 745-757. 23.Tate W.P., Poole E.S., Dalphin M.E., Major L.L., et al. (1996) Biochimie, 78, 945-952. 24.Temperley R., Richter R., Dennerlein S., Lightowlers R.N., Chrzanowska-Lightowlers Z.M. (2010) Science, 327, 301. 25.Welch M., Govindarajan S., Ness J.E., Villalobos A., et al. (2009) PLoS One, 4, e7002. 4025
© Copyright 2026 Paperzz