Document

Article
BIOINFORMATICS
HTTP://DX.DOI.ORG/10.5504/BBEQ.2013.0052
TERMINATION CODONS AND STOP CODON CONTEXT IN BACTERIA AND
MAMMALIAN MITOCHONDRIA
Kiril T. Kirilov1, Ashkan Golshani2, Ivan G. Ivanov1
Institute of Molecular Biology, Bulgarian Academy of Sciences, Sofia, Bulgaria
2
Department of Biology, Carleton University, Ottawa, Canada
Correspondence to: Kiril Kirilov
E-mail: [email protected]
1
abstract
The aim of this study was to analyze the frequency of occurrence of individual stop codons and combinations of stop codons with
adjacent upstream and downstream triplets in 264 bacterial and 1308 mammalian mitochondrial genomes. For the purpose of
this analysis a novel program (Gene Triplet Analysis) was applied. The obtained results indicate that the standard stop codon
UAA is the most frequently used one (48 %) in both bacteria and mitochondria. In addition, our analysis revealed 30 nonstandard translation termination codons in mitochondria. The preferential nucleotides in all three positions adjacent to the stop
codons in mammalian mitochondria are A and U.
Biotechnol. & Biotechnol. Eq. 2013, 27(4), 4018-4025
Keywords: termination codons, stop-codon context, Gene
Triplet Analysis (GTA), mitochondrial release factor, RF,
ICT1, C12orf65
Introduction
The Postgenomic era made it clear that the genetic code is
not simply an assignment of 61 trinucleotides to 20 α-amino
acids but it is also a modulating factor in gene expression.
The degeneration of genetic code in combination with the
species (possibly tissue) specific codon usage is a mechanism
for fine regulation/modulation of protein biosynthesis (25).
Although the codon usage phenomenon was described long
ago, its biological significance remained vague until the last
decade. Ever since the year 2000, thousands of prokaryotic,
eukaryotic, viral and organelle genomes have been sequenced,
opening a new thoroughfare for extensive bioinformatics
analyses, including studies on the codon usage phenomenon.
The obtained results undoubtedly demonstrate that the
synonymous codons are not randomly employed and that their
preference varies widely between the different taxonomic
groups (sometimes between the species) (20).
Besides single codon bias, the combinations of codons
(codon pairs) are also biased (3). The latter is explained by
the difference in compatibility between the isoacceptor tRNAs
occupying the two functional (A and P) sites on the translating
ribosome (13). Taking into consideration that the steric
parameters of these two sites are genetically pre-determined
by the structure of the small ribosomal subunit, it is logical
to assume that the combinations between two isoacceptor
tRNAs (having different spatial structures) or a tRNA in P and
a release factor (RF1 or RF2) in the A site will not be equal
and, therefore, could play the role of a modulating factor in the
translation of genetic information (19). Bearing in mind that
these combinations are genetically encoded by the collinear
4018
arrangement of synonymous codons in mRNA, their effect
on translation can be indirectly estimated via the frequency of
occurrence of codon pairs in protein coding genes.
In a previous study (4) we investigated the frequency
of occurrence of codon pairs in all Escherichia coli open
reading frames (ORFs) and proved undoubtedly that both the
combinations sense:sense and sense:stop codons were nonrandomly distributed. Our analysis, based on 4289 ORFs,
revealed that the frequency of occurrence of codon pairs in
the E. coli genome varied from zero to 4913. Based on their
preference, the codon pairs were classified as: overrepresented,
moderately represented, underrepresented and missing.
Thus, 19 missing pairs were identified of which 14 appeared
to be combinations between sense and stop codons. With
the exception of one pair, ACU:UGA, the rest contained
UAG as a stop codon. Furthermore, we studied the effect of
various sense:stop codon pairs on translation efficiency in
vivo and demonstrated that the missing pairs CCU:UAG and
CCC:UAG, but not CCU:UAA and CCC:UAA (all coding for
Pro:stop) had a strong suppressing effect on the translation of
chloramphenicol acethyltransferase (cat) gene (5). The yield
of recombinant (CAT) protein in this experiment had a direct
correlation with the type of stop codon used in the pair, with
a decreasing order: UAA>UGA>UAG. In our earlier studies,
however, we did not investigate either the usage of nucleotides
located downstream of the stop codon (in the non-translating
region), or their effect on translation.
Here, we took advantage of the availability of hundreds of
bacterial and thousands of mitochondrial genome sequences, to
analyze the frequency of occurrence (usage) of combinations
of stop codons with adjacent upstream and downstream triplets
in 482 453 bacterial and 16 967 mitochondrial genes belonging
to 264 bacterial and 1308 mammalian mitochondrial genomes.
To study all three types of triplet combinations: pre-stop:stop,
stop:post-stop and pre-stop:stop:post-stop triplets, we
Biotechnol. & Biotechnol. Eq. 27/2013/4
developed an original program, called Gene Triplet Analysis
(GTA), capable of directly using biological data files (*.gbk).
Databases and Methodology
Genome sequence data files
In this study, 1308 mammalian mitochondrial genomes
containing 16 967 protein coding genes and 264 bacterial
genomes carrying 482453 open reading frames (ORF) were
obtained and analyzed. Lists of these genomes belonging
to different taxonomic groups are found in Appendix 1 and
Appendix 2. All bacterial and mitochondrial (mtDNA)
sequences were downloaded from the NCBI GenBank (http://
www.ncbi.nlm.nih.gov/genbank/) as *.gbk files containing
complete information about the organism, its genome and
sequences of its genes and proteins.
Codon and codon/triplet pairs analysis
To study the stop codon context, a new program named Gene
Triplet Analysis (GTA) was written in JAVA NetBeans IDE
6.1 and described in details elsewhere (10). An advantage of
this program is that it directly uses biological data files (*.gbk)
from external BioJava (www.biojava.org) libraries and the
obtained results are saved as *.csv (Comma Separated Values,
see Fig. 1). It loads *.gbk files and consecutively extracts
information concerning the species name, gene signature and
localization in the full length sequence, and calculates the
nucleotide position in relation to the first and the last nucleotide
in the ORF. Since transcription may take place in opposite
directions, two signs (plus or minus) are possible to refer to the
nucleotide position depending on the direction of transcription.
A positive sign indicates inverse transcription (transcription
from opposite strand). In case of reverse transcription, the
program automatically converts the sequence into the correct
format. The algorithm also identifies the start codon and the
first downstream codon (called post-start codon). After that,
it determines the stop codon together with the two adjacent
triplets: pre-stop and post-stop trinucleotides. Furthermore, the
program defines the nonanucleotides pre-stop:stop:post-stop.
All data obtained by the GTA program are mounted in a table
as *.csv files for further use in mathematical and statistical
calculations. The program is designed to work with a userfriendly graphical interface and is available at: http://bio21.
bas.bg/kirilov/.
Results and Discussion
Although the genetic code is considered standard and
universal, there are important differences between the
standard (cellular) and mitochondrial genetic codes. Twentythree deviations from the standard genetic code have been
described in mitochondria, indicating that there are at least
23 different types of mitochondrial genetic codes (18); www.
ncbi.nlm.nih.gov/taxonomy/taxonomyhome.html/index.
cgi?chapter=cgencodes (1). The main differences between the
standard and mitochondrial genetic codes can be formulated
as follows: a) Many triplets in mitochondria are assigned to
different amino acids, other than those for the standard genetic
code; b) Some sense codons play the role of termination
codons in mitochondria; and c) The stop codon UGA encodes
tryptophan in mitochondria.
In this study we took advantage of a large number of
bacterial and mitochondrial genomic sequences available at
the DNA databases to compare the usage of stop codons and
combinations of stop codons with adjacent triplets in both
bacteria and mammalian mitochondria.
Stop-codon usage in bacteria and mammalian
mitochondria
There are three stop codons in the standard genetic code:
UAA, UGA and UAG. In this study, we determined their
usage (frequency of occurrence) in 482 453 bacterial ORFs
belonging to 264 bacterial genomes and also in 16 967
mitochondrial protein-coding genes from 1308 mammalian
mitochondria. The data presented in Table 1 show that UAA is
the most preferred stop codon (accounting for approximately
48 % of all stop codons) in both bacteria and mitochondria.
There is some difference in the usage of the second stop codon
UAG in bacteria (19.36 %) and mitochondria (14.22 %) and
a noticeable difference in the usage of the third standard
stop codon UGA. It is the second preferential stop codon in
bacteria (31.87 %) and completely missing in mitochondria.
Our analysis also revealed some non-standard stop codons
(AUA, UUA, AAA, AAG, AGU and AGC) in bacteria, with a
frequency of occurrence below 0.0001 %, which may account
for sequencing or data processing errors.
TABLE 1
Stop codon usage in bacteria and mammalian mitochondria
Codon
UAA
UGA
UAG
Fig. 1. Gene Triplet Analysis (GTA) Program.
Biotechnol. & Biotechnol. Eq. 27/2013/4
Bacteria
Occurrence (%)
48.76765
31.87005
19.35919
Mitochondria
Occurrence (%)
47.99364
0.017677
14.22426
In mitochondria, however, besides the two standard stop
codons (UAA and UAG), 40 additional translation termination
triplets were revealed (Table 2). As seen in Table 2, the most
frequently used non-standard stop codon is CCU (4.93 %),
followed by AAU (4.07 %), AUA (2.99 %), CAU (2.94 %),
AGA (2.91 %), AGG (2.8 %), CUU (2.34 %), ACU (2.33 %),
4019
CUA (1.93 %), UAU (1.90 %), UUU (1.62 %), UUA (1.51 %),
AUU (1.51 %), and UCU (1.01 %). The rest of the non-standard
stop codons have a frequency of occurrence lower than 1.0. Ten
of them (AGC, ACA, ACG, UCA, GAG, GGG, GCA, GCC,
CAC, CCG) have an extremely low frequency of usage (lower
than 0.01 %) and the fact that they appear could be explained
by sequencing or data processing errors. This group of codons
was omitted from further analysis, reducing the number of
non-standard stop codons in mammalian mitochondria to 30
triplets.
codon combinations (pre-stop:stop pairs) in both bacterial and
mammalian mitochondrial genomes.
Usage of pre-stop:stop codon pairs in bacteria and
mammalian mitochondria. The frequency of occurrence of
pre-stop:stop codon pairs in 264 bacterial genomes is presented
in Table 3 and Appendix 3.
As indicated in Table 3, the most preferred 3’-terminal
codon pairs in bacteria contain the stop codon UAA
(48.76775 %), followed by UGA (31.86991 %) and UAG
(19.35923 %), which correlates with the frequency of usage
of the corresponding stop codons (see Table 1). The most
frequently used codon pair in bacteria is aaa:Uaa (6.044 %),
followed by gaa:Uaa (2.724 %), and aaU:Uaa (2.054 %).
Among the most frequently used 3’-terminal codon pairs with
a frequency of occurrence of at least 1 % (18 in total), the
stop codon Uaa appears 12 times; Uga, 5 times; and Uag,
once. Our data also demonstrate that the most preferential
penultimate (last sense) codon is the Lys codon AAA (9.19 %).
We revealed 15 combinations of sense codons with nonstandard termination triplets (aaa, aag, agc, agU, aUa,
caa, cgU, cUg, gac, ggU, gUa, UUa, UUg) showing
a frequency of occurrence lower than 0.0002 %. These may
reflect sequencing or data processing errors and are omitted in
further analysis.
The frequencies of occurrence of pre-stop:codon codon
pairs in mammalian mitochondria are presented in Table 4A
and Table 4B, and Appendix 4.
Considering that at least 32 different triplets are used
for termination in mitochondria (see above), is it logical to
expect the presence of a higher number of pre-stop:stop codon
combinations than in bacteria. Our analysis revealed 968 such
pairs (expected number: 62×32 = 1984) with a frequency
of occurrence between 4.167 % and less than 0.001 % (see
TABLE 2
Non-standard stop codon usage in mammalian mitochondria
Codon
ccU
aaU
aUa
caU
aga
agg
cUU
acU
cUa
UaU
(%)
4.93
4.07
2.99
2.95
2.91
2.80
2.34
2.33
1.93
1.91
Codon
UUU
UUa
aUU
UcU
agU
gaU
gcU
UgU
gUU
ggU
(%)
1.62
1.51
1.51
1.11
0.61
0.55
0.36
0.29
0.21
0.20
Codon
gUa
cgU
cUg
aUg
gaa
aag
aaa
aac
UUg
cca
(%)
0.19
0.16
0.11
0.04
0.02
0.02
0.01
0.01
0.01
0.01
Frequency of usage of combinations of stop codons
with adjacent triplets in bacteria and mammalian
mitochondria
Taking into consideration that the stop codon context might
be important for the efficiency of translation termination, we
determined the frequency of occurrence of all 183 sense:stop
TABLE 3
Frequency of occurrence of pre-stop:stop codon pairs in bacteria*
Codon
Freq.
(%)
AA
Codon
aaa:Uaa 6.04
Lys
gaa:Uaa 2.72
Freq.
(%)
AA
Codon
aUU:Uaa 1.44
Ile
Glu
caa:Uaa 1.38
aaU:Uaa 2.05
Asn
aag:Uaa 1.80
Freq.
(%)
Freq.
(%)
AA
Codon
AA
cgc:Uga 0.99
Arg
aga:Uaa 0.85
Arg
Gln
gac:Uga 0.97
Asp
gag:Uga 0.85
Glu
UUa:Uaa 1.37
Leu
UaU:Uaa 0.94
Tyr
aUa:Uaa 0.83
Ile
Lys
gag:Uaa 1.22
Glu
gaa:Uag 0.88
Glu
UUc:Uaa 0.82
Phe
aaa:Uag 1.66
Lys
ggc:Uga 1.09
Gly
gca:Uaa 0.87
Ala
gga:Uaa 0.78
Gly
gaU:Uaa 1.64
Asp
aac:Uaa 1.09
Asn
gUU:Uaa 0.87
Val
aag:Uag 0.77
Lys
UUU:Uaa 1.61
Phe
gcU:Uaa 1.08
Ala
gac:Uaa 0.87
Asp
caU:Uaa 0.76
His
gcc:Uga 1.59
Ala
gca:Uga 1.07
Ala
cUU:Uaa 0.87
Leu
caa:Uga 0.72
Gln
aaa:Uga 1.49
Lys
gaa:Uga 1.06
Glu
gaU:Uga 0.85
Asp
aag:Uga 0.70
Lys
The table contains data for frequently used (above 0.7 %) codon pairs only. For more information see Appendix 3.
*
AA: amino acid.
4020
Biotechnol. & Biotechnol. Eq. 27/2013/4
Appendix 4). As expected, the most preferential 3’-terminal
codon pairs contained the standard stop codons UAA and
UAG. They amount to approximately 62 % of all termination
codon pairs in mitochondria. The most preferred codon pairs
containing standard stop codons (Table 4A) are: Ugc:Uaa
(4.17 %), aaU:Uaa (2.41 %), aUU:Uaa (2.19 %),
acc:Uaa (2.05 %), UgU:Uaa (1.96 %), AAA:UAA
(1.80 %), etc. It is worth mentioning that, with an occurrence
of 6 %, the most frequently used pre-stop:stop codon pair in
bacteria, AAA:UAA, takes the sixth position in mitochondria
(1.82 %). Among the most frequently used pre-stop:stop codon
pairs in mitochondria, with occurrence of more than 1 % (23
in total), Uaa appears 21 times, Uag twice and ccU once.
TABLE 4
Frequency of occurrence of pre-stop:stop codon pairs in
mammalian mitochondria (see also Appendix 4).
4A. Standard pre-stop:stop codon pairs in mammalian
mitochondria*
Codon
Freq. (%)
Codon
Freq. (%)
Codon
Freq. (%)
(Table 4B). Comparing with the data in Table 2, one can see
that their frequency of usage does not correlate with that of the
corresponding non-standard stop codons (CUU, AGG, AAU,
AUA and AGA).
Stop:post-stop codon usage in bacteria and mammalian
mitochondria. As mentioned above (see Databases and
Methodology), the new GTA Program made it possible to
also study the frequency of occurrence for combinations of
stop codons with triplets located downstream in the 3’ noncoding region. The results from this analysis for bacteria
and mammalian mitochondria are presented in Table 5 and
Table 6, and Appendix 5 and Appendix 6.
TABLE 5
Frequency of occurrence of stop:post-stop codon pairs in
bacteria
Codon
Freq. (%)
Codon
Freq. (%)
Codon
Freq. (%)
Uaa:aaa 3.5264
Uga:aaa 1.1357
Uga:cgc 0.7941
Uaa:UUU 2.4216
Uaa:aag 1.0876
Uga:gcc 0.7910
Uaa:aaU 2.1130
Uaa:Uca 1.0434
Uaa:UcU 0.7806
Uaa:Uaa 1.8400
Uaa:gga 0.9976
Uag:UUU 0.7787
Uaa:UUa 1.7753
Uga:UUU 0.9856
Uaa:aca 0.7752
Ugc:Uaa 4.1669
aUc:Uaa 1.4322
Uac:Uaa 1.0727
aaU:Uaa 2.4106
gaa:Uaa 1.3791
cUU:Uaa 1.0550
aUU:Uaa 2.1866
gUU:Uag 1.3497
gag:Uaa 1.0373
acc:Uaa 2.0451
cUa:Uaa 1.3379
UaU:Uaa 0.9843
UgU:Uaa 1.9626
cac:Uaa 1.3261
Ucc:Uaa 0.8369
Uaa:aUU 1.6300
Uaa:gaa 0.9454
Uaa:UUc 0.7557
aaa:Uaa 1.8153
Uca:Uaa 1.2848
aUa:Uaa 0.7721
Uaa:UaU 1.5162
Uaa:aUg 0.8737
Uaa:agg 0.7487
gUU:Uaa 1.7151
UUa:Uaa 1.2613
gUc:Uaa 0.7721
Uaa:aUa 1.5096
Uaa:aac 0.8679
Uga:Uga 0.7476
aac:Uaa 1.6915
aca:Uaa 1.1847
caU:Uaa 0.7603
Uga:Uaa 1.5795
caa:Uaa 1.1729
aUU:Uag 0.7544
Uaa:Uga 1.2393
Uaa:aga 0.8428
Uaa:Ugg 0.7396
UUU:Uaa 1.5265
gaa:Uag 1.1434
UcU:Uaa 0.7426
Uga:gcg 1.1458
Uag:aaa 0.8364
Uaa:UUg 0.7385
The table contains data for frequently used (above 0.4 %) codon pairs only.
For more information see Appendix 4.
*
4B. Non-Standard pre-stop:stop codon pairs in mammalian
mitochondria*
Codon
Freq. (%)
Codon
Freq. (%)
Codon
Freq. (%)
acg:ccU 1.5972
gaU:cUU 0.4597
cgU:aUa 0.2652
gaa:agg 0.9843
gaU:caU 0.4185
gUa:aga 0.2593
ccg:aaU 0.9784
gag:cUU 0.3949
aUc:UUa 0.2416
cUc:aUa 0.9784
gUU:agg 0.3831
aga:aUa 0.2299
Uga:aga 0.9135
gaa:acU 0.3359
gUU:caU 0.2122
cUg:aaU 0.8782
aUc:aUa 0.3006
ccg:ccU 0.2122
gcU:caU 0.7426
aUg:ccU 0.2888
aaU:gaU 0.2004
cag:aaU 0.7073
gag:ccU 0.2888
aac:aUa 0.1945
cgU:cUa 0.6365
cgU:UUa 0.2888
Uaa:aaU 0.1886
gaU:ccU 0.6306
UUU:acU 0.2829
acg:cUU 0.1886
The table contains data for frequently used (above 0.4 %) codon pairs only.
For more information see Appendix 4.
*
The most frequently used pre-stop:stop codon pairs
containing non-standard stop codons are ACG:CUU,
GAA:AGG, CCG:AAU, CUC:AUA, UGA:AGA, etc
Biotechnol. & Biotechnol. Eq. 27/2013/4
The most used stop:post-stop codon pair in bacteria
(Table 5) is Uaa:aaa (3.53 %), followed by Uaa:UUU
(2.42 %), Uaa:aaU (2.11 %), Uaa:UUa (1.78 %), and
Uaa:aUU (1.63 %). This distribution indicates that the
preferential 3’-terminal pairs in bacteria appear to be highly
enriched in A and U.
The frequency of stop:post-stop codon pairs in mammalian
mitochondria containing standard stop codons (Table 6A)
is 62.21 %. The most frequently used pairs are: UAA:AAA
(4.78 %), UAA:UGA (2.60 %), UAA:GCU (2.17 %), UAA:UGG
(2.13 %), UAA:AAU (1.83 %), and UAA:GAA (1.72 %).
Unlike the stop:post-stop codon pairs in bacteria, 37.79 %
of all pairs of this type in mitochondria contain non-standard
stop codons (see Table 6B).
As shown in Table 6B, the first two most frequently used
stop:post-stop codon pairs with non-standard stop codons
(CCU:CAC and AAU:AGG) contain the two most used nonstandard stop codons CCU and AAU (see Table 2). However,
the frequency of occurrence of the other highly used codon
pairs (CUA:AUG, AUA:AUC, ACU:GUA) does not correlate
with the frequency of usage of the corresponding non-standard
stop codons.
4021
TABLE 6
Frequency of occurrence of the stop:post-stop codon pairs in
mammalian mitochondria.
6A. Standard stop:post-stop codon pairs in mammalian
mitochondria (see also Appendix 6)
Codon
Freq. (%)
Codon
Freq. (%)
Codon
Freq. (%)
Uaa:aaa 4.7793
Uaa:aUU 1.3377
Uaa:UaU 0.7720
Uaa:Uga 2.6047
Uaa:UUa 1.3318
Uaa:Uca 0.7720
Uaa:gcU 2.1746
Uaa:acU 1.2376
Uaa:aga 0.7190
Uaa:Ugg 2.1333
Uaa:Uaa 1.1256
Uaa:UgU 0.6600
Uaa:aaU 1.8269
Uag:aaa 1.1138
Uaa:gUa 0.6188
Uaa:gaa 1.7208
Uaa:UcU 1.0902
Uaa:ccc 0.6188
Uaa:UUU 1.6442
Uaa:caa 0.9959
Uaa:ccU 0.6129
Uaa:aUa 1.4968
Uaa:Uag 0.9429
Uag:UUU 0.6129
Uaa:aag 1.4792
Uaa:aca 0.9370
Uaa:cca 0.6070
Uaa:cga 1.3908
Uaa:acc 0.8840
Uag:ggg 0.5952
6B. Non-Standard stop:post-stop codon pairs in mammalian
mitochondria (see also Appendix 6)
Codon
Freq. (%)
Codon
Freq. (%)
Codon
Freq. (%)
ccU:cac 1.4851
aaU:aUg 0.4597
aga:aag 0.2770
aaU:agg 1.2317
aUa:aUg 0.4420
ccU:gcU 0.2711
cUa:aUg 1.0313
ccU:cgc 0.4420
cUU:gUa 0.2711
aUa:aUc 0.9841
caU:aUU 0.4243
agg:aaa 0.2652
acU:gUa 0.9665
cUU:gcc 0.4066
UaU:gca 0.2652
aga:gUc 0.7190
caU:aUc 0.3595
aUa:aga 0.2416
agg:aag 0.7131
UcU:gUa 0.3595
ccU:gca 0.2416
ccU:gUa 0.6659
acU:gcc 0.3536
cUU:gcU 0.2357
UUa:aUg 0.4832
ccU:gcc 0.3300
ccU:aUU 0.2298
aaU:aga 0.4656
ccU:caU 0.2829
acU:gca 0.2239
Pre-stop:stop:post-stop codon usage in bacteria and
mammalian mitochondria. Taking into consideration that the
efficiency of translation termination depends on both the type
of stop codon used and the context of adjacent nucleotides (stop
codon context), we were motivated to determine the frequency
of occurrence of nonanucleotides representing combinations
of the three 3’-terminal triplets (pre-stop:stop:post-stop codon)
in bacteria and mammalian mitochondria. As illustrated in
Table 7 and Fig. 2 the nucleotide preference at position -1
(the last nucleotide of the pre-stop codon) in bacteria is: A
(32.27 %), U (26.6 %), C (22.5 %), G (18.64 %). The preferred
nucleotides at position -2 (the second nucleotide of the prestop codon) are also A or U (63.19 %). The same holds true
for position +1 (the first nucleotide following the stop codon)
where A and U occur at a frequency of about 30 % each, and G
and C with 22 % and 17 %, respectively. The two nucleotides,
A and U, are also preferential at positions +2 and +3.
4022
Fig. 2. Frequency of occurrence of the pre-stop:stop:post-stop codon triplets in
bacteria. Abscissa first dimension: nucleotide’s number in the nonanucleotide
consisting of pre-stop, stop and post-stop codon triplets; abscissa second
dimension: type of nucleotides at positions -3 to +3 in the nonanucleotides;
ordinate: frequency of occurrence of nonanucleotides in bacterial genes.
TABLE 7
Frequency of occurrence of the pre-stop:stop:post-stop codon
triplets in bacteria and mammalian mitochondria.
Position
-3
-2
-1
1
2
3
+1
+2
+3
U
16.9
23.9
26.6
100.0
0.0
0.0
30.4
25.5
26.7
Bacteria
A
G
31.5 32.1
39.3 18.4
32.3 18.6
0.0
0.0
68.3 31.7
80.6 19.4
30.8 22.1
29.6 22.9
31.4 22.3
C
19.6
18.4
22.5
0.0
0.0
0.0
16.7
22.0
19.7
U
29.1
32.2
33.9
68.6
12.5
25.2
21.9
28.6
25.2
Mitochondria
A
G
28.0 21.4
31.2 16.7
28.5 14.0
17.4
1.6
71.7
7.0
57.6 17.2
38.5 24.6
29.3 22.0
36.8 21.2
C
21.5
20.0
23.6
12.5
8.8
0.0
15.1
20.0
16.8
Vertical column (left) represents nucleotides’ position in the nonanucleotide
consisting of pre-stop:stop:post-stop codon triplets; horizontal row (top)
represents type of nucleotides in positions -3 to +3 in the corresponding
nonanucleotide. The sum of the values in each horizontal row of the two
columns is 100 %.
The preference of A and U in the triplets adjacent to stop
codons is more noticeable in mitochondria (Table 7 and Fig. 3).
In spite of a high number of diverse stop codons employed
in mitochondria, the usage of A/U at position -1 is 62.4 %,
and 60.3 % at position +1. The most avoided nucleotide in
mitochondria at position 1 is G (1.57 %); at position 2, G
(7.0 %) and C (8.8%); and at position 3, C (close to zero).
Translation in both prokaryotes and eukaryotes is
terminated by three stop codons (UAG, UGA and UAA),
which are recognized by two classes (class I and II) of
translation termination/release factors. Class I includes the
release factors RF1 and RF2 (in prokaryotes) and eRF1 and
eRF2 (in eukaryotes), and class II is represented by the release
factor RF3 only. RF1/eRF1 and RF2/eRF2 recognize the
stop codons UAA/UAG and UAA/UGA, respectively. Both
class I release factors hydrolyze the ester bond between the
growing polypeptide chain and the last tRNA in the ribosomal
P site and the RF3, also called ribosome recycling factor. The
Biotechnol. & Biotechnol. Eq. 27/2013/4
latter is employed by prokaryotes only and is not known in
eukaryotes (8, 9).
Fig. 3. Frequency of occurrence of the pre-stop:stop:post-stop codon triplets
in mammalian mitochondria. Abscissa first dimension: nucleotide’s number in
the nonanucleotide consisting of pre-stop, stop and post-stop codon triplets;
abscissa second dimension: type of nucleotides in positions -3 to +3 in the
nonanucleotides; ordinate: frequency of occurrence of nonanucleotides in
mitochondrial genes.
While the translation termination machinery of prokaryotes
and eukaryotes is well understood, limited information is
available on the translation termination in mitochondria. They
contain a separate translation apparatus for the synthesis of
mitochondrion-specific proteins encoded by the mitochondrial
DNA (7). In mammals, mtDNA encodes 13 proteins that play
essential roles in the respiratory chain reaction (6). All proteins
required for mitochondrial translation, however, including
those involved in translation termination, are coded by nuclear
genes and are imported from the cellular cytoplasm.
It is shown that only mtRF1a is necessary and sufficient
for the termination of translation of all 13 mitochondrial
polypeptides in human mitochondria (22, 24). Recent
proteomic analyses uncovered 73 proteins associated with
the mitochondrial ribosomes (17). Later, Richter et al. (16)
postulated that the immature colon carcinoma transcript-1
(ICT1) might be a member of the mitochondrial release
factors family. ICT1 is a component of the 39S mitochondrial
ribosomal subunit that carries a ribosome-dependent peptidyltRNA hydrolase (PTH) activity and is essential for cell
viability. The authors also showed that this PTH activity is
codon nonspecific and speculated that it might be involved in
the hydrolysis of peptidyl-tRNAs in prematurely terminated
(stalled) mitochondrial ribosomes.
Another feature of the mitochondrial translation machinery
is the use of a number of different termination codons. As
discussed, UAA is preferred in all (prokaryotic, eukaryotic
and mitochondrial) translation systems. In mammalian
mitochondria, its usage is close to 48 %, which is much greater
than that of the second standard stop codon UAG (14.22 %).
The UAA:UAG ratio in mitochondria is 3.4, which is higher
than that in bacteria (2.4). The bias for UAA in prokaryotes
and eukaryotes is explained by the fact that it is recognized
by two release factors (RF1/eRF1 and RF2/eRF2), whereas
the other two stop codons (UAG and UGA) are recognized by
Biotechnol. & Biotechnol. Eq. 27/2013/4
one release factor each (12). In mitochondria, however, only
one RF factor, the mitochondrial release factor 1 (mtRF1), is
utilized. Since it recognizes both UAA and UAG codons (11),
it is logical to expect a lower frequency of usage for the UAA
codon in mitochondria. This is contrary to our observation
that UAA appears with a high frequency. The unexpected
high preference for UAA observed here, might be explained
by: i) existence of a putative mitochondrial release factor that
specifically recognizes UAA and not UAG, or ii) existence of
an auxiliary (helper) factor that enhances the activity/affinity
of the mtRF1 for the UAA codon.
Final remarks
In this study we analyzed the frequency of usage of termination
codons in 264 bacterial and 1308 mammalian mitochondrial
genomes. Expectedly, our results for bacteria are in
accordance with published reports from small scale studies.
In mitochondria, however, in addition to the two standard stop
codons UAA and UAG, we revealed 40 additional non-standard
stop codons with a frequency of usage varying from 0.001 % to
5 %. Assuming that the appearance of some extremely rare stop
codons might be due to data processing or sequencing errors,
we determined a reliability threshold of 0.01 %. This led to the
omission of ten extremely rare stop codons and reduction of the
number of the non-standard termination codons to 30 triplets
(their usage is shown in Table 2). Bearing in mind that these
non-standard stop codons may also serve as sense codons in
the standard genetic code, the question is by what mechanism
they are recognized as termination signals in mitochondria.
Certain codons, such as AGA/AGG in human mitochondria
are not recognized by any mt-tRNA or mt-RF and promote
termination via ribosomal frame-shifting (24). Consequently,
they may not be considered to be classical termination codons
per se. The number of the predicted non-canonical termination
codons that actually function as stop codons in mammalian
mitochondria is difficult to estimate, without experimental
support. We believe that our statistical analysis will inspire
future studies designed to shed more light on the potential of
such codons to terminate translation in mitochondria and on
the mechanism of this process.
The ICT1 protein identified in the study of Richter et al.
(16) contains an M domain, which is typical for the RF factors.
Within this domain, GGQ is responsible for the hydrolysis of
the ester bond between the growing polypeptide chain and the
last tRNA. However, the ICT1 protein appears to be devoid
of a stop codon-recognizing NIKS domain. Hypothetically,
this protein sticks to the E-site on the mitochondrial large
ribosomal subunit and cleaves the ester bond independently
of the codon type in the mitoribosomal A site (17). Another
protein, C12orf65, which is devoid of a NIKS domain and,
therefore, hypothetically functioning as a codon non-specific
release factor, has recently been described by Antonicka et al.
(2) and Smits et al. (21). Together with mtRF1a/mtRF1, the
ICT1 and C12orf65 are also considered to be mitochondrial
release factors (mtRFs).
4023
Based on our data and other published reports, we
could conclude that translation termination in mammalian
mitochondria can be realized by both standard and nonstandard (non-canonical) stop codons. In principle, the latter
are sense codons in the standard genetic code but, in a specific
context, could play the role of termination signals.
To study the stop codon context in both prokaryotes and
mitochondria, we used an original program developed by
us (GTA, see above), which allows an analysis to be made
of the frequency of occurrence of nucleotide combinations
of stop codons and adjacent nucleotides (both upstream
and downstream). Thus, the frequency of usage of all
hexanucleotides representing pre-stop:stop and stop:post-stop
codons, and also the nonanucleotides pre-stop:stop:post-stop in
all bacterial and mitochondrial genomes was determined. In a
previous study, we determined the frequency of occurrence of
the pre-stop:stop codon pairs in the E. coli genome and identified
the most frequently used and missing codon pairs (4). In the
present study, we expand our analysis to 264 bacterial and 1308
mitochondrial genomes. As shown in Fig. 2 and Fig. 3, the most
frequently used nucleotides at positions -3, -2, -1 (upstream) and
+1, +2, +3 (downstream) adjacent to the stop codons in both
bacteria and mitochondria is A (30 % or higher), followed by
U. At position -3 in bacteria the most biased nucleotide is G
(32 %). In an experimental system Mottagui-Tabar and Isaksson
(14) varied the nucleotides at positions -1 and -2, using a weak
stop signal UGAA, and observed a well-expressed modulation
of translation termination effect in E. coli and B. subtilis but
not in S. typhimurium. Other studies indicate that the content
of nucleotides located downstream of the stop codon (+ signs)
are also important for the efficiency of translation termination
(15, 23). This finding is supported by bioinformatics analyses
of both prokaryotic and eukaryotic genomes (REF). This is in
strong agreement with the X-ray crystallography analysis by
Dalphin et al. (4) and Korostelev et al. (7), who independently
showed that eRF1 interacts not only with the stop codon situated
in the ribosomal A site, but also with the adjacent nucleotides at
positions -1, -2, +1 and +2.
In addition, the pre-stop:stop codon usage data allowed us
to determine the bias of C-terminal amino acids in bacterial
proteins. As shown in Fig. 4, the most frequently used C-terminal
amino acids in bacteria are: Lys (12.5 %), Ala and Leu (about
8 %), Arg, Glu and Ser (about 7 %), Gly (6.1 %), Asp, Asn, Ile
and Val (about 5 %), Phe and Gln (about 4 %), etc.
Fig. 4. Frequency of usage of C-terminal amino acids in bacterial proteins.
4024
Fig. 5. Frequency of usage of C-terminal amino acids in E. coli proteins.
In terms of usage, the C-terminal amino acids can be
classified into four groups: a) frequently used (Lys, Ala, Leu,
Arg, Glu, Ser and Gly); b) moderately used (Asp, Asn, Ile, Val,
Phe, Gln and Pro); c) rare (Tyr, His and Thr), and d) avoided
(Met, Trp and Cys). As seen in Fig. 4, the group of frequently
used amino acids is represented by: hydrophilic (Lys, Arg,
Glu and Ser), hydrophobic (Ala, Leu and Gly), basic (Lys and
Arg), and acidic (Glu) α-amino acids. To some extent the same
holds true for the group of moderately used amino acids. This
makes it difficult to draw conclusions about the relationship
between chemical nature and frequency of usage of the
C-terminal α-amino acids in bacteria. It should be mentioned
that the data presented in Fig. 4 represents an average for the
264 bacterial species used in this study. The frequency of
occurrence of C-terminal amino acids in E. coli proteins alone
(Fig. 5) indicates that the individual data might substantially
deviate from the average presented in Fig. 4. For instance, the
most preferred C-terminal amino acid in E. coli is Glu (15 %)
and not Lys (12 %) as suggested by the average counts, and the
two moderately used amino acids Gln and Met in the average
distribution are absent in the E. coli proteins.
The program, source codes and Appendices are available
at: http://bio21.bas.bg/kirilov/.
Conclusions
The frequency of occurrence of stop codons as well as of
combinations of stop codons and adjacent upstream and
downstream nucleotides in 482 453 open reading frames
belonging to 264 bacterial and 1308 mammalian mitochondrial
genomes was determined by a novel program (Gene Triplet
Analysis). Based on this analysis, the following conclusions
can be drawn:
• The most frequently used termination codon in both
bacteria and mammalian mitochondria is the standard
termination codon UAA.
• Besides the two standard stop codons (UAA and UAG),
30 other non-standard termination codons are found in
mammalian mitochondria.
• The preferential nucleotides in all three positions (±1 to
±3) adjacent to the termination codons in both bacteria
and mammalian mitochondria are A and U.
Biotechnol. & Biotechnol. Eq. 27/2013/4
• The most frequently used pre-stop:stop codon pairs in
mammalian mitochondria are AAA:UAA (6.044 %),
GAA:UAA (2.724 %) and AAU:UAA (2.054 %).
• The most common post-stop:stop codon pairs in
mammalian mitochondria are UAA:AAA (3.53 %),
UAA:UUU (2.42 %), UAA:AAU (2.11 %), UAA:UUA
(1.78 %), and UAA:AUU (1.63 %).
• The most frequently used C-terminal amino acids in
bacteria are Lys, Ala, Leu, Arg, Glu, Ser, Gly.
• The most frequently used C-terminal amino acids in
mammalian mitochondria are Glu, Lys, Ala, Arg, Leu,
Gly, Ser.
• The most avoided C-terminal amino acids in bacteria
are Met, Trp and Cys.
Acknowledgements
This study was supported by Grant No. IDEAS 02-30/2009
from the National Science Fund of Bulgaria.
References
1. Anjay A. (2012) National Center for Biotechnology Information
(NCBI), Bethesda, Maryland, U.S.A.
2. Antonicka H., Ostergaard E., Sasarman F., Weraarpachai
W., et al. (2010) Am. J. Hum. Genet., 87, 115-122.
3. Bossi L., Ruth J.R., (1980) Nature, 286, 123-127.
4. Boycheva S., Chkodrov G., Ivanov I. (2003) Bioinformatics,
19, 987-998.
5. Boycheva S.S., Bachvarov B.I., Berzal-Heranz A., Ivanov I.G.
(2004) Curr. Microbiol., 48, 97-101.
6. D’Aurelio M., Gajewski C.D., Lenaz G., Manfredi G. (2006)
Hum. Mol. Genet., 15, 2157-69.
7. Hunter S.E., Spremulli L.L. (2004) Mitochondrion, 4, 21-29.
Biotechnol. & Biotechnol. Eq. 27/2013/4
8. Janosi L., Mottagui-Tabar S., Isaksson L.A., Sekine Y., et al.
(1998) EMBO J., 17, 1141-1151.
9. Janosi L., Shimizu I., Kaji A. (1994) P. Natl. Acad. Sci. USA,
91, 4249-4253.
10.Kirilov K., Ivanov I. (2012) Biotechnol. Biotech. Eq., 26, 33103314.
11.Korostelev A., Asahara H., Lancaster L., Laurberg M., et al.
(2008) P. Natl. Acad. Sci. USA, 105, 19684-19689.
12.Laurberg M., Asahara H., Korostelev A., Zhu J., et al. (2008)
Nature, 454, 852-857.
13.Leger M., Dulude D., Steinberg S.V., Brakier-Gingras L.
(2007) Nucleic Acids Res., 35, 5581-5592.
14.Mottagui-Tabar S., Isaksson L.A. (1998) Gene, 212, 189-196.
15.Poole E.S., Brown C.M., Tate W.P. (1995) EMBO J., 14, 151158.
16.Richter R., Rorbach J., Pajak A., Smith P.M., et al. (2010)
EMBO J., 29, 1116-1125.
17.Rorbach J., Richter R., Wessels H.J., Wydro M., et al. (2008)
Nucleic Acids Res., 36, 5787-5799.
18.Sayers E.W., Barrett T., Benson D.A., Bolton E., et al. (2012)
Nucleic Acids Res., 40, D13-25.
19.Schluenzen F., Tocilj A., Zarivach R., Harms J., et al. (2000)
Cell, 102, 615-623.
20.Sharp P.M., Li W.H. (1987) Nucleic Acids Res., 15, 1281-1295.
21.Smits P., Antonicka H., van Hasselt P.M., Weraarpachai W.,
et al. (2011) Eur. J. Hum. Genet., 19, 275-279.
22.Soleimanpour-Lichaei H.R., Kuhl I., Gaisne M., Passos J.F.,
et al. (2007) Mol. Cell, 27, 745-757.
23.Tate W.P., Poole E.S., Dalphin M.E., Major L.L., et al. (1996)
Biochimie, 78, 945-952.
24.Temperley R., Richter R., Dennerlein S., Lightowlers R.N.,
Chrzanowska-Lightowlers Z.M. (2010) Science, 327, 301.
25.Welch M., Govindarajan S., Ness J.E., Villalobos A., et al.
(2009) PLoS One, 4, e7002.
4025