Compilation of tRNA sequences and sequences of tRNA genes

©1993 Oxford University Press
Nucleic Acids Research, 1993, Vol. 21, No. 13
3011-3015
Compilation of tRNA sequences and sequences of tRNA
genes
Sergey Steinberg + , Armin Misch and Mathias Sprinzl*
Laboratorium fur Biochemie, Universitat Bayreuth, Postfach 10 12 51, 8580 Bayreuth, Germany
INTRODUCTION
This compilation contains 2011 sequences of tRNAs and tRNA
genes published so far, including 305 sequences that have been
published since 1991 [1] which covers the literature up to
December 1992. Mutant tRNAs and they genes are not included
in the compilation. Sequences of tRNAs originating from
transformed or differentiated cells are considered as a separate
entry only if their are different from those from which they are
derived.
A summary of all sequences in the compilation is given in Table
1. The sequences in this summary are listed by source, i.e.
organism or organelle. Each source is specified by a four-digit
code: the first three numbers identify the organism and the last
number specifies the isoacceptor. Also included in the table is
the (abbreviated) name of the organism from which the sequence
was derived.
The sequences, references and footnotes for tRNAs, or tRNA
genes included in the sequence database are deposited in the
EMBL Data Library. The references are restricted to the first
complete publication of the sequence unless additional information
(e.g. base modification, corrections, etc.) was later obtained. In
such cases additional references were added.
In order to facilitate a computer analysis, starting with this
edition, the presentation of the sequences has been changed.
Previously, we used the sequences, annotations and alignments
strictly as they were published in the original literature. Only
changes concerning the terminology of modified bases and
numbering of the nucleotides, according to the rules adopted at
the Cold Spring Harbor tRNA Meeting 1979 [2], were
performed. Starting with this edition, a new alignment is used,
which is most compatible with the tRNA phytogeny and the threedimensional structure. In particular the alignment of the nucleotide
residues in the variable region and the alignment of unusual
mitochondrial tRNAs were altered. The tRNAs coding for
selenocysteine were treated as a separate group.
As was the case in the previous edition (1), this publication
does not contain a sequence printout. Instead, the sequences have
been deposited with the EMBL Data Library. This publication
should be therefore quoted as a reference for data obtained
from the electronically accessible EMBL-database.
Information on how to access the sequence files can be obtained
by electronic mail: send e-mail to [email protected] containing the commands 'Help' and 'HELP
TRNA'. The help file will contain all the information needed
to obtain the requested sequence. The tRNA database is also
available via anonymous FTP from FTP.EMBL-Heidelberg.DE
in the directory /pub/databases/trna. It is also distributed quarterly
on the EMBL CD-ROM. Contact the EMBL Data Library,
Postfach 10.2209, 6900 Heidelberg, Germany. Fax: +49 6221
387591, E-mail: [email protected]
Researchers who do not have access to electronic mail and wish
to obtain the sequence information on a floppy disk or as a
hardcopy should contact M.Sprinzl, Laboratorium fur Biochemie,
Universitat Bayreuth, Postfach 101251, D-8580 Bayreuth,
Germany, Fax: +49 921 552432, E-mail: [email protected]
Presentation of sequences
The sequences are divided into three parts. The first two parts
contain the sequences of the tRNA genes and tRNAs,
respectively, which can be fitted into the canonical tRNA structure
and the revised numbering system shown in Fig. 1. The third
part contains tRNA and tRNA gene sequences, mainly of animal
mitochondria, whose secondary structures differ from most
tRNAs or have not yet been established.
Each sequence in the compilation occupies two consecutive
lines. The first line begins with the letter 'D' or 'R' and contains
the six-position identification code of the sequence ('D' or 'R'
for DNA or RNA, respectively; one letter code for the amino
acid, X for methionine-initiator, Z for selenocysteine; and the
four-digit code (Table 1), specifying the organism). After this,
the sequence of the anticodon (in the case of tRNA sequences
in its modified form) is given, followed by the name and the
kingdom of the organism (Table 1), the sequence (99 standard
positions) and a short footnote which contains very brief
information necessary to elucidate features of the sequence. The
second line begins with the sign ' + ' and contains the information
about base-pairing (double helical regions only). All other lines
in the compilation begin with signs other than 'D', 'R' or ' + '
(usually '•') and contain comments.
In all sequences, the nucleotides involved in the formation of
the secondary structure are marked by the sign ' = '
(Watson-Crick pairs) or '*' (GU pairs). Nucleotides 26 and 44
are considered to form a base-pair included in the anticodon stem
(Fig. 1).
The sequences in original publications denoted as 'yeast' are
now assigned to Saccharomyces cerevisiae. The reader should
• To wbom correspondence should be addressed
+
On leave from Engelhardt Institute of Molecular Biology, Vavilova 32, Moscow 117984, Russia
3012 Nucleic Acids Research, 1993, Vol. 21, No. 13
Figure 1. Numbering of nudeotides in tRNAs. Circles represent nudeotides which
are always present; the ovals, nudeotides which arc not present in each structure:
these are nudeotides before the position 1 on the 5'-end, before and after the
two mvariant GMP residues 18 and 19 in the D-loop, and the nudeotides in the
variable loop. The nudeotide to be added at a given site is inrtiratpH by the number
of the preceding nudeotide followed by a colon and a letter in alphabetical order.
The nudeotides in the variable stem have the prefix 'e' and are located between
position 45 and 46 obeying the base-pairing rules. The nudeotides in the 5'-strand
and the 3'-strand are numbered by e l l , e l 2 , el3,... and e21, e22,
e23,...,respectivdy; and the second digit identifies the base-pair. In the case of
a long variable region, the loop can be formed by up to five nudeotides: e l ,
e2, e3, e4 and e5.
Table 1:
THERMOFIL. PENDENS
THERMOPROT. TENAX
096
098
EUBACTERIA
110-239
MYCOPLASMA CAPRIC.
114
MYCOPLASMA MYCOID.
MYCOPLASMA PNEUMO.
MYCOPLASMA PG50
ACHOLEPLASMA LAID.
118
120
122
123
SPIROPLASMA CTTRI
SPIROPLASMA MELIF.
STREPTOMYCES COEL.
STREPTOMYCES RIM.
STREPTOMYCES LTV.
STREPTOMYCES AMBO.
MYCOBACT. TUBERC.
LACTOBAC. BULG.
LACTOBAC.DELBRUEC.
BACILLUS SUBTHJS
125
126
131
134
135
136
140
150
152
154
BACILLUS CIRCULANS
BACILLUS SP. PS3
THERMUS THERMOPHI.
THIOBACILLUS FERRO
E.COLI
156
157
158
162
166
SALMONELLA TYPHI.
PHOTOBACT. PHOSPH.
PHOTOBAC. LEIOGNA.
AEROMONAS HYDROPH.
PSEUDOMONAS AER.
CAMPYLOBAC.JEJUNI
CAULOBACTER CRES.
RHEOBIUM MELILOTI
BORDETELLA PERTUS.
HAEMOPHILUS INFLU.
ANACYSTIS NTDULANS
CYANOPHORA PARAD.
PYLAIF.I.I.A LJTTORA.
170
174
175
178
182
186
190
194
198
200
210
218
222
GM
AALLX
ACDEFGHUKKLLMNP
QRRSSTTVWWXY
ADEFGIMNPRRSTVX
GKLQY
KL
ACDEFGHKKLLLMMN
QRSSTVW
SWW
ACDFIMPRSX
L
EQQXX
CGKNNW
P
P
DEGNPRSV
S
AAAACDEFFGGGHHn
KLLLLLMMNNPQRSS
STTTVWXY
P
DENSV
GGTTY
AI
AACDEFGGGHIIKLLL
LLMNPPPQQRRRRRSS
SSTTTTTVVVWXXYYZ
HLPR
HP
LM
HLPR
AGTTTY
AI
AI
L
L
GKL
AI
AEGILS
AI
ORGANELLES
PART ONE: Sequences of tRNA genes
Source
Code
VIRUSES
000-029
PHAGE T4
PHAGE T5
022
026
ARCHEABACTERIA
030-109
ARCHAEGLOBUS FULG.
HALOBACTERIUM CUT.
HALOBACTERIUM HAL.
HALOBACTERIUM MAR.
HALOBACTERIUM MED.
HALOBACTERIUM VOL.
METHANOBAC.FORMI.
METHANOBAC.THERM.
METHANOCOC.VANI.
METHANOTHRTX SOEH.
METHANOTHERM. FER.
RUMINOBACTER AMYLO
METHANOCOC.VOLTAE
METHANOSPIR. HUNG.
SULFOLOBUS SOLFA.
THERMOPLASMA ACID.
THERMOCOCCUS CELER
034
038
042
044
046
050
058
062
066
067
068
070
074
078
086
090
094
tRNA genes
GILPQRST
AADGHKLMPQSSTVX
A
AC
A
LS
W
CW
A
A
ADEFHDCLNPQRTTVY
A
ADEHIKLMNPST
E
DKPTY
A
FGLSVX
M
A
CHLOROPLASTS
240-359
STREPTOCOCCUS PN.
CYANOPHORA PARAD.
PYLAIELLA LTTTORA.
CHLAMYDOMONAS REIN
CHLAMYDOMO. MOEWU.
CHLOREU.A EI.I.TPSO.
CUCUMIS SATIVUS
EUGLENA GRACHJS
224
240
241
244
246
248
250
252
CRYPTOMONAS SPEC.
SPfROGYRA MAXIMA
ANTTTHAMNION SP.
CYANIDIUM CALDAR.
OUSTHODISCUS LUT.
MARCHANTIA POLYM.
254
255
257
258
259
260
CUSCUTA REFLEXA
COLEOCHAETE ORBIC.
HORDEUM VULGARE
TRITICUM AESTTVUM
ORYZA SATTVA
261
262
264
268
270
ZEA MAYS
EPIFAGUS VIRGINIA.
ARABIDOPSIS THAL.
BRASSICA OLERACEA
272
274
276
280
A
AI
AI
ACDEGIRW
T
AIRS
E
AACDEFGGHKLLLMN
PQRSSTVWXY
AIR
I
AI
ADC
AI
ACDEFGGHHKLLLMN
PPQRRRSSSTTWWXY
M
AI
GGMSTVX
CDEGGMPRSTWXY
ACDEFGGHIILLLMMN
PQRRSSSTTWWY
ACFHILLMNPRSSSTWW
LNR
IM
L
Nucleic Acids Research, 1993, Vol. 21, No. 13
GLYCINEMAX
MEDICAGO SATIVA
NICOTIANA TABACUM
284
288
292
NICOTIANA DEBNEYI
OENOTHERA SP.
GOSSYPIUM HIRSUTUM
PELARGONIUM ZONALE
PENNISETUM AMERICA
PETUNIA HYBRIDA
PISUM SATIVUM
PINUS THUNBERGII
PINUS CONTORTA
SINAPIS ALBA
SPINACIA OLERACEA
SPIRODELA OUGORH.
VIOAFABA
SORGHUM BICOLOR
296
300
302
304
308
312
320
322
323
324
328
332
336
340
MITOCHONDRIA
360-599
SINGLE CELL ORGANISMS
CHLAMYDOMO. REINH.
PARAMECIUM PRIM.
PARAMECIUM TETRA.
PARAMECIUM AURELIA
TETRAHYMENA PYRIF.
TETRAHYMENA THERM.
ASPERGILLUS NIDUL.
NEUROSPORA CRASSA
PODOSPORA ANSERINA
SACCHAROMYCES CER.
SACCHAROMYCES EXI.
PICHIA PUPERI
WILUOPSIS MRAKH
SCHIZOSACCHA.POM.
KLUYVEROMYCES LAC.
CANDIDA PARAPSILO.
HANSENULA WINGEI
TORULOPSIS GLAB.
AIMV
H
ACDEFGGHmKLLLM
NPQRRSSSTTWWXY
H
PW
H
R
I
H
DEGHKLNPRRSTVWXY
DCQ
HK
HKQSV
ACDEHIILMRSSTTVY
NRR
EFHLLTY
L
AND FUNGI 360-419
364
MQW
372
XY
376
WY
377
FWY
380
EFHLWX
384
LXY
388
ACCDEFGGHIKLLMMN
PQRSSTVWXY
392
ACMR
396
DMNSVW
400
AACDEFGHKLMNPQR
RSSTTWWXYY
401
MP
402
LMM
403
KLPQS
404
GHLPQ
405
CKLQ
406
P
407
CEGLPQTWW
408
ACDEFGHIKLMNPQRS
STTVWXY
PLANTS 420-459
ARABIDOPSIS THAL.
GLYCINE MAX
SOLANUM LYCOPERS.
LUPINUS LUTEUS
BRASSICA NAPUS
OENOTHERA SP.
PHASEOLUS VULGARIS
TRITICUM AESTTVUM
ZEA MAYS
MARCHANTIA POLYM.
424
428
430
432
434
436
440
444
448
450
ANIMALS
FASCIOLA HEPATICA
MYTILUS EDULIS
460-599
462
470
ARTEMIA SP.
LOCUSTA MIGRATORIA
AEDES ALBOPICTUS
DROSOPHILA MELANO.
DROSOPHILA YAKUBA
DROSOPHILA YAKUBA
DROSOPHILA VIRILIS
PISASTER OCHRACEUS
ASTERINA PECTINI.
ASTERIAS FORBESH
PARACENTROTUS LIV.
472
476
480
484
488
492
496
498
500
502
504
RAINBOW TROUT
506
EMSSY
EMX
C
GINX
K
FGHLSSSWXY
NSY
CDEFKNPQQSSSWXY
CDEHKMMPSSWXY
ACDEFGGHIKLLLMMN
PQRRRSSTVWY
ADIKNPW
ACDEFGHDCLLMMNPQ
STVWY
E
DGKLLS
AEFGLNRV
CDGKLWY
ACDEFGHDCLNPQRTVWXY
LS
IQX
ACDEGLLNPQTVWXY
ACDGHLLMNPQSVWY
ACDGLLNVWXY
ACDEFFGHIKLLNPQR
STVWXY
FPT
STRONGYLOCEN.PURP.
508
ACIPENSER TRANSM.
GADUS MORHUA
XENOPUS LAEVIS
509
510
512
RANA CATESBEIANA
CEPHALORHYN.COM.
CHICKEN
516
520
522
RAT
528
MOUSE
532
BOVINE
536
GREEN MONKEY
MACACA FUSCATA
MACACA MULATTA
MACACA FASCICULA.
MACACA SYLVANUS
SAIMIRI SCIUREUS
TARSIUS SYRICHTA
LEMUR CATTA
CHIMPANZEE
GIBBON
GORILLA
ORANG UTAN
HUMAN
540
544
548
552
556
560
564
568
572
576
580
584
588
AEPYCEROS MELAMPUS
BOSELAPHUS TRAGOC.
CEPHALOPHUS MAXW.
DAMALISCUS DORCAS
GAZELLA THOMSONI
KOBUS ELLIPSIPRYM.
MADOQUAKIRKI
ORYX GAZELLA
TRAGELAPHUS IMBER.
590
591
592
593
594
595
596
597
598
3013
ACDEFGHIKLLNPQRS
TVWXY
PT
ACFGHKLNRWY
ACDEFFGHIKLLPQRS
TVWXY
AGFILNPQTWXY
FPT
ACDEFGHIKLLMNPQR
STVWY
ACCDDEFGHKKLLNN
NPPQQRTTVWWXXY
ACDEFGHIKLLNPQRT
VWXY
ACDEFGHIKLLNPQRT
VWXY
F
HL
HL
HL
HL
HL
HL
HL
HL
HL
HL
HL
ACDEFGHIKLLNPQRT
VWXY
FV
FV
FV
FV
FV
FV
FV
FV
FV
EUKARYOnC CYTOPLASM 600-999
SINGLE CELL ORGANISMS
TRYPANOSOMA BRUCFJ
TETRAHYMENA PYRIF.
DICTYOSTELIUM DIS.
NEUROSPORA CRASSA
PHYTOPHTHORA PAR.
PODOSPORA ANSERINA
SACCHAROMYCES CER.
AND FUNGI 600-669
605
KKKNNQQRRRTY
606
NQS
616
AEEHKKLMNQRRSSST
TWWWY
620
FL
622
D
624
SS
628
AACDEEFFGHIIKKLL
MNPQQRRRSSSSSTT
WWXXY
SCHIZOSACCHA.POM
632
ADEEFHDCRRSSSVXX
PLANTS 670-749
ARABIDOPSIS THAL.
GLYCINE MAX
PHASEOLUS VULGARIS
NICOTIANA RUSTICA
PETUNIA SP.
SORGHUM BICOLOR
ORYZA SATIVA
TRITICUM AESTIVUM
TRITICUM VULGARE
674
690
698
706
710
714
718
720
724
AFSSSSSSVWWXYYYY
DMX
LPP
ANIMALS 750-999
CAENORHABDI. ELEG.
BOMBYX MORI
DROSOPHILA MELANO.
756
768
774
DROSOPHILA SIMUL.
XENOPUS LAEVIS
CHICKEN
780
792
804
DKLPRWXZ
AAEGK
ADEEEFGGHJKKLLMN
PRRSSTWXYZ
S
AFKLNVXXYYYZ
KPPWZ
Y
N
G
G
Y
S
3014 Nucleic Acids Research, 1993, Vol. 21, No. 13
Table 1, continued....
Source
Code
tRNA genes
MOUSE
RAT
BOVINE
HUMAN
8KT
916
928
999
ACCDEGHKKLPPX
DDEEEFGGKLLLPP
SZ
EEGGKKLLNNPPQQQS
SSSTTWWWXXYY
PART TWO: tRNA Sequences
Code
tRNA
010
014
018
022
026
M
W
PP
GILPQRST
DHLNPQ
HALOBACTERIUM CUT.
HALOBACTERIUM VOL.
038
050
HALOCOCCUS MORRHUA
METHANOBAC.THERM.
SULFOLOBUS ACIDO.
THERMOPLASMA ACID.
054
062
082
090
AGHNQRSTVWX
AAACDEEFGGGGHHK
KLLLLLMNPPPQRRRS
SSTTWWXY
X
GN
X
MX
Source
VIRUSES 000-029
TETRAHYMENA PYRIF.
TETRAHYMENA THERM.
NEUROSPORA CRASSA
SACCHAROMYCES CER.
380 FY
384 W
392
400
ALLTVWXY
FGHKLMPRRSSSTWXY
PLANTS 420-459
SOLANUM TUBEROSUM
LUPINUS LUTEUS
PHASEOLUS VULGARIS
431
432
440
IL
I
FLLLLMPWXY
ANIMALS 460-599
AEDES ALBOPICTUS
HAMSTER
RAT
BOVINE
480
524
528
536
DEGKQRVX
DKR
DDFKLLLRWW
EGIKLLRTVWX
EUKARYOTIC CYTOPLASM 600-999
AVIAN ONCO.-VIRUS
CHICKEN ASV/AMV/RS
MOUSE M-MULV
PHAGE T4
PHAGE T5
ARCHAEBACTERIA 030-109
EUBACTERIA 110-239
MYCOPLASMA CAPRIC.
114
MYCOPLASMA MYCOID.
SPIROPLASMA CTTRI
STREPTOMYCES GRIS.
STREPTOMYCES COEL.
STAPHYLOCOC. EPID.
MYCOBAC. SMEG.
BACILLUS STEARO.
BACILLUS SUBTILIS
118
125
130
131
138
142
146
154
THERMUS THERMOPHI.
E.COU
158
166
SALMONELLA TYPHI.
RHODOSPIRIL. RUB.
AGMENELLUM QUADR.
ANACYSTIS NIDULANS
SYNECHOCYSTIS SP.
170
202
206
210
214
ACDEFGHUKKLLLMN
PQRRSSTTVWWXY
AGIPSTVX
WW
X
G
GG
X
FLVY
AFGKKLMPRSSSTVW
XYY
FIMXX
AAACDEEEFGGGHm
KLLLMNQQRRRRRSSS
SSTTVWWXXYYZ
GGHLPPP
FL
F
LLX
E
CHLAMYDOMONAS REIN
EUGLENA GRACILIS
CODIUM FRAGILE
SCENEDESMUS OBUQ.
HORDEUM VULGARE
TRITICUM AESTTVUM
ZEA MAYS
GLYCINE MAX
PHASEOLUS VULGARIS
SPINACIA OLERACEA
244
252
253
256
264
268
272
284
316
328
E
F
GKMR
MXY
EQ
E
I
MI.
FLLLWX
FIILMPTVWX
MITOCHONDRIA
360-599
ORGANELLES
SINGLE CELL ORGANISMS AND FUNGI 600-669
EUGLENA GRACILJS
TETRAHYMENA THERM.
SCENEDESMUS OBUQ.
NEUROSPORA CRASSA 620
SACCHAROMYCES CER.
604
608
612
FX
628
DF
QQQX
FXY
SCHIZOSACCHA.POM
TORULOPSIS UTILIS
632
636
ACDEFFGGHHIKKLLL
MNPPRRRSSSTTVWWXY
EFY
AILPVXY
PLANTS 670-749
HORDEUM VULGARE
WHEAT GERM
BRASSICA NAPUS
LUPINUS LUTEUS
PHASEOLUS VULGARIS
PISUM SATTVUM
SPINACIA OLERACEA
NICOTIANA RUSTICA
SOLANUM TUBEROSUM
678
682
686
694
698
702
704
706
707
EEF
FGKMRWXYY
F
EFGHIMNPSVXY
LLLLX
F
S
YY
L
ANIMALS 750-999
CAENORHABDI. ELEG.
ASTERINA AMURENSIS
BOMBYX MORI
DROSOPHILA MELANO.
EUPHAUSIA SPERBA
XENOPUS LAEVIS
SALMON LIVER
CHICKEN
MOUSE
RAT
RABBIT LIVER
BOVINE
CALF LIVER
COW MAMMARY GLAND
SHEEP LIVER
HUMAN
756
762
768
774
786
792
798
804
810
916
922
928
934
940
946
999
L
X
AAFFGG
EFHKKSSSWVXY
X
DFX
X
w
EFFFIKKMQQRRVXZ
DDEKKKLLNNQSSSWX
DFKKKMV
DFFLNQRRRSTWYZZ
F
LL
HX
AAEFGGHLMNNQQSV
XYYZ
CHLOROPLASTS 240-359
SINGLE CELL ORGANISMS AND FUNGI 360-419
PART THREE: tRNA and tRNA gene sequences which differ from the
conventional alignment
Source
Code
MITOCHONDRIA
360-599
tRNA/tRNA gene
SINGLE CELL ORGANISMS AND FUNGI 360-419
TRYPANOSOMA BRUCEI 368 AA
ANIMALS 460-599
FASCIOLA HEPATICA
ASCARISSUUM
462
464
ACDEFGHIKLLNPQRS
TVWXYSS
Nucleic Acids Research, 1993, Vol. 21, No. 13 3015
CAENORHABDI.ELEG.
468
MYTTLUS EDULIS
ARTEMIA SP.
AEDES ALBOPICTUS
DROSOPHILA YAKUBA
ASTERINA PECTINI.
PARACENTROTUS LJV.
STRONGYLOCEN.PURP.
GADUS MORHUA
XENOPUS LAEVIS
CHICKEN
HAMSTER
RAT
MOUSE
BOVINE
MACACA FUSCATA
MACACA MULATTA
MACACA FASCICULA.
MACACA SYLVANUS
SAIMIRI SCIUREUS
TARSIUS SYRICHTALEMURCATTA
CHIMPANZEE
GIBBON
GORILLA
ORANG UTAN
HUMAN
470
472
4«0
488
500
504
508
510
512
522
524
528
532
536
544
548
552
556
560
564
568
572
576
580
584
588
ACDEFGHDCLLNPQRT
VWXYSS
SR
F
SS
S
S
S
S
S
SN
S
s
sss
SS
sssss
s
s
s
s
s
s
s
s
s
s
s
sss
EUKARYOTIC CYTOPLASM 600-999
SINGLE CELL ORGANISMS AND FUNGI 600-669
V
TRYPANOSOMA BRUCEI
605
be aware, however, that some of these organisms have possibly
been misclassifled and should consult the original literature.
In contrast to all previous editions, this compilation uses a oneletter code for all nucleotides including those which are modified.
For standard nucleotides, adenosine, cytidine, guanosine,
thymidine and uridine, the usual abbreviations, A, C, G, T and
U, respectively, are used. To designate modified nucleotides,
the remaining ASCII signs are employed as defined at the
beginning of the sequence data. Empty positions are indicated
by a dash. All nucleotide insertions are denoted by underlining
at the place of insertion with a corresponding footnote at the end
of the sequence.
Numbering and alignment of the variable region
The alignment of the variable region has been modified. In
accordance with [3], the extra arm is now placed between
nucleotides 45 and 46 rather than between 47 and 48 as was done
previously [1]. The extra arm now includes two double helical
strands forming a stem and a loop. The annotations of the
nucleotides in the extra arm positions begin with the letter 'e'
(extra) followed by a one- or two-digit number. We have reserved
a space for 7 base pairs in the stem and 5 nucleotides in the loop.
The nucleotides in the loop are numbered from 1 to 5, whereas
the nucleotides in the stem are numbered from 11 to 17
(5'-branch) and from 27 to 21, in the reverse order, (3'-branch),
to indicate base-pair formation between nucleotides 11-21,
12-22, etc. (Fig. 1). The tRNAs with deletions in positions
4 5 - 4 8 will be filled in the order 48, 46, 47, 45; i.e., tRNAs
use position 48, 46, 47 and 45 for the first, second, third and
fourth nucleotide, respectively, depending on the length of the
sequence in this region. A similar situation occurs in tRNAs
without a long extra arm, where the most variable position 47
is deleted in many sequences.
Alignment of animal mitochondrlal tRNAs
In properly aligned tRNA sequences, nucleotides occupying die
same position in different tRNA sequences should play a
comparable structural or functional role. Most animal
mitochondrial tRNAs cannot be easily aligned with other tRNAs
mainly because of the absence of information about their threedimensional structure. Experimental data, however, point to the
existence of tertiary interactions in these tRNAs. In this
compilation, we use an alignment which accounts for these
interactions as much as possible. Where we could do so, the
animal mitochondrial tRNAs were included in Parts I and n. The
problem with animal mitochondrial tRNA alignment is, however,
not yet clear and remains to be elucidated in the future editions
of the compilation, when more experimental and theoretical data
become available.
Some animal mitochondrial tRNAs have completely unusual
secondary structure and cannot be fitted in the tRNA alignment
used here (Part I and II). We treated these sequences separately
including them into a third separate Part HI. Here, each particular
sequence has its own alignment. To this group belong the tRNAs
from:
• mitochondria of a parasitic worm lacking the T-, or Ddomain
• mitochondria of mollusc, insect and echinoderm, with
extended anticodon and T-stems;
• mammalian mitochondria, lacking the D-domain
For some tRNA genes the secondary structure pattern cannot
be clearly established. We have also included these sequences
in the Part HI. It is possible that posttranscriptional modifications
of these tRNAs will result in improvement of the secondary
structure.
ACKNOWLEDGEMENTS
We thank Drs R.Cedergren and H.Grosjean for discussions and
suggestions concerning the presentation of the database in a
computer readable form. This project was supported by Fonds
der Chemischen Industrie, Deutsche Forschungsgemeinschaft,
(Sonder-forschungsbereich 213,) and in part by Medical Research
Council of Canada, (MT 3382).
REFERENCES
1. Sprinzl.M., Dank,N., Nock,S. and Schon A. (1991) Nucl. Acids Res. 19,
2127-2171.
2. in Transfer-RNA: Structure, Properties and Recognition, P.R. Schimmel,
D. Soil, J.N. Abelson, Eds. 1979, Cold Spring Harbor Laboratory, N.Y.
pp.518-519.
3. Steinberg S.V. and Kisselev L.L. (1992) Biochimie 74, 337-351.