The t-SINE Family and Its Retrotransposition Through Multiple

Unique Mammalian tRNA-Derived Repetitive Elements in Dermopterans:
The t-SINE Family and Its Retrotransposition Through Multiple Sources
Oliver Piskurek,* Masato Nikaido,* Boeadi, Minoru Baba,à and Norihiro Okada*§
*Tokyo Institute of Technology, Faculty of Bioscience and Biotechnology, Department of Biological Sciences, Yokohama, Japan;
Museum Zoologi Bogor (Museum Zoologicum Bogoriense), Puslitbang Biologi-LIPI, Cibinong, Indonesia; àKitakyushu Museum
and Institute of Natural History, Kitakyushu, Japan; and §National Institute for Basic Biology, Department of Cell Biology, Aichi, Japan
Short interspersed nuclear elements (SINEs) are dispersed repetitive DNA sequences that are major components of all
mammalian genomes. They have been described in almost all lineages of Euarchontoglires (rodents, rabbits, primates,
flying lemurs, and tree shrews), except in flying lemurs. Most SINE family members are composed of three distinct
regions: a 59 tRNA-related region, a tRNA-unrelated region, and a short tandem repeat at the 39 end that is AT-rich. The
newly discovered SINE family in Cynocephalus deviates from this common structure. All 30 SINE loci analyzed in this
family lack a tRNA-unrelated region and are composed exclusively of tRNA-related elements. Therefore, this novel
SINE structure, described for the first time in mammalian genomes, was designated as t-SINE. The t-SINE family
exhibits a high copy number and is specific to flying lemurs. Three major t-SINE subfamilies could be distinguished on
the basis of characteristic nucleotides, deletions, insertions, and duplications. These sequence-specific characteristics
within subfamilies and sub-subfamilies reveal that they are derived copies of distinct progenitors. We present
evolutionary relationships between subfamilies and compare relationships between the subfamilies and the isoleucine
tRNA gene. t-SINE amplification occurred through multiple sources and is supposedly mobilized via the L1-encoded
reverse transcriptase-dependent retrotranspositional mechanism in trans.
Introduction
Short interspersed nuclear elements (SINEs) of
approximately 70–500 bp are abundant components of
eukaryotic genomes that are often present at .104 copies
per genome (Weiner, Deininger, and Efstratiadis 1986;
Britten et al. 1988; Okada 1991; Schmid and Maraia 1992;
Deininger and Batzer 1993). Retroposons account for more
than 40% of the human genome (Graur and Li 2000;
International Human Genome Sequencing Consortium
2001). Whereas LINEs belong to the viral superfamily of
retroposons that encodes reverse transcriptase (RT) and
endonuclease (EN), SINEs are transposable elements of the
nonviral retroposon superfamily that do not encode these
enzymes (Weiner, Deininger, and Efstratiadis 1986). The
high copy number of SINEs distinguishes them from other
nonviral repetitive elements. Whereas SINEs propagate by
retrotransposition, most other nonviral retroposons, such as
processed retropseudogenes, are usually not subjected to
multiple rounds of retrotransposition. Furthermore, other
nonviral repetitive elements are nonfunctional and are
normally present at low copy number. Still, there is
evidence that these elements may occasionally become
functional as components of novel genes (Brosius 1999).
To replicate via retrotransposition, SINEs use the
existing enzymatic retrotranspositional machinery of their
LINE partners. RNA-mediated retrotransposition, also
known as target-primed reverse transcription (TPRT),
represents the pioneering model for the relationship
between SINEs and LINEs (Luan et al. 1993). Evidence
supporting this LINE retrotransposition model is provided
by an analysis of the Bombyx mori R2Bm RNA transcript.
Initially, the R2Bm-encoded EN creates a first-strand nick
Key words: t-SINE, tRNA-derived, subfamilies, multiple source
gene model, Cynocephalus variegatus, Dermoptera.
E-mail: [email protected].
Mol. Biol. Evol. 20(10):1659–1668. 2003
DOI: 10.1093/molbev/msg187
Molecular Biology and Evolution, Vol. 20, No. 10,
Ó Society for Molecular Biology and Evolution 2003; all rights reserved.
that is utilized by the R2Bm RT, which subsequently
reverse-transcribes copy DNA that is inserted at the 28S
target site. The TPRT model gave rise to the suggestion
that a similar mechanism may be used for SINE
retrotransposition. It was proposed and established by
our group that most SINEs share the 39 sequence of their
partner LINEs (Ohshima et al. 1996; Okada et al. 1997;
Terai, Takahashi, and Okada 1998; Kajikawa and Okada
2002). During retrotransposition, RTs encoded by LINEs
recognize the corresponding identical 39 ends in SINEs,
thus implementing SINE amplification via mobilization.
This mechanism accounts for the majority of SINE
retrotransposition in most nonmammalian species. The
retrotranspositional machinery of the predominant mammalian LINE family, L1, is an exception to the conserved
39 end-specific region for RT recognition in that no 39 end–
specific sequence (except the poly-A tail) is needed
(Moran et al. 1996; Ostertag and Kazazian 2001).
Additionally, Moran et al. (1996) revealed that mammalian
L1 elements retrotranspose at high frequency in HeLa cells
and that the integration of mammalian retroposons is
mediated by L1-encoded RTs (Jurka 1997). Moreover,
there is no RNA sequence specificity with respect to
retrotransposition mediated by L1 in trans (Esnault,
Maestre, and Heidmann 2000). In other words, each L1
element may not only mobilize its own transcribed RNA
sequences, but may also mobilize SINE transcripts for
retrotransposition via their 39 poly-A sequence. Thus,
mammalian SINEs appear to be amplified by L1-encoded
RTs via a mechanism that does not require an identical 39
end in the SINE, apart from the poly-A sequence.
SINEs are derived either from tRNA or 7SL RNA,
with the majority of SINEs characterized thus far being
derived from tRNAs. SINEs are found in a variety of
eukaryotic species such as tobacco, the yellow fever
mosquito, salmon, and mammals (for a list of references
see Shedlock and Okada 2000). Examples of 7SL RNA–
derived SINEs include the primate Alu families, the rodent
B1 family, and two recently described SINE families in
1659
1660 Piskurek et al.
Tupaia (Tu type I and type II; Nishihara, Terai, and Okada
2002). SINEs derived from tRNA genes are composed of
a tRNA-related region containing RNA polymerase III–
specific internal promoter sequences, a tRNA-unrelated
region, and an AT-rich region (Okada 1991; Okada et al.
1997).
SINE families are classified into subfamilies based on
DNA sequence. Because all characterized SINE families
and subfamilies are restricted to particular phylogenetic
groups, Shedlock and Okada (2000) suggested that they
represent powerful noise-free Hennigian synapomorphies
(Hennig 1966). Murphy et al. (2001a, 2001b) proposed
a phylogeny that divides placental mammals into four
major clades, namely Laurasiatheria, Euarchontoglires,
Xenarthra, and Afrotheria. SINE families and subfamilies
have been described in many lineages of these clades,
except in xenarthran species. For example, in laurasiatherians, CHR-1 and CHR-2 represent SINE families
specific to cetartiodactylan lineages (cetaceans, hippopotamuses, and ruminants; Shimamura et al. 1997, 1999). The
AfroSINE family was created in a common ancestor of
afrotherian species, and another AfroSINE subfamily is
present only in the genomes of hyraxes, elephants, and sea
cows (Nikaido et al. 2003). On the other hand, in
Euarchontoglires (rodents, rabbits, primates, flying lemurs,
and tree shrews), all recognized Alu-SINE subfamilies are
distributed exclusively among primate genomes (Britten
et al. 1988; Schmid 1996). As previously stated, 7SL
RNA–derived SINEs are found in other Euarchontoglires
lineages as well (Nishihara, Terai, and Okada 2002). No
7SL RNA–derived SINE family has yet been detected in
rabbits or flying lemurs. The rabbit genome harbors the C
repeat SINE family (Cheng et al. 1984) that appears to be
derived from tRNA genes (Sakamoto and Okada 1985),
although to date no SINE family has been characterized
in the flying lemur genome.
Here, we describe the isolation and characterization
of a unique tRNA-derived SINE family in dermopterans
(flying lemurs or colugos). All 30 members of this new
SINE family are composed exclusively of tRNA-related
regions. Therefore, this novel SINE structure was denoted
t-SINE. We discuss several aspects of t-SINEs including
the evolution of variable subfamily formations, the flying
lemur–specific t-SINE distribution, the multiple source
gene model as an amplification mechanism, and the
enzymatic retrotranspositional machinery of a corresponding LINE partner for the t-SINE family.
Materials and Methods
Flying Lemur Tissue and Extraction of DNA
Genomic DNA from the Malayan flying lemur
(Cynocephalus variegatus) was isolated by phenol-chloroform extraction as described by Blin and Stafford
(1976).
In Vitro Transcription of Total DNA
In vitro transcription of total genomic dermopterans
DNA in HeLa cell extract was performed as described
previously (Endoh and Okada 1986).
Construction and Screening of Genomic Libraries,
in Vitro Runoff Transcription, and Sequencing of
Cloned DNA
Genomic libraries were constructed by complete
digestion of dermopteran genomic DNA with HindIII,
followed by sedimentation through a sucrose gradient and
selection of DNA fragments of up to 2 kbp. The sizefractionated genomic DNA was ligated into HindIIIdigested pUC18 plasmids at 378C overnight. Aliquots of
the ligation reactions were transformed into Escherichia
coli DH5-a cells. Colonies were transferred to membranes
for screening. The first three SINE loci were identified
by random selection and sequencing of ;60 kbp from
the genomic library as described by Okada, Shedlock,
and Nikaido (2003). Runoff transcripts were generated
in vitro using total genomic dermopteran DNA as well
as dermopteran plasmid DNA digested with Bpu1102I
(GCTNAGC, TaKaRa, Japan) as described by Koishi and
Okada (1991). Additional t-SINE loci were screened using
internal primers (CYN-AB-F GTGCGCCRCTTGGGAAGC, CYN-AB-R CACTGGCTGAGCGAGGTGC,
CYN-C-F GCCTGCCCGTGGCTCACT, CYN-C-R CACCAAGTCAAGGGTTAAGATCC) labeled by primer
extension in the presence of [a-32P]dCTP. [c-32P]dATPlabeled internal primer sequences were also used to
further investigate the evolution of the Cynocephalus tSINE family. Hybridization was performed at 428C
overnight in a solution of 63 SSC, 1% SDS, 23
Denhardt’s solution, and 100 lg/ml herring sperm
DNA and washed at 508C for 10 min in a solution of
23 SSC and 1% SDS. Positive phage clones that
appeared to contain Cynocephalus t-SINE loci were
isolated and the inserts sequenced using universal primers
M4 and RV (TaKaRa) as well as internal Cynocephalus
t-SINE specific primers (see above). Sequencing was
performed with an ABI PRISM 3100 Genetic Analyzer
(Applied Biosystems). Nucleotide sequence data with the
following accession numbers were deposited in GenBank: AY278325 through AY278354.
Dot-Blot Analysis and PCR
Genomic DNA of Saguinus oedipus, Lemur catta,
Cynocephalus variegatus, Tupaia belangeri, Lepus crawshayi, Mus musculus, Pteropus dasymallus, and Sorex
unguiculatus were arrayed onto a GeneScreen Plus
membrane (Du Pont-NEN Products, Boston, Mass.) with
a dot-blot apparatus (model DP-96, Advantec, Tokyo) and
probed with a t-SINE specific oligonucleotide (GACACTGAGGGTTGCGATCCGTT). The total amount of genomic
DNA for the dot-blot hybridization was titrated from 1,000
ng to 10 ng. Linearized plasmid DNA (1–100 ng) containing
a Cynocephalus t-SINE sequence (CYN-CL56) was used as
a positive control. The DNA samples were denatured for 10
min in 2 M NaOH prior to hybridization. Hybridization
conditions were as described above, except that the washing
step was performed two times at 508C for 10 min.
Genomic DNA of Homo sapiens, Saguinus oedipus,
Lemur catta, Cynocephalus variegatus, Tupaia belangeri,
Lepus crawshayi, Mus musculus, and Pteropus dasymallus
New Mammalian Retroposons: t-SINEs in Cynocephalus 1661
FIG. 1.—In vitro transcription of total genomic DNA from squid (lane
1) and Cynocephalus variegatus (lane 2). An arrowhead shows a major
discrete transcript of slightly less than 250 nucleotides. The length of
the squid SINE is known to be 250 nucleotides (Ohshima et al. 1993).
was amplified by polymerase chain reaction (PCR) using
internal t-SINE primers (CYN-C-F, CYN-C-R). The PCR
conditions were as follows: After initial denaturation for
3 min at 948C, 33 cycles were performed consisting of 30 s
denaturation at 948C, 50 s annealing at 568C, and 30 s
elongation at 728C.
Sequence Analyses
Multiple sequence alignments were constructed using
ClustalW (Thompson, Higgens, and Gibson 1994), and
sequence analyses were performed with BioEdit (Hall
1999). Database searches were performed with BlastN
(Altschul et al. 1997). Mouse and human tRNA sequences
were obtained from the tRNA compilation of Sprinzl et al.
(1998) and the tRNAscan-SE program (Lowe and Eddy
1997) and compared with known t-SINEs of Cynocephalus using DNA analysis software (Genetyx version 10.1).
Using Tree-Puzzle 5.0 (Strimmer and von Haeseler 1996),
a maximum likelihood analysis based on the HKY85
model was performed (Hasegawa, Kishino, and Yano
1985) using the discrete gamma distribution (eight
categories) for site heterogeneity (Yang 1996). Puzzling
supports were based on 1,000 replicates.
Results and Discussion
Identification of Retrotransposed t-SINEs Isolated from
the Genome of Cynocephalus
We previously used in vitro transcription of total
genomic DNA to detect SINEs in many species (Endoh
and Okada 1986; Ohshima et al. 1993; Kawai et al. 2002).
This technique was applied to the Cynocephalus genome
to identify a new SINE family, and a discrete transcript of
slightly fewer than 250 nucleotides was generated (fig. 1).
This result clearly indicated that RNA polymerase III–
transcribed repetitive elements were present in the genome
of the flying lemur. The first three such loci were detected
by random selection and sequencing of 100 clones from
a genomic dermopteran library. To verify that these
sequences were identical with the in vitro transcription
product, an in vitro runoff transcription assay (Koishi and
FIG. 2.—DNA sequence comparison of t-SINE loci flanking direct
repeats. The flanking direct repeats with a length of 11–18 bp for 25
t-SINE loci are shown.
Okada 1991) was performed (data not shown). After the
initial characterization of these novel repetitive sequences,
27 additional loci were detected by screening (see
Materials and Methods). Curiously, all loci contained
tRNA-related regions only, along with a poly-A tail.
Initially, the repetitive sequences appeared to be
simple tRNA pseudogenes, which are generated in one of
two ways—either by duplication of tRNA genes or by
retrotransposition of tRNA itself. Duplications of tRNA
genes occur at the DNA level and the resultant genes do
not have direct repeats, whereas a tRNA pseudogene
generated by retrotransposition occurs at the level of RNA
and results in direct repeats flanking the gene. In the
present case, the novel repetitive sequences were dispersed
in the Cynocephalus genome and flanked by direct repeats
(fig. 2), suggesting that they were amplified through
retrotransposition. Accordingly, these repetitive sequences
appeared to have many characteristics of SINEs, although
most SINEs described to date are composed of a 59 tRNArelated region, a tRNA-unrelated region, and a 39 AT-rich
region (Okada 1991; Okada et al. 1997). Given that all the
novel repetitive dermopteran loci contained only tRNArelated regions, this unique SINE structure was designated
as t-SINE. Although this novel Cynocephalus t-SINE
family lacks the tRNA-unrelated region in every locus, all
other essential SINE features are present, including clearly
recognizable flanking direct repeats that are diagnostic of
amplification via RNA intermediates (fig. 2), A and B
boxes for internal RNA polymerase III promoters, a typical
poly-A tail, and diagnostic nucleotides indicating that they
have undergone multiple rounds of retrotransposition
during evolution (see below).
Structural and Evolutionary Aspects of Cynocephalus
t-SINE Family Members
Analysis of the repetitive sequences led us to divide
the Cynocephalus t-SINE family into three different
subfamilies, denoted a-, b-, and c-types (fig. 3). Fourteen
1662 Piskurek et al.
FIG. 3.—Alignment of the 30 Cynocephalus t-SINE family members. DNA sequences are divided into three different subfamilies (a-, b-, and ctype). Mouse and human Ile-tRNA gene sequences were added to the alignment to illustrate the high similarity of each subunit of the dimeric a-type
loci to the Ile-tRNA gene. RNA polymerase III promoter boxes (A þ B) are shown for all subunits of the dimeric a-type (CYN-C, CYN-CL46.2, 64,
162, 263, 290, 293.1, 298, 314, 374) and b-type sequence (CYN-CL180, 293.2, 463.1, 463.2) as well as the trimeric c-type sequence (CYN-A, CYN-B,
CYN-CL1, 8, 39, 46.1, 56, 66, 172, 173, 175, 177, 181, 433, 434, 438). Double stranded regions are boxed. The consensus sequence for each of the
subfamilies is shown at the end of each section. Ac: Acceptor stem, D: D loop stem, An: Anticodon stem, Ps: TwC stem.
New Mammalian Retroposons: t-SINEs in Cynocephalus 1663
Table 1
Calculated Similarity for Two t-SINE tRNA-like Subunits and Subunit Identity Compared to the Human Isoleucine
tRNA Gene
tRNA Ile
a1st subunit
b1st subunit
c1st subunit
a2nd subunit
b2nd subunit
c2nd subunit
a1st subunit
b1st subunit
c1st subunit
a2nd subunit
b2nd subunit
c2nd subunit
74 %
—
—
—
—
—
—
61 %
76 %
—
—
—
—
—
57 %
67 %
86 %
—
—
—
—
80 %
73 %
57 %
53 %
—
—
—
65 %
59 %
53 %
49 %
75 %
—
—
63 %
58 %
55 %
51 %
73 %
92 %
—
c3rd subunit
36
32
36
32
39
57
52
(52)
(45)
(50)
(44)
(55)
(81)
(77)
%
%
%
%
%
%
%
NoTE.—The 8-bp insertion of the variable loop region in the b- and c-type was excluded. Numbers in parentheses are given for comparison of the duplicated sequence
region of the third c subunit only. Bold numbers show postulated evolutionary relationships.
dimeric t-SINE loci were divided into 10 a-type sequences
and 4 b-type sequences while all 16 trimeric t-SINE loci
were categorized in the c-type subfamily. A homology
search revealed that the sequences are closely related to the
isoleucine tRNA gene. The a-type sequences consist of
two tRNA like regions, each with a 74%–80% identity to
the human tRNAIle gene (table 1). Although conserved
nucleotides in the a-type sequences are related to the band c-type sequences, these two subfamilies are less
related to the tRNAIle gene than the a-type sequence.
However, a tRNA-like cloverleaf structure resembling the
human tRNAIle gene (Sprinzl et al. 1998) could be
constructed for all three subfamilies and their subunits.
While these structures are partially disrupted by substitutions, insertions, and deletions in the b- and c-type
subunits, they are primarily conserved in both of the atype subunits (fig. 4). This suggests a tRNAIle origin for
the a-type subunits.
The evolutionary relationships among the novel tSINE subfamilies were analyzed by comparing all the
subunits between subfamilies (table 1). The first and
second tRNA subunits of the b-type are more similar to the
first and second a-type subunit, respectively, than either of
the two b-subunits is to the tRNAIle gene. This result
suggests that the b-type sequence was derived from the atype. The most obvious distinction is an 8-bp insertion in
the first subunit b-type region of the variable loop, which
is absent in the first a-type subunit (fig. 4). Another
distinction between the a-type sequences and the other
subfamilies is the presence of a TCTTT sequence at the
beginning of the poly-A tail that is absent in both the band c-type sequences. In these two subfamilies, the poly-A
tail begins with CGAAAAAGAC. Numerous other diagnostic nucleotides distinguish the a- and b-type
sequences (fig. 3, sites 13, 18, 26, 41, 43, 44, 51–53, 68,
89, 92, 122, 128, 130–132, 136, 139, 140, 142–144, 151,
152, 159). Although not as many obvious diagnostic
nucleotides discriminate the b- from the c-type sequences,
quite a few are still recognized (fig. 3, sites 23, 36, 46, 84,
90, 91, 102, 103, 132, 169, 170).
While the 8-bp insertion is also present in the first
subunit of the c-type, an additional truncated subunit with
a length of 47 bp distinguishes the c- from the b-type. The
first and second subunits of the b- and c-type sequences
exhibit high sequence similarity (table 1). This not only
supports the close relationship between the b- and c-type
sequences but also strongly suggests that the c-type
subfamily is the youngest presented here. The average
length of the 16 trimeric c-type sequences is 230 bp,
whereas the average length of the 14 dimeric a- and btype sequences is ,200 bp. Thus, the major discrete
transcript shown in figure 1 may represent the trimeric ctype sequence and is likely the most abundant subfamily
in the Cynocephalus genome. In a maximum likelihood
analysis of all different subunits, the third c-type subunit
is presented in a sister group relation to the second subunit
of the b-type (fig. 5). In other words, the sequence
similarity of the third c-type subunit and the second btype subunit suggests an origin of the trimeric c-type
subfamily after a partial duplication of the second subunit
of the b-type. The major tree topology in figure 5 does not
change if subfamilies are divided in minor subfamilies
(see below).
Dimeric structures such as the a-type t-SINEs may
be generated in a variety of ways (Rogers 1985). For
example, a dimeric retroposon structure may result from
the duplication of a tRNA gene. Alternatively, an original
gene cluster of two functional tRNA genes may be
responsible for the dimeric structure. This possibility was
discussed for the two arginine tRNA-related monomers of
the Twin SINE family from the vector mosquito Culex
pipiens (Feschotte et al. 2001). The Twin SINE subunits
are separated by a 39-bp spacer sequence that is believed
to correspond to the DNA region ancestrally separating
two functional tRNAArg genes. It has been hypothesized
that the Twin SINE family originated from an unprocessed
RNA polymerase III transcript containing a gene cluster of
two tRNA sequences (Feschotte et al. 2001). However, the
t-SINE family in Cynocephalus does not exhibit an
obvious spacer region between the subunits. Therefore,
the t-SINE may not have originated from an unprocessed
polymerase III transcript of a tRNAIle gene cluster.
A dimeric retroposon structure may be generated in
other ways, including retrotransposition of a tRNA
pseudogene in the vicinity of a tRNA gene and duplication
of retrotransposed tRNA. The latter scenario is possible
because the sequence similarity between a subunit 1 and a
subunit 2 is 73%. However, both subunits show greater
similarity to the tRNAIle gene (table 1). The origin of
dimeric Alu elements in the primate genome was described
as a fusion of a free left Alu monomer (FLAM) with a free
right Alu monomer (FRAM) (Jurka and Zuckerkandl
1991; Quentin 1992). An A-rich linker connects these two
Alu precursors as well as many other dimeric SINE
1664 Piskurek et al.
FIG. 4.—Cloverleaf structure of a human isoleucine tRNA gene (I), and predicted cloverleaf structures of the a-type (II), b-type (III), and c-type
(IV) consensus sequences.
elements. Because there is no such poly-A linker found
between the t-SINE subunits of the flying lemur, the fusion
of two tRNAIle retrotransposed monomers may not be
relevant. At the same time, the t-SINE transcription seems
to be promoted by the first subunit in view of the fact that
a critical nucleotide insertion in the B-Box of the second atype subunit might be responsible for silencing this t-SINE
subunit. The same B-Box mutation can be seen in the
derived b- and c-type sequences as well (fig. 3). A possible
fusion of a transcriptionally active FLAM with a silenced
FRAM was discussed in previous studies concerning the
process of Alu dimerization (Jurka and Zuckerkandl 1991;
Quentin 1992). The Alu fusion model involves two
separate deletions in 7SL genes or 7SL RNA–derived
SINEs followed by retrotransposition and fusion. Such
a model might explain the origin of the dimeric t-SINEs as
well. Human Alu-SINEs as well as rabbit C repeats show
a common tendency to insert into regions where other
SINEs have previously inserted. Whereas human AluSINEs are usually inserted at the 39 end into the poly-A
region (Slagel et al. 1987), C repeats occasionally insert
into the 59 end of previously existing SINE loci (Krane et
al. 1991). Many examples regarding C repeats have been
identified in which new SINE insertions occur near or
within preexisting inserts (Krane et al. 1991). The former
situation has occurred in the Cynocephalus genome as well
(data not shown). Although there are several possible ways
to generate a dimer or trimer structure as seen in t-SINEs,
the precise mechanism remains to be resolved. Given that
we could not detect t-SINE monomers that had been
subjected to multiple rounds of retrotransposition, we can
only suggest that a dimer or trimer structure may have
some retrotranspositional advantage over a monomeric
tRNA-like structure.
New Mammalian Retroposons: t-SINEs in Cynocephalus 1665
FIG. 5.—Schematic of evolutionary relationships between consensus
sequences of tRNA-like subunits (1st, 2nd, and 3rd) of the a-, b-, and ctype t-SINEs analyzed using the maximum likelihood method. Numbers
corresponding to internal nodes represent puzzle support values. Branch
lengths represent nucleotide substitutions per site.
Copy Number of t-SINEs in Cynocephalus, Their
Taxonomic Distribution, and Phylogenetic Inference
To estimate the t-SINE copy number, the size of the
mammalian genome was postulated to be 3 3 109 bp.
Given the fact that the first three t-SINE loci were found by
random selection and sequencing of ;60 kbp of genomic
dermopteran DNA, the assumed copy number is 1.5 3 105
t-SINEs per haploid genome. On the other hand, we
performed a dot-blot experiment (fig. 6) to examine the tSINE distribution in Cynocephalus and other mammalian
species. Based on the results of this experiment, the
assumed t-SINE copy number was estimated as 1 3 106.
Because the c-type sequences seem to be the most
abundant t-SINE members in the Cynocephalus genome,
the sequence designed as a probe was specific to the
duplicated third c-type subunit. The high copy number
shown by the dot-blot experiment is supposedly caused by
the additional hybridization of the probe to the second
subunit of b- and c-type sequences (see above), and
probably even to the second a-type subunit as well.
Hence, the assumed copy number estimated from the
hybridization signal is three to four times higher than could
have been expected had the probe hybridized to one
sequence position only. However, the dot-blot hybridization exhibited a strong signal for Cynocephalus only. The
dot-blot hybridization therefore illustrates that the t-SINE
family discovered in Cynocephalus variegatus is restricted
to dermopteran genomes. To support this result, a PCR
was performed for the flying lemur and other mammalian
species using internal t-SINE primers designed from the atype subfamily (fig. 7). A strong band of the expected size
(140 bp) was observed only for the Cynocephalus sample.
Thus, the new t-SINE family is specific to dermopterans
and cannot be used as a molecular marker to determine the
phylogenetic position of flying lemurs.
A close relationship between anthropoids (higher
primates) and dermopterans has been proposed on the
basis of large data sets of nuclear and mitochondrial DNA
of eutherian mammals (Madson et al. 2001; Murphy et al.
2001a; Arnason et al. 2002). In contrast, Schmitz et al.
(2002) considered that the monophyly of primates is
supported by analyzing SINE insertions (absent in flying
lemurs), and suggests that the colugo is mistakenly joined
with anthropoids in phylogenetic tree reconstructions
based on similarities in the nucleotide compositions of
mitochondrial DNA. A discussion of the phylogenetic
FIG. 6.—Genomic DNA dot-blot. DNA from several mammalian
sources was immobilized in decreasing amounts (1,000 ng, 500 ng, 100
ng, 10 ng). Cloned DNA was used as a positive control (t-SINE) and
negative control (squid SINE) at 100 ng, 50 ng, 10 ng, and 1 ng. AN:
Anthropoidea (Saguinus oedipus) ST: Strepsirhini (Lemur catta), DE:
Dermoptera (Cynocephalus variegatus), SC: Scandentia (Tupaia belangeri), LA: Lagomorpha (Lepus crawshayi), RO: Rodentia (Mus
musculus), CH: Chiroptera (Pteropus dasymallus), EU: Eulipotyphla
(Sorex unguiculatus).
position of dermopterans has been ongoing since the
traditional morphology-based Archonta hypothesis (Gregory 1910; Sargis 2002). Because of their relatively ancient
history, the identification of 7SL RNA–derived SINEs in
Cynocephalus would contribute to the understanding of its
phylogenetic position. 7SL RNA–derived SINEs had been
identified only in primates and rodents until their recent
discovery in tree shrews (Nishihara, Terai, and Okada
2002). These animals, together with rabbits and flying
lemurs, belong to the Euarchontoglires. Thus, we might
expect to find 7SL RNA–derived SINEs or their ancestral
sequences in rabbits and flying lemurs as well.
Members of the Cynocephalus t-SINE Family Are
Amplified Through Multiple Sources and Are Potentially
Retrotransposed by L1-Encoded RT in trans
Both SINEs and processed retropseudogenes belong
to the nonviral superfamily of retroposons. Once processed
retropseudogenes are integrated into a genome, they
generally do not propagate further. Conversely, SINEs
may propagate and generate progeny during evolution.
Each SINE copy has the potential to propagate depending
on the circumstances in the genome (Schmid and Maraia
1992; Shedlock and Okada 2000). In cases where SINEs
are integrated into unfavorable chromosomal locations, it
is possible that they accumulate too many mutations and
ultimately lose their ability to propagate. Since all
propagating SINE loci are dependent on the retrotranspositional LINE machinery, SINEs may also become inactive
if their partner LINE family becomes inert. However, if
mutations are introduced into a SINE copy but do not
influence its ability to propagate, the progeny may be
distinguished by these mutations (diagnostic nucleotides).
The ‘‘mother copy’’ with these mutations is called a source
gene. In the master gene model only a very limited number
of master SINE loci are responsible for long-term
amplification of non-propagating offspring copies. This
model was proposed by earlier studies of Alu subfamilies
(Shen et al. 1991; Deininger et al. 1992; Deininger and
Batzer 1995). At present, however, most SINEs are
believed to amplify according to the multiple source gene
1666 Piskurek et al.
FIG. 7.—Polymerase chain reaction analysis with internal t-SINE
primers. Genomic DNA from mammalian sources was amplified by PCR
using primers CYN-C-F and CYN-C-R as described in Materials and
Methods. M: Marker (/ X174-Hinc II digest), AN1: Anthropoidea (Homo
sapiens), AN2: Anthropoidea (Saguinus oedipus). See figure 6 for the
abbreviations of other mammalian species.
model and can be divided into subfamilies (source genes)
that are able to propagate. The amplification rate of source
genes will increase or decrease over evolutionary time
depending on whether accumulated mutations deactivate
them faster or slower (Schmid and Maraia 1992; Shedlock
and Okada 2000).
For the new t-SINE family described here, we
demonstrated the existence of different subfamilies, each
containing several members. Moreover, subfamilies can be
further divided into sub-subfamilies based on several
diagnostic nucleotides. For example, the c-type subfamily
may be divided into two sub-subfamilies ([c1: CYN-A,
CYN-CL1, 39, 56, 177, 181, 433, 438] [c 2: CYN-B,
CYN-CL8, 46.1, 66, 172, 173, 175, 434]) that are based on
several diagnostic nucleotides (fig. 3, sites 25, 26, 29, 66,
171, 209). c 2 may be subdivided into c 2.1 (CYN-B,
CYN-CL8) and c 2.2 (CYN-CL46.1, 66, 172, 173, 175,
434) based on one diagnostic nucleotide (fig. 3, site 171).
Several c 1 sequences cluster together as well (c 1.1:
CYN-CL56, 181, 433, 438; fig. 3, sites 100–103; c 1.2:
CYN-CL1, 39, 177; fig. 3, sites 46, 143). Finally, it is
possible to divide the a-type subfamily in two subsubfamilies, namely a 1 (CYN-C, CYN-CL263, 293.1,
298, 314) and a 2 (CYN-CL46.2, 64, 162, 290, 374) based
on one diagnostic nucleotide (fig. 3, site 79).
The delineation of subfamilies suggests the existence
of a small number of progenitors that were responsible for
the amplification of t-SINE members. Thus, for t-SINEs,
an amplification process that is equivalent to SINEs having a common structure (containing a tRNA-unrelated
region as well) must also exist. In the multiple source gene
model, SINE offspring copies serve as multiple sources
for subsequent SINE amplifications (Schmid and Maraia
1992; Shedlock and Okada 2000). This evolutionary model
appears to be appropriate for the amplification of t-SINEs
(fig. 8). Unlike in the master gene model, which implies
long-term persistence of individual source genes, t-SINE
source genes were shown to be derived from each other.
The t-SINE subfamilies were subjected to dimerization,
FIG. 8.—Multiple source genetic model for t-SINE subfamily
amplification. The multiple source gene model illustrates the amplification process of t-SINE subfamilies in the genome of Cynocephalus. Some
offspring of a parent t-SINE become inactive, some give rise to
nonpropagating copies, and others propagate and become multiple
sources (a-, b-, and c-type and their sub-subfamily source genes) for
new t-SINE copies during evolution.
duplication (trimerization), deletions, insertions, and substitutions. Certain source genes were successfully retrotransposed during evolution, and their offspring copies can
clearly be recognized to shape subfamily characteristics.
Whereas a-type sequences were revealed to form the
oldest t-SINE subfamily with a 74%–80 % identity to the
tRNAlle, b-type t-SINEs were shown to be derived from
a-type t-SINEs to which they are 75%–76 % identical.
The youngest t-SINE subfamily was presented through the
c-type sequences, which are 81%–92 % identical to the
b-type t-SINE subunits (table 1). The number of t-SINE
copies determined in this study is too limited to assume
the age of major t-SINE lineages. Because the rate of
amplification is expected to be tightly linked to overall
copy numbers, and because c-type sequences are the most
abundant in the genome of the flying lemur, it is expected
that the c-type source gene is highly active and very
successful in its retrotransposition of offspring copies. The
high retrotranspositional frequency of UnaSINE1 may
reflect the affinity of tRNA-derived regions for ribosomes
in order to bring the latter into proximity with polysomes
that are synthesizing UnaL2 proteins (Kajikawa and Okada
2002). It might further be expected that a trimeric structure
consisting exclusively of tRNA-related subunits (as represented in the c-type subfamily) strengthens its affinity
for ribosomes. While this hypothesis has not yet been
proven, it is apparent that the youngest trimeric-structured
t-SINE subfamily members are the most dominant in the
flying lemur genome.
SINEs utilize the retrotranspositional enzymes of
their partner LINEs and therefore the retrotranspositional
machinery of a corresponding LINE must be functional for
t-SINE amplification (Ohshima et al. 1996; Jurka 1997;
Okada et al. 1997). Ohshima et al. (1996) proposed that
tRNA-derived SINEs in different nonmammalian species
were generated by fusion with the 39 ends of their
New Mammalian Retroposons: t-SINEs in Cynocephalus 1667
respective LINEs. Kajikawa and Okada (2002) conclusively demonstrated the mechanism by which SINEs
acquire retrotranspositional activity via LINE-encoded
RTs that recognize the tRNA-unrelated region of SINEs
(identical to that in LINEs). Apart from the 39 poly-A
region, t-SINEs do not contain a tRNA-unrelated region
that could serve as a recognition site for an RT encoded by
a corresponding LINE family. On the other hand, Jurka
(1997) proposed that Alu and other mammalian retroposons make use of the L1 retrotranspositional machinery.
The recognition specificity of the 39-end sequence of RNA
is more relaxed in the LINE family, L1 (Okada et al.
1997). L1 elements and Alu-SINEs do not share a common
39 sequence except for a poly-A tail (Boeke 1997), but L1encoded proteins are believed to be linked to the
amplification of their partner Alu-SINEs (Esnault,
Maestre, and Heidmann 2000). Interestingly, in primate
genomes the copy number of L1 elements is smaller (L1mediated cis retrotransposition) than that of Alus (L1
mediated trans retrotransposition). Another example in
which trans preference is dominant over cis preference is
the larger copy number of UnaSINE1 compared to its
partner LINE-family UnaL2 (Kajikawa and Okada 2002).
Thus, several examples exist in which L1 elements not
only mobilize their own transcribed RNA sequences but
also retrotranspose SINE transcripts via the 39 poly-A
sequence. It is very likely that t-SINE retrotransposition is
mediated through the 39 tail via recognition by L1-encoded
RTs in trans.
Note Added in Proof
Recently, Schmitz and Zischler (2003) published
independently eight sequences of the new tRNA-derived
SINE family in Cynocephalus variegatus and obtained
quite a few similar results.
Acknowledgments
This work was supported by research grants from the
German Academic Exchange Service (DAAD) and from
the Ministry of Education, Culture, Sports, Science and
Technology of Japan. We thank the Indonesian Institute of
Sciences (LIPI) for research permissions (no. 6541/I/KS/
1999 and no. 3452/SU/KS/2002), and the Director of the
Research Center for Biology–LIPI for his sponsorship.
Literature Cited
Altschul, S. F., T. L Madden, A. A. Schaffer, J. Zhang, Z. Zhang,
W. Miller, and D. J. Lipmann. 1997. Gapped BLAST and
PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res. 25:3389–3402.
Arnason, U., J. A. Adegoke, K. Bodin, E. W. Born, Y. B. Esa, A.
Gullberg, M. Nilsson, R. V. Short, X. Xu, and A. Janke. 2002.
Mammalian mitogenomic relationships and the root of the
eutherian tree. Proc. Natl. Acad. Sci. USA 99:8151–8156.
Blin, N., and D. W. Stafford. 1976. A general method for
isolation of high molecular weight DNA from eukaryotes.
Nucleic Acid Res. 3:2303–2308.
Boeke, J. D. 1997. LINEs and Alus—the poly A connection. Nat.
Genet. 16:6–7.
Britten, R. J., W. F. Baron, D. B. Stout, and E. H. Davidson.
1988. Sources and evolution of human Alu repeated
sequences. Proc. Natl. Acad. Sci. USA 85:4770–4774.
Brosius, J. 1999. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory
elements. Gene 238:115–134.
Cheng, J. F., R. Printz, T. Callaghan, D. Shuey, and R. C.
Hardison. 1984. The rabbit C family of short, interspersed
repeats. Nucleotide sequence determination and transcriptional analysis. J. Mol. Biol. 176:1–20.
Deininger, P. L., and M. A. Batzer. 1993. Evolution of
retroposons. Evol. Biol. 27:157–196.
———. 1995. SINE master genes and population biology. Pp.
43–60 in R. Maraia, ed. The impact of short, interspersed
elements (SINEs) on the host genome. R. G. Landes, Georgetown, Tex.
Deininger, P. L., M. A. Batzer, C. A. Hutchison 3rd, and M. H.
Edgell. 1992. Master genes in mammalian repetitive DNA
amplification. Trends Genet. 8:307–311.
Endoh, H., and N. Okada. 1986. Total DNA transcription in
vitro: a procedure to detect highly repetitive and transcribable
sequences with tRNA-like structures. Proc. Natl. Acad. Sci.
USA 83:251–255.
Esnault, C., J. Maestre, and T. Heidmann. 2000. Human LINE
retrotransposons generate processed pseudogenes. Nat. Genet.
24:363–367.
Feschotte, C., N. Fourrier, I. Desmons, and C. Mouches. 2001.
Birth of a retroposon: the Twin SINE family from the vector
mosquito Culex pipiens may have originated from a dimeric
tRNA precursor. Mol. Biol. Evol. 18:74–84.
Graur, D., and W. H. Li. 2000. Fundamentals of molecular
evolution, 2nd ed. Sinauer Associates, Sunderland, Mass.
Gregory, W. K. 1910. The orders of mammals. Bull. Am. Mus.
Nat. Hist. 27:1–524.
Hall, T. A. 1999. BioEdit: a user-friendly biological sequence
alignment editor and analysis program for Windows 95/98/
NT. Nucl. Acids. Symp. Ser. 41:95–98.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the
human-ape splitting by a molecular clock of mitochondrial
DNA. J. Mol. Evol. 22:160–174.
Hennig, W. 1966. Phylogenetic systematics. University of
Illinois Press, Urbana-Champaign, Ill.
International Human Genome Sequencing Consortium. 2001.
Initial sequencing and analysis of the human genome. Nature
409:860–921.
Jurka, J. 1997. Sequence patterns indicate an enzymatic
involvement in integration of mammalian retroposons. Proc.
Natl. Acad. Sci. USA 94:1872–1877.
Jurka, J., and E. Zuckerkandl. 1991. Free left arms as precursor
molecules in the evolution of Alu sequences. J. Mol. Evol.
33:49–56.
Kajikawa, M., and N. Okada. 2002. LINEs mobilize SINEs in the
eel through a shared 39 sequence. Cell 111:433–444.
Kawai, K., M. Nikaido, M. Harada, S. Matsumura, L.-K. Lin, Y.
Wu, M. Hasegawa, and N. Okada. 2002. Intra- and interfamily
relationships of Vespertilionidae inferred by various molecular markers including SINE insertion data. J. Mol. Evol.
55:284–301.
Koishi, R., and N. Okada. 1991. Distribution of the salmonid Hpa
1 family species demonstrated by in vitro runoff transcription
assay of total genomic DNA: a procedure to estimate
repetitive frequency and sequence divergence of a certain
repetitive family with a few known sequences. J. Mol. Evol.
32:43–52.
Krane, D. E., A. G. Clark, J.-F. Cheng, and R. C. Hardison. 1991.
Subfamily relationships and clustering of rabbit C repeats.
Mol. Biol. Evol. 8:1–30.
1668 Piskurek et al.
Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for
improved detection of transfer RNA genes in genomic
sequence. Nucleic Acids Res. 25:955–964.
Luan, D. D., M. H. Korman, J. L. Jakubczak, and T. H. Eickbush.
1993. Reverse transcription of R2Bm RNA is primed by a nick
at the chromosomal target site: a mechanism for non-LTR
retrotransposition. Cell 72:595–605.
Madson, O., M. Scally, C. J. Douady, D. J. Kao, R. W. DeBry, R.
Adkins, H. M. Amrine, M. J. Stanhope, W. W. de Jong, and
M. S. Springer. 2001. Parallel adaptive radiations in two
major clades of placental mammals. Nature 409:610–614.
Moran, J. V., S. E. Holmes, T. P. Naas, R. J. DeBeradinis, J. D.
Boeke, and H. H. Kazazian, Jr. 1996. High frequency
retrotransposition in cultured mammalian cells. Cell 87:917–
927.
Murphy, W. J., E. Eizirik, W. E. Johnson, Y. P. Zhang, O. A.
Ryder, and S. J. O’Brien. 2001a. Molecular phylogenetics and
the origins of placental mammals. Nature 409:614–618.
Murphy, W. J., E. Eizirik, S. J. O’Brien et al. (11 co-authors).
2001b. Resolution of the early placental mammal radiation
using bayesian phylogenetics. Science 294:2348–2351.
Nikaido, M., H. Nishihara, Y. Fukumoto, and N. Okada. 2003.
Ancient SINEs from African endemic mammals. Mol. Biol.
Evol. 20:522–527.
Nishihara, H., Y. Terai, and N. Okada. 2002. Characterization of
novel Alu- and tRNA-related SINEs from the tree shrew and
evolutionary implications of their origins. Mol. Biol. Evol.
19:1964–1972.
Ohshima, K., M. Hamada, Y. Terai, and N. Okada. 1996. The 39
ends of tRNA-derived short interspersed repetitive elements
are derived from the 39 ends of long interspersed repetitive
elements. Mol. Cell. Biol. 16:3756–3764.
Ohshima, K., R. Koishi, M. Matsuo, and N. Okada. 1993.
Several short interspersed elements (SINEs) in distant species
may have originated from a common ancestral retrovirus:
characterization of a squid SINE and a possible mechanism
for generation of tRNA-derived retroposons. Proc. Natl. Acad.
Sci. USA 90:6260–6264.
Okada, N. 1991. SINEs: short interspersed repeated elements of
the eukaryotic genome. Trends Ecol. Evol. 6:358–361.
Okada, N., M. Hamada, I. Ogiwara, and K. Ohshima. 1997.
SINEs and LINEs share common 39 sequence: a review. Gene
205:229–243.
Okada, N., A. M. Shedlock, and M. Nikaido. 2003. TE mapping
in molecular systematics. In P. Capy, ed. Mobile genetic
elements and their application in genomics. Humana Press,
Totowa, N.J. (in press).
Ostertag, E. M., and H. H. Kazazian, Jr. 2001. Biology
of mammalian L1 retrotransposons. Annu. Rev. Genet. 35:
501–538.
Quentin, Y. 1992. Fusion of a free left Alu monomer and a free
right Alu monomer at the origin of the Alu family in the
primate genomes. Nucleic Acids Res. 20:487–493.
Rogers, J. H. 1985. The origin and evolution of retroposons. Int.
Rev. Cytol. 93:187–279.
Sakamoto, K., and N. Okada. 1985. Rodent type 2 Alu family, rat
identifier sequence, rabbit C family, and bovine or goat 73-bp
repeat may have evolved from tRNA genes. J. Mol. Evol.
22:134–140.
Sargis, E. J. 2002. Primate origins nailed. Science 298:1564–
1565.
Schmid, C. W. 1996. Alu: structure, origin, evolution, significance and function of one-tenth of human DNA. Prog.
Nucleic Acids Res. Mol. Biol. 53:283–319.
Schmid, C. W., and R. Maraia. 1992. Transcriptional regulation
and transpositional selection of active SINE sequences Curr.
Opin. Genet. Dev. 2:874–882.
Schmitz, J., M. Ohme, B. Suryobroto, and H. Zischler. 2002. The
colugo (Cynocephalus variegatus, Dermoptera): the primates
gliding sister? Mol. Biol. Evol. 19:2308–2312.
Schmitz, J., and H. Zischler. 2003. A novel family of tRNAderived SINEs in the colugo and two new retrotransposable
markers separating dermopterans from primates. Mol. Phylogenet. Evol. 28:341–349.
Shedlock, A. M., and N. Okada. 2000. SINE insertions: powerful
tools for molecular systematics. Bioessays 22:148–160.
Shen, M. R., M. A. Batzer, and P. L. Deininger. 1991. Evolution
of the master Alu gene(s). J. Mol. Evol. 33:311–320.
Shimamura, M., H. Abe, M. Nikaido, K. Ohshima, and N.
Okada. 1999. Genealogy of families of SINEs in cetaceans
and artiodactyls: the presence of a huge superfamily of
tRNAGlu-derived families of SINEs. Mol. Biol. Evol.
16:1046–1060.
Shimamura, M., H. Yasue, K. Ohshima, H. Abe, H. Kato, T.
Kishiro, M. Goto, I. Munechika, and N. Okada. 1997.
Molecular evidence from retroposons that whales form a clade
within even-toed ungulates. Nature 388:666–670.
Slagel, V., E. Flemington, V. Traina-Dorge, H. Bradshaw, and P.
Deininger. 1987. Clustering and subfamily relationships of the
Alu family in the human genome. Mol. Biol. Evol. 4:19–29.
Sprinzl, M., C. Horn, M. Brown, M. Ioudovitch, and S.
Steinberg. 1998. Compilation of tRNA sequences and
sequences of tRNA genes. Nucleic Acids Res. 26:148–153.
Strimmer, K., and A. von Haeseler. 1996. Quarter puzzling:
a quarted maximum-likelihood method for reconstructing tree
topologies. Mol. Biol. Evol. 13:964–969.
Terai, Y., K. Takahashi, and N. Okada. 1998. SINE cousins: the
39-end tails of two oldest and distantly related families of
SINEs are descended from the 39 ends of LINEs with the same
genealogical origin. Mol. Biol. Evol. 15:1460–1471.
Thompson, J. D., D. G. Higgens, and T. J. Gibson. 1994.
CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
position specific gap penalties and weight matrix choice.
Nucleic Acids Res. 22:4673–4680.
Weiner, A. M., P. L. Deininger, and A. Efstratiadis. 1986.
Nonviral retroposons: genes, pseudogenes, and transposable
elements generated by the reverse flow of genetic information.
Annu. Rev. Biochem. 55:631–661.
Yang, Z. 1996. Among-site rate variation and its impact on
phylogenetic analyses. Trends Ecol. Evol. 11:367–372.
Associate Editor, Naruya Saitou
Accepted May 29, 2003