Sequence Analysis of Transposable Elements in the Sea Squirt

Sequence Analysis of Transposable Elements in the Sea Squirt,
Ciona intestinalis
Martin W. Simmen and Adrian Bird
Institute of Cell and Molecular Biology, University of Edinburgh, Edinburgh, Scotland
A systematic search of 1 Mb of genomic sequences from the sea squirt, Ciona intestinalis, revealed the presence
of six families of transposable elements. The Cigr-1 retrotransposon contains identical 245-bp long terminal repeats
(LTRs) and a 3,630-bp open reading frame (ORF) encoding translation products in the same order as the domains
characteristic of gypsy/Ty3-type LTR retrotransposons. The closest homologs of the reverse transcriptase domain
were in gypsy elements from Drosophila and the sushi element from the pufferfish. However, the capsid-nucleocapsid
region shows the clearest homology to an echinoderm element, Tgr1. Database searches also indicated two classes
of non-LTR retrotransposon, named Cili-1 and Cili-2. The Cili-1 sequences show matches to regions of the ORF2
product of mammalian L1 elements. The Cili-2 sequences possess similarity to the RNaseH domain of Lian-Aa1,
a mosquito non-LTR retrotransposon. The most abundant element was a short interspersed nucleotide element named
Cics-1 with a copy number estimated at 40,000. Cics-1 consists of two conserved domains separated by an A-rich
stretch. The 172-bp 59 domain is related to tRNA sequences, whereas the 110-bp 39 domain is unique. Cics-1 is
unusual, not just in its modular structure, but also in its lack of a 39 poly(A) tail or direct flanking repeats. A second
abundant element, Cimi-1, has an A1T-rich 193-bp consensus sequence and 30-bp terminal inverted repeats (TIRs)
and is usually flanked by A1T-rich 2–4-bp putative target site duplications—characteristics of miniature invertedrepeat transposable elements found in plants and insects. A single 2,444-bp foldback element was found, possessing
long TIRs containing an A1T-rich internal domain, an array of subrepeats, and a flanking domain at the TIR ends;
this is the first example of a chordate foldback element. This study provides the first systematic characterization of
the families of transposable elements in a lower chordate.
Introduction
Eukaryote genomes harbor a bewildering variety of
transposable elements, the function and evolutionary
significance of which are under debate (e.g., Britten and
Davidson 1969; Doolittle and Sapienza 1980; Orgel and
Crick 1980; Brookfield 1995; Labrador and Corces
1997). These elements can be split into two broad classes according to their modes of transposition (Finnegan
1992). Class I elements perform replicative transposition
via an RNA intermediate which is then reverse transcribed into a cDNA molecule and integrated into the
genome. Such ‘‘retroelements’’ fall into three categories.
First, the long terminal repeat (LTR) retrotransposons
encode the proteins necessary for their own replication
and are closely related to retroviruses (Boeke and Stoye
1997). Second, the non-LTR retrotransposons are also
autonomous but have a different replicative mechanism,
which is believed to utilize the poly(A) tail in their 39
ends (Luan et al. 1993). Finally, short interspersed nucleotide elements (SINEs) are short elements containing
an RNA polymerase III promoter and a 39 end which is
A-rich or, less frequently, consists of simple repeats. In
most SINE families the pol III promoter region is derived from a particular tRNA gene, although the abundant human Alu sequences have homology to part of the
7SL RNA gene (Deininger 1989). The discovery that
Abbreviations: RT, reverse transcriptase; TIR, terminal inverted
repeat.
Key words: retrotransposon, SINEs, LINEs, foldback element, inverted repeat, Ciona intestinalis.
Address for correspondence and reprints: Martin W. Simmen, Institute of Cell and Molecular Biology, University of Edinburgh, Mayfield Road, King’s Buildings, Edinburgh EH9 3JR, United Kingdom.
E-mail: [email protected].
Mol. Biol. Evol. 17(11):1685–1694. 2000
q 2000 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
some SINEs are homologous to the extreme 39 ends of
non-LTR retrotransposons (Ohshima et al. 1996; Okada
et al. 1997) has given support to the hypothesis that
SINEs depend on the transpositional machinery of nonLTR retrotransposons for mobility (Luan et al. 1993).
Class II elements mobilize via a DNA intermediate
which is excised and reintegrated elsewhere in the genome by a transposase (Plasterk 1995). Such DNA
transposons possess terminal inverted repeats (TIRs)
containing transposase-binding sites. Some elements
contain an open reading frame (ORF) encoding transposase (e.g., P elements in Drosophila), but copies often
become nonautonomous through mutation. Other element families show characteristics of DNA-mediated
mobilization but neither encode transposase nor appear
to be derivatives of autonomous DNA transposons. Examples are the miniature inverted-repeat transposable elements (MITEs) found in plants (Wessler, Bureau, and
White 1995) and animals (Ünsal and Morgan 1995; Tu
1997) and the foldback elements distinguished by long
modular TIRs containing arrays of direct subrepeats
(Truett, Jones, and Potter 1981; Liebermann et al. 1983;
Rebatchouk and Narita 1997).
Molecular studies on ascidian development play an
important role in attempts to understand the origin of
vertebrates. As primitive members of the phylum Chordata, ascidians in larval stages display many vertebratelike characteristics, such as a dorsal nerve and a tail
region. We recently analyzed the genome of one ascidian, the sea squirt, Ciona intestinalis, using sequence
data from short fragments and cosmid inserts of genomic DNA to estimate the number of protein-coding
genes (Simmen et al. 1998).
Apart from the partial sequencing of an LTR retrotransposon (Britten et al. 1995), we are unaware of other
1685
1686
Simmen and Bird
FIG. 1.—Multiple-sequence alignment of reverse transcriptase (RT) sequences from Cigr-1 and other gypsy/Ty3-class retrotransposons. The
sequences are ordered by the degree of identity to Cigr-1 RT. Shading is according to a 50% column consensus: amino acids are shaded black
if they match a consensus and gray if they are similar to a consensus. The seven domains previously identified as highly conserved are demarcated
by vertical struts and labeled I—VII (Xiong and Eickbush 1990). The alignment was constructed using CLUSTAL X (Thompson et al. 1997)
and manual adjustment, and the shading is by MACBOXSHADE. The sequence details are as follows: nomad, Drosophila melanogaster
(AF039416); gypsy, D. melanogaster (AF033821); ZAM, D. melanogaster (AJ000387); TED, Trichoplusia ni (M32662); Ty3-2, Saccharomyces
cerevisiae (M23367); sushi, Fugu rubripes (AF030881); MAGGY, Magnaporthe grisea (L35053); Tgr1, Tripneustes gratilla (M75723); Mag,
Bombyx mori (X17219); Cer1, Caenorhabditis elegans (U15406); Cigr-1, Ciona intestinalis (Z83760).
work on repeats in ascidians. Here, we report the systematic search for repetitive elements in the C. intestinalis sequences. Members of several element classes are
described, namely, a gypsy/Ty3-type LTR retrotransposon, non-LTR retrotransposons, a tRNA-derived SINE,
a MITE, and a foldback element. A report on the methylation status of some of these elements has been presented elsewhere (Simmen et al. 1999).
short fragments (mean read length 592 bp) determined
on a single strand (EMBL accession numbers
AJ226133—AJ227618), and four cosmid sequences, referred to here by their EMBL entry names (accession
numbers in parentheses): cicos1 (Z80904), cicos2
(Z79640), cicos41 (Z83760), and cicos46 (Z83861). The
CIR2 amino acid sequence is given in figure 2 of Britten
et al. (1995).
Materials and Methods
DNA Sequences
Identification of Repetitive Elements
Ciona intestinalis DNA sequences generated in a
previous study were used (Simmen et al. 1998). These
comprised 1,486 sequences from randomly generated
To search for known repetitive elements, all of the
DNA sequences were scanned against the nonredundant
NCBI databases using BLASTN and BLASTX (Altschul et al. 1990). To search for novel repetitive ele-
Transposable Elements in the Sea Squirt
1687
Results and Discussion
Gypsy/Ty3-Class LTR Retrotransposon
FIG. 2.—Phylogenetic tree of Cigr-1 and other gypsy/Ty3-class
LTR retrotransposons and retroviruses. The tree is based on the alignment of the amino acids in the reverse transcriptase domain shown in
figure 1, with the addition of two vertebrate retroviruses included for
comparison: Moloney murine leukemia virus, MoMLV (AF033811),
and Rous sarcoma virus, RSV (V01197). A Copia LTR retrotransposon
(EMBL accession number M11240) was used to root the tree. The
analysis was performed using the neighbor-joining method (Saitou and
Nei 1987), and the bootstrap values shown are percentage values from
1,000 replicates performed using CLUSTALX (Thompson et al. 1997).
ments, the 1,486 short sequences were used to form a
BLAST database, and the four cosmid sequences were
used as queries against this. By visualizing the resulting
hits with the MSPCRUNCH and BLIXEM programs
(Sonnhammer and Durbin 1994), repetitive elements
were identified by their occurrence in different cosmid
regions and in many of the 1,486 short sequences. Preliminary consensus sequences were derived from multiple-sequence alignments of the cosmid hits for each of
the putative elements so detected. Final alignments were
then constructed by aggregating all the sequences from
the short fragments and the cosmids with over 70% nucleotide identity to each of the preliminary consensus
sequences. The Inverted program from the EGCG package (Rice et al. 1996) was used to search the cosmid
sequences for inverted repeats.
Sequence Alignments and Phylogenetic Analysis
Unless stated otherwise, multiple-sequence alignments were generated using Pileup from the GCG Wisconsin Package, version 9.1 (Genetics Computer Group),
using default parameters. In some cases, subsequent manual refinement was performed with an alignment editor,
CINEMA (http://www.biochem.ucl.ac.uk/bsm/dbbrowser/
CINEMA2.1/). Consensus sequences were derived from
the multiple-sequence alignments using the GCG utility
Pretty, with parameter values described in the text. Pairwise alignments were made using the GCG programs Gap
and Bestfit. CLUSTAL X (Thompson et al. 1997) was
used to perform the phylogenetic analysis using the neighbor-joining approach (Saitou and Nei 1987).
The cicos41 cosmid contains a 3,630-bp ORF
(starting at position 1310 on the reverse strand) flanked
45 bp upstream and 63 bp downstream by a pair of
identical 245-bp direct repeats. The repeat unit begins
TG . . . and contains a potential polyadenylation signal
at position 231. Immediately 59 of the downstream repeat is a polypurine tract. These features are characteristic of LTRs, although there is no indication of either
a promoter or the tRNA primer binding site usually
found 39 of the left LTR in retroviruses and retrotransposons. The LTRs are flanked by a putative 2-bp target
site duplication TA. The 1,209-aa putative ORF product
has similarity to domains encoded by the gag and pol
genes of LTR retrotransposons. Several features indicate
membership of the gypsy/Ty3 class of LTR retrotransposons, so we call the complete 4,226-bp element Cigr1 (Ciona intestinalis gypsy/Ty3 retrotransposon). First,
the order of the domains—nucleocapsid (NC), protease
(PR), reverse transcriptase (RT), RNaseH (RH), integrase (IN)—is that found in gypsy/Ty3-type elements. Second, the top hits in a TBLASTN search of the NCBI
nucleotide database with the ORF product were all to
gypsy/Ty3 elements (P values were in the range 10263–
10240; e.g., the hits to nomad of Drosophila melanogaster and sushi of Fugu rubripes had P values of 4 3
10263 and 2 3 10257, respectively), followed by hits to
caulimoviruses (P . 10237), and then to retroviruses (P
. 10222). A similar pattern emerged when searching
with just the RT domain, residues 464–644 (data not
shown), regarded as the most reliable domain upon
which to base classification (Xiong and Eickbush 1990).
An alignment of the RT domains from Cigr-1 and other
LTR retrotransposons reveals all the motifs expected in
RT (fig. 1).
Several gypsy/Ty3-type elements include a short
env gene coding for a polypeptide similar to that necessary for the extracellular transmission of retroviruses;
thus, they are better classed as endogeneous retroviruses,
e.g., gypsy (Pelisson et al. 1994) and ZAM (Leblanc et
al. 1997). Cigr-1 lacks an env gene. Although there are
187 amino acids between the integrase D-DX35E motif
and the C-terminus, this is consistent with the size of
IN domains in other elements. There is also no evidence
of the transmembrane domain expected in an Env
polypeptide.
Several findings indicate that Cigr-1 is a member
of a recently active family. First, Southern blots with
PstI-digested genomic DNA reveal multiple bands, with
different individuals having distinct banding patterns,
indicating that the genomic location of Cigr-1 elements
differs between individuals (Simmen et al. 1999). Second, searching the sequences from 1,486 fragments of
C. intestinalis genomic DNA (see Materials and Methods) revealed four fragments with DNA similarity
(BLASTN; P , 10224) to Cigr-1. In three cases (accession numbers AJ226321, AJ227419, and AJ226522), the
match was 98% or 99%, suggesting that these sequences
lie in recently inserted Cigr-1–type elements. Extrapo-
1688
Simmen and Bird
lating this hit rate directly to the genome yields a Cigr1 copy number estimate of 75. In contrast, the match
with AJ226402 has only 55% nucleotide identity. It is
notable, however, that the entire AJ226402 sequence is
an ORF encoding 221 amino acids of the IN domain
that is 50% identical to the equivalent stretch of the
Cigr-1 IN. This suggests that this genome contains two
families of gypsy/Ty3-type elements.
Multiple gypsy/Ty3 subfamilies within single species have been found before (e.g., Britten et al. 1995).
Further evidence of this in C. intestinalis comes from
CIR2, a 176-aa fragment of the RT/RH domain of a
retroelement found in C. intestinalis during a study
(Britten et al. 1995) of elements in marine species by
PCR amplification using degenerate primers from Tgr1,
a member of the SURL gypsy/Ty3 family, in the Hawaiian sea urchin, Tripneustes gratilla (Springer, Davidson, and Britten 1991). CIR2 shows only a 24% match
with the equivalent region of the Cigr-1 product. Also,
a TBLASTN search shows that the sequences most similar to CIR2 are Tgr1 (P 5 2 3 10224) and the silkworm
gypsy/Ty3 element Mag (P 5 3 3 10222), with the
match to Cigr-1 being weak (P 5 0.003). Thus, distinct
gypsy/Ty3 families in C. intestinalis can be more similar
to elements in other species than to each other.
Evolutionary Relationship of Cigr-1 to Other gypsy/
Ty3 Elements
To investigate Cigr-1’s relationship to LTR retrotransposons in other species, a phylogenetic analysis
was performed. This was based on the alignment in figure 1 of the RT domains of Cigr-1 and representative
LTR retrotransposons, plus two retroviruses, using the
copia element from D. melanogaster to root the tree (the
complete alignment is entry ds43388 in the EMBL sequence alignment database). The neighbor-joining tree
(fig. 2) gives a phylogeny broadly similar to those found
in previous studies (e.g., Malik and Eickbush 1999). As
C. intestinalis is a nonvertebrate chordate, we were interested in Cigr-1’s relationship to the puffer fish element sushi (Poulter and Butler 1998), which has been
shown to be representative of most putative vertebrate
gypsy/Ty3 elements found to date (Miller et al. 1999).
Miller et al. (1999) concluded that there are at least two,
and possibly four, vertebrate gypsy/Ty3 lineages (on the
basis of partial RT sequences), with the non-sushi-like
vertebrate elements being related to the Mag/Tgr1 group
(discussed in Springer and Britten 1993). The fact that
the sushi-like elements cluster with fungal elements
(e.g., MAGGY) in phylogenetic reconstructions, coupled with their apparent absence in other deuterostomes,
has led to the conjecture (e.g., Poulter and Butler 1998;
Miller et al. 1999) that they arose by horizontal transmission from either fungi or plants to an early
vertebrate.
Cigr-1 affords a test of this idea, for if a horizontal
transmission event occurred earlier in the primitive chordates, then the descendant lineage in C. intestinalis
should form a sister group to the vertebrate sushi-like
elements. The fact that Cigr-1 and sushi are not neigh-
bors (fig. 2) suggests either that the putative horizontal
transmission event took place after the divergence of the
ascidians from the protovertebrate line or that a family
of sushi-like elements exists in the Ciona genome that
is yet to be discovered. Either way, the situation is complex, as the analysis also shows that nonvertebrate deuterostomes can contain gypsy/Ty3 elements bearing
more similarity (in the RT domain) to the sushi-like
branch than to the Mag/Tgr1 branch. Additional data
and more rigorous phylogenetic analyses would clearly
be useful in clarifying these issues.
In addition, a TBLASTN search revealed that in
the capsid and nucleocapsid domains, Cigr-1 shows
striking similarity to Tgr1 (P 5 10216) and Mag (P 5
1029), weaker similarity (P . 1025) to Arabidopsis thaliana and HIV-1 sequences, and none to the other sequences represented in figure 2. Lack of sequence conservation precludes a reliable phylogenetic analysis
based on the CA/NC domains, but a close relationship
between Cigr-1 and Tgr1 is also supported by their sharing two rare features: two CX2CX4HX4C RNA-binding
motifs in the NC domain (separated in both cases by six
amino acids), and only one ORF.
The differing phylogenetic signals in the CA/NC
and RT/RH regions suggest that perhaps recombination
events have brought together domains from previously
distinct elements. We speculate that there may be a family of elements in urochordates with homology to the
Mag/Tgr1 group in both the gag and the pol genes. Support for this hypothesis comes from the short CIR2 sequence which shows the strongest similarity to RT/RH
of Tgr1 and Mag and little to Cigr-1. Given the evidence
for recent horizontal transmission of SURL elements (a
family of which Tgr1 is a member) within echinoderms
(Gonzalez and Lessios 1999), another possibility, albeit
a more speculative one, is of a similar, ancient transmission to C. intestinalis.
Non-LTR Retrotransposons
Searching the genomic sequences against the protein database revealed seven fragments which had nonLTR retrotransposons as their closest matches. Three
show similarity (P , 1026) to the ORF2 products of
various vertebrate L1 elements. Figure 3A indicates the
similarities with respect to a typical full-length mouse
element, L1spa (EMBL accession number AF016099)
(Naas et al. 1998). In AJ226259 and AJ226190, the pattern of L1 homology is suggestive of the 59 truncated
copies known to vastly outnumber full-length copies of
mammalian L1’s (Voliva et al. 1983). In AJ226870, the
homology is interrupted by a frameshift and three short
insertions relative to L1spa. In the absence of any overlap
between these sequences, there is no formal proof that
they derive from insertions of a common retrotransposon. However, given their common similarity to vertebrate L1-like elements, we suggest that there is such an
element—or closely related families of elements—
which we label Cili-1.
Figure 3B shows the relationship between four sequences that have their closest matches to Lian-Aa1
Transposable Elements in the Sea Squirt
1689
FIG. 3.—Schematic illustration of the relationship between seven Ciona intestinalis fragments and non-LTR retrotransposons. The shaded
portions of the fragments show similarity at the amino acid level to the corresponding regions of the full-length comparison element. Fragments
are labeled with their EMBL accession numbers (RC denotes reverse complement) and the BLASTX (version 2.0.8) P value and percentage
amino acid identity with the comparison full-length element over the shaded region(s). A, Alignment of the three Ciona fragments which had
strong BLASTX matches to mammalian L1 elements with respect to the representative mouse element L1spa (EMBL accession number
AF016099) (Naas et al. 1998). The locations of the endonuclease (EN) and RT domains in L1spa were determined by comparison to data in
Feng et al. (1996) and Xiong and Eickbush (1990), respectively. AJ226870 RC also has some similarity to L1spa in its 59 end, but in a different
frame from that in the rest of the fragment (data not shown). B, Alignment of the four Ciona fragments which had the strongest BLASTX
matches to the mosquito element Lian-Aa1 (EMBL accession number U87543). The locations of EN, RT, and RNaseH domains in Lian-Aa1
are according to Tu, Isoe, and Guzova (1998), and the shaded portion of Lian-Aa1 is that which has similarity to one or more Ciona fragments.
An arrow indicates the location of a stop codon found in all four fragments, and the horizontal bar indicates the putative 39 untranslated region.
(EMBL accession number U87543), a non-LTR retrotransposon containing a single 1,189-aa ORF found in
the Aedes aegypti mosquito (Tu, Isoe, and Guzova
1998). The sequences overlap each other (mean pairwise
nucleotide identity by Bestfit 96.7%) and have amino
acid homology to a region starting in the 39 end of the
RH domain of Lian-Aa1 and extending to within seven
amino acids of the Lian-Aa1 C-terminus. The putative
Ciona element’s ORF extends seven amino acids farther
at the C-terminus than does the Lian-Aa1 ORF and is
followed by a conserved stretch of 98 bp, which we
speculate is a 39 untranslated region (UTR) (fig. 3B). If
these sequences reflect insertions of a non-LTR retrotransposon, we would predict multiple matches to the
putative 39 UTR in a search of genomic sequence due
to the high frequency of extreme 59 truncation events.
This was indeed observed: BLASTN searches using the
putative 39 UTR (bases 271–368 of AJ226391) found
over a dozen strong matches (P , 1026) to the short
Ciona sequences and three to the cosmids (data not
shown). There is therefore evidence for a second nonLTR retrotransposon in C. intestinalis, Cili-2, which, at
least in its 39 end, bears more homology to insect elements than to vertebrate ones. Extrapolating the observed sample frequencies of Cili-1 and Cili-2 to the
genome suggests copy numbers of approximately 50 per
element.
These findings support a recent phylogenetic analysis which classified all non-LTR elements into 11
clades and suggested that each clade originated in the
Precambrian era and has since evolved purely by vertical descent (Malik, Burke, and Eickbush 1999). Under
this scheme, Cili-1 would likely be in the L1 clade and
Cili-2 in the LOA clade. Cili-2 therefore significantly
broadens the species distribution of the LOA clade,
which previously contained only arthropod elements.
Composite tRNA-Derived SINE
Three short novel repetitive sequences were identified via the strategy detailed in Materials and Methods.
The distribution of two of these—termed a and g (approximately 170 and 100 bp long, respectively)—in the
cosmids revealed that they tended to colocalize, with a
often being found upstream of one or more g sequences.
An association between a and g was also evident from
their distribution in the 1,486 random fragment sequences of genomic DNA.
Further analysis suggested that a and g are the primary components of a composite tRNA-derived SINE,
which we label Cics-1 (fig. 4). The 172-bp a consensus
sequence was derived from the 23 near-full-length copies of a found in the sequences (mean similarity of the
copies to the consensus 94%, SD 3%). Immediately
downstream of all but one of these a copies is a short
poly(A) region followed by a 12-bp motif (consensus
TAATCACCCACA, termed b) and at least a partial g
sequence. (A similar pattern is seen in the data set in
which only the 39 end of a is complete.)
1690
Simmen and Bird
FIG. 4.—Modular structure of the Cics-1 SINE and component consensus sequences. a: boxed regions indicate candidate RNA polymerase
III promoter A and B sites; the single underlined region is tRNA-related according to the tRNAscan-SE program (Lowe and Eddy 1997), and
the double-underlined region is that also found in the AFC family of SINEs in various fish species (see main text). g: the boxed TTTT motif
is a potential RNA polymerase III transcriptional stop signal. Examples of various Cics-1 variants can be found in the relevant sequence
annotations, as follows: a—p(A)-b-g (AJ226662), a 39-p(A)-b-g3 (Z80904), p(A)-b-g4 (AJ226976).
A total of 35 near-full-length g sequences or g clusters (2–4 copies head to tail) were found. Almost all
(32) were flanked immediately 59 by the b motif. In 21
cases, the flanking region also had similarity to the a 39
end (or farther), but in the other 14 cases, the g cluster
appeared to be independent (e.g., in AJ226267). We derived two consensus g sequences, one from copies 39 of
an a, and the other from copies not flanked by a. The
former 98-bp consensus is shown in figure 4; the latter
98-bp consensus is 96% identical but lacks the TTTT
motif.
As indicated in figure 4, a 72-bp tRNA-derived region lies at the a 59 end, containing RNA pol III promoter sites separated by 34 bp, typical of their spacing
in tRNA genes and SINEs (Deininger 1989). The similarity to individual tRNA sequences is moderate; the
closest match is to a tRNA-Thr gene from D. melanogaster (X02575), with 70% identity over bases 5–74 of
a. The relationship was also detected by the tRNAscanSE program (Lowe and Eddy 1997). As in most other
SINEs, this segment is followed by a tRNA-unrelated
sequence. BLASTN searches indicate that the closest
homologs of Cics-1 in this region are AFC SINEs in
African cichlids (Takahashi et al. 1998). Bases 91–112
of a perfectly match bases from almost the same location in AFCs in several cichlids (e.g., sequences
AB016544 from Julidochromis transcriptus and
AB009707 from Tropheus moorii; BLASTN P 5
0.008). Interestingly, this tRNA-unrelated segment of
AFCs has been found to be 74% identical to a 65-bp
‘‘core’’ sequence shared by many families of SINEs in
eukaryotes (Gilbert and Labuda 1999). Comparing the
reference core sequence used in that study (human Ther1 consensus; see fig 4 of Gilbert and Labuda 1999) with
Cics-1-a revealed 55% identity over bases 87–150 of a,
indicating that Cics-1 belongs to the superfamily of SINEs containing this component.
In other respects, however, Cics-1 is unusual. First,
whereas many SINEs have a poly(A) tail, the data indicate that a-p(A) is rarely, if ever, mobilized on its
own. Rather, the almost ubiquitous presence downstream of b and at least part of g suggests that it is the
a-p(A)-b-g fusion that is mobile. Composite SINEs
have previously been found (Kaukinen and Varvio 1992;
Izsvák et al. 1996; Serdobova and Kramerov 1998). Second, the 39 ends of many SINEs are similar to the 39
ends of non-LTR retrotransposons and are thought to
rely on the latter for mobility (Okada et al. 1997). We
therefore searched for any association between the Cili2 39 UTR sequences and Cics-1, but none was apparent.
From sequence data alone, it is impossible to fully
describe how Cics-1 arose or how it mobilizes. However, one possible scenario is that a pol III readthrough
transcript of a tRNA gene or pseudogene coupled to the
SINE core segment was aberrantly polyadenylated then
retrotranscribed and integrated (by enzymes encoded by
Cigr or Cili elements) adjacent to the b-g sequence. This
event brought into proximity the pol III promoter in a
and the pol III transcriptional stop signal in the g 39 end
(fig. 4). It remains unclear, though, how pol III transcripts of the element are reverse transcribed, as the copies lack flanking target site duplications and Cics-1 lacks
the 39 poly(A) tail believed to help prime this step in
other SINEs (Deininger 1989). Cics-1 is not unique in
Transposable Elements in the Sea Squirt
1691
FIG. 5.—Analysis of the Cimi-1 elements. A multiple-sequence alignment was constructed from the 15 full-length Cimi-1 copies found in
the short genomic sequence data set using Pileup. The consensus sequence in the upper panel was derived from this alignment using the GCG
Pretty program with plurality set to 7; copies found in the cosmid sequences give the same consensus (data not shown). The underlined segments
indicate the terminal inverted repeats. The lower panel summarizes the multiple-sequence alignment. The sequences flanking the elements are
indicated, and putative target site duplications are shown in bold italics. Degree of similarity of the element itself to the consensus is indicated
in the central portion. Sequence identifiers are EMBL accession numbers.
this regard: composite SINEs in artiodactyls have simple
repeats at the 39 end. It may be significant that several
Cics-1 copies have [CATT]2–4 at the 39 end (e.g., in
AJ226376, AJ226486, and AJ227046).
Another puzzle concerns the origin of the solitary
b-g copies and g clusters, as b-g lacks a pol III promoter. Perhaps such sequences are the result of incomplete reverse transcription of full-length transcripts
(Weiner, Deininger, and Efstratiadis 1986; Tu 1999). The
mechanism generating g clusters is unknown, although
it may be relevant that a 71-bp sequence (not shown)
containing a 69% match to bases 1–45 of g occurs in
head-to-tail arrays in C. intestinalis; the cosmid cicos1,
for example, contains a 29-copy array spanning bases
26504–28535. Whatever the mechanisms allowing it or
parts of it to mobilize, Cics-1 has been highly successful
in proliferating: extrapolation from the frequency of
complete or partial hits (232 in total) in the sequence
sample suggests a genomic copy number of 40,000.
Miniature Inverted-Repeat Transposable Element
A third short novel repeat was identified via the
strategy detailed in Materials and Methods. Fifteen nearfull-length copies were found, and a 193-bp consensus
sequence was derived (fig. 5). Many incomplete copies,
either truncated or containing internal deletions, were
also found; the copy number was estimated to be
17,000. The element’s features are characteristic of MITEs found in plants and insects (Wessler, Bureau, and
White 1995; Tu 1997), so we label it Cimi-1. First,
Cimi-1 has perfectly matching 30-bp TIRs. Second, the
sequence is A1T-rich (60%). Third, the elements are
usually flanked by 2–4-bp A1T-rich direct repeats, consistent with the bias to A1T-rich insertion target sites
found for other MITEs (Tu 1997). Thirteen out of 15
copies are immediately flanked by TA on both sides; the
two copies that do not are also those with the least similarity to the consensus (fig. 5), consistent with the possibility that the putative original TA repeats have been
altered by mutation. Furthermore, in 6 out of these 13
cases, the direct repeat is TATA. This is a far higher
frequency than expected by chance. In the flanking sequences shown in figure 5, 27% of the dinucleotides are
TA, so the proportion of copies in which the TA direct
repeats are also embedded purely by chance within
TATA repeats on both sides can be estimated as 0.272
5 0.07, sixfold less than the observed proportion (6/13).
As TA and TATA are palindromic, this analysis cannot
establish whether these repeats are target site duplications or part of Cimi-1’s TIRs. In principle, this can be
resolved by examining cases in which Cimi-1 inserts
into a known sequence, but unfortunately no such cases
were found.
Recent work indicates that several MITE families
share TIR sequence similarities with DNA transposons
and that one such family is derived from a Tc1/marinerclass transposon (Feschotte and Mouchès 2000). Cimi1, however, does not share TIR similarity with those
MITE families. Database searches found no non-Ciona
Cimi-1 homologs, but revealed copies in the UTRs and
introns of various C. intestinalis genes; specifically, five
homeobox genes (X83444, X83447, X83453, X83446,
and AJ002028), a MyoD family gene (U80080), and
myoplasmin-C1 (D42167). Other examples of Cimi-1
copies in genes can be found in figure 1 of Simmen et
al. (1999). This association of MITEs with genes is also
found in other animal and plant species (Wessler, Bureau, and White 1995; Tu 1997). In contrast, the abundance of Cimi-1 in a genome of only 162 Mb (Simmen
et al. 1998) argues against the hypothesis (Tu 1997) that
1692
Simmen and Bird
FIG. 6.—Structure of the foldback element. Component names: IR, inverted repeat; ID, inner domain; OD, outer domain; FD, flanking
domain; M, middle sequence; L, left; R, right. Relative to the element’s 59 end, the components have the following nucleotide coordinates: IRFD-L, 1–130; IR-OD-L, 131–696; IR-ID-L, 709–748; M, 749–1717; IR-ID-R, 1718–1757; IR-OD-R, 1758–2315; IR-FD-R, 2316–2444. The
IR-OD 32-bp subrepeat consensus sequence is AGTCTGACAGTTGCAGGTCGTTTTTTTAAAGT.
abundant MITEs will be found only in large, highly repetitive genomes.
Foldback Element
A scan of the four C. intestinalis cosmid sequences
for inverted repeats found one prominent pair in Cicos41. Subsequent analysis revealed a 2,444-bp element
(fig. 6) spanning bases 18327–20770, in which each inverted repeat arm has a modular architecture, including
a tandem array of subrepeats. These features are shared
by foldback transposable elements in various eukaryotes, e.g., in Drosophila (Potter 1982), the sea urchin
(Hoffman-Liebermann et al. 1985), and plants (Rebatchouk and Narita 1997).
Dominating each inverted repeat (IR) arm is an array of contiguous 32-bp subrepeats (in previously studied foldbacks, this was located at the IR termini and
labeled OD). The left IR-OD (IR-OD-L) contains 19
subrepeat copies, and IR-OD-R contains 18 copies; in
both domains, the final subrepeat is incomplete. Three
IR-OD-L subrepeats contain short deletions, and one IROD-R subrepeat contains a 2-bp insertion. Homology
between the subrepeats is high; the majority-rule 32-bp
consensus sequences from the two domains are identical. Internal to IR-OD is a 40-bp domain (IR-ID) which
shows a high match (37/40 bases) between the sequence
of one arm and the reverse complement of the other.
This domain is highly A1T-rich (77%), characteristic of
IR-IDs in other species (Liebermann et al. 1983; Rebatchouk and Narita 1997). A novel feature is the extra
domain (IR-FD) flanking the ODs. One hundred twentyseven of the 130 bases in IR-FD-L are matched in the
complementary IR-FD-R. There is no notable sequence
similarity between the different IR domains, nor do they
have any database homologs. The element is immediately flanked on both sides by the sequence GATATGTTT, consistent with the 8–10-bp target insertion
sequences in other foldbacks (Truett, Jones, and Potter
1981; Hoffman-Liebermann et al. 1985; Rebatchouk and
Narita 1997).
How foldback elements mobilize is unknown, although their structural similarities to class II elements
suggest transposition mediated by a transposase. By
analogy with DNA transposons, the transposase would
be expected to be encoded in the non-repetitive middle
domain (M). However, most foldbacks show no evidence of M encoding proteins and the size and sequence
of M can vary among members of a family (HoffmanLiebermann et al. 1985), suggesting that most copies are
nonautonomous. The Ciona M domain is only 969 bp
and shows no sign of encoding a transposase: the longest ORF encodes a 99-aa product with no similarity to
any known protein. The foldback in cosmid Cicos41
was the only example found, so proof that it belongs to
a family of dispersed repeats will require further work.
If this was found to be true, it would imply that the
Ciona genome also contains an as yet unidentified DNA
transposon encoding a transposase capable of also mobilizing the foldback element.
Conclusions
Ascidians and the other urochordates form a sister
group of vertebrates within the chordate phylum. Larval
ascidians share many morphological similarities with
higher chordates (Satoh and Jeffery 1995), and this,
combined with their well-characterized development
(Nishida 1987), has led to their use as model systems
for studies of chordate development. Until now, little has
been published about the repetitive elements in ascidians
or other nonvertebrate chordates. In the current work,
we searched a small sample of genomic sequences (1
Mb) from the ascidian C. intestinalis and found examples from five major groups of transposable elements: a
gypsy/Ty3-type LTR retrotransposon, two families of
non-LTR retrotransposons, a SINE, a MITE, and a foldback element.
The discovery of these elements should aid efforts
to unravel the evolutionary history and significance of
various classes of eukaryotic mobile elements. Analysis
of the Cigr-1 LTR retrotransposon indicates that its history may have involved domain-swapping, as the RT/
RH domains are similar to those in vertebrate sushi-like
elements, whereas the CA/NC domains bear close similarity to those in echinoderm SURL elements. The nonLTR elements support a recent phylogeny (Malik,
Burke, and Eickbush 1999) which classed all non-LTR
elements into 11 clades; the two Ciona elements fall into
separate clades and significantly broaden the species distribution in the LOA clade. The two most abundant elements are a MITE and a modular tRNA-derived SINE
with several unusual features: no flanking repeats, an
internal poly(A) region, and a downstream segment that
is also found independently in the genome. Finally, the
foldback element is, to our knowledge, the first example
of this class in a chordate. We also speculate that the
genome may harbor additional families of elements, specifically, another branch of gypsy/Ty3 LTR retrotransposons and an autonomous DNA transposon.
Further study of these repeats should be particularly useful in tracing the origins of vertebrate elements.
Transposable Elements in the Sea Squirt
We have already shown that the Ciona host genome is
unlikely to suppress element mobility via the mechanism often suggested as serving this function in mammalian genomes, i.e., cytosine methylation (Simmen et
al. 1999). Finally, we also believe that the current study
validates the strategy of systematically searching genomic sequences for repetitive elements, rather than just
detecting elements from well-known families.
Acknowledgments
We thank Susan Tweedie, Jillian Charlton, and the
anonymous referees for comments on the manuscript.
This work was supported by the Wellcome Trust and the
Biotechnology and Biological Sciences Research Council (United Kingdom). M.W.S. is supported by a Research Training Fellowship in Mathematical Biology
from the Wellcome Trust.
LITERATURE CITED
ALTSCHUL, S. F., W. GISH, W. MILLER, E. W. MYERS, and D.
J. LIPMAN. 1990. Basic local alignment search tool. J. Mol.
Biol. 215:403–410.
BOEKE, J. D., and J. P. STOYE. 1997. Retrotransposons, endogenous retroviruses, and the evolution of retroelements. Pp.
343–435 in J. M. COFFIN, S. H. HUGHES, and H. VARMUS,
eds. Retroviruses. Cold Spring Harbor Laboratory Press,
Plainview, N.Y.
BRITTEN, R. J., and E. H. DAVIDSON. 1969. Gene regulation
for higher cells: a theory. Science 165:349–357.
BRITTEN, R. J., T. J. MCCORMACK, T. L. MEARS, and E. H.
DAVIDSON. 1995. Gypsy/Ty3-class retrotransposons integrated in the DNA of herring, tunicate, and echinoderms. J.
Mol. Evol. 40:13–24.
BROOKFIELD, J. F. Y. 1995. Transposable elements as selfish
DNA. Pp. 130–153 in D. J. SHERRATT, ed. Mobile genetic
elements. Oxford University Press, Oxford, England.
DEININGER, P. L. 1989. SINEs: short interspersed repeated
DNA elements in higher eukaryotes. Pp. 619–636 in D. E.
BERG and M. M. HOWE, eds. Mobile DNA. American Society of Microbiology, Washington, D.C.
DOOLITTLE, W. F., and C. SAPIENZA. 1980. Selfish genes, the
phenotype paradigm and genome evolution. Nature 284:
601–603.
FENG, Q., J. V. MORAN, H. H. KAZAZIAN JR., and J. D. BOEKE.
1996. Human L1 retrotransposon encodes a conserved endonuclease required for transcription. Cell 87:905–916.
FESCHOTTE, C., and C. MOUCHÈS. 2000. Evidence that a family
of miniature inverted-repeat transposable elements (MITEs)
from the Arabidopsis thaliana genome has arisen from a
pogo-like DNA transposon. Mol. Biol. Evol. 17:730–737.
FINNEGAN, D. J. 1992. Transposable elements. Curr. Opin.
Genet. Dev. 2:861–867.
GILBERT, N., and D. LABUDA. 1999. CORE-SINEs: eukaryotic
short interspersed retroposing elements with common sequence motifs. Proc. Natl. Acad. Sci. USA 96:2869–2874.
GONZALEZ, P., and H. A. LESSIOS. 1999. Evolution of sea urchin retroviral-like (SURL) elements: evidence from 40
echinoid species. Mol. Biol. Evol. 16:938–952.
HOFFMAN-LIEBERMANN, B., D. LIEBERMANN, L. H. KEDES, and
S. N. COHEN. 1985. TU elements: a heterogeneous family
of modularly structured eucaryotic transposons. Mol. Cell.
Biol. 5:991–1001.
IZSVÁK, Z., Z. IVICS, D. GARCIA-ESTEFANIA, S. C. FAHRENKRUG, and P. B. HACKETT. 1996. DANA elements: a family
1693
of composite, tRNA-derived short interspersed DNA elements associated with mutational activities in zebrafish (Danio rerio). Proc. Natl. Acad. Sci. USA 93:1077–1081.
KAUKINEN, J., and S. VARVIO. 1992. Artiodactyl retroposons:
association with microsatellites and use in SINEmorph detection by PCR. Nucleic Acids Res. 20:2955–2958.
LABRADOR, M., and V. G. CORCES. 1997. Transposable element-host interactions: regulation of insertion and excision.
Annu. Rev. Genet. 31:381–404.
LEBLANC, P., S. DESSET, B. DASTUGUE, and C. VAURY. 1997.
Invertebrate retroviruses: ZAM, a new candidate in D. melanogaster. EMBO J. 16:7521–7531.
LIEBERMANN, D., B. HOFFMAN-LIEBERMANN, J. WEINTHAL, G.
CHILDS, R. MAXSON, A. MAURON, S. N. COHEN, and L.
KEDES. 1983. An unusual transposon with long terminal
inverted repeats in the sea urchin Strongylocentrotus purpuratus. Nature 306:342–347.
LOWE, T. M., and S. R. EDDY. 1997. tRNAscan-SE: a program
for improved detection of transfer RNA genes in genomic
sequence. Nucleic Acids Res. 25:955–964.
LUAN, D. D., M. H. KORMAN, J. L. JAKUBCZAK, and T. H.
EICKBUSH. 1993. Reverse transcription of R2Bm RNA is
primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595–605.
MALIK, H. S., W. D. BURKE, and T. H. EICKBUSH. 1999. The
age and evolution of non-LTR retrotransposable elements.
Mol. Biol. Evol. 16:793–805.
MALIK, H. S., and T. H. EICKBUSH. 1999. Modular evolution
of the integrase domain in the Ty3/gypsy class of LTR retrotransposons. J. Virol. 73:5186–5190.
MILLER, K., C. LYNCH, J. MARTIN, E. HERNIOU, and M. TRISTEM. 1999. Identification of multiple gypsy LTR-retrotransposon lineages in vertebrate genomes. J. Mol. Evol. 49:
358–366.
NAAS, T. P., R. J. DEBERARDINIS, J. V. MORAN, E. M. OSTERTAG, S. F. KINGSMORE, M. F. SELDIN, Y. HAYASHIZAKI, S.
L. MARTIN, and H. H. KAZAZIAN JR. 1998. An actively
retrotransposing, novel subfamily of mouse L1 elements.
EMBO J. 17:590–597.
NISHIDA, H. 1987. Cell lineage analysis in ascidian embryos
by intracellular injection of a tracer enzyme iii: up to the
tissue restricted stage. Dev. Biol. 121:526–541.
OHSHIMA, K., M. HAMADA, Y. TERAI, and N. OKADA. 1996.
The 39 ends of tRNA-derived short interspersed repetitive
elements are derived from the 39 ends of long interspersed
repetitive elements. Mol. Cell. Biol. 16:3756–3764.
OKADA, N., M. HAMADA, I. OGIWARA, and K. OHSHIMA. 1997.
SINEs and LINEs share common 39 sequences: a review.
Gene 205:229–243.
ORGEL, L. E., and F. H. C. CRICK. 1980. Selfish DNA: the
ultimate parasite. Nature 284:604–607.
PELISSON, A., S. SONG, N. PRUD’HOMME, P. SMITH, A. BUCHETON, and V. CORCES. 1994. Gypsy transposition correlates
with the production of a retroviral envelope-like protein under the tissue-specific control of the Drosophila flamenco
gene. EMBO J. 13:4401–4411.
PLASTERK, R. H. A. 1995. Mechanisms of DNA transposition.
Pp. 18–37 in D. J. SHERRATT, ed. Mobile genetic elements.
Oxford University Press, Oxford, England.
POTTER, S. S. 1982. DNA sequence of a foldback transposable
element in Drosophila. Nature 297:201–204.
POULTER, R., and M. BUTLER. 1998. A retrotransposon family
from the pufferfish (fugu) Fugu rubripes. Gene 215:241–
249.
REBATCHOUK, D., and J. O. NARITA. 1997. Foldback transposable elements in plants. Plant Mol. Biol. 34:831–835.
1694
Simmen and Bird
RICE, P., R. LOPEZ, R. DOELZ, and J. LEUNISSEN. 1996. EGCG
8.1 release notes. EMBNET News 3:2–4.
SAITOU, N., and M. NEI. 1987. The neighbour-joining method:
a new method for reconstructing phylogenetic trees. Mol.
Biol. Evol. 4:406–425.
SATOH, N., and W. R. JEFFERY. 1995. Chasing tails in ascidians: developmental insights into the origin and evolution of
chordates. Trends Genet. 11:354–359.
SERDOBOVA, I. M., and D. A. KRAMEROV. 1998. Short retroposons of the B2 superfamily: evolution and application for
the study of rodent phylogeny. J. Mol. Evol. 46:202–214.
SIMMEN, M. W., S. LEITGEB, J. CHARLTON, S. J. M. JONES, B.
R. HARRIS, V. H. CLARK, and A. BIRD. 1999. Nonmethylated transposable elements and methylated genes in a chordate genome. Science 283:1164–1167.
SIMMEN, M. W., S. LEITGEB, V. H. CLARK, S. J. M. JONES, and
A. BIRD. 1998. Gene number in an invertebrate chordate,
Ciona intestinalis. Proc. Natl. Acad. Sci. USA 95:4437–
4440.
SONNHAMMER, E. L. L., and R. DURBIN. 1994. A workbench
for large scale sequence homology analysis. Comput. Appl.
Biosci. 10:301–307.
SPRINGER, M. S., and R. J. BRITTEN 1993. Phylogenetic relationships of reverse transcriptase and RNase H sequences
and aspects of genome structure in the gypsy group of retrotransposons. Mol. Biol. Evol. 10:1370–1379.
SPRINGER, M. S., E. H. DAVIDSON, and R. J. BRITTEN 1991.
Retroviral-like element in a marine environment. Proc. Natl.
Acad. Sci. USA 88:8401–8404.
TAKAHASHI, K., Y. TERAI, M. NISHIDA, and N. OKADA. 1998.
A novel family of short interspersed repetitive elements (SINEs) from cichlids: the patterns of insertion of SINES at
orthologous loci support the proposed monophyly of four
major groups of cichlid fishes in Lake Tanganyika. Mol.
Biol. Evol. 15:391–407.
THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN,
and D. G. HIGGINS. 1997. The CLUSTALX windows interface: flexible strategies for multiple sequence alignment
aided by quality analysis tools. Nucleic Acids Res. 25:
4876–4882.
TRUETT, M. A., R. S. JONES, and S. S. POTTER. 1981. Unusual
structure of the FB family of transposable elements in Drosophila. Cell 24:753–763.
TU, Z. 1997. Three novel families of miniature inverted-repeat
transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti. Proc. Natl. Acad. Sci.
USA 94:7475–7480.
———. 1999. Genomic and evolutionary analysis of Feilai, a
diverse family of SINES in the yellow fever mosquito, Aedes aegypti. Mol. Biol. Evol. 16:760–772.
TU, Z., J. ISOE, and J. A. GUZOVA. 1998. Structural, genomic,
and phylogenetic analysis of Lian, a novel family of nonLTR retrotransposons in the yellow fever mosquito, Aedes
aegypti. Mol. Biol. Evol. 15:837–853.
ÜNSAL, K., and G. T. MORGAN. 1995. A novel group of families of short interspersed repetitive elements (SINEs) in
Xenopus: evidence of a specific target site for DNA-mediated transposition of inverted-repeat SINEs. J. Mol. Biol.
248:812–823.
VOLIVA, C. F., C. L. JAHN, M. B. COMER, C. A. HUTCHISON
III, and M. H. EDGELL. 1983. The L1Md long interspersed
repeat family in the mouse: almost all examples are truncated at one end. Nucleic Acids Res. 11:8847–8859.
WEINER, A. M., P. L. DEININGER, and A. EFSTRATIADIS. 1986.
Nonviral retroposons: genes, pseudogenes, and transposable
elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 55:631–661.
WESSLER, S. R., T. E. BUREAU, and S. W. WHITE. 1995. LTRretrotransposons and MITEs: important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5:814–
821.
XIONG, Y., and T. H. EICKBUSH. 1990. Origin and evolution of
retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.
HOWARD OCHMAN, reviewing editor
Accepted July 13, 2000