MosquI, a Novel Family of Mosquito Retrotransposons Distantly

MosquI, a Novel Family of Mosquito Retrotransposons Distantly Related to
the Drosophila I Factors, May Consist of Elements of More than One
Origin
Zhijian Tu* and Jennifer J. Hill†
*Department of Biochemistry, Virginia Polytechnic Institute and State University; and †Department of Entomology and
Center for Insect Science, University of Arizona
A novel family of non-long-terminal-repeat (non-LTR) retrotransposons, named MosquI, was discovered in the
yellow fever mosquito, Aedes aegypti. There were approximately 14 copies of MosquI in the A. aegypti genome.
Four of the five analyzed MosquI elements were truncated at the 59 ends while one of them, MosquI-Aa2, was fulllength. All five MosquI elements ended with 4–10 TAA tandem repeats, as the Drosophila I factors do. Interestingly,
MosquI elements were often found near genes and other repetitive elements. The 6,623-bp MosquI-Aa2 contained
two open reading frames (ORFs) flanked by a 404-bp 59 untranslated region and a 326-bp 39 untranslated region.
The two ORFs code for nucleocapsids, endonuclease, reverse transcriptase, and RNase H domains. Although overall
structural and sequence comparisons suggest that MosquI is highly similar to the Drosophila I factors, phylogenetic
analysis based on the reverse transcriptase domains of 40 non-LTR retrotransposons indicate that MosquI and I
factors are likely paralogous elements which may have been separated before the split between the ancestors of
mollusca and arthropoda. Pairwise comparisons between the four truncated MosquI elements showed 96.7%–99.5%
identity at the nucleotide level, while comparisons between the full-length MosquI-Aa2 and the truncated copies
showed only 80.2%–81.8% identity. These comparisons and preliminary phylogenetic analyses suggest that the fulllength and truncated MosquI elements may belong to two subfamilies originating from two source genes that
diverged a long time ago. In contrast to the defective I factors in Drosophila melanogaster, which are likely very
old components of the genome, the truncated MosquI elements seem to have been recently active. Finally, the
genomic distribution and evolution of MosquI elements are analyzed in the context of other non-LTR retrotransposons in A. aegypti.
Introduction
Transposable elements are integral components of
eukaryotic genomes. They are classified by the mechanism of their transposition (Finnegan 1992). Class II elements transpose directly from DNA to DNA, while
class I elements transpose via an RNA intermediate.
Class I elements can be further categorized into three
groups, including long terminal repeat (LTR) retrotransposons, non-LTR retrotransposons, and short interspersed nuclear elements (SINEs). Non-LTR retrotransposons utilize internal promoters for their transcription
(Levin 1997). They code for reverse transcriptase and
other functional domains which are essential for retrotransposition. Recent studies suggest that target-primed
reverse transcription, which was first described for the
R2 element of Bombyx mori (Luan et al. 1993), is likely
to be common for non-LTR retrotransposons (Feng et
al. 1996; Levin 1997; Finnegan 1997).
The I factor, a family of non-LTR retrotransposons,
was first discovered in Drosophila melanogaster as the
factor controlling the I-R hybrid dysgenesis, a syndrome
of female sterility resulting from a cross between the
inducer-strain males and the reactive-strain females
(Finnegan 1989; Busseau et al. 1994). The dysgenic
cross results in high rate of transposition of the I factors
Abbreviations: LINE, long interspersed nuclear element; LTR,
long terminal repeat; ORF, open reading frame.
Key words: non-LTR, retrotransposon, Aedes aegypti, Drosophila,
I factor, evolution.
Address for correspondence and reprints: Zhijian Tu, Department
of Biochemistry, Virginia Polytechnic Institute and State University,
Blacksburg, Virginia 24061. E-mail: [email protected].
Mol. Biol. Evol. 16(12):1675–1686. 1999
q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
through a mechanism that is just being understood (e.g.,
Jensen, Gassama, and Heidmann 1999). While both the
inducer and the reactive strains contain defective I factors, only the inducer strains possess the full-length active I factors (Busseau et al. 1994). The defective I factors seem to have been derived from an element that
existed in the D. melanogaster genome long before the
recent invasion of the new active I factor. While the
defective I factors are found only in the pericentromeric
regions, the complete I factors are found on the chromosome arms. Like most non-LTR retrotransposons, the
active I factors use an internal promoter that lies within
the first 186 bp for transcription (McLean, Bucheton,
and Finnegan 1993; Udomkit et al. 1996; Minchiotti,
Contursi, and Di Nocera 1997). However, unlike most
non-LTR retrotransposons, which end with strings of
poly-A (Hutchinson et al. 1989), I factors contain unique
TAA tandem repeats at the 39 end. Active I factors contain two open reading frames (ORFs) that code for nucleocapsids, endonuclease, reverse transcriptase, and
RNase H domains (Fawcett et al. 1986; Finnegan 1989;
Feng et al. 1996; Dawson et al. 1997; Seleme et al.
1999). In addition to D. melanogaster, I factors have
been found in several other Drosophila species, mainly
within the melanogaster species group (Bucheton et al.
1986; Simonelig et al. 1988; Abad et al. 1989).
We here report the discovery and characterization
of MosquI, a novel family of non-LTR retrotransposons
distantly related to the Drosophila I factors. We also
present genomic and evolutionary analysis of MosquI in
the context of other non-LTR retrotransposons. Like
several other transposable elements, MosquI was discovered in Aedes aegypti by serendipity. We are now
1675
1676
Tu and Hill
systematically studying the molecular genetics and evolution of MosquI and other endogenous mosquito transposable elements (Tu 1997, 1999; Tu, Isoe, and Guzova
1998). We hope that such analyses will provide further
insights into the genetic makeup and organization of
mosquito genomes as well as powerful tools which may
facilitate current efforts to control mosquito-transmitted
diseases using genetic engineering.
Materials and Methods
Genomic Library Screening
The l Dash II genomic library used in this study
was prepared from the A. aegypti Rock strain by Dr. A.
A. James of the Department of Molecular Biology and
Biochemistry of the University of California at Irvine.
The genomic library was screened using a digoxigeninlabeled ssDNA probe. This probe was prepared by
asymmetric PCR from a dsDNA template that included
a 600-bp region near the 39 end of MosquI-Aa1. PCR
conditions were the same as those described in Tu and
Hagedorn (1997). Approximately 40,000 plaques were
plated on three 150-mm plates and lifted to MagnaGraph
Nylon membranes (Micron Separation Inc., Westborough, Mass.). The prehybridization solution was 5 3
SSC, 0.1% N-lauroylsarcosine, 0.02% SDS, and 2%
nonfat milk. Hybridization occurred at 658C using 20
ng/ml of the digoxigenin-labeled ssDNA probe. The final washes were carried out in 0.5 3 SSC containing
0.1% SDS at 658C. Prehybridization, hybridization, and
washing were performed in a Gene Roller from Savant
Instruments, Inc. (Holbrook, N.Y.)
Estimation of Copy Numbers
The copy number of the MosquI elements in the A.
aegypti genome was estimated during the above screening experiment based on the ratio of positive plaques to
the total number of plaques screened, taking into account the known size of the haploid genome of A. aegypti Rock strain (800 Mb; Rao and Rai 1987) and the
16 kb average insert size of the genomic library. Details
of the method are described in Tu, Isoe, and Guzova
(1998).
Phage DNA Purification, Subcloning, and DNA
Sequencing
Phage DNA was purified according to Sambrook,
Fritsch, and Maniatis (1989). Fragments of the phage
DNA insert were separated by gel electrophoresis and
subcloned into pBluescript SK (2) plasmid from Stratagene Cloning Systems (La Jolla, Calif.). MosquI sequences were determined from both strands by the DNA
Sequencing Facility of the University of Arizona using
synthetic primers and an automatic sequencer (Model
377, Applied Biosystems International, Forster City, Calif.).
Sequence Analysis
Searches for matches of either nucleotide or amino
acid sequences in the database (nonredundant GenBank
1 EMBL 1 DDBJ 1 PDB) were done using FASTA
of GCG, version 9.0 (Genetics Computer Group, Mad-
ison, Wis.), and BLAST (Altschul et al. 1997). E, the
expected frequency of chance occurrence in finding segments with at least a certain level of similarity or higher
between the query and a database sequence, indicates
the significance of the similarities identified in a BLAST
search. Pairwise comparisons were accomplished using
Bestfit and Gap of GCG. Multiple sequences were
aligned by Pileup, which is a progressive, pairwise
method from GCG (gap weight 5 8, gap length weight
5 1). Consensus of the multiple-sequence alignment
was obtained using Pretty of GCG. Phylogenetic trees
were constructed using neighbor-joining, minimum-evolution, and maximum-parsimony methods of PAUP* 4.0
b1 (Swofford 1998). Specific parameters used in the
phylogenetic analyses are described in the figure legends. Five hundred bootstrap resamplings were used to
assess the confidence in the grouping (Felsenstein and
Kishino 1993).
Results
Discovery of MosquI, a Novel Family of Non-LTR
Retrotransposons in A. aegypti
The first copy of MosquI, MosquI-Aa1, was found
fortuitously in the 39 flanking sequence of an AaE74-1
gene (unpublished data). The AaE74-1 gene is a homolog of a D. melanogaster transcription factor E74
(Burtis et al. 1990). MosquI-Aa1 is 1,300 bp long
(GenBank accession number AF134899). Although it is
an incomplete copy, it is flanked by 12-bp direct repeats.
Its limited coding sequence showed relatively high similarities to other non-LTR retrotransposons, including
bilbo of Drosophila subobscura (Blesa and MartinezSebastian 1997), Lian of A. aegypti (Tu, Isoe, and Guzova 1998), and I factors of D. melanogaster and Drosophila teissieri (Fawcett et al. 1986; Abad et al. 1989),
with E values all lower than 3 3 e29 according to a
BLASTX analysis. A tandem repeat of seven TAAs was
found at the 39 end of MosquI-Aa1, similar to the Drosophila I factors. Further analysis described below indicates that MosquI is distantly related to the Drosophila
I factors.
Relatively Low Copy Number of MosquI in A. aegypti
To investigate the relative abundance and diversity
of the MosquI family, a genomic library was screened,
using the MosquI-Aa1 probe, under the conditions described in Materials and Methods. Eleven positive
plaques were identified out of approximately 40,000
plaques. Therefore, based on the information described
in Materials and Methods, there should be approximately 14 copies of MosquI elements per haploid A.
aegypti genome. Six of the positive MosquI clones were
further analyzed. One was shown to be the same as the
clone containing MosquI-Aa1. Two other clones were
shown to be identical to one another. This is consistent
with the low copy number of MosquI elements in the
genome.
MosquI Retrotransposons
Structural and Sequence Analysis of the Full-Length
MosquI-Aa2 Suggest that It Is Highly Similar to the
Drosophila I Factors
In addition to MosquI-Aa1, four other MosquI elements were isolated and sequenced (Genbank accession
numbers AF134900–AF134903). The only full-length
element, MosquI-Aa2, is 6,623 bp long, containing two
ORFs flanked by a 404-bp 59 untranslated region and a
326-bp 39 untranslated region, as shown in figure 1.
ORF1 and ORF2 are 496 and 1,208 amino acids long,
respectively, separated by a 781-bp noncoding sequence.
The 59 region contains an initiator sequence CAGT and
a downstream regulatory sequence AGANNCGTG, similar to those known to regulate transcription of other
non-LTR retrotransposons (Minchiotti, Contursi, and Di
Nocera 1997). MosquI-Aa2 also contains a tandem repeat of six TAAs at its 39 end, which is similar to the
unique sequence at the 39 end of Drosophila I factors.
Moreover, as shown in figure 2, the overall organization
of the two ORFs and the domains of MosquI-Aa2 is the
same as that of the Drosophila I factors.
As shown in table 1, ORF1 of MosquI-Aa2 contains a domain which is most similar to the nucleocapsids of the I factors of D. melanogaster (Dawson et al.
1997; Seleme et al. 1999) according to a BLASTP analysis. Three CCHC motifs, characteristic of the nucleocapsid domains of many retrotransposons, were identified as shown in figure 1. BLAST analysis also showed
that a region in the ORF1, downstream of the CCHC
motifs, had a relatively low similarity to a coiled-coil
motif found in the tropomyosin of the yeast (Pohlmann
and Philippsen 1996). Although the coiled-coil motif has
not been found in invertebrate non-LTR retrotransposons, it has been found in human long interspersed nuclear elements (LINEs). This motif may be responsible
for generating ribonucleoprotein complexes by multimerization (Hohjoh and Singer 1996). It is not clear
whether or not the potential coiled-coil motif in MosquIAa2 has a similar function. The ORF2 contains three
domains, namely endonuclease, reverse transcriptase,
and RNase H domains. As shown in table 1, when sequences of these domains were used as queries in
BLAST searches, the Drosophila I factors were again
among the most similar sequences in all of these domains. The similarities in overall organization and domain sequences suggest that MosquI is related to the
Drosophila I factors. However, the relatively high level
of sequence divergence between MosquI and I factors
(table 1) indicates that they may be distantly related.
Phylogenetic Analysis of the Reverse Transcriptase
Domain Is Consistent with the Hypothesis that MosquI
Is Distantly Related to the Drosophila I Factors
Phylogenetic relationships between MosquI and 39
other non-LTR retrotransposons were analyzed using the
reverse transcriptase domain, as shown in figure 3. The
basic pattern of the relationship of the reverse transcriptase domains of these 40 non-LTR retrotransposons are
the same as that of 33 non-LTR retrotransposons analyzed by Tu, Isoe, and Guzova (1998). In addition to
1677
four major groupings shown in the previous analysis,
two other groups are identified because of the addition
of new elements in the analysis. All six major groupings
were supported by bootstrap replicates, scoring higher
than 50% in all three different methods, namely minimum evolution, neighbor joining, and maximum parsimony. MosquI-Aa2 belongs to group V, together with
LINE1-Bg and the I factors from two Drosophila species. LINE1-Bg is a fragment of a non-LTR retrotransposon from a snail, Biomphalaria glabrata (Knight et
al. 1992). The reverse transcriptase domains of MosquI
and LINE1-Bg form a subgroup, while the two I factors
form another. The evolutionary implications of such
groupings are discussed below. The branches separating
MosquI and the other three elements are rather long,
indicating that they are distantly related. The sister relationship between Ingi-Tb (Murphy et al. 1987) and the
group V shown in figure 3 was not supported by bootstrap analyses. Because this is an unrooted tree, the relative relationships between the major groups are not certain. In summary, structural analysis, sequence comparisons, and phylogenetic analysis all suggest that MosquI
is likely a distant relative of the Drosophila I factors.
Frequent 59 Truncations May Be Caused by
Incomplete Reverse Transcription
Only one of the five sequenced elements, MosquIAa2, is full-length. The rest are truncated copies ranging
from 443 to 1,300 bp. As shown in figure 4A, all truncations happened at the 59 end. The 39 termini are intact,
although the number of TAA repeats varies. Moreover,
all five elements are flanked by short direct repeats that
are putative target duplications, suggesting that the truncations are due neither to deletion nor to recombination
after insertion. It is likely that the truncations were
caused by incomplete reverse transcription. There is no
striking consensus among the direct repeats flanking the
five MosquI elements (fig. 4B).
The Full-Length and Truncated MosquI Elements
Form Two Subfamilies
Shown in figure 4A is a multiple-sequence alignment of the five MosquI elements. It is apparent that in
the region shared by all five elements, the four truncated
copies are much more similar to each other than to the
full-length MosquI-Aa2. For example, 101 changes and
2 insertions are found in MosquI-Aa2 when compared
with the consensus of the five elements. Strikingly, only
one of these differences is shared with a truncated copy
of MosquI. As shown in table 2, the pairwise comparisons between the four truncated MosquI elements
showed 96.7%–99.5% identity at the nucleotide level.
However, the pairwise comparisons between the fulllength MosquI-Aa2 and the truncated copies showed
only 80.2%–81.8% identity. Therefore, the truncated
copies and the full-length MosquI-Aa2 may form two
subfamilies based on their sequence divergence. Moreover, phylogenetic analyses of the five MosquI elements
showed that the four truncated copies clustered together,
while MosquI-Aa2 was separated as a long branch (data
not shown), which is consistent with the grouping de-
1678
Tu and Hill
FIG. 1.—Nucleotide and deduced amino acid sequence of MosquI-Aa2, a full-length non-LTR retrotransposon in Aedes aegypti. The 11-bp
direct repeats flanking MosquI-Aa2 are boxed. MosquI-Aa2 contains two open reading frames. Four putative domains are marked by arrows,
including nucleocapsids, endonuclease, reverse transcriptase, and RNase H domains. Three CCHC motifs in the nucleocapsids are underlined.
Note that six TAA tandem repeats are found at the 39 end.
MosquI Retrotransposons
1679
FIG. 1 (Continued)
scribed above. However, as it is not clear where the root
is in the phylogenetic trees, the evolutionary relationships of these elements are not yet certain. As shown in
figure 4A, a large portion of the multiple-sequence alignment is in the 39 untranslated region. The levels of similarity in the coding and the 39 regions are quite similar
between the truncated copies except for MosquI-Aa3,
which has several divergent nucleotides near the 59 truncation. However, the similarities between MosquI-Aa2
and the four truncated copies are higher (84.6%–85.8%)
in the coding region than in the 39 untranslated region
(77.4%–78.9%), perhaps indicating a slower rate of mutation in the coding sequence.
MosquI Elements Are Often Found Near Genes and
Other Transposable Elements
As shown in figure 5, three of the five MosquI elements, MosquI-Aa1, MosquI-Aa4, and MosquI-Aa5, are
1680
Tu and Hill
FIG. 2.—Structure of MosquI-Aa2 (A) and the I factor of Drosophila melanogaster (B). The two open reading frames (ORFs) are shown
as open boxes and are separated by a short untranslated region. The domains in each of the ORFs are marked by solid lines above the ORFs.
Both MosquI-Aa2 and the I factor of D. melanogaster contain short 59 and 39 untranslated regions and tandem TAA repeats. NC 5 nucleocapsids;
ENDO 5 endonuclease; RT 5 reverse transcriptase; RH 5 RNase H.
near genes. These genes are AaE74-1 (unpublished
data), a gene similar to a Caenorhabditis elegans gene
coding for an unknown protein (Wilson et al. 1994; E
5 3 3 e217), and a gene similar to a serine/threonine
protein phosphatase gene of D. melanogaster (Dombradi
et al. 1990; E 5 6 3 e214), respectively. Furthermore,
each of the five MosquI elements is close to at least one
transposable element. In many cases, MosquI elements
are close to multiple transposable elements. For example, four transposable elements are found near MosquIAa5. Interestingly, MosquI-Aa1, MosquI-Aa3, and
MosquI-Aa5 contain a transposable element inserted
within their sequences. Except for the BEL-like element
(Davis and Judd 1995), the Q-like element (Besansky,
Bedell, and Mukabayire 1994), and the Wuneng element
(Tu 1997), all transposable elements near a MosquI are,
or are likely to be, full-length.
Discussion
Evolutionary Relationship Between MosquI and Other
Non-LTR Retrotransposons
In addition to the discovery and characterization of
MosquI, we have presented evidence suggesting that
MosquI is highly similar to the Drosophila I factors. We
have also shown that MosquI belongs to the same group
as the I factors and LINE1-Bg (group V) based on analyses of the reverse transcriptase domains of 40 non-
LTR retrotransposons (fig. 3). However, the bootstrap
values for group V were the lowest (63, 68, and 55)
among the six groups, and the sister relationship between Ingi-Tb and group V was not supported by bootstrap analyses. Using an expanded alignment of the reverse transcriptase domain, Malik, Burke, and Eickbush
(1999) recently classified 72 non-LTR retrotransposons
into 11 clades. This extensive new phylogeny is largely
the same as those of previous analyses (Xiong and Eickbush 1990; Tu, Isoe, and Guzova 1998) and what we
described here. However, the classification is much more
comprehensive and the resolution is improved. The I
clade in Malik, Burke, and Eickbush’s (1999) groupings
includes the Drosophila I factors, LINE1-Bg (BGR),
Ingi-Tb (ingi), and L1Tc. Similar to group V in our analysis, the I clade is ‘‘the poorest defined,’’ as it was not
supported by bootstrap analysis using maximum parsimony.
Although MosquI is highly similar to I factors
based on a number of criteria, its reverse transcriptase
domain is most similar to that of LINE1-Bg (table 1).
MosquI and LINE1-Bg form a subgroup within group V
which is supported by bootstrap analysis as shown in
figure 3. LINE1-Bg is a fragment of a non-LTR retrotransposon from a snail, B. glabrata (Knight et al.
1992). If we assume vertical transmission as suggested
by Malik, Burke, and Eickbush (1999), the above phy-
Table 1
Comparison of the Domains of MosquI-Aa2 with Those of Other Non-LTR Retrotransposons
NC
S
1-Dm . . . . . . 123/246
I-Dt . . . . . . . 122/246
Tras1-Bm . . 88/209
R1/R2-Nv . . 52/140
E
9
8
9
2
3
3
3
3
ENDO
229
e
e228
e 26
e 24
S
I-Dt . . . . . 106/212
I-Dm . . . . 104/212
Lian-Aa1
45/87
RT1-Ag . . 50/94
E
RT
23e
NA
3 3 e 28
5 3 e 27
225
S
LINE1-Bg . . 145/273
I-Dm . . . . . . 129/267
I-Dt . . . . . . . 125/267
Hyp1-Cte . . . 96/193
E
3
3
2
1
3
3
3
3
245
e
e228
e224
e222
RH
S
Trim-Dmi
bilbo-Ds. . .
Lian-Aa1 . .
I-Dm . . . . .
66/127
62/122
59/124
59/124
E
3
3
2
3
3
3
3
3
e212
e212
e 28
e 26
NOTE.—NC 5 nucleocapsids; ENDO 5 endonuclease; RT 5 reverse transcriptase; RH 5 RNase H; S 5 similar residues over total comparable residues. E 5
E value calculated during a BLAST search. The domains were first identified by pairwise comparisons with known domains from a number of retrotransposons.
BLAST analyses were performed to find similar sequences in the database. Only the four most similar sequences were shown for each domain. The E value for
the comparison between MosquI and I-Dm in the ENDO domain is not available because the correction of the I-Dm ENDO domain is not entered in the database.
References for the non-LTR retrotransposons are as follows: I-Dm—Fawcet et al. (1986), modified according to Abad et al. (1989); I-Dt—Abad et al. (1989); Tras1Bm—Okazaki, Ishikawa, and Fujiwara (1995); R1/R2-Nv—GenBank L00950; Lian-Aa1—Tu, Isoe, and Guzova (1998); RT1-Ag—Besansky et al. (1992); LINE1Bg—Knight et al. (1992); Hyp1-Cte—Blinov et al. (1997); Trim-Dmi—Steinemann and Steinemann (1991); bilbo-Ds—Blesa and Martinez-Sebastian (1997).
MosquI Retrotransposons
1681
FIG. 3.—Phylogenetic analyses of the reverse transcriptase domains of 40 non-LTR retrotransposons including MosquI-Aa2 (marked by an
asterisk). Thirty-three of the 40 elements were analyzed in figure 6A of Tu, Isoe, and Guzova (1998). The seven additional elements include
MosquI-Aa2 (this paper), I-Dt (Abad et al. 1989), LINE1-Mg (GenBank accession number AF018033), Helena-Dy (Petrov, Lozovskaya, and
Hartl 1996), bilbo-Ds (Blesa and Martinez-Sebastian 1997), RT-Ce2 (Wilson et al. 1994), and RT1-Sm (Drew and Brindley 1997). The alignments
used here were obtained using Pileup of GCG (gap weight 5 8, gap length weight 5 1). A few minor adjustments were made at the N-terminal
end. The entire alignment is deposited in the EMBL database (accession number DS37921). The alignment is highly similar to that of Tu, Isoe,
and Guzova (1998) and Xiong and Eickbush (1990). The tree shown here is an unrooted phylogram constructed using a minimum-evolution
algorithm. The heuristic search was conducted using the tree bisection-reconnection (TBR) branch-swapping algorithm. All characters are of
equal weight and unordered. Three different methods were used, including minimum evolution, neighbor joining and maximum parsimony.
Confidence of the groupings was estimated using 500 bootstrap replications. Each Arabic numeral at the base of a node is the bootstrap value
which represents the percentage of times out of 500 bootstrap resamplings that branches were grouped together at a particular node. The first,
second, and third numbers at a particular node represent the bootstrap values derived from minimum-evolution, neighbor-joining, and maximumparsimony analysis, respectively. For the parsimony analysis, 20 random additions were done in each bootstrap replicate. Only groupings scored
higher than 50% in all three bootstrap analyses are marked. The Roman numerals at the bases of branch nodes indicate a major grouping of
elements. For example, group I includes elements from Tart-Dm to Juan-Aa. The bootstrap values supporting the six major groups are shown
separately at the bottom right. All phylogenetic analyses were conducted using PAUP* 4.0 b1 (Swofford 1998).
logenetic analysis would indicate that MosquI may be a
paralog of the Drosophila I factors because it is closer
to LINE1-Bg from a snail than to the Drosophila I factors. Thus, there may be at least two subgroups (subclades) within group V (or the I clade) that were separated before the split between the ancestors of mollusca
and arthropoda at the latest. Analysis of other related
elements from different genomes will certainly help to
improve our understanding of the evolution of elements
in this relatively poorly defined group.
Genomic Distribution of MosquI
There are approximately 14 copies of MosquI elements, estimated using the stringency described in Materials and Methods. There might be other copies of
MosquI with more divergent sequences that were not
1682
Tu and Hill
FIG. 4.—A, Multiple-sequence alignment of the four truncated MosquI elements and the 39 end of the full-length MosquI-Aa2. Pileup of
GCG was used to generate the alignment (GapWeight 5 3, GapLengthWeight 5 0). Insertions in MosquI-Aa2, MosquI-Aa3, and MosquI-Aa5
were removed prior to generating the alignment. The consensus sequence of the above alignment was created by Pretty (plurality 5 3, threshold
5 1) of GCG. Dots indicate sequences that are identical to the consensus. Lowercase letters indicate sequence variation. Dashed lines indicate
gaps.‘‘,’’ indicates a 59 truncation. An asterisk indicates the stop codon separating ORF2 and the 39 untranslated region of MosquI. Note that
the sizes of the four truncated copies vary. B, Direct repeats flanking the five MosquI elements. The lower case ’taa’ indicates the equal
possibility of this being part of the TAA tandem repeat of MosquI-Aa1. The lowercase letter ‘‘g’’ indicates the difference at the first nucleotide
between the 59 repeat (g) and the 39 repeat (c). The lowercase letter ‘‘t’’ indicates that it is missing in the 39 repeat.
Table 2
Percentages of Sequence Identity Between the MosquI
Elements
MosquI-Aa1 MosquI-Aa3 MosquI-Aa4 MosquI-Aa5
MosquI-Aa3
MosquI-Aa4
MosquI-Aa5
MosquI-Aa2
....
....
....
....
97.1
99.5
97.9
81.7
97.6
96.7
80.2
97.6
81.8
80.4
NOTE.—See figure 4 for the sequence alignment of all five MosquI elements.
detected during the screening. However, because
MosquI-Aa2, which is 84% identical to the probe, was
detected under these conditions, any MosquI element
that was not detected should be quite different from the
two subfamilies described here. The distribution of
MosquI in the A. aegypti genome does not seem to be
random. Instead, MosquI elements are often found near
genes and other repetitive elements. It is interesting to
note that three out of the four truncated MosquI elements
contain an insertion of an intact transposable element of
a different family. It is possible that many MosquI ele-
MosquI Retrotransposons
1683
FIG. 5.—MosquI elements and nearby genes and other transposable elements. The figure is not drawn to scale. MosquI elements are shown
as open boxes. Thick arrows indicate retrotransposons, including Mosqcopia-Aa1 (unpublished data), Lian (Tu, Isoe, and Guzova 1998), and
two elements similar to BEL (Davis and Judd, 1995) and Q (Besansky, Bedell, and Mukabayire 1994). The orientation of the arrows represents
the orientation of the retrotransposon. Boxes with small squares indicate elements of the Feilai family of SINEs (Tu 1999). Boxes with slanted
stripes indicate miniature inverted-repeat transposable elements, including Wujin, Wuneng, Pony, and Dopey (Tu 1997, unpublished data). The
box with horizontal stripes indicates an as yet unclassified repetitive element. Dotted boxes indicate open reading frames of genes, including
AaE74-1 (unpublished data), a gene similar to a Caenorhabditis elegans gene coding for an unknown protein (Wilson et al. 1994), and a gene
similar to a serine/threonine protein phosphatase gene of D. melanogaster (Dombradi et al. 1990). A question mark indicates that the relative
position is undetermined.
ments may be biased toward noncoding regions of genes
where repetitive elements concentrate. This is consistent
with our preliminary analysis showing concentrations of
a number of repetitive elements in the noncoding regions of a number of genes in A. aegypti (unpublished
data). Obviously, more copies of MosquI elements need
to be analyzed to further understand their distribution.
Nonrandom distributions of retrotransposons and other
transposable elements have been previously shown in A.
aegypti (Tu 1997, 1999; Tu, Isoe, and Guzova 1998).
The distribution patterns of transposable elements are
likely the result of complex interactions between different families of elements and/or between elements and
the host genome. Several mechanisms could account for
the nonrandom distribution and association between different families of repetitive elements and genes in the
genome, as discussed in detail in Tu (1999). In this regard, it may be helpful to view the genome as a complex
ecological system within which the lineage of the host
and the lineages of different transposable elements
evolve (Brookfield 1995; Kidwell and Lisch 1997).
MosquI and Other Non-LTR Retrotransposons in A.
aegypti
In addition to MosquI, three other families of nonLTR retrotransposons have been reported in A. aegypti,
including Juan, JAM1, and Lian (Mouches, Bensaadi,
and Salvado 1992; Hughes, Warren, and Crampton
1996; Tu, Isoe, and Guzova 1998). These three elements
belong to three different clades (Juan: Jockey clade;
JAM1: RTE clade; Lian: LOA clade) as defined by Malik, Burke, and Eickbush (1999). MosquI belongs to
group V in our analysis, which is equivalent to the I
clade in Malik, Burke, and Eickbush (1999). We have
also identified non-LTR retrotransposons in A. aegypti
that belong to the CR1 clade and the R1 clade (unpublished data). Thus, there is a diverse group of non-LTR
retrotransposons in A. aegypti, at least one representative
from six different clades as defined by Malik, Burke,
and Eickbush (1999).
Aedes aegypti has a genome five times the size of
the Drosophila genome. It contains many families of
highly reiterative transposable elements, including the
three non-LTR retrotransposons mentioned above. However, there are only approximately 14 copies of MosquI
per haploid genome. The difference in copy number between MosquI and other non-LTR retrotransposons may
reflect various interactions between non-LTR retrotransposons and the A. aegypti genome. The copy number of
I factors in Drosophila is also low, with 10–15 copies
on the chromosomal arms and approximately 30 defec-
1684
Tu and Hill
tive copies in b-heterochromatin (Busseau et al. 1994).
It has been shown that the transpositional activity of the
I factors in Drosophila can be repressed by the transcription of transgenes containing a small internal region
of the I element (Jensen, Gassama, and Heidmann
1999). Four of the five analyzed MosquI elements are
truncated. It is not clear whether these truncated copies
helped repress the activity of the full-length copy, thus
keeping the number of MosquI elements low.
Evolutionary Origins of the Two Subfamilies of
MosquI Elements and Comparisons with the I Factors
in Drosophila
Sequence comparisons and phylogenetic analyses
suggest that there may be two subfamilies of MosquI
elements, namely the truncated copies and the fulllength MosquI-Aa2. It is likely that the truncated copies
analyzed here are derived from a source other than the
full-length MosquI-Aa2. It is possible that there is at
least one other full-length MosquI element which is the
progenitor of the truncated copies. Alternatively, the
truncated copies may have been originated from a truncated master gene, borrowing the retrotransposition machinery of the full-length MosquI-Aa2. However, the latter hypothesis is not likely, as the promoter for the nonLTR retrotransposons is believed to be in the 59 end
(McLean, Bucheton, and Finnegan 1993), which is missing in the truncated copies.
Interestingly, it has been shown that there are also
two subfamilies of I factors in D. melanogaster, the defective I factors found in the pericentromeric regions of
both the inducer and the reactive strains, and the I factors originating from the active full-length I factors
which are scattered on chromosomal arms in the inducer
strains (Busseau et al. 1994). It is believed that the defective I factors are ancient components of the Drosophila genome, while the active I factors invaded the
natural populations of D. melanogaster in recent decades. However, the evolutionary relationship between
the two subfamilies of MosquI elements in A. aegypti is
likely to be quite different from the relationship between
the two subfamilies of I factors in D. melanogaster. First
of all, all truncated copies of MosquI analyzed so far
are flanked by direct repeats, while none of the sequenced defective I factors in D. melanogaster are. In
this respect, the truncated MosquI elements are more
similar to the incomplete copies of the active I factors
in the inducer strains of D. melanogaster which are
flanked by direct repeats. Moreover, some of the truncated MosquI elements are highly similar to each other
(99.5% identity). These data suggest that the subfamily
of truncated MosquI elements may have been transposing relatively recently. Based on the presence of the
complete sequence and intact ORFs and the presence of
direct repeats, the full-length MosquI-Aa2 is also likely
an active or recently active element. However, since it
is the only sequence available in this subfamily, it is
difficult to assess the relative time of its activity. Sequence identities between the two subfamilies are relatively low, 84.6%–85.8% in the coding region, and
77.4%–78.9% in the 39 untranslated region. On the other
hand, the defective and active I factors in D. melanogaster are 94% identical. Thus, either the source genes
of the two subfamilies of MosquI diverged a long time
ago, or the they have been evolving at a much faster
rate than the I factors in D. melanogaster. We hypothesize that two divergent MosquI elements have recently
been transposing in the genome of A. aegypti, although
it is not clear whether either one of them is still active.
Analysis of more MosquI sequences from different
strains and natural populations of A. aegypti, and perhaps MosquI from closely related species of mosquitoes,
may be necessary to further understand the evolution of
this family of retrotransposons and their relationships to
different mosquito genomes.
Potential Applications of the Analysis of Endogenous
Mosquito Transposable Elements
Mosquito-transmitted diseases such as malaria and
dengue fever are on the rise because traditional control
methods have become less effective. An alternative approach is being investigated in which mosquitoes are
genetically transformed to become refractory to disease
pathogens. Analysis of the characteristics, evolution,
and spread of endogenous mosquito transposable elements such as MosquI will provide important basic information needed for the long-term success of such a
genetic strategy. For example, knowledge of the behavior of mosquito transposable elements and their interactions with the host genomes may help in devising better transposon-derived transformation vectors to reduce
possible inactivation by endogenous transposable elements and cross-mobilization of endogenous transposable elements. Moreover, active elements may be identified during the analysis of endogenous transposable elements in mosquitoes. It is not yet clear how effective
it will be to use endogenous transposable elements as
transformation vectors in the same species. However,
active elements found in A. aegypti may at least have
the potential to serve as transformation tools in different
mosquitoes, such as Anopheles gambiae. Finally, some
of the endogenous transposable elements can be used to
develop markers for genetic mapping and population
studies, which are also necessary for the development
of a successful and sustained genetic strategy to control
mosquito-transmitted diseases.
Acknowledgments
We thank H. H. Hagedorn and M. G. Kidwell for
critical comments on the manuscript. We thank A. A.
James for the gift of a genomic library of A. aegypti.
We also thank Skip Vaught and others at the Sequencing
Facility of the University of Arizona for their excellent
service. This work was supported by NIH grant
AI42121 to Z.T. and by a MacArthur Foundation grant
to the Center for Insect Science of the University of
Arizona.
LITERATURE CITED
ABAD, P., C. VAURY, A. PELISSON, M. C. CHABOISSIER, I. BUSSEAU, and A. BUCHETON. 1989. A long interspersed repet-
MosquI Retrotransposons
itive element—the I factor of Drosophila teissieri—is able
to transpose in different Drosophila species. Proc. Natl.
Acad. Sci. USA 86:8887–8891.
ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHÄFFER, J. ZHANG,
Z. ZHANG, W. MILLER, and D. J. LIPMAN. 1997. Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.
BESANSKY, N. J., J. A. BEDELL, and O. MUKABAYIRE. 1994.
Q: a new retrotransposon from the mosquito Anopheles
gambiae. Insect Mol. Biol. 3:49–56.
BESANSKY, N. J., S. M. PASKEWITZ, D. M. HAMM, and F. H.
COLLINS. 1992. Distinct families of site-specific retrotransposons occupy identical positions in the rRNA genes of
Anopheles gambiae. Mol. Cell. Biol. 12:5102–5110.
BLESA, D., and M. J. MARTINEZ-SEBASTIAN. 1997. bilbo, a
non-LTR retrotransposon of Drosophila subobscura: a clue
to the evolution of LINE-like elements in Drosophila. Mol.
Biol. Evol. 14:1145–1153.
BLINOV, A. G., Y. V. SOBANOV, S. V. SCHERBIK, and K. G.
AIMANOVA. 1997. The Chironomus (Camptochironomus)
tentans genome contains two non-LTR retrotransposons.
Genome 40:143–150.
BROOKFIELD, J. F. Y. 1995. Transposable element as selfish
DNA. Pp. 130–153 in D. J. SHERRATT, ed. Mobile genetic
elements. Oxford University Press, Oxford, England.
BUCHETON, A., M. SIMONELIG, C. VAURY, and M. CROZATIER.
1986. Sequences similar to the I transposable element involved in I-R hybrid dysgenesis in D. melanogaster occur
in other Drosophila species. Science 322:650–652.
BURTIS, K. C., C. S. THUMMEL, C. W. JONES, F. D. KARIM, and
D. S. HOGNESS. 1990. The Drosophila 74EF early puff contains E74, a complex ecdysone-inducible gene that encodes
two ets-related proteins. Cell 61:85–99.
BUSSEAU, I., M.-C. CHABOISSIER, A. PELISSON, and A. BUCHETON. 1994. I factors in Drosophila melanogaster: transposition under control. Genetica 93:101–116.
DAVIS, P. S., and B. H. JUDD. 1995. Nucleotide sequence of
the transposable element, BEL, of Drosophila melanogaster. Drosoph. Inf. Serv. 76:134–136.
DAWSON, A., E. HARTSWOOD, T. PATERSON, and D. J. FINNEGAN. 1997. A LINE-like transposable element in Drosophila, the I factor, encodes a protein with properties similar to
those of retroviral nucleocapsids. EMBO J. 16:4448–4455.
DOMBRADI, V., J. M. AXTON, N. D. BREWIS, E. F. DA CRUZ E
SILVA, L. ALPHEY, and P. T. COHEN. 1990. Drosophila contains three genes that encode distinct isoforms of protein
phosphatase 1. Eur. J. Biochem. 194:739–745.
DREW, A. C., and P. J. BRINDLEY. 1997. A retrotransposon of
the non-long terminal repeat class from the human blood
fluke Schistosoma mansoni. Similarities to the chicken-repeap-1-like elements of vertebrates. Mol. Biol. Evol. 14:
602–610.
FAWCETT, D. H., C. K. LISTER, E. KELLETT, and D. J. FINNEGAN. 1986. Transposable elements controlling I-R hybrid
dysgenesis in D. melanogaster are similar to mammalian
LINEs. Cell 47:1007–1015.
FELSENSTEIN, J., and H. KISHINO. 1993. Is there something
wrong with the bootstrap on phylogenies? A reply to Hillis
and Bull. Syst. Biol. 42:193–200.
FENG, Q., J. V. MORAN, H. H. KAZAZIAN JR., and J. D. BOEKE.
1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905–916.
FINNEGAN, D. J. 1989. The I factor and I-R hybrid dysgenesis
in Drosophila melanogaster. Pp. 503–517 in D. E. BERG
and M. M. HOME, eds. Mobile DNA. American Society of
Microbiology, Washington, D.C.
1685
. 1992. Transposable elements. Curr. Opin. Genet. Dev.
2:861–867.
. 1997. Transposable elements: how non-LTR
retrotransposons do it. Curr. Biol. 7:R245–R248.
HOHJOH, H., and M. F. SINGER. 1996. Cytoplasmic ribonucleoprotein complexes containing human LINE-1 protein and
RNA. EMBO J. 15:630–639.
HUGHES, M. A., A. M. WARREN, and J. M. CRAMPTON. 1996.
JAM1: a novel LINE transposable element in the genome
of the medically important mosquito, Aedes aegypti. Pp.
276 in Proceedings of the XXth International Congress of
Entomology, Florence, Italy.
HUTCHINSON, C. A., S. C. HARIES, D. D. LOEB, W. R. SHEHEE,
and M. H. EDGELL. 1989. LINEs and related retroposons:
long interspersed repeated sequences in the eucaryotic genome. Pp. 593–617 in D. E. BERG and M. M. HOME, eds.
Mobile DNA. American Society of Microbiology, Washington, D.C.
JENSEN, S., M. P. GASSAMA, and T. HEIDMANN. 1999. Taming
of transposable elements by homology-dependent gene silencing. Nat. Genet. 21:209–212.
KIDWELL, M. G., and D. LISCH. 1997. Transposable elements
as sources of variation in animals and plants. Proc. Natl.
Acad. Sci. USA 94:7704–7711.
KNIGHT, M., A. MILLER, N. RAGHAVAN, C. RICHARDS, and F.
LEWIS. 1992. Identification of a repetitive element in the
snail Biomphalaria glabrata: relationship to the reverse
transcriptase-encoding sequence in LINE-1 transposons.
Gene 118:181–187.
LEVIN, H. L. 1997. It’s prime time for reverse transcriptase.
Cell 88:5–8.
LUAN, D. D., M. H. KORMAN, J. L. JAKUBCZAK, and T. H.
EICKBUSH. 1993. Reverse transcription of R2Bm RNA is
primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595–605.
MCLEAN, C., A. BUCHETON, and D. J. FINNEGAN. 1993. The
59 untranslated region of the I factor, a long interspersed
nuclear element-like retrotransposon of Drosophila melanogaster, contains an internal promoter and sequences that
regulate expression. Mol. Cell. Biol. 13:1042–1050.
MALIK, H. S., W. D. BURKE, and T. H. EICKBUSH. 1999. The
age and evolution of non-LTR retrotransposable elements.
Mol. Biol. Evol. 16:793–805.
MINCHIOTTI, G., C. CONTURSI, and P. P. DI NOCERA. 1997.
Multiple downstream promoter modules regulate the transcription of the Drosophila melanogaster I, Doc and F elements. J. Mol. Biol. 267:37–46.
MOUCHES, C., N. BENSAADI, and J. C. SALVADO. 1992. Characterization of a LINE retroposon dispersed in the genome
of three non sibling Aedes mosquito species. Gene 120:183–
190.
MURPHY, N. B., A. PAYS, P. TEBABI, H. COQUELET, M. GUYAUX, M. STEINERT, and E. PAYS. 1987. Trypanosoma brucei
repeated element with unusual structural and transcriptional
properties. J. Mol. Biol. 195:855–871.
OKAZAKI, S., H. ISHIKAWA, and H. FUJIWARA. 1995. Structural
analysis of Tras1, a novel family of telomeric repeat-associated retrotransposons in the silkworm, Bombyx mori. Mol.
Cell. Biol. 15:4545–4552.
PETROV, D. A., E. R. LOZOVSKAYA, and D. L. HARTL. 1996.
High intrinsic rate of DNA loss in Drosophila. Nature 384:
346–349.
POHLMANN, R., and P. PHILIPPSEN. 1996. Sequencing a cosmid
clone of Saccharomyces cerevisiae chromosome XIV reveals 12 new open reading frames (ORFs) and an ancient
duplication of six ORFs. Yeast 12:391–402.
1686
Tu and Hill
RAO, P. S., and K. S. RAI. 1987. Inter and intraspecific variation
in nuclear DNA content in Aedes mosquitoes. Heredity 59:
253–258.
SAMBROOK, J., E. F. FRITSCH, and T. MANIATIS. 1989. Molecular cloning: a laboratory manual. 2nd edition. Cold Spring
Harbor Press, Cold Spring Harbor, N.Y.
SELEME, M. D., I. BUSSEAU, S. MALINSKY, A. BUCHETON, and
D. TENINGES. 1999. High-frequency retrotransposition of a
marked I factor in Drosophila melanogaster correlates with
a dynamic expression pattern of the ORF1 protein in the
cytoplasm of oocytes. Genetics 151:761–771.
SIMONELIG, M., C. BAZIN, A. PELISSON, and A. BUCHETON.
1988. Transposable and nontransposable elements similar to
the I factor involved in inducer-reactive (IR) hybrid dysgenesis in Drosophila melanogaster coexist in various Drosophila species. Proc. Natl. Acad. Sci. USA 85:1141–1145.
STEINEMANN, M., and S. STEINEMANN. 1991. Preferential Y
chromosomal location of TRIM, a novel transposable element of Drosophila miranda, obscura group. Chromosoma
101:169–179.
SWOFFORD, D. L. 1998. PAUP*. Version 4.0 b1. (A commercial test version; completed version 4.0 to be distributed by
Sinauer, Sunderland, Mass.)
TU, Z. 1997. Three novel families of miniature inverted-repeat
transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti. Proc. Natl. Acad. Sci.
USA 94:7475–7480.
. 1999. Genomic and evolutionary analysis of Feilai, a
diverse family of highly reiterated SINEs in the yellow fever mosquito, Aedes aegypti. Mol. Biol. Evol. 16:760–772.
TU, Z., and H. H. HAGEDORN. 1997. Biochemical, molecular,
and phylogenetic analysis of pyruvate carboxylase in the
yellow fever mosquito, Aedes aegypti. Insect Biochem.
Mol. Biol. 27:133–147.
TU, Z., J. ISOE, and J. A. GUZOVA. 1998. Structural, genomic,
and phylogenetic analysis of Lian, a novel family of nonLTR retrotransposons in the yellow fever mosquito, Aedes
aegypti. Mol. Biol. Evol. 15:837–853.
UDOMKIT, A., S. FORBES, C. MCLEAN, and D. J. FINNEGAN.
1996. Control of expression of the I factor, a LINE-like
transposable element in Drosophila melanogaster. EMBO
J. 15:3174–1381.
WILSON, R., R. AINSCOUGH, K. ANDERSON et al. (53 co-authors). 1994. 2.2 Mb of contiguous nucleotide sequence
from chromosome III of C. elegans. Nature 368:32–38.
XIONG, Y., and T. H. EICKBUSH. 1990. Origin and evolution of
retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.
PIERRE CAPY, reviewing editor
Accepted September 6, 1999