MosquI, a Novel Family of Mosquito Retrotransposons Distantly Related to the Drosophila I Factors, May Consist of Elements of More than One Origin Zhijian Tu* and Jennifer J. Hill† *Department of Biochemistry, Virginia Polytechnic Institute and State University; and †Department of Entomology and Center for Insect Science, University of Arizona A novel family of non-long-terminal-repeat (non-LTR) retrotransposons, named MosquI, was discovered in the yellow fever mosquito, Aedes aegypti. There were approximately 14 copies of MosquI in the A. aegypti genome. Four of the five analyzed MosquI elements were truncated at the 59 ends while one of them, MosquI-Aa2, was fulllength. All five MosquI elements ended with 4–10 TAA tandem repeats, as the Drosophila I factors do. Interestingly, MosquI elements were often found near genes and other repetitive elements. The 6,623-bp MosquI-Aa2 contained two open reading frames (ORFs) flanked by a 404-bp 59 untranslated region and a 326-bp 39 untranslated region. The two ORFs code for nucleocapsids, endonuclease, reverse transcriptase, and RNase H domains. Although overall structural and sequence comparisons suggest that MosquI is highly similar to the Drosophila I factors, phylogenetic analysis based on the reverse transcriptase domains of 40 non-LTR retrotransposons indicate that MosquI and I factors are likely paralogous elements which may have been separated before the split between the ancestors of mollusca and arthropoda. Pairwise comparisons between the four truncated MosquI elements showed 96.7%–99.5% identity at the nucleotide level, while comparisons between the full-length MosquI-Aa2 and the truncated copies showed only 80.2%–81.8% identity. These comparisons and preliminary phylogenetic analyses suggest that the fulllength and truncated MosquI elements may belong to two subfamilies originating from two source genes that diverged a long time ago. In contrast to the defective I factors in Drosophila melanogaster, which are likely very old components of the genome, the truncated MosquI elements seem to have been recently active. Finally, the genomic distribution and evolution of MosquI elements are analyzed in the context of other non-LTR retrotransposons in A. aegypti. Introduction Transposable elements are integral components of eukaryotic genomes. They are classified by the mechanism of their transposition (Finnegan 1992). Class II elements transpose directly from DNA to DNA, while class I elements transpose via an RNA intermediate. Class I elements can be further categorized into three groups, including long terminal repeat (LTR) retrotransposons, non-LTR retrotransposons, and short interspersed nuclear elements (SINEs). Non-LTR retrotransposons utilize internal promoters for their transcription (Levin 1997). They code for reverse transcriptase and other functional domains which are essential for retrotransposition. Recent studies suggest that target-primed reverse transcription, which was first described for the R2 element of Bombyx mori (Luan et al. 1993), is likely to be common for non-LTR retrotransposons (Feng et al. 1996; Levin 1997; Finnegan 1997). The I factor, a family of non-LTR retrotransposons, was first discovered in Drosophila melanogaster as the factor controlling the I-R hybrid dysgenesis, a syndrome of female sterility resulting from a cross between the inducer-strain males and the reactive-strain females (Finnegan 1989; Busseau et al. 1994). The dysgenic cross results in high rate of transposition of the I factors Abbreviations: LINE, long interspersed nuclear element; LTR, long terminal repeat; ORF, open reading frame. Key words: non-LTR, retrotransposon, Aedes aegypti, Drosophila, I factor, evolution. Address for correspondence and reprints: Zhijian Tu, Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061. E-mail: [email protected]. Mol. Biol. Evol. 16(12):1675–1686. 1999 q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 through a mechanism that is just being understood (e.g., Jensen, Gassama, and Heidmann 1999). While both the inducer and the reactive strains contain defective I factors, only the inducer strains possess the full-length active I factors (Busseau et al. 1994). The defective I factors seem to have been derived from an element that existed in the D. melanogaster genome long before the recent invasion of the new active I factor. While the defective I factors are found only in the pericentromeric regions, the complete I factors are found on the chromosome arms. Like most non-LTR retrotransposons, the active I factors use an internal promoter that lies within the first 186 bp for transcription (McLean, Bucheton, and Finnegan 1993; Udomkit et al. 1996; Minchiotti, Contursi, and Di Nocera 1997). However, unlike most non-LTR retrotransposons, which end with strings of poly-A (Hutchinson et al. 1989), I factors contain unique TAA tandem repeats at the 39 end. Active I factors contain two open reading frames (ORFs) that code for nucleocapsids, endonuclease, reverse transcriptase, and RNase H domains (Fawcett et al. 1986; Finnegan 1989; Feng et al. 1996; Dawson et al. 1997; Seleme et al. 1999). In addition to D. melanogaster, I factors have been found in several other Drosophila species, mainly within the melanogaster species group (Bucheton et al. 1986; Simonelig et al. 1988; Abad et al. 1989). We here report the discovery and characterization of MosquI, a novel family of non-LTR retrotransposons distantly related to the Drosophila I factors. We also present genomic and evolutionary analysis of MosquI in the context of other non-LTR retrotransposons. Like several other transposable elements, MosquI was discovered in Aedes aegypti by serendipity. We are now 1675 1676 Tu and Hill systematically studying the molecular genetics and evolution of MosquI and other endogenous mosquito transposable elements (Tu 1997, 1999; Tu, Isoe, and Guzova 1998). We hope that such analyses will provide further insights into the genetic makeup and organization of mosquito genomes as well as powerful tools which may facilitate current efforts to control mosquito-transmitted diseases using genetic engineering. Materials and Methods Genomic Library Screening The l Dash II genomic library used in this study was prepared from the A. aegypti Rock strain by Dr. A. A. James of the Department of Molecular Biology and Biochemistry of the University of California at Irvine. The genomic library was screened using a digoxigeninlabeled ssDNA probe. This probe was prepared by asymmetric PCR from a dsDNA template that included a 600-bp region near the 39 end of MosquI-Aa1. PCR conditions were the same as those described in Tu and Hagedorn (1997). Approximately 40,000 plaques were plated on three 150-mm plates and lifted to MagnaGraph Nylon membranes (Micron Separation Inc., Westborough, Mass.). The prehybridization solution was 5 3 SSC, 0.1% N-lauroylsarcosine, 0.02% SDS, and 2% nonfat milk. Hybridization occurred at 658C using 20 ng/ml of the digoxigenin-labeled ssDNA probe. The final washes were carried out in 0.5 3 SSC containing 0.1% SDS at 658C. Prehybridization, hybridization, and washing were performed in a Gene Roller from Savant Instruments, Inc. (Holbrook, N.Y.) Estimation of Copy Numbers The copy number of the MosquI elements in the A. aegypti genome was estimated during the above screening experiment based on the ratio of positive plaques to the total number of plaques screened, taking into account the known size of the haploid genome of A. aegypti Rock strain (800 Mb; Rao and Rai 1987) and the 16 kb average insert size of the genomic library. Details of the method are described in Tu, Isoe, and Guzova (1998). Phage DNA Purification, Subcloning, and DNA Sequencing Phage DNA was purified according to Sambrook, Fritsch, and Maniatis (1989). Fragments of the phage DNA insert were separated by gel electrophoresis and subcloned into pBluescript SK (2) plasmid from Stratagene Cloning Systems (La Jolla, Calif.). MosquI sequences were determined from both strands by the DNA Sequencing Facility of the University of Arizona using synthetic primers and an automatic sequencer (Model 377, Applied Biosystems International, Forster City, Calif.). Sequence Analysis Searches for matches of either nucleotide or amino acid sequences in the database (nonredundant GenBank 1 EMBL 1 DDBJ 1 PDB) were done using FASTA of GCG, version 9.0 (Genetics Computer Group, Mad- ison, Wis.), and BLAST (Altschul et al. 1997). E, the expected frequency of chance occurrence in finding segments with at least a certain level of similarity or higher between the query and a database sequence, indicates the significance of the similarities identified in a BLAST search. Pairwise comparisons were accomplished using Bestfit and Gap of GCG. Multiple sequences were aligned by Pileup, which is a progressive, pairwise method from GCG (gap weight 5 8, gap length weight 5 1). Consensus of the multiple-sequence alignment was obtained using Pretty of GCG. Phylogenetic trees were constructed using neighbor-joining, minimum-evolution, and maximum-parsimony methods of PAUP* 4.0 b1 (Swofford 1998). Specific parameters used in the phylogenetic analyses are described in the figure legends. Five hundred bootstrap resamplings were used to assess the confidence in the grouping (Felsenstein and Kishino 1993). Results Discovery of MosquI, a Novel Family of Non-LTR Retrotransposons in A. aegypti The first copy of MosquI, MosquI-Aa1, was found fortuitously in the 39 flanking sequence of an AaE74-1 gene (unpublished data). The AaE74-1 gene is a homolog of a D. melanogaster transcription factor E74 (Burtis et al. 1990). MosquI-Aa1 is 1,300 bp long (GenBank accession number AF134899). Although it is an incomplete copy, it is flanked by 12-bp direct repeats. Its limited coding sequence showed relatively high similarities to other non-LTR retrotransposons, including bilbo of Drosophila subobscura (Blesa and MartinezSebastian 1997), Lian of A. aegypti (Tu, Isoe, and Guzova 1998), and I factors of D. melanogaster and Drosophila teissieri (Fawcett et al. 1986; Abad et al. 1989), with E values all lower than 3 3 e29 according to a BLASTX analysis. A tandem repeat of seven TAAs was found at the 39 end of MosquI-Aa1, similar to the Drosophila I factors. Further analysis described below indicates that MosquI is distantly related to the Drosophila I factors. Relatively Low Copy Number of MosquI in A. aegypti To investigate the relative abundance and diversity of the MosquI family, a genomic library was screened, using the MosquI-Aa1 probe, under the conditions described in Materials and Methods. Eleven positive plaques were identified out of approximately 40,000 plaques. Therefore, based on the information described in Materials and Methods, there should be approximately 14 copies of MosquI elements per haploid A. aegypti genome. Six of the positive MosquI clones were further analyzed. One was shown to be the same as the clone containing MosquI-Aa1. Two other clones were shown to be identical to one another. This is consistent with the low copy number of MosquI elements in the genome. MosquI Retrotransposons Structural and Sequence Analysis of the Full-Length MosquI-Aa2 Suggest that It Is Highly Similar to the Drosophila I Factors In addition to MosquI-Aa1, four other MosquI elements were isolated and sequenced (Genbank accession numbers AF134900–AF134903). The only full-length element, MosquI-Aa2, is 6,623 bp long, containing two ORFs flanked by a 404-bp 59 untranslated region and a 326-bp 39 untranslated region, as shown in figure 1. ORF1 and ORF2 are 496 and 1,208 amino acids long, respectively, separated by a 781-bp noncoding sequence. The 59 region contains an initiator sequence CAGT and a downstream regulatory sequence AGANNCGTG, similar to those known to regulate transcription of other non-LTR retrotransposons (Minchiotti, Contursi, and Di Nocera 1997). MosquI-Aa2 also contains a tandem repeat of six TAAs at its 39 end, which is similar to the unique sequence at the 39 end of Drosophila I factors. Moreover, as shown in figure 2, the overall organization of the two ORFs and the domains of MosquI-Aa2 is the same as that of the Drosophila I factors. As shown in table 1, ORF1 of MosquI-Aa2 contains a domain which is most similar to the nucleocapsids of the I factors of D. melanogaster (Dawson et al. 1997; Seleme et al. 1999) according to a BLASTP analysis. Three CCHC motifs, characteristic of the nucleocapsid domains of many retrotransposons, were identified as shown in figure 1. BLAST analysis also showed that a region in the ORF1, downstream of the CCHC motifs, had a relatively low similarity to a coiled-coil motif found in the tropomyosin of the yeast (Pohlmann and Philippsen 1996). Although the coiled-coil motif has not been found in invertebrate non-LTR retrotransposons, it has been found in human long interspersed nuclear elements (LINEs). This motif may be responsible for generating ribonucleoprotein complexes by multimerization (Hohjoh and Singer 1996). It is not clear whether or not the potential coiled-coil motif in MosquIAa2 has a similar function. The ORF2 contains three domains, namely endonuclease, reverse transcriptase, and RNase H domains. As shown in table 1, when sequences of these domains were used as queries in BLAST searches, the Drosophila I factors were again among the most similar sequences in all of these domains. The similarities in overall organization and domain sequences suggest that MosquI is related to the Drosophila I factors. However, the relatively high level of sequence divergence between MosquI and I factors (table 1) indicates that they may be distantly related. Phylogenetic Analysis of the Reverse Transcriptase Domain Is Consistent with the Hypothesis that MosquI Is Distantly Related to the Drosophila I Factors Phylogenetic relationships between MosquI and 39 other non-LTR retrotransposons were analyzed using the reverse transcriptase domain, as shown in figure 3. The basic pattern of the relationship of the reverse transcriptase domains of these 40 non-LTR retrotransposons are the same as that of 33 non-LTR retrotransposons analyzed by Tu, Isoe, and Guzova (1998). In addition to 1677 four major groupings shown in the previous analysis, two other groups are identified because of the addition of new elements in the analysis. All six major groupings were supported by bootstrap replicates, scoring higher than 50% in all three different methods, namely minimum evolution, neighbor joining, and maximum parsimony. MosquI-Aa2 belongs to group V, together with LINE1-Bg and the I factors from two Drosophila species. LINE1-Bg is a fragment of a non-LTR retrotransposon from a snail, Biomphalaria glabrata (Knight et al. 1992). The reverse transcriptase domains of MosquI and LINE1-Bg form a subgroup, while the two I factors form another. The evolutionary implications of such groupings are discussed below. The branches separating MosquI and the other three elements are rather long, indicating that they are distantly related. The sister relationship between Ingi-Tb (Murphy et al. 1987) and the group V shown in figure 3 was not supported by bootstrap analyses. Because this is an unrooted tree, the relative relationships between the major groups are not certain. In summary, structural analysis, sequence comparisons, and phylogenetic analysis all suggest that MosquI is likely a distant relative of the Drosophila I factors. Frequent 59 Truncations May Be Caused by Incomplete Reverse Transcription Only one of the five sequenced elements, MosquIAa2, is full-length. The rest are truncated copies ranging from 443 to 1,300 bp. As shown in figure 4A, all truncations happened at the 59 end. The 39 termini are intact, although the number of TAA repeats varies. Moreover, all five elements are flanked by short direct repeats that are putative target duplications, suggesting that the truncations are due neither to deletion nor to recombination after insertion. It is likely that the truncations were caused by incomplete reverse transcription. There is no striking consensus among the direct repeats flanking the five MosquI elements (fig. 4B). The Full-Length and Truncated MosquI Elements Form Two Subfamilies Shown in figure 4A is a multiple-sequence alignment of the five MosquI elements. It is apparent that in the region shared by all five elements, the four truncated copies are much more similar to each other than to the full-length MosquI-Aa2. For example, 101 changes and 2 insertions are found in MosquI-Aa2 when compared with the consensus of the five elements. Strikingly, only one of these differences is shared with a truncated copy of MosquI. As shown in table 2, the pairwise comparisons between the four truncated MosquI elements showed 96.7%–99.5% identity at the nucleotide level. However, the pairwise comparisons between the fulllength MosquI-Aa2 and the truncated copies showed only 80.2%–81.8% identity. Therefore, the truncated copies and the full-length MosquI-Aa2 may form two subfamilies based on their sequence divergence. Moreover, phylogenetic analyses of the five MosquI elements showed that the four truncated copies clustered together, while MosquI-Aa2 was separated as a long branch (data not shown), which is consistent with the grouping de- 1678 Tu and Hill FIG. 1.—Nucleotide and deduced amino acid sequence of MosquI-Aa2, a full-length non-LTR retrotransposon in Aedes aegypti. The 11-bp direct repeats flanking MosquI-Aa2 are boxed. MosquI-Aa2 contains two open reading frames. Four putative domains are marked by arrows, including nucleocapsids, endonuclease, reverse transcriptase, and RNase H domains. Three CCHC motifs in the nucleocapsids are underlined. Note that six TAA tandem repeats are found at the 39 end. MosquI Retrotransposons 1679 FIG. 1 (Continued) scribed above. However, as it is not clear where the root is in the phylogenetic trees, the evolutionary relationships of these elements are not yet certain. As shown in figure 4A, a large portion of the multiple-sequence alignment is in the 39 untranslated region. The levels of similarity in the coding and the 39 regions are quite similar between the truncated copies except for MosquI-Aa3, which has several divergent nucleotides near the 59 truncation. However, the similarities between MosquI-Aa2 and the four truncated copies are higher (84.6%–85.8%) in the coding region than in the 39 untranslated region (77.4%–78.9%), perhaps indicating a slower rate of mutation in the coding sequence. MosquI Elements Are Often Found Near Genes and Other Transposable Elements As shown in figure 5, three of the five MosquI elements, MosquI-Aa1, MosquI-Aa4, and MosquI-Aa5, are 1680 Tu and Hill FIG. 2.—Structure of MosquI-Aa2 (A) and the I factor of Drosophila melanogaster (B). The two open reading frames (ORFs) are shown as open boxes and are separated by a short untranslated region. The domains in each of the ORFs are marked by solid lines above the ORFs. Both MosquI-Aa2 and the I factor of D. melanogaster contain short 59 and 39 untranslated regions and tandem TAA repeats. NC 5 nucleocapsids; ENDO 5 endonuclease; RT 5 reverse transcriptase; RH 5 RNase H. near genes. These genes are AaE74-1 (unpublished data), a gene similar to a Caenorhabditis elegans gene coding for an unknown protein (Wilson et al. 1994; E 5 3 3 e217), and a gene similar to a serine/threonine protein phosphatase gene of D. melanogaster (Dombradi et al. 1990; E 5 6 3 e214), respectively. Furthermore, each of the five MosquI elements is close to at least one transposable element. In many cases, MosquI elements are close to multiple transposable elements. For example, four transposable elements are found near MosquIAa5. Interestingly, MosquI-Aa1, MosquI-Aa3, and MosquI-Aa5 contain a transposable element inserted within their sequences. Except for the BEL-like element (Davis and Judd 1995), the Q-like element (Besansky, Bedell, and Mukabayire 1994), and the Wuneng element (Tu 1997), all transposable elements near a MosquI are, or are likely to be, full-length. Discussion Evolutionary Relationship Between MosquI and Other Non-LTR Retrotransposons In addition to the discovery and characterization of MosquI, we have presented evidence suggesting that MosquI is highly similar to the Drosophila I factors. We have also shown that MosquI belongs to the same group as the I factors and LINE1-Bg (group V) based on analyses of the reverse transcriptase domains of 40 non- LTR retrotransposons (fig. 3). However, the bootstrap values for group V were the lowest (63, 68, and 55) among the six groups, and the sister relationship between Ingi-Tb and group V was not supported by bootstrap analyses. Using an expanded alignment of the reverse transcriptase domain, Malik, Burke, and Eickbush (1999) recently classified 72 non-LTR retrotransposons into 11 clades. This extensive new phylogeny is largely the same as those of previous analyses (Xiong and Eickbush 1990; Tu, Isoe, and Guzova 1998) and what we described here. However, the classification is much more comprehensive and the resolution is improved. The I clade in Malik, Burke, and Eickbush’s (1999) groupings includes the Drosophila I factors, LINE1-Bg (BGR), Ingi-Tb (ingi), and L1Tc. Similar to group V in our analysis, the I clade is ‘‘the poorest defined,’’ as it was not supported by bootstrap analysis using maximum parsimony. Although MosquI is highly similar to I factors based on a number of criteria, its reverse transcriptase domain is most similar to that of LINE1-Bg (table 1). MosquI and LINE1-Bg form a subgroup within group V which is supported by bootstrap analysis as shown in figure 3. LINE1-Bg is a fragment of a non-LTR retrotransposon from a snail, B. glabrata (Knight et al. 1992). If we assume vertical transmission as suggested by Malik, Burke, and Eickbush (1999), the above phy- Table 1 Comparison of the Domains of MosquI-Aa2 with Those of Other Non-LTR Retrotransposons NC S 1-Dm . . . . . . 123/246 I-Dt . . . . . . . 122/246 Tras1-Bm . . 88/209 R1/R2-Nv . . 52/140 E 9 8 9 2 3 3 3 3 ENDO 229 e e228 e 26 e 24 S I-Dt . . . . . 106/212 I-Dm . . . . 104/212 Lian-Aa1 45/87 RT1-Ag . . 50/94 E RT 23e NA 3 3 e 28 5 3 e 27 225 S LINE1-Bg . . 145/273 I-Dm . . . . . . 129/267 I-Dt . . . . . . . 125/267 Hyp1-Cte . . . 96/193 E 3 3 2 1 3 3 3 3 245 e e228 e224 e222 RH S Trim-Dmi bilbo-Ds. . . Lian-Aa1 . . I-Dm . . . . . 66/127 62/122 59/124 59/124 E 3 3 2 3 3 3 3 3 e212 e212 e 28 e 26 NOTE.—NC 5 nucleocapsids; ENDO 5 endonuclease; RT 5 reverse transcriptase; RH 5 RNase H; S 5 similar residues over total comparable residues. E 5 E value calculated during a BLAST search. The domains were first identified by pairwise comparisons with known domains from a number of retrotransposons. BLAST analyses were performed to find similar sequences in the database. Only the four most similar sequences were shown for each domain. The E value for the comparison between MosquI and I-Dm in the ENDO domain is not available because the correction of the I-Dm ENDO domain is not entered in the database. References for the non-LTR retrotransposons are as follows: I-Dm—Fawcet et al. (1986), modified according to Abad et al. (1989); I-Dt—Abad et al. (1989); Tras1Bm—Okazaki, Ishikawa, and Fujiwara (1995); R1/R2-Nv—GenBank L00950; Lian-Aa1—Tu, Isoe, and Guzova (1998); RT1-Ag—Besansky et al. (1992); LINE1Bg—Knight et al. (1992); Hyp1-Cte—Blinov et al. (1997); Trim-Dmi—Steinemann and Steinemann (1991); bilbo-Ds—Blesa and Martinez-Sebastian (1997). MosquI Retrotransposons 1681 FIG. 3.—Phylogenetic analyses of the reverse transcriptase domains of 40 non-LTR retrotransposons including MosquI-Aa2 (marked by an asterisk). Thirty-three of the 40 elements were analyzed in figure 6A of Tu, Isoe, and Guzova (1998). The seven additional elements include MosquI-Aa2 (this paper), I-Dt (Abad et al. 1989), LINE1-Mg (GenBank accession number AF018033), Helena-Dy (Petrov, Lozovskaya, and Hartl 1996), bilbo-Ds (Blesa and Martinez-Sebastian 1997), RT-Ce2 (Wilson et al. 1994), and RT1-Sm (Drew and Brindley 1997). The alignments used here were obtained using Pileup of GCG (gap weight 5 8, gap length weight 5 1). A few minor adjustments were made at the N-terminal end. The entire alignment is deposited in the EMBL database (accession number DS37921). The alignment is highly similar to that of Tu, Isoe, and Guzova (1998) and Xiong and Eickbush (1990). The tree shown here is an unrooted phylogram constructed using a minimum-evolution algorithm. The heuristic search was conducted using the tree bisection-reconnection (TBR) branch-swapping algorithm. All characters are of equal weight and unordered. Three different methods were used, including minimum evolution, neighbor joining and maximum parsimony. Confidence of the groupings was estimated using 500 bootstrap replications. Each Arabic numeral at the base of a node is the bootstrap value which represents the percentage of times out of 500 bootstrap resamplings that branches were grouped together at a particular node. The first, second, and third numbers at a particular node represent the bootstrap values derived from minimum-evolution, neighbor-joining, and maximumparsimony analysis, respectively. For the parsimony analysis, 20 random additions were done in each bootstrap replicate. Only groupings scored higher than 50% in all three bootstrap analyses are marked. The Roman numerals at the bases of branch nodes indicate a major grouping of elements. For example, group I includes elements from Tart-Dm to Juan-Aa. The bootstrap values supporting the six major groups are shown separately at the bottom right. All phylogenetic analyses were conducted using PAUP* 4.0 b1 (Swofford 1998). logenetic analysis would indicate that MosquI may be a paralog of the Drosophila I factors because it is closer to LINE1-Bg from a snail than to the Drosophila I factors. Thus, there may be at least two subgroups (subclades) within group V (or the I clade) that were separated before the split between the ancestors of mollusca and arthropoda at the latest. Analysis of other related elements from different genomes will certainly help to improve our understanding of the evolution of elements in this relatively poorly defined group. Genomic Distribution of MosquI There are approximately 14 copies of MosquI elements, estimated using the stringency described in Materials and Methods. There might be other copies of MosquI with more divergent sequences that were not 1682 Tu and Hill FIG. 4.—A, Multiple-sequence alignment of the four truncated MosquI elements and the 39 end of the full-length MosquI-Aa2. Pileup of GCG was used to generate the alignment (GapWeight 5 3, GapLengthWeight 5 0). Insertions in MosquI-Aa2, MosquI-Aa3, and MosquI-Aa5 were removed prior to generating the alignment. The consensus sequence of the above alignment was created by Pretty (plurality 5 3, threshold 5 1) of GCG. Dots indicate sequences that are identical to the consensus. Lowercase letters indicate sequence variation. Dashed lines indicate gaps.‘‘,’’ indicates a 59 truncation. An asterisk indicates the stop codon separating ORF2 and the 39 untranslated region of MosquI. Note that the sizes of the four truncated copies vary. B, Direct repeats flanking the five MosquI elements. The lower case ’taa’ indicates the equal possibility of this being part of the TAA tandem repeat of MosquI-Aa1. The lowercase letter ‘‘g’’ indicates the difference at the first nucleotide between the 59 repeat (g) and the 39 repeat (c). The lowercase letter ‘‘t’’ indicates that it is missing in the 39 repeat. Table 2 Percentages of Sequence Identity Between the MosquI Elements MosquI-Aa1 MosquI-Aa3 MosquI-Aa4 MosquI-Aa5 MosquI-Aa3 MosquI-Aa4 MosquI-Aa5 MosquI-Aa2 .... .... .... .... 97.1 99.5 97.9 81.7 97.6 96.7 80.2 97.6 81.8 80.4 NOTE.—See figure 4 for the sequence alignment of all five MosquI elements. detected during the screening. However, because MosquI-Aa2, which is 84% identical to the probe, was detected under these conditions, any MosquI element that was not detected should be quite different from the two subfamilies described here. The distribution of MosquI in the A. aegypti genome does not seem to be random. Instead, MosquI elements are often found near genes and other repetitive elements. It is interesting to note that three out of the four truncated MosquI elements contain an insertion of an intact transposable element of a different family. It is possible that many MosquI ele- MosquI Retrotransposons 1683 FIG. 5.—MosquI elements and nearby genes and other transposable elements. The figure is not drawn to scale. MosquI elements are shown as open boxes. Thick arrows indicate retrotransposons, including Mosqcopia-Aa1 (unpublished data), Lian (Tu, Isoe, and Guzova 1998), and two elements similar to BEL (Davis and Judd, 1995) and Q (Besansky, Bedell, and Mukabayire 1994). The orientation of the arrows represents the orientation of the retrotransposon. Boxes with small squares indicate elements of the Feilai family of SINEs (Tu 1999). Boxes with slanted stripes indicate miniature inverted-repeat transposable elements, including Wujin, Wuneng, Pony, and Dopey (Tu 1997, unpublished data). The box with horizontal stripes indicates an as yet unclassified repetitive element. Dotted boxes indicate open reading frames of genes, including AaE74-1 (unpublished data), a gene similar to a Caenorhabditis elegans gene coding for an unknown protein (Wilson et al. 1994), and a gene similar to a serine/threonine protein phosphatase gene of D. melanogaster (Dombradi et al. 1990). A question mark indicates that the relative position is undetermined. ments may be biased toward noncoding regions of genes where repetitive elements concentrate. This is consistent with our preliminary analysis showing concentrations of a number of repetitive elements in the noncoding regions of a number of genes in A. aegypti (unpublished data). Obviously, more copies of MosquI elements need to be analyzed to further understand their distribution. Nonrandom distributions of retrotransposons and other transposable elements have been previously shown in A. aegypti (Tu 1997, 1999; Tu, Isoe, and Guzova 1998). The distribution patterns of transposable elements are likely the result of complex interactions between different families of elements and/or between elements and the host genome. Several mechanisms could account for the nonrandom distribution and association between different families of repetitive elements and genes in the genome, as discussed in detail in Tu (1999). In this regard, it may be helpful to view the genome as a complex ecological system within which the lineage of the host and the lineages of different transposable elements evolve (Brookfield 1995; Kidwell and Lisch 1997). MosquI and Other Non-LTR Retrotransposons in A. aegypti In addition to MosquI, three other families of nonLTR retrotransposons have been reported in A. aegypti, including Juan, JAM1, and Lian (Mouches, Bensaadi, and Salvado 1992; Hughes, Warren, and Crampton 1996; Tu, Isoe, and Guzova 1998). These three elements belong to three different clades (Juan: Jockey clade; JAM1: RTE clade; Lian: LOA clade) as defined by Malik, Burke, and Eickbush (1999). MosquI belongs to group V in our analysis, which is equivalent to the I clade in Malik, Burke, and Eickbush (1999). We have also identified non-LTR retrotransposons in A. aegypti that belong to the CR1 clade and the R1 clade (unpublished data). Thus, there is a diverse group of non-LTR retrotransposons in A. aegypti, at least one representative from six different clades as defined by Malik, Burke, and Eickbush (1999). Aedes aegypti has a genome five times the size of the Drosophila genome. It contains many families of highly reiterative transposable elements, including the three non-LTR retrotransposons mentioned above. However, there are only approximately 14 copies of MosquI per haploid genome. The difference in copy number between MosquI and other non-LTR retrotransposons may reflect various interactions between non-LTR retrotransposons and the A. aegypti genome. The copy number of I factors in Drosophila is also low, with 10–15 copies on the chromosomal arms and approximately 30 defec- 1684 Tu and Hill tive copies in b-heterochromatin (Busseau et al. 1994). It has been shown that the transpositional activity of the I factors in Drosophila can be repressed by the transcription of transgenes containing a small internal region of the I element (Jensen, Gassama, and Heidmann 1999). Four of the five analyzed MosquI elements are truncated. It is not clear whether these truncated copies helped repress the activity of the full-length copy, thus keeping the number of MosquI elements low. Evolutionary Origins of the Two Subfamilies of MosquI Elements and Comparisons with the I Factors in Drosophila Sequence comparisons and phylogenetic analyses suggest that there may be two subfamilies of MosquI elements, namely the truncated copies and the fulllength MosquI-Aa2. It is likely that the truncated copies analyzed here are derived from a source other than the full-length MosquI-Aa2. It is possible that there is at least one other full-length MosquI element which is the progenitor of the truncated copies. Alternatively, the truncated copies may have been originated from a truncated master gene, borrowing the retrotransposition machinery of the full-length MosquI-Aa2. However, the latter hypothesis is not likely, as the promoter for the nonLTR retrotransposons is believed to be in the 59 end (McLean, Bucheton, and Finnegan 1993), which is missing in the truncated copies. Interestingly, it has been shown that there are also two subfamilies of I factors in D. melanogaster, the defective I factors found in the pericentromeric regions of both the inducer and the reactive strains, and the I factors originating from the active full-length I factors which are scattered on chromosomal arms in the inducer strains (Busseau et al. 1994). It is believed that the defective I factors are ancient components of the Drosophila genome, while the active I factors invaded the natural populations of D. melanogaster in recent decades. However, the evolutionary relationship between the two subfamilies of MosquI elements in A. aegypti is likely to be quite different from the relationship between the two subfamilies of I factors in D. melanogaster. First of all, all truncated copies of MosquI analyzed so far are flanked by direct repeats, while none of the sequenced defective I factors in D. melanogaster are. In this respect, the truncated MosquI elements are more similar to the incomplete copies of the active I factors in the inducer strains of D. melanogaster which are flanked by direct repeats. Moreover, some of the truncated MosquI elements are highly similar to each other (99.5% identity). These data suggest that the subfamily of truncated MosquI elements may have been transposing relatively recently. Based on the presence of the complete sequence and intact ORFs and the presence of direct repeats, the full-length MosquI-Aa2 is also likely an active or recently active element. However, since it is the only sequence available in this subfamily, it is difficult to assess the relative time of its activity. Sequence identities between the two subfamilies are relatively low, 84.6%–85.8% in the coding region, and 77.4%–78.9% in the 39 untranslated region. On the other hand, the defective and active I factors in D. melanogaster are 94% identical. Thus, either the source genes of the two subfamilies of MosquI diverged a long time ago, or the they have been evolving at a much faster rate than the I factors in D. melanogaster. We hypothesize that two divergent MosquI elements have recently been transposing in the genome of A. aegypti, although it is not clear whether either one of them is still active. Analysis of more MosquI sequences from different strains and natural populations of A. aegypti, and perhaps MosquI from closely related species of mosquitoes, may be necessary to further understand the evolution of this family of retrotransposons and their relationships to different mosquito genomes. Potential Applications of the Analysis of Endogenous Mosquito Transposable Elements Mosquito-transmitted diseases such as malaria and dengue fever are on the rise because traditional control methods have become less effective. An alternative approach is being investigated in which mosquitoes are genetically transformed to become refractory to disease pathogens. Analysis of the characteristics, evolution, and spread of endogenous mosquito transposable elements such as MosquI will provide important basic information needed for the long-term success of such a genetic strategy. For example, knowledge of the behavior of mosquito transposable elements and their interactions with the host genomes may help in devising better transposon-derived transformation vectors to reduce possible inactivation by endogenous transposable elements and cross-mobilization of endogenous transposable elements. Moreover, active elements may be identified during the analysis of endogenous transposable elements in mosquitoes. It is not yet clear how effective it will be to use endogenous transposable elements as transformation vectors in the same species. However, active elements found in A. aegypti may at least have the potential to serve as transformation tools in different mosquitoes, such as Anopheles gambiae. Finally, some of the endogenous transposable elements can be used to develop markers for genetic mapping and population studies, which are also necessary for the development of a successful and sustained genetic strategy to control mosquito-transmitted diseases. Acknowledgments We thank H. H. Hagedorn and M. G. Kidwell for critical comments on the manuscript. We thank A. A. James for the gift of a genomic library of A. aegypti. We also thank Skip Vaught and others at the Sequencing Facility of the University of Arizona for their excellent service. This work was supported by NIH grant AI42121 to Z.T. and by a MacArthur Foundation grant to the Center for Insect Science of the University of Arizona. LITERATURE CITED ABAD, P., C. VAURY, A. PELISSON, M. C. CHABOISSIER, I. BUSSEAU, and A. BUCHETON. 1989. A long interspersed repet- MosquI Retrotransposons itive element—the I factor of Drosophila teissieri—is able to transpose in different Drosophila species. Proc. Natl. Acad. Sci. USA 86:8887–8891. ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHÄFFER, J. ZHANG, Z. ZHANG, W. MILLER, and D. J. LIPMAN. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402. BESANSKY, N. J., J. A. BEDELL, and O. MUKABAYIRE. 1994. Q: a new retrotransposon from the mosquito Anopheles gambiae. Insect Mol. Biol. 3:49–56. BESANSKY, N. J., S. M. PASKEWITZ, D. M. HAMM, and F. H. COLLINS. 1992. Distinct families of site-specific retrotransposons occupy identical positions in the rRNA genes of Anopheles gambiae. Mol. Cell. Biol. 12:5102–5110. BLESA, D., and M. J. MARTINEZ-SEBASTIAN. 1997. bilbo, a non-LTR retrotransposon of Drosophila subobscura: a clue to the evolution of LINE-like elements in Drosophila. Mol. Biol. Evol. 14:1145–1153. BLINOV, A. G., Y. V. SOBANOV, S. V. SCHERBIK, and K. G. AIMANOVA. 1997. The Chironomus (Camptochironomus) tentans genome contains two non-LTR retrotransposons. Genome 40:143–150. BROOKFIELD, J. F. Y. 1995. Transposable element as selfish DNA. Pp. 130–153 in D. J. SHERRATT, ed. Mobile genetic elements. Oxford University Press, Oxford, England. BUCHETON, A., M. SIMONELIG, C. VAURY, and M. CROZATIER. 1986. Sequences similar to the I transposable element involved in I-R hybrid dysgenesis in D. melanogaster occur in other Drosophila species. Science 322:650–652. BURTIS, K. C., C. S. THUMMEL, C. W. JONES, F. D. KARIM, and D. S. HOGNESS. 1990. The Drosophila 74EF early puff contains E74, a complex ecdysone-inducible gene that encodes two ets-related proteins. Cell 61:85–99. BUSSEAU, I., M.-C. CHABOISSIER, A. PELISSON, and A. BUCHETON. 1994. I factors in Drosophila melanogaster: transposition under control. Genetica 93:101–116. DAVIS, P. S., and B. H. JUDD. 1995. Nucleotide sequence of the transposable element, BEL, of Drosophila melanogaster. Drosoph. Inf. Serv. 76:134–136. DAWSON, A., E. HARTSWOOD, T. PATERSON, and D. J. FINNEGAN. 1997. A LINE-like transposable element in Drosophila, the I factor, encodes a protein with properties similar to those of retroviral nucleocapsids. EMBO J. 16:4448–4455. DOMBRADI, V., J. M. AXTON, N. D. BREWIS, E. F. DA CRUZ E SILVA, L. ALPHEY, and P. T. COHEN. 1990. Drosophila contains three genes that encode distinct isoforms of protein phosphatase 1. Eur. J. Biochem. 194:739–745. DREW, A. C., and P. J. BRINDLEY. 1997. A retrotransposon of the non-long terminal repeat class from the human blood fluke Schistosoma mansoni. Similarities to the chicken-repeap-1-like elements of vertebrates. Mol. Biol. Evol. 14: 602–610. FAWCETT, D. H., C. K. LISTER, E. KELLETT, and D. J. FINNEGAN. 1986. Transposable elements controlling I-R hybrid dysgenesis in D. melanogaster are similar to mammalian LINEs. Cell 47:1007–1015. FELSENSTEIN, J., and H. KISHINO. 1993. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42:193–200. FENG, Q., J. V. MORAN, H. H. KAZAZIAN JR., and J. D. BOEKE. 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905–916. FINNEGAN, D. J. 1989. The I factor and I-R hybrid dysgenesis in Drosophila melanogaster. Pp. 503–517 in D. E. BERG and M. M. HOME, eds. Mobile DNA. American Society of Microbiology, Washington, D.C. 1685 . 1992. Transposable elements. Curr. Opin. Genet. Dev. 2:861–867. . 1997. Transposable elements: how non-LTR retrotransposons do it. Curr. Biol. 7:R245–R248. HOHJOH, H., and M. F. SINGER. 1996. Cytoplasmic ribonucleoprotein complexes containing human LINE-1 protein and RNA. EMBO J. 15:630–639. HUGHES, M. A., A. M. WARREN, and J. M. CRAMPTON. 1996. JAM1: a novel LINE transposable element in the genome of the medically important mosquito, Aedes aegypti. Pp. 276 in Proceedings of the XXth International Congress of Entomology, Florence, Italy. HUTCHINSON, C. A., S. C. HARIES, D. D. LOEB, W. R. SHEHEE, and M. H. EDGELL. 1989. LINEs and related retroposons: long interspersed repeated sequences in the eucaryotic genome. Pp. 593–617 in D. E. BERG and M. M. HOME, eds. Mobile DNA. American Society of Microbiology, Washington, D.C. JENSEN, S., M. P. GASSAMA, and T. HEIDMANN. 1999. Taming of transposable elements by homology-dependent gene silencing. Nat. Genet. 21:209–212. KIDWELL, M. G., and D. LISCH. 1997. Transposable elements as sources of variation in animals and plants. Proc. Natl. Acad. Sci. USA 94:7704–7711. KNIGHT, M., A. MILLER, N. RAGHAVAN, C. RICHARDS, and F. LEWIS. 1992. Identification of a repetitive element in the snail Biomphalaria glabrata: relationship to the reverse transcriptase-encoding sequence in LINE-1 transposons. Gene 118:181–187. LEVIN, H. L. 1997. It’s prime time for reverse transcriptase. Cell 88:5–8. LUAN, D. D., M. H. KORMAN, J. L. JAKUBCZAK, and T. H. EICKBUSH. 1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595–605. MCLEAN, C., A. BUCHETON, and D. J. FINNEGAN. 1993. The 59 untranslated region of the I factor, a long interspersed nuclear element-like retrotransposon of Drosophila melanogaster, contains an internal promoter and sequences that regulate expression. Mol. Cell. Biol. 13:1042–1050. MALIK, H. S., W. D. BURKE, and T. H. EICKBUSH. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793–805. MINCHIOTTI, G., C. CONTURSI, and P. P. DI NOCERA. 1997. Multiple downstream promoter modules regulate the transcription of the Drosophila melanogaster I, Doc and F elements. J. Mol. Biol. 267:37–46. MOUCHES, C., N. BENSAADI, and J. C. SALVADO. 1992. Characterization of a LINE retroposon dispersed in the genome of three non sibling Aedes mosquito species. Gene 120:183– 190. MURPHY, N. B., A. PAYS, P. TEBABI, H. COQUELET, M. GUYAUX, M. STEINERT, and E. PAYS. 1987. Trypanosoma brucei repeated element with unusual structural and transcriptional properties. J. Mol. Biol. 195:855–871. OKAZAKI, S., H. ISHIKAWA, and H. FUJIWARA. 1995. Structural analysis of Tras1, a novel family of telomeric repeat-associated retrotransposons in the silkworm, Bombyx mori. Mol. Cell. Biol. 15:4545–4552. PETROV, D. A., E. R. LOZOVSKAYA, and D. L. HARTL. 1996. High intrinsic rate of DNA loss in Drosophila. Nature 384: 346–349. POHLMANN, R., and P. PHILIPPSEN. 1996. Sequencing a cosmid clone of Saccharomyces cerevisiae chromosome XIV reveals 12 new open reading frames (ORFs) and an ancient duplication of six ORFs. Yeast 12:391–402. 1686 Tu and Hill RAO, P. S., and K. S. RAI. 1987. Inter and intraspecific variation in nuclear DNA content in Aedes mosquitoes. Heredity 59: 253–258. SAMBROOK, J., E. F. FRITSCH, and T. MANIATIS. 1989. Molecular cloning: a laboratory manual. 2nd edition. Cold Spring Harbor Press, Cold Spring Harbor, N.Y. SELEME, M. D., I. BUSSEAU, S. MALINSKY, A. BUCHETON, and D. TENINGES. 1999. High-frequency retrotransposition of a marked I factor in Drosophila melanogaster correlates with a dynamic expression pattern of the ORF1 protein in the cytoplasm of oocytes. Genetics 151:761–771. SIMONELIG, M., C. BAZIN, A. PELISSON, and A. BUCHETON. 1988. Transposable and nontransposable elements similar to the I factor involved in inducer-reactive (IR) hybrid dysgenesis in Drosophila melanogaster coexist in various Drosophila species. Proc. Natl. Acad. Sci. USA 85:1141–1145. STEINEMANN, M., and S. STEINEMANN. 1991. Preferential Y chromosomal location of TRIM, a novel transposable element of Drosophila miranda, obscura group. Chromosoma 101:169–179. SWOFFORD, D. L. 1998. PAUP*. Version 4.0 b1. (A commercial test version; completed version 4.0 to be distributed by Sinauer, Sunderland, Mass.) TU, Z. 1997. Three novel families of miniature inverted-repeat transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti. Proc. Natl. Acad. Sci. USA 94:7475–7480. . 1999. Genomic and evolutionary analysis of Feilai, a diverse family of highly reiterated SINEs in the yellow fever mosquito, Aedes aegypti. Mol. Biol. Evol. 16:760–772. TU, Z., and H. H. HAGEDORN. 1997. Biochemical, molecular, and phylogenetic analysis of pyruvate carboxylase in the yellow fever mosquito, Aedes aegypti. Insect Biochem. Mol. Biol. 27:133–147. TU, Z., J. ISOE, and J. A. GUZOVA. 1998. Structural, genomic, and phylogenetic analysis of Lian, a novel family of nonLTR retrotransposons in the yellow fever mosquito, Aedes aegypti. Mol. Biol. Evol. 15:837–853. UDOMKIT, A., S. FORBES, C. MCLEAN, and D. J. FINNEGAN. 1996. Control of expression of the I factor, a LINE-like transposable element in Drosophila melanogaster. EMBO J. 15:3174–1381. WILSON, R., R. AINSCOUGH, K. ANDERSON et al. (53 co-authors). 1994. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature 368:32–38. XIONG, Y., and T. H. EICKBUSH. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362. PIERRE CAPY, reviewing editor Accepted September 6, 1999
© Copyright 2026 Paperzz