Yeast Yeast 2001; 18: 865–880. DOI: 10.1002/yea.733 Research Article The diversity of retrotransposons in the yeast Cryptococcus neoformans Timothy J. D. Goodwin* and Russell T. M. Poulter Department of Biochemistry, University of Otago, Cumberland Street, Dunedin, New Zealand * Correspondence to: T. J. D. Goodwin, Department of Biochemistry, University of Otago, PO Box 56, Dunedin, New Zealand. E-mail: [email protected] Received: 25 October 2000 Accepted: 30 January 2001 Abstract We have undertaken an analysis of the retrotransposons in the medically important basidiomycetous fungus Cryptococcus neoformans. Using the data generated by a C. neoformans genome sequencing project at the Stanford Genome Technology Center, 15 distinct families of LTR retrotransposons and several families of non-LTR retrotransposons were identified. Members of at least seven families have transposed recently and are probably still active. For several families, only partial elements could be identified and these are quite diverse in sequence, suggesting that they are ancient components of the C. neoformans genome. Most C. neoformans elements are not closely related to previously identified fungal retrotransposons, suggesting that the diversity of fungal retrotransposons has been only sparsely sampled to date. C. neoformans has fewer distinct retrotransposon families than Candida albicans (37 or more), in particular fewer families represented solely by ancient and inactive elements, but it has considerably more families than either Saccharomyces cerevisiae (five) or Schizosaccharomyces pombe (two). The findings suggest that elimination of retrotransposons is faster in C. neoformans than in C. albicans, but perhaps not as rapid as in S. cerevisiae or Sz. pombe. The identification of the retrotransposons of C. neoformans should assist in the molecular characterization of this important pathogen, and also further our understanding of the role played by retroelements in genome evolution. Copyright # 2001 John Wiley & Sons, Ltd. Keywords: Cryptococcus neoformans; yeast; retrotransposon; genome Introduction Cryptococcus neoformans is a basidiomycetous yeast that is a leading cause of disease in AIDS patients and is also capable of causing disease in people with no known immune disorder (Kwon-Chung and Bennett, 1992). C. neoformans infections present as a meningoencephalitis, which is often fatal if left untreated. Infections are usually not curable. Patients who survive the initial disease undergo long-term suppressive anti-fungal therapy to reduce the likelihood of recurrent infections. C. neoformans isolates can be classified into two varieties and four serotypes, C. neoformans var. neoformans (serotypes A and D) and C. neoformans var. gattii (serotypes B and C), on the basis of various biochemical, morphological and genetic distinctions (Kwon-Chung et al., 1982). The relationships among these groups are not well defined, Copyright # 2001 John Wiley & Sons, Ltd. however, and there can be extensive genetic variation among strains, even within a group (Casadevall et al., 1992; Franzot et al., 1998). This variation is of significance, as it may result in differences in virulence and response to anti-fungal therapy. The mechanisms that generate the diversity among C. neoformans isolates are not well understood. C. neoformans is haploid as isolated in nature and has a well-characterized heterothallic mating system (Tanaka et al., 1996; Kwon-Chung, 1975). The overall population structure appears to be largely clonal, however (Chen et al., 1995; Franzot et al., 1997), suggesting that sexual reproduction is relatively infrequent in nature. Transposable elements have had a profound influence on the evolution of eukaryote genomes (Kidwell and Lisch, 2000). These elements often constitute a large proportion of the genome, for instance, they make up 40% or more of the human 866 genome (Smit, 1999) and as much as 70% of some plant genomes (San Miguel and Bennetzen, 1998). Insertions of transposable elements near genes can lead to alterations in gene expression patterns — as the elements usually contain transcriptional regulatory sequences — while insertions within genes can directly alter gene structure. Recombination between elements at different sites can lead to large-scale chromosomal rearrangements. As a result of the potentially deleterious effects of transposable element proliferation, host organisms have often evolved mechanisms, such as specific methylation, that limit the activity of these elements. Similarly, because transposable elements are, in general, only rarely transmitted across species boundaries, their continued existence is usually dependent upon the continued survival of their hosts. As a result, the elements themselves often appear to have evolved mechanisms, such as directing their integration to specific parts of the genome, which keep the damage they cause to a minimum (Sandmeyer, 1998). The transposable elements most commonly isolated from fungi to date are retrotransposons. These are mobile elements that replicate via an RNA intermediate (Boeke and Stoye, 1997). Autonomous examples encode their own reverse transcriptase and are typically 5–10 kb long. There are two broad classes; the LTR retrotransposons, and the non-LTR retrotransposons (also known as LINE elements). LTR retrotransposons are characterized by long terminal repeats (LTRs) at each end. These are necessary for the replication cycle and also contain sequences that regulate transcription of the element. Non-LTR retrotransposons lack LTRs. They often contain an internal promoter and frequently terminate in poly-A tails. The early identification and characterization of retrotransposons can assist in the analysis of new genomes in several ways. First, because these elements are often highly homogeneous in sequence and dispersed throughout the genome, they can potentially lead to serious difficulties in sequence assembly. An early estimate of the diversity and abundance of these elements may allow appropriate measures to be taken to avoid subsequent problems in sequence assembly. Second, these elements often display fairly relaxed targeting specificities, and transposition can often be induced to high levels in vivo. Such properties allow these elements to act as efficient, random, tagged, insertional mutagens, Copyright # 2001 John Wiley & Sons, Ltd. T. J. D. Goodwin and R. T. M. Poulter which can prove useful for understanding gene function, including, for example, the identification of virulence factors (Smith et al., 1995). Finally, because these elements are mobile and usually present as multiple copies, they can provide useful markers for strain differentiation and epidemiological analyses. Because of the medical significance of C. neoformans, the genome of one strain is currently being sequenced at the Stanford Genome Technology Center (http://www-sequence.stanford.edu/ group/C.neoformans/index.html). At the time of writing (July 2000), the Stanford C. neoformans sequence database consists of y82 000 random genomic shotgun sequence reads, containing y49 Mb of sequence. This represents a slightly greater than two-fold coverage of the 23 Mb haploid genome (Wickes et al., 1994). We have used this considerable resource to study the diversity of retrotransposons in C. neoformans. The strain being sequenced contains 15 or more distinct families of LTR retrotransposons and a number of non-LTR retrotransposon families. Several families appear to have amplified very recently and probably still contain active members. Other families are present at relatively low copy numbers and are diverse in sequence, suggesting that they are ancient components of the C. neoformans genome. The findings should assist in the characterization of the C. neoformans genome, and also contribute to our understanding of the role played by retrotransposons in genome evolution. Materials and methods Sequence analyses Sequence data from Cryptococcus neoformans strain JEC21 (Heitman et al., 1999) was obtained from the Stanford Genome Technology Center website (http://www-sequence.stanford.edu/group/C.neoformans/ index.html). The bulk of the analyses reported here were performed on the sequence obtained up to the end of July 2000. The database at this time consisted of 82 330 random shotgun sequence reads, containing 48.7 Mb of sequence. The Stanford C. neoformans sequences were screened using the BLAST facilities provided on the website. General sequence analyses and manipulations were performed using programs of the GCG package (Devereux et al., 1984). Multiple sequence Yeast 2001; 18: 865–880. Cryptococcus neoformans retrotransposons alignments were constructed with CLUSTAL_X (Thompson et al., 1997). Phylogenetic trees were constructed using PHYLIP (Felsenstein, 1989). Nucleotide diversity (Nei and Li, 1979) was calculated using Arlequin (Schneider et al., 2000). For most LTR families, the phylogenetic trees and the nucleotide diversity calculations were based on alignments of all the sequences in the database that contained a full-length LTR, except that apparent duplicates were discarded. For the longest LTR families, few or no single sequences covered the entire LTR. For these elements, all the single sequences that covered a particular 500 bp segment of the LTR were used to construct trees and calculate nucleotide diversity. The nucleotide diversity results for the C. neoformans elements are likely to be slight overestimates, as they are taken from raw sequence data, which will contain some errors. The relative abundance of each LTR family was estimated by counting the number of single sequences in the database that contained a greater than 70% match to a particular 100 bp fragment of the LTR in question. It should be emphasized that these are estimates of relative abundance, not absolute copy number. The structure of the C. neoformans methionine tRNA was studied using tRNAscan-SE (Lowe and Eddy, 1997). The sequences described in this paper are available from the authors’ website at URL http://bioc111. otago.ac.nz : 8001/retrobase/home.htm. Results Identification and classification of retrotransposons in Cryptococcus neoformans Retrotransposons in C. neoformans were identified in several ways. The first elements were detected by virtue of their sequence similarity to previously identified retrotransposons of other species. For instance, several elements were detected in a series of TBLASTN searches (protein query vs DNA database) of the Stanford C. neoformans sequence database, using the protein sequences of a wide variety of retrotransposons from other species as queries. Several other elements were subsequently identified by performing BLASTN and TBLASTN searches of the C. neoformans database, using the elements identified in the original searches as queries. The remaining elements were detected as Copyright # 2001 John Wiley & Sons, Ltd. 867 insertions present within or adjacent to other retrotransposons. The sequences identified at this point could easily be classifed into distinct families, as they were either highly similar to each other (e.g. 95–100% identity in overlapping regions) or otherwise very different in sequence. For each distinct family, we attempted to assemble a representative full-length sequence. This was done by building up contigs, via series of BLAST searches to identify overlapping sequences, using the sequences identified in the original searches as starting points. The extent of sequence identified for each LTR retrotransposon family, and the deduced structures of the elements, are shown in Figure 1. For six families we were able to assemble full-length representative sequences. The other families are at present represented by either: (a) LTRs with associated, but incomplete, internal regions; (b) solo LTRs with no associated internal regions; or (c) partial internal regions with no identified LTRs. The LTRs have been named LTR1–15. The corresponding retrotransposons, where they have been identified, have been named Tcn1–10. Partial internal regions, for which the corresponding LTR is not known, have been named RF1–10 (for retrotransposon fragment). The possibility that some of the partial elements will be found to belong to the same family, as more sequence becomes available, means that we cannot at present determine the exact number of distinct families. However, the greater than two-fold coverage of the genome in the database at the time of writing, and the fact that we have devoted considerable effort to identifying additional families without success, suggests that the sequences shown in Figure 1 represent the majority of distinct LTR retrotransposon families present in this C. neoformans strain. The deduced structure of the most abundant nonLTR element is shown in Figure 2A. This element has been named Cnl1 (C. neoformans LINE-1), and appears to represent a structurally intact nonLTR retrotransposon. There are also several other families of non-LTR retrotransposons in C. neoformans (not shown). These all appear to be degenerate, low-copy-number relatives of Cnl1. Sequence diversity within families It is important to know how representative of each family the full-length assembled sequences are, or, phrased another way, what level of sequence Yeast 2001; 18: 865–880. 868 T. J. D. Goodwin and R. T. M. Poulter Figure 1. Structure and available sequence for each C. neoformans LTR retrotransposon family. (A) Ty3/gypsy elements. (B) Ty1/copia elements. (C) Solo LTRs. Boxes with triangles represent the LTRs. Shaded boxes represent the ORFs. Changes of reading frame are indicated by offset boxes. Stop codons within the ORFs are indicated by vertical lines. The approximate positions of the various domains within the ORFs are indicated above the first element of each group. The common scale is shown at the bottom Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 865–880. Cryptococcus neoformans retrotransposons Figure 2. Structure of the C. neoformans non-LTR retrotransposon, Cnl1. (A) Structure of a single Cnl1 element. The long box represents the single long ORF of the element. Areas of dark shading represent the zinc finger-like sequences. Areas of pale shading represent the conserved regions of reverse transcriptase and the restriction enzymelike endonuclease (REL-Endo.). Thin black lines represent the 5k and 3k untranslated regions. (B) Examples of nested Cnl1 arrays. Thick lines with arrowheads represent the 3k ends of Cnl1 elements. Shaded boxes represent the Cnl1 ORF. Thin lines represent unrelated sequences. These examples are from the sequences named at the right diversity is present within each family? To tackle this problem for the LTR retrotransposons, we started by extracting from the database all the sequences containing a full-length LTR and aligning them according to their families. From the alignments phylogenetic trees were constructed (examples in Figure 3A) and levels of nucleotide diversity (p; Nei and Li, 1979) were calculated (Table 1). The phylogenetic trees allow the diversity of sequences within a family to be visualized directly. For several families we found that all the LTRs are nearly identical in sequence, for example, those of Tcn1. Some families, such as Tcn3, have LTRs that fall into distinct subfamilies, the LTRs within each subfamily, however, being very similar in sequence. Other families, for example LTR12, contain a much broader range of sequences (Figure 3A). In general, we found that LTRs from families for which full-length sequences were easily assembled have the lowest levels of sequence variation, while the most diverse families are those represented only by solo LTRs. The nucleotide diversity calculations (Table 1) Copyright # 2001 John Wiley & Sons, Ltd. 869 Figure 3. Distance trees of full-length LTR sequences (A) and RT sequences (B). (A) All the sequences in the database containing full-length LTRs of a particular family were extracted and aligned. Phylogenetic trees were then constructed by the UPGMA method using PHYLIP. (B) All the sequences in the database containing a particular 400 bp region of the reverse transcriptase gene from four closely related families were extracted and aligned. Trees were again constructed by the UPGMA method using PHYLIP. The trees in A and B are all shown at the same horizontal scale. The distance is Kimura’s (1980) two-parameter distance allow the diversity of sequences within a family to be quantitated. Again, LTRs from families with full-length elements generally have low levels of diversity, e.g. the Tcn2 LTR, for which p=0.022. This equates to an average level of sequence identity of y98%. The highest levels of diversity were again found in solo LTR families, e.g. LTR12, for which p=0.212 (corresponding to an average level of sequence identity of y79%). We then examined the diversity of sequences within the internal regions of some full-length elements. All the sequences in the database containing particular 400 bp segments of either the Gag or reverse transcriptase (RT) coding regions from four related families (Tcn2, 3, 4, and 5; see Phylogenetic analyses, below) were extracted and aligned. The alignments were again used to construct Yeast 2001; 18: 865–880. 870 T. J. D. Goodwin and R. T. M. Poulter Table 1. Properties of C. neoformans retrotransposon LTR families LTR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (Tcn (Tcn (Tcn (Tcn (Tcn (Tcn (Tcn (Tcn (Tcn (Tcn 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) Length (bp) Relative abundancea Nucleotide diversity TSD (bp) Group 501 942 505 433 589 271 174 255 181 528 616 154 393 448 515 53 70 61 33 44 17 16 6 17 18 19 4 5 11 10 0.056 0.022 0.113b 0.024 0.011 0.123 0.184 0.107 0.135 0.004 0.110 0.212 0.259 0.065 0.034 5 5 5 5 ND 5 5 5 5 4 5 5 5 5 4 Ty3/gypsy Ty3/gypsy Ty3/gypsy Ty3/gypsy Ty3/gypsy Ty1/copia Ty1/copia Ty1/copia Ty1/copia Ty3/gypsy ND ND ND ND Ty3/gypsy ND, not determined. a Determined as described in Materials and methods. b This is actually the nucleotide diversity of two closely related, but distinct, subfamilies (Figure 3A). phylogenetic trees and calculate nucleotide diversity. Similar to the results with the LTRs, we found that these families were very homogeneous in their internal regions. Nucleotide diversity levels in the different families ranged from 0.017 to 0.029 for the RT sequences (corresponding to y97–98% average identity), and from 0.011 to 0.050 (95–99% average identity) for the gag sequences. In the RT region, all but four out of 307 differences were simple base substitutions, and 77% of these base substitutions were in the third-base position, and thus may not alter the encoded amino acid sequence. In the gag regions nearly all differences were again simple base substitutions, but the frequency of those in thirdbase positions dropped to 60%. Within the highly conserved RT region, the nucleotide sequences from these four families were sufficiently similar to permit all the sequences to be placed in a single alignment. This allowed the level of sequence diversity within families to be compared to the level of distinction between families. A tree generated from the alignment is shown in Figure 3B. It is evident that the diversity of sequences within a family is small compared to the level of distinction between families, even for these closely related families. A similar analysis was undertaken for a 400 bp segment of the RT gene of the non-LTR retrotransposon Cnl1. The level of nucleotide diversity over this region for Cnl1 was 0.028. All but two Copyright # 2001 John Wiley & Sons, Ltd. from a total of 336 differences were base substitutions. Of these, 86% were third-base substitutions. Overall, these results show that the level of sequence diversity within single C. neoformans retrotransposon families differs considerably in different families. Families for which full-length sequences were easily assembled display low levels of sequence diversity. This suggests that representative full-length assembled sequences are likely to be reliable indicators of the overall sequence and structure of these retrotransposons. Families for which just partial elements, such as solo LTRs, are present, generally display higher levels of diversity. It is likely that the level of sequence diversity within a related group of retrotransposons reflects the amount of time since the group amplified in the genome, as older elements have had more time to accumulate individual differences than younger ones. The finding that the most diverse C. neoformans retrotransposon families also appear to have no intact examples remaining is consistent with this idea. Phylogenetic analyses The relationships between the C. neoformans elements and previously identified retrotransposons were analysed by constructing phylogenetic trees based on multiple sequence alignments of the various Pol domains. For the LTR retrotransposons, we began by constructing trees based on the Yeast 2001; 18: 865–880. Cryptococcus neoformans retrotransposons seven universally conserved domains of reverse transciptase (RT) described by Xiong and Eickbush (1990). We included all the C. neoformans elements for which sequence from all seven RT domains was available, and a wide range of elements from other species, including representatives from all four major groups of LTR retrotransposons. An example of the trees we obtained is shown in Figure 4A. Alignments of highly conserved RT regions are shown in Figure 5. The C. neoformans elements were all found to be members of either the Ty3/ gypsy or Ty1/copia groups. The full-length element Tcn1 and the partial element RF3 were found to be members of a sub-group of Ty3/gypsy elements that includes many fungal elements, such as Tf1 from Schizosaccharomyces pombe and Maggy from Magnaporthe grisea, as well as vertebrate elements such as sushi from the puffer fish Fugu rubripes. Tcn1 and RF3 do not, however, appear to closely related to each other (Figure 4A). The other C. neoformans Ty3/gypsy elements for which phylogenetic analysis was possible (Tcn2, 3, 4 and 5) form a monophyletic group of closely related elements that emerges from near the base of the Ty3/gypsy group and which is not closely related to any previously identified elements. The element Tcn10, for which very little internal sequence was detected, was also assigned to the Ty3/gypsy group on the basis of structural features (see below). These same structural features, however, suggest that Tcn10 is not closely related to the other C. neoformans Ty3/ gypsy elements. The remaining partial Ty3/gypsy elements, RF2, RF4 and RF10, on the basis of sequence similarities, all appear to belong to the Tf1/ sushi subgroup, like Tcn1 and RF3. Phylogenetic analyses were possible for three C. neoformans Ty1/copia elements, Tcn6, Tcn9, and RF5. These elements were found to be more closely related to retrotransposons from plants and animals than to previously identified fungal elements. All known fungal Ty1/copia elements, such as those from S. cerevisiae and C. albicans, branch off from other elements of this group quite early in the tree. The C. neoformans elements, in contrast, are fairly closely related to elements such as Tnt1 from tobacco and copia from Drosophila (Figure 4A). Tcn6 and Tcn9 are closely related to each other, while RF5 represents a distinct lineage. All of the partial Ty1/copia group elements are closely related to one of these two lineages, the majority to the Tcn6/Tcn9 lineage. Copyright # 2001 John Wiley & Sons, Ltd. 871 Much recent effort has been directed towards understanding the distribution, diversity, and evolution of Ty3/gypsy group elements in particular, partly because elements very similar to these probably gave rise to modern-day retroviruses. We also conducted some more detailed phylogenetic analyses of the C. neoformans Ty3/gypsy elements. Trees were constructed based on each of the RT, RNase H and integrase domains, and on all three of these domains combined. A wide variety of elements was included, designed to represent the recognized diversity of the Ty3/gypsy group. A representative tree derived from the combined dataset is shown in Figure 4B. This tree again places Tcn1 as a member of the Tf1/sushi subgroup. Its closest relatives are MarY1, from another basidiomycete, and the vertebrate sushi elements. A range of elements from ascomycetous fungi make up the remainder of the Tf1/sushi group. A group of plant retrotransposons, represented here by dea1 from pineapple and an element from tomato, form a sister group to the Tf1/sushi elements. These two groups, together, have been termed the ‘chromoviruses’ by Marin and Llorens (2000), because most contain a chromodomain-like module at the Cterminal end of their integrase (Malik and Eickbush, 1999). Loosely associated with this chromovirus group are Ty3 from S. cerevisiae, skipper from Dictyostelium discoideum and Tcn2, 3, 4 and 5. The association of Ty3, skipper and Tcn2–5 with the chromoviruses in this tree is of interest, as it is not apparent from the tree based solely on RT sequences. The difference is largely the result of similarities in the INT domains of these elements and those of the chromoviruses. Marin and Llorens (2000) have previously noted this discrepancy for Ty3 and skipper. They proposed that it may result from differing rates of evolution or reflect a recombinant origin for these elements. The relationship of the non-LTR retrotransposon Cnl1 to other non-LTR elements was examined by constructing trees based on an alignment of the regions encompassing the 11 highly conserved blocks of non-LTR retrotransposon RT identified by Malik et al. (1999). An example of the trees obtained is shown in Figure 4C. Cnl1 was found to group with the elements CRE1 and SLACS from the trypanosomes Crithidia fasciculata and Trypanosoma brucei, respectively. This grouping received 95% bootstrap support. The distances between Cnl1 and the trypanosome elements in the tree (Figure 4C), and Yeast 2001; 18: 865–880. 872 T. J. D. Goodwin and R. T. M. Poulter Figure 4. Continues on next page Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 865–880. Cryptococcus neoformans retrotransposons 873 Figure 4. Relationships of C. neoformans elements to other retrotransposons. (A) Phylogenetic tree of LTR-retrotransposons based on the seven universally conserved domains of reverse transcriptase. The four major groups of LTR-retrotransposons are indicated. The tree shown is a distance tree constructed by the UPGMA method with PHYLIP (Felsenstein, 1989). (B) Phylogenetic tree of Ty3/gypsy elements based on RT, RNase H and INT sequences. This tree was constructed by the neighbour-joining method using PHYLIP. (C) Phylogenetic tree of non-LTR elements based on the regions encompassing the 11 conserved blocks of non-LTR retrotransposon reverse transcriptase. This tree was generated by the UPGMA method using PHYLIP. In panels B and C the percentages of bootstrap support for branches receiving >50% support are shown sequence comparisons (Figure 5C), however, show that the elements are not particularly closely related. Structure of the C. neoformans retrotransposons The deduced structures of all the LTR retrotransposons are presented in Figure 1. The structure of Tcn1 is consistent with its classification with the Tf1/sushi elements. Like most elements of this Copyright # 2001 John Wiley & Sons, Ltd. group, Tcn1 encodes its Gag and Pol proteins in separate open reading frames (ORFs), with the pol ORF being in the x1 phase with respect to gag. Tcn1 also has a chromodomain-like module near the C-terminal end of its integrase (including the most conserved residues, KWxGYx4NSWEP; Malik and Eickbush, 1999). It has the characteristic self-priming structure found in all Tf1/sushi elements (Lin and Levin, 1997a,1997b; Figure 6A). The Tcn1 self-priming structure appears to be Yeast 2001; 18: 865–880. 874 T. J. D. Goodwin and R. T. M. Poulter A gypsy Ty3 Tcn1 RF3 Tf1 Tcn2 Tcn5 Tcn3 Tcn4 F F T TL D LK S G YH QI Y L A E HD REK T S F S V N GGK Y E F C RL P F GL R N A S SI F Q R AL D D V L RE Q I F T TL D LH S G YH QI P M E P KD RYK T A F VTP S GK Y E Y T V M PF GL V N A P ST F A R YM A D T FR DL V F TKI D LR G A YN L L RI K A GE EWK T A F RTR Y GH F E Y L V M PF GL T N A P AS F QH LM N H N FR DL F F TK LD LV G T YQ L L RI S P G H E P LT T F HTQ Y GM F E S L VI Q D GL R N A P AT F QH FL N N V FR DL I F TK LD LK S A YH L I RV R K GD EHK L A F R C P RGV F E Y L V M PY GI S T A P AH F QY FI N T I LG E A V F AK LD LT D A FF QT L M H E PD IEK T A I STP W GL Y E W V V M PQ GA C N S P AT Q Q R RL N E A L RNL I F AKI D LS D A FF QT L M H E PD IEK T A I TTE L GL F E W V V M PQ GA C N S P AT Q Q R RL N E A L RGL I F AK LD CK D A FF QT L M K E ED IPK T A I TTP L GL L E W V V M PQ GI R N A P AA Q Q R RI N E A LQ GL L F AK LD CK D A FF QT L M K D DD IHK T A I TTP L GL L E W V V M PQ GI R N A P AA Q Q R RI N E A LQ GL gypsy Ty3 Tcn1 RF3 Tf1 Tcn2 Tcn5 Tcn3 Tcn4 I G K I C Y VY V D DV II F R . . F V N VY L D DI LI F L D I F V I IY L D DI LI Y L G K G V T IY I N DI LI Y K E S H V V CY M D DI LI H I S V C C E AY V D DI II W L G D S C E AY V D DI I V W T G E C C E AY V D DI II W A G E C C E AY V D DI II W ** B Ty5 RF5 copia Tcn6 Tcn9 Ty1 I V Y Q M DV DT A F LNS K MNEP V YVK QP PG F I D Y VW E L Y G GM Y G LKQ A P L L WN E HI N N T LQ K I E L H Q M DV KT A F LYG K LEED V YLD QP EG Y D G M VW K L D KAL Y G LKQ A P RA WY Q EL H S T LT S L K V H Q M DV KT A F LN GT LKEE I YMR LP QG I S D N VC K L N KAI Y G LKQ A A RC WF E V F E Q ALK E C E C D Q V DI KA A F LN GD LEET I YLE AP EG S D N K IL L L N KSL Y G LRQ S P RC FN K AL D Q W LK S Q E C D Q V DI KS A F LN GD LDET I YLS PP E H S D T HIL R L R KSL Y G LRQ A P RC FN K A F D G WLK S Q Y I T Q L DI SS A Y LYA D IKEE L YIR PP P H L G D KLI R L K KS H Y G LKQ S G A N WY E TI K S Y LI K Q Ty5 RF5 copia Tcn6 Tcn9 Ty1 G F R R I AL YV D D L L VA G F I R V GV YV D D LTI A E F V N V LL YV D D V VI A G L K P L SV HV D D QLI A D L M P L SV HV D D QLI A C G M E I CL FV D D M I LF ** C Jockey L1Hs SLACS Cnl1 CRE1 G AFL DI Q QA FD R VW HP G L LY KA K R L . F P P QL Y L VV KS FLE E RT F HVS V . DG YK S S I K P I A I I SI DA E KA FD K IQ Q P FM LK TL N K L G I D G T YF K I I RA I Y D K PT A NIR L . NG QK L E A F P L K L A ML DG R NA YN A IS RR A I LEA VY G D S T W S PL W R LV S L LL GT T G EV G F Y E NG K L C H T W EST L A SL DA S NA F NR VD RA E M A AA VK T H A . . P TL W R T CKW A YGD S S DL V . C . .G DK I . . L QSS L V AL DG V NA YN T MS RA H I LQA VY A E Q R L K PI W G VV KV AL GG P G FL G V Y R DG C L K G N L WST Jockey L1Hs SLACS Cnl1 CRE1 A GVP Q GS V L GP T LY SVF A S D MP T H T P . . . . . . V TE V D E EDVL I A TYA D DT A V T GTR Q GC PL SP L L FNIV L E V L A R AI R Q E K E I K G IQ L G K EEVK L SL FA D DM I V R GV RQ GM V L GP L L F S IG T L A T L R RL Q . . . . . . . .Q T F .PEA Q F TA Y L D DV T V Q GV RQ GD P FGP L FF S IT L R P TL N AL S . . . . . . . .Q S L GPS T Q A LA Y L D DI Y L K GIR Q GM V L GP L LY AT G M A A AI G PV R . . . . . . . .Q R I .PGVP V TA Y I D DI T L ** Figure 5. Alignments of conserved reverse transcriptase domains of some C. neoformans retrotransposons and related elements of other species. (A) Ty3/gypsy elements. (B) Ty1/copia elements. (C) Non-LTR retrotransposons. The highly conserved pairs of aspartate residues are indicated by asterisks Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 865–880. Cryptococcus neoformans retrotransposons 875 Figure 6. Primer-binding sites of C. neoformans Ty3/gypsy elements. (A) The Tf1/sushi-group self-priming structure of Tcn1. (B) Similarity of the PBSs of Tcn2, 3, 4 and 5 to that of skipper from D. discoideum. (C) Complementarity between the Tcn10 PBS and the 3k end of a C. neoformans methionine tRNA. In each case, the last 10 nucleotides of the upstream LTR are underlined particularly robust and actually contains more paired residues (80 vs. 54) than the type element of the group, Tf1. The relative ease with which a fulllength and apparently intact Tcn1 sequence was assembled suggests that there are still intact elements in the genome. Tcn2, 3, 4, and 5 are very similar to each other in structure (Figure 1A), as might be expected from their close relationships (Figure 4A, B). They have, however, diverged sufficiently that their LTRs are now completely different in sequence. Again, fulllength and apparently intact sequences for each of these elements were relatively easy to assemble, suggesting that intact elements may be fairly common. An interesting structural feature that these elements all share is a very unusual minusstrand primer-binding site (PBS). Most LTR retrotransposons use either a cellular tRNA to prime minus-strand DNA synthesis, in which case they have a PBS complementary to part of a tRNA molecule, or, if they belong to the Tf1/sushi group, they have a characteristic self-priming structure, Copyright # 2001 John Wiley & Sons, Ltd. similar to that of Tcn1. The PBSs of Tcn2–5 do not have any obvious complementarity to tRNAs, neither do they potentially form Tf1/sushi-like selfpriming structures. Instead, their PBSs are very similar to that of the slime-mould element, skipper (Leng et al., 1998), mentioned above (Figure 6B). Skipper is known to have an unusual PBS, but the actual mechanism by which it primes minus-strand DNA synthesis is not known. The similarities of the PBSs of skipper and Tcn2–5, suggest that these elements share a similar method of priming. While little can be learned about the overall structure of the final Ty3/gypsy element, Tcn10, due to a lack of sequence data, this element also seems to have an unusual PBS. Most LTR elements which use the 3k end of a tRNA as a primer have PBSs that are 18 nucleotides long. This represents the distance to the first modified base in the tRNA molecule. The PBS of Tcn10, however, is perfectly complementary to 49 bp at the 3k end of a C. neoformans methionine tRNA (Figure 6C). This is easily the longest PBS that we know of. There is a Yeast 2001; 18: 865–880. 876 2 bp gap between the end of the Tcn10 LTR and its PBS. Such a gap is more commonly found in Ty3/ gypsy elements than in members of the Ty1/copia group. It might be argued that the putative Tcn10 PBS is not a PBS at all, but simply a tRNA gene which the Tcn10 element has inserted next to. We believe that it is a genuine PBS, however, for several reasons: (a) it begins with the sequence TGG, which is complementary to the CCA trinucleotide that is added post-transcriptionally to tRNA molecules, but which is not generally present in tRNA genes; (b) the corresponding tRNA gene has an intron that is not present in the putative PBS; and (c) only the 3k-most 49 bp of the tRNA are present, not an entire tRNA gene. A plausible alternative explanation, however, is that this sequence represents an aberrant reverse transcript, which has become inserted in the genome. Additional sequence will be required to distinguish between these possibilities. The one Ty1/copia element for which the entire sequence is available, Tcn6, has a single ORF, similar to several of the elements, such as copia and Tnt1, to which it is most closely related. It has a PBS complementary to an internal portion of the initiator methionine tRNA that is also commonly found in related elements. The Tcn6 sequence was relatively easily assembled, suggesting that there may still be intact elements. For many of the other Ty1/copia families, however, only short sequences are available, and these are sometimes corrupted. It is likely that few, if any, intact examples of these families remain, at least in this strain. The structure of Cnl1 (Figure 2A) is broadly similar to the structures of CRE1, SLACS and other early-branching non-LTR retrotransposons (see Malik et al., 1999, for a summary of the structures of the various groups of non-LTR elements). It contains a single ORF encoding two N-terminal zinc finger-like motifs (Cx2Cx9Hx4H and Cx2Cx12Hx4C), a reverse transcriptase and a C-terminal zinc-finger (Cx2Cx7Hx3C) followed closely by a sequence (RHNx19IEPx7RNDx11 TDYx23Kx3F) resembling the restriction enzymelike endonucleases of other early-branching elements (Yang et al., 1999). Downstream of the ORF is a short untranslated region. The element most commonly terminates in the sequence 5k-TAACCC-3k, rather than with a poly-A tail. Cnl1 differs from other early-branching non-LTR elements in its integration site preferences. Other early-branching elements, such as CRE1 and Copyright # 2001 John Wiley & Sons, Ltd. T. J. D. Goodwin and R. T. M. Poulter SLACS from trypanosomes, R4 and NeSL from nematodes and R2 from insects, display a high degree of sequence-specificity in their integration — usually for specific sequences within either rRNA genes or spliced-leader exons. In contrast, Cnl1 seems to preferentially insert within pre-existing copies of itself, e.g. of a random sample of 52 3k ends of Cnl1 elements analysed, 37 (71%) were inserted within another Cnl1 element. In every case, the new element was inserted in the same orientation as its target, although no strict sequencespecificity was detected. The integration patterns of Cnl1 elements result in the formation of arrays of tandemly integrated elements (examples in Figure 2B). Interestingly, the Cnl1 elements can frequently be found closely associated with tandem repeats of the sequence 5k-TAACCCCC-3k (and slight variations thereof). This sequence is thought to comprise the C. neoformans telomeres (Edman, 1992). Comparisons of the LTRs Analysis of LTRs can be of particular use in studying LTR retrotransposon evolution, as they are usually the most rapidly evolving part of the element and they are often much more abundant than the internal regions. Several properties of each of the 15 different C. neoformans LTR families are listed in Table 1. As mentioned previously, there is considerable variation in the levels of nucleotide diversity displayed by different families. Families with full-length and apparently intact elements generally have very low levels of diversity, suggesting that they have amplified very recently. The families with the highest levels of diversity only appear to have solo LTRs remaining, consistent with these being older elements. From rough estimates of the relative abundance of the LTRs of each family, it is apparent that the most abundant LTRs are those of the full-length elements, while LTRs from the partial elements are less common. This also suggests that the full-length elements have undergone a more recent amplification. The majority of the full-length solo LTRs were found flanked by short direct repeats. These are target-site duplications (TSDs), which are formed during the integration reaction. For most families the associated TSDs are 5 bp in length (Table 1). Analyses of the perfect 5 bp TSDs of retrotransposons from S. cerevisiae and C. albicans revealed a Yeast 2001; 18: 865–880. Cryptococcus neoformans retrotransposons strong preference for having A or T residues in the three internal positions of the TSDs (Kim et al., 1998; Goodwin and Poulter, 2000). A similar bias was observed in the perfect 5 bp TSDs of the C. neoformans elements. The percentage of A or T residues in the five positions of the TSD (from a total of 43 LTRs) are: 1, 42%; 2, 65%; 3, 65%; 4, 80%; and 5, 40%. Two of the C. neoformans LTRs, LTRs 10 and 15, have 4 bp TSDs. Such TSDs are common for Ty3/gypsy elements but, to the best of our knowledge, have not been found for Ty1/copia elements. It is thus suspected that these LTRs are from Ty3/gypsy-group elements. For these LTRs, a very strong bias towards inserting at RCGY sequences (where R=purine and Y=pyrimidine) was observed. A significant number of the C. neoformans solo LTRs are not flanked by sequences resembling TSDs. The loss of TSDs from these elements is probably the result of recombinations between LTRs at different sites in the genome. The presence of such elements suggests that some chromosomal rearrangements in C. neoformans may occur at sites of transposable element insertions. Discussion Here we have described the identification and initial characterization of 15 or more families of LTR retrotransposons and several families of non-LTR retrotransposons from the medically important yeast Cryptococcus neoformans. The description of these repetitive elements should assist in the assembly of the C. neoformans genome sequence data and, moreover, may allow the development of an efficient in vivo random tagged mutagenesis system. The results also further our understanding of the impact of transposable elements on genome evolution. Full-length sequences were readily assembled for six of the C. neoformans LTR retrotransposon families. Five of these are from Ty3/gypsy group elements and one is from a Ty1/copia element. The sequences represent structurally intact elements, raising the possibility that these elements are still active. This possibility is supported by an evident high degree of sequence homogeneity in these families, which suggests very recent transposition events and, for the Ty3/gypsy elements in particular, relatively high copy numbers. The elements for Copyright # 2001 John Wiley & Sons, Ltd. 877 which just partial sequences could be identified, in general, appear to be present at lower copy numbers and to be more diverse in sequence. These families probably amplified some time ago and have since accumulated mutations and diverged in sequence. There may be few, if any, intact elements remaining for these families. An apparently full-length sequence was assembled for one non-LTR retrotransposon family, Cnl1. The sequence represents a structurally intact element, containing all the expected motifs of a functional element. The elements of this family are highly homogeneous in sequence, suggesting that they have recently transposed and may still be active. The only other non-LTR retrotransposon families identified appear to be low copy number and degenerate relatives of Cnl1. Few, if any, of these elements are likely to be active. C. neoformans is the first basidiomycete for which there is a large amount of sequence data available. At present, all the most well-characterized fungal genomes are from ascomycetes. We found that few of the C. neoformans retrotransposons are closely related to previously identified fungal elements. For instance, the several Ty1/copia-group elements are more closely related to elements from plants and animals than to other known fungal elements. Similarly, the Ty3/gypsy group elements, Tcn2, 3, 4 and 5, are not close relatives of any previously identified retrotransposons. These findings suggest that the diversity of fungal retrotransposons has only been sparsely sampled to date. It also suggests that it will be necessary to characterize a wide variety of the most basal eukaryotic lineages in order to understand the full diversity of LTR retrotransposons. All the Ty3/gypsy elements of C. neoformans have remarkable primer-binding sites. Tcn1 has a selfpriming structure, resembling those of other members of the Tf1/sushi group, but which has considerably more paired residues than any other element. Tcn10 has a PBS complementary to a cellular tRNA but which is much longer than any other similar PBS. Tcn2, 3, 4, and 5 have PBSs, consisting of single cytosine residues followed by poly-thymidine tracts, the like of which have only been seen once before — in the element skipper from the slime mould, Dictyostelium discoideum. These remarkable PBSs may once again reflect an as yet sparse sampling of the diversity of fungal retrotransposons. Alternatively, they may indicate Yeast 2001; 18: 865–880. 878 T. J. D. Goodwin and R. T. M. Poulter that exceptional PBSs are necessary for successful transposition in C. neoformans. For instance, because C. neoformans is a pathogen of warmblooded animals, its retrotransposons will frequently be exposed to higher temperatures than other characterized retrotransposons, the majority of which have been found in plants, plantpathogenic or saprophytic fungi or cold-blooded animals. The higher temperatures that the C. neoformans elements are exposed to may mean that they require either greater stability in their priming structures or novel priming mechanisms, in order to transpose successfully. The mechanism by which Tcn2–5 and skipper prime minus-strand DNA synthesis is not known at present. Leng et al. (1998) speculated that the poly-A sequence at the end of the skipper mRNA might anneal to the poly-T tract in the PBS and subsequently be used as a minus-strand primer. The fact that this poly-T tract is highly conserved in the C. neoformans elements is consistent with this proposal. The finding of this novel priming system, in what are probably active yeast retrotransposons, should assist in its molecular characterization. Tcn2, 3, 4 and 5 do not appear to be closely related to skipper on the basis of their RT sequences, despite sharing the unusual priming system. They also have quite different structures: the four C. neoformans elements all encode their Gag and Pol proteins in a single ORF, whereas skipper has a conserved termination codon between gag and pro, and a +1 phase-shift between pro and the rest of pol. However, when three conserved Pol domains are used to build phylogenetic trees, skipper and the C. neoformans elements group more closely together, largely because of similarities in their INT domains. It seems likely that at least some of these elements have a recombinant origin. Members of the C. neoformans non-LTR retrotransposon family Cnl1 are most commonly found inserted within pre-existing Cnl1 elements and, in these cases, they are always in the same orientation as their targets. The Cnl1 elements thus form polar arrays of nested retrotransposons. Cnl1 elements are also frequently associated with tandem repeats of a sequence thought to comprise the C. neoformans telomeres. The putative telomeric repeats are usually located at the 5k-most ends of the Cnl1 elements. This suggests that the Cnl1 arrays are located close to the ends of chromosomes with the 3k-ends of the elements closest to the centromere. The structures produced by the Cnl1 elements resemble Drosophila telomeres, which consist of polar arrays of nested nonLTR retrotransposons (Pardue et al., 1996). The C. neoformans telomeres may be similar to an evolutionary intermediate between typical telomerasemaintained telomeres and retrotransposon telomeres, and may make a useful model for studying the transition between these two systems. The number of distinct LTR retrotransposon families varies considerably in different fungal genomes (Table 2). Why should this be so? The data available at present suggest that at least some of the variation is due to differences in the rates at which elements are lost from the genome. For instance, nearly all the retrotransposon families in both S. cerevisiae (Kim et al., 1998) and Sz. pombe (Levin and Boeke, 1992; Hoff et al., 1998) are still active, and there are no families represented just by solo LTRs, suggesting that elimination of nonfunctional elements is rapid in these species. In contrast, C. albicans has a large number of distinct families (Goodwin and Poulter, 2000). Many of these, such as those represented solely by hetero- Table 2. Numbers of LTR families in well-characterized fungal genomes Species Number of LTR familiesa Solo LTR familiesb Haploid genome size (Mb) Phylum C. neoformans S. cerevisiae Sz. pombe C. albicans 15 4 1 34 5 0 0 18 23 12 14 16 Basidiomycota Ascomycota Ascomycota Ascomycota a The Ty1 and Ty2 LTRs of S. cerevisiae, and the Tf1 and Tf2 LTRs of Sz. pombe, would each make up single families under the classification scheme used for C. albicans and C. neoformans. b LTR families for which no associated internal regions have been identified. Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 865–880. Cryptococcus neoformans retrotransposons geneous groups of isolated LTRs, are inactive and unlikely to have transposed in a long time. Turnover of elements in C. albicans is therefore likely to be very slow compared to S. cerevisiae or Sz. pombe. The number of distinct LTR retrotransposon families in C. neoformans is intermediate between S. cerevisiae/Sz. pombe and C. albicans (Table 2). In addition, it has several families for which only isolated LTRs have been identified to date, but nowhere near as many of these families as C. albicans. These findings suggest that elimination of retrotransposons in C. neoformans is faster than in C. albicans but not as rapid as in S. cerevisiae or Sz. pombe. Assuming that the rate at which elements are lost from the genome makes a major contribution to the diversity of elements in a species, there is still the problem of why this rate should differ between species. It is possible that differing modes of reproduction in the host will affect the rate at which transposable elements are eliminated. We previously speculated that the apparent lack of sexual reproduction in C. albicans may have contributed to its relative abundance of LTR elements, as, in the absence of meiotic recombination and out-crossing, even selectively disadvantageous insertions would be difficult to remove efficiently from the population (Goodwin and Poulter, 2000). To fully assess this proposal, it will be necessary to have not only a detailed understanding of the diversity of transposable elements within a wide variety of species but also a detailed understanding of the population structure of each species. Several studies suggest that the population structure of C. albicans is predominantly clonal, as expected given that C. albicans has long been considered asexual (Pujol et al., 1993; Graser et al., 1996). Population studies of C. neoformans show that, like C. albicans, this organism has a predominantly clonal population structure (Chen et al., 1995; Franzot et al., 1997), despite having a wellcharacterized sexual cycle. The structures of wild S. cerevisiae and Sz. pombe populations have not been described to date, although one might predict, given the ease with which laboratory strains are crossed, that they would undergo more recombination than C. albicans and perhaps more than C. neoformans. Other features of host genomes could also lead to differing rates of elimination of transposable elements. For instance, the S. cerevisiae genome Copyright # 2001 John Wiley & Sons, Ltd. 879 appears to be particularly compact, as attested to by its small intergenic regions, its rare introns and its small size (Dujon, 1996). The same processes that produce this streamlined genome may bring about the rapid elimination of retrotransposons. The Sz. pombe genome may be similarly compressed, as it is also very small (Fan et al., 1989) and has few transposable elements. C. neoformans is a unicellular organism like S. cerevisiae and Sz. pombe and has a life cycle of similar complexity. The fact that its genome is nearly twice as large as those of S. cerevisiae or Sz. pombe suggests that it has not been under the same pressure to shed nonessential DNA. This may also account for its greater diversity of retrotransposons. The overall diversity of transposable elements in different species is probably the result of many complex interactions between the elements and their hosts. As more genomes are characterized, it should be possible to determine the most important factors controlling transposable element diversity and evolution. Acknowledgements We thank Richard Hyman and the other members of the C. neoformans sequencing project at the Stanford Genome Technology Center and Nagasaki University for permission to use their unpublished data. Sequencing of C. neoformans at Stanford was accomplished with the support of the NIH. References Boeke JD, Stoye JP. 1997. Retrotransposons, endogenous retroviruses, and the evolution of retroelements. In Retroviruses, Coffin JM, Hughes SH, Varmus HE (eds). Cold Spring Harbor Laboratory Press: New York; 343–435. Casadevall A, Freundlich LF, Marsh L, Scharff MD. 1992. Extensive allelic variation in Cryptococcus neoformans. J Clin Microbiol 30: 1080–1084. Chen F, Currie BP, Chen LC, Spitzer SG, Spitzer ED, Casadevall A. 1995. Genetic relatedness of Cryptococcus neoformans clinical isolates grouped wih the repetitive DNA probe CNRE-1. J Clin Microbiol 33: 2818–2822. Devereux J, Haeberli P, Smithies O. 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12: 387–395. Dujon B. 1996. The yeast genome project: what did we learn? Trends Genet 12: 263–270. Edman JC. 1992. Isolation of telomerelike sequences from Cryptococcus neoformans and their use in high-efficiency transformation. Mol Cell Biol 12: 2777–2783. Fan JB, Chikashige Y, Smith CL, Niwa O, Yanagida M, Cantor CR. 1989. Construction of a Not I restriction map of the Yeast 2001; 18: 865–880. 880 fission yeast Schizosaccharomyces pombe genome. Nucleic Acids Res 17: 2801–2818. Felsenstein J. 1989. PHYLIP — phylogeny inference package (version 3.2). Cladistics 5: 164–166. Franzot SP, Hamdan JS, Currie BP, Casadevall A. 1997. Molecular epidemiology of Cryptococcus neoformans in Brazil and the United States: evidence for both local genetic differences and a global clonal population structure. J Clin Microbiol 35: 2243–2251. Franzot SP, Fries BC, Cleare W, Casadevall A. 1998. Genetic relationship between Cryptococcus neoformans var. neoformans strains of serotypes A and D. J Clin Microbiol 36: 2200–2204. Goodwin TJD, Poulter RTM. 2000. Multiple LTRretrotransposon families in the asexual yeast Candida albicans. Genome Res 10: 174–191. Graser Y, Volovsek M, Arrington J, et al. 1996. Molecular markers reveal that population structure of the human pathogen Candida albicans exhibits both clonality and recombination. Proc Natl Acad Sci U S A 93: 12473–12477. Heitman J, Allen B, Alspaugh JA, Kwon-Chung KJ. 1999. On the origins of congenic MATa and MATa strains of the pathogenic yeast Cryptococcus neoformans. Fungal Genet Biol 28: 1–5. Hoff EF, Levin HL, Boeke JD. 1998. Schizosaccharomyces pombe retrotransposon Tf2 mobilizes primarily through homologous cDNA recombination. Mol Cell Biol 18: 6839–6852. Kidwell MG, Lisch DR. 2000. Transposable elements and host genome evolution. Trends Ecol Evol 15: 95–99. Kim JM, Vanguri S, Boeke JD, Gabriel A, Voytas DF. 1998. Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res 8: 464–478. Kimura M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16: 111–120. Kwon-Chung KJ. 1975. A new genus, Filobasidiella, the perfect state of Cryptococcus neoformans. Mycologia 67: 1197–1200. Kwon-Chung KJ, Bennett JE, Rhodes JC. 1982. Taxonomic studies on Filobasidiella species and their anamorphs. Antonie Leeuwenhoek 48: 25–38. Kwon-Chung KJ, Bennett JE. 1992. Medical Mycology. Lea & Febiger: Philadelphia, PA. Leng P, Klatte DH, Schumann G, Boeke JD, Steck TL. 1998. Skipper, an LTR retrotransposon of Dictyostelium. Nucleic Acids Res 26: 2008–2015. Levin HL, Boeke JD. 1992. Demonstration of retrotransposition of the Tf1 element in fission yeast. EMBO J 11: 1145–1153. Lin JH, Levin HL. 1997a. A complex structure in the mRNA of Tf1 is recognized and cleaved to generate the primer of reverse transcription. Genes Dev 11: 270–285. Lin JH, Levin HL. 1997b. Self-primed reverse transcription is a mechanism shared by several LTR-containing retrotransposons. RNA 3: 952–953. Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25: 955–964. Copyright # 2001 John Wiley & Sons, Ltd. T. J. D. Goodwin and R. T. M. Poulter Malik HS, Burke WD, Eickbush TH. 1999. The age and evolution of non-LTR retrotransposable elements. Mol Biol Evol 16: 793–805. Malik HS, Eickbush TH. 1999. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol 73: 5186–5190. Marin I, Llorens C. 2000. Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data. Mol Biol Evol 17: 1040–1049. Nei M, Li WH. 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci U S A 76: 5269–5273. Pardue ML, Danilevskaya ON, Lowenhaupt K, Slot F, Traverse KL. 1996. Drosophila telomeres: new views on chromosome evolution. Trends Genet 12: 48–52. Pujol C, Reynes J, Renaud F, et al. 1993. The yeast Candida albicans has a clonal mode of reproduction in a population of infected human immunodeficiency virus-positive patients. Proc Natl Acad Sci U S A 90: 9456–9459. Sandmeyer S. 1998. Targeting transposition: at home in the genome. Genome Res 8: 416–418. San Miguel P, Bennetzen JL. 1998. Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons. Ann Bot 82: 37–44. Schneider S, Roessli D, Excoffier L. 2000. Arlequin, version 2.000: a software for population genetics data analysis. Genetics and Biometry Laboratory, University of Geneva, Switzerland. Smit AFA. 1999. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev 9: 657–663. Smith V, Botstein D, Brown PO. 1995. Genetic footprinting: a genomic strategy for determining a gene’s function given its sequence. Proc Natl Acad Sci U S A 92: 6479–6483. Tanaka R, Taguchi H, Takeo K, Miyaji M, Nishimura K. 1996. Determination of ploidy in Cryptococcus neoformans by flow cytometry. J Med Vet Mycol 34: 299–301. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25: 4876–4882. Wickes BL, Moore TDE, Kwon-Chung KJ. 1994. Comparison of the electrophoretic karyotypes and chromosomal location of ten genes in the two varieties of Cryptococcus neoformans. Microbiol 140: 543–550. Xiong Y, Eickbush TH. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J 9: 3353–3362. Yang J, Malik HS, Eickbush TH. 1999. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc Natl Acad Sci U S A 96: 7847–7852. Yeast 2001; 18: 865–880.
© Copyright 2026 Paperzz