The diversity of retrotransposons in the yeast Cryptococcus

Yeast
Yeast 2001; 18: 865–880.
DOI: 10.1002/yea.733
Research Article
The diversity of retrotransposons in the yeast
Cryptococcus neoformans
Timothy J. D. Goodwin* and Russell T. M. Poulter
Department of Biochemistry, University of Otago, Cumberland Street, Dunedin, New Zealand
* Correspondence to:
T. J. D. Goodwin, Department of
Biochemistry, University of Otago,
PO Box 56, Dunedin, New
Zealand.
E-mail: [email protected]
Received: 25 October 2000
Accepted: 30 January 2001
Abstract
We have undertaken an analysis of the retrotransposons in the medically important
basidiomycetous fungus Cryptococcus neoformans. Using the data generated by a C.
neoformans genome sequencing project at the Stanford Genome Technology Center, 15
distinct families of LTR retrotransposons and several families of non-LTR retrotransposons were identified. Members of at least seven families have transposed recently
and are probably still active. For several families, only partial elements could be identified
and these are quite diverse in sequence, suggesting that they are ancient components of the
C. neoformans genome. Most C. neoformans elements are not closely related to previously
identified fungal retrotransposons, suggesting that the diversity of fungal retrotransposons
has been only sparsely sampled to date. C. neoformans has fewer distinct retrotransposon
families than Candida albicans (37 or more), in particular fewer families represented solely
by ancient and inactive elements, but it has considerably more families than either
Saccharomyces cerevisiae (five) or Schizosaccharomyces pombe (two). The findings
suggest that elimination of retrotransposons is faster in C. neoformans than in C. albicans,
but perhaps not as rapid as in S. cerevisiae or Sz. pombe. The identification of the
retrotransposons of C. neoformans should assist in the molecular characterization of this
important pathogen, and also further our understanding of the role played by retroelements
in genome evolution. Copyright # 2001 John Wiley & Sons, Ltd.
Keywords:
Cryptococcus neoformans; yeast; retrotransposon; genome
Introduction
Cryptococcus neoformans is a basidiomycetous yeast
that is a leading cause of disease in AIDS patients
and is also capable of causing disease in people with
no known immune disorder (Kwon-Chung and
Bennett, 1992). C. neoformans infections present as
a meningoencephalitis, which is often fatal if left
untreated. Infections are usually not curable.
Patients who survive the initial disease undergo
long-term suppressive anti-fungal therapy to reduce
the likelihood of recurrent infections.
C. neoformans isolates can be classified into two
varieties and four serotypes, C. neoformans var.
neoformans (serotypes A and D) and C. neoformans
var. gattii (serotypes B and C), on the basis of
various biochemical, morphological and genetic
distinctions (Kwon-Chung et al., 1982). The relationships among these groups are not well defined,
Copyright # 2001 John Wiley & Sons, Ltd.
however, and there can be extensive genetic variation among strains, even within a group (Casadevall
et al., 1992; Franzot et al., 1998). This variation is
of significance, as it may result in differences in
virulence and response to anti-fungal therapy. The
mechanisms that generate the diversity among
C. neoformans isolates are not well understood.
C. neoformans is haploid as isolated in nature and
has a well-characterized heterothallic mating system
(Tanaka et al., 1996; Kwon-Chung, 1975). The
overall population structure appears to be largely
clonal, however (Chen et al., 1995; Franzot et al.,
1997), suggesting that sexual reproduction is relatively infrequent in nature.
Transposable elements have had a profound
influence on the evolution of eukaryote genomes
(Kidwell and Lisch, 2000). These elements often
constitute a large proportion of the genome, for
instance, they make up 40% or more of the human
866
genome (Smit, 1999) and as much as 70% of some
plant genomes (San Miguel and Bennetzen, 1998).
Insertions of transposable elements near genes can
lead to alterations in gene expression patterns — as
the elements usually contain transcriptional regulatory sequences — while insertions within genes can
directly alter gene structure. Recombination
between elements at different sites can lead to
large-scale chromosomal rearrangements. As a
result of the potentially deleterious effects of
transposable element proliferation, host organisms
have often evolved mechanisms, such as specific
methylation, that limit the activity of these elements. Similarly, because transposable elements are,
in general, only rarely transmitted across species
boundaries, their continued existence is usually
dependent upon the continued survival of their
hosts. As a result, the elements themselves often
appear to have evolved mechanisms, such as
directing their integration to specific parts of the
genome, which keep the damage they cause to a
minimum (Sandmeyer, 1998).
The transposable elements most commonly
isolated from fungi to date are retrotransposons.
These are mobile elements that replicate via an
RNA intermediate (Boeke and Stoye, 1997). Autonomous examples encode their own reverse transcriptase and are typically 5–10 kb long. There are
two broad classes; the LTR retrotransposons, and
the non-LTR retrotransposons (also known as
LINE elements). LTR retrotransposons are characterized by long terminal repeats (LTRs) at each
end. These are necessary for the replication cycle
and also contain sequences that regulate transcription of the element. Non-LTR retrotransposons
lack LTRs. They often contain an internal promoter and frequently terminate in poly-A tails.
The early identification and characterization of
retrotransposons can assist in the analysis of new
genomes in several ways. First, because these
elements are often highly homogeneous in sequence
and dispersed throughout the genome, they can
potentially lead to serious difficulties in sequence
assembly. An early estimate of the diversity and
abundance of these elements may allow appropriate
measures to be taken to avoid subsequent problems
in sequence assembly. Second, these elements often
display fairly relaxed targeting specificities, and
transposition can often be induced to high levels
in vivo. Such properties allow these elements to act
as efficient, random, tagged, insertional mutagens,
Copyright # 2001 John Wiley & Sons, Ltd.
T. J. D. Goodwin and R. T. M. Poulter
which can prove useful for understanding gene
function, including, for example, the identification
of virulence factors (Smith et al., 1995). Finally,
because these elements are mobile and usually
present as multiple copies, they can provide useful
markers for strain differentiation and epidemiological analyses.
Because of the medical significance of C.
neoformans, the genome of one strain is currently
being sequenced at the Stanford Genome Technology Center (http://www-sequence.stanford.edu/
group/C.neoformans/index.html). At the time of
writing (July 2000), the Stanford C. neoformans
sequence database consists of y82 000 random
genomic shotgun sequence reads, containing
y49 Mb of sequence. This represents a slightly
greater than two-fold coverage of the 23 Mb
haploid genome (Wickes et al., 1994). We have
used this considerable resource to study the diversity of retrotransposons in C. neoformans. The
strain being sequenced contains 15 or more distinct
families of LTR retrotransposons and a number of
non-LTR retrotransposon families. Several families
appear to have amplified very recently and probably
still contain active members. Other families are
present at relatively low copy numbers and are
diverse in sequence, suggesting that they are ancient
components of the C. neoformans genome. The
findings should assist in the characterization of the
C. neoformans genome, and also contribute to our
understanding of the role played by retrotransposons in genome evolution.
Materials and methods
Sequence analyses
Sequence data from Cryptococcus neoformans strain
JEC21 (Heitman et al., 1999) was obtained from
the Stanford Genome Technology Center website
(http://www-sequence.stanford.edu/group/C.neoformans/
index.html). The bulk of the analyses reported here
were performed on the sequence obtained up to the
end of July 2000. The database at this time
consisted of 82 330 random shotgun sequence
reads, containing 48.7 Mb of sequence. The Stanford C. neoformans sequences were screened using
the BLAST facilities provided on the website.
General sequence analyses and manipulations
were performed using programs of the GCG
package (Devereux et al., 1984). Multiple sequence
Yeast 2001; 18: 865–880.
Cryptococcus neoformans retrotransposons
alignments were constructed with CLUSTAL_X
(Thompson et al., 1997). Phylogenetic trees were
constructed using PHYLIP (Felsenstein, 1989).
Nucleotide diversity (Nei and Li, 1979) was calculated using Arlequin (Schneider et al., 2000). For
most LTR families, the phylogenetic trees and the
nucleotide diversity calculations were based on
alignments of all the sequences in the database
that contained a full-length LTR, except that
apparent duplicates were discarded. For the longest
LTR families, few or no single sequences covered
the entire LTR. For these elements, all the single
sequences that covered a particular 500 bp segment
of the LTR were used to construct trees and
calculate nucleotide diversity. The nucleotide diversity results for the C. neoformans elements are likely
to be slight overestimates, as they are taken from
raw sequence data, which will contain some errors.
The relative abundance of each LTR family was
estimated by counting the number of single
sequences in the database that contained a greater
than 70% match to a particular 100 bp fragment of
the LTR in question. It should be emphasized that
these are estimates of relative abundance, not
absolute copy number. The structure of the
C. neoformans methionine tRNA was studied using
tRNAscan-SE (Lowe and Eddy, 1997). The
sequences described in this paper are available
from the authors’ website at URL http://bioc111.
otago.ac.nz : 8001/retrobase/home.htm.
Results
Identification and classification of
retrotransposons in Cryptococcus neoformans
Retrotransposons in C. neoformans were identified
in several ways. The first elements were detected by
virtue of their sequence similarity to previously
identified retrotransposons of other species. For
instance, several elements were detected in a series
of TBLASTN searches (protein query vs DNA
database) of the Stanford C. neoformans sequence
database, using the protein sequences of a wide
variety of retrotransposons from other species as
queries. Several other elements were subsequently
identified by performing BLASTN and TBLASTN
searches of the C. neoformans database, using the
elements identified in the original searches as
queries. The remaining elements were detected as
Copyright # 2001 John Wiley & Sons, Ltd.
867
insertions present within or adjacent to other
retrotransposons.
The sequences identified at this point could easily
be classifed into distinct families, as they were either
highly similar to each other (e.g. 95–100% identity
in overlapping regions) or otherwise very different
in sequence. For each distinct family, we attempted
to assemble a representative full-length sequence.
This was done by building up contigs, via series of
BLAST searches to identify overlapping sequences,
using the sequences identified in the original
searches as starting points. The extent of sequence
identified for each LTR retrotransposon family, and
the deduced structures of the elements, are shown in
Figure 1. For six families we were able to assemble
full-length representative sequences. The other
families are at present represented by either: (a)
LTRs with associated, but incomplete, internal
regions; (b) solo LTRs with no associated internal
regions; or (c) partial internal regions with no
identified LTRs. The LTRs have been named
LTR1–15. The corresponding retrotransposons,
where they have been identified, have been named
Tcn1–10. Partial internal regions, for which the
corresponding LTR is not known, have been named
RF1–10 (for retrotransposon fragment). The possibility that some of the partial elements will be found
to belong to the same family, as more sequence
becomes available, means that we cannot at present
determine the exact number of distinct families.
However, the greater than two-fold coverage of the
genome in the database at the time of writing, and
the fact that we have devoted considerable effort to
identifying additional families without success,
suggests that the sequences shown in Figure 1
represent the majority of distinct LTR retrotransposon families present in this C. neoformans strain.
The deduced structure of the most abundant nonLTR element is shown in Figure 2A. This element
has been named Cnl1 (C. neoformans LINE-1), and
appears to represent a structurally intact nonLTR retrotransposon. There are also several
other families of non-LTR retrotransposons in
C. neoformans (not shown). These all appear to be
degenerate, low-copy-number relatives of Cnl1.
Sequence diversity within families
It is important to know how representative of each
family the full-length assembled sequences are, or,
phrased another way, what level of sequence
Yeast 2001; 18: 865–880.
868
T. J. D. Goodwin and R. T. M. Poulter
Figure 1. Structure and available sequence for each C. neoformans LTR retrotransposon family. (A) Ty3/gypsy elements. (B)
Ty1/copia elements. (C) Solo LTRs. Boxes with triangles represent the LTRs. Shaded boxes represent the ORFs. Changes of
reading frame are indicated by offset boxes. Stop codons within the ORFs are indicated by vertical lines. The approximate
positions of the various domains within the ORFs are indicated above the first element of each group. The common scale is
shown at the bottom
Copyright # 2001 John Wiley & Sons, Ltd.
Yeast 2001; 18: 865–880.
Cryptococcus neoformans retrotransposons
Figure 2. Structure of the C. neoformans non-LTR retrotransposon, Cnl1. (A) Structure of a single Cnl1 element.
The long box represents the single long ORF of the element.
Areas of dark shading represent the zinc finger-like
sequences. Areas of pale shading represent the conserved
regions of reverse transcriptase and the restriction enzymelike endonuclease (REL-Endo.). Thin black lines represent the
5k and 3k untranslated regions. (B) Examples of nested Cnl1
arrays. Thick lines with arrowheads represent the 3k ends of
Cnl1 elements. Shaded boxes represent the Cnl1 ORF. Thin
lines represent unrelated sequences. These examples are
from the sequences named at the right
diversity is present within each family? To tackle
this problem for the LTR retrotransposons, we
started by extracting from the database all the
sequences containing a full-length LTR and aligning them according to their families. From the
alignments phylogenetic trees were constructed
(examples in Figure 3A) and levels of nucleotide
diversity (p; Nei and Li, 1979) were calculated
(Table 1). The phylogenetic trees allow the diversity
of sequences within a family to be visualized
directly. For several families we found that all the
LTRs are nearly identical in sequence, for example,
those of Tcn1. Some families, such as Tcn3, have
LTRs that fall into distinct subfamilies, the LTRs
within each subfamily, however, being very similar
in sequence. Other families, for example LTR12,
contain a much broader range of sequences
(Figure 3A). In general, we found that LTRs from
families for which full-length sequences were easily
assembled have the lowest levels of sequence
variation, while the most diverse families are those
represented only by solo LTRs.
The nucleotide diversity calculations (Table 1)
Copyright # 2001 John Wiley & Sons, Ltd.
869
Figure 3. Distance trees of full-length LTR sequences (A)
and RT sequences (B). (A) All the sequences in the database
containing full-length LTRs of a particular family were
extracted and aligned. Phylogenetic trees were then constructed by the UPGMA method using PHYLIP. (B) All the
sequences in the database containing a particular 400 bp
region of the reverse transcriptase gene from four closely
related families were extracted and aligned. Trees were again
constructed by the UPGMA method using PHYLIP. The trees
in A and B are all shown at the same horizontal scale. The
distance is Kimura’s (1980) two-parameter distance
allow the diversity of sequences within a family to
be quantitated. Again, LTRs from families with
full-length elements generally have low levels of
diversity, e.g. the Tcn2 LTR, for which p=0.022.
This equates to an average level of sequence identity
of y98%. The highest levels of diversity were again
found in solo LTR families, e.g. LTR12, for which
p=0.212 (corresponding to an average level of
sequence identity of y79%).
We then examined the diversity of sequences
within the internal regions of some full-length
elements. All the sequences in the database containing particular 400 bp segments of either the Gag or
reverse transcriptase (RT) coding regions from four
related families (Tcn2, 3, 4, and 5; see Phylogenetic
analyses, below) were extracted and aligned.
The alignments were again used to construct
Yeast 2001; 18: 865–880.
870
T. J. D. Goodwin and R. T. M. Poulter
Table 1. Properties of C. neoformans retrotransposon LTR families
LTR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(Tcn
(Tcn
(Tcn
(Tcn
(Tcn
(Tcn
(Tcn
(Tcn
(Tcn
(Tcn
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
Length (bp)
Relative abundancea
Nucleotide diversity
TSD (bp)
Group
501
942
505
433
589
271
174
255
181
528
616
154
393
448
515
53
70
61
33
44
17
16
6
17
18
19
4
5
11
10
0.056
0.022
0.113b
0.024
0.011
0.123
0.184
0.107
0.135
0.004
0.110
0.212
0.259
0.065
0.034
5
5
5
5
ND
5
5
5
5
4
5
5
5
5
4
Ty3/gypsy
Ty3/gypsy
Ty3/gypsy
Ty3/gypsy
Ty3/gypsy
Ty1/copia
Ty1/copia
Ty1/copia
Ty1/copia
Ty3/gypsy
ND
ND
ND
ND
Ty3/gypsy
ND, not determined.
a
Determined as described in Materials and methods.
b
This is actually the nucleotide diversity of two closely related, but distinct, subfamilies (Figure 3A).
phylogenetic trees and calculate nucleotide diversity. Similar to the results with the LTRs, we found
that these families were very homogeneous in their
internal regions. Nucleotide diversity levels in the
different families ranged from 0.017 to 0.029 for the
RT sequences (corresponding to y97–98% average
identity), and from 0.011 to 0.050 (95–99% average
identity) for the gag sequences. In the RT region, all
but four out of 307 differences were simple base
substitutions, and 77% of these base substitutions
were in the third-base position, and thus may not
alter the encoded amino acid sequence. In the gag
regions nearly all differences were again simple base
substitutions, but the frequency of those in thirdbase positions dropped to 60%. Within the highly
conserved RT region, the nucleotide sequences from
these four families were sufficiently similar to
permit all the sequences to be placed in a single
alignment. This allowed the level of sequence
diversity within families to be compared to the
level of distinction between families. A tree generated from the alignment is shown in Figure 3B. It is
evident that the diversity of sequences within a
family is small compared to the level of distinction
between families, even for these closely related
families.
A similar analysis was undertaken for a 400 bp
segment of the RT gene of the non-LTR retrotransposon Cnl1. The level of nucleotide diversity
over this region for Cnl1 was 0.028. All but two
Copyright # 2001 John Wiley & Sons, Ltd.
from a total of 336 differences were base substitutions. Of these, 86% were third-base substitutions.
Overall, these results show that the level of
sequence diversity within single C. neoformans
retrotransposon families differs considerably in
different families. Families for which full-length
sequences were easily assembled display low levels
of sequence diversity. This suggests that representative full-length assembled sequences are likely to be
reliable indicators of the overall sequence and
structure of these retrotransposons. Families for
which just partial elements, such as solo LTRs, are
present, generally display higher levels of diversity.
It is likely that the level of sequence diversity within a
related group of retrotransposons reflects the amount
of time since the group amplified in the genome, as
older elements have had more time to accumulate
individual differences than younger ones. The finding
that the most diverse C. neoformans retrotransposon
families also appear to have no intact examples
remaining is consistent with this idea.
Phylogenetic analyses
The relationships between the C. neoformans elements and previously identified retrotransposons
were analysed by constructing phylogenetic trees
based on multiple sequence alignments of the
various Pol domains. For the LTR retrotransposons, we began by constructing trees based on the
Yeast 2001; 18: 865–880.
Cryptococcus neoformans retrotransposons
seven universally conserved domains of reverse
transciptase (RT) described by Xiong and Eickbush
(1990). We included all the C. neoformans elements
for which sequence from all seven RT domains was
available, and a wide range of elements from other
species, including representatives from all four
major groups of LTR retrotransposons. An example of the trees we obtained is shown in Figure 4A.
Alignments of highly conserved RT regions are
shown in Figure 5. The C. neoformans elements
were all found to be members of either the Ty3/
gypsy or Ty1/copia groups. The full-length element
Tcn1 and the partial element RF3 were found to be
members of a sub-group of Ty3/gypsy elements that
includes many fungal elements, such as Tf1 from
Schizosaccharomyces pombe and Maggy from Magnaporthe grisea, as well as vertebrate elements such
as sushi from the puffer fish Fugu rubripes. Tcn1
and RF3 do not, however, appear to closely related
to each other (Figure 4A). The other C. neoformans
Ty3/gypsy elements for which phylogenetic analysis
was possible (Tcn2, 3, 4 and 5) form a monophyletic group of closely related elements that
emerges from near the base of the Ty3/gypsy
group and which is not closely related to any
previously identified elements. The element Tcn10,
for which very little internal sequence was detected,
was also assigned to the Ty3/gypsy group on the
basis of structural features (see below). These same
structural features, however, suggest that Tcn10 is
not closely related to the other C. neoformans Ty3/
gypsy elements. The remaining partial Ty3/gypsy
elements, RF2, RF4 and RF10, on the basis of
sequence similarities, all appear to belong to the Tf1/
sushi subgroup, like Tcn1 and RF3.
Phylogenetic analyses were possible for three
C. neoformans Ty1/copia elements, Tcn6, Tcn9,
and RF5. These elements were found to be more
closely related to retrotransposons from plants and
animals than to previously identified fungal elements. All known fungal Ty1/copia elements, such
as those from S. cerevisiae and C. albicans, branch
off from other elements of this group quite early in
the tree. The C. neoformans elements, in contrast,
are fairly closely related to elements such as
Tnt1 from tobacco and copia from Drosophila
(Figure 4A). Tcn6 and Tcn9 are closely related to
each other, while RF5 represents a distinct lineage.
All of the partial Ty1/copia group elements are
closely related to one of these two lineages, the
majority to the Tcn6/Tcn9 lineage.
Copyright # 2001 John Wiley & Sons, Ltd.
871
Much recent effort has been directed towards
understanding the distribution, diversity, and evolution of Ty3/gypsy group elements in particular,
partly because elements very similar to these
probably gave rise to modern-day retroviruses. We
also conducted some more detailed phylogenetic
analyses of the C. neoformans Ty3/gypsy elements.
Trees were constructed based on each of the RT,
RNase H and integrase domains, and on all three of
these domains combined. A wide variety of elements was included, designed to represent the
recognized diversity of the Ty3/gypsy group. A
representative tree derived from the combined
dataset is shown in Figure 4B. This tree again
places Tcn1 as a member of the Tf1/sushi subgroup.
Its closest relatives are MarY1, from another
basidiomycete, and the vertebrate sushi elements.
A range of elements from ascomycetous fungi make
up the remainder of the Tf1/sushi group. A group
of plant retrotransposons, represented here by dea1
from pineapple and an element from tomato, form
a sister group to the Tf1/sushi elements. These two
groups, together, have been termed the ‘chromoviruses’ by Marin and Llorens (2000), because most
contain a chromodomain-like module at the Cterminal end of their integrase (Malik and Eickbush,
1999). Loosely associated with this chromovirus
group are Ty3 from S. cerevisiae, skipper from
Dictyostelium discoideum and Tcn2, 3, 4 and 5. The
association of Ty3, skipper and Tcn2–5 with the
chromoviruses in this tree is of interest, as it is not
apparent from the tree based solely on RT sequences.
The difference is largely the result of similarities in the
INT domains of these elements and those of the
chromoviruses. Marin and Llorens (2000) have
previously noted this discrepancy for Ty3 and
skipper. They proposed that it may result from
differing rates of evolution or reflect a recombinant
origin for these elements.
The relationship of the non-LTR retrotransposon
Cnl1 to other non-LTR elements was examined by
constructing trees based on an alignment of the
regions encompassing the 11 highly conserved blocks
of non-LTR retrotransposon RT identified by Malik
et al. (1999). An example of the trees obtained is
shown in Figure 4C. Cnl1 was found to group with
the elements CRE1 and SLACS from the trypanosomes Crithidia fasciculata and Trypanosoma brucei,
respectively. This grouping received 95% bootstrap
support. The distances between Cnl1 and the trypanosome elements in the tree (Figure 4C), and
Yeast 2001; 18: 865–880.
872
T. J. D. Goodwin and R. T. M. Poulter
Figure 4. Continues on next page
Copyright # 2001 John Wiley & Sons, Ltd.
Yeast 2001; 18: 865–880.
Cryptococcus neoformans retrotransposons
873
Figure 4. Relationships of C. neoformans elements to other retrotransposons. (A) Phylogenetic tree of LTR-retrotransposons
based on the seven universally conserved domains of reverse transcriptase. The four major groups of LTR-retrotransposons
are indicated. The tree shown is a distance tree constructed by the UPGMA method with PHYLIP (Felsenstein, 1989). (B)
Phylogenetic tree of Ty3/gypsy elements based on RT, RNase H and INT sequences. This tree was constructed by the
neighbour-joining method using PHYLIP. (C) Phylogenetic tree of non-LTR elements based on the regions encompassing the
11 conserved blocks of non-LTR retrotransposon reverse transcriptase. This tree was generated by the UPGMA method
using PHYLIP. In panels B and C the percentages of bootstrap support for branches receiving >50% support are shown
sequence comparisons (Figure 5C), however, show
that the elements are not particularly closely related.
Structure of the C. neoformans
retrotransposons
The deduced structures of all the LTR retrotransposons are presented in Figure 1. The structure
of Tcn1 is consistent with its classification with the
Tf1/sushi elements. Like most elements of this
Copyright # 2001 John Wiley & Sons, Ltd.
group, Tcn1 encodes its Gag and Pol proteins
in separate open reading frames (ORFs), with the
pol ORF being in the x1 phase with respect to gag.
Tcn1 also has a chromodomain-like module near
the C-terminal end of its integrase (including the
most conserved residues, KWxGYx4NSWEP;
Malik and Eickbush, 1999). It has the characteristic
self-priming structure found in all Tf1/sushi elements (Lin and Levin, 1997a,1997b; Figure 6A).
The Tcn1 self-priming structure appears to be
Yeast 2001; 18: 865–880.
874
T. J. D. Goodwin and R. T. M. Poulter
A
gypsy
Ty3
Tcn1
RF3
Tf1
Tcn2
Tcn5
Tcn3
Tcn4
F F T TL D LK S G YH QI Y L A E HD REK T S F S V N GGK Y E F C RL P F GL R N A S SI F Q R AL D D V L RE Q
I F T TL D LH S G YH QI P M E P KD RYK T A F VTP S GK Y E Y T V M PF GL V N A P ST F A R YM A D T FR DL
V F TKI D LR G A YN L L RI K A GE EWK T A F RTR Y GH F E Y L V M PF GL T N A P AS F QH LM N H N FR DL
F F TK LD LV G T YQ L L RI S P G H E P LT T F HTQ Y GM F E S L VI Q D GL R N A P AT F QH FL N N V FR DL
I F TK LD LK S A YH L I RV R K GD EHK L A F R C P RGV F E Y L V M PY GI S T A P AH F QY FI N T I LG E A
V F AK LD LT D A FF QT L M H E PD IEK T A I STP W GL Y E W V V M PQ GA C N S P AT Q Q R RL N E A L RNL
I F AKI D LS D A FF QT L M H E PD IEK T A I TTE L GL F E W V V M PQ GA C N S P AT Q Q R RL N E A L RGL
I F AK LD CK D A FF QT L M K E ED IPK T A I TTP L GL L E W V V M PQ GI R N A P AA Q Q R RI N E A LQ GL
L F AK LD CK D A FF QT L M K D DD IHK T A I TTP L GL L E W V V M PQ GI R N A P AA Q Q R RI N E A LQ GL
gypsy
Ty3
Tcn1
RF3
Tf1
Tcn2
Tcn5
Tcn3
Tcn4
I G K I C Y VY V D DV II F
R . . F V N VY L D DI LI F
L D I F V I IY L D DI LI Y
L G K G V T IY I N DI LI Y
K E S H V V CY M D DI LI H
I S V C C E AY V D DI II W
L G D S C E AY V D DI I V W
T G E C C E AY V D DI II W
A G E C C E AY V D DI II W
**
B
Ty5
RF5
copia
Tcn6
Tcn9
Ty1
I V Y Q M DV DT A F LNS K MNEP V YVK QP PG F I D Y VW E L Y G GM Y G LKQ A P L L WN E HI N N T LQ K I
E L H Q M DV KT A F LYG K LEED V YLD QP EG Y D G M VW K L D KAL Y G LKQ A P RA WY Q EL H S T LT S L
K V H Q M DV KT A F LN GT LKEE I YMR LP QG I S D N VC K L N KAI Y G LKQ A A RC WF E V F E Q ALK E C
E C D Q V DI KA A F LN GD LEET I YLE AP EG S D N K IL L L N KSL Y G LRQ S P RC FN K AL D Q W LK S Q
E C D Q V DI KS A F LN GD LDET I YLS PP E H S D T HIL R L R KSL Y G LRQ A P RC FN K A F D G WLK S Q
Y I T Q L DI SS A Y LYA D IKEE L YIR PP P H L G D KLI R L K KS H Y G LKQ S G A N WY E TI K S Y LI K Q
Ty5
RF5
copia
Tcn6
Tcn9
Ty1
G F R R I AL YV D D L L VA
G F I R V GV YV D D LTI A
E F V N V LL YV D D V VI A
G L K P L SV HV D D QLI A
D L M P L SV HV D D QLI A
C G M E I CL FV D D M I LF
**
C
Jockey
L1Hs
SLACS
Cnl1
CRE1
G AFL DI Q QA FD R VW HP G L LY KA K R L . F P P QL Y L VV KS FLE E RT F HVS V . DG YK S S I K P I A
I I SI DA E KA FD K IQ Q P FM LK TL N K L G I D G T YF K I I RA I Y D K PT A NIR L . NG QK L E A F P L K
L A ML DG R NA YN A IS RR A I LEA VY G D S T W S PL W R LV S L LL GT T G EV G F Y E NG K L C H T W EST
L A SL DA S NA F NR VD RA E M A AA VK T H A . . P TL W R T CKW A YGD S S DL V . C . .G DK I . . L QSS
L V AL DG V NA YN T MS RA H I LQA VY A E Q R L K PI W G VV KV AL GG P G FL G V Y R DG C L K G N L WST
Jockey
L1Hs
SLACS
Cnl1
CRE1
A GVP Q GS V L GP T LY SVF A S D MP T H T P . . . . . . V TE V D E EDVL I A TYA D DT A V
T GTR Q GC PL SP L L FNIV L E V L A R AI R Q E K E I K G IQ L G K EEVK L SL FA D DM I V
R GV RQ GM V L GP L L F S IG T L A T L R RL Q . . . . . . . .Q T F .PEA Q F TA Y L D DV T V
Q GV RQ GD P FGP L FF S IT L R P TL N AL S . . . . . . . .Q S L GPS T Q A LA Y L D DI Y L
K GIR Q GM V L GP L LY AT G M A A AI G PV R . . . . . . . .Q R I .PGVP V TA Y I D DI T L
**
Figure 5. Alignments of conserved reverse transcriptase domains of some C. neoformans retrotransposons and related
elements of other species. (A) Ty3/gypsy elements. (B) Ty1/copia elements. (C) Non-LTR retrotransposons. The highly
conserved pairs of aspartate residues are indicated by asterisks
Copyright # 2001 John Wiley & Sons, Ltd.
Yeast 2001; 18: 865–880.
Cryptococcus neoformans retrotransposons
875
Figure 6. Primer-binding sites of C. neoformans Ty3/gypsy elements. (A) The Tf1/sushi-group self-priming structure of Tcn1.
(B) Similarity of the PBSs of Tcn2, 3, 4 and 5 to that of skipper from D. discoideum. (C) Complementarity between the Tcn10
PBS and the 3k end of a C. neoformans methionine tRNA. In each case, the last 10 nucleotides of the upstream LTR are
underlined
particularly robust and actually contains more
paired residues (80 vs. 54) than the type element of
the group, Tf1. The relative ease with which a fulllength and apparently intact Tcn1 sequence was
assembled suggests that there are still intact
elements in the genome.
Tcn2, 3, 4, and 5 are very similar to each other in
structure (Figure 1A), as might be expected from
their close relationships (Figure 4A, B). They have,
however, diverged sufficiently that their LTRs are
now completely different in sequence. Again, fulllength and apparently intact sequences for each of
these elements were relatively easy to assemble,
suggesting that intact elements may be fairly
common. An interesting structural feature that
these elements all share is a very unusual minusstrand primer-binding site (PBS). Most LTR retrotransposons use either a cellular tRNA to prime
minus-strand DNA synthesis, in which case they
have a PBS complementary to part of a tRNA
molecule, or, if they belong to the Tf1/sushi group,
they have a characteristic self-priming structure,
Copyright # 2001 John Wiley & Sons, Ltd.
similar to that of Tcn1. The PBSs of Tcn2–5 do not
have any obvious complementarity to tRNAs,
neither do they potentially form Tf1/sushi-like selfpriming structures. Instead, their PBSs are very
similar to that of the slime-mould element, skipper
(Leng et al., 1998), mentioned above (Figure 6B).
Skipper is known to have an unusual PBS, but the
actual mechanism by which it primes minus-strand
DNA synthesis is not known. The similarities of the
PBSs of skipper and Tcn2–5, suggest that these
elements share a similar method of priming.
While little can be learned about the overall
structure of the final Ty3/gypsy element, Tcn10, due
to a lack of sequence data, this element also seems
to have an unusual PBS. Most LTR elements which
use the 3k end of a tRNA as a primer have PBSs
that are 18 nucleotides long. This represents the
distance to the first modified base in the tRNA
molecule. The PBS of Tcn10, however, is perfectly
complementary to 49 bp at the 3k end of a
C. neoformans methionine tRNA (Figure 6C). This
is easily the longest PBS that we know of. There is a
Yeast 2001; 18: 865–880.
876
2 bp gap between the end of the Tcn10 LTR and its
PBS. Such a gap is more commonly found in Ty3/
gypsy elements than in members of the Ty1/copia
group. It might be argued that the putative Tcn10
PBS is not a PBS at all, but simply a tRNA gene
which the Tcn10 element has inserted next to. We
believe that it is a genuine PBS, however, for several
reasons: (a) it begins with the sequence TGG, which
is complementary to the CCA trinucleotide that is
added post-transcriptionally to tRNA molecules,
but which is not generally present in tRNA genes;
(b) the corresponding tRNA gene has an intron that
is not present in the putative PBS; and (c) only the
3k-most 49 bp of the tRNA are present, not an entire
tRNA gene. A plausible alternative explanation,
however, is that this sequence represents an aberrant
reverse transcript, which has become inserted in the
genome. Additional sequence will be required to
distinguish between these possibilities.
The one Ty1/copia element for which the entire
sequence is available, Tcn6, has a single ORF,
similar to several of the elements, such as copia and
Tnt1, to which it is most closely related. It has a
PBS complementary to an internal portion of the
initiator methionine tRNA that is also commonly
found in related elements. The Tcn6 sequence was
relatively easily assembled, suggesting that there
may still be intact elements. For many of the other
Ty1/copia families, however, only short sequences
are available, and these are sometimes corrupted. It
is likely that few, if any, intact examples of these
families remain, at least in this strain.
The structure of Cnl1 (Figure 2A) is broadly
similar to the structures of CRE1, SLACS and
other early-branching non-LTR retrotransposons
(see Malik et al., 1999, for a summary of the
structures of the various groups of non-LTR
elements). It contains a single ORF encoding two
N-terminal zinc finger-like motifs (Cx2Cx9Hx4H
and Cx2Cx12Hx4C), a reverse transcriptase and a
C-terminal zinc-finger (Cx2Cx7Hx3C) followed
closely by a sequence (RHNx19IEPx7RNDx11
TDYx23Kx3F) resembling the restriction enzymelike endonucleases of other early-branching elements (Yang et al., 1999). Downstream of the
ORF is a short untranslated region. The element
most commonly terminates in the sequence
5k-TAACCC-3k, rather than with a poly-A tail.
Cnl1 differs from other early-branching non-LTR
elements in its integration site preferences. Other
early-branching elements, such as CRE1 and
Copyright # 2001 John Wiley & Sons, Ltd.
T. J. D. Goodwin and R. T. M. Poulter
SLACS from trypanosomes, R4 and NeSL from
nematodes and R2 from insects, display a high
degree of sequence-specificity in their integration —
usually for specific sequences within either rRNA
genes or spliced-leader exons. In contrast, Cnl1
seems to preferentially insert within pre-existing
copies of itself, e.g. of a random sample of 52 3k
ends of Cnl1 elements analysed, 37 (71%) were
inserted within another Cnl1 element. In every case,
the new element was inserted in the same orientation as its target, although no strict sequencespecificity was detected. The integration patterns
of Cnl1 elements result in the formation of arrays
of tandemly integrated elements (examples in
Figure 2B). Interestingly, the Cnl1 elements can
frequently be found closely associated with tandem
repeats of the sequence 5k-TAACCCCC-3k (and
slight variations thereof). This sequence is thought
to comprise the C. neoformans telomeres (Edman,
1992).
Comparisons of the LTRs
Analysis of LTRs can be of particular use in
studying LTR retrotransposon evolution, as they
are usually the most rapidly evolving part of the
element and they are often much more abundant
than the internal regions. Several properties of each
of the 15 different C. neoformans LTR families are
listed in Table 1. As mentioned previously, there is
considerable variation in the levels of nucleotide
diversity displayed by different families. Families
with full-length and apparently intact elements
generally have very low levels of diversity, suggesting that they have amplified very recently. The
families with the highest levels of diversity only
appear to have solo LTRs remaining, consistent
with these being older elements. From rough
estimates of the relative abundance of the LTRs of
each family, it is apparent that the most abundant
LTRs are those of the full-length elements, while
LTRs from the partial elements are less common.
This also suggests that the full-length elements have
undergone a more recent amplification.
The majority of the full-length solo LTRs were
found flanked by short direct repeats. These are
target-site duplications (TSDs), which are formed
during the integration reaction. For most families
the associated TSDs are 5 bp in length (Table 1).
Analyses of the perfect 5 bp TSDs of retrotransposons from S. cerevisiae and C. albicans revealed a
Yeast 2001; 18: 865–880.
Cryptococcus neoformans retrotransposons
strong preference for having A or T residues in the
three internal positions of the TSDs (Kim et al.,
1998; Goodwin and Poulter, 2000). A similar bias
was observed in the perfect 5 bp TSDs of the
C. neoformans elements. The percentage of A or T
residues in the five positions of the TSD (from a
total of 43 LTRs) are: 1, 42%; 2, 65%; 3, 65%; 4,
80%; and 5, 40%. Two of the C. neoformans LTRs,
LTRs 10 and 15, have 4 bp TSDs. Such TSDs are
common for Ty3/gypsy elements but, to the best of
our knowledge, have not been found for Ty1/copia
elements. It is thus suspected that these LTRs are
from Ty3/gypsy-group elements. For these LTRs, a
very strong bias towards inserting at RCGY
sequences (where R=purine and Y=pyrimidine)
was observed. A significant number of the
C. neoformans solo LTRs are not flanked by
sequences resembling TSDs. The loss of TSDs
from these elements is probably the result of
recombinations between LTRs at different sites
in the genome. The presence of such elements
suggests that some chromosomal rearrangements in
C. neoformans may occur at sites of transposable
element insertions.
Discussion
Here we have described the identification and initial
characterization of 15 or more families of LTR
retrotransposons and several families of non-LTR
retrotransposons from the medically important
yeast Cryptococcus neoformans. The description of
these repetitive elements should assist in the
assembly of the C. neoformans genome sequence
data and, moreover, may allow the development of
an efficient in vivo random tagged mutagenesis
system. The results also further our understanding
of the impact of transposable elements on genome
evolution.
Full-length sequences were readily assembled for
six of the C. neoformans LTR retrotransposon
families. Five of these are from Ty3/gypsy group
elements and one is from a Ty1/copia element. The
sequences represent structurally intact elements,
raising the possibility that these elements are still
active. This possibility is supported by an evident
high degree of sequence homogeneity in these
families, which suggests very recent transposition
events and, for the Ty3/gypsy elements in particular,
relatively high copy numbers. The elements for
Copyright # 2001 John Wiley & Sons, Ltd.
877
which just partial sequences could be identified, in
general, appear to be present at lower copy
numbers and to be more diverse in sequence.
These families probably amplified some time ago
and have since accumulated mutations and diverged
in sequence. There may be few, if any, intact
elements remaining for these families.
An apparently full-length sequence was
assembled for one non-LTR retrotransposon
family, Cnl1. The sequence represents a structurally
intact element, containing all the expected motifs of
a functional element. The elements of this family
are highly homogeneous in sequence, suggesting
that they have recently transposed and may still be
active. The only other non-LTR retrotransposon
families identified appear to be low copy number
and degenerate relatives of Cnl1. Few, if any, of
these elements are likely to be active.
C. neoformans is the first basidiomycete for which
there is a large amount of sequence data available.
At present, all the most well-characterized fungal
genomes are from ascomycetes. We found that few
of the C. neoformans retrotransposons are closely
related to previously identified fungal elements. For
instance, the several Ty1/copia-group elements are
more closely related to elements from plants and
animals than to other known fungal elements.
Similarly, the Ty3/gypsy group elements, Tcn2, 3,
4 and 5, are not close relatives of any previously
identified retrotransposons. These findings suggest
that the diversity of fungal retrotransposons has
only been sparsely sampled to date. It also suggests
that it will be necessary to characterize a wide
variety of the most basal eukaryotic lineages in
order to understand the full diversity of LTR
retrotransposons.
All the Ty3/gypsy elements of C. neoformans have
remarkable primer-binding sites. Tcn1 has a selfpriming structure, resembling those of other members of the Tf1/sushi group, but which has
considerably more paired residues than any other
element. Tcn10 has a PBS complementary to a
cellular tRNA but which is much longer than any
other similar PBS. Tcn2, 3, 4, and 5 have PBSs,
consisting of single cytosine residues followed by
poly-thymidine tracts, the like of which have only
been seen once before — in the element skipper
from the slime mould, Dictyostelium discoideum.
These remarkable PBSs may once again reflect an
as yet sparse sampling of the diversity of fungal
retrotransposons. Alternatively, they may indicate
Yeast 2001; 18: 865–880.
878
T. J. D. Goodwin and R. T. M. Poulter
that exceptional PBSs are necessary for successful
transposition in C. neoformans. For instance,
because C. neoformans is a pathogen of warmblooded animals, its retrotransposons will frequently be exposed to higher temperatures than
other characterized retrotransposons, the majority
of which have been found in plants, plantpathogenic or saprophytic fungi or cold-blooded
animals. The higher temperatures that the
C. neoformans elements are exposed to may mean
that they require either greater stability in their
priming structures or novel priming mechanisms, in
order to transpose successfully.
The mechanism by which Tcn2–5 and skipper
prime minus-strand DNA synthesis is not known
at present. Leng et al. (1998) speculated that the
poly-A sequence at the end of the skipper
mRNA might anneal to the poly-T tract in the
PBS and subsequently be used as a minus-strand
primer. The fact that this poly-T tract is highly
conserved in the C. neoformans elements is consistent with this proposal. The finding of this novel
priming system, in what are probably active yeast
retrotransposons, should assist in its molecular
characterization.
Tcn2, 3, 4 and 5 do not appear to be closely
related to skipper on the basis of their RT
sequences, despite sharing the unusual priming
system. They also have quite different structures:
the four C. neoformans elements all encode their
Gag and Pol proteins in a single ORF, whereas
skipper has a conserved termination codon between
gag and pro, and a +1 phase-shift between pro and
the rest of pol. However, when three conserved Pol
domains are used to build phylogenetic trees,
skipper and the C. neoformans elements group
more closely together, largely because of similarities
in their INT domains. It seems likely that at
least some of these elements have a recombinant
origin.
Members of the C. neoformans non-LTR retrotransposon family Cnl1 are most commonly found
inserted within pre-existing Cnl1 elements and, in
these cases, they are always in the same orientation
as their targets. The Cnl1 elements thus form polar
arrays of nested retrotransposons. Cnl1 elements are
also frequently associated with tandem repeats of a
sequence thought to comprise the C. neoformans
telomeres. The putative telomeric repeats are usually
located at the 5k-most ends of the Cnl1 elements. This
suggests that the Cnl1 arrays are located close to the
ends of chromosomes with the 3k-ends of the elements
closest to the centromere. The structures produced by
the Cnl1 elements resemble Drosophila telomeres,
which consist of polar arrays of nested nonLTR retrotransposons (Pardue et al., 1996). The C.
neoformans telomeres may be similar to an evolutionary intermediate between typical telomerasemaintained telomeres and retrotransposon telomeres, and may make a useful model for studying
the transition between these two systems.
The number of distinct LTR retrotransposon
families varies considerably in different fungal
genomes (Table 2). Why should this be so? The
data available at present suggest that at least some
of the variation is due to differences in the rates at
which elements are lost from the genome. For
instance, nearly all the retrotransposon families in
both S. cerevisiae (Kim et al., 1998) and Sz. pombe
(Levin and Boeke, 1992; Hoff et al., 1998) are still
active, and there are no families represented just by
solo LTRs, suggesting that elimination of nonfunctional elements is rapid in these species. In
contrast, C. albicans has a large number of distinct
families (Goodwin and Poulter, 2000). Many of
these, such as those represented solely by hetero-
Table 2. Numbers of LTR families in well-characterized fungal genomes
Species
Number of
LTR familiesa
Solo LTR
familiesb
Haploid genome
size (Mb)
Phylum
C. neoformans
S. cerevisiae
Sz. pombe
C. albicans
15
4
1
34
5
0
0
18
23
12
14
16
Basidiomycota
Ascomycota
Ascomycota
Ascomycota
a
The Ty1 and Ty2 LTRs of S. cerevisiae, and the Tf1 and Tf2 LTRs of Sz. pombe, would each make up single families under the classification scheme
used for C. albicans and C. neoformans.
b
LTR families for which no associated internal regions have been identified.
Copyright # 2001 John Wiley & Sons, Ltd.
Yeast 2001; 18: 865–880.
Cryptococcus neoformans retrotransposons
geneous groups of isolated LTRs, are inactive
and unlikely to have transposed in a long time.
Turnover of elements in C. albicans is therefore
likely to be very slow compared to S. cerevisiae or
Sz. pombe. The number of distinct LTR retrotransposon families in C. neoformans is intermediate
between S. cerevisiae/Sz. pombe and C. albicans
(Table 2). In addition, it has several families for
which only isolated LTRs have been identified to
date, but nowhere near as many of these families as
C. albicans. These findings suggest that elimination
of retrotransposons in C. neoformans is faster than
in C. albicans but not as rapid as in S. cerevisiae or
Sz. pombe.
Assuming that the rate at which elements are lost
from the genome makes a major contribution to the
diversity of elements in a species, there is still the
problem of why this rate should differ between
species. It is possible that differing modes of
reproduction in the host will affect the rate at
which transposable elements are eliminated. We
previously speculated that the apparent lack of
sexual reproduction in C. albicans may have
contributed to its relative abundance of LTR
elements, as, in the absence of meiotic recombination and out-crossing, even selectively disadvantageous insertions would be difficult to remove
efficiently from the population (Goodwin and
Poulter, 2000). To fully assess this proposal, it will
be necessary to have not only a detailed understanding of the diversity of transposable elements
within a wide variety of species but also a detailed
understanding of the population structure of each
species. Several studies suggest that the population
structure of C. albicans is predominantly clonal, as
expected given that C. albicans has long been
considered asexual (Pujol et al., 1993; Graser et al.,
1996). Population studies of C. neoformans show
that, like C. albicans, this organism has a predominantly clonal population structure (Chen et al.,
1995; Franzot et al., 1997), despite having a wellcharacterized sexual cycle. The structures of wild
S. cerevisiae and Sz. pombe populations have not
been described to date, although one might predict,
given the ease with which laboratory strains are
crossed, that they would undergo more recombination than C. albicans and perhaps more than
C. neoformans.
Other features of host genomes could also lead to
differing rates of elimination of transposable elements. For instance, the S. cerevisiae genome
Copyright # 2001 John Wiley & Sons, Ltd.
879
appears to be particularly compact, as attested to
by its small intergenic regions, its rare introns and
its small size (Dujon, 1996). The same processes
that produce this streamlined genome may bring
about the rapid elimination of retrotransposons.
The Sz. pombe genome may be similarly compressed, as it is also very small (Fan et al., 1989)
and has few transposable elements. C. neoformans
is a unicellular organism like S. cerevisiae and
Sz. pombe and has a life cycle of similar complexity.
The fact that its genome is nearly twice as large as
those of S. cerevisiae or Sz. pombe suggests that it
has not been under the same pressure to shed nonessential DNA. This may also account for its
greater diversity of retrotransposons.
The overall diversity of transposable elements in
different species is probably the result of many
complex interactions between the elements and their
hosts. As more genomes are characterized, it should
be possible to determine the most important factors
controlling transposable element diversity and
evolution.
Acknowledgements
We thank Richard Hyman and the other members of the
C. neoformans sequencing project at the Stanford Genome
Technology Center and Nagasaki University for permission
to use their unpublished data. Sequencing of C. neoformans
at Stanford was accomplished with the support of the NIH.
References
Boeke JD, Stoye JP. 1997. Retrotransposons, endogenous retroviruses, and the evolution of retroelements. In Retroviruses,
Coffin JM, Hughes SH, Varmus HE (eds). Cold Spring Harbor
Laboratory Press: New York; 343–435.
Casadevall A, Freundlich LF, Marsh L, Scharff MD. 1992.
Extensive allelic variation in Cryptococcus neoformans. J Clin
Microbiol 30: 1080–1084.
Chen F, Currie BP, Chen LC, Spitzer SG, Spitzer ED,
Casadevall A. 1995. Genetic relatedness of Cryptococcus
neoformans clinical isolates grouped wih the repetitive DNA
probe CNRE-1. J Clin Microbiol 33: 2818–2822.
Devereux J, Haeberli P, Smithies O. 1984. A comprehensive set
of sequence analysis programs for the VAX. Nucleic Acids Res
12: 387–395.
Dujon B. 1996. The yeast genome project: what did we learn?
Trends Genet 12: 263–270.
Edman JC. 1992. Isolation of telomerelike sequences from
Cryptococcus neoformans and their use in high-efficiency
transformation. Mol Cell Biol 12: 2777–2783.
Fan JB, Chikashige Y, Smith CL, Niwa O, Yanagida M, Cantor
CR. 1989. Construction of a Not I restriction map of the
Yeast 2001; 18: 865–880.
880
fission yeast Schizosaccharomyces pombe genome. Nucleic
Acids Res 17: 2801–2818.
Felsenstein J. 1989. PHYLIP — phylogeny inference package
(version 3.2). Cladistics 5: 164–166.
Franzot SP, Hamdan JS, Currie BP, Casadevall A. 1997.
Molecular epidemiology of Cryptococcus neoformans in Brazil
and the United States: evidence for both local genetic
differences and a global clonal population structure. J Clin
Microbiol 35: 2243–2251.
Franzot SP, Fries BC, Cleare W, Casadevall A. 1998. Genetic
relationship between Cryptococcus neoformans var. neoformans
strains of serotypes A and D. J Clin Microbiol 36: 2200–2204.
Goodwin TJD, Poulter RTM. 2000. Multiple LTRretrotransposon families in the asexual yeast Candida albicans.
Genome Res 10: 174–191.
Graser Y, Volovsek M, Arrington J, et al. 1996. Molecular
markers reveal that population structure of the human
pathogen Candida albicans exhibits both clonality and recombination. Proc Natl Acad Sci U S A 93: 12473–12477.
Heitman J, Allen B, Alspaugh JA, Kwon-Chung KJ. 1999. On
the origins of congenic MATa and MATa strains of the
pathogenic yeast Cryptococcus neoformans. Fungal Genet Biol
28: 1–5.
Hoff EF, Levin HL, Boeke JD. 1998. Schizosaccharomyces
pombe retrotransposon Tf2 mobilizes primarily through homologous cDNA recombination. Mol Cell Biol 18: 6839–6852.
Kidwell MG, Lisch DR. 2000. Transposable elements and host
genome evolution. Trends Ecol Evol 15: 95–99.
Kim JM, Vanguri S, Boeke JD, Gabriel A, Voytas DF. 1998.
Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete
Saccharomyces cerevisiae genome sequence. Genome Res 8:
464–478.
Kimura M. 1980. A simple method for estimating evolutionary
rates of base substitutions through comparative studies of
nucleotide sequences. J Mol Evol 16: 111–120.
Kwon-Chung KJ. 1975. A new genus, Filobasidiella, the perfect
state of Cryptococcus neoformans. Mycologia 67: 1197–1200.
Kwon-Chung KJ, Bennett JE, Rhodes JC. 1982. Taxonomic
studies on Filobasidiella species and their anamorphs. Antonie
Leeuwenhoek 48: 25–38.
Kwon-Chung KJ, Bennett JE. 1992. Medical Mycology. Lea &
Febiger: Philadelphia, PA.
Leng P, Klatte DH, Schumann G, Boeke JD, Steck TL. 1998.
Skipper, an LTR retrotransposon of Dictyostelium. Nucleic
Acids Res 26: 2008–2015.
Levin HL, Boeke JD. 1992. Demonstration of retrotransposition
of the Tf1 element in fission yeast. EMBO J 11: 1145–1153.
Lin JH, Levin HL. 1997a. A complex structure in the mRNA of
Tf1 is recognized and cleaved to generate the primer of reverse
transcription. Genes Dev 11: 270–285.
Lin JH, Levin HL. 1997b. Self-primed reverse transcription is a
mechanism shared by several LTR-containing retrotransposons. RNA 3: 952–953.
Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for
improved detection of transfer RNA genes in genomic
sequence. Nucleic Acids Res 25: 955–964.
Copyright # 2001 John Wiley & Sons, Ltd.
T. J. D. Goodwin and R. T. M. Poulter
Malik HS, Burke WD, Eickbush TH. 1999. The age and
evolution of non-LTR retrotransposable elements. Mol Biol
Evol 16: 793–805.
Malik HS, Eickbush TH. 1999. Modular evolution of the
integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol 73: 5186–5190.
Marin I, Llorens C. 2000. Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary
perspectives derived from comparative genomic data. Mol Biol
Evol 17: 1040–1049.
Nei M, Li WH. 1979. Mathematical model for studying genetic
variation in terms of restriction endonucleases. Proc Natl Acad
Sci U S A 76: 5269–5273.
Pardue ML, Danilevskaya ON, Lowenhaupt K, Slot F, Traverse
KL. 1996. Drosophila telomeres: new views on chromosome
evolution. Trends Genet 12: 48–52.
Pujol C, Reynes J, Renaud F, et al. 1993. The yeast Candida
albicans has a clonal mode of reproduction in a population of
infected human immunodeficiency virus-positive patients. Proc
Natl Acad Sci U S A 90: 9456–9459.
Sandmeyer S. 1998. Targeting transposition: at home in the
genome. Genome Res 8: 416–418.
San Miguel P, Bennetzen JL. 1998. Evidence that a recent
increase in maize genome size was caused by the massive
amplification of intergene retrotransposons. Ann Bot 82:
37–44.
Schneider S, Roessli D, Excoffier L. 2000. Arlequin, version
2.000: a software for population genetics data analysis.
Genetics and Biometry Laboratory, University of Geneva,
Switzerland.
Smit AFA. 1999. Interspersed repeats and other mementos of
transposable elements in mammalian genomes. Curr Opin
Genet Dev 9: 657–663.
Smith V, Botstein D, Brown PO. 1995. Genetic footprinting: a
genomic strategy for determining a gene’s function given its
sequence. Proc Natl Acad Sci U S A 92: 6479–6483.
Tanaka R, Taguchi H, Takeo K, Miyaji M, Nishimura K. 1996.
Determination of ploidy in Cryptococcus neoformans by flow
cytometry. J Med Vet Mycol 34: 299–301.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins
DG. 1997. The CLUSTAL_X windows interface: flexible
strategies for multiple sequence alignment aided by quality
analysis tools. Nucleic Acids Res 25: 4876–4882.
Wickes BL, Moore TDE, Kwon-Chung KJ. 1994. Comparison
of the electrophoretic karyotypes and chromosomal location of
ten genes in the two varieties of Cryptococcus neoformans.
Microbiol 140: 543–550.
Xiong Y, Eickbush TH. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences.
EMBO J 9: 3353–3362.
Yang J, Malik HS, Eickbush TH. 1999. Identification of the
endonuclease domain encoded by R2 and other site-specific,
non-long terminal repeat retrotransposable elements. Proc
Natl Acad Sci U S A 96: 7847–7852.
Yeast 2001; 18: 865–880.