Analyses of RNA Polymerase II Genes from Free

Analyses of RNA Polymerase II Genes from Free-Living Protists:
Phylogeny, Long Branch Attraction, and the Eukaryotic Big Bang
Joel B. Dacks,*1 Alexandra Marinets,†1 W. Ford Doolittle,* Thomas Cavalier-Smith,‡ and
John M. Logsdon, Jr.§
*Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry and Molecular
Biology, Dalhousie University, Halifax; †Department of Botany, University of British Columbia, Vancouver; ‡Department of
Zoology, University of Oxford, South Parks Road, UK; and §Department of Biology, Emory University
The phylogenetic relationships among major eukaryotic protist lineages are largely uncertain. Two significant obstacles in reconstructing eukaryotic phylogeny are long-branch attraction (LBA) effects and poor taxon sampling
of free-living protists. We have obtained and analyzed gene sequences encoding the largest subunit of RNA Polymerase II (RPB1) from Naegleria gruberi (a heterolobosean), Cercomonas ATCC 50319 (a cercozoan), and Ochromonas danica (a heterokont); we have also analyzed the RPB1 gene from the nucleomorph (nm) genome of
Guillardia theta (a cryptomonad). Using a variety of phylogenetic methods our analysis shows that RPB1s from
Giardia intestinalis and Trichomonas vaginalis are probably subject to intense LBA effects. Thus, the deep branching of these taxa on RPB1 trees is questionable and should not be interpreted as evidence favoring their early
divergence. Similar effects are discernable, to a lesser extent, with the Mastigamoeba invertens RPB1 sequence.
Upon removal of the outgroup and these problematic sequences, analyses of the remaining RPB1s indicate some
resolution among major eukaryotic groups. The most robustly supported higher-level clades are the opisthokonts
(animals plus fungi) and the red algae plus the cryptomonad nm—the latter result gives added support to the red
algal origin of cryptomonad chloroplasts. Clades comprising Dictyostelium discoideum plus Acanthamoeba castellanii (Amoebozoa) and Ochromonas plus Plasmodium falciparum (chromalveolates) are consistently observed and
moderately supported. The clades supported by our RPB1 analyses are congruent with other data, suggesting that
bona fide phylogenetic relationships are being resolved. Thus, the RPB1 gene has apparently retained some phylogenetically meaningful signal, making it worthwhile to obtain sequences from more diverse protist taxa. Additional
RPB1 data, especially in combination with other genes, should provide further resolution of branching orders among
protist groups within the apparently rapid early divergence of eukaryotes.
Introduction
In the late 1980s and early 1990s, small subunit
ribosomal DNA (ssu rDNA) painted a picture of three
major early evolving eukaryotic lineages: the diplomonads, parabasalids, and microsporidia. These groups were
followed sequentially by euglenozoans and heteroloboseans and an unresolved radiation of so-called crown
taxa, including animals, plants, fungi, and a number of
protists (Sogin 1991). This ssu rDNA-based view of eukaryotic relationships has been greatly weakened by
protein phylogenies, indicating that some taxa, such as
microsporidia (Keeling and Doolittle 1996; Germot,
Philippe, and Le Guyader 1997; Hirt et al. 1999) and
Mycetozoa (Baldauf and Doolittle 1997; Baldauf et al.
2000), are seriously misplaced on ssu rDNA trees. Related studies reveal phylogenetic artifacts in ssu rDNA
trees formerly thought to support the apparently early
divergence of diplomonads and parabasalids (Hirt et al.
1999; Silberman et al. 1999; Stiller and Hall 1999; Philippe et al. 2000b) and suggest that the root of the ssu
1
Contributed equally to this paper.
Abbreviations: RPB1, RNA Polymerase II largest subunit; LBA,
long-branch attraction; CTD, carboxy-terminal domain of RPB1; nm,
nucleomorph.
Key words: evolution, Naegleria, Cercomonas, Ochromonas, intron, nucleomorph.
Address for correspondence and reprints: John M. Logsdon Jr.,
Department of Biology, Emory University, 1111 Rollins Research
Center, 1510 Clifton Road, Atlanta, Georgia 30322.
E-mail: [email protected].
Mol. Biol. Evol. 19(6):830–840. 2002
q 2002 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
830
rDNA tree using bacterial outgroups may be misplaced
(Embley and Hirt 1998).
How to reconcile the ssu rDNA and protein sequence evidence is hotly debated and has prompted several alternative views of eukaryotic relationships. The
‘‘eukaryotic big bang’’ hypothesis suggests that eukaryotes evolved in a massive radiation of 4–10 groups
whose interrelationships are fundamentally irresolvable
(Philippe and Adoutte 1998; Philippe, Germot, and Moreira 2000a). An alternative view, based on combined
protein data, proposes two superclades of eukaryotes:
one group, called opisthokonts (Cavalier-Smith 1987),
contains animals, fungi, and their choanozoan relatives,
whereas the other contains plants, chromists, and most
protozoa (Embley and Hirt 1998; Dacks and Roger
1999; Baldauf et al. 2000; Edgcomb et al. 2001). As
ribosomal rRNA trees also invariably robustly resolve
this dichotomy between opisthokonts and the rest of eukaryotes (Sogin 1991; Cavalier-Smith 1993, 2000),
some of the broad features of both rRNA and protein
trees can be reconciled and are congruent with key ultrastructural data (Cavalier-Smith 2000, 2002). However, a number of unanswered questions remain: the evolutionary affinities of the many protist groups not clearly
attributable to any major grouping, the root of the eukaryotic tree, and the identity of early evolving lineages.
Some difficulties are largely methodological, including
artifacts arising from long-branch attraction (LBA). Another problem, more easily remediable, is poor taxon
sampling—many protein trees entirely omit key, often
free-living, protist groups.
RPB1s from Free-Living Protists
The largest subunit of RNA Polymerase II (RPB1)
has been one of the few proteins used to address issues
of major eukaryotic relationships (Stiller and Hall 1997;
Stiller and Hall 1998; Hirt et al. 1999). RPB1 is large
(ca. 1,600 amino acid residues), and phylogenetic trees
of this molecule can be outgroup-rooted by either its archaebacterial homologs or by its eukaryotic-specific paralogs, RPA1 or RPC1 (the largest subunits of RNA Polymerase I and III, respectively). However, because RPB1
genes are so large, they have not been characterized in a
wide variety of eukaryotic species: the taxonomic representation of RPB1 is particularly sparse compared with
other molecules, such as tubulins or ssu rDNA (see Baldauf et al. 2000). RPB1 orthologs have been well sampled and characterized from animals and fungi. RPB1
sequences are also available for parasitic protists once
thought to be early emerging eukaryotes, but notably
lacking are sequences from free-living protists. Organisms heretofore missing from RPB1 analyses include heterokonts (or stramenopiles), cercomonads, heteroloboseans, and the cryptomonad nucleomorph (nm).
The first three are each monophyletic groups, having a variety of proposed larger-scale evolutionary affinities. The heterokonts are a collection of algae and
secondary heterotrophs that have recently been proposed
as related to the alveolates (Cavalier-Smith 1999; Fast
et al. 2001). Cercomonads are related to various filose
amoebae, thaumatomonads, and chlorarachniophytes
(collectively Cercozoa: Cavalier-Smith 1998), and possibly also to foraminifera (Keeling 2001). Heterolobosea
were proposed as an early evolving lineage because of
ssu rDNA evidence and their lack of Golgi dictyosomes
(Cavalier-Smith 1993) but are now thought to be related
to Euglenozoa in a larger excavate assemblage (Simpson
and Patterson 1999; Cavalier-Smith 2002). The cryptomonad nm is the relict nucleus of an anciently captured red algal cell (Douglas et al. 1991, 2001).
In this study we cloned and sequenced the gene
encoding RPB1 from Ochromonas danica (heterokont),
Cercomonas ATCC50319, and Naegleria gruberi (heterolobosean). We have also analyzed RPB1 from the
Guillardia theta nm. These data provide significant additions to the diversity of protist taxa represented in the
RPB1 data set. We examine the evolutionary affinities
of these lineages, as well as the effects of LBA in this
data set, by a number of phylogenetic methods and consider the implications of RPB1 phylogeny for the evolution of transcription, spliceosomal introns, and the
overall pattern of eukaryotic evolution.
Materials and Methods
DNA
Purified DNA from N. gruberi was generously donated by R. J. Redfield (University of British Columbia).
Total genomic DNA was extracted from O. danica
(ATCC 30004) and Cercomonas ATCC 50319 using a
CTAB extraction (Lichtenstein and Draper 1985).
PCR Amplification
Conserved regions A–D of the Naegleria RPB1
gene sequence were amplified using the degenerate PCR
831
primers RPB1-F1 (GAG TGT CCA GGN CAY TTY
GG) and RPB1-R2 (GTC GAA GTC TGC RTT RTA
NGG) described in Hirt et al. (1999). After sequencing
of this fragment, an exact-match primer, RPB1-N5X1
(AAG ATG GTA CAC GTA TCG), was used in combination with the reverse degenerate primer RPB1-R4
(TG GAA CGT ATT NAR NGT CAT) to obtain the
remaining regions used for phylogenetic analysis. A second exact-match primer, RPB1-N3X1 (CAA GGG TAC
TGA TGA ATT GTC), was used in combination with
degenerate primer CTDR1 (TGA TAG ACT GGN GAN
GTN GG) to amplify the remaining portion of the gene,
including conserved region H and a portion of the carboxy-terminal domain (CTD). All PCR fragments were
cloned into TOPO 2.1 vector using the TOPO TA cloning kit (InVitrogen). Sequencing of each clone was by
LICOR and ABI automated sequencers. All clones were
sequenced in both directions, and the full gene sequence
was assembled with two- to sixfold coverage.
Degenerate primers (described in Stiller and Hall
1997; Stiller, Duffield, and Hall 1998) were used for
PCR amplification of RPB1 regions A–D, D–F, and F–
G (Stiller and Hall 1997) from Cercomonas
ATCC50319 and O. danica. Products were cloned into
Topo TA vectors (InVitrogen) and completely sequenced
using ABI sequencing protocols.
Phylogeny
All sequences were obtained from NCBI, with the
exception of those from N. gruberi (AF395110) O. danica (AF395111) and Cercomonas ATCC50319
(AF395835) reported herein. These three RPB1 sequences plus three RPA1 sequences as outgroups were
manually added to a previously published alignment
(Hirt et al. 1999) using MacClade 4.0. (Maddison and
Maddison 2000). The final alignment used for global
phylogenetic analysis contained 22 taxa and 746 aligned
amino acid sites. A sub-data set with the outgroups removed was also analyzed. These alignments are available upon request. The microsporidial RPB1 sequences
were not included in these analyses as they represent
highly divergent fungi (Hirt et al. 1999), and two representative, less divergent, fungal sequences were used
instead.
Maximum parsimony analyses were performed using Paup* 4.0b (Swofford 1998), whereas NeighborJoining and Fitch distance analyses used Phylip 3.573
(Felsenstein 1995). Protein maximum likelihood (ML)
analyses were done using two methods. Puzzle 4.0.2
(Strimmer and von Haeseler 1997) was used incorporating a gamma correction for among site rate variation
plus a correction for invariant sites (8 1 1 rate categories) estimated from the data set. A Neighbor-Joining
tree, estimated by Puzzle 4.0.2, was used as a basis for
site rate calculations. In addition, a protML 2.2 (Adachi
and Hasegawa 1996) heuristic (2q 10,000) search was
performed for each data set. For the protML analyses,
the relative estimated log likelihood values (RELLs)
were calculated using Mol2con (A. Stoltzfus, personal
communication). Although full ML heuristic searches
832
Dacks et al.
were done to search for the optimal topology using
ProML 3.6a (Felsenstein 1995), the optimal topology
found by ProML contradicted several nodes supported
by all other methods in our analyses, including some
which are well established in the literature; these alternate nodes were not supported by any of the other methods at greater than 50%. For this reason, the trees shown
are the best protML topology (i.e., with the highest log
likelihood) with branch lengths estimated in Puzzle to
incorporate gamma-distributed rates and invariant sites.
Although these protML trees provide an accurate representation of RPB1 phylogeny, they may not be the
overall best trees because of computational limitations.
ML distance analyses used Tree-Puzzle 4.02 (previously
called Puzzle; Strimmer and von Haeseler 1997) to calculate ML distance matrices along with Puzzleboot (A.
Roger and M. Holder, personal communication;
www.tree-puzzle.de); resampled matrices were then analyzed using Fitch (Felsenstein 1995) with global rearrangements and 10 times jumbling. All bootstrap support values are based on 100 replicates.
After the LBA tests, a new alignment was constructed initially using Clustal X (Thompson et al. 1997)
and then adjusted manually such that only regions of
unambiguously alignable sequence were retained for
analysis (17 taxa, 910 sites). Phylogenetic analyses for
this restricted data set were identical to those done on
the global set.
LBA Tests
For the 22 taxon, 746 site data set, evolutionary
rates at all amino acid sites were calculated using Puzzle
4.0.2 (Strimmer and von Haeseler 1997), and selected
sites were removed manually in MacClade 4.0. (Maddison and Maddison 2000). For fast site removal (FSR)
all sites calculated to be in the fastest rate category (category 8 of 8) were removed. For constant site removal
(CSR) those in the slowest, i.e., invariant, class (category 0) were removed. For fast and constant site removal (FCSR), sites in both categories were eliminated.
Heuristic protML analyses (2q 10,000) were performed
and RELL values determined as described previously.
For establishing the autapomorphy to symplesiomorphy ratio, each class of substitution was assessed manually from the 22-taxon alignment. Autapomorphies
were defined as unique substitutions at an otherwise invariant position within the in-group taxa and not shared
with the outgroups. The outgroups, however, did not
also have to share the invariant residue with the other
in-groups. Symplesiomorphies were defined as substitutions shared between an in-group and at least two of
the three outgroup taxa but different from the other ingroups at an otherwise invariant position for those ingroups. Substitutions for both classes were tallied for
each in-group taxon individually as well as for several
taxonomic groups (fungi, animals, red algae, kinetoplastids) to compensate for uneven taxon sampling in
the alignment. For these groups the final substitution
count was the sum of those substitutions shared by the
group and the average of the substitutions found in each
of the component taxa.
RASA 2.5 (Lyons-Weiler and Hoelzer 1999) was
used to assess the phylogenetic signal in the various data
sets and to identify long-branch sequences in a phylogeny independent fashion. Outgroup-rooted analyses (using the analytical method in RASA) were performed on
the 22-taxon data set, including the RPA1 sequences.
Unrooted RASA analyses were performed with the
RPA1 sequences removed; these analyses were performed with both the analytical and permutation methods, the latter with 30 replicates.
Results
Physical Attributes of the RPB1 Sequences
Using various degenerate and specific primers, a
4,790 nt ORF (encoding 1,596 amino acid residues), uninterrupted by introns, was amplified from N. gruberi.
Only conserved regions A–G were amplified from Cercomonas ATCC50319 and O. danica, resulting in RPB1
sequences of 3,258 nt (1,055 amino acid residues) and
3,191 nt (1,062 residues), respectively. Whereas the O.
danica sequence was uninterrupted by introns, the Cercomonas sequence had two, of 119 and 94 bases,
respectively.
Degenerate primers were used to amplify the CTD
of RPB1 from N. gruberi. This region of RPB1 in animals, plants, and fungi, as well as a number of protists
is composed of heptad repeats with a canonical sequence
of YSPTSPS (Lam et al. 1992; Stiller, Duffield, and Hall
1998). Because this region was determined by PCR amplification, the precise number of repeats in Naegleria
is not known, although at least eight heptads are present.
Several repeats in the Naegleria CTD are degenerate,
before beginning a more canonical YSPTSPA/
YSPTSPN register. There is no information regarding
the presence of CTDs in Cercomonas and Ochromonas
RPB1 genes because they would be outside the amplified regions.
Uniquely, the G. theta nm RPB1 lacks a distinctive
CTD, apparently ending abruptly at the end of domain
H (the final conserved block in RPB1). Interestingly, we
note a possible tandem pair of degenerate heptad repeats: YSLSLKLF-YSMMKNF in one of three ORFs
annotated as hypothetical protein genes in the 1,699 bp
region immediately downstream of the putative RPB1
stop codon and before the next bona fide gene (rpl37A).
However, database searches with this region did not reveal any significant similarity with CTDs (or any other
proteins). Codon usage for the three ORFs resembles
that for the RPB1 gene but with so few codons in each
orf, this may not be statistically significant. However,
the GC content of the RPB1 gene is higher (27%) than
that of any of the three ORFs (which range from 17%
to 20% GC). For any of these regions to be part(s) of
the RPB1 gene, one or more spliceosomal intron(s)
would appear to be required at the 39 end of the nm
RPB1, unlike the other known G. theta nm introns,
which are all found at the extreme 59 ends of genes.
Determination of the 39 end sequence of the RPB1
RPB1s from Free-Living Protists
833
FIG. 1.—Global RPB1 phylogeny. Data sets of 746 aligned amino acid positions were analyzed by protML (ML), Puzzle (PZ), Puzzleboot
(PB), Maximum Parsimony (MP) and PAM-corrected distance methods (DI) with values given at each node in that order. In all trees, nodes
that are not supported by more than 50% with at least one method have no values listed for them and dashes denote cases where the topology
shown was not reconstructed by a particular method. Topologies shown represent the optimal protML tree with branch lengths estimated in
Puzzle incorporating a gamma correction and invariant sites. Sequences in bold are those obtained in this study. A, Rooted analysis of 22 taxa,
including 19 RPB1 sequences and 3 RPA1 outgroups. Uppercase letters denote nodes of interest matched to values in figure 1B. B, Results of
site removal test for long branches. As the optimal topology was the same for all site-removal analyses, the nodes listed are directly comparable
with the phylogeny in figure 1A. NoSR 5 no sites removed, FSR 5 fast sites removed, CSR 5 constant sites removed, and F 1 CSR 5 fast
and constant sites removed. Of particular note is the change in support for nodes J and I after site removal.
mRNA would be required to verify the absence of a
CTD or whether transcription extends into the short
CTD-like region in this downstream ORF (or both).
Global Phylogenetic Analysis
To assess the phylogenetic placement of our new
RPB1s, amino acid sequences from diverse eukaryotes
were aligned with RPA1 homologs from an animal, a
plant, and a fungus. This 22-taxon data set of 746 unambiguously aligned positions was subjected to rigorous
phylogenetic analysis (fig. 1A). The Giardia sequence
emerged as the earliest RPB1 branch followed by the
Trichomonas RPB1 sequence, both with apparently
good support. Neither the Naegleria nor the Cercomonas sequences were strongly placed in these trees,
whereas the Ochromonas sequence grouped with Plasmodium with variable, but modest support. Beyond
834
Dacks et al.
those nodes that were universally supported, parsimony
and distance analyses did not seem to provide significant
resolution but were consistent with the seemingly more
resolved ML analyses.
Tests for Long Branches
It has been previously suggested that the Giardia
and Trichomonas RPB1 sequences represent long
branches (Hirt et al. 1999; Stiller and Hall 1999) which
may artifactually place them as early emerging taxa
(Philippe et al. 2000b). Of the in-group sequences, the
Giardia sequence is clearly a long branch (fig. 1A). We
subjected our 22-taxon data set to a number of tests
devised to detect long-branch effects in our analysis (Lyons-Weiler, Hoelzer, and Tausch 1996; Stiller, Duffield,
and Hall 1998; Hirt et al. 1999).
Hirt et al. (1999) showed that failure to correct for
invariant or rapidly evolving sites could lead to artifactual resolution in phylogenies, especially when using
protML in which the assumption of rate constancy is
applied to all sites. A simple method of fast and CSR
was used to compensate for this artifact. In order to
assess the effect of these site rate categories in our data
set, protML (2q 10,000) searches were carried out with
fast, constant, and fast plus constant sites removed (fig.
1B).
Reminiscent of Hirt et al. (1999) our optimal ML
topology was unchanged by site removal, but the support for several nodes were significantly affected. Although the node separating the Giardia RPB1 and the
three RPA1 sequences from the rest remained robust,
support for the node placing the Trichomonas sequence
with Giardia and outgroups dropped from 78% to 51%
RELL support with fast sites removed and to 42% with
fast plus constant sites removed (fig. 1B). This suggests
that the deep placement of Trichomonas might also be
artifactual. Interestingly, the node uniting Ochromonas
and Plasmodium rose from 78% RELL support to 95%
and 94% with fast and fast plus constant sites removed,
respectively (fig. 1B). It is possible that long-branch effects are masking the real phylogenetic signal in this
case.
Without knowing the location of the root, it is difficult to distinguish whether a long branch is caused by
rapid sequence evolution or early divergence. Stiller,
Duffield, and Hall (1998) proposed a method that may
help to do so. They realized that the ratio of unique
substitutions (autapomorphies) in a sequence to the
shared substitutions with outgroups (symplesiomorphies) should be relatively uniform even in early diverging eukaryotes, if the rate of evolution were fairly
constant and the earliest branches did not precede others
by an immensely long time. However, a high ratio of
autapomorphies to symplesiomorphies indicates rapid
sequence divergence in a taxon rather than slow, but
ancient, evolution. In the 746 amino acid alignment,
Giardia has 24 autapomorphies, Ochromonas and Trichomonas have 8 and 7, respectively, kinetoplastids have
6, and no other taxa or group of taxa have more than 4.
Giardia only has five symplesiomorphies, Naegleria has
one, as does the Homo sapiens sequence. The exceptionally numerous autapomorphies in Giardia are suggestive of rapid evolution in its RPB1 gene but do not
preclude the possibility of it also being an ancient lineage among eukaryotes.
RASA (Lyons-Weiler and Hoelzer 1999) assesses
phylogenetic signal by measuring expected distribution
of synapomorphies in a data set against a null hypothesis
of a random distribution. It also identifies those taxa
contributing more statistical noise than phylogenetic signal in the same data set. As seen in figure 2A, when the
rooted 22-taxon 746-site data set was analyzed using
RASA, the Giardia sequence was clearly identified as a
long-branch sequence. A similar result was obtained
when the outgroup sequences were manually removed
(fig. 2B). As indicated in figure 2A–D, those taxa with
the largest taxon variance were sequentially removed
from the data set until the observed variance distribution
was relatively even (fig. 2E). In each case of taxon deletion, the tRASA value rose or (in the case of Mastigamoeba) was not markedly decreased (fig. 2 lower). Although the status of Mastigamoeba as a long branch is
debatable, it was also deleted from our analysis to be
conservative regarding both the possibility of longbranch artifacts as well as reducing the computational
load of further phylogenetic analyses.
Restricted Phylogeny
After removing long-branch taxa, the remaining sequences were aligned to give a data set of 16 taxa with
910 unambiguously aligned positions. Its tRASA value
was highest among those tested (fig. 2, lower), and little
heterogeneity was seen in taxon variance among all sequences (fig. 2F). The G. theta nm sequence was then
manually added to the alignment. Analysis of this restricted data set robustly reconstructed the major groups
of red algae, animal, fungi, and Euglenozoa and gave
limited resolution of the branching order among the major eukaryotic groups (fig. 3A). We recovered an Amoebozoa clade (sensu Baldauf et al. 2000) with the slime
mould Dictyostelium discoideum and the amoeba Acanthamoeba castellanii grouping together with moderate
affinity. A weak chromalveolate clade of Ochromonas
and Plasmodium was also recovered; alternatively, the
Plasmodium sequence clustered with the euglenozoan
sequences (Trypanosoma and Leishmania) with the
Ochromonas sequence adjacent. Because these four are
the four longest-branching sequences that remain in this
particular data set (fig. 3A), it is possible that moderate
LBA effects (not detectable by RASA) could be masking the chromalveolate relationship. Removing constant
and fast plus constant sites modestly improved support
for this relationship to 60% and 67%, respectively (data
not shown). When euglenozoan RPB1 sequences were
removed, support for chromalveolates and Amoebozoa
changed to 81% and 84% RELL and 61% and 71%
Puzzle support, respectively (fig. 3B).
Some cryptomonad nm genes are highly diverged
and give long branches in phylogenetic analyses (Keeling et al. 1999; Archibald et al. 2001). However, when
RPB1s from Free-Living Protists
835
FIG. 2.—Taxon variance graphs and tRASA values for long-branch removal. RASA analyses were used to sequentially remove taxa until
tRASA was maximized and taxon variance graphs appeared relatively even. Graphs of taxon variance from analytical analysis by RASA 2.5
are shown in the upper panel. Each data set is identified by the number of taxa, followed by the number of sites. In each case where a clearly
long branch is present, that sequence is labeled. Taxa are always listed from top to bottom and are numbered as follows: 1 5 Homo sapiens, 2
5 Caenorhabditis elegans, 3 5 Drosophila melanogaster, 4 5 Schizosaccharomyces pombe, 5 5 Saccharomyces cerevisiae, 6 5 Acanthamoeba
castellanii, 7 5 Dictyostelium discoideum, 8 5 Cercomonas ATCC50319, 9 5 Ochromonas danica, 10 5 Naegleria gruberi, 11 5 Mastiga-
836
Dacks et al.
the Guillardia nm RPB1 sequence is aligned into a data
set lacking other long-branch taxa (i.e., our 16 taxa 910site alignment, above), it is recovered as a sister to the
red algae with strong support (fig. 3A). This result reinforces previous evidence that the cryptomonad nm is
the remnant nucleus of a captured red alga (Douglas et
al. 1991; Douglas and Penny 1999). Addition or exclusion of the long-branch nm sequence did not significantly alter support or topology of the rest of the tree.
Discussion
Our analysis of RPB1 sequences has provided additional insights into the evolutionary relationships
among eukaryotes, including considerable support for
chromalveolates and Amoebozoa and a red algal affinity
for the cryptomonad nm. In addition, these data also
provide the opportunity to consider the functional evolution of RPB1, itself an important piece of the eukaryotic transcriptional apparatus.
Evolution of Transcription
The RNA Polymerase II complex is responsible for
transcription and processing of messenger RNAs, with
the largest subunit, RPB1, being central. A comparison
of RPB1 sequences from diverse eukaryotes allows us
to examine its functional evolution, particularly at the
putatively identified functional and highly conserved regions. Block A is the region to which amplification
primers were designed, so no information is available
for the three sequences obtained by PCR. However, it is
notable that the Guillardia nm RPB1 deviates from the
Cys2-His2 zinc finger motif that characterizes this region
(Cornelissen, Evers, and Kock 1988): the first histidine
position of the conserved Cys-X2-Cys-X9-His-X2-His
motif is replaced with a tyrosine. However, this is the
only clear deviation in the nm sequence (as compared
with the red algal sequences) of a previously identified
critical residue. Blocks B through E are present and well
conserved in all sequences that we obtained from freeliving protists. Block F, the location of the catalytic sites
(Wlassoff, Kimura, and Ishihama 1999), is well conserved across all taxa. This region also contains residues
identified for a-amanitin sensitivity (reviewed by Quon,
Delgadillo, and Johnson 1996), including Arg-741, Cys777, and Gly-785; all are perfectly conserved in our
RPB1s. Conserved domains G and H are both implicated in binding RPB6, a subunit important for RNA polymerase complex formation (Minakhin et al. 2001). In
particular, the residues PGEMV in domain G and DAFDVMIDEES in domain H have been pointed out as
contact points. In Naegleria and the Guillardia nm,
these residues in domain G are perfectly conserved. The
region H residues are less well conserved in the Naegleria sequence and the nm gene seems truncated at the
end of this region. Implications of these observations
for RPB6 binding and complex formation will require
direct experimental inquiry.
The carboxy-terminal domain (CTD) of RPB1 has
a number of transcriptional and posttranscriptional functions, in regulating transcription efficiency and coupling
it to pre-mRNA processing: capping, splicing, 39 end
cleavage, and polyadenylation. The phosphorylation of
serines 2 and 5 of the heptad repeats is particularly critical. Interestingly, though, Stiller, McConaughy, and
Hall (2000) demonstrated that the last serine of the
YSPTSPS heptad, although highly conserved, is not essential and can be substituted by a nonphosphorylatable
residue. The CTD from Naegleria is congruent with this,
having nonphosphorylatable residues (either alanine or
asparagine) at this position. In addition to regulatory
effects by phosphate addition, the action of a prolylisomerase ESS1 in yeast seems also to exert a regulatory
effect at the CTD (Wu et al. 2000). In line with this, the
Mastigamoeba and Naegleria CTDs are perfectly conserved at both proline positions.
Despite the CTD being implicated in many aspects
of transcription and RNA processing, several protists appear to be devoid of a bona fide CTD, instead having
only serine- and proline-rich regions at the carboxy terminal end. Even in taxa where the conservation of the
repeats is strong, CTDs sometimes contain a number of
noncanonical repeats. Similarly, of the eight repeats
known in the Naegleria RPB1 CTD, three diverge significantly from the canonical heptad sequence. These
data suggest that the exact sequence of the repeat may
not be critical and that the conservation of the repeats
may be correlated with the rigor with which the function
is required. Because mRNA processing occurs in some
taxa where the CTD is diminished or absent, they may
have different mechanisms of transcription regulation. It
also underlines that the careful functional work carried
out with RPB1 in animals and fungi needs to be taken
in an evolutionary context and not generalized to other
species without direct evidence. Comparative studies, as
here, may help in generalizing to all eukaryotes.
Evolution of Splicing and Introns
The RPB1 CTD plays a major role as a platform
for construction of the spliceosome, (reviewed by Hirose
and Manley 2000). We have observed a relationship between spliceosomal intron density and the presence of a
CTD. For intron-rich species like mammals, the efficiency of spliceosome binding to the CTD may be paramount, perhaps forcing strict adherence to the heptad
repeat sequence. However, for intron-sparse organisms,
←
moeba invertens, 12 5 Arabidopsis thaliana, 13 5 Bonnemaisonia hamifera, 14 5 Porphyra yezoensis, 15 5 Plasmodium falciparum, 16 5
Trichomonas vaginalis, 17 5 Trypanosoma brucei, 18 5 Leishmania donovani, 19 5 Giardia intestinalis. In panels A and B, taxa are ordered
1–19. In C, taxa are ordered as 1–18. Panel D contains taxa 1–15, 17, and 18. In panel E, taxa are ordered 1–10, 12–15, 17, and 18. Panel F
has taxa in the following order: 13, 14, 6, 7, 12, 8, 2, 3, 1, 4, 5, 10, 9, 15, 17, 18. Figure 2, lower tabulates the tRASA values under analytical
(ana) and permutation (perm) models of null slope estimation. The numbering of the data sets matches those in figure 2, upper.
RPB1s from Free-Living Protists
837
FIG. 3.—Unrooted, taxon-restricted RPB1 phylogeny. Figure 3A shows an unrooted phylogeny arbitrarily rooted on the kinetoplastid
sequences. ProtML RELL (ML), Puzzle (PZ), and Puzzleboot (PB) support values are shown at all nodes over 50%. Taxa in bold are those
obtained in this study or newly analyzed (the G. theta nm). Upper case letters correspond to nodes of interest matched to values in figure 3B.
Figure 3B tabulates the support values for nodes upon removal of the kinetoplastids (taxa denoted with an asterisk) and reanalysis. Notably, the
support for chromalveolate and Amoebozoa nodes were particularly affected.
this conservation might be relaxed. RPB1s from Trichomonas and Giardia (organisms not known to contain
spliceosomal introns; Logsdon 1998) lack CTDs with
canonical repeats but instead have serine-proline–rich Cterminal regions—possibly representing degenerate
CTDs. Other protists also show possible CTD degeneration (Stiller and Hall 1997; Stiller, Duffield, and Hall
1998). Interestingly, the low intron density in Naegleria
(Logsdon 1998) matches its abnormal CTD. Without
knowing the location of the eukaryotic root or even a
well resolved eukaryotic phylogeny, we cannot be sure
whether this and similar cases in other protists are degenerate or early stages of CTD evolution. The apparent
absence of a CTD from the Guillardia nm RPB1 contrasts with the presence of 17 spliceosomal introns in its
genome (Douglas et al. 2001). If the CTD is indeed
missing from the Guillardia nm RPB1, it is very likely
caused by loss because both red and green algae contain
either bona fide heptad repeats or clearly degenerate repeats. Whether the absence of CTD from the Guillardia
nm affects the transcription-processing functions and
represents a singular loss event or a general feature of
genome diminution and intron loss are interesting questions, now open to investigation. As RPB1 genes are
sequenced from a diversity of eukaryotes and as more
protist genomes are studied, the relationship between the
838
Dacks et al.
CTD and the evolution and spread of spliceosomal introns will be clarified.
Eukaryotic Phylogeny
Our analyses of RPB1 phylogeny reveal support—
for the first time with this molecule—for some higherlevel groupings among major eukaryotic lineages. Although RPB1 does not provide robust resolution between some major eukaryotic groups, the opisthokonts
(animals plus fungi), Amoebozoa, and chromalveolates
are moderately supported, as they are for other phylogenetic markers (Baldauf and Palmer 1993; Baldauf et
al. 2000; Fast et al. 2001). While our paper was in preparation, another RPB1 analysis (Stiller, Riley, and Hall
2001) confirmed the alveolate relationship providing a
new ciliate sequence and showed glaucophytes as an
outgroup to red algae. In the analyses shown here (fig.
3), the cryptomonad nm clearly groups with red algae
(Douglas et al. 1991), though it is unclear whether it
will group within the strong glaucophyte-red algal clade.
Unfortunately, neither the Naegleria nor Cercomonas
RPB1 sequences show strong affinity for any others in
our data set.
Two apparently robust nodes in our 22-taxon phylogeny were those separating the Giardia and then the
Trichomonas sequences from the other eukaryotic RPB1
sequences. However, this need not mean they are actually early emerging. In line with previous suggestions
and results (Stiller, Duffield, and Hall 1998; Hirt et al.
1999), our various tests indicate that these two sequences are particularly divergent; thus, their placement as
early evolving lineages is suspect. The site removal (fig.
1B), autapomorphy-sympleisiomorphy ratio, and RASA
(fig. 2) analyses confirmed that the Giardia and Trichomonas RPB1 sequences represent long branches within
the analysis. The Trypanosoma sequence has also been
suggested as a long branch; however, the Leishmania
sequence appears to divide this branch and somewhat
reduce its effects. Although our LBA analyses neither
indicate an alternate placement for Giardia or Trichomonas nor prove that these sequences are not early
evolving, they strongly concur with prior suggestions
that the deeply diverging position of diplomonads and
parabasalids be viewed with caution.
When long-branch taxa are excluded, we see less
apparent resolution than in previous reports (Stiller, Duffield, and Hall 1998; Hirt et al. 1999) or in our global
analyses (fig. 1). This suggests that long-branch taxa
may structure the data set and provide false resolution.
It is therefore important to view with caution any conclusions based on RPB1 phylogenies which include
long-branch taxa; their presence may obscure other relationships. Our restricted data set has some resolution
at the supertaxon level, consistent with data from morphological and other molecular analyses. In particular,
the chromalveolates and Amoebozoa are reconstructed
with moderate support (fig. 3), as are the opisthokonts;
the latter two are notable, given their previous lack of
resolution by RPB1 (Stiller, Duffield, and Hall 1998;
Hirt et al. 1999), including a seemingly well supported,
but contradictory, placement of animals and fungi (Sidow and Thomas 1994). An opisthokont plus amoebozoa branch is recovered in the optimal topology, consistent with other data (Baldauf et al. 2000), but it is not
statistically supported or recovered by other methods.
Stiller, Riley, and Hall (2001) have recently provided
evidence from RPB1 for the separation of red and green
algae; in the analyses done here (including the removal
of long-branch taxa) we find no support for this separation. Although we do not recover a monophyletic plant
clade (red algae and land plants), there is no significant
support for its polyphyly. Indeed, Moreira, Le Guyader,
and Phillippe (2000) also showed that RPB1 phylogeny
was the sole exception among a variety of genes to uniting red and green algae and that analyses of RPB1 are
not strongly inconsistent with this clade.
In its initial formulation, the eukaryotic big bang
hypothesis stated that the major eukaryotic groups were
formed in an explosive radiation yielding as many as 10
or as few as four fundamentally unresolvable groups
(Philippe and Adoutte 1998). In the past few years, a
number of these (and other) major eukaryotic groups
have been confidently placed together using concatenated data (Baldauf et al. 2000; Moreira, Le Guyader, and
Phillippe 2000), novel taxon inclusion (Dacks et al.
2001), or alternative protein markers (Hirt et al. 1999;
Moreira, Le Guyader, and Phillippe 2000; Keeling 2001;
Fast et al. 2001). Consequently, we doubt that the largescale relationships between eukaryotes are fundamentally unresolvable by conventional molecular phylogenetics. Recent incarnations of the eukaryotic big bang
hypothesis have focused on the time span of the radiation and less on fundamental lack of resolution among
lineages (Philippe, Germot, and Moreira 2000a). The
major eukaryotic supertaxa probably did evolve rapidly,
in line with the observation that most single genes have
consistent, but weak signal. However, that radiation
probably left behind a phylogenetic signal that could be
unraveled with more data and additional analyses. This
means that the eukaryotic big bang and superclade
views are not as incompatible as they might first appear.
Using several different genes to establish internal relationships may prove more productive and robust than
seeking the deepest diverging taxa using single genes
only. Given the relative success of RPB1 in placing phylogenetically difficult taxa (Hirt et al. 1999; Stiller, Riley, and Hall 2001) and our demonstration of some larger-scale eukaryotic resolution, building a well-represented RPB1 database may help clarify some of these
internal relationships.
Note Added in Proof
Two recent papers have demonstrated that the diverse amoebae Mastigamoeba, Entamoeba and Dictyostelium form a monophyletic group, Conosa (Arisue, N.,
T. Hashimoto, J. A. Lee, D. V. Moore, P. Gordon, C. W.
Sensen, T. Gaasterland, M. Hasegawa, and M. Muller.
2002. The phylogenetic position of Mastigamoeba balamuthi based on sequences of rDNA and translation
elongation factors EF1-a and EF-2. J. Eukaryot. Micro-
RPB1s from Free-Living Protists
biol. 49:1–10; Bapteste, E., H. Brinkmann, J. A. Lee, D.
V. Moore, C. W. Sensen, P. Gordon, L. Durufle, T. Gaasterland, P. Lopez, M. Muller, and H. Phillippe. 2002. The
analysis of 100 genes supports the grouping of three
highly divergent amoebae: Dictyostelium, Entamoeba,
and Mastigamoeba. Proc. Natl. Acad. Sci. USA 99:14–
19). A third paper using concatenated mitochondrial
proteins (Forget, L., J. Ustinova, Z. Wang, V. A. R.
Huss, and B. F. Lang. 2002. Hyaloraphidium curvatum:
A linear mitochondrial genome, tRNA Editing, and an
evolutionary link to lower fungi. Mol. Biol. Evol. 19:
310–319) shows independently of our nuclear gene evidence and that of Baldauf et al. (2000) that Acanthamoeba also is specifically related to Dictyostelium. This
extensive evidence for the monophyly of Amoebozoa is
not strongly contradicted by our analyses that do not
place Mastigamoeba invertens with the other two amoebae; the RPB1 data set seems sensitive to long branch
effects and M. invertens acts as a long branch. The position of M. invertens is similarly non-robust on gamma
corrected 18S rRNA trees, where it often does not group
with other amoebae (and never with M. balamuthi: TCS unpublished data). In addition, a spliceosomal intron
has been recently discovered in Giardia (Nixon, J. E.,
A. Wang, H. G. Morrison, A. G. McArthur, M. L. Sogin,
B. J. Loftus, and J. Samuelson. 2002. A spliceosomal
intron in Giardia lamblia. Proc. Natl. Acad. Sci. USA
99:3701–3705. Thus Giardia must still be capable of
splicing despite its abnormal CTD; this is consistent
with our suggestion of widespread CTD degeneration in
protists.
Acknowledgments
We would like to thank Alastair Simpson, Banoo
Malik, Lesley Davis, and Andrew Roger for critical
reading of the manuscript and helpful comments. We
also thank two anonymous reviewers for their helpful
comments. This work was made possible by grants to
W.F.D. from the CIHR (Grant MT4467), to T.C.-S. from
NSERC (Canada) and NERC (UK) and to J.M.L. from
the NIH (GM19656). J.B.D. was supported by a CIHR
Doctoral Research Award as well as a Walter C. Sumner
scholarship. A.M. was partly supported by a grant from
CIAR. T.C.-S. thanks the CIAR Evolutionary Biology
Program and NERC for Fellowship support.
LITERATURE CITED
ADACHI, J., and M. HASEGAWA. 1996. MOLPHY. Version 2.3.
Programs for molecular phylogenetics based on maximum
likelihood. Computer Science Monographs 28.
ARCHIBALD, J., T. CAVALIER-SMITH, U. MAIER, and S. DOUGLAS. 2001. Molecular chaperones encoded by a reduced nucleus—the cryptomonad nucleomorph. J. Mol. Evol. 52:
490–501.
BALDAUF, S. L., and W. F. DOOLITTLE. 1997. Origin and evolution of the slime molds (Mycetozoa). Proc. Natl. Acad.
Sci. USA 94:12007–12012.
BALDAUF, S. L., and J. D. PALMER. 1993. Animals and fungi
are each other’s closest relatives: congruent evidence from
multiple proteins. Proc. Natl. Acad. Sci. USA 90:11558–
11562.
839
BALDAUF, S. L., A. J. ROGER, I. WENK-SIEFERT, and W. F.
DOOLITTLE. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972–977.
CAVALIER-SMITH, T. 1987. The origin of fungi and pseudofungi. Pp. 339–353 in A. D. M. RAYNER, C. M. BRASIER, and
D. MOORE, eds., Evolutionary biology of the fungi, Vol. 13.
Symp. Br. Mycol. Soc. Cambridge University Press,
Cambridge.
———. 1993. Kingdom Protozoa and its 18 phyla. Microbiol.
Rev. 57:953–994.
———. 1998. A revised six-kingdom system of life. Biol. Rev.
Camb. Philos. Soc. 73:203–266.
———. 1999. Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree. J. Eukaryot. Microbiol. 46:347–366.
———. 2000. Flagellate megaevolution: the basis for eukaryote diversification. Pp. 361–390 in J. R. GREEN and B. S.
C. LEADBEATER, eds. The Flagellates. Taylor and Francis,
London.
———. 2002. The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int. J. Syst. Evol. Microbiol. 52:297–354.
CORNELISSEN, A. W., R. EVERS, and J. KOCK. 1988. Structure
and sequence of genes encoding subunits of eukaryotic
RNA polymerases. Oxf. Surv. Eukaryot. Genes 5:91–131.
DACKS, J., and A. J. ROGER. 1999. The first sexual lineage and
the relevance of facultative sex. J. Mol. Evol. 48:779–783.
DACKS, J. B., J. D. SILBERMAN, A. G. SIMPSON, S. MORIYA,
T. KUDO, M. OHKUMA, and R. J. REDFIELD. 2001. Oxymonads are closely related to the excavate taxon Trimastix.
Mol. Biol. Evol. 18:1034–1044.
DOUGLAS, S., S. ZAUNER, M. FRAUNHOLZ, M. BEATON, S. PENNY, L. T. DENG, X. WU, M. REITH, T. CAVALIER-SMITH, and
U. G. MAIER. 2001. The highly reduced genome of an enslaved algal nucleus. Nature 410:1091–1096.
DOUGLAS, S. E., C. A. MURPHY, D. F. SPENCER, and M. W.
GRAY. 1991. Cryptomonad algae are evolutionary chimaeras of two phylogenetically distinct unicellular eukaryotes.
Nature 350:148–151.
DOUGLAS, S. E., and S. L. PENNY. 1999. The plastid genome
of the cryptophyte alga, Guillardia theta: complete sequence and conserved synteny groups confirm its common
ancestry with red algae. J. Mol. Evol. 48:236–244.
EDGCOMB, V. P., A. J. ROGER, A. G. SIMPSON, D. T. KYSELA,
and M. L. SOGIN. 2001. Evolutionary relationships among
‘‘jakobid’’ flagellates as indicated by alpha- and beta-tubulin phylogenies. Mol. Biol. Evol. 18:514–522.
EMBLEY, T. M., and R. P. HIRT. 1998. Early branching eukaryotes? Curr. Opin. Genet. Dev. 8:624–629.
FAST, N. M., J. C. KISSINGER, D. S. ROOS, and P. J. KEELING.
2001. Nuclear-encoded, plastid-targeted genes suggest a
single common origin for apicomplexan and dinoflagellate
plastids. Mol. Biol. Evol. 18:418–426.
FELSENSTEIN, J. 1995. PHYLIP (phylogeny inference package).
Department of Genetics, University of Washington, Seattle.
GERMOT, A., H. PHILIPPE, and H. LE GUYADER. 1997. Evidence for the loss of mitochondria in Microsporidia from a
mitochondrial-type HSP70 in Nosema locustae. Mol.
Biochem. Parasitol. 87:159–168.
HIROSE, Y., and J. L. MANLEY. 2000. RNA polymerase II and
the integration of nuclear events. Genes Dev. 14:1415–
1429.
HIRT, R. P., J. M. LOGSDON JR., B. HEALY, M. W. DOREY, W.
F. DOOLITTLE, and T. M. EMBLEY. 1999. Microsporidia are
related to Fungi: evidence from the largest subunit of RNA
840
Dacks et al.
polymerase II and other proteins. Proc. Natl. Acad. Sci.
USA 96:580–585.
KEELING, P. J. 2001. Foraminifera and cercozoa are related in
actin phylogeny: two orphans find a home? Mol. Biol. Evol.
18:1551–1557.
KEELING, P. J., J. A. DEANE, C. HINK-SCHAUER, S. E. DOUGLAS, U. G. MAIER, and G. I. MCFADDEN. 1999. The secondary endosymbiont of the cryptomonad Guillardia theta
contains alpha-, beta-, and gamma-tubulin genes. Mol. Biol.
Evol. 16:1308–1313.
KEELING, P. J., and W. F. DOOLITTLE. 1996. Alpha-tubulin from
early-diverging eukaryotic lineages and the evolution of the
tubulin family. Mol. Biol. Evol. 13:1297–1305.
LAM, T. Y., L. CHAN, P. YIP, and C. H. SIU. 1992. The largest
subunit of RNA polymerase II in Dictyostelium: conservation of the unique tail domain and gene expression. Biochem. Cell. Biol. 70:792–799.
LICHTENSTEIN, C. P., and J. DRAPER. 1985. Genetic engineering
in plants. Pp. 102–103 in D. M. Glover, ed. DNA cloning:
a practical approach. IRL Press, Oxford.
LOGSDON, J. M. JR. 1998. The recent origins of spliceosomal
introns revisited. Curr. Opin. Genet. Dev. 8:637–648.
LYONS-WEILER, J., and G. A. HOELZER. 1999. Null model selection, compositional bias, character state bias, and the limits of phylogenetic information. Mol. Biol. Evol. 16:1400–
1406.
LYONS-WEILER, J., G. A. HOELZER, and R. J. TAUSCH. 1996.
Relative apparent synapomorphy analysis (RASA). I: the
statistical measurement of phylogenetic signal. Mol. Biol.
Evol. 13:749–757.
MADDISON, D. R., and W. P. MADDISON. 2000. MacClade 4;
analysis of phylogeny and character evolution. Sinauer Associates, Sunderland, Mass.
MINAKHIN, L., S. BHAGAT, A. BRUNNING, E. A. CAMPBELL, S.
A. DARST, R. H. EBRIGHT, and K. SEVERINOV. 2001. Bacterial RNA polymerase subunit omega and eukaryotic RNA
polymerase subunit RPB6 are sequence, structural, and
functional homologs and promote RNA polymerase assembly. Proc. Natl. Acad. Sci. USA 98:892–897.
MOREIRA, D., H. LE GUYADER, and H. PHILLIPPE. 2000. The
origin of red algae and the evolution of chloroplasts. Nature
405:69–72.
PHILIPPE, H., and A. ADOUTTE. 1998. The molecular phylogeny
of Eukaryota: solid facts and uncertainties. Pp. 25–56 in G.
COOMBS, K. VICKERMAN, M. SLEIGH, and A. WARREN, eds.
Evolutionary relationships among Protozoa. Chapman &
Hall, London.
PHILIPPE, H., A. GERMOT, and D. MOREIRA. 2000a. The new
phylogeny of eukaryotes. Curr. Opin. Genet. Dev. 10:596–
601.
PHILIPPE, H., P. LOPEZ, H. BRINKMANN, K. BUDIN, A. GERMOT,
J. LAURENT, D. MOREIRA, M. MULLER, and H. LE GUYADER. 2000b. Early-branching or fast-evolving eukaryotes?
An answer based on slowly evolving positions. Proc. R.
Soc. Lond. B. Biol. Sci. 267:1213–1221.
QUON, D. V., M. G. DELGADILLO, and P. J. JOHNSON. 1996.
Transcription in the early diverging eukaryote Trichomonas
vaginalis: an unusual RNA polymerase II and alpha-amanitin–resistant transcription of protein-coding genes. J. Mol.
Evol. 43:253–262.
SIDOW, A., and W. K. THOMAS. 1994. A molecular evolutionary framework for eukaryotic model organisms. Curr. Biol.
4:596–603.
SILBERMAN, J. D., C. G. CLARK, L. S. DIAMOND, and M. L.
SOGIN. 1999. Phylogeny of the genera Entamoeba and Endolimax as deduced from small-subunit ribosomal RNA sequences. Mol. Biol. Evol. 16:1740–1751.
SIMPSON, A. G. B., and D. J. PATTERSON. 1999. The ultrastructure of Carpediemonas membranifera (Eukaryota), with reference to the ‘‘excavate hypothesis.’’ Eur. J. Protistol. 35:
353–370.
SOGIN, M. L. 1991. Early evolution and the origin of eukaryotes. Curr. Opin. Gen. Dev. 1:457–463.
STILLER, J. W., E. C. DUFFIELD, and B. D. HALL. 1998. Amitochondriate amoebae and the evolution of DNA-dependent
RNA polymerase II. Proc. Natl. Acad. Sci. USA 95:11769–
11774.
STILLER, J. W., and B. D. HALL. 1997. The origin of red algae:
implications for plastid evolution. Proc. Natl. Acad. Sci.
USA 94:4520–4525.
———. 1998. Sequences of the largest subunit of RNA polymerase II from two red algae and their implications for
rhodophyte evolution. J. Phycol. 34:857–864.
———. 1999. Long-branch attraction and the rDNA model of
early eukaryotic evolution. Mol. Biol. Evol. 16:1270–1279.
STILLER, J. W., B. L. MCCONAUGHY, and B. D. HALL. 2000.
Evolutionary complementation for polymerase II CTD
function. Yeast 16:57–64.
STILLER, J. W., J. RILEY, and B. D. HALL. 2001. Are red algae
plants? A critical evaluation of three key molecular data
sets. J. Mol. Evol. 52:527–539.
STRIMMER, K., and A. VON HAESELER. 1997. Puzzle. Zoologisches Institut. Universitat Muenchen, Munich.
SWOFFORD, D. L. 1998. PAUP*: phylogenetic analysis using
parsimony (* and Other Methods). Sinauer Associates, Sunderland, Mass.
THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN,
and D. G. HIGGINS. 1997. The CLUSTALpX windows interface: flexible strategies for multiple sequence alignment
aided by quality analysis tools. Nucleic Acids Res. 25:
4876–4882.
WLASSOFF, W. A., M. KIMURA, and A. ISHIHAMA. 1999. Functional organization of two large subunits of the fission yeast
Schizosaccharomyces pombe RNA polymerase II. Location
of the catalytic sites. J. Biol. Chem. 274:5104–5113.
WU, X., C. B. WILCOX, G. DEVASAHAYAM, R. L. HACKETT,
M. AREVALO-RODRIGUEZ, M. E. CARDENAS, J. HEITMAN,
and S. D. HANES. 2000. The Ess1 prolyl isomerase is linked
to chromatin remodeling complexes and the general transcription machinery. EMBO J. 19:3727–3738.
Geoffrey McFadden, reviewing editor
Accepted January 15, 2002