Root of the Eukaryota Tree as Inferred from Combined Maximum

Root of the Eukaryota Tree as Inferred from Combined Maximum
Likelihood Analyses of Multiple Molecular Sequence Data
Nobuko Arisue,* 1 Masami Hasegawa,* and Tetsuo Hashimoto,* à§
*Department of Biosystems Science, Graduate University for Advanced Studies (Sokendai), Hayama, Kanagawa, Japan;
The Institute of Statistical Mathematics, Minato-ku, Tokyo, Japan; àThe Rockefeller University;
§Institute of Biological Sciences, University of Tsukuba, Tsukuba, Japan
Extensive studies aiming to establish the structure and root of the Eukaryota tree by phylogenetic analyses of molecular
sequences have thus far not resulted in a generally accepted tree. To re-examine the eukaryotic phylogeny using
alternative genes, and to obtain a more robust inference for the root of the tree as well as the relationship among major
eukaryotic groups, we sequenced the genes encoding isoleucyl-tRNA and valyl-tRNA synthetases, cytosolic-type heat
shock protein 90, and the largest subunit of RNA polymerase II from several protists. Combined maximum likelihood
analyses of 22 protein-coding genes including the above four genes clearly demonstrated that Diplomonadida and
Parabasala shared a common ancestor in the rooted tree of Eukaryota, but only when the fast-evolving sites were
excluded from the original data sets. The combined analyses, together with recent findings on the distribution of a fused
dihydrofolate reductase–thymidylate synthetase gene, narrowed the possible position of the root of the Eukaryota tree on
the branch leading to Opisthokonta or to the common ancestor of Diplomonadida/Parabasala. However, the analyses did
not agree with the position of the root located on the common ancestor of Opisthokonta and Amoebozoa, which was
argued by Stechmann and Cavalier-Smith [Curr. Biol. 13:R665–666, 2003] based on the presence or absence of a threegene fusion of the pyrimidine biosynthetic pathway: carbamoyl-phosphate synthetase II, dihydroorotase, and aspartate
carbamoyltransferase. The presence of the three-gene fusion recently found in the Cyanidioschyzon merolae
(Rhodophyta) genome sequence data supported our analyses against the Stechmann and Cavalier-Smith-rooting in 2003.
Introduction
The prominent part of eukaryotic diversity is represented by unicellular organisms, the protists. Based primarily on their morphology and biology, protists can be
assigned to several dozen well-characterized groups (Lee,
Leedale, and Bradbury 2000). Many attempts have been
made to establish a natural, phylogenetic system of eukaryotes, but the relationships and the order of evolutionary
emergence of many diverse groups remain unresolved,
primarily because of a lack of clear synapomorphies. Phylogenetic inferences based on molecular sequences promised to provide a natural system, but thus far they have failed
to give unequivocal results. As yet no consensus has been
reached on either the structure or the root of the eukaryotic
part of the universal tree. Recent articles have shown trees
with essentially unresolved origins of many lineages (Dacks
and Doolittle 2001; Roger and Silberman 2002; Simpson
and Roger 2002; Baldauf 2003).
Analysis of increasing numbers of sequences from
more and more species resulted in trees that often resolved
the relationships within individual lineages well, but more
distant, deeper relationships were more often contradictory
than not. Recent developments in our understanding of the
processes of molecular evolution and the development of
more discriminating techniques of sequence analysis uncovered several reasons for the difficulty in attaining the
desired goal with single sequences (Philippe et al. 2000;
Gribaldo and Philippe 2002). Among these are the progres1
Present address: Department of Molecular Protozoology, Research
Institute for Microbial Disease, Osaka University, Suita, Osaka 565-0871,
Japan.
Key words: Diplomonadida, Parabasala, eukaryote evolution, root,
maximum likelihood, combined phylogeny.
E-mail: [email protected].
Mol. Biol. Evol. 22(3):409–420. 2005
doi:10.1093/molbev/msi023
Advance Access publication October 20, 2004
sive loss of phylogenetic information resulting from mutational saturation of diverging sequences, the long branch
attraction (LBA) artifact of phylogenetic reconstruction
(Felsenstein 1978), and the failure to model the groupspecific and species-specific differences in the evolution of
different positions of the macromolecules studied. Lateral
gene transfers (LGT) have also been recognized recently to
contribute to the discordance of different gene trees
(Richards et al. 2003).
To overcome the lack of resolution in single-gene
phylogenies, various extensive analyses based on combined
data sets with multiple genes have recently been performed,
and a monophyletic origin of each of the higher-order
groups, Opisthokonta (Metazoa 1 Fungi/Microsporidia),
Amoebozoa (Lobosa 1 Conosa), Plantae (Viridiplantae 1
Rhodophyta 1 Glaucophyta), Euglenozoa 1 Heterolobosea, and Alveolata 1 stramenopiles has been established
(Moreira, Le Guyader, and Philippe 2000; Baldauf et al.
2000; Arisue et al. 2002a, 2002b; Bapteste et al. 2002). In
addition to these groups, the presence of several other
higher-order groups has also been suggested by molecular
and/or morphological findings (for review see Baldauf
2003). The group Cercozoa was supported by the actin and
SSUrRNA phylogenies (Keeling 2001; Cavalier-Smith and
Chao 2003a, 2003b) and by a shared insertion in the
polyubiquitin genes (Archibald et al. 2003). The ‘‘excavate
taxa’’ was proposed as a putative monophyletic or paraphyletic group (O’Kelly and Nerad 1999; Simpson and
Patterson 1999), which includes organisms possessing a
vental feeding groove that collects suspended particles
driven into it by the beating of a posterior flagellum.
Cavalier-Smith (2002) further proposed a larger group,
‘‘Excavata,’’ based on an unrooted SSUrRNA tree and morphological considerations. Excavata comprises Metamonada
(including Diplomonadida), Parabasala, Percolozoa (including Heterolobosea), Euglenozoa, and Loukozoa (including
Oxymonadida, Trimastix, Malawimonas, Carpediomonas,
Molecular Biology and Evolution vol. 22 no. 3 Ó Society for Molecular Biology and Evolution 2004; all rights reserved.
410 Arisue et al.
Jakobea). To date, however, examination of the molecular
phylogenies of SSUrRNA and tubulins have not supported
either monophyly or paraphyly of Excavata with any
statistical confidence (Dacks et al. 2001; Edgcomb et al.
2001; Silberman et al. 2002; Simpson et al. 2002).
Only small numbers of higher-order groups are
present in the tree of Eukaryota as mentioned above, and
an overall picture of biodiversity of Eukaryota seems to be
rather good and comprehensive (Cavalier-Smith 2004).
However, the phylogenetic relationships of the higherorder groups are still uncertain, and many alternative possibilities still exist regarding the root of the tree of Eukaryota.
On the basis of distribution of the dihydrofolate reductase (DHFR) and thymidylate synthase (TS) fused gene,
Stechmann and Cavalier-Smith (2002) proposed that the
root is likely to be located between Opisthokonta and the
others. Later they argued that the root should be located
between the bikonts and Opisthokonta/Amoebozoa
(Stechmann and Cavalier-Smith 2003a), together with independent lines of evidence for other gene fusion events
in the pyrimidine biosynthetic pathway. Particular attention was paid also to the presence of a fusion between the
genes, carbamoyl-phosphate synthetase (CPS) II that are
composed of glutamine amidotransferase (GAT), and the
CPS domains, dihydroorotase (DHO), and aspartate
carbamoyltransferase (ACT), which is exclusively found
in Metazoa, Fungi, and the amoebozoan, Dictyostelium
discoideum (Nara, Hashimoto, and Aoki 2000).
In the present study, in order to obtain a robust
resolution for the evolutionary relationship among major
eukaryotic groups and to gain a better insight into the
possible root of the eukaryotic tree, we performed combined
maximum likelihood (ML) analyses based on 24 genes
concerning the relationships among seven major eukaryotic
groups (Opisthokonta, Amoebozoa, Plantae, Euglenozoa/
Heterolobosea, Alveolata/stramenopiles, Diplomonadida,
and Parabasala) using an outgroup for rooting the tree. For
this purpose we cloned and sequenced the genes from
several protists coding for isoleucyl-tRNA and valyl-tRNA
synthetases (IleRS, ValRS), cytosolic-type heat shock
protein 90 (HSP90c), and the largest subunit of RNA
polymerase II (RPB1). Analysis of all selected sites from
original alignments for the 24 genes including two rRNAs
was strongly affected by the LBA artifact, significantly
positioning Diplomonadida at the base of the eukaryotic
tree. However, analysis of 22 protein-coding genes using
only slowly evolving amino acid sites demonstrated clearly
that Diplomonadida and Parabasala are closely related, and
that an early emergence of the common ancestor of these
two groups is not necessarily exclusively supported. Our
present analyses, together with findings on the distribution
of the fused DHFR-TS gene as mentioned above, narrowed
the possible position of the root of the Eukaryota tree on
the branches leading to Opisthokonta or to the common
ancestor of Diplomonadida/ Parabasala.
Materials and Methods
Sequencing of Protist Genes
The original sequences (and the GenBank Accession
Numbers) reported in this work were these: IleRS of Glugea
plecoglossi [Microsporidia] (AB092420), Encephalitozoon
hellem [Microsporidia] (AB092421), Entamoeba histolytica (AB092423 and AB092424), Giardia intestinalis
(AB092425), Trichomonas vaginalis (AB092426), Trypanosoma cruzi (AB092427), Plasmodium falciparum (AB092428); ValRS of G. plecoglossi (AB092429), E. hellem
(AB092430), E. histolytica (AB092433 and AB092434),
T. cruzi (AB092435), P. falciparum (AB092436); HSP90c
of G. intestinalis (AB092407 and AB092408), T. vaginalis
(AB092409 and AB092410), E. histolytica (AB092411);
and RPB1 of G. intestinalis (AB092412), E. histolytica
(AB092413). Details of cloning and sequencing strategies
are described in the Supplementary Materials online.
Sequence Alignments of the Genes Used for the
Phylogenetic Analyses
In addition to IleRS, ValRS, RPB1, and HSP90c, for
which original sequences were established from several protists in this work, 20 other genes were used for phylogenetic analyses. These included small subunit (SSU) rRNA,
large subunit (LSU) rRNA, EF1a, EF2, ribosomal proteins (RP) S14, S15a, L5, L8, L10a, cytosolic-type HSP70
(HSP70c), ER-type HSP70 (HSP70er), mitochondrialtype HSP70 (HSP70mit), chaperonin 60 (CPN60),
chaperonin-containing TCP-1 (CCT) a, d, c, f subunits,
actin (ACT), a-tubulin (TBa), and b-tubulin (TBb). Genes
related to metabolic pathways were not used in the present
analyses, because LGT events are frequently observed for
these genes, and thus inclusion of such genes would have
violated the correct inference for organismal phylogeny.
We assumed that the genes used in this study are not
subjected to LGT as far as the analysis of the Eukaryota
domain is concerned, because preliminary phylogenetic
analyses of these genes did not suggest the presence of any
LGT events.
For the above protein-coding 22 genes, amino acid
sequences from diverse eukaryotes and several outgroup
sequences were collected from various databases and
alignments, including the original sequences of the above
four genes obtained in this study, were constructed using
the SAM2.1 program (Hughey and Krogh 1996). The
obtained alignments were then adjusted manually. For
SSUrRNA and LSUrRNA, alignments of diverse eukaryotic and four archaebacterial (outgroup) sequences were
obtained using the secondary structure-based alignment
database
(http://oberon.fvms.ugent.be:8080/rRNA/
index.html) (Wuyts et al. 2001, 2002). Several additional
sequences not present in the database were inserted and
then aligned manually. Unambiguously aligned sites were
selected from each of the original alignments and used for
phylogenetic analyses. Alignments and data sets used are
available from T. H. upon request.
Outgroup sequences used for individual genes are
listed in table S1 of the Supplementary Materials online. In
brief, archaebacteria were used for the analyses of the
genes, SSUrRNA, LSUrRNA, EF1a, EF2, RP-S14, RPS15a, RP-L5, RP-L8, RP-L10a, TCP-1a, TCP-1d, TCP1c, and TCP-1f. Eubacteria were used for IleRS, ValRS,
HSP70mit, and CPN60. Paralogous eukaryotic sequences
were used for the other genes. Combined maximum
Root of the Eukaryota Tree 411
likelihood (ML) analyses in this study were carried out
under the assumption that ingroup sequences in each gene
have evolved independently.
Programs and Models Used in the Phylogenetic Inference
The NUCML and PROTML programs in the package
MOLPHY (version 2.3) (Adachi and Hasegawa 1996) were
used in the analyses, which assumed a homogeneous acrosssite rate (Homogeneous model). To take the evolutionary
heterogeneity across-site rate into consideration, the
BASEML and CODEML programs in the package PAML
(version 3.1) (Yang 1997) were used, where a discrete
ÿ-distribution with 8 categories for across-site rate heterogeneity was assumed (RAS model). The ÿ-shape parameter
(a) was estimated from the analyzed data for each gene. The
combined ML analysis, which calculated the sum of the
log-likelihoods, was carried out using the TOTALML
program in MOLPHY with a variety of different gene combinations. The HKY85 and JTT-F models were assumed
for nucleotide and amino acid substitution processes, respectively (Hasegawa, Kishino, and Yano 1985; Jones,
Taylor, and Thornton 1992). The RELL bootstrap analysis
(Kishino, Miyata, and Hasegawa 1990) was performed on
alternative trees to obtain approximate bootstrap proportion
(BP) values, because the limitation of computational time
did not enable us to carry out real bootstrap analyses. The
RELL method was shown to be a good approximation to
the real bootstrap method (Hasegawa and Kishino 1994).
The AU test (Shimodaira 2002) in the CONSEL program
(Shimodaira and Hasegawa 2001) and the ShimodairaHasegawa (SH) test in the BASEML and CODEML
programs were used for statistical comparisons among the
alternative trees of interest.
Phylogenetic Analyses of Individual Genes
In the preliminary stage of each individual gene
analysis, an unrooted tree was considered for each gene,
excluding sequences that belonged to an outgroup. The
quick topology search option of the NUCML or PROTML
program (-q –n2000) was used to produce candidate trees,
which were subsequently analyzed by the ordinary ML
method using the Homogeneous model. The best tree and
alternative trees, of which the log-likelihood differences
from the log-likelihood of the best tree were within 1
standard error (SE) (1SE criterion), were selected. These
trees were further analyzed by the ML method using the
RAS model, and the best tree was finally selected. Based on
the best tree and widely accepted phylogenetic relationships, constraints on the subtrees for seven higher-order
taxonomic groups of Eukaryota (Opisthokonta, Amoebozoa, Plantae, Alveolata/stramenopiles, Euglenozoa/Heterolobosea, Diplomonadida, and Parabasala) were assumed in
advance. The subtree for the outgroup of each gene was also
assumed in advance, based on established findings. Taxa
and subtrees are shown in Table S1 of the Supplementary
Materials online with other information.
Thereafter, for each gene with the subtree constraints,
a total of 10,395 possible trees for eight groups (seven
groups 1 an outgroup) was exhaustively analyzed with the
Homogeneous model for a data set including all the sites
initially selected from an original alignment ( ‘‘all’’ data
set). Based on the best tree using the Homogeneous model
site-by-site rate categories, r1 (the slowest evolving sites,
including constant sites) through r8 (the fastest evolving
site), were estimated by the analysis using the RAS model.
To investigate the effect of removing constant or slowly
evolving sites or rapidly evolving sites in the analyses, we
made alternative data sets in a way similar to that previously
examined by Hirt et al. (1999) and Dacks et al. (2002). The
r8, r1, and r7 sites were stepwise removed from the ‘‘all’’
data set, producing another four data sets, –r8, –r18, –r78,
and –r178. For each of these data sets, an exhaustive
analysis of 10,395 trees using the Homogeneous model was
carried out in the same manner as for the ‘‘all’’ data set.
Combined Analyses of the Relationships among Seven
Eukaryotic Higher-Order Groups with the Outgroup
To evaluate the support for a given tree among the
10,395 trees from the total information residing in the
individual genes, phylogenetic information of individual
genes were combined by summing up site-by-site loglikelihoods for each tree. Thereafter, the tree with the
highest log-likelihood in total was selected as the best tree
for the combined analysis. With this approach, parameters
(such as branch lengths) were optimized for each gene,
allowing the combined analysis to take into consideration
heterogeneous phylogenetic information among the genes.
For each of the five data sets (‘‘all,’’ –r8, –r18, –r78, and
–r178) the summation was done over 24 genes including
2 rRNAs and over 22 protein-coding genes. Based on the
analyses using the summation over 24 genes for the data
sets ‘‘all’’ and –r78, candidate trees were selected from
10,395 alternatives based on the 4SE criterion (the best tree
and trees with log-likelihood differences from the best tree
less than 4SE), producing 137 and 572 trees for the data sets
‘all’ and –r78, respectively. The union of these tree sets
contained 577 trees and was exhaustively searched by the
analysis using RAS model for each of the 24 individual
genes. The combined analysis was done in the same way as
described above for each of the five data sets.
Combined Analyses with Additional Assumptions
According to the results of the analyses as described
in the following section, Diplomonadida and Parabasala
were grouped in advance, and 945 possible trees for the six
eukaryotic groups with an outgroup were exhaustively
examined for each gene using the –r78 data set and the
RAS model. Based on the combined analyses, over all
24 genes and 22 protein-coding genes, the selection of
candidate trees was carried out against the 3SE criterion,
producing 102 and 205 trees, respectively. The union of
the two tree sets resulted in 214 trees, which were then
used in AU tests to enable statistical comparisons with
different combinations of genes.
Finally, assuming that Euglenozoa/Heterolobosea
are closely related to Diplomonadida/Parabasala, which
corresponds to the Excavata monophyly hypothesis
(Cavalier-Smith 2002), 105 possible trees for the five
412 Arisue et al.
FIG. 1.—Schematic representation of the best tree from 10,395 alternatives based on the analysis of the ‘‘all’’ data set using the Homogeneous
model for across-site rate. (a), 22 protein genes. (b), 22 protein 1 2 rRNA genes. Bootstrap proportion (BP) values are shown on internal branches.
eukaryotic groups with an outgroup were exhaustively
examined for each of the 22 protein-coding genes using
the data sets, ‘‘all.’’ –r8, and –r78. Based on the combined
analyses for these data sets, 105 alternative trees were
compared.
Results
The ML trees for IleRS, ValRS, HSP90c, and RPB1
are shown in figure S1 of the Supplementary Materials
online. Although the monophyly of most of the higherorder eukaryotic groups was reconstructed in these trees,
the analyses did not statistically resolve the relationship
among these groups. None of the best trees of the other 20
individual genes coincided with the accepted trees of
Eukaryota either. The best tree of each gene showed some
differences from the accepted trees (Moreira, Le Guyader,
and Philippe 2000; Baldauf et al. 2000; Arisue et al.
2002a, 2002b; Bapteste et al. 2002).
The Relationship among Seven Higher-Order Groups
of Eukaryota
At first, combined analyses were performed by the
Homogeneous model using the ‘‘all’’ data set in order to
provide an inference based on the most classical
phylogenetic approach. The best tree by the 22 protein
genes positioned the amitochondriate lineages, Diplomonadida and Parabasala, at the earliest and second-earliest
branches of the eukaryotic tree, respectively, with 60% and
81% BP supports (fig. 1a). The stepwise divergences of
Opisthokonta, Amoebozoa, and Plantae were followed by
two early branches. The tree reconstructed the monophyly
of Plantae, Alveolata/stramenopiles, and Euglenozoa/
Heterolobosea, as suggested by the presence of a fused
DHFR-TS gene (Stechmann and Cavalier-Smith 2002;
2003a). Inclusion of two rRNA genes (fig. 1b) further
supported the earliest branching of Diplomonadida (96%)
and changed the branching order apart from the two early
branches. This tree was congruent with a previous tree
inferred by the combined analysis of approximately 100
proteins with 25,000 sites but that did not include
Parabasala (Bapteste et al. 2002). Because the analysis
including two rRNA genes seemed to be strongly affected
by a LBA artifact, uniting Diplomonadida with outgroup
sequences, analyses based on the 22 protein genes were
mainly used for subsequent phylogenetic inference.
In contrast to the analysis using the ‘‘all’’ data set with
Homogeneous modeling (fig. 1a), removal of the fastevolving sites (–r8, –r78) and/or the use of RAS model reduced the possibility of Diplomonadida possessing the
earliest branch status. The best tree of the analysis using the
RAS model on the –r78 data set positioned Diplomonadida
as the closest relative of Parabasala with 86% BP support
(fig. 2a, Tree A). The branching order, followed by divergence of a common ancestor for Diplomonadida and
Parabasala in Tree A, was exactly the same as the tree in
figure 1a, although the BP support values for internal
branches were reduced.
In the analysis of the –r78 data set with the RAS
model, alternative trees of interest which have previously
been suggested in the literature were compared (fig. 2a).
Tree B corresponds to the tree of Bapteste et al. (2002),
and Tree C is a rooted version of the tree of Baldauf et al.
(2000) and is based on the concatenated EF1a, actin and
tubulin genes, the root of which is located on the line
leading to the common ancestor of Diplomonadida and
Parabasala. Trees D and E, suggested by Stechmann and
Cavalier-Smith (2002, 2003a), are based on the distribution of fused DHFR-TS and/or CPSII-DHO-ACT genes.
The AU test for comparing the 577 candidate trees
significantly rejected Trees D and E (p , 0.05), but not at
a level of p , 0.01.
To investigate the effect of reducing sites and different model specifications, variations in BP support values
among 577 alternative trees were compared between different data sets using different sample sites and different
models for the site rates (fig. 2b). For node ‘‘d,’’ 81% support was found for the data set ‘‘all’’ with the Homogeneous model, whereas only 45% support was found for
the data set –r78 with the RAS model. For node ‘‘e,’’
Homogeneous model analysis using the data set ‘‘all’’ did
not support the node (38%), whereas analysis using the
data set –r78, with either Homogeneous or RAS modeling,
showed a support of greater than 85%. By removing the
fast evolving sites, support for the close affinity between
Diplomonadida and Parabasala increased, whereas support
for the early branching status of these two was decreased.
This tendency was particularly obvious in the analyses
using the Homogeneous model (fig. 2b, nodes d and e). In
Root of the Eukaryota Tree 413
contrast, removal of the slowest evolving sites (–r1)
showed no significant effect on variation of the BP values
for any of the nodes (fig. 2b), suggesting that removal of
the slowest or constant sites did not affect phylogenetic inference as far as the ML method is concerned. For the other
nodes of interest, represented by Trees B, C, D, and E, no
high BP support was obtained for grouping Opisthokonta
with Amoebozoa (node f) or Euglenozoa/Heterolobosea
with Diplomonadida/Parabasala (node j), corresponding to
the Excavata monophyly hypothesis (Cavalier-Smith 2002).
Removal of the fast-evolving sites increased support for
the earliest branching status of Opisthokonta (node l),
especially with regard to the RAS model analysis (more
than 10%), albeit without any clear support.
Possible Root of the Tree of Eukaryota
According to the above analyses, Diplomonadida and
Parabasala were grouped together to form a new clade. To
further analyze the relationship among higher-order groups
and the root of the tree, 945 possible trees for the six higherorder eukaryotic groups (including the DiplomonadidaParabasala clade) and an outgroup were exhaustively
examined for each gene, using the –r78 data set with RAS
modeling. Thereafter, subsequent combined analyses were
performed. Because the alternative trees for the BP
analyses were different from the previous ones (577
trees), the BP support values for the nodes in figure 2a
except node ‘‘e’’ were slightly changed but were almost
the same as shown in figure 2.
Based on the criteria described in Materials and
Methods, 214 trees were selected out of 945 trees for
statistical comparisons. Of these 214 trees, combined
analyses were performed on various combinations of the
genes. Based on the analysis with 22 protein genes,
60 trees were finally selected by the AU test with
a criterion of p . 0.05. Figure 3 compares the P values
of the 60 trees and three additional trees of interest (Tree D
in fig. 2a by Stechmann and Cavalier-Smith (2002), the
best tree from tubulins, and the best tree from rRNAs). The
214 trees did not include Tree E in figure 2a by Stechmann
and Cavalier-Smith (2003a).
In the analysis of the 22 proteins, Tree A (¼ Tree 44,
shown in fig. 2a) was the best tree. When tubulins were
removed from the analysis of the 22 proteins (‘‘–Tubulin’’),
the best tree shifted to the tree with Opisthokonta rooting
(Tree 19). This tree was also selected as the best tree by the
combined analyses of the chaperone proteins and the ‘‘15
proteins,’’ each of which had an outgroup that was not
extremely distant (see the legend to fig. 3). Tree D was not
rejected in the analyses of ‘‘–Tubulin’’ (p 3 0.1), ‘‘15
proteins’’ (p 3 0.2), translation-related proteins (p 3 0.1),
and chaperone proteins (p 3 0.2), whereas analyses of
tubulins (p , 0.01) and rRNAs (p , 0.01) significantly
rejected Tree D.
Analyses of tubulins and rRNAs significantly rejected
most of the 60 trees. It would appear that the phylogenetic
signals residing in the tubulins and rRNAs must be
different from those in the other protein data sets used in
the present analyses. Interestingly, all the trees except Tree
45, that were not rejected by the analysis of tubulins
(p 3 0.05), positioned Opisthokonta as the closest relative
to Diplomonadida/Parabasala, indicating that the tubulin
data sets strongly supported the close relationship between
Opisthokonta and Diplomonadida/Parabasala. On the other
hand in the analysis by rRNAs, 59 of the 60 trees were
significantly rejected, leaving only Tree B (Bapteste et al.
2002). Also in this analysis, the classical eukaryotic tree
of SSUrRNA (Tree 62) was significantly supported,
even when rapidly evolving sites were excluded from the
analysis.
Because Plantae, Euglenozoa/Heterolobosea, and
Alveolata/stramenopiles share a fused DHFR-TS gene,
these three groups are likely to be monophyletic, and the
root of the tree of Eukaryota should not be located within
these three groups (Stechmann and Cavalier-Smith 2002).
The ML tree, from the combined analyses presented in this
study, also reconstructed the monophyly for these three
groups by sequence-based evidence, although the BP
support was not high (fig. 2a). Therefore, from the 60
selected trees listed in figure 3, together with the DHFRTS gene fusion–based findings, the possible position of the
root of the eukaryotic tree could finally be narrowed to the
branch leading either to Opisthokonta (Trees 5, 6, 16, 19,
20), to the common ancestor of Diplomonadida/Parabasala
(Trees 44 [Tree A], 45, 51, 52 [Tree C]), to the common
ancestor of Opisthokonta and Diplomonadida/Parabasala
(Tree 55), or to the common ancestor of Plantae,
Euglenozoa/Heterolobosea, and Alveolata/stramenopiles
(Trees 57 and 58). In addition, if we also accept the close
relationship between Excavata, Plantae, and Alveolata/
stramenopiles as one of the candidate relationship
(Cavalier-Smith 2002; Stechmann and Cavalier-Smith
2002; 2003a), Tree 1 cannot be ruled out either. The
close relationship between Opisthokonta and Diplomonadida/Parabasala found in Trees 55 and 58 is likely to be
artificially affected by tubulins as mentioned above. The
p values for Trees 6, 55, and 57 decreased to 0.01 2 p ,
0.05 in the analysis of the ‘‘15 proteins’’ whose outgroups
are not extremely distant. Based on these considerations
Trees 6, 55, 57, and 58 can also be ruled out, further
narrowing the possibilities to the Opisthokonta rooting or
to the Diplomonadida/Parabasala rooting.
Analysis with a Constraint on the Excavata Monophyly
To seek a possible relationship among higher-order
groups under the assumption that Excavata are monophyletic (Cavalier-Smith 2002), 105 possible alternative
trees for the five higher-order groups (Opisthokonta,
Amoebozoa, Plantae, Alveolata/stramenopiles, Excavata)
and an outgroup were exhaustively examined using the
RAS model with the –r78 data set. Tree D (Stechmann
and Cavalier-Smith 2002) was selected as the best tree
(fig. 4). Tree 60 in figure 3, in which Excavata was positioned at the base of Eukaryota, was not significantly
different from the best tree (p ¼ 0.433), and the tree (Tree 1
in figure 3) that exchanges the positions of Excavata and
Plantae in Tree D was not significantly different either (p ¼
0.493). Sixty-nine trees were rejected by the AU test at the
significance level, p , 0.05. Tree E (Stechmann and
Cavalier-Smith 2003a) was included in these trees
414 Arisue et al.
a)
Tree A
Euglenozoa
Heterolobosea
a
(59)
Alveolata
Stramenopiles
b
(47)
c
Plantae
(50)
[4.1]
[4.0]
Amoebozoa
d
[2.5]
[2.5]
(45)
Opisthokonta
e
(86)
[10.3]
Parabasala
[1.1]
Diplomonadida [1.3]
Outgroup [4.6]
(BPs)
[averaged No. of species]
0.1 substitutions/sites
1 species
Tree B
Op
f
p=0.084
g
Am
h
p=0.148
100
l
EH
a
Pl
i
p=0.042
AS
k
Pl
b
Tree E
p=0.043
Am
d
AS
Tree D
Op
f
Pl
d
b)
Tree C
Pl
AS
k
EH
j
i
j
DP
EH
DP
EH
AS
Am
DP
DP
Op
Op
O
O
O
O
c
d
e
f
a
b
all -r8 -r18-r78 178
-r
all -r8 -r18-r78 178 all -r8 -r18-r78 178
-r
-r
all -r8 -r18-r78 178 all -r8 -r18-r78 178 all -r8 -r18-r78 178
-r
-r
-r
g
h
j
all -r8 -r18-r78 178
-r
all -r8 -r18-r78 178 all -r8 -r18-r78 178
-r
-r
Am
f
BPs (%)
80
60
40
20
0
100
BPs (%)
80
i
k
l
60
40
20
0
Homogeneous model
Rate across site (RAS) model
all -r8 -r18-r78 178 all -r8 -r18-r78 178 all -r8 -r18-r78 178
-r
-r
-r
sites used;
All / -r8 / -r18 / -r78 / -r178
8199 / 7199 / 5742 / 6047 / 4590
FIG. 2.—Relationship between seven higher-order eukaryotic groups. (a), The best tree and the alternative trees based on the analysis of the –r78
data set, including 22 protein-coding genes with the rate across site (RAS) model (‘‘–r78 with RAS’’). Five alternative trees of interest. Tree A, (the best
tree); Tree B (same as the tree shown in Bapteste et al. (2002) but with Parabasala); Tree C (the tree shown in Baldauf et al. (2000) rooted by the
common ancestor of Diplomonadida and Parabasala); Tree D (Stechmann and Cavalier-Smith 2002); Tree E (Stechmann and Cavalier-Smith 2003a).
Internal nodes are represented by lowercase characters, a ; l. In Tree A, multi-taxon groups are shown as triangles. The base of the triangle is
proportional to the average number of taxa for different genes, and the number for each group is shown in brackets beside the group name. The width of
the triangle is proportional to average branch length for taxa and for genes weighted by the number of sites used. BP values are shown in parentheses. In
Tree B through Tree E, only topologies are shown schematically with p values of the AU test for comparison of the 577 trees examined. Abbreviations
Root of the Eukaryota Tree 415
(p ¼ 0.017). As shown in the BP values indicated in
the internal branches of the trees in figure 4, the basal
placings of Opisthokonta and Excavata were supported by
1% and 97%, respectively, when the ‘‘all’’ data set was
used with Homogeneous modeling. In contrast, supports
for these placings shifted to 74% and 18%, respectively in
the analysis using the RAS model on the –r78 data set.
A close relationship with 91% support was found between
Opisthokonta and Amoebozoa when the Homogeneous
model was used on the ‘‘all’’ data set. However, support
decreased to 21% when the –r78 data set was used with
RAS modeling. The analyses by removing fast-evolving
sites with the RAS model favored the Opisthokonta rooting. This supported the hypothesis proposed by Stechmann
and Cavalier-Smith in 2002 but not their later hypothesis
of 2003 (Stechmann and Cavalier-Smith 2003a), under the
assumption that Excavata monophyly really was the case.
Discussion
A close relationship between amitochondriate lineages, Diplomonadida and Parabasala, in a rooted tree has
been suggested by several phylogenies of non-metabolic
genes: b-tubulin (Keeling and Doolittle 1996); EF1a
(Hashimoto et al. 1997); ValRS (Hashimoto et al. 1998);
CPN60 (Horner and Embley 2001); and concatenated five
ribosomal proteins (Arisue et al. 2004), but it has not been
reconstructed in other trees of non-metabolic genes. The
present analyses clearly demonstrated for the first time that
Diplomonadida and Parabasala are the closest relatives to
each other in a rooted tree. The presence of mitochondrionrelated genes and organelles in several amitochondriate
protists has suggested that these protists secondarily lost
their typical mitochondria separately in different lineages
(Roger 1999; Rotte et al. 2000; Williams et al. 2002;
Embley et al. 2003; Tovar et al. 2003). The grouping of
Diplomonadida and Parabasala reduced the number of secondary losses of typical mitochondria in the Eukaryota tree,
although these two lineages show deep differences in the
organization of their amitochondriate phenotype (Martin
and Müller 1998).
The tree of Eukaryota has been examined by the
SSUrRNA phylogeny during past two decades, and various
phylogenetic questions have been successfully addressed
(e.g., Sogin and Silberman 1998). One of the most important implications of the SSUrRNA phylogeny was that
the amitochondriate protists, Euglenozoa, Heterolobosea,
and Amoebozoa diverged stepwise before the radiation of
the terminal ‘‘Crown’’ groups (Sogin and Silberman 1998).
The classical SSUrRNA tree as shown in Tree 62 of figure
3, however, was exclusively supported only by rRNAs in
the present analyses. Averaged phylogenetic signals residing in the protein data sets did not favor the rRNA tree
at all. Because figure 3 revealed that the rRNA tree was an
extreme example of the tree of Eukaryota, the widely
accepted, SSUrRNA-based scenario of the eukaryotic
evolution should be extensively revised (Cavalier-Smith
2004). At the same time, the tubulin phylogeny was also
very discordant with the combined protein phylogeny
without tubulins (fig. 3). Tubulins polymerize to form
microtubules, the major component of cytoskeleton; 912
axonemes, and mitotic spindles. The tubulins are the most
important molecules for forming structure and morphology
of the cell. Compared with those of other proteins used in
the present combined analyses, functional constraints on
tubulins are more likely to be affected by lifestyle and
environment of the organisms. Thus, convergent evolution
at the molecular level might have occurred and violated the
organismal phylogeny in the tubulin data sets.
From the combined analyses of the 22 protein
sequences together with the findings for the distribution
of the DHFR-TS gene fusion in the eukaryotic tree, two
possibilities seem to exist for the root of the tree of
Eukaryota, namely the branch leading either to Opisthokonta or that leading to the common ancestor of
Diplomonadida/Parabasala. No strong support was detected for the latter possibility, in contrast to the prominent
support generally obtained by the rRNA phylogeny. If the
latter possibility could be discarded because of a widely
recognized possible LBA artifact, Opisthokonta rooting
would be the most likely option. The early emergence of
Opisthokonta has been weakly recovered by a recent
HSP90c phylogeny including six different, previously
unsampled eukaryotic groups (Stechmann and CavalierSmith 2003b).
Although the present analyses did not support the
Excavata monophyly hypothesis (Cavalier-Smith 2002),
we examined the eukaryotic phylum Excavata. This was
because we could not entirely exclude the hypothesis by
the present combined analyses alone, possibly because
they were affected by the serious LBA problems. Many
unknown artifacts including LBA may critically influence
the present analyses. One of the two higher-order groups in
Excavata, Euglenozoa/Heterolobosea, and Diplomonadida/
Parabasala, might be artificially located at the base of
the eukaryotic tree (fig. 3), and thus the monophyly of
Excavata might be difficult to reconstruct. It is worth
noting that the best tree obtained in the analysis with
a constraint on the Excavata monophyly was exactly the
same as Tree D (Stechmann and Cavalier-Smith 2002).
This result demonstrates once again that Opisthokonta
rooting is most likely, if the possibility of rooting on one of
the fast-evolving groups, Diplomonadida/Parabasala and
Euglenozoa/Heterolobosea, is not considered. If Tree D
was really the case, then the DHFR-TS fused gene would
have been lost in the parasites of the Diplomonadida/
Parabasala group, because neither DHFR nor TS activity
was detected in Giardia intestinalis, Trichomonas vaginalis, and Tritrichomonas foetus (Wang et al. 1983; Wang
and Cheng 1984; Aldritt, Tien, and Wang 1985), and
for the groups are as follows: Op, Opisthokonta; Am, Amoebozoa; Pl, Plantae; AS, Alveolata/stramenopiles; EH, Euglenozoa/Heterolobosea, DP,
Diplomonadida/Parabasala. (b). Variations in BP values for the internal branches of the five trees shown in panel a. The values are shown for 10
different analyses using different data sets and different models for site rates.
sites
Tree No.
Topologies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44 (Tree A)
45
46
47
48
49 (Tree B)
50
51
52 (Tree C)
53
54
55
56
57
58
59
60
Op|(Am,(Pl,(AS,(EH,DP))))^
Op|(Am,(EH,(AS,(Pl,DP))))
Op|(Am,(EH,((AS,Pl),DP)))
Op|(Am,((AS,EH),(Pl,DP)))
Op|(Am,(DP,(Pl,(AS,EH))))*
Op|(Am,(DP,(EH,(AS,Pl))))*
Op|(AS,((Am,Pl),(EH,DP)))^
Op|(Pl,(Am,((AS,EH),DP)))
Op|(EH,((AS,Pl),(Am,DP)))
Op|((Am,Pl),(AS,(EH,DP)))
Op|((Am,Pl),((AS,EH),DP))
Op|((AS,Pl),((Am,EH),DP))
Op|((AS,EH),(Am,(Pl,DP)))
Op|((AS,EH),(Pl,(Am,DP)))
Op|(DP,(Am,((AS,Pl),EH)))*
Op|(DP,(Am,((AS,EH),Pl)))*
Op|(DP,(Pl,(Am,(AS,EH))))
Op|(DP,((Am,Pl),(AS,EH)))
Op|((Am,DP),(Pl,(AS,EH)))*
Op|((Am,DP),(EH,(AS,Pl)))*
Pl|((AS,EH),(Am,(Op,DP)))
Pl|((Am,(AS,EH)),(Op,DP))
EH|(Am,(DP,(Op,(AS,Pl))))
EH|(AS,(Am,(Pl,(Op,DP))))
EH|(AS,(Pl,(Am,(Op,DP))))
EH|(AS,(Pl,(Op,(Am,DP))))
EH|(AS,(Op,(Pl,(Am,DP))))
EH|(AS,((Am,Pl),(Op,DP)))
EH|(Op,(Am,(AS,(Pl,DP))))
EH|(Op,(Am,((AS,Pl),DP)))
EH|(Op,(DP,(Am,(AS,Pl))))
EH|((Am,Op),(Pl,(AS,DP)))
EH|((Am,Op),((AS,Pl),DP))
EH|((AS,Pl),(Am,(Op,DP)))
EH|((AS,Pl),(Op,(Am,DP)))
EH|(DP,(AS,(Pl,(Am,Op))))
EH|(DP,((Am,Op),(AS,Pl)))
(Am,Pl)|((EH,AS),(Op,DP))
(AS,EH)|(Op,(Am,(Pl,DP)))
(AS,EH)|(Op,(DP,(Pl,Am)))
(AS,EH)|(Pl,(Am,(Op,DP)))
(AS,EH)|((Op,Pl),(Am,DP))
DP|(Op,(Pl,(Am,(AS,EH))))
DP|(Op,(Am,(Pl,(AS,EH))))*
DP|(Op,(Am,(EH,(Pl,AS))))*
DP|(Op,((Pl,Am),(AS,EH)))
DP|(Pl,((Op,Am),(AS,EH)))
DP|(EH,(Op,(Am,(Pl,AS))))
DP|(EH,(AS,(Pl,(Op,Am))))
DP|(EH,((Op,Am),(Pl,AS)))
DP|(Am,(Op,(Pl,(AS,EH))))*
DP|((Op,Am),(Pl,(AS,EH)))*
DP|((AS,EH),(Pl,(Op,Am)))
(Op,(AS,EH))|(Pl,(Am,DP))
(Op,DP)|(Am,(Pl,(AS,EH)))*
(Op,DP)|(Pl,(Am,(AS,EH)))
(Pl,(AS,EH))|(Op,(DP,Am))*
(Pl,(AS,EH))|(Am,(Op,DP))*
(Pl,DP)|((Op,Am),(AS,EH))
(EH,DP)|((Op,Am),(AS,Pl))^
88.0
100.2
91.2
57.5
47.2
77.2
114.1
80.8
95.5
105.0
76.1
105.8
57.7
46.3
44.2
11.7
38.0
30.4
28.0
60.8
22.2
43.1
100.4
33.6
2.3
34.8
51.5
24.9
107.4
94.3
98.4
85.0
70.3
18.2
50.7
74.6
65.0
55.1
77.2
83.4
25.3
77.4
26.2
0.0
32.6
16.0
50.3
96.5
78.8
72.4
29.2
36.1
48.0
78.9
20.0
47.1
60.5
30.5
82.4
89.5
61 (Tree D) Op|(Am,((AS,Pl),(EH,DP)))^
62
DP|(EH,(Am,(Op,(Pl,AS))))
63
(Op,DP)|(Am,((Pl,AS),EH))*
74.2
95.6
53.6
∆li
6047
5534
5401
best
4501
1773
646
best
C
pr ha
ot pe
ei ro
ns n
Tu
bu
lin
s
rR
N
As
Al
+ lP
rR ro
N te
As in
s
Al
lP
ro
te
in
s
-T
ub
ul
in
s
-R
BP
1
15
pr
ot
ei
Tr ns
re an
pr lat sla
ot ed tio
ei
n
ns
R
PB
1
416 Arisue et al.
2874
513
1934
7981
best
best
best
best
best
best
best
best
best
0.2
p
0.1
p
0.2
0.05
p
0.1
0.01
p
0.05
p
0.01
FIG. 3.—Comparison of alternative candidate trees for the different combined analyses with different combinations of genes. Sixty trees were
finally selected (Trees 1 to 60) based on analysis of 22 protein genes using the –r78 data set with RAS modeling, and are shown with the other trees
of interest (Trees 61 to 63). Tree A to Tree D in figure 2a are shown in parentheses with Tree number. The symbols j, *, and ˆ in topologies are used to
denote the root of Eukaryota, the presence of monophyly for Plantae, Euglenozoa/Heterolobosea, and Alveolata/stramenopiles, and the presence of the
Excavata monophyly, respectively. li is a log-likelihood difference between the ML tree (Tree 44) and the corresponding tree. P values from the AU
test are categorized into six groups by shading, as shown at the bottom of the figure. Each column corresponds to one of the different combined analyses
with different gene combinations. The ‘‘15 proteins’’ include EF1a, EF2, IleRS, ValRS, RP-S14, RP-S15a, RP-L8, RPB1, HSP70c, HSP70er,
HSP70mit, HSP90c, CPN60, TCP1-a, and ACT. In the best tree for each of these 15 individual proteins, branch length leading to the outgroup was less
than 0.5 substitutions/site, and thus the outgroup was not so extremely distant as found in the other seven proteins.
Root of the Eukaryota Tree 417
FIG. 4.—Four alternative trees of interest selected from 105 possible trees in the analysis by the –r78 data set using the RAS model under the
assumption that Excavata are monophyletic. Tree D, the best tree (Stechmann and Cavalier-Smith 2002, Tree 61 in fig. 3); Tree 1 in figure 3; Tree E,
(Stechmann and Cavalier-Smith 2003a); Tree 60 in figure 3. P values of the AU test are shown in parentheses for Trees 1, E, and 60. BP values are
shown over internal branches for different data sets analyzed and for the different models used.
because no related gene was found in the genome sequencing database of Giardia intestinalis (McArthur et al.
2000). Examination of the presence or absence of the fused
gene in the free living organisms that belong to this group
will be important to clarify the status of Diplomonadida/
Parabasala in the phylogenetic tree based on the DHFR-TS
gene fusion.
Preliminary exploration of the distribution of the gene
fusion, CPSII-ACT-DHO, which was exclusively found in
Opisthokonta and Amoebozoa, suggested that these two
groups are probably monophyletic and that the root is
located on their common ancestor (Stechmann and
Cavalier-Smith 2003a). However, this possibility is less
supported by the present study on the sequence-based
phylogeny. Because information regarding the gene fusion
events in the pyrimidine biosynthetic pathway is scant,
further analyses examining gene organization of the
pathway from diverse protist lineages is necessary to
settle more precisely the discrepancy between the
inferences based on the gene fusion and the sequencebased phylogeny. Interestingly a sequence similarity
search of the genome project database of a unicellular
red alga, Cyanidioschyzon merolae (Matsuzaki et al. 2004)
(http://merolae.biol.s.u-tokyo.ac.jp/) identified a fused
418 Arisue et al.
CPSII-ACT-DHO gene in addition to separate CPS and
GAT genes, demonstrating that the gene fusion event most
likely occurred on the common ancestor of all eukaryotes
including bikonts, Opisthokonta, and Amoebozoa. This
finding reduces the possibility for rooting on the common
ancestor of Opisthokonta and Amoebozoa, but instead
gives more support for the Opisthokonta rooting suggested
by the present molecular phylogeny. In addition to gene
fusions in the pyrimidine biosynthetic pathway summarized by Nara, Hashimoto, and Aoki (2000), a novel gene
fusion, ACT-DHOD (Dihydroorotate dehydrogenase), has
recently been found in a euglenozoan protist, Bodo saliens
(Annoura et al. 2004). Compared to the DHFR-TS fusion
event, gene fusions in the pyrimidine biosynthetic pathway
may be more complicated, with fusion and separation
events possibly occurring more than once on the independent branches of the eukaryotic tree.
If the constraints that were assumed in advance for
each of the higher-order groups examined in the present
analyses were very discordant with phylogenetic information for each of the genes analyzed, the constraints may
have violated phylogenetic inference. Because constraints
to RPB1 significantly affected the log-likelihood difference between the best trees with and without constraints
(p , 0.01; see table S1 of the Supplementary Materials
online), we excluded RPB1 from all 22 protein genes in
the combined analysis, as shown in figure 3, and explored
the influence of such exclusion. Analysis only by RPB1
was not significantly different from the analysis of the 22
proteins with regard to the patterns in figure 3. Removal of
RPB1 from the 22 proteins (‘‘–RPB1’’) shifted the best tree
to Tree 25 with Euglenozoa/Heterolobosea rooting, but the
overall pattern of p values for the 63 trees in figure 3 did
not change after its removal, demonstrating that no
significant influence was introduced by inclusion of
RPB1. Instead, as already mentioned in the Results,
inclusion of tubulins and/or rRNAs might significantly
affect the present combined analyses. We also examined
an alternative combined analysis of the 22 protein genes
by excluding sequences of Rhodophyta and/or Glaucophyta, which are present in nine of the genes (see table S1
in the Supplementary Materials online), because the monophyletic origin of Plantae, including Viridiplantae, Rhodophyta, and Glaucophyta, has recently been challenged
(Nozaki et al. 2003). No differences were found either with
or without Rhodophyta and/or Glaucopyhta, demonstrating that the constraint on the monophyletic origin of
Plantae had no significant influence. The possibility for
a polyphyletic origin of Viridiplantae, Rhodophyta, and
Glaucophyta that was proposed by Nozaki et al. (2003)
should be re-examined with more data.
In our present analyses performed by removing slowor fast-evolving sites, owing to the limitation of the computational time, we did not provide ‘‘control’’ experiments
for assessing a specific effect of removing sites over what is
expected with the random removal of sites. Although we
roughly compared the BP values of the analyses with
different numbers of sites in the present analyses, in general
one cannot simply compare them because a positive correlation is present between BP values and numbers of sites
used in the analysis. A control experiment should be done in
the next step for each of the data sets with different number
of sites. In spite of the use of the ÿ model for approximating
rate heterogeneity among sites (RAS), the removal of fastevolving sites (–r78) still showed an additional effect on the
BP values. This is probably because an effect of model
misspecification was apparent, especially on sites r7 and r8.
Because violation of amino acid frequency constancy was
not so evident for the 22 proteins analyzed (table S1 of the
Supplementary Materials online), the misspecification can
probably be attributed to evolutionary rate distribution
differences across subtrees (covarion shifts), which were not
taken into consideration in the present analyses with the
RAS model. The presence of such model misspecification
was discussed in detail in an EF1a analysis for the position
of Microsporidia in light of LBA and covarion shifts
(Inagaki et al. 2004). If the ‘‘all’’ data set in the present
analyses contained such a covarion-like structure, then the
RAS model could not fully approximate the data, resulting
in a possible LBA artifact which locates Diplomonadida or
a common ancestor of Diplomonadida and Parabasala at the
base of the tree.
The ML tree of the combined analysis using RAS
model on the –r78 data set, including 22 proteins (Tree A,
shown in fig. 2a) clearly demonstrated the difficulty in
solving the higher-order phylogeny of Eukaryota. The
branch lengths leading to the outgroup and Parabasala or
Diplomonadida are extremely long, and those leading to
nodes a, b, and c are extremely short. Except for Opisthokonta, taxon sampling within each higher-order group is
sparse. The relationships between the major eukaryotic
groups and an outgroup cannot be clearly resolved apart
from the close relationship between Diplomonadida and
Parabasala. Although the present analyses could narrow the
possible root of the tree of Eukaryota, the problem is still
open because of the lack of phylogenetic information. With
the accumulation of EST sequence data, a large scale
analyses for eukaryotic phylogeny ( ‘‘phylogenomics’’) has
recently been examined, and conclusively demonstrated the
position of chanoflagellates (Philippe et al. 2004). The
‘‘phylogenomics’’ approach, using more sequence data with
adequate taxon sampling, together with the application of
sophisticated data analysis for combined phylogeny, will be
indispensable for providing more robust inference on the
higher-order relationships and the root of the Eukaryota tree.
Supplementary Material
Materials and Methods for Cloning and Sequencing
of the Protist Genes.
FIG. S1. Unrooted maximum likelihood trees of Eukaryota. (a), IleRS; (b), ValRS; (c), Hsp90c; and (d), RPB1.
Table S1. Constraints on the subtrees for seven higherorder taxonomic groups of Eukaryota and an outgroup.
Acknowledgments
We express sincere thanks to Dr. M. Müller for
providing us an opportunity to analyze genes from several
amitochondriate protists, for invaluable discussions on the
phylogenetic analyses, and for critical review of the
Root of the Eukaryota Tree 419
manuscript. We also thank Dr. H. Philippe for provision
of the IleRS and ValRS alignments and discussions, Dr.
L. B. Sánchez for technical support for gene clonings, Dr.
T. Shirakura for initial sequencing of the G. plecoglossi
ValRS and E. hellem IleRS genes, A. Deguchi and
S. Kikuchi for technical assistance, Drs. F. D. Gillin and S.
A. Aley (San Diego, CA), P. J. Johnson (Los Angeles,
CA), L. B. Sánchez (New York, NY), L. M. Weiss (New
York, NY), and K. Kita (Tokyo, Japan) for provision of
the gDNA and/or gDNA/cDNA libraries of Giardia intestinalis, Trichomonas vaginalis, Entamoeba histolytica,
Encephalitozoon hellem, and Plasmodium falciparum,
respectively. This work was carried out under the ISM
Cooperative Research Programs (03ISMCRP-1015 and
04ISMCRP-1017) and the Research Project at Center for
Computational Sciences, University of Tsukuba. Work
carried out in the laboratory at the Rockefeller University
in New York was supported by USPHS National Institutes
of Health grant AI11942 to M.M. The visits of T.H. to the
New York laboratory were supported by the US-Japan
Cooperative Research Project by the National Science
Foundation (USA), and by the Japan Society for the
Promotion of Science (INT-9726707).
Literature Cited
Adachi, J., and M. Hasegawa. 1996. MOLPHY version 2.3:
program for molecular phylogenetics based on maximum
likelihood. Comput. Sci. Monographs No. 28, The Institute of
Statistical Mathematics, Tokyo.
Aldritt, S. M., P. Tien, and C. C. Wang. 1985. Pyrimidine salvage
in Giardia lamblia. J. Exp. Med. 161:437–445.
Annoura, T., T. Nara, T. Makiuchi, T. Hashimoto, and T. Aoki.
2004. The origin of dihydroorotate dehydrogenase genes of
kinetoplastids, with special reference to their biological
significance and adaptation to anaerobic, parasitic conditions.
J. Mol. Evol. in press.
Archibald, J. M., D. Longet, J. Pawlowski, and P. J. Keeling.
2003. A novel polyubiquitin structure in Cercozoa and
Foraminifera: evidence for a new eukaryotic supergroup.
Mol. Biol. Evol. 20:62–66.
Arisue, N., T. Hashimoto, J. A. Lee, D. V. Moore, P. Gordon,
C. W. Sensen, T. Gaasterland, M. Hasegawa, and M. Müller.
2002a. The phylogenetic position of the peleobiont Mastigamoeba balamuthi based on sequences of rDNA and translation elongation factors EF-1a and EF-2. J. Eukaryot.
Microbiol. 49:1–10.
Arisue, N., T. Hashimoto, H. Yoshikawa, Y. Nakamura, G.
Nakamura, F. Nakamura, T. Yano, and M. Hasegawa. 2002b.
Phylogenetic position of Blastocystis hominis and of stramenopiles inferred from multiple molecular sequence data. J.
Eukaryot. Microbiol. 49:42–53.
Arisue, N., Y. Maki, H. Yoshida, A. Wada, L. B. Sánchez, M.
Müller, and T. Hashimoto. 2004. Comparative analysis of the
ribosomal components of the hydrogenosome-containing
protist, Trichomonas vaginalis. J. Mol. Evol. 59:59–71.
Baldauf, S. L. 2003. The deep roots of eukaryotes. Science
300:1703–1706.
Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle.
2000. A kingdom-level phylogeny of eukaryotes based on
combined protein data. Science 290:972–977.
Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W.
Sensen, P. Gordon, L. Duruflé, T. Gaasterland, P. Lopez,
M. Müller, and H. Philippe. 2002. The analysis of 100 genes
supports the grouping of three highly divergent amoebae:
Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl.
Acad. Sci. USA 99:1414–1419.
Cavalier-Smith, T. 2002. The phagotrophic origin of eukaryotes
and phylogenetic classification of Protozoa. Int. J. Syst. Evol.
Microbiol. 52:297–354.
Cavalier-Smith, T., and E. E. Chao. 2003a. Phylogeny of
choanozoa, apusozoa, and other protozoa and early eukaryote
megaevolution. J. Mol. Evol. 56:540–563.
Cavalier-Smith, T., and E. E. Chao. 2003b. Phylogeny and
classification of phylum Cercozoa (Protozoa). Protist
154:341–358.
Cavalier-Smith, T. 2004. Only six kingdoms of life. Proc R. Soc.
Lond. Ser. B 271:1251–1262.
Dacks, J. B., and W. F. Doolittle. 2001. Reconstructing/
deconstructing the earliest eukaryotes: how comparative
genomics can help. Cell 107:419–425.
Dacks, J. B., J. D. Silberman, A. G. B. Simpson, S. Moriya,
T. Kudo, M. Ohkuma, and R. J. Redfield. 2001. Oxymonads
are closely related to the excavate taxon Trimastix. Mol. Biol.
Evol. 18:1034–1044.
Dacks, J. B., A. Marinets, W. F. Doolittle, T. Cavalier-Smith, and
J. M. Logsdon Jr. 2002. Analyses of RNA polymerase II
genes from free-living protists: phylogeny, long branch attraction, and the eukaryotic big bang. Mol. Biol. Evol. 19:
830–840.
Edgcomb, V. P., A. J. Roger, A. G. B. Simpson, D. T. Kysela,
and M. L. Sogin. 2001. Evolutionary relationships among
‘‘jakobid’’ flagellates as indicated by alpha- and beta-tubulin
phylogenies. Mol. Biol. Evol. 18:514–522.
Embley, T. M., M. van der Giezen, D. S. Horner, P. L. Dyal, and
P. Foster. 2003. Mitochondria and hydrogenosomes are two
forms of the same fundamental organelle. Philos. Trans. R.
Soc. Lond. B Biol. Sci. 358:191–201.
Felsenstein, J. 1978. Cases in which parsimony or compatibility
methods will be positively misleading. Syst. Zool. 27:401–410.
Gribaldo, S., and H. Philippe. 2002. Ancient phylogenetic
relationships. Theoret. Popul. Biol. 61:391–408.
Hasegawa, M., and H. Kishino. 1994. Accuracies of the simple
methods for estimating the bootstrap probability of a maximumlikelihood tree. Mol. Biol. Evol. 11:142–145.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the
human-ape splitting by a molecular clock of mitochondrial
DNA. J. Mol. Evol. 22:160–174.
Hashimoto, T., L. B. Sánchez, T. Shirakura, M. Müller, and M.
Hasegawa. 1998. Secondary absence of mitochondria in
Giardia lamblia and Trichomonas vaginalis revealed by
valyl-tRNA synthetase phylogeny. Proc. Natl. Acad. Sci.
USA 95:6860–6865.
Hashimoto, T., Y. Nakamura, T. Kamaishi, and M. Hasegawa.
1997. Early evolution of eukaryotes inferred from protein
phylogenies of translation elongation factors 1a and 2. Arch.
Protistenkd. 148:287–295.
Hirt, R. P., J. M. Logsdon Jr., B. Healy, M. W. Dorey, W. F.
Doolittle, and T. M. Embley. 1999. Microsporidia are related
to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc. Natl. Acad. Sci. USA 96:
580–585.
Horner, D. S., and T. M. Embley. 2001. Chaperonin 60
phylogeny provides further evidence for secondary loss of
mitochondria among putative early-branching eukaryotes.
Mol. Biol. Evol. 18:1970–1975.
Hughey, R, and A. Krogh. 1996. Hidden Markov models for
sequence analysis: extension and analysis of the basic method.
Comput. Appl. Biosci. 12:95–107.
Inagaki, Y., E. Susko, N. M. Fast, and A. J. Roger. 2004.
Covarion shifts cause a long-branch attraction artifact that
420 Arisue et al.
unites Microsporidia and Archaebacteria in EF-1a phylogenies. Mol. Biol. Evol. 21:1340–1349.
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid
generation of mutation data matrices from protein sequences.
Comput. Appl. Biosci. 8:275–282.
Keeling, P. J. 2001. Foraminifera and Cercozoa are related in
actin phylogeny: two orphans find a home? Mol. Biol. Evol.
18:1551–1557.
Keeling, P. J., and W. F. Doolittle. 1996. Alpha-tubulin from
early-diverging eukaryotic lineages and the evolution of the
tubulin family. Mol. Biol. Evol. 13:1297–1305.
Kishino, H., T. Miyata, and M. Hasegawa. 1990. Maximum
likelihood inference of protein phylogeny, and the origin of
chloroplasts. J. Mol. Evol. 31:151–160.
Lee, J. J., G. F. Leedale, and P. Bradbury, eds. 2000. An
Illustrated Guide of Protozoa, 2nd ed. Society of Protozoologists, Lawrence, Kans.
Martin, W., and M. Müller. 1998. The hydrogen hypothesis of
the first eukaryote. Nature 392:37–41.
Matsuzaki, M., O. Misumi, T. Shin-i et al. (42 co-authors). 2004.
Genome sequence of the ultrasmall unicellular red alga
Cyanidioschyzon merolae 10D. Nature 428:653–657.
McArthur, A. G., H. G. Morrison, J. E. Nixon et al. (15 coauthors). 2000. The Giardia genome project database. FEMS
Microbiol. Lett. 189:271–273.
Moreira, D., H. Le Guyader, and H. Philippe. 2000. The origin of
red algae and the evolution of chloroplasts. Nature 405:69–72.
Nara, T., T. Hashimoto, and T. Aoki. 2000. Evolutionary
implications of the mosaic pyrimidine-biosynthetic pathway
in eukaryotes. Gene 257:209–222.
Nozaki, H., M. Matsuzaki, M. Takahara, O. Misumi, H. Kuroiwa,
M. Hasegawa, T. Shin-i, Y. Kohara, N. Ogasawara, and T.
Kuroiwa. 2003. The phylogenetic position of red algae
revealed by multiple nuclear genes from mitochondriacontaining eukaryotes and an alternative hypothesis on the
origin of plastids. J. Mol. Evol. 56:485–497.
O’Kelly, C. J., and T. A. Nerad. 1999. Malawimonas
jakobiformis n. gen., n. sp. (Malawimonadidae n. fam.): a
jakoba-like heterotrophic nanoflagellate with discoidal mitochondrial cristae. J. Eukaryot. Microbiol. 46:522–531.
Philippe, H., P. Lopez, H. Brinkmann, K. Budin, A. Germot, J.
Laurent, D. Moreira, M. Muller, and H. Le Guyader. 2000.
Early-branching or fast-evolving eukaryotes? An answer
based on slowly evolving positions. Proc. R. Soc. Lond. B
Biol. Sci. 267:1213–1221.
Philippe, H., E. A. Snell, E. Bapteste, P. Lopez, P. W. Holland,
and D. Casane. 2004. Phylogenomics of eukaryotes: impact of
missing data on large alignments. Mol. Biol. Evol. 21:1740–
1752.
Richards, T. A., R. P. Hirt, B. A. P. Williams, and T. M. Embley.
2003. Horizontal gene transfer and the evolution of parasitic
protozoa. Protist 154:17–32.
Roger, A. J. 1999. Reconstructing early events in eukaryotic
evolution. Am. Nat. 154:S146–S163.
Roger, A. J., and J. D. Silberman. 2002. Mitochondria in hiding.
Nature 418:827–829.
Rotte, C., K. Henze, M. Müller, and W. Martin. 2000. Origins of
hydrogenosomes and mitochondria. Curr. Opin. Microbiol.
3:481–486.
Shimodaira, H. 2002. An approximately unbiased test of
phylogenetic tree selection. Syst. Biol. 51:492–508.
Shimodaira, H., and M. Hasegawa. 2001. CONSEL: for
assessing the confidence of phylogenetic tree selection.
Bioinformatics 17:1246–1247.
Silberman, J. D., A. G. B. Simpson, J. Kulda, I. Cepicka, V.
Hampl, P. J. Johnson, and A. J. Roger. 2002. Retortamonad
flagellates are closely related to diplomonads–implications for
the history of mitochondrial function in eukaryote evolution.
Mol. Biol. Evol. 19:777–786.
Simpson, A. G. B., and D. J. Patterson. 1999. The ultrastructure
of Carpediemonas membranifera: (Eukaryota), with reference
to the ‘‘excavate hypothesis.’’ Eur. J. Protistol. 35:353–
370.
Simpson, A. G. B., and A. J. Roger 2002. Eukaryotic evolution:
getting to the root of the problem. Curr. Biol. 12:R691–
R695.
Simpson, A. G. B., A. J. Roger, J. D. Silberman, D. D. Leipe, V.
P. Edgcomb, L. S. Jermiin, D. J. Patterson, and M. L. Sogin.
2002. Evolutionary history of ‘‘early-diverging’’ eukaryotes:
the excavate taxon Carpediemonas is a close relative of
Giardia. Mol. Biol. Evol. 19:1782–1791.
Sogin, M. L., and J. D. Silberman. 1998. Evolution of the protists
and protistan parasites from the perspective of molecular
systematics. Int. J. Parasitol. 28:11–20.
Stechmann, A., and T. Cavalier-Smith. 2002. Rooting the eukaryote tree by using a derived gene fusion. Science 297:
89–91.
Stechmann, A., and T. Cavalier-Smith. 2003a. The root of the
eukaryote tree pinpointed. Curr. Biol. 13:R665–666.
Stechmann, A., and T. Cavalier-Smith. 2003b. Phylogenetic
analysis of eukaryotes using heat-shock protein Hsp90. J.
Mol. Evol. 57:408–419.
Tovar, J., G. Leon-Avila, L. B. Sánchez, R. Sutak, J. Tachezy, M.
van der Giezen, M. Hernandez, M. Müller, and J. M. Lucocq.
2003. Mitochondrial remnant organelles of Giardia function
in iron-sulphur protein maturation. Nature 426:172–176.
Wang, C. C., R. Verham, S. F. Tzeng, S. Aldritt, and H. W.
Cheng. 1983. Pyrimidine metabolism in Tritrichomonas
foetus. Proc. Natl. Acad. Sci. USA 80:2564–2568.
Wang, C. C., and H. W. Cheng. 1984. Salvage of pyrimidine
nucleosides by Trichomonas vaginalis. Mol. Biochem. Parasitol. 10:171–184.
Williams, B. A., R. P. Hirt, J. M. Lucocq, and T. M. Embley.
2002. A mitochondrial remnant in the microsporidian
Trachipleistophora hominis. Nature 418:865–869.
Wuyts, J., P. De Rijk, Y. Van de Peer, T. Winkelmans, and R. De
Wachter. 2001. The European Large Subunit Ribosomal RNA
database. Nucleic Acids Res. 29:175–177.
Wuyts, J., Y. Van de Peer, T. Winkelmans, and R. De Wachter.
2002. The European database on small subunit ribosomal
RNA. Nucleic Acids Res. 30:183–185.
Yang, Z. 1997. PAML: a program package for phylogenetic
analysis by maximum likelihood. Comput. Appl. Biosci.
13:555–556.
Mark Embley, Associate Editor
Accepted October 8, 2004