Syst. Biol. 64(3):406–415, 2015 © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: [email protected] DOI:10.1093/sysbio/syu126 Advance Access publication December 23, 2014 Taxon-Rich Phylogenomic Analyses Resolve the Eukaryotic Tree of Life and Reveal the Power of Subsampling by Sites LAURA A. KATZ1,2∗ AND JESSICA R. GRANT1 1 Department of Biological Sciences, Smith College, Northampton, MA 01063, USA and 2 Program in Organismic and Evolutionary Biology, UMass-Amherst, Amherst MA 01003, USA ∗ Correspondence to be sent to: Department of Biological Sciences, Smith College, Northampton, MA 01063, USA; E-mail: [email protected] Received 9 March 2014; reviews returned 17 July 2014; accepted 15 December 2014 Associate Editor: Thomas Buckley Abstract.—Most eukaryotic lineages are microbial, and many have only recently been sampled for phylogenetic studies or remain in the “dark area” of the tree of life where there are no molecular data. To assess relationships among eukaryotic lineages, we perform a taxon-rich phylogenomic analysis including 232 eukaryotes selected to maximize taxonomic diversity and up to 1554 genes chosen as vertically inherited based on their broad distribution among eukaryotes. We also include sequences from 486 bacteria and 84 archaea to assess the impact of endosymbiotic gene transfer (EGT) from plastids and to detect contamination. Overall, our analyses are consistent with other less taxon-rich estimates of the eukaryotic tree of life, and we recover strong support for five major clades: Amoebozoa, Excavata (without the genus Malawimonas), Opisthokonta, Archaeplastida, and SAR (Stramenopila, Alveolata, and Rhizaria). Our analyses also highlight the existence of “orphan” lineages, lineages that lack robust placement in the eukaryotic tree of life, and indicate the possibility of as yet undiscovered diversity. In analyses including bacteria and archaea, we find that approximately 10% of the 1554 genes, which we choose because they are found in four or five of the five major eukaryotic clades and hence may be more likely to be inherited vertically, appear to have been acquired from cyanobacteria through EGT in photosynthetic lineages. Removing these EGT genes places the green algae as sister to the glaucophytes instead of the red algae, suggesting that unknowingly including genes of plastid origin, and combining them with genes of nuclear origin, may mislead phylogenetic estimates. Finally, the large size of our data set allows comparative analyses of subsets of data; alignments built from randomly sampled sites provide greater support, particularly for deep relationships, than do equivalent-sized data sets built from randomly sampled genes. [Endosymbiotic gene transfer; eukaryotic phylogeny; microbial diversity; phylogenetics; sampling strategy; tree of life.] The power of phylogenomics for resolving relationships is well-documented for groups shallower than eukaryotes (Delsuc et al. 2005; Philippe et al. 2005; Giribet and Edgecombe 2012; Kumar et al. 2012), and has been applied previously to questions of eukaryotic phylogeny (Hampl et al. 2009; Burki et al. 2010; Burki et al. 2012; Torruella et al. 2012; Sierra et al. 2013) and eukaryotic origins (Cotton and McInerney 2010; Thiergart et al. 2012; Williams et al. 2012). However, given the constraints on large-scale phylogenomic analyses and data availability, the bulk of these studies focused on relatively small numbers of lineages, often in the range of 50–70 species. Yet, the past few years have seen an explosion in genome scale data (e.g., whole genomes and transcriptomes) and high-powered computer resources allowing for more data rich analyses. Reconstructing eukaryotic phylogeny is particularly challenging given the approximately 1.8 billion year age of this clade (Knoll et al. 2006; Parfrey et al. 2011), the tremendous diversity among eukaryotes (Adl et al. 2012), and the impact of both lateral and endosymbiotic gene transfer (LGT and EGT; Andersson 2005; Keeling and Palmer 2008; Archibald 2009; CavalierSmith 2010; Hug et al. 2010; Nowack and Melkonian 2010; Szklarczyk and Huynen 2010). Eukaryotes are marked by tremendous heterogeneity in rates and patterns of molecular evolution, including the rapid evolution of proteins in many lineages (Dacks et al. 2002; Zufall et al. 2006; Yoon et al. 2008; Cuomo et al. 2012). Despite these difficulties, five major clades of eukaryotes have emerged: Amoebozoa, Excavata, Opisthokonta, Archaeplastida, and SAR (stramenopile, alveolate, and rhizaria; e.g., Adl et al. 2012; Katz et al. 2012). The evolutionary history of eukaryotic genomes has also been impacted by the transfer of genes from endosymbionts (i.e., EGT; Martin et al. 1993; 2002). Eukaryotes acquired plastids by endosymbiosis of a cyanobacterium or through secondary endosymbiosis in which a heterotrophic eukaryote engulfed a red or green alga (Delwiche 1999; Cavalier-Smith 2000; Archibald 2009). The major clade Archaeplastida includes green algae, red algae, and glaucophytes, and is argued to have diverged following a single primary acquisition of chloroplasts (Adl et al. 2005). The remaining lineages of photosynthetic eukaryotes (e.g., diatoms, brown algae, euglenoids, cryptomonads, haptophytes, and dinoflagellates) are argued to have resulted from secondary, or in the case of some dinoflagellates, tertiary, or perhaps even quaternary endosymbiosis (Delwiche 1999; Archibald 2009). Given the importance of taxon sampling for phylogenetic reconstruction (Heath et al. 2008), we aimed to capture as broad a diversity of eukaryotes as possible. For example, in studies of eukaryotes (Parfrey et al. 2010), Metazoa (Dunn et al. 2008; Hejnol et al. 2009), and vertebrates (Wiens 2005; Wiens and Tiu 2012), inclusion of taxa in key positions has been critical in stabilizing phylogenies even if these taxa introduced extensive missing data. With well-sampled phylogenomic data, it is also possible to test the idea that robust phylogenies are better obtained from analyses of randomly chosen sites when compared with data 406 2015 KATZ AND GRANT—TAXON-RICH PHYLOGENOMICS OF EUKARYOTES sets of the same size constructed from unlinked genes (Cummings et al. 1995, 1999; Cummings and Meyer 2005). Here we combine a taxon and gene rich approach, with up to 802 species (232 eukaryotes, 486 bacteria, and 84 archaea) and up to 1554 genes (∼500,000 characters), to assess the evolutionary relationships among eukaryotes and the position of eukaryotes on the tree of life. Inclusion of bacterial lineages also enabled us to identify eukaryotic genes that fall sister to cyanobacteria and are hence likely the result of EGTs following acquisition of plastids. The large size of our data set allows us to assess the stability of clades using a variety of subsampling approaches. MATERIALS AND METHODS Taxon and Gene Selection Given our goal of creating a taxon rich phylogenomic study, we collected data from diverse eukaryotes containing at least 100 proteins available on public databases, and curated these sequences in a custombuilt pipeline as described in Grant and Katz (2014). To this end, we began with the taxa included in OrthoMCL (Li et al. 2003; Chen et al. 2006), a database of over 124,000 orthologous groups from whole genomes of 98 eukaryotes, 44 bacteria, and 16 archaea. We then added sequence data from all eukaryotic taxa in GenBank with at least 100 proteins either in the protein database or as EST projects and data from transcriptome projects from 407 our laboratory (Chilodonella uncinata, Subulatomonas tetraspora, Corallomyxa tenera) or from the Marine Microbial Eukaryote Transcriptome Sequencing project (http://marinemicroeukaryotes.org/; Trichosphaerium sp. ATCC40318, Filamoeba nolandi ATCC50430, Mayorella sp., Stereomyxa ramosa, Pessonella sp. PRA-29, Rhodomonas lens, Tetraselmis chuii, Mesodinium pulex, Strombidinopsis sp., and Tiarina fusus). To maintain some taxonomic evenness, we subsampled from groups that are well represented in GenBank (i.e., plants, animals, fungi, and some parasitic microbial lineages such as Plasmodium and Trypanosoma). We also removed taxa from the analyses if the data appeared to be highly contaminated; for example, the EST data from the rhizarian plant parasites Plasmodiophora brassicae and Spongospora subterranea included many sequences that fell within Archaeplastida in preliminary analyses. Other taxa (e.g., Welwitschia mirabilis and Ginkgo biloba) were removed because they had few data that remained in our matrices after data refinement was complete. In the final analyses, we also removed the microsporidian fungi as they consistently fell on very long branches and their position was unstable across analyses. To capture putatively vertically inherited genes, we identified 1554 orthologous groups from OrthoMCL (Chen et al. 2006) that were present in at least four of five major eukaryotic clades (i.e., Opisthokonta, Archaeplastida, Amoebozoa, Excavata and SAR) as represented in OrthoMCL (http://orthomcl. org/orthomcl/; Fig. 1 and Supplementary Table S2; FIGURE 1. Depiction of output from phylogenomic pipeline reveals heterogeneity in data availability for the 1554 genes sampled. Each column represents a gene and the warmth (darkness) of the color represents the number of sequences captured for each gene and clade. The genes are ranked by evenness as measured by representation across both major clades and subclades, and the most evenly sampled 150 genes are marked by the bar at the bottom of figure. The last row indicates genes identified as impacted by EGT as orthologs in photosynthetic eukaryotes nest among cyanobacteria. Am = Amoebozoa: ac, Acanthamoebidae; ar, Archamoebae; da, Dactylopodida; di, Dictyosteliida; fi, Filamoeba; is, incertae sedis; my, Mycetozoa; va, Vannellidae; EE = “everything else”: ap, Apusomonadidae; br, Breviatea; cr, Cryptophyta; ha, Haptophyta; ka, Katablepharidophyta; Ex = Excavata; eu, Euglenozoa; fo, Fornicata; he, Heterolobosea; ja, Jakobida; ma, Malawimonadidae; ox, Oxymonadida; pa, Parabasalia; Op = Opisthokonta; ch, Choanoflagellida; fu, Fungi; ic, Ichthyosporea; me, Metazoa; Pl = Archaeplastida/Plantae: gl, Glaucocystophyceae; gr, Viridiplantae; rh, Rhodophyta; Sr = SAR: ap, Apicomplexa; ch, Chromerida; ci, Ciliophora; di, Dinophyceae; pe, Perkinsida; rh, Rhizaria; st, Stramenopiles. 408 SYSTEMATIC BIOLOGY available from http://dx.doi.org/10.5061/dryad.db78g). We generated alignments using a pipeline that is described in detail in Grant and Katz (2014). In brief, genes are identified first by BLAST, using data from OrthoMCL as query and then pairwise comparisons of sequences are used to remove too similar (>98% identical; e.g., allelic variation) or too divergent (< 70% identical; e.g., non-orthologous) data. Alignments are then refined using Guidance (Penn et al. 2010) and RaxML (Stamatakis 2006) is used to build single gene trees. The resulting data set had 802 taxa (232 eukaryotes, 486 bacteria, and 84 archaea) and up to 493,050 amino acids. The well-sampled marker SSU-rDNA gene was also concatenated to some gene matrices. To maximize the power of our tree reconstructions, we put considerable effort into analysis of the 150 genes that had the most evenly distributed taxa across the eukaryotic clades (e.g., Excavata, Amoebozoa, Alveolata, Rhizaria, Stramenopile, Archaeplastida, and Opisthokonta; indicated in Fig. 1). Phylogenetic Analyses For all analyses, the best fitting model for each gene was determined using ProtTest (Darriba et al. 2011). The model LG was determined to be the best fitting model for all but 8 of the 1554 genes. Of the eight, six had WAG as the best fit, one had JTT, and one had BLOSUM, but of these genes all had LG as the second-best fit with a deltaBIC < 10. Because estimates from ProtTest were very similar for +G and +G+I, and because of the computational cost of partitioning these data, we ran all amino acid matrices under the PROTGAMMALG model using default parameters as implemented in RAxML. Where SSU-rDNA nucleotide data were included, we partitioned the data to run with GTRGAMMA for nucleotide data and PROTGAMMALG for amino acids. Phylogenies were reconstructed in one of several ways: 1) for smaller and more tractable data sets we used RAxML (Stamatakis et al. 2005; Stamatakis 2006; Berger et al. 2011) with rapid bootstrapping of 100 replicates (Stamatakis et al. 2008) followed by full maximum likelihood reconstruction; 2) for larger data sets, and for subsampling, we used ExaML (Stamatakis and Aberer 2013), a faster implementation of the RAxML algorithm that provides the topology of the most likely tree without bootstrap support (BS); 3) we also ran an analysis in PhyloBayes (PhyloBayesMPI 1.4f; Lartillot et al. 2009) as implemented at CIPRES (Miller et al. 2010) using the GTR CAT model, to assess the stability of our results to changing models. Although the PhyloBayes run did not converge after approximately 52 h on a 192 core computer (up to 2750 cycles), likely due to instability among closely related taxa, deeper nodes were consistent with our other analyses. Analyses are named in part based on their inclusion of either the 150 most evenly sampled genes or the full set of 1554 genes (“150” vs. “1554”), whether they were run without genes impacted by (“EGT” vs “-EGT” and varying taxon inclusion to focus only on eukaryotes VOL. 64 (“E”), or to include also Bacteria and Archaea (“A, B, E”; Table 1). We explored the impact of missing data using a range of levels (40–70%) and found that topologies were generally stable to varying levels of missing data (i.e., supported clades retained across analyses). Based on these preliminary analyses, we choose an arbitrary cut-off of more than 50% missing data (per column) for removal in all analyses reported here. As detailed in the legend for Figures 1 and 2; we also used a standard naming system with the first two letters referring to major clade (e.g., Op for Opisthokonta), the second two letters referring to subclade (e.g., me for Metazoa), and the next four letters referring to species (e.g., hsap for Homo sapiens). Comparing Sampling by Sites versus Genes Data sets of the size generated by our pipeline are too large to be analyzed by most phylogenetic software programs. This allowed us to ask whether researchers analyzing phylogenomic data are best off subsampling by genes or by randomly chosen sites. Subsampling by genes at random might enable capture of diverse genes with varying levels of functional constraint, whereas subsampling by sites may better fit the assumption of site independence inherent in most phylogenetic models. We subsampled the full data set of 1554 genes and 493,050 amino acids (again masking columns with ≥50% missing data), choosing random genes or random sites to make new matrices of varying lengths from 1000 to 45,000 characters. For each sampling strategy, we made 20 alignments of lengths 1000, 5000, 10,000, 15,000, 20,000, 25,000, 30,000, 35,000, 40,000, and 45,000. These were analyzed with EXaML and the proportion of the 20 topologies that supported monophyly of a particular clade is reported. We assessed the impact of fast evolving sites in our analyses using TIGER (Cummins and McInerney 2011) with default parameters. TIGER infers the evolutionary rate of homologous sites, and groups sites with similar rates into bins. We removed sites from the two and three highest bins (i.e. fastest 14,800 and 26,017 sites, respectively) from the alignments “150” and “150-EGT” with eukaryotes only, respectively, and analyzed the resulting matrices in RAxML as described above. Endosymbiotic Gene Transfer To assess EGTs and possible contamination, we also included whole, genome data from 486 bacteria and 84 archaea, limiting ourselves to two species per genus for well-sampled genera. Prior to concatenation, we removed possible contaminating sequences identified as single eukaryotes (i.e., without eukaryotic sisters) within bacterial and/or archaeal clades. To search for genes with putative signal of EGT, we use python scripts that employed the tree-walking methods in p4 (Foster 2004) and identified genes with eukaryotic sequences sister to or nested within cyanobacteria. We identified 122 genes 2015 409 KATZ AND GRANT—TAXON-RICH PHYLOGENOMICS OF EUKARYOTES TABLE 1. Monophyly of major clades is consistent across most analyses, though with varying levels of support Analysis Domains Number of taxa Number of characters Software 150 E 232 36,346 RAx 150-EGT E 232 35,302 RAx 150 E 232 36,346 EXa 150-EGT E 232 35,302 EXa 1554 E 232 64,860 EXa 1554-EGT E 232 54,939 EXa 150 A, B, E 800 36,346 EXa 150-EGT A, B, E 800 35,302 EXa 1554 A, B, E 800 64,860 EXa 1554-EGT A, B, E 800 54,939 EXa 150 E 235 34,991 PB Amoebozoa Excavata Opisthokonts Plantae SAR 85 81 100 82 66 93 85 100 40 70 o o x x x x x x x x o o x x x o o x x x o o x x x o o x x x o o x o x o o x x x 1 o 1 0.78 0.91 Alveolates Ap, Di, Ch, Pe Discoba Euglenozoa Green + Red Green + Glauc Hacrobia Rhizaria (Sp + Ct) + Me 100 100 98 100 67 o 57 86 99 100 100 92 100 o 75 45 89 99 x x x x x o x x x x x x x o x x x x x x x x x o x x x x x x x o x x x x x x x x x o x x x x x x x o x x x x x x x x o o x x x x x x x o x x x x 0.73 0.91 0.95 0.95 0.79 o 0.69 o o Notes: Analyses are named first based on the number of genes using in initial data set (i.e., “150” for 150 most evenly sampled genes; 1554 for full data set), and then by noting whether genes impacted by EGT are removed (i.e., -EGT). Analyses vary based on the domains included and the resulting number of taxa. x = monophyletic; o = nonmonophyletic; for RAxML (RAx), numbers represent BS for monophyletic clades; for PhyloBayes (PB), numbers represent posterior probabilities of monophyletic clades. The large size of the data sets constrained the approaches that could be used. For example, ExaML (EXa) was used for larger data sets even though it appears to produce topologies that are consistently different in some areas when compared with RAxML (RAx); for example, neither Amoebozoa or Excavata are recovered consistently in EXaML analyses as the long branched Entamoeba tend to fall in Excavata (online supplementary Data). The PB analyses were done excluding SSU-rDNA and including the microsporidians, which were removed from other analyses as these taxa are long branched and unstable across analyses. Also, in the PhyloBayes analyses, Picobiliphyte sp. MS584-22 is sister to Rhizaria whereas in ML analyses, this taxon falls in Orphan clade 1 (Fig. 2) along with the cryptophytes, haptophytes, Roombia truncata, and Telonema subtile. Abbreviations are: Ap, Di, Ch, Pe = apicomplexa, dinoflagellates, chromerids, and perkinsida and (Sp + Ct) + Me = sponges and ctenophores outside of the rest of the Metazoa. from the 1554 (12 of which were from the 150 most even genes) that are putative examples of EGT. We removed these genes from some analyses to assess the impact of EGT (e.g., 150-EGT and 1554-EGT). Placement of Eukaryotes on the Tree of Life To address the placement of eukaryotes on the tree of cellular life while minimizing the impact of lateral gene transfer (LGT), we concatenated the 207 genes from the 1554 gene set that included bacteria and/or archaea, and that yielded monophyletic eukaryotes. We removed taxa with data in fewer than 20 genes and columns with more than 50% missing data. The resulting alignment had 442 taxa and 17,220 characters. This alignment was analyzed under the PROTGAMMALG model of RAxML with rapid bootstrapping of 100 replicates, followed by full maximum likelihood reconstruction. RESULTS AND DISCUSSION Phylogenomic Data Set Generated for this Study Our pipeline yielded an alignment of 1554 genes from 232 eukaryotes, and up to 486 bacteria and 84 archaea (Figs. 1 and 2; supplementary Tables S1 and S2). Among the eukaryotes, the focus of this study, available data are not evenly distributed among lineages, with overrepresentation of genes from macroscopic clades (e.g., animals (Op_me), fungi (Op_fu), green algae, including land plants (Pl_gr)) and microbial clades with many whole-genome sequences (e.g., Apicomplexa such as Plasmodium (Sr_ap), Euglenozoa like Trypanosoma and Leishmania (Ex_eu)). Other clades are underrepresented in our data set, either because there are few molecular data for members (e.g., Amoebozoa:Dactylopodida (Am_da) and Amoebozoa:Filamoeba (Am_fi)) and/or because these clades are relatively species poor (e.g., Excavata:Oxymonadida (Ex_ox) and SAR:Perkinsida (Sr_pe)). Analyses of 150 Most Evenly Sampled Genes Yield Robust Estimate of the Eukaryotic Tree of Life We put the bulk of our effort into analyses of the 150 genes that were most evenly sampled across major clades, representing a data set of 36,346 characters (Fig. 1 and Table 1 analysis “150”). The resulting topology supports five major clades of eukaryotes (Fig. 2)—Amoebozoa, Excavata (without the genus Malawimonas), Opisthokonta, Archaeplastida, and SAR—and is consistent with a subset of previous analyses that used fewer taxa (Hampl et al. 2009) or fewer genes (Yoon et al. 2008; Parfrey et al. 2010). The placement of Malawimonas has also been unstable in other analyses (Simpson et al. 2006; Hampl et al. 2009; 410 SYSTEMATIC BIOLOGY VOL. 64 FIGURE 2. Most likely tree from 150 most even genes, including 232 eukaryotes and 36,346 characters, reveals five major clades of eukaryotes and several “orphan” lineages (lineages lacking clear sister taxa at this time). The tree is drawn rooted between Opisthokonta and the remaining eukaryotic clades, though we recognize that the root of the eukaryotic tree of life is under debate. Subclades indicated in bold are monophyletic and include all subclades except the diatoms as the single stramenopile taxon, Vaucheria litorea, interrupts the monophyly of diatoms in this analysis. Notations on branches indicate full BS (asterisk) or ≥80% BS (filled circles). This analysis is entitled “150” in Table 1, and abbreviations are as in Figure 1. 2015 KATZ AND GRANT—TAXON-RICH PHYLOGENOMICS OF EUKARYOTES Derelle and Lang 2011; Zhao et al. 2012), suggesting that this genus may not belong to the major clade Excavata but may instead represent an orphan lineage (i.e., one without clear position given current taxon sampling). In a RAxML analysis, BS for these major clades varied from 100% for Opisthokonta, to more than 80% for Amoebozoa, Archaeplastida, and Excavata (without Malawimonas), to 66% for SAR (Fig. 2 and Table 1 analysis “150”). Nested relationships within major eukaryotic clades are generally concordant with previous phylogenies, with monophyly indicated by bold text in Figure 2. Monophyletic clades include animals, fungi, green algae, embryophytes, red algae, brown algae, and apicomplexa (Table 1; online Supplementary Data). The one exception is that the diatoms are not monophyletic as the single representative of the golden algae, Vaucheria litorea, falls on a long branch within the diatoms (Fig. 2). Analyses with varying taxon inclusion and software (i.e., RaxML, ExaML, and PhyloBayes) generate similar topologies (Table 1). Removing rapidly evolving sites from these alignments using TIGER (Cummins and McInerney 2011) yields consistent topology with lower support for most clades. We depict the topology rooted between Opisthokonta and the remaining eukaryotes (consistent with some previous studies (Stechmann and CavalierSmith 2002; Derelle and Lang 2011; Katz et al. 2012), though we recognize the debate around the placement of the root of the eukaryotic tree of life. As further evidence of the robustness of analyses of the 150 most even genes, several hypotheses of relationships within major clades are supported. For example, we recover the monophyly of ascomycetes and basidiomycetes within the fungi (Hibbett et al. 2007) and the Discoba within Excavata, which contains Heterolobosea, Euglenozoa, and Jakobida (Fig. 2 and Supplementary Table S3; Hampl et al. 2009). We also assessed the controversial topology for early diverging animal lineages and found that our analyses yield a topology for Metazoa with a clade containing a sponge (Oscarella) and ctenophore (Mnemiopsis) as sister to all remaining animals (Fig. 2 and Table 1), a finding consistent with some recent phylogenomic analyses (Dunn et al. 2008; Hejnol et al. 2009). Our taxon rich analyses also highlight the presence of several clades of “orphan” lineages, lineages that lack clear sister taxa and/or are unstable in position across our trees. The first of these clades, “Orphan clade 1” (Fig. 2), contains several photosynthetic lineages such as the small (∼6m) picobiliphytes, the katablepharid taxon Roombia truncata, and the incertae sedis taxon Telonema subtile. These lineages fall sister to the haptophytes and cryptophytes, two lineages previously considered to be part of the now falsified taxon “Chromalveolata” (Parfrey et al. 2010). The position of these taxa is controversial, in part because some have been placed in the “Hacrobia”, which in turn is a taxon with changing definition (Okamoto et al. 2009; Burki et al. 2012). A second group of orphans, “Orphan clade 2” (Fig. 2), includes the controversial genus Malawimonas, 411 that has been argued to belong to Excavata, though it is often found outside of this major clade (Simpson et al. 2006; Hampl et al. 2009; Derelle and Lang 2011; Zhao et al. 2012). In many of our analyses, Malawimonas falls sister to Collodictyon triciliatum (Zhao et al. 2012), though the placement of these two taxa is unstable (see online Supplementary Data). The final orphan clade includes two taxa previously identified as a potential novel clade of eukaryotes, S. tetraspora and Breviata anathema (Walker et al. 2006; Katz et al. 2011; Grant et al. 2012), and Thecamonas (Amastigomonas) trahens, a member of the Apusozoa (Cavalier-Smith 1997; CavalierSmith and Chao 2003). The position of these lineages varies by analyses (online supplementary Data). Analyses of 1554 Eukaryotic Genes We also assessed relationships using the full data set of 1554 genes, though the large size here (232 eukaryotes × 493,050 characters) made analyses challenging. For example, full RaxML analyses with bootstraps were not possible even with the high-powered computing that we accessed through CIPRES (Miller et al. 2010) and University Florida Research Computing Center. Instead, we used ExaML, despite differences in reconstructions based on these analyses as compared to RaxML; ExaML generally fails to recover monophyly of Excavata (even without Malawimonas) or Amoebozoa because Archamoeba go within Excavata, sister to a clade containing Fornicata and parabasalids (Table 1 analysis “1554”, online Supplementary Data). Given the growing body of evidence supporting these clades (e.g., Hampl et al. 2009; Parfrey et al. 2010; Adl et al. 2012), we suspect that lack of monophyly of Amoebozoa and Excavata (with or without the controversial Malawimonas) may be due to differences in implementation of ML analysis in ExaML when compared with RaxML. Beyond differences that might be attributed to software package, the 1554 analyses are generally concordant with analyses of 150 most evenly sampled genes in that we recover Opisthokonta, Archaeplastida, SAR, and many of the nested clades (Table 1). Analyses Including Bacterial and/or Archaeal Orthologs Inclusion of bacterial and archaeal orthologs enabled us to look at the impact of genes of endosymbiotic origin and the placement of eukaryotes on the tree of cellular life. We identified 122 genes where homologs in photosynthetic eukaryotes appear to be descended from the cyanobacterial ancestor of plastids (see dashes at bottom of Fig. 1, and Supplementary Table S2). These genes were identified by the placement of photosynthetic eukaryotes sister to at least 1 of the 10 cyanobacteria included in our pipeline. Removal of 12 potential EGT genes from the 150 analyses had little impact on the structure of the tree except for in a few places (compare “150 euks” to “150-EGT” in Table 1). Notably, the support for the monophyly of Archaeplastida dropped from 82% 412 SYSTEMATIC BIOLOGY VOL. 64 BS to 40% BS and within Archaeplastida green algae are sister to glaucophytes (75% BS) instead of sister to red algae (67% BS). Removal of 122 potential EGT genes from the 1554 had the same topological effect, though the size of this data set constrained us to use ExaML (Table 1). These data indicate that inclusion of just a few genes impacted by EGT can have a substantial impact on interpretation of evolutionary relationships among photosynthetic taxa. To evaluate the placement of eukaryotes on the tree of life and to explore root of eukaryotes, we concatenated genes that met two criteria: they included bacteria and/or archaea and generated single gene trees with monophyletic eukaryotes. We restricted ourselves to the trees with monophyletic eukaryotes as a means of lessening the impact of both EGT and LGT on our analyses. Phylogenetic reconstructions of the resulting 207 genes revealed that eukaryotes emerge as a lineage of archaea, sister to species in the archaeal clade Thaumarchaeota (Fig. 3). This result is consistent with multiple analyses that challenge the validity of the three domain hypotheses (e.g., Lake et al. 1984; Tourasse and Gouy 1999; Brown et al. 2001; Williams et al. 2013), including analyses that find eukaryotes emerging from among the “TACK” (Thaumarchaeota, Aigarchaeota, Crenarchaeota, and Korarchaeota) clade of archaea (Williams et al. 2012). Within eukaryotes, the root appears between Fornicata (Excavata such as Giardia lamblia), Archamoeba, and the remaining eukaryotes. However, this root may be due to systematic error such as: 1) neither Amoebozoa nor Excavata are monophyletic in these analyses; and 2) ‘long branch’ lineages such as Trichomonas vaginalis and Entamoeba spp. fall toward root of tree (Fig. 3). More Genes or More Sites? Subsampling phylogenomic data for sites chosen at random yields more robust phylogenies than subsampling by gene (Fig. 4). The large size of our data set enabled us to assess clade stability by subsampling characters to build matrices of varying sizes. We took two approaches here, assessing the consistency of nodes from 20 data sets of a given size created using a subsampling of either genes or randomly chosen sites from the supermatrix. In nearly all cases, nodes were more stable in analyses of randomly chosen sites than in matrices of the same size that consisted of gene segments (Fig. 4). For example, 10,000 characters were sufficient to yield monophyletic Opisthonkonta in 19 of 20 trees when subsampling by site but only half of the trees generated with same-sized subsamples of genes (Fig. 4 and Supplementary Table S4). Similarly, the monophyly of SAR is more stable in subsamples by site; in data sets of 30,000 characters, SAR is recovered in 19 of 20 trees sampled by site and only 12 of 20 trees by gene (Supplementary Table S4). Our findings at a phylogenomic scale are consistent with analyses of smaller data sets. For example, the same FIGURE 3. Phylogenetic analyses of 207 genes that included bacteria/archaea and had a monophyletic clade of eukaryotes place eukaryotes as a lineage of archaea. Topology generated in RaxML based on analysis of 17,220 sites. Only monophyletic groups are labeled, with the notable exception of the paraphyletic Archaea. Bootstrap values are shown for the monophyly of eukaryotes (100%) and the sister relationship between archaea and Thaumarchaeota (79%). A full tree with species names and all bootstrap values can be found as a Newick string in the online Supplementary Data. inference on the greater power of random sites when compared with gene segments was found analyzing loci within fully linked mitochondrial genomes (Cummings et al. 1995, 1999). One explanation is that the nonindependence of sites within a gene can lead to 2015 KATZ AND GRANT—TAXON-RICH PHYLOGENOMICS OF EUKARYOTES 413 tree of life, including identifying major clades plus several clusters of “orphan” lineages (lineages without clear sister taxa). Inclusion of bacterial and archaeal sequences also enabled us to investigate the impact of lateral gene transfer from plastids to host nuclei (EGT) on estimates of evolutionary relationships among photosynthetic eukaryotes. Finally, we demonstrate that random sampling sites provides better estimates of phylogeny when compared with data sets of the same size made from gene segments. SUPPLEMENTARY MATERIAL Supplementary material related to this manuscript has been deposited on Treebase (eukaryotes only concatenated alignments and trees, due to size constraints; TB2:S16260) and Dryad (supplementary data tables, plus all concatenated alignments used for analyses and their associated trees; http://dx.doi.org/ 10.5061/dryad.db78g). FUNDING This work was support by the National Science Foundation AVAToL grant [DEB-1208741 to L.A.K.]. ACKNOWLEDGMENTS FIGURE 4. Subsampling by sites chosen at random yields more robust phylogenetic estimates than subsampling in gene pieces, based on assessment of monophyly of clades in 20 trees produced for each sampling strategy and data set size. Analyses are divided into major clades (top panel), nested hypotheses (mid panel), and groups with morphological synapomorphies (lower panel). Warmer (darker) colors represent greater numbers of trees supporting monophyly, and numerical values are in Supplementary Table S4. Clades recovered significantly more often when sampling by site than by gene are marked with asterisk (P < 0.05 as assessed by a two-tailed t-test implemented in R (http://www.r-project.org/)). Abbreviations are: Ap, Di, Ch, Pe = apicomplexa, dinoflagellates, chromerids, and perkinsida and (Sp + Ct) + Me = sponges and ctenophores together, outside of the rest of the metazoan. inaccurate phylogenetic estimates (Cummings et al. 1995, 1999), which contributed to the suggestion that sampling multiple, unlinked genes from various parts of the genome will help overcome this effect (Cummings and Meyer 2005). However, our sampling of genes chosen at random with respect to their position in genomes did not perform as well as sampling random sites from within these unlinked genes (Fig. 4). Intriguingly, the impact of the sampling effect was greatest on deeper nodes (estimated at ∼1 billion years or more), suggesting the need to rethink sampling approaches for studies of early evolutionary events. SYNTHESIS Using a data set of up to 802 species and 1554 genes, we are able to construct a robust estimate for the eukaryotic The authors are grateful to Gordon Burleigh (University of Florida), T. Martin Embley (Newcastle University), Tom Williams (Newcastle University) for help with computer resources and conversations about the work presented here, and to Jean-David Grattepanche (Smith College) for advice on statistics. They also thank to Mark Miller and Wayne Pfeiffer at CIPRES and Charles Taylor and Oleksandr Moskalenko at the University of Florida for support in using computer clusters. The authors also thank William F. Martin (University of Düsseldorf) and two anonymous reviewers for comments on earlier versions of this manuscript. The Gordon and Betty Moore Foundation provided sequence data and assembly through the Marine Microbial Eukaryote Transcriptome project (http://marinemicroeukaryotes.org/; last accessed September 1, 2015). REFERENCES Adl S.M., Simpson A.G.B., Farmer M.A., Andersen R.A., Anderson O.R., Barta J.R., Bowser S.S., Brugerolle G., Fensome R.A., Fredericq S., James T.Y., Karpov S., Kugrens P., Krug J., Lane C.E., Lewis L.A., Lodge J., Lynn D.H., Mann D.G., McCourt R.M., Mendoza L., Moestrup O., Mozley-Standridge S.E., Nerad T.A., Shearer C.A., Smirnov A.V., Spiegel F.W., Taylor M. 2005. The new higher level classification of eukaryotes with emphasis on the taxonomy of protists. J. Eukaryot. Microbiol. 52:399–451. Adl S.M., Simpson A.G.B., Lane C.E., Lukes J., Bass D., Bowser S.S., Brown M.W., Burki F., Dunthorn M., Hampl V., Heiss A., Hoppenrath M., Lara E., le Gall L., Lynn D.H., McManus H., Mitchell E.A.D., Mozley-Stanridge S.E., Parfrey L.W., Pawlowski J., Rueckert S., Shadwick L., Schoch C.L., Smirnov A., Spiegel F.W. 414 SYSTEMATIC BIOLOGY 2012. The revised classification of eukaryotes. J. Eukaryot. Microbiol. 59:429–493. Andersson J.O. 2005. Lateral gene transfer in eukaryotes. Cell. Mol. Life Sci. 62:1182–1197. Archibald J.M. 2009. The puzzle of plastid evolution. Curr. Biol. 19: R81–R88. Berger S.A., Krompass D., Stamatakis A. 2011. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol. 60:291–302. Brown J.R., Douady C.J., Italia M.J., Marshall W.E., Stanhope M.J. 2001. Universal trees based on large combined protein sequence data sets. Nat. Genet. 28:281–285. Burki F., Kudryavtsev A., Matz M.V., Aglyamova G.V., Bulman S., Fiers M., Keeling P.J., Pawlowski J. 2010. Evolution of Rhizaria: new insights from phylogenomic analysis of uncultivated protists. BMC Evol. Biol. 10:18. Burki F., Okamoto N., Pombert J.F., Keeling P.J. 2012. The evolutionary history of haptophytes and cryptophytes: phylogenomic evidence for separate origins. Proc. Roy. Soc. B Biol. Sci. 279:2246–2254. Cavalier-Smith T. 1997. Amoeboflagellates and mitochondrial cristae in eukaryote evolution: megasystematics of the new protozoan subkingdoms Eozoa and Neozoa. Arch. Protistenkd. 147: 237–258. Cavalier-Smith T. 2000. Membrane heredity and early chloroplast evolution. Trends Plant Sci. 5:173–182. Cavalier-Smith T. 2010. Origin of the cell nucleus, mitosis and sex: roles of intracellular coevolution. Biol. Direct 5:7. Cavalier-Smith T., Chao E.E.Y. 2003. Phylogeny of choanozoa, apusozoa, and other protozoa and early eukaryote megaevolution. J. Mol. Evol. 56:540–563. Chen F., Mackey A.J., Stoeckert C.J., Roos D.S. 2006. OrthoMCLDB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 34:D363–D368. Cotton J.A., McInerney J.O. 2010. Eukaryotic genes of archaebacterial origin are more important than the more numerous eubacterial genes, irrespective of function. Proc. Natl Acad. Sci. U. S. A. 107:17252–17255. Cummings M.P., Meyer A. 2005. Magic bullets and golden rules: data sampling in molecular phylogenetics. Zoology 108:329–336. Cummings M.P., Otto S.P., Wakeley J. 1995. Sampling properties of DNA sequence data in phylogenetic analysis. Mol. Biol. Evol. 12: 814–822. Cummings M.P., Otto S.P., Wakeley J. 1999. Genes and other samples of DNA sequence data for phylogenetic inference. Biol. Bull. 196: 345–347. Cummins C.A., McInerney J.O. 2011. A method for inferring the rate of evolution of homologous characters that can potentially improve phylogenetic inference, resolve deep divergence and correct systematic biases. Syst. Biol. 60:833–844. Cuomo C.A., Desjardins C.A., Bakowski M.A., Goldberg J., Ma A.T., Becnel J.J., Didier E.S., Fan L., Heiman D.I., Levin J.Z., Young S., Zeng Q., Troemel E.R. 2012. Microsporidian genome analysis reveals evolutionary strategies for obligate intracellular growth. Genome Res. 22:2478–2488. Dacks J.B., Marinets A., Doolittle W.F., Cavalier-Smith T., Logsdon J.M. 2002. Analyses of RNA polymerase II genes from free-living protists: phylogeny, long branch attraction, and the eukaryotic big bang. Mol. Biol. Evol. 19:830–840. Darriba D., Taboada G.L., Doallo R., Posada D. 2011. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165. Delsuc F., Brinkmann H., Philippe H. 2005. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6:361–375. Delwiche C.F. 1999. Tracing the tread of plastid diversity through the tapestry of life. Am. Nat. 154:S164–S177. Derelle R., Lang F.B. 2011. Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Mol. Biol. Evol. 29:1277–1289. Dunn C.W., Hejnol A., Matus D.Q., Pang K., Browne W.E., Smith S.A., Seaver E., Rouse G.W., Obst M., Edgecombe G.D., Sorensen M.V., Haddock S.H.D., Schmidt-Rhaesa A., Okusu A., Kristensen R.M., Wheeler W.C., Martindale M.Q., Giribet G. 2008. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452:745–U745. VOL. 64 Foster P.G. 2004. Modeling compositional heterogeneity. Syst. Biol. 53:485–495. Giribet G., Edgecombe G.D. 2012. Reevaluating the arthropod tree of life. Annu. Rev. Entomol. 57:167–186. Grant J.R., Lahr D.J.G., Rey F.E., Burleigh J.G., Gordon J.I., Knight R., Molestina R.E., Katz L.A. 2012. Gene discovery from a pilot study of the transcriptomes from three diverse microbial eukaryotes: Corallomyxa tenera, Chilodonella uncinata, and Subulatomonas tetraspora. Protist Genomics. 1:3–18. Grant J.R., Katz L.A. 2014. Building a phylogenomic pipeline for the eukaryotic tree of life—addressing deep phylogenies with genomescale data. PLoS Curr. 6. Hampl V., Hug L., Leigh J.W., Dacks J.B., Lang B.F., Simpson A.G.B., Roger A.J. 2009. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic “supergroups”. Proc. Natl. Acad. Sci. U. S. A. 106:3859–3864. Heath T.A., Hedtke S.M., Hillis D.M. 2008. Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol. 46:239–257. Hejnol A., Obst M., Stamatakis A., Ott M., Rouse G.W., Edgecombe G.D., Martinez P., Baguna J., Bailly X., Jondelius U., Wiens M., Muller W.E.G., Seaver E., Wheeler W.C., Martindale M.Q., Giribet G., Dunn C.W. 2009. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc. Roy. Soc. B Biol. Sci. 276:4261–4270. Hibbett D.S., Binder M., Bischoff J.F., Blackwell M., Cannon P.F., Eriksson O.E., Huhndorf S., James T., Kirk P.M., Lücking R., Thorsten Lumbsch H., Lutzoni F., Matheny P.B., McLaughlin D.J., Powell M.J., Redhead S., Schoch C.L., Spatafora J.W., Stalpers J.A., Vilgalys R., Aime M.C., Aptroot A., Bauer R., Begerow D., Benny G.L., Castlebury L.A., Crous P.W., Dai Y.-C., Gams W., Geiser D.M., Griffith G.W., Gueidan C., Hawksworth D.L., Hestmark G., Hosaka K., Humber R.A., Hyde K.D., Ironside J.E., Kõljalg U., Kurtzman C.P., Larsson K.-H., Lichtwardt R., Longcore J., Miadlikowska J., Miller A., Moncalvo J.-M., Mozley-Standridge S., Oberwinkler F., Parmasto E., Reeb V., Rogers J.D., Roux C., Ryvarden L., Sampaio J.P., Schüssler A., Sugiyama J., Thorn R.G., Tibell L., Untereiner W.A., Walker C., Wang Z., Weir A., Weiss M., White M.M., Winka K., Yao Y.-J., Zhang N. 2007. A higher-level phylogenetic classification of the Fungi. Mycol. Res. 111:509–547. Hug L.A., Stechmann A., Roger A.J. 2010. Phylogenetic distributions and histories of proteins involved in anaerobic pyruvate metabolism in eukaryotes. Mol. Biol. Evol. 27:311–324. Katz L.A. 2012. Origin and diversification of eukaryotes. Annu. Rev. Microbiol. 66:411–427. Katz L.A., Grant J.R., Parfrey L.W., Burleigh J.G. 2012. Turning the crown upside down: gene tree parsimony roots the eukaryotic tree of life. Syst. Biol. 61:653–660. Katz L.A., Grant J.R., Parfrey L.W., Gant A., O’Kelly C.J., Anderson O.R., Molestina R.E., Nerad T. 2011. Subulatomonas tetraspora nov. gen. nov. sp. is a member of a previously unrecognized major clade of eukaryotes. Protist 162:762–773. Keeling P.J., Palmer J.D. 2008. Horizontal gene transfer in eukaryotic evolution. Nat. Rev. Genet. 9:605–618. Knoll A.H., Javaux E.J., Hewitt D., Cohen P. 2006. Eukaryotic organisms in Proterozoic oceans. Philos. Trans. R. Soc. B Biol. Sci. 361:1023–1038. Kumar S., Filipski A.J., Battistuzzi F.U., Pond S.L.K., Tamura K. 2012. Statistics and truth in phylogenomics. Mol. Biol. Evol. 29:457–472. Lake J.A., Henderson E., Oakes M., Clark M.W. 1984. Eocytes: a new ribosome structure indicates a kingdom with a close relationship to eukaryotes. Proc. Natl. Acad. Sci. U. S. A. 81:3786–3790. Lartillot N., Lepage T., Blanquart S. 2009. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25:2286–2288. Li L., Stoeckert C.J. Jr., Roos D.S. 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178–2189. Martin W., Brinkmann, H., Savonna, C., Cerff R. 1993. Evidence for a chimeric nature of nuclear genomes: eubacterial origin of eukaryotic glyceraldehyde-3-phosphate dehydrogenase genes. Proc. Natl. Acad. Sci. U. S. A. 90:8692–8698. Martin W., Rujan T., Richly E., Hansen A., Cornelsen S., Lins T., Leister D., Stoebe B., Hasegawa M., Penny D. 2002. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Natl Acad. Sci. U. S. A. 99:12246–12251. 2015 KATZ AND GRANT—TAXON-RICH PHYLOGENOMICS OF EUKARYOTES Miller M.A., Pfeiffer W., Schwartz T. 2010. Creating the CIPRES science gateway for inference of large phylogenetic trees. Proceedings of the Gateway Computing Environments Workshop (GCE). New Orleans, LA. p. 1–8. Nowack E.C.M., Melkonian M. 2010. Endosymbiotic associations within protists. Philos. Trans. R. Soc. B Biol. Sci. 365:699–712. Okamoto N., Chantangsi C., Horak A., Leander B.S., Keeling P.J. 2009. Molecular phylogeny and description of the novel katablepharid Roombia truncata gen. et sp nov., and establishment of the Hacrobia taxon nov. PLoS One 4:e7080. Parfrey L.W., Grant J., Tekle Y.I., Lasek-Nesselquist E., Morrison H.G., Sogin M.L., Patterson D.J., Katz L.A. 2010. Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Syst. Biol. 59:518–533. Parfrey L.W., Lahr D.J.G., Knoll A.H., Katz L.A. 2011. Estimating the timing of early eukaryotic diversification with multigene molecular clocks. Proc. Natl Acad. Sci. U. S. A. 108:13624–13629. Penn O., Privman E., Ashkenazy H., Landan G., Graur D., Pupko T. 2010. GUIDANCE: a web server for assessing alignment confidence scores. Nucleic Acids Res. 38:W23–W28. Philippe H., Delsuc F., Brinkmann H., Lartillot N. 2005. Phylogenomics. Annu. Rev. Ecol. Evol. Sci. 36:541–562. Sierra R., Matz M.V., Aglyamova G., Pillet L.Ø., Decelle J., Not F., de Vargas C., Pawlowski J. 2013. Deep relationships of Rhizaria revealed by phylogenomics: a farewell to Haeckel’s Radiolaria. Mol. Phylogenet. Evol. 67:53–59. Simpson A.G.B., Inagaki Y., Roger A.J. 2006. Comprehensive multigene phylogenies of excavate protists reveal the evolutionary positions of “primitive” eukaryotes. Mol. Biol. Evol. 23:615–625. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690. Stamatakis A., Aberer A.J. 2013. Novel parallelization schemes for large-scale likelihood-based phylogenetic inference. Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IEEE Computer Society. p. 1195–1204. Stamatakis A., Hoover P., Rougemont J. 2008. A rapid bootstrap algorithm for the RAxML web-servers. Syst. Biol. 57:758–771. Stamatakis A., Ott M., Ludwig T. 2005. RAxML-OMP: an efficient program for phylogenetic inference on SMPs. Parallel Comput. Technol. 3606:288–302. 415 Stechmann A., Cavalier-Smith T. 2002. Rooting the eukaryote tree by using a derived gene fusion. Science 297:89–91. Szklarczyk R., Huynen M.A. 2010. Mosaic origin of the mitochondrial proteome. Proteomics 10:4012–4024. Thiergart T., Landan G., Schenk M., Dagan T., Martin W.F. 2012. An evolutionary network of genes present in the eukaryote common ancestor polls genomes on eukaryotic and mitochondrial origin. Genome Biol. Evol. 4:466–485. Torruella G., Derelle R., Paps J., Lang B.F., Roger A.J., ShalchianTabrizi K., Ruiz-Trillo I. 2012. Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of conserved single-copy protein domains. Mol. Biol. Evol. 29:531–544. Tourasse N.J., Gouy M. 1999. Accounting for evolutionary rate variation among sequence sites consistently changes universal phylogenies deduced from rRNA and protein-coding genes. Mol. Phylogenet. Evol. 13:159–168. Walker G., Dacks J.B., Embley T.M. 2006. Ultrastructural description of Breviata anathema, n. Gen., n. Sp., the organism previously studied as “Mastigamoeba invertens”. J. Eukaryot. Microbiol. 53: 65–78. Wiens J.J. 2005. Can incomplete taxa rescue phylogenetic analyses from long-branch attraction? Syst. Biol. 54:731–742. Wiens J.J., Tiu J. 2012. Highly incomplete taxa can rescue phylogenetic analyses from the negative impacts of limited taxon sampling. PLoS One 7:e42925 Williams T.A., Foster P.G., Cox C.J., Embley T.M. 2013. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504:231–236. Williams T.A., Foster P.G., Nye T.M., Cox C.J., Embley T.M. 2012. A congruent phylogenomic signal places eukaryotes within the Archaea. Proc. Roy. Soc. B Biol. Sci. 279:4870–4879. Yoon H.S., Grant J., Tekle Y.I., Wu M., Chaon B.C., Cole J.C., Logsdon J.M., Patterson D.J., Bhattacharya D., Katz L.A. 2008. Broadly sampled multigene trees of eukaryotes. BMC Evol. Biol. 8:14. Zhao S., Burki F., Brate J., Keeling P.J., Klaveness D., Shalchian-Tabrizi K. 2012. Collodictyon—an ancient lineage in the tree of eukaryotes. Mol. Biol. Evol. 29:1557–1568. Zufall R.A., McGrath C., Muse S.V., Katz L.A. 2006. Genome architecture drives protein evolution in ciliates. Mol. Biol. Evol. 23:1681–1687.
© Copyright 2026 Paperzz