Was the ANITA Rooting of the Angiosperm Phylogeny Affected by Long-Branch Attraction? Yin-Long Qiu,*† Jungho Lee,*† Barbara A. Whitlock,* Fabiana Bernasconi-Quadroni,† and Olena Dombrovska*† *Department of Biology, University of Massachusetts at Amherst; and †Institute of Systematic Botany, University of Zurich, Zurich, Switzerland Five groups of basal angiosperms, Amborella, Nymphaeales, Illiciales, Trimeniaceae, and Austrobaileya (ANITA), were identified in several recent studies as representing a series of the earliest-diverging lineages of the angiosperm phylogeny. All of these studies except one employed a multigene analysis approach and used gymnosperms as the outgroup to determine the ingroup topology. The high level of divergence between gymnosperms and angiosperms, however, has long been implicated in the difficulty of reconstructing relationships at the base of angiosperm phylogeny using DNA sequences, for fear of long-branch attraction (LBA). In this study, we replaced the gymnosperm sequences from the five-gene matrix (mitochondrial atp1 and matR, plastid atpB and rbcL, and nuclear 18S rDNA) used in our earlier study with four categories of divergent sequences—random sequences with equal base frequencies or equally AT- and GC-rich contents, homopolymers and heteropolymers, misaligned gymnosperm sequences, and aligned lycopod and bryophyte sequences—to evaluate whether the gymnosperms were an appropriate outgroup to angiosperms in our earlier study that identified the ANITA rooting. All 24 analyses performed rooted the angiosperm phylogeny at either Acorus or Alisma (or Alisma-Triglochin-Potamogeton in one case due to use of a slightly different alignment) and placed the monocots as a basal grade, producing genuine LBA results. These analyses demonstrate that the identification of ANITA as the basalmost extant angiosperms was based on historical signals preserved in the gymnosperm sequences and that the gymnosperms were an appropriate outgroup with which to root the angiosperm phylogeny in the multigene sequence analysis. This strategy of evaluating the appropriateness of an outgroup using artificial sequences and a series of outgroups with increments of divergence levels can be applied to investigations of phylogenetic patterns at the bases of other major clades, such as land plants, animals, and eukaryotes. Introduction Using an outgroup to polarize character states for identifying basal lineages within an ingroup is a virtually universal practice in phylogenetic analyses (Farris 1972; Stevens 1980; Maddison, Donoghue, and Maddison 1984; Nixon and Carpenter 1993). Choosing outgroups for assessing relationships at the bases of most major clades, however, has been difficult because of the great divergence between the potential outgroups and the ingroup at both morphological and molecular levels, which could confound interpretation of the homology of characters and character states. To reconstruct relationships at the base of the angiosperm phylogeny, extant and fossil gymnosperms or a hypothetical ancestor has been used as the outgroup in morphological cladistic analyses (Dahlgren and Bremer 1985; Donoghue and Doyle 1989; Loconte and Stevenson 1991; Taylor and Hickey 1992; Doyle, Donoghue, and Zimmer 1994). In molecular analyses, however, only living gymnosperms can be, and usually are, used as the outgroup (Martin and Dowd 1991; Hamby and Zimmer 1992; Chase et al. 1993; Qiu et al. 1993, 1999, 2000; Doyle, Donoghue, and Zimmer 1994; Goremykin et al. 1996; Chaw et al. 1997; Soltis et al. 1997, 2000; Parkinson, Adams, and Abbreviations: ANITA, Amborella, Nymphaeales, and IllicialesTrimeniaceae-Austrobaileya; LBA, long-branch attraction. Key words: Amborella, ANITA, basal angiosperms, long-branch attraction, outgroup, random sequences. Address for correspondence and reprints: Yin-Long Qiu, Department of Biology, University of Massachusetts, Amherst, Massachusetts 01003-5810. E-mail: [email protected]. Mol. Biol. Evol. 18(9):1745–1753. 2001 q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 Palmer 1999; Soltis, Soltis, and Chase 1999; Barkman et al. 2000; Graham and Olmstead 2000; Savolainen et al. 2000). Concerns have been expressed that gymnosperms may be too divergent to use as the outgroup for reconstructing relationships among basal angiosperms (Qiu et al. 1993; Donoghue and Mathews 1998). In DNA sequence data, due to the limited number of states for character evolution and the time elapsed since separation of the outgroup and the ingroup, a distant outgroup might no longer contain historical signals to polarize character states and thus would behave like a random sequence (Miyamoto and Boyle 1989; Wheeler 1990; Qiu and Palmer 1999). Thus, they might attract the longest ingroup branch, creating the so-called ‘‘longbranch attraction’’ (LBA) problem (Felsenstein 1978; Hendy and Penny 1989). Several approaches have been explored to deal with the distant-outgroup problem. The first approach is to use genes that duplicated along the branch leading to the ingroup, to reciprocally root the two gene phylogenies with each other and thus to infer the organismal phylogeny (Gogarten et al. 1989; Iwabe et al. 1989; Donoghue and Mathews 1998). This strategy works when the duplication occurred close to the point at which the ingroup diversified and when the duplicated copies did not experience dramatic rate acceleration. Another way to deal with the distant-outgroup problem is to extend the length of sequence analyzed by combining data from multiple genes of all three genomes so that the signal/noise ratio can be increased to allow a reliable rooting of the ingroup topology (Hillis 1996; Soltis et al. 1998; Qiu and Palmer 1999; Graham and 1745 1746 Qiu et al. Olmstead 2000). A third way to circumvent the distantoutgroup problem is to use genomic structural features that are conserved in their evolution and have clearly understood evolutionary mechanisms (Manhart and Palmer 1990; Raubeson and Jansen 1992; Qiu et al. 1998). Finally, understanding the homology of morphological characters across the large gap between the outgroup and the ingroup at a deeper level by taking the molecular developmental biology approach represents a major direction for future investigation of diversification patterns at the bases of major clades (Kellogg and Shaffer 1993; Doyle 1994; Carroll 1995; Davidson, Peterson, and Cameron 1995; Raff 1996; Frohlich and Meyerowitz 1997; Shubin, Tabin, and Carroll 1997; Theissen et al. 2000). In reconstructing relationships among basal angiosperms, the first two strategies have been used in several recent studies that identified the first branches of the angiosperm phylogeny (Mathews and Donoghue 1999, 2000; Parkinson, Adams, and Palmer 1999; Qiu et al. 1999, 2000; Soltis, Soltis, and Chase 1999; Barkman et al. 2000; Graham and Olmstead 2000; Soltis et al. 2000). Despite the mutual corroboration between the studies that employed the duplicated gene rooting strategy and those that adopted the multigene analysis approach in identifying the ANITA lineages as the basalmost extant angiosperms, it is essential to demonstrate that the multigene analysis approach can stand on a solid analytic ground on its own and that sampling multiple genes can indeed enhance the level of phylogenetic signal and thus can overcome the divergence gap problem between gymnosperms and angiosperms. This concern is especially justified by the fact that duplicated gene rooting has been shadowed by the difficulty of placing Ceratophyllum (Mathews and Donoghue 1999, 2000), which was identified as the first lineage of angiosperms in earlier rbcL analyses (Les, Garvin, and Wimpee 1991; Chase et al. 1993; Qiu et al. 1993). The key argument used in suggesting that distant outgroups might no longer be appropriate outgroups in molecular phylogenetic analyses is that the outgroup sequences are so divergent that the variation they contain has been randomized due to back-mutations and parallel mutations during the long time span since separation of the ingroup and the outgroup (Miyamoto and Boyle 1989; Wheeler 1990; Qiu and Palmer 1999). Hence, one can test whether or not a particular outgroup still contains phylogenetic signal to root the ingroup by replacing it with a random sequence. If the subsequent analysis reproduces the ingroup topology obtained by the original outgroup, this may be an indication of LBA caused by the randomized outgroup. Alternatively, if the random sequence attracts the longest ingroup branch and yields a different topology, this would suggest that the use of the original outgroup might have been appropriate (Miyamoto and Boyle 1989; Wheeler 1990; Maddison, Ruvolo, and Swofford 1992; Donoghue 1994; Graham 1997, pp. 122–161; Sullivan and Swofford 1997). In this study, we performed a series of analyses on the original data matrix used to identify ANITA as the earliest-diverging lineages of angiosperms (Qiu et al. 1999, 2000) using several types of artificial (random and nonrandom) sequences, as well as sequences that are more divergent than those of gymnosperms, namely, those of a lycopod and a bryophyte, to test whether our original use of gymnosperms as the outgroup was justified. Together with the ingroup taxon deletion analyses and constraint topology analyses presented earlier (Qiu et al. 2000), we hope that these analyses provide a rigorous analytic perspective for identifying the ANITA lineages as the earliest branches of the angiosperm phylogeny. Materials and Methods Four categories of divergent sequences were used to replace the eight gymnosperms (Cycas, Zamia, Ginkgo, Podocarpus, Metasequoia, Pinus, Gnetum, and Welwitschia) in the original matrix (Qiu et al. 2000) as the outgroup in a series of 24 analyses (table 1). In the first category, three types of random sequences were generated using the RANUNI function of SAS 8.1 (SAS Institute 2000): the first type consisted of 10 random sequences with equal base frequencies (25% each for A, C, G, and T), the second type consisted of two random sequences with 37.5% each for A and T and 12.5% each for G and C, and the third type consisted of two random sequences with 12.5% each for A and T and 37.5% each for G and C. These sequences represent truly random sequences with equal base frequencies or AT- and GCrich contents. For the second category, we manually generated five artificial, nonrandom sequences which were homopolymers (poly-A’s, poly-C’s, poly-G’s, and poly-T’s) and heteropolymers (poly-ACGT’s). These sequences represent extreme forms in a sequence universe. Both of these categories of sequences are of the same length (8,741 nt) as that of the five genes used in our earlier study (Qiu et al. 2000). Because these two categories of sequences were of nonbiological origin, they likely lacked certain unique properties of biological sequences and might behave erratically in phylogenetic analyses. To counteract this argument, we generated the third category of divergent sequences by misaligning the original five-gene sequences of the eight gymnosperms through deletion of the first position in the alignment (that of atp1) and filling in the last position in the alignment (that of nu18S rDNA) with a question mark (missing data). In so doing, we destroyed all the nucleotide position homology between gymnosperms and angiosperms by disrupting the original alignment, thus creating artificially divergent sequences (relative to the angiosperm ingroup) but of the same biological origin as the original gymnosperm sequences. For the last category, we used aligned sequences of the five genes of a lycopod and a bryophyte. Both the lycopod and the bryophyte sequences were composite. For the former, atp1 was from Lycopodium digitatum (AF209113), matR and atpB were from Huperzia lucidula (AY033145, this study, and U93819), rbcL was from Lycopodium obscurum (Y07935), and 18S rDNA was from Lycopodium tristachyum (U18511). For the latter, atp1 (M68929), atpB (X04465), rbcL (X04465), and Rooting the Angiosperm Phylogeny 1747 Table 1 The Results of 24 Analyses that Used Divergent Sequences as the Outgroup to Root the Angiosperm Phylogeny Analysis and Outgroup 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. rand-seq-1. . . . . . . . . . . . . . . . . . . . . rand-seq-2. . . . . . . . . . . . . . . . . . . . . rand-seq-3. . . . . . . . . . . . . . . . . . . . . rand-seq-4. . . . . . . . . . . . . . . . . . . . . rand-seq-5. . . . . . . . . . . . . . . . . . . . . rand-seq-6. . . . . . . . . . . . . . . . . . . . . rand-seq-7. . . . . . . . . . . . . . . . . . . . . rand-seq-8. . . . . . . . . . . . . . . . . . . . . rand-seq-9. . . . . . . . . . . . . . . . . . . . . rand-seq-10. . . . . . . . . . . . . . . . . . . . AT-0.375-1 . . . . . . . . . . . . . . . . . . . . AT-0.375-2 . . . . . . . . . . . . . . . . . . . . GC-0.375-1. . . . . . . . . . . . . . . . . . . . GC-0.375-2. . . . . . . . . . . . . . . . . . . . Poly-A’s. . . . . . . . . . . . . . . . . . . . . . . Poly-T’s . . . . . . . . . . . . . . . . . . . . . . Poly-ACGT’s . . . . . . . . . . . . . . . . . . Poly-C’s . . . . . . . . . . . . . . . . . . . . . . Poly-G’s . . . . . . . . . . . . . . . . . . . . . . Misaligned Cycas. . . . . . . . . . . . . . . Misaligned Ginkgo. . . . . . . . . . . . . . Misaligned 8 gymnosperms . . . . . . Aligned lycopod. . . . . . . . . . . . . . . . Aligned bryophyte . . . . . . . . . . . . . . TBML Outgroup GC% No. of MPTs TH TL CI RI Alisma Alisma Acorus Alisma Alisma Alisma Acorus Alisma Alisma Acorus Acorus Acorus Alisma Alisma Acorus Acorus Acorus Alisma Alisma Acorus Acorus Acorus Acorus ATPa 51.0 49.4 49.5 49.9 50.8 50.5 50.0 51.0 50.2 50.7 24.9 24.4 75.2 73.9 0.0 0.0 50.0 100.0 100.0 49.1 48.4 48.3 44.2 44.2 9 9 9 3 9 12 6 9 3 6 6 6 12 6 6 6 6 3 9 9 9 9 16 8 936 951 953 490 930 929 782 996 469 761 777 784 994 796 786 790 806 325 991 956 957 947 855 449 16,585 16,535 16,424 16,476 16,450 16,529 16,537 16,480 16,520 16,463 16,456 16,400 16,625 16,545 16,313 16,452 16,607 16,736 16,482 14,879 15,813 18,513 13,555 13,843 0.568 0.565 0.566 0.566 0.566 0.569 0.567 0.567 0.566 0.568 0.563 0.564 0.571 0.567 0.559 0.564 0.567 0.571 0.569 0.535 0.549 0.552 0.413 0.419 0.578 0.578 0.580 0.579 0.580 0.580 0.579 0.580 0.580 0.580 0.577 0.580 0.579 0.578 0.578 0.579 0.578 0.577 0.580 0.586 0.579 0.842 0.547 0.544 NOTE.—Abbreviations: TBML, the basalmost lineage; MPTs, most-parsimonious trees; TH, times the island(s) of the most-parsimonious trees was hit out of 1,000 random-taxon-addition replicates; TL, tree length; CI, consistency index; RI, retention index. The Acorus branch consists of two species: A. calamus and A. gramineus. a ATP 5 Alisma, Triglochin, and Potamogeton. 18S rDNA (X75521) were all from Marchantia polymorpha, and matR (AF068932) was from Notothylas breutelii (a hornwort), since the gene is a group II intron-encoded open reading frame and is absent in liverworts and most mosses (Qiu et al. 1998; unpublished data). To use this last category of divergent sequences, a new alignment for all of the original angiosperm and gymnosperm sequences was needed, since inclusion of the lycopod and bryophyte involved addition or removal of gaps in the alignment, particularly for matR. The alignment was done using Clustal X (Thompson et al. 1997). The purpose of using this last category of divergent sequences was to determine at what point a wellaligned biological outgroup sequence behaved like a random sequence when a nonangiosperm was used as the outgroup. In all 24 analyses except one, we used one divergent sequence to replace the eight gymnosperms in the original five-gene matrix as the outgroup (table 1). In only one analysis, we used eight misaligned gymnosperm sequences to replace the eight aligned ones. This analysis was designed to evaluate the effect of the number of divergent sequences on rooting of the angiosperm phylogeny, since we could compare its result with that of two other analyses in which either the misaligned sequence of Cycas (which lacks data for atpB; see Qiu et al. 2000) or that of Ginkgo was used as the outgroup. A heuristic parsimony (equal weighting) search was conducted using 1,000 random-taxon-addition replicates, one tree held at each step during stepwise addition, tree bisection-reconnection (TBR) branch swap- ping, the steepest-descent option, the MulTrees option, and no upper limit of MaxTrees. A bootstrap analysis was subsequently performed using 1,000 resampling replicates and the same tree search procedure as described above except with simple taxon addition. All the analyses were performed using PAUP 4.0* (Swofford 1998). To identify the longest ingroup branch and to examine distribution of branch lengths within angiosperms, we performed an unrooted ingroup (i.e., angiosperm only) analysis without using any outgroup. All of the angiosperms in the original matrix (Qiu et al. 2000) were kept after the eight gymnosperms were deleted. A heuristic search with 1,000 random-taxon-addition replicates and the same tree search procedure described above was conducted. Results The details of search results of the 24 analyses using divergent sequences as the outgroup are presented in table 1. One of the most parsimonious trees found in the analysis where a random sequence (sequence 1) with equal base frequencies was used as the outgroup is presented in figure 1. The branch lengths, the bootstrap values, and the nodes that collapsed in the strict consensus are shown in the figure. All 24 analyses except one produced virtually the same topology, with the first branch of angiosperm phylogeny being identified as either Acorus (which consists of two species, A. calamus and A. gramineus) or Alisma and the monocots forming 1748 Qiu et al. Rooting the Angiosperm Phylogeny a grade at the base of the phylogeny (table 1 and fig. 1). Only in the analysis where the aligned bryophyte sequence was used as the outgroup was the first branch of angiosperm trees composed of Alisma, Triglochin, and Potamogeton, and this variance could be due to the slightly different alignment used in the analysis. In all analyses, the topology of the strict consensus of the most-parsimonious trees was essentially identical to that of our earlier studies (Qiu et al. 1999, 2000), with the conspicuous exceptions of the monocot rooting and the ANITA lineages forming a clade (which occurred in the earlier rbcL analyses when Ceratophyllum was placed as a sister to all other angiosperms; see below; Chase et al. 1993; Qiu et al. 1993). Magnoliales were sister to Laurales, and Winterales were sister to Piperales, and together these four clades formed the eumagnoliid clade. Relationships among the eumagnoliids, eudicots, Chloranthaceae, Ceratophyllum, and ANITA were unresolved or resolved with no or low bootstrap support. In the unrooted ingroup analysis, we found two islands of 12 equally most parsimonious trees with a length of 10,520 steps, a consistency index of 0.411, and a retention index of 0.610. One of the trees, presented as an unrooted network showing branch lengths, is shown in figure 2. It is obvious that the longest ingroup branches are those leading to Alisma, Potamogeton, Triglochin, A. calamus, and A. gramineus, when the tree centers around the juncture of the ANITA lineages, Piperales, Winterales, Laurales, Magnoliales, eudicots, Chloranthaceae, monocots, and Ceratophyllum. Discussion The mystique surrounding LBA (Felsenstein 1978; Hendy and Penny 1989) has created a situation in which the phenomenon is frequently invoked to explain a topology that seems to be in conflict with other evidence or simply to dismiss an unfavorable topology. However, cases in which LBA is explicitly demonstrated and carefully investigated are few (Miyamoto and Boyle 1989; Wheeler 1990; Maddison, Ruvolo, and Swofford 1992; Graham 1997, pp. 122–161; Huelsenbeck 1997; Sullivan and Swofford 1997; Siddall and Whiting 1999; Sanderson et al. 2000). In the case of reconstruction of basal angiosperm phylogeny, taxa with very different morphologies were placed at the base of the trees in earlier molecular studies that analyzed different data sets: Schisandraceae in nuclear rbcS (Martin and Dowd 1991), Nymphaeales in nuclear rDNAs as well as plastid ITS and rDNA (Hamby and Zimmer 1992; Goremykin et al. 1996; Chaw et al. 1997), Ceratophyllum in plastid rbcL (Les, Garvin, and Wimpee 1991; Chase et al. 1993; 1749 Qiu et al. 1993), and Austrobaileya-Illiciales and Amborella in nuclear 18S rDNA (Soltis et al. 1997). These seemingly unstable results heightened plant systematists’ fear of the effect of LBA on rooting of the angiosperm phylogeny when gymnosperms, which always have the longest branch in any data set, were used as the outgroup (Qiu et al. 1993; Donoghue and Mathews 1998). One critical issue that has not been addressed in any of the molecular studies using gymnosperms as the outgroup to root the angiosperm phylogeny is whether they really are divergent enough to cause LBA (Martin and Dowd 1991; Hamby and Zimmer 1992; Chase et al. 1993; Qiu et al. 1993, 1999, 2000; Doyle, Donoghue, and Zimmer 1994; Goremykin et al. 1996; Chaw et al. 1997; Soltis et al. 1997, 2000; Parkinson, Adams, and Palmer 1999; Soltis, Soltis, and Chase 1999; Barkman et al. 2000; Graham and Olmstead 2000; Savolainen et al. 2000). Here, we demonstrate the conditions under which LBA really becomes a problem. With two categories of artificial sequences and misaligned gymnosperm sequences as the outgroups, we consistently rooted the angiosperm phylogeny at either Acorus or Alisma (table 1 and fig. 1), two of the longest branches (even longer than any ANITA members or Ceratophyllum) among all the angiosperms (fig. 2). These outgroup sequences plainly contain no historical information and have immensely long branches in comparison with all others in the trees (fig. 1 and data not shown). The branch length of the outgroup, random sequence 1, shown in figure 1 (6,065 steps!) is also in stark contrast to that of the gymnosperms in our earlier study (354 steps). The fact that virtually the same topology was reproduced in all of these analyses suggests that we have demonstrated the conditions under which genuine LBA can occur, and this is what the earlier authors had predicted (Miyamoto and Boyle 1989; Wheeler 1990). The variation of rooting between Acorus (45.5% GC) and Alisma (48.1% GC) appears to be correlated with the GC content of the outgroup sequence (table 1). A few exceptions (rand-seq-2, rand-seq-4, rand-seq-10, and the aligned bryophyte) may be due to the altered GC content of informative sites relative to the entire sequence. Therefore, these analyses demonstrate that while the gymnosperm sequences are highly divergent relative to those of angiosperms, they are not divergent enough to cause LBA and thus were an appropriate outgroup in our original studies (Qiu et al. 1999, 2000). Consequently, the placement of the ANITA lineages at the base of the angiosperm phylogeny was based on unique historical signals preserved in the gymnosperm sequences and was not caused by LBA. ← FIG. 1.—One of the nine most-parsimonious trees found in the search in which random sequence 1 was used to replace the eight gymnosperms as the outgroup in the five-gene matrix from Qiu et al. (2000) to root the angiosperm phylogeny. Numbers above branches are branch lengths (ACCTRAN optimization); those below in italics are bootstrap values (only those .50% are shown). The nodes labeled with asterisks are collapsed in the strict consensus of the nine shortest trees. Abbreviations: MON, monocots; CER, Ceratophyllum; CHL, Chloranthaceae; ITA, Illiciales, Trimeniaceae, and Austrobaileya; AMB, Amborella; NYM, Nymphaeales; EUD, eudicots; WIN, Winterales; PIP, Piperales; MAG, Magnoliales; LAU, Laurales; Acoruspc, Acorus calamus; Acoruspg, Acorus gramineus; Ceratophyllumpd, C. demersum; Ceratophyllumps, C. submersum; Rand seq 1, random sequence 1. 1750 Qiu et al. FIG. 2.—One of the 12 most-parsimonious trees found in the ingroup (angiosperms) only analysis. Numbers along branches are branch lengths (ACCTRAN optimization). The tree is shown as an unrooted phylogram. The part of the tree covering Laurales exclusive of Calycanthaceae (abbreviated as ‘‘L’’) is shown in 23 magnification. Rooting the Angiosperm Phylogeny The next question to ask is whether the ANITA rooting can still be an artifact caused by some mechanisms that generate similarities in unrelated lineages by chance but do not necessarily produce long branches. One molecular evolutionary phenomenon, RNA editing, so far known to occur only in organellar genomes (Yoshinaga et al. 1996; Steinhauser et al. 1999), may be such a mechanism (Bowe and dePamphilis 1996; Qiu and Palmer 1999). Nevertheless, individual analyses of three genes from two organellar genomes (mitochondrial atp1 and matR and plastid atpB) have all identified the ANITA clades as the earliest-branching angiosperm lineages (Qiu et al. 1999, 2000; Barkman et al. 2000; Savolainen et al. 2000). It is highly unlikely that the three genes in two genomes would experience extensive RNA editing in both gymnosperms and the ANITA members but not in any other lineages. Furthermore, an analysis of the nuclear 18S rDNA alone with extensive taxon sampling also placed Austrobaileya-Illiciales and Amborella at the base of angiosperm phylogeny (Soltis et al. 1997). No RNA editing has been reported at this locus to date. Finally, and most importantly, rooting of the angiosperm phylogeny using duplicated nuclear phytochrome genes has produced a similar result (Mathews and Donoghue 1999, 2000), reinforcing our belief that the ANITA rooting was not caused by RNA editing. GC content bias is another mechanism that does not necessarily increase branch length dramatically but still can generate analytic artifacts in phylogenetic analysis of DNA sequences (Steel, Lockhart, and Penny 1993). A brief examination of the GC content in the five genes across all major lineages of basal angiosperms and gymnosperms shows that there is no significant difference among lineages. Thus, it is unlikely that the ANITA rooting was affected by this factor. A final question to ask is whether the concern that distant outgroups could cause LBA was well placed (Miyamoto and Boyle 1989; Wheeler 1990; Qiu et al. 1993; Donoghue and Mathews 1998; Qiu and Palmer 1999). Our analyses using well-aligned lycopod and bryophyte sequences as the outgroup to root the angiosperm phylogeny indicate that exceedingly divergent outgroups can indeed generate a spurious rooting topology. Both analyses identified either Acorus or AlismaTriglochin-Potamogeton as the first branch of the angiosperm phylogeny and placed the monocots as a basal grade (table 1). These results suggest that the lycopod and bryophyte sequences are so divergent that they behave like random sequences. The outgroup branch length in the bryophyte rooting analysis was 1,464 steps, and that in the lycopod rooting analysis was 1,181 steps, as opposed to the 354 steps of the gymnosperm branch in Qiu et al. (2000). (Note that the alignment used for the bryophyte and lycopod rooting analyses was a slightly different one.) On the other hand, placing aligned gymnosperm sequences back into the matrix produced the ANITA rooting again (data not shown; the gymnosperms formed a monophyletic group, and the Gnetum-Welwitschia clade was sister to Pinus), supporting the earlier suggestion that one can avoid LBA by judiciously increasing taxon sampling to break long 1751 branches (Chase et al. 1993; Hillis 1996; Graybeal 1998; Soltis et al. 1998; Qiu et al. 1999; Qiu and Palmer 1999). The analyses presented here demonstrate that the gymnosperms were an appropriate outgroup with which to root the angiosperm phylogeny in our earlier multigene analyses (Qiu et al. 1999, 2000) and that the ANITA rooting is likely free of the LBA effect. Several other multigene analyses reached similar conclusions on the identity of the earliest angiosperms (Parkinson, Adams, and Palmer 1999; Soltis, Soltis, and Chase 1999; Barkman et al. 2000; Graham and Olmstead 2000; Soltis et al. 2000). It can be extrapolated that their use of gymnosperms as the outgroup did not violate any fundamental rule of choosing an appropriate outgroup. In retrospect, gymnosperms were well-behaved outgroups even in most single-gene analyses. Various members of the ANITA grade were placed at the base of angiosperm trees: Schisandraceae in nuclear rbcS (Martin and Dowd 1991), Nymphaeales in nuclear rDNAs as well as plastid ITS and rDNA (Hamby and Zimmer 1992; Goremykin et al. 1996; Chaw et al. 1997), and Austrobaileya-Illiciales and Amborella in nuclear 18S rDNA (Soltis et al. 1997). Insufficient taxon sampling in all of these studies and the use of single genes (which obviously contain less signal than multigene data sets) naturally complicate the effort of building a well-resolved phylogeny and lead to the suspicion that these seemingly different rooting topologies were produced by LBA due to the great divergence between gymnosperms and angiosperms. Ironically, the only single-gene analyses that sampled basal angiosperms extensively produced a rooting that seems to be an analytical artifact, i.e., the Ceratophyllum rooting (Chase et al. 1993; Qiu et al. 1993). A reanalysis of the rbcL matrix used in our recent multigene analyses (Qiu et al. 1999, 2000) shows that even the placement of Ceratophyllum as the sister to all other angiosperms was also largely due to the historical signal contained in the gymnosperm sequences. When the gymnosperm sequences were replaced with the artificial sequences and misaligned gymnosperm sequences used in this study, the angiosperm trees were rooted at various taxa that have branches longer than Ceratophyllum (data not shown). Ceratophyllum is likely an early-diverging lineage of angiosperms, even though its exact relationship to other major clades of basal angiosperms is not well resolved at present (Qiu et al. 1999, 2000; Soltis, Soltis, and Chase 1999; Mathews and Donoghue 2000; Savolainen et al. 2000; Soltis et al. 2000). Thus, its placement at the base of angiosperm trees in the rbcL analyses was probably caused by both phylogenetic signal and a few homoplasious changes that happened to be shared with gymnosperms (not necessarily by LBA). Reconstruction of phylogenetic relationships at the bases of major clades using molecular sequence data routinely generates controversial results (Qiu and Palmer 1999; Adoutte et al. 2000; Philippe, Germot, and Moreira 2000), largely due to use of divergent outgroups and sparse taxon sampling. The LBA problem is frequently invoked to explain results that are otherwise inexplicable. Nevertheless, most claims of LBA have not 1752 Qiu et al. been substantiated by explicit analyses. Several parsimony- or likelihood-based tests have been developed to examine whether long branches indeed attract each other and to reduce the LBA effect (Huelsenbeck 1997; Lyons-Weiler and Hoelzer 1997; Willson 1999; Sanderson et al. 2000). The strategy employed here follows the ideas of Miyamoto and Boyle (1989), Wheeler (1990), Maddison, Ruvolo, and Swofford (1992), Graham (1997), and Sullivan and Swofford (1997) in using random sequences to evaluate whether phylogenetic signal in the outgroup has been randomized. We further elaborated this approach by increasing the repertoire of test sequences by using homo- and heteropolymers, misaligned original outgroup sequences, and more distantly related aligned outgroup sequences. In particular, this last category of outgroup sequences showed several increments of divergence levels and helped to define the point beyond which the outgroup was no longer appropriate for rooting the ingroup. As it becomes clear that sampling multiple genes from all two or three genomes of a large number of organisms can lead to reliable reconstruction of complicated organismal phylogenies (Hillis 1996; Qiu et al. 1999, 2000; Soltis, Soltis, and Chase 1999; Savolainen et al. 2000; Soltis et al. 2000) and that the LBA problem is tractable thanks to the various strategies that are being developed, phylogenetic analyses of DNA sequences will undoubtedly, along with comparative genomics and evolutionary developmental biology, allow evolutionary biologists to tackle many of the issues in the tree of life. Acknowledgments We thank Ronald Adkins, Albert Blarer, James A. Doyle, Eva Goldwater, Sean Graham, Margaret Hoey, Libo Li, and Peter F. Stevens for helpful suggestions, and Schweizerischer Nationalfonds and University of Massachusetts for financial support. LITERATURE CITED ADOUTTE, A., G. BALAVOINE, N. LARTILLOT, O. LESPINET, B. PRUD’HOMME, and R. DE ROSA. 2000. The new animal phylogeny: reliability and implications. Proc. Natl. Acad. Sci. USA 97:4453–4456. BARKMAN, T. J., G. CHENERY, J. R. MCNEAL, J. LYONS-WEILER, W. J. ELLISENS, G. MOORE, A. D. WOLFE, and C. W. DEPAMPHILIS. 2000. Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny. Proc. Natl. Acad. Sci. USA 97:13166–13171. BOWE, L. M., and C. W. DEPAMPHILIS. 1996. Effects of RNA editing and gene processing on phylogenetic reconstruction. Mol. Biol. Evol. 13:1159–1166. CARROLL, S. B. 1995. Homeotic genes and the evolution of arthropods and chordates. Nature 376:479–485. CHASE, M. W., D. E. SOLTIS, R. G. OLMSTEAD et al. (39 coauthors). 1993. Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Ann. Mo. Bot. Gard. 80:528–580. CHAW, S.-M., A. ZHARKIKH, H.-M. SUNG, T.-C. LAU, and W.H. LEE. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14:56–68. DAHLGREN, R., and K. BREMER. 1985. Major clades of the angiosperms. Cladistics 1:349–368. DAVIDSON, E. H., K. J. PETERSON, and R. A. CAMERON. 1995. Origin of bilaterian body plans: evolution of developmental regulatory mechanisms. Science 270:1319–1325. DONOGHUE, M. J. 1994. Progress and prospects in reconstructing plant phylogeny. Ann. Mo. Bot. Gard. 81:405–418. DONOGHUE, M. J., and J. A. DOYLE. 1989. Phylogenetic analysis of angiosperms and the relationships of Hamamelidae. Pp. 17–45 in P. R. CRANE and S. BLACKMORE, eds. Evolution, systematics, and fossil history of the Hamamelidae. Vol. 1. Clarendon, Oxford, England. DONOGHUE, M. J., and S. MATHEWS. 1998. Duplicated genes and the root of angiosperms, with an example using phytochrome sequences. Mol. Phylogenet. Evol. 9:489–500. DOYLE, J. A., M. J. DONOGHUE, and E. A. ZIMMER. 1994. Integration of morphological and ribosomal RNA data on the origin of angiosperms. Ann. Mo. Bot. Gard. 81:419– 450. DOYLE, J. J. 1994. Evolution of a plant homeotic multigene family: toward connecting molecular systematics and molecular developmental genetics. Syst. Biol. 43:307–328. FARRIS, J. S. 1972. Estimating phylogenetic trees from distance matrices. Am. Nat. 106:645–668. FELSENSTEIN, J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 27:401–410. FROHLICH, M. W., and E. M. MEYEROWITZ. 1997. The search for homeotic gene homologs in basal angiosperms and Gnetales: a potential new source of data on the evolutionary origin of flowers. Int. J. Plant Sci. 158:S131–S142. GOGARTEN, J. P., H. KILBAK, P. DITTRICH, L. TAIZ, E. J. BOWMAN, B. J. BOWMAN, M. F. MANOLSON, R. J. POOLE, T. DATE, and T. OSHIMA. 1989. Evolution of vacuolar H1ATPase: implications for the origin of eukaryotes. Proc. Natl. Acad. Sci. USA 86:6661–6665. GOREMYKIN, V., V. BOBROVA, J. PAHNKE, A. TROITSKY, A. ANTONOV, and W. MARTIN. 1996. Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support Gnetalean affinities of angiosperms. Mol. Bio. Evol. 13:383–396. GRAHAM, S. W. 1997. Phylogenetic analysis of breeding system evolution in heterostylous monocotyledons. Ph.D. dissertation, University of Toronto, Toronto, Canada. GRAHAM, S. W., and R. G. OLMSTEAD. 2000. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87:1712–1730. GRAYBEAL, A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47:9–17. HAMBY, R. K., and E. A. ZIMMER. 1992. Ribosomal RNA as a phylogenetic tool in plant systematics. Pp. 50–91 in P. S. SOLTIS, D. E. SOLTIS, and J. J. DOYLE, eds. Molecular systematics of plants. Chapman and Hall, New York. HENDY, M. D., and D. PENNY. 1989. A framework for the quantitative study of evolutionary trees. Syst. Zool. 38:297– 309. HILLIS, D. M. 1996. Inferring complex phylogenies. Nature 383:130–131. HUELSENBECK, J. P. 1997. Is the Felsenstein zone a fly trap? Syst. Biol. 46:69–74. IWABE, N., K.-I. KUMA, M. HASEGAWA, S. OSAWA, and T. MIYATA. 1989. Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci. USA 86:9355– 9359. KELLOGG, E. A., and H. B. SHAFFER. 1993. Model organisms in evolutionary studies. Syst. Biol. 42:409–414. Rooting the Angiosperm Phylogeny LES, D. H., D. K. GARVIN, and C. F. WIMPEE. 1991. Molecular evolutionary history of ancient aquatic angiosperms. Proc. Natl. Acad. Sci. USA 88:10119–10123. LOCONTE, H., and D. W. STEVENSON. 1991. Cladistics of the Magnoliidae. Cladistics 7:267–296. LYONS-WEILER, J., and G. A. HOELZER. 1997. Escaping from the Felsenstein zone by detecting long branches in phylogenetic data. Mol. Phylogenet. Evol. 8:375–384. MADDISON, D. R., M. RUVOLO, and D. L. SWOFFORD. 1992. Geographic origins of human mitochondrial DNA: phylogenetic evidence from control region sequences. Syst. Biol. 41:111–124. MADDISON, W. P., M. J. DONOGHUE, and D. R. MADDISON. 1984. Outgroup analysis and parsimony. Syst. Zool. 33:83– 103. MANHART, J. R., and J. D. PALMER. 1990. The gain of two chloroplast tRNA introns marks the green algal ancestors of land plants. Nature 345:268–270. MARTIN, P. G., and J. M. DOWD. 1991. Studies of angiosperm phylogeny using protein sequences. Ann. Mo. Bot. Gard. 78:296–337. MATHEWS, S., and M. J. DONOGHUE. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286:947–950. ———. 2000. Basal angiosperm phylogeny inferred from duplicated phytochromes A and C. Int. J. Plant Sci. 161:S41– S55. MIYAMOTO, M. M., and S. M. BOYLE. 1989. The potential importance of mitochondrial DNA sequence data to eutherian mammal phylogeny. Pp. 437–450 in B. FERNHOLM, K. BREMER, and H. JOERNVALL, eds. The hierarchy of life. Elsevier, Amsterdam, the Netherlands. NIXON, K. C., and J. M. CARPENTER. 1993. On outgroups. Cladistics 9:413–426. PARKINSON, C. L., K. L. ADAMS, and J. D. PALMER. 1999. Multigene analyses identify the three earliest lineages of extant flowering plants. Curr. Biol. 9:1485–1488. PHILIPPE, H., A. GERMOT, and D. MOREIRA. 2000. The new phylogeny of eukaryotes. Curr. Opin. Genet. Dev. 10:596– 601. QIU, Y.-L., M. W. CHASE, D. H. LES, and C. R. PARKS. 1993. Molecular phylogenetics of the Magnoliidae: cladistic analyses of nucleotide sequences of the plastid gene rbcL. Ann. Mo. Bot. Gard. 80:587–606. QIU, Y.-L., Y. CHO, J. C. COX, and J. D. PALMER. 1998. The gain of three mitochondrial introns identifies liverworts as the earliest land plants. Nature 394:671–674. QIU, Y.-L., J. LEE, F. BERNASCONI-QUADRONI, D. E. SOLTIS, P. S. SOLTIS, M. ZANIS, E. A. ZIMMER, Z. CHEN, V. SAVOLAINEN, and M. W. CHASE. 1999. The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402:404–407. ———. 2000. Phylogeny of basal angiosperms: Analyses of five genes from three genomes. Int. J. Plant Sci. 161:S3– S27. QIU, Y.-L., and J. D. PALMER. 1999. Phylogeny of early land plants: insights from genes and genomes. Trends Plant Sci. 4:26–30. RAFF, R. A. 1996. The shape of life. University of Chicago Press, Chicago. RAUBESON, L. A., and R. K. JANSEN. 1992. Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants. Science 255:1697–1699. 1753 SANDERSON, M. J., M. F. WOJCIECHOWSKI, J.-M. HU, T. SHER KHAN, and S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17:782–797. SAS INSTITUTE. 2000. SAS 8.1. SAS Institute, Cary, N.C. SAVOLAINEN, V., M. W. CHASE, S. B. HOOT, C. M. MORTON, D. E. SOLTIS, C. BAYER, M. F. FAY, A. Y. DE BRUIJN, S. SULLIVAN, and Y.-L. QIU. 2000. Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Syst. Biol. 49:306–362. SHUBIN, N., C. TABIN, and S. CARROLL. 1997. Fossils, genes, and the evolution of animal limbs. Nature 388:639–648. SIDALL, M. E., and M. F. WHITING. 1999. Long branch abstraction. Cladistics 15:9–24. SOLTIS, D. E., P. S. SOLTIS, M. W. CHASE et al. (16 co-authors). 2000. Angiosperm phylogeny inferred from 18S rDNA, rbcL and atpB sequences. Bot. J. Linn. Soc. 133:381–461. SOLTIS, D. E., P. S. SOLTIS, M. E. MORT, M. W. CHASE, V. SAVOLAINEN, S. B. HOOT, and C. M. MORTON. 1998. Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms. Syst. Biol. 47:32–42. SOLTIS, D. E., P. S. SOLTIS, D. L. NICKRENT et al. (16 coauthors). 1997. Angiosperm phylogeny inferred from 18S ribosomal DNA sequences. Ann. Mo. Bot. Gard. 84:1–49. SOLTIS, P. S., D. E. SOLTIS, and M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402:402–404. STEEL, M. A., P. J. LOCKHART, and D. PENNY. 1993. Confidence in evolutionary trees from biological sequence data. Nature 364:440–442. STEINHAUSER, S., S. BECKERT, I. CAPESIUS, O. MALEK, and V. KNOOP. 1999. Plant mitochondrial RNA editing. J. Mol. Evol. 48:303–312. STEVENS, P. F. 1980. Evolutionary polarity of character states. Annu. Rev. Ecol. Syst. 11:333–358. SULLIVAN, J., and D. L. SWOFFORD. 1997. Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J. Mamm. Evol. 4:77–86. SWOFFORD, D. L. 1998. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4.0b2. Sinauer, Sunderland, Mass. TAYLOR, D. W., and L. J. HICKEY. 1992. Phylogenetic evidence for the herbaceous origin of angiosperms. Plant Syst. Evol. 180:137–156. THEISSEN, G., A. BECKER, A. DI ROSA, A. KANNO, J. T. KIM, T. MUENSTER, K.-U. WINTER, and H. SAEDLER. 2000. A short history of MADS-box genes in plants. Plant Mol. Biol. 42:115–149. THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN, and D. G. HIGGINS. 1997. The Clustal X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876–4882. WHEELER, W. C. 1990. Nucleic acid sequence phylogeny and random outgroups. Cladistics 6:363–367. WILLSON, S. J. 1999. A higher order parsimony method to reduce long-branch attraction. Mol. Biol. Evol. 16:694–705. YOSHINAGA, K., H. IINUMA, T. MASUZAWA, and K. UEDA. 1996. Extensive RNA editing of U to C in addition to C to U substitution in the rbcL transcripts of hornwort chloroplasts and the origin of RNA editing in green plants. Nucleic Acids Res. 24:1008–1014. ELIZABETH KELLOGG, reviewing editor Accepted May 18, 2001
© Copyright 2025 Paperzz