Was the ANITA Rooting of the Angiosperm

Was the ANITA Rooting of the Angiosperm Phylogeny Affected
by Long-Branch Attraction?
Yin-Long Qiu,*† Jungho Lee,*† Barbara A. Whitlock,* Fabiana Bernasconi-Quadroni,† and
Olena Dombrovska*†
*Department of Biology, University of Massachusetts at Amherst; and †Institute of Systematic Botany, University of Zurich,
Zurich, Switzerland
Five groups of basal angiosperms, Amborella, Nymphaeales, Illiciales, Trimeniaceae, and Austrobaileya (ANITA),
were identified in several recent studies as representing a series of the earliest-diverging lineages of the angiosperm
phylogeny. All of these studies except one employed a multigene analysis approach and used gymnosperms as the
outgroup to determine the ingroup topology. The high level of divergence between gymnosperms and angiosperms,
however, has long been implicated in the difficulty of reconstructing relationships at the base of angiosperm phylogeny using DNA sequences, for fear of long-branch attraction (LBA). In this study, we replaced the gymnosperm
sequences from the five-gene matrix (mitochondrial atp1 and matR, plastid atpB and rbcL, and nuclear 18S rDNA)
used in our earlier study with four categories of divergent sequences—random sequences with equal base frequencies
or equally AT- and GC-rich contents, homopolymers and heteropolymers, misaligned gymnosperm sequences, and
aligned lycopod and bryophyte sequences—to evaluate whether the gymnosperms were an appropriate outgroup to
angiosperms in our earlier study that identified the ANITA rooting. All 24 analyses performed rooted the angiosperm
phylogeny at either Acorus or Alisma (or Alisma-Triglochin-Potamogeton in one case due to use of a slightly
different alignment) and placed the monocots as a basal grade, producing genuine LBA results. These analyses
demonstrate that the identification of ANITA as the basalmost extant angiosperms was based on historical signals
preserved in the gymnosperm sequences and that the gymnosperms were an appropriate outgroup with which to
root the angiosperm phylogeny in the multigene sequence analysis. This strategy of evaluating the appropriateness
of an outgroup using artificial sequences and a series of outgroups with increments of divergence levels can be
applied to investigations of phylogenetic patterns at the bases of other major clades, such as land plants, animals,
and eukaryotes.
Introduction
Using an outgroup to polarize character states for
identifying basal lineages within an ingroup is a virtually universal practice in phylogenetic analyses (Farris
1972; Stevens 1980; Maddison, Donoghue, and Maddison 1984; Nixon and Carpenter 1993). Choosing outgroups for assessing relationships at the bases of most
major clades, however, has been difficult because of the
great divergence between the potential outgroups and
the ingroup at both morphological and molecular levels,
which could confound interpretation of the homology of
characters and character states. To reconstruct relationships at the base of the angiosperm phylogeny, extant
and fossil gymnosperms or a hypothetical ancestor has
been used as the outgroup in morphological cladistic
analyses (Dahlgren and Bremer 1985; Donoghue and
Doyle 1989; Loconte and Stevenson 1991; Taylor and
Hickey 1992; Doyle, Donoghue, and Zimmer 1994). In
molecular analyses, however, only living gymnosperms
can be, and usually are, used as the outgroup (Martin
and Dowd 1991; Hamby and Zimmer 1992; Chase et al.
1993; Qiu et al. 1993, 1999, 2000; Doyle, Donoghue,
and Zimmer 1994; Goremykin et al. 1996; Chaw et al.
1997; Soltis et al. 1997, 2000; Parkinson, Adams, and
Abbreviations: ANITA, Amborella, Nymphaeales, and IllicialesTrimeniaceae-Austrobaileya; LBA, long-branch attraction.
Key words: Amborella, ANITA, basal angiosperms, long-branch
attraction, outgroup, random sequences.
Address for correspondence and reprints: Yin-Long Qiu, Department of Biology, University of Massachusetts, Amherst, Massachusetts
01003-5810. E-mail: [email protected].
Mol. Biol. Evol. 18(9):1745–1753. 2001
q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
Palmer 1999; Soltis, Soltis, and Chase 1999; Barkman
et al. 2000; Graham and Olmstead 2000; Savolainen et
al. 2000). Concerns have been expressed that gymnosperms may be too divergent to use as the outgroup for
reconstructing relationships among basal angiosperms
(Qiu et al. 1993; Donoghue and Mathews 1998). In
DNA sequence data, due to the limited number of states
for character evolution and the time elapsed since separation of the outgroup and the ingroup, a distant outgroup might no longer contain historical signals to polarize character states and thus would behave like a random sequence (Miyamoto and Boyle 1989; Wheeler
1990; Qiu and Palmer 1999). Thus, they might attract
the longest ingroup branch, creating the so-called ‘‘longbranch attraction’’ (LBA) problem (Felsenstein 1978;
Hendy and Penny 1989).
Several approaches have been explored to deal with
the distant-outgroup problem. The first approach is to
use genes that duplicated along the branch leading to
the ingroup, to reciprocally root the two gene phylogenies with each other and thus to infer the organismal
phylogeny (Gogarten et al. 1989; Iwabe et al. 1989;
Donoghue and Mathews 1998). This strategy works
when the duplication occurred close to the point at
which the ingroup diversified and when the duplicated
copies did not experience dramatic rate acceleration.
Another way to deal with the distant-outgroup problem
is to extend the length of sequence analyzed by combining data from multiple genes of all three genomes so
that the signal/noise ratio can be increased to allow a
reliable rooting of the ingroup topology (Hillis 1996;
Soltis et al. 1998; Qiu and Palmer 1999; Graham and
1745
1746
Qiu et al.
Olmstead 2000). A third way to circumvent the distantoutgroup problem is to use genomic structural features
that are conserved in their evolution and have clearly
understood evolutionary mechanisms (Manhart and
Palmer 1990; Raubeson and Jansen 1992; Qiu et al.
1998). Finally, understanding the homology of morphological characters across the large gap between the outgroup and the ingroup at a deeper level by taking the
molecular developmental biology approach represents a
major direction for future investigation of diversification
patterns at the bases of major clades (Kellogg and Shaffer 1993; Doyle 1994; Carroll 1995; Davidson, Peterson,
and Cameron 1995; Raff 1996; Frohlich and Meyerowitz 1997; Shubin, Tabin, and Carroll 1997; Theissen et
al. 2000).
In reconstructing relationships among basal angiosperms, the first two strategies have been used in several
recent studies that identified the first branches of the
angiosperm phylogeny (Mathews and Donoghue 1999,
2000; Parkinson, Adams, and Palmer 1999; Qiu et al.
1999, 2000; Soltis, Soltis, and Chase 1999; Barkman et
al. 2000; Graham and Olmstead 2000; Soltis et al.
2000). Despite the mutual corroboration between the
studies that employed the duplicated gene rooting strategy and those that adopted the multigene analysis approach in identifying the ANITA lineages as the basalmost extant angiosperms, it is essential to demonstrate
that the multigene analysis approach can stand on a solid
analytic ground on its own and that sampling multiple
genes can indeed enhance the level of phylogenetic signal and thus can overcome the divergence gap problem
between gymnosperms and angiosperms. This concern
is especially justified by the fact that duplicated gene
rooting has been shadowed by the difficulty of placing
Ceratophyllum (Mathews and Donoghue 1999, 2000),
which was identified as the first lineage of angiosperms
in earlier rbcL analyses (Les, Garvin, and Wimpee 1991;
Chase et al. 1993; Qiu et al. 1993).
The key argument used in suggesting that distant
outgroups might no longer be appropriate outgroups in
molecular phylogenetic analyses is that the outgroup sequences are so divergent that the variation they contain
has been randomized due to back-mutations and parallel
mutations during the long time span since separation of
the ingroup and the outgroup (Miyamoto and Boyle
1989; Wheeler 1990; Qiu and Palmer 1999). Hence, one
can test whether or not a particular outgroup still contains phylogenetic signal to root the ingroup by replacing it with a random sequence. If the subsequent analysis reproduces the ingroup topology obtained by the
original outgroup, this may be an indication of LBA
caused by the randomized outgroup. Alternatively, if the
random sequence attracts the longest ingroup branch and
yields a different topology, this would suggest that the
use of the original outgroup might have been appropriate
(Miyamoto and Boyle 1989; Wheeler 1990; Maddison,
Ruvolo, and Swofford 1992; Donoghue 1994; Graham
1997, pp. 122–161; Sullivan and Swofford 1997).
In this study, we performed a series of analyses on
the original data matrix used to identify ANITA as the
earliest-diverging lineages of angiosperms (Qiu et al.
1999, 2000) using several types of artificial (random and
nonrandom) sequences, as well as sequences that are
more divergent than those of gymnosperms, namely,
those of a lycopod and a bryophyte, to test whether our
original use of gymnosperms as the outgroup was justified. Together with the ingroup taxon deletion analyses
and constraint topology analyses presented earlier (Qiu
et al. 2000), we hope that these analyses provide a rigorous analytic perspective for identifying the ANITA
lineages as the earliest branches of the angiosperm
phylogeny.
Materials and Methods
Four categories of divergent sequences were used
to replace the eight gymnosperms (Cycas, Zamia, Ginkgo, Podocarpus, Metasequoia, Pinus, Gnetum, and Welwitschia) in the original matrix (Qiu et al. 2000) as the
outgroup in a series of 24 analyses (table 1). In the first
category, three types of random sequences were generated using the RANUNI function of SAS 8.1 (SAS Institute 2000): the first type consisted of 10 random sequences with equal base frequencies (25% each for A,
C, G, and T), the second type consisted of two random
sequences with 37.5% each for A and T and 12.5% each
for G and C, and the third type consisted of two random
sequences with 12.5% each for A and T and 37.5% each
for G and C. These sequences represent truly random
sequences with equal base frequencies or AT- and GCrich contents. For the second category, we manually
generated five artificial, nonrandom sequences which
were homopolymers (poly-A’s, poly-C’s, poly-G’s, and
poly-T’s) and heteropolymers (poly-ACGT’s). These sequences represent extreme forms in a sequence universe.
Both of these categories of sequences are of the same
length (8,741 nt) as that of the five genes used in our
earlier study (Qiu et al. 2000). Because these two categories of sequences were of nonbiological origin, they
likely lacked certain unique properties of biological sequences and might behave erratically in phylogenetic
analyses. To counteract this argument, we generated the
third category of divergent sequences by misaligning the
original five-gene sequences of the eight gymnosperms
through deletion of the first position in the alignment
(that of atp1) and filling in the last position in the alignment (that of nu18S rDNA) with a question mark (missing data). In so doing, we destroyed all the nucleotide
position homology between gymnosperms and angiosperms by disrupting the original alignment, thus creating artificially divergent sequences (relative to the angiosperm ingroup) but of the same biological origin as
the original gymnosperm sequences. For the last category, we used aligned sequences of the five genes of a
lycopod and a bryophyte. Both the lycopod and the
bryophyte sequences were composite. For the former,
atp1 was from Lycopodium digitatum (AF209113),
matR and atpB were from Huperzia lucidula
(AY033145, this study, and U93819), rbcL was from
Lycopodium obscurum (Y07935), and 18S rDNA was
from Lycopodium tristachyum (U18511). For the latter,
atp1 (M68929), atpB (X04465), rbcL (X04465), and
Rooting the Angiosperm Phylogeny
1747
Table 1
The Results of 24 Analyses that Used Divergent Sequences as the Outgroup to Root the Angiosperm Phylogeny
Analysis and Outgroup
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
rand-seq-1. . . . . . . . . . . . . . . . . . . . .
rand-seq-2. . . . . . . . . . . . . . . . . . . . .
rand-seq-3. . . . . . . . . . . . . . . . . . . . .
rand-seq-4. . . . . . . . . . . . . . . . . . . . .
rand-seq-5. . . . . . . . . . . . . . . . . . . . .
rand-seq-6. . . . . . . . . . . . . . . . . . . . .
rand-seq-7. . . . . . . . . . . . . . . . . . . . .
rand-seq-8. . . . . . . . . . . . . . . . . . . . .
rand-seq-9. . . . . . . . . . . . . . . . . . . . .
rand-seq-10. . . . . . . . . . . . . . . . . . . .
AT-0.375-1 . . . . . . . . . . . . . . . . . . . .
AT-0.375-2 . . . . . . . . . . . . . . . . . . . .
GC-0.375-1. . . . . . . . . . . . . . . . . . . .
GC-0.375-2. . . . . . . . . . . . . . . . . . . .
Poly-A’s. . . . . . . . . . . . . . . . . . . . . . .
Poly-T’s . . . . . . . . . . . . . . . . . . . . . .
Poly-ACGT’s . . . . . . . . . . . . . . . . . .
Poly-C’s . . . . . . . . . . . . . . . . . . . . . .
Poly-G’s . . . . . . . . . . . . . . . . . . . . . .
Misaligned Cycas. . . . . . . . . . . . . . .
Misaligned Ginkgo. . . . . . . . . . . . . .
Misaligned 8 gymnosperms . . . . . .
Aligned lycopod. . . . . . . . . . . . . . . .
Aligned bryophyte . . . . . . . . . . . . . .
TBML
Outgroup
GC%
No. of
MPTs
TH
TL
CI
RI
Alisma
Alisma
Acorus
Alisma
Alisma
Alisma
Acorus
Alisma
Alisma
Acorus
Acorus
Acorus
Alisma
Alisma
Acorus
Acorus
Acorus
Alisma
Alisma
Acorus
Acorus
Acorus
Acorus
ATPa
51.0
49.4
49.5
49.9
50.8
50.5
50.0
51.0
50.2
50.7
24.9
24.4
75.2
73.9
0.0
0.0
50.0
100.0
100.0
49.1
48.4
48.3
44.2
44.2
9
9
9
3
9
12
6
9
3
6
6
6
12
6
6
6
6
3
9
9
9
9
16
8
936
951
953
490
930
929
782
996
469
761
777
784
994
796
786
790
806
325
991
956
957
947
855
449
16,585
16,535
16,424
16,476
16,450
16,529
16,537
16,480
16,520
16,463
16,456
16,400
16,625
16,545
16,313
16,452
16,607
16,736
16,482
14,879
15,813
18,513
13,555
13,843
0.568
0.565
0.566
0.566
0.566
0.569
0.567
0.567
0.566
0.568
0.563
0.564
0.571
0.567
0.559
0.564
0.567
0.571
0.569
0.535
0.549
0.552
0.413
0.419
0.578
0.578
0.580
0.579
0.580
0.580
0.579
0.580
0.580
0.580
0.577
0.580
0.579
0.578
0.578
0.579
0.578
0.577
0.580
0.586
0.579
0.842
0.547
0.544
NOTE.—Abbreviations: TBML, the basalmost lineage; MPTs, most-parsimonious trees; TH, times the island(s) of the most-parsimonious trees was hit out of
1,000 random-taxon-addition replicates; TL, tree length; CI, consistency index; RI, retention index. The Acorus branch consists of two species: A. calamus and A.
gramineus.
a ATP 5 Alisma, Triglochin, and Potamogeton.
18S rDNA (X75521) were all from Marchantia polymorpha, and matR (AF068932) was from Notothylas
breutelii (a hornwort), since the gene is a group II intron-encoded open reading frame and is absent in liverworts and most mosses (Qiu et al. 1998; unpublished
data). To use this last category of divergent sequences,
a new alignment for all of the original angiosperm and
gymnosperm sequences was needed, since inclusion of
the lycopod and bryophyte involved addition or removal
of gaps in the alignment, particularly for matR. The
alignment was done using Clustal X (Thompson et al.
1997). The purpose of using this last category of divergent sequences was to determine at what point a wellaligned biological outgroup sequence behaved like a
random sequence when a nonangiosperm was used as
the outgroup.
In all 24 analyses except one, we used one divergent sequence to replace the eight gymnosperms in the
original five-gene matrix as the outgroup (table 1). In
only one analysis, we used eight misaligned gymnosperm sequences to replace the eight aligned ones. This
analysis was designed to evaluate the effect of the number of divergent sequences on rooting of the angiosperm
phylogeny, since we could compare its result with that
of two other analyses in which either the misaligned
sequence of Cycas (which lacks data for atpB; see Qiu
et al. 2000) or that of Ginkgo was used as the outgroup.
A heuristic parsimony (equal weighting) search was
conducted using 1,000 random-taxon-addition replicates, one tree held at each step during stepwise addition, tree bisection-reconnection (TBR) branch swap-
ping, the steepest-descent option, the MulTrees option,
and no upper limit of MaxTrees. A bootstrap analysis
was subsequently performed using 1,000 resampling
replicates and the same tree search procedure as described above except with simple taxon addition. All the
analyses were performed using PAUP 4.0* (Swofford
1998).
To identify the longest ingroup branch and to examine distribution of branch lengths within angiosperms, we performed an unrooted ingroup (i.e., angiosperm only) analysis without using any outgroup. All of
the angiosperms in the original matrix (Qiu et al. 2000)
were kept after the eight gymnosperms were deleted. A
heuristic search with 1,000 random-taxon-addition replicates and the same tree search procedure described
above was conducted.
Results
The details of search results of the 24 analyses using divergent sequences as the outgroup are presented
in table 1. One of the most parsimonious trees found in
the analysis where a random sequence (sequence 1) with
equal base frequencies was used as the outgroup is presented in figure 1. The branch lengths, the bootstrap
values, and the nodes that collapsed in the strict consensus are shown in the figure. All 24 analyses except
one produced virtually the same topology, with the first
branch of angiosperm phylogeny being identified as either Acorus (which consists of two species, A. calamus
and A. gramineus) or Alisma and the monocots forming
1748
Qiu et al.
Rooting the Angiosperm Phylogeny
a grade at the base of the phylogeny (table 1 and fig.
1). Only in the analysis where the aligned bryophyte
sequence was used as the outgroup was the first branch
of angiosperm trees composed of Alisma, Triglochin,
and Potamogeton, and this variance could be due to the
slightly different alignment used in the analysis. In all
analyses, the topology of the strict consensus of the
most-parsimonious trees was essentially identical to that
of our earlier studies (Qiu et al. 1999, 2000), with the
conspicuous exceptions of the monocot rooting and the
ANITA lineages forming a clade (which occurred in the
earlier rbcL analyses when Ceratophyllum was placed
as a sister to all other angiosperms; see below; Chase et
al. 1993; Qiu et al. 1993). Magnoliales were sister to
Laurales, and Winterales were sister to Piperales, and
together these four clades formed the eumagnoliid clade.
Relationships among the eumagnoliids, eudicots, Chloranthaceae, Ceratophyllum, and ANITA were unresolved
or resolved with no or low bootstrap support.
In the unrooted ingroup analysis, we found two islands of 12 equally most parsimonious trees with a
length of 10,520 steps, a consistency index of 0.411, and
a retention index of 0.610. One of the trees, presented
as an unrooted network showing branch lengths, is
shown in figure 2. It is obvious that the longest ingroup
branches are those leading to Alisma, Potamogeton,
Triglochin, A. calamus, and A. gramineus, when the tree
centers around the juncture of the ANITA lineages, Piperales, Winterales, Laurales, Magnoliales, eudicots,
Chloranthaceae, monocots, and Ceratophyllum.
Discussion
The mystique surrounding LBA (Felsenstein 1978;
Hendy and Penny 1989) has created a situation in which
the phenomenon is frequently invoked to explain a topology that seems to be in conflict with other evidence
or simply to dismiss an unfavorable topology. However,
cases in which LBA is explicitly demonstrated and carefully investigated are few (Miyamoto and Boyle 1989;
Wheeler 1990; Maddison, Ruvolo, and Swofford 1992;
Graham 1997, pp. 122–161; Huelsenbeck 1997; Sullivan
and Swofford 1997; Siddall and Whiting 1999; Sanderson et al. 2000). In the case of reconstruction of basal
angiosperm phylogeny, taxa with very different morphologies were placed at the base of the trees in earlier
molecular studies that analyzed different data sets:
Schisandraceae in nuclear rbcS (Martin and Dowd
1991), Nymphaeales in nuclear rDNAs as well as plastid
ITS and rDNA (Hamby and Zimmer 1992; Goremykin
et al. 1996; Chaw et al. 1997), Ceratophyllum in plastid
rbcL (Les, Garvin, and Wimpee 1991; Chase et al. 1993;
1749
Qiu et al. 1993), and Austrobaileya-Illiciales and Amborella in nuclear 18S rDNA (Soltis et al. 1997). These
seemingly unstable results heightened plant systematists’ fear of the effect of LBA on rooting of the angiosperm phylogeny when gymnosperms, which always
have the longest branch in any data set, were used as
the outgroup (Qiu et al. 1993; Donoghue and Mathews
1998). One critical issue that has not been addressed in
any of the molecular studies using gymnosperms as the
outgroup to root the angiosperm phylogeny is whether
they really are divergent enough to cause LBA (Martin
and Dowd 1991; Hamby and Zimmer 1992; Chase et al.
1993; Qiu et al. 1993, 1999, 2000; Doyle, Donoghue,
and Zimmer 1994; Goremykin et al. 1996; Chaw et al.
1997; Soltis et al. 1997, 2000; Parkinson, Adams, and
Palmer 1999; Soltis, Soltis, and Chase 1999; Barkman
et al. 2000; Graham and Olmstead 2000; Savolainen et
al. 2000). Here, we demonstrate the conditions under
which LBA really becomes a problem. With two categories of artificial sequences and misaligned gymnosperm sequences as the outgroups, we consistently rooted the angiosperm phylogeny at either Acorus or Alisma
(table 1 and fig. 1), two of the longest branches (even
longer than any ANITA members or Ceratophyllum)
among all the angiosperms (fig. 2). These outgroup sequences plainly contain no historical information and
have immensely long branches in comparison with all
others in the trees (fig. 1 and data not shown). The
branch length of the outgroup, random sequence 1,
shown in figure 1 (6,065 steps!) is also in stark contrast
to that of the gymnosperms in our earlier study (354
steps). The fact that virtually the same topology was
reproduced in all of these analyses suggests that we have
demonstrated the conditions under which genuine LBA
can occur, and this is what the earlier authors had predicted (Miyamoto and Boyle 1989; Wheeler 1990). The
variation of rooting between Acorus (45.5% GC) and
Alisma (48.1% GC) appears to be correlated with the
GC content of the outgroup sequence (table 1). A few
exceptions (rand-seq-2, rand-seq-4, rand-seq-10, and the
aligned bryophyte) may be due to the altered GC content
of informative sites relative to the entire sequence.
Therefore, these analyses demonstrate that while the
gymnosperm sequences are highly divergent relative to
those of angiosperms, they are not divergent enough to
cause LBA and thus were an appropriate outgroup in
our original studies (Qiu et al. 1999, 2000). Consequently, the placement of the ANITA lineages at the base of
the angiosperm phylogeny was based on unique historical signals preserved in the gymnosperm sequences and
was not caused by LBA.
←
FIG. 1.—One of the nine most-parsimonious trees found in the search in which random sequence 1 was used to replace the eight gymnosperms as the outgroup in the five-gene matrix from Qiu et al. (2000) to root the angiosperm phylogeny. Numbers above branches are branch
lengths (ACCTRAN optimization); those below in italics are bootstrap values (only those .50% are shown). The nodes labeled with asterisks
are collapsed in the strict consensus of the nine shortest trees. Abbreviations: MON, monocots; CER, Ceratophyllum; CHL, Chloranthaceae;
ITA, Illiciales, Trimeniaceae, and Austrobaileya; AMB, Amborella; NYM, Nymphaeales; EUD, eudicots; WIN, Winterales; PIP, Piperales; MAG,
Magnoliales; LAU, Laurales; Acoruspc, Acorus calamus; Acoruspg, Acorus gramineus; Ceratophyllumpd, C. demersum; Ceratophyllumps, C.
submersum; Rand seq 1, random sequence 1.
1750
Qiu et al.
FIG. 2.—One of the 12 most-parsimonious trees found in the ingroup (angiosperms) only analysis. Numbers along branches are branch
lengths (ACCTRAN optimization). The tree is shown as an unrooted phylogram. The part of the tree covering Laurales exclusive of Calycanthaceae (abbreviated as ‘‘L’’) is shown in 23 magnification.
Rooting the Angiosperm Phylogeny
The next question to ask is whether the ANITA
rooting can still be an artifact caused by some mechanisms that generate similarities in unrelated lineages by
chance but do not necessarily produce long branches.
One molecular evolutionary phenomenon, RNA editing,
so far known to occur only in organellar genomes
(Yoshinaga et al. 1996; Steinhauser et al. 1999), may be
such a mechanism (Bowe and dePamphilis 1996; Qiu
and Palmer 1999). Nevertheless, individual analyses of
three genes from two organellar genomes (mitochondrial
atp1 and matR and plastid atpB) have all identified the
ANITA clades as the earliest-branching angiosperm lineages (Qiu et al. 1999, 2000; Barkman et al. 2000; Savolainen et al. 2000). It is highly unlikely that the three
genes in two genomes would experience extensive RNA
editing in both gymnosperms and the ANITA members
but not in any other lineages. Furthermore, an analysis
of the nuclear 18S rDNA alone with extensive taxon
sampling also placed Austrobaileya-Illiciales and Amborella at the base of angiosperm phylogeny (Soltis et
al. 1997). No RNA editing has been reported at this
locus to date. Finally, and most importantly, rooting of
the angiosperm phylogeny using duplicated nuclear phytochrome genes has produced a similar result (Mathews
and Donoghue 1999, 2000), reinforcing our belief that
the ANITA rooting was not caused by RNA editing.
GC content bias is another mechanism that does
not necessarily increase branch length dramatically but
still can generate analytic artifacts in phylogenetic analysis of DNA sequences (Steel, Lockhart, and Penny
1993). A brief examination of the GC content in the five
genes across all major lineages of basal angiosperms and
gymnosperms shows that there is no significant difference among lineages. Thus, it is unlikely that the ANITA rooting was affected by this factor.
A final question to ask is whether the concern that
distant outgroups could cause LBA was well placed
(Miyamoto and Boyle 1989; Wheeler 1990; Qiu et al.
1993; Donoghue and Mathews 1998; Qiu and Palmer
1999). Our analyses using well-aligned lycopod and
bryophyte sequences as the outgroup to root the angiosperm phylogeny indicate that exceedingly divergent
outgroups can indeed generate a spurious rooting topology. Both analyses identified either Acorus or AlismaTriglochin-Potamogeton as the first branch of the angiosperm phylogeny and placed the monocots as a basal
grade (table 1). These results suggest that the lycopod
and bryophyte sequences are so divergent that they behave like random sequences. The outgroup branch
length in the bryophyte rooting analysis was 1,464 steps,
and that in the lycopod rooting analysis was 1,181 steps,
as opposed to the 354 steps of the gymnosperm branch
in Qiu et al. (2000). (Note that the alignment used for
the bryophyte and lycopod rooting analyses was a
slightly different one.) On the other hand, placing
aligned gymnosperm sequences back into the matrix
produced the ANITA rooting again (data not shown; the
gymnosperms formed a monophyletic group, and the
Gnetum-Welwitschia clade was sister to Pinus), supporting the earlier suggestion that one can avoid LBA
by judiciously increasing taxon sampling to break long
1751
branches (Chase et al. 1993; Hillis 1996; Graybeal 1998;
Soltis et al. 1998; Qiu et al. 1999; Qiu and Palmer
1999).
The analyses presented here demonstrate that the
gymnosperms were an appropriate outgroup with which
to root the angiosperm phylogeny in our earlier multigene analyses (Qiu et al. 1999, 2000) and that the ANITA rooting is likely free of the LBA effect. Several
other multigene analyses reached similar conclusions on
the identity of the earliest angiosperms (Parkinson, Adams, and Palmer 1999; Soltis, Soltis, and Chase 1999;
Barkman et al. 2000; Graham and Olmstead 2000; Soltis
et al. 2000). It can be extrapolated that their use of gymnosperms as the outgroup did not violate any fundamental rule of choosing an appropriate outgroup. In retrospect, gymnosperms were well-behaved outgroups
even in most single-gene analyses. Various members of
the ANITA grade were placed at the base of angiosperm
trees: Schisandraceae in nuclear rbcS (Martin and Dowd
1991), Nymphaeales in nuclear rDNAs as well as plastid
ITS and rDNA (Hamby and Zimmer 1992; Goremykin
et al. 1996; Chaw et al. 1997), and Austrobaileya-Illiciales and Amborella in nuclear 18S rDNA (Soltis et al.
1997). Insufficient taxon sampling in all of these studies
and the use of single genes (which obviously contain
less signal than multigene data sets) naturally complicate the effort of building a well-resolved phylogeny and
lead to the suspicion that these seemingly different rooting topologies were produced by LBA due to the great
divergence between gymnosperms and angiosperms.
Ironically, the only single-gene analyses that sampled
basal angiosperms extensively produced a rooting that
seems to be an analytical artifact, i.e., the Ceratophyllum
rooting (Chase et al. 1993; Qiu et al. 1993). A reanalysis
of the rbcL matrix used in our recent multigene analyses
(Qiu et al. 1999, 2000) shows that even the placement
of Ceratophyllum as the sister to all other angiosperms
was also largely due to the historical signal contained
in the gymnosperm sequences. When the gymnosperm
sequences were replaced with the artificial sequences
and misaligned gymnosperm sequences used in this
study, the angiosperm trees were rooted at various taxa
that have branches longer than Ceratophyllum (data not
shown). Ceratophyllum is likely an early-diverging lineage of angiosperms, even though its exact relationship
to other major clades of basal angiosperms is not well
resolved at present (Qiu et al. 1999, 2000; Soltis, Soltis,
and Chase 1999; Mathews and Donoghue 2000; Savolainen et al. 2000; Soltis et al. 2000). Thus, its placement
at the base of angiosperm trees in the rbcL analyses was
probably caused by both phylogenetic signal and a few
homoplasious changes that happened to be shared with
gymnosperms (not necessarily by LBA).
Reconstruction of phylogenetic relationships at the
bases of major clades using molecular sequence data
routinely generates controversial results (Qiu and Palmer 1999; Adoutte et al. 2000; Philippe, Germot, and
Moreira 2000), largely due to use of divergent outgroups
and sparse taxon sampling. The LBA problem is frequently invoked to explain results that are otherwise inexplicable. Nevertheless, most claims of LBA have not
1752
Qiu et al.
been substantiated by explicit analyses. Several parsimony- or likelihood-based tests have been developed to
examine whether long branches indeed attract each other
and to reduce the LBA effect (Huelsenbeck 1997; Lyons-Weiler and Hoelzer 1997; Willson 1999; Sanderson
et al. 2000). The strategy employed here follows the
ideas of Miyamoto and Boyle (1989), Wheeler (1990),
Maddison, Ruvolo, and Swofford (1992), Graham
(1997), and Sullivan and Swofford (1997) in using random sequences to evaluate whether phylogenetic signal
in the outgroup has been randomized. We further elaborated this approach by increasing the repertoire of test
sequences by using homo- and heteropolymers, misaligned original outgroup sequences, and more distantly
related aligned outgroup sequences. In particular, this
last category of outgroup sequences showed several increments of divergence levels and helped to define the
point beyond which the outgroup was no longer appropriate for rooting the ingroup. As it becomes clear that
sampling multiple genes from all two or three genomes
of a large number of organisms can lead to reliable reconstruction of complicated organismal phylogenies
(Hillis 1996; Qiu et al. 1999, 2000; Soltis, Soltis, and
Chase 1999; Savolainen et al. 2000; Soltis et al. 2000)
and that the LBA problem is tractable thanks to the various strategies that are being developed, phylogenetic
analyses of DNA sequences will undoubtedly, along
with comparative genomics and evolutionary developmental biology, allow evolutionary biologists to tackle
many of the issues in the tree of life.
Acknowledgments
We thank Ronald Adkins, Albert Blarer, James A.
Doyle, Eva Goldwater, Sean Graham, Margaret Hoey,
Libo Li, and Peter F. Stevens for helpful suggestions,
and Schweizerischer Nationalfonds and University of
Massachusetts for financial support.
LITERATURE CITED
ADOUTTE, A., G. BALAVOINE, N. LARTILLOT, O. LESPINET, B.
PRUD’HOMME, and R. DE ROSA. 2000. The new animal phylogeny: reliability and implications. Proc. Natl. Acad. Sci.
USA 97:4453–4456.
BARKMAN, T. J., G. CHENERY, J. R. MCNEAL, J. LYONS-WEILER, W. J. ELLISENS, G. MOORE, A. D. WOLFE, and C. W.
DEPAMPHILIS. 2000. Independent and combined analyses of
sequences from all three genomic compartments converge
on the root of flowering plant phylogeny. Proc. Natl. Acad.
Sci. USA 97:13166–13171.
BOWE, L. M., and C. W. DEPAMPHILIS. 1996. Effects of RNA
editing and gene processing on phylogenetic reconstruction.
Mol. Biol. Evol. 13:1159–1166.
CARROLL, S. B. 1995. Homeotic genes and the evolution of
arthropods and chordates. Nature 376:479–485.
CHASE, M. W., D. E. SOLTIS, R. G. OLMSTEAD et al. (39 coauthors). 1993. Phylogenetics of seed plants: an analysis of
nucleotide sequences from the plastid gene rbcL. Ann. Mo.
Bot. Gard. 80:528–580.
CHAW, S.-M., A. ZHARKIKH, H.-M. SUNG, T.-C. LAU, and W.H. LEE. 1997. Molecular phylogeny of extant gymnosperms
and seed plant evolution: analysis of nuclear 18S rRNA
sequences. Mol. Biol. Evol. 14:56–68.
DAHLGREN, R., and K. BREMER. 1985. Major clades of the
angiosperms. Cladistics 1:349–368.
DAVIDSON, E. H., K. J. PETERSON, and R. A. CAMERON. 1995.
Origin of bilaterian body plans: evolution of developmental
regulatory mechanisms. Science 270:1319–1325.
DONOGHUE, M. J. 1994. Progress and prospects in reconstructing plant phylogeny. Ann. Mo. Bot. Gard. 81:405–418.
DONOGHUE, M. J., and J. A. DOYLE. 1989. Phylogenetic analysis of angiosperms and the relationships of Hamamelidae.
Pp. 17–45 in P. R. CRANE and S. BLACKMORE, eds. Evolution, systematics, and fossil history of the Hamamelidae.
Vol. 1. Clarendon, Oxford, England.
DONOGHUE, M. J., and S. MATHEWS. 1998. Duplicated genes
and the root of angiosperms, with an example using phytochrome sequences. Mol. Phylogenet. Evol. 9:489–500.
DOYLE, J. A., M. J. DONOGHUE, and E. A. ZIMMER. 1994.
Integration of morphological and ribosomal RNA data on
the origin of angiosperms. Ann. Mo. Bot. Gard. 81:419–
450.
DOYLE, J. J. 1994. Evolution of a plant homeotic multigene
family: toward connecting molecular systematics and molecular developmental genetics. Syst. Biol. 43:307–328.
FARRIS, J. S. 1972. Estimating phylogenetic trees from distance
matrices. Am. Nat. 106:645–668.
FELSENSTEIN, J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool.
27:401–410.
FROHLICH, M. W., and E. M. MEYEROWITZ. 1997. The search
for homeotic gene homologs in basal angiosperms and Gnetales: a potential new source of data on the evolutionary
origin of flowers. Int. J. Plant Sci. 158:S131–S142.
GOGARTEN, J. P., H. KILBAK, P. DITTRICH, L. TAIZ, E. J. BOWMAN, B. J. BOWMAN, M. F. MANOLSON, R. J. POOLE, T.
DATE, and T. OSHIMA. 1989. Evolution of vacuolar H1ATPase: implications for the origin of eukaryotes. Proc.
Natl. Acad. Sci. USA 86:6661–6665.
GOREMYKIN, V., V. BOBROVA, J. PAHNKE, A. TROITSKY, A.
ANTONOV, and W. MARTIN. 1996. Noncoding sequences
from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support Gnetalean affinities of
angiosperms. Mol. Bio. Evol. 13:383–396.
GRAHAM, S. W. 1997. Phylogenetic analysis of breeding system evolution in heterostylous monocotyledons. Ph.D. dissertation, University of Toronto, Toronto, Canada.
GRAHAM, S. W., and R. G. OLMSTEAD. 2000. Utility of 17
chloroplast genes for inferring the phylogeny of the basal
angiosperms. Am. J. Bot. 87:1712–1730.
GRAYBEAL, A. 1998. Is it better to add taxa or characters to a
difficult phylogenetic problem? Syst. Biol. 47:9–17.
HAMBY, R. K., and E. A. ZIMMER. 1992. Ribosomal RNA as
a phylogenetic tool in plant systematics. Pp. 50–91 in P. S.
SOLTIS, D. E. SOLTIS, and J. J. DOYLE, eds. Molecular systematics of plants. Chapman and Hall, New York.
HENDY, M. D., and D. PENNY. 1989. A framework for the
quantitative study of evolutionary trees. Syst. Zool. 38:297–
309.
HILLIS, D. M. 1996. Inferring complex phylogenies. Nature
383:130–131.
HUELSENBECK, J. P. 1997. Is the Felsenstein zone a fly trap?
Syst. Biol. 46:69–74.
IWABE, N., K.-I. KUMA, M. HASEGAWA, S. OSAWA, and T.
MIYATA. 1989. Evolutionary relationship of archaebacteria,
eubacteria, and eukaryotes inferred from phylogenetic trees
of duplicated genes. Proc. Natl. Acad. Sci. USA 86:9355–
9359.
KELLOGG, E. A., and H. B. SHAFFER. 1993. Model organisms
in evolutionary studies. Syst. Biol. 42:409–414.
Rooting the Angiosperm Phylogeny
LES, D. H., D. K. GARVIN, and C. F. WIMPEE. 1991. Molecular
evolutionary history of ancient aquatic angiosperms. Proc.
Natl. Acad. Sci. USA 88:10119–10123.
LOCONTE, H., and D. W. STEVENSON. 1991. Cladistics of the
Magnoliidae. Cladistics 7:267–296.
LYONS-WEILER, J., and G. A. HOELZER. 1997. Escaping from
the Felsenstein zone by detecting long branches in phylogenetic data. Mol. Phylogenet. Evol. 8:375–384.
MADDISON, D. R., M. RUVOLO, and D. L. SWOFFORD. 1992.
Geographic origins of human mitochondrial DNA: phylogenetic evidence from control region sequences. Syst. Biol.
41:111–124.
MADDISON, W. P., M. J. DONOGHUE, and D. R. MADDISON.
1984. Outgroup analysis and parsimony. Syst. Zool. 33:83–
103.
MANHART, J. R., and J. D. PALMER. 1990. The gain of two
chloroplast tRNA introns marks the green algal ancestors
of land plants. Nature 345:268–270.
MARTIN, P. G., and J. M. DOWD. 1991. Studies of angiosperm
phylogeny using protein sequences. Ann. Mo. Bot. Gard.
78:296–337.
MATHEWS, S., and M. J. DONOGHUE. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome
genes. Science 286:947–950.
———. 2000. Basal angiosperm phylogeny inferred from duplicated phytochromes A and C. Int. J. Plant Sci. 161:S41–
S55.
MIYAMOTO, M. M., and S. M. BOYLE. 1989. The potential
importance of mitochondrial DNA sequence data to eutherian mammal phylogeny. Pp. 437–450 in B. FERNHOLM, K.
BREMER, and H. JOERNVALL, eds. The hierarchy of life. Elsevier, Amsterdam, the Netherlands.
NIXON, K. C., and J. M. CARPENTER. 1993. On outgroups. Cladistics 9:413–426.
PARKINSON, C. L., K. L. ADAMS, and J. D. PALMER. 1999.
Multigene analyses identify the three earliest lineages of
extant flowering plants. Curr. Biol. 9:1485–1488.
PHILIPPE, H., A. GERMOT, and D. MOREIRA. 2000. The new
phylogeny of eukaryotes. Curr. Opin. Genet. Dev. 10:596–
601.
QIU, Y.-L., M. W. CHASE, D. H. LES, and C. R. PARKS. 1993.
Molecular phylogenetics of the Magnoliidae: cladistic analyses of nucleotide sequences of the plastid gene rbcL. Ann.
Mo. Bot. Gard. 80:587–606.
QIU, Y.-L., Y. CHO, J. C. COX, and J. D. PALMER. 1998. The
gain of three mitochondrial introns identifies liverworts as
the earliest land plants. Nature 394:671–674.
QIU, Y.-L., J. LEE, F. BERNASCONI-QUADRONI, D. E. SOLTIS, P.
S. SOLTIS, M. ZANIS, E. A. ZIMMER, Z. CHEN, V. SAVOLAINEN, and M. W. CHASE. 1999. The earliest angiosperms:
evidence from mitochondrial, plastid and nuclear genomes.
Nature 402:404–407.
———. 2000. Phylogeny of basal angiosperms: Analyses of
five genes from three genomes. Int. J. Plant Sci. 161:S3–
S27.
QIU, Y.-L., and J. D. PALMER. 1999. Phylogeny of early land
plants: insights from genes and genomes. Trends Plant Sci.
4:26–30.
RAFF, R. A. 1996. The shape of life. University of Chicago
Press, Chicago.
RAUBESON, L. A., and R. K. JANSEN. 1992. Chloroplast DNA
evidence on the ancient evolutionary split in vascular land
plants. Science 255:1697–1699.
1753
SANDERSON, M. J., M. F. WOJCIECHOWSKI, J.-M. HU, T. SHER
KHAN, and S. G. BRADY. 2000. Error, bias, and long-branch
attraction in data for two chloroplast photosystem genes in
seed plants. Mol. Biol. Evol. 17:782–797.
SAS INSTITUTE. 2000. SAS 8.1. SAS Institute, Cary, N.C.
SAVOLAINEN, V., M. W. CHASE, S. B. HOOT, C. M. MORTON,
D. E. SOLTIS, C. BAYER, M. F. FAY, A. Y. DE BRUIJN, S.
SULLIVAN, and Y.-L. QIU. 2000. Phylogenetics of flowering
plants based upon a combined analysis of plastid atpB and
rbcL gene sequences. Syst. Biol. 49:306–362.
SHUBIN, N., C. TABIN, and S. CARROLL. 1997. Fossils, genes,
and the evolution of animal limbs. Nature 388:639–648.
SIDALL, M. E., and M. F. WHITING. 1999. Long branch abstraction. Cladistics 15:9–24.
SOLTIS, D. E., P. S. SOLTIS, M. W. CHASE et al. (16 co-authors).
2000. Angiosperm phylogeny inferred from 18S rDNA,
rbcL and atpB sequences. Bot. J. Linn. Soc. 133:381–461.
SOLTIS, D. E., P. S. SOLTIS, M. E. MORT, M. W. CHASE, V.
SAVOLAINEN, S. B. HOOT, and C. M. MORTON. 1998. Inferring complex phylogenies using parsimony: an empirical
approach using three large DNA data sets for angiosperms.
Syst. Biol. 47:32–42.
SOLTIS, D. E., P. S. SOLTIS, D. L. NICKRENT et al. (16 coauthors). 1997. Angiosperm phylogeny inferred from 18S
ribosomal DNA sequences. Ann. Mo. Bot. Gard. 84:1–49.
SOLTIS, P. S., D. E. SOLTIS, and M. W. CHASE. 1999. Angiosperm phylogeny inferred from multiple genes as a tool for
comparative biology. Nature 402:402–404.
STEEL, M. A., P. J. LOCKHART, and D. PENNY. 1993. Confidence in evolutionary trees from biological sequence data.
Nature 364:440–442.
STEINHAUSER, S., S. BECKERT, I. CAPESIUS, O. MALEK, and V.
KNOOP. 1999. Plant mitochondrial RNA editing. J. Mol.
Evol. 48:303–312.
STEVENS, P. F. 1980. Evolutionary polarity of character states.
Annu. Rev. Ecol. Syst. 11:333–358.
SULLIVAN, J., and D. L. SWOFFORD. 1997. Are guinea pigs
rodents? The importance of adequate models in molecular
phylogenetics. J. Mamm. Evol. 4:77–86.
SWOFFORD, D. L. 1998. PAUP*: phylogenetic analysis using
parsimony (*and other methods). Version 4.0b2. Sinauer,
Sunderland, Mass.
TAYLOR, D. W., and L. J. HICKEY. 1992. Phylogenetic evidence
for the herbaceous origin of angiosperms. Plant Syst. Evol.
180:137–156.
THEISSEN, G., A. BECKER, A. DI ROSA, A. KANNO, J. T. KIM,
T. MUENSTER, K.-U. WINTER, and H. SAEDLER. 2000. A
short history of MADS-box genes in plants. Plant Mol.
Biol. 42:115–149.
THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN,
and D. G. HIGGINS. 1997. The Clustal X windows interface:
flexible strategies for multiple sequence alignment aided by
quality analysis tools. Nucleic Acids Res. 24:4876–4882.
WHEELER, W. C. 1990. Nucleic acid sequence phylogeny and
random outgroups. Cladistics 6:363–367.
WILLSON, S. J. 1999. A higher order parsimony method to
reduce long-branch attraction. Mol. Biol. Evol. 16:694–705.
YOSHINAGA, K., H. IINUMA, T. MASUZAWA, and K. UEDA.
1996. Extensive RNA editing of U to C in addition to C to
U substitution in the rbcL transcripts of hornwort chloroplasts and the origin of RNA editing in green plants. Nucleic Acids Res. 24:1008–1014.
ELIZABETH KELLOGG, reviewing editor
Accepted May 18, 2001