Changing the Landscape: A New Strategy for

Syst. Biol. 50(1):60–66, 2001
Changing the Landscape: A New Strategy for Estimating
Large Phylogenies
D ONALD L. J. Q UICKE,1 J ASON TAYLOR,2 AND ANDY PURVIS 2,3
1 Unit of Parasite Systematics, CABI, Bioscience UK Centre (Ascot), Department of Biology,
Imperial College at Silwood Park, Ascot, Berkshire, SL5 7PY, and Department of Entomology,
The Natural History Museum, London, SW7 5BD, U.K.; E-mail: [email protected]
2 Department of Biology, Imperial College, Silwood Park, Ascot, Berkshire, SL5 7PY, U.K.;
E-mail: [email protected]
3 NERC Centre for Population Biology, Silwood Park, Ascot, Berkshire, SL5 7PY, U.K.;
E-mail: [email protected]
Abstract.—In this paper we describe a new heuristic strategy designed to Žnd optimal (parsimonious)
trees for data sets with large numbers of taxa and characters. This new strategy uses an iterative
searching process of branch swapping with equally weighted characters, followed by swapping with
reweighted characters. This process increases the efŽciency of the search because, after each round of
swapping with reweighted characters, the subsequent swapping with equal weights will start from a
different group (island) of trees that are only slightly, if at all, less optimal. In contrast, conventional
heuristic searching with constant equal weighting can become trapped on islands of suboptimal trees.
We test the new strategy against a conventional strategy and a modiŽed conventional strategy and
show that, within a given time, the new strategy Žnds trees that are markedly more parsimonious.
We also compare our new strategy with a recent, independently developed strategy known as the
Parsimony Ratchet. [Heuristic algorithms; large phylogenies; Parsimony Ratchet; search time.]
Without phylogenies, many evolutionary
studies would be difŽcult, if not impossible
(Harvey et al., 1996). In studies that utilize
phylogenies, the inclusion of more taxa often
allows the analysis of more data. Large phylogenies, therefore, have some advantages,
such as increasing sample size for statistical analyses and lending greater credibility
to extrapolations from the data. Large phylogenies also facilitate broad-scale studies that
prevent our view of evolutionary trends from
becoming distorted by small, possibly nonrepresentative, examples (Gould, 1997). Furthermore, certain types of analysis require
large and complete species-level phylogenies, such as the localization of diversiŽcation rates and trait values to nodes in the
phylogeny to allow sister clade comparisons
(Barraclough et al., 1998).
The importance of phylogenetic reconstruction has led to investigations of the
consistency, efŽciency, accuracy, and relative
performance of various optimality criteria
(Stewart, 1993; Hillis et al., 1994; Sidow, 1994;
Hillis, 1995) and of the strategies used for
estimating large phylogenies (Farris et al.,
1996; Nixon, 1999). Here, after briey summarizing the problems in searching for optimal trees, we describe a new search strategy
and show that it is markedly more efŽcient
at Žnding optimal trees, in this case most-
parsimonious trees, for data sets containing
large numbers of characters and taxa. We also
compare our strategy with a similar and independently developed strategy known as
the Parsimony Ratchet (Nixon, 1999).
Searching for Optimal Trees
The reconstruction of large phylogenies is
difŽcult, whatever the optimality criterion
used, because the number of possible trees
grows very rapidly as the number of taxa increases. With 10 taxa, for example, >34 £ 106
trees are possible; for 20 taxa, >8:2 £ 1021 ;
and for 228 taxa, »1:2 £ 10502 (Quicke 1993;
Hillis, 1996; Purvis and Quicke, 1997). Consequently, for more than »20 taxa, searching
strategies that guarantee to Žnd the optimal
tree are unreasonably time-consuming, such
that approximate (heuristic) methods have to
be used.
Conventional heuristic strategies generally proceed by randomly choosing taxa and
adding them, one by one, to an optimal
position on the growing tree until all taxa
are added and the tree is complete (random step-wise addition). Unfortunately, as
taxa are added, alternative and better positions for some of the previously added taxa
may be formed. After all taxa have been
added, branch swapping is then used to try
60
2001
QUICKE ET AL.—CHANGING THE LANDSCAPE
61
FIGURE 1. A hypothetical treescape with many suboptimal peaks, each of which has the potential to trap
conventional heuristic searches.
to Žnd these new positions and thus Žnd
shorter trees. However, as the number of taxa
increases, these conventional strategies become increasingly limited because of the time
taken to swap branches, especially when a
large number of very similar trees are held
in memory. The large amount of search time
devoted to branch swapping results in fewer
random additions and thus limits the search
to small groups of similar trees; this in turn reduces conŽdence in the results of the search.
The New Strategy
The new strategy can be explained most
easily with reference to a hypothetical threedimensional tree-landscape (Fig. 1). Both the
horizontal axes of this landscape can be
any measures of tree similarity, whereas the
height of the landscape is tree optimality.
In this study, parsimony has been chosen
as the optimality criterion because several
of the data sets are morphological and also
because this is the most widely used criterion. Therefore, in this study the height
of the peak refers to how parsimonious the
trees are; that is, higher peaks are more parsimonious (Maddison’s [1991] “tree-islands”
analogy). Although we have concentrated on
parsimony analysis, our new strategy could
equally well be applied to other types of analysis, for example, distance methods.
The new strategy (Fig. 2) explores many
more peaks on the treescape than would a
conventional search. This is achieved by initially holding (saving) only one tree during
branch swapping (stage 1), whereas conventional strategies get bogged down by saving
and swapping large numbers of trees (Swofford et al., 1996). On completion of stage 1,
the search will have found at least one tree
FIGURE 2. A step-by-step ow chart of the New strategy. See text for details.
on one, possibly suboptimal, peak. Further
branch swapping would be able only to move
up this peak, which may be only a local
optimum (Maddison, 1991). To escape these
peaks, the characters are reweighted proportionately to their goodness-of-Žt in the previously found trees (stage 2), and because the
data have been altered, the treescape is then
changed. Perhaps isolating dips will be transformed into accessible uphill climbs, thereby
allowing the search to escape from the peaks
during the next phase of branch-swapping
(stage 3). The treescape is then returned to
its original form by giving the characters
their original weights (stage 4). Any uphill
movement across the modiŽed treescape can
move the search off of the peaks on the original treescape, allowing branch swapping to
62
VOL. 50
S YSTEMATIC BIOLOGY
move the search up new, potentially higher,
peaks (stage 5). When tree length stabilizes
(for equally weighted characters), the procedure stops (stage 6). To recap: Stage 1 of the
new strategy involves exploring many peaks
on the treescape. The rest of the stages explore the surroundings by crossing depressions in the treescape. The weighting and
swapping is iterative and can be continued
until tree length stabilizes. This strategy is
more efŽcient than conventional strategies
because, after each round of swapping with
the weighted characters, the swapping with
equal weights starts from trees that are only
slightly less optimal than trees found previously during the search. This is much more
efŽcient than swapping the branches of trees
found by random addition, which could be
much worse than previously found trees.
Furthermore, if one assumes that the phylogenetic information of a character can be
estimated by its Žt with the trees previously
found by the strategy, then the weighting
can be thought of as amplifying this information. Because characters that are considered good indicators of phylogeny are
given more weight, the uphill movement
on the treescape will be in the direction of
shorter trees for the improved (weighted)
data. When the treescape is converted to its
original form, these trees may be longer; that
is, the search may have moved downhill.
However, the search has moved to trees in
which weighted characters have had more
inuence. Therefore, under the assumption
that the characters that best Žt with the trees
are related to their phylogenetic reliability,
the weighting and swapping will cross the
depressions in the direction of more optimal
trees for the equally weighted data rather
than move in a random direction. This will
also increase the efŽciency of the searching.
M ATERIALS AND M ETHODS
Because the efŽciency of the new searching strategy will depend on the data set
investigated—that is, some treescapes will
be easier to search than others—we tested
the new strategy on four sets of data that
vary in number of taxa, number of characters, and character type: (1) 95 morphological
characters for 126 genera of doryctine wasps
(Insecta: Hymenoptera: Braconidae: Doryctinae) (Belokobylskij et al., in prep.), (2) 575
aligned nucleotides of nuclear 28S rDNA
for 111 taxa of eulophid wasps (Insecta:
Hymenoptera: Eulophidae) (Gauthier et al.,
2000), (3) 123 morphological characters for
80 taxa of ichneumonid wasps (Insecta: Hymenoptera: Ichneumonidae) (Quicke et al.,
2000), and (4) 259 morphological characters
for 234 taxa of chewing lice (Insecta: Phthiraptera: Ischnocera: Trichodectidae) (Taylor
et al., in prep.). All the data sets are available from www.bio.ic.ac.uk/research/data/
and a summary description of each data set
is given in the legend to Table 1.
TABLE 1. Shown are the lengths for the trees found by the various strategies for (1) the doryctine wasps data,
which consists of 95 morphological characters (2 uninformative) for 126 general (2) the eulophid wasp data consisting
of 575 aligned nucleotides from the D2 expansion region of the nuclear 28S rDNA gene (328 uninformative sites)
for 111 taxa; (3) the ichneumonid wasp data consisting of 123 morphological characters (1 uninformative) for 80
taxa; and (4) the chewing louse data which consists of 259 morphological characters (6 uninformative) for 234 taxa.
Six replicates were carried out for the conventional, the modiŽed, and the new strategies. See text for details.
Data set
Doryctine wasps
Eulophid wasps
Ichneumonid wasps
Chewing lice
Strategy
Conventional
ModiŽed
New (max CI)
New (max RI)
Conventional
ModiŽed
New (max CI)
New (max RI)
Conventional
ModiŽed
New (max CI)
New (max RI)
Conventional
ModiŽed
New (max CI)
New (max RI)
Tree lengths
663
660
655
656
1867
1863
1863
1863
1367
1368
1366
1365
1647
1643
1641
1645
661
657
655
659
660
658
663
660
656
667
657
654
660
658
655
1867
1862
1863
1864
1863
1863
1867
1863
1863
1863
1863
1863
1867
1862
1862
1369
1367
1364
1367
1366
1365
1368
1368
1364
1369
1366
1364
1364
1367
1366
1647
1644
1643
1639
1643
1642
1645
1643
1645
1639
1640
1644
1647
1640
1639
2001
QUICKE ET AL.—CHANGING THE LANDSCAPE
Each data set was analyzed with the following strategies:
1. Conventional strategy (Con) of building
starting trees by using random addition,
followed by tree bisection and reconnection (TBR) branch swapping with the
number of trees held effectively unlimited
(set to 30,000).
2. ModiŽed conventional strategy (Mod)
performed the same way as the conventional search, but holding only one tree
during TBR swapping; this strategy is
stage 1 of the new strategy (Fig. 2).
3. The new strategy (New), carried out by using maximum character consistency index
(max CI) as the reweighting factor.
4. The same as strategy 3 but reweighting
by using the maximum character retention
index (max RI).
All analyses were carried out with PAUP¤
(4.0d64(PPC, test)) (kindly provided by D.
Swofford) run on a 266 MHz PowerMacintosh G3, and each strategy was allowed a total of 80 min to analyze each data set. A time
limit was set rather than the number of random additions, because in practice, time is
usually the limiting factor. For the New strategy, the 80 min was divided into eight blocks
of 10 min each: The Žrst 10 min was allocated
to obtaining the starting tree (Fig. 2, step 1)
with random additions followed by TBR
branch swapping, holding no more than one
tree during swapping; 10 min was then allocated for each subsequent round of weighting and swapping, and for these, maxtrees
was increased to 30,000 (i.e., effectively unlimited). Each strategy was then performed
six times per data set, except for the New
strategy with max RI weighting, which was
carried out once as a preliminary investigation of different weighting methods. All
analyses were started with a different random seed. To test if the differences in the tree
lengths found by each strategy were significantly different, a one-way ANOVA and a
Least SigniŽcant Difference (LSD) test were
carried out on the results from each data set.
A direct comparison between the new
technique and the Parsimony Ratchet
(Nixon, 1999) was carried out for one of our
four data sets (ichneumonid wasps). We
used only the one data set, in view of the time
taken for completing searches with maxtrees
63
unlimited, as would be the situation in most
real analyses. Although we used only one
data set, 41 random additions with TBR
swapping (number of trees held and saved
unlimited) led to 40 different islands of trees,
which suggests that the treescape contains
many peaks. That makes this a suitable data
set for testing the relative ability of these
strategies to move between the peaks of the
treescape to Žnd more-parsimonious trees.
To compare the strategies, each of the 40
islands was used as a starting point for one
iteration of the New strategy with maximum
values of the rescaled consistency index (RC),
CI, and RI as the reweighting factors. This
resulted in three searches (each with different weighting function) for 40 starting islands, giving a total of 120 searches for the
New strategy. Each of the 40 islands was
also used as a starting point for one iteration
of the Parsimony Ratchet. The Parsimony
Ratchet was simulated by using PAUP¤ by
up-weighting 15% of the characters to a
weight of 2, compared with the remaining
85%, which were weighted at 1 (in line with
Nixon, 1999). Because characters chosen at
random might affect the performance of the
Parsimony Ratchet, each search was therefore performed with 13 sets of randomly chosen characters up-weighted. This resulted in
13 searches consisting of a single iteration
for each of the 40 starting islands, for a total of 520 searches. Note that the same 13
sets of randomly chosen and up-weighted
characters were used on each of the starting
islands.
R ES ULTS
Comparisons between Con, Mod, and New
Strategies
The lengths of the shortest trees found by
each strategy are shown in Table 1 and the results from the ANOVA and LSD tests carried
out on these tree lengths are given in Table 2.
The results show that, except for the chewing
lice data, and the modiŽed strategy for the
eulophid molecular data, the New method
Žnds signiŽcantly shorter trees. In the New
strategy’s analyses of the ichneumonid morphological data, the search came to a premature Žnish; that is, a single tree was found
in ·80 min. The same numbers of iterations
were carried out, but the search time was
greatly decreased, averaging 28 min. Moreover, the tree found in this much shorter
64
VOL. 50
S YSTEMATIC BIOLOGY
TABLE 2. The results of the statistical analysis of the tree lengths found by various strategies (data in Table 1).
The P-values and the F-ratios (degrees of freedom in brackets) from the ANOVA are shown in the left of the table.
The right of the table contains the comparisons made by the LSD test to show which strategies Žnd signiŽcantly
different tree lengths. The Žnal column contains various conŽdence intervals of the differences in tree length in
the strategies being compared. If this range does not encompass zero, the results are signiŽcantly different. If both
values are <0, the Žrst-named strategy Žnds signiŽcantly shorter trees; if both are positive, the Žrst-named strategy
Žnds signiŽcantly longer trees. The LSD test were carried out for 90%, 95%, and 98% conŽdence intervals; however,
all tests were either signiŽcant at the 2% level (P < 0:02; identiŽed with¤ ) or not signiŽcant at the 10% level (P > 0:1;
ns). The LSD tests were not performed for the louse data because the ANOVA showed no signiŽcant difference
among strategies.
ANOVA
LSD
Data set
F (2, 17)
P-value
Dorycine wasps
16.23
<0.01
Eulophidae
15.04
<0.01
Icneumonid wasps
6.34
0.01
Chewing lice
0.80
0.469
time was signiŽcantly shorter ( p · 0:01) than
those found by the conventional method.
The new searching strategy was also carried out by using max RI to reweight the
characters. Although only one replicate was
performed, the results suggest that, at least
for these four data sets, using max RI
and max CI weighting produced no obvious difference between the lengths of trees
found.
Comparisons
ConŽdence interval
New versus Con
New versus Mod
Mod versus Con
New versus Con
New versus Mod
Mod versus Con
New versus Con
New versus Mod
Mod versus Con
¡4.484, ¡0.516 ¤
¡4.151, ¡0.182 ¤
¡1.670, 1.003ns
¡4.691, ¡1.309 ¤
¡0.972, 1.306ns
¡4.857, ¡1.476
¡4.484, ¡0.516 ¤
¡4.151, ¡0.182 ¤
¡1.670, 1.003ns
trial that never led to trees longer than the
starting trees.
D IS CUS SION
We have described our New strategy and
shown that it Žnds signiŽcantly shorter
trees than both the conventional and modiŽed conventional strategy (Table 2). However, at least three issues remain to be
dealt with: choosing a method for character
Comparison with the Parsimony Ratchet
The results of our comparisons are presented in Table 3. Of the 13 ratchet replicates, some of the randomly selected and
up-weighted character sets consistently led
to shorter trees than the starting island (see
Table 3, best), whereas others led to trees that
were longer than the starting trees more often than to shorter ones (Table 3, worse). The
median performance of the ratchet indicates
that it performs well, leading to shorter tree
islands on 13 occasions, leading to longer islands on 4 occasions, and making no difference in 23 runs. The New strategy, using both
RC and CI as reweighting functions, performed marginally better than the median
ratchet, Žnding shorter trees from 15 and 14
of the starting islands, respectively, although
with RC, more islands led to worse trees than
in the median ratchet. Using RI found shorter
trees fewer times than the median ratchet, but
it was the only reweighting scheme in this
TABLE 3. Comparison of the New strategy and a simulation of the Parsimony Ratchet (using 15% of randomly selected characters up-weighted by 1). The number of reweighting runs (out of 40) that found shorter or
longer trees than the starting trees for the New strategy
using maximum RC, CI, and RI as weighting functions
and are shown on the left half of the table. The results
from the Ratchet simulation are shown on the right half
of the table. The column entitled Best shows the number
of the 40 replicates that found trees shorter or longer than
the starting tree by using the set of randomly chosen and
up-weighted characters that performed best. The Worse
column contains the number of the 40 replicates that
found trees shorter or longer than the starting tree when
the set of randomly chosen and up-weighted characters
used performed worse. The Žnal column (Median) contains the median number of shorter or longer trees found
from the 13 sets of randomly chosen and up-weighted
characters.
New strategy
Shorter
Longer
Parsimony ratchet
RC
CI
RI
Best
Worse
Median
15
9
14
3
10
0
22
1
10
18
13
4
2001
QUICKE ET AL.—CHANGING THE LANDSCAPE
reweighting; possible modiŽcations to our
strategy; and how our work relates to the Parsimony Ratchet (Nixon, 1999).
Reweighting the Characters
Character reweighting can be carried out
automatically within the PAUP¤ program
by a range of methods. Unfortunately, it is
difŽcult to determine whether a particular
method of weighting is suitable for all data
sets, or whether an examination of the data
will allow the appropriate method to be determined. Therefore, the user must make the
choice of method. In a further analysis of
the chewing lice data (Taylor et al., in prep.),
we chose the max CI because it cannot weight
characters as zero. This is an attractive property because the data contain much homoplasy. If the chosen method can allocate zero
weights, many potentially useful characters
would be excluded from the analysis on the
basis of their assessment on the preliminary
starting trees.
In this study the original weights of the
characters are equal. However, this is not a
necessary part of the strategy. Characters can
be given any starting weights that are appropriate, and after each round of reweighting
and swapping, they should be returned to
those weights.
Strategy ModiŽcations
One obvious modiŽcation that can be
made to our strategy is to carry out the
search in parallel for each initial tree found
by stage 1. This would explore more of the
treescape by exploring around each peak and
is likely to be particularly important when
the treescape is too large to be examined by
starting from one point. Alternatively, rather
than stopping the iterations when tree length
stabilizes, more iterations could be carried
out to explore more of the treescape.
65
portant is currently undetermined and will
require further work.
The availability of two similar, contemporary, strategies can have only positive consequences. For instance, researchers estimating
large phylogenies now have a choice of two
strategies that are clearly described, implemented by using two different user-friendly
programs, and both proven to work better
than conventional strategies. Not only does
this give a choice of strategies, it also encourages additional development and improvements. An example of such an anticipated improvement may come from further
investigation of the most appropriate methods for character weighting (see also Nixon,
1999). This may clarify the problem of choosing a method for character weighting or
lead to implementation of other versions of
these strategies that are suitable for particular types of data. If no one method is proved
to be better for all data sets, that might lead
to the implementation of our strategy on
fast parsimony programs such as NONA
(Goloboff, 1993), which, as currently set up,
cannot be used with our strategy.
In summary, the New strategy is easy
to understand, easy to implement by using
PAUP, and more efŽcient than conventional
methods. This, coupled with other methods
for constructing large trees (Farris et al., 1996;
Goloboff, 1999; Nixon, 1999), and the inevitable future improvements, will facilitate
the estimation of large, useful phylogenies
that, until now, have been very difŽcult.
ACKNOWLEDGMENTS
We thank A. Webster, A. Katzourakis, G. Broad,
P.-M. Agapow, J. Harral, and especially M. Goddard and
T. Barraclough for comments on the manuscript. We also
thank D. Swofford for access to developmental versions
of PAUP¤ and two anonymous reviewers for their helpful comments. This work was funded by a Natural Environment Research Council (NERC) PhD studentship
and the NERC Initiative in Taxonomy.
The Parsimony Ratchet
R EFERENCES
As previously mentioned, the Parsimony
Ratchet (Nixon, 1999) is a similar, independently derived strategy. Its main difference is
that, to escape the peaks, the weight of a set
of randomly chosen characters is increased
by 1 (or more) rather than by their Žt to the
previously found trees. Whether any of the
differences in weighting between the New
strategy and the Parsimony Ratchet is im-
B ARRACLOUGH, T. G., A. P. VOGLER, AND P. H. HARVEY.
1998. Revealing factors that promote speciation. Philos. Trans. R. Soc. London Ser. B 353:241–249.
FARRIS , J. S ., V. A. ALBERT , M. K ALLERSJO , D. LIPS COMB ,
AND A. G. K LUGE . 1996. Parsimony jackkniŽng outperforms neighbor-joining. Cladistics 12:99–124.
G AUTHIER, N., J. LASALLE, D. L. J. Q UICKE, AND
H. C. J. G ODFRAY. 2000. The phylogeny and classiŽcation of the Eulophidae (Hymenoptera, Chalcidoidea) and the recognition that the Elasmidae are
derived eulophids. Syst. Entomol. 25:251–539.
66
S YSTEMATIC BIOLOGY
G OLOBOFF, P. A. 1993. NONA, version 1.1. Computer
program and documentation distributed by J. M.
Carpenter, American Museum of Natural History,
New York. ([email protected]).
G OLOBOFF, P. A. 1999. Analyzing large data sets in reasonable times: Solutions for composite optima. Cladistics 15:415–428.
G OULD , S . J. 1997. Cope’s rule as psychological artefact.
Nature 385:199–200.
HARVEY, P. H., A. J. L. B ROWN, J. M. S MITH, AND S. NEE.
1996. New uses for new phylogenies. Oxford University Press, Oxford.
HILLIS , D. M. 1995. Approaches for assessing phylogenetic accuracy. Syst. Biol. 44:3–16.
HILLIS , D. M. 1996. Inferring complex phylogenies.
Nature 383:130–131.
HILLIS , D. M., J. P. HUELSENBECK , AND C. W. CUNNING HAM . 1994. Application and accuracy of molecular
phylogenies. Science 264:671–264.
M ADDIS ON, D. R. 1991. The discovery and importance
of multiple islands of most parsimonious trees. Syst.
Zool. 40:315–328.
NIXON, K. C. 1999. The Parsimony Ratchet, a new
method for rapid parsimony analysis. Cladistics
15:407–414.
VOL. 50
P URVIS , A., AND D. L. J. Q UICKE. 1997. Building phylogenies: Are the big easy? Trends Ecol. Evol. 12:
49–50.
Q UICKE, D. L. J. 1993. Principles and techniques of
contemporary taxonomy. Chapman and Hall, St
Edmunds, Suffolk, U.K.
Q UICKE, D. L. J., M. G. FITTON, D. G. NOTTON, G. R.
B ROAD, AND K. D OLPHIN. 2000. Phylogeny of the subfamilies of Ichneumonidae (Hymenoptera): A simultaneous molecular and morphological analysis. Pages
74–83 in Hymenoptera: evolution, biodiversity and biological control (A. D. Austin and M. Dowton, eds.).
CSIRO, Canberra.
S IDOW, A. 1994. Parsimony or statistics? Nature 367:
26–27.
S TEWART , C.-B. 1993. The powers and pitfalls of parsimony. Nature 361:603–607.
S WOFFORD, D. L., G. J. O LSEN, P. J. WADDELL, AND D. M.
HILLIS . 1996. Phylogenetic inference. Pages 407–514.
in Molecular systematics (D. M. Hillis, C. Moritz, and
B. K. Mable, eds.). Sinauer Associates, Sunderland,
Massachusetts.
Received 9 June 1999; accepted 30 December 1999
Associate Editor: R. Olmstead