Characterizing the Phylogenetic Tree

Syst. Biol. 61(2):228–239, 2012
c The Author(s) 2011. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.
For Permissions, please email: [email protected]
DOI:10.1093/sysbio/syr097
Advance Access publication on November 10, 2011
Characterizing the Phylogenetic Tree-Search Problem
D ANIEL M ONEY AND S IMON W HELAN∗
Faculty of Life Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK;
to be sent to: University of Manchester, Faculty of Life Sciences, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK;
E-mail: [email protected].
∗ Correspondence
Received 16 October 2010; reviews returned 14 January 2011; accepted 5 July 2011
Associate Editor: Mark Holder
Abstract.—Phylogenetic trees are important in many areas of biological research, ranging from systematic studies to the
methods used for genome annotation. Finding the best scoring tree under any optimality criterion is an NP-hard problem,
which necessitates the use of heuristics for tree-search. Although tree-search plays a major role in obtaining a tree estimate,
there remains a limited understanding of its characteristics and how the elements of the statistical inferential procedure
interact with the algorithms used. This study begins to answer some of these questions through a detailed examination
of maximum likelihood tree-search on a wide range of real genome-scale data sets. We examine all 10,395 trees for each
of the 106 genes of an eight-taxa yeast phylogenomic data set, then apply different tree-search algorithms to investigate
their performance. We extend our findings by examining two larger genome-scale data sets and a large disparate data set
that has been previously used to benchmark the performance of tree-search programs. We identify several broad trends
occurring during tree-search that provide an insight into the performance of heuristics and may, in the future, aid their
development. These trends include a tendency for the true maximum likelihood (best) tree to also be the shortest tree
in terms of branch lengths, a weak tendency for tree-search to recover the best tree, and a tendency for tree-search to
encounter fewer local optima in genes that have a high information content. When examining current heuristics for treesearch, we find that nearest-neighbor-interchange performs poorly, and frequently finds trees that are significantly different
from the best tree. In contrast, subtree-pruning-and-regrafting tends to perform well, nearly always finding trees that are
not significantly different to the best tree. Finally, we demonstrate that the precise implementation of a tree-search strategy,
including when and where parameters are optimized, can change the character of tree-search, and that good strategies for
tree-search may combine existing tree-search programs. [Algorithms; heuristics; maximum likelihood; NNI; phylogenetics;
SPR; tree-search.]
Phylogenetic tree estimation is a critical part of many
biological studies and has been used to resolve evolutionary relationships (e.g., Aguinaldo et al. 1997;
Delsuc et al. 2005; Nikolaev et al. 2007), to help understand and fight disease (e.g., Bush et al. 1999; Hahn
et al. 2000), and even as evidence in court (Metzker
et al. 2002). Phylogenetic studies frequently use some
form of optimality criterion to assess how well specific
tree topologies describe the observed sequence data.
Optimality methods typically work by finding the best
scoring tree for a sequence alignment, which is taken
to be the best estimate of the evolutionary relationships
between a set of sequences. Confidence in that tree
estimate is then assessed, typically using statistical procedures such as bootstrapping (see Whelan et al. 2001;
Delsuc et al. 2005). One of the most popular optimality
criterion methods is maximum likelihood (ML), which
calculates the likelihood of the observed sequence data
conditional on a specific tree topology and a substitution
model of how sequences change over time (Felsenstein
2003). Much research effort has been devoted to the development of more realistic substitution models, with
the expectation that they will improve the accuracy of
phylogenetic tree inference (Delsuc et al. 2005). Much
less attention has been given to the methods used to estimate the phylogenetic tree. Most practical applications
require a tree-search strategy to try and find the ML tree
estimate from the overwhelming number of possible
tree topologies, and the rearrangements used to define
the relationships between them (referred to hereafter
as tree-space; see Felsenstein 2003). Different computer
programs use different types of tree-search heuristics,
but they are usually based on the idea of hill-climbing
using a rearrangement algorithm to define neighboring
trees (e.g., Whelan 2007; Whelan 2008), although there
have been attempts to use genetic algorithms (Lewis
1998; Zwickl 2006) and simulated annealing (Stamatakis
2005).
It is established that hill-climbing can produce many
different optimal trees depending where the algorithm
starts, and that the number of optima may vary from
data set to data set (e.g., Salter 2001; Morrison 2007).
Only the optimal tree with the highest likelihood, the
global optimum, has the appealing properties of the
ML estimator. Unfortunately, tree-search is NP-hard,
which means that once an optimum is located there is
no way of knowing whether it is the global optimum or
a lower scoring local optimum (Chor and Tuller 2005).
Some insight into the difficulty of recovering the optimal tree can be obtained by rerunning an analysis from
different starting points and seeing how frequently the
tree-search algorithm identifies other optima (Vinh and
von Haeseler 2004). In alignments where there are large
numbers of different optima, the tree-search problem is
difficult and the heuristic will be highly dependent on
its starting location, indicating that one should treat any
individual estimate with caution. In contrast, if rerunning tree-search from different starting points recovers
few optima, one may conclude that there is a better
chance of recovering the optimal tree. Note that this approach is very different from bootstrapping; the reruns
of tree-search use exactly the same data and the reason
228
2012
MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH
the algorithm returns a different tree is a consequence
of the structure of tree-space.
The efficacy of different rearrangement operations
in hill-climbing is relatively poorly characterized (although see Morrison 2007) and, although more expansive rearrangements are expected to reduce the
number of optima at greater computational cost, our
understanding of how different rearrangements affect
tree-search is limited (although see Whelan and Money
2010). Furthermore, we know tree-search performance
varies between data sets, but it is not clear what properties of alignments affect the difficulty of tree-search and
whether we can identify easier tree-search problems
a priori. There is also a complex and largely uncharacterized interplay between the tree-search problem,
different models of substitution, and different implementations of tree-search. Improving our understanding of the factors affecting the difficulty of tree-search is
an important step toward developing improved heuristics that perform well on a wide variety of data types.
Although tree-search is NP-hard and it is impossible to
guarantee good performance, experience with other NPhard problems, such as the travelling salesman problem,
show that more effective heuristics lead to good results
in the majority of cases.
In this study, we investigate the tree-search problem
for a wide-range of real amino acid and nucleotide
alignments under a variety of popular substitution
models and computer programs. Our goal is two-fold,
to learn about the factors that affect the topography of
tree-space, and to provide pragmatic suggestions that
will aid phylogenetic inference with existing methods.
To investigate the topography of tree-space, we examine
how tree-space varies between alignments, substitution
models, and heuristic tree-search algorithms. We investigate how the difficulty of tree-search differs between
alignments and use correlation analyses to identify predictors for the difficulty of tree-search. We also examine
whether optima share any properties, such as their relative size or their location in tree-space. In our study
of tree-search heuristics currently used in programs,
we examine whether some approaches clearly outperform others, judged by statistically comparing the tree
they find with the globally optimal tree (or the best tree
found during extensive tree-search). We also propose
a range of quick-to-compute statistics that may be predictive of how difficult the tree-search will be for any
given alignment. We address these questions by first
examining a small data set that is amenable to exhaustive tree-search, identifying important trends associated
with tree-search. We then use a sampling approach
to extend our results to heuristic tree-search on much
larger data sets.
M ATERIALS AND M ETHODS
Data Sets
We examine three high-quality phylogenomic data
sets consisting of 8-, 20- and 40-taxa that have been
handcrafted and used extensively in other studies, and a
229
disparate set of alignments taken from TreeBase (Morell
1996) that have previously been used to benchmark treesearch programs (Guindon et al. 2010). Phylogenomic
data sets consist of a series of genes taken from the same
set of taxa, leading us to expect a single tree relating the
taxa, and enabling us to compare results between genes
to highlight similarities and differences in tree-space
caused by alignment properties. The eight-taxa yeast
data set, taken from Rokas et al. (2003), consists of ungapped nucleotide sequence alignments for 106 genes.
The 20- and 40-taxa phylogenomic data sets are subsets
of the data set used by Philippe et al. (2005), whose data
consist of gapped amino acid sequence alignments for
146 genes, each having sequences from between 25 and
49 taxa. To create the 40-taxa data set, we select the 40
taxa that occur most frequently across the 146 genes, retaining all genes that contain sequences for the selected
taxa. From these genes, any that contains one or more
sequences with >10% unknown characters or gaps is
removed, leaving a total of 20 genes in the 40-taxa phylogenomic data set. The 20-taxa data set is produced in
a similar manner, with the additional rule that genes in
the 40-taxa data set were excluded. One further gene
is removed because it causes irrecoverable errors in
phyml, resulting in a total of 52 genes in the 20-taxa
phylogenomic data set.
The benchmark data set consists of the medium nucleotide and amino acid sequence alignments used
to investigate the performance of phyml in Guindon
et al. (2010), with some minor modifications. To allow straightforward comparison between trees, we remove redundant sequences from each alignment using
RAxML. From these reduced alignments, we discard
all with 10 or fewer or greater than 100 sequences, to
ensure we investigate an appropriate range of data
while maintaining computational tractability. The refined benchmark data set used in our study contains 41
nucleotide alignments and 40 amino acid alignments.
Exhaustive Tree-Search in the Eight-Taxa Phylogenomic
Data Set
There are 10,395 bifurcating trees describing all possible evolutionary relationships of the yeast species in
the eight-taxa data set, making it amenable to exhaustive tree-search and consequently a complete analysis of
the topography of tree-space. For this study, we examine only the extremes of nucleotide model complexity,
the Jukes and Cantor model (JC) and the general time
reversible model with Γ -distributed rates-across-sites
(GTR+Γ ) (for details about these models see Felsenstein
2003). This choice of models is limited but more detailed
analyses show that although model choice does affect
the topography of tree-space, it does so in an unpredictable manner and does not substantially affect the
difficulty of the tree-search problem (see Supplementary Table S1 and associated text for full details, available from http://www.sysbio.oxfordjournals.org/).
A modified version of standard steepest ascent hillclimbing algorithms (Whelan 2008) is used to identify
230
SYSTEMATIC BIOLOGY
the optimum found when starting from every tree.
These algorithms function by iteratively improving the
current tree topology using the following scheme: (i)
assign a start tree to the current tree object, (ii) use a
rearrangement operation to define the neighborhood of
the current tree, (iii) calculate likelihoods for the trees
in the neighborhood and assign the highest scoring as
the new current tree, and (iv) if no improvement in
likelihood occurs, then tree-search reaches an optimum
and stops, otherwise go to (ii). We investigate treesearch using nearest-neighbor-interchange (NNI), and
subtree-pruning-and-regrafting (SPR) rearrangement
operations (for details of operations see Whelan 2008).
Our modification to hill-climbing only affects NNI treesearch, which enables the algorithm to escape from
multifurcations when they would otherwise get stuck
(see Whelan and Money 2010 for details). Whenever a
multifurcation is identified during an NNI tree-search,
we identify the subset of connected bifurcating trees it
contains. The neighborhood used for the next round of
hill-climbing is then defined by performing NNI rearrangements to this subset of trees.
Number and size of optima.—Throughout this study we
assume that the number of different optima that can
be reached during tree-search is a suitable proxy for
the difficulty of the tree-search problem. The number
of optima for a gene is defined as the number of optima identified during tree-search under a specific rearrangement strategy from all possible starting points
in tree-space. The size of an optimum is defined as
the number of start trees that reach that specific optimum when performing tree-search, which may be
reflective of how frequently an optimum may be encountered if start trees were sampled at random. Therefore, large optima will be reached from many start trees,
whereas small optima will be reached from relatively
few start trees. In a small number of cases, an optimum
is a multifurcating tree, with two or more neighboring
bifurcating trees with approximately the same likelihood (tolerance 10−5 ). These clusters of bifurcating
trees are grouped together as a single multifurcating
optimum.
Statistical comparison of optima.—Different runs of treesearch may yield different point estimates of the tree,
but it is not known whether these tend to be significantly different to one another or significantly different
to the global optimum. We assess whether each local optimum is significantly different to the global optimum
using the SH test (Shimodaira and Hasegawa 1999), implemented in PAML (Yang 1997). For brevity, hereafter
we denote the “95% confidence interval of the globally
optimal tree” to be the set of trees that cannot be rejected by the SH-test with 95% confidence as being different from the globally optimal tree, although we note
the atypical usage of this phrase.
VOL. 61
Optima clustering.—Complete knowledge of tree-space
allows us to perform the NP-hard calculation of how
many rearrangement operations are required to transform tree A into tree B. We use this metric to investigate
whether optima cluster together more than expected by
chance under a particular rearrangement scheme. We
compute the mean NNI distance between our n identified optima and assess the significance of any clustering observed using a bootstrap approach. We take
1000 draws from the null distribution of no significant
clustering by sampling n randomly chosen trees, with
the condition that none is a neighbor to any other, and
computing their mean NNI distance. For genes where
there are multifurcating optima, we use the minimum
distance between any two bifurcating trees within that
optimum and use an equal-sized multifurcation during
our sampling procedure.
Correlating the number of optima with gene and tree
properties.—The relative impact of tree and gene properties on the difficulty of tree-search are assessed by
calculating the Spearman correlation coefficients between specific properties and the number of optima
identified by tree-search. We present correlations for
three major properties: (i) tree length, defined as the
sum of all branches of the globally optimal tree; (ii)
alignment length, defined as the length of the gapped
sequence alignment associated with a gene; and (iii)
the difference in likelihood between the fully resolved
globally optimal tree and the unresolved star-tree (the
tree with no resolved internal branches). The statistic in
(iii), which we label Δ ln L̂, is intended to represent the
information content in the sequence data for resolving
the tree, where the star-tree, which has a single multifurcating internal node, represents a model with no
topological information and the globally optimal tree
represents the fully resolved model. The Δ ln L̂ statistic
is therefore proportional to the difference in Akaike’s
information criterion value (Akaike 1974), although it
does not address the problem that some branches are
easier to resolve than others. Note that we examined
other gene and tree properties, but none were found to
have correlations as strong as those we present. We also
investigate these correlations using the neighbor-joining
tree as a proxy for the global optimum, implemented
through the PAL library (Drummond and Strimmer
2001).
Parameter distributions.—Many previous studies assume
that parameter estimates in the substitution model are
stable as long as a “good” tree is used (e.g., Whelan
and Goldman 2001). We investigate the estimates of
tree length and the α parameter of the Γ -distributed
rates-across-sites model by examining their distribution across the 10,395 bifurcating trees relating 8 taxa
for all 106 genes. For each gene, we calculate the mean
rank of the global optimum and the skewness of the
distribution.
2012
MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH
Heuristic Tree-Search Using Sampling for the 20- and
40-taxa Phylogenomic and Benchmarking Data Sets
The size of tree-space for these data sets makes exhaustive tree-search computationally unfeasible, with
the 20- and 40-taxa sets having in the order of 10 20 and
1055 possible topologies, respectively. For such large
data sets, there are two computational limitations. First,
a fully defined steepest ascent algorithm is computationally too slow, so instead we use two popular and
fast phylogenetic tree-search programs: phyml v3.0.1
(NNI and SPR: Guindon and Gascuel 2003) and RAxML
v7.04 (SPR: Stamatakis 2006). Second, the total number
of trees is too large to allow exhaustive examination
of tree-space, so instead we use a sampling approach.
For each data set under each condition examined, we
randomly sample 1000 trees (uniform probability for
each bifurcating tree topology) and use these trees as
the starting point to perform independent runs of treesearch to identify optima. For the benchmarking data
set, we use only 100 random samples due to computational limitations of examining alignments with many
sequences. In all cases, the ML score of optima are recalculated using PAML to ensure an unbiased comparison
between programs.
For the phylogenomic data sets, each alignment is
examined under extremes of model complexity, using EQU (the equiprobable model; Bishop and Friday
1987) and WAG+F+Γ (Yang 1994; Whelan and Goldman
2001). For the benchmarking data set, we examine only
a complex model, GTR+Γ for nucleotide alignments and
WAG+F+Γ for amino acid alignments. In RAxML, EQU
is invoked by specifying a user-defined substitution matrix and homogeneous rates-across-sites is enforced by
setting the number of rate categories to one.
Number and size of optima.— Overall numbers of optima
cannot be calculated using a sampling approach, but the
relative numbers of optima discovered by phyml and
RAxML from the randomly sampled start trees should
be indicative of the difficulty of tree-search. For the phylogenomic data set, the relative size of the sampled optima are calculated as the number of start trees that lead
to them. Note that when investigating the relative size
of optima, we exclude genes with >200 optima to ensure that our estimates of optima size are accurate. We
also compare the optima identified by sampling with
those located using out-of-the-box (OOB) SPR settings
for both RAxML and phyml.
Statistical comparison of optima.— The best tree identified
during tree-search is taken as a proxy for the global
optimum, and we compare the 95% confidence interval
of this best tree with the other optima we identify using the SH test implemented in PAML (Yang 1997) and
Consel (Shimodaira and Hasegawa 2001). We also assess
whether the optima found using OOB SPR search settings for each program falls within the 95% confidence
interval of the best optimum found from the sampled
start trees.
231
Optima clustering.—It is NP-hard to calculate NNI or
SPR distances between trees (SPR: DasGupta et al. 1999;
NNI: DasGupta et al. 2000), so for our larger data sets we
use the Robinson–Foulds (RF) distance metric (Robinson and Foulds 1981). The average RF distance between
the n sampled optima is calculated and a bootstrap
procedure used to assign P values of clustering by comparing the observed distance to distribution of average
RF-distances between n randomly sampled trees.
Correlating the number of optima with gene and tree
properties.—To assess the difficulty of tree-search across
alignments, it is necessary to assume that each alignment has the same evolutionary history. For the purpose of this study, we ignore minor variations in gene
tree topology that can result from (e.g.,) incomplete
lineage sorting or model misspecification. This assumption is only true for our phylogenomic data sets, so
the benchmarking data are excluded from these analyses. Spearman correlation coefficients are calculated between the observed number of optima and gene length,
tree length, and likelihood difference between the startree and the best optimum identified (Δ ln L̂).
Parameter distributions.— For each alignment examined,
the distribution of tree length and rates-across-sites
parameter, α, across trees are approximated by taking estimates from our random start trees, for both
the simple and complex model. The parameter estimates obtained for the best tree found during heuristic
tree-search are then compared with this distribution of
randomly selected trees.
R ESULTS
Exhaustive Analysis of Eight-taxa Yeast Phylogenomic
Data Set
We characterize the tree-search problem for the 106
genes that comprise the phylogenomic data from Rokas
et al. (2003). We only present results from NNI treesearch because very few genes produce multiple optima
under SPR rearrangements; under JC (GTR+Γ ), we find
that 101/106 (105/106) genes have only a single optimum using SPR.
The global optimum is frequently the largest optimum.—
Figure 1 compares the ordered rank of identified optima
with the average size of optima relative to that expected
if all optima were of equal size. There is a strong tendency for the global optimum to be larger than other
less good optima under both models: in other words,
the highest scoring optimum has more start trees that
reach it during tree-search than expected by chance. The
global optimum is on average 2.37 times larger than
expected under JC (solid black line), and 2.16 times
larger than expected under GTR+Γ (gray dashed line).
Similarly, the second best optimum found is on average 1.03 (1.07) times the size expected by chance under
JC (GTR+Γ ), whereas the remaining optima are usually
232
SYSTEMATIC BIOLOGY
VOL. 61
FIGURE 1. The number of start trees leading to an optimum relative to that which would be expected if the number were uniformly
distributed (total number of trees divided by the number of optima
for an individual gene) for the eight-taxa yeast phylogenomic data set.
Log-likelihood scores are used to rank optima. The solid black line is
under JC, and dotted gray line under GTR+Γ .
smaller than expected. These averages are confirmed
when examining individual genes: under JC, the global
optimum is the largest in 95/106 genes, whereas it is the
largest under GTR+Γ for 90/106 genes. The genes where
this does not hold tend to have large numbers of optima
and in all of these cases the global optimum is one of the
four largest optima. Note that the spike in the number
of optima between rank 15 and 20 under GTR+Γ is from
a single gene in our analysis that had many optima.
Statistical comparison of optima.—Under GTR+Γ , we find
that the number of trees that lie within the 95% confidence interval of the global optima varies widely between alignments with 41–10,393 trees falling within the
confidence interval (average 1405). For 101/106 genes
examined, the trees in the confidence interval are fully
connected by NNI, in other words they form one group
in tree-space because a series of NNI rearrangements exists between any pair of trees in the confidence interval
such that no intermediate steps contain a tree outside
the confidence interval. For the remaining five genes,
the trees in the confidence interval form two distinct
groups separated by trees outside the confidence interval. Of the 91 genes with multiple optima under GTR+Γ ,
we find that 28 have local optima that the SH test finds
significantly different from the global optima. When averaged across all 91 genes, we find 14.6% of the locally
optima found are significantly different to the global optima. The results for JC are broadly similar.
Clustering of optima.—In Figure 2a, we show a representation of tree-space under GTR+Γ for the genes YBR198C
(black) and YLR389C (gray). Tree-search on YLR389C is
easy; there is a single optimum, marked with a gray X,
and as trees increase their NNI distance from this optimum they have progressively lower likelihoods. For
YBR198C, tree-search is more complex. The likelihood
decreases as the NNI distance from the global optimum
increases, but the slope is less steep. Furthermore, there
FIGURE 2. a) Average log-likelihood differences between resolved
tree topologies and the star tree (Δ ln L̂) under GTR+Γ plotted against
their distance from the globally optimal tree. The figure summarizes
information from the 10,395 trees linking the eight-taxa yeast phylogenomic data set in the YBR198C (black) and YLR389C (gray) genes,
with solid lines showing the average log-likelihood and dotted lines
describing the 95% interval in the distribution. Crosses in the figure
show the location of optima; note the two crosses at distance zero are
the global optima. The histogram (top) shows the relative fraction of
trees at each distance from the global optimum, which is dependent
on the shape of the globally optimal tree. In this case, both genes have
the same optimal tree. b) Correlations between Δ ln L̂ and the number
of optima per gene under JC (crosses) and GTR+Γ (squares). Best fit
lines are shown for JC (solid) and for GTR+Γ (dashed).
are 16 local optima (each labeled with a black X) clustered between three and eight NNI operations from the
global optimum and these tend to have higher than average likelihoods.
The clustering of optima observed in YBR198C also
occurs in other genes, but their location is unpredictable.
We examined Δ ln L̂, and α, and found that none of these
factors predicted which start trees are attracted to specific optimum. We examine the genes with multiple
optima in more detail. Our bootstrap analyses show
that for the overwhelming majority of genes, the mean
NNI distance between optima was less than expected
by chance and under JC (GTR+Γ ) we find 49/92 (50/91)
genes display significant clustering (P < 0.05; Bonferroni correction). There is no clear interpretation of
this clustering because we cannot differentiate between
2012
233
MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH
TABLE 1. Correlations between gene properties and the number of optima discovered during tree-search
Yeasta
Factor
Δ ln L̂h
Genelength
Treelengthi
Small metazoa NNIb
Small metazoa SPRc
Large metazoad
JCe
GTR+Γ f
EQUg
WAG +F+Γ
EQU
WAG +F+Γ
EQU
WAG +F+Γ
−0.50* (−0.52*)
−0.40*
0.29* (0.29*)
−0.51* (−0.52*)
−0.42*
0.34* (0.34*)
−0.76*
−0.70*
−0.20
−0.80*
−0.70*
−0.06
−0.46* (−0.47*)
−0.35* (−0.39*)
−0.24 (−0.04)
−0.59* (−0.47*)
−0.53* (−0.40*)
0.00 (0.06)
−0.87* (−0.74*)
−0.80* (−0.74*)
−0.38 (−0.19)
−0.59* (−0.26)
−0.52* (−0.15)
−0.18 (−0.13)
∗ Significant
correlation.
from eight-taxa yeast species examined using NNI. Results using the neighbor-joining tree instead of the optimal tree are shown in
parentheses.
b Sampled results from 20-taxa metazoan data examined using NNI.
c Sampled results from 20-taxa metazoan data examined using SPR using RAxML. Results from phyml are shown in parentheses.
d Sampled results from 40-taxa metazoan data examined using SPR using RAxML. Results from phyml are shown in parentheses.
e Jukes and Cantor model.
f General time reversible model with Γ -distributed rates-across-sites.
g Equiprobable model.
h The log-likelihood difference between the globally optimal tree and the star topology.
i The sum of all branch lengths of the globally optimal tree.
a Results
potential causes. One cause would be one or more
highly supported branches in the data, which means
that all optimal trees contain this branch. Alternative
causes could be a clustering of quite dissimilar optimal
topologies that share few (if any) features, systematic
exclusion of particular branches, or some potential bias
in tree-shape.
Correlation between number of optima and data properties.—
Figure 2a demonstrates another trend that holds between gene comparisons: the number of optima in a
gene is correlated with the value of Δ ln L̂ at the global
optimum. Figure 2b plots the number of optima compared with the value of Δ ln L̂ across all genes under JC
and GTR+Γ , and we find there are significant, but imperfect, correlations between these variables (Table 1).
These correlations also occur when using the difference
in likelihood between the star-tree and the neighborjoining tree (JC: r = −0.52; GTR+Γ : r = −0.52), suggesting this form of statistic may be predictive even if the
true global optimum is not known. Table 1 also shows
that the number of optima in a gene is also significantly
correlated with the gene length and tree length (sum
of branches of the globally optimal tree), albeit weaker
than those between the number of optima and Δ ln L̂.
Parameter distributions.— The distribution of parameter
estimates across tree-space are summarized in Table 2.
Globally optimal trees tend to have relatively high esti-
mates of α from Γ -distributed rates-across-sites and low
estimates of tree length. Table 2 also shows the skew
of the parameter distributions. There is positive skew
for α, with the majority of trees having low parameter
estimates, in contrast to the high estimate in the globally
optimal tree. There is a negative skew for tree length,
with the majority of trees having longer estimates, in
contrast to the shorter estimate for the globally optimal
tree.
Heuristic tree-search in 20- and 40-taxa metazoan data sets
We use a sampling approach to examine whether the
trends observed in smaller data sets under exhaustive
tree-search extend to larger data sets. Furthermore, we
investigate the effect on tree-search of different implementations of tree-search strategies. Results presented
are for WAG+F+Γ , and those for EQU follow a similar
pattern unless described otherwise.
Number of optima.—We find large differences in the number of optima observed both between models and between genes for both data sets. For the 20-taxa data set,
1000 samples of NNI tree-search using phyml reveal an
average number of optima of 196 (range 1–753), whereas
NNI tree-search using phyml on the 40-taxa data set results in very large numbers of optima for most genes.
For several genes, different optima are found from each
of the 1000 start trees examined, and for 16/20 genes
more than 900 different optima are found. These large
TABLE 2. Median ranks of parameter estimates for the global optimum tree and the median skewness of parameter distributions
αa
Global optimum rank
Skewness
GTR+Γ
2 (1–38)c
1.48 (0.51–3.66)c
Lengthb
JC
10,395 (10,385–10,395)c
−1.08 (−1.39 to 0.44)c
GTR+Γ
10 394 (10,352 – 10,395)c
−0.90 (−1.49 to 0.19)c
Notes: A total of 10,395 trees are examined. Ranks are assigned from highest parameter estimate to lowest parameter estimate for each gene. A
rank of 1 for α indicates that the globally optimal tree had the highest estimate of α across all possible tree topologies.
a Estimated variance parameter from Γ -distribution.
b Estimated tree length calculated as the sum of all branches.
c Range of values across all genes.
234
SYSTEMATIC BIOLOGY
VOL. 61
numbers preclude any detailed analysis of NNI treesearch on the 40-taxa data set.
For SPR tree-search, we generally find that model
and implementation affect the number of optima
recovered. On average, the more sophisticated
WAG+F+Γ model finds fewer optima during tree-search
than EQU, whereas phyml consistently finds more optima during tree-search than RAxML. Applying phyml
to the 20-taxa data set, there are an average of 11 (range
1–90) optima under EQU and 8 (1–59) optima under
WAG+F+Γ , whereas for RAxML we find an average of
5 (1–32) optima under EQU and 4 (1–45) optima under WAG+F+Γ . Applying phyml to the 40-taxa data
set results in an average of 290 (60–594) optima under
EQU and an average of 203 (17–838) under WAG+F+Γ ,
whereas RAxML results in 112 (3–415) optima under
EQU and 94 (4–817) optima under WAG+F+Γ .
The optima obtained using sampling are compared
with those obtained with OOB settings. For these phylogenomic data sets, no OOB SPR tree-search provided
better scoring optimum. Differences between OOB and
sampling are most pronounced for the 40-taxa data set,
where sampling provided a higher scoring tree in 10/20
(18/20) genes for RAxML (phyml).
Size of optima.—The number of times an optimum is
reached during tree-search from 1000 different start
trees is used to approximate the relative size of the
optima discovered. The best optimum located for NNI
tree-search is usually larger than other lower scoring
optima in all conditions examined, but it is difficult to
draw firm conclusions because the number of optima
is frequently very large. Moreover, the expected size of
an optimum is confounded with the number of optima
found, so that an increase in relative optima size may
not result in an increased probability of recovering that
optima from a random start location in tree-space. The
number of optima for SPR tree-search is much lower,
allowing a more detailed comparison. Figure 3a shows
the relative size of the optima for the 20-taxa data set
using SPR tree-search using phyml and RAxML. We
find that using phyml the best optimum is on average
3.2 times larger than expected, whereas the average using RAxML is 2.4 times larger than expected. Figure
3b shows that SPR tree-search on the 40-taxa data set
follows a similar pattern. The best optimum recovered
using RAxML is 4.3 times larger than expected, whereas
for phyml the best optimum recovered is 8.4 times larger
than expected.
Statistical comparison of optima.—We examine whether
the optimal trees found by different search methods and
from different starting points fall within the 95% confidence interval of the best tree found. For the 20-taxa
phylogenomic data set, we find that NNI frequently
finds trees that are significantly different to the best
tree found, 45.8% of searches yielding trees significantly
worse than the best tree found. In contrast, under both
programs, tree-search using SPR always finds optima
FIGURE 3. The number of start trees leading to an optimum relative to that which would be expected if the number were uniformly
distributed (total number of sampled trees divided by the number of
optima located for an individual gene) for (a) our 20-species metazoan
data set using SPR moves and (b) our 40-species metazoan data set
using SPR moves under WAG+F+Γ . Log-likelihood scores are used
to rank optima. Solid lines denote tree-search using phyml, whereas
dashed lines denote tree-search using RAxML.
within the 95% confidence interval of the best tree for
both the 20- and 40-taxa data sets. The optimum discovered using OOB settings for both Phyml and RAxML
also always fall within the 95% confidence interval of
the best optimum found from 1000 random start trees,
for both data sets and both models.
Optima clustering.—We examine the mean RF distance
between optima for both the 20- and 40-taxa data sets
and compare it with that expected if the optimal trees
were randomly distributed through tree-space. We find
significant clustering for every gene examined, under
all implementations, and all tree-search methods (P<<
0.01; Bonferroni correction).
Correlation between number of optima and data properties.
— Table 1 shows the correlation between the number of optima found in a gene and (i) Δ ln L̂, (ii) gene
length, and (iii) tree length. In all cases, the strongest
correlations are found between Δ ln L̂ and the number
of optima, suggesting in some conditions it may be a
useful predictor of the difficulty of tree-search. Note
the difference in the relative strength and significance
235
51.7 (3.8)a
40.1 (2.1)a
11.4 (2.9)a
42.1 (0.2)a
111.7
94.0
17.9
58.6
number of common optima found in both runs.
Benchmarking data set
a Average
2nd run RAxML
83.1 (50.7)a
60.2 (24.6)a
11.8 (4.6)a
42.5 (3.7)a
109.3 (9.5)a
83.6 (7.0)a
16.5 (1.4)a
46.6 (0.7)a
289.8
203.2
25.0
62.0
EQU
WAG+F+Γ
WAG+F+ Γ (protein)
GTR+ Γ (DNA)
1st run RAxML
2nd run phyml
2nd run RAxML
1st run phyml
2nd run phyml
1st run
Model
TABLE 3. Average number of optima found using different implementations of SPR tree-search
The effect of implementation on rearrangement algorithms in
tree-search.—Several results in previous sections highlight a difference in the performance of SPR tree-search
between phyml and RAxML, including the number of
optima recovered using both strategies and the correlations detected between the number of inferred optima
and data properties. We further examine this difference in performance by rerunning tree-search from the
optima identified under different programs. The steepest ascent algorithm used for exhaustive tree-search
would recover exactly the same number of optima, but
the heuristics used in different programs will affect
their ability to recover the same optimum. Running a
program a second time from a previously recovered
optimum tests the robustness of the topography of treespace for an individual implementation. Note that for
these reruns, we only provide the topology and sequence data to the program, which means that branch
length and substitution model parameters need to be
reestimated.
In the “1st Run” columns of Table 3, we show how
many optima are recovered when tree-search is initialized from 1000 randomly sampled trees using phyml
and RAxML. Using phyml for the first run, we find
203.2 optima, which reduces to 83.6 when phyml is run
a second time; an average of only 7.0 optima are shared
between the first and second run. If RAxML is used to
perform the second run, the average of 203.2 optima
reduces to 60.2, but with more optima retained in the
second run.
When the first run of tree-search is performed with
RAxML, we usually find fewer optima than phyml (an
average of 94.0 for RAxML compared with 203.2 for
phyml). The optima identified by RAxML appear to be
relatively robust, with the number of optima decreasing by only a small amount when RAxML is used for
the second run. The optima shared between runs have
1st run
Parameter distributions.—The parameter estimates for the
best tree found during tree-search are compared with
the distribution of parameter estimates across tree-space
estimated from 1000 random trees. The results for both
data sets qualitatively match our observations for the
Rokas data. The best tree has a strong tendency to be
shorter than randomly sampled trees under both data
sets. Under WAG+F+Γ the rates-across-sites parameter,
α, tends to have relatively high values for the best tree,
although the pattern is marginally weaker in the 40-taxa
data set, with 12/20 genes having a value of α greater
than that observed in the sampled distribution.
40-taxa phylogenomic data set
of the correlation between the statistics examined and
the choice of program used for tree-search. The number
of optima identified by RAxML are strongly correlated
with Δ ln L̂, whereas correlations are relatively weaker
and nonsignificant for optima identified by phyml, suggesting that the choice of program and the heuristics
they implement affect the way phylogenetic information is used.
110.1 (109.2)a
89.9 (85.7)a
16.4 (12.2)a
53.3 (17.7)a
MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH
Data
2012
236
SYSTEMATIC BIOLOGY
a great deal of overlap. For tree-search, we find that
the 94.0 optima identified by RAxML in the first run
reduces to 89.9 in the second run, and on average 85.7
of these optima are the same as those recovered in the
first run. When phyml is used to perform the second
run, the number of optima reduces substantially and
there is very little overlap between the optima from the
first and second runs. The average number of optima
reduces from 94.0 to 40.1, with an average of only 2.1
shared optima between the two runs.
The numbers of optima recovered from different runs
do not answer whether the first or second run yields
the highest scoring optima. When the first run of treesearch is performed using phyml, we find a total 4063
different optima across all genes. A second run with
phyml (RAxML) yields optima with a higher score in
3161 (3557) cases, optima with the same score in 187
(489) cases, and lower scoring optima in 715 (17) cases.
There are 1880 optima across all genes under WAG+F
+Γ when RAxML is used for the first run of tree-search.
A second run with phyml (RAxML) yields a higher scoring optima in 799 (141) cases, an optima with the same
score in 229 (1707) cases, and a lower scoring optima in
852 (32) cases.
Qualitatively similar results are obtained when using
EQU for tree-search although the initial number of optima tends to be higher (Table 3). From these optima a
second run under RAxML nearly always finds optima
that score the same or better than the original, and a second run with phyml frequently results in lower scoring
optima.
Heuristic Tree-Search in the Benchmark Data Sets
Examining the performance of phylogenomic data
sets provides some insight into the tree-search problem, but the genes are expected to share an evolutionary history and therefore have similar ML tree
estimates. The chance choice of a set of sequences with
an unusual tree-shape may affect our results, however; so, we extend our analyses to larger and more
diverse data sets. Note that losing the assumption of a
shared evolutionary history means that several comparisons between individual alignments can no longer be
made.
The effect of SPR implementation on the number of optima.—
For each alignment examined from 100 different start
trees, we find the number of different optima discovered varies from 1 to 100 for both phyml and RAxML,
although the number of optima is correlated between
the two programs (proteins: r = 0.71; DNA: r = 0.72). For
the protein (DNA) data set for 36/41 (28/40) sequences,
RAxML finds fewer or the same number of optima as
phyml with, on average, phyml finding 2.71 (2.16) times
as many optima as RAxML. Running OOB tree-search
on the protein data set reveals similar patterns to those
revealed in the larger phylogenomic data sets. For the
DNA data set the performance of OOB tree-search is
VOL. 61
more varied. Sampling finds the best tree for the majority of genes (RAxML: 29/40; phyml: 35/40), and for the
remaining genes, there is a roughly even split between
(i) sampling and OOB locating the same best tree, and
(ii) OOB finding the best tree.
Statistical comparison of optima.—In nearly all cases examined, the local optima identified with SPR tree-search
from 100 random start trees falls within the 95% confidence interval of the best tree identified. The single case
where this does not occur is for a 67 sequence nucleotide
alignment, where for one random start tree phyml finds
an optimum that is significantly worse than the best
tree found. Our results suggest that both programs are
highly efficient at finding good tree estimates, even
though their heuristics can result in quite different topographies of tree-space. OOB tree-search for phyml
and RAxML always return an optimum indistinguishable from the best found from 100 random start trees.
Optima clustering.—There is significant clustering for every nucleotide and amino acid alignment examined under both tree-search programs (P << 0.01; Bonferroni
correction).
Parameter distributions.—The best tree found is shorter
than any of the 100 randomly sampled trees in all of
the amino acid and nucleotide alignments. For the majority of the alignments examined, there is a strong
tendency for the rates-across-sites parameter, α, to be
higher than the randomly sampled trees, although there
are some notable exceptions. For one of the amino acid
alignments, there is very high sequence similarity (86%
constant sites), which results in similar values of α for
all trees. For one of the DNA alignments, the optimal
tree has a very low value of α and we could not identify
the cause.
The effect of implementation on rearrangement algorithms
in tree-search.—The results for the benchmark data set
shown in Table 3 are qualitatively similar to those
obtained from the 40-taxa data set. On average, the
number of optima inferred under RAxML is lower
than the number of optima inferred under phyml for
both nucleotide and amino acid alignments, suggesting
that the heuristics in RAxML may be more effective
than those implemented in phyml. The relative stability of optima found by different programs for the
amino acid alignments are similar to the 40-taxa phylogenomic data set, confirming that our results extend
to general protein data sets. We find that optima are
much less stable for nucleotide alignments under both
tree-search programs, although RAxML is marginally
more stable that phyml. This instability may reflect the
greater number of free parameters in GTR+Γ . A poor
choice of starting values can lead the programs away
from good regions of tree-space before closing-in on
a new optimum. For the nucleotide alignments, the
2012
MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH
log likelihoods of the optima found in the second run is
highly variable for both programs, with neither program
consistently finding better or worse optima. Further details of these runs are provided in the Supplementary
Material.
D ISCUSSION
Our results provide a detailed insight into the topography of tree-space and some of the different factors that
affect it. Previous research has examined the heuristics
used during tree-search in a limited way, showing they
may yield multiple optima (Salter 2001), that model
choice can affect the global optimum (Yang et al. 1995),
and that more general rearrangement strategies are frequently more successful at recovering “good” optima
(Guindon et al. 2010). Our study examines tree-space in
a comprehensive manner and extends previous observations in several key areas.
Our first group of findings relates to the shape of
tree-space and how it may vary between different alignments. We find there are major differences in the topography of tree-space, with the tree-search problem
seeming much easier in some genes than others. As the
number of sequences in an alignment grows, these differences become more pronounced and the heuristics
used for tree-search play a greater role. On the most difficult alignments, we find different optimal tree topologies for every starting tree we examine. Despite these
differences, we are able to make some useful general
statements about tree-space. The global optimum (or at
least the optimum with the highest log likelihood found
during tree-search) tends to have the greatest number
of trees attracted to it, even when complex implementations of tree-search heuristics are used, suggesting
that the topography of tree-space favors finding good
solutions. We also find that optima exhibit a strong
tendency to cluster in tree-space, suggesting that good
trees all share (sub)sets of characteristics. We are unable
to recover what those characteristics are, but they may
group or separate subsets of species in the tree. Further
investigation of these characteristics may enable preliminary analyses of tree-search, which could be used
to define a “reasonable” region in tree-space to search
more thoroughly.
Related to these findings is the interaction of treespace with the fit of the substitution model. We find that
model choice affects tree-space, but in an unpredictable
manner with no discernable pattern, suggesting that
model choice should be governed by biological and statistical considerations, such as the model being an adequate description of the evolutionary process. Certain
aspects of the substitution model, however, are strongly
related to the quality of the tree estimate. All of our analyses demonstrate that the “best tree”, whether found by
heuristic or exhaustive tree-search, has a significantly
shorter tree length than the rest of tree-space. Associated with this observation, we also find the best tree
has a weaker tendency to have the least rate variation
237
between sites and we offer an intuitive explanation linking these two observations. When a specific tree misses
branches that occur in the optimal tree, the substitutions
occurring on that branch must happen elsewhere in the
tree. In some cases, for any given site, two substitutions
on different branches will be required to explain the
single substitution on the “correct” branch, which will
lead to an increase in the number of inferred substitutions at that site and a lengthening of suboptimal trees.
The number of “extra” substitutions required to explain
the suboptimal tree for each site will vary randomly
along the sequence, leading to some sites having more
substitutions occurring under the suboptimal topology
than others, which could, in turn, lead to an increase in
among-site rate variation. This increased spread in rate
variation may be compounded by the excess of zero rate
(constant) sites observed in many sequence alignments.
In addition to these general insights into the treesearch problem, our results also provide some useful
observations for researchers performing phylogenetic
tree-search. First, we note that the difference in loglikelihood score between the globally optimal tree (or
neighbor-joining tree) and the star topology (Δ ln L̂) may
provide a useful proxy for the phylogenetic “signal”
available in an alignment to estimate a tree. Although
the inverse correlation between Δ ln L̂ and the difficulty of tree-search is relatively weak, it does suggest
a method of identifying genes in whole-genome studies that could be particularly amenable to tree-search.
If faced with 100 genes, known values of Δ ln L̂, and
limited computational resources, one would expect that
phylogenetic analyses of the genes with 10 largest values of Δ ln L̂ would yield results that would be more
robust to tree-search errors than picking 10 genes at random. We caution, however, that Δ ln L̂ is an imperfect
statistic and certain properties of the tree may hinder
its application. If, for example, the evolutionary history
of a phylogenomic data set has one exceedingly long
branch, then the value of Δ ln L̂ may solely reflect the
support for that single branch, rather than the other,
more difficult to resolve, parts of the tree.
Second, the results presented here expand on our previous work (Whelan and Money 2010) by demonstrating that in most reasonable applications NNI is prone
to large numbers of local optima and that these optima
are frequently significantly different from the global (or
best) optimum. The conclusions drawn from such inaccurate tree estimates could prove misleading and we
therefore recommend avoiding NNI tree-search where
possible.
Third, our analyses provide positive support for the
use of SPR as a tree-search strategy. In contrast to NNI,
we find that for nearly all random start trees examined,
SPR optimal trees tend to fall within the 95% confidence
interval of the global (or best) optimum. Moreover,
we find that when tree-search programs are run using default settings, the resulting optimal tree is within
the 95% confidence interval for all of the 153 alignments. These results suggest that regardless of whether
238
VOL. 61
SYSTEMATIC BIOLOGY
the programs have correctly identified the correct tree,
the point estimates of trees produced by RAxML or
phyml represent a useful summary of evolutionary relationships, providing that those estimated are treated
with the caution befitting any point estimates without confidence intervals. Given the fast and efficient
programs available for tree-search, SPR appears an adequate heuristic for moderately sized alignments with
fewer than 100 sequences. We note it is not clear how
SPR will perform on larger alignments, but our previous research suggests it does not suffer from some of
the limitations of NNI (Whelan and Money 2010).
Finally, despite the strong performance of SPR in locating trees that are not significantly different from the
best tree, the effect of using suboptimal trees remains
unclear when performing hypothesis tests to investigate
tree topology or aspects of the substitution process. For
many applications, it is still of benefit to locate the true
ML trees or at least the best possible for a given amount
of computational resource. When trying to find a globally optimal tree, our results suggest that neither phyml
nor RAxML has consistently the best performance, and
that the different parameter optimization strategies and
other heuristics they implement have a substantial effect
on the topography of tree-space. Consequently, we conclude that combining the output of both programs and
ranking them with a highly accurate third party program, such as PAML, may yield better results. Moreover,
our results on the stability of optima suggest that feeding very good tree estimates into such programs may
yield trees with lower likelihoods, especially for relatively high parameter models such as GTR+Γ , and that
a selection of different start trees should be tried if users
wish to try and avoid local optima.
between a well-resolved topology and the star topology
may provide a proxy for the phylogenetic information
in an alignment, and in a phylogenomic data set indicate which genes are the most amenable to tree-search;
(ii) NNI tree-search performs poorly on real data and
should be avoided; (iii) current implementations of SPR
tree-search are likely to yield trees that are not significantly different from the globally optimal tree; and (iv)
no single program is likely to yield the best tree estimate and the best strategy for finding it may involved
combining runs from different programs with a range
of different start trees.
C ONCLUSIONS
Our results provide several new insights that may
be useful for researchers developing phylogenetic treesearch programs or whose studies involve estimating
phylogenetic trees. For those developing tree-search
programs, we identify properties of the globally optimal tree that may help drive improvements in treesearch algorithms and heuristics. Knowing that the
globally optimal tree tends to have the greatest number of trees attracted to it and that it is frequently the
shortest tree could all feed directly into current algorithms for tree-search. Equally, the strong differences
in the topography of tree-space for different SPR-based
phylogenetic tree-search programs should encourage
authors of these programs to describe their search algorithms more explicitly, perhaps leading to the modularization of programs in such a way that would allow
researchers to assess independently the relative effectiveness of both the tree-search algorithm and the numerical approximations required to speed up likelihood
computation.
For researchers more interested in the estimation
of trees from existing programs our results offer four
key observations: (i) the difference in log likelihood
R EFERENCES
S UPPLEMENTARY M ATERIAL
Supplementary material, including data files and/or
online-only appendices, can be found at http://www.
sysbio.oxfordjournals.org/.
F UNDING
D.M. was supported by a Doctoral Training Centre
studentship awarded to the University of Manchester
by the Biotechnology and Biological Sciences Research
Council, UK.
A CKNOWLEDGMENTS
The metazoan phylogenomic data set was kindly
provided by Nicolas Lartillot. We also thank Junhyong
Kim, Stephane Guindon, Mark Holder, Ron Debry, and
an anonymous referee for their constructive comments,
which have helped improve the manuscript.
Aguinaldo A.M.A., Turbeville J.M., Linford L.S., Rivera M.C., Garey
J.R., Raff R.A., Lake J.A. 1997. Evidence for a clade of nematodes,
arthropods and other moulting animals. Nature. 387:489–493.
Akaike H. 1974. New look at statistical-model identification. IEEE
Trans. Autom. Control. 19:716–723.
Bishop M.J., Friday A. 1987. Tetropad relationships: the molecular
evidence. In: Pattterson C., editor. Molecules and morphology in
evolution: conflict or compromise? Cambridge (UK): Cambridge
University Press. p. 123–139
Bush R.M., Bender C.A., Subbarao K., Cox N.J., Fitch W.M. 1999. Predicting the evolution of human influenza A. Science. 286:1921–1925.
Chor B., Tuller T. 2005. Maximum likelihood of evolutionary trees is
hard. Lect. Notes Comput. Sci. 3500:296–310.
DasGupta B., He X., Jiang T., Li M., Tromp J. 1999. On the linearcost subtree-transfer distance between phylogenetic trees. Algorithmica. 25:176–195.
DasGupta B., He X., Jiang T., Li M., Tromp J., Zhang L. 2000. On
computing the nearest neighbor interchange distance. In: Du D.Z.,
Pardalos P.M., Wang J., editors. Proceedings of the DIMACS workshop on discrete problems with medical applications. Volume 55.
Providence (RI): American Mathematical Society. p. 125–143.
Delsuc F., Brinkmann H., Philippe H. 2005. Phylogenomics and the
reconstruction of the tree of life. Nat. Rev. Genet. 6:361–375.
Drummond A., Strimmer K. 2001. PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics. 17:662–663.
Felsenstein J. 2003. Inferring phylogenies. Sunderland (MA): Sinauer
Associates.
2012
MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH
Guindon S., Gascuel O. 2003. A simple, fast, and accurate algorithm
to estimate large phylogenies by maximum likelihood. Syst. Biol.
52:696–704.
Guindon S., Dufayard J.-F., Lefort V., Anisimova M., Hordijk W.,
Gascuel O. 2010. New algorithms and methods to estimate
maximum-likelihood phylogenies: assessing the performance of
Phyml 3.0. Syst. Biol. 59:307–321.
Hahn B.H., Shaw G.M., De Cock K.M., Sharp P.M. 2000. AIDS
as a zoonosis: scientific and public health implications. Science.
287:607–614.
Lewis P.O. 1998. A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Mol. Biol. Evol.
15:277–283.
Metzker M.L., Mindell D.P., Liu X.M., Ptak R.G., Gibbs R.A., Hillis
D.M. 2002. Molecular evidence of HIV-1 transmission in a criminal
case. Proc. Natl. Acad. Sci. U.S.A. 99:14292–14297.
Morell V. 1996. TreeBASE: the roots of phylogeny. Science. 273:569.
Morrison D.A. 2007. Increasing the efficiency of searches for the maximum likelihood tree in a phylogenetic analysis of up to 150 nucleotide sequences. Syst. Biol. 56:988–1010.
Nikolaev S., Montoya-Burgos J.I., Margulies E.H., Rougemont J.,
Nyffeler B., Antonarakis S.E., Progra N.C.S. 2007. Early history
of mammals is elucidated with the ENCODE multiple species sequencing data. PLoS Genet. 3:e2.
Philippe H., Lartillot N., Brinkmann H. 2005. Multigene analyses
of bilaterian animals corroborate the monophyly of Ecdysozoa,
Lophotrochozoa, and Protostomia. Mol. Biol. Evol. 22:1246–1253.
Robinson D.F., Foulds L.R. 1981. Comparison of phylogenetic trees.
Math. Biosci. 53:131–147.
Rokas A., Williams B.L., King N., Carroll S.B. 2003. Genome-scale
approaches to resolving incongruence in molecular phylogenies.
Nature. 425:798–804.
Salter L.A. 2001. Complexity of the likelihood surface for a large DNA
dataset. Syst. Biol. 50:970–978.
Shimodaira H., Hasegawa M. 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Mol. Biol.
Evol. 16:1114–1116.
239
Shimodaira H., Hasegawa M. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics. 17:1246–1247.
Stamatakis A. 2005. An Efficient Program for Phylogenetic Inference
Using Simulated Annealing in High Performance Computational
Biology Workshop, Denver, Colorado.
Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based
phylogenetic analyses with thousands of taxa and mixed models.
Bioinformatics. 22:2688–2690.
Vinh, L.S., von Haeseler A. 2004. IQPNNI: moving fast through tree
space and stopping in time. Mol. Biol. Evol. 21:1565–1571.
Whelan S. 2007. New approaches to phylogenetic tree search and
their application to large numbers of protein alignments. Syst. Biol.
56:727–740.
Whelan S. 2008. Inferring trees. In: Keith J., editor. Bioinformatics:
data, sequence analysis and evolution. Totowa (NJ): Humana Press.
p. 287–309.
Whelan S., Goldman N. 2001. A general empirical model of protein evolution derived from multiple protein families using a
maximum-likelihood approach. Mol. Biol. Evol. 18:691–699.
Whelan S., Lio P., Goldman N. 2001. Molecular phylogenetics:
state-of-the-art methods for looking into the past. Trends Genet.
17:262–272.
Whelan S., Money D. 2010. The prevalence of multifurcations in
tree-space and their implications for tree-search. Mol. Biol. Evol.
27:2674–2677.
Yang Z. 1994. Maximum-likelihood phylogenetic estimation from
DNA-sequences with variable rates over sites—approximate methods. J. Mol. Evol. 39:306–314.
Yang Z. 1997. PAML: a program package for phylogenetic analysis by
maximum likelihood. Comput. Appl. Biosci. 13:555–556
Yang Z., Goldman N., Friday A. 1995. Maximum-likelihood trees from
DNA-sequences—a peculiar statistical estimation problem. Syst.
Biol. 44:384–399.
Zwickl D.J. 2006. Genetic algorithm approaches for the phylogenetic
analysis of large biological sequence datasets under the maximum likelihood criterion [dissertation]. Austin (TX): University of
Texas.