Syst. Biol. 61(2):228–239, 2012 c The Author(s) 2011. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: [email protected] DOI:10.1093/sysbio/syr097 Advance Access publication on November 10, 2011 Characterizing the Phylogenetic Tree-Search Problem D ANIEL M ONEY AND S IMON W HELAN∗ Faculty of Life Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK; to be sent to: University of Manchester, Faculty of Life Sciences, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK; E-mail: [email protected]. ∗ Correspondence Received 16 October 2010; reviews returned 14 January 2011; accepted 5 July 2011 Associate Editor: Mark Holder Abstract.—Phylogenetic trees are important in many areas of biological research, ranging from systematic studies to the methods used for genome annotation. Finding the best scoring tree under any optimality criterion is an NP-hard problem, which necessitates the use of heuristics for tree-search. Although tree-search plays a major role in obtaining a tree estimate, there remains a limited understanding of its characteristics and how the elements of the statistical inferential procedure interact with the algorithms used. This study begins to answer some of these questions through a detailed examination of maximum likelihood tree-search on a wide range of real genome-scale data sets. We examine all 10,395 trees for each of the 106 genes of an eight-taxa yeast phylogenomic data set, then apply different tree-search algorithms to investigate their performance. We extend our findings by examining two larger genome-scale data sets and a large disparate data set that has been previously used to benchmark the performance of tree-search programs. We identify several broad trends occurring during tree-search that provide an insight into the performance of heuristics and may, in the future, aid their development. These trends include a tendency for the true maximum likelihood (best) tree to also be the shortest tree in terms of branch lengths, a weak tendency for tree-search to recover the best tree, and a tendency for tree-search to encounter fewer local optima in genes that have a high information content. When examining current heuristics for treesearch, we find that nearest-neighbor-interchange performs poorly, and frequently finds trees that are significantly different from the best tree. In contrast, subtree-pruning-and-regrafting tends to perform well, nearly always finding trees that are not significantly different to the best tree. Finally, we demonstrate that the precise implementation of a tree-search strategy, including when and where parameters are optimized, can change the character of tree-search, and that good strategies for tree-search may combine existing tree-search programs. [Algorithms; heuristics; maximum likelihood; NNI; phylogenetics; SPR; tree-search.] Phylogenetic tree estimation is a critical part of many biological studies and has been used to resolve evolutionary relationships (e.g., Aguinaldo et al. 1997; Delsuc et al. 2005; Nikolaev et al. 2007), to help understand and fight disease (e.g., Bush et al. 1999; Hahn et al. 2000), and even as evidence in court (Metzker et al. 2002). Phylogenetic studies frequently use some form of optimality criterion to assess how well specific tree topologies describe the observed sequence data. Optimality methods typically work by finding the best scoring tree for a sequence alignment, which is taken to be the best estimate of the evolutionary relationships between a set of sequences. Confidence in that tree estimate is then assessed, typically using statistical procedures such as bootstrapping (see Whelan et al. 2001; Delsuc et al. 2005). One of the most popular optimality criterion methods is maximum likelihood (ML), which calculates the likelihood of the observed sequence data conditional on a specific tree topology and a substitution model of how sequences change over time (Felsenstein 2003). Much research effort has been devoted to the development of more realistic substitution models, with the expectation that they will improve the accuracy of phylogenetic tree inference (Delsuc et al. 2005). Much less attention has been given to the methods used to estimate the phylogenetic tree. Most practical applications require a tree-search strategy to try and find the ML tree estimate from the overwhelming number of possible tree topologies, and the rearrangements used to define the relationships between them (referred to hereafter as tree-space; see Felsenstein 2003). Different computer programs use different types of tree-search heuristics, but they are usually based on the idea of hill-climbing using a rearrangement algorithm to define neighboring trees (e.g., Whelan 2007; Whelan 2008), although there have been attempts to use genetic algorithms (Lewis 1998; Zwickl 2006) and simulated annealing (Stamatakis 2005). It is established that hill-climbing can produce many different optimal trees depending where the algorithm starts, and that the number of optima may vary from data set to data set (e.g., Salter 2001; Morrison 2007). Only the optimal tree with the highest likelihood, the global optimum, has the appealing properties of the ML estimator. Unfortunately, tree-search is NP-hard, which means that once an optimum is located there is no way of knowing whether it is the global optimum or a lower scoring local optimum (Chor and Tuller 2005). Some insight into the difficulty of recovering the optimal tree can be obtained by rerunning an analysis from different starting points and seeing how frequently the tree-search algorithm identifies other optima (Vinh and von Haeseler 2004). In alignments where there are large numbers of different optima, the tree-search problem is difficult and the heuristic will be highly dependent on its starting location, indicating that one should treat any individual estimate with caution. In contrast, if rerunning tree-search from different starting points recovers few optima, one may conclude that there is a better chance of recovering the optimal tree. Note that this approach is very different from bootstrapping; the reruns of tree-search use exactly the same data and the reason 228 2012 MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH the algorithm returns a different tree is a consequence of the structure of tree-space. The efficacy of different rearrangement operations in hill-climbing is relatively poorly characterized (although see Morrison 2007) and, although more expansive rearrangements are expected to reduce the number of optima at greater computational cost, our understanding of how different rearrangements affect tree-search is limited (although see Whelan and Money 2010). Furthermore, we know tree-search performance varies between data sets, but it is not clear what properties of alignments affect the difficulty of tree-search and whether we can identify easier tree-search problems a priori. There is also a complex and largely uncharacterized interplay between the tree-search problem, different models of substitution, and different implementations of tree-search. Improving our understanding of the factors affecting the difficulty of tree-search is an important step toward developing improved heuristics that perform well on a wide variety of data types. Although tree-search is NP-hard and it is impossible to guarantee good performance, experience with other NPhard problems, such as the travelling salesman problem, show that more effective heuristics lead to good results in the majority of cases. In this study, we investigate the tree-search problem for a wide-range of real amino acid and nucleotide alignments under a variety of popular substitution models and computer programs. Our goal is two-fold, to learn about the factors that affect the topography of tree-space, and to provide pragmatic suggestions that will aid phylogenetic inference with existing methods. To investigate the topography of tree-space, we examine how tree-space varies between alignments, substitution models, and heuristic tree-search algorithms. We investigate how the difficulty of tree-search differs between alignments and use correlation analyses to identify predictors for the difficulty of tree-search. We also examine whether optima share any properties, such as their relative size or their location in tree-space. In our study of tree-search heuristics currently used in programs, we examine whether some approaches clearly outperform others, judged by statistically comparing the tree they find with the globally optimal tree (or the best tree found during extensive tree-search). We also propose a range of quick-to-compute statistics that may be predictive of how difficult the tree-search will be for any given alignment. We address these questions by first examining a small data set that is amenable to exhaustive tree-search, identifying important trends associated with tree-search. We then use a sampling approach to extend our results to heuristic tree-search on much larger data sets. M ATERIALS AND M ETHODS Data Sets We examine three high-quality phylogenomic data sets consisting of 8-, 20- and 40-taxa that have been handcrafted and used extensively in other studies, and a 229 disparate set of alignments taken from TreeBase (Morell 1996) that have previously been used to benchmark treesearch programs (Guindon et al. 2010). Phylogenomic data sets consist of a series of genes taken from the same set of taxa, leading us to expect a single tree relating the taxa, and enabling us to compare results between genes to highlight similarities and differences in tree-space caused by alignment properties. The eight-taxa yeast data set, taken from Rokas et al. (2003), consists of ungapped nucleotide sequence alignments for 106 genes. The 20- and 40-taxa phylogenomic data sets are subsets of the data set used by Philippe et al. (2005), whose data consist of gapped amino acid sequence alignments for 146 genes, each having sequences from between 25 and 49 taxa. To create the 40-taxa data set, we select the 40 taxa that occur most frequently across the 146 genes, retaining all genes that contain sequences for the selected taxa. From these genes, any that contains one or more sequences with >10% unknown characters or gaps is removed, leaving a total of 20 genes in the 40-taxa phylogenomic data set. The 20-taxa data set is produced in a similar manner, with the additional rule that genes in the 40-taxa data set were excluded. One further gene is removed because it causes irrecoverable errors in phyml, resulting in a total of 52 genes in the 20-taxa phylogenomic data set. The benchmark data set consists of the medium nucleotide and amino acid sequence alignments used to investigate the performance of phyml in Guindon et al. (2010), with some minor modifications. To allow straightforward comparison between trees, we remove redundant sequences from each alignment using RAxML. From these reduced alignments, we discard all with 10 or fewer or greater than 100 sequences, to ensure we investigate an appropriate range of data while maintaining computational tractability. The refined benchmark data set used in our study contains 41 nucleotide alignments and 40 amino acid alignments. Exhaustive Tree-Search in the Eight-Taxa Phylogenomic Data Set There are 10,395 bifurcating trees describing all possible evolutionary relationships of the yeast species in the eight-taxa data set, making it amenable to exhaustive tree-search and consequently a complete analysis of the topography of tree-space. For this study, we examine only the extremes of nucleotide model complexity, the Jukes and Cantor model (JC) and the general time reversible model with Γ -distributed rates-across-sites (GTR+Γ ) (for details about these models see Felsenstein 2003). This choice of models is limited but more detailed analyses show that although model choice does affect the topography of tree-space, it does so in an unpredictable manner and does not substantially affect the difficulty of the tree-search problem (see Supplementary Table S1 and associated text for full details, available from http://www.sysbio.oxfordjournals.org/). A modified version of standard steepest ascent hillclimbing algorithms (Whelan 2008) is used to identify 230 SYSTEMATIC BIOLOGY the optimum found when starting from every tree. These algorithms function by iteratively improving the current tree topology using the following scheme: (i) assign a start tree to the current tree object, (ii) use a rearrangement operation to define the neighborhood of the current tree, (iii) calculate likelihoods for the trees in the neighborhood and assign the highest scoring as the new current tree, and (iv) if no improvement in likelihood occurs, then tree-search reaches an optimum and stops, otherwise go to (ii). We investigate treesearch using nearest-neighbor-interchange (NNI), and subtree-pruning-and-regrafting (SPR) rearrangement operations (for details of operations see Whelan 2008). Our modification to hill-climbing only affects NNI treesearch, which enables the algorithm to escape from multifurcations when they would otherwise get stuck (see Whelan and Money 2010 for details). Whenever a multifurcation is identified during an NNI tree-search, we identify the subset of connected bifurcating trees it contains. The neighborhood used for the next round of hill-climbing is then defined by performing NNI rearrangements to this subset of trees. Number and size of optima.—Throughout this study we assume that the number of different optima that can be reached during tree-search is a suitable proxy for the difficulty of the tree-search problem. The number of optima for a gene is defined as the number of optima identified during tree-search under a specific rearrangement strategy from all possible starting points in tree-space. The size of an optimum is defined as the number of start trees that reach that specific optimum when performing tree-search, which may be reflective of how frequently an optimum may be encountered if start trees were sampled at random. Therefore, large optima will be reached from many start trees, whereas small optima will be reached from relatively few start trees. In a small number of cases, an optimum is a multifurcating tree, with two or more neighboring bifurcating trees with approximately the same likelihood (tolerance 10−5 ). These clusters of bifurcating trees are grouped together as a single multifurcating optimum. Statistical comparison of optima.—Different runs of treesearch may yield different point estimates of the tree, but it is not known whether these tend to be significantly different to one another or significantly different to the global optimum. We assess whether each local optimum is significantly different to the global optimum using the SH test (Shimodaira and Hasegawa 1999), implemented in PAML (Yang 1997). For brevity, hereafter we denote the “95% confidence interval of the globally optimal tree” to be the set of trees that cannot be rejected by the SH-test with 95% confidence as being different from the globally optimal tree, although we note the atypical usage of this phrase. VOL. 61 Optima clustering.—Complete knowledge of tree-space allows us to perform the NP-hard calculation of how many rearrangement operations are required to transform tree A into tree B. We use this metric to investigate whether optima cluster together more than expected by chance under a particular rearrangement scheme. We compute the mean NNI distance between our n identified optima and assess the significance of any clustering observed using a bootstrap approach. We take 1000 draws from the null distribution of no significant clustering by sampling n randomly chosen trees, with the condition that none is a neighbor to any other, and computing their mean NNI distance. For genes where there are multifurcating optima, we use the minimum distance between any two bifurcating trees within that optimum and use an equal-sized multifurcation during our sampling procedure. Correlating the number of optima with gene and tree properties.—The relative impact of tree and gene properties on the difficulty of tree-search are assessed by calculating the Spearman correlation coefficients between specific properties and the number of optima identified by tree-search. We present correlations for three major properties: (i) tree length, defined as the sum of all branches of the globally optimal tree; (ii) alignment length, defined as the length of the gapped sequence alignment associated with a gene; and (iii) the difference in likelihood between the fully resolved globally optimal tree and the unresolved star-tree (the tree with no resolved internal branches). The statistic in (iii), which we label Δ ln L̂, is intended to represent the information content in the sequence data for resolving the tree, where the star-tree, which has a single multifurcating internal node, represents a model with no topological information and the globally optimal tree represents the fully resolved model. The Δ ln L̂ statistic is therefore proportional to the difference in Akaike’s information criterion value (Akaike 1974), although it does not address the problem that some branches are easier to resolve than others. Note that we examined other gene and tree properties, but none were found to have correlations as strong as those we present. We also investigate these correlations using the neighbor-joining tree as a proxy for the global optimum, implemented through the PAL library (Drummond and Strimmer 2001). Parameter distributions.—Many previous studies assume that parameter estimates in the substitution model are stable as long as a “good” tree is used (e.g., Whelan and Goldman 2001). We investigate the estimates of tree length and the α parameter of the Γ -distributed rates-across-sites model by examining their distribution across the 10,395 bifurcating trees relating 8 taxa for all 106 genes. For each gene, we calculate the mean rank of the global optimum and the skewness of the distribution. 2012 MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH Heuristic Tree-Search Using Sampling for the 20- and 40-taxa Phylogenomic and Benchmarking Data Sets The size of tree-space for these data sets makes exhaustive tree-search computationally unfeasible, with the 20- and 40-taxa sets having in the order of 10 20 and 1055 possible topologies, respectively. For such large data sets, there are two computational limitations. First, a fully defined steepest ascent algorithm is computationally too slow, so instead we use two popular and fast phylogenetic tree-search programs: phyml v3.0.1 (NNI and SPR: Guindon and Gascuel 2003) and RAxML v7.04 (SPR: Stamatakis 2006). Second, the total number of trees is too large to allow exhaustive examination of tree-space, so instead we use a sampling approach. For each data set under each condition examined, we randomly sample 1000 trees (uniform probability for each bifurcating tree topology) and use these trees as the starting point to perform independent runs of treesearch to identify optima. For the benchmarking data set, we use only 100 random samples due to computational limitations of examining alignments with many sequences. In all cases, the ML score of optima are recalculated using PAML to ensure an unbiased comparison between programs. For the phylogenomic data sets, each alignment is examined under extremes of model complexity, using EQU (the equiprobable model; Bishop and Friday 1987) and WAG+F+Γ (Yang 1994; Whelan and Goldman 2001). For the benchmarking data set, we examine only a complex model, GTR+Γ for nucleotide alignments and WAG+F+Γ for amino acid alignments. In RAxML, EQU is invoked by specifying a user-defined substitution matrix and homogeneous rates-across-sites is enforced by setting the number of rate categories to one. Number and size of optima.— Overall numbers of optima cannot be calculated using a sampling approach, but the relative numbers of optima discovered by phyml and RAxML from the randomly sampled start trees should be indicative of the difficulty of tree-search. For the phylogenomic data set, the relative size of the sampled optima are calculated as the number of start trees that lead to them. Note that when investigating the relative size of optima, we exclude genes with >200 optima to ensure that our estimates of optima size are accurate. We also compare the optima identified by sampling with those located using out-of-the-box (OOB) SPR settings for both RAxML and phyml. Statistical comparison of optima.— The best tree identified during tree-search is taken as a proxy for the global optimum, and we compare the 95% confidence interval of this best tree with the other optima we identify using the SH test implemented in PAML (Yang 1997) and Consel (Shimodaira and Hasegawa 2001). We also assess whether the optima found using OOB SPR search settings for each program falls within the 95% confidence interval of the best optimum found from the sampled start trees. 231 Optima clustering.—It is NP-hard to calculate NNI or SPR distances between trees (SPR: DasGupta et al. 1999; NNI: DasGupta et al. 2000), so for our larger data sets we use the Robinson–Foulds (RF) distance metric (Robinson and Foulds 1981). The average RF distance between the n sampled optima is calculated and a bootstrap procedure used to assign P values of clustering by comparing the observed distance to distribution of average RF-distances between n randomly sampled trees. Correlating the number of optima with gene and tree properties.—To assess the difficulty of tree-search across alignments, it is necessary to assume that each alignment has the same evolutionary history. For the purpose of this study, we ignore minor variations in gene tree topology that can result from (e.g.,) incomplete lineage sorting or model misspecification. This assumption is only true for our phylogenomic data sets, so the benchmarking data are excluded from these analyses. Spearman correlation coefficients are calculated between the observed number of optima and gene length, tree length, and likelihood difference between the startree and the best optimum identified (Δ ln L̂). Parameter distributions.— For each alignment examined, the distribution of tree length and rates-across-sites parameter, α, across trees are approximated by taking estimates from our random start trees, for both the simple and complex model. The parameter estimates obtained for the best tree found during heuristic tree-search are then compared with this distribution of randomly selected trees. R ESULTS Exhaustive Analysis of Eight-taxa Yeast Phylogenomic Data Set We characterize the tree-search problem for the 106 genes that comprise the phylogenomic data from Rokas et al. (2003). We only present results from NNI treesearch because very few genes produce multiple optima under SPR rearrangements; under JC (GTR+Γ ), we find that 101/106 (105/106) genes have only a single optimum using SPR. The global optimum is frequently the largest optimum.— Figure 1 compares the ordered rank of identified optima with the average size of optima relative to that expected if all optima were of equal size. There is a strong tendency for the global optimum to be larger than other less good optima under both models: in other words, the highest scoring optimum has more start trees that reach it during tree-search than expected by chance. The global optimum is on average 2.37 times larger than expected under JC (solid black line), and 2.16 times larger than expected under GTR+Γ (gray dashed line). Similarly, the second best optimum found is on average 1.03 (1.07) times the size expected by chance under JC (GTR+Γ ), whereas the remaining optima are usually 232 SYSTEMATIC BIOLOGY VOL. 61 FIGURE 1. The number of start trees leading to an optimum relative to that which would be expected if the number were uniformly distributed (total number of trees divided by the number of optima for an individual gene) for the eight-taxa yeast phylogenomic data set. Log-likelihood scores are used to rank optima. The solid black line is under JC, and dotted gray line under GTR+Γ . smaller than expected. These averages are confirmed when examining individual genes: under JC, the global optimum is the largest in 95/106 genes, whereas it is the largest under GTR+Γ for 90/106 genes. The genes where this does not hold tend to have large numbers of optima and in all of these cases the global optimum is one of the four largest optima. Note that the spike in the number of optima between rank 15 and 20 under GTR+Γ is from a single gene in our analysis that had many optima. Statistical comparison of optima.—Under GTR+Γ , we find that the number of trees that lie within the 95% confidence interval of the global optima varies widely between alignments with 41–10,393 trees falling within the confidence interval (average 1405). For 101/106 genes examined, the trees in the confidence interval are fully connected by NNI, in other words they form one group in tree-space because a series of NNI rearrangements exists between any pair of trees in the confidence interval such that no intermediate steps contain a tree outside the confidence interval. For the remaining five genes, the trees in the confidence interval form two distinct groups separated by trees outside the confidence interval. Of the 91 genes with multiple optima under GTR+Γ , we find that 28 have local optima that the SH test finds significantly different from the global optima. When averaged across all 91 genes, we find 14.6% of the locally optima found are significantly different to the global optima. The results for JC are broadly similar. Clustering of optima.—In Figure 2a, we show a representation of tree-space under GTR+Γ for the genes YBR198C (black) and YLR389C (gray). Tree-search on YLR389C is easy; there is a single optimum, marked with a gray X, and as trees increase their NNI distance from this optimum they have progressively lower likelihoods. For YBR198C, tree-search is more complex. The likelihood decreases as the NNI distance from the global optimum increases, but the slope is less steep. Furthermore, there FIGURE 2. a) Average log-likelihood differences between resolved tree topologies and the star tree (Δ ln L̂) under GTR+Γ plotted against their distance from the globally optimal tree. The figure summarizes information from the 10,395 trees linking the eight-taxa yeast phylogenomic data set in the YBR198C (black) and YLR389C (gray) genes, with solid lines showing the average log-likelihood and dotted lines describing the 95% interval in the distribution. Crosses in the figure show the location of optima; note the two crosses at distance zero are the global optima. The histogram (top) shows the relative fraction of trees at each distance from the global optimum, which is dependent on the shape of the globally optimal tree. In this case, both genes have the same optimal tree. b) Correlations between Δ ln L̂ and the number of optima per gene under JC (crosses) and GTR+Γ (squares). Best fit lines are shown for JC (solid) and for GTR+Γ (dashed). are 16 local optima (each labeled with a black X) clustered between three and eight NNI operations from the global optimum and these tend to have higher than average likelihoods. The clustering of optima observed in YBR198C also occurs in other genes, but their location is unpredictable. We examined Δ ln L̂, and α, and found that none of these factors predicted which start trees are attracted to specific optimum. We examine the genes with multiple optima in more detail. Our bootstrap analyses show that for the overwhelming majority of genes, the mean NNI distance between optima was less than expected by chance and under JC (GTR+Γ ) we find 49/92 (50/91) genes display significant clustering (P < 0.05; Bonferroni correction). There is no clear interpretation of this clustering because we cannot differentiate between 2012 233 MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH TABLE 1. Correlations between gene properties and the number of optima discovered during tree-search Yeasta Factor Δ ln L̂h Genelength Treelengthi Small metazoa NNIb Small metazoa SPRc Large metazoad JCe GTR+Γ f EQUg WAG +F+Γ EQU WAG +F+Γ EQU WAG +F+Γ −0.50* (−0.52*) −0.40* 0.29* (0.29*) −0.51* (−0.52*) −0.42* 0.34* (0.34*) −0.76* −0.70* −0.20 −0.80* −0.70* −0.06 −0.46* (−0.47*) −0.35* (−0.39*) −0.24 (−0.04) −0.59* (−0.47*) −0.53* (−0.40*) 0.00 (0.06) −0.87* (−0.74*) −0.80* (−0.74*) −0.38 (−0.19) −0.59* (−0.26) −0.52* (−0.15) −0.18 (−0.13) ∗ Significant correlation. from eight-taxa yeast species examined using NNI. Results using the neighbor-joining tree instead of the optimal tree are shown in parentheses. b Sampled results from 20-taxa metazoan data examined using NNI. c Sampled results from 20-taxa metazoan data examined using SPR using RAxML. Results from phyml are shown in parentheses. d Sampled results from 40-taxa metazoan data examined using SPR using RAxML. Results from phyml are shown in parentheses. e Jukes and Cantor model. f General time reversible model with Γ -distributed rates-across-sites. g Equiprobable model. h The log-likelihood difference between the globally optimal tree and the star topology. i The sum of all branch lengths of the globally optimal tree. a Results potential causes. One cause would be one or more highly supported branches in the data, which means that all optimal trees contain this branch. Alternative causes could be a clustering of quite dissimilar optimal topologies that share few (if any) features, systematic exclusion of particular branches, or some potential bias in tree-shape. Correlation between number of optima and data properties.— Figure 2a demonstrates another trend that holds between gene comparisons: the number of optima in a gene is correlated with the value of Δ ln L̂ at the global optimum. Figure 2b plots the number of optima compared with the value of Δ ln L̂ across all genes under JC and GTR+Γ , and we find there are significant, but imperfect, correlations between these variables (Table 1). These correlations also occur when using the difference in likelihood between the star-tree and the neighborjoining tree (JC: r = −0.52; GTR+Γ : r = −0.52), suggesting this form of statistic may be predictive even if the true global optimum is not known. Table 1 also shows that the number of optima in a gene is also significantly correlated with the gene length and tree length (sum of branches of the globally optimal tree), albeit weaker than those between the number of optima and Δ ln L̂. Parameter distributions.— The distribution of parameter estimates across tree-space are summarized in Table 2. Globally optimal trees tend to have relatively high esti- mates of α from Γ -distributed rates-across-sites and low estimates of tree length. Table 2 also shows the skew of the parameter distributions. There is positive skew for α, with the majority of trees having low parameter estimates, in contrast to the high estimate in the globally optimal tree. There is a negative skew for tree length, with the majority of trees having longer estimates, in contrast to the shorter estimate for the globally optimal tree. Heuristic tree-search in 20- and 40-taxa metazoan data sets We use a sampling approach to examine whether the trends observed in smaller data sets under exhaustive tree-search extend to larger data sets. Furthermore, we investigate the effect on tree-search of different implementations of tree-search strategies. Results presented are for WAG+F+Γ , and those for EQU follow a similar pattern unless described otherwise. Number of optima.—We find large differences in the number of optima observed both between models and between genes for both data sets. For the 20-taxa data set, 1000 samples of NNI tree-search using phyml reveal an average number of optima of 196 (range 1–753), whereas NNI tree-search using phyml on the 40-taxa data set results in very large numbers of optima for most genes. For several genes, different optima are found from each of the 1000 start trees examined, and for 16/20 genes more than 900 different optima are found. These large TABLE 2. Median ranks of parameter estimates for the global optimum tree and the median skewness of parameter distributions αa Global optimum rank Skewness GTR+Γ 2 (1–38)c 1.48 (0.51–3.66)c Lengthb JC 10,395 (10,385–10,395)c −1.08 (−1.39 to 0.44)c GTR+Γ 10 394 (10,352 – 10,395)c −0.90 (−1.49 to 0.19)c Notes: A total of 10,395 trees are examined. Ranks are assigned from highest parameter estimate to lowest parameter estimate for each gene. A rank of 1 for α indicates that the globally optimal tree had the highest estimate of α across all possible tree topologies. a Estimated variance parameter from Γ -distribution. b Estimated tree length calculated as the sum of all branches. c Range of values across all genes. 234 SYSTEMATIC BIOLOGY VOL. 61 numbers preclude any detailed analysis of NNI treesearch on the 40-taxa data set. For SPR tree-search, we generally find that model and implementation affect the number of optima recovered. On average, the more sophisticated WAG+F+Γ model finds fewer optima during tree-search than EQU, whereas phyml consistently finds more optima during tree-search than RAxML. Applying phyml to the 20-taxa data set, there are an average of 11 (range 1–90) optima under EQU and 8 (1–59) optima under WAG+F+Γ , whereas for RAxML we find an average of 5 (1–32) optima under EQU and 4 (1–45) optima under WAG+F+Γ . Applying phyml to the 40-taxa data set results in an average of 290 (60–594) optima under EQU and an average of 203 (17–838) under WAG+F+Γ , whereas RAxML results in 112 (3–415) optima under EQU and 94 (4–817) optima under WAG+F+Γ . The optima obtained using sampling are compared with those obtained with OOB settings. For these phylogenomic data sets, no OOB SPR tree-search provided better scoring optimum. Differences between OOB and sampling are most pronounced for the 40-taxa data set, where sampling provided a higher scoring tree in 10/20 (18/20) genes for RAxML (phyml). Size of optima.—The number of times an optimum is reached during tree-search from 1000 different start trees is used to approximate the relative size of the optima discovered. The best optimum located for NNI tree-search is usually larger than other lower scoring optima in all conditions examined, but it is difficult to draw firm conclusions because the number of optima is frequently very large. Moreover, the expected size of an optimum is confounded with the number of optima found, so that an increase in relative optima size may not result in an increased probability of recovering that optima from a random start location in tree-space. The number of optima for SPR tree-search is much lower, allowing a more detailed comparison. Figure 3a shows the relative size of the optima for the 20-taxa data set using SPR tree-search using phyml and RAxML. We find that using phyml the best optimum is on average 3.2 times larger than expected, whereas the average using RAxML is 2.4 times larger than expected. Figure 3b shows that SPR tree-search on the 40-taxa data set follows a similar pattern. The best optimum recovered using RAxML is 4.3 times larger than expected, whereas for phyml the best optimum recovered is 8.4 times larger than expected. Statistical comparison of optima.—We examine whether the optimal trees found by different search methods and from different starting points fall within the 95% confidence interval of the best tree found. For the 20-taxa phylogenomic data set, we find that NNI frequently finds trees that are significantly different to the best tree found, 45.8% of searches yielding trees significantly worse than the best tree found. In contrast, under both programs, tree-search using SPR always finds optima FIGURE 3. The number of start trees leading to an optimum relative to that which would be expected if the number were uniformly distributed (total number of sampled trees divided by the number of optima located for an individual gene) for (a) our 20-species metazoan data set using SPR moves and (b) our 40-species metazoan data set using SPR moves under WAG+F+Γ . Log-likelihood scores are used to rank optima. Solid lines denote tree-search using phyml, whereas dashed lines denote tree-search using RAxML. within the 95% confidence interval of the best tree for both the 20- and 40-taxa data sets. The optimum discovered using OOB settings for both Phyml and RAxML also always fall within the 95% confidence interval of the best optimum found from 1000 random start trees, for both data sets and both models. Optima clustering.—We examine the mean RF distance between optima for both the 20- and 40-taxa data sets and compare it with that expected if the optimal trees were randomly distributed through tree-space. We find significant clustering for every gene examined, under all implementations, and all tree-search methods (P<< 0.01; Bonferroni correction). Correlation between number of optima and data properties. — Table 1 shows the correlation between the number of optima found in a gene and (i) Δ ln L̂, (ii) gene length, and (iii) tree length. In all cases, the strongest correlations are found between Δ ln L̂ and the number of optima, suggesting in some conditions it may be a useful predictor of the difficulty of tree-search. Note the difference in the relative strength and significance 235 51.7 (3.8)a 40.1 (2.1)a 11.4 (2.9)a 42.1 (0.2)a 111.7 94.0 17.9 58.6 number of common optima found in both runs. Benchmarking data set a Average 2nd run RAxML 83.1 (50.7)a 60.2 (24.6)a 11.8 (4.6)a 42.5 (3.7)a 109.3 (9.5)a 83.6 (7.0)a 16.5 (1.4)a 46.6 (0.7)a 289.8 203.2 25.0 62.0 EQU WAG+F+Γ WAG+F+ Γ (protein) GTR+ Γ (DNA) 1st run RAxML 2nd run phyml 2nd run RAxML 1st run phyml 2nd run phyml 1st run Model TABLE 3. Average number of optima found using different implementations of SPR tree-search The effect of implementation on rearrangement algorithms in tree-search.—Several results in previous sections highlight a difference in the performance of SPR tree-search between phyml and RAxML, including the number of optima recovered using both strategies and the correlations detected between the number of inferred optima and data properties. We further examine this difference in performance by rerunning tree-search from the optima identified under different programs. The steepest ascent algorithm used for exhaustive tree-search would recover exactly the same number of optima, but the heuristics used in different programs will affect their ability to recover the same optimum. Running a program a second time from a previously recovered optimum tests the robustness of the topography of treespace for an individual implementation. Note that for these reruns, we only provide the topology and sequence data to the program, which means that branch length and substitution model parameters need to be reestimated. In the “1st Run” columns of Table 3, we show how many optima are recovered when tree-search is initialized from 1000 randomly sampled trees using phyml and RAxML. Using phyml for the first run, we find 203.2 optima, which reduces to 83.6 when phyml is run a second time; an average of only 7.0 optima are shared between the first and second run. If RAxML is used to perform the second run, the average of 203.2 optima reduces to 60.2, but with more optima retained in the second run. When the first run of tree-search is performed with RAxML, we usually find fewer optima than phyml (an average of 94.0 for RAxML compared with 203.2 for phyml). The optima identified by RAxML appear to be relatively robust, with the number of optima decreasing by only a small amount when RAxML is used for the second run. The optima shared between runs have 1st run Parameter distributions.—The parameter estimates for the best tree found during tree-search are compared with the distribution of parameter estimates across tree-space estimated from 1000 random trees. The results for both data sets qualitatively match our observations for the Rokas data. The best tree has a strong tendency to be shorter than randomly sampled trees under both data sets. Under WAG+F+Γ the rates-across-sites parameter, α, tends to have relatively high values for the best tree, although the pattern is marginally weaker in the 40-taxa data set, with 12/20 genes having a value of α greater than that observed in the sampled distribution. 40-taxa phylogenomic data set of the correlation between the statistics examined and the choice of program used for tree-search. The number of optima identified by RAxML are strongly correlated with Δ ln L̂, whereas correlations are relatively weaker and nonsignificant for optima identified by phyml, suggesting that the choice of program and the heuristics they implement affect the way phylogenetic information is used. 110.1 (109.2)a 89.9 (85.7)a 16.4 (12.2)a 53.3 (17.7)a MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH Data 2012 236 SYSTEMATIC BIOLOGY a great deal of overlap. For tree-search, we find that the 94.0 optima identified by RAxML in the first run reduces to 89.9 in the second run, and on average 85.7 of these optima are the same as those recovered in the first run. When phyml is used to perform the second run, the number of optima reduces substantially and there is very little overlap between the optima from the first and second runs. The average number of optima reduces from 94.0 to 40.1, with an average of only 2.1 shared optima between the two runs. The numbers of optima recovered from different runs do not answer whether the first or second run yields the highest scoring optima. When the first run of treesearch is performed using phyml, we find a total 4063 different optima across all genes. A second run with phyml (RAxML) yields optima with a higher score in 3161 (3557) cases, optima with the same score in 187 (489) cases, and lower scoring optima in 715 (17) cases. There are 1880 optima across all genes under WAG+F +Γ when RAxML is used for the first run of tree-search. A second run with phyml (RAxML) yields a higher scoring optima in 799 (141) cases, an optima with the same score in 229 (1707) cases, and a lower scoring optima in 852 (32) cases. Qualitatively similar results are obtained when using EQU for tree-search although the initial number of optima tends to be higher (Table 3). From these optima a second run under RAxML nearly always finds optima that score the same or better than the original, and a second run with phyml frequently results in lower scoring optima. Heuristic Tree-Search in the Benchmark Data Sets Examining the performance of phylogenomic data sets provides some insight into the tree-search problem, but the genes are expected to share an evolutionary history and therefore have similar ML tree estimates. The chance choice of a set of sequences with an unusual tree-shape may affect our results, however; so, we extend our analyses to larger and more diverse data sets. Note that losing the assumption of a shared evolutionary history means that several comparisons between individual alignments can no longer be made. The effect of SPR implementation on the number of optima.— For each alignment examined from 100 different start trees, we find the number of different optima discovered varies from 1 to 100 for both phyml and RAxML, although the number of optima is correlated between the two programs (proteins: r = 0.71; DNA: r = 0.72). For the protein (DNA) data set for 36/41 (28/40) sequences, RAxML finds fewer or the same number of optima as phyml with, on average, phyml finding 2.71 (2.16) times as many optima as RAxML. Running OOB tree-search on the protein data set reveals similar patterns to those revealed in the larger phylogenomic data sets. For the DNA data set the performance of OOB tree-search is VOL. 61 more varied. Sampling finds the best tree for the majority of genes (RAxML: 29/40; phyml: 35/40), and for the remaining genes, there is a roughly even split between (i) sampling and OOB locating the same best tree, and (ii) OOB finding the best tree. Statistical comparison of optima.—In nearly all cases examined, the local optima identified with SPR tree-search from 100 random start trees falls within the 95% confidence interval of the best tree identified. The single case where this does not occur is for a 67 sequence nucleotide alignment, where for one random start tree phyml finds an optimum that is significantly worse than the best tree found. Our results suggest that both programs are highly efficient at finding good tree estimates, even though their heuristics can result in quite different topographies of tree-space. OOB tree-search for phyml and RAxML always return an optimum indistinguishable from the best found from 100 random start trees. Optima clustering.—There is significant clustering for every nucleotide and amino acid alignment examined under both tree-search programs (P << 0.01; Bonferroni correction). Parameter distributions.—The best tree found is shorter than any of the 100 randomly sampled trees in all of the amino acid and nucleotide alignments. For the majority of the alignments examined, there is a strong tendency for the rates-across-sites parameter, α, to be higher than the randomly sampled trees, although there are some notable exceptions. For one of the amino acid alignments, there is very high sequence similarity (86% constant sites), which results in similar values of α for all trees. For one of the DNA alignments, the optimal tree has a very low value of α and we could not identify the cause. The effect of implementation on rearrangement algorithms in tree-search.—The results for the benchmark data set shown in Table 3 are qualitatively similar to those obtained from the 40-taxa data set. On average, the number of optima inferred under RAxML is lower than the number of optima inferred under phyml for both nucleotide and amino acid alignments, suggesting that the heuristics in RAxML may be more effective than those implemented in phyml. The relative stability of optima found by different programs for the amino acid alignments are similar to the 40-taxa phylogenomic data set, confirming that our results extend to general protein data sets. We find that optima are much less stable for nucleotide alignments under both tree-search programs, although RAxML is marginally more stable that phyml. This instability may reflect the greater number of free parameters in GTR+Γ . A poor choice of starting values can lead the programs away from good regions of tree-space before closing-in on a new optimum. For the nucleotide alignments, the 2012 MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH log likelihoods of the optima found in the second run is highly variable for both programs, with neither program consistently finding better or worse optima. Further details of these runs are provided in the Supplementary Material. D ISCUSSION Our results provide a detailed insight into the topography of tree-space and some of the different factors that affect it. Previous research has examined the heuristics used during tree-search in a limited way, showing they may yield multiple optima (Salter 2001), that model choice can affect the global optimum (Yang et al. 1995), and that more general rearrangement strategies are frequently more successful at recovering “good” optima (Guindon et al. 2010). Our study examines tree-space in a comprehensive manner and extends previous observations in several key areas. Our first group of findings relates to the shape of tree-space and how it may vary between different alignments. We find there are major differences in the topography of tree-space, with the tree-search problem seeming much easier in some genes than others. As the number of sequences in an alignment grows, these differences become more pronounced and the heuristics used for tree-search play a greater role. On the most difficult alignments, we find different optimal tree topologies for every starting tree we examine. Despite these differences, we are able to make some useful general statements about tree-space. The global optimum (or at least the optimum with the highest log likelihood found during tree-search) tends to have the greatest number of trees attracted to it, even when complex implementations of tree-search heuristics are used, suggesting that the topography of tree-space favors finding good solutions. We also find that optima exhibit a strong tendency to cluster in tree-space, suggesting that good trees all share (sub)sets of characteristics. We are unable to recover what those characteristics are, but they may group or separate subsets of species in the tree. Further investigation of these characteristics may enable preliminary analyses of tree-search, which could be used to define a “reasonable” region in tree-space to search more thoroughly. Related to these findings is the interaction of treespace with the fit of the substitution model. We find that model choice affects tree-space, but in an unpredictable manner with no discernable pattern, suggesting that model choice should be governed by biological and statistical considerations, such as the model being an adequate description of the evolutionary process. Certain aspects of the substitution model, however, are strongly related to the quality of the tree estimate. All of our analyses demonstrate that the “best tree”, whether found by heuristic or exhaustive tree-search, has a significantly shorter tree length than the rest of tree-space. Associated with this observation, we also find the best tree has a weaker tendency to have the least rate variation 237 between sites and we offer an intuitive explanation linking these two observations. When a specific tree misses branches that occur in the optimal tree, the substitutions occurring on that branch must happen elsewhere in the tree. In some cases, for any given site, two substitutions on different branches will be required to explain the single substitution on the “correct” branch, which will lead to an increase in the number of inferred substitutions at that site and a lengthening of suboptimal trees. The number of “extra” substitutions required to explain the suboptimal tree for each site will vary randomly along the sequence, leading to some sites having more substitutions occurring under the suboptimal topology than others, which could, in turn, lead to an increase in among-site rate variation. This increased spread in rate variation may be compounded by the excess of zero rate (constant) sites observed in many sequence alignments. In addition to these general insights into the treesearch problem, our results also provide some useful observations for researchers performing phylogenetic tree-search. First, we note that the difference in loglikelihood score between the globally optimal tree (or neighbor-joining tree) and the star topology (Δ ln L̂) may provide a useful proxy for the phylogenetic “signal” available in an alignment to estimate a tree. Although the inverse correlation between Δ ln L̂ and the difficulty of tree-search is relatively weak, it does suggest a method of identifying genes in whole-genome studies that could be particularly amenable to tree-search. If faced with 100 genes, known values of Δ ln L̂, and limited computational resources, one would expect that phylogenetic analyses of the genes with 10 largest values of Δ ln L̂ would yield results that would be more robust to tree-search errors than picking 10 genes at random. We caution, however, that Δ ln L̂ is an imperfect statistic and certain properties of the tree may hinder its application. If, for example, the evolutionary history of a phylogenomic data set has one exceedingly long branch, then the value of Δ ln L̂ may solely reflect the support for that single branch, rather than the other, more difficult to resolve, parts of the tree. Second, the results presented here expand on our previous work (Whelan and Money 2010) by demonstrating that in most reasonable applications NNI is prone to large numbers of local optima and that these optima are frequently significantly different from the global (or best) optimum. The conclusions drawn from such inaccurate tree estimates could prove misleading and we therefore recommend avoiding NNI tree-search where possible. Third, our analyses provide positive support for the use of SPR as a tree-search strategy. In contrast to NNI, we find that for nearly all random start trees examined, SPR optimal trees tend to fall within the 95% confidence interval of the global (or best) optimum. Moreover, we find that when tree-search programs are run using default settings, the resulting optimal tree is within the 95% confidence interval for all of the 153 alignments. These results suggest that regardless of whether 238 VOL. 61 SYSTEMATIC BIOLOGY the programs have correctly identified the correct tree, the point estimates of trees produced by RAxML or phyml represent a useful summary of evolutionary relationships, providing that those estimated are treated with the caution befitting any point estimates without confidence intervals. Given the fast and efficient programs available for tree-search, SPR appears an adequate heuristic for moderately sized alignments with fewer than 100 sequences. We note it is not clear how SPR will perform on larger alignments, but our previous research suggests it does not suffer from some of the limitations of NNI (Whelan and Money 2010). Finally, despite the strong performance of SPR in locating trees that are not significantly different from the best tree, the effect of using suboptimal trees remains unclear when performing hypothesis tests to investigate tree topology or aspects of the substitution process. For many applications, it is still of benefit to locate the true ML trees or at least the best possible for a given amount of computational resource. When trying to find a globally optimal tree, our results suggest that neither phyml nor RAxML has consistently the best performance, and that the different parameter optimization strategies and other heuristics they implement have a substantial effect on the topography of tree-space. Consequently, we conclude that combining the output of both programs and ranking them with a highly accurate third party program, such as PAML, may yield better results. Moreover, our results on the stability of optima suggest that feeding very good tree estimates into such programs may yield trees with lower likelihoods, especially for relatively high parameter models such as GTR+Γ , and that a selection of different start trees should be tried if users wish to try and avoid local optima. between a well-resolved topology and the star topology may provide a proxy for the phylogenetic information in an alignment, and in a phylogenomic data set indicate which genes are the most amenable to tree-search; (ii) NNI tree-search performs poorly on real data and should be avoided; (iii) current implementations of SPR tree-search are likely to yield trees that are not significantly different from the globally optimal tree; and (iv) no single program is likely to yield the best tree estimate and the best strategy for finding it may involved combining runs from different programs with a range of different start trees. C ONCLUSIONS Our results provide several new insights that may be useful for researchers developing phylogenetic treesearch programs or whose studies involve estimating phylogenetic trees. For those developing tree-search programs, we identify properties of the globally optimal tree that may help drive improvements in treesearch algorithms and heuristics. Knowing that the globally optimal tree tends to have the greatest number of trees attracted to it and that it is frequently the shortest tree could all feed directly into current algorithms for tree-search. Equally, the strong differences in the topography of tree-space for different SPR-based phylogenetic tree-search programs should encourage authors of these programs to describe their search algorithms more explicitly, perhaps leading to the modularization of programs in such a way that would allow researchers to assess independently the relative effectiveness of both the tree-search algorithm and the numerical approximations required to speed up likelihood computation. For researchers more interested in the estimation of trees from existing programs our results offer four key observations: (i) the difference in log likelihood R EFERENCES S UPPLEMENTARY M ATERIAL Supplementary material, including data files and/or online-only appendices, can be found at http://www. sysbio.oxfordjournals.org/. F UNDING D.M. was supported by a Doctoral Training Centre studentship awarded to the University of Manchester by the Biotechnology and Biological Sciences Research Council, UK. A CKNOWLEDGMENTS The metazoan phylogenomic data set was kindly provided by Nicolas Lartillot. We also thank Junhyong Kim, Stephane Guindon, Mark Holder, Ron Debry, and an anonymous referee for their constructive comments, which have helped improve the manuscript. Aguinaldo A.M.A., Turbeville J.M., Linford L.S., Rivera M.C., Garey J.R., Raff R.A., Lake J.A. 1997. Evidence for a clade of nematodes, arthropods and other moulting animals. Nature. 387:489–493. Akaike H. 1974. New look at statistical-model identification. IEEE Trans. Autom. Control. 19:716–723. Bishop M.J., Friday A. 1987. Tetropad relationships: the molecular evidence. In: Pattterson C., editor. Molecules and morphology in evolution: conflict or compromise? Cambridge (UK): Cambridge University Press. p. 123–139 Bush R.M., Bender C.A., Subbarao K., Cox N.J., Fitch W.M. 1999. Predicting the evolution of human influenza A. Science. 286:1921–1925. Chor B., Tuller T. 2005. Maximum likelihood of evolutionary trees is hard. Lect. Notes Comput. Sci. 3500:296–310. DasGupta B., He X., Jiang T., Li M., Tromp J. 1999. On the linearcost subtree-transfer distance between phylogenetic trees. Algorithmica. 25:176–195. DasGupta B., He X., Jiang T., Li M., Tromp J., Zhang L. 2000. On computing the nearest neighbor interchange distance. In: Du D.Z., Pardalos P.M., Wang J., editors. Proceedings of the DIMACS workshop on discrete problems with medical applications. Volume 55. Providence (RI): American Mathematical Society. p. 125–143. Delsuc F., Brinkmann H., Philippe H. 2005. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6:361–375. Drummond A., Strimmer K. 2001. PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics. 17:662–663. Felsenstein J. 2003. Inferring phylogenies. Sunderland (MA): Sinauer Associates. 2012 MONEY AND WHELAN—PHYLOGENETIC TREE-SEARCH Guindon S., Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696–704. Guindon S., Dufayard J.-F., Lefort V., Anisimova M., Hordijk W., Gascuel O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of Phyml 3.0. Syst. Biol. 59:307–321. Hahn B.H., Shaw G.M., De Cock K.M., Sharp P.M. 2000. AIDS as a zoonosis: scientific and public health implications. Science. 287:607–614. Lewis P.O. 1998. A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Mol. Biol. Evol. 15:277–283. Metzker M.L., Mindell D.P., Liu X.M., Ptak R.G., Gibbs R.A., Hillis D.M. 2002. Molecular evidence of HIV-1 transmission in a criminal case. Proc. Natl. Acad. Sci. U.S.A. 99:14292–14297. Morell V. 1996. TreeBASE: the roots of phylogeny. Science. 273:569. Morrison D.A. 2007. Increasing the efficiency of searches for the maximum likelihood tree in a phylogenetic analysis of up to 150 nucleotide sequences. Syst. Biol. 56:988–1010. Nikolaev S., Montoya-Burgos J.I., Margulies E.H., Rougemont J., Nyffeler B., Antonarakis S.E., Progra N.C.S. 2007. Early history of mammals is elucidated with the ENCODE multiple species sequencing data. PLoS Genet. 3:e2. Philippe H., Lartillot N., Brinkmann H. 2005. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Mol. Biol. Evol. 22:1246–1253. Robinson D.F., Foulds L.R. 1981. Comparison of phylogenetic trees. Math. Biosci. 53:131–147. Rokas A., Williams B.L., King N., Carroll S.B. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 425:798–804. Salter L.A. 2001. Complexity of the likelihood surface for a large DNA dataset. Syst. Biol. 50:970–978. Shimodaira H., Hasegawa M. 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114–1116. 239 Shimodaira H., Hasegawa M. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics. 17:1246–1247. Stamatakis A. 2005. An Efficient Program for Phylogenetic Inference Using Simulated Annealing in High Performance Computational Biology Workshop, Denver, Colorado. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22:2688–2690. Vinh, L.S., von Haeseler A. 2004. IQPNNI: moving fast through tree space and stopping in time. Mol. Biol. Evol. 21:1565–1571. Whelan S. 2007. New approaches to phylogenetic tree search and their application to large numbers of protein alignments. Syst. Biol. 56:727–740. Whelan S. 2008. Inferring trees. In: Keith J., editor. Bioinformatics: data, sequence analysis and evolution. Totowa (NJ): Humana Press. p. 287–309. Whelan S., Goldman N. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18:691–699. Whelan S., Lio P., Goldman N. 2001. Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 17:262–272. Whelan S., Money D. 2010. The prevalence of multifurcations in tree-space and their implications for tree-search. Mol. Biol. Evol. 27:2674–2677. Yang Z. 1994. Maximum-likelihood phylogenetic estimation from DNA-sequences with variable rates over sites—approximate methods. J. Mol. Evol. 39:306–314. Yang Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556 Yang Z., Goldman N., Friday A. 1995. Maximum-likelihood trees from DNA-sequences—a peculiar statistical estimation problem. Syst. Biol. 44:384–399. Zwickl D.J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion [dissertation]. Austin (TX): University of Texas.
© Copyright 2026 Paperzz