Syst. Biol. 55(1):116-121, 2006 Copyright © Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DO1:10.1080/10635150500481648 Fundamental Differences Between the Methods of Maximum Likelihood and Maximum Posterior Probability in Phylogenetics BODIL SVENNBLAD,1 PER ERIXON,2 BENGT OXELMAN,2 AND TOM BRITTON,3 department of Mathematics, Uppsala University, Box 480, SE-751 06 Uppsala, Sweden; E-mail: [email protected] department of Systematic Botany, Uppsala University, Norbyvagen 18D, SE-75236 Uppsala, Sweden; E-mail: [email protected] (P.E.); [email protected] (B.O.) ^Department of Mathematics, Stockholm University, SE-106 91 Stockholm, Sweden; E-mail: [email protected] Abstract.— Using a four-taxon example under a simple model of evolution, we show that the methods of maximum likelihood and maximum posterior probability (which is a Bayesian method of inference) may not arrive at the same optimal tree topology. Some patterns that are separately uninformative under the maximum likelihood method are separately informative under the Bayesian method. We also show that this difference has impact on the bootstrap frequencies and the posterior probabilities of topologies, which therefore are not necessarily approximately equal. Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434,1996) stated that bootstrap frequencies can, under certain circumstances, be interpreted as posterior probabilities. This is true only if one includes a noninformative prior distribution of the possible data patterns, and most often the prior distributions are instead specified in terms of topology and branch lengths. [Bayesian inference; maximum likelihood method; Phylogeny; support.] of k = 4s possible ones. Denote the possible columns by Xi, X 2 / ..., Xit where, e.g., X[ = {A, A,..., A), X!2 = {C, C, ...,C} etc. The proportions of Xi, X2,..., Xjt, generated from the true phylogeny r with branch lengths b(T) are p = (pi, p 2 ,..., p*), where £ p,- = 1. Each {r, b{z)} induces a different p vector. The p-space is a continuous space that can be divided into different subspaces corresponding to the different topologies. The data matrix x is a sample from the true phylogeny. When assuming sites to be iid, the data matrix x can instead be represented by a vector n = {n\, n^,... n^), where ]T n, = N and each n, is the number of columns in x that equals X,. The vector p can then be estimated by p = (plf p2/... pk), which depends on n. Depending on the method used for phylogenetic inference and choice of model of evolution, each n vector gives an estimate of the topology. The discrete n space is therefore divided in different regions, Rj, giving f = r,- and there might also be regions where the estimate is not unique. In likelihood based methods the information in data is contained in the likelihood function, L(T,, b^Ti\ 6\n). The likelihood is the probability of data, n, f(n\ij, Uri\ 6), given the topology, r,, corresponding branch lengths, b^Ti) and model of evolution (with parameters 6), but seen as a function of r,-, b^ and 0. In the maximum likelihood approach, L(r,, UTi\ 9\n) is maximized for each r,- with respect to b^ and 0. The estimated topology TML is the topology r; with the largest maximized likelihood. In the Bayesian approach, prior distributions for r,, METHODOLOGY FOR ESTIMATING PHYLOGENETIC TREES b(Tl), and 0 have to be specified. The posterior probaWe are interested in finding the phylogenetic tree, T, bility can then be calculated by Bayes' theorem. In the of s species from aligned DNA sequences of length N. method of maximum posterior probability the estimate That is, we have a n s x N data matrix, x, where each row of the topology, TMPP, is the topology r, with the largest posterior probability, 7r(r,|n) (not the majority-rule conrepresents the DNA sequence for one of the species. The columns in x are assumed to be independent and sensus tree as given by, e.g., Huelsenbeck and Ronquist, identically distributed (iid) where each column is one 2001). To summarize, the estimate of the true phylogeny Phylogenetic methods based on likelihood aim to find the best topology by maximizing the likelihood function with respect to topology and branch lengths (maximum likelihood method, e.g., Felsenstein, 1981) or by comparing posterior probabilities for the different possible topologies (Bayesian inference, e.g., Rannala and Yang, 1996). Regardless of the method of inference, a measure of confidence, or support, is often desired for the estimated topology. Felsenstein (1985) suggested a nonparametric bootstrap procedure to obtain such a measure, a procedure that is commonly used with the maximum likelihood method. In Bayesian inference, the posterior probabilities are the support values. Efron et al. (1996) noted that bootstrap frequencies can be interpreted as posterior probabilities under certain circumstances. Those sentences have been frequently cited (e.g., Larget and Simon, 1999; Cummings et al., 2003; Simmons et al., 2004), claiming bootstrap support values and Bayesian posterior probabilities to be approximately equal. Other authors have noticed empirically that Bayesian posterior probabilities for the best supported clades are significantly higher than corresponding nonparametric bootstrap frequencies, (e.g., Suzuki et al., 2002; Wilcox et al., 2002; Alfaro et al., 2003; Douady et al., 2003; Erixon et al., 2003). We will, in Appendix 1, show that the statement of Efron et al. (1996) is true under the conditions therein given. In this paper, however, we will show that those conditions are violated in general phylogenetic inference. 116 2006 117 SVENNBLAD ET AL.—LIKELIHOOD-BASED METHODS IN PHYLOGENETICS is, depending on the method of inference, = argmaxi{maxb>9 L{zir b{xi\ 6\n)} DIFFERENCES BETWEEN MAXIMUM LIKELIHOOD AND MAXIMUM POSTERIOR PROBABILITY TABLE 1. With Jukes-Cantor model of evolution and with four taxa there are 15 different patterns contributing to the likelihood in different ways. Pattern no. Pattern Description 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 xxry Groups of two nucleotides equal within the group but not equal between groups. One nucleotide differs from the rest. XYXY XYYX XXXY XXYX XYXX YXXX XXYZ YZXX XYXZ XYZX YXXZ YXZX XYZU XXXX Efron et al. (1996) stated that, "The bootstrap probability that f * = f is almost the same as the aposteriOne group with two equal ori probability that r = r starting from an uninformative nucleotides, the other two differ from the group and from each prior density on p", where f * is the estimate of the topolother. ogy for a bootstrap replicate. In Appendix 1 this statment is proven. Note, however, that the parameter used is the 4s-dimensional vector p, where s is the number of taxa. All nucleotides different. All nucleotides equal. As explained earlier, the method of maximum posterior probability does not use the p vector as a parameter. The parameters used with this method are the topology r,, Assume data for taxa {a, b, c,d} consists of one single corresponding branch lengths b^Xi\ and the parameters column of pattern 8, XXYZ. In Appendix 2 it is shown of the model of evolution used, 6. The posterior probaanalytically that for this pattern the maximized likelibility for the topology is hoods for ri = [(a, b), (c, d)\, xi = {(a, c), (b, d)}, and T3 = f {(a, d), (b, c)} are indistinguishable. Now consider the method of maximum posterior probability for pattern XXYZ. The priors for topology and branch lengths are considered independent; that is, (2) where 7r(r,', b^Xi\ 0) is the prior for those parameters. Even though flat priors for topology and branch lengths are used, that is not equivalent to a flat prior for p, which Assume flat priors for topology, 7r(r,) = 1/3, as well as simply is not considered in current implementations of for branch lengths, 7r(b(T|)) = 1/M, that is the prior for a Bayesian phylogenetic inference. Denote the regions in the n-space where the method of branch length is uniform on the interval [0, M] with exmaximum likelihood gives estimates f = r,- with R, (see pected value M/2. The priors will then only be scaling Appendix 1). Denote the corresponding regions where factors and the main contribution to the posterior distrithe method of maximum posterior probability gives es- bution is the integral of the likelihood. Those integrals timates f = n with R,-. In Efron et al. (1996), a distance- can be calculated analytically: based method was used to obtain the regions R,. When stated that bootstrap support values could be interpreted pM pM as posterior probabilities, those regions remained unJo Jo changed. Hence, in their example, R, ~ R,'. When comparing the maximum likelihood method and the method = [M5 + 2M 3 (1 - e~M)2 - 4M 2 (1 - e~Mf of maximum posterior probability, this means that it must hold that XML should be equal to XMPP for almost (3) - 3M(1 - e~Mf + 4(1 - <T M ) 5 ]/4 4 all n vectors with £ n, = N. pM pM To examine whether XML = *MPP, a situation as simple h= • L(x2,b{X2)\n)db{T2) Jo Jo as possible but where more than one topology is possible has been used; i.e., a four-taxon case under the Jukes= [M5 - 2M 3 (1 - e~M)2 + M(l - <r M ) 4 ]/4 4 . (4) Cantor model (Jukes and Cantor, 1969). Many of the 256 possible columns contribute to the likelihood for x and For M > 0 the difference between Equations (3) and b^ in exactly the same way, so we say they have the same (4) is "pattern." For example, AACC contributes to the likelihood in the same way as CC TT (under the Jukes-Cantor model), whereas ACAC is another pattern contributing [4M3(1 - e~M)2 - 4M2(1 - e~Mf - 4M(1 - e'Mf differently. For the Jukes-Cantor model and four taxa, + 4(1 - £TM)5]/44 = [(1 - e~M)2 there are 15 different patterns. The possible different patterns of data at a site for four taxa are summarized in x (M - (1 - e-M))2{M + (1 - <TM))]/43. Table 1. If a specific pattern on its own gives a unique estimate of the topology we say it is separately informative. (5) 118 All the factors of (5) are strictly positive, and hence the difference is strictly positive and l\ > hThe posterior probability of T; , 7T (T, | n), is obtained from (2). To calculate the denominator n(n) in (2) one has to integrate over the branch lengths b and the parameter(s) 6 for all possible topologies and sum those, a complicated numerical task. However, if it is possible to calculate the numerator in (2) for all possible topologies, the posterior probability for T,- can be obtained by dividing the numerator in (2) for T, with the sum of the numerators for all topologies. Then n (r, |n) can be calculated analytically from (3) and (4) as I, 7r(r,|n) = where I3 = I2. Because l\ > I2, the posterior probability for Ti will be larger than for z2 and T3 and hence, using flat priors, TMPP = z\ for pattern XXYZ. The posterior probability will tend to 1/3 as M, the length of the interval of the prior for branch lengths, increases. For any finite M though, TI will be the unique estimate. Using an exponential distribution with mean j as prior for branch lengths but still using uniform distribution as prior for topology, the posterior probabilities can be written as 7T(T;|«) = A/(Ai + A2 + A3) where 1 0 A = 7F5 X 1. A 3 2 X y 1 2 3 X y 2 Xy A 4 y5' 1 where y = (X + 1). Pattern XXYZ is separately informative for the maximum posterior probability method with those priors too, because A\ > A2 for all X > 0. Analogous to the analysis of pattern XXYZ it can be shown that the two methods of inference differ for many of the 15 patterns. For the maximum likelihood method only patterns 1,2, and 3 are separately informative. These are separately informative for the maximum posterior probability method also, but so are patterns 8 to 13 (see Table 2). Even though a site is separately uninformative, it contributes to the likelihood function and therefore cannot be ignored. The fundamental differences between the two methods of inference, summarized in Table 2, have impact on the regions Rj and R,', as the next example will show. TABLE 2. Separately informative patterns favoring topologies TI = {(a, b), (c, d)\, T2 = {(a, c), (b, d)) and r3 = {(a, d), (b, c)} for the methods of maximum likelihood and maximum posterior probability, respectively. The enumeration of the patterns follows Table 1. Estimate of topology Method ML MPP VOL. 55 SYSTEMATIC BIOLOGY 1 1,8,9 2 2,10,13 3 3,11,12 a) MPP, exp(10) prior for t ML-method 100 FIGURE 1. Regions R,, where f = T, for the maximum likelihood method (a) and Bayesian inference (b) with nx + n2 + n8 = 100. The number of columns in data matrix that are equal to pattern 1 is on the x-axis, number of pattern 2 on the y-axis, and for each dot in the triangle bounded by the axes and n^ + n2 = 100 we have n8 = 100 - «] - n2. Example Suppose N = 100 and data only consists of patterns 1, 2, and 8 (e.g., nx = 45, n2 = 35, n8 = 20, n,- = 0 for i = 3 , . . . , 7, 9,..., 15). For this data set XML = TMPP = riFrom Table 2 we see that patterns 1 and 2 are separately informative, favoring topologies r\ and T2, respectively, for both methods of inference. Pattern 8 is, as we have shown analytically, separately informative for the maximum posterior probability method, favoring topology TI, but not for the maximum likelihood method. Columns of this pattern still contribute to the likelihood and are therefore taken into account even in the maximum likelihood method. A consequence of this is the region JRO in Figure 1, where no unique estimate of topology can be found. The regions in Figure 1 have been determined by using PAUP*v4.0bl0 (Swofford, 2002) for all possible n-vectors with n, = 0 for all i except n\ > 0, n2 > 0 and ns > 0 with ]T n, = 100 for the maximum likelihood method. For the Bayesian inference, R,- was determined by using MrBayes3.04 (Huelsenbeck and Ronquist, 2001) with an exponential prior with intensity 10 (mean 1/10) for branch lengths and uniform prior for topology. Figure 1 shows that R, ^ R^ and the bootstrap support value for TML is smaller (0.8715 when approximated by bootstrap frequencies from PAUP*) than the posterior probability for TMPP (which is approximated by MrBayes to 1.0). With an original data set far away from the borders between different regions {R,} and {Rj}, respectively (e.g., m = 70, ri2 = 10, and n% = 20 in Figure 1), the bootstrap support value and the posterior probabilities will both be high, tending to 1, even though R, / R^. This is due to the fact that the probability of a bootstrap replicate ending up in a region other than R, is then very small. CONCLUDING REMARKS We have shown that the often cited statement of Efron et al. (1996) about approximate equivalence between bootstrap support values and posterior probabilities is true when the only parameter of interest is p and a noninformative prior is used. This parameter is not considered 2006 SVENNBLAD ET AL.—LIKELIHOOD-BASED METHODS IN PHYLOGENETICS in the current implementations of Bayesian inference of phytogenies where the parameters topology, r, branch lengths b (r) , and model parameters 9 are considered, nor in the maximum likelihood method. For the four-taxon case with Jukes-Cantor model of evolution, the 256 different possible data columns can be reduced to 15 different patterns contributing to the likelihood function in different ways. We have shown analytically that some of those patterns are separately informative for the method of maximum posterior probability but not for the method of maximum likelihood. The differences between the methods remain when more taxa are considered. In the five-taxon case, {a, b, c, d, e}, there are 45 = 1024 different possible data columns that can, under the Jukes-Cantor model, be reduced to 51 different data patterns. For example, for pattern XXYYZ, seven topologies have the same maximized likelihood, all of them obtained on a border where at least one internal branch is of length 0; i.e., the tree collapse (see Appendix 2 for an analogous case with four taxa). Some of the topologies collapse to the same, not fully resolved, tree. For this specific pattern, four different collapsed trees are obtained. The groups {a, b} and {c,d} exist in three of them, respectively. The tree where the group is absent does not contradict the group. Hence, with the method of maximum likelihood, this pattern is not informative for topology but is informative for some groups. The method of maximum posterior probability, on the other hand, gives a unique estimate of the topology, (a, b, (d, (e, /))), and hence also of groups. Numerically, by using PAUP*v40bl0 and MrBayes3.04 we have found that none of the 51 patterns gives a unique estimate of the topology with maximum likelihood but 15 patterns give fully resolved trees with maximum posterior probability. This is also the case for the 187 different data patterns of the six-taxon case. The posterior probabilities of groups are, for six taxa and more, also strongly influenced by unequal priors for groups that follow a flat prior for topologies (Pickett and Randle, 2005), which may further explain the observed disparities between bootstrap values and posterior probabilities. The fundamental differences between the methods of maximum likelihood and maximum posterior probabilities that we have demonstrated influence the regions of the n space where the estimates equal a specific topology. Those regions influence the bootstrap support values and posterior probabilities differently, and they should therefore not be expected to be approximately equal. How much of the observed differences between maximum likelihood bootstrap values and posterior probabilities that can be attributed to this factor is not clear but need to be further studied. ACKNO WLEDG EMENTS We are grateful to Roderic D. M. Page, Jeff Thorne, and an anonymous reviewer for constructive criticism of an earlier version of this manuscript. This work was supported by the Linnaeus Centre for Bioinformatics, Uppsala University, and the Swedish Research Council. 119 REFERENCES Alfaro, M. E., S. Zoller, and F. Lutzoni. 2003. Mol. Biol. Evol. 20:255266. Berger, J. 0.1980. Statistical decision theory, foundations, concepts and methods. Springer-Veriag, New York. Bernardo, J. M., and A. F. M. Smith. 1994. Bayesian theory. John Wiley & Sons, New York. Box, G. E., and G. C. Tiao. 1973. Bayesian inference in statistical analysis. Addison-Wesley Publishing Company. Cummings, M. P., S. A. Handley, D. S. Myers, D. L. Reed, A. Rokas, and K. Winka. 2003. Syst. Biol. 52:477-487. Douady, C. J., F. Delsuc, Y. Boucher, W. F. Doolittle, and E. J. P. Douzery. 2003. Mol. Biol. Evol. 20:248-254. Efron, B. 1982. SIAM CBMS-NSF Monograph 38. Efron, B., E. Halloran, and S. Holmes. 1996. Proc. Natl. Acad. Sci. USA 93:13429-13434. Erixon, P., B. Svennblad, T. Britton, and B. Oxelman. 2003. Syst. Biol. 52:665-673. Felsenstein, J. 1981. J. Mol. Evol. 17:368-376. Felsenstein, J. 1985. Evolution 39:783-791. Huelsenbeck, J. P., and F. Ronquist 2001. Bioinformatics 17:754-755. Jukes, T. H., and C. R. Cantor 1969. Pages 21-132 in Mammalian protein metabolism. (H. N. Munro., ed.). Academic Press, New York. Larget, B., and Simon, D. L. 1999. Mol. Biol. Evol. 16:750-759. Pickett, K. M., and Randle, C. P. 2005. Mol. Phyl. Evol. 34:203-211. Rannala, B., and Yang, Z. 1996. J. Mol. Evol. 43:304-311. Simmons, M. P., Pickett, K. M., and Miya, M. 2004. Mol. Biol. Evol. 21:188-199. Steel, M. 1994. Syst. Biol. 43:560-564. Suzuki, Y, Glazko, G. V., and Nei, M. 2002. Proc. Natl. Acad. Sci. USA 99:16138-16143. Swofford, D. L. 2002. PAUP*: Phylogenetic analysis using parsimony (*and other methods), version 4.0bl0. Sinauer Associates, Sunderland, Massachusetts. Wilcox, T. P., Zwickl D. J., Heath T. A., and Hillis, D. M., 2002. Mol. Phylogenet. Evol. 25:361-371. First submitted 1 February 2005; reviews returned 6 April 2005; final acceptance 11 August 2005 Associate Editor: Rod Page APPENDIX 1 In Efron et al. (1996), an example where the data x are a twodimensional normal vector with expectation vector \x = {\x\, 1x2) and with identity covariance matrix is used. It is noted that if [i a priori could be anywhere in the plane with equal probability, the a posteriori distribution of [i, given p, = x, is also normally distributed with expectation vector p, and identity covariance matrix, exactly the same as the bootstrap distribution of fi , using parametric bootstrap. Referring to this example it is further stated that "The bootstrap probability that f * = f is almost the same as the aposteriori probability that r = f starting from an uninformative prior density on p", where f * is the estimate of the topology for a bootstrap replicate and p is the vector of proportions of the possible data patterns Xi, X 2 ,..., Xj*, where s is the number of taxa and, e.g., X[ = {A, A,..., A), X£ = {C, C,..., C}, etc. We will now show that the statement is true if a noninformative prior is used for p under the assumption that the method of inference gives the correct tree. The data, x, is an s x N data matrix of aligned DNA sequences. This data matrix is equivalent with drawing N columns from the true phylogeny where each possible column Xi, X 2 ,..., Xk has probability Pi, P2, • •-, Pk of being drawn, where k = 4s. The data matrix x can then be represented by a vector n = (n{,... n^) where £ n,; = N and each «, is the number of columns in x that equals X,-. The probability of data or, equivalently, the probability of n, the likelihood, then follows a multinomial distribution. In the nonparametric bootstrap method used in phylogenetic inference (Felsenstein, 1985), a bootstrap sample of the same size as the original data is obtained by drawing columns from the matrix x with replacement. The bootstrap replicate n* = (n*, n*v ..., n*k) follows, regardless of the method of inference used, a multinomial distribution 120 VOL. 55 SYSTEMATIC BIOLOGY with probability function / defined by f{n\,n*2,...n*k) = N (6) ntni---nt where pi = n,/N, with means E(n*) = Np, = nir variances V(n*) = blpi(l - pi) = (M,-(N - Hj))/N and covariances C(ri[, np = -Npipj = FIGURE 2. The continous p-space is J:-dimensional where k = 4s Now consider the Bayesian framework. The prior for p should, and s is the number of taxa. Here this is illustrated to the left in two diaccording to Efron et al. (1996), be noninformative. This term is not mensions with a hypothetical division of the p-space giving subspaces uniquely defined in the literature. For example, Berger (1980) says R,', where r = r,-. To the right the corresponding /?-space is illustrated "a non-informative prior, by which is meant a prior which contains no in-where each dot represents a possible p-vector. The regions R, and R,' formation about the parameter (or more crudely which favors no possiblemay differ as R,' is the true division of the p space, whereas R, is the values of the parameter over others)." Bernardo and Smith (1994) de- region where the estimate is equal to r,. In the region RQ there is no scribe the idea of a non-informative prior as representing ignorance. unique estimate. Jeffreys' rule for multiparameter problems, described, e.g., in Box and Tiao (1973), states that the prior distribution for a set of parameters is approximately noninformative if "the prior distribution for a set of pa- value a could be calculated exactly by (6) as rameters is taken to be proportional to the square root of the determinant of the information matrix," which relies on invariance under parameter (14) / ( " I ' n*2' • • • "*)• transformation. n'eR, The three different "definitions" of non-informative priors for p given above can all be represented by a symmetric Dirichlet Usually, the n-space is very complex and the division of it is theredistribution fore practically impossible to determine. The bootstrap support value is then estimated by the fraction of bootstrap replicates giving the same estimate of topology as the original data when drawing bootstrap repli(7) cates many times (Fig. 2). We have seen that theoretically the posterior distribution and the bootstrap distribution are approximately equal. Let the continuous region where r = r, be denoted R,'. To get the posterior distribution of where 6 = ( 1 , . . . 1) represents a flat (uniform) prior, 6 = (\,. ..\) rep- r,, the posterior distribution of p is integrated over the region R,'. If resents Jeffery's prior, and letting 6 ->• (0,..., 0), as in Efron (1982), R, ~ R,' and N is large, an integral can be approximated by a sum; i.e., represents "prior ignorance." Because the Dirichlet distribution is a conjugate prior to the multinomial distribution, the posterior distribution of p is D{p\6 + n) with (15) n(p\n)dp~ V f{n'\n)=a. I means £(/?,) = rii + 9 (8) N + 9k' variances V(pt) = covariances C(p,, pj) = — ^ ^ ( I ' + N ^ V " ' {n, + 9)(nj + 9) (9) APPENDIX 2 (10) From the means, variances, and covariances following (6) it is seen that for the bootstrap replicate, where p* = n*/N, (11) 3 V(p?) = V(n*/N) = m{N - n,)/N , 3 C(p;, p)) = C(n*/N, nyN) = -n ; n ; /N . Assuming that the method of inference gives the correct tree, (15) makes the bootstrap support value and the Bayesian posterior probability for a given topology theoretically approximately equal. In this section we show that there is no unique estimate for topology for the maximum likelihood method when data consists of one column of pattern 8, XXYZ, and the Jukes-Cantor model of evolution is used. The three possible topologies are then zu r2, and r3 (see Fig. 3) with branch lengths t, b, and v, respectively. Assume the Jukes-Cantor model of evolution, with a = 0.25. The likelihood functions for r2 and r3 will be identical (for r3 replace bt with i>, in the likelihood function for r2). Denote the likelihood function for Ti with L\ and for r2 (and r3) with L2. (12) (13) For very large N and k small, that is for a lot of data on only a few taxa, (8) ~ (11), (9) ~ (12), and (10) ~ (13) for each of the priors suggested. As we assume N to be large, the multivariate central limit theorem makes the bootstrap distribution and the posterior distribution for p approximately equal. Recall that for the data, n, £ n, = N. Each possible M-vector satisfying £ n, = JV, gives an estimate of the topology. The discrete n-space is therefore divided into different regions R,, with f = r,-. There might also be regions where the estimate is not unique. Denote these regions with RQ. The bootstrap support value a is the probability that the bootstrap replicate n* ends up in the same region as n. That is, the probability that the bootstrap replicate gives the same estimate of the topology as the original data set. If the division of the n-space in regions R, (giving estimate f = T,) would be known explicitly, then the bootstrap support We want, in the semiclosed five-dimensional continuous space bounded by 0 < tu 0 < t2,..., 0 < t5, to find a point t such that Lj (r) is maximized. If we would have a closed five-dimensional space, such that 0 < t, < M, i = 1 , . . . , 5 for some positive M, there would be four possibilities for a maximum point t (shown in Fig. 4 for a threedimensional case): 1. t is in the interior of the closed space, 2. t is in a boundary region of the space, 2006 SVENNBLAD ET AL.—LIKELIHOOD-BASED METHODS IN PHYLOGENETICS 121 V 5 X FIGURE 3. 2 The possible topologies ti, T2, and r3 with branches t, b, and v, respectively, when data consist of one site: {XXYZ}. 3. t is on a border between the different boundary regions, 4. t is in a corner of the closed space. To find the maximum value of L] in a closed space, find, to begin with, possible extreme points in the interior of the closed space (case 1). There is no need to further investigate the kind of possible extreme points but evaluate L\ in those points. Then investigate the boundary regions (case 2) and evaluate Lj in the possible extreme points of these regions. Continue in the same way with case 3 and finally evaluate L\ in the corners of the closed space (case 4). The maximum value of L\ in the closed continuous space is the maximum value of the evaluated points. If, in the five-dimensional closed space, t is an interior point, the tree will be fully resolved. If t is a boundary point with some t, = 0, the tree will collapse. If there exists an interior t that is extreme, i.e., maximizes or minimizes L, or is a so-called saddlepoint, the gradient VLi = a'i,, L'1(5) = (0,..., 0). From L'h - L'^ = 0 and L'lfj - L'1(i = 0 it is found that ^ = t2 and t3 = t4. From L'1( — L\ = 0, and by using tx = fe and t3 =fc»,it also follows that e~* = (-3e"f>(l + e ^ ) ) / ^ " ' 3 ) < 0, which is impossible because 0 < e"*5 < 1 in the interior of the compact space. Therefore, t\ has no interior extreme point (case 1). Continue with cases 2 and 3. In the closed five-dimensional space the boundary regions to be investigated are obtained by letting combinations of variables be fixed, 0 or M. At least one variable must not be fixed as otherwise a corner is investigated (case 4). When a variable is fixed to 0, or M, the likelihood function looks slightly different and an extreme point is searched for in the usual way. In the five-dimensional space there are 210 boundary regions and borders to be investigated. By investigating all of them it is found that only three of the regions have a possible interior extreme point, no matter how large M is. Evaluating L\ at those possible extreme points gives L\ < 4~4 for all three of them. The last possibility for a maximum point is at the corners of the closed space (case 4); i.e., where U is equal to 0 or M. For 14 of the 32 FIGURE 4. The possible locations of a maximum point {x, y, z} in a closed three-dimensional space: 1 an interior point, 2 a point on a boundary region, 3 a point on the border between different boundary regions, and 4 a corner, where each variable is either 0 or M. possible corners, L i = 0, for the others, L i < 4~3. As long as we consider finite M, the maximum point for ^ is in the corner (0, 0, M, M, M) for which L, = [1 — 3e~m + 2e~3M]/43. As M increases, the value of L} in that corner tends to 4"3, but there are other corners with the same limit value. Hence the maximum value of the likelihood will not be a unique point in the space of branch lengths. The fact that this can happen was pointed out by Steel (1994). Now, consider topologies r2 (and T3). The likelihood L2 is L2(b) = — [1 — + 2e~ ( i > 1 + ! ) 2 + b 4 + f ? 5 ) Analyzing L2 in the same way as L\ gives a unique maximum at the corner b = (0, M, 0, M, 0), where L2 = (1 - 2e~M + e"2M)/43, giving max L2{b) = 4"3 as M -*• oo, which is the same as the maximum value of Li. Hence, pattern XXYZ is not separately informative for the maximum likelihood method. In Figure 5 the maximized likelihoods for TI and r2 are shown as a function of M. As can be seen, the maximized likelihoods are indistinguishable for large but finite M. Maximized likelihood FIGURE 5. The maximized likelihood for topology r, and r2 (which has the same likelihood function as r3) for pattern XXYZ. The likelihood for Ti is maximized for branch lengths (0, 0, M, M, M) whereas the likelihood for r2 is maximized for branch lengths (0, M, 0, M, 0).
© Copyright 2026 Paperzz