Comparing alternative trees Building a Maximum Likelihood Tree What do the likelihood scores mean? Recall ln-likelihood tests of alternative models. Estimate a “pretty good” tree (NJ or parsimony) Use the tree to estimate various model parameters. Choose the model parameters that have the highest likelihood (lowest -lnL). Search tree space using the optimal model and a good tree search method (NNI, SPR, TBR) and 5-10 random starts. Choose the tree with the highest likelihood (lowest -lnL). Using the same optimal model parameters, run a bootstrap analysis to assess support for individual clades. Compare the likelihoods of alternative tree hypotheses. Do the trees differ significantly? Again we calculate pair-wise site differences (conditioned on the best model). lnL = -1405.61 lnL = -1405.61 H1: Chimp with human tree: lnL = -1405.61 H2: Chimp with gorilla tree: lnL = -1408.80 But what do these log likelihoods (lnL) mean? Remember the likelihood function: L = Pr(D|H) And the likelihood ratio test: L = Pr(D|H1)/Pr(D|H2) Harder mathematically. L = e -1405.61 /e-1408.80 And we don’t know how the likelihood score is distributed so we can’t test the hypothesis statistically. But the log of the ratio is easy and equivalent to the ratio. And the difference between the log of the two likelihoods is (asymptotically) chi - sq (χ2) distributed. From Felsenstein, 2003 ‘Inferring Phylogenies” So we use logs to compare tree hypotheses H1: Chimp with human tree: lnL 1 = -1405.61 H2: Chimp with gorilla tree: lnL 2 = -1408.80 Note that the lnL scores are negative. That is because the likelihoods (L) are probabilities and are therefore fractions. The log of any fraction is a negative number. e.g. 100 = 102 so log of 100 is 2 2 -2 so log of 1/100 is - 2 and 1/100 = 1/ 10 = 10 and the log of 1/1000 is -3. Recall basic math Which number is bigger? 1/100 or 1/1000 ? Which number is bigger? -2 or -3? So which number is bigger? -1405.61 or -1408.80 So which hypothesis has the larger (or highest) likelihood? For example Which number is bigger: 1/100 or 1/1000 ? » 1/100 Which number is bigger: -2 or -3? » -2 So which number is bigger: -1405.61 or -1408.80? » -1405.61 So which hypothesis has the larger (or highest) likelihood? » Chimp with Human We use the natural log (lnL). The log is in our case the natural log: ln to the base e=2.71 (instead of to the base 10). How to read Seaview output? Homework The ln(L) statistic for the tree inferred using the GTR+G model. Which of the trees built from the six models is the best? PhyML ln(L)=-87280.1 7872 sites GTR 4 rate classes H1: Chimp with human tree: H2: Chimp with gorilla tree: lnL 2 = -1408.80 But the same principle applies: We prefer H 1 (the tree with the chimp/human clade). The ln(L) statistic for the tree inferred using the GTR+G model NOTE: we often see -lnL 1 = 1405.61 and -ln = 1408.80. In that case we want the smallest negative lnL because it has the highest likelihood Which tree has the highest likelihood? PhyML ln(L)=-97326.8 7872 sites GTR Comparing alternative trees Recall ln-likelihood tests of alternative models. Do the trees differ significantly? Again we calculate pair-wise site differences (conditioned on the best model). lnL = -1405.61 lnL = -1405.61 Kishino-Hasegawa (KH) test Simply a paired t-test comparing two trees. Calculate the pair-wise differences at each site for each tree. Sum the differences over all sites. Calculate the standard error of the pairwise differences (SE). lnL/SE >1.96, p ≤ 0.05 significantly different trees Kishino & Hasegawa. 1989. J. Mol. Evol. 29:170-179 From Felsenstein, 2003 ‘Inferring Phylogenies” Shimodaira-Hasegawa (SH) test A newer variant of the KH that corrects for multiple tests & some bias Should also correct KH for multiple tests (critical value is for 0.05 / # trees tested. For both, use RELL-calculated p-values (Resampling Estimated log Likelihood. For both, one-sided test if ML is one of the trees. – Shimodaira & Hasegawa. 1999. MBE 16(8):1114-1116 Seaview? Here I compared 10 trees using PAUP. Four were statistically poorer than the ML tree. Here I compared 10 trees using PAUP. Four were statistically poorer than the ML tree. Likelihood vs. Bayesian methods Both use maximum likelihood optimized models and Markov Chains. Does not do these tests. PAUP and Phylip and others will perform these tests. Some newer variants are in the newest Consel software (Shimodaira, 2008). Not used for Bayesian analysis. Likelihood: Bayesian methods I L = Pr(D|H) (joint) Probability of the data (D) given the hypothesis (H) H may be a tree or a branch length or a model parameter D is the sequence of nucleotides Bayesian adds a prior: Pr(H|D) = Pr (D|H) (Pr (H) Pr(D) Probability of the hypothesis (H) given the data (D). The product of the Likelihood and the Prior Typically uses Monte Carlo Markov Chain (MCMC) to search tree space. Models of sequence evolution JC K2P HKY GTR HKY+I+G GTR+I+G equal probability of change (1 df) transition rate ≠ transversion rate (2df) adds unequal nucleotide frequencies (5 df) p (A<-->G) ≠ p (A<-->T) ≠ . . .≠p (T<--> C ) (8 df) adds invariant sites (I) + rate heterogeneity (G) (8df) most complex (10 df) Bayesian optimization: Bayesian analysis and MCMC ML: fix parameters Bayes: Marginalize over parameters Monte Carlo Markov Chain combines parameter estimation with the tree search algorithm • (integrates over tree and parameter space) Whereas, conventional Likelihood tree search conditioned on parameters estimated from preliminary trees • (integrates over the tree space) Bayesian optimization: simultaneously optimizes parameters and trees. ω represents the model parameters ML optimization: optimizes the tree likelihood over fixed ω, Baysian tree search: Monte Carlo Markov Chains Recall ML tree search and tree space? ω, the model parameters are determined first then the tree is optimized. Improve hill climbing with NNI, SPR and TBR and with random starts for ML trees Search method Markov Chain Monte Carlo (MCMC) Simulates a walk through parameter and tree space. Analogous to Maximum Likelihood heuristic search “hill climbing” through tree space to find highest likelihood tree. Thanks to Mark Holder for the portions of the following slides. From the Workshop on Molecular Evolution, Woods Hole, MA, July, 2003. Similarly for Baysian trees: hill climbing variant Always moves to next tree if R>1. R = ratio of the new tree height to the present one. Moves with low probability (0.03) If R < 1, then probability of the move = R Moves with high probability (0.92) If R < 1, then probability of the move = R Tree search in Mr. Bayes Begins with a wander through space. Propose a new location. Calculate height of new location. R= new height/old height. Move with probability that is a function of R. Always move if R>1. Early steps are discarded: Burn-in. Help to avoid local optima? Metropolis Coupled Monte Carlo Markov Chains Run at least 4 chains simultaneously. One chain - cold chain - explores with relatively short steps. Others - heated chains - explore with big steps: cover much more of the tree space. Advantages of MCMCMC Cold chain with short steps: better explores parameter space. Metropolis-coupled Markov Chain Monte Carlo: MC3 Run multiple (at least four) chains simultaneously. Cold chain is the main chain - the one that shows up in the buffer and on the output. 3 heated chains that take bigger steps across posterior probability hills. A heated chain sometimes swaps to the cold chain if hot chain finds better space. Advantages of MCMCMC Heated chains may miss optimal parameter space but cover tree space more thoroughly. Hot chains may become the cold chain. Chain results: 1 -- [-41631.791] (-43694.786) (-42920.096) (-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304) 1000 -- (-32120.952) (-31590.257) (-31579.554) [-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51 Advantages of MCMCMC Short steps: may miss globally optimal hill. Hot chains may become the cold chain. Chain results: 1 -- [-41631.791] (-43694.786) (-42920.096) (-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304) 1000 -- (-32120.952) (-31590.257) (-31579.554) [-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51 Average standard deviation of split frequencies: 0.106151 Average standard deviation of split frequencies: 0.106151 2000 -- (-30922.429) (-30900.476) (-30861.073) [-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40 2000 -- (-30922.429) (-30900.476) (-30861.073) [-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40 Standard deviations ≤ 0.01? White noise - no trend over generations. Chain results: 1 -- [-41631.791] (-43694.786) (-42920.096) (-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304) 1000 -- (-32120.952) (-31590.257) (-31579.554) [-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51 Average standard deviation of split frequencies: 0.106151 2000 -- (-30922.429) (-30900.476) (-30861.073) [-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40 MrBayes > exe cynmix.nex begin mrbayes; outgroup Ibalia; charset morphology = 1-166; charset molecules = 167-3246; charset COI = 167-1244; charset COI_1st = 167-1244\3; charset COI_2nd = 168-1244\3; charset COI_3rd = 169-1244\3; charset EF1a = 1245-1611; charset EF1a_2nd = 1245-1611\3; charset EF1a_3rd = 1246-1611\3; charset EF1a_1st = 1247-1611\3; charset LWRh = 1612-2092; charset LWRh_2nd = 1612-2092\3; charset LWRh_3rd = 1613-2092\3; charset LWRh_1st = 1614-2092\3; charset 28S = 2093-3246; charset 28S_Stem = 2160-2267 2361-2401 2489-2528 2539-2565 2577-2647 2671-2760 2768-2827 2848-3194 3220-3246; charset 28S_Loop = 2093-2159 2268-2360 2402-2488 2529-2538 2566-2576 2648-2670 2761-2767 2828-2847 3195-3219; partition Names = 5: morphology, COI, EF1a, LWRh, 28S; partition Nopart = 2: morphology, molecules; partition Morph_mito_nucl_ribo = 4: morphology, COI, EF1a LWRh, 28S; partition Extreme = 12: morphology, COI_1st, COI_2nd, COI_3rd, EF1a_2nd, EF1a_3rd, EF1a_1st, LWRh_2nd, LWRh_3rd, LWRh_1st, 28S_Stem, 28S_Loop; end; begin mrbayes; set partition=Names; lset applyto=(2,3,4,5) nst=6 rates=invgamma; unlink shape=(all) pinvar=(all) statefreq=(all) revmat=(all); prset ratepr=variable; end; ML score is no longer improving. The entire MrBayes block for a mixed analysis. MrBayes > mcmc ngen=50000 samplefreq=50 MrBayes > sump burnin=500 MrBayes > sumt burnin=500 MrBayes > comparetree List of taxon bipartitions found in tree file: Post burn-in parameter summary. Partitioned analysis: cynmix.nex with lots of parameters for mixed data. Majority rule consensus of all trees sampled after the burn in. Cynmix.nex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 -------------------------------------------------------- .............*.................. .........................*...... ............*................... .............................*.. ..........*..................... .....*.......................... ..............*................. ..*............................. ...............................* ................*............... ..............................*. ...........................*.... .........*...................... ....*........................... .......*........................ ......*......................... ....................*........... .................*.............. ...............*................ ........................*....... .......................*........ ...........*.................... ..................*............. ...................*............ ...*............................ .....................*.......... ......................*......... ..........................*..... ............................*... .*.............................. ........*....................... .******************************* .........****................... ..........................***... ................**.............. ................***............. .......*********................ ...****......................... .........................****... .*.****......................... .......******................... .............**................. ...........................**... ...**........................... ...................***.......... ..........***................... .....**......................... .......************************* .......................**....... ......................***....*** .............................*** ...........**................... .******......................... .......*********...******....*** .......**....................... 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 3988 3985 3983 3983 3979 3978 3976 3971 3970 3968 3961 3960 3957 3952 3952 3948 3938 3908 3904 3882 3814 3791 3635 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 3988 3985 3983 3983 3979 3978 3976 3971 3970 3968 3961 3960 3957 3952 3952 3948 3938 3908 3904 3882 3814 3791 3635 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.997 0.996 0.996 0.996 0.995 0.994 0.994 0.993 0.992 0.992 0.990 0.990 0.989 0.988 0.988 0.987 0.984 0.977 0.976 0.970 0.953 0.948 0.909 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.997 0.996 0.996 0.996 0.995 0.994 0.994 0.993 0.992 0.992 0.990 0.990 0.989 0.988 0.988 0.987 0.984 0.977 0.976 0.970 0.953 0.948 0.909 Bipartitions (splits) for 2 runs: Cynmix.nex 4001 samples from 200,000 generations. Converging to stationarity List of taxon bipartitions found in tree file: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ------------------ .............*.................. .........................*...... ............*................... .............................*.. ..........*..................... .....*.......................... ..............*................. ..*............................. ...............................* ................*............... ..............................*. ...........................*.... .........*...................... ....*........................... .......*........................ ......*......................... ....................*........... 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 4001 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 lnL plot of both runs over 200 generations Markov chains run until stationarity is reached. Point where the fit is good and does not improve. The “top of the hill” in tree/parameter space. Detected when the lnL scores plateaus. But has stationarity been reached? Plot of lnL by generation. Looks pretty patternless. Just as we’d like. +------------------------------------------------------------+ 26570.79 | 1 1 | | 1 | | 11 | | 1 1 2 1 2 | | 2 1 1 1 2 2 1 1 2 | | 1 1 12 2 2 1 2 2 22 2 2 2 | | 1 1 2 222 1 1 1 1 1 2 111| |1 112 1 2 2 21 * 11 * 2 1 12 | |2 2 2 2 1 1 12 1 * 2 1 2 2| | 1 2 2 21 1 2 1 2 1 1 2 2 | | 2 2 1 2 2 1 1 | | 2 1 1 2 2 | | 2 1 2 2 | | 2 1 2 | | 1 2 1 1 2 2 | +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ 26592.29 ^ ^ 25000 Looks pretty stationary from 200 -200,000 generations. Except that SE is too big. 199900 -- [-26577.614] (-26610.759) (-26597.502) (-26603.504) * (26614.970) (-26643.536) (-26600.831) [-26580.881] -- 0:00:10 200000 -- [-26576.965] (-26619.707) (-26590.745) (-26594.843) * (26606.805) (-26644.199) (-26601.677) [-26577.404] -- 0:00:00 Average standard deviation of split frequencies: 0.020419 Monitor the run to check the standard error of the split frequencies. Should be smaller than 0.01 but is 0.024. Should have run the analysis for longer than 200,000 generations. Lots and lots of parameters to estimate here. 1.0 Run one 1.0 What are they? Splits are bipartitions of the taxa that define clades. The standard deviation measures the discrepancies between the two runs. Gets smaller as the trees for the two runs become more similar. Goal: tight fit to the diagonal As two runs converge to same tree. Clade probability in analysis 2 Clade probability in analysis 2 Run two This graph plots the probabilities of clades found in file 1 (the x-axis) against the probabilities of the same clades found in file 2 (the yaxis). 0.0 Monitoring the standard error of the split frequencies. We look for SE ≤ 0.01 Should it be SE ≤ 0.0001? We want the SE to approach 0. As it does, the two runs converge on the same optimal tree. Goal: tight fit to the diagonal As two runs converge to same tree. Bivariate plot of clade probabilities: Standard deviations of split frequencies SE is a simple and perhaps better diagnostic than lnL plots. Clade probability in analysis 1 Clade probability in analysis 1 From Ronquist lecture From Ronquist lecture Goal: tight fit to the diagonal As two runs converge to same tree. Clade probability in analysis 2 Clade probability in analysis 2 Goal: tight fit to the diagonal As two runs converge to same tree. Clade probability in analysis 1 e.g. tree 2 prob<0.75 tree 1 prob ~ 1.00. Quite a lot of variation between runs. Clade probability in analysis 1 From Ronquist lecture From Ronquist lecture
© Copyright 2026 Paperzz