Searching tree space. Tree confidence and comparison. Introduction to Bayesian methods. How do we search tree space? Search algorithms find the best tree for the data. Two methods guaranteed to find the globally best tree. Exhaustive search: every single tree Branch and bound: discard one tree usually means a set of subtrees are bad. Algorithm is much faster than exhaustive search. Searching methods For ML and MP trees Figures from Felsenstein’s Inferring phylogenies How many possible trees? Too many for computational tractability. Branch Swapping Strategies Global vs Heuristic Search S Most data sets are large. No possibility of exploring all trees Currently few packages even have these options (Exception is PAUP4). Goal: Balance thorough search against tractability and speed. Heuristic Search Strategies 2. SPR: Subtree pruning and reconnection. Break a branch off and reconnect the “root” somewhere else (any other branch). T 1. NNI: nearest neighbor interchange. V U S T Dissolve interior branch and form each alternative. V U S T S T U V U V Heuristic Search Strategies Tree bisection and reconnection (TBR) Break a branch into two. Then reattach using a different branch. Which Branch Swapping to Use? NNI --> STR --> TBR But all can get stuck on local optima. Local maximums Global maximum NNI and STR subsets of TBR. Increasingly accurate: TBR best. Decreasingly fast: TBR slowest. PAUP has all three methods phyml just NNI and STR. Traveling through tree space by accepting increasingly better trees (hill climbing) But all can get stuck on local optima. Maximum Likelihood tree. Traveling through tree space But all can get stuck on local optima. …or Maximum Parsimony tree. Traveling through tree space Starting tree influences outcome. Global maximum If start here then… Two effective methods …end up here. Global maximum If start here then… Starting tree influences outcome. …end up here. Global maximum If start here then… Sequential addition of taxa. Felsenstein recommendation: best resolved taxa listed first Add increasingly unresolved relationships. Sequential addition of taxa. Best known first Requires knowledge of the biology (oh dear!) Must know what the question is. Must limit the question that you are asking. Or use Stepwise Addition. Or use Stepwise Addition: multiple starting points. Global maximum Global maximum Suppose you start at multiple random trees to increase chance of covering tree space. Fastest: start with MP or NJ tree Global maximum NJ distance tree Better chance of reaching the global maximum. Use a quick method. Hope it lands near the global max. No guarantees but usually OK. Nonparametric bootstrap Tree confidence measures. Branch support And tree comparisons. Bootstrap (non parametric) Sequences are resampled 1000 times (at least) with replacement. Search for best tree for each of the 1000 replicated sequences/ bootstrap consensus is the majority rule consensus of 1000 trees from 1000 sequences. branches are labeled by % occurrence in 1000 trees. Felsenstein, J. 1985. Evolution 39: 783-791 Invented by Bradley Efron in the1970s. Adapted for phylogenies by Felsenstein in 1985. Most commonly used measure of tree support. For MP, distance and ML methods. Unnecessary for Bayesian methods. The bootstrap (nonparametric) Used in statistics as confidence levels when the data distribution is unknown (Efron, 1979). Eg. Eg. Suppose data is not normally distributed. Then a pseudosample provides variation estimate. The Bootstrap Replicate The bootstrap (nonparametric) Sites along the sequence Original The boot strap samples the sequence alignment with replacement. So the sequence length is the same as the original sequence Some sites are randomly sampled multiple times, while others are randomly omitted. Example: 5 replicate alignments One best tree for each replicated sequence alignment Resamples, with replacement, all sites along the alignment Sample 1 Sample 2 Some sites are sampled twice or more and some not at all. … and so on to 1000 total samples, then infer 1 best tree for each replicate sequence. Notice that it is the sequences that are bootstrapped, not the tree. Example: 5 replicate sequences Collection of trees built from the replicated data. Collection of trees built from the replicated data. Count the number of times each partition occurs in all the trees. Count the number of times each partition occurs in all the trees. Any partition that occurs in more than 50% of trees shows up in the majority rule consensus tree - the “bootstrap tree”. Any partition that occurs in more than 50% of trees shows up in the majority rule consensus tree - the “bootstrap tree”. 3/5 = 60% of the trees have the E-A clade. Bootstrap Bootstrap 79 53 68 94 88 91 86 99 73 90 80 100 74 100 100 62 91 91 55 100 100 100 100 55 97 100 72 74 82 84 51 99 99 50 90 88 99 100 Gulo gulo Martes pen Martes ame Martes mel Martes mar Martes zib Martes Foi Martes fla Mustela ev Mustela fu Mustela pu Mustela lu Mustela si Mustela it Mustela er Mustela al Mustela ni Mustela vi Taxidea ta Meles mele Spilogale Mephitis m Enhydra lu Aonyx cape Amblonyx c Lontra fel Lontra lon Lutra lutr Lutra macu Pteronura A forsteri Zalophusca walrus C cristata P fasciata P groenlan grayseal harborseal E barbatus H leptonyx Weddell seal M schauins ringtail Racoon Panda PolarBear Grizzly mongoose cat dog fox donkey horse indiarhino whiterhino blackrhino tapir pig sheep cow alpaca pygmyhippo hippo blue gray fin humpback bowhead n right dolphin Bootstrap tree Pseudosampling the sequence. 79 53 68 94 88 91 86 99 73 90 80 100 74 Idea: if many sites support a clade then it will appear in most random replicates. 100 100 62 91 91 55 100 100 100 100 55 97 If just one site, many replicates will lack this site. Bootstrap 100 72 74 82 84 51 99 99 50 90 88 99 100 Gulo gulo Martes pen Martes ame Martes mel Martes mar Martes zib Martes Foi Martes fla Mustela ev Mustela fu Mustela pu Mustela lu Mustela si Mustela it Mustela er Mustela al Mustela ni Mustela vi Taxidea ta Meles mele Spilogale Mephitis m Enhydra lu Aonyx cape Amblonyx c Lontra fel Lontra lon Lutra lutr Lutra macu Pteronura A forsteri Zalophusca walrus C cristata P fasciata P groenlan grayseal harborseal E barbatus H leptonyx Weddell seal M schauins ringtail Racoon Panda PolarBear Grizzly mongoose cat dog fox donkey horse indiarhino whiterhino blackrhino tapir pig sheep cow alpaca pygmyhippo hippo blue gray fin humpback bowhead n right dolphin Many clades may be unresolved - all sorts of polytomies. These mean that less than 50% of the boostrap trees support any particular clade. (Bayes posterior probabilities have a similar meaning). Bootstrap 79 53 68 94 88 91 86 99 73 90 80 100 74 100 100 62 91 91 55 100 100 100 100 55 97 100 72 74 82 84 51 99 99 50 90 88 99 100 Gulo gulo Martes pen Martes ame Martes mel Martes mar Martes zib Martes Foi Martes fla Mustela ev Mustela fu Mustela pu Mustela lu Mustela si Mustela it Mustela er Mustela al Mustela ni Mustela vi Taxidea ta Meles mele Spilogale Mephitis m Enhydra lu Aonyx cape Amblonyx c Lontra fel Lontra lon Lutra lutr Lutra macu Pteronura A forsteri Zalophusca walrus C cristata P fasciata P groenlan grayseal harborseal E barbatus H leptonyx Weddell seal M schauins ringtail Racoon Panda PolarBear Grizzly mongoose cat dog fox donkey horse indiarhino whiterhino blackrhino tapir pig sheep cow alpaca pygmyhippo hippo blue gray fin humpback bowhead n right dolphin 79 53 68 Bootstrap tree 94 88 86 99 73 91 90 80 Note: multiple tests problem: each clade is a separate hypothesis. Note that high bootstrap support will be misleading if model assumptions violated. Not a way to check the fit of the model. 100 74 100 100 62 91 91 55 100 100 100 100 55 97 100 72 74 82 84 51 99 99 50 90 88 99 100 Gulo gulo Martes pen Martes ame Martes mel Martes mar Martes zib Martes Foi Martes fla Mustela ev Mustela fu Mustela pu Mustela lu Mustela si Mustela it Mustela er Mustela al Mustela ni Mustela vi Taxidea ta Meles mele Spilogale Mephitis m Enhydra lu Aonyx cape Amblonyx c Lontra fel Lontra lon Lutra lutr Lutra macu Pteronura A forsteri Zalophusca walrus C cristata P fasciata P groenlan grayseal harborseal E barbatus H leptonyx Weddell seal M schauins ringtail Racoon Panda PolarBear Grizzly mongoose cat dog fox donkey horse indiarhino whiterhino blackrhino tapir pig sheep cow alpaca pygmyhippo hippo blue gray fin humpback bowhead n right dolphin Bootstrap tree Simply an accounting of how many bootstrap sequences support a particular clade. clade. Branch lengths cannot be represented in the consensus tree itself. ML tree with Bootstrap values ML tree with Bootstrap values A bootstrap toplology need not match the ML toplology E.g. this clade is not in the bootstrap concensus tree Convention: published trees A maximum likelihood tree with bootstrap values added in Intaglio or Illustrator or Word (not so great). Also programs like Fig Tree will display the bootstrap values. In short: So rare, but informative, sites are only rarely sampled and so do not in show up in all bootstrap trees. Hence the clades supported by just a very few sites will not be resolved. This is the point: high bootstrap values show that many sites support the clade. Comparing alternative trees Do the trees have significantly different tree scores? Statistical tests to compare alternative trees. Pair-wise site differences Kishino-Hasegawa (KH) test Simply a paired t-test comparing two trees. Calculate the pair-wise differences at each site for each tree. Sum the differences over all sites. Calculate the standard error of the pairwise differences (SE). lnL/SE >1.96, p ≤ 0.05 significantly different trees Kishino & Hasegawa. 1989. J. Mol. Evol. 29:170-179 Shimodaira-Hasegawa (SH) test A newer variant of the KH that corrects for multiple tests & some bias Should also correct KH for multiple tests (critical value is for 0.05 / # trees tested. For both, use RELL-calculated p-values (Resampling Estimated log Likelihood. For both, one-sided test if ML is one of the trees. – Shimodaira & Hasegawa. 1999. MBE 16(8):1114-1116 Here I compared 10 trees. Four were statistically poorer than the ML tree. Here I compared 10 trees. Four were statistically poorer than the ML tree. Likelihood vs. Bayesian methods Bayesian methods I Likelihood: L = Pr(D/H) (joint) Probability of the data (D) given the hypothesis (H) H may be a tree or a branch length or a model parameter D is the sequence of nucleotides Likelihood vs. Bayesian methods Bayesian adds a prior: Pr(H/D) = Pr (D/ H) (Pr (H) / Pr(D) Probability of the hypothesis (H) given the data (D). The product of the Likelihood and the Prior Typically uses Monte Carlo Markov Chain (MCMC) to search tree space. Bayesian Bayesian adds a prior : Pr(H/D) = Pr (D/ H) (Pr (H) / Pr(D) Likelihood The prior The probability of the data over all trees. Bayesian analysis and MCMC Monte Carlo Markov Chain combines parameter estimation with the tree search algorithm • (integrates over tree and parameter space) Whereas, conventional Likelihood tree search conditioned on parameters estimated earlier by a pretty good tree. • (integrates over the tree space) Bayesian simultaneously estimates parameters and trees ω = model parameters allowed to vary. Maximum likelihood fixes the parameters and then estimates the tree. Search method ω model parameters are estimated first and then fixed. Markov Chain Monte Carlo (MCMC) Simulates a walk through parameter and tree space. Analogous to Maximum Likelihood heuristic search “hill climbing” through tree space to find highest likelihood tree. Thanks to Mark Holder for the portions of the following slides. From the Workshop on Molecular Evolution, Woods Hole, MA, July, 2003. Lewis, Paul Tuesday’s reading Begins with a wander through space. Avoid getting stuck on local optima. Early steps are discarded: Burn-in. Metropolis algorithm for MCMC Propose a new location. Calculate height of new location. R= new height/old height. Move with probability that is a function of R. Always move if R>1. Mr. Bayes MCMC search Moves to next tree if R>1. R = ratio of the new tree height to the present one. Moves with low probability (0.03) If R < 1, then probability of the move = R Moves with high probability (0.92) The Process Initial tree and model parameters May be random Accept new move? MCMC rules: accept or reject. Run millions of generations. Save tree, branch lengths, parameters every k generations. After n generations, summarize results. If R < 1, then probability of the move = R How many generations? Exploring tree and parameter space: some big steps help explore other locally optimal hills. Very often: tens of millions of generations Correlations in parameter space: Base frequency estimates change as Ts/Tv changes. 0.60 20 18 0.50 parameter estimates A large step across narrow correlated parameter space. 16 14 0.40 12 10 0.30 8 0.20 6 4 0.10 2 0.00 0 A C G T ts/tv E.g. base frequency depends on TS/TV ratio So we use MCMCMC Base frequency of C A small change in C will send the TS/TV ratio right out of the optimal zone Metropolis Coupled Monte Carlo Markov Chains Run at least 4 chains simultaneously. One chain - cold chain - explores with relatively short steps. Others - heated chains - explore with big steps: cover much more of the tree space. TS/TV ratio Advantages of MCMCMC Cold chain with short steps: better explores parameter space. Advantages of MCMCMC Heated chains may miss optimal parameter space but cover tree space more thoroughly. Metropolis-coupled Markov Chain Monte Carlo: MC3 Advantages of MCMCMC Run multiple (at least four) chains simultaneously. Cold chain is the main chain - the one that shows up in the buffer and on the output. 3 heated chains that take bigger steps across posterior probability hills. A heated chain sometimes swaps to the cold chain if hot chain finds better space. Short steps: may miss globally optimal hill. Hot chains may become the cold chain. Chain results: 1 -- [-41631.791] (-43694.786) (-42920.096) (-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304) 1000 -- (-32120.952) (-31590.257) (-31579.554) [-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51 Hot chains may become the cold chain. Chain results: 1 -- [-41631.791] (-43694.786) (-42920.096) (-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304) 1000 -- (-32120.952) (-31590.257) (-31579.554) [-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51 Average standard deviation of split frequencies: 0.106151 Average standard deviation of split frequencies: 0.106151 2000 -- (-30922.429) (-30900.476) (-30861.073) [-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40 2000 -- (-30922.429) (-30900.476) (-30861.073) [-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40 Standard deviations ≤ 0.01? White noise - no trend over generations. Chain results: 1 -- [-41631.791] (-43694.786) (-42920.096) (-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304) 1000 -- (-32120.952) (-31590.257) (-31579.554) [-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51 Average standard deviation of split frequencies: 0.106151 2000 -- (-30922.429) (-30900.476) (-30861.073) [-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40 MCMC warnings Apparent plateau may be local not global optima. Failure to run long enough. ML score is no longer improving.
© Copyright 2026 Paperzz