Phylogeneticmethods Lecture2.1 Algorithmbased Optimality criterion Distance-based methods Maximum parsimony Distance-based methods Maximum likelihood BayesianPhylogeneticAnalysis Noexplicit substitution model SimonHo A G C T Other Bayesian inference 2 Bayesianphylogeneticanalysis • Bayesianphylogenetic analysiswasdeveloped inthemid1990s • Nowoneofthemostwidely usedmethods Bayesianphylogeneticanalysis Maximum likelihood Given brownbear cavebear blackbear giantpanda Probabilityof? A G brownbear cavebear + C T blackbear giantpanda CGTTAGTACACT CGATAGTTCACT CGTTAGTTTACC CATTGGTTTACT Bayesianinference Given Probabilityof? brownbear cavebear blackbear giantpanda 3 A G brownbear cavebear + C T blackbear giantpanda CGTTAGTACACT CGATAGTTCACT CGTTAGTTTACC CATTGGTTTACT 4 TheBayesianparadigm • Contrast withfrequentist statistics(likelihood) • Parameters havedistributions • Simpleexample brownbear cavebear Before thedataareobserved, eachparameter hasaprior distribution • Thelikelihood ofthedataiscomputed • Theprior distribution iscombined withthelikelihood toyield theposteriordistribution blackbear giantpanda 5 Simpleexample brownbear cavebear blackbear giantpanda AGT AGA CGA CAA 6 Simpleexample AGTAAG AGAAAG CGAAAG CAAAGG brownbear cavebear blackbear giantpanda 7 AGTAAGTACACA AGAAAGATCACA CGAAAGATTACC CAAAGGATTACA 8 Bayesianinference Prior Specifiedbyuser, independent ofdata Pr(θ |D)= Posterior Thisiswhatwe wanttoestimate Priors Likelihood Calculatedfromdata • Priorsarechosenintheformofprobability distributions • Reflect ourpriorexpectations (anduncertainty) aboutvaluesof parameters (without knowledge ofthedata) Pr(θ)Pr(D |θ) Pr(D) • Pastobservations • Personalbeliefs • Useofabiologicalmodel normalisingconstant marginallikelihoodofthedata modellikelihood 9 10 Priors Treeprior:Amongspecies 1. Useaflatpriorfortreetopology (MrBayes) • Alltreeshaveequalprobability • Alsoneedapriorforbranchlengthsornodetimes • • 2. Useabiologicalmodeltogenerate prior distribution (BEAST andMrBayes) • Amongspecies:speciationmodel • Withinspecies:coalescentmodel Treeshapedescribed bya stochastic branching process Yuleprocess brownbear cavebear blackbear giantpanda • Lineagessplitataconstantrate brownbear • Simulatesspeciationprocess cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda 11 12 Estimatingtheposterior MarkovChainMonteCarloSampling • Impossibletoobtaintheposterior directly • Instead,theposterior canbeestimatedusing Markovchain MonteCarlosimulation • Thisisusuallydoneusingthe Metropolis-Hastingsalgorithm Nicholas Metropolis Los Alamos, 1953 14 probability MCMCsimulation probability MCMCsimulation proposea change randomstartingtree brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda 15 brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda 16 MCMCsimulation probability probability MCMCsimulation alwaysaccept changesthatincrease probability brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda usuallyaccept changesthatslightly reduceprobability brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda 17 18 MCMCsimulation probability probability MCMCsimulation usuallyreject changesthatgreatly reduceprobability brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda takemanysteps(millions) brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda 19 brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda 20 Metropolis-coupledMCMC SamplesfromtheMCMC • probability Successivechainsare incrementallyheated brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda OutputfromaBayesianphylogenetic analysis: • AlistoftheparametervaluesvisitedbytheMarkovchain (.pfileinMrBayes, .logfileinBEAST) • AlistofthetreesvisitedbytheMarkovchain (.tfileinMrBayes, .treesfileinBEAST) brownbear cavebear blackbear giantpanda 21 22 SamplesfromtheMCMC SamplesfromtheMCMC Stationaryphase Numberofsamples Probability Stationaryphase Burn-in phase Valueofparameter MCMCsteps(millions) 23 UsingtheMetropolis-Hastings algorithm, wesampleparameter valuesandtreesfromtheMCMC withafrequency proportional to theirposteriorprobability 24 SamplesfromtheMCMC SamplesfromtheMCMC brownbear cavebear giantpanda blackbear brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda Numberofsamples Mean • Takethemeanofthesampled values Meanposteriorestimate • Takethecentral95%ofthe sampledvalues 95%credibilityinterval Valueofparameter 25 SamplesfromtheMCMC brownbear cavebear giantpanda blackbear brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda brownbear cavebear giantpanda blackbear brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda blackbear cavebear giantpanda brownbear 0.90 0.60 brownbear cavebear blackbear giantpanda brownbear cavebear giantpanda blackbear brownbear cavebear blackbear giantpanda brownbear cavebear blackbear giantpanda blackbear cavebear giantpanda brownbear 0.90 brownbear cavebear 26 SamplesfromtheMCMC brownbear • Majority-ruleconsensus tree(MrBayes) Showsallnodeswithposterior probability >0.50 • Maximum aposteriori(MAP) tree Sampled treewithhighestposterior probability • Maximum cladecredibility(MCC)tree(BEAST/TreeAnnotator) Sampled treewithhighestsumorproduct ofposterior node probabilities cavebear blackbear giantpanda 27 28 Diagnostics 1. 2. Convergence Convergence • Arewedrawingsamples from thestationarydistribution? Runatleast2independent chains • Posterior probabilities andlikelihoods shouldbesimilar Sufficientsampling • Modelparameters Havewedrawnenoughsamples toallowareliableestimateof theposteriordistribution? • Poor convergence Good • Poor mixing Checkifestimatesofmodelparametersaresimilarbetween runs Treetopologies • Monitorstandarddeviationofsplitfrequencies(SDSF) • Astreesamplesbecomemoresimilarbetween runs,SDSFà 0 30 29 Sufficientsampling • Effectivesamplesize(ESS) Havewedrawnenough independent samplestoproduce a reliable estimateoftheposterior distribution? • ESSispreferably >200foreachparameter • ESScanbeincreased by: • IncreasingthelengthoftheMCMC • Decreasing thefrequency ofsamplingfromtheMCMC • Modifying theMCMCproposals AdvantagesandProblems Maintainanacceptancerateof15–70% 31 Advantages Nuisanceparameters • Abletoimplementhighlyparameterisedmodels • Integrate over‘nuisance’ parameters • Estimatingtreeuncertainty isstraightforward • Marginal distribution ofaparameter ofinterest • Canonlydothisindirectly inlikelihood (viabootstrapping) • Posteriorprobabilitieshaveanintuitive interpretation • Canincorporate independent information(intheprior) Tree1 Tree2 Tree3 Branchlengths1 0.10 0.07 0.12 0.29 Branchlengths2 0.05 0.22 0.06 0.33 Branchlengths3 0.05 0.19 0.14 0.38 0.20 0.48 0.32 Jointprobabilities 33 Influenceofpriors • Sensitivity oftheposterior totheprior • Thisproblem canoccur ifthedataareuninformative, theprior isstrong,orboth Marginal probabilities Ronquistetal.(2009)ThePhylogeneticHandbook 34 Problems:Inflatedsupportvalues? 35 Chatterjee et al. (2009) BMC Evol Biol 36 BEAST 1 • BayesianEvolutionary AnalysisbySamplingTrees • Analysepopulation- orspecies-level data • Simultaneous estimationoftree andnodetimes • Rangeofclock models • Rangeoftreepriors anddemographic models • Re-write ofBEASTtoincrease modularity • UserscanextendBEASTbyaddingpackages • Lackssomeoftheclock modelsanddemographic modelsfrom BEAST1 • Additional treepriors notavailable inBEAST1 • Capacity toperform simulations Foracomparison ofBEAST1and2: http://beast2.org/beast-features/ 37 38 MrBayes RevBayes • Primarily designed forspecies-level data • UsesitsownR-likelanguage,Rev • Simultaneous estimationoftree andnodetimes • Interactive construction ofgraphical model • Rangeofclock models • Flexible andcanbeusedforsimulationandinference • Rangeoftreepriors • Ongoingdevelopment • Multiple chainsandMCMC diagnostics 39 40 Usefulreferences • Analysesoflargedatasetsoncomputing clusters • Available priorssimilartothoseinolder versionsofMrBayes • Limitedoptions, nomolecular dating • Likelihood component adapted fromRAxML 41 42
© Copyright 2026 Paperzz