Phylogenetics and comparative methods Dating phylogenetic trees May 8th, 2015 Molecular phylogenetic trees Reconstruction methods estimate • unrooted trees • with branch lengths in genetic change units Ideal phylogenetic trees • are rooted • have branch lengths in units of time Example of a dated tree Dating molecular trees Dating phylogenetic trees is challenging. The basic idea is that given • a tree topology • branch length = r ∗ t • calibration point(s), e.g. a time estimate for a node (T) Can we date all the other nodes in the tree? A constant evolutionary rate • To obtain a calibrated tree, the evolutionary model must assume a relationship between the accumulation of genetic diversity and time • Zuckerkandl and Pauling (1962): the rate of amino acid replacements in animal haemoglobins was roughly proportional to real time, as judged against the fossil record. Global molecular clock Under a global molecular clock, the rate of mutation r in each lineage is the same and constant over time. If that’s the case, branch lengths are simply proportional to time. There are several reasons why we want to believe in the existence of a constant global molecular clock • phylogenetic inference is much simpler when constant rates of evolution can be assumed • a constant clock makes it possible to estimate divergence times, and to date specific events (e.g. migration, hybridisation, host switch, . . .) • the molecular clock relates to Kimura’s theory of neutral evolution Evolutionary time with a clock • estimate pairwise genetic distance d = genetic distance • paleontological data to determine date of common ancestor T = time since divergence • estimate calibration rate (number of substitutions per unit of time) r = dac /2Tac • calculate time of divergence for all other nodes Tab = dab /2r Strict molecular clock Substitutions occur randomly according to a Poisson process P(N(t + δt ) − N(t) = k) = e −λδt (λδt )k k! Number of mutations occuring per million year with Poisson variance • 95% of the lineages 15 my old have between 8 and 22 substitutions • 8 substitutions could also be < 5 my old! k = 0, 1, . . . Testing the global clock • strict molecular clock • all lineages evolve at the same rate • allows the estimation of the root of the tree and dates of individual nodes Zuckerkandl and Pauling, 1962 • unconstrained Felsenstein model • each branch has its own rate independent of all others • time and rate are confounded and can only be estimated as a compound parameter (the branch length) Felsenstein, 1981 Non-clock phylogenetic tree • unrooted tree • 2n − 3 independent branches • all of bi need to be estimated • Maximum Likelihood Q L(T , bi , θ) = k Prob(yk |T , bi , θ) Clock phylogenetic tree • rooted tree • n − 1 independent branches • only the heights of the nodes to estimate • b1 = b2 b3 = b4 b6 = b5 + b1 b8 = b7 + b6 − b3 Likelihood ratio test Alternative model H1 Null model H0 2n − 3 parameters n − 1 parameters • likelihood ratio test with n − 2 degrees of freedom • 2 ∗ (lnL(H1 ) − lnL(H0 )) Relative rate tests Molecular clock test is a very strict statistical test, because it can be rejected even if a single lineage is different from the other. Idea is to detect such lineage and remove them from tree • compare two ingroup lineages for their distance to a single outgroup • can be modified to test multiple lineages • for each non-root node • test if two descendants of a node have same branch length • remove lineages that show significant deviation from clock hypothesis • create therefore a “linearized tree” Solutions to molecular clocks If the previous tests show deviation from clock hypothesis, removing taxa might not be ideal. So how can we try to deal with that? • if rate variation are random in direction and magnitude in different DNA regions, combining large number of data sets might give reasonable estimates of divergence times • but grasses have higher rates in plastid, nuclear and mitochondrial genes • assume that all genes share common divergence times, but allow pattern of rate variation to differ among genes Local molecular clocks If global molecular clock does not hold, we could try to fit local clocks. • postulate some small number k > 1 of fixed but different rates for sets of branches • some models are not identifiable, they do not permit unambiguous estimation of times and rates • combinatorially huge number of ways to assign a small number of rates on a large tree • tests for rate differences at each node to identify subtree with common rates Rate of the rate of evolution The molecular clock assumption can be relaxed by imposing a weaker constraint that is sufficient enough to allow estimation of divergence times • we need to come up with a way to model changes of evolutionary rates through time • tractable solution is to use temporal autocorrelation to model among lineages rate changes Penalized likelihood Modelling rate autocorrelation through lineages • f (θSAT |x1 , . . . , xn ) = Q f (xk |rk [tanc(k) − tk ]) • “smoothing” parameter is introduced and can be tuned to allow greater or lesser rate smoothing: • f (θSAT |x1 , . . . , xn ) − λΘ(r1 , . . . , rn ) • Θ(r1 , . . . , rn ) can simply be minimizing rate differences between branches • constant rate: λ = ∞ • each branch has one rate: λ = 0 • cross-validation to estimate smoothing parameter • allow constraints to be added to the model in the form of known dates Sanderson, 2002 Bayesian approach Modeling rate evolution using a Hierarchical Bayesian setting: f (r , a, θr , θa , θs |X ) = f (X |r , a, θs )f (r |θr )f (a|θa )f (θs )f (θr )f (θa ) f (X ) • where • r = substitution rates for branches, a = ages of interior nodes • θr = lineage-specific rate variation • θa = model of branching times • θs = model of sequence evolution • MCMC to integrate over all possible rate assignment • gives credibility intervals around rate estimates for each branch and obtained dates • allow constraints to be added to the model in the form of known dates Correlated relaxed clock Uncorrelated relaxed clock Divergence time influences When calibrating the divergence times of some internal nodes, the tree prior is constructed in BEAST using three main ingredients: 1 2 3 One or more "calibration densities" A parametric "tree prior" that specify a density on the topology and all the divergence times of the tree Zero or more additional constraints on the topology in the form of subsets of taxa that are constrained to be monophyletic Fossils Phylogeography calibration Volcanic islands are nice... but rare Fleischer et al. 1998 Other possibilities Sauquet et al. 2012 Calibration points Heath et al. 2012 Effects of calibrations I Sauquet et al. 2012 Effects of calibrations II Sauquet et al. 2012 Effects of substitution models Brandley et al. 2012
© Copyright 2026 Paperzz