Molecular dating

Principles!
•! Ultrametricity: All descendants of any node
are equidistant from that node!
•! For extant species, branches, in units of
A!
time, are ultrametric!
Introduction to molecular
dating methods!
B!
C!
D!
E!
F!
100!90!80!70!60!50!40!30!20!10!
Evolutionary branch length!
What is a “molecular clock”? !
•! Expected number of substitutions/site = rate
of change x branch duration!
20 Ma!
•! Rate = 0.001 sub/site/Ma!
•! “True” length = 0.02!
•! Actual length " 0.02!
a)! All internodes have equal duration!
b)! All branches have equal rate of substitution!
c)! All tips are the same number of time units from
the root!
d)! The expected number of substitutions per site is
the same for all branches!
e)! The observed number of substitutions is the same
for all descendants of a given node!
•! First proposed by Zuckerkandl and Pauling
(1965) based on haemoglobin data!
•! If there is the same rate for all branches
there will be a linear relationship between
A!
sequence distance and time since
divergence!
B!
O!
Percent sequence divergence!
If you know one divergence date
then you can calculate others!
The molecular clock idea!
y
x
Time since divergence!
Issue 1: There will be error around the
estimates!
Percent sequence divergence!
Percent sequence divergence!
If you know one divergence date
then you can calculate others!
x
z!
Stochastic rate variation!
x
Uncertainty in dating!
z!
Range!
Time since divergence!
Percent sequence divergence!
Issue 2: You need to correct for
multiple hits!
Time since divergence!
Inferred age!
Issue 3: Is evolution clock-like?!
onship!
Actual relati
x
z!
Actual age!
Inferred age!
Issue 3: Is evolution clock-like?!
Local clock: clade-specific rates!
Issue 3: Is evolution clock-like?!
No clock: rates vary greatly!
Why should we expect a clock?!
•! Under neutral evolution: but that is too fast
for most (all?) data sets!
•! If there is reasonable constancy of
population size, mutation rate, and patterns
of selection!
•! We can hope that rates of evolution change
slowly and/or rarely!
The likelihood approach!
•! Consider two models of evolution!
–! The usual model!
–! The same model but!
•! A root is specified!
•! The summed branch lengths from any node to all
descendants of that node are the same!
•! Do a likelihood ratio test!
Which is the simpler model?!
How many degrees of freedom?!
•! Depends on the number of taxa (n)!
•! Branch length parameters in the non-clock
model = 2n - 3!
•! Branch length parameters in the clock
model = n - 1!
•! Difference = (2n - 3) - (n - 1) = n - 2!
If a clock model is not rejected!
•! Calculate rates and then extrapolate from known
to unknown pairwise distances!
DOA = 0.4 ; DAB = 0.1!
TOA = 90 ; TAB = (0.1/0.4) x 90 = 22.5 Ma!
O
A
0.05!
0.2!
B
0.05!
22.5!
0.195!
90!
Should obtain confidence
intervals around date estimates!
•! Look at the curvature of the likelihood
surface (can be done with PAML)!
•! Use bootstrapping (parametric or nonparametric)!
–! Generate multiple pseudoreplicate data sets!
–! For each data set calculate relative nodal ages!
–! Discard the upper and lower 2.5%!
Calibrating the tree!
•! How does one attach a date to an internal
node? How old is the fossil? Where does a
fossil fit on the tree?!
Calibrating the tree!
•! How does one attach a date to an internal
node? How old is the fossil? Where does a
fossil fit on the tree?!
What does that tell us?!
F (90 Ma)!
F (90 Ma)!
A!
B!
O!
This node is at least 90 Ma!
The lineage leading to F could
have been missed!
What else?!
This node is more than 90 Ma!
F!
A!
F!
A!
B!
B!
O!
O!
This node is at least 90 Ma!
General issues!
•! Fossils generally provide only minimal ages!
•! The age is attached to the node below the
lowest place on the tree that the fossil could
attach!
•! Maximal or absolute ages can only be
asserted when there are lots of fossil data!
•! Geological events can sometimes be used to
obtain minimal ages!
This node is at least 90 Ma!
What if a clock is rejected?!
•! Until recently three (bad) choices!
–! Give-up on molecular dating!
–! Go ahead and use molecular dating anyway!
–! Delete extra-fast or extra-slow taxa!
•! Now we have other options!
–! Assume local clocks!
–! Relaxed clock methods!
Local clocks!
Non-Parametric Rate-Smoothing"
(NPRS: Sanderson 1998)!
Node k!
d1!
a!
d2!
Can use likelihood ratio tests to compare to strict clock
and non-clock models#
How many parameters? !
Non-Parametric Rate-Smoothing"
(NPRS: Sanderson 1998)!
Node k!
The rate of branch a = ^!
ra= La/Ta!
(L = branch length; T = time duration)!
Non-Parametric Rate-Smoothing"
(NPRS: Sanderson 1998)!
d1!
a!
d1!
a!
d2!
Measure of rate roughness = Rk = (r^!a - ^!
rd1)2 + (r^!a - ^!
rd2) 2!
d2!
k= n"1
Adjust times so as to minimize
overall roughness: !
#R
k
k=1
!
NPRS!
•! Uses branch lengths only (ignores raw data)!
•! Quick and easy to do!
•! Assumes rate change is smooth!
Penalized Likelihood"
(Sanderson 2001)!
•! Semi-parametric likelihood approach!
•! Uses raw data but penalizes the likelihood
score by the roughness score, # R ,
weighted by a smoothness parameter (!)!
•! Selects optimal value of !! using crossvalidation (pick the value that minimizes the
errors made in predicting branch lengths)!
k= n"1
k
k=1
Penalized Likelihood!
•! Uses more data than NPRS - more accurate!
•! More difficult to implement!