Maximum Likelihood Evolutionary Trees

Phylogenetics-2
Marek Kimmel (Statistics, Rice)
[email protected]
713 348 5255
Outline
• Distance trees and ultrametric distances
• Existence of a tree given a set of ultrametric
distances
• UPGMA method
• Neighbor Joining method
• Maximum Parsimony (independent reading)
Distance axioms
1.
2.
3.
4.
Nonnegativeness, d(x, y)  0.
Nondegeneracy, d(x, y)  0  x = y
Symmetry, d(x, y) = d(y, x)
Triangle property, d(x, y)  d(x, z) + d(z, y)
For tree – derived distances:
5. Ultrametricity. For any three points, two distances are
equal and the third is less than these two, e.g.
d(x, y) < d(x, z) = d(z, y)
Ultrametricity
For any 3-subtree, d(x, y) < d(x, z) = d(z, y)
• Distances: tree – derived  all triplets are ultrametric
• If all triplets ultrametric, do the distances uniquely define
a tree?
Proof of tree existence
• Constructive proof, by
induction, given set of
nodes,
s1 , s2 , , sn
with ultrametric distances.
• First step: Construct a
tree with 2 species
m-step
• Suppose tree constructed for first m species.
r = old root
x  SL
y  SR
m+1 - step
• Take x and y as in the previous slide and sm+1
• Suppose d(sm+1, x) = d(sm+1, y) (other cases handled
similarly).
• Consequently, d(x, y) < d(sm+1, x) = d(sm+1, y).
new root
b
a
sm 1
r = old root
x  SL
y  SR
Induction
• Choose x and y and define
a  d ( s m 1 , x ) / 2
b  d ( sm 1 , x )  d ( x, y ) / 2  0 (! ?)
• These distances good for x and y, now check for any z
d ( z, r )  d ( x, y ) / 2, z  S L  S R
 In the new tree, distance ( sm 1 , z )  a  b  d ( x, y ) / 2
 In the new tree, d ( sm 1 , x )  a  b  d ( x, y ) / 2 (*)
If z  S L  d ( x, z )  d ( x, y )
 d ( x, z )  d ( x, y )  d ( sm 1 , x ), by (*)
 d ( s m 1 , z )  d ( s m 1 , x )
Remarks
• Similar proofs for the other two cases
• UPGMA method builds the same trees simpler.
• Not good for non-ultrametric distances, closest
nodes do not have to be neighbors.
• Neighbor Joining method is a remedy (to be
continued …).
Neighbor-joining distance
 ( x , y )  ( N  4) d ( x , y ) 
 [d ( x, z )  d ( y, z )]
z  x, y
Neighbor-joining “distance” is not a distance, but
it satisfies the following theorem:
Theorem. Suppose S is a set of species and d is a
tree-derived distance on S obtained from an unrooted tree
(so, not necessarily ultrametric). If x and y are such that
(x,y) are minimum, then x and y are neighbors.
Proof for N = 4
 (i , j )   d (i , k )  d (i , l )  d ( j , k )  d ( j , l )
 (i , k )   d ( i , j )  d (i , l )  d ( k , j )  d ( k , l )
  ( i , k )   ( i , j )   d (i , j )  d ( k , l )  d (i , k )  d ( j , l )  0
• In a 4-tree, all leaves have neighbors
• General proof, see the book
• N.-J. Algorithm, see the book
Gene splitting versus population splitting
Diagram showing that gene splitting (G) usually occurs earlier than population splitting (P)
if the population is genetically polymorphic at time P. The evolutionary history of gene
splitting resulting in the six alleles denoted a-f is shown in solid lines, and population
splitting is shown in broken lines. After Nei (1987).