Chapter 3: Phylogenetics
3.2 Computing Phylogeny
Prof. Yechiam Yemini (YY)
Computer Science Department
Columbia University
Overview
Computing trees
Distance-based techniques
Maximal Parsimony (MP) techniques
Maximum likelihood techniques
This chapter is based on Durbin Chapter 7
Also recommended: The Phylogenetic Handbook, Salemi and Vandamme 2004
2
1
Can We Tell Evolution From Homology
Partial sample
Duplication
Speciation
1A
2A
3A
3B
2B
1B
Phylogeny
How do we tell the right tree?
3A
1A
2B
1A
3A
2B
3
Phylogeny: Computing Trees
INPUT:
AGGGCAT
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
Phylogeny
U
V
W
X
Y
OUTPUT:
X
U
Y
V
W
4
2
Brute Force Approach
Brute Force
Enumerate all trees
Compute some measure of evolutionary likelihood
Select best tree
How many rooted trees are there with n leaves?
n=2 leaves => 1 tree
n=3 leaves =>attach 3rd leaf to 3 edges => 3 trees
Let T(n)= # rooted trees with n leaves; E(n) = # edges
T(2)=1, E(2)=3; T(3)=3, E(3)=5
Addition of a leaf creates two new edges => E(n)=E(n-1)+2=> E(n)=2n-1
T(n)=T(n-1)*E(n-1)=T(n-1)*(2n-3) => T(n)= 1*3*5*…(2n-3)
For n=20 leaves ~1021
5
Approaches
Distance based
Tree should best model evolutionary distance metric among taxa
Character-based [Maximal Parsimony (MP)]
Tree should minimize changes
Maximum likelihood (ML)
Tree should maximize likelihood of changes
INPUT:
OUTPUT:
Phylogeny
U
V
W
X
Y
AGGGCAT
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
X
U
Y
V
W
6
3
Distance Based
Techniques
7
I. Distance Based Techniques
Key Idea:
Compute evolutionary distance metric D among S={U,V,W,X,Y}
Compute a tree on S that best fits the distances D
Formally:
Given: nxn distance matrix D
Compute: weighted tree T on n leaves that “best fits” D
How to establish evolutionary distance measures?
Distance ~ AA changes
Next chapter: evaluating distance using Markovian evolution models
8
4
Is There A Tree That Perfectly Fits D?
Not every distance metric D can be modeled by a tree
How can we tell distance metrics that model a tree?
U V WX
U
V 2
W 2 2
X 2 1 1
U
1
.5
.5 .5
1
V X
U V WX
U
V 1
W 1 2
X 2 1 1
W
U
?
X
1 1 1 1
W
V
9
The Four-point condition
A distance matrix corresponding to a tree is called additive
THEOREM: D is additive if and only if:
For every four indices i,j,k,l, the maximum and median of
the three pairwise sums are identical:
Dij+Dkl < Dik+Djl = Dil+Djk
Suggests how to connect 4 points into a tree to fit D
i
l
Dij
Dkl
j
U V WX
U
V 2
W 2 2
X 2 1 1
<
k
U
V
1
1
.5
W
.5
.5
X
Dik
Djl
Dil
=
U VWX
U
V1
W1 2
X2 1 1
Djk
U
1
V
1
2
W
2
1
X
1
10
5
How Do We Handle Non-Additive D ?
Additive metrics are very useful
Provide perfect fit with a tree model; tree is easily computed from D
But evolutionary distance metrics are often non-additive
How do we handle non-additive metric?
Fitch & Margoliash: find a tree T to minimize least-square fit:
E(T) =∑i,j (dij(T) – Dij )2
This problem is NP-Hard need heuristics
Fitch & Margoliash (1968) – exhaustive search
11
Closest-Pair Clustering
Idea: use D to guide closest-pair clustering
Extend D to clusters by UPGMA/WPGMA averaging
12
6
UPGMA Algorithm
Initialization
Initialize n clusters Ci={Si}
Initialize T with leaves for each cluster Ci
Iteration
Find Ci, Cj with smallest distance Dij
Create new cluster Ck = Ci∪Cj
Add a new node to T, for Ck, and connect it to Ci,Cj
If all nodes are connected to a tree exit; otherwise, assign Dki=Dkj=Dij/2
and compute the distances Dkl to all clusters Cl
Dil |Ci| + Djl |Cj|
Dkl = ––––––––––––––
|Ci| + |Cj|
•
Repeat the iteration
13
UPGMA: Molecular Clock Property
Uniform distance from root to leaves
Distance to root ~ evolutionary clock
Species are assumed to take identical time to evolve
9
8
7
0.5D67
6
0.5D18
0.5D45
1
2
3
4
5
14
7
Notes
Complexity is is O(n2)
Averaging redistributes distances to overcome non-additivity
Clustering can lead to substantial errors and is very sensitive
This limits the applications of clustering
How do we overcome the sensitivity of UPGMA?
Real tree
UPGMA
.5
2.5
2
1
1
W
V
20
U
9
X
U V W X
U
V 22
W 24 6
X 32 14 10
13
U
12
X
9
3
V
3
W 15
Improvements Through Bootstrapping
Bootstrapping: statistical technique to increase robustness
Scenario: given a sample S(ω) and a result R(S) computed from S
Bootstrapping:
o Resample S, to get S’(ω);
o Evaluate R(S’(ω));
o Evaluate match of R(S) with the values R(S’(ω))
In here S= columns of sequences of size n; R(S)=tree
S’(ω)=Sample n random columns of S with possible repetitions
Compute phylogenetic tree R(S’(ω))
Use {R(S’(ω))} to compute consensus/likelihood of branches of R(S)
16
8
Bootstrapping Example
17
Closest Pair vs. Evolutionary-Neighbors
Additivity: Dij+Dkl < Dik+Djl = Dil+Djk
i
l
Dij
<
Dkl
j
k
Dik
Dil
=
Djl
Djk
UPGMA overcomes non-additivity by averaging distances
But, the closest pair may not be evolutionary neighbors
The evolutionary tree distances may diverge greatly;
averaging distorts neighborhood
U 1
4
W
1
V
1
22
X
X
W
U
V
18
9
Neighbor Joining [Saitou & Nei 87; Studier & Keppler 88]
Neighbor joining heuristics:
join closest clusters that are far from the rest
Define: Rk=Σi≠kDik the divergence of k
Cluster nodes k,m that minimize D’km=Dkm-(Rk+Rm)/(n-2)
[Define rk=Rk/(n-2) and consider Dkm-rk-rm]
rk
Dkm
rm
D’
U 1
4
W
1
V 1
22
X
r
U 16
V 16
W 19
X 37
U
V W X
U
V -29
W -30 -29
X -29 -30 -29
19
Neighbor Joining Algorithm
Initialization:(same as UPGMA) Initialize n clusters Ci={Si}
Iteration:
1.
2.
3.
4.
Compute rk=Σi≠kDik/(n-2) for each cluster k
Find (k,m) minimizing Dkm-rk-rm;
Define a new node i and set Dis= 0.5(Dks+Dms-Dkm) for all s
Join node i to k and m with edges of respective lengths:
Dki=0.5(Dkm+rk-rm) Dmi=0.5(Dkm+rm-rk)
5. Repeat until all nodes are connected
20
10
Example: Step 1--Compute Divergences rX
A B CD E F
A
B
C
D
E
F
5 4 7 6 8
5
7 10 9 11
4 7
7 6 8
7 10 7
5 9
6 9 6 5
8
A
8 11 8 9 8
B
C
D
E
F
Step 1
Σ
30 42 32 38 34 44
r
7.5 10.5 8 9.5 8.5 11
Step 1: compute rk=Σi≠kDik/(n-2)
Sum the columns then divide by 6-2=4
From The Phylogenetic Handbook, Salemi and Vandamme 2004
21
Step 2: find neighboring pair
Step 2: evaluate neighboring distance matrix
Nkm=Dkm-(rk+rm)
[Subtract the r column & row]
Find (k,m) minimizing Nkm
Create a new node U and attach to k,m
U
A
A B C D E F
5 4 7 6 8
5
4 7
7 6 8
7 10 7
8
5 9
9.5
8
8.5
6 9 6 5
8 11 8 9 8
7.5 10.5 8 9.5 8.5 11
B
7.5
A
7 10 9 11 10.5
11
Step 2
A
B
C
D
E
F
UPGMA would connect
the closest pair
A
B
C
B
-13
C -11.5 -11.5
C
D
E
D
F
E
F
Min{Nkm}}
Min{D’
km
D -10 -10 -10.5
E -10 -10 -10.5 -13
F -10.5 -10.5 -11 -11.5 -11.5
22
11
Step 3,4: Join Neighbors Update Distances
Step 3: Compute the branch lengths UA,UB
DAU=0.5(DAB+rA-rB)=0.5(5-3)=1
DBU=0.5(DAB+rB-rA)=0.5(5+3)=4
Step 3
Step 4: Update distance matrix
DUX= 0.5(DAX+DBX-DAB)
DUC= 0.5(4+7-5)=3; DUD=0.5(7+10-5)=6
DUE=0.5(6+9-5)=5; DUF=0.5(8+11-5)=7
U
1
A
4
A B C D E F
5 4 7 6 8
5
7 10 9 11
4 7
7 6 8
7 10 7
5 9
6 9 6 5
B
UC D E F
8
8 11 8 9 8
C
D
E
F
3 6 5 7
U
C
D
E
F
Step 4
A
B
C
D
E
F
3
7 6 8
6 7
5 9
5 6 5
8
7 8 9 8
7.5 10.5 8 9.5 8.5 11
23
Repeat Steps 1/2/3/4
U C D E F
Step 1: compute rk=Σi≠kDik/(n-2)
Step 2: compute neighboring pair
3 6 5 7
U
C 3
7 6 8
D 6 7
Min{NXY=DXY-rX-rY} => (U,C) or (D,E)
5 9
E 5 6 5
Step 3: join neighbors; compute branch length
8
DUV=0.5(DUC+rU -rC)=1; DCV=2
Step 4: re-compute distances
F 7 8 9 8
DVX= 0.5(DUX+DCX-DUC)
Step 1
r 7 8 9
Step 3
8 10.7
Step 2
U
U
C
D
E
F
C
D
A
-10
-11
-10
-10
V D E F
V
2
U
1
-12
V
1
E
Step 4
5 4 5
D 5
C
4
5 9
E 4 5
8
F 5 9 8
-12
-10.7 -10.7 -10.7 -10.7
B
D
E
F
24
12
Repeat
V D E F
5 4 6
V
D 5
5 9
E 4 5
8
F 6 9 8
Step 1: compute rk=Σi≠k Dik/(n-2)
Step 2: compute neighboring pair
Min{NXY=DXY-rX -rY } => (D,E)
Step 3: join neighbors; compute branch length
DWD=0.5(DDE+rD-rE )=3; DWE=2
Step 4: re-compute distances
DWX= 0.5(DDX+DDX-DDE)
Step 1
r 7.5 9.5 8.5 11.5
Step 3
Step 2
V
D
V
D -12
E -12 -13
F -13 -12 -12
V
1
E
A
V W F
2 6
V
2
U
1
Step 4
W 2
C
4
W
B
F 6 6
2
3
6
E
D
F
25
Repeat
V W F
V
2 6
W 2
6
F 6 6
Step 1: compute rk=Σi≠k Dik/(n-2)
Step 2: compute neighboring pair
Min{NXY=DXY-rX -rY } => (V,F)
Step 3: join neighbors; compute branch length
DZV=0.5(DVF+rV -rF )=1; DZF=5
Step 4: re-compute distances
Step 1
DZX= 0.5(DVX+DFX-DVF)
r 8 8 12
Step 3
1
Step 2
W
V
W -14
F -14 -14
A
Z
2
U
1
Z W
V
1
V
Step 4
Z
5
W 1
C
4
W
F
B
1
2
3
E
D
26
13
Complete
Z W
1
Z
W 1
1
V
1
A
5
C
4
F
B
A
W
2
3
2
U
1
5
C
4
2
3
E
D
F
E
D
W
V
1
2
U
1
Z
1
Z
1
B
27
Notes On Neighbors Joining
Complexity is O(n2)
Does not depend on molecular clock assumption
Heavily used in practice [e.g., Clustal W]
But can be sensitive to non-additivity
28
14
Maximal Parsimony
(character based phylogeny)
29
Key Idea: Minimize Changes
Reconsider the problem:
Find “best” tree to explain evolution of sequences
Motivation: focus on evolution of positions
ATTACTG
ATTACTA
GTTGCTA
ATTGCTA
“Distance” loses information on evolutionary changes
Key idea: find tree with minimal changes to explain data
AAG
AAA
GGA
AGA
C=4
AAA
AAA
AAA
AAG AGA AAA GGA
C=3
AAA
AAA
AGA
AAG AAA GGA AGA
30
15
More Generally
Taxa are considered as sets of attributes: characters
“character” = DNA position, genes order, morphological feature…
“character state” = a value assumed by a character
Characters evolve through state changes
Evolutionary tree represents changes in character states
MP-tree seeks to minimize state changes
31
MP Example
http://evolution.berkeley.edu/evosite/evo101/IIC1aUsingparsimony.shtml
Characters
Binary states
Taxa
1 state change
32
16
MP Example
7 state changes
6 state changes
33
Example: Evolution of A Gene
Taxa
www.life.uiuc.edu/ib/335/MolSyst.html
Character = position
State = nucleotide
34
17
Example: Evolution of A Gene
http://home.cc.umanitoba.ca/~psgendb/GDE/phylogeny/parsimony/phylip.parsimony.html
Character = position
State = nucleotide
Taxa
35
Example
Pevzner 2003 Genome Research
MP rearrangements of chromosome X
36
18
The Max Parsimony (MP) Problem
“Big” MP:
Input: set of n aligned sequences of length k
Output: phylogenetic tree T such that
o T has n leaves labeled with the input sequences (taxa)
o T has internal nodes labeled with sequences of length k (states)
o T minimizes the Hamming distance among its node labels
H=3
AAA
AAA
AGA
AAG AAA GGA AGA
This is a Steiner Tree type problem
Can be shown to be NP hard [Gusfield, Foulds]
But often the number of sequences considered is small
“Small” MP
Input: a tree with sequence-labeled leaves
Output: labeling of internal nodes states which max parsimony
37
MP Basics
Consider {ATA,ATT, GTT, GTA, GGT}
First column admits 2 arrangements & identifies likely mutation ATA
ATT
A 1
5 G
G 3
5 G
GTT
GTA
A 2
4 G
GGT
A 1
4 G
G 3
A 2
MP (1 mutation)
2 mutations
Second column does not provide clues on likely mutations
T 1
T
2
T 3
5 G
T 1
4 T
T
3
2 T
G 5
4 T
Non-informative position (need at least 2 characters)
ATA
ATT
GTT
GTA
GGT
38
19
MP Basics
A 1
A 2
G 3
5 G
A 1
4 G
A
5 T
4
ATA
ATT
GTT
GTA
GGT
2 T
T 3
MP
MP
Merge MP trees of columns 1 & 3:
ATA 1
ATT
ATT 2
GTT
5 GGT
ATA
GTT
4 GTA
GTT 3
5 GGT
ATA 1
GTA
4
GTT
ATT
3 GTT
2
ATT
Two MP trees
39
Example (N. Friedman)
Aardvark:
Bison:
Chimp:
Dog:
Elephant:
CAGGTA
CAGACA
CGGGTA
TGCACT
TGCGTA
TGGGTA
CGGGTA
CAGGTA
Aardvark
Bison
TGCGTA
Chimp
Dog
CAGGTA CAGACA CGGGTA TGCACT
Elephant
TGCGTA
40
20
Example:Evolution of Protein Domains
http://ai.stanford.edu/~serafim/CS374_2006/
D1 D2 D3
1
1
0
1
1
1
0
0
1
1
0
1
Total Cost: 3
1
1
1
C. Chothia et al…, “Evolution of the Protein Repertoire”, Science VOL 300, 13 June 2003
T. Przytycka et al, “Graph Theoretical Insights ….”, RECOMB 2005, LNBI 3500, pp. 311-325, 2005
41
Single Site MP: The Fitch Algorithm
Problem:
Input: a tree T with labeled leaves
Output: labels of internal nodes of MP tree + cost C
Step 1: Assign to each node x a set of labels S(x) such that
If x is a leaf then S(x)= label of x, C⇐0
If x has children y,z
S(x) = if S(y)∩S(z)≠0 then S(y)∩S(z)
else S(y)∪S(z), C⇐C+1
Traverse T in postorder (leaves to root)
Step 2: Assign to a node x a character value v(x)
Traverse T in preorder (root to leaves)
If y is the parent of x and v(y)εS(x) then v(x)v(y)
else v(x)= any label from S(x)
42
21
Step 1: Computing Candidate Labels
C=2
∪
{A, G}
C=1
∩
C=1
{A}
C=1
{A}
C=1
∪
{A, G}
A
{A}
A C=0 G
{A} {G}
G
A
{G} {A}
{A, G}
G
{G}
A
{A}
G
{G}
A
{A}
A
{A}
G
{G}
G
{G}
43
Step 2: Selecting MP Labels
C=2
{A, G}
{A, G}
{A}
{A}
{A, G}
A
{A}
A
G
{G} {A}
{A, G}
{A}
{A, G}
G
{G}
A
{A}
A
G
{G} {A}
{A, G}
G
{G}
A
{A}
A
G
{G} {A}
G
{G}
44
22
Notes
Algorithm is fast O(nk) n= # nodes, k=#character values
It selects a particular MP tree (there may be others)
{A, G}
C=2
{A}
G
A
{G} {A}
A
G
A
A
G
{G}
G
A
A
{A, G}
A
{A}
A
G
G
A
G
A
G
G
A
G
A
G
A
G
Run separately for each character then merge results
May be generalized for weighted parsimony:
Sankoff’s generalization: different costs of different changes
45
Heuristic MP Algorithms
Use Steiner-tree heuristic algorithms
Branch-and-bound search
Represent search space as tree
(nodes at k-th level represent phylogenetic trees for first k species)
Find best scoring search-node and use it as bound
Branch to children of this search-node
Nearest neighbor interchange (NNI) – switch subtrees
Simulated annealing
….
46
23
Maximal Likelihood
Approach
47
(III) Max Likelihood Approaches
(Based on N. Friedman slides)
Key idea: compute maximum likelihood tree
Many models of changes (trees) can yield observed data
Compute tree that maximizes the likelihood
Problem 1: given T, compute probability P(S|T)
S={X1,…Xn} are the observed sequences
Need a probability model of changes generated by T:
o Background probabilities: q(a)
o Mutation probabilities: P(a|b,t)
x5
Problem 2: compute T that maximizes P(S|T)
t4
x4
This is the complex part
t1
x1
t2
x2
t3
x3
48
24
Tree Likelihood Computation
Define P(Lk|a)= prob. of subtree below node k given xk=a
Init: for all leaves k; P(Lk|a)=1 if xk=a ; 0 otherwise
Iteration: if k is node with children i and j, then
P(Lk | a) = " P(b | a,t i )L(i | b)P(c | a,t j )L( j | c)
b,c
x5
Termination:Likelihood is
!
P ( x1,K, x 3 | T , t ) = ! P (Lroot | a )q(a )
a
t1
t4
x4
t2
x1
x2
t3
x3
49
Maximum Likelihood (ML)
Score each tree by
P ( X1,K, X n | T , t ) = ! P ( x1 [m],K, x n [m] | T , t )
Assumption of independent positions
m
Find the highest scoring tree
Exhaustive search
Sampling methods (Metropolis)
Approximation (consider only a subset of trees)
50
25
Comparison
Tony Weisstein, http://bioquest.org:16080/bedrock/terre_haute_03_04/phylogenetics_1.0.ppt
Neighbor-joining
Maximum parsimony
Maximum likelihood
Uses only pairwise distances
Uses only shared derived
characters
Uses all data
Minimizes distance between
nearest neighbors
Minimizes total distance
Maximizes tree likelihood
given specific parameter
values
Very fast
Slow
Very slow
Easily trapped in local optima Assumptions fail when
evolution is rapid
Highly dependent on assumed
evolution model
Good for generating tentative
tree, or choosing among
multiple trees
Good for very small data sets
and for testing trees built using
other methods
Best option when
tractable (<30 taxa)
51
Conclusions
Computing phylogeny is an area of active research
Hundreds of algorithms.
New models: phylogenetic networks (generalize trees)
New challenges: whole genome phylogeny
Account for multi-site changes: replication, transpositions…
New algorithms
Applications
Epidemiology
Cancer diagnosis
….
52
26
© Copyright 2026 Paperzz