Lecture 14

Realistic evolutionary models
Marjolijn Elsinga & Lars Hemel
Realistic evolutionary models
Contents
•
•
•
•
•
•
Models with different rates at different sites
Models which allow gaps
Evaluating different models
Break
Probabilistic interpretation of Parsimony
Maximum Likelihood distances
Unrealistic assumptions
1 Same rate of evolution at each site in the
substitution matrix
- In reality: the structure of proteins and the
base pairing of RNA result in different rates
2 Ungapped alignments
- Discard useful information given by the
pattern of deletions and insertions
Different rates in matrix

Maximum likelihood, sites are
independent
Xj for j = 1…n
Different rates in matrix (2)

Introduce a site-dependent variable ru
Different rates in matrix (3)
We don’t know ru, so we use a prior
 Yang [1993] suggests a gamma
distribution g(r, α , α), with mean = 1
and variance = 1/α

Problem
Number of terms grows exponentially
with the number of sequences 
computationally slow
 Solution: approximation
- Replace integral by a discrete sum
- Subdivide domain into m intervals
- Let rk denote the mean of the gamma
distribution in the kth interval

Solution
Yang [1993] found m = 3.4 gives a good
approximation
 Only m times as much computation as
for non-varying sites

Evolutionary models with gaps (1)
Idea 1: introduce ‘_’ as an extra character
of the alphabet of K residues and
replace the (KxK) matrix with a (K+1) x
(K+1) matrix
Drawback: no possibility to assign lower
cost to a following gap, gaps are now
independent
Evolutionary models with gaps (2)
Idea 2: Allison, Wallace & Yee [1992]
introduce delete and insertion states to
ensure affine-type gaps
Drawback: computationally intractable
Evolutionary models with gaps (3)
Idea 3: Thorne, Kishino & Felsenstein
[1992] use fragment substitution to get a
degree of biological plausibility
Drawback: usable for only two sequences
Finally

Find a way to use affine-type gap penalties in
a computationally reasonable way

Mitchison & Durbin [1995] made a tree HMM
which uses a profile HMM architecture, and
treats paths through the model as objects that
undergo evolutionary change
Assumptions needed again

We will use a architecture quite simpler
than that of the profile HMM of Krogh et
al [1994]: it has only match and delete
states
Match state: Mk
 Delete state: Dk

k = position in the model
Tree HMM with gaps (1)
Sequence y is ancestor of sequence x
 Both sequences are aligned to the
model, so both follow a prescribed path
through the model

Tree HMM with gaps (2)
x emits residu xi at Mk
 y emits residu yj at Mk


Probability of substitution yj  xi is
P(xi| yj,t)
Tree HMM with gaps (3)
What if x goes a different path than y
 x: Mk  Dk+1
(= MD)
 y: Mk  Mk+1
(= MM)


P(MD|MM, t)
Tree HMM with gaps (4)
x: Dk+1  Mk+2 (= DM)
 y: Mk+1  Mk+2 (= MM)

We assume that the
choice between DD and
DM is controlled by a mutational process
that operates independently from y

Substitution matrix

The probabilities of transitions of the
path of x are given by priors:
Dk+1  Mk+2 has probability qDM
How it works
At position k: qyjP(xi|yj,t)
 Transition k  k+1:
qMMP(MD|MM,t)
 Transition k+1  k+2:
qMMqDM

An other example
Evaluating models: evidence
Comparing models is difficult
 Compare probabilities:
P(D|M1) and P(D|M2) by integrating
over all parameters of each model
Parameters θ
Prior probabilities P(θ )

Comparing two models

Natural way to compare M1 and M2 is to
compute the posterior probability of M1
Parametric Bootstrap
Let
be the maximum likelihood of
the data D for the model M1
 Let
be the maximum likelihood of
the data D for the model M2

Parametric bootstrap (2)
Simulate datasets Di with the values of
the parameters of M1 that gave the
maximum likelihood for D
 If Δ exceed almost all values of Δi 
M2 captured more aspects of the data
that M1 did not mimic, therefore M1 is
rejected

Break
Probabilistic interpretation of
various models
Lars Hemel
Overview

Review of last week’s method Parsimony
– Assumptions, Properties

Probabilistic interpretation of Parsimony
 Maximum Likelihood distances
– Example: Neighbour joining

More probabilistic interpretations
– Sankoff & Cedergren
– Hein’s affine cost algorithm

Conclusion / Questions?
Review

Parsimony = Finding a tree which can
explain the observed sequences with a
minimal number of substitutions
Parsimony

Remember the following assumptions:
– Sequences are aligned
– Alignments do not have gaps
– Each site is treated independently

Further more, many families have:
– Substitution matrix is multiplicative: S(t  s)  S(t)S(s)
– Reversibility: P(b | a, t )qa  P(a | b, t )qb
Parsimony

Basic step: counting the minimal number of
changes for one site
 Final number of substitutions is summing
over all the sites
 Weighted parsimony uses different
‘weights’ for different substitutions
Probabilistic interpretation of
parsimony

Given:
A set of substitution probabilities P(b|a) in
which we neglect the dependence on length t
 Calculate substitution costs
S(a,b) = -log P(b|a)
 Felsenstein [1981] showed that by using these
substitution costs, the minimal cost at site u for the
whole tree T obtained by the weighted parsimony
algorithm is regarded as an approximation to the
likelihood
Probabilistic interpretation of
parsimony

Testing performance for tree-building algorithms
can be done by generating trees probabilistic with
sampling and then see how often a given
algorithm reconstructs them correctly
 Sampling is done as follows:
– Pick a residue a at the root with probability q a
– Accept substitution to b along the edge down to node i
with probability P(b | a, ti ) repetitive
– Sequences of length N are generated by N independent
repetitions of this procedure
– Maximum likelihood should reconstruct the correct tree
for large N
Probabilistic interpretation of
3 parsimony
T
1
0.3
0.1
Suppose we have tree T,
with the following edgelengths
0.09
2
0.1
0.3
4
And substitutionmatrix
1  p

 p
p 

1 p 
with p=0.3 for leaves 1,3
and p=0.1 for 2 and 4
Probabilistic interpretation of
parsimony

Tree with n leaves has (2n-5)!! unrooted
trees
3
1
4
2
1
1
2
2
T1
3
4
T2
4
T3
3
Probabilistic interpretation of
parsimony
Parsimony
Maximum likelihood
N
20
100
500
2000
T1
419
638
904
997
T2 T3
339
204
61
3
242
158
35
0
N
20
100
500
2000
T1
396
405
404
353
T2 T3
378
515
594
646
Parsimony can constructs the wrong tree
even for large N
224
79
2
0
Probabilistic interpretation of
parsimony

Suppose the following example: A tree with
A,A,B,B at the places 1,2,3 and 4
B
A
A
B
Probabilistic interpretation of
parsimony

With parsimony the number of substitutions
are calculated
B
A
A
A
A
A
B
2
A
1
A
B
B
B
Parsimony constructs the right tree with 1
substitution more often than the left tree with 2
Maximum Likelihood distances
Suppose tree T, edge lengths t   t1 ,, tn
i
and sampled sequences x at the leafs
1
 We’ll try to compute the distance between x
3
x
and

x8
t6
t1
x1
t7
x6
x2
x7
t4
t3
x
3
x4
x5
Maximum Likelihood distances
x8
t6
t1
x1

t7
x6
x2
x7
t4
t3
x
3
x4
x5
By multiplicativety
P(a | a , t1  t6 )   P(a | a , t1 ) P(a | a , t6 )
1
8
1
a6
6
6
8
x3
x8
t6
t1
t7
x6
x2
x1
t3
x
x3

t 7  t3
t1  t6
x7
t4
4
x
5
x
t1  t6  t7  t3
x3
1
x1
By reversibility and multiplicativity
 P(a
1
a8
| a , t1  t6 ) P(a | a , t7  t3 )qa8
8
3
8
  P(a | a , t1  t6 ) P(a | a , t3  t7 )qa 3
1
8
8
a8
 P(a | a , t1  t6  t3  t7 )qa 3
1
3
3
Maximum Likelihood distances
P( x , x | T , t )  qx j P( x | x , tk1    tkr )
i
u
j
u
u
i
u
j
u


i
j
d  arg max  qx j P( xu | xu , t )
u
t
 u



ML
i
j
d ij  arg max  P( xu | xu , t )
t
 u

ML
ij
Maximum Likelihood distances
d

ML
ij
 tk1    tkr
ML distances between leaf sequences are
close to additive, given large amount of data
Example: Neighbour joining
m
i
k
j
dim  dik  d km , d jm  d jk  d km
1
d km  d im  d jm  d ij 
2
Example: Neighbour joining

Use Maximum Likelihood distances
 Suppose we have a multiplicative reversible
model
 Suppose we have plenty of data
 The underlying probabilistic model is
correct
 Then Neighbour joining will construct any
tree correctly.
Example: Neighbour joining
Neighbour joining using ML distances
N
20
100
500
2000
T1
477
635
896
997
T2 T3
301
231
85
5
222
134
19
0
It constructs the correct tree where Parsimony
failed
More probabilistic interpretations

Sankoff & Cedergren
– Simultaneously aligning sequences and finding
its phylogeny, by using a character substitution
model
– Probabilistic when scores are interpreted as log
probabilities and if the procedure is additive in
stead of maximizing.
Allison, Wallace & Yee [1992]
– But as the original S&C method it is not
practical for most problems.
More probabilistic interpretations

Hein’s affine cost algorithm
– Simultaneously aligning sequences and finding
its phylogeny, by using affine gap penalties
– Probabilistic when scores are interpreted as log
probabilities and if the procedure is additive in
stead of maximizing.
– But when using plus in stead of max we have to
include all the paths, which will cost N 2 at the
first node above the leaf and N 3 at the next and
so on. So all the speed advantages are gone.
Conclusion

Probabilistic interpretations can be better
– Compare ML with parsimony

They can also be less useful, because of costs
which get too high
– Sankoff & Cedergren

Neighbour joining constructs the correct tree if it
has the correct assumptions
 So, the trick is to know your problem and to
decide which method is the best

Questions??