Probabilistic Approaches to Phylogeny

Probabilistic Approaches to
Phylogeny
Wouter Van Gool & Thomas Jellema
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
Goal:
8.1 Introduction
•
Formulate probabilistic models for phylogeny
• Infer trees from sets of sequences
Aim Probability-based Phylogeny:
Rank trees according to - likelihood P(data |tree)
- posterior probability
P(tree|data)
8.1 Introduction
Compute probability of a set of data given
A tree:
P(x* |T, t* )
x*: set of n sequences xj (j=1…n)
T : tree with n leaves, with sequence j at leaf j
t* : edge lengths of the tree
8.1 Introduction
Example
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
8.2 Probabilistic Models of Evolution
Given the sequence at the leafs x1…xn:
2.
Pick a model of evolution: P(x |y,t ),P(x)
Enumerate all possible tree topologies with n leaves
3.
For each T, maximize over all possible edge lengths t:
4.
Pick the T and t that have the largest probability
1.
8.2 Probabilistic Models of Evolution
Simplifying Assumptions:
1.
2.
Single base substitions only: ungapped alignments only
Each base evolves independently with the same model of
evolution based on a substitution matrix
8.2 Probabilistic Models of Evolution
Substitution Matrix for Phylogeny
Many important families of substitution matrices are
multiplicative: S(t)S(s) = S(T+s)
Substitution matrices used in Phylogeny:

Jukes & Cantor Model [1969]

Kimura DNA Model [1980]

PAM Matrix [1978]
8.2 Probabilistic Models of Evolution
Jukes-Cantor Model
8.2 Probabilistic Models of Evolution
Kimura DNA model
8.2 Probabilistic Models of Evolution
PAM matrix model
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
8.3 Calculating the likelihood for
ungapped alignments
Example: The likelihood of two nucleotide sequences
8.3 calculating the likelihood for
ungapped alignments
Likelihood for general case
Where node α(i) is the ancestor of node i
A fixed set of values t1…t2n-1 and topology T is required
8.3 calculating the likelihood for
ungapped alignments
Likelihood for general case
Where node α(i) is the ancestor of node i
A fixed set of values t1…t2n-1 and topology T is required
8.3 calculating the likelihood for
ungapped alignments
Felsenstein’s recursive algorithm
Define a table of probabilities Fk,a for each site u and
all tree nodes k and input characters a:
= probability at a site u for subtree below node k
assuming character u at node k is a
8.3 calculating the likelihood for
ungapped alignments
Felsenstein’s recursive algorithm
8.3 calculating the likelihood for
ungapped alignments
Likelihood for general case
Overall algorithm:
•
Enumerate each tree topology t
• Enumerate sets of values t (using some ndimensional optimisation technique)
• Run Felsenstein’s recursive algortihm for each site
u and multiply likelihoods
• Return best T&t
8.3 calculating the likelihood for
ungapped alignments
Reversibility & independence of root position

The score of the optimal tree is independent of the
root position if and only if:
- the substitution matrix is multiplicative
- the substitution matrix is reversible
 A substititution matrix is reversible if for all a,b
and t:
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
Demo
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
8.4 Using the likelihood for
inference
Maximum likelihood:


The best tree “could be “ the tree that maximises the
likelihood
Computationally demanding
8.4 Using the likelihood for
inference
Sampling from the posterior distribution:


We use Bayes’ rule to compute the posterior probability
This is the probability of a model given the data
8.4 Using the likelihood for
inference
Example
Model name
Model 1
Model 2
Model 3
prior chance of model
10
40
50
data
100% A
50% A 50% B
100% B
8.4 Using the likelihood for
inference
Sampling from the posterior distribution:


We use Bayes’ rule to compute the posterior probability
This is the probability of a model given the data
33
100
10
30
8.4 Using the likelihood for
inference
Metropolis algorithm

It samples from the trees with probabilities given by their
posterior distribution.
 It is a sampling procedure that generates a sequence of
trees, each from the previous one.
8.4 Using the likelihood for
inference
Metropolis algorithm
8.4 Using the likelihood for
inference
A proposal distribution
4
Time from root
2
7
5
3
6
8
1
Order of traversal
8.4 Using the likelihood for
inference
Metropolis algorithm
4
Time from root
2
7
5
3
6
8
1
Order of traversal
8.4 Using the likelihood for
inference
Metropolis algorithm
4
Time from root
2
7
5
3
6
8
1
Order of traversal
8.4 Using the likelihood for
inference
Metropolis algorithm
4
Time from root
2
7
5
3
6
8
1
Order of traversal
8.4 Using the likelihood for
inference
Metropolis algorithm
4
Time from root
2
7
5
3
6
8
1
Order of traversal
8.4 Using the likelihood for
inference
Metropolis algorithm
8.4 Using the likelihood for
inference
Other phylogenetic uses of sampling
AATC
AATT
8.4 Using the likelihood for
inference
Other phylogenetic uses of sampling
AATC
AATC
AATT
8.4 Using the likelihood for
inference
Other phylogenetic uses of sampling
AATT
TTAA
8.4 Using the likelihood for
inference
Other phylogenetic uses of sampling
AAAA
AATC
AATC
TCAA
AATT
TTAA
TCAA
8.4 Using the likelihood for
inference
Other phylogenetic uses of sampling

Inferring the history of populations
Probability density of a coalesence in time =
Probability of a coalesence between any pair
=
*
=
8.4 Using the likelihood for
inference
Inferring the history of populations

When the value of n is large and the value of p is close to 0
the binomial distribution with parameters n and p can be
approximated by a Poisson
distribution with mean n*p
n*p =
=
and x = 1
The probability of a coalesence at the end of the period tk
The total probability of the tree
8.4 Using the likelihood for
inference
The bootstrap



The bootstrap can give a approximation to the posterior.
To much labour, so it is an unattractive alternative for
sampling.
The bootstrap is probably more useful for non-probabilistic
tree building methods.
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
Demo
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
Probabilistic Approaches to
Phylogeny
Conclusion
•
•
•
The methods of today can be used to find the most
probable tree.
Most of the methods were computationally demanding
More realistic evolutionary models are explained Thursday
Probabilistic Approaches to Phylogeny
Contents
•
Introduction/Overview
• Probabilistic Models of Evolution
• Calculating the Likelihood
• Pause
• Evolution Demo
• Using the likelihood for inference
• Phylogeny Demo
• Summary/Conclusion
• Questions
Wouter
Wouter
Wouter
Thomas
Thomas
Thomas
Thomas
Probabilistic Approaches to
Phylogeny
Questions????