Dia 1

Bayesian inference
calculate the model parameters
that produce a distribution that
gives the observed data the
greatest probability
Thomas Bayes
Thomas Bayes? (1701?-1761?)
Bayesian methods were invented in the
18th century, but their application in
phylogenetics dates from 1996.
Bayes’ theorema links a conditional
probability to its inverse
Bayes’ theorem
Prob(H) Prob(D|H)
Prob(H|D) =
∑H Prob(H) Prob(D|H)
in the case of two alternative
hypotheses, the theorem can be written
as
Bayes’ theorem
Prob(H) Prob(D|H)
Prob(H|D) =
∑H Prob(H) Prob(D|H)
Prob(H1|D) =
Prob(H1) Prob(D|H1)
Prob(H1) Prob(D|H1) + Prob(H2) Prob(D|H2)
Bayes for smarties
Bayes’ theorem
m
m
m
m
m
=D
H1=D came from mainly orange bag
H2=D came from mainly blue bag
Prob(D|H1) = ¾ • ¾ • ¾ • ¾ • ¼ • 5 = 405/1024
Prob(D|H2) = ¼ • ¼ • ¼ • ¼ • ¾ • 5 = 15/1024
Prob(H1) = ½
m
m mm
m
m
m
m m
m
m
m m
m m mm m
Prob(H1|D) =
m
m
m
m
m
m
Prob(H2) = ½
m
m
m
m
m m
m
m m
m
m m
m m m
Prob(H1) Prob(D|H1)
Prob(H1) Prob(D|H1) + Prob(H2) Prob(D|H2)
=
½ • 405/1024
½ • 405/1024 + ½ • 15/1024
= 0.964
a-priori knowledge can affect one’s
conclusions
Bayes’ theorem
positive test result
negative test result
ill
true positive
false negative
healthy
false positive
true negative
positive test result
negative test result
ill
99%
1%
healthy
0.1%
99.9%
using the data only, P(ill|positive test result)≈0.99
a-priori knowledge can affect one’s
conclusions
Bayes’ theorem
positive test result
negative test result
ill
true positive
false negative
healthy
false positive
true negative
positive test result
negative test result
ill
99%
1%
healthy
0.1%
99.9%
using the data only, P(ill|positive test result)≈0.99
a-priori knowledge can affect one’s
conclusions
Bayes’ theorem
positive test result
negative test result
ill
99%
1%
healthy
0.1%
99.9%
a-priori knowledge: 0.1% of the population (n=100 000) is ill
positive test result
negative test result
Ill (100)
99
1
Healthy (99 900)
100
99800
with a-priori knowledge: 99/190 of persons with positive test results is ill
P(ill|positive result) ≈ 50%
Bayes’ theorem
a-priori knowledge can affect one’s
conclusions
Bayes’ theorem
a-priori knowledge can affect one’s
conclusions
a-priori knowledge can affect one’s
conclusions
Bayes’ theorem
Behind door 1
Behind door 2
Behind door 3
Result if
staying at
door 1
Result if
switching to
door offered
Car
Goat
Goat
Car
Goat
Goat
Car
Goat
Goat
Car
Goat
Goat
Car
Goat
Car
a-priori knowledge can affect one’s
conclusions
Bayes’ theorem
C=number of the door hiding the car
S=number of the door selected by the player
H=number of the door opened by the host
P(C=c|H=h, S=s) =
P(H=h|C=c, S=s)• P(C=c|S=s)
P(H=h|S=s)
probability of finding the car, after the original selection
and the host’s opening of one.
a-priori knowledge can affect one’s
conclusions
Bayes’ theorem
C=number of the door hiding the car
S=number of the door selected by the player
H=number of the door opened by the host
P(C=c|H=h, S=s) =
P(H=h|C=c, S=s)• P(C=c|S=s)
3
∑ P(H=h|C=c,S=s)
C=1
the host’s behaviour
depends on the
candidate’s selection
and on where the car is.
Bayes’ theorem
a-priori knowledge can affect one’s
conclusions
C=number of the door hiding the car
S=number of the door selected by the player
H=number of the door opened by the host
P(C=2|H=3, S=1) =
1 • 1/3
1/2 • 1/3 + 1 • 1/3 + 0 • 1/3
= 2/3
Bayes’ theorema is used to combine a
prior probability with the likelihood to
produce a posterior probability.
Bayes’ theorem
prior probability
Prob(H) Prob(D|H)
likelihood
Prob(H|D) =
∑H Prob(H) Prob(D|H)
posterior probability
normalizing constant
Bayesian inference of trees
in BI, the players are the tree topology
and branch lengths, the evolution model
and the (sequence) data)
tree topology and branch lengths
A
G
C
T
evolutionary model
(sequence) data
the posterior probability of a tree is
calculated from the prior and the
likelihood
Bayesian inference of trees
prior probability of a tree
likelihood
Prob(
posterior probability
of a tree
A
,C
G
T
|
)=
Prob(
A
,C
G
) • Prob(
T
Prob(
|
A
,C
G
T
)
)
summation over all possible
branch lengths and model
parameter values
the prior probability of a tree is often
not known and therefore all trees are
considered equally probable
Bayesian inference of trees
C
D
D
1
15
A
E
B
C
1
15
A
E
A
E
A
A
E
D
A
E
D
C
B
1
15
A
D
E
C
A
D
E
A
E
A
D
C
E
C
E
1
15
B
1
15
D
E
C
B
1
15
B
1
15
A
C
D
B
A
B
1
15
E
C
D
C
1
15
D
1
15
B
E
C
E
B
1
15
D
1
15
A
C
B
B
A
D
1
15
B
C
B
B
1
15
A
D
C
B
1
15
E
E
C
prior probability
Prob(Tree i)
likelihood
Prob(Data |Tree i)
posterior probability
Prob(Tree i |Data)
Bayesian inference of trees
the prior probability of a tree is often
not known and therefore all trees are
considered equally probable
but prior knowledge of taxonomy could
suggest other prior probabilities
Bayesian inference of trees
(CDE) constrained:
C
D
D
1
3
A
E
B
C
1
3
A
E
A
E
A
A
E
D
A
E
D
C
B
0
A
D
E
C
D
E
E
A
D
C
E
C
E
0
A
D
E
B
0
C
B
0
A
A
C
B
0
D
B
A
B
0
E
C
D
C
0
B
E
D
0
C
E
B
0
A
C
D
0
B
B
A
D
1
3
B
C
B
B
0
A
D
C
B
0
E
E
C
BI requires summation over all possible
trees … which is impossible to do
analytically
Bayesian inference of trees
Prob(
A
,C
G
T
|
)=
Prob(
A
,C
G
) • Prob(
T
Prob(
|
A
,C
G
T
)
)
summation over all possible
branch lengths and model
parameter values
Bayesian inference of trees
but Markov chain Monte Carlo allows
approximating posterior probability
Posterior probability density
1. Start at a random point
tree 1
tree 2
parameter space
tree 3
Bayesian inference of trees
but Markov chain Monte Carlo allows
approximating posterior probability
Posterior probability density
1. Start at a random point
2. Make a small random
move
3. Calculate posterior density
ratio r = new/old state
2
1
tree 1
tree 2
parameter space
tree 3
Bayesian inference of trees
but Markov chain Monte Carlo allows
approximating posterior probability
Posterior probability density
1. Start at a random point
2. Make a small random
move
3. Calculate posterior density
ratio r = new/old state
4. If r > 1 always accept move
2
always accepted
1
tree 1
tree 2
parameter space
tree 3
Bayesian inference of trees
but Markov chain Monte Carlo allows
approximating posterior probability
If r < 1 accept move with a
probability ~ 1/distance
Posterior probability density
1. Start at a random point
2. Make a small random
move
3. Calculate posterior density
ratio r = new/old state
4. If r > 1 always accept move
1
perhaps accepted
2
tree 1
tree 2
parameter space
tree 3
Bayesian inference of trees
but Markov chain Monte Carlo allows
approximating posterior probability
If r < 1 accept move with a
probability ~ 1/distance
Posterior probability density
1. Start at a random point
2. Make a small random
move
3. Calculate posterior density
ratio r = new/old state
4. If r > 1 always accept move
1
rarely accepted
2
tree 1
tree 2
parameter space
tree 3
Bayesian inference of trees
If r < 1 accept move with a
probability ~ 1/distance
5. Go to step 2
Posterior probability density
1. Start at a random point
2. Make a small random
move
3. Calculate posterior density
ratio r = new/old state
4. If r > 1 always accept move
20%
tree 1
the proportion of time that MCMC
spends in a particular parameter region
is an estimate of that region’s posterior
probability.
48%
tree 2
parameter space
32%
tree 3
Bayesian inference of trees
flat
0<b<1
Metropolis-coupled Markov Chain
Monte Carlo speeds up the search
cold chain
cold chain
hot chain: P(tree|data)b
hotter chain: P(tree|data)b
hottest chain: P(tree|data)b
Bayesian inference of trees
hot scout signalling
better spot
Metropolis-coupled Markov Chain
Monte Carlo speeds up the search
Hey!
Over here!
cold scout stuck on local optimum