Searching methods How do we search tree space? How many

Searching tree space.
Tree confidence and comparison.
Introduction to Bayesian methods.
How do we search tree space?




Search algorithms find the best tree for the data.
Two methods guaranteed to find the globally
best tree.
Exhaustive search: every single tree
Branch and bound: discard one tree usually
means a set of subtrees are bad.
Algorithm is much faster than exhaustive
search.
Searching methods
For ML and MP trees
Figures from Felsenstein’s Inferring
phylogenies
How many
possible
trees?
Too many for
computational
tractability.
Branch Swapping Strategies
Global vs Heuristic Search
S



Most data sets are large.
 No possibility of exploring all trees
Currently few packages even have these
options (Exception is PAUP4).
Goal: Balance thorough search against
tractability and speed.
Heuristic Search Strategies
2. SPR: Subtree
pruning
and reconnection.
Break a branch off and
reconnect the “root”
somewhere else (any
other branch).
T
1. NNI:
nearest neighbor
interchange.
V
U
S
T
Dissolve interior
branch and form each
alternative.
V
U
S
T
S
T
U
V
U
V
Heuristic Search Strategies
Tree bisection
and
reconnection
(TBR)
Break a branch
into two. Then
reattach using a
different branch.
Which Branch Swapping to Use?
NNI --> STR --> TBR




But all can get stuck on local optima.
Local maximums
Global maximum
NNI and STR subsets of TBR.
Increasingly accurate: TBR best.
Decreasingly fast: TBR slowest.
PAUP has all three methods
 phyml just NNI and STR.
Traveling through tree space by accepting
increasingly better trees (hill climbing)
But all can get stuck on local optima.
Maximum Likelihood
tree.
Traveling through tree space
But all can get stuck on local optima.
…or Maximum
Parsimony tree.
Traveling through tree space
Starting tree influences outcome.
Global maximum
If start here then…
Two effective methods
…end up here.
Global maximum
If start here then…
Starting tree influences outcome.
…end up here.
Global maximum
If start here then…
Sequential addition of taxa.
Felsenstein
recommendation:
best resolved taxa
listed first
 Add increasingly
unresolved
relationships.

Sequential addition of taxa.
Best known first
 Requires
knowledge of the
biology (oh dear!)
 Must know what
the question is.
 Must limit the
question that you are
asking.
Or use Stepwise Addition.

Or use Stepwise Addition:
multiple starting points.
Global maximum
Global maximum
Suppose you start at multiple random trees to
increase chance of covering tree space.
Fastest: start with MP or NJ tree
Global maximum
NJ distance tree
Better chance of reaching the global maximum.
Use a quick method. Hope it lands near the
global max. No guarantees but usually OK.
Nonparametric bootstrap
Tree confidence measures.

Branch support
And tree comparisons.




Bootstrap (non parametric)
Sequences are resampled 1000 times (at least)
with replacement.
 Search for best tree for each of the 1000
replicated sequences/
 bootstrap consensus is the majority rule
consensus of 1000 trees from 1000 sequences.
 branches are labeled by % occurrence in 1000
trees.

Felsenstein, J. 1985. Evolution 39: 783-791
Invented by Bradley Efron in the1970s.
Adapted for phylogenies by Felsenstein in
1985.
Most commonly used measure of tree support.
For MP, distance and ML methods.
Unnecessary for Bayesian methods.
The bootstrap (nonparametric)

Used in statistics as confidence levels when the data
distribution is unknown (Efron, 1979).
Eg.
Eg. Suppose data is
not normally
distributed.
Then a pseudosample provides
variation estimate.
The Bootstrap Replicate
The bootstrap (nonparametric)
Sites along the sequence
Original
The boot strap samples the sequence
alignment with replacement.
 So the sequence length is the same as
the original sequence
 Some sites are randomly sampled
multiple times, while others are
randomly omitted.

Example: 5 replicate alignments
One best
tree for
each
replicated
sequence
alignment
Resamples, with
replacement, all sites
along the alignment
Sample 1
Sample 2
Some sites are sampled
twice or more and some
not at all.
… and so on to 1000 total
samples, then infer 1 best tree
for each replicate sequence.
Notice that it is the
sequences that are
bootstrapped, not the
tree.
Example: 5 replicate sequences
Collection of trees built from
the replicated data.
Collection of trees built from
the replicated data.
Count the number of times
each partition occurs in all
the trees.
Count the number of times
each partition occurs in all
the trees.
Any partition that occurs in
more than 50% of trees
shows up in the majority rule
consensus tree - the
“bootstrap tree”.
Any partition that occurs in
more than 50% of trees
shows up in the majority rule
consensus tree - the
“bootstrap tree”.
3/5 = 60% of the trees have
the E-A clade.
Bootstrap
Bootstrap
79
53
68
94
88
91
86
99
73
90
80
100
74
100
100
62
91
91
55
100
100
100
100
55
97
100
72
74
82
84
51
99
99
50
90
88
99
100
Gulo gulo
Martes pen
Martes ame
Martes mel
Martes mar
Martes zib
Martes Foi
Martes fla
Mustela ev
Mustela fu
Mustela pu
Mustela lu
Mustela si
Mustela it
Mustela er
Mustela al
Mustela ni
Mustela vi
Taxidea ta
Meles mele
Spilogale
Mephitis m
Enhydra lu
Aonyx cape
Amblonyx c
Lontra fel
Lontra lon
Lutra lutr
Lutra macu
Pteronura
A forsteri
Zalophusca
walrus
C cristata
P fasciata
P groenlan
grayseal
harborseal
E barbatus
H leptonyx
Weddell seal
M schauins
ringtail
Racoon
Panda
PolarBear
Grizzly
mongoose
cat
dog
fox
donkey
horse
indiarhino
whiterhino
blackrhino
tapir
pig
sheep
cow
alpaca
pygmyhippo
hippo
blue
gray
fin
humpback
bowhead
n right
dolphin
Bootstrap tree
Pseudosampling the
sequence.
79
53
68
94
88
91
86
99
73
90
80
100
74
Idea: if many sites
support a clade then it
will appear in most
random replicates.
100
100
62
91
91
55
100
100
100
100
55
97
If just one site, many
replicates will lack this
site.
Bootstrap
100
72
74
82
84
51
99
99
50
90
88
99
100
Gulo gulo
Martes pen
Martes ame
Martes mel
Martes mar
Martes zib
Martes Foi
Martes fla
Mustela ev
Mustela fu
Mustela pu
Mustela lu
Mustela si
Mustela it
Mustela er
Mustela al
Mustela ni
Mustela vi
Taxidea ta
Meles mele
Spilogale
Mephitis m
Enhydra lu
Aonyx cape
Amblonyx c
Lontra fel
Lontra lon
Lutra lutr
Lutra macu
Pteronura
A forsteri
Zalophusca
walrus
C cristata
P fasciata
P groenlan
grayseal
harborseal
E barbatus
H leptonyx
Weddell seal
M schauins
ringtail
Racoon
Panda
PolarBear
Grizzly
mongoose
cat
dog
fox
donkey
horse
indiarhino
whiterhino
blackrhino
tapir
pig
sheep
cow
alpaca
pygmyhippo
hippo
blue
gray
fin
humpback
bowhead
n right
dolphin
Many clades may be
unresolved - all sorts
of polytomies.
These mean that less
than 50% of the
boostrap trees support
any particular clade.
(Bayes posterior
probabilities have a
similar meaning).
Bootstrap
79
53
68
94
88
91
86
99
73
90
80
100
74
100
100
62
91
91
55
100
100
100
100
55
97
100
72
74
82
84
51
99
99
50
90
88
99
100
Gulo gulo
Martes pen
Martes ame
Martes mel
Martes mar
Martes zib
Martes Foi
Martes fla
Mustela ev
Mustela fu
Mustela pu
Mustela lu
Mustela si
Mustela it
Mustela er
Mustela al
Mustela ni
Mustela vi
Taxidea ta
Meles mele
Spilogale
Mephitis m
Enhydra lu
Aonyx cape
Amblonyx c
Lontra fel
Lontra lon
Lutra lutr
Lutra macu
Pteronura
A forsteri
Zalophusca
walrus
C cristata
P fasciata
P groenlan
grayseal
harborseal
E barbatus
H leptonyx
Weddell seal
M schauins
ringtail
Racoon
Panda
PolarBear
Grizzly
mongoose
cat
dog
fox
donkey
horse
indiarhino
whiterhino
blackrhino
tapir
pig
sheep
cow
alpaca
pygmyhippo
hippo
blue
gray
fin
humpback
bowhead
n right
dolphin
79
53
68
Bootstrap tree
94
88
86
99
73
91
90
80
Note: multiple tests
problem: each clade is a
separate hypothesis.
Note that high bootstrap
support will be misleading
if model assumptions
violated.
Not a way to check the
fit of the model.
100
74
100
100
62
91
91
55
100
100
100
100
55
97
100
72
74
82
84
51
99
99
50
90
88
99
100
Gulo gulo
Martes pen
Martes ame
Martes mel
Martes mar
Martes zib
Martes Foi
Martes fla
Mustela ev
Mustela fu
Mustela pu
Mustela lu
Mustela si
Mustela it
Mustela er
Mustela al
Mustela ni
Mustela vi
Taxidea ta
Meles mele
Spilogale
Mephitis m
Enhydra lu
Aonyx cape
Amblonyx c
Lontra fel
Lontra lon
Lutra lutr
Lutra macu
Pteronura
A forsteri
Zalophusca
walrus
C cristata
P fasciata
P groenlan
grayseal
harborseal
E barbatus
H leptonyx
Weddell seal
M schauins
ringtail
Racoon
Panda
PolarBear
Grizzly
mongoose
cat
dog
fox
donkey
horse
indiarhino
whiterhino
blackrhino
tapir
pig
sheep
cow
alpaca
pygmyhippo
hippo
blue
gray
fin
humpback
bowhead
n right
dolphin
Bootstrap tree
Simply an accounting of
how many bootstrap
sequences support a
particular clade.
clade.
Branch lengths cannot be
represented in the
consensus tree itself.
ML tree
with Bootstrap
values
ML tree
with Bootstrap
values
A bootstrap
toplology need not
match the ML
toplology
E.g. this clade is not in
the bootstrap
concensus tree
Convention: published trees
A maximum
likelihood tree with
bootstrap values added
in Intaglio or Illustrator
or Word (not so great).
Also programs like Fig
Tree will display the
bootstrap values.
In short:
So rare, but informative, sites are only
rarely sampled and so do not in show up
in all bootstrap trees.
 Hence the clades supported by just a
very few sites will not be resolved.
 This is the point: high bootstrap
values show that many sites support the
clade.

Comparing alternative trees
Do the trees have
significantly different
tree scores?
Statistical tests to compare
alternative trees.
Pair-wise site
differences
Kishino-Hasegawa (KH) test
Simply a paired t-test comparing two trees.
Calculate the pair-wise differences at each
site for each tree.
Sum the differences over all sites.
Calculate the standard error of the pairwise differences (SE).
lnL/SE >1.96, p ≤ 0.05 significantly
different trees
Kishino & Hasegawa. 1989. J. Mol. Evol. 29:170-179
Shimodaira-Hasegawa (SH) test




A newer variant of the KH that corrects for multiple
tests & some bias
Should also correct KH for multiple tests (critical
value is for 0.05 / # trees tested.
For both, use RELL-calculated p-values (Resampling
Estimated log Likelihood.
For both, one-sided test if ML is one of the trees.
– Shimodaira & Hasegawa. 1999. MBE
16(8):1114-1116
Here I
compared
10 trees.
Four were
statistically
poorer than
the ML tree.
Here I
compared
10 trees.
Four were
statistically
poorer than
the ML tree.
Likelihood vs. Bayesian methods

Bayesian methods I
Likelihood:
 L = Pr(D/H)
 (joint) Probability of the data (D) given the
hypothesis (H)
 H may be a tree or a branch length or a
model parameter
 D is the sequence of nucleotides
Likelihood vs. Bayesian methods

Bayesian adds a prior:
 Pr(H/D) = Pr (D/ H) (Pr (H) / Pr(D)
 Probability of the hypothesis (H) given the
data (D).
 The product of the Likelihood and the Prior
 Typically uses Monte Carlo Markov Chain
(MCMC) to search tree space.
Bayesian

Bayesian adds a prior :
Pr(H/D) = Pr (D/ H) (Pr (H) / Pr(D)
Likelihood
The prior
The probability of the data over all trees.
Bayesian analysis and MCMC


Monte Carlo Markov Chain
 combines parameter estimation with the tree
search algorithm
• (integrates over tree and parameter space)
Whereas, conventional Likelihood
 tree search conditioned on parameters
estimated earlier by a pretty good tree.
• (integrates over the tree space)
Bayesian simultaneously estimates
parameters and trees
ω = model
parameters
allowed to
vary.
Maximum likelihood fixes the
parameters and then estimates the
tree.
Search method

ω model
parameters
are
estimated
first and
then fixed.



Markov Chain Monte Carlo (MCMC)
 Simulates a walk through parameter
and tree space.
Analogous to Maximum Likelihood
heuristic search
 “hill climbing” through tree space to
find highest likelihood tree.
Thanks to Mark Holder for the portions
of the following slides.
From the Workshop on Molecular
Evolution, Woods Hole, MA, July, 2003.
Lewis, Paul Tuesday’s reading
Begins with a wander through space.
Avoid
getting
stuck on
local
optima.
Early steps are discarded: Burn-in.
Metropolis algorithm for MCMC
Propose a new location.
 Calculate height of new location.
 R= new height/old height.
 Move with probability that is a
function of R.
 Always move if R>1.

Mr. Bayes MCMC search
Moves to next tree if R>1.
R = ratio of the new tree height to the present one.
Moves with low probability (0.03)
If R < 1, then probability of the move = R
Moves with high probability (0.92)
The Process
Initial tree and model parameters
May be random
 Accept new move?
 MCMC rules: accept or reject.
 Run millions of generations.
 Save tree, branch lengths, parameters
every k generations.
After n generations, summarize results.

If R < 1, then probability of the move = R
How many generations?
Exploring tree and parameter space: some
big steps help explore other locally
optimal hills.
Very often: tens of millions of generations
Correlations in parameter space:
Base frequency estimates change as
Ts/Tv changes.
0.60
20
18
0.50
parameter estimates
A large step across narrow
correlated parameter space.
16
14
0.40
12
10
0.30
8
0.20
6
4
0.10
2
0.00
0
A
C
G
T
ts/tv
E.g. base frequency depends on
TS/TV ratio
So we use MCMCMC

Base
frequency
of C


A small change in C will
send the TS/TV ratio right
out of the optimal zone

Metropolis Coupled Monte Carlo
Markov Chains
Run at least 4 chains simultaneously.
One chain - cold chain - explores with
relatively short steps.
Others - heated chains - explore with big
steps: cover much more of the tree space.
TS/TV ratio
Advantages of MCMCMC
Cold chain with short steps: better explores parameter space.
Advantages of MCMCMC
Heated chains may miss optimal parameter space but
cover tree space more thoroughly.
Metropolis-coupled Markov
Chain Monte Carlo: MC3
Advantages of MCMCMC




Run multiple (at least four) chains
simultaneously.
Cold chain is the main chain - the one that
shows up in the buffer and on the output.
3 heated chains that take bigger steps across
posterior probability hills.
A heated chain sometimes swaps to the cold
chain if hot chain finds better space.
Short steps: may miss globally optimal hill.
Hot chains may become the cold
chain.
Chain results:
1 -- [-41631.791] (-43694.786) (-42920.096)
(-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304)
1000 -- (-32120.952) (-31590.257) (-31579.554)
[-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51
Hot chains may become the cold
chain.
Chain results:
1 -- [-41631.791] (-43694.786) (-42920.096)
(-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304)
1000 -- (-32120.952) (-31590.257) (-31579.554)
[-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51
Average standard deviation of split frequencies:
0.106151
Average standard deviation of split frequencies:
0.106151
2000 -- (-30922.429) (-30900.476) (-30861.073)
[-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40
2000 -- (-30922.429) (-30900.476) (-30861.073)
[-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40
Standard deviations ≤ 0.01?
White noise - no trend over
generations.
Chain results:
1 -- [-41631.791] (-43694.786) (-42920.096)
(-42782.307) * (-42388.547) [-41306.253] (43688.544) (-42883.304)
1000 -- (-32120.952) (-31590.257) (-31579.554)
[-31096.284] * (-31353.766) (-31437.477) [31176.966] (-31814.110) -- 0:08:51
Average standard deviation of split frequencies:
0.106151
2000 -- (-30922.429) (-30900.476) (-30861.073)
[-30822.676] * [-30826.747] (-30849.901) (30848.131) (-30874.821) -- 0:07:40
MCMC warnings

Apparent plateau may be local not global
optima.
 Failure to run long enough.
ML score is no longer improving.