Consensus methods Strict consensus methods

Systematics - Bio 615
Confidence - Assessment of the Strength of
the Phylogenetic Signal - part 2
1. Consistency Index
2. g1 statistic, PTP - test
3. Consensus trees
4. Decay index (Bremer Support)
5. Bootstrapping / Jackknifing
6. Statistical hypothesis testing (frequentist)
7. Posterior probability (see lecture on Bayesian)
Derek S. Sikes University of Alaska
Multiple optimal trees
•  Many methods can yield multiple equally
optimal trees
Multiple optimal trees
•  If multiple optimal trees are found we know
that all of them are wrong except, possibly,
(hopefully) one
(as species tree, not gene trees)
•  We can further select among these trees
with additional criteria, but
•  Typically, relationships common to all the
optimal trees are summarized with
consensus trees
Consensus methods
•  Some have argued against consensus tree
methods for this reason
•  Debate over quest for true tree (point
estimate) versus quantification of uncertainty
Strict consensus methods
•  A consensus tree is a summary of the agreement
among a set of fundamental trees
•  Strict consensus methods require agreement
across all the fundamental trees
•  There are many consensus methods that differ in:
  1. the kind of agreement
  2. the level of agreement
•  They show only those relationships that are
unambiguously supported by the data
•  Consensus methods can be used with multiple
trees from a single analysis or from multiple
analyses
•  The commonest method (strict component
consensus) focuses on clades/components/full
splits
1
Systematics - Bio 615
Strict consensus methods
Strict consensus methods
TWO FUNDAMENTAL TREES"
•  This method produces a consensus tree that
includes all and only those full splits found in all the
fundamental trees
A!
B!
C!
D!
•  Other relationships (those in which the
fundamental trees disagree) are shown as
unresolved polytomies
E!
A!
•  Can be less optimal than any of the optimal trees
F!
B!
C!
B!
A!
G!
D!
E!
F!
C!
E!
D!
F!
G!
G!
Simplest to interpret
STRICT CONSENSUS TREE!
Majority rule consensus
Majority rule consensus
•  Majority-rule consensus methods require
agreement across a majority of the fundamental
trees
•  This method produces a consensus tree that
includes all and only those full splits found in a
majority (>50%) of the fundamental trees
•  May include relationships that are not supported by
the most parsimonious interpretation of the data
•  Other relationships are shown as unresolved
polytomies
•  The commonest method focuses on clades/
components/full splits
•  Of particular use in bootstrapping and Bayesian
Inference (best not to use for single searches)
•  Implemented in PAUP* and MrBayes
Majority rule consensus
Majority rule consensus
Majority Rule Consensus trees are used for
THREE FUNDAMENTAL TREES
A
B
C
D
E
F
Numbers indicate
frequency of
clades in the
fundamental trees
G
B
A
A
E
C
B
C
D
F
E
D
A
G
F
G
66
100
66
66
66
B
C
E
D
F
G
1. Summarizing multiple equally optimal trees
from one search (but they shouldn’t be!)
2. Summarizing the results of a bootstrapping
analysis (multiple searches)
3. Summarizing the results of a Bayesian
analysis
Don’t confuse these! The numbers on the branches
mean very different things in each case
MAJORITY-RULE CONSENSUS TREE
2
Systematics - Bio 615
Reduced consensus methods
TWO FUNDAMENTAL TREES!
A!
B!
C!
D!
E!
F!
G!
A!
G!
B!
C!
D!
E!
F!
A B!C! D!E! F! G!
A!
B!
C!
D!
E!
F!
Strict component consensus!
completely unresolved!
AGREEMENT SUBTREE - PAUP*!
Taxon G is excluded!
Consensus methods
Three
fundamental
trees
agreement subtree
strict consensus
Ochromonas
Symbiodinium
Prorocentrum
Loxodes
Tetrahymena
Tracheloraphis
Spirostomum
Euplotes
Gruberia
Ochromonas
Symbiodinium
Prorocentrum
Loxodes
Tetrahymena
Spirostomumum
Tracheloraphis
Euplotes
Gruberia
Ochromonas
Symbiodinium
Prorocentrum
Loxodes
Tetrahymena
Spirostomumum
Euplotes
Tracheloraphis
Gruberia
Ochromonas
Symbiodinium
Prorocentrum
Loxodes
Tetrahymena
Euplotes
Spirostomumum
Tracheloraphis
Gruberia
Euplotes excluded
majority-rule
100
100
66
100
66
100
Consensus methods
Symbiodinium
Prorocentrum
Loxodes
Tetrahymena
Spirostomum
Tracheloraphis
Gruberia
Ochromonas
Ochromonas
Symbiodinium
Prorocentrum
Loxodes
Tetrahymena
Spirostomum
Euplotes
Tracheloraphis
Gruberia
Recall
•  Use strict methods to identify those relationships
unambiguously supported by parsimonious
interpretation of the data
•  Stochastic error vs Systematic error
•  Use reduced methods where consensus trees are
poorly resolved
•  These assessment methods help identify
stochastic error
•  Avoid methods which have ambiguous
interpretations. Prevent possible confusion between
MR consensus for an optimal tree search and a MR
consensus for a bootstrapping search
Accuracy and Precision
•  Accuracy
–  Accuracy is correctness. How close a
measurement is to the true value. ""
"(unless we know the “true tree” in ""
"advance we cannot measure this)"
–  How repeatable are the results?
–  How strongly do the data support them?
–  This is a measure of precision (which is
hopefully related to accuracy)
Confidence - Assessment of the Strength of
the Phylogenetic Signal - part 2
1. Consistency Index
2. g1 statistic, PTP - test
3. Consensus trees
4. Decay index (Bremer Support)
•  Precision
–  Precision is reproducibility. How closely
two or more measurements agree with one
another. (this we can measure!)
5. Bootstrapping / Jackknifing
6. Statistical hypothesis testing (frequentist)
7. Posterior probability (see lecture on Bayesian)
3
Systematics - Bio 615
Decay analysis
Branch Support
•  Several methods have been proposed that attach
numerical values to internal branches in trees that
are intended to provide some measure of the
strength of support for those branches and the
corresponding groups
•  These methods include:
  - The Bootstrap (BS) and jackknife
  - Decay analyses (aka Bremer Support)
  - Bayesian Posterior Probabilities (PP or BPP)
•  In parsimony analysis, a way to assess support for a
group is to see if the group occurs in slightly less
parsimonious trees also
•  The length difference between:
the shortest trees including the group and
the shortest trees that exclude the group
(the extra steps required to collapse a group)
is the decay index or Bremer support
Decay analysis -example
Ciliate SSUrDNA data
+27
+45
+10
+15
+7
Ochromonas
Symbiodinium
Prorocentrum
Loxodes
Tracheloraphis
Spirostomum
Gruberia
Euplotes
Tetrahymena
Randomly permuted data
+1
+1
+8
+3
Ochromonas
Symbiodinium
Prorocentrum
Loxodes
Tetrahymena
Tracheloraphis
Spirostomum
Euplotes
Gruberia
Decay indices - interpretation
•  Generally, the higher the decay index the better the
relative support for a group
•  Like Bootstrap values (BS), decay indices may be
misleading if the data are misleading
•  Magnitude of decay indices and BS generally
correlated (i.e. they tend to agree)
•  Only groups found in all most parsimonious trees
have decay indices > zero
Decay analyses - in practice
•  Decay indices for each clade can be determined by:
-  Using PAUP* to search for the shortest tree that
lacks the branch of interest using reverse
topological constraints
-  with the Autodecay or TreeRot programs (in
conjunction with PAUP*) - MacClade 4 will also
help prepare for a Decay analysis
-  An excellent use for the Parsimony Ratchet because finding the shortest tree length is all that
matters (not finding multiple shortest trees)
Decay indices - interpretation
•  Unlike BS decay indices are not scaled (0-100)
–  This has the advantage that the value can exceed 100
whereas BS “tops - out” at 100 meaning that we cannot
distinguish between the support of two branches with BS
values of 100 although one might have a far greater
decay index than the other
•  It is even less clear what is an acceptable decay
index than a BS value…
–  Unlike the BS value very little work has examined the
properties and behavior of decay indices
4
Systematics - Bio 615
Decay indices - interpretation
One key study is that of DeBry (2001)
–  He showed that decay indices should be interpreted in
light of branch lengths
–  That the same values, even within the same tree, do not
represent the same support if the branch lengths differ
-  ie Decay Indices are not easily comparable as measures
of branch support
Confidence - Assessment of the Strength of
the Phylogenetic Signal - part 2
1. Consistency Index
2. g1 statistic, PTP - test
3. Consensus trees
4. Decay index (Bremer Support)
5. Bootstrapping / Jackknifing
-  Values < 4 should be considered weak regardless of
branch length
6. Statistical hypothesis testing (frequentist)
DeBry, R.W. (2001) Improving interpretation of the Decay Index for DNA sequence data. Systematic
Biology 50: 742-752.
7. Posterior probability (see lecture on Bayesian)
Bootstrapping (non-parametric)
Decay values versus Bootstrap and Jacknife values
from one empirical study
•  Bootstrapping is a statistical
technique that uses computer
intensive random resampling
of data to determine sampling
error or confidence intervals
for some estimated parameter
•  Introduced to phylogenetics by
Felsenstein in 1985
•  Based on idea of Efron (1979)
Norén, M. & U. Jondelius. 1999. Phylogeny of the Prolecithophora
(Platyhelminthes) inferred from 18S rDNA sequences. Cladistics 15: 103-112.
Bootstrapping (non-parametric)
1. Characters are sampled with replacement to create
many (100-1000) bootstrap replicate data sets
(think shuffle vs random play of music)
2. Each bootstrap replicate data set is analysed (e.g.
with parsimony, distance, ML)
3. Agreement among the resulting trees is
summarized with a majority-rule consensus tree
5
Systematics - Bio 615
Bootstrapping
Bootstrapping (non-parametric)
•  Frequency of occurrence of groups, bootstrap
support (BS), is a measure of support for those
groups
•  Additional information is given in partition tables (for
groups below 50% support)
•  Can ask PAUP* to create MR con-tree of higher
cut-off, eg 80% - all weaker branches collapse
Resampled data matrix!
Original data matrix!
Taxa
A
B
C
D
Outgp
1
R
R
Y
Y
R
Characters!
2 3 4 5 6 7 8!
R Y Y Y Y Y Y!
R Y Y Y Y Y Y!
Y Y Y Y R R R!
Y R R R R R R!
R R R R R R R!
Taxa
A
B
C
D
Outgp
1
R
R
Y
Y
R
Characters!
2 2 5 5 6 6 8!
R R Y Y Y Y Y!
R R Y Y Y Y Y!
Y Y Y Y R R R!
Y Y R R R R R!
R R R R R R R!
Randomly resample characters from the original data with
replacement to build many bootstrap replicate data sets of the
same size as the original - analyse each replicate data set
A!
B!
C!
1!
2!
8!
7!
6!
5!
4!
3!
D!
Summarize the results of
multiple analyses with a
majority-rule consensus tree
Bootstrap values (BS) are the
frequencies with which
groups are encountered in
analyses of replicate data
sets
A!
A!
1!
2!
B!
C!
8!
6!
6!
B!
C!
D!
D!
5!
5!
96%!
2!
2!
1!
66%!
Outgroup!
Outgroup!
Bootstrapping - an example
Ciliate SSUrDNA - parsimony bootstrap
Ochromonas (1)!
Symbiodinium (2)!
100!
Prorocentrum (3)!
Euplotes (8)!
84!
Tetrahymena (9)!
96!
Loxodes (4)!
100!
Tracheloraphis (5)!
100!
Spirostomum (6)!
100!
Majority-rule consensus
Gruberia (7)!
123456789
Freq
----------------.**...... 100.00
...**.... 100.00
.....**.. 100.00
...****.. 100.00
...******
95.50
.......**
84.33
...****.*
11.83
...*****.
3.83
.*******.
2.50
.**....*.
1.00
.**.....*
1.00
The probability of a character being omitted
from a bootstrap sample ranges from
0-0.367 (depending on N, the number of
characters)
P
0
0.25
0.29
0.31
0.367
Bootstrapping - random data
Partition Table
Partition Table
Bootstrapping
N
1 
2 
3 
4 
…
Outgroup!
Rule of thumb: a branch must be
supported by 3 or more characters to be
recovered in >95% of bootstraps
Randomly permuted data - parsimony bootstrap
Ochromonas
Ochromonas
Symbiodinium
59
71
Prorocentrum
Loxodes
Tracheloraphis
Spirostomumum
Symbiodinium
16
59
26
21
71
16
Prorocentrum
Loxodes
Spirostomumum
Tetrahymena
Euplotes
Tetrahymena
Euplotes
Tracheloraphis
Gruberia
Gruberia
50% Majority-rule consensus (with minority components)
123456789
Freq!
-----------------!
.*****.**
71.17!
..**.....
58.87!
....*..*.
26.43!
.*......*
25.67!
.***.*.**
23.83!
...*...*.
21.00!
.*..**.**
18.50!
.....*..*
16.00!
.*...*..*
15.67!
.***....*
13.17!
....**.**
12.67!
....**.*.
12.00!
..*...*..
12.00!
.**..*..*
11.00!
.*...*...
10.80!
.....*.**
10.50!
.***.....
10.00!
Bootstrap - interpretation
•  Bootstrapping was introduced as a way of
establishing confidence intervals for
phylogenies
•  This interpretation of bootstrap values
depends on the assumption that the
original data is a random sample from a
much larger set of independent and
identically distributed data (i.i.d.)
6
Systematics - Bio 615
Bootstrap - interpretation
•  However, several things complicate this interpretation
-  These assumptions are often wrong - making any
strict statistical interpretation of BS invalid
-  Some theoretical work indicates that BS are very
conservative (too low), and may underestimate
confidence intervals - problem increases with
numbers of taxa
-  BS can be high for incongruent relationships in
separate analyses - and can therefore be misleading
(misleading data -> misleading BS) recall the
Mantra: The data are the things
Bootstrap - interpretation
Huelsenbeck & Rannala (2004) list 3 common interpretations
1. Probability that a clade is correct (accuracy)
2. Robustness of the results to perturbation
(repeatability / precision)
3. Probability of incorrectly rejecting a hypothesis of
monophyly (1-P) : probability of getting that
much evidence if, in fact, the group did not exist
Huelsenbeck, J.P. and Rannala, B. (2004) Frequentist properties of Bayesian posterior probabilities of
phylogenetic trees under simple and complex substitution models. Systematic Biology 53: 904-913.
Bootstrap - interpretation
Be suspicious of
maximum bootstrap
values…
they might be due
to systematic error.
“…bootstrapping provides
us a confidence
interval within which is
contained not [necessarily]
the true phylogeny but
the phylogeny that
would be estimated
on repeated sampling of
many characters from the
underlying pool of
characters.”
Joseph Felsenstein (1985)
Bootstrap - interpretation
•  High BS (e.g. > 85%) is indicative of strong ‘signal’
in the data (some use 70% as the cutoff, there is no
consensus as to which value is best)
•  Provided we have no evidence of strong misleading
signal due to violation of assumptions (e.g. base
composition biases, great differences in branch
lengths) high BS values are likely to reflect strong
phylogenetic signal
•  In other words, although technically they are meant
to be a measure precision, they are usually thought
to be at least strongly correlated with accuracy
Bootstrap - interpretation
•  Low BS values, however, need not mean the
relationship is false, only that it is poorly supported
–  This is especially true of morphological data
–  Morphologists often use the Decay index instead
•  Bootstrapping can be viewed as a way of exploring
the robustness of phylogenetic inferences to
perturbations in the balance of supporting and
conflicting evidence for groups
Paul Lewis
7
Systematics - Bio 615
Bootstrap - interpretation
Two types of precision (Hillis & Bull 1993):
Precision of bootstrap value vs repeatability of
finding a branch:
- Precision of bootstrap values increases with the
number of bootstrap replicates (variance
among analyses decreases)
- Repeatability tells us how likely we are to find the
same results using a different but similar
dataset - Felsenstein’s original idea
Hillis & Bull (1993) examined
precision, repeatability, and
accuracy of the bootstrap
a) 1,089 BS of 100 reps e from 1
“real” dataset =1,089 pseudo datasets
Bootstrap - interpretation
Hillis & Bull (1993) examined precision, repeatability, and
accuracy of the bootstrap
- Found that BS provide a very imprecise measure of
repeatability - so imprecise as to be worthless as a
measure of repeatability
- Determined that in some cases a BS as low as 70%
was equivalent to a 95% probability of being true - Bias
confirmed by Newton (1996)
Hillis, D.M. and Bull, J.J. (1993) An empirical test of bootstrapping as a method for assessing confidence in
phylogenetic analysis. Systematic Biology, 42: 182-192.
Bootstrap - interpretation
BS values have been criticized for a variety of
reasons:
Sanderson, M.J. (1995) Objections to Bootstrapping Phylogenies: A Critique. Systematic Biology, 44:
299-320.
b) 100 real datasets
“Comparison of these two
distributions reveals that the
process of bootstrap resampling
is not the same as repeated,
independent sampling of data.”
Hillis, D.M. and Bull, J.J. (1993) An empirical test of bootstrapping
as a method for assessing confidence in phylogenetic analysis.
Systematic Biology, 42: 182-192.
Jackknifing
•  Jackknifing is very similar to bootstrapping and
differs only in the character resampling strategy
•  Some proportion of characters (e.g. 37%, 50%) are
randomly selected and deleted
•  Replicate data sets are analyzed and the results
summarized with a majority-rule consensus tree
•  Jackknifing and bootstrapping tend to produce
broadly similar results and have similar
interpretations - Jackknifing is preferred by cladists
But the top reason has been that they seem to
be too conservative - ie underestimates of the
probability of the branch being correct - ie
biased downward (erratically & unpredictably)
Newton, M.A. (1996) Bootstrapping phylogenies: Large deviations and dispersion effects. Biometrika,
83: 315-328.
Low Support
Low branch support can result from
1. Conflicting data (homoplasy)
2. Lack of data - even a dataset with no homoplasy can yield
poorly resolved trees if there are branches without change
3. Use of a poorly fitting model (too complex or too simple)
4. Artifact of mid-sized clades? “This indicates that, for all support
measures on trees of a given size, the largest clades and the smallest clades are
supported most strongly, whereas medium sized clades receive lower support”
Picket, K.M. and Randle, C.P. (2005) Strange bayes indeed: uniform topological priors imply non-uniform
clade priors. Molecular Phylogenetics and Evolution 34: 203-211. SEE ALSO: Brandley, M. et al. (2006)
Are unequal clade priors problematic for Bayesian phylogenetics? Systematic Biology 55: 138-146.
8
Systematics - Bio 615
Confidence - Assessment of the Strength of
the Phylogenetic Signal - part 2
1. Consistency Index
2. g1 statistic, PTP - test
3. Consensus trees
4. Decay index (Bremer Support)
5. Bootstrapping / Jackknifing
6. Statistical hypothesis testing (frequentist)
7. Posterior probability (see lecture on Bayesian)
Terms - from lecture & readings
consensus methods
consensus tree
strict consensus
splits
majority rule consensus
reduced consensus trees
agreement subtree
branch support
Decay analysis
Decay index (Bremer Support)
DeBry (2001)
Bootstrapping
resampling with replacement
repeatability
jackknifing
Study questions
Describe the difference between a strict and majority rule
consensus tree."
What were the key findings of DeBry in his (2001) paper on Decay
Indices?"
What is the rule of thumb in bootstrapping for a branch to receive >
95% support?
What are two common but different interpretations of bootstrap
values? What did Hillis & Bull (1993) conclude regarding these
interpretations?"
What are two common explanations for low branch support?"
9