Evolutionary Trees - New Mexico State University

Systematics
the branch of biology that infers phylogeny, classification,
(taxonomy), and patterns of macroevolution (whether
they be morphological, coevolutionary, biogeographic)
phylogeny - genealogy, literally the relationships of
organisms, often presented in the form of a "family tree"
with axes of time and divergence
although biologists accept as fact that organisms have
ancestors, phylogenies cannot be known except for very
rapidly evolving organisms (e.g., bacteria); phylogenies are
hypotheses, they must be inferred or estimated
Linnaean classification - system of “binomial nomenclature”
(i.e., genus and species) and hierarchical classification, with
increasingly inclusive taxa of higher rank
a system of classification should be one from which we can
recover information about phylogeny
a hierarchical system is one that is similar to the branching
of a tree, with higher ranks closer to the trunk and lower
ranks closer to the tips of the branches
taxonomic category = rank = level, e.g., the genus rank is
above the species
taxon – a proper noun, an entity, like your own name
traditional taxonomic ranks
Kingdom, Phylum, Class, Order, Family, Genus, Species
common prefixes
higher: Super-, Supra-,
lower: Sub-, Infra-, Parvcohort – a term used to describe a taxon of no predefined
rank
Phylogenies are expressed graphically as “trees”
a tree has a root, nodes, and branches which may be
internodes or tips, ending at terminal taxa or “operational
taxonomic units” (OTUs)
deeper nodes in a tree can represent higher taxonomic ranks
of increasing inclusiveness
sister taxa - two most closely related taxa
sister taxon - the next most related taxon to another
outgroup - a taxon unrelated to some group of interest
a network differs from a tree because it lacks a root, i.e., the
outgroup is not distinguished from the ingroup (it may be
unknown)
cladogenesis - the splitting of a lineage into two
daughters
anagenesis - some measure of change of a lineage
through time without cladogenesis
higher taxa are meant to describe monophyletic groups
monophyly - the property of a group being descended
from a common ancestor and including all descendants
of that ancestor; in other words, all the taxa that branch
from a single node in a tree
monophyletic taxa are each others’ nearest relatives
clade - a monophyletic group
grade - a group of organisms that are similar because of
the retention of primitive characters, rather than
convergent)
monophyly – the property of group that includes all descendants
of a common ancestor
polyphyly - the property of being unrelated by descent (i.e.,
winged organisms are polyphyletic)
paraphyly - the property of being descended from a common
ancestor but not including all evolutionary derivatives of that
ancestor (i.e., reptiles are a paraphyletic group)
Monophyly – the property of an
inclusive group of organisms of
shared common ancestry
Polyphyly – the property of being
unrelated by descent
Paraphyly – the property of a group
of organisms of shared common
ancestry that does not include all of
the evolutionary derivatives of that
common ancestor
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e
Monophyletic groups are the only ones
intended to be classified taxonomically
Paraphyletic groups are undesirable
in classification because those
organisms most closely related
(i.e., a and b) are not grouped together
a
b
c
-most likely to have been based on superficially
conspicuous traits, therefore many examples discovered
with the application of molecular data to large samples
d
e
“Apes” are a paraphyletic group
guenons
gibbons
orang
gorilla
chimps
human
Neotropical toucans
Ramphastidae
Neotropical barbets
Capitonidae
African barbets
Lybiidae
Asian barbets
Megalaimidae
There are plenty of examples of paraphyletic groups among birds
e.g., “barbets”
phenogram - tree reflects similarity
stratophenogram - tree reflects similarity against
measured time
cladogram - independent of time, simply a rooted
network derived from character data
phylogram - branch lengths proportional to the number
of character state changes that occur along each lineage
consensus tree - combining information from different
trees into one summary tree
strict consensus - makes polytomies of all nodes
for which there are differences
majority rule consensus - shows percentage of
trees supporting the most frequently obtained tree
stratophenogram
Equal branches – cladograms
unrooted network
Unequal branches – phylograms
unrooted network
individual nodes do not convey information about branching order
they define all the descendants of a common ancestor, i.e., a monophyletic group or clade
A
B
(AB)
(ABC)
(ABCD)
C
D
E
similarly, individual nodes of majority rule consensus trees do not convey information
about branching order
they convey what percentage greater than 50% of trees recover a particular node, i.e.,
group all the same OTUs as descendants of that ancestral node
A
B
C
D E
A
A
B
C
B
D E
75%
A
B
C
D E
100%
100%
A
C
B
D E
C
D
E
character - any trait, e.g., morphological, developmental,
behavioral, molecular (i.e., relating to DNA), biogeographic,
etc.
character state - a variation of a character
Examples
Character – color
Character states – red, white, blue
Character – number of eyes
Character state – one, two, three, eight
Character – a nucleotide position in a DNA sequence
Character state – adenine, cytosine, guanine, thymine
Character – gene
Character state – allele
How do we go about inferring phylogeny?
usually, in the same way that organisms are classified (that is
circular)
historically, on the basis of similarity - largely justifiable since
organisms that are similar are usually related
but primitive characteristics can be misleading
examples
paraphyly in reptiles (anapsids  turtles, synapsids  mammals, diapsids
 birds)
paraphyly in apes  humans
paraphyly in barbets  toucans
homology - the property of any traits, genotypic or
phenotypic, that are shared by two or more biological
entities (taxa, individuals) by virtue of inheritance
(i.e., descent) from a common ancestor, whether or not
similar in function
analogy - similarity in function without homology,
convergent
homoplasy - convergence, reversal, parallelism, i.e., any
character state shared for any reason other than inheritance
from a common ancestor
apomorphy – a derived character state
synapomorphy – a shared derived character state that is
specific to a clade; it unites members of that clade
autapomorphy – a derived character state unique to only
one taxon
plesiomorphy – a primitive character state that is not
specific to a clade because it also exists in outgroups
symplesiomorphy – a shared primitive character state
both plesiomorphies and apomorphies are homologous
characters, in contrast to homoplasies
a single trait can be both plesiomorphic and apomorphic,
but in different contexts
e.g., the possession of four legs is a synapomorphy of
tetrapod vertebrates, but it is a symplesiomorphy of
mammals
assume (ABC) is a monophyletic clade
synapomorphy of clade (ABC)
A
B
C
D E
plesiomorphy of C in clade (ABC)
A
B
C
D E
autapomorphy of B
A
B
C
D E
symplesiomorphy of clade (ABC)
A
B
C
D E
Assessments of Character Homology and
Character State Polarity
a priori – “before hand” – an educated guess based on
detailed similarity, development, outgroup comparison, etc.
a posteriori – “after the fact” - homology evaluated in the
context of formal phylogenetic analysis
many putatively homologous characters are "mapped" onto a
tree, many are found to be mutually inconsistent with other
characters that favor a different branching pattern of a tree
Example
OTUs
species A
species B
species C
characters
1, 1, 1
1, 1, 2
2, 2, 1
Importance of a posteriori Homology Assessment
homoplasy is rampant
paraphyletic groups based on symplesiomorphies are common
Methods of Phylogenetic Inference
phenetic
cladistic or parsimony
maximum likelihood
Bayesian
multispecies coalescent
phenetic - similarity or distance data, quantitative, not
qualitative (includes stratophenetics)
"distance" - a measure of similarity is treated as a
measure of relatedness
a pairwise matrix of distances is "fit" to a tree
example of phenetic phylogeny reconstruction
taxa
ass
bat
cat
dog
characters
1
2
A
T
A
C
A
C
A
T
3
A
A
G
G
4
T
T
T
T
5
T
T
C
C
taxa
ass
bat
cat
dog
characters
1
2
A
T
A
C
A
C
A
T
3
A
A
G
G
4
T
T
T
T
5
T
T
C
C
PHENETIC APPROACH:
count number of character state differences between each pair of taxa
distance matrix:
ass
bat
cat
dog
ant
-
bat
1
-
cat
3
2
-
dog
2
3
1
-
PHENETIC APPROACH:
distance matrix:
ass
bat
cat
dog
ass
-
bat
1
-
cat
3
2
-
dog
2
3
1
-
then, fit distances to trees
but these distance aren't perfectly metric (additive)
A and B are sisters, but A is closer to D than it is to C, but B is closer to C than it is to D
ass
bat
cat
1
0
1
1
0
dog
0
ass
bat
cat
0
1
0
0
1
dog
1
PHENETIC APPROACH:
distance matrix:
ass
bat
cat
dog
ass
-
bat
1
-
cat
3
2
-
dog
2
3
1
-
Unweighted Pair Group Method with Arithmetic Mean (UPGMA)
fit distances to trees, beginning with closest pairs
ass
bat
cat
½
½
½
dog
½
ass
bat
cat
½
½
½
⅝
⅝
Join nodes using average distance between all OTUs being joined
(ass, bat, cat, dog) = (2+2+3+3)/4 = 2.5
dog
½
fractional nucleotide changes are impossible, but
distances are usually calculated for larger numbers of
characters
there isn't any way to fit the original distances together on
a dichotomously bifurcating tree, so they are averaged
even though A and D differ only by 2 characters,
parsimony will show that there are really 4 state
homoplasious changes between them that are undetected
by the phenetic approach
There are more sophisticated phenetic methods, too, that
attempt to ‘correct’ distances for undetected homoplasy
phenetic methods must be used if the data collected
are inherently quantitative rather than qualitative (i.e.,
continuously variable like percentages, e.g., DNA of
chimps and humans is 98% identical, chimps and
gorillas 95% identical, etc.)
unlike phenetic methods, cladistic, maximum
likelihood, and Bayesian methods utilize qualitative
character data and employ an optimality criterion
optimality criterion - a method for evaluating
competing hypotheses of phylogeny; a predefined
metric of how to evaluate what is ‘best’
cladistic or parsimony analysis - employs an optimality criterion
known as parsimony
parsimony - "stingy", the network that invokes the fewest
number of character state changes
in cladistic analysis, synapomorphies are the only kind of
characters that are useful as evidence of monophyly
homoplasies, autapomorphies, and symplesiomorphies are
considered “uninformative” - this distinguishes parsimony from
phenetic and all other methods, which use all characters
unlike phenetic methods, parsimony can often distinguish
homoplasy from homology, and apomorphy from plesiomorphy
Steps in cladistic analysis
1) define all topologically discrete unrooted networks
2) map characters one at a time onto each network
this is in contrast to the phenetic approach, which compared differences
in pairs of OTUs
- phenetic comparison by OTUs
- cladistic comparison by characters
3) choose the optimal network using the criterion of parsimony
4) root the optimal tree using an unambiguous outgroup
Number of topologically distinct unrooted networks is defined by number of OTUs
OTUs
networks
3
1
4
3
5
15
6
105
7
945
8
10,395
9
135,135
10
2,027,025
(you get the point)
there are more topologically distinct unrooted networks for 32 OTUs than the
estimated total number of atoms in the universe (or so I’ve read)
therefore, various heuristic methods must be relied upon to find optimal networks
when considering more than ~12 OTUs rather than evaluating all possible networks
example of cladistic phylogeny reconstruction
OTUs
Ant
Bat
Cat
Dog
characters
1
2
A
T
A
C
A
C
A
T
3
A
A
G
G
4
T
T
T
T
5
T
T
C
C
step 1: define all the topologically distinct networks
There are three topologically distinct unrooted networks
for 4 OTUs
Ass
Cat
1)
Bat
Ass
Bat
2)
Dog
Cat
Ass
Bat
3)
Dog
Dog
Cat
Ass
Cat
1)
Bat
Dog
Note: all the networks below are identical to number 1) !
A
C
B
C
B
D
A
D
B
D
A
D
A
C
B
C
C
A
D
A
D
B
C
B
D
B
C
B
C
A
D
A
step 2: map each character one by one onto each network
OTUs
characters
1
2
A
T
A
C
A
C
A
T
Ass
Bat
Cat
Dog
Ant
Cat
1)
Bat
Ant
3
A
A
G
G
4
T
T
T
T
Bat
2)
Dog
Cat
5
T
T
C
C
Ant
Bat
3)
Dog
Dog
Cat
the minimum number of character state changes that can be explained by a
network is one less than the number of character states
the actual number of changes observed depends on the topology of the network
OTUs
characters
1
2
A
T
A
C
A
C
A
T
Ass
Bat
Cat
Dog
3
A
A
G
G
4
T
T
T
T
5
T
T
C
C
character 1
Ant
Cat
1)
Ant
Bat
2)
Bat
Dog
0 changes
Ant
Bat
3)
Cat
Dog
0 changes
Dog
Cat
0 changes
OTUs
characters
1
2
A
T
A
C
A
C
A
T
Ass
Bat
Cat
Dog
3
A
A
G
G
4
T
T
T
T
5
T
T
C
C
character 2
Ant
Cat
1)
Ant
Bat
2)
Bat
Dog
2 changes
Ant
Bat
3)
Cat
Dog
2 changes
Dog
Cat
1 change
OTUs
characters
1
2
A
T
A
C
A
C
A
T
Ass
Bat
Cat
Dog
3
A
A
G
G
4
T
T
T
T
5
T
T
C
C
character 3
Ant
Cat
1)
Ant
Bat
2)
Bat
Dog
1 change
Ant
Bat
3)
Cat
2 changes
Dog
Dog
Cat
2 changes
OTUs
characters
1
2
A
T
A
C
A
C
A
T
Ass
Bat
Cat
Dog
3
A
A
G
G
4
T
T
T
T
5
T
T
C
C
character 4
Ant
Cat
1)
Ant
Bat
2)
Bat
Dog
0 changes
Ant
Bat
3)
Cat
Dog
0 changes
Dog
Cat
0 changes
OTUs
characters
1
2
A
T
A
C
A
C
A
T
Ass
Bat
Cat
Dog
3
A
A
G
G
4
T
T
T
T
5
T
T
C
C
character 5
Ant
Cat
1)
Ant
Bat
2)
Bat
Dog
1 change
Ant
Bat
3)
Cat
Dog
2 changes
Dog
2 changes
Cat
step 3: sum character state changes for each network
Ass
Cat
1)
Ass
Bat
2)
Bat
Dog
Cat
Ass
Bat
3)
Dog
Dog
Cat
Character 1:
Character 2:
Character 3:
Character 4:
Character 5:
Network 1)
0 steps
2 steps
1 step
0 steps
1 step
Network 2)
0 steps
2 steps
2 steps
0 steps
2 steps
Network 3
0 steps
1 step
2 steps
0 steps
2 steps
Sum:
4 steps*
6 steps
5 steps
*most parsimonious
step 4: root optimal tree by outgroup
Ass
Cat
Bat
Dog
Dog
Cat
Bat
Ass
what if all character state changes are not equally probable?
transition substitution: purine  purine (A  G) or pyrimidine  pyrimidine (C  T)
transversions substitution: purine  pyrimidine (A or G  C or T)
if transition substitutions occur at twice the rate of transversions
then is it appropriate to count them equally?
enter the realm of substitution modeling…
maximum likelihood - employs an optimality criterion of
maximum likelihood
a computationally intensive method that calculates the
likelihood of terminal taxa exhibiting the character states they
do on a given network, given one of numerous models for the
probability of character state changes along branches
(likelihood ~ chance that something would come to pass the
way it did under specified conditions)
(probability – chance that something will happen in the
future)
of all possible networks evaluated, the one calculated to have
the highest likelihood score is chosen as optimal
a great method if the model chosen is accurate
Bayesian – employs an optimality criterion of Bayesian
posterior probability, based on Bayesian statistics
similar to maximum likelihood in using substitution
models but faster and capable of handling independent
data partitions with multiple substitution models
simultaneously
computationally faster because
1) it relies on the specification of priors (previously
“known” information) based on Bayesian statistics
2) it uses a Monte Carlo Markov Chain model to explore
Bayesian posterior probabilities in tree space (the
“universe” of possible trees, given that it is not
necessarily tractable to really calculate them all)
multispecies coalescent - finds a species tree that best
combines a large collection of independently inferred gene
trees that may differ from one another
gene trees can be inferred by any method, but it is usually
maximum likelihood
the best species tree is not necessarily the majority consensus
of gene trees
the method is designed to minimize the impact of a
phenomenon known as incomplete lineage sorting, in which
some ancestral polymorphisms are inherited by one daughter
species but not another, and thus may produce incongruence
among gene phylogenies and to species phylogeny
Confidence in phylogenetic estimates
the one thing that is guaranteed in phylogenetic analysis is
that it will produce a tree
phylogenetic reconstructions can differ because
• they are dependent on the data they use
• varying assumptions of techniques
• varying models and parameters chosen to describe
character state transformations and priors
since there is only one historical truth, how do we evaluate
contradictory phylogenetic reconstructions?
"feel good" confidence vs statistical confidence
non-rigorous methods of assessing confidence
• congruence between independent datasets
• congruence between different methods of analysis on
the same data
• Phenetic - residuals from least squares fitting of
distances to branches
• Maximum likelihood and Bayesian - difference in
likelihoods or Bayesian posteriors
• Parsimony –
• heteroskedasticity (skewness) of tree length
distribution
• length to next shortest tree
• Decay (Bremer) Index
• Consistency Index
Consistency Index
for one character - the minimum number of times a
character could undergo state changes if a network was
optimized for that character alone (i.e., number of
observed character states minus one) divided by the actual
number of state changes on the network in question
for a tree - average of all CI's (polymorphic characters only)
Consistency Index is always a value between 0.0 and 1.0
CI = 1.0
CI = 0.5
Statistical methods – the jackknife
resampling without replacement
reiterative analysis omitting one different taxon each time
a majority rule consensus tree shows the percentage of trees
supporting each node
Statistical methods – the bootstrap
resampling with replacement
reiterative analysis of a data sets equal in size to the original
but generated by randomly sampling the original data
(thus, some data will be sampled repeatedly, others not at all)
a majority rule consensus tree shows the percentage of trees
supporting each node
the bootstrap is the only method that has been ‘calibrated’ to
statistical confidence intervals
what is a statistical confidence interval?
it is measured by an alpha or "P" value
scientists hold that a P value equal to or less than 0.05 is
significant
this means that you make a hypothesis and test it and look up
P values on a statistical table
type 1 statistical error
Incorrect rejection of a true null hypothesis
at P < 0.01 you would incorrectly reject your null hypothesis
1% of the time
at P < 0.05 you would incorrectly reject your null hypothesis
5% of the time – good enough for scientists
type 2 statistical error
Failure to reject a false null hypothesis
At P < 0.05 you would accept a hypothesis that was in fact
false 5% of the time
a bootstrap score of 95% has a value of
approximately P<0.05
Very loosely stated, this means that we would do the
bootstrap analysis and get this result and it would in fact be
incorrect 5% of the time
the bootstrap usually underestimates confidence interval, but
the relationship depends on the number of taxa and the
topology of the tree
Does a bootstrap score of 100% mean that a
relationship is certain?
No way!
outcomes are dependent on input data, phylogenetic method,
models and parameters of character state substitution, etc.
a node on a tree with a bootstrap score of 96% is better
supported than another node on the same tree with a
bootstrap score of 95%
bootstrap scores are not directly comparable between trees,
analyses, or data sets
in Bayesian analysis, confidence intervals are
expressed as Bayesian Posteriors instead of
bootstrap scores
Bayesian Posteriors are not equivalent to bootstrap scores
Bayesian Posteriors are more “generous” and “flattering” so
they are commonly presented in publications
To make matters worse, all methods of analysis with the
alleged exception of the multispecies coalescent are
“statistically inconsistent” under certain conditions
“statistically inconsistent” means that increasing the size of the
data set provides ever-stronger statistical support for an
incorrect result
This happens when there are very short internal internodes
and some long but some short terminal branches
It is commonly referred to as “long branch attraction” (LBA)
Susceptibility to LBA:
phenetic >> parsimony > Bayesian >? Maximum Likelihood
Properties of the multispecies coalescent are not well known
the “molecular clock”
Zuckerkandl and Pauling 1962
relative rate test - Sarich and Wilson (1976) modified from
Margoliash 1963, microcomplement fixation study of primates
the difference in the distance between any two sister taxa to
an outgroup can result only from a difference in the rate of
evolution in the lineages of the sisters since the time they
diverged from their common ancestor
“clock-like evolution” AC = BC where C is outgroup to A and B
therefore, fossils are not required to document relative rates
of evolution
Important things learned from systematics
1) homologous features are derived from common ancestors
2) homology - similarity in structure but not function is evidence
for evolution; there is no other reason to make a whale's flipper
from the same bones, muscles, nerves, and blood vessels as a
bat's wing
3) homoplasy is common in evolution – convergence in function
but not structure is evidence for evolution; there is no reason to
construct bird’s and bat’s wings differently
4) phylogenetic analysis documents evolutionary trends, e.g.,
parallel trends such as reduction in number of digits in cursorial
animals
Important things learned from systematics
5) rates of character evolution differ
mosaic evolution - evolution of different characters at
different rates within individual lineages
the concept of "living fossils" is erroneous - individual
characters can be primitive but everything living is
specialized in some way(s) to do what it does
Platypus is primitive as an egg laying mammal but specialized
with respect to electrical sensitivity and poison glands
Important things learned from systematics
6) Major evolutionary innovations generally occur in many
small gradual steps
ostensibly discrete characters in living organisms are
generally found to be continuously variable characters if
examined in sufficiently fine segments of time in the fossil
record
Eunotosaurus
turtles have shoulders and hips inside their ribcage
Eunotosaurus
Middle Permian
(>250 Million years old)
birds have large stiff “pennaceous” feathers for flight
Sinosauropteryx
close-up of proto-feathers
Important things learned from systematics
7) characteristics often owe their change in form to a change
in function
Important things learned from systematics
8) most clades display evolutionary radiation
https://www.bio.umass.edu/biology/research/gbi/evolution-of-new-world-leaf-nosed-bats
Important things learned from systematics
9) organisms can be classified into a hierarchical system of
nomenclature because species evolve by divergence from
common ancestors
a historical process of branching and divergence will
yield objects that can be hierarchically ordered, but few other
processes will do so, e.g., elements and minerals cannot
10) Systematics provides the phylogenetic framework for
comparative analyses of molecular evolution, e.g., nucleotide
substitution (mutation) rates, natural selection at the
nucleotide level, understanding the origins of emergent
disease, and vaccine development