Tutorial 9

ELE4120 Bioinformatics
Tutorial 9
Content
• Building Phylogenetic Tree
– Principles and terminology
– Methods to construct the tree
– Samples
The construction of trees
• We construct cladograms (= phylogenies
or trees) to determine how closely related
organisms are.
• We need to understand certain terms
before we can construct a tree.
• Polytomy, outgroup, sister group, polytomy
Multifurcation/Starbursts, Bifurcation
Additional Terms
•
Recall: Bifurcation versus Multifurcation/Starbursts (e.g. Trifurcation)
A
B
C
D
E
A
B
C
D
E
=
/
Bifurcation
Trifurcation
Multifurcation
- May represent a lack of resolution because of too few data available for
inferring the phylogeny
- May stand for rapid speciation
Additional Terms
• Character
– any phenotypic trait of an organism
– E.g., tongue length, head shape, nucleotide
position
• Character state
– variant (presumed to be homologous) forms
of a character;
– E.g., if characters are DNA sequences, code
as different amino acids or G, A, T, or C
Terms: Homology vs. Homoplasy
X
X
Homology: similar
traits inherited
from a common
ancestor
X
X
Homoplasy: similar
traits are
not directly caused by
common ancestry
(convergent evolution).
Terms: What traits should one evaluate to construct a taxonomy?
1. Phenetic approach: all traits can be useful; the
taxonomist must use subjective judgment to
decide how important each trait is relative to
other traits.
2. Phylogenetic approach: Which traits should be
used?
Q Analogies: traits shared because of convergent
evolution
Q Homologies: traits shared because a common
ancestor had the trait
Q Derived homologies
Q Ancestral homologies
Terms: Should We Use Analogies to Construct Phylogeny?
• Characters are versions of a trait.
• Species 1 & 3 share character B because of
convergent evolution: an analogy.
Species
1
2
3
4
Characters
B
A
B
A
B
B
A
A
A
A
A
Terms: Should We Use Derived Homologies to Construct
Phylogeny?
•
•
Characters are versions of a trait.
Species 1 & 2 share character B because a recent common ancestor
derived B, a new character that species 3 & 4 lack.
Species
1
2
3
4
Characters
B
B
A
A
B
A
B
A
A
Terms: Should We Use Ancestral Homologies to Construct
Phylogeny?
•
•
Characters are versions of a trait.
Species 2, 3, & 4 share character A because a common ancestor of
these species had A, and other descendants of that common ancestor
lost A.
Species
1
2
3
4
Characters
B
A
A
A
B
A
A
A
A
Terms: Summary of Previous 4 Slides
• According to a type of phylogenetics called
“cladistic phylogenetics” or “cladistics”:
– Traits to use to construct phylogenetic trees:
shared derived characters = derived
homologies
– Traits to NOT use to construct phylogenetic
trees:
• shared ancestral characters = ancestral
homologies
• analogies
Recall Steps of Reconstructing trees
• Choose the taxa
– whose evolutionary relationships interest you
– must themselves be clades
• Determine the characters
– examine each taxon to determine the character states
– anatomical traits /select the 362 bases in a particular
gene
• Determine the polarity of characters
– figure out the order of evolution for each character
– May take some work
– Helpful fossil evidence
Recall Steps of Reconstructing trees
• Group taxa by synapomorphies
– derived or "changed" character states shared by two taxa
– Assumption: similar features caused by common ancestry
• Work out conflicts that arise
– by some clearly stated method, usually parsimony
• Build your tree --rules
– All taxa go to endpoints, never nodes
– All nodes must have a list of synapomorphies
– All synapomorphies appear on the tree only once
• have made a phylogeny!?
– phylogeny is a hypothesis
– Tree is only as good as the data
Example 1: How to determine the most likely
evolutionary tree?
1. Pick a group of taxa-of-interest: for
example, cow, deer, hippo, pig, and whale.
2. Pick a set of traits that vary between these
taxa: e.g., whether or not each species
contains a DNA insertion at various
specific locations in the chromosomes.
Observe the traits in those species.→
Observed Traits: Presence or Absence of DNA
Insertions at Different Specific Locations
Names of Locations of Different Specific Insertions
1
3
5
6
7
8
10
11
12
15
18
19
20
Cow
0
0
0
0
0
1
1
1
1
1
1
0
0
Deer
0
0
0
0
0
1
1
1
1
1
1
0
0
Hippo 0
0
1
1
1
0
1
0
1
0
1
0
0
Pig
0
0
0
0
0
0
0
0
0
0
1
1
1
Whale 1
1
1
1
1
0
1
0
1
0
1
0
0
“0”means insertion is absent; “1” means insertion is present.
Example 1: building tree
1. Pick a group of taxa-of-interest: for example, cow,
deer, hippo, pig, and whale.
2. Pick a set of traits that vary between these taxa:
e.g., whether or not each species contains a DNA
insertion at various specific locations in the
chromosomes. Observe the traits in those
species.→
3. Find a control taxon that is similar to the other
taxa, but lacks all the traits you are using to
construct this tree: for example, a mammal that
lacks are of these insertions: the camel.
Q What do we call this control group?
Observed Traits: Include the Outgroup
Names of Locations of Different Specific Insertions
1
3
5
6
7
8
10
11
12
15
18
19
20
Cow
0
0
0
0
0
1
1
1
1
1
1
0
0
Deer
0
0
0
0
0
1
1
1
1
1
1
0
0
Hippo 0
0
1
1
1
0
1
0
1
0
1
0
0
Pig
0
0
0
0
0
0
0
0
0
0
1
1
1
Whale 1
1
1
1
1
0
1
0
1
0
1
0
0
Camel 0
0
0
0
0
0
0
0
0
0
0
0
0
“0”means insertion is absent; “1” means insertion is present.
Example 1: building tree
3. Find a control taxon that is similar to the other
taxa, but lacks all the traits you are using to
construct this tree: for example, a mammal that
lacks are of these insertions: the camel.
Q What do we call this control group?
4. Construct many hypothetical phylogenetic trees,
each tree showing each mutational event: gaining
or losing a DNA insertion.
5. Which tree is the most likely?
Q The simplest one = the one that hypothesizes the
fewest mutational events!
Q “Parsimony” is the general logical principle that the
simplest hypothesis that successfully explains all
the observation is most likely.
Q Select the most parsimonious tree.
Construct many hypothetical phylogenetic trees, each tree
showing each mutational event: gaining or losing a DNA insertion.
Any tree must contain 6 tips, representing the 6 taxa in
alphabetical order below and their current traits, the DNA
insertions.
Species: Camel
Traits:
none
Cow
Deer
Hippo
8, 10, 11, 8, 10, 11, 5, 6, 7,
12, 15,
12, 15,
10, 12,
18
18
18
Pig
Whale
18, 19,
20
1, 3, 5, 6,
7, 10, 12,
18
Consider one possible phylogenetic tree,
shown on the next slide.
Consider one possible phylogenetic tree.
Same traits
outgroup
Camel
Cow
Deer
Gain 8, 10, 11,
12, 15, 18
Last 3 taxa in alphabetical order
Hippo
Whale
Gain 5, 6, 7,
10, 12, 18
Gain 18,
19, 20
Can you think of another
more likely tree: one that is
more simple = has fewer
mutational events?
This tree hypothesizes 23
mutational events.
Pig
Gain 1, 3, 5, 6,
7, 10, 12, 18
For example, would it be
simpler to hypothesize that
the insertion at 18 occurred
once rather than 4 separate
times?
Simplify the previous phylogenetic tree.
Camel
Cow
Deer
Gain 8, 10, 11,
12, 15
Hippo
Whale
Gain 5, 6, 7,
10, 12
Gain 19,
20
Can you think of another
more likely tree: one that is
more simple = has fewer
mutational events?
This tree assumes 20
mutational events.
Pig
Gain 18
Gain 1, 3, 5, 6,
7, 10, 12
What other simplifications
can you think of?
21
Different algorithms used to infer
phylogeny from sequence data
1.
Distance based methods
a. Calculate evolutionary distances between sequences
b. Build a tree based on those distances
2.
Maximum Parsimony (character based method)
a. Find the simplest tree that explains the data with the fewest # of substitutions/mutation
Used by aforementioned method
3.
Maximum Likelihood (probabilistic method based on explicit model)
a. Find the tree that is most likely, given an evolutionary model
4.
New Baysian approaches (also probabilistic)
Example 2: building trees - parsimony
• Now, how do we take a bunch of data, say DNA
sequence data, and make a (phylogenetic) tree?
• One way to do this is to draw a tree on which all
of the features of the organisms (or alleles)
evolve in the simplest possible way.
• This is called a parsimony analysis - parsimony
means stingy, so we are trying to find a tree on
which one evolves within fewest substitutions, if
possible.
Example 2: Building phylogenetic trees
Maximum Parsimony (character based method)
Search all possible trees and find the one requiring the fewest substitutions
AAG
GGA
AAA
AGA
a
b
c
d
Example 2: Building phylogenetic trees
Maximum Parsimony (character based method)
Search all possible trees and find the one requiring the fewest substitutions
AAG
GGA
AAA
AGA
a
b
c
d
Example 2: Building phylogenetic trees
Maximum Parsimony (character based method)
Search all possible trees and find the one requiring the fewest substitutions
AAG
AAA
GGA
AGA
a
c
b
d
What are the ancestral sequences at each node?
How many base changes are required for this tree?
Example 2: Building phylogenetic trees
Maximum Parsimony (character based method)
Search all possible trees and find the one requiring the fewest substitutions
AAA
AAG
AAA
GGA
AGA
AAA
or
AGA
a
c
b
d
AGA
What are the ancestral sequences at each node?
How many base changes are required for this tree?
3 changes are required.