Phylogenetic Inference with Parsimony

Phylogenetic Inference with Parsimony
There are many methods that may be used for estimating trees. In this chapter we focus
on the approach called maximum parsimony (or just parsimony). Maximum parsimony
provides the obvious starting point for any discussion of phylogenetic inference. Of all
methods of phylogenetic analysis, parsimony is probably the most intuitive. Also,
parsimony has played a special role in the history of phylogenetic systematics – it was the
development of the parsimony criterion and programs for implementing this approach
that spurred the phylogenetic revolution that began in the 1980’s. Finally, even when
more statistically sophisticated approaches are used to estimate trees, parsimony still
provides a useful way to make inferences about the evolution of traits along those trees.
The bulk of the chapter will describe the parsimony method, which makes assumptions
that are generally valid and, thus, provides a simple way to estimate phylogenies from
many different kinds of data. To provide a context for the discussion of parsimony we
will work through a case study – an analysis of the phylogeny of the Carnivora clade of
mammals.
A biological example: Carnivora
We will now work through the process of using biological data to infer
phylogenetic relationships among a set of species, using Carnivora as our group of
interest. Carnivora is a group of mostly meat-eating mammals, including, for example,
dogs, cats, bears, weasels, mongooses, skunks, and seals. While these animals differ
greatly in their external appearance and their ecology they share several skeletal features.
For example, almost all species have enlarged side teeth, the carnassials, which may be
used for shearing meat. Based on this and other traits it has long been accepted that the
Carnivora is a monophyletic group, a clade. But what are the relationships within
Carnivora?
Before embarking on a study of carnivore phylogeny, we need to decide which
species of carnivores to select for our study and what traits to score. With around 250
species, we cannot possible examine all of the carnivores, and depending on the questions
we wish to answer, we do not need to. Let us say that we want to find out if the
carnivores can be divided into two natural (i.e. monophyletic) groups, the aquatic
pinnipeds (seals, sealions, and walruses) and the terrestrial fissipeds (all others). To
answer this question, sampling a single representative species from each of the major
families (dogs, cats, seals, sealions, etc.) will suffice.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 1
Bobcat and Mexican Grey Wolf Skull (from skulls unlimited)
Now that we have chosen which taxa to include, we must decide which traits or
characters to use as evidence for inferring the phylogeny. Any trait that varies among
tips and is thought to show some degree of heritability (ancestors with the trait tend to
have descendants with the trait) has the potential to provide phylogenetic information.
Until recently, phylogenetic analyses were largely based on morphological traits. With
the advent of modern molecular methods, almost all phylogenetic studies now employ
DNA sequence data, where the individual nucleotide positions along the strand of DNA
are the characters. However, it will be instructive to begin with a consideration of
morphological characters. We will do this by extracting morphological data from a
published study (Wyss and Flynn 1993).
There is one final consideration before we can begin to collect data; we must
choose one or more outgroup taxa. Outgroups serve as a point of comparison with our
ingroup (here the carnivores) and allow us to determine the direction of character change,
(discussed in more detail later). Any taxon that is not a member of Carnivora could,
theoretically, serve as an outgroup. However, the best outgroups are reasonably closely
related to the ingroup so that they can be easily compared for many traits. For our
analysis of morphological data we will use the creodonts, which is an extinct group of
mammals that appears to be closely related to Carnivora (perhaps their sister group). We
have good reason to think creodonts are outside of the Carnivora (and thus can be an
outgroup) because they lack the bony auditory bulla, which is found in Carnivora. We
could have used living placental mammals from another group, although we might avoid
clades that have undergone extensive phenotypic evolution, such as bats, for which it
might be hard to identify homologous traits.
Now we can proceed to score the ingroup and outgroup taxa for the
morphological traits selected. The species are scored for each trait by observing an
individual or a few individuals from that species and recording the form of that trait, its
character state. The 11 characters and the states for each character are given in the table
below.
Table X: Characters and character-states from Wyss and Flynn’s study of carnivore
phylogeny. The character states assigned state ‘0’ in Table X are listed first.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 2
No.
Character
1
Complexity of the turbinal (nasal) bones
2
Relative size of auditory bulla
3
Number of lower incisors
4
Upper molar 1
5
Baculum (bone within the penis)
6
Tail
7 Hallux (5th digit, or dew claw, on forelimb)
8
Prostate gland size and shape
9
Kidney structure
10
External ear (pinna)
11
Testis position
States
Simple; complex
Small; large
2; 3
Present; absent
Present; absent
Elongated; short
Present; absent
Small and simple; large and bilobed
Simple; conglomerate
Present; absent
Scrotal; abdominal
Moving through the tips, we can record the character state for each character for each
species to build a character-state matrix. For example, we observe that, for trait 6 (the
tail), creodonts have the “elongated” state. We have chosen to represent this state with a
0, so creodonts are given a score of 0 for trait 6 in the matrix. (We have assigned, ‘0’ to
all the character states present in the outgroup, but choosing different labels for the states
would not affect the analysis.)
The complete character-state matrix of 11 characters for 10 ingroup and 1 outgroup taxa
is given below. Notice that some taxa may be scored as unknown for certain characters
(conventionally represented with ‘?’), either because we are ignorant as to the proper
scoring (e.g., soft tissues in a fossil) or because it is impossible to score (e.g., toe number
in snakes). Also, a taxon can be scored as polymorphic by listing multiple states within a
cell.
Character-state scoring
Taxa
creodonts
cats
hyaenas
civets
dogs
racoons
bears
otters
seals
walrus
sealions
1
0
0
0
0
1
1
1
1
1
1
1
2
0
1
1
1
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
1
1
1
4
0
1
1
0
0
0
0
0
0
0
0
5
0
0
0
0
1
1
1
1
1
1
1
6
0
0
0
0
0
0
1
0
1
1
1
7
0
1
1
0
0
0
0
0
0
0
0
8
?
1
1
1
0
0
0
0
0
0
0
9
?
0
0
0
0
0
1
1
1
1
1
10
?
0
0
0
0
0
0
0
1
1
0
11
?
0
0
0
0
0
0
0
1
1
1
Character number
As you might imagine, it can be difficult to find a large number of morphological traits
showing appropriate levels of variation for building a phylogeny. In comparison, it has
become quite easy to obtain large amounts of DNA sequence data and to find regions that
are suitably variable for phylogenetics. Table X shows what such data might look like
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 3
for the carnivores. Whereas for morphological data the character states were coded as 0’s
and 1’s, the states of DNA are the 4 bases (A,C,T,G). You may also observe gaps
(marked with a hyphen) in the DNA matrix. These gaps arise when bases are inserted or
deleted from a species’ sequence in the course of evolution. The process of sequence
alignment is concerned with establishing the correct position of gaps so that homologous
sequence positions are aligned above one another in the data matrix. The methods and
theory of sequence alignment are rather technical, but in most cases DNA sequences can
be aligned by hand.
Table X. The states for 15 consecutive positions in transthyretin 2 gene. Here mole is
used as the outgroup and an otter rather than a weasel represents the Mustelidae.
Sequence data
Taxon list
mole
cat
hyaena
civet
dog
racoon
bear
weasel
seal
walrus
sealion
1
2
3
4
5
6
7
8
9
10
11
12 13 14 15
G
G
G
G
G
G
C
G
G
G
G
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
A
A
A
A
G
G
G
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
G
G
G
G
G
G
G
C
C
C
C
C
G
T
G
C
C
C
T
C
C
C
A
G
G
G
G
G
G
T
T
T
T
T
T
T
T
T
T
T
C
C
C
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
T
T
T
T
C
T
C
C
G
G
G
G
G
G
G
A
A
A
A
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
T
T
T
T
Sequence
position
The maximum parsimony criterion
Acknowledging that traits do not always evolve in a perfectly Hennigian fashion, one
logical way to proceed is to allow that some rule breaking occurred, but to assume that
most of the time the rules were followed. Thus, we favor trees that minimize the number
of times that rules were broken – an application of the principle of parsimony. The
maximum parsimony criterion holds that the best tree is the one that can explain all the
observed data by invoking the fewest character state changes.
Phylogenetic inference using parsimony involves determining a score for each tree and
then selecting the tree with the best score. Three steps are involved in estimating a tree
using simple parsimony:
1) For a single tree we consider each character in turn and determine the minimum
number of state-changes, or steps, that are required to account for the distribution
of states among tips (see Chap. 3).
2) We sum up the number of steps required by each character on this tree. The total
number of steps required to explain all the data on a single tree is called the tree
length.
3) We repeat the preceding steps for all alternative trees and then identify the tree
with the lowest tree length, the shortest or most-parsimonious tree.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 4
The carnivore data matrix is too big to do by hand so we will illustrate the parsimony
method using a simple data matrix for four taxa. If we assume that taxon O (the
outgroup) is sister to the remaining species, i.e., that taxa A-C form a clade, then three
trees are possible for these data.
Character data
Taxa
O
A
B
C
O
A
1
0
0
1
0
C
B
2
0
1
1
0
O
3
1
1
1
0
4
0
0
1
1
B
5
1
1
0
1
A
6
1
0
0
1
7
0
1
1
0
C
8
0
0
1
0
O
Tree 2
Tree 1
C
A
B
Tree 3
We can start by considering the tree 1 and seeing how we can explain each character in
turn. If tree 1 were true, the simplest way to explain the first character is that all lineages
had state 0, but there was a single change to state 1 somewhere on the lineage leading to
taxon B. Thus, we can explain this character with a single step, so character 1 has length
1 on tree 1.
Character 2 is more difficult to map onto tree 1. There is no way to explain the
distribution of states among the tips with only one change, but there are three ways to do
it with two changes. These three equally parsimonious reconstructions are shown in the
figure. Branches assigned to state 0 are shown in grey and those assigned to state 1 are
shown in black. The first scenario entails two independent transitions to state 1 (from
state 0), the second entails two independent transitions to state 0 (from state 1), whereas
the third entails one change to state 1 and one reversal back to state 0. From the point of
view of phylogenetic analysis we do not have to worry about which of these
reconstructions is correct, all that matters in that it takes a minimum of two changes to
map character 2 on to the tree 1. Hence, character 2 has length 2 on tree 1.
O=0
A=1
B=1
C=0
O=0
A=1
B=1
©Baum and Smith 1/22/09. Draft. Do not circulate.
C=0
O=0
A=1
B=1
C=0
Page 5
Using the same approach we can now map all eight characters onto tree 1. Characters 1,
3, 4, 5, and 8 each have only one most-parsimonious reconstruction, whereas characters
2, 6 and 7 each has multiple equally parsimonious reconstructions. The tree shows one
most-parsimonious mapping for each character. In total, 11 steps are required to explain
all the data. That is to say, tree 1 has a length of 11.
We can now do the same procedures for the other trees. For these same data, tree 2 has a
length of 12, whereas tree 3 has a length of 9 (as shown). This tells us that tree 3 is the
most-parsimonious tree and is the one that would be preferred under the maximum
parsimony criterion.
O
A
1
2
C
B
6
7
O
C
B
A
3
5
6
8
2
3
4
7
1
4
5
8
7
6
2
4
Tree 1
Black bars: change from 1 to 0
Grey bars: change from 0 to 1
Tree 3
Black bars: change from 1 to 0
Grey bars: change from 0 to 1
The length of each tree is a summation of the number of steps required to explain each
character on that tree. As the table shows, the tree length corresponds to the sum of the
length of each of the eight characters.
O
A
B
C
L on tree 1
L on tree 2
L on tree 3
1
0
0
1
0
1
1
1
2
0
1
1
0
2
2
1
3
1
1
1
0
1
1
1
4
0
0
1
1
1
2
2
5
1
1
0
1
1
1
1
6
1
0
0
1
2
2
1
7
0
1
1
0
2
2
1
8
0
0
1
0
1
1
1
Total length
11
12
9
Mostparsimonious tree
It is worth highlighting that although the optimal (most-parsimonious) tree has the
shortest length over all, it is not optimal for all characters. Character four has a length of
two on the optimal tree, but it has a length of one on tree 1. Character four can be said to
support tree 1 over tree 3. However, the weight of evidence still favors tree 3 over tree 1.
The occurrence of extra changes is an instance of homoplasy. Although parsimony does
not assume that all characters evolve parsimoniously, it favors the tree that invokes the
least amount of homoplasy summed over all characters.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 6
A second observation can be drawn from this table. Only characters 2, 4, 6, and 7 vary in
length among trees. Only these character are parsimony informative. Characters 1, 3, 5,
and 8, in contrast, are parsimony uninformative. Because they have the same length
under each of the three trees, they do not help us choose among trees. Thinking back to
the Clade Race, these characters are like the heart and star stamps that only one runner
picked up. Any derived trait that is found only in one taxon will provide no evidence for
resolving the tree.
outgroup
cats
hyaenas
civets
dogs
racoons
bears
otters
seals
walrus
sealions
0
0
0
0
1
1
1
1
1
1
1
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
1
0
1
1
1
0
1
1
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
1
1
1
0
1
1
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
1
0
1
0
1
1
1
0
1
0
1
1
1
1
0
0
1
1
0
0
0
1
1
1
1
0
0
0
0
0
1
1
1
?
0
0
0
0
0
0
0
0
1
0
0
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
1
1
1
Now we can apply parsimony to the carnivore dataset. The data matrix shown above
includes the 12 characters listed earlier plus a further seven characters from Wyss and
Flynn (1993). Two trees are equally parsimonious for these data, both requiring 29
character state changes to explain all 19 morphological characters. The two trees differ
only in the relationships within the pinnipeds: seals being sister either to walruses or
sealions. The information common to both trees can be shown in a consensus tree
(technically a strict consensus tree): a tree that contains only those clades present in both
equally parsimonious trees, as shown on the right.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 7
Strict
creodonts
creodonts
cats
cats
hyaenas
hyaenas
civets
civets
dogs
dogs
racoons
racoons
creodonts
cats
hyaenas
civets
dogs
racoons
bears
bears
bears
seals
seals
walrus
sealions
sealions
sealions
walrus
otters
otters
otters
The two most-parsimonious trees for the morphological data.
Both trees have length 29.
seals
walrus
Consensus of the two mostparsimonious trees
How does this compare to the tree inferred from analysis of DNA sequence data for the
same organisms? For most of the living carnivores included here, sequences are
available for a 1116 base-pair region of the transthyretin 2 gene (Nedball and Flynn,
199x). It is necessary to use a weasel rather than an otter to represent the family
Mustelidae. Also, it is necessary to use a living mammal,
in this case a mole, rather than the fossil creodont as an
outgroup. But, we these substitutions we can conduct
exactly the same kind of analysis. Parsimony analysis of
these molecular data yields a single tree of length 812
steps, shown in fig. x.
mole
cat
hyaena
civet
dog
This tree has many similarities to the trees obtained from
racoon
morphological data. Both trees include a monophyletic
weasel
pinniped group (seals, walruses, sealions) and a division
of the living carnivores into two major subclades:
seal
feliforms (cats, hyaenas, civet) and caniforms (dogs,
walrus
raccoons, bears, otters/weasels, and pinnipeds), with dogs
sister to the remaining coniforms. This agreement should
sealion
not be taken for granted. If these data were not the result
bear
of evolution along a tree, the probability that the two trees
we would find would be this similar is 10-xx (how do we calculate this?). The fact that
both kinds of data yield such similar trees provides concrete evidence in support of the
claim that these tips are the result of descent from common ancestry. As discussed by
Penny et al. (1982) the agreement among independent phylogenetic data sets serves to
disprove the hypothesis of separate ancestry.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 8
However, while these two trees are remarkably similar, they also have some differences.
One difference is the sister group of the pinnipeds: the bears for the morphological data
(based in part on the fact that both groups have short tails), but a clade of raccoons plus
otters/weasels for the DNA sequence data. If we assume that there is one true treelike
history of carnivorans along which both molecular and morphological data have evolved
then these differences are best understood as being due to imperfect phylogenetic
inference. One or the other or both trees are presumably incorrect in some details. While
some conclusions, such as the fact that pinnipeds are embedded within the caniforms, are
well demonstrated by these analyses, other results would have to remain uncertain
pending the collection of more data.
Parsimony versus compatibility
Early in the development of systematics, when it was realized that pure Hennigian
reasoning did not work, another criterion compatibility was proposed as an alternative to
parsimony. While compatibility is not widely used it is worth knowing about the concept
and how it differs from parsimony.
A character is said to be compatible with a tree if it can be explained on that tree without
invoking any homoplasy. Parsimony uninformative characters are compatible with all
trees. Parsimony informative characters are compatible only with some trees. A
character with two states is compatible only with trees that can explain the data with one
change of state. A character with three states is compatible only with trees that can
explain the data with two changes of state. Or more generally, a character is compatible
with a tree if the number of state changes need on that tree is one
A
B C D E
F
less than the number of states.
Here is a tree and some examples of characters that are and are
not compatible.
A
B
C
D
E
F
Length
on tree
A
A
A
A
A
A
0
Compatible
G T G
G T G
G T G
G T T
A C T
G C C
1 1 2
G
G
C
T
A
A
3
G
G
A
A
G
G
2
Incompatible
C A A A
T C C C
T A A G
C C C C
C G G A
T G G G
3 3 3 4
T
G
A
T
C
G
4
The best tree according to the compatibility criterion is the tree for which the maximum
number of characters are compatible. Compatibility is similar but not identical to
parsimony: parsimony selects the tree on which there are the fewest number of
homoplastic changes whereas compatibility selects the tree on which there are the fewest
number of characters with any homoplasy.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 9
While the most-compatible tree is often also the most-parsimonious tree, it need not be.
Whereas compatibility treats all incompatible characters as equivalent, parsimony takes
account of the number of steps associated with compatible and the non-compatible
characters. This is shown by the following hypothetical example in which tree 1 is
favored by parsimony and tree 2 by compatibility. This is because the last two
characters, which are not compatible with either tree, nonetheless can be explained on
tree 1 with fewer steps than on tree 2.
O A
B
C
D
E
O A
Tree 1
O
A
B
C
D
E
L on tree 1
L on tree 2
B
C
E
D
Tree 2
1
A
C
C
C
C
C
1
1
2
C
C
T
T
T
T
1
1
3
G
G
T
T
T
T
1
1
4
G
G
G
A
A
A
2
1
5
A
A
A
A
T
T
1
1
6
T
T
T
T
C
C
1
1
7
A
T
T
A
A
T
2
3
8
G
A
A
G
A
G
2
3
Compatible
5
6
Length
11
12
Compatibility is not an unreasonable method for estimating phylogenies, especially when
we believe that there are a subset of characters whose evolution perfectly reflects the
phylogeny, while other characters change state too much to be useful. However, if you
believe that all characters, even ones that have experienced some homoplasy, contain
information then parsimony is a better criterion. For this and other reasons compatibility
is rarely, if ever, used in modern phylogenetics and will not be discussed further.
A justification of parsimony
The aim of phylogenetics is to choose the tree that is most likely to be true given all the
empirical data. Why would we expect more parsimonious (shorter) trees to be more
probably true than less parsimonious (longer) trees? Why would the discovery of a set of
data for which tree A is shorter than tree B tend to increase our belief that tree A is true?
Here I will present a verbal argument. A simplified quantitative analysis is provided in
Appendix 8.
Let’s start by assuming that the true evolutionary tree is such that the rate of character
evolution is low and relatively even across the tree. When the rate of character evolution
is low the chance of a single character changing state is low, but the chance of changes
happening on two branches is much lower, because it will be the square of the initial low
probability (both changes must occur so the probability is multiplied). So most characters
will have experienced few changes of state from the ancestral state (at the root of the
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 10
tree), meaning that homoplasy will be low. Thus, the data will generally be explicable by
invoking few changes on the tree that matches the true tree, but will require more changes
on other trees. Therefore, when rates of evolution are reasonably low, the tree that can
explain all the characters with the fewest numbers of changes will tend to resemble the
true tree.
What if the rate of evolution is high for all characters? If all characters in the matrix tend
to evolve very rapidly, the data will contain little consistent signal favoring one tree over
another. Once this lack of phylogenetic signal has been detected (by methods that will be
described later in this chapter), it will become clear that the data are not suited to
parsimony analysis.
What if the rate of evolution is low for some characters but very high for others? In this
case the rapidly evolving characters will tend to yield a noisy pattern that will not
strongly favor any one tree. However, provided there are enough slowly evolving traits,
they will tend to lay their collective support behind a single tree – and that tree should
resemble the true tree.
Taken together, parsimony provides a reasonable method for inferring phylogenetic trees:
the probability that a tree is true will generally correlate at least loosely with tree length,
shorter tree being more probable. Further, tree length provides a crude measure of how
much better one tree is than another. Thus, if tree A is one step shorter than tree B but 15
steps shorter than tree C we can say that the data reject tree C more strongly than they
reject tree B. However, the absolute length of a tree and the magnitude of the length
difference between two trees is dependent on the particular matrix of characters used.
Without other analyses (such as those presented later in the chapter), we cannot assert
that one tree is “significantly” better than another. Although parsimony gives us valuable
insights into the trees implied by our data, it falls far short of being a rigorous statistical
method.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 11
Appendix 1
A quantitative demonstration that the most-parsimonious tree will tend to be the
tree that is most likely to have generated the data.
Name one character-state ‘0’ and its alternative state ‘1.’ For example, state ‘0’ could be
locomotion = tetrapedal whereas state ‘1’ could be locomotion = bipedal; or state ‘0’
could be position 119 in haemoglobin = adenosine and state ‘1’ could be position 119 in
haemoglobin = cytosine.
Consider four taxa, an outgroup, O and three ingroups A-C. In order to be parsimony
informative two taxa must have state 0 and two must have state 1. Suppose, B and C
have state 1 and taxa A and O have state 0. If we order the taxa OABC, this character has
the pattern 0011. The most parsimonious tree for this single character will be one that
can explain the states with only a single change of character state, namely a tree with B
and C sister: tree 1 among the three possible trees. So the method of parsimony assumes
that, if we start by considering all three trees to be equally likely to be true, the
observation of a character with a 0011 pattern is sufficient to make tree 1 the most
plausible of the three trees. To understand the validity of parsimony we need to
understand why a 0011 pattern supports tree 1 among these three trees.
O
A
C
B
O
B
A
Tree 2
Tree 1
C
O
C
A
B
Tree 3
The answer is that, under certain assumptions, tree 1 is more likely to generate the data
pattern 0011 than are the other trees. It is a general principle of statistics that the
hypothesis that is most likely to be true is the one that, if it were true, would be the most
likely to generate the observed data. But how can we calculate the probability of the data
being generated under one of these trees?
It will be helpful to work through this numerically, using trees 1 and 2 (we will ignore
tree 3). To make it easy let’s assume that each branch on the two trees is the same length,
where length is here measured in units of probability of a character changing state (in
either direction). Suppose that the probability of a change occurring on each branch is
1%. This is low enough that we can assume that no more than one change will occur per
branch.
O
A
C
B
z
y
Tree 1
Assuming that nodes x, y, z represent actual ancestors that
had either state 0 or state 1 (not both), there are eight
possible combinations of ancestral states, as shown in the
table.
x
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 12
History
1
2
3
4
5
6
7
8
0
0
x
0
0
0
0
1
1
1
1
0
0
Hist. 1
Tree 1
0
0
z
0
1
0
1
0
1
0
1
We can calculate the probability of history 1 yielding a 0011
pattern by taking a product of the number of branches along
which no change happened (probability 0.99) and those on which
a change happened. In this case there are four branches with no
changes and two with changes. So the overall probability of a
0011 pattern is 0.994 X 0.012 = 9.6 x 10-5 or about 1 in 10,000.
1
1
y
0
0
1
1
0
0
1
1
0
1
1
1
0
Hist. 2
Tree 1 0
History
1 (000)
2 (001)
3 (010)
4 (011)
5 (100)
6 (101)
7 (110)
8 (111)
Now let’s do the same thing for history 2. In this case, just one
character state change is required. This means that the chance of
this outcome is 0.995 X 0.01 = 9.5 x 10-3, which is nearly 1 in
100.
The table shows the probability of a 0011 pattern for all eight
histories.
Number of
branches w/o
change
4
5
1
4
2
3
1
4
Number of
branches w/
change
2
1
5
2
4
3
5
2
Total
Probability
9.6 x 10-5
9.5 x 10-3
9.9 x 10-11
9.6 x 10-5
9.8 x 10-9
9.7 x 10-7
9.9 x 10-11
9.6 x 10-5
9.8 x 10-3
Summing across all eight histories we can say that, given this tree, the probability of a
0011 pattern is about 0.0098, or a shade less than 1%. You can probably also see that the
probability of a 1100 pattern will be the same. Thus, there is an approximately 2%
chance that evolution up tree one will produce a parsimony informative character
supporting this tree.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 13
You might consider 2% to be a low number, so why would we still say that these data
favor tree 1 over tree 2? To see the answer we need to also calculate the likelihood of a
0011 and 1100 pattern under tree 2. What is the probability of a 0011 pattern under tree
2? Note that because the tips are not in the OABC order but are in the order OBAC, the
0011 pattern would be read across the tips as 0101. This means that the histories have
different probabilities of yielding the desired pattern. The table shows probabilities of
each history that leads to O and A having state 0 and B and C having state 1 under this
tree.
0
1
1
0
History
0
0
Hist. 1
Tree 1
0
1
0
1
0
1
0
Hist. 2
Tree 1
1 (000)
2 (001)
3 (010)
4 (011)
5 (100)
6 (101)
7 (110)
8 (111)
Branches
w/o
change
4
3
3
4
2
1
3
4
Branches Probability
w/
change
2
9.6 x 10-5
3
9.7 x 10-7
3
9.7 x 10-7
2
9.6 x 10-5
4
9.8 x 10-9
5
9.9 x 10-11
3
9.7 x 10-7
2
9.6 x 10-5
Total
2.9 x 10-4
0
Notice that while the 0011 pattern was very unlikely under
tree 1 (ca. 1%) it is even less likely under tree 2 (0.03%): more than 30-fold less
probable. The same is true for this pattern under tree 3. Thus, observation of traits with a
0011 (and 1100) pattern provides support for tree 1 over trees 2 and 3. In case you are
wondering, where the other 97% of the probability is, it is divided among all the noninformative data patterns (0000,0001,0010, 0100, 1000, 0100, 0010, 0001, and 1111).
The conclusion that a trait with an informative pattern is more likely to arise on the tree
that it supports by parsimony hold so long as the probability of change on each branch is
relatively low and similar among branches (Felsenstein 1978). More generally, so long
as changes are rare, histories that invoke fewer changes have a much higher probability
of giving rise to the observed data than histories that require many changes. Thus,
characters lend statistical support for trees on which they undergo few changes in
character state. Thus, our back-of-the envelope calculations explain why the parsimony
criterion is justified for identifying the “best” trees.
©Baum and Smith 1/22/09. Draft. Do not circulate.
Page 14