Phylogenetic Inference with Parsimony There are many methods that may be used for estimating trees. In this chapter we focus on the approach called maximum parsimony (or just parsimony). Maximum parsimony provides the obvious starting point for any discussion of phylogenetic inference. Of all methods of phylogenetic analysis, parsimony is probably the most intuitive. Also, parsimony has played a special role in the history of phylogenetic systematics – it was the development of the parsimony criterion and programs for implementing this approach that spurred the phylogenetic revolution that began in the 1980’s. Finally, even when more statistically sophisticated approaches are used to estimate trees, parsimony still provides a useful way to make inferences about the evolution of traits along those trees. The bulk of the chapter will describe the parsimony method, which makes assumptions that are generally valid and, thus, provides a simple way to estimate phylogenies from many different kinds of data. To provide a context for the discussion of parsimony we will work through a case study – an analysis of the phylogeny of the Carnivora clade of mammals. A biological example: Carnivora We will now work through the process of using biological data to infer phylogenetic relationships among a set of species, using Carnivora as our group of interest. Carnivora is a group of mostly meat-eating mammals, including, for example, dogs, cats, bears, weasels, mongooses, skunks, and seals. While these animals differ greatly in their external appearance and their ecology they share several skeletal features. For example, almost all species have enlarged side teeth, the carnassials, which may be used for shearing meat. Based on this and other traits it has long been accepted that the Carnivora is a monophyletic group, a clade. But what are the relationships within Carnivora? Before embarking on a study of carnivore phylogeny, we need to decide which species of carnivores to select for our study and what traits to score. With around 250 species, we cannot possible examine all of the carnivores, and depending on the questions we wish to answer, we do not need to. Let us say that we want to find out if the carnivores can be divided into two natural (i.e. monophyletic) groups, the aquatic pinnipeds (seals, sealions, and walruses) and the terrestrial fissipeds (all others). To answer this question, sampling a single representative species from each of the major families (dogs, cats, seals, sealions, etc.) will suffice. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 1 Bobcat and Mexican Grey Wolf Skull (from skulls unlimited) Now that we have chosen which taxa to include, we must decide which traits or characters to use as evidence for inferring the phylogeny. Any trait that varies among tips and is thought to show some degree of heritability (ancestors with the trait tend to have descendants with the trait) has the potential to provide phylogenetic information. Until recently, phylogenetic analyses were largely based on morphological traits. With the advent of modern molecular methods, almost all phylogenetic studies now employ DNA sequence data, where the individual nucleotide positions along the strand of DNA are the characters. However, it will be instructive to begin with a consideration of morphological characters. We will do this by extracting morphological data from a published study (Wyss and Flynn 1993). There is one final consideration before we can begin to collect data; we must choose one or more outgroup taxa. Outgroups serve as a point of comparison with our ingroup (here the carnivores) and allow us to determine the direction of character change, (discussed in more detail later). Any taxon that is not a member of Carnivora could, theoretically, serve as an outgroup. However, the best outgroups are reasonably closely related to the ingroup so that they can be easily compared for many traits. For our analysis of morphological data we will use the creodonts, which is an extinct group of mammals that appears to be closely related to Carnivora (perhaps their sister group). We have good reason to think creodonts are outside of the Carnivora (and thus can be an outgroup) because they lack the bony auditory bulla, which is found in Carnivora. We could have used living placental mammals from another group, although we might avoid clades that have undergone extensive phenotypic evolution, such as bats, for which it might be hard to identify homologous traits. Now we can proceed to score the ingroup and outgroup taxa for the morphological traits selected. The species are scored for each trait by observing an individual or a few individuals from that species and recording the form of that trait, its character state. The 11 characters and the states for each character are given in the table below. Table X: Characters and character-states from Wyss and Flynn’s study of carnivore phylogeny. The character states assigned state ‘0’ in Table X are listed first. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 2 No. Character 1 Complexity of the turbinal (nasal) bones 2 Relative size of auditory bulla 3 Number of lower incisors 4 Upper molar 1 5 Baculum (bone within the penis) 6 Tail 7 Hallux (5th digit, or dew claw, on forelimb) 8 Prostate gland size and shape 9 Kidney structure 10 External ear (pinna) 11 Testis position States Simple; complex Small; large 2; 3 Present; absent Present; absent Elongated; short Present; absent Small and simple; large and bilobed Simple; conglomerate Present; absent Scrotal; abdominal Moving through the tips, we can record the character state for each character for each species to build a character-state matrix. For example, we observe that, for trait 6 (the tail), creodonts have the “elongated” state. We have chosen to represent this state with a 0, so creodonts are given a score of 0 for trait 6 in the matrix. (We have assigned, ‘0’ to all the character states present in the outgroup, but choosing different labels for the states would not affect the analysis.) The complete character-state matrix of 11 characters for 10 ingroup and 1 outgroup taxa is given below. Notice that some taxa may be scored as unknown for certain characters (conventionally represented with ‘?’), either because we are ignorant as to the proper scoring (e.g., soft tissues in a fossil) or because it is impossible to score (e.g., toe number in snakes). Also, a taxon can be scored as polymorphic by listing multiple states within a cell. Character-state scoring Taxa creodonts cats hyaenas civets dogs racoons bears otters seals walrus sealions 1 0 0 0 0 1 1 1 1 1 1 1 2 0 1 1 1 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 1 1 1 4 0 1 1 0 0 0 0 0 0 0 0 5 0 0 0 0 1 1 1 1 1 1 1 6 0 0 0 0 0 0 1 0 1 1 1 7 0 1 1 0 0 0 0 0 0 0 0 8 ? 1 1 1 0 0 0 0 0 0 0 9 ? 0 0 0 0 0 1 1 1 1 1 10 ? 0 0 0 0 0 0 0 1 1 0 11 ? 0 0 0 0 0 0 0 1 1 1 Character number As you might imagine, it can be difficult to find a large number of morphological traits showing appropriate levels of variation for building a phylogeny. In comparison, it has become quite easy to obtain large amounts of DNA sequence data and to find regions that are suitably variable for phylogenetics. Table X shows what such data might look like ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 3 for the carnivores. Whereas for morphological data the character states were coded as 0’s and 1’s, the states of DNA are the 4 bases (A,C,T,G). You may also observe gaps (marked with a hyphen) in the DNA matrix. These gaps arise when bases are inserted or deleted from a species’ sequence in the course of evolution. The process of sequence alignment is concerned with establishing the correct position of gaps so that homologous sequence positions are aligned above one another in the data matrix. The methods and theory of sequence alignment are rather technical, but in most cases DNA sequences can be aligned by hand. Table X. The states for 15 consecutive positions in transthyretin 2 gene. Here mole is used as the outgroup and an otter rather than a weasel represents the Mustelidae. Sequence data Taxon list mole cat hyaena civet dog racoon bear weasel seal walrus sealion 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 G G G G G G C G G G G T T T T T T T T T T T T T T T T T T T A A A A G G G A A A A A A A A A A A A A A A A A A G G G G G G G C C C C C G T G C C C T C C C A G G G G G G T T T T T T T T T T T C C C C C C C C C C C T T T T T T T T T T T C T C C G G G G G G G A A A A C C C C C C C C C C C C C C C C C C T T T T T T T T T T T Sequence position The maximum parsimony criterion Acknowledging that traits do not always evolve in a perfectly Hennigian fashion, one logical way to proceed is to allow that some rule breaking occurred, but to assume that most of the time the rules were followed. Thus, we favor trees that minimize the number of times that rules were broken – an application of the principle of parsimony. The maximum parsimony criterion holds that the best tree is the one that can explain all the observed data by invoking the fewest character state changes. Phylogenetic inference using parsimony involves determining a score for each tree and then selecting the tree with the best score. Three steps are involved in estimating a tree using simple parsimony: 1) For a single tree we consider each character in turn and determine the minimum number of state-changes, or steps, that are required to account for the distribution of states among tips (see Chap. 3). 2) We sum up the number of steps required by each character on this tree. The total number of steps required to explain all the data on a single tree is called the tree length. 3) We repeat the preceding steps for all alternative trees and then identify the tree with the lowest tree length, the shortest or most-parsimonious tree. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 4 The carnivore data matrix is too big to do by hand so we will illustrate the parsimony method using a simple data matrix for four taxa. If we assume that taxon O (the outgroup) is sister to the remaining species, i.e., that taxa A-C form a clade, then three trees are possible for these data. Character data Taxa O A B C O A 1 0 0 1 0 C B 2 0 1 1 0 O 3 1 1 1 0 4 0 0 1 1 B 5 1 1 0 1 A 6 1 0 0 1 7 0 1 1 0 C 8 0 0 1 0 O Tree 2 Tree 1 C A B Tree 3 We can start by considering the tree 1 and seeing how we can explain each character in turn. If tree 1 were true, the simplest way to explain the first character is that all lineages had state 0, but there was a single change to state 1 somewhere on the lineage leading to taxon B. Thus, we can explain this character with a single step, so character 1 has length 1 on tree 1. Character 2 is more difficult to map onto tree 1. There is no way to explain the distribution of states among the tips with only one change, but there are three ways to do it with two changes. These three equally parsimonious reconstructions are shown in the figure. Branches assigned to state 0 are shown in grey and those assigned to state 1 are shown in black. The first scenario entails two independent transitions to state 1 (from state 0), the second entails two independent transitions to state 0 (from state 1), whereas the third entails one change to state 1 and one reversal back to state 0. From the point of view of phylogenetic analysis we do not have to worry about which of these reconstructions is correct, all that matters in that it takes a minimum of two changes to map character 2 on to the tree 1. Hence, character 2 has length 2 on tree 1. O=0 A=1 B=1 C=0 O=0 A=1 B=1 ©Baum and Smith 1/22/09. Draft. Do not circulate. C=0 O=0 A=1 B=1 C=0 Page 5 Using the same approach we can now map all eight characters onto tree 1. Characters 1, 3, 4, 5, and 8 each have only one most-parsimonious reconstruction, whereas characters 2, 6 and 7 each has multiple equally parsimonious reconstructions. The tree shows one most-parsimonious mapping for each character. In total, 11 steps are required to explain all the data. That is to say, tree 1 has a length of 11. We can now do the same procedures for the other trees. For these same data, tree 2 has a length of 12, whereas tree 3 has a length of 9 (as shown). This tells us that tree 3 is the most-parsimonious tree and is the one that would be preferred under the maximum parsimony criterion. O A 1 2 C B 6 7 O C B A 3 5 6 8 2 3 4 7 1 4 5 8 7 6 2 4 Tree 1 Black bars: change from 1 to 0 Grey bars: change from 0 to 1 Tree 3 Black bars: change from 1 to 0 Grey bars: change from 0 to 1 The length of each tree is a summation of the number of steps required to explain each character on that tree. As the table shows, the tree length corresponds to the sum of the length of each of the eight characters. O A B C L on tree 1 L on tree 2 L on tree 3 1 0 0 1 0 1 1 1 2 0 1 1 0 2 2 1 3 1 1 1 0 1 1 1 4 0 0 1 1 1 2 2 5 1 1 0 1 1 1 1 6 1 0 0 1 2 2 1 7 0 1 1 0 2 2 1 8 0 0 1 0 1 1 1 Total length 11 12 9 Mostparsimonious tree It is worth highlighting that although the optimal (most-parsimonious) tree has the shortest length over all, it is not optimal for all characters. Character four has a length of two on the optimal tree, but it has a length of one on tree 1. Character four can be said to support tree 1 over tree 3. However, the weight of evidence still favors tree 3 over tree 1. The occurrence of extra changes is an instance of homoplasy. Although parsimony does not assume that all characters evolve parsimoniously, it favors the tree that invokes the least amount of homoplasy summed over all characters. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 6 A second observation can be drawn from this table. Only characters 2, 4, 6, and 7 vary in length among trees. Only these character are parsimony informative. Characters 1, 3, 5, and 8, in contrast, are parsimony uninformative. Because they have the same length under each of the three trees, they do not help us choose among trees. Thinking back to the Clade Race, these characters are like the heart and star stamps that only one runner picked up. Any derived trait that is found only in one taxon will provide no evidence for resolving the tree. outgroup cats hyaenas civets dogs racoons bears otters seals walrus sealions 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 ? 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 Now we can apply parsimony to the carnivore dataset. The data matrix shown above includes the 12 characters listed earlier plus a further seven characters from Wyss and Flynn (1993). Two trees are equally parsimonious for these data, both requiring 29 character state changes to explain all 19 morphological characters. The two trees differ only in the relationships within the pinnipeds: seals being sister either to walruses or sealions. The information common to both trees can be shown in a consensus tree (technically a strict consensus tree): a tree that contains only those clades present in both equally parsimonious trees, as shown on the right. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 7 Strict creodonts creodonts cats cats hyaenas hyaenas civets civets dogs dogs racoons racoons creodonts cats hyaenas civets dogs racoons bears bears bears seals seals walrus sealions sealions sealions walrus otters otters otters The two most-parsimonious trees for the morphological data. Both trees have length 29. seals walrus Consensus of the two mostparsimonious trees How does this compare to the tree inferred from analysis of DNA sequence data for the same organisms? For most of the living carnivores included here, sequences are available for a 1116 base-pair region of the transthyretin 2 gene (Nedball and Flynn, 199x). It is necessary to use a weasel rather than an otter to represent the family Mustelidae. Also, it is necessary to use a living mammal, in this case a mole, rather than the fossil creodont as an outgroup. But, we these substitutions we can conduct exactly the same kind of analysis. Parsimony analysis of these molecular data yields a single tree of length 812 steps, shown in fig. x. mole cat hyaena civet dog This tree has many similarities to the trees obtained from racoon morphological data. Both trees include a monophyletic weasel pinniped group (seals, walruses, sealions) and a division of the living carnivores into two major subclades: seal feliforms (cats, hyaenas, civet) and caniforms (dogs, walrus raccoons, bears, otters/weasels, and pinnipeds), with dogs sister to the remaining coniforms. This agreement should sealion not be taken for granted. If these data were not the result bear of evolution along a tree, the probability that the two trees we would find would be this similar is 10-xx (how do we calculate this?). The fact that both kinds of data yield such similar trees provides concrete evidence in support of the claim that these tips are the result of descent from common ancestry. As discussed by Penny et al. (1982) the agreement among independent phylogenetic data sets serves to disprove the hypothesis of separate ancestry. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 8 However, while these two trees are remarkably similar, they also have some differences. One difference is the sister group of the pinnipeds: the bears for the morphological data (based in part on the fact that both groups have short tails), but a clade of raccoons plus otters/weasels for the DNA sequence data. If we assume that there is one true treelike history of carnivorans along which both molecular and morphological data have evolved then these differences are best understood as being due to imperfect phylogenetic inference. One or the other or both trees are presumably incorrect in some details. While some conclusions, such as the fact that pinnipeds are embedded within the caniforms, are well demonstrated by these analyses, other results would have to remain uncertain pending the collection of more data. Parsimony versus compatibility Early in the development of systematics, when it was realized that pure Hennigian reasoning did not work, another criterion compatibility was proposed as an alternative to parsimony. While compatibility is not widely used it is worth knowing about the concept and how it differs from parsimony. A character is said to be compatible with a tree if it can be explained on that tree without invoking any homoplasy. Parsimony uninformative characters are compatible with all trees. Parsimony informative characters are compatible only with some trees. A character with two states is compatible only with trees that can explain the data with one change of state. A character with three states is compatible only with trees that can explain the data with two changes of state. Or more generally, a character is compatible with a tree if the number of state changes need on that tree is one A B C D E F less than the number of states. Here is a tree and some examples of characters that are and are not compatible. A B C D E F Length on tree A A A A A A 0 Compatible G T G G T G G T G G T T A C T G C C 1 1 2 G G C T A A 3 G G A A G G 2 Incompatible C A A A T C C C T A A G C C C C C G G A T G G G 3 3 3 4 T G A T C G 4 The best tree according to the compatibility criterion is the tree for which the maximum number of characters are compatible. Compatibility is similar but not identical to parsimony: parsimony selects the tree on which there are the fewest number of homoplastic changes whereas compatibility selects the tree on which there are the fewest number of characters with any homoplasy. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 9 While the most-compatible tree is often also the most-parsimonious tree, it need not be. Whereas compatibility treats all incompatible characters as equivalent, parsimony takes account of the number of steps associated with compatible and the non-compatible characters. This is shown by the following hypothetical example in which tree 1 is favored by parsimony and tree 2 by compatibility. This is because the last two characters, which are not compatible with either tree, nonetheless can be explained on tree 1 with fewer steps than on tree 2. O A B C D E O A Tree 1 O A B C D E L on tree 1 L on tree 2 B C E D Tree 2 1 A C C C C C 1 1 2 C C T T T T 1 1 3 G G T T T T 1 1 4 G G G A A A 2 1 5 A A A A T T 1 1 6 T T T T C C 1 1 7 A T T A A T 2 3 8 G A A G A G 2 3 Compatible 5 6 Length 11 12 Compatibility is not an unreasonable method for estimating phylogenies, especially when we believe that there are a subset of characters whose evolution perfectly reflects the phylogeny, while other characters change state too much to be useful. However, if you believe that all characters, even ones that have experienced some homoplasy, contain information then parsimony is a better criterion. For this and other reasons compatibility is rarely, if ever, used in modern phylogenetics and will not be discussed further. A justification of parsimony The aim of phylogenetics is to choose the tree that is most likely to be true given all the empirical data. Why would we expect more parsimonious (shorter) trees to be more probably true than less parsimonious (longer) trees? Why would the discovery of a set of data for which tree A is shorter than tree B tend to increase our belief that tree A is true? Here I will present a verbal argument. A simplified quantitative analysis is provided in Appendix 8. Let’s start by assuming that the true evolutionary tree is such that the rate of character evolution is low and relatively even across the tree. When the rate of character evolution is low the chance of a single character changing state is low, but the chance of changes happening on two branches is much lower, because it will be the square of the initial low probability (both changes must occur so the probability is multiplied). So most characters will have experienced few changes of state from the ancestral state (at the root of the ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 10 tree), meaning that homoplasy will be low. Thus, the data will generally be explicable by invoking few changes on the tree that matches the true tree, but will require more changes on other trees. Therefore, when rates of evolution are reasonably low, the tree that can explain all the characters with the fewest numbers of changes will tend to resemble the true tree. What if the rate of evolution is high for all characters? If all characters in the matrix tend to evolve very rapidly, the data will contain little consistent signal favoring one tree over another. Once this lack of phylogenetic signal has been detected (by methods that will be described later in this chapter), it will become clear that the data are not suited to parsimony analysis. What if the rate of evolution is low for some characters but very high for others? In this case the rapidly evolving characters will tend to yield a noisy pattern that will not strongly favor any one tree. However, provided there are enough slowly evolving traits, they will tend to lay their collective support behind a single tree – and that tree should resemble the true tree. Taken together, parsimony provides a reasonable method for inferring phylogenetic trees: the probability that a tree is true will generally correlate at least loosely with tree length, shorter tree being more probable. Further, tree length provides a crude measure of how much better one tree is than another. Thus, if tree A is one step shorter than tree B but 15 steps shorter than tree C we can say that the data reject tree C more strongly than they reject tree B. However, the absolute length of a tree and the magnitude of the length difference between two trees is dependent on the particular matrix of characters used. Without other analyses (such as those presented later in the chapter), we cannot assert that one tree is “significantly” better than another. Although parsimony gives us valuable insights into the trees implied by our data, it falls far short of being a rigorous statistical method. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 11 Appendix 1 A quantitative demonstration that the most-parsimonious tree will tend to be the tree that is most likely to have generated the data. Name one character-state ‘0’ and its alternative state ‘1.’ For example, state ‘0’ could be locomotion = tetrapedal whereas state ‘1’ could be locomotion = bipedal; or state ‘0’ could be position 119 in haemoglobin = adenosine and state ‘1’ could be position 119 in haemoglobin = cytosine. Consider four taxa, an outgroup, O and three ingroups A-C. In order to be parsimony informative two taxa must have state 0 and two must have state 1. Suppose, B and C have state 1 and taxa A and O have state 0. If we order the taxa OABC, this character has the pattern 0011. The most parsimonious tree for this single character will be one that can explain the states with only a single change of character state, namely a tree with B and C sister: tree 1 among the three possible trees. So the method of parsimony assumes that, if we start by considering all three trees to be equally likely to be true, the observation of a character with a 0011 pattern is sufficient to make tree 1 the most plausible of the three trees. To understand the validity of parsimony we need to understand why a 0011 pattern supports tree 1 among these three trees. O A C B O B A Tree 2 Tree 1 C O C A B Tree 3 The answer is that, under certain assumptions, tree 1 is more likely to generate the data pattern 0011 than are the other trees. It is a general principle of statistics that the hypothesis that is most likely to be true is the one that, if it were true, would be the most likely to generate the observed data. But how can we calculate the probability of the data being generated under one of these trees? It will be helpful to work through this numerically, using trees 1 and 2 (we will ignore tree 3). To make it easy let’s assume that each branch on the two trees is the same length, where length is here measured in units of probability of a character changing state (in either direction). Suppose that the probability of a change occurring on each branch is 1%. This is low enough that we can assume that no more than one change will occur per branch. O A C B z y Tree 1 Assuming that nodes x, y, z represent actual ancestors that had either state 0 or state 1 (not both), there are eight possible combinations of ancestral states, as shown in the table. x ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 12 History 1 2 3 4 5 6 7 8 0 0 x 0 0 0 0 1 1 1 1 0 0 Hist. 1 Tree 1 0 0 z 0 1 0 1 0 1 0 1 We can calculate the probability of history 1 yielding a 0011 pattern by taking a product of the number of branches along which no change happened (probability 0.99) and those on which a change happened. In this case there are four branches with no changes and two with changes. So the overall probability of a 0011 pattern is 0.994 X 0.012 = 9.6 x 10-5 or about 1 in 10,000. 1 1 y 0 0 1 1 0 0 1 1 0 1 1 1 0 Hist. 2 Tree 1 0 History 1 (000) 2 (001) 3 (010) 4 (011) 5 (100) 6 (101) 7 (110) 8 (111) Now let’s do the same thing for history 2. In this case, just one character state change is required. This means that the chance of this outcome is 0.995 X 0.01 = 9.5 x 10-3, which is nearly 1 in 100. The table shows the probability of a 0011 pattern for all eight histories. Number of branches w/o change 4 5 1 4 2 3 1 4 Number of branches w/ change 2 1 5 2 4 3 5 2 Total Probability 9.6 x 10-5 9.5 x 10-3 9.9 x 10-11 9.6 x 10-5 9.8 x 10-9 9.7 x 10-7 9.9 x 10-11 9.6 x 10-5 9.8 x 10-3 Summing across all eight histories we can say that, given this tree, the probability of a 0011 pattern is about 0.0098, or a shade less than 1%. You can probably also see that the probability of a 1100 pattern will be the same. Thus, there is an approximately 2% chance that evolution up tree one will produce a parsimony informative character supporting this tree. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 13 You might consider 2% to be a low number, so why would we still say that these data favor tree 1 over tree 2? To see the answer we need to also calculate the likelihood of a 0011 and 1100 pattern under tree 2. What is the probability of a 0011 pattern under tree 2? Note that because the tips are not in the OABC order but are in the order OBAC, the 0011 pattern would be read across the tips as 0101. This means that the histories have different probabilities of yielding the desired pattern. The table shows probabilities of each history that leads to O and A having state 0 and B and C having state 1 under this tree. 0 1 1 0 History 0 0 Hist. 1 Tree 1 0 1 0 1 0 1 0 Hist. 2 Tree 1 1 (000) 2 (001) 3 (010) 4 (011) 5 (100) 6 (101) 7 (110) 8 (111) Branches w/o change 4 3 3 4 2 1 3 4 Branches Probability w/ change 2 9.6 x 10-5 3 9.7 x 10-7 3 9.7 x 10-7 2 9.6 x 10-5 4 9.8 x 10-9 5 9.9 x 10-11 3 9.7 x 10-7 2 9.6 x 10-5 Total 2.9 x 10-4 0 Notice that while the 0011 pattern was very unlikely under tree 1 (ca. 1%) it is even less likely under tree 2 (0.03%): more than 30-fold less probable. The same is true for this pattern under tree 3. Thus, observation of traits with a 0011 (and 1100) pattern provides support for tree 1 over trees 2 and 3. In case you are wondering, where the other 97% of the probability is, it is divided among all the noninformative data patterns (0000,0001,0010, 0100, 1000, 0100, 0010, 0001, and 1111). The conclusion that a trait with an informative pattern is more likely to arise on the tree that it supports by parsimony hold so long as the probability of change on each branch is relatively low and similar among branches (Felsenstein 1978). More generally, so long as changes are rare, histories that invoke fewer changes have a much higher probability of giving rise to the observed data than histories that require many changes. Thus, characters lend statistical support for trees on which they undergo few changes in character state. Thus, our back-of-the envelope calculations explain why the parsimony criterion is justified for identifying the “best” trees. ©Baum and Smith 1/22/09. Draft. Do not circulate. Page 14
© Copyright 2026 Paperzz