Syst. Biol. 46(4):765-769, 1997 Prior Agreement: Arbitration or Arbitrary? MARK E. SIDDALL Museum of Zoology, University of Michigan, Ann Arbor, Michigan 48109, USA; Email: [email protected] Contemporary systematists are deeply divided on the issue of combining or partitioning data sets in phylogenetic analyses (Mickevich and Farris, 1981; Miyamoto, 1985; Bull et alv 1993; Chippindale and Wiens, 1994; Huelsenbeck et al v 1994; de Queiroz et al., 1995; Allard and Carpenter, 1996; Ballard, 1996; Huelsenbeck et al., 1996a, 1996b). Intermediate between these positions is the prior agreement (Chippindale and Wiens, 1994) approach, in which one examines properties of the data a priori with the aim of making some determination of data combinability (Bull et al., 1993; Huelsenbeck et al., 1994; Huelsenbeck et al., 1996a, 1996b), i.e., sometimes combine, sometimes do not combine. Motivations for adopting one framework or the other stem from philosophical positions that are as deeply held as are the points of view on combining or compartmentalizing (Kluge, 1989; Kluge and Wolf, 1993; Miyamoto and Fitch, 1995). For example, if one is to accept maximum likelihood as an appropriate methodology for estimating phylogenetic trees it is not clear to me what form of reasoning could justify the combined analysis of morphological and molecular data together under an HKY85 model. Thus, if the "HKY85+r 5 model [is] the best fitting model" (Huelsenbeck, 1997:72) for nucleotide changes, it cannot also be a realistic probabilistic model for the evolution of a femur. Nor, however, should one cast all morphological data to oblivion (although some do take this extreme position; Hedges and Maxson, 1996). Similarly, if one is to take parsimony as the appropriate methodology for inferring phylogenetic trees and accept falsification and explanatory power (Farris, 1983, 1986; Mickevich and Platnick, 1989; Frost and Kluge, 1994) as its logical underpinnings, keeping data sets separate is anathema to explaining all of the data. Although Huelsenbeck et al. (1994:288) suggested that prior agreement represents a "philosophical framework for the treatment of potentially diverse data," they did not indicate what their philosophical underpinnings were. To the contrary, prior agreement is merely a methodological framework and is arbitrary in its application. Although I disagree with Miyamoto and Fitch's (1995) compartmentalist position, clearly they recognized the full implications of adopting a process partition foundation in their approach. That is, if the notion of process partitions is driving the decision to keep data sets separate, whether or not these partitions are incongruent is immaterial. Huelsenbeck et al. (1994; see also Huelsenbeck et al., 1996a) posited that only if data sets can be shown to be "significantly incongruent" should they be kept separate (the statement that data sets should be kept separate unless found not to be significantly incongruent differs in intent but is equivalent in implication). The argument offered by Bull et al. (1993:385) was that data sets "should not automatically be assumed to be homogeneous." The methodological statement then reduces to the dangers of assuming homogeneity of data in an analysis. However, a necessary implication of analyzing the data sets independently thereafter is that the individual data sets are themselves assumed to be internally homogeneous, contrary to Bull et al/s (1993) own admonishment that this is a dangerous assumption. Consider six taxa (I-VI) and two data sets (A, B) each with four character types (Table 1). Data set A has h characters supporting (II, III, IV, V, VI), i characters supporting (II, III, IV), / characters supporting (II, III), and k characters supporting (V, VI). 765 766 VOL. 4 6 SYSTEMATIC BIOLOGY TABLE 1. Incongruent data sets A and B. Bold values are those for the incongruent character in data set B. Taxon I II III IV V VI Na 0... 1... \... 1... 1... 1... h 0... \ ... \ ... 1 ... 0... 0... i 0 . .. 1 . .. 1 ... 0... 0... 0... i 0... 0... 0... 0... 1... 1... k 0. 1. 1. 1. . 1. 1. . X 0... 1... 1 ... 0... 0... 0... y 0... 0... 0... 0... I ... I... z 0 0 0 1 1 1 1 ' Number of characters with this pattern. Data set B has x characters supporting (II, III, i y V, VI), y characters supporting (II, III), z characters supporting (V, VI), and only one character supporting (IV, V, VI). No matter how large or small one makes the numbers h, i, j , and k or the numbers x, y, and z, the lone incongruent character in data set B, which supports a contradicting clade (IV, V, VI), will always be sufficient to render the data sets significantly incongruent (using the incongruence length difference test; Farris et al., 1994). Huelsenbeck et al. (1994) would have us not combine these data sets together, even if the single incongruent character were but one of millions in data set B and billions in data set A and not withstanding that otherwise there is considerable mutual support for three clades by both data sets. Of course, data sets A and B are contrived and the example is trivial. Nonetheless, the principle can be extended to real situations. Siddall et al. (1997) examined the relationships of the protistan phyla Ciliophora, Apicomplexa, and Dinoflagellida and particularly the placement of the genus Perkinsus therein with 18S ribosomal DNA (rDNA) and actin sequences. A different tree was found using rDNA data (Fig. la) than was found using actin data (Fig. lb). The incongruence length difference test of Farris et al. (1994; implemented in ARNIE.EXE; Siddall, 1996) indicated significant heterogeneity (P = 0.001). This result would have us, under the prior agreement criterion, refrain from combining these data sets. However, Siddall et al. (1997) showed that the first and second codon positions in the actin gene were not incon- gruent with the rDNA data (P = 0.643), whereas these data taken together (Fig. la) were incongruent with the third positions in the actin gene (Fig. lc). The conclusion here, then, is that although prior agreement would have had us not combine the actin and rDNA data at all, the decision to act thus would have been arbitrary because most of the actin data are congruent. That is, one manner of partitioning would have been chosen over some other possible manner of partitioning that is equally or more incongruent than the one chosen without there there being any particularly good reason for choosing one over the other. (By arbitrary, I mean "selected at random or without reason," i.e., choice of one manner of partitioning over a conceivable multitude of possible parititions; moreover, the decision to not combine, if made on the basis of arbitrary choice, is then itself arbitrary). Were this the extent of it, prior agreement could simply assert that the third position represents a nonarbitrary partition and is the cause of the incongruence. However, suppose we had a way to arbitrarily set off 75 characters, for example taking the characters in third position serial homology in the actin gene, calculating for each character the difference between its consistency on the tree suggested by the third positions and that on the tree suggested by the rest of the data, and then ranking the characters according to this difference. Arbitrarily removing only these 75 positions (less than one-sixth of the actin data) with the greatest difference in consistency on the two trees renders the rest of the third position characters congruent (Fig. la) 1997 767 POINTS OF VIEW Perkinsus 1 Perkinsus 2 Prorocentrum Amphidinium (a) Toxoplasma Plasmodium Cryptosporidium Oxytricha 1 Oxytricha 2 Euplotes Tetrahymena.t Tetrahymena.p (b) Perkinsus 1 Perkinsus 1 Perkinsus 2 Perkinsus 2 Prorocentrum Prorocentrum Amphidinium Amphidinium Toxoplasma Oxytricha 1 (c) Toxoplasma Oxytricha 1 Tetrahymena.t Oxytricha 2 rf (d) Euplotes Tetrahymena.p Plasmodium Oxytricha 2 Cryptosporidium Euplotes Tetrahymena.t Plasmodium Tetrahymena.p Cryptosporidium Perkinsus 1 Perkinsus 1 Perkinsus 2 Perkinsus 2 Prorocentrum Oxytricha 2 Euplotes Oxytricha 2 Euplotes Plasmodium Cryptosporidium Amphidinium (e) •E Plasmodium Cryptosporidium Tetrahymena.t Tetrahymena.p Toxoplasma Toxoplasma Oxytricha 1 Prorocentrum Tetrahymena.t Amphidinium Tetrahymena.p Oxytricha 1 FIGURE 1. Single most-parsimonious trees resulting from parsimony analyses of genera in three protistan phyla, (a) 18S rDNA, or 18S rDNA + actin, or 18S rDNA + first and second positions of actin, or 18S rDNA + first and second positions + all but 75 third positions of actin, or 18S rDNA + first and second positions + all but 50 third positions of actin. (b) Actin alone, (c) Third positions of actin sequences alone, (d) Subset of 75 third positions of actin alone, (e) Strict consensus of a subset of 50 third positions of actin alone. 768 VOL. 46 SYSTEMATIC BIOLOGY TABLE 2. Incongruent data sets C and D. Taxon C D I II III IV V VI 000000 101101 101101 100100 010010 010010 0000 0111 1011 1100 1100 0010 with the rDNA + actin first position + actin second position data set (P = 0.581). These 75 third position characters (Fig. Id) were significantly incongruent with the rest of the data (P = 0.001). However, there is no discernible connection among these 75 characters. That is, their involvement was not linked to any particular amino acid(s) and they were randomly located across the actin sequence. Lest the incongruence now be atributed to these 75 characters, consider a ranking of their individual incongruence with the rest of the data according to the skew of the frequency distribution of the sum of tree lengths found from random partitions. Now, arbitrarily discarding only the 50 most incongruent (least skewed) characters renders the remaining 25 characters congruent with the rest of the actin + rDNA data set (Fig. la). These 50 third-position characters, of course, are significantly incongruent (Fig. le) with the rest of the data (P = 0.001). There may yet be even fewer than 50 characters that are contributing this heterogeneity, but I ran out of arbitrary criteria for finding them. What is a phylogeneticist to do when confronted with a situation in which there is more than one way to identify incongruence? Data sets C and D (Table 2) are incongruent (P = 0.02). But if one removes the first, second, or third character in D, the data sets are congruent (P > 0.05). Which character is causing incongruence? In fact, incongruence frequently is a more complex issue that cannot be so readily teased apart as Huelsenbeck et al. (1994; Huelsenbeck et al., 1996a) would have us believe. Returning to the alveolate phyla in Figure 1, if the incongruence in actin data (Fig. lb) were merely reflective of the third positions alone (Fig. lc), we would expect third positions to reveal a poorer match to the tree in Figure la. In fact, both of these trees (Figs, lb, lc) match the whole-data tree equally well (9 of 11 possible nodes are correlated [sensu Page, 1994] as determined by an exhaustive search with TreeMap; Page, 1995), they just match in different ways. Incongruence and combinability are different things. Finding two chosen partitions to be heterogeneous carries no more meaning than finding that two other arbitrary partitions are heterogeneous. When data are partitioned, the investigator runs the risk of discarding otherwise corroborating information. The proportion of characters (as made extreme in the contrived single-character example) contributing to incongruence might in fact be vanishingly small and of marginal consequence in light of otherwise overwhelming agreement or mutual corroboration in the two data sets. Prior agreement cannot be saved from arbitrariness, and phylogeneticists must decide either to combine or to not combine according to the merits of the competing philosophies. This approach may not be as comforting as the prior agreement approach, but it is rational. ACKNOWLEDGMENTS I thank A. Kluge, M. Sorenson, and C. Thacker for comments on earlier drafts and Paul Chippindale for his thoughtful review. REFERENCES ALLARD, M., AND J. M. CARPENTER. 1996. On weight- ing and congruence. Cladistics 12:183-198. BALLARD, J. Q 1996. Combining data in phylogenetic analysis. Trends Ecol. Evol. 11:334. BULL, J. J., J. P. HUELSENBECK, C. W. CUNNINGHAM, D. L. SWOFFORD, AND P. J. WADDELL. 1993. Partition- ing and combining data in phylogenetic analysis. Syst. Biol. 42:384-397. CHIPPINDALE, P. T., AND J. J. WIENS. 1994. Weighting, partitioning, and combining characters in phylogenetic analysis. Syst. Biol. 43:278-287. DE QUEIROZ, A., M . J. DONOGHUE, AND J. KlM. 1995. Separate versus combined analysis of phylogenetic evidence. Annu. Rev. Ecol. Syst. 26:657-681. FARRIS, J. S. 1983. The logical basis of phylogenetic analysis. Pages 7-36 in Advances in cladistics, Volume 2 (N. I. Platnick and V. A. Funk, eds.). Columbia Univ. Press, New York. 1997 POINTS OF VIEW FARRIS, J. S. 1986. On the boundries of phylogenetic systematics. Cladistics 2:14—27. FARRIS, J. SV M. KALLERSJO, A. G. KLUGE, AND C. BULT. 1994. Testing significance of incongruence. Cladistics 10:315-319. FROST, D. R., AND A. G. KLUGE. 1994. A consideration of epistemology in systematic biology, with special reference to species. Cladistics 10:259-294. HEDGES, S. B., AND L. R. MAXSON. 1996. Re: Mole- cules and morphology in amniote phylogeny. Mol. Phylogenet. Evol. 6:312-314. HUELSENBECK, J. P. 1997. Is the Felsenstein zone a fly trap? Syst. Biol. 46:69-74. HUELSENBECK, J. P., J. J. BULL, AND C. W. CUNNING- HAM. 1996a. Combining data in phylogenetic analysis. Trends Ecol. Evol. 11:152-158. HUELSENBECK, J. P., J. J. BULL, AND C. W. CUNNING- HAM. 1996b. Reply. Trends Ecol. Evol. 11:335. HUELSENBECK, J. P., D. L. SWOFFORD, C. W. CUNNINGHAM, J. J. BULL, AND P. J. WADDELL. 1994. Is char- acter weighting a panacea for the problem of data heterogeneity in phylogenetic analysis? Syst. Biol. 43:288-291. KLUGE, A. G. 1989. A concern for evidence and a phy- logenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Syst. Zool. 38:7-25. KLUGE, A. G., AND A. J. WOLF. 1993. Cladistics: Whafs in a word? Cladistics 9:183-199. 769 MICKEVICH, M. F., AND J. S. FARRIS. 1981. The impli- cations of congruence in Menidia. Syst. Zool. 30:351369. MICKEVICH, M. E, AND N. I. PLATNICK. 1989. On the information content of classifications. Cladistics 5: 33-47. MIYAMOTO, M. M. 1985. Consensus cladograms and general classifications. Cladistics 1:186-189. MIYAMOTO, M. M., AND W. M. FITCH. 1995. Testing species phylogenies and phylogenetic methods with congruence. Syst. Biol. 44:64-76. PAGE, R. D. M. 1994. Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst. Biol. 43:58-77. PAGE, R. D. M. 1995. TreeMap, version 1.0b. Division of Environmental and Evolutionary Biology, Univ. Glasgow, Glasgow. SIDDALL, M. E. 1996. Random cladistics, Ohio edition. Virginia Insitute of Marine Science, College of William and Mary, Gloucester Point. SIDDALL, M. E., K. R. REECE, J. GRAVES, AND E. M. BURRESON. 1997. "Total evidence" refutes inclusion of Perkinsus species in the phylum Apicomplexa. Int. J. Parasitol. 115:165-176. Received 13 November 1996; accepted 15 August 1997 Associate Editor: David Cannatella
© Copyright 2026 Paperzz