Prior Agreement: Arbitration or Arbitrary?

Syst. Biol. 46(4):765-769, 1997
Prior Agreement: Arbitration or Arbitrary?
MARK E. SIDDALL
Museum of Zoology, University of Michigan, Ann Arbor, Michigan 48109, USA;
Email: [email protected]
Contemporary systematists are deeply
divided on the issue of combining or partitioning data sets in phylogenetic analyses
(Mickevich and Farris, 1981; Miyamoto,
1985; Bull et alv 1993; Chippindale and
Wiens, 1994; Huelsenbeck et al v 1994; de
Queiroz et al., 1995; Allard and Carpenter,
1996; Ballard, 1996; Huelsenbeck et al.,
1996a, 1996b). Intermediate between these
positions is the prior agreement (Chippindale and Wiens, 1994) approach, in which
one examines properties of the data a
priori with the aim of making some determination of data combinability (Bull et al.,
1993; Huelsenbeck et al., 1994; Huelsenbeck et al., 1996a, 1996b), i.e., sometimes
combine, sometimes do not combine. Motivations for adopting one framework or
the other stem from philosophical positions that are as deeply held as are the
points of view on combining or compartmentalizing (Kluge, 1989; Kluge and Wolf,
1993; Miyamoto and Fitch, 1995). For example, if one is to accept maximum likelihood as an appropriate methodology for
estimating phylogenetic trees it is not clear
to me what form of reasoning could justify
the combined analysis of morphological
and molecular data together under an
HKY85 model. Thus, if the "HKY85+r 5
model [is] the best fitting model" (Huelsenbeck, 1997:72) for nucleotide changes, it
cannot also be a realistic probabilistic
model for the evolution of a femur. Nor,
however, should one cast all morphological
data to oblivion (although some do take
this extreme position; Hedges and Maxson, 1996). Similarly, if one is to take parsimony as the appropriate methodology
for inferring phylogenetic trees and accept
falsification and explanatory power (Farris,
1983, 1986; Mickevich and Platnick, 1989;
Frost and Kluge, 1994) as its logical underpinnings, keeping data sets separate is
anathema to explaining all of the data. Although Huelsenbeck et al. (1994:288) suggested that prior agreement represents a
"philosophical framework for the treatment of potentially diverse data," they did
not indicate what their philosophical underpinnings were. To the contrary, prior
agreement is merely a methodological
framework and is arbitrary in its application.
Although I disagree with Miyamoto and
Fitch's (1995) compartmentalist position,
clearly they recognized the full implications of adopting a process partition foundation in their approach. That is, if the notion of process partitions is driving the
decision to keep data sets separate, whether or not these partitions are incongruent
is immaterial. Huelsenbeck et al. (1994; see
also Huelsenbeck et al., 1996a) posited that
only if data sets can be shown to be "significantly incongruent" should they be
kept separate (the statement that data sets
should be kept separate unless found not
to be significantly incongruent differs in
intent but is equivalent in implication). The
argument offered by Bull et al. (1993:385)
was that data sets "should not automatically be assumed to be homogeneous."
The methodological statement then reduces to the dangers of assuming homogeneity of data in an analysis. However, a necessary implication of analyzing the data
sets independently thereafter is that the individual data sets are themselves assumed
to be internally homogeneous, contrary to
Bull et al/s (1993) own admonishment that
this is a dangerous assumption.
Consider six taxa (I-VI) and two data
sets (A, B) each with four character types
(Table 1). Data set A has h characters supporting (II, III, IV, V, VI), i characters supporting (II, III, IV), / characters supporting
(II, III), and k characters supporting (V, VI).
765
766
VOL. 4 6
SYSTEMATIC BIOLOGY
TABLE 1. Incongruent data sets A and B. Bold values are those for the incongruent character in data set B.
Taxon
I
II
III
IV
V
VI
Na
0...
1...
\...
1...
1...
1...
h
0...
\ ...
\ ...
1 ...
0...
0...
i
0 . ..
1 . ..
1 ...
0...
0...
0...
i
0...
0...
0...
0...
1...
1...
k
0.
1.
1.
1. .
1.
1. .
X
0...
1...
1 ...
0...
0...
0...
y
0...
0...
0...
0...
I ...
I...
z
0
0
0
1
1
1
1
' Number of characters with this pattern.
Data set B has x characters supporting (II,
III, i y V, VI), y characters supporting (II,
III), z characters supporting (V, VI), and
only one character supporting (IV, V, VI).
No matter how large or small one makes
the numbers h, i, j , and k or the numbers
x, y, and z, the lone incongruent character
in data set B, which supports a contradicting clade (IV, V, VI), will always be sufficient to render the data sets significantly
incongruent (using the incongruence
length difference test; Farris et al., 1994).
Huelsenbeck et al. (1994) would have us
not combine these data sets together, even
if the single incongruent character were
but one of millions in data set B and billions in data set A and not withstanding
that otherwise there is considerable mutual
support for three clades by both data sets.
Of course, data sets A and B are contrived
and the example is trivial. Nonetheless, the
principle can be extended to real situations.
Siddall et al. (1997) examined the relationships of the protistan phyla Ciliophora,
Apicomplexa, and Dinoflagellida and particularly the placement of the genus Perkinsus therein with 18S ribosomal DNA
(rDNA) and actin sequences. A different
tree was found using rDNA data (Fig. la)
than was found using actin data (Fig. lb).
The incongruence length difference test of
Farris et al. (1994; implemented in ARNIE.EXE; Siddall, 1996) indicated significant heterogeneity (P = 0.001). This result
would have us, under the prior agreement
criterion, refrain from combining these
data sets. However, Siddall et al. (1997)
showed that the first and second codon positions in the actin gene were not incon-
gruent with the rDNA data (P = 0.643),
whereas these data taken together (Fig. la)
were incongruent with the third positions
in the actin gene (Fig. lc).
The conclusion here, then, is that although prior agreement would have had
us not combine the actin and rDNA data
at all, the decision to act thus would have
been arbitrary because most of the actin
data are congruent. That is, one manner of
partitioning would have been chosen over
some other possible manner of partitioning that is equally or more incongruent
than the one chosen without there there
being any particularly good reason for
choosing one over the other. (By arbitrary,
I mean "selected at random or without
reason," i.e., choice of one manner of partitioning over a conceivable multitude of
possible parititions; moreover, the decision
to not combine, if made on the basis of arbitrary choice, is then itself arbitrary). Were
this the extent of it, prior agreement could
simply assert that the third position represents a nonarbitrary partition and is the
cause of the incongruence. However, suppose we had a way to arbitrarily set off 75
characters, for example taking the characters in third position serial homology in
the actin gene, calculating for each character the difference between its consistency
on the tree suggested by the third positions and that on the tree suggested by the
rest of the data, and then ranking the characters according to this difference. Arbitrarily removing only these 75 positions
(less than one-sixth of the actin data) with
the greatest difference in consistency on
the two trees renders the rest of the third
position characters congruent (Fig. la)
1997
767
POINTS OF VIEW
Perkinsus 1
Perkinsus 2
Prorocentrum
Amphidinium
(a)
Toxoplasma
Plasmodium
Cryptosporidium
Oxytricha 1
Oxytricha 2
Euplotes
Tetrahymena.t
Tetrahymena.p
(b)
Perkinsus 1
Perkinsus 1
Perkinsus 2
Perkinsus 2
Prorocentrum
Prorocentrum
Amphidinium
Amphidinium
Toxoplasma
Oxytricha 1
(c)
Toxoplasma
Oxytricha 1
Tetrahymena.t
Oxytricha 2
rf
(d)
Euplotes
Tetrahymena.p
Plasmodium
Oxytricha 2
Cryptosporidium
Euplotes
Tetrahymena.t
Plasmodium
Tetrahymena.p
Cryptosporidium
Perkinsus 1
Perkinsus 1
Perkinsus 2
Perkinsus 2
Prorocentrum
Oxytricha 2
Euplotes
Oxytricha 2
Euplotes
Plasmodium
Cryptosporidium
Amphidinium
(e)
•E
Plasmodium
Cryptosporidium
Tetrahymena.t
Tetrahymena.p
Toxoplasma
Toxoplasma
Oxytricha 1
Prorocentrum
Tetrahymena.t
Amphidinium
Tetrahymena.p
Oxytricha 1
FIGURE 1. Single most-parsimonious trees resulting from parsimony analyses of genera in three protistan
phyla, (a) 18S rDNA, or 18S rDNA + actin, or 18S rDNA + first and second positions of actin, or 18S rDNA
+ first and second positions + all but 75 third positions of actin, or 18S rDNA + first and second positions +
all but 50 third positions of actin. (b) Actin alone, (c) Third positions of actin sequences alone, (d) Subset of 75
third positions of actin alone, (e) Strict consensus of a subset of 50 third positions of actin alone.
768
VOL. 46
SYSTEMATIC BIOLOGY
TABLE 2. Incongruent data sets C and D.
Taxon
C
D
I
II
III
IV
V
VI
000000
101101
101101
100100
010010
010010
0000
0111
1011
1100
1100
0010
with the rDNA + actin first position + actin second position data set (P = 0.581).
These 75 third position characters (Fig. Id)
were significantly incongruent with the
rest of the data (P = 0.001). However, there
is no discernible connection among these
75 characters. That is, their involvement
was not linked to any particular amino
acid(s) and they were randomly located
across the actin sequence. Lest the incongruence now be atributed to these 75 characters, consider a ranking of their individual incongruence with the rest of the data
according to the skew of the frequency distribution of the sum of tree lengths found
from random partitions. Now, arbitrarily
discarding only the 50 most incongruent
(least skewed) characters renders the remaining 25 characters congruent with the
rest of the actin + rDNA data set (Fig. la).
These 50 third-position characters, of
course, are significantly incongruent (Fig.
le) with the rest of the data (P = 0.001).
There may yet be even fewer than 50 characters that are contributing this heterogeneity, but I ran out of arbitrary criteria for
finding them.
What is a phylogeneticist to do when
confronted with a situation in which there
is more than one way to identify incongruence? Data sets C and D (Table 2) are incongruent (P = 0.02). But if one removes
the first, second, or third character in D,
the data sets are congruent (P > 0.05).
Which character is causing incongruence?
In fact, incongruence frequently is a more
complex issue that cannot be so readily
teased apart as Huelsenbeck et al. (1994;
Huelsenbeck et al., 1996a) would have us
believe. Returning to the alveolate phyla in
Figure 1, if the incongruence in actin data
(Fig. lb) were merely reflective of the third
positions alone (Fig. lc), we would expect
third positions to reveal a poorer match to
the tree in Figure la. In fact, both of these
trees (Figs, lb, lc) match the whole-data
tree equally well (9 of 11 possible nodes
are correlated [sensu Page, 1994] as determined by an exhaustive search with
TreeMap; Page, 1995), they just match in
different ways.
Incongruence and combinability are different things. Finding two chosen partitions to be heterogeneous carries no more
meaning than finding that two other arbitrary partitions are heterogeneous. When
data are partitioned, the investigator runs
the risk of discarding otherwise corroborating information. The proportion of characters (as made extreme in the contrived
single-character example) contributing to
incongruence might in fact be vanishingly
small and of marginal consequence in light
of otherwise overwhelming agreement or
mutual corroboration in the two data sets.
Prior agreement cannot be saved from arbitrariness, and phylogeneticists must decide either to combine or to not combine
according to the merits of the competing
philosophies. This approach may not be as
comforting as the prior agreement approach, but it is rational.
ACKNOWLEDGMENTS
I thank A. Kluge, M. Sorenson, and C. Thacker for
comments on earlier drafts and Paul Chippindale for
his thoughtful review.
REFERENCES
ALLARD, M., AND J. M. CARPENTER. 1996. On weight-
ing and congruence. Cladistics 12:183-198.
BALLARD, J. Q 1996. Combining data in phylogenetic
analysis. Trends Ecol. Evol. 11:334.
BULL, J. J., J. P. HUELSENBECK, C. W. CUNNINGHAM, D.
L. SWOFFORD, AND P. J. WADDELL. 1993. Partition-
ing and combining data in phylogenetic analysis.
Syst. Biol. 42:384-397.
CHIPPINDALE, P. T., AND J. J. WIENS. 1994. Weighting,
partitioning, and combining characters in phylogenetic analysis. Syst. Biol. 43:278-287.
DE QUEIROZ, A., M . J. DONOGHUE, AND J. KlM. 1995.
Separate versus combined analysis of phylogenetic
evidence. Annu. Rev. Ecol. Syst. 26:657-681.
FARRIS, J. S. 1983. The logical basis of phylogenetic
analysis. Pages 7-36 in Advances in cladistics, Volume 2 (N. I. Platnick and V. A. Funk, eds.). Columbia Univ. Press, New York.
1997
POINTS OF VIEW
FARRIS, J. S. 1986. On the boundries of phylogenetic
systematics. Cladistics 2:14—27.
FARRIS, J. SV M. KALLERSJO, A. G. KLUGE, AND C. BULT.
1994. Testing significance of incongruence. Cladistics 10:315-319.
FROST, D. R., AND A. G. KLUGE. 1994. A consideration
of epistemology in systematic biology, with special
reference to species. Cladistics 10:259-294.
HEDGES, S. B., AND L. R. MAXSON. 1996. Re: Mole-
cules and morphology in amniote phylogeny. Mol.
Phylogenet. Evol. 6:312-314.
HUELSENBECK, J. P. 1997. Is the Felsenstein zone a fly
trap? Syst. Biol. 46:69-74.
HUELSENBECK, J. P., J. J. BULL, AND C. W. CUNNING-
HAM. 1996a. Combining data in phylogenetic analysis. Trends Ecol. Evol. 11:152-158.
HUELSENBECK, J. P., J. J. BULL, AND C. W. CUNNING-
HAM. 1996b. Reply. Trends Ecol. Evol. 11:335.
HUELSENBECK, J. P., D. L. SWOFFORD, C. W. CUNNINGHAM, J. J. BULL, AND P. J. WADDELL. 1994. Is char-
acter weighting a panacea for the problem of data
heterogeneity in phylogenetic analysis? Syst. Biol.
43:288-291.
KLUGE, A. G. 1989. A concern for evidence and a phy-
logenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Syst. Zool. 38:7-25.
KLUGE, A. G., AND A. J. WOLF. 1993. Cladistics:
Whafs in a word? Cladistics 9:183-199.
769
MICKEVICH, M. F., AND J. S. FARRIS. 1981. The impli-
cations of congruence in Menidia. Syst. Zool. 30:351369.
MICKEVICH, M. E, AND N. I. PLATNICK. 1989. On the
information content of classifications. Cladistics 5:
33-47.
MIYAMOTO, M. M. 1985. Consensus cladograms and
general classifications. Cladistics 1:186-189.
MIYAMOTO, M. M., AND W. M. FITCH. 1995. Testing
species phylogenies and phylogenetic methods with
congruence. Syst. Biol. 44:64-76.
PAGE, R. D. M. 1994. Maps between trees and cladistic analysis of historical associations among genes,
organisms, and areas. Syst. Biol. 43:58-77.
PAGE, R. D. M. 1995. TreeMap, version 1.0b. Division
of Environmental and Evolutionary Biology, Univ.
Glasgow, Glasgow.
SIDDALL, M. E. 1996. Random cladistics, Ohio edition.
Virginia Insitute of Marine Science, College of William and Mary, Gloucester Point.
SIDDALL, M. E., K. R. REECE, J. GRAVES, AND E. M.
BURRESON. 1997. "Total evidence" refutes inclusion
of Perkinsus species in the phylum Apicomplexa. Int.
J. Parasitol. 115:165-176.
Received 13 November 1996; accepted 15 August 1997
Associate Editor: David Cannatella