Compatibility analysis and its applications

<oological Journal of the Linnean SocieQ (1982), 74: 267-275. With 2 figures
Compatibility analysis and its applications
WALTER J. LE QUESNE
Anne Cottage, 70 Lye Green Road, Chesham, Buckinghamshire HP5 3NB
Accepted for publication
JUM
1981
A two-state character is defined as uniquely derived if it has only evolved once in the history ofa group,
without subsequent reversal. Two independent characters cannot both be uniquely derived if all four
possible combinations (or all three excluding that of the two ancestral forms) occur.
A number of ways of choosing compatible sets of uniquely derived characters are discussed and used
to derive possible unrooted and rooted trees. Results of these are related to those chosen on parsimony
criteria, using data for orthopteroid groups, and the assumptions of both methods are compared.
Application of compatibility analysis to the moth genera Teldenia and Argodrepunu is also discussed.
Compatibility and parsimony methods are complementary rather than exclusive of each other.
KEY WORDS:4ladistics - polarity
-
parsimony
-
numerical
-
taxonomy
-
incompatibility.
CONTENTS
Introduction . . . . . . . . .
The concept of incompatibility . . . . .
Compatibility analysis . . . . . . .
The character-pair matrix . . . . .
Constructing the network . . . . .
The coefficient of character-state randomness
The normal deviate . . . . . .
Finding the largest possible set . . . .
Rooted cladograms
. . . . . .
Application to multistate characters
. .
Use of reference taxa outside the study group
Application to a subset of the study group .
Application to the genus Teldeniu . . . .
Application to Kamp’s orthopteroid data . .
Value of compatibility methods. . . . .
Philosophical considerations . . . .
Use of more than one technique of cladogram
. . . . . . . . .
References.
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
. . . . . . . .
. .
. .
. .
. .
.
. . . .
. . . .
. . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
construction . . .
. . . . . .
.
. . . .
. . . .
.
.
.
.
.
.
. .
267
268
268
268
269
269
269
270
270
270
270
27 1
271
27 1
274
274
274
274
INTRODUCTION
The most direct method of attempting computer analysis of any problem is to
try to define the successive steps involved in one’s instinctive approach to it. The
classical method of suggesting a phylogeny normally depended on an implicit
assumption that a chosen character (or group of characters) had only evolved
0024-4082/82/030267+09 $02.00/0
267
0 1982 The Linnean Society of London
W. J. LE QUESNE
268
’Table 1. Derivation of two independent uniquely derived characters
CHARACTER I
..tncestral
state
First
derived
A
B
Second
derived
A
B
or
CHARACTER 2
A
’4
A
A
or
CH.4RACTER 1
B
B
(but not both)
B
B
or
CHARACTER 2
A
B
A
B
(but not both)
once, without subsequent reversal, and using it to indicate the pattern of evolution.
However, the publication of widely diverse trees for the same group by different
authors made taxonomists wonder if more objective methods might not settle the
problems more decisively, though with hindsight it would appear that the
improvements are often rather limited.
THE CONCEPT OF INCOMPATIBILITY
About 13 years ago, as a taxonomist with no practical experience of numerical
methods, the thought occurred to me that by consideration of two independent twostate characters together one could get an indication of the possibility of their both
being uniquely derived, that is having evolved once without subsequent reversal.
The test depends on the realization that the species ancestral to the whole group
being studied had both characters in the ancestral state, represented in Table 1 by
A, where B represents an evolved state. The first change which occurred produced
state B in one or other of these characters, and the second change might have
occurred either among species of the original AA combination or the evolved AB
(or BA) combination, producing one or other of the extra combinations shown in
either case, but not both, making a total of three combinations. Thus, it is
impossible to obtain all four combinations together without either a character
evolving twice or reversal of an already evolved character.
The logical consequence is that if all four combinations of character-states for
the two characters are found, then one or other of the characters is not uniquely
derived (or possibly both are not), as pointed out in my original paper on this topic
(Le Quesne, 1969). Estabrook et al. (1976) introduced the term ‘compatible’ for
pairs ofcharacters that passed this test. It does not matter for this compatibility test
which is the ancestral and which the derived state. Unfortunately, an
incompatibility does not tell us which of the characters is or is not uniquely
derived.
COMPATIBILITY ANALYSIS
The character-pair matrix
In my 1969 paper, I suggested making up what I called the ‘character-pair’
matrix showing whether or not each possible pair of characters was compatible
COMPATIBILITY ANALYSIS
269
with each other, and then eliminating the character with the largest number of
incompatibilities by drawing vertical and horizontal lines. The remaining
incompatibilities are counted again and the process repeated until none remain.
Another simple method suggested in a subsequent paper (Le Quesne, 1972) was
acceptance of the character with the smallest number of incompatibilities,
eliminating those incompatible with it in similar fashion, recounting and again
eliminating until all the incompatibilities have been removed.
Constructing the network
After elimination by either of these techniques, one is left with a set of characters
each of which divides the set or organisms under study into two groups depending
on their character-state. To turn this information into a cladogram, I found it most
convenient to start with the selected character which divides the taxa under study
into two groups as nearly equal as possible, when all the other selected characters
will be found to split off a section of one or other of these two groups, enabling one
to build up the tree. The convenience of this approach masks the fact that there is
no evidence for the position of the root, as pointed out by Felsenstein (1975): thus
the term ‘network’ might be more appropriate. The fact that a set of characters
which were all compatible when taken on a pairwise basis was compatible as a
whole was rigidly proved by McMorris (1975).
The coeficient of character-state randomness
I also suggested (Le Quesne 1969, 1972) calculation of the number of
incompatibilities to be expected if each character was represented by the found
number of A and B character-states, but these were distributed in a random
fashion: the calculated total number of these I called P,.The ratio obtained by
dividing the actual number of incompatibilities (denoted by N,) by the value of P,
and expressing a percentage is called the ‘coefficient bf character-state
randomness’, and clearly will be zero for a group of completely compatible
characters. In most cases studied, the figures have been in the range 67-96y0 (Le
Quesne, 1975), but when the figures are above 90% there are difficulties in getting
meaningful cladograms.
This coefficient of character-state randomness can be applied to the whole of the
data or just to the pairings between any one character and each of the others. If
this method is used to give a value for each character, it is possible to select the
character with the lowest value and to eliminate characters incompatible with this,
carrying on until a mutually compatible set is obtained. This constitutes a third
selection method.
The normal deviate
A fourth method, described in my 1972 paper, depends on calculation of a
normal deviate, using the formula
P S- N x
N.D. = (p.(nhPs))”
in which N, and Ps are as previously defined and no represents the number of valid
comparisons (those involving characters with two or more examples of both
character-states). This criterion can again be applied to a single character or to a
270
W. J. LE QUESNE
group of characters: where a number of characters are completely correlated, the
information from them can be reinforced by use of the combined normal deviate.
Again, we can select the character with the largest positive normal deviate,
eliminate characters incompatible with this, choose the remaining character with
the highest positive value, and so on.
The normal deviate can also be calculated for a set of mutually compatible
characters, giving a figure related to the possibility of the assemblage occurring by
accident, as in Table 2.
A single Fortran program has been written which produces networks by all four
of the above selection methods.
Finding the largest possible set
A fifth criterion for selection of a set of compatible characters is to find the
largest possible set. I am rather sceptical of the ‘biggest is best’ concept, possibly
based on the outcome of its application to the orthopteroid data which I studied
and thus a rather subjective view, but it has been developed by Estabrook et al.
(1977). My suspicion is that it might bring out parallelisms based on function.
Rooted cladograms
All the methods described so far require no assumptions as to which is the
primitive and which the derived state: that is why unrooted trees are produced. If
we know the actual direction of change in each character, we can make a
somewhat more stringent compatibility test. Returning to Table 1 , we see that for
two uniquely derived characters only AA and two other combinations occur, i.e.
the condition that all three combinations other than AA (i.e. AB, BA and BB),
occur is sufficient for an incompatibility (Le Quesne, 1979).This is the basis for the
‘cliques’ proposed by Estabrook et al. (1977) and will lead to finding compatible
synapomorphies.
Application to muitistate characters
The methods which I have de\reloped only apply to two-stage characters, but
Estabrook et al. (1977) have proposed methods for detecting incompatibilities in
multistate characters. However, I prefer breaking these up into pairs of two-state
characters (e.g. for a base sequence into (a) A or not A, (b) C or not C, etc.), since
one character-state may be uniquely derived from the ancestral one and another
derived several times from it : in such circumstances one could easily ‘throw away
the baby with the bath-water’ using Estabrook’s method.
It should be noted that the choice of unrooted trees in either compatibility or
‘parsimony’ methods is unaffected by ‘singularities’ (i.e. a character-state only
found in one taxon under study) and this constraint can substantially reduce the
number of two-state characters to be tested.
Use of reference taxa outside the study group
In practice, we usually deduce the primitive state by reference to organisms
closely related to, but outside the study group. Thus, a practical alternative is to
add one or more of these reference species to the data matrix, which essentially
gives a direction to the tree and supplies a root. An example is the moth genus
Teldenia, as discussed below.
COMPATIBILITY ANALYSIS
271
ilpplication to a subset of the study group.
I have previously referred (Le Quesne, 1975) to some diaspid data where the
relationship within a cluster of ten species out of the original 26 were not clear from
the initial analysis, but when these ten were studied on their own the relationships
became clear. It is thus often very valuable in this work to make changes in the set
of taxa under analysis.
APPLICATION T O THE GENUS TELDENZA
The methods discussed above have been applied to data for the moth genus
Teldenia published by Wilkinson (1967). As seen from Table 2, application to the
genus Teldenia on its own gave a different cladogram for each of the five methods of
selection, very low normal deviates and a coefficient of character-state randomness
of over 9476, clearly suggesting that not much confidence could be put in the
results. However, by combining these data with those for the related genus
Argodrepana a substantially clearer picture could be obtained.
Table 2. Teldenia and Argodrepana : results of application of five elimination
methods
Argodrepana
No. of characters
No. of species
No. of different selections
No. of characten selected
Normal deviate
C.C.S.R. (all data)
23
7
1
19
15.5
34.704
+
Teldenia
+
36
32
5
7-1 1
2.09-4.02
94.104
A rgodrepana
Teldenia
+
+
51
39
2
19-22
17.4-+ 17.8
72.700
The two alternative cladograms are shown in Fig. 1. In this case, the partition
into these genera was supported by seven characters (numbers 8, 11, 16, 19, 60, 64
and 66), while five other characters (numbers 9, 13, 14, 46 and 52) represented
groupings within Argodrepana. Thus, one can with confidence supply a root at the
point where the two genera separate.
Within Teldenia, characters 1,2,22,58,87 and 95 are selected for both alternative
trees. Cladogram A also depends on characters 27, 62,63 and 69, while cladogram
B is supported by character 44. The latter character is one of wing pattern, and
thus must remain rather suspect, though having the advantage that all the species
are included in the cladogram. Three of the characters supporting cladogram A
are based on male genitalia, but this scheme has the disadvantage that not all of
the species can be placed.
APPLICATION T O KAMPS ORTHOF'TEROID DATA
Mickevich (1978) has applied a useful objective test to various numerical
methods by use of pairs of data matrices for the same study group of organisms.
(Incidentally, the method which she ascribes to me is founded more closely on
Estabrook's work, based on finding the largest clusters of mutually compatible
W.J. LE QUESNE
272
Cladograrn B
Cladogram A
Argodrepano spp.
Argodrepona spp
veriicoio
galbona
ouroiifrons
3
ouratifruns
denficulafo
deniicuIota
ienebro
4
5
6
umbroso
7
ruficosia
ruficosfo
,
1
z-j3j
5
fenebra
Teldenio spp
Teldenio SPP
27
23
C
-
27
b
23
W O
pure
24
cofhoro
coibaro
25
30
19
b
30
19
ruficosio
monilioio
ruficosio
mon/liaia
niveo
niveo
opoio
unisirigoia
2I
sparsoto
14
orgeio
heleno
20
d
26
28
unisfrigato
2I
specca
subpuro
29
niveoto
sirigoso
32
inanis
obsoleio
29
subpuro
nigrinotaio
melanasiicfo
demo
9
desma
ouriiineo
niveoio
I3
specco
atbo
olbo
vesfigiaio
psoro
loiilinea
celidogropbia 10
psaro
5
celidographio I0
geminato
Jlunaia
12
15
unplaced
geminofo
12
sparsofa
14
15
apato
18
orgeto
d
26
28
inonis
17
seriato
31
heleno
20
seriato
31
sfrigoso
32
Figure 1. Two cladograms obtained by compatibility analysis of data for the genera Teldmia and Argodrepana.
COMPATIBILITY ANALYSIS
Giles doto
213
Blockith doto
steps
&h
G b D B M P G T A
m
steps
steps
1
Ip
Combined dota
(I)
(n)
160
141
7
139
160
140
160
139
174
n133
171
Ill
135
306
172
I
Y
135
307
301
P
n
G b D B M P A T G
L??d?l
Jll
300
Gb D P B M A T G
m
IU
299
Gb D P B M A T G
I
A h
307
Gb P B M D A T G
II
I
G b P D B M A T G
rnh
G
~
MB P D A
T G
C.C.S.R.(011 doto)
No.of chorocters
No. of chorocters selected
86.I%
76
18 - 21
88.3%
56
14- 16
88.7%
I32
30 - 33
Figure 2. Cladograms selected on compatibility analysis and parsimony grounds from Kamp’s data on
orthopteroids. Roman numerals indicate the selection methods leading to each cladogram based on the various
data sets. A, Acrididae; B, Blattaria; D, Dermaptera; G, Gryllidae; Gb, Grylloblattodea; M, Mantodea; P,
Phasmida; T, Tettigoniidae.
characters.) Following her philosophy, I have recently been analysing the two
separate data matrices on orthopteroids published by Kamp (1973) in an attempt
to place the aberrant group Grylloblattodea. The results obtained are shown in Fig.
2, using the five selection methods which I have discussed above, on both the
separate matrices and on the combined data.
I have also found the minimum number of steps required for each network as a
parsimony criterion, using a method based on that of Fitch (1971, 1975). In fact,
seven different networks have been selected, but they fall into two groups, the
upper ones fitting the Giles data on both compatibility and parsimony grounds
W. J. LE QUESNE
274
(the most parsimonious number of steps in each case being underlined). T h e
bottom three networks fit the Blackith data better using both techniques, while the
combined data fit in with the first, second and fourth networks. (Here methods I
and 1’do not distinguish between two possibilities-the methods are numbered in
the order which they are mentioned above and as designated in my 1972 paper.)
From these results we may conclude that in general compatibility and what are
traditionally termed ‘parsimony’ methods will not lead to widely disparate
answers. It may be noted, incidentally, that the Grylloblattodea have been put on
the left of the network in each case and that three possibilities emerge for their
closest relatives. The coefficients of character-state randomness are between 86 and
8 9 O 6 , making conclusions not very firm: in every case, after the singularities have
been excluded, the number of characters selected is less than 3001, of the total
number.
VALUE OF COMPATIBILITY METHODS
Philosophical considerations
Finally, why use compatibility methods? I feel that they are close to the implicit,
sometimes subconscious, judgments long made by taxonomists when assessing
relationships. The fundamental philosophical question that separates
compatibility and ‘parsimony’ methods is whether all characters are equal in their
information content. Classical taxonomists tend to think in terms of ‘stable’ and
‘unstable’ characters, and hope to find some in the former category that are good
indicators of the history of the group. The idea of an ‘unstable character’ is often
applied to those that are adaptive to ecological circumstances. This is less
obviously true with base or amino-acid sequences, but may still be significant if the
biological function of the protein coded for by the gene is not known.
Use of more than one technique of cladogram construction
Moreover, use of a number of different techniques helps to indicate the degree of
confidence one can put in the cladograms produced. When a number of techniques
give similar results, one naturally feels happier than when each gives a very
different conclusion, so the various compatibility and ‘parsimony’ tests available
should be regarded as complementary.
REFERENCES
ESTABROOK, G. F., JOHNSON, C. S. Jr. & McMORRIS, F. R., 1976. A mathematical foundation for the
analysis of cladistic character compatibility. Malhrmafical Bioscience, 29: 181-187.
ESTABROOK, G. F., STRAUCH, J. C . & FIALA, K. L., 1977. An application of compatibility analysis to
Blackith’s data on Orthopteroid insects. .$sfemufir (WIOQ, 26: 269-276.
FELSENSTEIN, J., 1975. Discussion ofpreceding presentation. In G. F. Estabrook (Ed.), Proceedings of the Eighfh
Infcrnafwna[ Conference on :Vumnual Taxonomy: 428. San Francisco: W. H. Freeman.
FITCH, W. M., 1971. Toward defining the course of evolution: minimum change for a specific tree topology.
Sysftmafic ZWIO~V,
20: 40-416.
FITCH. W. M.. 1975.Toward finding the tree ofmaximum parsimony. In G. F. Estabrook (Ed.),Proceedingsofthe
Eighfh Infernational Conference on .Numcrual Taxonoms: 189-230. San Francisco: W. H. Freeman.
KAMP, J. W., 1973. Numerical classification of the Orthopteroids, with special reference to the Grylloblattodea.
Canadian Entomologist, 105: 1235-1 249.
LE QUESNE, W.J., 1969.A method ofselectionofcharacten in numerical taxonomy. .$sfematic<oology, 18: 201-205.
LE QUESNE, W. J., 1972. Further studies based on the uniquely derived character concept. .$stmatic <oology,
21: 281-288.
COMPATIBILITY ANALYSIS
275
LE QUESNE, W. J., 1975. Discussion of preceding presentations. In G. F. Estabrook (Ed.), Pmceedings of the
Eighth International Conference on Numerical Taxonomy: 416-429. San Francisco: W. H. Freeman.
LE QUESNE, W. J., 1979. Compatibility analysis and the uniquely derived character concept. +sfmatic xoology,
28: 92-94.
McMORRIS, F. R., 1975. Compatibility criteria for cladistic and qualitative taxonomic characters. In. G. F.
Estabrook (Ed.),Proceedings of the Eighth International Conference on Numerical Taxonomy: 399-415. San Francisco:
W. H. Freeman.
MICKEVICH, M. F., 1978. Taxonomic congruence. Systematic .Zoology, 27: 143-1 58.
WILKINSON, C., 1967. A taxonomic revision of the genus Teldenia Moore (Lepidoptera: Drepanidae,
Drepaninae). Transactions of the Royal Entomological Socieg of London, 119: 303-362.