CLADISTIC ANALYSIS AND SYNTHESIS:
PRINCIPLES AND DEFINITIONS, WITH A
HISTORICAL NOTE ON ADANSON'S
FAMILLES DES PLANTES (1763-1764)
GARETH NELSON
Abstract
sult that various ideas, sometimes conflicting, are associated with the term.
"II faloit chercher dans la nature elle-meme son
Systeme, s'il etoit vrai qu'ele en eut en . . ."
(Michel Adanson, 1763:clvij).
From a historical standpoint, something
of a new language is, perhaps, developing. With this possibility in mind, I attempt to formalize some of the fundamental principles of cladistics, some of
which may be stated for the first time.
The reader is advised that in this paper, certain terms are used in a sense that
might seem unconventional (cladistics,
cladistic analysis, cladogram, phylogram,
information, efficiency, etc.). The concepts denoted by these terms are, I hope,
clearly enough defined to avoid confusion, once the reader is forewarned.
I focus on the cladistic aspect of Michener's paper (1977), again in the hope of
progress. What follows is an expansion of
my original commentary (Nelson, 1978),
not a reply to the replies of Michener
(1978) and Brothers (1978).
Briefly summarized, Michener's (1977)
Sensing what I thought are elements of
general, as well as historical, interest in
Michener's (1977) recent paper on discordant evolution and bee classification,
I thought that commenting on it might
contribute to a solution of one or more
basic disputes in systematics. Having
read Michener's (1978) reply to my comments, I now have second thoughts:
"I find Nelson's treatment . . . so extraordinary
that I wonder if we are actually writing about the
same things or using the same language" (p. 116.)
Whether Michener is right or wrong in
his wonderment, I believe that his and
Brothers' (1978) replies bring into relief
a problem centering on the term "cladistic." In recent years there has been some
discussion of this problem, with the re1
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
Nelson, G. (Department of Ichthyology, The American Museum of Natural History, Central
Park West at 79th Street, New York, New York 10024) 1979. Cladistic analysis and synthesis:
principles and definitions, with a historical note on Adanson's Families des
Plantes (1763-1764). Syst. Zool. 28:1-21.—Cladistic analysis is the analysis of hierarchically branching diagrams (cladograms), which estimate, with more or less informativeness and efficiency, one or more cladistic parameters. Branch points (components) comprise part of the information of a cladogram (the component information); and branch tips
(terminal taxa) comprise the other part (the term information). In an analysis of five cladograms
published on allodapine bees, components were segregated into four categories: (1) replicates; (2) components non-combinable with replicates; (3) components combinable with replicates and with each other; (4) components individually combinable with replicates but not
with each other. Components replicated in cladograms based on independent data sets have
low, but specifiable, probabilities of occurrence. For the five cladograms of bees, the replicates were found to be non-random (P = 10~17%). Through cladistic synthesis, categories (1)
and (3) were combined in a general cladogram—the best estimate of the only apparent cladistic parameter. In a comparison of the five cladograms of bees, phyletic procedures proved
more efficient and more informative than phenetic procedures in estimating the cladistic
parameter, as represented by the general cladogram. The number of characters on which each
of the five cladograms is based seems either uncorrelated, or inversely correlated, with the
cladogram's efficiency in estimating the cladistic parameter. [Cladistics; phyletics; phenetics;
gradistics; classification; Apidae; Allodapinae; bees.]
SYSTEMATIC ZOOLOGY
VOL. 28
ADFGHBCEJKMLIN
I.I Phylogram
1.4 Phenogram: Adults
ABCDEFGH I J KLMN
ABFGHCDEIJ KLMN
1.2 Phenogram:Larvae
1.5 Phenogram: Males
ABFGHKLMI J CDEN
A'ABCD'DEFGH'H I JKLMM'N
1.3 Phenogram: Pupae
1.6 Phylogram: Alternative
FIG. 1.—Cladograms (1.1-5, after Michener, 1977:figs. 1, 7-10; 1.6, partly after Nelson, 1978:fig. 2). In
cladogram 1.3, taxon F is added arbitrarily, so as to fill in missing data.
paper includes five branching diagrams
and three classifications of 14 genera of
allodapine bees. The diagrams include
what Michener terms a "cladogram . . .
developed using the methods of Hennig"
(his fig. 1), and four diagrams, each of
which he terms a "cladogram on which
are shown certain distance coefficients"
(his figs. 7-10). These are the five cladograms, which include one phylogram (redrawn here as Fig. 1.1) and four phenograms (Fig. 1.2-5). Of each of Michener's
figs. 7-10, he states that "nested rings in-
dicate subjective levels of similarity."
The nested rings are the basis of the phenograms of Figs. 1.2-5. Redrawing them
all in conventional form makes it easy to
specify what they mean. The phenograms include one based on "54 characters of mature larvae" (his fig. 7; cf., Fig.
1.2); a second, on "46 characters of pupae" (his fig. 8; cf., Fig. 1.3); a third, on
"144 external characters of adults" (his
fig. 9; cf., Fig. 1.4); and a fourth, on "25
genital and associated characters of adult
males" (his fig. 10; cf., Fig. 1.5). The five
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
ABCDEFGHI J KLMN
1979
CLADISTIC ANALYSIS
ADFGHBCE I J KMLN
2.1 All variables
2.3 Adults
ABCDEFGH I J KLMN
ABCDEFGH I J KLMN
2.2 Larvae
2 . 4 Alternative
FIG. 2.—Cladograms (2.1-3, after Michener, 1977:51-52, and Nelson, 1978:fig. 1; 2.4, after Nelson,
1978:fig. 2).
cladograms constitute a study of 14 genera coded as follows: A, Halterapis; B,
Compsomelissa; C, Allodape; D, Braunsapis; E, Nasutapis; F, Allodapulodes;
G, Dalloapula; H, Allodapula; I, Eucondylops; J, Exoneurella; K, Brevineura; L,
Inquilina; M, E$oneura; N, Macrogalea.
The classifications include what Michener terms three "different lists of genera." The cladistic aspect of each is specified in the cladograms of Fig. 2. Of
Michener's lists, he states that the first is
based on "all variables so far analyzed"
(cf., Fig. 2.1); the second, on "larval variables only (fig. 7)" (cf., Fig. 2.2); the
third, on "adult external variables only
(fig. 9)" (cf., Fig. 2.3).
mean, in general and particular. Some
examples of cladograms, based on Michener's bee study, are shown in Figs. 1-2.
Cladistic Components
Cladistic components are branch points.
A particular branch point is defined by
the branch tips (terminals, or terms) to
which it leads. Consider cladogram 1.1,
with 12 branch points, numbered 0-11.
Branch point 0 is that branch point leading to terms A-N or, in other words
Component 0 = A-N
Component 1 = AB
Etc.
There are 21 different cladistic components in the cladograms of Figs. 1-2.
Cladistic analysis is the analysis of They are listed in Table 1, with their disbranching. Specifically, the analysis is of tribution among the various cladograms.
cladograms (hierarchically branching Component 0, defined as that branch
diagrams), or classifications that may be point leading to all branch tips, has proprepresented by cladograms. The analysis erties unlike components 1-20, for it apfocuses on branch points and what they pears in all cladograms of the 14 taxa.
CLADISTIC ANALYSIS: DEFINITIONS
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
ABCDEFGHI J KMLN
J-M
F-M
F-I
G-I
HI
AB
C-N
C-E
DE
Definition
2
12
3
2
8
4
3
2
4
3
2
10
8
12
7
5
3
5
3
2
Information
!
1
1.1
—
0.029
0.050
0.27
0.050
1.1
0.10
0.033
1.1
0.10
0.27
0.033
0.10
0.27
1.1
0.27
1.1
1.1
-
58
19
_
—
-
—
—
33
23
_
—
—
—
—
_
—
_
+
_
+
+
+
_
-
+
+
+
+
—
_
_
—
_
92
45
—
_
+
+
—
-
+
_
+
+
+
+
+
+
+
+
+
+
Pupae
Fig. 1.3
Larvae
Fig. 1.2
+
_
_
_
—
+
+
+
+
—
_
42
31
-
_
_
_
_
—
_
_
_
Adults
Fig. 1.4
Phenograms
Fundamental Cladograms
Phylogram
Fig. 1.1
Component present (+); component absent (—).
Efficiency for cladogram 2.4 is computed by comparison with cladogram 1.1
KM
LM
ABFG-M
ABF-HK-M
A-HJ-M
A-DF-H
ADF-H
JKM
ABF-H
FGH
KM
Information, Component (%):
Information, Term (terms):
(%):
Efficiency, Component
Efficiency, Term (%)2:
Efficiency, Cladistic or Overall (%)2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Number
Probability
of
Replication
(%)
25
12
-
+
—
—
—
_
_
+
_
_
_
—
—
—
+
—
-
_
_
Males
Fig. 1.5
+
+
17
5
0
0
0
_
_
_
_
—
_
_
—
_
_
_
_
_
_
-
All
Variables
Fig. 2.1
+
—
—
+
_
+
_
+
_
_
_
_
—
—
33
10
57
53
55
Larvae
Fig. 2.2
+
—
+
—
—
+
_
_
+
—
—
_
—
33
13
36
29
32
_
—
—
_
—
_
—
_
_
_
-
17
8
40
26
33
+
+
_
-
Alternative
Fig. 2.4
Adults
Fig. 2.3
Derivative Cladograms
ANALYSIS O F CLADISTIC COMPONENTS OF CLADOGRAMS, AND OF THEIR INFORMATION AND EFFICIENCY.1
Components
TABLE 1.
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
00
to
1979
CLADISTIC ANALYSIS
Components 1—20 need not appear in all
cladograms, and, indeed, none of them
does. For cladistic analysis, component
0 is irrelevant.
Fundamental and Derivative
Cladograms
The basic principle of cladistic analysis, so far as systematic biology is concerned, is that a hierarchical classification has a cladistic aspect that may be
specified in a cladogram, wherein a taxon
may be represented either as a tip of a
branch (terminal taxon) or a branch point
(inclusive taxon).
Two major types of cladograms may be
distinguished: (1) fundamental cladograms, such as those of Fig. 1, which
summarize data about the interrelationships between terminal taxa; and (2) derivative cladograms (classifications), such
as those of Fig. 2, which specify the cladistic aspect of hierarchical classifications. Systematic study of the interrelationships of the species of a certain group
generally results first in one or more fundamental cladograms (Fig. 1), which is
used as a basis for constructing one or
more derivative cladograms, or hierarchical classifications (Fig. 2). In a given
study, the fundamental and derivative
cladograms may agree or differ. If they
agree, they agree because they include
the same components. If they differ, they
differ only because they include different
components, or different sets of components. Of the cladograms of Figs. 1-2,
each specifies a different set of components (Table 1).
In my earlier commentary (Nelson,
1978), I considered an alternative phylogram somewhat more detailed than
Information Content of a Cladogram
The information content of a cladogram
has two parts, here called the component
information (measured in components)
and the term information (measured in
terms). The component information is
the number of components, or branch
points, of a cladogram, expressed as a
fraction of the total possible, or as a percentage. With disregard for the initial
branch point (component 0), the total
possible is two less than the number of
branch tips. The component information
of the cladograms is given in Table 1.
The term information is the sum of the
information of all individual components
of a cladogram. Consider cladogram 1.1.
It has 11 components. Component 1, with
two terms, has an information content of
2 terms; component 2, 12 terms; component 3, 3 terms, etc. For cladogram 1.1
the total term information is the sum of
2 + 1 2 + 3, etc., = 45 terms. The term information of the cladograms is given in
Table 1.
Derivation of Cladograms
A derivative cladogram is a graphic representation of a hierarchical classification.
The word "derivative" implies derivation from a fundamental cladogram.
Michener supplies three classifications,
hence three derivative cladograms
(2.1-3). All three may be compared with
the fundamental cladograms (Figs. 1.1-5),
and two of the three invite comparison
with particular fundamental cladograms
(2.2 with 1.2, and 2.3 with 1.4).
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
Information Content of a Component
The information content of a component is the number of terms in its definition. A branch point leading to two
branch tips has an information content of
two terms; a branch point leading to 12
branch tips, 12 terms; and so on. The information of each of the 20 components
is listed in Table 1.
Michener's (cf., Figs. 1.1, 1.6). The alternative was obtained by splitting four of
Michener's genera and creating four new
taxa (A', D', H', M'). I considered also an
alternative classification, of which the
cladistic aspect is specified in Fig. 2.4,
with the omission of the new, as yet undefined, taxa mentioned above. Omitting
them makes for easier comparison among
the different classifications, without significantly changing the results (see below).
SYSTEMATIC ZOOLOGY
VOL. 2 8
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
Comparison 1 (2.2 and 1.2).—The clas-ficiency of 53 percent. Cladistic, or oversification (derivative cladogram) of larvae all, efficiency may be expressed as an avcontains components 1, 4, 7, 10. The phe- erage of the two, 55 percent. Efficiency
nogram (fundamental cladogram) for lar- of derivative cladograms is listed in Tavae contains components 1, 3, 4, 6, 7, 8, ble 1.
10. The classification may be said to be
derived by accepting components 1, 4, 7,
INTERPRETATION OF ANALYSIS
10 and rejecting components 3, 6, 8.
Phenetics and Phyletics
Comparison 2 (2.3 and 1.4).—The clasConsider
derivative cladogram 2.2 (the
sification (derivative cladogram) of adults
classification
of larvae). All of its compocontains components 16 and 17. The phenogram (fundamental cladogram) for nents are represented in fundamental
adults contains components 9, 14, 15, 16, cladogram 1.2 (the phenogram for lar17. The classification may be said to be vae); all of its components are representderived by accepting components 16 and ed also in fundamental cladogram 1.1
17 and rejecting components 9, 14, 15. (the phylogram). One may ask, what is
Comparison 3 (2.1 and all fundamen- the nature of the classification? Is it puretal cladograms).—The classification (de- ly phenetic, purely phyletic, or both? It
rivative cladogram) based on all variables would seem to be both.
Consider derivative cladogram 2.3 (the
contains two components (19, 20), which
are absent from all fundamental clado- classification of adults). Its components
grams (1.1-5). The classification may be are represented only in fundamental
said to be derived by accepting compo- cladogram 1.4 (the phenogram for adults).
nents 19 and 20 (from unknown sources) What is the nature of the classification?
It would seem to be purely phenetic.
and rejecting components 1-18.
Consider derivative cladogram 2.1 (the
Other comparisons.—Other comparisons are possible, for example, between classification based on all variables). Its
cladograms 2.2 and 1.1, or 2.2 and 1.3, components are represented in no funwhich would reveal various agreements damental cladogram. What is the nature
of the classification? It would seem to
or the lack thereof.
have no nature that can be specified, exEfficiency
cept to say that it is derived from unknown
sources.
Inasmuch as derivative cladograms inThat
Michener presents three classifivite comparison with particular fundamental cladograms, such that one might cations, all differently derived, might be
be said to be derived from the other, a considered curious or unusual. One may
given derivative cladogram may be said ask, why three and three only? Or, alterto be more or less efficient in represent- natively, why not just one?
ing a given fundamental cladogram.
Chaos vs. Order
Component efficiency may be expressed
Confronted with five different fundaas the fraction of derived to fundamental
components. Consider derivative clado- mental cladograms, one might assume
gram 2.2 in relation to fundamental that each tells a different story of its own:
cladogram 1.2: four of seven components that nature is chaotic; evolution, discorare represented; the derivative clado- dant; convergence, common; etc. This
gram may be said to have a component assumption might lead to the idea that
efficiency of 4/7, or 57 percent. Term ef- the task of systematics is to impose order
ficiency may be expressed as the fraction upon a chaotic nature. An alternative asof derived to fundamental terms. Consid- sumption is that each cladogram tells the
er again cladograms 2.2 and 1.2: 10 of same, or part of the same story: that na19 terms are represented; the derivative ture is ordered; that discordant evolution
cladogram may be said to have a term ef- and convergence are mere artifacts of hu-
1979
CLADISTIC ANALYSIS
CLADISTIC SYNTHESIS: DEFINITIONS
The objective of cladistic synthesis is
a single general cladogram. There might
seem to be various approaches to this objective, for there are, in truth, a variety of
cladistic philosophies at various levels of
generality. One might assume, for example, that the phylogram (Fig. 1.1), if it has
been resolved by combining various data
according to some phyletic clustering
technique, is the general cladogram, and
that any departure from it is discord, convergence, parallelism, etc. In that sense
the phylogram would be more general
than the data, but no more general than
the particular clustering technique. Alternatively, one might assume that some
phenogram (Fig. 1.2-5), if it has been resolved by combining various data according to some phenetic clustering technique, is the general cladogram. Such
assumptions create different philosophies, in these cases, particular philosophies of phyletics and phenetics that are
bound to one or another particular technique of combining data. Other philosophies, of course, are possible. One might
assume, for example, that the general
cladogram is yet another, one that portrays grades or adaptive zones, resolved
by combining data according to some particular gradistic clustering technique.
The pitfalls of philosophy are many, and
many of them are deep. However deep,
no pitfall is satisfactory as a solution at a
general level. A general solution requires
not combining data according to some
particular clustering technique, nor even
combining various clustering techniques
(which would result merely in a hybrid
technique), but rather the combining of
the results of clustering techniques in
general.
Again, one may turn toward a judicious
choice of questions. Rather than assume
that the general cladogram must be phyletic, or phenetic, or gradistic, or whatever, one may ask, how a general cladogram might be discovered, given the
possibility that it exists? What follows is
an attempt to answer this question within
the context of cladistic synthesis.
Component Relations
One component is more or less like
another, such that any two components
share one of four possible relations:
Exclusion.—Components are combinable and exclusive if their definitions are
different and non-overlapping. Example:
component 1, with the definition AB, and
component 4, with the definition DE
(Table 1).
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
man perception. This assumption might
lead to the idea that the task of systematics is to discover and record nature's
order and to embody it is classification.
Thus, different philosophies of systematics are possible. They have been recognized for a long time, as exemplified in
the discussion of artificial vs. natural systems—a discussion that is about 200
years old.
To some extent one may try to avoid
the pitfalls of philosophy by a judicious
choice of questions. For example, rather
than assume that nature is chaotic (and
opt for some artificial system), or, alternatively, assume that nature is ordered
(and ponder the mystery of the natural
system), one may ask, how chaos or order
might be discovered, granting the possibility that each might exist? For chaos, an
immediate, but facile, answer is possible,
with reference to the diversity of fundamental cladograms that might be at hand
(Fig. 1). For order, in contrast, an immediate answer is problematical. The question, however, may be restated: given a
diversity of fundamental cladograms
(Fig. 1), what single story might they all
be telling?
With cladistic analysis, fundamental
cladograms may be analyzed into their
components (Table 1). The analysis
merely specifies, at an elemental level,
the structure of fundamental cladograms.
Yet the single most important principle
of cladistics is that diverse fundamental
cladograms may be combined to form a
single general cladogram. It is easiest to
view the process in two stages (analysis,
followed by synthesis).
8
SYSTEMATIC ZOOLOGY
1
These views of component relations presuppose
that the reader understands the term "cladogram"—
by which I basically mean a hierarchically branching diagram in which all species, both fossil and
recent, are terminal in position. Also, cladograms
are always non-reticulate (if reticulation is allowed,
"non-combinable" components may be combined
in a single cladogram). In the phyletic context, reticulation implies hybrid origin of species—a possibility that is subsumed, in a different guise, under
this concept of cladograms. An instance of hybridization would be represented in fundamental cladograms by non-combinable components that exhibit
non-random replication, and in the general cladogram by tri- or polytomies that present conflicting,
but non-random, possibilities for dichotomous resolution (as exemplified in fundamental cladograms).
A branching diagram showing some supposed hybrid origin of species (reticulation) is a structure, of
lower generality than a cladogram, that I have elsewhere suggested be termed a "phyletic tree." The
distinction between cladograms and trees is best
appreciated by considering the meaning attached
to branch points (components). In a cladogram, the
branch point represents the generality of supposedly true statements ("synapomorphies") that can
Replicated Components
Replicated components are significant
because the probability of replication for
any given component, due to chance
alone, is small. In this case (Michener's
14 genera of bees) the probability is 1
percent or less (Table 1). Of interest is
the fact that the probability is not a function of the fundamental cladogram in
be made about the terminal taxa. In a phyletic tree,
the branch point represents a supposed speciation
event. A cladogram is an atemporal concept; a phyletic tree is a cladogram to which the temporal aspect has been added. In short, a cladogram is a synapomorphy scheme; a phyletic tree, a phylogeny.
The two concepts are related: many different trees
may be derived from a single cladogram; but only
one cladogram can be derived from any given tree.
Deriving a tree from a cladogram is apt to change
the structure of the branching diagram, for example,
when taxa that are designated as "ancestors" are
shifted from terminal positions to branch points. To
state that a cladogram is a synapomorphy scheme
invites the rejoinder that a cladogram must, therefore, be a phyletic concept. Not so, for by "synapomorphy" I mean "defining character" of an inclusive taxon. True, all defining characters, in the
phyletic context, may be assumed to be evolutionary novelties. But making that assumption does not
render it automatically true; nor does it change the
characters, the observations on which the characters
are based, or the structure of the branching diagram
that expresses the general sense of the characters:
i.e., "that there exist certain inclusive taxa (components 1, 2, . . .) that have defining characters."
That the taxa may be assumed to be "monophyletic" (derived from a single ancestral species) does
not change the structure of the branching diagram,
but merely adds the phyletic context of interpretation; hence the diagram becomes a phylogram (phylocladogram). It is still atemporal in the sense that
it is necessarily non-reticulate and that its branch
points are only mistakenly construed as speciation
events. To equate "synapomorphy" and "defining
character" invites the rejoinder that a branching
diagram specifying only "polythetic" inclusive taxa
is no cladogram, because no defining characters are
present. Not so. Even if the inclusive taxa are assumed to lack defining characters (because one is
not yet known), that assumption does not change
the fact that the branching diagram expresses the
general sense of defining characters even in their
absence: i.e., "if there were some defining characters, these are the inclusive taxa (components 1, 2,
. . .) that the characters would define." In short,
finding defining characters for each of the "polythetic" taxa would not change the structure of the
branching diagram, but merely add replications of
components.
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
Inclusion.—Components are combinable and inclusive if their definitions are
different and overlapping, such that one
is included in the other.
Example.—Component 3, with the definition CDE, and component 4, with the
definition DE. For each inclusive relation one component is a whole (3), and
the other (4) is a part.
Non-combinability.—Components are
non-combinable if their definitions are
different and overlapping, such that neither is completely included in the other.
Example: component 11, with the definition LM, and component 17, with the
definition JKM.
Replication.—Components are replicated if their definitions are the same, in
which case there are not two components, but merely one that is replicated.
These four relations may alternatively
be viewed as two:
Combinability (Inclusion, Exclusion,
and Replication).—Components are combinable if they can be parts of the same
cladogram.
Non-combinability:—Components are
non-combinable if they cannot be parts
of the same cladogram.1
VOL. 28
1979
CLADISTIC ANALYSIS
GH I
KLM
ABCDEFGHIJKLMN
FIG. 3.—Replicated components (3.1) and their resulting cladogram (3.2).
which a component appears, but rather a
function of the number of terms in the
component's definition (the information
content). Consider, for example, component 1. It has two terms. It is one of 91
possible 2-term components that might
PROBABILITIES (P) OF REPLICATION OF COMPONENTS (C) IN FUNDAMENTAL CLADOGRAMS.1
TABLE 2.
Number
Numof
ber possible
of
compoterms nents
Fundamental Cladograms
1.1
1.2
364
4
1,001
5
7
8
10
12
2,002
3,432
3,003
1,001
91
1
1.3
11
3
7
10
—
-
6
9
-
1
4
8
4.40
4.55
4.44
3
7
10
3.75
4.08
3.85
6
4
4.40
3
1.29
Pfor
1.5
1.4
C
91
3
result from clustering 14 terminal taxa.
Hence its probability of replication, due
to chance alone, is about 1 percent.
In the fundamental cladograms (Table
2), there are eight 2-term components, of
which four are replicates (P = 10~4%);
P:%
3
1.10
9
0.59
components
of same number
of terms, %
P cumulative
%
3.90 x 10~4
3.90 x 10"4
8.37 x 10~7
3.27 x 10"12
5.73 x 10-"
1.87 x lO"17
17
16.7
9
16
15
13
12
14
0.59
18
10"1
1010"
10"
10-'
For a component that replicates a previously resolved component, the probability is that of replication due to chance alone, under the
assumption that the components are resolved one at a time in the order listed for each cladogram. The probability fractions for replicates are:
cladoRram 1.2 (4/91, 3/66, 2/45, 3/80, 2/49, 1/26, 2/12); 1.3 (4/91, 3/232); 1.4 (2/341); 1.5 4/364, 2/341). A component that is resolved for the first
time has 0 probability of replicating a previously resolved component.
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
J KLM
HI
10
SYSTEMATIC ZOOLOGY
ABCDEFGH IJ KLMN
there are nine 3-term components,7 of
which five are replicates (P = 10~ %);
and there are five 4-term components, of
which three are replicates (P = lO'^o).
One may doubt that these replicates are
due to chance alone (P = 10~17%). There
are three different 2-term components
that are replicated; three different 3-term
components; and two different 4-term
components. The replicated components
may be combined in one cladogram (Fig.
3.1-2). One may doubt that this combination is due to chance alone (P =
General Cladogram
The cladogram specified by the replicates may be considered a first step toward a general cladogram. Inspection of
the definitions of components (Table 1)
shows that several are non-combinable
with the replicates (cladogram 3.2): components 13, 14, 15, 16,17, 18. There is no
reason to doubt that these components
are due to chance alone. Of the remaining components (2, 5, 11, 12), each is
combinable (either exclusively or inclusively) with all replicates, but components 2 and 12 are mutually non-combinable. There is no reason to doubt that
these components are due to chance
alone. Nevertheless, four categories of
components may be recognized: (1) replicates (1, 3, 4, 6, 7, 8, 9, 10); (2) components non-combinable with replicates
(13, 14, 15, 16, 17, 18); (3) components
combinable with replicates and with
each other (5, 11); (4) components individually combinable with replicates, but
not with each other (2, 12). Only category
(3) is combinable with the replicates in
a single cladogram. Hence, categories (1)
and (3) may be combined in a general
cladogram (Fig. 4), which may be considered a best estimate of one, and the only
apparent, cladistic parameter.
Consider cladogram 3.2 (replicates)
and cladogram 4 (general cladogram).
Category (3) contributes two additional
components (5 and 11)—25 percent of the
replicates. What can be said on behalf of
the added components? Not much beyond the statement that they are combinable with the replicates and with each
other—a statement of less significance
than, and a statement of which the significance depends upon, the phenomenon of improbable replication. Component 5 is one of 15 possible components
that could be formed by component 6 (or
component 9) in exclusive relation with
components 1 and 3 and term N. Component 11 is one of three possible components in inclusive relation with component 10. Hence, there are 15 x 3 = 45
possible combinations. And there is only
one chance in 45 that components 5 and
11 might accurately estimate the cladistic
parameter, under the assumption that the
replicates are an accurate estimation.
That the probability is on
the order of 2
percent, rather than 10~3 percent is due
wholly to the improbable pattern of replication, which determines that there are
45, rather than 400,000, possible combinations.
The general cladogram (Fig. 4) may be
compared with the fundamental cladograms (Fig. 1.1-5). Its components are all
included in the phylogram (Fig. 1.1), and
it might, therefore, be termed purely
"phyletic." Most of its components are
included also in the phenogram for larvae
(Fig. 1.2), and at least one of its components appears in each of the other
phenograms (Fig. 1.3-5), and it might,
therefore, also be termed somewhat
"phenetic." In what follows, the compo-
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
FIG. 4.—The general cladogram derived from the
fundamental cladograms of Fig. 1 (1.1-5).
VOL. 2 8
1979
11
CLADISTIC ANALYSIS
TABLE 3.
ANALYSIS OF GENERAL INFORMATION AND GENERAL EFFICIENCY OF CLADOGRAMS.
Efficiency (%.)
Term Information
Component Information
Components that are:
Terms that are:
nents
(total)
True
False
Ambiguous
Overall
(%)
Terms
(tot.)
True
False
Ambiguous
Component
Term
Cladistic
or
Overall
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
11
7
4
5
3
2
4
2
4
10
7
2
1
2
0
4
0
4
0
0
1
4
1
2
0
2
0
1
0
1
0
0
0
0
0
0
88
58
12
-25
8
-17
33
-17
33
45
19
23
31
12
5
10
8
13
33
19
5
4
7
0
10
0
13
0
0
8
27
5
5
0
8
0
12
0
10
0
0
0
0
0
0
105
70
15
-30
10
-20
40
-20
40
118
58
6
-70
6
-15
30
-24
39
112
64
11
-50
8
-18
35
-22
40
nents of the general cladogram are
termed "true components"; those noncombinable with the general cladogram
(category 2), "false components"; those
mutually non-combinable (category 4),
"ambiguous components."
General Component Information
Given a general cladogram with its
"true components," it may be compared
with fundamental and derivative cladograms (Table 3), and their information
content may be broken down into three
categories: true, false, ambiguous (by
"true" and "false" I mean only agreement or disagreement with the general
cladogram). For fundamental cladogram
1.1, which includes 11 components, 10
components are true, and one is ambiguous (component 2). One-half of the ambiguity may be allocated to the true category, resulting in 10.5 true components.
Given 12 components as a maximum, the
general component information is, therefore, 10.5/12, or 88 percent. For fundamental cladogram 1.3, which includes
four components, two components are
true; one is false; and one, ambiguous.
The false may be subtracted from the
true, and the ambiguous allocated as
above. Given 12 components as a maximum, the general component information is, therefore, (2 - 1 + 0.5)/12, or 12
percent. General component information
for all cladograms is given in Table 3.
General Term Information
The maximum value of term information, even for dichotomous cladograms,
is not a unique amount, but depends
upon the branching pattern. For a given
cladogram, the general term information
may be expressed by the total terms, rather than a fraction, or percentage. For fundamental cladogram 1.1, with 45 terms,
33 are true and 12 ambiguous (Table 3).
General Efficiency
Efficiency is calculable when there is
a cladogram that can serve as a standard
for comparison. General efficiency is calculable for both fundamental and derivative cladograms, with the general cladogram serving as a standard. Consider
fundamental cladogram 1.1, with 11 components (10 true, 1 ambiguous) and 45
terms (33 true, 12 ambiguous). Again,
one-half of the ambiguity may be allocated to true information. Accordingly, the
general component efficiency is (10 +
0.5)/10, or 105 percent; and the general
term efficiency is (33 + 6)/33, or 118 percent. Both values may be combined as a
general cladistic, or overall, efficiency,
by adding them together and dividing by
two: 112 percent. Consider fundamental
cladogram 1.3, with four components (2
true, 1 false, 1 ambiguous) and 23 terms
(5 true, 8 false, 10 ambiguous). Its general
component efficiency is (2 - 1 + 0.5)/10,
or 15 percent. Its general term efficiency
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
Cladograms
12
SYSTEMATIC ZOOLOGY
is (5 - 8 + 5)/33, or 6 percent. And its
general cladistic, or overall, efficiency is
11 percent. In cladograms with ambiguity, general efficiency may exceed 100
percent—indicating that the cladograms
are, in some sense, too efficient. In cladograms with false information, both general information and general efficiency
may achieve negative values (Table 3).
VOL. 28
ing the fundamental cladograms with an
eye for the replications that they contain;
in short, cladistic analysis and*synthesis
can operate at an intuitive level).
MICHENER'S CLASSIFICATION OF
BEES
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
Michener (1977) provided three classifications of bees, one based on all variables, one based on larvae, one based on
Optimum Classification
external features of adults. He did not say
What is the optimum, or best, kind of so explicitly, but I assumed that the clasclassification? A variety of answers is pos- sification based on all variables was ofsible, in accordance with the variety of fered by him as a general classification.
possible taxonomic philosophies. Ac- Because of the incongruity of this classicording to one such philosophy, there is fication, I proposed an alternative in my
no optimum at all. The assumption that previous commentary (Fig. 2.4). The four
there is an optimum defines a philosophy classifications are analyzed in Table 3.
that I call cladistics—a modern version of They differ widely in their information
an old tradition in systematics, namely and efficiency. What are the conclusions
the theory and practice of methods to re- to be drawn?
solve the natural system, or classificaThe basic question is about optimal
tion.
classification: is classification to reflect a
Discussion of optimality has generated parameter? If yes, then one may judge
a dispute between adherents of phyletic, one classification better than another in
phenetic, and gradistic philosophies, all accordance with its agreement with some
of which are cladistic—under the as- standard estimate of that parameter. If no,
sumption that any optimal classification then one may not. To date, various pahas a cladistic aspect that may be repre- rameters have been suggested: phyletic,
sented by a cladogram, which is true for phenetic, gradistic, etc., all of which are
all attempts at hierarchical classification. cladistic in the sense that each may be
Thus, there is a generally optimal clas- represented by a cladogram. I suggest the
sification—of which the cladistic aspect parameter estimated by the general
is represented in a general cladogram.
cladogram—the cladistic parameter.
It is entirely possible, of course, that
There is no necessary correlation besome methods are more efficient than tween the cladistic parameter and any
others in estimating a general cladogram. one discipline, such as phyletics, pheTo judge from Michener's bee study, his netics, or gradistics. In the case of Michphyletic methods are more efficient than ener's bees, the general cladogram
his phenetic methods. But his phenetic agrees better with Michener's phylogram
methods are not without value. They than with three of his four phenograms,
contribute replications, and, apparently but it agrees better with one phenogram
generate random variation (i.e., noncom- than with the other three. The agreement
binable components) that may be dis- between estimates of the cladistic paramcriminated as such and set aside. Without eter and phyletics may prove general, at
methods that generate random variation, least in theory, but who knows? In
there would be no need for cladistic anal- another case, phyletic methods, applied
ysis and synthesis, for their results would ineptly but with the best of intentions,
be intuitively self-evident in advance might generate pure noise; and phenetic
(which is near the truth anyway, for much methods, with a careful analysis of charof the general cladogram can be imme- acters, pure signal.
diately apprehended by anyone inspectIn his commentary, Michener (1978)
1979
13
CLADISTIC ANALYSIS
INTERPRETATION OF SYNTHESIS
General Pattern
Traditionally conceived, systematics is
the search for general pattern in the living world. During the history of systematics, pattern has been resolved piecemeal, and its resolution has never been
claimed to be complete. Discovery of pattern has depended upon sampling, and
confidence in the results has been a function of prediction and replication. Traditionally, sample size has been small.
With the advent of phenetic methods,
stress was placed on large samples, in the
hope that their results would be imperturbable with further sampling, and in
the hope that the results would estimate
a parametric overall-similarity. The recommendation was made that further samples be pooled with the original data,
with the consequence that prediction dissolves into non-falsifiability, which, as I
see it, is the Achilles' heel of purely phenetic philosophy. Hence, the time is ripe
for consideration of small samples, the
extent to which elements of cladistic pattern (components) are replicated therein,
and the probability of their occurrence
due to chance alone. These are the traditional virtues of systematics.
The obvious weakness of phenetics is
its assumption of parametric overall-similarity—the reason for the recommendation for pooling further samples with the
original data (presumably resulting in a
more reliable estimate of overall-similarity). The usefulness of the idea of overall
similarity would seem to consist only in
its implication that, given estimates of
the overall-similarities for the species of
a certain group (a phenogram), the estimates specify a cladogram, i.e., an estimate of the cladistic parameter. But there
is no reason to believe that the cladistic
parameter depends for its existence upon
the phenetic. Even if there is no parametric overall-similarity, there may yet
be a cladistic parameter. Consider, for
example, the hypothetical phenograms of
Fig. 5. Each specifies estimates of the
overall-similarities for 14 taxa of a certain
group. The estimates of the two phenograms differ in all particulars, but the cladistic aspect is the same for both.
Of interest is a comparison between
the efficiency of the fundamental cladograms and the number of characters upon
which each cladogram is based:
Cladogram
1.1
1.2
1.3
1.4
1.5
Characters
about 30
54
46
144
25
Efficiency
112%
64%
11%
-50%
8%
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
makes various statements about cladistics, which he conceives as synonymous
with phyletics. In my opinion he does
justice to neither subject, and his argument is an argument over words, particularly the word "cladistic," the meaning
of which he considers established through
his own usage of it. In using the word
"cladistic" as I do, I suppose I invite his
criticism and, perhaps, that of other persons of his persuasion, who believe that
the bible of "cladistics" was written by
Hennig (1966). Hennig, in fact, never
used the word, which was coined by his
critics (Mayr, Sokal, Darlington, Simpson
et al.), who argued that Hennig's philosophy was hopelessly narrow-minded
and, therefore, deserved to receive a special name. Alas! The critics never seemed
quite to understand what they were criticizing to begin with. "Well," according
to the old saying, "no good deed goes
unpunished."
The conclusions to be drawn? I hope
you, the reader, will consider the concepts of cladistic parameter and general
cladogram, their implications, their
possible applications, and particularly
whether the concepts enlighten systematic practice, both past and present (never mind the theory). On these matters I
invite you to draw your own conclusions.
As for bees and Michener's classifications of them, I suggest that you view
them as I do, as interesting examples of
contemporary practice in systematics,
from which much can be learned, and
value them accordingly.
14
SYSTEMATIC ZOOLOGY
ABCDEFGHIJKLMN
1.0
0.99
0.98
0.97
ABCDEFGH IJ KLMN
0.75
0.50-
i
0.25FlG. 5.—Two hypothetical phenograms that differ
in their statements about similarity, but specify the
same cladistic components. The scales are of similarity (however measured).
There seems to be no positive correlation, but there might be a negative one,
with the implication that, with large
numbers of characters processed by phenetic techniques, virtually all resolved
components might be due to chance
alone—a possibility enhanced by the
findings of Mickevich (1978).
General Cladogram: Implications
Suppose that a general cladogram is resolved (Fig. 4). What does it mean? It
specifies a non-random pattern of interrelationships. One may ask, what is the
cause of the non-random pattern? Two
possibilities come to mind: (1) the pattern is a methodological artifact; (2) the
pattern reflects (and, therefore, estimates) an aspect (parameter) of the real
world that is method-independent. Given these two possibilities, how might the
truth be ascertained? Consider possibility (1): if the pattern is artifact, different
methods, to the extent that they are truly
independent, should produce different
28
(randomly non-combinable) patterns.
Consider possibility (2): if the pattern is
method-independent, different methods,
even if they are truly independent,
should produce the same non-random
pattern, or combinable parts of the same
pattern, in addition to some random variation.
There is already available enough information to assert that different clustering methods produce fundamental cladograms that are to some extent different
and to some extent the same. If the differences are artifact, and the samenesses
method-independent, then the problem
is only distinguishing signal from noise.
Resolving a general cladogram merely
specifies what is signal, given a number
of fundamental cladograms that are to
some extent the same, and to some extent
different. As here conceived, its resolution amounts to asking, and answering in
the affirmative and in detail, this question: given some fundamental cladograms, do they exhibit a non-random pattern of cladistic components?
One may ask, in addition, how this
question might best be explored and answered. Suppose that a certain clustering
method is applied to different data sets
in order to generate various fundamental
cladograms for the species of a certain
group, one cladogram for each data set.
A general cladogram might be resolved.
It could be an artifact of the clustering
method. To explore the possible bias of
a single clustering method, different
methods might be applied, each method
to a different data set, or all methods to
the same data set. Another general cladogram might be resolved. What is to be
expected in the way of agreement between the two general cladograms? The
present study suggests that both general
cladograms would estimate the same cladistic parameter.
Suppose that a general cladogram is resolved (Fig. 4). What then? Is it artifact
or knowledge of the real world? The
bolder, and non-trivial, hypothesis is the
latter. How might this hypothesis be falsified in a given case? The answer seems
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
1.0
VOL.
1979
CLADISTIC ANALYSIS
in a second-order general cladogram, i.e.,
a unique estimate of the cladistic parameter. Therefore, some replication between different general cladograms implies that possibility (3) is unrejectable,
and that possibility (4) is merely false. I
conclude that, even if the null hypothesis
is falsified, the only alternatives are
either partly ad hoc or trivial. Of the ad
hoc possibilities, only (3) seems interesting.
I have asserted that possibility (4)
seems trivial. It is a statement that replication of results (if any) is artifact, and
non-combinability is an estimate of the
real world. As such, it contradicts one of
the premises of the falsification argument: namely, that the two general cladograms are, or should be, independently
resolved. If independence is assumed,
and significant replication is the result,
the result is not artifact. Significant artifact can mean only non-independence in
the resolution of the two general cladograms. Hence, possibility (4) is a misstatement. It purports to say something
about the world, but in reality contradicts
the premise of independence.
What then of the initial hypothesis:
that a general cladogram is knowledge of
the real world? Is it falsifiable? I conclude that it is not in a general sense, but
that it is in a particular sense. In the general sense, all that a person can do is to
search, but if he fails to find, the failure
is no falsification. In the particular sense,
a person may find one pattern rather than
another.
What are the implications? The initial
hypothesis can only be presumed. Once
presumed, it may lead to particular null
hypotheses that may be tested and, perhaps, falsified, and an alternative may be
entertained. If this is the only implication, what is there in the way of a general
theory of systematic^? The answer seems
clear: the theory of general cladograms.
As here conceived, the theory includes
phenetics, phyletics, gradistics, and any
other theory that might be proposed, if its
applications lead to results that might be
rendered as hierarchically branching dia-
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
clear: by resolving a second, and independent, general cladogram from a different, and independent, set of fundamental cladograms. The prediction is that
the two general cladograms, even though,
independently resolved, will estimate
the same cladistic parameter. The prediction is a null hypothesis: the two general
cladograms will not significantly differ.
The null hypothesis may be tested by
sampling.
Suppose that the sampling is done, and
that, in one's judgement, the two general
cladograms differ significantly (I pass
over the problem of what might constitute a significant difference). There seem
to be four possibilities for an alternative:
(1) there are two different cladistic parameters, each estimated by one general
cladogram; (2) there is one cladistic parameter, estimated by one, or the other,
or neither, of the general cladograms; (3)
there is one cladistic parameter, estimated by the sameness (replication and combinability), if any, exhibited jointly by
the two general cladograms; (4) there is
no cladistic parameter, or parameters,
and all pattern is artifact.
Possibility (1) seems purely ad hoc: "I
still believe that there is one cladistic parameter (estimated by my first general
cladogram), but now I believe that there
is another." Possibility (2) also seems
purely ad hoc: "I still believe that there
is one cladistic parameter, but I have different estimates of it; I know not which
to choose, but my first estimate might still
be correct." Possibility (3) seems partly
ad hoc, but is, nevertheless, interesting:
"I still believe that there is one cladistic
parameter, but I have revised my estimate of it; the revised estimate might
prove more reliable." Possibility (4)
seems trivial: "I no longer believe that
there is a cladistic parameter, because my
prediction was unfulfilled; and my hypothesis, falsified."
Possibility (3) seems unrejectable as an
alternative, if there is some replication of
components in the two general cladograms. The replicates, and components
combinable with them, may be specified
15
16
SYSTEMATIC ZOOLOGY
HISTORICAL NOTE
Michel Adanson has been regarded as
an early advocate of phenetic systematics
(Sokal and Sneath, 1963:50). Commentary from botanists later indicated otherwise (references in Sneath and Sokal,
1973:23). It seems agreed that the confusion over Adanson stems from statements made by A.-P. de Candolle (e.g.,
1844:57, translated):
artificial systems—a point emphasized by
Adanson to show the futility of artificial
systems. It is evident that Adanson acquired his notions of natural families not
by study of his artificial systems, but by
independent, and unspecified means
(Adanson, vol. l:clvij-clviij, translated):
"These diverse remarks, in showing the utility
of botanical exploration [discovery of new
species in the tropics], explain why I became
more and more convinced of the necessity to consider plants in a totally new fashion. I believed
it necessary to abandon old prejudice in favor of
[artificial] systems, and the ideas on which the
systems are based and which limit our knowledge; and to search in nature herself for her system, if it is true that she really has one. In this
belief I examined all parts of plants, without exception, from the roots to the embryo . . . . First
I made a complete description of each plant
species, considering each part in detail in a separate article; and to the extent that I encountered
new species with relations to species already described, I described the new species separately,
omitting the similarities and noting only their
differences. It was by the ensemble of these comparative descriptions that I perceived that the
plants placed themselves naturally in classes, or
families, that could be neither artificial nor arbitrary—not based on one or a few parts that must
vary within certain limits, but on all parts, such
that the absence of one part [in a given species]
would be replaced and balanced by the addition
of another part that would restore the equilibri-
To declare that artificial systems are futile (that they fail to resolve a single natural family), Adanson previously had to
decide what the natural families are.
True enough, he numerically rated each
of the 65 artificial systems, not by counting the number of natural families that
each system resolved (always 0), but by
"After having reasoned that all parts of plants
should be taken into consideration in the natural counting the natural families that each
method, Adanson established, for each plant or- system failed to dismember.
gan separately considered, one or more systems
Adanson summarized his results in a
. . .: this enterprise resulted in the formation of table (cf., Table 4), listing his 65 artificial
65 artificial systems. After this immense undertaking, Adanson reasoned that plant species systems and, in three additional columns,
found closely associated in the largest number of their number of classes, "sections," and
these systems are those that share among them- "natural sections" ("sections natureles
selves the most relations, and are those that qu'ils conservent"). Modern commentashould be brought together in the natural order."
tors have gone wrong in interpreting the
Perusal of Adanson's Families des Plantes second and third columns. The numbers
(1763-1764) reveals that none of his nat- in column 2 have usually been taken to
ural families is resolved in any of his 65 mean the number of secondary divisions
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
grams. The theory of general cladograms
might be considered a contribution to the
"omnispective" systematics of Blackwelder (1967).
At present it is difficult to evaluate the
completeness of the theory of general
cladograms. Two areas of theoretical
problems are evident: (1) levels of significance of difference between two or
more general cladograms; (2) criteria of
independence in resolution of general
cladograms. I do not consider either area
particularly important in a practical
sense; such a judgement would presuppose that systematic data are necessarily
so noisy in themselves that only mathematical treatment can isolate a non-random pattern. The history of systematics
belies that presupposition. Systematic
data are simply not in themselves necessarily noisy. Hence, I see more practical significance arising from a theoretical
consideration of systematic data (characters). If some data sets are noisy, why so?
Are there remedies? Yes, I believe that
there are, but I am not yet prepared to
deal with this subject.
VOL. 28
1979
CLADISTIC ANALYSIS
FIG. 6.—System no. 33 of Adanson's 65 artificial
systems. The three classes of this system are represented by the three vertical columns. His 58 natural families are listed to the left. The 35 "natural
families conserved" are noted to the right.
2. Champignons
3. Fucus
8. Liliasees
9. Jenjambres
11. Aristoloches
12. Eleagnus
15. Ombelliferes
16. Composees
17. Campanules
19 • Apannes
20. Scabieuses
21. Chevrefeuilles
2k. Bouraches
26. Vervenes
27. Personees
28. Solanons
30.
31.
32.
33.
Anagallis
Salikeres
Pourpiers
Joubarbes
37.
38.
39.
UP.
hi.
k2.
k3.
kk.
k3.
k6.
U7.
U8.
U9.
Amarantes
Espargoutes
Persikeres
Garou
Rosiers
Jujubiers
Legumineuses
Pistachiers
Titimales
Anones
Chatefiers
Tilleuls
Geranions
51. Capriers
52. Cruciferes
55• Renoncules
56. Arons
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
in a system; and the numbers in column
3, as the number of sections (secondary
divisions) that are natural, that is, that
correspond to his 58 natural families.
Hence, the third column has been taken
as some indication of the goodness, or
naturalness, of each of his 65 systems.
And his analysis of the 65 systems has
been taken as the means by which he resolved his 58 natural families.
The meaning of Adanson's table may
be considered with reference to an analysis of one (no. 33) of his 65 systems—a
simple one with only three classes, represented by the vertical columns of Fig.
6. As he did for each of his 65 artificial
systems, Adanson specified how his natural families are distributed among the
three different classes of this system. In
Fig. 6, the natural families are listed to
the left. Some families go into only one
class; some go into two; some go into all
three. Counting all of the black boxes
gives the number of sections (column 2
of Table 4)—which are, therefore, either
entire natural families or fragments of
them. Counting the families represented
by only one black box gives the number
of "natural sections conserved" (column
3). Some families went unmentioned
(families 19, 30, 31)—a result of his faulty
proofreading. They are counted here as
"blanks." Of interest is the fact that the
"sections" (column 2) and the "natural
sections conserved" (column 3) are not
features of the artificial system per se.
Rather, they result from the intersection
of the natural families with the three divisions (classes) of the artificial system.
I have retabulated Adanson's summary
(Table 4). For each system (1-65) there
are three columns (classes, sections,
natural sections) that correspond to his.
I have added a column for the blanks,
17
18
SYSTEMATIC ZOOLOGY
VOL. 28
TABLE 4. ADANSON'S SUMMARY OF HIS 65 ARTIFICIAL SYSTEMS (CORRECTED).1
System
number
1
2
3
4
5
6
8
9
10
11
12
13
14
11
11
13
10
41
7
22(19)
Sections
Natural
sections
164
7 (9)
275 (277)
0
272
224
1
1
415(417)
107(108)
165 (166)
Blanks
(0)
(0)
0
17 (7)
16 (59)
1(0)
1(0)
Fragments
157
275
271
223
415
90
149
55
154
58
96
299
9
8
10
172 (174)
7
38
7
303(314)
18 (17)
33 (12)
17 (10)
4 (229)
102
26 (31)
76
8
82
40 (44)
42
15
8
5
96
109
25 (29)
16
17
8
116(115)
23 (21)
1(0)
93
18
19
20
21
22
23
24
25
26
27
28
5
15
3
5
4
4
7
145
137
91
85
84
81
72
KO)
128
12
11
9
9
5
7
121 (124)
139(140)
17
15 (14)
25 (23)
34
28
33 (30)
47 (48)
21 (24)
9 (8)
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
5
7
3
14
5
8
9
3
15
4
20
7
14
4
6
3
5
12
6
7
8
8
6
8
4
8
6
6
17
92
91
113
151
256
93
37
1(0)
3(0)
1(0)
20
6- (8)
4 (3)
23 (21)
3(0)
3(0)
1(0)
1(0)
1(0)
1(0)
71
89
122
66
51
56
48
25
100
130
145
252
70
123 (124)
22 (21)
101
106
109
80
185
89
90
105
85
164
83
86
75
284 (283)
131
76
67
74
93
68
93
130
174
113
219(218)
62
211
76 (85)
95 (92)
116
31 (25)
27 (26)
35 (36)
23
34 (33)
25 (32)
24 (25)
36 (37)
29 (30)
37 (36)
37
44
9 (8)
12 (13)
40
49 (48)
43 (45)
29 (28)
50 (53)
44 (45)
33
17 (15)
17
15 (13)
54
13 (12)
47 (48)
28
18 (20)
75
82
45
162
55
65
81
49
135
46
49
31
275
119
36
18
31
64
18
49
97
157
96
204
8
198
29
67
98
3(0)
7(0)
1(0)
1(0)
1(0)
5(0)
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
7
Classes
CLADISTIC ANALYSIS
1979
TABLE 4.
Classes
Sections
60
61
62
63
64
65
Total:
Accuracy:
5
3
7
22
4
10
598
98%
76
65
92
101
63
94 (93)
(Continued)
Natural
sections
38
51
29
26
51
29
77%
Blanks
Fragments
1(0)
(37)
(55)
(24)
(48)
(35)
38
14
63
75
12
65
1(0)
25%
68%
1
In cases wherein corrected figures disagree with Adanson's, his are given in parentheses. Adanson gave no figures for "blanks," which
should have always been zero.
and another column for the fragments—
derived by subtracting the natural sections from the sections. Where my counts
and Adanson's disagree, I give his in parentheses. There are numerous disagreements—probably stemming from his
faulty proofreading. I judge Adanson to
have counted the classes reasonably accurately (98%); that task is easy, for it
amounts only to counting a few items in
a list—one for each system. He was less
accurate in counting "sections" (77%);
that task is not so easy, but it still amounts
to counting items in a list—albeit a longer
one. He was less accurate in counting
"natural sections conserved" (25%); that
task is difficult, for it requires comparing
different lists in search of items that occur in only one. Only 68% of his systems
are without blanks. The only other piece
of information relevant here is the total
number of classes (598), which in effect
are single characters—each defining one
class.
We can examine the relation between
the number of classes and the number of
"natural sections conserved" (Fig. 7).
The conclusion: the more complex the
system, the fewer natural sections conserved. We can examine the relation between the number of classes and the
number of "fragments" (Fig. 8). The conclusion: the more complex the system,
the more fragments. Both conclusions are
trivial.
Adanson's argument is directed toward
one conclusion: that a truly natural group
can have no uniquely defining characters—no synapomorphies. He considered
598 possible characters and rejected
them all, because not one of them
uniquely defined any one of his 58 families. He concluded that looking for characters that uniquely define natural groups
is fruitless, and can lead only to artificiality in classification.
Adanson's natural families stand in relation to his 65 artificial systems exactly
as do Michener's components 19 and 20
to his five fundamental cladograms:
Adanson's natural families are unresolved in his artificial systems; and Mich60H
50-
i/) 30'
<
*
.
10
20
30
HO
50
60
CLASSES
FIG. 7.—The relation between classes and "natural sections conserved" in Adanson's 65 systems.
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
System
number
19
20
SYSTEMATIC ZOOLOGY
have been, and still are, often confounded. In
analyzing them separately we will note that all
of the mistakes that have been made in each of
them stem from attempting to introduce the principles of one sort into the other. We will see that
all of the unjust criticisms that have been made
of one or the other sort of system stem from judging one sort by the laws of the other."
•
400-
•
^ 300• ••
UJ
•
•
S200-
t
u.
••
•
10
•
20
30
40
50
60
CLASSES
FlG. 8.—The relation between classes and fragments (of natural families) in Adanson's 65 systems.
ener's components are unresolved in his
fundamental cladograms. Here the resemblance ends between Adanson and
Michener. If I were to hazard a guess
about the resolution of Adanson's families and Michener's components, I would
suppose that Adanson was searching for
a natural system; and Michener, for an
artificial one. And by "artificial system"
I mean the same thing as expressed by
de Candolle (1844:44, translated):
"For their unique purpose and their unique result, artificial systems have, as we have seen, to
make it possible to learn with more or less ease,
the names of the species to which the systems
are applied."
In our modern understanding, the function of an artificial system is that of a key.
The distinction between artificial and
natural systems was made long ago,
thanks to the efforts of Linnaeus, the de
Jussieu, Adanson, de Candolle, and others; and our modern concept of a key
owes much to Lamarck. No one ever denied, and I would not, that artificial systems are useful. Their usefulness has
been well expressed by de Candolle and
many other writers. Yet the distinction
between the two sorts of systems, artificial and natural, is also useful because,
again as expressed by de Candolle, the
two sorts of systems have different goals
(1844:26, translated):
"These sorts of systems follow laws and rules
that are entirely different. Nevertheless, they
I conclude that my previous commentary
of Michener's results (Nelson, 1978) are,
by de Candolle's standard, unjust, under
the assumption that my judgement was
made in the context of the natural system,
whereas Michener's results are fairly
judged only in the context of an artificial
system.
What, then, of cladistics in relation to
the history of systematics? If cladistics is
merely a restatement of the principles of
natural classification, why has cladistics
been the subject of argument? I suspect
that the argument is largely misplaced,
and that the misplacement stems, as de
Candolle suggests, from confounding the
goals of artificial and natural systems. But
why have these been confounded? Partly, perhaps, because of widespread ignorance—inevitable in this age of specialization—of the history of systematics and
that of the distinction between artificial
and natural systems in their past and
present guises (modern artificial systems
are often presented as "evolutionary" or
"phylogenetic" or "phenetic," with the
implication that, being "evolutionary" or
"phylogenetic" or "phenetic," they are
also natural, or natural enough). More importantly, perhaps, because of the search
for some ultimate artificial system (the
ultimate source of data, or the ultimate
clustering algorithm, that by itself would
guarantee a worthwhile result, inaugurate a modern age of systematics, and in
the process relieve us all of a heavy burden—reading the systematic literature of
the past and reaching an informed judgement of its relevance). Beginning even
before "the new systematics," many different artificial systems (data sources and
methods) were hailed as the salvation of
systematics; all ultimately have failed, if
only because no artificial system can prevail as natural; it is nature that prevails.
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
100-
VOL. 2 8
1979
21
CLADISTIC ANALYSIS
ACKNOWLEDGMENTS
This manuscript has evolved tortuously over a long period. I am particularly
indebted to James Farris (for sharing
with me his clarity of thought over many
years), to Denis Brothers (for a patient
and thorough review of a previous draft
of this manuscript), to Leslie Marcus (for
sharing with me his grasp of numbers and
what they mean), and to Norman Platnick
and Donn Rosen (mainly for being themselves). Without them all I would have
been oblivious of many of my own mistakes, and would doubtlessly be submerged in many more mistakes than is
still the case. Once again I acknowledge
my indebtedness to Charles Michener,
for having done his studies, and for having tolerated my attentions to them.
REFERENCES
ADANSON, M. 1763-1764. Families des plantes.
Paris.
BLACKWELDER, R. E. 1967. Taxonomy. Wiley,
New York.
BROTHERS, D. 1978. How pure must a cladistic
study be?—a response to Nelson on Michener.
Syst. Zool. 27:118-122.
CANDOLLE, A.-P. DE 1844. Theorie elementaire de
la botanique (A. de Candolle [Ed.]). Paris.
HENNIG, W. 1966. Phylogenetic systematic^. University of Illinois Press, Urbana.
MICHENER, C. D. 1977. Discordant evolution and
the classification of allodapine bees. Syst. Zool.
26:32-56.
MICHENER, C. D. 1978. Dr. Nelson on taxonomic
methods. Syst. Zool. 27:112-118.
MICKEVICH, M. 1978. Taxonomic congruence.
Syst. Zool. 27:143-158.
NELSON, G. 1978. Professor Michener on phenetics—old and new. Syst. Zool. 27:104-112.
Manuscript received April 1978
Revised July 1978
Downloaded from http://sysbio.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 13, 2016
If there is no ultimate data source or
method (artificial system), will systematics perish for want of one? No, I judge
not. What will perish is the modern
preoccupation with artificial systems.
Systematic^ will endure, in Adanson's
words, as the "search in nature herself for
her system, if it is true that she really has
one."
© Copyright 2026 Paperzz