Three-item analysis and parsimony, intersection tree and

Bull. Soc. géol. Fr., 2009, t. 180, no 1, pp. 13-15
Three-item analysis and parsimony, intersection tree and strict consensus:
a biogeographical example
NATHANAËL CAO1, ESTELLE BOURDON 2, MEYDI EL AZAWI3 and RENÉ ZARAGÜETA BAGILS3
Key-words. – Three-item analysis, Intersection tree, Historical biogeography, Strict consensus, Parsimony.
Abstract. – Using a biogeographical example, we show that the strict consensus of parsimony programs fails to summarize the information present in the initial hypotheses of homology. The intersection tree used in the three-item analysis
program Nelson05 maximizes the compatibility of the three-item statements deduced from the source trees. The 3ia solution is more precise than the parsimony one because it accurately summarizes the initial information.
Analyse à trois éléments et parcimonie, arbre d’intersection et consensus strict: un exemple
biogéographique
Mots-clés. – Analyse à trois éléments, Arbre d’intersection, Biogéographie historique, Consensus strict, Parcimonie.
Résumé. – En utilisant un exemple biogéographique, nous montrons que le consensus strict usité dans les logiciels de
parcimonie ne résume pas correctement l’information présente dans les hypothèses d’homologie initiales. L’arbre d’intersection utilisé dans le logiciel d’analyse à trois éléments Nelson05 maximise la compatibilité des assertions à trois
éléments déduites des arbres source. La solution 3ia est plus précise que celle proposée en parcimonie car elle résume
de manière exacte l’information initiale.
INTRODUCTION
During the 2007 meeting of the French Society of Systematics (SFS), we presented a communication dealing with the
application of three-item analysis (3ia) to supertree building. Supertree building aims to combine trees of taxa in order to obtain a cladogram that summarizes the information
conveyed by the source trees. Note that the terminals of the
source trees do not need to be the same. From a formal
point of view, supertree construction is identical to historical biogeography [Nelson and Platnick, 1981; Nelson and
Ladiges, 1991]: both methods combine a set of hierarchies
so as to obtain a single tree summarizing the information
present in the source cladograms. However, supertree building is simpler because it does not have to deal with two
challenging phenomena occurring in biogeographic analyses, paralogy [Nelson and Ladiges, 1996] and MASTs
[Ebach et al., 2005]. Here we discuss the rationale underlying the 3ia solution for a particular biogeographical
example.
INTERSECTION TREE
(3is) deduced from the source trees yields optimal
taxon-area cladograms. The intersection tree combines the
3is common to all optimal cladograms, thus maximizing the
compatibility of the 3is deduced from optimal trees. Therefore, the intersection tree is strictly based on the information conveyed by the source trees. The polytomies of the
intersection tree follow interpretation 2 of Nelson and Platnick [1980].
THE PROBLEM
What would be the combination of the following taxon-area
cladograms?
3ia, as implemented in Nelson05 [Cao et al., 2007a], yields
609 optimal trees that maximize compatibility of three-item
statements (3is). The intersection tree of these optimal trees
produces the following hierarchy:
In Nelson05 program [Cao et al., 2007a], hypotheses of area
homology are represented as hierarchical relationships
(source trees). The combination of the three-item statements
1. CNRS UMR 5143, MNHN Département Histoire de la Terre, 75005 Paris, France. [email protected]
2. CNRS UMR 7179, Collège de France, UPMC Univ. Paris VI, 75005 Paris, France
3. CNRS UMR 5143, UPMC Univ. Paris VI, MNHN Département Histoire de la Terre, 75005 Paris, France. Tel: 01 40 79 80 53.
Email: [email protected]
Manuscrit déposé le 30 octobre 2007; accepté après révision le 8 avril 2008
Bull. Soc. géol. Fr., 2009, no 1
14
CAO N. et al.
This solution merely results from the combination or superposition of the two initial trees:
Each optimal tree and the intersection tree show a retention
index RI = 1, because there is no conflict in the original information.
Parsimony, as implemented in Brooks Parsimony Analysis (BPA) [Wiley, 1988] or Matrix Representation using
Parsimony (MRP) [Baum, 1992; Ragan, 1992], finds the
same 609 optimal cladograms. The parsimony program used
here is PAUP* v4b10 [Swofford, 1998], with branch and
bound search and option collapse = no (see below). Combination of the 609 trees through a strict consensus results in
a totally unresolved bush:
represent the set of source trees for which they stand” [Wilkinson and Thorley, 2001: 610]. In our case, the strict
consensus represents a set of 34 459 425 trees. In other
terms, more than 34 million trees can be deduced from this
consensus, the source trees representing only 0.0018% of
the compatible trees.
Null branches appear in a parsimony context when two
taxa have identical descriptions. The two source trees have
no null branches. If the option collapse = yes is used,
PAUP* collapses nodes that have null branches and yields
only 433 optimal trees, some of them showing polytomies.
This phenomenon is inherent to matrix-based methods,
which cannot represent hierarchical relationships [Cao et
al., 2007b], and require all the cells to be scored [Williams
and Ebach, 2006]. When the source hierarchical trees are
destructured and transformed into a partition, the obtained
table is as follows:
This consensus has an ensemble RI = 0. The RI of each
optimal tree is 1. The Adams consensus is identical to the
intersection tree found in 3ia.
CRITICISM
One of the 609 optimal trees found in parsimony or 3ia is figured below: areas B and F, C and G, D and H and E and I
are respectively more closely related to each other than to
other areas.
Critics stated that if the above tree is the “true tree”, the
3ia solution is wrong, because it is over-resolved given the
information present. The absence of resolution found by
BPA (or MRP) and a strict consensus method is preferable,
even if it tells us that there is no information in the original
trees. In other words, the totally unresolved parsimony –
strict consensus solution cannot be wrong, because it tells
us nothing.
REPLY
The strict consensus tree retains nodes present in all the optimal trees [Sokal and Rohlf, 1981]. It is the most widely
used consensus, probably because its interpretation is the
simplest and least ambiguous [Wilkinson and Thorley,
2001] in a parsimony context. The strict consensus has an
important flaw, however. The quality of a consensus may
be defined as “how well do individual consensus trees
Bull. Soc. géol. Fr., 2009, no 1
The quantity of question marks present in BPA is proportional to the degree of overlapping of terminals present
in the source trees. In the above example, the matrix is full
of question marks, because the source trees do not overlap,
except for area A. Remarkably, one of the criticisms of 3ia
is the supposed addition of missing data. Kluge (1993: 250
and 255) wrote that “the matrix resulting from the
three-taxon transformation has considerable missing data,
where none existed before. […] This amounts to the unscientific exercise of throwing away observations”. A simple
example shows that parsimony transforms the “data”, i.e.
sets of hierarchical hypotheses of area homology, into a matrix [Cao et al., 2007b]. This results in the creation of a
considerable amount of question marks where none existed
before [see Zaragüeta Bagils and Bourdon, 2007]. In addition, the under-resolved parsimony solution throws away
the information present in the source trees.
The 3ia solution computes the intersection of the two
source trees through the 3is deduced from them. It only takes into account the presence of the same relationships
whatever the tree. The 3ia solution is compatible only with
THREE-ITEM ANALYSIS AND PARSIMONY, INTERSECTION TREE AND STRICT CONSENSUS
the 609 optimal cladograms. In Wilkinson and Thorley’s
terms [2001], the intersection tree has a maximal efficiency,
at least in this case. The above example illustrates how 3ia
is much more precise than parsimony analysis [Nelson and
Platnick, 1991].
CONCLUSION
Parsimony programs are not able to recover the correct tree
because the strict consensus is unable to summarize the
source trees. The mistake our critics made comes from the a
priori assumption that parsimony is better than 3ia. This is
a common attitude with 3ia detractors [Platnick et al.,
15
1996]. In this case, their conclusions just follow from their
assumptions [Nelson, 1993]. A detailed analysis of the problem shows that the 3ia solution illustrated by the intersection tree is the best one, even from a parsimony point of
view. In sum, the 3ia solution is more precise than the parsimony one because it accurately summarizes the initial information.
Acknowledgements. – We are grateful to Blaise Li, Michel Laurin and
Jean-Yves Dubuisson among others, for their criticism, and we encourage
them to discuss every theoretical point that may be raised. We warmly
thank the organisers of the 1st International Palaeobiogeography Symposium (Paris, 2007). Gareth J. Nelson and an anonymous reviewer provided
useful comments on the manuscript.
References
BAUM B.R. (1992). – Combining trees as a way of combining data for phylogenetic inference, and the desirability of combining gene
trees. – Taxon, 41, 3-10.
CAO N., DUCASSE J. & ZARAGÜETA BAGILS R. (2007a). – NELSON05. – Published by the authors, Paris.
CAO N., ZARAGÜETA BAGILS R. & VIGNES-LEBBE R. (2007b). – Hierarchical
representation of hypotheses of homology. – Geodiversitas, 29,
5-15.
EBACH M., HUMPHRIES C.J., NEWMAN R.A., WILLIAMS D.W. & WALSH S.A.
(2005). – Assumption 2: Opaque to intuition? – Journ. Biogeography, 32, 781-787.
KLUGE A.G. (1993). – Three-taxon transformation in phylogenetic inference: ambiguity and distortion as regards explanatory power. –
Cladistics, 9, 246-259.
NELSON G.J. (1993). – Reply. – Cladistics, 9, 261-265.
NELSON G.J. & LADIGES P.Y. (1991). – Three-area statements: Standard assumptions for biogeographic analysis. – Syst. Zool., 40,
470-485.
NELSON G.J. & LADIGES P.Y. (1996). – Paralogy in cladistic biogeography
and analysis of paralogy-free subtrees. – Am. Mus. Novitates,
3167, 1-58.
NELSON G.J. & PLATNICK N.I. (1980). – Multiple branching in cladograms:
Two interpretations. – Syst. Zool., 29, 86-91.
NELSON G.J. & PLATNICK N.I. (1981). – Systematics and biogeography.
Cladistics and vicariance. – Columbia University Press, New
York, 567 p.
NELSON G.J. & PLATNICK N.I. (1991). – Three-taxon statements: A more
precise use of parsimony? – Cladistics, 7, 351-366.
PLATNICK N.I., HUMPHRIES C.J., NELSON G.J. & WILLIAMS D.W. (1996). –
Is Farris optimization perfect?: Three-taxon statements and multiple branching. – Cladistics, 12, 243-252.
RAGAN M.A. (1992). – Phylogenetic inference based on matrix representation of trees. – Mol. Phyl. Evol., 1, 53-58.
SOKAL R.R. & ROHLF F.J. (1981). – Taxonomic congruence in the Leptopodomorpha re-examined. – Syst. Zool., 30, 309-325.
SWOFFORD D.L. (1998). – PAUP*. Phylogenetic analysis using parsimony
(*and other methods). Version 4. – Sinauer Associates, Sunderland
WILEY E.O. (1988). – Parsimony analysis and vicariance biogeography. –
Syst. Zool., 37, 271-290.
WILKINSON M. & THORLEY J.L. (2001). – Efficiency of strict consensus
trees. – Syst. Biol., 50, 610-613.
WILLIAMS D.W. & EBACH M. (2006). – The data matrix. – Geodiversitas,
28, 5-16.
ZARAGÜETA BAGILS R. & BOURDON E. (2007). – Three-item analysis: Hierarchical representation and treatment of missing and inapplicable data. – C. R. Palevol, 6, 527-534.
Bull. Soc. géol. Fr., 2009, no 1