Bull. Soc. géol. Fr., 2009, t. 180, no 1, pp. 13-15 Three-item analysis and parsimony, intersection tree and strict consensus: a biogeographical example NATHANAËL CAO1, ESTELLE BOURDON 2, MEYDI EL AZAWI3 and RENÉ ZARAGÜETA BAGILS3 Key-words. – Three-item analysis, Intersection tree, Historical biogeography, Strict consensus, Parsimony. Abstract. – Using a biogeographical example, we show that the strict consensus of parsimony programs fails to summarize the information present in the initial hypotheses of homology. The intersection tree used in the three-item analysis program Nelson05 maximizes the compatibility of the three-item statements deduced from the source trees. The 3ia solution is more precise than the parsimony one because it accurately summarizes the initial information. Analyse à trois éléments et parcimonie, arbre d’intersection et consensus strict: un exemple biogéographique Mots-clés. – Analyse à trois éléments, Arbre d’intersection, Biogéographie historique, Consensus strict, Parcimonie. Résumé. – En utilisant un exemple biogéographique, nous montrons que le consensus strict usité dans les logiciels de parcimonie ne résume pas correctement l’information présente dans les hypothèses d’homologie initiales. L’arbre d’intersection utilisé dans le logiciel d’analyse à trois éléments Nelson05 maximise la compatibilité des assertions à trois éléments déduites des arbres source. La solution 3ia est plus précise que celle proposée en parcimonie car elle résume de manière exacte l’information initiale. INTRODUCTION During the 2007 meeting of the French Society of Systematics (SFS), we presented a communication dealing with the application of three-item analysis (3ia) to supertree building. Supertree building aims to combine trees of taxa in order to obtain a cladogram that summarizes the information conveyed by the source trees. Note that the terminals of the source trees do not need to be the same. From a formal point of view, supertree construction is identical to historical biogeography [Nelson and Platnick, 1981; Nelson and Ladiges, 1991]: both methods combine a set of hierarchies so as to obtain a single tree summarizing the information present in the source cladograms. However, supertree building is simpler because it does not have to deal with two challenging phenomena occurring in biogeographic analyses, paralogy [Nelson and Ladiges, 1996] and MASTs [Ebach et al., 2005]. Here we discuss the rationale underlying the 3ia solution for a particular biogeographical example. INTERSECTION TREE (3is) deduced from the source trees yields optimal taxon-area cladograms. The intersection tree combines the 3is common to all optimal cladograms, thus maximizing the compatibility of the 3is deduced from optimal trees. Therefore, the intersection tree is strictly based on the information conveyed by the source trees. The polytomies of the intersection tree follow interpretation 2 of Nelson and Platnick [1980]. THE PROBLEM What would be the combination of the following taxon-area cladograms? 3ia, as implemented in Nelson05 [Cao et al., 2007a], yields 609 optimal trees that maximize compatibility of three-item statements (3is). The intersection tree of these optimal trees produces the following hierarchy: In Nelson05 program [Cao et al., 2007a], hypotheses of area homology are represented as hierarchical relationships (source trees). The combination of the three-item statements 1. CNRS UMR 5143, MNHN Département Histoire de la Terre, 75005 Paris, France. [email protected] 2. CNRS UMR 7179, Collège de France, UPMC Univ. Paris VI, 75005 Paris, France 3. CNRS UMR 5143, UPMC Univ. Paris VI, MNHN Département Histoire de la Terre, 75005 Paris, France. Tel: 01 40 79 80 53. Email: [email protected] Manuscrit déposé le 30 octobre 2007; accepté après révision le 8 avril 2008 Bull. Soc. géol. Fr., 2009, no 1 14 CAO N. et al. This solution merely results from the combination or superposition of the two initial trees: Each optimal tree and the intersection tree show a retention index RI = 1, because there is no conflict in the original information. Parsimony, as implemented in Brooks Parsimony Analysis (BPA) [Wiley, 1988] or Matrix Representation using Parsimony (MRP) [Baum, 1992; Ragan, 1992], finds the same 609 optimal cladograms. The parsimony program used here is PAUP* v4b10 [Swofford, 1998], with branch and bound search and option collapse = no (see below). Combination of the 609 trees through a strict consensus results in a totally unresolved bush: represent the set of source trees for which they stand” [Wilkinson and Thorley, 2001: 610]. In our case, the strict consensus represents a set of 34 459 425 trees. In other terms, more than 34 million trees can be deduced from this consensus, the source trees representing only 0.0018% of the compatible trees. Null branches appear in a parsimony context when two taxa have identical descriptions. The two source trees have no null branches. If the option collapse = yes is used, PAUP* collapses nodes that have null branches and yields only 433 optimal trees, some of them showing polytomies. This phenomenon is inherent to matrix-based methods, which cannot represent hierarchical relationships [Cao et al., 2007b], and require all the cells to be scored [Williams and Ebach, 2006]. When the source hierarchical trees are destructured and transformed into a partition, the obtained table is as follows: This consensus has an ensemble RI = 0. The RI of each optimal tree is 1. The Adams consensus is identical to the intersection tree found in 3ia. CRITICISM One of the 609 optimal trees found in parsimony or 3ia is figured below: areas B and F, C and G, D and H and E and I are respectively more closely related to each other than to other areas. Critics stated that if the above tree is the “true tree”, the 3ia solution is wrong, because it is over-resolved given the information present. The absence of resolution found by BPA (or MRP) and a strict consensus method is preferable, even if it tells us that there is no information in the original trees. In other words, the totally unresolved parsimony – strict consensus solution cannot be wrong, because it tells us nothing. REPLY The strict consensus tree retains nodes present in all the optimal trees [Sokal and Rohlf, 1981]. It is the most widely used consensus, probably because its interpretation is the simplest and least ambiguous [Wilkinson and Thorley, 2001] in a parsimony context. The strict consensus has an important flaw, however. The quality of a consensus may be defined as “how well do individual consensus trees Bull. Soc. géol. Fr., 2009, no 1 The quantity of question marks present in BPA is proportional to the degree of overlapping of terminals present in the source trees. In the above example, the matrix is full of question marks, because the source trees do not overlap, except for area A. Remarkably, one of the criticisms of 3ia is the supposed addition of missing data. Kluge (1993: 250 and 255) wrote that “the matrix resulting from the three-taxon transformation has considerable missing data, where none existed before. […] This amounts to the unscientific exercise of throwing away observations”. A simple example shows that parsimony transforms the “data”, i.e. sets of hierarchical hypotheses of area homology, into a matrix [Cao et al., 2007b]. This results in the creation of a considerable amount of question marks where none existed before [see Zaragüeta Bagils and Bourdon, 2007]. In addition, the under-resolved parsimony solution throws away the information present in the source trees. The 3ia solution computes the intersection of the two source trees through the 3is deduced from them. It only takes into account the presence of the same relationships whatever the tree. The 3ia solution is compatible only with THREE-ITEM ANALYSIS AND PARSIMONY, INTERSECTION TREE AND STRICT CONSENSUS the 609 optimal cladograms. In Wilkinson and Thorley’s terms [2001], the intersection tree has a maximal efficiency, at least in this case. The above example illustrates how 3ia is much more precise than parsimony analysis [Nelson and Platnick, 1991]. CONCLUSION Parsimony programs are not able to recover the correct tree because the strict consensus is unable to summarize the source trees. The mistake our critics made comes from the a priori assumption that parsimony is better than 3ia. This is a common attitude with 3ia detractors [Platnick et al., 15 1996]. In this case, their conclusions just follow from their assumptions [Nelson, 1993]. A detailed analysis of the problem shows that the 3ia solution illustrated by the intersection tree is the best one, even from a parsimony point of view. In sum, the 3ia solution is more precise than the parsimony one because it accurately summarizes the initial information. Acknowledgements. – We are grateful to Blaise Li, Michel Laurin and Jean-Yves Dubuisson among others, for their criticism, and we encourage them to discuss every theoretical point that may be raised. We warmly thank the organisers of the 1st International Palaeobiogeography Symposium (Paris, 2007). Gareth J. Nelson and an anonymous reviewer provided useful comments on the manuscript. References BAUM B.R. (1992). – Combining trees as a way of combining data for phylogenetic inference, and the desirability of combining gene trees. – Taxon, 41, 3-10. CAO N., DUCASSE J. & ZARAGÜETA BAGILS R. (2007a). – NELSON05. – Published by the authors, Paris. CAO N., ZARAGÜETA BAGILS R. & VIGNES-LEBBE R. (2007b). – Hierarchical representation of hypotheses of homology. – Geodiversitas, 29, 5-15. EBACH M., HUMPHRIES C.J., NEWMAN R.A., WILLIAMS D.W. & WALSH S.A. (2005). – Assumption 2: Opaque to intuition? – Journ. Biogeography, 32, 781-787. KLUGE A.G. (1993). – Three-taxon transformation in phylogenetic inference: ambiguity and distortion as regards explanatory power. – Cladistics, 9, 246-259. NELSON G.J. (1993). – Reply. – Cladistics, 9, 261-265. NELSON G.J. & LADIGES P.Y. (1991). – Three-area statements: Standard assumptions for biogeographic analysis. – Syst. Zool., 40, 470-485. NELSON G.J. & LADIGES P.Y. (1996). – Paralogy in cladistic biogeography and analysis of paralogy-free subtrees. – Am. Mus. Novitates, 3167, 1-58. NELSON G.J. & PLATNICK N.I. (1980). – Multiple branching in cladograms: Two interpretations. – Syst. Zool., 29, 86-91. NELSON G.J. & PLATNICK N.I. (1981). – Systematics and biogeography. Cladistics and vicariance. – Columbia University Press, New York, 567 p. NELSON G.J. & PLATNICK N.I. (1991). – Three-taxon statements: A more precise use of parsimony? – Cladistics, 7, 351-366. PLATNICK N.I., HUMPHRIES C.J., NELSON G.J. & WILLIAMS D.W. (1996). – Is Farris optimization perfect?: Three-taxon statements and multiple branching. – Cladistics, 12, 243-252. RAGAN M.A. (1992). – Phylogenetic inference based on matrix representation of trees. – Mol. Phyl. Evol., 1, 53-58. SOKAL R.R. & ROHLF F.J. (1981). – Taxonomic congruence in the Leptopodomorpha re-examined. – Syst. Zool., 30, 309-325. SWOFFORD D.L. (1998). – PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4. – Sinauer Associates, Sunderland WILEY E.O. (1988). – Parsimony analysis and vicariance biogeography. – Syst. Zool., 37, 271-290. WILKINSON M. & THORLEY J.L. (2001). – Efficiency of strict consensus trees. – Syst. Biol., 50, 610-613. WILLIAMS D.W. & EBACH M. (2006). – The data matrix. – Geodiversitas, 28, 5-16. ZARAGÜETA BAGILS R. & BOURDON E. (2007). – Three-item analysis: Hierarchical representation and treatment of missing and inapplicable data. – C. R. Palevol, 6, 527-534. Bull. Soc. géol. Fr., 2009, no 1
© Copyright 2026 Paperzz