454 VOL. 50 S YSTEMATIC BIOLOGY Syst. Biol. 50(3):454–462, 2001 Incorporation, Relative Homoplasy, and Effect of Gap Characters in Sequence-Based Phylogenetic Analyses M ARK P. S IMMONS ,1 HELGA O CHOTERENA,1 AND TIMOTHY G. CARR 2 1 2 L.H. Bailey Hortorium, 462 Mann Library, Cornell University, Ithaca, New York 14853, USA; E-mail: [email protected] Department of Ecology and Evolutionary Biology, Corson Hall, Cornell University, Ithaca, New York 14853, USA Phylogenetic analysis of nucleotide and amino acid sequences requires the alignment of homologous sequences. The alignment procedure often requires the insertion of gaps, putatively corresponding to insertion or deletion events, which can be coded as phylogenetic characters. As a general class of phylogenetic characters, gaps have variously been suggested to be reliable (e.g., Lloyd and Calder, 1991; Van Dijk et al., 1999) or unreliable (e.g., Golenberg et al., 1993; Ford et al., 1995). This difference in opinion, coupled with the lack of a well-supported method for the coding of gaps, has led to a diversity of approaches by which gaps have been treated in, or excluded from, tree searches (González, 1996). In an earlier paper we presented two methods, termed simple and complex indel coding, in which all gaps (excluding leading and trailing gaps, which are generally artifacts) can be coded from aligned sequence-based matrices (Simmons and Ochoterena, 2000). Simple indel coding, which is used in this study, is implemented by coding all gaps that have different 5’ or 3’ termini as separate presence/absence characters. Whenever a gap is being coded and the region it spans is completely included within the span of another gap, those sequences having the longer gap (i.e., one that extends to or beyond both the 5’ and 3’ termini of the gap being coded) are scored as inapplicable for the gap character being coded. Some have suggested on theoretical and empirical grounds that longer gaps are better phylogenetic characters than shorter gaps. Lloyd and Calder (1991) argued that multiresidue gaps are reliable phylogenetic characters because indels are unlikely to be repeated in the exact same position with the same length and sequence (for insertions); indels of different lengths at the same position are recognized as separate events. Similarly, van Ham et al. (1994) suggested that, based on the relative levels of homoplasy in the intergenic spacer between trnL and trnF, gaps longer than two positions are reliable phylogenetic characters. In this paper we assess the relative levels of homoplasy of gap and base characters from a selection of 38 published sequence-based matrices. We determine the potential phylogenetic information included in gap characters and the extent to which inclusion of gap characters alters the gene tree topology and branch support values. We also test the assertion that longer gaps are better phylogenetic characters than shorter gaps. M ETHODS Thirty-eight sequence-based data matrices were selected for this study: 5 based on structural rDNA, 5 based on ITS, 6 based on introns, and 22 based on protein-coding exons (Appendix 1). Matrices with many gaps were preferentially selected over matrices with few gaps so that many gaps could be coded. Gaps were coded for all matrices by using simple indel coding (as explained above, see also Simmons and Ochoterena, 2000). In all cases, the original sequence alignments, obtained from the authors or downloaded from EMBL on 9 July 1999 at http://bioinfo.weizmann.ac.il/pub/ databases/embl/align/, were used. All aligned positions were included in the analyses, even if certain regions were excluded by the original authors (as was done by Kanai et al., 1997; Budin and Philippe, 1998; Burmester et al., 1998; Downie et al., 1998; Tourancheau et al., 1998). Gaps in DNA sequence matrices were manually coded by using WinClada (Nixon, 1999). Gaps in amino acid sequence matrices 2001 POINTS OF VIEW were manually coded by using MacClade (Maddison and Maddison, 1992). DNA sequence matrices were analyzed with Nona (Goloboff, 1993). Amino acid sequence matrices were analyzed with PAUP¤ (Swofford, 1998). For all analyses equally weighted parsimony was used. Tree searches were performed by using 100 heuristic searches with random order taxon entry and TBR branch swapping. A maximum of 1,000 trees was held. To compare the potential amount of phylogenetic information contained in the base characters and the gap characters, the maximum possible number of steps minus the minimum possible number of steps for each character was used as a measure of the “amount of possible synapomorphy” (Farris, 1989:418). This statistic, which was calculated directly from the data matrices, was obtained by using the statistics option in WinClada and manually calculated for the amino-acid–based matrices by using MacClade. Note that the statistic for potential–phylogenetic information is not particularly sensitive to missing data, and when large amounts of missing data are present, the statistic may be somewhat misleading. The consistency index (Kluge and Farris, 1969) and the retention index (Farris, 1989) were used to assess relative amounts of homoplasy in gap and base characters. The consistency indices presented include uninformative characters for two reasons. First, uninformative characters that include autapomorphies are not homoplasious. Although they are not phylogenetically informative, these characters are appropriately considered when measuring homoplasy (Goloboff, 1991). Second, the consistency index, when autapomorphic characters are excluded, is not a fair measure when comparing gaps coded as binary characters (as is done when using simple indel coding) with multistate base characters (up to four nucleotides or 20 amino acids). This is because informative multistate characters may include uninformative character states that raise the consistency index. In contrast, binary characters that have an uninformative character state are eliminated when the consistency index is calculated only on the basis of informative characters. In comparing the consistency and retention indices of gap and base characters, both 455 groups of characters were optimized onto the most-parsimonious tree(s) found by using base characters only. These trees were selected as a very conservative measure of relative levels of homoplasy between gap and base characters. The base characters were mapped onto the most-parsimonious tree(s) for these characters, whereas the gap characters were mapped onto trees for which they had no effect on the topology. Furthermore, when a range of consistency or retention indices (or both) for gap characters was obtained for the most-parsimonious trees found by using base characters only, the lowest values were reported (i.e., the tree with the worst t of the gap characters was used). Both of these factors represent biases in favor of base characters when comparing consistency and retention indices. One problem complicating comparison of homoplasy between gap and base characters is that the two types of characters have different amounts of missing data. All else being equal, the more missing data in a character, the higher the consistency and retention indices are expected to be—because missing data cannot conict with the groupings inferred by the character state entries that are present. Generally, because of the manner in which overlapping gaps are coded in simple indel coding, more data were missing in gap characters than in the base characters. For our purposes, we corrected for these differences in amounts of missing data by multiplying the consistency and retention indices by the percentage of real (not missing) data for each group of characters being compared. The modied indices are termed the “corrected consistency index” and the “corrected retention index.” Note that relative to the “uncorrected” indices, the corrected indices generally favor base characters because those generally contained less missing data. Strict consensus trees were used to compare the percentage of branches in common between most-parsimonious trees found by using base characters only compared with those based on both base and gap characters. Comparing strict consensus trees is a severe measure of the similarity in tree topologies because a single rearrangement of one taxon from one clade to a distantly related clade results in a substantial decrease in the number of clades in common between the 456 VOL. 50 S YSTEMATIC BIOLOGY strict consensus trees. To determine the percentage of branches in common, the number of branches resolved in the strict consensus trees of both matrices in each comparison was divided by the number occurring in whichever strict consensus tree was least resolved. Relative levels of branch support between the trees constructed with base characters and the trees constructed with base and gap characters were compared in terms of bootstrap support values (Felsenstein, 1985). Bootstrap support values were determined with 100 replicates with 10 TBR searches using random taxon addition per replicate. For the nucleotide sequence matrices analyzed with Nona, strict consensus bootstrap support values (described by Davis et al., 1998) were mapped onto the strict consensus of the most-parsimonious trees by using WinClada. For the amino acid sequence matrices analyzed with PAUP¤ , frequency within replicates bootstrap support values were mapped onto the 50% majority rule bootstrap tree (because PAUP¤ does not calculate strict consensus bootstrap support values). Average bootstrap support values were calculated on the basis of the branches in common on the strict consensus trees from both matrices. Two comparisons were performed to test the assertions made by Lloyd and Calder (1991) and van Ham et al. (1994) that longer gaps are better phylogenetic characters than shorter gaps. To test the assertion made by Lloyd and Calder (1991), corrected consistency and retention indices of single-position gaps were compared with gaps longer than one position. Because gaps in exons generally occur in multiples of three nucleotide positions (one codon) and in the coding frame, the shortest gap in exons is generally three nucleotide positions long. According to the criteria for the assertion made by Lloyd and Calder (1991), these gaps are considered equivalent to gaps that are one nucleotide position long in non-exon regions. To test the assertion made by van Ham et al. (1994), we compared corrected consistency and retention indices of one- and twoposition-long gaps with gaps longer than two positions. According to the criterion for the assertion made by van Ham et al. (1994), gaps in exons (which were, in all exon matrices, at least three nucleotide positions long) were not considered. Therefore, this test was limited to the structural rDNA, ITS, and intron matrices. Consistency and retention indices were measured by mapping the gap characters onto the most-parsimonious trees found by using base characters only. Statistical Analyses The comparisons outlined above are generally split-plot designs consisting of a random factor (matrix) nested within a xed factor (type of matrix: rDNA, ITS, intron, exon) and on which measurements are paired for another xed factor (e.g., consistency index measured for base and gap characters; Keppel, 1982). For each split-plot analysis, only the statistically signicant results or the relevant statistically insignicant results are reported. ANOVAs were used for analyses comparing a single measurement among the matrix types. Raw data or residuals were checked for normality and homoscedasticity by using one-sample Kolmogorov–Smirnov tests and Fmax tests, respectively, and transformed if necessary. Although sample sizes for some matrix types were small, in >95% of the cases the data or the residuals (depending on the type of analysis) were not significantly heteroscedastic nor were their distributions signicantly different from normal. Thus, being able to include matrix type as a factor prevents exons (which have a much greater sample size) from unduly inuencing the overall results. Differential effects of matrix type show up as interactions in the split-plot ANOVA and are reported whenever signicant. All analyses were performed with SYSTAT v. 6.0 for Windows, except the Fmax tests were performed according to Sokal and Rohlf (1981) and evaluated by using the table in Rohlf and Sokal (1981). Any post hoc tests were evaluated by using Bonferronicorrected probabilities (Keppel, 1982). Our primary purpose in using statistical hypothesis testing is to facilitate interpretation of the results from this set of matrices. Our sample has a broad taxonomic base and examines a range of loci and types of loci, suggesting that its inferential scope should be broad. However, because systematists studying different groups often focus on different loci, correlations between locus type and taxonomic group can confound interpretation (e.g., all ITS sequences analyzed are from plants). Therefore, broader inferences about types of loci and the effects 2001 457 POINTS OF VIEW of gaps on phylogenetic analyses should be made with caution. R ESULTS AND D IS CUS SION Relative Homoplasy and Effect of Gap Characters In the 38 matrices, an average of 8% (from 1% to 22%) of the potential phylogenetic information was contained in gap characters (Table 1). The percentage of potential phylogenetic information contained in gap characters varied signicantly among the types of matrices (ANOVA: F3,34 D 11:19, P < 0:00005) because of the signicantly greater phylogenetic content contained in gaps in ITS-based matrices (average 14.8%) and intron-based matrices (average 14.5%) compared with exon-based matrices (average 4.4%; Bonferroni correction for both comparisons: P < 0:0005). Gaps averaged 9.8% of the total characters in all matrices, and the percentage of total characters that are gaps did not differ among the types of matrices (ANOVA: F3,34 D 1:60, P > 0:2). Gap characters were found to have signicantly less homoplasy than did base characters. When mapped on the mostparsimonious trees found by using base TABLE 1. Relative homoplasy and effect of gap characters. Matrix no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 a% Corrected CI Corrected RI Bases Gaps Bases Gaps IARb 14 8 6 3 17 6 16 12 22 18 17 22 11 15 20 2 3 2 6 6 3 2 8 12 3 5 2 2 2 3 2 6 4 3 1 8 9 5 42 58 50 40 54 30 60 69 50 62 67 75 68 82 56 52 57 24 70 47 39 40 63 49 69 68 59 72 64 61 67 65 70 70 64 71 46 66 57 71 48 61 56 51 70 71 48 72 64 71 84 72 57 37 62 40 77 54 57 38 61 54 78 65 54 80 51 49 79 56 52 86 77 76 54 67 44 50 36 49 63 58 63 73 34 69 61 65 70 77 64 67 68 42 67 40 48 48 65 45 65 70 55 54 64 53 63 58 55 63 50 68 55 62 60 61 37 36 60 70 68 73 22 76 52 60 85 42 65 54 67 59 83 57 68 50 83 65 86 86 65 82 64 56 90 76 51 90 87 81 77 79 yes yes yes yes yes yes yes no yes yes yes no yes no yes yes yes yes no yes yes yes yes yes no yes yes no yes no no yes yes yes yes no yes no of potential phylogenetic information contained in gaps. including of the amount of resolution. c Positive values reect increase with the addition of gaps. CI, consistency index; RI, retention index. b IAR, Differencec in Topology change %gap inf. cont.a Not IAR yes no no yes yes yes yes no yes yes yes no yes no yes yes yes yes no no no yes no yes no no no no no no no no yes no no no yes no No. of trees No. of clades Bootstrap 2 1 ¡1 8 29 — ¡3 ¡2 0 ¡10 0 0 ¡1 0 16 — 1 ¡16 0 ¡2 ¡261 ¡5 ¡50 1 0 1 5 0 ¡3 0 0 ¡2 0 ¡3 1 0 6 0 ¡2 ¡1 2 0 ¡5 ¡5 1 0 0 0 0 0 0 0 ¡1 18 0 ¡5 0 3 23 5 7 ¡1 0 1 ¡6 0 0 0 0 4 0 2 ¡1 0 ¡3 0 2.7 0.2 ¡4.3 ¡2.8 6.8 0.4 1.1 5.2 4.6 7.8 6.4 4.5 0.6 ¡0.1 1.6 6.9 0.2 2.2 ¡0.5 ¡0.1 1.0 4.2 1.7 ¡0.4 ¡0.4 0.7 1.4 1.0 2.6 0.6 ¡0.8 0.6 2.0 2.0 0.4 1.0 2.9 458 S YSTEMATIC BIOLOGY characters only, gap characters generally had higher corrected ensemble consistency indices than base characters. For 24 of 38 matrices (63%), gap characters had a higher corrected consistency index than did base characters, and on average the corrected consistency index was slightly higher for gap characters (0.62) than for base characters (0.58; Table 1; split-plot ANOVA: F1,34 D 5:69, P < 0:05). For 27 of 38 matrices (71%), gap characters had a higher corrected retention index than did base characters, and for 9 of 38 matrices (24%), gap characters had a lower corrected retention index than did base characters. Overall, the difference between the corrected retention index for gap and base characters was not signicant (split-plot ANOVA: F1,34 D 2:01, P > 0:15), but the difference by “matrix type” interaction was (split-plot ANOVA: F3,34 D 7:68, P < 0:0005): The corrected retention index for gaps versus bases averaged 15.6% greater in exon matrices (P < 0:000005). Inclusion of gap characters usually changed the strict consensus of the mostparsimonious trees. In 28 of 38 matrices (74%), including gap characters resulted in a change in the amount of resolution or topology of the strict consensus tree (Table 1). In 17 of 38 matrices (45%), including gap characters changed the topology of the strict consensus tree, irrespective of the amount of resolution (Table 1). Inclusion of gap characters did not necessarily decrease the number of mostparsimonious trees (split-plot ANOVA: F1,32 D 0:02, P > 0:8) or increase the resolution of the strict consensus of the mostparsimonious trees (split-plot ANOVA: F1,34 D 0:29, P > 0:5 ). Overall, the number of most-parsimonious trees decreased in 36% (13) of the matrices and increased in 31% (11) of the matrices when gap characters were included (Table 1). Likewise, in 26% (10) of the matrices the resolution of the strict consensus tree increased, and in 26% (10) of the matrices the resolution of the strict consensus tree decreased (Table 1). No effect of matrix type on the change in number of most-parsimonious trees (splitplot ANOVA: F3,32 D 0:12, P > 0:9) or on the resolution of the strict consensus tree (split-plot ANOVA: F3,34 D 0:27, P > 0:8) was evident. For particular data sets (e.g., 16, 21, and 23), however, including gap char- VOL. 50 acters substantially decreased the number of parsimonious trees found (by as many as 261) and increased the resolution of the strict consensus tree (by as many as 23 clades). Inclusion of gap characters generally resulted in increased branch support as measured by bootstrap support. In 29 of 37 matrices (78%; bootstrap support values were not determined for one matrix because of the many [130] terminals, which would have resulted in prohibitively long tree-search times) bootstrap support increased on the branches in common between the strict consensus trees (Table 1). In only 8 of 37 matrices (22%) did bootstrap support decrease on the branches in common between the strict consensus trees. Overall, including gap characters produced a small (1.7%) but signicant increase in average branch support per tree were included (split-plot ANOVA: F1,33 D 21:95, P < 0:00005). The average change in bootstrap support was more pronounced when the bootstrap support values increased rather than when they decreased (C2.2% vs. ¡1.2%). Homoplasy Relative to Gap Length Longer gaps were not found to have less homoplasy than shorter gaps. In 20 of 38 matrices (54%), the corrected consistency index was greater for single-position gaps than for gaps longer than one position, and the average corrected consistency index for single position gaps (0.62) differed insignicantly from that for gaps longer than one position (0.624; Table 2; split-plot ANOVA: F1,34 D 1:94, P > 0:15). Likewise, in 18 of the 37 (49%) matrices with informative gaps both one position long and more than one position long, the corrected retention index was greater for single-position gaps than for gaps longer than one position (Table 2). Overall, the average corrected retention index for single position gaps (0.656) and that for gaps longer than one position (0.657; Table 2) were not signicantly different (splitplot ANOVA: F1,33 D 0:18, P > 0:6). In exons, however, gaps longer than one position (i.e., one codon) apparently are less homoplasious than single-position gaps. In 14 of 21 exon matrices (67%) the corrected consistency index was greater for gaps longer than one position. In contrast, single-position gaps appear to have less homoplasy in 2001 459 POINTS OF VIEW TABLE 2. Homoplasy relative to gap length. Matrix No. of gaps Corrected CI Corrected RI No. of gaps Corrected CI Corrected RI No. Exon 1 bp >1 bp 1 bp >1 bp 1 bp >1 bp 1–2 bp >2 bp 1–2 bp >2 bp 1–2 bp >2 bp 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 no no no no no no no no no no no no no no no no yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes 88 75 17 44 191 38 58 54 64 45 23 25 26 46 34 20 12 47 16 37 64 7 14 39 19 6 1 7 6 12 5 14 8 13 8 11 24 16 58 83 39 13 121 44 62 37 104 65 30 47 43 52 130 116 24 76 18 95 125 46 19 73 19 3 9 12 7 23 6 5 13 9 2 17 39 19 57 80 61 65 61 44 75 75 51 75 81 70 83 72 51 48 67 28 74 62 54 27 54 45 67 67 50 70 67 42 82 56 63 78 73 68 57 69 59 63 44 52 50 58 67 66 47 72 55 73 84 75 58 36 61 33 79 51 59 40 66 58 90 62 54 87 44 53 75 57 48 99 100 80 52 64 60 68 29 33 63 67 74 75 24 77 74 60 84 50 36 60 71 65 82 66 69 29 78 62 82 91 80 72 69 44 88 71 53 84 87 77 75 85 61 57 37 43 54 74 65 72 22 76 43 60 85 30 68 53 66 56 84 53 68 52 86 67 92 77 64 88 63 62 90 80 50 99 115 109 26 53 243 57 71 76 100 65 26 35 30 55 53 32 31 49 30 4 69 25 49 15 68 45 27 37 39 43 111 104 56 74 55 63 59 45 74 71 46 75 71 71 82 75 57 45 62 64 44 44 46 68 66 70 52 70 58 74 84 72 57 36 57 63 28 36 64 68 71 73 25 77 59 57 81 50 57 54 69 58 41 39 49 69 64 75 18 73 47 63 86 32 67 54 83 78 72 CI, consistency index; RI, retention index. non-exon regions than do gaps longer than one position. In 10 of 16 non-exon matrices (62%), the corrected consistency index was greater for single-position gaps. Thus, there is a nearly signicant matrix type £ gap length interaction when the corrected consistency index is analyzed (split-plot ANOVA: F1,34 D 2:63, P < 0:07). No such interaction was found for the corrected retention index. For the non-exon matrices, gaps longer than two positions were not less homoplasious than gaps one or two positions long. In 11 of 16 matrices (69%), the corrected consistency index was greater for the shorter gaps, but the average corrected consistency index was not signicantly different for gaps one or two positions long (mean 0.637) and gaps longer than two positions (mean 0.604; Table 2; split-plot ANOVA: F1,13 D 1:87, P > 0:15). In 8 of 16 matrices (50%), the corrected retention index was greater for longer gaps, but again there was no difference on average between the corrected retention indices of the two types of gaps (mean 0.575 for 1to 2-bp gaps, 0.565 for >2 bp gaps) (Table 2; split-plot ANOVA: F1,13 D 0:15, P > 0:7). Conclusions Our results demonstrate the following: (1) gap characters can represent a considerable portion of the potential phylogenetic information in sequence-based matrices; (2) gap characters have signicantly less homoplasy 460 S YSTEMATIC BIOLOGY than do base characters, but the difference is slight and sometimes depends on the type of matrix; (3) including gap characters in sequence-based matrices often changes the topology or resolution of the strict consensus tree; and (4) including gap characters in sequence-based matrices often increases branch support values. Together, these results support the inclusion of gap characters in phylogenetic analyses that include sequence data from structural rDNA, ITS of rDNA, intron, or exon regions. These empirical results, in combination with the theoretical bases given for using gap characters and rigorous methodologies with which to code gap characters (Giribet and Wheeler, 1999; Simmons and Ochoterena, 2000), strongly support the use of gap characters in phylogenetic analyses. In contrast to the assertions made by Lloyd and Calder (1991) and van Ham et al. (1994), longer gaps were not necessarily found to be better phylogenetic characters than shorter gaps (assuming that characters with less homoplasy are better phylogenetic characters). This result challenges any attempt to a priori weight gap characters according to their length. ACKNOWLEDGMENTS We thank Jerrold Davis, Jeff Doyle, Damon Little, and the Doyle and Harrison Lab Groups for reviewing the manuscript and for helpful discussions; Kevin Nixon for helpful discussions; and David Hibbett, Richard Olmstead, and two anonymous reviewers for their constructive criticisms. We also thank Gilles Bena, Alessandra Bonci, James Brown, Thorsten Burmester, Elie Dassa, Stephen Downie, Tadashi Kajita, Satoru Kanai, David Krakauer, Roberta Mason-Gamer, Lucinda McDade, Hervé Philippe, Jean-Loup Risler, Douglas Soltis, Anne Baroin Tourancheau, and Miranda von Dornum for sending us the aligned sequences used in this study. R EFERENCES ALLARD , M. W. 1994. An empirical example of parsimony behavior. Pages 231–248 in Models of phylogeny reconstruction (R. W. Scotland, D. Siebert, and D. M. Williams, eds.). Clarendon Press, Oxford. ANDREASEN, K., B. G. BALD WIN , AND B. BR EMER . 1999. Phylogenetic utility of the nuclear rDNA ITS region in subfamily Ixoroideae (Rubiaceae): Comparisons with cpDNA rbcL sequence data. Plant Syst. Evol. 217:119– 135. BARRIEL, V. 1994. Molecular phylogenies and how to code insertion/deletion events. Life Sci. 317:693–701. BENA, G., J.-M. PROSPER , B. LEJEUNE, AND I. OLIVIERI . 1998. Evolution of annual species of the genus Medicago: A molecular phylogenetic approach. Mol. Phylogenet. Evol. 9:552–559. VOL. 50 BLOMS TER , J., C. A. MAG GS , AND M. J. STANHOPE. 1999. Extensive intraspecic morphological variation in Enteromorpha muscoides (Chlorophyta) revealed by molecular analysis. J. Phycol. 35:575–586. BONCI, A., A. CHIESURIN, P. MUS CAS , AND G. M. ROSSOLINI . 1997. Relatedness and phylogeny within the family of periplasmic chaperones involved in the assembly of pili or capsule-like structures of Gramnegative bacteria. J. Mol. Evol. 44:299–309. BROWN, J. R., F. T. ROBB , R. WEISS , AND W. F. DOOLITTLE. 1997. Evidence for the early divergence of tryptophanyl- and tyrosyl-tRNA synthetases. J. Mol. Evol. 45:9–16. BUDIN, K., AND H. PHILIPPE . 1998. New insights into the phylogeny of eukaryotes based on ciliate Hsp70 sequences. Mol. Biol. Evol. 15:943–956. BURMESTER , T., H. C. MASS EY, J R ., S. O. ZAKHARKIN, AND H. BENES . 1998. The evolution of hexamerins and the phylogeny of insects. J. Mol. Evol. 47:93–108. DAVIS , J. I., M. P. SIMMO NS , D. W. STEVENSON, AND J. F. W ENDEL . 1998. Data decisiveness, data quality, and incongruence in phylogenetic analysis: An example from the monocotyledons using mitochondrial atpA sequences. Syst. Biol. 47:282–310. DIAZ-LAZCOZ, Y., J.-C. AUDE, P. NITSCHK É, H. CHIAPELLO , C. LAND ÈS -D EVAUCHELLE, AND J.-L. RISLER . 1998. Evolution of genes, evolution of species: The case of aminoacyl-tRNA synthetases. Mol. Biol. Evol. 15:1548–1561. DOWNIE, S. R., S. RAMANATH, D. S. KATZ-D OWNIE, AND E. LLANAS . 1998. Molecular systematics of Apiaceae subfamily Apioideae: Phylogenetic analyses of nuclear ribosomal DNA internal transcribed spacer and plastid rpoC1 intron sequences. Am. J. Bot. 85:563– 591. FARRIS , J. S. 1989. The retention index and the rescaled consistency index. Cladistics 5:417–419. FELSENSTEIN , J. 1985. Condence limits on phylogenies: An approach using the bootstrap. Evolution 39:783– 791. FORD, V. S., B. R. THOMAS , AND L. D. GOTTLIEB . 1995. The same duplication accounts for the PgiC genes in Clarkia xantiana and C. lewisii (Onagraceae). Syst. Bot. 20:147–160. GIRIBET , G., AND W. C. WHEELER . 1999. On gaps. Mol. Phylogenetic. Evol. 13:132–143. GOLENBERG , E. M., M. T. CLEGG , M. L. DURBIN, J. DOEBLEY, AND D. P. MA. 1993. Evolution of a noncoding region of the chloroplast genome. Mol. Phylogenet. Evol. 2:52–64. GOLOBOFF, P. A. 1991. Homoplasy and the choice among cladograms. Cladistics 7:215–232. GOLOBOFF, P. A. 1993. Nona, version 1.6 (computer software and manual). Distributed by the author. Tucumán, Argentina. GONZÁLEZ, D. 1996. Codicación de las insercionesdeleciones en el an álisis logenético de secuencias génicas. Bol. Soc. Bot. Mex. 59:115–129. HASSANIN, A., AND E. J. P. DOUZERY. 1999. Evolutionary afnities of the enigmatic saola (Pseudoryx nghetinhensis) in the context of the molecular phylogeny of Bovidae. Proc. R. Soc. Lond. Biol. Sci. 266:893–900. KAJITA, T., K. KAMIYA, H. TACHID A, R. WICKNES WARI, Y. TSUMURA, H. YOSHIMARU, AND T. YAMAZAKI. 1998. Molecular phylogeny of Dipterocarpacea e in southeast Asia based on nucleotide sequences of matK, trnL intron, and trnL–trnF intergenic spacer region in chloroplast DNA. Mol. Phylogenet. Evol. 10:202–209. 2001 POINTS OF VIEW KANAI , S., R. KIKUNO , H. TOH, H. RYO , AND T. TODO . 1997. Molecular evolution of the photolyase-bluelight photoreceptor family. J. Mol. Evol. 45:535–548. KEPPEL, G. 1982. Design & analysis: A researcher’s handbook, 2nd edition. Prentice-Hall, Englewood Cliffs. KLUGE, A. G., AND J. S. FARRIS . 1969. Quantitative phyletics and the evolution of Anurans. Syst. Zool. 18:1–32. KRAKAUER, D. C., P. M. D . A. ZANOTTO, AND M. PAGEL. 1998. Prion’s progress: Patterns and rates of molecular evolution in relation to spongiform disease. J. Mol. Evol. 47:133–145. LITTLEWOOD, D. T. J., A. B. SMITH, K. A. CLOUGH, AND R. H. EMS ON. 1997. The interrelationships of the echinoderm classes: Morphological and molecular evidence. Biol. J. Linn. Soc. 61:409–438. LLOYD, D. G. AND V. L. CALDER . 1991. Multi-residue gaps, a class of molecular characters with exceptional reliability for phylogenetic analyses. J. Evol. Biol. 4:9– 21. MADDIS ON, W. P., AND D. R. MADDIS ON. 1992. MacClade: Analysis of phylogeny and character evolution. Sinauer, Sunderland, Massachusetts. MASON-G AMER , R. J., C. F. WEIL, AND E. A. KELLO GG . 1998. Granule-bound starch synthase: Structure, function, and phylogenetic utility. Mol. Biol. Evol. 15:1658– 1673. MCD ADE, L. A., AND M. L. MOODY. 1999. Phylogenetic relationships among Acanthaceae: Evidence from noncoding trnL-trnF chloroplast DNA sequences. Am. J. Bot. 86:70–80. NEDBAL, M. A., M. W. ALLARD , AND R. L. HONEYCUTT . 1994. Molecular systematics of hystricognath rodents: Evidence from the mitochondrial 12S rRNA gene. Mol. Phylogenet. Evol. 3:206–220. NISHIDA , H., AND J. SUGIYAMA. 1993. Phylogenetic relationships among Taphrina, Saitoella, and other higher fungi. Mol. Biol. Evol. 10:431–436. NIXON, K. C. 2000. WinClada, version 1.0 (computer software and manual). Distributed by the author. Cornell Univ., Ithaca, New York. 461 ROHLF, F. J., AND R. R. SOKAL. 1981. Statistical tables, 2nd edition. W. H. Freeman and Co., New York. SAURIN, W., M. HOFNUNG , AND E. DASS A. 1999. Getting in or out: Early segregation between importers and exporters in the evolution of ATPbinding cassette (ABC) transporters. J. Mol. Evol. 48: 22–41. SIMMONS , M. P., AND H. OCHOTERENA. 2000. Gaps as characters in sequence-based phylogenetic analyses. Syst. Biol. 49:369–381. SOKAL, R. R., AND F. J. ROHLF. 1981. Biometry, 2nd edition. W. H. Freeman and Co., New York. SOLTIS , D. E., L. A. JOHNSON, AND C. LOONEY. 1996. Discordance between ITS and chloroplast topologies in the Boykinia group (Saxifragaceae). Syst. Bot. 21:169– 185. SWOFFORD, D. L. 1998. PAUP¤ : Phylogenetic analysis using parsimony (¤ and other methods). Sinauer, Sunderland, Massachusetts. TOURANCHEAU, A. B., E. VILLALOBO , N. TSAO , A. TORRES , AND R. E. PEARLMAN. 1998. Protein coding gene trees in ciliates: Comparison with rRNAbased phylogenies. Mol. Phylogenet. Evol. 10:299– 309. VAN DIJK , M. A. M., E. PARADIS , F. CATZEFLIS , AND W. W. DE JONG . 1999. The virtues of gaps: Xenarthran (Edentate) monophyly supported by a unique deletion in aA-crystallin. Syst. Biol. 48:94–106. VAN HAM , R. C. H. J., H. HART , T. H. M. MES , AND J. M. SAND BRINK . 1994. Molecular evolution of noncoding regions of the chloroplast genome in the Crassulaceae and related species. Curr. Genet. 25:558–566. VON DORNUM, M., AND M. RUVOLO . 1999. Phylogenetic relationships of the new world monkeys (Primates, Platyrrhini) based on nuclear G6PD DNA sequences. Mol. Phylogenet. Evol. 11:459–476. Received 12 July 2000; accepted 18 August 2000 Associate Editor: R. Olmstead 462 VOL. 50 S YSTEMATIC BIOLOGY APPENDIX 1. CITATION AND CHARACTERIS TICS OF THE MATRICES Matrix No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Citation Nedbal et al. (1994) Nishida and Sugiyama (1993) Allard (1994) Hassanin and Douzery (1999) Littlewood et al. (1997) Downie et al. (1998) Soltis et al. (1996) Bena et al. (1998) Andreasen et al. (1999) Blomster et al. (1999) Mason-Gamer et al. (1998) Mason-Gamer et al. (1998) von Dornum and Ruvolo (1999) Kajita et al. (1998) McDade and Moody (1999) Downie et al. (1998) Krakauer et al. (1998) Budin and Philippe (1998) Tourancheau et al. (1998) Kanai et al. (1997) Burmester et al. (1998) Saurin et al. (1999) Brown et al. (1997) Bonci et al. (1997) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) Diaz-Lazco z et al. (1998) No. of characters Type rDNA rDNA rDNA rDNA rDNA ITS ITS ITS ITS ITS intron intron intron intron intron intron exon-DNA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA exon-AA Locus Base Gap mitochondrial 12S rRNA 18S rRNA mitochondrial 12S rRNA mitochondrial 12S rRNA 18S rRNA ITS ITS ITS and ETS ITS ITS granule-bound starch synthase introns granule-bound starch synthase introns G6PD introns trnL-trnF intron and spacer trnL-trnF intron and spacer rpoC1 intron prion precursor protein Hsp70 phosphoglycerate kinase photolyase-blue-light photoreceptor family hexamerins and hemocyanins ABC transporters trytophanyl- and tyrosyl-tRNA synthetases periplasmic chaperone-like proteins ArgRS AspRS GluRS GlyRS HisRS lleRS LeuRS MetRS PheRS ProRS ThrRS TrpRS TrpRS and TyrRS TyrRS 814 2021 1000 976 1957 488 597 1110 696 655 1358 1421 1286 1948 1152 949 896 817 492 838 971 416 184 295 411 218 226 490 201 461 267 266 271 321 343 245 233 313 146 158 56 57 312 82 120 91 168 110 53 72 69 98 164 136 36 123 34 132 189 53 33 112 38 9 10 19 13 35 11 19 21 22 10 28 63 35
© Copyright 2026 Paperzz