American Journal of Botany 91(1): 149–157. 2004. AN EXPANDED PLASTID ORCHIDACEAE DNA PHYLOGENY OF AND ANALYSIS OF JACKKNIFE BRANCH SUPPORT STRATEGY1 JOHN V. FREUDENSTEIN,2,6 CÁSSIO VAN DEN BERG,3 DOUGLAS H. GOLDMAN,4 PAUL J. KORES,5 MIA MOLVRAY,5 AND MARK W. CHASE5 2 Ohio State University Herbarium, Department of Evolution, Ecology, and Organismal Biology, 1315 Kinnear Rd., Columbus, Ohio 43212 USA; 3Universidade Estadual de Feira de Santana, Departamento de Ciências Biológicas, BR116 km 3 Campus Universitario, 44031-460 Feira de Santana, Bahia, Brazil; 4Harvard University Herbaria, 22 Divinity Avenue, Cambridge, Massachusetts 02138 USA; and 5Molecular Systematics Section, Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK An expanded plastid DNA phylogeny for Orchidaceae was generated from sequences of rbcL and matK for representatives of all five subfamilies. The data were analyzed using equally weighted parsimony, and branch support was assessed with jackknifing. The analysis supports recognition of five subfamilies with the following relationships: (Apostasioideae (Vanilloideae (Cypripedioideae (Orchidoideae (Epidendroideae))))). Support for many tribal-level groups within Epidendroideae is evident, but relationships among these groups remain uncertain, probably due to a rapid radiation in the subfamily that resulted in short branches along the spine of the tree. A series of experiments examined jackknife parameters and strategies to determine a reasonable balance between computational effort and results. We found that support values plateau rapidly with increased search effort. Tree bisection-reconnection swapping in a single search replicate per jackknife replicate and saving only two trees resulted in values that were close to those obtained in the most extensive searches. Although this approach uses considerably more computational effort than less extensive (or no) swapping, the results were also distinctly better. The effect of saving a maximal number of trees in each jackknife replicate can also be pronounced and is important for representing support accurately. Key words: branch support; jackknife analysis; matK; Orchidaceae; phylogenetic analysis; plastid DNA; rbcL. The last 10 years have seen a rapid advance in our understanding of orchid relationships, due largely to the contribution of molecular phylogenetic studies. The phylogenetic hypotheses based on morphology set forth by Burns-Balogh and Funk (1986), Dressler (1990a, b, c, 1993), and Szlachetko (1995), and as reflected in their classifications, are being evaluated and revised as necessary. The first broadly sampled molecular analysis of Orchidaceae (Chase et al., 1994; 33 genera) was followed by an even more extensive sampling of rbcL sequences (135 genera), the most widely used plastid DNA locus for phylogenetic analysis. Neyland and Urbatsch (1995, 1996) also sparsely sampled the family (31 and six genera represented, respectively) for the plastid locus ndhF, but their results and those of Chase et al. (1994) were necessarily viewed as tentative, given the importance of sampling to accurate phylogenetic reconstruction (e.g., Graybeal, 1998; Pollock et al., 2002). An example of a sample-dependent effect in the Neyland and Urbatsch analyses would appear to be the placement of Orchidoideae as sister to Cypripedioideae and Epidendroideae, a relationship not supported by subsequent analyses. Cameron et al. (1999) thus presented our first convincing look at the molecular perspective on higher-level relationships across the family. A significant amount of structure was revealed throughout the family, although many nodes received relatively low (,75%) bootstrap support. Cameron et al. (1999) analyzed their data using both equally weighted and successively weighted parsimony, but only the latter results were presented. Successive weighting (Farris, 1969) is based on the principle that some characters (or at least our homology hypotheses regarding them) are better than others at indicating phylogenetic relationships, but recent studies have emphasized the importance of homoplastic characters for providing tree structure (Källersjö et al., 1999; Wenzel and Siddall, 1999), such that it is not clear that downweighting homoplastic characters as is done in successive weighting is the best approach for phylogenetic analysis. At the time that cladistic methods first appeared, morphological characters predominated. Individual characters marking particular clades were valued because characters were relatively few and each was often known in some detail. Indeed, it was often a triumph to find a single synapomorphy to unite a group. With ever increasing amounts of sequence data being generated, the emphasis has shifted now from analysis of individual characters to the sheer weight of group support to assess confidence. This assessment of support is not usually interpreted as being statistical in the sense of estimating ‘‘truth’’ (cf. Sanderson, 1995), but gives an indication of support within the context of a particular data set. Jackknifing has emerged as an important method to estimate support; the suggestion that it provides the result of an infinitely repeated bootstrap analysis argues for its power (Farris et al., 1996). Although the first jackknife analysis programs allowed for only a fast tree construction approach without branch swapping, subsequent programs allowed a more extensive tree search. The availability of these options raises the question of what the optimal search strategy in jackknife analysis might be. Previous studies (DeBry and Olmstead, 2000; Mort et al., 2000; Salamin et al., 2003) have explored these Manuscript received 25 April 2003; revision accepted 4 September 2003. The authors thank W. Mark Whitten, and Norris Willams for access to sequence data and Dan Janies, Alec Pridgeon and an anonymous reviewer for discussion and comments. This research was supported by US NSF grant DEB-9615437 to J. V. F. 6 E-mail: [email protected]. 1 149 150 AMERICAN JOURNAL issues to some extent and with somewhat conflicting results. The current data set provides an opportunity to extend our understanding of this procedure. The purposes of this study were twofold: (1) to combine matK data with the rbcL data already available in a simultaneous analysis to produce an expanded plastid DNA phylogenetic hypothesis for the family that has a rigorous analytical basis and (2) to explore the impact of varying search parameters on jackknife analyses. MATERIALS AND METHODS Many of the rbcL sequences used here have been published previously (cf. Cameron et al., 1999), as have some of the matK sequences (cf. Goldman et al., 2001); they are available in GenBank (Appendix; see Supplemental Data accompanying the online version of this article). Thirty-eight new rbcL and 68 matK sequences were generated by standard polymerase chain reaction (PCR) methods, including amplification with Taq polymerase using primers and conditions given in Cameron et al. (1999) and Goldman et al. (2001). Sequencing was performed using the ABI Prism cycle sequencing kits according to the manufacturer’s directions (Applied Biosystems, Foster City, California, USA), except that 1/4 reactions were used. Reactions were run on an ABI 377A automated sequencer (Applied Biosystems) following the manufacturer’s protocols. Sequences were assembled in AutoAssembler (Applied Biosystems) and Sequencher (GeneCodes, Ann Arbor, Michigan, USA) and aligned by eye; we followed the recommendations of Freudenstein and Chase (2001) for coding gap characters in matK. New sequences have been deposited in GenBank. Outgroups were chosen from related asparagalean families (Asteliaceae, Boryaceae, Hypoxidaceae, Lanariaceae) that have been shown to be close relatives of Orchidaceae (e.g., Rudall et al., 1998; Chase et al., 2000). Equally weighted parsimony analysis of the combined rbcL 1 matK (including gap characters) data was performed with WinClada (Nixon, 2002) running NONA (Goloboff, 1993) as a daughter process; specifically, a heuristic search for the shortest trees was performed using the parsimony ratchet (Nixon, 2002) in WinClada (200 iterations/replicate, one tree held/replicate, 20 ratchet runs). WinClada and NONA were run on a 450 MHz PC with 256 MB RAM running Windows NT. These data were also analyzed with PAUP*, employing 2000 random addition searches, holding one tree, and saving five trees per search replicate, then swapping extensively to obtain the complete set of most parsimonious trees. The PAUP* analysis was run on a MacIntosh PowerPC G3 at 400 MHz with 192 MB RAM (OS 8.6). All analyses were performed on the data set with uninformative characters excluded. A series of jackknife experiments was designed to examine the effects of different search strategies on branch support values. Jackknife analyses were performed with both WinClada/NONA and PAUP*. PAUP* was set to emulate Jac resampling (characters are deleted individually based on a specific probability rather than deleting an overall percentage of characters; Farris et al., 1996), character deletion was set at 37% to correspond to a probability of 1/e that a character appears in the resampled matrix, and tree bisectionreconnection (TBR) branch swapping was employed for most analyses with MulTrees on. One analysis was performed with nearest-neighbor interchange (NNI) swapping to relate the results more directly to those of Mort et al. (2000). Three variables were manipulated in the jackknife analyses: (1) the number of jackknife replicates performed; (2) the number of search replicates performed per jackknife replication; and (3) the number of trees saved in each search replicate. The results of branch support analyses such as bootstrap and jackknife vary from run to run because they depend upon the specific trees that are generated from the ‘‘randomly’’ assembled resampled matrices. The first experiment was designed to assess the variance in support values with different numbers of jackknife replicates. Jackknife values for 11 branches that varied maximally among runs were obtained by performing jackknife analyses comprising 100, 1000, 2500, and 5000 replicates in Winclada/NONA. Each of these analyses was repeated 10 times in order to sample the distribution of variation among individual analyses. The mean value, standard deviation, and range for jackknife values on each of the 11 most variable clades were recorded. OF BOTANY [Vol. 91 The second experiment was designed to compare the results of jackknife analyses performed with various combinations of parameters to explore the efficiency of search strategies. Ten thousand jackknife replicates were performed in PAUP* for each of the following search strategies: fast stepwise (FS, which does not employ swapping), one addition search and saving one tree per jackknife replicate (1/1), as well as 1/2, 2/1, 1/3, 3/1, and 2/2 combinations. Fourteen clades were selected for comparison based on the range of their values in the different strategies—we selected clades that showed maximum change among strategies. Branch support was plotted for these clades against time required to complete the jackknife analysis. A third experiment was performed in PAUP* employing a single jackknife analysis with 10 000 jackknife replicates, each with a single addition replicate and saving 20 trees, in order to assess the effect of saving a greater number of trees. Jackknife analysis was also performed in PAUP* on the rbcL matrix alone (10 000 replicates, each consisting of one random addition search, holding one tree, and saving two trees, TBR branch swapping). RESULTS The raw aligned data matrix comprised 2958 base positions for 173 taxa. The incomplete ends of both rbcL and matK sequences were excluded from the analyses as were uninformative characters, leaving 1180 informative characters to be included in the analysis, 27% (319) from rbcL and 73% from matK. The matK data included 18 multibase indels coded as presence/absence or unordered multistate characters (as in Freudenstein and Chase, 2001) appended to the sequence matrix. One sequence (Neuwiedia) was missing for matK, but all other sequences were at least partially complete for both genes. Analysis of jackknife parameters—The first experiment assessed the variance in values on a particular branch for various numbers of jackknife replicates. Ten analyses of the data set using WinClada/NONA with increasing numbers of jackknife replicates revealed a progressive decrease in the range of values on most branches. The branches on which the greatest range was observed had support between 52% and 87% (Table 1); none of the clades with support over 90% was among the most variable in these analyses. The standard deviations ranged from 1.3 to 2.4 (depending upon clade) for 1000 replication analyses, 0.7 to 1.6 for 2500 replication analyses, and 0.5 to 0.8 for 5000 replication analyses. A value of four standard deviations (equivalent to a 95% probability that a randomly drawn value will be within four standard deviations of any other specific value for the same branch) yields a conservative guide as to what might be considered a ‘‘significant’’ change between treatments for a given number of replications (Table 1). ‘‘Significant’’ here means that for a particular number of jackknife replications a difference is likely due to variation in search strategy parameters rather than to stochastic variation among individual jackknife analyses. It is conservative because it accommodates even data points that are at opposite ends of the distribution (considering that there is a 95% probability that a randomly drawn sample will be within two standard deviations of the mean). For 1000 replicates, the difference in support values for a particular branch between two treatments would have to be greater than 6–10 (depending on the clade) in order to conclude (at 95% probability level) that it was due to difference in treatment. For 2500 replications, this value decreases to 3– 7, and for 5000 it is 2–4. Whereas branches with low support often had a greater variance than those with higher support, this was not always the case, such that no absolute trend was evident (Table 1). To minimize the effects of number of jack- January 2004] TABLE 1. FREUDENSTEIN ET AL.—ORCHID PHYLOGENY AND BRANCH SUPPORT 151 Summary of results of 10 jackknife analyses at each of three numbers of replicates for 10 clades. Clade Range Mean Variance SD 4 SD 1000 replications Corycium 1 Disa Gennaria 1 Habenaria Pachyplectron 1 Spiranthes Epipactis 1 sister Dilomilis 1 sister Tainia 1 sister Arundina 1 sister Oncidium 1 sister Cattleya 1 Meiracyllium Restrepia 1 sister 49–57 65–72 65–71 69–74 68–74 73–79 70–76 83–87 64–69 66–73 53.7 67.9 68.4 72.2 72.3 75.5 72.3 85.6 66.4 70.0 5.6 3.7 3.6 2.6 2.7 2.5 4.5 1.8 2.7 3.6 2.4 1.9 1.9 1.6 1.6 1.6 2.1 1.3 1.6 1.9 9.4 7.6 7.6 6.5 6.5 6.3 8.4 5.4 6.6 7.5 2500 replications Corycium 1 Disa Gennaria 1 Habenaria Pachyplectron 1 Spiranthes Epipactis 1 sister Dilomilis 1 sister Tainia 1 sister Arundina 1 sister Oncidium 1 sister Cattleya 1 Meiracyllium Restrepia 1 sister 51–55 67–68 65–70 71–73 71–73 75–77 70–74 84–86 66–69 68–71 52.9 67.5 68.5 72.3 72.0 75.9 72.6 85.3 67.1 69.5 2.3 0.3 2.5 0.5 0.4 0.5 1.4 0.7 1.0 0.9 1.5 0.5 1.6 0.7 0.7 0.7 1.2 0.8 1.0 1.0 6.1 2.1 6.3 2.7 2.7 3.0 4.7 3.3 4.0 3.9 5000 replications Corycium 1 Disa Gennaria 1 Habenaria Pachyplectron 1 Spiranthes Epipactis 1 sister Dilomilis 1 sister Tainia 1 sister Arundina 1 sister Oncidium 1 sister Cattleya 1 Meiracyllium Restrepia 1 sister 52–54 67–69 68–70 71–73 71–73 74–77 72–73 84–86 66–68 68–71 53.0 67.6 68.8 72.2 72.0 75.6 72.3 85.2 67.1 69.4 0.4 0.5 0.4 0.6 0.2 0.7 0.2 0.4 0.3 0.7 0.7 0.7 0.6 0.8 0.5 0.8 0.5 0.6 0.6 0.8 2.7 2.8 2.5 3.2 1.9 3.4 1.9 2.5 2.3 3.4 knife replicates even further in treatment comparisons here, the second experiment was performed at 10 000 replicates. Figure 1 shows the results of varying the number of search replications per jackknife replicate and the number of trees saved per jackknife replicate on clades plotted against elapsed time for the analysis performed in PAUP* using TBR and NNI branch swapping. The data on which this table is based are available as Supplementary Data with the online version of this article. Often the clades that experienced the greatest changes among strategies were those with lower support values, although even branches with support values above 90% showed some change. In general, support increased with increased time of search, regardless of how that was achieved. The most dramatic increase in support was between a search with no branch swapping (fast stepwise search) and one in which a minimum amount of swapping was employed (one search replicate with one tree saved). This is the difference between the FS procedure in PAUP* and an NNI analysis with one search replicate and saving one tree per replicate (1/1). The second greatest increase occurred between 1/1 NNI and 1/1 TBR searches. The increases were less striking beyond that. As an example, the most substantial increase was for the epidendroid clade, for which support increased from 60 to 96% between the FS analysis and the 1/1 TBR analysis. Half of that increase occurred between FS and 1/1 NNI and the second half between the latter and 1/1 TBR. In terms of increase in support per unit time, the difference between FS and 1/1 NNI is most striking; although the epidendroid clade had an equal gain in support between FS and 1/1 NNI and 1/1 NNI and 1/1 TBR, the former represented only a 1.73 increase in analysis time, whereas the TBR search took 35 times as long. The support increase per unit time was always greatest for the 1/1 NNI search over FS, as compared to any other increment of swapping effort. A surprising difference in support is evident between the run with two search replicates but one tree saved (the ‘‘2/1’’ analysis) and that with one search replicate saving two trees (1/2). Although the second analysis took only 45 min longer than the first, most clades had an abrupt but small increase in performance (1–3%). Beyond the 2/1 or 1/2 analyses, effectiveness sometimes increased and sometimes decreased. For five of the 14 clades, even the 2/2 analysis resulted in no increase in support over the 2/1 analysis, although it required almost twice as much time. For almost all others, the 2/2 analysis gave the best result; in only one case was the 2/2 result suboptimal. For many clades resolved in the 1/20 analysis (tree not shown), the values were similar to those in the comparable 1/1 analysis. However, for 32 clades the support was at least 4% lower in the 1/20 analysis than in the 1/1 analysis. These clades were at all hierarchic levels in the tree but were typically those with lower support. The highest support value that decreased with increased tree saving was 93% in the 1/1 TBR tree. The greatest decrease observed was 17% for the Epipactis 1 Listera 1 Neottia clade. Not every low or mid-range support value decreased, however. In no case did a clade increase 152 AMERICAN JOURNAL OF BOTANY [Vol. 91 Fig. 1. Jackknife support percentages plotted against analysis time under different parameter combinations for 14 clades of Orchidaceae (for clade composition, see Figs. 2–3). FS 5 fast stepwise, NNI 5 nearest neighbor interchange swapping. The number preceding the slash is the number of random addition searches per jackknife replicate, and the number after the slash is the number of trees saved per jackknife replicate. by four or more points in the 1/20 analysis relative to the 1/1 analysis. Tree topology—The parsimony analysis found 11 201 most parsimonious trees for the rbcL 1 matK sequence data (length 5 7309; consistency index [CI] 5 0.29; retention index [RI] 5 0.59). Although 2000 random addition searches provide a good chance of discovering multiple islands of most parsimonious trees, it is still not possible to conclude that all have been found. This data set appeared to be a relatively easy one to search, however, in that shortest trees were found in 616 of the 2000 replicates. Figures 2–3 show the strict consensus of the 11 201 most parsimonious trees with jackknife values plotted from the 1/2 jackknife analysis. As a measure of relative resolution between the analyses of rbcL 1 matK and rbcL alone, 133 clades are supported at the 51% or greater level in the combined jackknife tree, while only 56 clades are supported at that level in the jackknife analysis of rbcL alone (trees not shown), indicating a substantial increase in branch support when the matK data are added. The family as a whole is supported at the 100% level relative to the asparagoid outgroups in both analyses. The monophyly of each of the currently recognized subfamilies (Apostasioideae, Vanilloideae, Cypripedioideae, Orchidoideae, Epidendroideae) is supported at the 98% level or above in the combined analysis, which is a significant increase for some groups over the rbcL results (Orchidoideae was 78% and Epidendroideae was 65% with rbcL alone). Resolution among Apostasioideae, Vanilloideae, Cypripedioideae, and Orchidoideae 1 Epidendroideae is not supported in either analysis, resulting in a polytomy. The latter group is well supported (99%) in the combined analysis but not in the rbcL analysis (,51%). Among Orchidoideae in the combined analysis, Chloraea, Megastylis, and Pterostylis are united in a strongly supported clade with the spiranthoid orchids (Dressler’s Spiranthinae, Cranichidinae, and Goodyerinae) and with Pachyplectron (92%), which is sister to Goodyerinae. Sister to this assemblage is the remainder of the diurids, comprising the clades containing Acianthus, Microtis, Eriochilus, Cryptostylis, and Thelymitra. Sister to all of these is a clade comprising a monophyletic Orchideae (the clade defined by Holothrix and Ophrys), a paraphyletic Diseae (all of the groups from Disperis to Satyrium), and, sister to all of these, Codonorchis. Within Epidendroideae, Neottieae 1 Palmorchis is supported at 87%, and Neottieae, comprising Cephalanthera, Epipactis, Neottia, and Listera, at 100% (73% with rbcL). The remainder of epidendroids are united at 71%. The ‘‘lower’’ epidendroids (Nervilia through Sobralia) fall as a polytomy outside of the ‘‘higher’’ epidendroids, the latter receiving support of 95%. Within the higher epidendroids, many small groups receive strong support ($90%), such as Coelia 1 1 Coelia 2, Isochilus 1 Ponera, Liparis 1 Malaxis (5 Malaxideae), Arpophyllum 1 Hexadesmia 1 Cattleya 1 Meiracyllium (5 Laeliinae), Phreatia 1 Appendicula 1 Trichotosia 1 Ceratostylis 1 Eria, Polystachya 1 (Cleisostoma 1 Neofinetia 1 Phalaenopsis 1 Aerangis 1 Diaphananthe 1 Aeranthes 1 Angraecum) [5 Vandeae], and the clade defined by Bletilla and Eleorchis (5 Bletiinae, in part). The leafless and probably achlorophyllous Wullschlaegelia is resolved as sister to Aplectrum 1 Govenia with 71% support. The vandoid orchids (defined by Acriopsis and Kegeliella) are well supported (96%) in the combined analysis, but do not appear in the rbcL support tree. Acriopsis and Thecostele comprise the sister to Cymbidium and together are strongly supported as monophyletic. These three genera are sister to the remainder of the vandoids, which are strongly supported in the combined analysis (99%; 71% with rbcL alone). The remainder of the cymbidioid genera fall in small clades whose relationships are unresolved, with one, Cyrtopodium, appearing as sister to Maxillarieae (defined by Eriopsis and Kegeliella and well supported at 97%). The cymbidioids are therefore paraphyletic. Some clades within Maxillarieae are also well supported, such as Pachyphyllum 1 Dipteranthus 1 Stellilabium (99%), Cryptarrhena-Huntleya (100%), and Acineta-Kegeliella (99%). January 2004] FREUDENSTEIN ET AL.—ORCHID PHYLOGENY AND BRANCH SUPPORT 153 Fig. 2. Strict consensus tree for combined rbcL 1 matK analysis of Orchidaceae. Jackknife support percentages are shown (.50%). Subfamilies are indicated with vertical bars. Out 5 outgroups, Apo 5 Apostasioideae, Cyp 5 Cypripedioideae, Orc 5 Orchidoideae, Epi 5 Epidendroideae. DISCUSSION Jackknifing—Although the current study focused on the jackknife, we expect that the results would be similar for bootstrapping because they are conceptually related (Farris et al., 1996) and are similar in methodology—only the sampling regime is different. Mort et al. (2000) compared bootstrap and jackknife and attributed the differences observed to the different resampling schemes employed in each, noting that because the resampled matrix is usually smaller in jackknifing, the branch support values may also be smaller. A similar pattern was noted by Salamin et al. (2003). Note that Mort et al. (2000) and Salamin et al. (2003) did not employ the resampling scheme put forth by Farris et al. (1996); they used a fixed proportion of character deletion (33 or 50%) for their jackknife analyses, rather than a probability of 1/e for each character to be deleted. Hedges (1992), Mort et al. (2000), and to a lesser extent Salamin et al. (2003) addressed the issue of the number of replicates that should be performed to ensure precision in estimating the support value. Hedges (1992) calculated the num- ber of replications needed to achieve a statistically accurate estimate of the bootstrap support value based on the variance of a binomial distribution. He determined that if one wishes the 95% confidence range to be 1% of the support value (in this case 95%), then 1825 bootstrap replications would need to be performed. At a bootstrap value of 99%, 381 replications would be required. Similarly, 8067 replicates would be needed for a clade with support of 70%, and 9519 replicates for a clade with 55% support. At the time that Hedges was writing, the use of few bootstrap replications (20–100 as cited by Hedges for several studies) was common; increased computer speed and better heuristics now allow much more thorough support analyses. The numbers calculated by Hedges are in the general range of those used in our analyses and concluded on an empirical basis to be appropriately precise estimates of the calculated support value. Mort et al. (2000) used empirical data, as was done here, to examine the issue of precision. They showed that the variance for repeated runs of bootstrap and jackknife decreased as the number of replicates increased from 100 to 5000, the 154 AMERICAN JOURNAL Fig. 3. Strict consensus tree for combined rbcL 1 matK analysis of Orchidaceae, continued. same effect that was demonstrated here, and that this effect is also greater for those branches with lower support, as predicted by the calculations of Hedges (1992). Our results were not as conclusive on the last point because we found that some clades with low support had lower standard deviations than others with higher support. Although Mort et al. (2000) employed OF BOTANY [Vol. 91 only NNI or no branch swapping in their analyses as opposed to our study, the standard deviations of support for selected branches in the different numbers of jackknife replicates were very similar in the two studies, such that at 5000 replicates, the standard deviations for all branches were below 1.0. With respect to the utility of including swapping in branch support analyses, Mort et al. (2000) reported a minimal increase in branch support values when NNI swapping was performed as opposed to none, whereas DeBry and Olmstead (2000) found a more significant increase in support with the addition of TBR swapping, as found here. DeBry and Olmstead (2000) recommended performing as much branch swapping as possible for bootstrapping a given data set under one’s time constraints. This approach is effective to the extent that increased effort yields increased benefit, but their study did not address the actual gains made with different degrees of swapping, a point that is addressed here. We found that while values did tend to increase a bit beyond the results obtained with the 1/2 analysis, the benefit incurred was so small compared to the increase in effort that it is essentially useless to search more intensively. Because branch support values approach an asymptote, it is worthwhile evaluating this point of diminishing returns. Given that the simplest TBR swapping analysis (1/1) took 33 times as long as an FS analysis and that a 1/2 analysis took 80 times as long, the increase in effort required to do a thorough analysis is clearly substantial, but the cost of performing a poor analysis in terms of underestimated branch support values is also substantial. Mort et al. (2000) found that the increase in support with increased search effort was greatest among clades with lowest support in general, such that many strongly supported clades were strongly supported in all analyses. This was also observed here, at least for the comparison of FS with minimal swapping analyses. Beyond that, there was little difference among clades with different support values. Although various parameters can be manipulated, two key ones are the number of search replications per jackknife replicate and the number of trees saved per jackknife replicate. Increasing the number of search replicates per jackknife replicate is straightforward, in that it provides additional chances of finding shorter trees. The number of trees saved per jackknife replicate has a similar effect because it increases the number of trees that are swapped on. Another parameter that can be varied is the number of trees held during tree search. In preliminary analyses, we found that holding more trees during the search and saving more trees per search had a similar effect and similar timing for the analyses. Further analyses varying the number of trees held were not performed. In general, the increase in branch support values was correlated with the time spent on the analysis. The increase in support values was dramatic at first as branch swapping was introduced, but rapidly lessened until an apparent asymptote was approached. We conclude that beyond the 1/2 TBR analysis, the gains in support values are slight relative to search time, such that 1/2 appears to be a reasonable strategy for maximizing jackknife efficiency. The extent to which the effectiveness of strategy is data-dependent was not assessed here. It is possible that not all data sets behave in exactly the same way and therefore additional data sets should be explored. For instance, it is possible that this data set is easily searched because there are few islands of shortest trees, while other data sets might benefit from additional search replicates because of a more complex tree island situation. January 2004] FREUDENSTEIN ET AL.—ORCHID PHYLOGENY AND BRANCH SUPPORT There appears to be a small advantage to saving additional trees in each search replicate over increasing the number of search replicates for this data set. This was noticeable at two points in Fig. 1 where strategies that required similar amounts of time and that differed by these parameters favored increased tree saving. We therefore suggest that saving an additional tree may be a better choice of search strategy than adding a search replicate. Clearly, because of the asymptote effect, increasing the numbers of trees saved does not continue to increase branch support indefinitely, such that it is inefficient to perform extremely thorough search analyses. Moreover, support values actually decrease in some cases with increased tree saving. In searches in which increasing numbers of trees are saved, not only are all of those trees swapped on, thus increasing effectiveness of the search, but greater numbers of trees saved will reduce branch support because they have conflict among them. The maximum number of trees saved in a search here was 20, due mainly to the fact that current implementations require swapping on all trees held (rather than just saving a large number of most parsimonious trees), which increases search time dramatically. Parsimony jackknifing was first advanced as a method to provide a rapid way of identifying the relative support of clades in an analysis, particularly of a large data matrix (Farris et al., 1996). Jac (Farris, 1995) is an extremely fast program that does not employ branch swapping. PAUP* allows a similar fast search option (using ‘‘fast stepwise’’ tree construction). The trade-off for fast jackknife analyses is the reduced values on the branches compared to analyses in which branch swapping is employed. Xac (Farris, 1997) allows jackknife analysis with branch swapping, as do both WinClada/NONA and PAUP*. A principal reason for improved support values with increased depth of tree search is that when the analyses in each jackknife replicate do a better job of finding short trees, there will be in general fewer different topologies possible, meaning that the jackknife consensus will have fewer conflicts and more structure will be preserved among jackknife replicates, resulting in higher values. As stated by Mort et al. (2000), fewer suboptimal trees are included in better search analyses. This is essentially the same principle behind Bremer Support (Bremer, 1988), in which clades begin to disappear in longer trees because there are more alternative arrangements possible. Hence, in general, doing the best job possible of finding short trees in each replicate will give the highest jackknife values. Farris et al. (1996) criticized neighbor-joining (Saitou and Nei, 1987) because the method results in a single final tree even though more than one optimal solution may exist. However, depending upon how jackknifing is implemented, a similar criticism can be leveled at this approach. If only a single tree is retained in each jackknife replicate, there is no way to assess how representative that tree is as a solution for the matrix from which it was derived—only in cases for which there truly is a single most parsimonious tree for a particular search will it accurately summarize the topological information from that replicate. It is arbitrary sampling to save only one or a few trees per jackknife replicate, in a similar fashion to the way that neighbor-joining results in only a single tree. All of the most parsimonious trees within a jackknife replicate are relevant to determining branch support, so a jackknife analysis is only as good as its attempt to find and summarize those trees. It is unfortunate that current phylogenetic analysis pro- 155 grams do not have an option for saving large numbers of most parsimonious trees within each jackknife replicate without having to swap through them, which greatly increases the time of the analysis. It is perhaps not surprising that the effect just described is most noticeable with clades that have relatively low support, since it is likely that those clades will be present in fewer most parsimonious trees in each jackknife replicate than strongly supported clades. Branch support decreased with increased tree saving for clades that had support at or lower than 93%. In the end, the approach to performing a branch support analysis can be no different than searching for an optimal tree. In each case, there is a trade-off between resources available for the analysis and thoroughness of search. In each case, there is a point of diminishing return beyond which one is unlikely to improve the outcome of the analysis much by further analysis. In each case, one can get some idea of phylogenetic structure quickly, either by running a support analysis with little or no branch swapping or by examining quickly constructed trees that may be far from optimal. The point of a support analysis is to identify strongly supported clades, so one should not undertake an analysis that is more likely to underestimate support for clades, regardless of time spent. Support analyses are an effective tool for exploring the pattern implications of a data set when done well but should not be considered a ‘‘quick and dirty’’ method of analysis. Topology—The topological patterns resolved here are largely congruent with those reported in Cameron et al. (1999), but they provide a more strongly supported hypothesis because the matK data have now been analyzed in combination with rbcL. Almost all of the differences between the present and Cameron et al. (1999) topologies are for clades that have weak support (,51%; such as the placement of Satyrium relative to Orchideae in the two analyses), so they are not considered important. The few differences for which somewhat higher support is involved include: (1) Cymbidium and Grammatophyllum were placed as sister genera with 75–100% support previously, while the current analysis puts Grammatophyllum with Ansellia with 99% support, and (2) Bifrenaria and Cryptocentrum previously were placed together with 50–74% support, but they are not sister taxa in this analysis. Because a detailed analysis of the rbcL topology was given by Cameron et al. (1999), who compared those results with previous hypotheses and classifications, that will not be repeated here. Only in cases with clear differences or refinements will taxonomic issues be addressed. Branch support for many clades increased significantly here relative to the Cameron et al. analysis, indicating that matK is adding substantially to the signal present in rbcL. For example, Lyperanthus 1 Chiloglottis 1 Calochilus 1 Thelymitra previously had support of ,75%, whereas in this analysis it is supported at 100%. Similarly, that clade 1 Cryptostylis 1 Diuris 1 Orthoceras had support less than 50%, but here it is 81%. The diurid clade, which is defined by Acianthus and Lyperanthus, previously received ,50% support and now receives 100%. The Phreatia-Eria clade previously was supported at ,75%, while it now receives 93% support. Whereas Vandeae were associated with Laeliinae, Arpophyllum, and Polystachya in Cameron et al. (1999; with support less than 50%), now Polystachya is clearly sister to Vandeae with 96% support. The small tribe Polystachyeae is largely Old World, as are Vandeae. No morphological characters have been found 156 AMERICAN JOURNAL that clearly link Polystachya to Vandeae (Freudenstein and Rasmussen, 1999). Although Podochileae share with Vandeae the presence of spherical stegmata (Møller and Rasmussen, 1984), no clear link between these groups was found here; stegmata have not been found in Polystachya (Møller and Rasmussen, 1984; J. V. Freudenstein, unpublished data). In our combined analysis the position of the vanilloids is not well resolved, yet it is clear that the group is distinct, well supported and not sister to the epidendroids, with which the vanilloids have sometimes been associated. Based on jackknife analysis of the rbcL data alone, there is no basis to reject a sister-group relationship between vanilloids and epidendroids because the relationships of the subfamilies are unresolved (tree not shown). Relationships among orchidoid genera recovered here agree with those found by Kores et al. (2001) and Clements et al. (2002), at least where significant branch support (.80%) was found. The position of Codonorchis in Clements et al. (2002), outside of Orchidoideae 1 Epidendroideae, was not well supported in that analysis. It is placed here with 99% support as sister to the Orchideae 1 Diseae, in agreement with Kores et al. (2001). In the Cameron et al. (1999) analysis, particular groups of lower epidendroids were resolved as successive sister groups to other lower epidendroids 1 higher epidendroids. These included members of Nervilieae, Diceratosteleae, Triphoreae, Tropidieae, Palmorchideae, Neottieae, and Sobraliinae. Their first epidendroid bifurcation revealed Nervilia 1 Xerorchis to be sister to the rest of the epidendroids, but with support ,50%. In our analysis, Neottieae 1 Palmorchis are shown to be sister to the remainder of epidendroids, but with only 71% support for the remainder. This low value is due to the presence of Palmorchis in the Neottieae clade, and that position is somewhat tenuous. When the analysis is rerun without Palmorchis, Neottieae are still sister to the rest of the epidendroids, which then have 78% support, suggesting that this group does in fact hold this position in the plastid DNA tree. There is no further structure among the lower epidendroids included here, except for Corymborkis 1 Tropidia (Tropidieae) and Elleanthus 1 Sobralia (Sobraliinae) being resolved. These, together with Nervilia, Triphora, and Xerorchis, comprise an unresolved assemblage of lower epidendroids. Ancistrochilus was resolved as sister to Spathoglottis in this analysis with 100% branch support, but the relationships of this clade to other epidendroids were unresolved. This relationship contrasts with that shown in Goldman et al. (2001), where Ancistrochilus was allied with representatives of Coelogyneae. That study and the present one both indicate a close relationship between Thunia and Coelogyneae, supporting Dressler’s (1993) placement of both groups in the same tribe. The strong support for a sister relationship between Tipularia and Calypso is in agreement with the results of Freudenstein (1994) based on morphology and Senyo and Freudenstein (2000) based on ITS and matK sequences. No close relationship is indicated between these genera and the remainder of the Calypsoeae (represented here by Aplectrum and Govenia), however. Acriopsis and Thecostele were not included in the study of Cameron et al. (1999). They were assigned to their own subtribes in Cymbideae by Dressler but are sister taxa in this analysis and in turn are sister to Cymbidium, all together forming the sister group to the remainder of the vandoids. Support for all of the relationships is 99–100%, providing evidence OF BOTANY [Vol. 91 that these three genera would be more appropriately placed in a single subtribe, perhaps with additional members of Dressler’s (1993) Cyrtopodiinae. Relationships within Maxillarieae are in agreement with those shown in Whitten et al. (2000) in most cases for which branch support is significant. An exception is that Houlletia is here shown to be sister to Kegeliella to the exclusion of Acineta, whereas in Whitten et al. (2000) the latter two genera are more closely related. This may well be due to the considerable difference in sampling in this tribe in the two studies. Few members of Laeliinae were included in this analysis, but the sister relationship of Arpophyllum to Cattleya 1 Hexadesmia 1 Meiracyllium is consistent with the results from van den Berg et al. (2000) based on nuclear ribosomal ITS sequences. The analyses also agree in placing Bletia together with Hexalectris, as also found by Goldman et al. (2001). Perhaps the most unusual topological finding in this analysis is the placement of Wullschlaegelia with Aplectrum and Govenia. The last two have been considered at times to be closely related, reflected in their placement in subtribe Corallorhizinae (Dressler, 1981; Freudenstein, 1994), although Dressler (1993) removed Govenia to its own subtribe in a different tribe based mainly on its different root velamen type. Wullschlaegelia has usually been placed in the leafless tribe Gastrodieae among the lower epidendroids, which is supported by seed type (Dressler, 1993). The small flowers have sectile pollinia (Rasmussen, 1982; Freudenstein and Rasmussen, 1997), and stipes have not been reported. Other than sectile pollinia, seed type, and the leafless habit, there is little in the way of morphology to link the genus to any particular epidendroids. The first evidence that Wullschlaegelia and Aplectrum might be closely related was derived from the analysis of Molvray et al. (2000), based on rbcL, matK, and 18S rDNA sequences. Our study with increased sampling shows the same pattern. Most other members of Calypsoeae are temperate, so the tropical distribution of Wullschlaegelia would be unusual in this tribe. This placement of Wullschlaegelia needs to be confirmed with additional data, but the case does point out the potential for molecular data to assist greatly in placing groups with reduced morphology. Although the higher epidendroids were resolved in the Cameron et al. (1999) study, they were only supported by two characters and received no support ,50%. Here, they received 95% support, confirming recognition of this clade. It is at the base of the advanced epidendroids that the largest polytomy occurs, with 20 clades whose relationships are unresolved in the consensus. This is the same polytomy that was identified in the morphological analysis of Freudenstein and Rasmussen (1999) and the nad1 intron study of Freudenstein and Chase (2001) and is also apparent in the rbcL-only study of Cameron et al. (1999) if the lack of support for most of the larger clades in the Epidendroideae is considered. This large clade comprises 96% of epidendroids and includes most of the epiphytes and those species with specialized pollinarium structures, such as hard pollinia and various types of stalks. The short branches that relate these epidendroid clades to one another in all of the data sets observed thus far were interpreted by Freudenstein et al. (2001) to reflect a rapid radiation of epidendroid groups, perhaps initiated by the evolution of epiphytism and specialized pollinarium structures that are keys to reproductive and vegetative diversity. Such features predominate in the epidendroids as a whole. January 2004] FREUDENSTEIN ET AL.—ORCHID PHYLOGENY AND BRANCH SUPPORT LITERATURE CITED BREMER, K. 1988. The limits of amino-acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42: 795–803. BURNS-BALOGH, P., AND V. A. FUNK. 1986. A phylogenetic analysis of the Orchidaceae. Smithsonian Contributions to Botany 61. CAMERON, K. M., M. W. CHASE, W. M. WHITTEN, P. J. KORES, D. C. JARRELL, V. A. ALBERT, T. YUKAWA, H. G. HILLS, AND D. H. GOLDMAN. 1999. A phylogenetic analysis of the Orchidaceae: evidence from rbcL nucleotide sequences. American Journal of Botany 86: 208–224. CHASE, M. W., K. CAMERON, H. HILLS, AND D. JARRELL. 1994. DNA sequences and phylogenetics of the Orchidaceae and other lilioid monocots. In A. Pridgeon [ed.], Proceedings of the Fourteenth World Orchid Conference, 61–73. Her Majesty’s Stationery Office, Glasgow, UK. CHASE, M. W., ET AL. 2000. Higher-level systematics of the monocotyledons: an assessment of current knowledge and a new classification. In K. L. Wilson and D. A. Morrison [eds.], Monocots: systematics and evolution, 3–16. CSIRO Publishing, Collingwood, Victoria, Australia. CLEMENTS, M. A., D. L. JONES, I. K. SGARMA, M. E. NIGHTINGALE, M. J. GARRATT, K. J. FITZGERALD, A. M. MACKENZIE, AND B. P. J. MOLLOY. 2002. Phylogenetics of Diurideae (Orchidaceae) based on the internal transcribed spacer (ITS) regions of nuclear ribosomal DNA. Lindleyana 17: 135–171. DEBRY, R. W., AND R. G. OLMSTEAD. 2000. A simulation study of reduced tree-search effort in boostrap resampling analysis. Systematic Biology 49: 171–179. DRESSLER, R. L. 1981. The orchids: natural history and classification. Harvard University Press, Cambridge, Massachusetts, USA. DRESSLER, R. L. 1990a. The major clades of the Orchidaceae—Epidendroideae. Lindleyana 5: 117–125. DRESSLER, R. L. 1990b. The Neottieae in orchid classification. Lindleyana 5: 102–109. DRESSLER, R. L. 1990c. The Spiranthoideae: grade or subfamily? Lindleyana 5: 110–116. DRESSLER, R. L. 1993. Phylogeny and classification of the orchid family. Timber Press, Portland, Oregon, USA. FARRIS, J. S. 1969. A successive approximations approach to character weighting. Systematic Zoology 18: 374–385. FARRIS, J. S. 1995. Jac. Computer program distributed by the author. Molekylärsystematiska laboratoriet, Naturhistoriska riksmuseet, Stockholm, Sweden. FARRIS, J. S. 1997. Xac. Computer program distributed by the author. Molekylärsystematiska laboratoriet, Naturhistoriska riksmuseet, Stockholm, Sweden. FARRIS, J. S., V. A. ALBERT, M. KÄLLERSJÖ, D. L. LIPSCOMB, AND A. G. KLUGE. 1996. Parsimony jackknifing outperforms neighbor-joining. Cladistics 12: 99–124. FREUDENSTEIN, J. V. 1994. Gynostemium structure and relationships of the Corallorhizinae (Orchidaceae: Epidendroideae). Plant Systematics and Evolution 193: 1–19. FREUDENSTEIN, J. V., AND M. W. CHASE. 2001. Analysis of mitochondrial nad1b-c intron sequences in Orchidaceae: utility and coding of lengthchange characters. Systematic Botany 26: 643–657. FREUDENSTEIN, J. V., AND F. N. RASMUSSEN. 1997. Sectile pollinia and relationships in the Orchidaceae. Plant Systematics and Evolution 205: 125–146. FREUDENSTEIN, J. V., AND F. N. RASMUSSEN. 1999. What does morphology tell us about orchid relationships? A cladistic analysis. American Journal of Botany 86: 225–248. FREUDENSTEIN, J. V., C. VAN DEN BERGH, W. M. WHITTEN, K. M. CAMERON, D. H. GOLDMAN, AND M. W. CHASE. 2001. A multi-locus combined analysis of Epidendroideae (Orchidaceae). Botany 2001 meeting, Albuquerque, New Mexico, USA. (Abstract). GOLDMAN, D. H., J. V. FREUDENSTEIN, P. J. KORES, M. MOLVRAY, D. C. JARRELL, W. M. WHITTEN, K. M. CAMERON, AND M. W. CHASE. 2001. 157 Phylogenetics of Arethuseae (Orchidaceae) based on plastid matK and rbcL sequences. Systematic Botany 26: 670–695. GOLOBOFF, P. 1993. NONA and PIWE. Version 2. American Museum of Natural History, New York, New York, USA. GRAYBEAL, A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Systematic Biology 47: 9–17. HEDGES, S. B. 1992. The number of replications needed for accurate estimation of the bootstrap p value in phylogenetic studies. Molecular Biology and Evolution 9: 366–369. KÄLLERSJÖ, M., V. A. ALBERT, AND J. S. FARRIS. 1999. Homoplasy increases phylogenetic structure. Cladistics 15: 91–93. KORES, P. J., M. MOLVRAY, P. H. WESTON, S. D. HOPPER, A. P. BROWN, K. M. CAMERON, AND M. W. CHASE. 2001. A phylogenetic analysis of Diurideae (Orchidaceae) based on plastid DNA sequence data. American Journal of Botany 88: 1903–1914. MØLLER, J. D., AND H. RASMUSSEN. 1984. Stegmata in Orchidales: character state distribution and polarity. Botanical Journal of the Linnean Society 89: 53–76. MOLVRAY, M, P. J. KORES, AND M. W. CHASE. 2000. Polyphyly of mycoheterotrophic orchids and functional influences on floral and molecular characters. In K. L. Wilson and D. A. Morrison [eds.], Monocots: systematics and evolution, 441–448. CSIRO Publishing, Collingwood, Victoria, Australia. MORT, M. E., P. S. SOLTIS, D. E. SOLTIS, AND M. L. MABRY. 2000. Comparison of three methods for estimating internal support on phylogenetic trees. Systematic Biology 49: 160–171. NEYLAND, R., AND L. E. URBATSCH. 1995. A terrestrial origin for the Orchidaceae is suggested by a phylogeny inferred from ndhF chloroplast gene sequences. Lindleyana 10: 244–251. NEYLAND, R., AND L. E. URBATSCH. 1996. Evolution in the number and position of fertile anthers in Orchidaceae inferred from ndhF chloroplast gene sequences. Lindleyana 11: 47–53. NIXON, K. C. 2002. WinClada ver. 1.00.08. Published by the author, Ithaca, New York, USA. POLLOCK, D. D., D. J. ZWICKL, J. A. MCGUIRE, AND D. M. HILLIS. 2002. Increased taxon sampling is advantageous for phylogenetic inference. Systematic Biology 51: 664–671. RASMUSSEN, F. N. 1982. The gynostemium of the neottioid orchids. Opera Botanica 65: 1–96. RUDALL, P. J., M. W. CHASE, D. F. CUTLER, J. RUSBY, AND A. Y. DE BRUIJN. 1998. Anatomical and molecular systematics of Asteliaceae and Hypoxidaceae. Botanical Journal of the Linnean Society 127: 1–42. SAITOU, N., AND M. NEI. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4: 406–425. SALAMIN, N, M. W. CHASE, T. R. HODKINSON, AND V. SAVOLAINEN. 2003. Assessing internal support with large phylogenetic DNA matrices. Molecular Phylogenetics and Evolution 27: 528–539. SANDERSON, M. J. 1995. Objections to bootstrapping phylogenies: a critique. Systematic Biology 44: 299–320. SENYO, D. M., AND J. V. FREUDENSTEIN. 2000. A molecular phylogeny of Corallorhiza and related genera. Botany 2000, Portland, Oregon, USA 6–10 August 2000 (Abstract). SZLACHETKO, D. L. 1995. Systema orchidalium. Fragmenta Floristica et Geobotanica Supplementum 3: 1–152. VAN DEN BERG, C., W. E. HIGGINS, R. L. DRESSLER, W. M. WHITTEN, M. A. SOTO ARENAS, A. CULHAM, AND M. W. CHASE. 2000. A phylogenetic analysis of Laeliinae (Orchidaceae) based on sequence data from internal transcribed spacers (ITS) of nuclear ribosomal DNA. Lindleyana 15: 96– 114. WENZEL, J. W., AND M. E. SIDDALL. 1999. Noise. Cladistics 15: 51–64. WHITTEN, W. M., N. H. WILLIAMS, AND M. W. CHASE. 2000. Subtribal and generic relationships of Maxillarieae (Orchidaceae) with emphasis on Stanhopeinae: combined molecular evidence. American Journal of Botany 87: 1842–1856.
© Copyright 2024 Paperzz