an expanded plastid dna phylogeny of orchidaceae and analysis of

American Journal of Botany 91(1): 149–157. 2004.
AN
EXPANDED PLASTID
ORCHIDACEAE
DNA
PHYLOGENY OF
AND ANALYSIS OF JACKKNIFE BRANCH
SUPPORT STRATEGY1
JOHN V. FREUDENSTEIN,2,6 CÁSSIO VAN DEN BERG,3
DOUGLAS H. GOLDMAN,4 PAUL J. KORES,5 MIA MOLVRAY,5
AND MARK W. CHASE5
2
Ohio State University Herbarium, Department of Evolution, Ecology, and Organismal Biology, 1315 Kinnear Rd., Columbus, Ohio
43212 USA; 3Universidade Estadual de Feira de Santana, Departamento de Ciências Biológicas, BR116 km 3 Campus Universitario,
44031-460 Feira de Santana, Bahia, Brazil; 4Harvard University Herbaria, 22 Divinity Avenue, Cambridge, Massachusetts 02138
USA; and 5Molecular Systematics Section, Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK
An expanded plastid DNA phylogeny for Orchidaceae was generated from sequences of rbcL and matK for representatives of all
five subfamilies. The data were analyzed using equally weighted parsimony, and branch support was assessed with jackknifing. The
analysis supports recognition of five subfamilies with the following relationships: (Apostasioideae (Vanilloideae (Cypripedioideae
(Orchidoideae (Epidendroideae))))). Support for many tribal-level groups within Epidendroideae is evident, but relationships among
these groups remain uncertain, probably due to a rapid radiation in the subfamily that resulted in short branches along the spine of
the tree. A series of experiments examined jackknife parameters and strategies to determine a reasonable balance between computational
effort and results. We found that support values plateau rapidly with increased search effort. Tree bisection-reconnection swapping in
a single search replicate per jackknife replicate and saving only two trees resulted in values that were close to those obtained in the
most extensive searches. Although this approach uses considerably more computational effort than less extensive (or no) swapping,
the results were also distinctly better. The effect of saving a maximal number of trees in each jackknife replicate can also be pronounced
and is important for representing support accurately.
Key words:
branch support; jackknife analysis; matK; Orchidaceae; phylogenetic analysis; plastid DNA; rbcL.
The last 10 years have seen a rapid advance in our understanding of orchid relationships, due largely to the contribution
of molecular phylogenetic studies. The phylogenetic hypotheses based on morphology set forth by Burns-Balogh and Funk
(1986), Dressler (1990a, b, c, 1993), and Szlachetko (1995),
and as reflected in their classifications, are being evaluated and
revised as necessary. The first broadly sampled molecular
analysis of Orchidaceae (Chase et al., 1994; 33 genera) was
followed by an even more extensive sampling of rbcL sequences (135 genera), the most widely used plastid DNA locus
for phylogenetic analysis. Neyland and Urbatsch (1995, 1996)
also sparsely sampled the family (31 and six genera represented, respectively) for the plastid locus ndhF, but their results and those of Chase et al. (1994) were necessarily viewed
as tentative, given the importance of sampling to accurate phylogenetic reconstruction (e.g., Graybeal, 1998; Pollock et al.,
2002). An example of a sample-dependent effect in the Neyland and Urbatsch analyses would appear to be the placement
of Orchidoideae as sister to Cypripedioideae and Epidendroideae, a relationship not supported by subsequent analyses.
Cameron et al. (1999) thus presented our first convincing
look at the molecular perspective on higher-level relationships
across the family. A significant amount of structure was revealed throughout the family, although many nodes received
relatively low (,75%) bootstrap support. Cameron et al.
(1999) analyzed their data using both equally weighted and
successively weighted parsimony, but only the latter results
were presented. Successive weighting (Farris, 1969) is based
on the principle that some characters (or at least our homology
hypotheses regarding them) are better than others at indicating
phylogenetic relationships, but recent studies have emphasized
the importance of homoplastic characters for providing tree
structure (Källersjö et al., 1999; Wenzel and Siddall, 1999),
such that it is not clear that downweighting homoplastic characters as is done in successive weighting is the best approach
for phylogenetic analysis.
At the time that cladistic methods first appeared, morphological characters predominated. Individual characters marking
particular clades were valued because characters were relatively few and each was often known in some detail. Indeed,
it was often a triumph to find a single synapomorphy to unite
a group. With ever increasing amounts of sequence data being
generated, the emphasis has shifted now from analysis of individual characters to the sheer weight of group support to
assess confidence. This assessment of support is not usually
interpreted as being statistical in the sense of estimating
‘‘truth’’ (cf. Sanderson, 1995), but gives an indication of support within the context of a particular data set.
Jackknifing has emerged as an important method to estimate
support; the suggestion that it provides the result of an infinitely repeated bootstrap analysis argues for its power (Farris
et al., 1996). Although the first jackknife analysis programs
allowed for only a fast tree construction approach without
branch swapping, subsequent programs allowed a more extensive tree search. The availability of these options raises the
question of what the optimal search strategy in jackknife analysis might be. Previous studies (DeBry and Olmstead, 2000;
Mort et al., 2000; Salamin et al., 2003) have explored these
Manuscript received 25 April 2003; revision accepted 4 September 2003.
The authors thank W. Mark Whitten, and Norris Willams for access to
sequence data and Dan Janies, Alec Pridgeon and an anonymous reviewer for
discussion and comments. This research was supported by US NSF grant
DEB-9615437 to J. V. F.
6
E-mail: [email protected].
1
149
150
AMERICAN JOURNAL
issues to some extent and with somewhat conflicting results.
The current data set provides an opportunity to extend our
understanding of this procedure.
The purposes of this study were twofold: (1) to combine
matK data with the rbcL data already available in a simultaneous analysis to produce an expanded plastid DNA phylogenetic hypothesis for the family that has a rigorous analytical
basis and (2) to explore the impact of varying search parameters on jackknife analyses.
MATERIALS AND METHODS
Many of the rbcL sequences used here have been published previously (cf.
Cameron et al., 1999), as have some of the matK sequences (cf. Goldman et
al., 2001); they are available in GenBank (Appendix; see Supplemental Data
accompanying the online version of this article). Thirty-eight new rbcL and
68 matK sequences were generated by standard polymerase chain reaction
(PCR) methods, including amplification with Taq polymerase using primers
and conditions given in Cameron et al. (1999) and Goldman et al. (2001).
Sequencing was performed using the ABI Prism cycle sequencing kits according to the manufacturer’s directions (Applied Biosystems, Foster City,
California, USA), except that 1/4 reactions were used. Reactions were run on
an ABI 377A automated sequencer (Applied Biosystems) following the manufacturer’s protocols. Sequences were assembled in AutoAssembler (Applied
Biosystems) and Sequencher (GeneCodes, Ann Arbor, Michigan, USA) and
aligned by eye; we followed the recommendations of Freudenstein and Chase
(2001) for coding gap characters in matK. New sequences have been deposited
in GenBank. Outgroups were chosen from related asparagalean families (Asteliaceae, Boryaceae, Hypoxidaceae, Lanariaceae) that have been shown to be
close relatives of Orchidaceae (e.g., Rudall et al., 1998; Chase et al., 2000).
Equally weighted parsimony analysis of the combined rbcL 1 matK (including gap characters) data was performed with WinClada (Nixon, 2002)
running NONA (Goloboff, 1993) as a daughter process; specifically, a heuristic search for the shortest trees was performed using the parsimony ratchet
(Nixon, 2002) in WinClada (200 iterations/replicate, one tree held/replicate,
20 ratchet runs). WinClada and NONA were run on a 450 MHz PC with 256
MB RAM running Windows NT. These data were also analyzed with PAUP*,
employing 2000 random addition searches, holding one tree, and saving five
trees per search replicate, then swapping extensively to obtain the complete
set of most parsimonious trees. The PAUP* analysis was run on a MacIntosh
PowerPC G3 at 400 MHz with 192 MB RAM (OS 8.6). All analyses were
performed on the data set with uninformative characters excluded.
A series of jackknife experiments was designed to examine the effects of
different search strategies on branch support values. Jackknife analyses were
performed with both WinClada/NONA and PAUP*. PAUP* was set to emulate Jac resampling (characters are deleted individually based on a specific
probability rather than deleting an overall percentage of characters; Farris et
al., 1996), character deletion was set at 37% to correspond to a probability
of 1/e that a character appears in the resampled matrix, and tree bisectionreconnection (TBR) branch swapping was employed for most analyses with
MulTrees on. One analysis was performed with nearest-neighbor interchange
(NNI) swapping to relate the results more directly to those of Mort et al.
(2000). Three variables were manipulated in the jackknife analyses: (1) the
number of jackknife replicates performed; (2) the number of search replicates
performed per jackknife replication; and (3) the number of trees saved in each
search replicate.
The results of branch support analyses such as bootstrap and jackknife vary
from run to run because they depend upon the specific trees that are generated
from the ‘‘randomly’’ assembled resampled matrices. The first experiment was
designed to assess the variance in support values with different numbers of
jackknife replicates. Jackknife values for 11 branches that varied maximally
among runs were obtained by performing jackknife analyses comprising 100,
1000, 2500, and 5000 replicates in Winclada/NONA. Each of these analyses
was repeated 10 times in order to sample the distribution of variation among
individual analyses. The mean value, standard deviation, and range for jackknife values on each of the 11 most variable clades were recorded.
OF
BOTANY
[Vol. 91
The second experiment was designed to compare the results of jackknife
analyses performed with various combinations of parameters to explore the
efficiency of search strategies. Ten thousand jackknife replicates were performed in PAUP* for each of the following search strategies: fast stepwise
(FS, which does not employ swapping), one addition search and saving one
tree per jackknife replicate (1/1), as well as 1/2, 2/1, 1/3, 3/1, and 2/2 combinations. Fourteen clades were selected for comparison based on the range
of their values in the different strategies—we selected clades that showed
maximum change among strategies. Branch support was plotted for these
clades against time required to complete the jackknife analysis. A third experiment was performed in PAUP* employing a single jackknife analysis with
10 000 jackknife replicates, each with a single addition replicate and saving
20 trees, in order to assess the effect of saving a greater number of trees.
Jackknife analysis was also performed in PAUP* on the rbcL matrix alone
(10 000 replicates, each consisting of one random addition search, holding
one tree, and saving two trees, TBR branch swapping).
RESULTS
The raw aligned data matrix comprised 2958 base positions
for 173 taxa. The incomplete ends of both rbcL and matK
sequences were excluded from the analyses as were uninformative characters, leaving 1180 informative characters to be
included in the analysis, 27% (319) from rbcL and 73% from
matK. The matK data included 18 multibase indels coded as
presence/absence or unordered multistate characters (as in
Freudenstein and Chase, 2001) appended to the sequence matrix. One sequence (Neuwiedia) was missing for matK, but all
other sequences were at least partially complete for both genes.
Analysis of jackknife parameters—The first experiment assessed the variance in values on a particular branch for various
numbers of jackknife replicates. Ten analyses of the data set
using WinClada/NONA with increasing numbers of jackknife
replicates revealed a progressive decrease in the range of values on most branches. The branches on which the greatest
range was observed had support between 52% and 87% (Table
1); none of the clades with support over 90% was among the
most variable in these analyses. The standard deviations
ranged from 1.3 to 2.4 (depending upon clade) for 1000 replication analyses, 0.7 to 1.6 for 2500 replication analyses, and
0.5 to 0.8 for 5000 replication analyses. A value of four standard deviations (equivalent to a 95% probability that a randomly drawn value will be within four standard deviations of
any other specific value for the same branch) yields a conservative guide as to what might be considered a ‘‘significant’’
change between treatments for a given number of replications
(Table 1). ‘‘Significant’’ here means that for a particular number of jackknife replications a difference is likely due to variation in search strategy parameters rather than to stochastic
variation among individual jackknife analyses. It is conservative because it accommodates even data points that are at
opposite ends of the distribution (considering that there is a
95% probability that a randomly drawn sample will be within
two standard deviations of the mean).
For 1000 replicates, the difference in support values for a
particular branch between two treatments would have to be
greater than 6–10 (depending on the clade) in order to conclude (at 95% probability level) that it was due to difference
in treatment. For 2500 replications, this value decreases to 3–
7, and for 5000 it is 2–4. Whereas branches with low support
often had a greater variance than those with higher support,
this was not always the case, such that no absolute trend was
evident (Table 1). To minimize the effects of number of jack-
January 2004]
TABLE 1.
FREUDENSTEIN
ET AL.—ORCHID PHYLOGENY AND BRANCH SUPPORT
151
Summary of results of 10 jackknife analyses at each of three numbers of replicates for 10 clades.
Clade
Range
Mean
Variance
SD
4 SD
1000 replications
Corycium 1 Disa
Gennaria 1 Habenaria
Pachyplectron 1 Spiranthes
Epipactis 1 sister
Dilomilis 1 sister
Tainia 1 sister
Arundina 1 sister
Oncidium 1 sister
Cattleya 1 Meiracyllium
Restrepia 1 sister
49–57
65–72
65–71
69–74
68–74
73–79
70–76
83–87
64–69
66–73
53.7
67.9
68.4
72.2
72.3
75.5
72.3
85.6
66.4
70.0
5.6
3.7
3.6
2.6
2.7
2.5
4.5
1.8
2.7
3.6
2.4
1.9
1.9
1.6
1.6
1.6
2.1
1.3
1.6
1.9
9.4
7.6
7.6
6.5
6.5
6.3
8.4
5.4
6.6
7.5
2500 replications
Corycium 1 Disa
Gennaria 1 Habenaria
Pachyplectron 1 Spiranthes
Epipactis 1 sister
Dilomilis 1 sister
Tainia 1 sister
Arundina 1 sister
Oncidium 1 sister
Cattleya 1 Meiracyllium
Restrepia 1 sister
51–55
67–68
65–70
71–73
71–73
75–77
70–74
84–86
66–69
68–71
52.9
67.5
68.5
72.3
72.0
75.9
72.6
85.3
67.1
69.5
2.3
0.3
2.5
0.5
0.4
0.5
1.4
0.7
1.0
0.9
1.5
0.5
1.6
0.7
0.7
0.7
1.2
0.8
1.0
1.0
6.1
2.1
6.3
2.7
2.7
3.0
4.7
3.3
4.0
3.9
5000 replications
Corycium 1 Disa
Gennaria 1 Habenaria
Pachyplectron 1 Spiranthes
Epipactis 1 sister
Dilomilis 1 sister
Tainia 1 sister
Arundina 1 sister
Oncidium 1 sister
Cattleya 1 Meiracyllium
Restrepia 1 sister
52–54
67–69
68–70
71–73
71–73
74–77
72–73
84–86
66–68
68–71
53.0
67.6
68.8
72.2
72.0
75.6
72.3
85.2
67.1
69.4
0.4
0.5
0.4
0.6
0.2
0.7
0.2
0.4
0.3
0.7
0.7
0.7
0.6
0.8
0.5
0.8
0.5
0.6
0.6
0.8
2.7
2.8
2.5
3.2
1.9
3.4
1.9
2.5
2.3
3.4
knife replicates even further in treatment comparisons here,
the second experiment was performed at 10 000 replicates.
Figure 1 shows the results of varying the number of search
replications per jackknife replicate and the number of trees
saved per jackknife replicate on clades plotted against elapsed
time for the analysis performed in PAUP* using TBR and NNI
branch swapping. The data on which this table is based are
available as Supplementary Data with the online version of
this article. Often the clades that experienced the greatest
changes among strategies were those with lower support values, although even branches with support values above 90%
showed some change. In general, support increased with increased time of search, regardless of how that was achieved.
The most dramatic increase in support was between a search
with no branch swapping (fast stepwise search) and one in
which a minimum amount of swapping was employed (one
search replicate with one tree saved). This is the difference
between the FS procedure in PAUP* and an NNI analysis with
one search replicate and saving one tree per replicate (1/1).
The second greatest increase occurred between 1/1 NNI and
1/1 TBR searches. The increases were less striking beyond
that. As an example, the most substantial increase was for the
epidendroid clade, for which support increased from 60 to
96% between the FS analysis and the 1/1 TBR analysis. Half
of that increase occurred between FS and 1/1 NNI and the
second half between the latter and 1/1 TBR. In terms of increase in support per unit time, the difference between FS and
1/1 NNI is most striking; although the epidendroid clade had
an equal gain in support between FS and 1/1 NNI and 1/1
NNI and 1/1 TBR, the former represented only a 1.73 increase in analysis time, whereas the TBR search took 35 times
as long. The support increase per unit time was always greatest
for the 1/1 NNI search over FS, as compared to any other
increment of swapping effort.
A surprising difference in support is evident between the
run with two search replicates but one tree saved (the ‘‘2/1’’
analysis) and that with one search replicate saving two trees
(1/2). Although the second analysis took only 45 min longer
than the first, most clades had an abrupt but small increase in
performance (1–3%). Beyond the 2/1 or 1/2 analyses, effectiveness sometimes increased and sometimes decreased. For
five of the 14 clades, even the 2/2 analysis resulted in no
increase in support over the 2/1 analysis, although it required
almost twice as much time. For almost all others, the 2/2 analysis gave the best result; in only one case was the 2/2 result
suboptimal.
For many clades resolved in the 1/20 analysis (tree not
shown), the values were similar to those in the comparable
1/1 analysis. However, for 32 clades the support was at least
4% lower in the 1/20 analysis than in the 1/1 analysis. These
clades were at all hierarchic levels in the tree but were typically those with lower support. The highest support value that
decreased with increased tree saving was 93% in the 1/1 TBR
tree. The greatest decrease observed was 17% for the Epipactis
1 Listera 1 Neottia clade. Not every low or mid-range support value decreased, however. In no case did a clade increase
152
AMERICAN JOURNAL
OF
BOTANY
[Vol. 91
Fig. 1. Jackknife support percentages plotted against analysis time under different parameter combinations for 14 clades of Orchidaceae (for clade composition, see Figs. 2–3). FS 5 fast stepwise, NNI 5 nearest neighbor interchange swapping. The number preceding the slash is the number of random addition
searches per jackknife replicate, and the number after the slash is the number of trees saved per jackknife replicate.
by four or more points in the 1/20 analysis relative to the 1/1
analysis.
Tree topology—The parsimony analysis found 11 201 most
parsimonious trees for the rbcL 1 matK sequence data (length
5 7309; consistency index [CI] 5 0.29; retention index [RI]
5 0.59). Although 2000 random addition searches provide a
good chance of discovering multiple islands of most parsimonious trees, it is still not possible to conclude that all have
been found. This data set appeared to be a relatively easy one
to search, however, in that shortest trees were found in 616 of
the 2000 replicates. Figures 2–3 show the strict consensus of
the 11 201 most parsimonious trees with jackknife values plotted from the 1/2 jackknife analysis.
As a measure of relative resolution between the analyses of
rbcL 1 matK and rbcL alone, 133 clades are supported at the
51% or greater level in the combined jackknife tree, while only
56 clades are supported at that level in the jackknife analysis
of rbcL alone (trees not shown), indicating a substantial increase in branch support when the matK data are added. The
family as a whole is supported at the 100% level relative to
the asparagoid outgroups in both analyses. The monophyly of
each of the currently recognized subfamilies (Apostasioideae,
Vanilloideae, Cypripedioideae, Orchidoideae, Epidendroideae)
is supported at the 98% level or above in the combined analysis, which is a significant increase for some groups over the
rbcL results (Orchidoideae was 78% and Epidendroideae was
65% with rbcL alone). Resolution among Apostasioideae,
Vanilloideae, Cypripedioideae, and Orchidoideae 1 Epidendroideae is not supported in either analysis, resulting in a polytomy. The latter group is well supported (99%) in the combined analysis but not in the rbcL analysis (,51%).
Among Orchidoideae in the combined analysis, Chloraea,
Megastylis, and Pterostylis are united in a strongly supported
clade with the spiranthoid orchids (Dressler’s Spiranthinae,
Cranichidinae, and Goodyerinae) and with Pachyplectron
(92%), which is sister to Goodyerinae. Sister to this assemblage is the remainder of the diurids, comprising the clades
containing Acianthus, Microtis, Eriochilus, Cryptostylis, and
Thelymitra. Sister to all of these is a clade comprising a monophyletic Orchideae (the clade defined by Holothrix and
Ophrys), a paraphyletic Diseae (all of the groups from Disperis
to Satyrium), and, sister to all of these, Codonorchis.
Within Epidendroideae, Neottieae 1 Palmorchis is supported at 87%, and Neottieae, comprising Cephalanthera, Epipactis, Neottia, and Listera, at 100% (73% with rbcL). The
remainder of epidendroids are united at 71%. The ‘‘lower’’
epidendroids (Nervilia through Sobralia) fall as a polytomy
outside of the ‘‘higher’’ epidendroids, the latter receiving support of 95%. Within the higher epidendroids, many small
groups receive strong support ($90%), such as Coelia 1 1
Coelia 2, Isochilus 1 Ponera, Liparis 1 Malaxis (5 Malaxideae), Arpophyllum 1 Hexadesmia 1 Cattleya 1 Meiracyllium (5 Laeliinae), Phreatia 1 Appendicula 1 Trichotosia 1
Ceratostylis 1 Eria, Polystachya 1 (Cleisostoma 1 Neofinetia 1 Phalaenopsis 1 Aerangis 1 Diaphananthe 1 Aeranthes 1 Angraecum) [5 Vandeae], and the clade defined by
Bletilla and Eleorchis (5 Bletiinae, in part). The leafless and
probably achlorophyllous Wullschlaegelia is resolved as sister
to Aplectrum 1 Govenia with 71% support.
The vandoid orchids (defined by Acriopsis and Kegeliella)
are well supported (96%) in the combined analysis, but do not
appear in the rbcL support tree. Acriopsis and Thecostele comprise the sister to Cymbidium and together are strongly supported as monophyletic. These three genera are sister to the
remainder of the vandoids, which are strongly supported in the
combined analysis (99%; 71% with rbcL alone). The remainder of the cymbidioid genera fall in small clades whose relationships are unresolved, with one, Cyrtopodium, appearing as
sister to Maxillarieae (defined by Eriopsis and Kegeliella and
well supported at 97%). The cymbidioids are therefore paraphyletic. Some clades within Maxillarieae are also well supported, such as Pachyphyllum 1 Dipteranthus 1 Stellilabium
(99%), Cryptarrhena-Huntleya (100%), and Acineta-Kegeliella (99%).
January 2004]
FREUDENSTEIN
ET AL.—ORCHID PHYLOGENY AND BRANCH SUPPORT
153
Fig. 2. Strict consensus tree for combined rbcL 1 matK analysis of Orchidaceae. Jackknife support percentages are shown (.50%). Subfamilies are indicated
with vertical bars. Out 5 outgroups, Apo 5 Apostasioideae, Cyp 5 Cypripedioideae, Orc 5 Orchidoideae, Epi 5 Epidendroideae.
DISCUSSION
Jackknifing—Although the current study focused on the
jackknife, we expect that the results would be similar for bootstrapping because they are conceptually related (Farris et al.,
1996) and are similar in methodology—only the sampling regime is different. Mort et al. (2000) compared bootstrap and
jackknife and attributed the differences observed to the different resampling schemes employed in each, noting that because the resampled matrix is usually smaller in jackknifing,
the branch support values may also be smaller. A similar pattern was noted by Salamin et al. (2003). Note that Mort et al.
(2000) and Salamin et al. (2003) did not employ the resampling scheme put forth by Farris et al. (1996); they used a
fixed proportion of character deletion (33 or 50%) for their
jackknife analyses, rather than a probability of 1/e for each
character to be deleted.
Hedges (1992), Mort et al. (2000), and to a lesser extent
Salamin et al. (2003) addressed the issue of the number of
replicates that should be performed to ensure precision in estimating the support value. Hedges (1992) calculated the num-
ber of replications needed to achieve a statistically accurate
estimate of the bootstrap support value based on the variance
of a binomial distribution. He determined that if one wishes
the 95% confidence range to be 1% of the support value (in
this case 95%), then 1825 bootstrap replications would need
to be performed. At a bootstrap value of 99%, 381 replications
would be required. Similarly, 8067 replicates would be needed
for a clade with support of 70%, and 9519 replicates for a
clade with 55% support. At the time that Hedges was writing,
the use of few bootstrap replications (20–100 as cited by
Hedges for several studies) was common; increased computer
speed and better heuristics now allow much more thorough
support analyses. The numbers calculated by Hedges are in
the general range of those used in our analyses and concluded
on an empirical basis to be appropriately precise estimates of
the calculated support value.
Mort et al. (2000) used empirical data, as was done here,
to examine the issue of precision. They showed that the variance for repeated runs of bootstrap and jackknife decreased
as the number of replicates increased from 100 to 5000, the
154
AMERICAN JOURNAL
Fig. 3. Strict consensus tree for combined rbcL 1 matK analysis of Orchidaceae, continued.
same effect that was demonstrated here, and that this effect is
also greater for those branches with lower support, as predicted
by the calculations of Hedges (1992). Our results were not as
conclusive on the last point because we found that some clades
with low support had lower standard deviations than others
with higher support. Although Mort et al. (2000) employed
OF
BOTANY
[Vol. 91
only NNI or no branch swapping in their analyses as opposed
to our study, the standard deviations of support for selected
branches in the different numbers of jackknife replicates were
very similar in the two studies, such that at 5000 replicates,
the standard deviations for all branches were below 1.0.
With respect to the utility of including swapping in branch
support analyses, Mort et al. (2000) reported a minimal increase in branch support values when NNI swapping was performed as opposed to none, whereas DeBry and Olmstead
(2000) found a more significant increase in support with the
addition of TBR swapping, as found here. DeBry and Olmstead (2000) recommended performing as much branch swapping as possible for bootstrapping a given data set under one’s
time constraints. This approach is effective to the extent that
increased effort yields increased benefit, but their study did
not address the actual gains made with different degrees of
swapping, a point that is addressed here. We found that while
values did tend to increase a bit beyond the results obtained
with the 1/2 analysis, the benefit incurred was so small compared to the increase in effort that it is essentially useless to
search more intensively. Because branch support values approach an asymptote, it is worthwhile evaluating this point of
diminishing returns. Given that the simplest TBR swapping
analysis (1/1) took 33 times as long as an FS analysis and that
a 1/2 analysis took 80 times as long, the increase in effort
required to do a thorough analysis is clearly substantial, but
the cost of performing a poor analysis in terms of underestimated branch support values is also substantial.
Mort et al. (2000) found that the increase in support with
increased search effort was greatest among clades with lowest
support in general, such that many strongly supported clades
were strongly supported in all analyses. This was also observed here, at least for the comparison of FS with minimal
swapping analyses. Beyond that, there was little difference
among clades with different support values.
Although various parameters can be manipulated, two key
ones are the number of search replications per jackknife replicate and the number of trees saved per jackknife replicate.
Increasing the number of search replicates per jackknife replicate is straightforward, in that it provides additional chances
of finding shorter trees. The number of trees saved per jackknife replicate has a similar effect because it increases the
number of trees that are swapped on. Another parameter that
can be varied is the number of trees held during tree search.
In preliminary analyses, we found that holding more trees during the search and saving more trees per search had a similar
effect and similar timing for the analyses. Further analyses
varying the number of trees held were not performed.
In general, the increase in branch support values was correlated with the time spent on the analysis. The increase in
support values was dramatic at first as branch swapping was
introduced, but rapidly lessened until an apparent asymptote
was approached. We conclude that beyond the 1/2 TBR analysis, the gains in support values are slight relative to search
time, such that 1/2 appears to be a reasonable strategy for
maximizing jackknife efficiency. The extent to which the effectiveness of strategy is data-dependent was not assessed
here. It is possible that not all data sets behave in exactly the
same way and therefore additional data sets should be explored. For instance, it is possible that this data set is easily
searched because there are few islands of shortest trees, while
other data sets might benefit from additional search replicates
because of a more complex tree island situation.
January 2004]
FREUDENSTEIN
ET AL.—ORCHID PHYLOGENY AND BRANCH SUPPORT
There appears to be a small advantage to saving additional
trees in each search replicate over increasing the number of
search replicates for this data set. This was noticeable at two
points in Fig. 1 where strategies that required similar amounts
of time and that differed by these parameters favored increased
tree saving. We therefore suggest that saving an additional tree
may be a better choice of search strategy than adding a search
replicate.
Clearly, because of the asymptote effect, increasing the
numbers of trees saved does not continue to increase branch
support indefinitely, such that it is inefficient to perform extremely thorough search analyses. Moreover, support values
actually decrease in some cases with increased tree saving. In
searches in which increasing numbers of trees are saved, not
only are all of those trees swapped on, thus increasing effectiveness of the search, but greater numbers of trees saved will
reduce branch support because they have conflict among them.
The maximum number of trees saved in a search here was 20,
due mainly to the fact that current implementations require
swapping on all trees held (rather than just saving a large
number of most parsimonious trees), which increases search
time dramatically.
Parsimony jackknifing was first advanced as a method to
provide a rapid way of identifying the relative support of
clades in an analysis, particularly of a large data matrix (Farris
et al., 1996). Jac (Farris, 1995) is an extremely fast program
that does not employ branch swapping. PAUP* allows a similar fast search option (using ‘‘fast stepwise’’ tree construction). The trade-off for fast jackknife analyses is the reduced
values on the branches compared to analyses in which branch
swapping is employed. Xac (Farris, 1997) allows jackknife
analysis with branch swapping, as do both WinClada/NONA
and PAUP*. A principal reason for improved support values
with increased depth of tree search is that when the analyses
in each jackknife replicate do a better job of finding short
trees, there will be in general fewer different topologies possible, meaning that the jackknife consensus will have fewer
conflicts and more structure will be preserved among jackknife
replicates, resulting in higher values. As stated by Mort et al.
(2000), fewer suboptimal trees are included in better search
analyses. This is essentially the same principle behind Bremer
Support (Bremer, 1988), in which clades begin to disappear in
longer trees because there are more alternative arrangements
possible. Hence, in general, doing the best job possible of
finding short trees in each replicate will give the highest jackknife values.
Farris et al. (1996) criticized neighbor-joining (Saitou and
Nei, 1987) because the method results in a single final tree
even though more than one optimal solution may exist. However, depending upon how jackknifing is implemented, a similar criticism can be leveled at this approach. If only a single
tree is retained in each jackknife replicate, there is no way to
assess how representative that tree is as a solution for the matrix from which it was derived—only in cases for which there
truly is a single most parsimonious tree for a particular search
will it accurately summarize the topological information from
that replicate. It is arbitrary sampling to save only one or a
few trees per jackknife replicate, in a similar fashion to the
way that neighbor-joining results in only a single tree. All of
the most parsimonious trees within a jackknife replicate are
relevant to determining branch support, so a jackknife analysis
is only as good as its attempt to find and summarize those
trees. It is unfortunate that current phylogenetic analysis pro-
155
grams do not have an option for saving large numbers of most
parsimonious trees within each jackknife replicate without
having to swap through them, which greatly increases the time
of the analysis.
It is perhaps not surprising that the effect just described is
most noticeable with clades that have relatively low support,
since it is likely that those clades will be present in fewer most
parsimonious trees in each jackknife replicate than strongly
supported clades. Branch support decreased with increased tree
saving for clades that had support at or lower than 93%.
In the end, the approach to performing a branch support
analysis can be no different than searching for an optimal tree.
In each case, there is a trade-off between resources available
for the analysis and thoroughness of search. In each case, there
is a point of diminishing return beyond which one is unlikely
to improve the outcome of the analysis much by further analysis. In each case, one can get some idea of phylogenetic structure quickly, either by running a support analysis with little or
no branch swapping or by examining quickly constructed trees
that may be far from optimal. The point of a support analysis
is to identify strongly supported clades, so one should not
undertake an analysis that is more likely to underestimate support for clades, regardless of time spent. Support analyses are
an effective tool for exploring the pattern implications of a
data set when done well but should not be considered a ‘‘quick
and dirty’’ method of analysis.
Topology—The topological patterns resolved here are largely congruent with those reported in Cameron et al. (1999), but
they provide a more strongly supported hypothesis because the
matK data have now been analyzed in combination with rbcL.
Almost all of the differences between the present and Cameron
et al. (1999) topologies are for clades that have weak support
(,51%; such as the placement of Satyrium relative to Orchideae in the two analyses), so they are not considered important. The few differences for which somewhat higher support
is involved include: (1) Cymbidium and Grammatophyllum
were placed as sister genera with 75–100% support previously,
while the current analysis puts Grammatophyllum with Ansellia with 99% support, and (2) Bifrenaria and Cryptocentrum
previously were placed together with 50–74% support, but
they are not sister taxa in this analysis.
Because a detailed analysis of the rbcL topology was given
by Cameron et al. (1999), who compared those results with
previous hypotheses and classifications, that will not be repeated here. Only in cases with clear differences or refinements will taxonomic issues be addressed.
Branch support for many clades increased significantly here
relative to the Cameron et al. analysis, indicating that matK is
adding substantially to the signal present in rbcL. For example,
Lyperanthus 1 Chiloglottis 1 Calochilus 1 Thelymitra previously had support of ,75%, whereas in this analysis it is
supported at 100%. Similarly, that clade 1 Cryptostylis 1
Diuris 1 Orthoceras had support less than 50%, but here it
is 81%. The diurid clade, which is defined by Acianthus and
Lyperanthus, previously received ,50% support and now receives 100%. The Phreatia-Eria clade previously was supported at ,75%, while it now receives 93% support. Whereas
Vandeae were associated with Laeliinae, Arpophyllum, and
Polystachya in Cameron et al. (1999; with support less than
50%), now Polystachya is clearly sister to Vandeae with 96%
support. The small tribe Polystachyeae is largely Old World,
as are Vandeae. No morphological characters have been found
156
AMERICAN JOURNAL
that clearly link Polystachya to Vandeae (Freudenstein and
Rasmussen, 1999). Although Podochileae share with Vandeae
the presence of spherical stegmata (Møller and Rasmussen,
1984), no clear link between these groups was found here;
stegmata have not been found in Polystachya (Møller and Rasmussen, 1984; J. V. Freudenstein, unpublished data).
In our combined analysis the position of the vanilloids is
not well resolved, yet it is clear that the group is distinct, well
supported and not sister to the epidendroids, with which the
vanilloids have sometimes been associated. Based on jackknife
analysis of the rbcL data alone, there is no basis to reject a
sister-group relationship between vanilloids and epidendroids
because the relationships of the subfamilies are unresolved
(tree not shown).
Relationships among orchidoid genera recovered here agree
with those found by Kores et al. (2001) and Clements et al.
(2002), at least where significant branch support (.80%) was
found. The position of Codonorchis in Clements et al. (2002),
outside of Orchidoideae 1 Epidendroideae, was not well supported in that analysis. It is placed here with 99% support as
sister to the Orchideae 1 Diseae, in agreement with Kores et
al. (2001).
In the Cameron et al. (1999) analysis, particular groups of
lower epidendroids were resolved as successive sister groups
to other lower epidendroids 1 higher epidendroids. These included members of Nervilieae, Diceratosteleae, Triphoreae,
Tropidieae, Palmorchideae, Neottieae, and Sobraliinae. Their
first epidendroid bifurcation revealed Nervilia 1 Xerorchis to
be sister to the rest of the epidendroids, but with support
,50%. In our analysis, Neottieae 1 Palmorchis are shown to
be sister to the remainder of epidendroids, but with only 71%
support for the remainder. This low value is due to the presence of Palmorchis in the Neottieae clade, and that position
is somewhat tenuous. When the analysis is rerun without Palmorchis, Neottieae are still sister to the rest of the epidendroids, which then have 78% support, suggesting that this
group does in fact hold this position in the plastid DNA tree.
There is no further structure among the lower epidendroids
included here, except for Corymborkis 1 Tropidia (Tropidieae) and Elleanthus 1 Sobralia (Sobraliinae) being resolved.
These, together with Nervilia, Triphora, and Xerorchis, comprise an unresolved assemblage of lower epidendroids.
Ancistrochilus was resolved as sister to Spathoglottis in this
analysis with 100% branch support, but the relationships of
this clade to other epidendroids were unresolved. This relationship contrasts with that shown in Goldman et al. (2001),
where Ancistrochilus was allied with representatives of Coelogyneae. That study and the present one both indicate a close
relationship between Thunia and Coelogyneae, supporting
Dressler’s (1993) placement of both groups in the same tribe.
The strong support for a sister relationship between Tipularia and Calypso is in agreement with the results of Freudenstein (1994) based on morphology and Senyo and Freudenstein (2000) based on ITS and matK sequences. No close
relationship is indicated between these genera and the remainder of the Calypsoeae (represented here by Aplectrum and
Govenia), however.
Acriopsis and Thecostele were not included in the study of
Cameron et al. (1999). They were assigned to their own subtribes in Cymbideae by Dressler but are sister taxa in this
analysis and in turn are sister to Cymbidium, all together forming the sister group to the remainder of the vandoids. Support
for all of the relationships is 99–100%, providing evidence
OF
BOTANY
[Vol. 91
that these three genera would be more appropriately placed in
a single subtribe, perhaps with additional members of Dressler’s (1993) Cyrtopodiinae.
Relationships within Maxillarieae are in agreement with
those shown in Whitten et al. (2000) in most cases for which
branch support is significant. An exception is that Houlletia is
here shown to be sister to Kegeliella to the exclusion of Acineta, whereas in Whitten et al. (2000) the latter two genera
are more closely related. This may well be due to the considerable difference in sampling in this tribe in the two studies.
Few members of Laeliinae were included in this analysis,
but the sister relationship of Arpophyllum to Cattleya 1 Hexadesmia 1 Meiracyllium is consistent with the results from
van den Berg et al. (2000) based on nuclear ribosomal ITS
sequences. The analyses also agree in placing Bletia together
with Hexalectris, as also found by Goldman et al. (2001).
Perhaps the most unusual topological finding in this analysis
is the placement of Wullschlaegelia with Aplectrum and Govenia. The last two have been considered at times to be closely
related, reflected in their placement in subtribe Corallorhizinae
(Dressler, 1981; Freudenstein, 1994), although Dressler (1993)
removed Govenia to its own subtribe in a different tribe based
mainly on its different root velamen type. Wullschlaegelia has
usually been placed in the leafless tribe Gastrodieae among the
lower epidendroids, which is supported by seed type (Dressler,
1993). The small flowers have sectile pollinia (Rasmussen,
1982; Freudenstein and Rasmussen, 1997), and stipes have not
been reported. Other than sectile pollinia, seed type, and the
leafless habit, there is little in the way of morphology to link
the genus to any particular epidendroids. The first evidence
that Wullschlaegelia and Aplectrum might be closely related
was derived from the analysis of Molvray et al. (2000), based
on rbcL, matK, and 18S rDNA sequences. Our study with
increased sampling shows the same pattern. Most other members of Calypsoeae are temperate, so the tropical distribution
of Wullschlaegelia would be unusual in this tribe. This placement of Wullschlaegelia needs to be confirmed with additional
data, but the case does point out the potential for molecular
data to assist greatly in placing groups with reduced morphology.
Although the higher epidendroids were resolved in the Cameron et al. (1999) study, they were only supported by two
characters and received no support ,50%. Here, they received
95% support, confirming recognition of this clade. It is at the
base of the advanced epidendroids that the largest polytomy
occurs, with 20 clades whose relationships are unresolved in
the consensus. This is the same polytomy that was identified
in the morphological analysis of Freudenstein and Rasmussen
(1999) and the nad1 intron study of Freudenstein and Chase
(2001) and is also apparent in the rbcL-only study of Cameron
et al. (1999) if the lack of support for most of the larger clades
in the Epidendroideae is considered. This large clade comprises 96% of epidendroids and includes most of the epiphytes
and those species with specialized pollinarium structures, such
as hard pollinia and various types of stalks. The short branches
that relate these epidendroid clades to one another in all of the
data sets observed thus far were interpreted by Freudenstein
et al. (2001) to reflect a rapid radiation of epidendroid groups,
perhaps initiated by the evolution of epiphytism and specialized pollinarium structures that are keys to reproductive and
vegetative diversity. Such features predominate in the epidendroids as a whole.
January 2004]
FREUDENSTEIN
ET AL.—ORCHID PHYLOGENY AND BRANCH SUPPORT
LITERATURE CITED
BREMER, K. 1988. The limits of amino-acid sequence data in angiosperm
phylogenetic reconstruction. Evolution 42: 795–803.
BURNS-BALOGH, P., AND V. A. FUNK. 1986. A phylogenetic analysis of the
Orchidaceae. Smithsonian Contributions to Botany 61.
CAMERON, K. M., M. W. CHASE, W. M. WHITTEN, P. J. KORES, D. C. JARRELL, V. A. ALBERT, T. YUKAWA, H. G. HILLS, AND D. H. GOLDMAN.
1999. A phylogenetic analysis of the Orchidaceae: evidence from rbcL
nucleotide sequences. American Journal of Botany 86: 208–224.
CHASE, M. W., K. CAMERON, H. HILLS, AND D. JARRELL. 1994. DNA sequences and phylogenetics of the Orchidaceae and other lilioid monocots.
In A. Pridgeon [ed.], Proceedings of the Fourteenth World Orchid Conference, 61–73. Her Majesty’s Stationery Office, Glasgow, UK.
CHASE, M. W., ET AL. 2000. Higher-level systematics of the monocotyledons:
an assessment of current knowledge and a new classification. In K. L.
Wilson and D. A. Morrison [eds.], Monocots: systematics and evolution,
3–16. CSIRO Publishing, Collingwood, Victoria, Australia.
CLEMENTS, M. A., D. L. JONES, I. K. SGARMA, M. E. NIGHTINGALE, M. J.
GARRATT, K. J. FITZGERALD, A. M. MACKENZIE, AND B. P. J. MOLLOY.
2002. Phylogenetics of Diurideae (Orchidaceae) based on the internal
transcribed spacer (ITS) regions of nuclear ribosomal DNA. Lindleyana
17: 135–171.
DEBRY, R. W., AND R. G. OLMSTEAD. 2000. A simulation study of reduced
tree-search effort in boostrap resampling analysis. Systematic Biology 49:
171–179.
DRESSLER, R. L. 1981. The orchids: natural history and classification. Harvard University Press, Cambridge, Massachusetts, USA.
DRESSLER, R. L. 1990a. The major clades of the Orchidaceae—Epidendroideae. Lindleyana 5: 117–125.
DRESSLER, R. L. 1990b. The Neottieae in orchid classification. Lindleyana 5:
102–109.
DRESSLER, R. L. 1990c. The Spiranthoideae: grade or subfamily? Lindleyana
5: 110–116.
DRESSLER, R. L. 1993. Phylogeny and classification of the orchid family.
Timber Press, Portland, Oregon, USA.
FARRIS, J. S. 1969. A successive approximations approach to character
weighting. Systematic Zoology 18: 374–385.
FARRIS, J. S. 1995. Jac. Computer program distributed by the author. Molekylärsystematiska laboratoriet, Naturhistoriska riksmuseet, Stockholm,
Sweden.
FARRIS, J. S. 1997. Xac. Computer program distributed by the author. Molekylärsystematiska laboratoriet, Naturhistoriska riksmuseet, Stockholm,
Sweden.
FARRIS, J. S., V. A. ALBERT, M. KÄLLERSJÖ, D. L. LIPSCOMB, AND A. G.
KLUGE. 1996. Parsimony jackknifing outperforms neighbor-joining. Cladistics 12: 99–124.
FREUDENSTEIN, J. V. 1994. Gynostemium structure and relationships of the
Corallorhizinae (Orchidaceae: Epidendroideae). Plant Systematics and
Evolution 193: 1–19.
FREUDENSTEIN, J. V., AND M. W. CHASE. 2001. Analysis of mitochondrial
nad1b-c intron sequences in Orchidaceae: utility and coding of lengthchange characters. Systematic Botany 26: 643–657.
FREUDENSTEIN, J. V., AND F. N. RASMUSSEN. 1997. Sectile pollinia and relationships in the Orchidaceae. Plant Systematics and Evolution 205:
125–146.
FREUDENSTEIN, J. V., AND F. N. RASMUSSEN. 1999. What does morphology
tell us about orchid relationships? A cladistic analysis. American Journal
of Botany 86: 225–248.
FREUDENSTEIN, J. V., C. VAN DEN BERGH, W. M. WHITTEN, K. M. CAMERON,
D. H. GOLDMAN, AND M. W. CHASE. 2001. A multi-locus combined
analysis of Epidendroideae (Orchidaceae). Botany 2001 meeting, Albuquerque, New Mexico, USA. (Abstract).
GOLDMAN, D. H., J. V. FREUDENSTEIN, P. J. KORES, M. MOLVRAY, D. C.
JARRELL, W. M. WHITTEN, K. M. CAMERON, AND M. W. CHASE. 2001.
157
Phylogenetics of Arethuseae (Orchidaceae) based on plastid matK and
rbcL sequences. Systematic Botany 26: 670–695.
GOLOBOFF, P. 1993. NONA and PIWE. Version 2. American Museum of
Natural History, New York, New York, USA.
GRAYBEAL, A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Systematic Biology 47: 9–17.
HEDGES, S. B. 1992. The number of replications needed for accurate estimation of the bootstrap p value in phylogenetic studies. Molecular Biology and Evolution 9: 366–369.
KÄLLERSJÖ, M., V. A. ALBERT, AND J. S. FARRIS. 1999. Homoplasy increases
phylogenetic structure. Cladistics 15: 91–93.
KORES, P. J., M. MOLVRAY, P. H. WESTON, S. D. HOPPER, A. P. BROWN, K.
M. CAMERON, AND M. W. CHASE. 2001. A phylogenetic analysis of
Diurideae (Orchidaceae) based on plastid DNA sequence data. American
Journal of Botany 88: 1903–1914.
MØLLER, J. D., AND H. RASMUSSEN. 1984. Stegmata in Orchidales: character
state distribution and polarity. Botanical Journal of the Linnean Society
89: 53–76.
MOLVRAY, M, P. J. KORES, AND M. W. CHASE. 2000. Polyphyly of mycoheterotrophic orchids and functional influences on floral and molecular
characters. In K. L. Wilson and D. A. Morrison [eds.], Monocots: systematics and evolution, 441–448. CSIRO Publishing, Collingwood, Victoria, Australia.
MORT, M. E., P. S. SOLTIS, D. E. SOLTIS, AND M. L. MABRY. 2000. Comparison of three methods for estimating internal support on phylogenetic
trees. Systematic Biology 49: 160–171.
NEYLAND, R., AND L. E. URBATSCH. 1995. A terrestrial origin for the Orchidaceae is suggested by a phylogeny inferred from ndhF chloroplast
gene sequences. Lindleyana 10: 244–251.
NEYLAND, R., AND L. E. URBATSCH. 1996. Evolution in the number and
position of fertile anthers in Orchidaceae inferred from ndhF chloroplast
gene sequences. Lindleyana 11: 47–53.
NIXON, K. C. 2002. WinClada ver. 1.00.08. Published by the author, Ithaca,
New York, USA.
POLLOCK, D. D., D. J. ZWICKL, J. A. MCGUIRE, AND D. M. HILLIS. 2002.
Increased taxon sampling is advantageous for phylogenetic inference.
Systematic Biology 51: 664–671.
RASMUSSEN, F. N. 1982. The gynostemium of the neottioid orchids. Opera
Botanica 65: 1–96.
RUDALL, P. J., M. W. CHASE, D. F. CUTLER, J. RUSBY, AND A. Y. DE BRUIJN.
1998. Anatomical and molecular systematics of Asteliaceae and Hypoxidaceae. Botanical Journal of the Linnean Society 127: 1–42.
SAITOU, N., AND M. NEI. 1987. The neighbor-joining method: a new method
for reconstructing phylogenetic trees. Molecular Biology and Evolution
4: 406–425.
SALAMIN, N, M. W. CHASE, T. R. HODKINSON, AND V. SAVOLAINEN. 2003.
Assessing internal support with large phylogenetic DNA matrices. Molecular Phylogenetics and Evolution 27: 528–539.
SANDERSON, M. J. 1995. Objections to bootstrapping phylogenies: a critique.
Systematic Biology 44: 299–320.
SENYO, D. M., AND J. V. FREUDENSTEIN. 2000. A molecular phylogeny of
Corallorhiza and related genera. Botany 2000, Portland, Oregon, USA
6–10 August 2000 (Abstract).
SZLACHETKO, D. L. 1995. Systema orchidalium. Fragmenta Floristica et
Geobotanica Supplementum 3: 1–152.
VAN DEN BERG, C., W. E. HIGGINS, R. L. DRESSLER, W. M. WHITTEN, M. A.
SOTO ARENAS, A. CULHAM, AND M. W. CHASE. 2000. A phylogenetic
analysis of Laeliinae (Orchidaceae) based on sequence data from internal
transcribed spacers (ITS) of nuclear ribosomal DNA. Lindleyana 15: 96–
114.
WENZEL, J. W., AND M. E. SIDDALL. 1999. Noise. Cladistics 15: 51–64.
WHITTEN, W. M., N. H. WILLIAMS, AND M. W. CHASE. 2000. Subtribal and
generic relationships of Maxillarieae (Orchidaceae) with emphasis on
Stanhopeinae: combined molecular evidence. American Journal of Botany 87: 1842–1856.