Estimating ancestral distributions of lineages with uncertain sister

Journal of Systematics and Evolution
47 (5): 349–368 (2009)
doi: 10.1111/j.1759-6831.2009.00044.x
Estimating ancestral distributions of lineages with uncertain sister
groups: a statistical approach to Dispersal–Vicariance Analysis
and a case using Aesculus L. (Sapindaceae) including fossils
A.J. HARRIS∗
Qiu-Yun (Jenny) XIANG∗
(North Carolina State University, Department of Plant Biology, Raleigh, North Carolina, USA)
Abstract We propose a simple statistical approach for using Dispersal–Vicariance Analysis (DIVA) software to
infer biogeographic histories without fully bifurcating trees. In this approach, ancestral ranges are first optimized
for a sample of Bayesian trees. The probability P of an ancestral range r at a node is then calculated as P(rY ) =
n
t=1 F(rY )t Pt where Y is a node, and F(rY ) is the frequency of range r among all the optimal solutions resulting
from DIVA optimization at node Y , t is one of n topologies optimized, and Pt is the probability of topology t.
Node Y is a hypothesized ancestor shared by a specific crown lineage and the sister of that lineage “x”, where x
may vary due to phylogenetic uncertainty (polytomies and nodes with posterior probability <100%). Using this
method, the ancestral distribution at Y can be estimated to provide inference of the geographic origins of the
specific crown group of interest. This approach takes into account phylogenetic uncertainty as well as uncertainty
from DIVA optimization. It is an extension of the previously described method called Bayes-DIVA, which pairs
Bayesian phylogenetic analysis with biogeographic analysis using DIVA. Further, we show that the probability P of
an ancestral range at Y calculated using this method does not equate to pp∗ F(rY ) on the Bayesian consensus tree
when both variables are <100%, where pp is the posterior probability and F(rY ) is the frequency of range r for the
node containing the specific crown group. We tested our DIVA-Bayes approach using Aesculus L., which has major
lineages unresolved as a polytomy. We inferred the most probable geographic origins of the five traditional sections
of Aesculus and of Aesculus californica Nutt. and examined range subdivisions at parental nodes of these lineages.
Additionally, we used the DIVA-Bayes data from Aesculus to quantify the effects on biogeographic inference of
including two wildcard fossil taxa in phylogenetic analysis. Our analysis resolved the geographic ranges of the
parental nodes of the lineages of Aesculus with moderate to high probabilities. The probabilities were greater than
those estimated using the simple calculation of pp∗ F(ry ) at a statistically significant level for two of the six lineages.
We also found that adding fossil wildcard taxa in phylogenetic analysis generally increased P for ancestral ranges
including the fossil’s distribution area. The P was more dramatic for ranges that include the area of a wildcard
fossil with a distribution area underrepresented among extant taxa. This indicates the importance of including fossils
in biogeographic analysis. Exmination of range subdivision at the parental nodes revealed potential range evolution
(extinction and dispersal events) along the stems of A. californica and sect. Parryana.
Key words Aesculus, biogeography, DIVA, fossil wildcards, MrBayes, phylogenetic uncertainty.
Studies in historical biogeography based on phylogeny have accumulated rapidly due to the recent increase in availability of molecular phylogenetic data
(see Xiang et al., 1998a, 2004, 2005, 2006; Wen, 1999;
Sanmartı́n et al., 2001; Donoghue & Smith, 2004;
Sanmartı́n & Ronquist, 2004; Soltis et al., 2006). One
of the most widely used methods of inferring biogeographic histories based on phylogeny is Dispersal–
Vicariance Analysis (DIVA) (Ronquist, 1997, 2001).
∗
C
Received: 11 March 2009 Accepted: 19 June 2009
Authors for correspondence. A.J. Harris E-mail: <[email protected]>;
Tel.: 1-336-6842314. Jenny Xiang E-mail: <[email protected]>;
Tel.: 1-919-5152728; Fax: 1-919-5153436.
2009 Institute of Botany, Chinese Academy of Sciences
DIVA is a method of reconstructing biogeographic
history that falls under the broad heading of eventbased methods, in which biogeographic processes that
help drive speciation are incorporated a priori into the
methodology (Ronquist, 1996, 1997; Sanmartı́n et al.,
2001). Specifically, DIVA uses a parsimony approach
that minimizes extinctions and dispersals and assumes
vicariance as the null hypothesis (Ronquist, 1996). The
program estimates distributions of hypothesized ancestors at internal nodes on a fully bifurcating phylogenetic tree based on the distributions of terminal taxa
(Ronquist, 1996). Results of biogeographic analysis using DIVA are optimized ancestral ranges at each internal node under the parsimony criterion. Frequently,
multiple equally parsimonious biogeographic pathways
350
Journal of Systematics and Evolution
Vol. 47
No. 5
(MP pathways) are obtained from a given tree, and these
are summarized as multiple optimal solutions at some
or all internal nodes of the tree. Although new modelbased likelihood and Bayesian methods of reconstructing biogeographic histories have recently been developed (Ree et al., 2005; Ree & Smith, 2008; Sanmartı́n
et al., 2008; also Lemmon & Lemmon, 2008), a quick,
advanced search using Google Scholar for 2008 published reports containing the words “biogeography” and
“DIVA” illustrates that DIVA continues to be widely
used in historical biogeographic studies. The primary
advantage of DIVA over the likelihood method of Ree
et al. (2005) is that less prior information is required
(Ree et al., 2005; Ree & Smith, 2008). DIVA is also
fast, simple, and user-friendly and gives results congruent to the model-based likelihood method Lagrange
(http://code.google.com/p/lagrange/) for most lineages
that have been compared (Ree et al., 2005; Burbrink &
Lawson, 2007; Ree & Smith, 2008; Velazco &
Patterson, 2008; Xiang & Thomas, 2008; Xiang et al.,
2009) when analyses using DIVA included outgroups
that are not widely distributed or the root range was
used for area coding for outgroups at higher rank than
species (see Ronquist, 1996).
Running the DIVA program requires that two parameters are defined; the phylogeny and the distributions of terminal taxa. Aside from any questions that
might arise regarding the underlying assumptions implemented in the program, uncertainty in the results
of DIVA arises from two areas, phylogenetic uncertainty and uncertainty in DIVA optimization. Biogeographic reconstruction using DIVA is typically carried
out using a single tree topology; the author’s “best” tree
representing the true phylogeny (e.g., Fiz et al., 2008;
Jeandroz et al., 2008; Lim, 2008). The single tree approach is a common practice in phylogenetic biogeography using many methods including Component analysis
(Page, 1993a, 1993b), Bremer’s ancestral area analysis
(Bremer, 1992), and the model-based likelihood methods of Ree et al. (2005) and Ree & Smith (2008). Of the
five reports published in the American Journal of Botany
and Systematic Biology in 2008, in which a primary
research goal was to reconstruct historical biogeography, five used DIVA, four used a single tree (Calviño
et al., 2008; Hines, 2008; Huttunen et al., 2008; Mansion
et al., 2008) and one showed that alternative resolutions
of polytomies had no effect on biogeographic reconstruction (Mast et al., 2008). Using a single tree rarely
accounts for the full range of possible, slightly less optimal topologies given the data. Additionally, the “best”
phylogeny is not always fully resolved or strongly supported for all nodes; some clades may be weakly supported or there may be polytomies. Polytomies are par-
2009
ticularly problematic. The backbone phylogeny used in
DIVA analysis must be fully bifurcating as the program
is unable to accept polytomies, but polytomies present
a problem for most methods of biogeographic analysis
using phylogeny, as reconstruction necessarily breaks
down at these unresolved nodes. The other area of uncertainty from DIVA is the multiple, equally parsimonious biogeographic scenarios for a given phylogeny.
The program does not provide any quantifiable method
of selecting between the multiple possibilities. However,
authors can use information from area connections and
divergence times to rule out certain hypotheses or to
favor one hypothesis over another, as also discussed by
Ronquist (1996). Both types of uncertainty in DIVA
have been recognized and handled by Nylander et al.
(2008) using posterior probabilities (pp).
Nylander et al. (2008) recently showed the utility
of a probabilistic approach to DIVA in reconstructing
the biogeographic history of the avian genus Turdus L.
Specifically, they optimized 20,000 Bayesian trees in
DIVA and used the results of these optimizations to
determine the marginal distributions of alternative ancestral ranges at each node of interest, dependent on
the node’s occurrence in the sampled topologies. Thus,
alternative ancestral ranges at each node in the tree
(Fig. 1a of Nylander et al., 2008) can be assumed to
have a probability equal to the product of the clade pp
(phylogenetic uncertainty) and the occurrence of the alternative ranges for the clade in DIVA (the uncertainty
in the biogeographic reconstruction). The occurrence of
each alternative range was determined as a fraction of all
optimal ranges; that is, for a given tree, a node with three
optimal ancestral ranges “A, B, or AB”, the occurrence
of each range was recorded “A:1/3, B:1/3, AB:1/3”. This
approach accounts for both uncertainty in the location
of a node in the broader tree topology (i.e., phylogenetic
uncertainty) and uncertainty in ancestral range reconstructions (multiple, equally parsimonious DIVA optimizations). Nylander et al. (2008) referred to this as a
Bayes-DIVA analysis. Using a subset of Bayesian trees
to account for uncertainty in phylogeny has been used
before (e.g., Lutzoni et al., 2001; Pagel et al., 2004). In
biogeography, this methodology was also suggested by
Lemmon and Lemmon (2008) and was previously used
by Huelsenbeck and Immenov (2002). Nylander et al.
(2008) were the first to apply this approach to use with
DIVA.
Here, we extend the Bayes-DIVA method to allow
estimation of the geographic origin of a lineage in a
polytomy. We first redefined a node as the parent node
(parent node, hereafter) of a crown group node, where
a crown group node (crown node, hereafter) represents
the last shared common ancestor of all constituents of
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
Fig. 1. Graphical explanation of parent nodes, crown nodes, and unspecified sister groups. A, Hypothetical phylogeny containing well-supported
crown groups marked by triangular symbols and incomplete resolution of
relationships among them. Open circles indicate crown nodes of crown
groups 1–4. Closed circles indicate parent nodes (node, sensu this study).
Numbered parent nodes corresponding to numbered crown groups. B,
Unspecified sister groups (x) for crown groups 1–4. Node numbers in
closed circles correspond to those in A.
a crown group with an undefined sister (x) (Fig. 1).
Therefore, the parent node is inherently present on every tree in the posterior distribution of phylogenetic
trees in which the crown group occurs, regardless of the
relationship of the crown group to other groups. Using this definition allows for estimation of the ancestral
range of the stem lineage of a highly supported terminal
taxon or crown group even if the lineage is resolved
as a member of a polytomy in the phylogeny (Fig. 1).
The probability (P) of an ancestral range r at a node of
interest is calculated as
P(rY ) =
n
F(rY )t Pt
(1)
t=1
where Y is the parent node, t is one of the randomly
selected Bayesian trees, n is the total number of sampled
trees, F(rY )t is the occurrence of an ancestral range r at
node Y for tree t, and Pt is the probability of tree t,
which is the proportion of the tree in the pool of the
sampled trees (which can be extended to the proportion
of the tree in the pool of the entire posterior distribution
of trees). F(rY ) is calculated as the actual frequency of
r within the pool of biogeographic pathways optimized
using DIVA for each sampled tree: F(rY ) = Ri t .
The value i is the number of times a range (r) occurs
in the total number of MP pathways (Rt ) over the tree.
The actual frequency can be obtained by using the command “printrecs” in DIVA. An alternative estimation of
F(rY ) is using the method of Nylander et al. (2008) as
1/N, where N is the total number of alternative ancestral distributions at node Y . An example of this method
C
2009 Institute of Botany, Chinese Academy of Sciences
351
of probability calculation and both methods of deriving
F(rY ) are illustrated in Fig. 2. This revised Bayes-DIVA
approach can provide statistical confidence on inferred
biogeographic origins of lineages of interest with unresolved or poorly supported phylogenetic placement,
for which the traditional DIVA analysis or the BayesDIVA approach used by Nylander et al. (2008) are
uninformative.
The parent node Y in this study is similar to the
floating node described by Pagel et al. (2004) in that
both Y and the floating node do not always include the
same crown groups. However, the floating node must
include two specific crown groups of interest, although
it may contain other clades or taxa as well (Pagel et al.,
2004). Y differs in that it is the parent of exactly two
groups: a specific crown group of interest and its sister
x, which is undefined. Another important difference is
that the two clades of interest at a floating node of Pagel
et al. (2004) can have any level of support, whereas the Y
applies to only the nodes connecting the well-supported
crown clade and its unspecified sister. Therefore, the
floating node is not suitable as a substitute for Y .
Using simulated data, we tested whether the range
probabilities of a parent node can be accurately inferred
as the product of the pp at the node containing the
crown group and a defined sister, and the frequency
of occurrence of the range at that node optimized by
DIVA on the Bayesian consensus tree topology, that is,
pp ∗ F(r y ).
(2)
We further tested the utility of our approach using data from Aesculus L., a genus of woody trees and
shrubs with a disjunct Laurasian distribution. We also
illustrate two additional applications of this method.
First, we estimated the impact of two fossil wildcard
taxa (sensu Nixon & Wheeler, 1992) on biogeographic
reconstruction of Aesculus. Second, we examined range
subdivisions at the parental nodes of lineages of interest and estimated the most probable ranges inherited by these lineages (referred to as post-Y range
hereafter) to gain some insights into range evolution
along the stem branches. The primary goals of this
study are: (i) to describe an alternative method of using
the Bayes-DIVA analysis under phylogentic uncertainty
which can provide estimation of geographic origin for
crown groups with unknown sister relationships; and
(ii) to test the method and its possible applications using
Aesculus L.
Aesculus (Sapindales, Sapindaceae) is a genus
of 13–19 species belonging to six major lineages,
which are supported by phylogenetic studies using
molecular and morphological data: sect. Aesculus
352
Journal of Systematics and Evolution
Vol. 47
No. 5
2009
Fig. 2. Example of calculation of P(rY ) and of F(rY ) using two methods. A, Hypothetical sample of three Bayesian trees, T 1 –T 3 . Node Y (circles) is
parent node of Lineage 1. A, B, C, and D are distribution areas. Ranges of terminals are given below lineage names. Possible ranges for node Y include
A, B, C, D and widespread areas including two or more of these. In B and C, only areas with F(rY ) > 0 for at least one tree shown. B, Calculation of
F(rY ) using actual frequency of areas from dispersal–vicariance analysis output (i.e., Ri t ). C, Calculation of F(rY ) assuming all optimal areas equally
probable for each t (i.e., 1/N).
(2 species), sect. Macrothyrsus (1 species), sect. Parryana (1 species), sect. Pavia (4 species), an Asian clade
(3–10 species), and the species Aesculus californica
Nutt. (Xiang et al., 1998b; Forest et al., 2001; Harris
et al., 2009). Extant Aesculus species are distributed
across the Northern Hemisphere and each lineage is restricted to one of the following areas: East Asia (EA);
western North America (wNA); eastern North America
(eNA); and Europe (EU), except sect. Aesculus, which is
disjunct in EA and EU. Aesculus has a rich fossil record
from EA, EU, and wNA and with fossils found in strata
ranging from the Paleocene to the Quaternary (Hu &
Chaney, 1940; Condit, 1944; Puri, 1945; Szafer, 1947,
1954; Tanai, 1952; Schloemer-Jäger, 1958; Prakash
& Barghoorn, 1961; Axelrod, 1966; Budantsev, 1983;
de Lumley, 1988; Mai & Walther, 1988; Wehr, 1998;
Golovneva, 2000; Manchester, 2001; Jeong et al., 2004;
Dilhoff et al., 2005).
Aesculus is an ideal genus for biogeographic study
owing to its small number of species, pan-Northern
Hemisphere distribution, extensive fossil record, and the
continental endemism of most lineages and all species.
However, molecular phylogenetic studies of Aesculus using several DNA regions (Xiang et al., 1998b;
Harris et al., 2009) have resulted in poorly supported or
unresolved relationships among the six major lineages
despite strong support for the polytypic lineages (i.e.,
crown groups). Thus, the utility of DIVA applied in the
traditional way for biogeographic reconstruction of the
genus is limited. In addition to deep node polytomies,
biogeographic reconstruction of Aesculus presents another challenge due to uncertainties in positions of
some fossil species. Recently, many authors have cited
the need for inclusion of fossils in phylogenetic reconstruction and phylogeny-based biogeographic analyses (Manchester, 1999; Rothwell, 1999; Wen, 1999;
Lieberman, 2003; Crane et al., 2004; Donoghue &
Smith, 2004; Xiang et al., 2005, 2006, 2009; Hilton &
Bateman, 2006; Rothwell & Nixon, 2006). Excluding
fossils can produce a false or incomplete biogeographic
history of a group (Manchester, 1999; Lieberman, 2003;
Crane et al., 2004). The limitations of including fossils,
for which often only incomplete morphological data
and rarely ancient DNA data is available, have been
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
discussed (Nixon & Wheeler, 1992; Kearney, 2002;
Kearney & Clark, 2003; Wiens, 2003, 2006) and observed in empirical studies (e.g. Rothwell & Nixon,
2006; Harris et al., 2009; but see Manos et al., 2007).
Fossil taxa for which little informative data is available may act as wildcard taxa (Nixon & Wheeler, 1992)
in phylogenetic analysis. Wildcard taxa are defined as
those that, due to significant missing characters, may
be placed algorithmically at many or all nodes on the
tree topology (Nixon & Wheeler, 1992; Kearney &
Clark, 2003). Two geographically and temporally important complete leaf (leaflets attached to a petiole) fossil species of Aesculus offer few phylogenetically informative characters. These are Aesculus longipedunculus
Schloemer-Jäger (Eocene, EU) and Aesculus “magnificum” (Budantsev, 1983; Manchester, 2001) (Paleocene,
EA). In preliminary analyses, these fossil species behave as wildcards, limiting phylogenetic resolution for
the fossils and for otherwise well-supported groups. In
the example using Aesculus, we use the revised BayesDIVA to provide a statistical measure of shifts in ancestral range probabilities when fossils are included versus
excluded.
1
Material and methods
1.1 Assessing the difference between Equation 1
and Equation 2
It is of interest to determine if the product of the
pp and the frequency of a range (F(rY )) derived from
DIVA analysis of the Bayesian consensus tree (with
compatible groupings below 50% allowed) effectively
reflects the estimation using Equation 1 of the revised
Bayes-DIVA method (i.e., Equation 2 versus Equation
1) because the former is so much simpler. To accomplish
this, 10 random DNA sequences of 200 bp in length
were generated using a JavaScript sequence generator
(http://www.faculty.ucr.edu/∼mmaduro/random.htm)
(M. Maduro, pers. comm., 2008). These sequences
were used to represent 10 hypothetical lineages,
Lineage 1–Lineage 10. These lineages represent 10
unique operational taxonomic units where each might
be a species or a clade containing multiple species
with 100% pp. This is a simplistic example, but our
analysis of Aesculus L. provides an example of data
calculation for clades supported by pp less than 100%
of the data. The 10 simulated sequences were treated
as aligned and placed in a data matrix. Knowledge
of any true relationship between these sequences was
unknown and inessential as the objective was not to
test the utility of Bayesian analysis in recovering true
relationships. The random sequences were expected
C
2009 Institute of Botany, Chinese Academy of Sciences
353
to provide phylogenetic uncertainty sufficient to test
the hypothesis whether Equation 1 results in ancestral
range probabilities at a node significantly different
from that resulting from Equation 2. The 10 random
sequences are available from the authors by request.
Phylogenetic analysis of the simulated data was carried out using MrBayes 3.1.2 (Huelsenbeck & Ronquist,
2001; Huelsenbeck & Ronquist, 2003). The program
was run using default priors for two simultaneous runs
of 22 million generations each. Each run used one
hot chain and two cold chains with default settings.
Burnin was set to 2,200,000 (or 10%) and trees were
sampled every 2000 generations. Resulting post-burnin
trees were assembled into a PHYLIP format file and
a majority rule consensus with compatible groupings
>50% was generated using Consense in the PHYLIP
3.68 package (Felsenstein, 1989; Felsenstein, 2008).
Lineage 3 was randomly selected as an outgroup. The
consensus tree was used to identify four lineages, two
sister groups, that would be used to test our hypothesis:
Lineages 1 and 8; and Lineages 4 and 9 (Fig. 3: A).
One hundred trees from the 19,800 post-burnin
dataset were randomly selected using RandomTree
(Kauff, 2005). Four ancestral areas, A, B, C, and D,
were randomly assigned to each of the 10 lineages, with
each area being used at least once and with each lineage
endemic to a single area. The 100 trees were optimized
using DIVA 1.1 for Windows (Ronquist, 1996, 1997)
with default settings. The ancestral ranges of the parent
nodes were recorded in an Microsoft Excel 2007 spreadsheet. The spreadsheet format was used for calculation
of ancestral range probabilities at each node of interest
and for statistical test analysis.
Lineages 1, 4, 8, and 9 (Fig. 3: A) were used to
compare the probabilities calculated using Equation 1
and Equation 2. Probabilities of ancestral ranges for the
node shared by Lineage 1 + Lineage 8 and the node
shared by Lineage 4 + Lineage 9 (occurring in the 50%
consensus topology) were first calculated using Equation 2 to provide an estimation of ancestral origin of
these lineages. The results were then compared to those
estimated using Equation 1, in which the sisters of Lineages 1, 4, 8, and 9 were undefined (x). A two-tailed
z-test was used to determine if there was significant difference between probabilities for ranges obtained using
the two methods. The goal of these comparisons, and
of similar comparisons made in the empirical example
using Aesculus, was to determine whether Equation 1
could recover additional informative range data for the
parental node that has <100% pp in the Bayesian consensus tree than Equation 2. Any significant differences
between Equation 1 and Equation 2 indicate that there is
additional useful range information present in the subset
354
Journal of Systematics and Evolution
Vol. 47
No. 5
2009
Fig. 3. Results of Bayesian analysis of simulated data. A, Consensus trees for 19,800 (left) and 100 (right) Bayesian trees. Values of posterior probability
support are shown above branches, actual occurrences are given in parentheses. Geographic ranges of terminals subtend terminal names. Parent and
crown nodes used in Bayes-dispersal–vicariance analysis simulation are highlighted, expanded in B. MJ, majority. B, Explanation of nodes of interest
for Bayes-dispersal–vicariance analysis simulation.
of Bayesian trees that is discarded by using Equation 2.
In all DIVA analyses constraints on maximum areas
(“maxareas” command) were not implemented.
1.2 Reconstructing ancestral ranges in Aesculus L.
1.2.1 DNA and morphological data DNA sequences from matK, the rps16 intron, and internal transcribed spacer (ITS), available from a previous study
for 16 species of Aesculus as well as for outgroup taxa
Handeliodendron bodinieri Redhr., Billia columbiana
Planch. & Linden ex Triana & Planch. and Billia hippocastanum Peyr., were used in this study (Appendix I).
For information on outgroup selection see Hardin
(1957a), Judd et al. (1994), Xiang et al. (1998b), Forest et al. (2001), Harrington et al. (2005), and Harris
et al. (2009), and DNA sequences were aligned manually using MacClade 4.02 (Maddison & Maddison,
2001). The 39-character morphological matrix of Forest
et al. (2001) was modified by: (i) excluding all outgroup
taxa used in their study except those noted above; (ii)
eliminating Aesculus glabra Willd. var. arguta (Buckley) B.L. Rob.; and (iii) combining the species of Billia
into a single taxonomic entry, Billia sp.
Fossil taxa, A. longipedunclus and A. “magnificum” were scored based on published reports
(Schloemer-Jäger, 1958; Budantsev, 1983; Golovneva,
2000; Manchester, 2001) for three characters: petiolulate leaflets (as opposed to sessile); serrate margins (as
opposed to entire); and having palmately compound
leaves (as opposed to ternate). The presence of petiolulate leaflets is a parsimony informative character in Aesculus (Hardin, 1957a; Forest et al., 2001; Manchester,
2001; Harris et al., 2009). All extant species of Aesculus
except (arguably) Aesculus parryi (sect. Parryana) have
some degree of leaf serration (Hardin, 1957a; Forest
et al., 2001). Outgroup taxa Handeliodendron and Billia have entire leaflets (Wiggins, 1932; Hardin, 1957a,
1957b; Forest et al., 2001; Harris et al., 2009). Palmately
compound leaves are common to all extant Aesculus and
Handeliodendron, whereas leaves of Billia are ternate
(Forest et al., 2001; Hardin, 1957a, 1957b, 1960).
1.2.2 Phylogenetic analysis Three independent
phylogenetic analyses were carried out. In Analysis 1,
gaps in matK were coded using ambiguous region coding (ARC) (Kauff et al., 2003) for ambiguously aligned
regions and simple gap coding for unambiguous gaps.
In Analysis 2 ARC and simple gap coding were applied
for all genes in the concatenated sequences. Analysis
3 included the extant species as well as the two fossil
species A. longipedunculus and A. “magnificum” and
was carried out using a matrix of combined morphological and molecular data with the same ARC and gap codings as Analysis 2. Analyses were carried out using MrBayes 3.1.2. Data was partitioned into four sets, matK,
rps16, ITS, and morphology including the modified
morphological matrix of Forest et al. (2001) and the
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
standard states from ARC and simple gap coding. For
each gene region, ModelTest 3.0 (Posada & Crandall,
1998) was used to determine the best model of evolution. Although character state ratios and other specific information were dependent on use ARC and simple gap coding, the basic models were not affected by
use of these coding methods. The Akaike Information
Criterion in ModelTest returned the following models:
TVM + I + G for matK, TRN + I for ITS, and K81uf
for rps16. Models were implemented in MrBayes using
the PRSET and LSET commands.
For each analysis, two simultaneous, independent
Markov chains were run for 22 million generations to
check convergence. Trees were sampled every 2000
generations. Burnin was set to 2.2 million generations or 1100 trees, and was checked using Tracer 1.3
(Rambaut & Drummond, 2003). The 19,800 postburnin trees from each analysis were combined independently and summarized by generating a 50% majority rule consensus tree in PAUP∗ 4.0b10 (Swofford,
2002).
1.2.3 Biogeographic analysis using the revised
Bayes-DIVA method Nine nodes of interest were
identified on the Bayesian consensus tree from analysis
of combined data with gaps in matK coded using ARC
and simple gap coding (Analysis 1). These were the parent nodes of sect. Aesculus, sect. Macrothyrsus, sect.
Parryana, sect. Pavia, the Asian clade, A. californica,
and the crown nodes of each of the polytypic lineages;
sect. Aesculus, sect. Pavia, and the Asian clade. One
hundred trees from the combined post-burnin Bayesian
tree files from each analysis were randomly sampled using RandomTree. Terminals were coded as belonging to
one of five ancestral areas: Europe (A), East Asia (B),
eastern North America (C), western North America (D),
and Latin America (E) to cover distributional ranges of
Aesculus and its outgroup Billia. Trees were optimized
using default settings in DIVA 1.1 for Macintosh. Results from DIVA for each of the nine nodes of interest
were recorded in a Microsoft Excel spreadsheet which
was used for subsequent calculations. Ancestral range
probability at each node of interest was calculated using
Equation 1. For those nodes present in the Bayesian consensus topology, the probability of alternative ancestral
ranges was also calculated using Equation 2 for comparison. Individual topologies of sampled trees were
examined using TreeView 1.6.6 (Page, 1996, 2001) and
PAUP∗ 4.0b10 (Swofford, 2002).
Biogeographic analysis of the Analysis 3 phylogenies using the revised Bayes-DIVA method considered
only the ancestral ranges of the six parent nodes and
did not include the three crown group nodes. We used
the floating node of Pagel et al. (2004) in cases where
C
2009 Institute of Botany, Chinese Academy of Sciences
355
crown clades contained fossil species. The floating node
allowed that crown clade existed on the tree as long as
the floating node included only the crown clade alone
or only the crown clade plus one or both fossil species.
The floating node was not a substitute for Y . Instead Y
included the crown clade (plus any fossils) and x. On
some topologies for some crown clades of interest, x
was a fossil and this was perfectly acceptable. The revised Bayes-DIVA analysis including fossils was done
using a sample of 100 trees from the post-burnin posterior distribution of trees. This analysis was repeated
for the same set of 100 trees with fossils pruned from
the topologies. Z statistics were used to compare the results of these two analyses (fossils included and fossils
pruned) and the results from Analysis 1 including only
extant species.
Post-Y range analyses for each of the six major lineages of Aesculus were carried out using Bayes-DIVA
results from Analysis 1 data. For each node Y of the
six major lineages, all possible ranges that the branch
leading to the crown group of interest could inherit from
ranges at Y with a P > 0 were determined. Inheritance
of each possible range from splitting of ranges at Y was
considered equally probable and was then weighted by
the probability of the ancestral range at Y . The probability of each possible range inherited from node Y by each
of the descendant branches was calculated as the sum
of the probabilities of that range over all ranges with a
P > 0 at Y . For example, if Lineage L has Y P(A) = 0.50
and P(AB) = 0.50, for range A at node Y, the probability
of inheritance of range A by the two descendant lineages
is 1.0. For range AB at node Y , the descendant lineages
may inherit A, B, or AB, each with a probability of
1/3. The post-Y probability of range A for Lineage L is,
therefore, post−Y P(A) = 0.5 ∗ 1.0 + 0.50 ∗ 0.333 = 0.667.
Post-Y range probability calculations were carried out
using RAD@Y, a Python 2.5 user interface program
developed by the authors for this purpose and available upon request. The post-Y ranges provide information on range inheritance of the descendant lineages and range evolution along the stem of crown
groups.
For all comparisons between Equation 1 and
Equation 2 in the empirical example using Aesculus,
a quick and conservative approach was used by allowing F(rY ) to have its largest possible value, that is,
F(rY ) = 1 (when there was no uncertainty from DIVA
optimization for node Y), thus Equation 2 = pp. If
the maximized values of Equation 2 are still significantly smaller than those found by using Equation 1,
the conclusion that Equation 2 does not effectively reflect the probability estimated by Equation 1 can be
made.
356
2
Journal of Systematics and Evolution
Vol. 47
No. 5
Results
2.1 Equation 1 vs. Equation 2 in simulated data
Relationships between all lineages were poorly
supported (Fig. 3: A). The highest pp support was observed for the sister relationships between Lineages 1
and 8 (pp = 52%) and Lineages 4 and 9 (pp = 48%).
In the randomly selected subset of Bayesian trees, the
monophyly of Lineages 1 + 8 was supported in 58%
of the data and the monophyly of Lineages 4 + 9 was
supported in 47% of the data (Fig. 3: A). Results from
DIVA using the 50% majority rule tree with nodes compatible (Fig. 3: A) indicated that the geographic range
for the node shared by Lineages 8 + 1 was A only (no
alternative solutions), thus F(rY ) = 1.0. For the node
shared by Lineages 4 and 9, results from DIVA showed
an ancestral range of BD only with F(rY ) = 1.0. Therefore, the probabilities of ancestral ranges for these nodes
based on Equation 2 were P(A) = 0.54 ∗ 1.0 for L8 + 1
and P(BD) = 0.47 ∗ 1.0 for L4 + 9, implying that the
geographic origins of both L1 and L8 are most likely to
have occurred in A with probability of 0.54, whereas the
geographic origins of L4 and L9 are both most likely to
have occurred in BD with probabilities equal to 0.47.
In the revised Bayes-DIVA approach applying
Equation 1, the most probable ancestral ranges at four
parent nodes (Fig. 3: B), Lineage 1 + x, Lineage 4 + x,
Lineage 8 + x, and Lineage 9 + x, inferred from the
2009
sample of 100 Bayesian trees were A (P = 0.744), BD
(P = 0.484), A (P = 0.755), and BD (P = 0.643), respectively (Fig. 4: A). All most highly supported ancestral
ranges for each parent node of interest were significantly
greater than the second most highly supported ancestral
range (Fig. 4: A) and were significantly greater than
those obtained by using Equation 2, except in the case
of Lineage 4 + x (Table 1). The probability of BD was
significantly higher for Lineage 9 compared to Lineage
4 (Fig. 4: B), and P(BD) was equal for Lineages 4 and
9 when using Equation 2.
2.2 Results of analyses using Aesculus
Phylogenetic analyses of different data partitions
showed strong support for the monophyly of polytypic
groups but poor resolution of relationships among the
major lineages (Fig. 5). In the analysis including fossils, support for the monophyly of all polytypic lineages
greatly decreased (Fig. 6). Fossil species were observed
to ally variously with all major lineages and, rarely, with
outgroup species, with low support (Fig. 6).
Results from the modified Bayes-DIVA analysis
(below) are not presented on the consensus tree or other
graphical representation of the relationships between
clades of Aesculus. This is for three reasons. First,
Y cannot be accurately reflected on a consensus tree
or other single topology. Second, the probabilities of
ancestral ranges calculated using Equation 1 are not
Fig. 4. Results of Bayes-dispersal–vicariance analysis of simulated data. A, Relative frequency graphs showing probability (P) of ancestral ranges for
the parent nodes of Lineages 1, 4, 8, and 9 and their unspecified sisters (x). Circled numbers correspond to numbered lineages. Ranges are shown above
graphs. Results of Z-test comparing the most highly supported range to the second most highly supported range shown below frequency boxes. Arrows
point to bars compared in B. B, Comparison of P(BD) as ancestral range of Lineages 4 and 9.
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
Table 1 Comparison of pp∗ F(rY ) (Equation 2) and P(rY ) =
simulated data
Sister
in Bayesian
consensus
tree
Ancestral area
from consensus
tree
optimization
pp support
for sister
in consensus
of 19,800
n
t=1
357
F(rY )t Pt (Equation 1) for ancestral areas of lineages of interest from the Java script
Equation 2
results†
Most highly
supported ancestral
area from BayesDIVA analysis
Equation 1
results
z statistic
for comparison
of EQ1
and EQ2
Significant
difference at
α/2 = .005
p value
Lineages
1
Lineage 8
A
0.52
0.58
A
0.744
3.90
yes
<0.0001
4
Lineage 9
BD
0.48
0.47
BD
0.484
0.286
no
0.779
8
Lineage 1
A
0.52
0.58
A
0.755
4.37
yes
0.0002
9
Lineage 4
BD
0.48
0.47
BD
0.643
3.80
yes
<0.0001
†pp∗ F(rY ) was estimated with F(rY ) = 1 for a more conservative test on the majority rule tree of the 100 sampled trees; ‡A, B, C, and D were used in
dispersal–vicariance analysis (DIVA) of simulated data to represent four hypothetical, unique areas. pp, posterior probability.
Fig. 5. Bayesian trees from phylogenetic Analyses 1 and 2 of Aesculus. Consensus trees were condensed, showing major lineages. Values of posterior
probability support are above branches, and bootstrap support are below branches. Modern ranges subtend terminal names corresponding to areas
indicated in C. A, Results of analysis of extant taxa only (Analysis 1 with ambiguous region coding and simple gap coding in matK). Numbered nodes
correspond to nine nodes of interest considered in Bayes-dispersal–vicariance analysis. 1, Asian clade + x; 2, sect. Aesculus + x; 3, Aesculus calfornica
+ x; 4, sect. Macrothyrsus + x; 5, sect. Pavia + x; 6, sect. Parryana + x; 7–9, last shared ancestor of species of polytypic lineages, that is, crown nodes.
B, Results of Analysis 2 including extant species only and with ambiguous region coding and simple gap coding for all gene regions. C, Geographic
map indicating areas used in Bayes-dispersal–vicariance analysis analysis, created using Online Map Creation (Weinelt, 1999). EA, East Asia; eNA,
eastern North America; EU, Europe; LA, Latin America; wNA, western North America.
dependent on position of the clades on the tree nor on
pp support shown on the tree for clades, though they
are weighted by these values. Finally, assuming that x
is best represented by the sister group indicated on the
consensus tree, topology limits confidence in the results
of the revised Bayes-DIVA analysis to confidence in
nodal support.
C
2009 Institute of Botany, Chinese Academy of Sciences
Using the revised Bayes-DIVA analysis, the ancestral ranges of the crown nodes of interest (Fig. 5: A,
nodes 7–9), sect. Aesculus, sect. Pavia, and the Asian
clade, were estimated to be EA-EU, eNA, and EA, respectively, in all sampled trees. Therefore, the probabilities for these ranges at these crown nodes are all
equal to 1.0. In this case, there is no difference between
358
Journal of Systematics and Evolution
Vol. 47
No. 5
2009
Fig. 6. Results from phylogenetic analysis of Aesculus including extant species and wildcard fossils (Analysis 3). A, Bayesian consensus trees from
Analysis 3. Values of posterior probability support from 19,800 trees are shown above branches and those from 100 randomly sampled trees are below
branches. Fossils highlighted in gray. Dashed lines indicate placement of Aesculus “magnificum” in consensus of 19,800 trees (lower) and 100 trees
(upper). Distributional ranges are provided to the right of terminals corresponding to those indicated in B. B, Geographic map indicating areas used
in Bayes-dispersal–vicariance analysis analysis, created using Online Map Creation (Weinelt, 1999). EA, East Asia; eNA, eastern North America; EU,
Europe; LA, Latin America; wNA, western North America.
Equation 1 and Equation 2 because all groups in question were supported by 100% pp and there was no optimization uncertainty in DIVA for all sampled trees.
In contrast, the ancestral ranges at the parent nodes
of the Asian clade, sect. Aesculus, sect. Pavia, A. californica, sect. Parryana, and sect. Macrothyrsus (Fig. 5:
A, nodes 1–6, respectively) were sensitive to topological rearrangements. More than one optimal geographic
range was resolved for each of these parent nodes using Bayes-DIVA (Fig. 7). For five of the six lineages, a
most probable range with P ≥ 0.5 was recovered. The
most probable range for the parent node of the Asian
clade was shown to be EA with P = 0.755 (Table 2,
Fig. 7: A). An EA distribution was also revealed to
be the most likely for the parent node of sect. Aesculus (P = 0.832) (Table 2, Fig. 7: B). For the parent
nodes of sect. Pavia and A. californica, the most likely
ancestral ranges were shown to be widespread in eNAwNA and EA-wNA, respectively (P = 0.663 and 0.76)
(Table 2, Fig. 7: C, D), whereas the parent nodes of
sect. Parryana and sect. Macrothyrsus were both shown
to be widespread in eNA-wNA (P = 0.90 and 0.395,
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
359
Fig. 7. Probabilities of ancestral ranges for the six major lineages of Aesculus L. Highest probabilities are given in black text in beveled slices. A,
Asian clade. B, Section Aesculus. C, Aesculus californica. D, Section Pavia. E, Section Parryana. F, Section Macrothyrsus. EA, East Asia; eNA, eastern
North America; EU, Europe; LA, Latin America; wNA, western North America.
C
2009 Institute of Botany, Chinese Academy of Sciences
360
Journal of Systematics and Evolution
Vol. 47
No. 5
2009
Table 2 Most probable ancestral ranges of the stem node of six lineages inferred from analysis without fossils and comparison between Equation 1
and Equation 2 calculations of probability
Lineage
Most probable
range
Equation 1
results
Equation 2
results†
Z-statistic‡ for
comparison of
Eqn 1 to Eqn 2
P
value§
Asian clade
EA
0.755
0.73
0.580
0.5619
Aesculus
EA
0.832
0.73
2.719
0.0065
Aesculus californica
EA-wNA-eNA
0.760
0.75
0.234
0.815
Pavia
eNA-wNA
0.663
0.70
−0.781
0.4348
Parryana
eNA-wNA
0.900
0.70
6.628
<0.0001
Macrothyrsus
eNA-wNA
0.395
none ≥0. 50
—
—
†pp∗ F(rY ) of Equation 2 was estimated with F(rY ) = 1 (no optimization uncertainty), leading to pp∗ F(rY ) = pp for conservative test. See Material
and Methods, 1.2.3; ‡Two-tailed z-test; §Highlighting indicates significance at Zα/2 , α = 0.01; —, Eqn 2 not used for calculation of ancestral range
probability for sect. Macrothyrsus as no posterior probability (pp) value is available from 50% majority rule consensus of Bayesian topologies. EA, East
Asia; eNA, eastern North America; wNA, western North America.
Table 3 Differences between pp and pp∗ F(rY ) of Equation 2 for Aesculus stem lineage nodes on Bayesian consensus tree derived from analysis without
fossils (Analysis 1)† . F(rY ) was calculated using 1/N
pp supporting
relationship to sister in
consensus topology
Lineage
Sister lineagein
50% MJ rule‡
Asian clade
Aesculus
0.70
Aesculus
Asian clade
0.70
A. californica
(Aesculus +
Asian clade)
0.75
Most probable
range(s)
from DIVA
analysis of
consensus tree‡
EA
EU-EA
EA
EU-EA
EU-EA
Equation 2
results
0.365
0.365
0.365
0.365
0.250
EA-wNA
0.250
EU-EA-wNA
0.250
Pavia
Parryana
0.70
eNA-wNA
0.70
Parryana
Pavia
0.70
eNA-wNA
0.70
†
Sect. Macrothyrsus pruned for this analysis to produce fully bifurcating tree topology; ‡ See Fig. 5: A. DIVA, dispersal–vicariance analysis; EA, East
Asia; eNA, eastern North America; EU, Europe; pp, posterior probability; wNA, western North America.
respectively) (Table 2, Fig. 7E, F). Some of these probability values are greater than the pp support for the
nodes shared by these lineages and a specific sister and
all are greater than the Equation 2 values for nodes
present in the 50% majority rule Bayesian consensus
(Table 3). The probabilities obtained for ranges at the
parent nodes of sect. Parryana + x and sect. Aesculus +
x were significantly different from those obtained using
Equation 2 (Table 2), for which the relationships sect.
Parryana + sect. Pavia and sect. Aesculus + the Asian
clade were used for DIVA analysis (Table 3).
When fossils were included in the Bayes-DIVA
analysis, the probability of any ancestral range including Europe, P(EU ∈ R), increased significantly for three
of six parent nodes (Fig. 8, Table 4) when compared
to results from trees with fossils pruned. The value of
P(EU ∈ R) increased significantly for all six parent
nodes when compared to results from trees resulting
from phylogenetic analysis including extant taxa only
(Table 4). In contrast, changes in the probability of
ranges including East Asia, P(EA ∈ R), were less dramatic when fossils were included vs. excluded (Fig. 8,
Table 4).
Fig. 8. Comparison of P(EU ∈ R) and P(EA ∈ R) for the six parent nodes
of interest when fossils are included, pruned, and excluded. Probability
(P; y axis) is the probability of any ancestral area, including widespread
areas, that include Europe (left) and East Asia (right).
Post-Y range calculations yielded moderate to
high support for a post-Y range of EA for the Asian
clade and sect. Aesculus (post−Y P(EA) = 0.837 and
0.888, respectively) (Table 5). The most probable postY range for sect. Pavia was eNA, but with lower support (post−Y P(eNA) = 0.543) (Table 5). For the other
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
361
Table 4 Change in probabilities of ancestral ranges including Europe (EU) and East Asia (EA) when fossils excluded vs. included. A, Comparison
of probabilities calculated using Bayes- dispersal–vicariance analysis with fossils pruned vs. fossils included on trees from analysis including both
extant and fossil species. Highlighting indicates significant change in P. B, Comparison of probabilities with fossils included (trees from Analysis 3) vs.
excluded (trees from analysis including only extant species). Highlighting indicates significant change in P
Lineage
P(EU ∈ R),
fossils excl.
P(EU ∈ R),
fossils incl.
P(EU ∈
R)†
P
value‡
P(EA ∈ R),
fossils excl.
P(EA ∈ R),
fossils incl.
P(EA ∈
R)†
P
value‡
0.745
0.678
0.000
0.033
0.044
0.000
0.655
0.707
0.183
0.114
0.112
0.200
0.090 ↓
0.029 ↑
0.183 ↑
0.081 ↑
0.068 ↑
0.200 ↑
0.1648
0.6599
< 0.0001
0.0282
0.0735
< 0.0001
1.000
0.674
0.000
0.033
0.033
0.000
0.992
0.568
0.192
0.044
0.062
0.179
0.008 ↓
0.106 ↓
0.192 ↑
0.011 ↑
0.029 ↑
0.179 ↑
0.3703
0.1223
< 0.0001
0.9681
0.3371
< 0.0001
0.008 ↓
0.422 ↓
0.573 ↑
0.022 ↑
0.026 ↑
0.158 ↓
0.3703
< 0.0001
< 0.0001
0.3843
0.3953
0.0107
A
Asian clade
Aesculus
Aesculus californica
Pavia
Parryana
Macrothyrsus
B
Asian clade
0.000
0.655
0.655 ↑
< 0.0001
1.000
0.992
Aesculus
0.088
0.707
0.619 ↑
< 0.0001
0.990
0.568
Aesculus californica
0.080
0.183
0.103 ↑
< 0.0001
0.765
0.192
Pavia
0.000
0.114
0.114 ↑
< 0.0001
0.022
0.044
Parryana
0.007
0.112
0.105 ↑
0.0017
0.036
0.062
Macrothyrsus
0.058
0.200
0.142 ↑
0.0027
0.337
0.179
†Absolute value of change, arrow indicating direction of change when fossils included; ‡From z-test comparing two means.
Table 5 Possible post-Y ranges inherited from the most probable ancestral range (see Fig. 7) for each of the six major lineages of Aesculus
Lineage
†
Asian clade
(5)
Aesculus
(11)
Aesculus californica
(11)
Pavia
(7)
Parryana
(23)
Possible postY ranges‡
Probability for
each postY range
EA
EA-wNA
wNA
0.83700
0.06000
0.06000
EA
EU
EA-EU
0.88800
0.03250
0.02905
wNA
EA
EA-wNA
0.32500
0.26000
0.26000
eNA
wNA
eNA-wNA
0.54300
0.22100
0.22100
wNA
eNA-wNA
eNA
0.34400
0.30670
0.30670
Macrothyrsus
(15)
eNA
0.47000
wNA
0.15200
eNA-wNA
0.15100
†Total number of non-zero post-Y ranges are shown in parentheses below lineage names; ‡Only the three highest post-Y ranges are shown.
EA, East Asia; eNA, eastern North America; wNA, western North
America.
three major lineages of Aesculus, no single post-Y range
received support greater than 0.500 (Table 5). These
preliminary results, which do not represent all of the
available molecular and fossil data (see Harris et al.,
2009), revealed possible extinction in EA and migra
C
2009 Institute of Botany, Chinese Academy of Sciences
tion to wNA of the A. californica lineage, extinction in
wNA and dispersal to eNA of the sect. Pavia lineage,
and extinction in eNA and dispersal to wNA of sect.
Parryana.
3
Discussion
3.1 Accounting for phylogenetic and DIVA optimization uncertainties
Accounting for uncertainties in phylogeny and
optimization is a major challenge in biogeographic
analysis. The Bayes-DIVA method provides a simple
and sound solution to this problem. The Bayes-DIVA
method of Nylander et al. (2008) applies to nodes with
fixed bipartitions (i.e., the two sister lineages at a node
are clearly defined) and only trees containing these fixed
nodes are considered, however, the revised Bayes-DIVA
approach extends the method to allow estimation of geographic ranges at a node with only one of the two lineages defined and all trees containing the defined lineage
contribute to the estimation. This revision to BayesDIVA provides a method of estimating biogeogaphic
origins of lineages with uncertain sister affiliation with
statistical confidence. Both Bayes-DIVA methods require optimization of a large set of Bayesian topologies
and subsequent analyses of the results. It would be an
easier alternative solution if the product of the pp value
and F(rY ) obtained from the 50% majority rule tree
(i.e., Equation 2) could accurately reflect the full extent
of range information inherent in the sampled Bayesian
trees. However, our comparisons showed that this is not
362
Journal of Systematics and Evolution
Vol. 47
No. 5
the case (e.g., P(BD) for Lineages 4 and 9, Fig. 4: B)
and that probabilities calculated using Equation 2, even
when F(rY ) is equal to its maximum value of 1.0, are
usually lower than the probabilities obtained using the
revised Bayes-DIVA method. An alternative way to simplify the calculation of Equation 1 is to use 1/N (N is
the number of alternative optimal ranges from DIVA
for tree t) for F(rY ). However, we found that 1/N (implying occurrence of each unique alternative range with
equal frequency) can be very different from the actual
frequencies ( Ri t ) (Fig. 9). The values of F(rY ) calculated using Ri t can be substantially different in two trees
showing identical sister relationships at the node of interest but differing elsewhere (Fig. 9). Using Ri t as a
calculation of F(rY ) accurately reflects the frequencies
of ranges given the data, which is important because
the actual frequencies better reflect the uncertainty of
DIVA optimization. Because a range with 100% occurrence at node Y on a given tree suggests no uncertainty
in DIVA optimization, a range at node Y occurring more
frequently in the optimal MP pathways indicates greater
certainty of that range in DIVA optimization compared
to other ranges occurring at the lower frequencies. However, 1/N may be used if one prefers to weight the
alternative ranges at a node equally. Software for automation of analyzing the results from Bayes-DIVA and
calculation of probabilities is desirable as, at present,
this can be time consuming. The revised Bayes-DIVA
approach is not in disagreement with Nylander et al.
(2008) or Huelsenbeck and Immenov (2002), but rather
provides an alternative method of accommodating phylogenetic and optimization uncertainties extending to
parent nodes of crown groups with uncertain sisters.
Although the model-based, full Bayesian approach of
Sanmartı́n et al. (2008) has been developed to account
for phylogenetic and optimization uncertainties in inferring biogeographic dispersal events, this approach
is well suited for island biogeography, but may not be
suitable for continental biogeography.
Nylander et al. (2008) raised the question of how
range probabilities obtained using Bayes-DIVA should
be interpreted because the optimal ranges for each node
from DIVA represent only the most parsimonious solutions, rather than including all possible solutions that
may be statistically equally likely. This is no less of a
concern for the revised Bayes-DIVA approach presented
here. Nylander et al. (2008) hypothesized that optimal
solutions from DIVA might be treated as approximating
ML solutions and that the Bayes-DIVA method could
then be treated as a non-parametric empirical Bayesian
method. The method relies on empirical observations
to approximate the actual stochastic distribution (see
Johns, 1957). However, as noted by Nylander et al.
2009
(2008), it is not currently possible to determine how effectively DIVA MP solutions approximate ML solutions
because there is no stochastic model for DIVA and, thus,
no way of estimating the full range of distribution of solutions. Nonetheless, studies comparing biogeographic
inference using DIVA and the model-based likelihood
methods (e.g., Ree et al., 2005; Ree & Smith, 2008) have
found that results are often largely congruent (e.g., Ree
et al, 2005; Xiang & Thomas, 2008; Xiang et al., 2009).
This may support the necessary assumption that MP
solutions from DIVA are reasonable approximations of
the ML solutions.
3.2 Biogeographical inference of extant Aesculus
Conflicting biogeographic hypotheses have been
proposed for Aesculus (Hardin, 1957a; Xiang et al.,
1998b; Forest et al., 2001; Harris et al., 2009). The most
recent hypothesis was proposed by Harris et al. (2009)
based on results of DIVA using phylogenies inferred
from a combination of DNA sequences, morphology,
and fossils. The study of Harris et al. (2009) included
more molecular data and more fossils than were included here for testing the Bayes-DIVA method. We do
not attempt to describe the biogeographic history of lineages of Aesculus with the data presented here. Rather,
this portion of the discussion focuses on the utility of
this approach to Bayes-DIVA with respect to ancestral
distributions at certain nodes of interest.
Despite low to moderate support for placement of
five of six lineages in phylogenetic analysis of extant
taxa (Analysis 1), we were able to obtain high to moderate statistical support for the biogeographic origins
of these lineages (Fig. 5: A, nodes 1–3, 5–6) using the
new approach described here (Fig. 7, Table 3). For example, the placement of sect. Parryana was supported
by pp = 70% in phylogenetic analysis (Fig. 5: A) and
we obtained higher support (P = 0.90) for its ancestral
range in eNA-wNA (Table 2, Fig. 7: E). The support
for the ancestral range of eNA-wNA for the section was
much lower (P = 0.70) when estimated using Equation 2
(Table 3). The increased probability support for P(eNAwNA) for sect. Parryana using the revised Bayes-DIVA
occurred because some alternate placements of the section, for example, sect. Parryana + sect. Macrothyrsus and sect. Parryana + (sect. Macrothyrsus + A.
californica), yielded non-zero F(eNA-wNAParryana+x ).
For the node Y including sect. Pavia and x, the ancestral range eNA-wNA is supported weakly to moderately
(P = 0.663) (Table 2, Fig. 7: D). However, all four
possible ancestral ranges of sect. Pavia contain eNA,
resulting in P(eNA ∈ R) = 1.0 and providing high confidence for inference that the ancestral range of sect. Pavia
included eNA. Similarly this type of inference can be
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
363
Fig. 9. Comparison of Rit for an identical node in two different Bayesian trees from analysis including fossils (Analysis 3). A, B, Two Bayesian trees
from sample of 100 from Analysis 3. Section Aesculus highlighted dark gray; sect. Macrothyrsus (Aesculus parviflora) + Aesculus californica are
highlighted in light gray. Dots indicate the parent node of sect. Aesculus. The alternative range frequencies at this node are presented in C–E. C, Relative
frequencies of nine alternative optimal ancestral areas determined using 1/N. Arrow indicates area BC (EA – eNA), an example referred to in text. D,
Relative frequencies of the alternative ancestral areas determined based on actual occurrences by Ri t for tree A. Arrow indicates area BC (EA – eNA),
example referred to in text. E, Relative frequencies determined using Ri t for tree B. Arrow indicates area BC (EA – eNA), example referred to in text.
A, Europe; B, East Asia (EA); C, eastern North America (eNA); D, western North America.
applied to sect. Macrothyrsus, represented by a single
extant species known from southeastern USA, which
has an unresolved sister (i.e., part of a polytomy) in the
Bayesian consensus topology (Fig. 5: A, node 4). Although no single ancestral range with P ≥ 0.5 emerged
C
2009 Institute of Botany, Chinese Academy of Sciences
for sect. Macrothyrsus, the two ancestral ranges with the
highest probabilities, eNA (P = 0.395) and eNA-wNA
(P = 0.245) (Fig. 7: F), can be combined for a total
probability of P = 0.640 of eNA. Further exploration
of the biogeographic history of this group might begin
364
Journal of Systematics and Evolution
Vol. 47
No. 5
with eNA as a working hypothesis. This finding highlights the utility of the Bayes-DIVA analysis in cases
of polytomies. The ancestral ranges of the lineages of
interest inferred in this study are largely congruent with
those inferred in Harris et al. (2009), but here we show
statistical support deriving from analysis that takes into
account topological and optimization uncertainties.
3.3 Adding fossil wildcards
The addition of a European fossil appears to have
had a more significant effect than the addition of an East
Asian fossil on ranges estimated for the parent nodes of
interest (Table 4, Fig. 8). Bayes-DIVA results inferring
P(EU ∈ R) changed significantly for all six parent nodes
of interest (Table 4), whereas the P(EA ∈ R) increased
significantly for only two of the six nodes (Table 4). This
phenomenon can be explained by the fact that only one
extant species of Aesculus occurs in Europe, Aesculus
hippocastanum L., forming sect. Aesculus with Aesculus turbinata Blume (EA) with a pp = 100% (Fig. 5),
but there are several extant species in two major clades
occurring in EA. When EU is specified as the range for
only one, highly stable terminal taxon, the impact of
EU on optimal ancestral ranges for the major lineages
is expected to be small compared to EA. Adding a wildcard fossil from Europe to the phylogeny thus heavily
influenced the outcomes, that is, increased the probability of EU in the ancestral ranges of the parent nodes.
This finding suggests that including fossils from species
poor areas with phylogenetic uncertainty will have dramatic impact on results of biogeographic analysis. We
therefore recommend that special care be taken to reduce a fossil’s wildcard behavior (see Kearney, 2002;
Kearney & Clark, 2003) especially when introducing a
fossil from a geographic area unrepresented or poorly
represented by extant species. Lineages most affected
by the inclusion of wildcard fossil taxa appear to be
those that have a very low probability of a range including the distribution of the fossil (i.e., P(Rfossil ∈ R) when
fossils are not included in the analysis.
3.4 Determining probable post-Y ranges
How an ancestral range is subdivided and inherited by daughter lineages immediately following speciation (i.e., at the base of the internode of a branch) can
provide additional information about the total historical
biogeographic pathway of a lineage of interest. Recently,
range evolution along internodes on a phylogenetic tree
has been addressed by and can be calculated using the
model-based likelihood method of Ree et al. (2005) and
Ree and Smith (2008). Here we show that the marginal
probabilities obtained using Bayes-DIVA can also be
used to make inferences about range inheritance at the
2009
base of the internode and evolution along the branches.
Our analyses on range divisions at the parental nodes
revealed potential extinction and dispersal events along
branches of two Aesculus lineages (comparing results
of Fig. 7 and Table 5). Future studies could compare the
range evolution data from Bayes-DIVA and the likelihood method implemented in Lagrange.
4 Conclusions
As other authors have previously argued, it is
best to include all available and relevant information when using phylogeny to reconstruct biogeography (Tiffney & Manchester, 2001; Huelsenbeck &
Immenov, 2002; Ree et al., 2005; Nylander et al.,
2008). Historical biogeography is a synthetic discipline that produces the most reliable results when
analyses include data from divergence time, evidence
from paleobotany, geological and ecological data, as
well as highly resolved and robust phylogenies (e.g.,
Tiffney & Manchester, 2001; Emerson & Hewitt,
2005; Ree et al., 2005; Carstens & Richards, 2007;
Nylander et al., 2008). Biogeographic analysis using DIVA, which requires little prior information, is
perhaps not the best method of biogeographic reconstruction when information in addition to phylogenetic pattern and distributions of extant taxa is available. However, given that DIVA is fast, user friendly,
and produces results similar to those from the modelbased methods that implement prior information into the
optimization, our revised Bayes-DIVA approach provides a solution for authors who favor DIVA but face
the problem of polytomies. In biogeogrpahic studies
using DIVA, the prior information on divergence time
and area connections can be used to distinguish among
multiple optimal solutions. DIVA remains advantageous
when working with groups for which little or unreliable
prior information is available, for its ease of use and
freedom from potential error associated with model selection and model parameter determination required for
the model-based methods (Ree et al., 2005; Nylander
et al., 2008).
Bayes-DIVA offers an advantage over using DIVA
in the traditional way as well as over using many methods that require only a single input tree. This is because
the Bayes-DIVA analysis provides statistical support for
inferred ranges, allows for inference at poorly supported
parent nodes of lineages of interest, and allows for other
types of analyses of support for biogeographic reconstruction including the two applications we have shown
here. As suggested by previous authors (Lemmon &
Lemmon, 2008; Nylander et al., 2008; Sanmartı́n et al.,
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
2008), this approach is not limited to use with DIVA and
is applicable to other types of biogeographic analyses.
Acknowledgements The authors are highly indebted
to François LUTZONI (Duke University, Durham, NC,
USA) for his assistance in developing this approach
to use of DIVA software. We also acknowledge Beau
DABBS (University of Chicago, Chicago, IL, USA) for
his assistance with mathematics and statistics, David
THOMAS (formerly North Carolina State University)
for helpful discussion, Morris MADURO (University
of California, Riverside, CA, USA) for correspondence regarding the random sequence generation script,
Holly FORBES (University of California, Berkeley, CA,
USA) for collection of fresh leaf materials, and the
Gray Herbarium at Harvard University for the loan of
herbarium specimens. This manuscript is a part of the
thesis of AJ Harris submitted to the NCSU graduate
school in 2007. This study has benefited from a National Science Foundation (USA) grant made to Xiang
(DEB-0444125). For travel support to workshops and
symposia we thank the Deep Time Research Coordination Network, supported by a NSF grant funded to D.E.
Soltis (DEB-0090283), and the Phytogeography of the
Northern Hemisphere Working Group and the Clock
Workgroup supported by NSF through NESCent.
References
Axelrod DI. 1966. The Eocene Copper Basin flora of northeastern
Nevada. University of California Publications in Geological
Science 59: 1–83.
Bremer K. 1992. Ancestral areas: a cladistic reinterpretation of
the center of origin concept. Systematic Biology 41: 436–
445.
Budantsev LJ. 1983. History of the Arctic flora of the early Cenophytic epoch. Nauka, Leningrad. (in Russian).
Burbrink FT, Lawson R. 2007. How and when did Old World rat
snakes disperse into the New World? Molecular Phylogenetics and Evolution 43: 173–189.
Calviño CI, Martı́nez SG, Downie SR. 2008. Morphology and
biogeography of Apiaceae subfamily Saniculoideae as inferred by phylogenetic analysis of molecular data. American
Journal of Botany 95: 196–214.
Carstens BC, Richards CL. 2007. Integrating coalescent and
ecological niche modeling in comparative phylogeography.
Evolution 61: 1439–1454.
Condit C. 1944. The Remington Hill flora. Washington: Carnegie
Institute of Washington Publication 553: 21–55.
Crane PR, Herendeen P, Friis EM. 2004. Fossils and plant phylogeny. American Journal of Botany 91: 1683–1699.
de Lumley H. 1988. La stratigraphie du remplissage de la Grotte
du Vallonnet. L’Anthropologie 92: 407–428.
Dilhoff RM, Leopold EB, Manchester SR. 2005. The McAbee
flora of British Columbia and its relation to the early-middle
C
2009 Institute of Botany, Chinese Academy of Sciences
365
Eocene Okanagan Highlands flora of the Pacific Northwest.
Canadian Journal of Earth Science 42: 151–166.
Donoghue MJ, Smith SA. 2004. Patterns in the assembly of the
temperate forest around the Northern Hemisphere. Philosophical Transactions of the Royal Society of London: Biology 359: 1633–1644.
Emerson BC, Hewitt GH. 2005. Phylogeography. Current Biology 15: 367–371.
Felsenstein J. 1989. PHYLIP (Phylogeny Inference Package) Version 3.2. Cladistics 5: 164–166.
Felsenstein J. 2008. PHYLIP (Phylogeny Inference Package) Version 3.68. Distributed by the author. Seattle: Department of
Genome Sciences, University of Washington.
Forest F, Drouin JN, Charest R, Brouillet L, Bruneau A. 2001.
A morphological phylogenetic analysis of Aesculus L. and
Billia Peyr. (Sapindaceae). Canadian Journal of Botany 79:
154–169.
Fiz O, Vargas P, Alarcón M, Aedo C, Garcia JL, Aldasoro JJ.
2008. Phylogeny and historical biogeography of Geraniaceae in relation to climate changes and pollination ecology.
Systematic Botany 33: 326–342.
Golovneva L. 2000. Early Paleogene floras of Spitzbergen and
North Atlantic floristic exchange. Acta Universitatis Carolinae Geologica 44: 39–50.
Hardin JW. 1957a. A revision of the American Hippocastanaceae.
Brittonia 9: 145–171.
Hardin JW. 1957b. A revision of the American Hippocastanaceae,
II. Brittonia 9: 173–195.
Hardin JW. 1960. Studies in the Hippocastanaceae, V. Species of
the Old World. Brittonia 12: 26–38.
Harrington MG, Edwards KJ, Johnson SA, Chase MW, Gadek
PA. 2005. Phylogenetic inference in Sapindaceae sensu lato
using plastid matK and rbcL DNA sequences. Systematic
Botany 30: 366–382.
Harris AJ, Thomas DT, Xiang QY. 2009. Phylogeny, origin, and
biogeographic history of Aesculus L. (Sapindales): an update from combined analysis of DNA sequences, morphology, and fossils. Taxon 58: 108–126.
Hilton J, Bateman RM. 2006. Pteridosperms are the backbone
of seed-plant phylogeny. Journal of the Torrey Botanical
Society 133: 119–168.
Hines HM. 2008. Historical biogeography, divergence times, and
diversification patterns of bumble bees (Hymenoptera: Apidae: Bombus). Systematic Biology 57: 58–75.
Hu HH, Chaney RW. 1940. A Miocene flora from Shantung
Province, China. Washington: [bpa2]Carnegie Institute of
Washington Publication 507: 1–147.
Huelsenbeck JP, Immenov NS. 2002. Geographic origin of
human mitochondrial DNA: accommodating phylogenetic
uncertainty and model comparison. Systematic Biology 51:
155–165.
Huelsenbeck JP, Ronquist F. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754–755.
Huelsenbeck JP, Ronquist F. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:
1572–1574.
Huttunen S, Hedenäs L, Ignatov MS, Devos N, Vanderpoorten
A. 2008. Origin and evolution of the Northern Hemisphere
disjunction in the moss genus Homalothecium (Brachytheciaceae). American Journal of Botany 95: 720–730.
366
Journal of Systematics and Evolution
Vol. 47
No. 5
Jeandroz S, Murat C, Wang Y, Bonfante P, Tacon FL. 2008.
Molecular phylogeny and historical biogeography of the
genus Tuber, the “true truffles”. Journal of Biogeography
35: 815–829.
Jeong EK, Kim K, Kim JH, Suzuki M. 2004. Fossil woods from
the Janggi Group (Early Miocene) in Pohang Basin, Korea.
Journal of Plant Research 117: 183–189.
Johns MV Jr. 1957. Non-parametric empirical Bayes procedures.
The Annals of Mathematical Statistics 28: 649–669.
Judd WS, Saunders RW, Donoghue MJ. 1994. Angiosperm family pairs: preliminary analyses. Harvard Papers in Botany 5:
1–51.
Kauff F, Miadlikowska J, Lutzoni F. 2003. ARC: a program for ambiguous region coding. Available online at
http://www.lutzonilab.net/ and select “Downloadable Programs” [Accessed 10 October 2006].
Kauff F. 2005. RandomTree: random tree sampling. Available online at http://www.lutzonilab.net/ and select “Downloadable
Programs” [Accessed 10 October 2006].
Kearney M. 2002. Fragmentary taxa, missing data, and ambiguity: mistaken assumptions and conclusions. Systematic
Biology 51: 369–381.
Kearney M, Clark JM. 2003. Problems due to missing data in
phylogenetic analyses including fossils: a critical review.
Journal of Vertebrate Paleontology 23: 263–274.
Lemmon AR, Lemmon EM. 2008. A likelihood framework for
estimating phylogeographic history on a continuous landscape. Systematic Biology 57: 544–561.
Lieberman BS. 2003. Paleobiogeography: the relevance of fossils
to biogeography. Annual Review of Ecology and Systematics 34: 51–69.
Lim K. 2008. Historical biogeography of New World emballonurid bats (tribe Diclidurini): taxon pulse diversification.
Journal of Biogeography 35: 1385–1401.
Lutzoni F, Pagel M, Reeb V. 2001. Major fungal lineages are
derived from lichen symbiotic ancestors. Nature 411: 937–
940.
Maddison DR, Maddison WP. 2001. MacClade 4: analysis of
phylogeny and character evolution. Version 4.02. Sunderland: Sinauer Associates.
Mai DH, Walther H. 1988. Die pliozaenen Floren von Thueringen, Deutsche Demokratische Republik. Quartaerpalaeontologie 7:55–297.
Manchester SR. 1999. Biogeographical relationships of North
American Tertiary floras. Annals of the Missouri Botanical
Gardens 86: 472–522.
Manchester SR. 2001. Leaves and fruits of Aesculus (Sapindales)
from the Paleocene of North America. International Journal
of Plant Sciences 162: 985–996.
Manos PS, Soltis PS, Soltis DE, Manchester SR, Oh SH, Bell CD,
Dilcher DL, Stone DE. 2007. Phylogeny of extant and fossil
Juglandaceae inferred from the integration of molecular and
morphological data sets. Systematic Biology 56: 412–430.
Mansion G, Rosenbaum G, Schoenenberger N, Bacchetta G,
Rosselló JA, Conti E. 2008. Phylogenetic analysis informed
by geological history supports multiple, sequential invasions of the Mediterranean Basin by the angiosperm family
Araceae. Systematic Biology 57: 269–285.
Mast AR, Willis CL, Jones EH, Downs KM, Weston PH. 2008. A
smaller Macadamia from a more vagile tribe: inference of
2009
phylogenetic relationships, divergence times, and diaspore
evolution in Macadamia and relatives (tribe Macadamieae;
Protaceae). American Journal of Botany 95: 843–870.
Nixon KC, Wheeler QD. 1992. Extinction and the origin of
species. In: Novacek MJ, Wheeler QD eds. Extinction and
phylogeny. New York: Columbia University Press. 119–143.
Nylander JAA, Olsson U, Alström P, Sanmartı́n I. 2008. Accounting for phylogenetic uncertainty in biogeography: a
Bayesian approach to Dispersal–Vicariance Analysis of the
thrushes (Aves: Turdus). Systematic Biology 57: 257–268.
Page RDM. 1993a. COMPONENT: tree comparison software
for Microsoft Windows, Version 2.0, User’s Guide. London:
Natural History Museum.
Page RDM. 1993b. Genes, organisms, and areas: the problem of
multiple lineages. Systematic Biology 42: 77–84.
Page RDM. 1996. TREEVIEW: an application to display phylogenetic trees on personal computers. Computer Applications
in Bioscience 12: 357–358.
Page RDM. 2001. TreeView for Windows. Version 1.6.6. Available online at http://taxonomy.zoology.gla.ac.uk/ and select
“Software”.
Pagel M, Meade A, Barker D. 2004. Bayesian estimation of ancestral character states on phylogenies. Systematic Biology
53: 673–684.
Posada D, Crandall KA. 1998. MODELTEST: testing the model
of DNA substitution. Bioinformatics 14: 817–818.
Prakash U, Barghoorn ES. 1961. Miocene fossil woods from the
Columbia Basalts of Central Washington, II. Journal of the
Arnold Arboretum 42: 165–203.
Puri GS. 1945. Some fossil leaflets of Aesculus indica Colebr.
from the Karewa Beds at Laredura and Ningal Nullah, Pir
Panjal, Kashmir. Journal of the Indian Botanical Society 24.
Rambaut A, Drummond J. 2003. Tracer. Version 1.3. Available
online at http://evolve.zoo.ox.ac.uk/Evolve/Welcome.html.
Ree RH, Smith SA. 2008. Maximum likelihood inference of
geographic range evolution by dispersal, local extinction,
and cladogenesis. Systematic Biology 57: 4–14.
Ree RH, Moore BR, Webb CO, Donoghue MJ. 2005. A likelihood
framework for inferring the evolution of geographic range
of phylogenetic trees. Evolution 59: 2299–2311.
Ronquist F. 1996. Dispersal Vicariance Analysis (DIVA) 1.1.
User’s manual. Available online at http://www.ebc.uu.se/
syszoo/research/diva/diva.html.
Ronquist F. 1997. Dispersal–vicariance analysis: a new approach
to the quantification of historical biogeography. Systematic
Biology 46: 195–203.
Ronquist F. 2001. Dispersal Vicariance Analysis (DIVA)
1.2. Available online at http://www.ebc.uu.se/syszoo/
research/diva/diva.html.
Rothwell GW. 1999. Fossil and ferns in the resolution of land
plant phylogeny. Botanical Review 65: 189–218.
Rothwell GW, Nixon KC. 2006. How does the inclusion of fossil
data change our conclusions about the phylogenetic history
of euphyllophytes. International Journal of Plant Sciences
167: 737–749.
Sanmartı́n I, Ronquist F. 2004. Southern Hemisphere biogeography inferred by event–based models: plant versus animal
patterns. Systematic Biology 53: 216–243.
Sanmartı́n I, Enghoff H, Ronquist F. 2001. Patterns of
animal dispersal, vicariance and diversification in the
C
2009 Institute of Botany, Chinese Academy of Sciences
HARRIS & XIANG: Statistical approach to using DIVA
Holarctic. Biological Journal of the Linnean Society 73:
345–390.
Sanmartı́n I, Van Der Mark P, Ronquist F. 2008. Inferring dispersal: a Bayesian approach to phylogeny-based island biogeography, with special reference to the Canary Islands. Journal
of Biogeography 35: 428–449.
Schloemer-Jäger A. 1958. Alttertiare pflanzen aus flozen der
bragger-halbinsel Spitzbergens. Paleontographica Abt B
39–103.
Soltis DE, Morris AB, MacLachlan JS, Manos PS, Soltis PS.
2006. Comparative phylogeography of unglaciated eastern
North America. Molecular Ecology 15: 4261–4293.
∗
Swofford DL. 2002. PAUP – Phylogenetic analysis using par∗
simony ( and other methods). Version 4.0b10. Sunderland:
Sinauer Associates.
Szafer W. 1947. The Pliocene flora of Kroscienko in Poland.
Rozpr Wydz mat-przyr Akad Urn. 72: 91–162. (in Polish
and English).
Szafer W. 1954. Pliocene flora from the vicinity of Czorsztyn
(West Carpathians) and its relationship to the Pleistocene.
Institute of Geology of Warzawa 111: 1–238. (in Polish and
English).
Tanai T. 1952. The fossil vegetation from the coalified basin of
Nishitagawa, Prefecture of Yamagata, Japan. Japanese Journal of Geology and Geography 22: 119–135. (in French).
Tiffney BH, Manchester SR. 2001. Integration of paleobotanical
and neobotanical data in the assessment of phylogeographic
history of Holarctic angiosperm clades. International Journal of Plant Sciences 162: S19–S27.
Velazco PM, Patterson BD. 2008. Phylogenetics and biogeography of the broad-nosed bats, genus Platyrrhinus (Chiroptera:
Phyllostomidae). Molecular Phylogenetics and Evolution
49: 479–459.
Wehr WC. 1998. Middle Eocene insects and plants of the Okanagan Highlands. In: Martin JE ed. Contributions to the paleontology and geology of the West Coast: in honor of V.
Standish Mallory. Seattle: Thomas Burke Memorial Washington State Museum Research. 99–109.
Weinelt M. 1999. Online Map Creation (OMC) version 4.1.
Available online at http://www.aquarius.ifm-geomar.de [Accessed 1 Jan 2008].
Wen J. 1999. Evolution of the eastern Asian and eastern North
American disjunct distributions in flowering plants. Annual
Review of Ecology and Systematics 30: 421–455.
Wiens JJ. 2003. Missing data, incomplete taxa, and phylogenetic
accuracy. Systematic Biology 52: 528–538.
Wiens JJ. 2006. Missing data and the design of phylogenetic
analyses. Journal of Biomedical Informatics 39: 34–42.
Wiggins IL. 1932. The lower California buckeye, Aesculus parryi
A. Gray. American Journal of Botany 19: 406–410.
Xiang QY, Thomas DT. 2008. Tracking character evolution and
biogeographic history through time in Cornaceae—Does
choice of methods matter? Journal of Systematics and Evolution 46: 349–374.
Xiang QY, Soltis DE, Soltis PS. 1998a. The eastern Asian and
eastern and western North American disjunction: congruent
phylogenetic patterns in seven diverse genera. Molecular
Phylogenetics and Evolution 10: 178–190.
Xiang QY, Crawford DJ, Wolfe AD, Tang YC. 1998b. Origin and biogeography of Aesculus L. (Hippocastanaceae):
C
2009 Institute of Botany, Chinese Academy of Sciences
367
a molecular phylogenetic perspective. Evolution 52: 988–
997.
Xiang QY, Zhang WH, Ricklefs RE, Qian H, Chen ZD, Wen J,
Li JH. 2004. Regional differences in rates of plant speciation and molecular evolution: a comparison between eastern Asia and eastern North America. Evolution 58: 2175–
2184.
Xiang QY, Manchester SR, Thomas DT, Zhang WH, Fan C.
2005. Phylogeny, biogeography, and molecular dating of
cornelian cherries (Cornus, Cornaceae): tracking Tertiary
plant migration. Evolution 58: 1685–1700.
Xiang QY, Thomas DT, Zhang WH, Manchester SR, Murrell Z.
2006. Species level phylogeny of the genus Cornus (Cornaceae) based on molecular and morphological evidence
– implications for taxonomy and Tertiary intercontinental
migration. Taxon 55: 9–30.
Xiang QY, Smith SA, Harris AJ, Feng C. 2009. Use of fossils in
biogeographic analysis – challenges and possible solutions.
Abstract. Invited presentation: 4th International conference
of the International Biogeography Society, Merida, Mexico.
69.
Appendix I: DNA sequences of Aesculus and
outgroups.
Notes: For each taxon, information reads as taxon,
accession number, and gene sequence data available.
GenBank accessions are given following gene names.
Internal transcribed spacer (ITS) accessions are given in
the order ITS1, ITS2, and 5.8s if available. Superscripts
correspond to numbered accessions in Fig. 3a of Harris
et al. (2009).
Ingroup.—Section Aesculus—. A. hippocastanum L., Kew 00-69.11289-263, rps16 (EU687697)
matK (EU687725) ITS (EU687600, EU687637);
A. turbinata Blume, D.J. Crawford 4111 , rps16
(EU687695) matK (EU687723) ITS (EU687598,
EU687635); A. turbinata Blume, JC Raulston Arboretum 9500162 , rps16 (EU687696) matK (EU687724)
ITS (EU687599, EU687636, EU687666); Section
Calothyrsus (traditional)—. A. assamica Griff., Mongolia Expedition 10039, rps16 (EU687676) ITS
(EU687578, EU687615, EU687651); A. californica
(Spach.) Nutt., D.J. Crawford 4061 , rps16 (EU687689)
matK (EU687715) ITS (EU687590, EU687627,
EU687659); A. californica (Spach.) Nutt., T.M. Hardig
27952 , rps16 (EU687690) matK (EU687716) ITS
(EU687591, EU687628, EU687660); A. californica
(Spach.) Nutt., J.C. Raulston arboretum 9504133 ,
rps16 (EU687691) matK (EU687717) ITS (EU687592,
EU687629, EU687661); A. californica (Spach.) Nutt.,
UC Berkeley 93.12034 , rps16 (EU687692) matK
(EU687718) ITS (EU687593, EU687630, EU687662);
368
Journal of Systematics and Evolution
Vol. 47
No. 5
A. californica (Spach.) Nutt., UC Berkeley 93.11165 ,
rps16 (EU687693) matK (EU687719) ITS (EU687594,
EU687631, EU687663); A. chinensis Bunge, Q.Y.
Xiang 3051 , rps16 (EU687678) ITS (EU687580,
EU687617, EU687652); A. chinensis Bunge, Q.Y. Xiang 04-C882 , rps16 (EU687677) matK (EU687706)
ITS (EU687579, EU687616); A. indica (Camb.)
Hook, Q.Y. Xiang 3011 , rps16 (EU687686) matK
(EU687711) ITS (EU687587, EU687624); A. indica
(Camb.) Hook, J.C. Raulston Arboretum 0014052 ,
rps16 (EU687687) matK (EU687712) ITS (EU687588,
EU687625, EU687658); A. polyneura Hu & Fang, Q.Y.
Xiang 02-255, rps16 (EU687681) matK (EU687707)
ITS (EU687582, EU687619, EU687654); A. tsiangii Hu
& Fang, Q.Y. Xiang 04-C37, rps16 (EU687685) matK
(EU687710) ITS (EU687586, EU687623, EU687657);
A. wilsonii Rehder, Q.Y. Xiang 02-1051 , rps16
(EU687684) ITS (EU687585, EU687622, EU687656);
A. wilsonii Rehder., Q.Y. Xiang 04-C92 , rps16
(EU687683) matK (EU687709) ITS (EU687584,
EU687621, EU687655); A. wangii Hu, Q.Y. Xiang 303, rps16 (EU687682) matK (EU687708) ITS
(EU687583, EU687620); Section Macrothyrsus—. A.
parviflora Walter, J.C. Raulston arboretum sene non.,
2009
rps16 (EU687694) matK (EU687721) ITS (EU687596,
EU687633, EU687664); Section Pavia—. A. glabra
Willd., D.J. Crawford 413, rps16 (EU687702) matK
(EU687734) ITS (EU687607, EU687644, EU687671);
A. flava Sol., C.W. DePamphilis F-MI-41 , matK
(EU687737); A. flava Sol., Q.Y. Xiang 98-1502 ,
rps16 (EU687703) matK (EU687738) ITS (EU687610,
EU687647, EU687672); A. pavia L., Q.Y. Xiang
01-541 , rps16 (EU687700) matK (EU687732) ITS
(EU687605, EU687642, EU687669); A. pavia L., Q.Y.
Xiang 98-1352 , rps16 (EU687701) matK (EU687733)
ITS (EU687606, EU687643, EU687670); A. sylvatica Bart., Q.Y. Xiang 01-2511 , rps16 (EU687698)
matK (EU687726) ITS (EU687601, EU687638,
EU687667); A. sylvatica Bart., Q.Y. Xiang 98-1102 ,
rps16 (EU687699) matK (EU687728) ITS (EU687602,
EU687639, EU687668); Section Parryana—. A. parryi
Gray, Epling 1936 sene non, rps16 (EU687688) matK
(EU687714).
Outgroup.—Handeliodendron bodinieri (Levl.)
Rehd., Q.Y. Xiang 302, rps16 (EU687674) ITS
(EU687575, EU687612, EU687649); Billia Peyr
sp., Q.Y. Xiang 02-12, rps16 (EU687675) matK
(EU687705) ITS (EU687577, EU687614, EU687650).
C
2009 Institute of Botany, Chinese Academy of Sciences