Phylogenomic Insights into the Cambrian

Syst. Biol. 62(1):93–109, 2013
© The Author(s) 2012. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.
For Permissions, please email: [email protected]
DOI:10.1093/sysbio/sys074
Advance Access publication on September 4, 2012
Phylogenomic Insights into the Cambrian Explosion, the Colonization of Land and the
Evolution of Flight in Arthropoda
CHRISTOPHER W. WHEAT1,2,∗
1 Department
AND
NIKLAS WAHLBERG3
of Biosciences, PL 65, Viikinkaari 1, 00014 University of Helsinki, Finland; 2 Department of Zoology, Stockholm University, S-106 91
Stockholm, Sweden and 3 Laboratory of Genetics, Department of Biology, University of Turku, FI-20014 Turku, Finland
∗ Correspondence to be sent to: Department of Biosciences, PL 65, Viikinkaari 1, 00014 University of Helsinki, Finland;
E-mail: [email protected].
Received 11 May 2012; reviews returned 30 July 2012; accepted 20 August 2012
Associate Editor: Brian Wiegemann
Abstract.—The timing of the origin of arthropods in relation to the Cambrian explosion is still controversial, as are the timing
of other arthropod macroevolutionary events such as the colonization of land and the evolution of flight. Here we assess
the power of a phylogenomic approach to shed light on these major events in the evolutionary history of life on earth.
Analyzing a large phylogenomic dataset (122 taxa, 62 genes) with a Bayesian-relaxed molecular clock, we simultaneously
reconstructed the phylogenetic relationships and the absolute times of divergences among the arthropods. Simulations
were used to test whether our analysis could distinguish between alternative Cambrian explosion scenarios with increasing
levels of autocorrelated rate variation. Our analyses support previous phylogenomic hypotheses and simulations indicate
a Precambrian origin of the arthropods. Our results provide insights into the 3 independent colonizations of land by
arthropods and suggest that evolution of insect wings happened much earlier than the fossil record indicates, with flight
evolving during a period of increasing oxygen levels and impressively large forests. These and other findings provide a
foundation for macroevolutionary and comparative genomic study of Arthropoda. [Arthropod; Bayesian relaxed clock;
Cambrian explosion; comparative genomics; macroevolution; phylogenomics; timing of divergence]
Arthropoda, as the largest animal phylum, may be
considered the most successful group of living animal
species. It includes the largest class of metazoans on
earth, Insecta, which comprise more than half of all
described species (Grimaldi and Engel 2005). Beyond
sheer numbers, arthropods also exhibit an incredible
range of morphological and ecological diversity and
can be found in nearly all habitats on our planet.
Understanding the evolutionary history of Arthropoda
is therefore central to understanding the tempo and
mode of evolution on earth. However, resolving the
evolutionary history of the early arthropods is difficult
as the fossil record is generally scant and episodic
(Grimaldi and Engel 2005; Budd and Telford 2009). The
timing of the origin of arthropods and its relation to the
Cambrian explosion is still controversial, as are several
macroevolutionary events, such as the colonization of
land and the evolution of flight. Here we assess the power
of a phylogenomic approach to shed light on these major
events in the evolutionary history of life on earth.
ancestral lineages of the Cambrian fauna. Continuing
with the explosion metaphor, this interpretation is
referred to as a “short-fuse” scenario (Fig. 1a; Gould 1989;
Conway Morris 2006). Others view this “explosion” as
a historical artifact and argue for a deeper taxonomic
age of arthropods, with longer divergence times among
ancestral lineages extending back into the Precambrian.
This alternative is referred to as a “long-fuse” scenario
(Briggs and Fortey 2005). Although the short-fuse
scenario necessarily requires a dramatic evolutionary
radiation, the long-fuse scenario allows for either a
gradual diversification or an older, yet still rapid
radiation (Fig. 1b, c).
Numerous studies over the past 2 decades have
worked to illuminate the origins of the Cambrian
explosion using molecular data. Generally, molecular
estimates of metazoan ages have been older than fossil
based estimates, and among them there is considerable
differences in the length of their Cambrian fuse,
much of which can be attributed to the sampling
and analytical biases (Bromham 2006). Significant
advances in molecular dating with the advent of relaxed
clock methods suggest the potential for analyses that
are significantly less biased than previous studies
(Drummond et al. 2006; Battistuzzi et al. 2010). In
fact, a series of relaxed clock studies have reported
molecular-based estimates claiming congruence with
Cambrian fossil estimates (Aris-Brosou and Yang 2002;
Aris-Brosou and Yang 2003; Peterson et al. 2004; Peterson
et al. 2008). These results have been welcomed as
long sought evidence of a possible congruence between
molecular and fossil data (Briggs and Fortey 2005; Budd
2008). However, subsequent re-analysis of these studies
identify these claims to derive from spurious results
Origins: A Short- or Long-Fuse Cambrian Explosion?
Prior to the Cambrian, very few metazoan body or
trace fossils have been identified (Briggs and Fortey
2005). This paucity of metazoan fossils in the strata
of Earth is broken by the sudden appearance of
highly developed metazoan fossils in the Cambrian,
a pattern colloquially referred to as the Cambrian
evolutionary “explosion” (Conway Morris 2006). The
cause of this “explosion” remains an unsolved question
(Briggs and Fortey 2005), interpreted by some as
evidence of a very dramatic evolutionary radiation,
with short evolutionary divergence times among the
93
[17:24 14/12/2012 Sysbio-sys074.tex]
Page: 93
93–109
94
SYSTEMATIC BIOLOGY
VOL. 62
FIGURE 1. Representation of the competing hypotheses for the ancient evolutionary radiation of the Arthropoda. Squares are the first instance
of a given taxonomic group in the fossil record. In gray are indicated the temporal positions of the primary Cambrian Konservat-Langerstatte
sites of the Burgess Shale and Chengjiang, along with the termination of the last global glaciation event (i.e., snowball earth). a) short-fuse, b)
long-fuse, gradual, c) long-fuse, rapid. Phylogenetic relationships from Regier et al. (2010), fossils dates from Briggs and Fortey (2005).
(Blair and Hedges 2005; Ho et al. 2005), or result from
excessive restrictions on priors that overly bias posterior
estimates (Sanders and Lee 2010).
A very recent study took a different approach
by exploiting EST databases, sacrificing even taxon
coverage for large numbers of orthologous gene regions
(Rehm et al. 2011). The times of divergence in that study
were based on a fixed topology taken from a previous
study (Meusemann et al. 2010b), and the evolution of
amino acids was calibrated by giving 6 nodes minimum
ages and one node both a minimum and a maximum
age, but the distributions of these age priors are not clear
from the paper. The results suggest that the arthropods
originated in the Precambrian some 590 Ma. However,
several aspects of this study complicate interpretation.
First, an EST approach can result in a dataset with a
high percentage of missing data that can be problematic
for analyses (Sanderson et al. 2010). Second, the taxon
coverage of the several important taxa is restricted to
1 or 2 individuals (e.g., Myriapoda, Pycnogonida and
Pulmonata) or is missing crucial taxa (e.g., Xenocarida of
Pancrustacea). Third, the use of hard constraints for age
priors can lead to overparameterization and result in an
inability of the actual molecular data to have meaningful
effects on posterior estimates (Ho and Phillips 2009;
Heled and Drummond 2012).
When Arthropods Colonized Land and Evolved Flight
Arthropods appear to be the first multicellular animals
to colonize land and 3 groups are abundant in many
[17:24 14/12/2012 Sysbio-sys074.tex]
ecosystems even today, i.e., hexapods, chelicerates, and
myriapods. However, uncertainty remains regarding
when (Labandeira 2005) and how many times ancient
arthropods colonized land. The first identifiable
terrestrial animal fossils, which were the ancestors of
chelicerates and myriapods, appear in the late Silurian
(ca. 419 Ma; Jeram et al. Edwards 1990). The first
fossilized tracks that are clearly terrestrial occur earlier,
in the early Ordovician (ca. 490 Ma), and are mostly likely
left by chelicerate ancestors (MacNaughton et al. 2002).
The phylogenetic position of the hexapods, whose fossils
appear in the Devonian (400 Ma), has been controversial,
with one hypothesis suggesting that they are sister to
the myriapods, which would infer one colonization of
land by their common ancestor (Averof and Akam 1995).
However, it is becoming increasingly clear that insects
are a clade within crustaceans, suggesting independent
colonization of land by myriapods and insects, as well
as by chelicerates (Regier et al. 2010). Crustaceans,
such as woodlice, land crabs, and coconut crabs, have
presumably colonized land more recently and multiple
times (Wildish 1988; Schubart et al. 1998; Labandeira
2005).
Few molecular studies have explicitly addressed the
question of when and how many times land was
colonized. Pisani et al. (2004) suggest that the ancestor
of chelicerates and myriapods colonized land relatively
late in their evolutionary history, i.e., very close to the
time suggested by the fossil record. However, that study
suffers from sparse taxonomic sampling (n = 8 species),
which has been shown to affect estimates of times of
Page: 94
93–109
2013
WHEAT AND WAHLBERG—DIVERGENCE TIME ESTIMATES FOR ARTHROPODA
TABLE 1.
95
Arthropod species with genomic resourcesa and their human relevance
Human importance
No. of samples by common names
Species names
Disease vectors (malaria, yellow
fever, sleeping sickness,
Chagas, etc.)
3 mosquitoes, 1 tick, 1 louse, 3 flies, 1
bug
Agricultural pests
1 locust, 2 aphids, 2 beetles,1 fly, 1 ant, 2
moths, 1 earwig
Agricultural services (pollination,
products)
Research systems
(Ecology/Genetics/Development)
1 honey bee, 1 wasp, 1 moth
Anopheles gambiae, Culex quinquefasciatus, Aedes aegypti,
Ixodes scapularis, Pediculus humanus, Glossina morsitans,
Lutzomyia longipalpis, Phlebotomus papatasi, Rhodnius
prolixus
Locusta migratoria, Acyrthosiphon pisum, Aphis gossypii,
Dendroctonus ponderosae, Tribolium castaneum,Mayetiola
destructor, Solenopsis invicta, Choristoneura fumiferana,
Heliothis virescens, Spodoptera frugiperda, Forficula
auricularia
Apis mellifera, Nasonia vitripennis, Bombyx mori
a Species
Drosophila species, Bicyclus anynana, Heliconius melpomene,
Melitaea cinxia, Pieris rapae, Anthocharis cardamines,
Belenois gidica, Gonepteryx rhamni, Delias nigrina
12 fruit flies, 8 butterflies
having whole genome or transcriptome sequence in available databases.
divergences (Hug and Roger 2007). Other studies do not
explicitly address the colonization of land (Sanders and
Lee 2010; Rehm et al. 2011), but their estimates of times of
divergence of the 3 major groups suggest that land was
colonized much earlier, i.e., in the Cambrian.
Winged insects (Pterygota) are by far the most
successful group of terrestrial arthropods (they comprise
98.5% of known hexapods; Mayhew 2003; Grimaldi and
Engel 2005), evidently due to their ability to fly. Insects
were the first group of organisms to take wing and their
flight appears to have evolved only once. Although a
diverse range of winged insect fossils are found in the late
Carboniferous (ca. 325 Ma), and insect fossils are known
before this period at ∼420 and 390 Ma, the intervening
time period is one of exceptional fossil paucity across
nearly all life, known as Romers Gap (see Fig. 1 in
Ward ; Labandeira et al. 2006). As a result, very little is
known about when flight evolved (Mayhew 2003). The
recent study by Rehm et al. (2011) suggests that Pterygota
diverged from its sister group ∼450 Ma, which is in
strong conflict with the fossil record.
Dating in the Phylogenomic Era
Previously, all phylogenomic studies of arthropods
reflected a common sampling tradeoff, as studies used
either many genes and very few taxa (e.g., Pisani et al.
2004; Savard et al. 2006; Timmermans et al. 2008; Simon
et al. 2009) or few genes and many taxa (e.g., Giribet
et al. 2001; Mallatt et al. 2004; Sanders and Lee 2010). Both
approaches suffer from significant sampling biases that
can affect inference (Felsenstein 1978; Soltis et al. 2004;
Gatesy et al. 2007). Two recent studies have been made in
the true spirit of the phylogenomic paradigm, one based
on targeted single-copy nuclear genes (62 genes for 41 kb)
with an even taxon sampling scheme across all major
lineages of arthropods (Regier et al. 2010), the second
being the previously mentioned study by Meusemann
et al. (2010a) along with the companion article by Rehm
et al. (2011).
[17:24 14/12/2012 Sysbio-sys074.tex]
Despite these promising advances, many molecularbased studies suffer from analytical biases, such as
(i) assumptions of molecular-clock-like behavior (ArisBrosou and Yang 2002; Welch and Bromham 2005), (ii)
assumptions of no rate variation among positions (Welch
and Bromham 2005), (iii) the use of priors that bias
posterior estimates (Blair and Hedges 2005; Welch et al.
2005; Benton and Donoghue 2007; Ho and Phillips 2009;
Sanders and Lee 2010; Heled and Drummond 2012),
(iv) systematic overestimation of interior, basal branches
(Blair and Hedges 2005; Ho et al. 2005), and potentially
(v) violations of model assumptions of either random- or
autocorrelated rate variation (Battistuzzi et al. 2010). In
general, the interpretation of many phylogenetic studies
is complicated by some combination of these biases.
Here, we use an established Bayesian-relaxed
molecular clock approach (Drummond et al. 2006) and
explicitly avoid and quantify the previously mentioned
biases by (i) using soft bounds on priors (Ho and
Phillips 2009), (ii) assessing the effect of priors on
posterior estimates, (iii) conducting simulations to
assess how autocorrelated rate variation could affect
our posterior estimates of the Cambrian explosion [as
this violates the underlying assumptions of our analysis
(Battistuzzi et al. 2010)], and (iv) assessing the effect
of our fossil calibrations on temporal estimates (Hug
and Roger 2007). Our data sampling takes advantage
of phylogenomic advances in Arthropoda (Table 1)
by using an extended phylogenomic dataset based on
the study by Regier et al. (2010) for a computationally
intensive, simultaneous estimation of phylogenetic
relationships and times of divergence.
MATERIALS AND METHODS
Data
Sixty-two genes, identified as single copy within
Arthropoda (Regier et al. 2010), were used in
BLAST searches to find homologous sequences
in 3 different types of databases [whole-genome
Page: 95
93–109
96
SYSTEMATIC BIOLOGY
sequence (WGS), WGS predicted gene sets, and
assembled transcriptomes from EST sequencing
projects; see Table S1 for details, available online
at:
http://datadryad.org/review?wfID=3040&token
=836f474a-dd66-4000-af86-59ee83a90139]. We use data
from all of the publicly available arthropod genomes (n =
25) and several EST databases (n = 17). Species scientific
name, common name, length of DNA sequence
obtained, data source (genomic vs. transcriptomic),
database, and source are all provided (Supplementary
Table S1).
Gene sequence from WGS was obtained by first
identifying the beginning and end of the chromosomal
region containing the gene of interest, which was then
clipped and fed into the gene prediction program
Genescan
[http://genes.mit.edu/GENSCAN.html;
(Burge and Karlin 1997)]. Using custom python scripts,
identified genes collected from each species database
were aligned with reference sequence using Blastalign
software, which maintained codon positions, and
concatenated. Final alignment was doublechecked by
eye and third positions removed.
Time
Bayesian inference of phylogeny and times of
divergence were performed using the BEAST v1.5.4
software package (Drummond and Rambaut 2007).
Datasets were analyzed as one partition under the
GTR+ model with a relaxed clock allowing branch
lengths to vary according to an uncorrelated log-normal
distribution (Drummond et al. 2006). The branch lengths
were also allowed to vary according to an exponential
distribution to assess the effects of changing priors on
results. The tree prior was set to the birth–death process.
Initial runs with BEAST showed that arthropods did not
remain monophyletic with regard to the outgroup taxa
Tardigrada and Onychophora. Because the monophyly
of Arthropoda is not in question, we constrained it
to be monophyletic in the analyses. All other priors,
except calibration points described below, were left
to the defaults in BEAST. Parameters were estimated
using 26 independent runs of 10–20 million generations
each, with parameters sampled every 2000 generations.
Convergence and effective sample sizes of parameter
estimates were checked in the Tracer v1.4.6 program;
trimmed datasets were combined to yield output files
with 147,465 sampled generations, from which summary
trees were generated using TreeAnnotator v1.5.3.
Fossil Calibration Points
Analyses were based on 8 calibration points with their
priors modeled as a normal distribution, with means set
to the estimated age of the fossil and standard deviations
about these means were set to give confidence intervals
±5% of the fossil age. The use of a mean distribution for
priors, without hard upper or lower constraints, reflects
the uncertainty in the fossil record and allows posterior
[17:24 14/12/2012 Sysbio-sys074.tex]
VOL. 62
estimates to vary in either direction based on their
interactions with the other calibration points during
analysis (Sanders and Lee 2007; Ho and Phillips 2009).
The fossil calibrated nodes are as follows: (i) the first split
in Pycnogonida set to a mean of 425 Ma (SD 11), based
on the fossil Haliestes (Arango and Wheeler 2007), (ii)
the first split in Pulmonata (spiders and scorpions) with
a mean of 417 Ma (SD 11) based on the fossil Proscorpio
(Dunlop, Tetlie and Lorenzo 2008), (iii) the node defining
Communostraca (barnacles, crabs, lobsters, and wood
lice) with a mean of 425 Ma (SD 11) based on the fossil
Rhamphoverritor (Briggs et al. 2005), (iv) the first split
in Chilopoda (centipedes) at 417 Ma (SD 11) based on
the fossil Crussolum (Edgecombe and Giribet 2007), (v)
the first split in Vericrustacea (fairy shrimp, copepods,
barnacles, and crabs) with a mean of 516 Ma (SD 11)
based on the fossils Yicaris and Rehbachiella (Olesen 2004;
Zhang et al. 2007; Møller et al. 2008), (vi) the first split
in Hexapoda was given a mean of 425 Ma (SD 7), based
on the first known fossil of a hexapodan from the late
Devonian/early Silurian (Grimaldi and Engel 2005), (vii)
the first split in Holometabola with a mean of 300 Ma (SD
11) based on a fossil gall on a fern frond attributed to a
holometabolous insect (Labandeira and Phillips 1996),
and (viii) the first split in Diptera with a mean of 230
Ma (SD 11) based on the fossil Grauvogelia arzvilleriana
(Krzeminski et al. 1994). We note that calibrations 1–5
have been used in Sanders and Lee (2010), calibrations 6
and 8 have been used by Rehm et al. (Rehm et al. 2011),
and calibration 8 was used by Wiegmann et al. (2009).
We also tested the effects of using uniform priors
instead of normal priors for the calibration points. In
these cases, the above times were given as minimum
bounds and a maximum bound was given as 100 myr
older than the minimum bound.
Simulation Analyses
Previous assessment of the accuracy of relaxed clock
methods for temporal inference has explored the effects
of node constraints (i.e., fossil dates), tree topology, and
taxonomic sampling (Hug and Roger 2007). Taxonomic
subsets were found to give nearly identical temporal
inferences to the full datasets when enough taxa for
phyogenetic inference were used (i.e., tree topology
as the same in the subset taxa dataset) and coupled
with the same node constraints (Hug and Roger 2007).
This is an important observation when assessing the
temporal inferences from large phylogenomic datasets,
as the BEAST runs of our full dataset (122 taxa for 27,984
bp) with a sufficient number of chains for convergence
and sufficient effective sample sizes (147,465 sampled
generations) took ∼3600 h on a high-performance
computer cluster. In comparison, a taxonomic subset
consisting of 21 taxa for 27,984 bp, representing the
minimal taxa necessary to implement our 8 temporal
constraints, generally finished in 40 h on the same
cluster. This subset approach was necessary, because
simulating and running the full dataset is currently
Page: 96
93–109
2013
WHEAT AND WAHLBERG—DIVERGENCE TIME ESTIMATES FOR ARTHROPODA
computationally prohibitive: running the subset data
alone required roughly 400,000 CPU hours for the final
simulation analyses used in this article (40 different
parameter settings × 10 runs per parameter setting ×
2 modeled scenarios × 12 CPUs per run × 40 h per run).
All simulations used the same topology as our full
dataset. However, there were 2 types of trees generated,
one with branch lengths based on our observed data
(OD) and the other reflecting a Cambrian explosion
scenario (CE). CE branch lengths were identical to the
OD tree, except for those branch lengths >520 myr
old, as they were shortened to conform to a radiation
of the basal branches to have occurred within the 20
myr window between 540 and 520 Ma (Supplementary
Fig. S1). BEAST is very robust to a range of violations
of its lineage rate variation assumptions, except under
conditions of autocorrelated rate variation analyzed
with an uncorrelated log-normal model (Drummond
et al. 2006), which we used in our analyses. Thus,
by examining the potential effect of autocorrelated
variation on our analyses, we assessed the conditions
most likely to have the largest negative influence on our
findings.
Autocorrelated rate change assumes the heritability
of rate change, with closely related species having
more similar rates. Autocorrelated rate change among
branches was modeled by modifying the starting
tree’s branch lengths. Beginning with the basal branch,
new descendant lineage rates were drawn from an
exponential distribution having a mean equal to the rate
of their ancestral lineage as implemented in RateEvolver
(Ho et al. 2005) (which was kindly modified by its creator
Simon Ho to fit a topology matching our taxonomic
subset). The variance of the distribution from which new
rates were drawn can be modified, as can the probability
of a given branch experiencing a rate change and the
mutation rate. All simulations had a probability of rate
change per branch of 1, except for clock simulations.
Our starting tree had branch lengths equal to millions of
years, which were multiplied by the new autocorrelated
rates depending on the simulation and settings, and the
resulting modified trees then served as templates for
simulations of DNA sequence evolution.
DNA sequences were allowed to evolve with branch
lengths proportional to mutation rate, starting with
the root ancestor, as implemented in SeqGen (Rambaut
and Grassly 1997) using the graphical user interface
SGRunner, written by T.P. Wilcox and provided as part
of the Seq-Gen package. DNA mutation parameters
were determined using likelihood ratio testing upon
our observed subset data as implemented in Modeltest
3.7 (Posada and Crandall 1998). The best-fit model
(GTR + I + G) selected had base frequencies = (0.3138
0.2269 0.2397), substitution model rate matrix = (2.8601
2.7620 1.5287 1.6254 4.7010), gamma distribution shape
parameter = 0.8990, and proportion of invariable sites
= 0.4251, with the rate of transitions and transversions
equal. Using these parameters and modeling gamma
rate heterogeneity with 10 categories, SeqGen was used
to generate datasets of 21 taxa, for 27,984 nucleotides
[17:24 14/12/2012 Sysbio-sys074.tex]
97
(i.e., size and mutation parameters were identical to
our subset data). BEAST analyses upon these simulated
datasets followed previously described analyses on the
observed datasets, but with 50 million chains run,
sampled every 2000.
For each set of rate variation and variance parameters,
10 random realizations of autocorrelated branch length
manipulations were performed and analyzed with
BEAST. Output files were combined only after each
replicate was checked for posterior asympototic behavior
and an appropriate “burn-in” selected. Topologies
failing to asymptote after 40 million runs were discarded.
The trimmed output was then combined to yield
posterior estimates of mean rate, coefficient of variation
in rates, ucld.mean, ucld.stdev, and the mean and
95% highest posterior density (HPD) for age of the
arthropods. Please see schematic diagram provided for
more details (Supplementary Fig. S1).
RESULTS
Phylogenetic Relationships
Available arthropod databases were searched for
orthologs of the 62 nuclear genes used by Regier et al.
(2008, 2010). We identified a total of 854,786 bp sequence
data in 42 new species (average = 19,879 bp per species),
41 of which are from Insecta, represented by 25 species
from genome sequence databases and 17 species from
expressed sequence tag databases (Table S1). Sequences
were aligned and concatenated with the Regier et al.
dataset, for a total of 122 taxa in our study, and a
total of 28% missing data (our new 42 species had an
average of 45% missing data). Our sampling increased
the representation of the most species-rich group of
macro-organisms, the insects, by 300% compared with
the Regier et al. (2010) dataset (which contained 14
species of Insecta). All gene regions included are
protein-coding, single copy, and orthologous (Regier
et al. 2008), and thus alignment was relatively trivial.
For analyses, third codon positions were removed, as
they proved to be too variable to be phylogenetically
informative (Regier et al. 2008). Analyses concentrated
on estimating times of divergence, but topology was
estimated concomitantly using the software BEAST
(Drummond et al. 2006). The resulting tree file is
provided, containing detailed information on node
support and age estimates (Supplementary Text S1).
A stable topology is essential to having meaningful
estimates of divergence times. Our estimated topology is
essentially congruent with that reported by Regier et al.
(2010), differing at a few poorly supported nodes (Fig. 2).
Major lineages that were previously unrepresented, such
as the social insects (Hymenoptera), beetles (Coleoptera),
and the flies (Diptera), show expected relationships
and reinforce the placement of Hymenoptera as the
most basal branching lineage within the holometabolous
insects (Fig. 2; Savard et al. 2006; Consortium 2007;
Wiegmann et al. 2009; Mutanen et al. 2010). In addition,
Page: 97
93–109
98
SYSTEMATIC BIOLOGY
VOL. 62
FIGURE 2. Simultaneously estimated phylogenetic relationships and temporal divergences of the arthropods. Posterior probability support
for each node is indicated above the branch to the left of the node. The 95% HPD of the node age estimate is given for each node. Nodes that
were calibrated with fossils are indicated with orange 95% HPD bars. Taxa with Whole-Genome Sequences available are highlighted yellow;
those with comprehensive Expressed Sequence Tag libraries available are highlighted orange. Named clades are discussed in the text.
[17:24 14/12/2012 Sysbio-sys074.tex]
Page: 98
93–109
2013
TABLE 2.
WHEAT AND WAHLBERG—DIVERGENCE TIME ESTIMATES FOR ARTHROPODA
99
Ages of the major crown clades of Arthropoda
Clade
Arthropoda
Pycnogonida
Euchelicerata
Arachnida
Myriapoda
Pancrustacea
Ichthyostraca
Hexapoda
Paleoptera
Polyneoptera
Paraneoptera
Holometabola
Aculeata
Hymenoptera
Coleoptera
Lepidoptera
Nymphalidae
Pieridae
Drosophila
Estimated age
95% HPD
706
420
533
501
538
567
292
433
326
279
275
308
100
167
167
198
77
84
112
631–787
398–442
476–592
449–553
469–614
532–606
179–401
420–445
278–367
218–334
219–325
292–325
47–156
112–228
102–233
170–225
50–104
60–109
85–141
we find Paraneoptera and Polyneoptera to both be
monophyletic. Thus, increased taxon sampling had no
significant effects on topology, while providing greater
resolution within Insecta, indicating a stable backbone
for temporal inference.
Dates: An Assessment of Prior Bias on Posterior Estimates
Eight calibration points can be placed with relative
certainty on the arthropod phylogeny (Fig. 2; Grimaldi
and Engel 2005; Wiegmann et al. 2009; Sanders and
Lee 2010). These were modeled as true soft constraints,
using a normal distribution centered on the fossil age
estimate, reflecting the bidirectionality of uncertainty
inherent in such calibrations (Ho and Phillips 2009).
Mean estimates and their 95% HPD for the age of
clades of interest, and for the last common ancestor
between pairwise comparisons among model genomic
arthropods, were tabulated for easy access (Tables 2
and 3; a detailed pairwise species level list is also
included in Supplementary Table S2).
We next assessed how our priors had influenced
the posteriors by comparing our data driven
posterior estimates to the results of a “null analysis,”
which was solely based upon the fossil priors (i.e.,
without the DNA data). Of the 8 calibration points,
the posterior distributions of age estimates were
approximately the same as the prior distributions for
4 nodes (Vericrustacea, Communostraca, Diptera, and
Pycnogonida), yet had been “updated” by the data
resulting in posteriors that were younger for 2 nodes
(Pulmonata and Chilopoda; Fig. 3) and older for 2
nodes (Hexapoda and Holometabola; Fig. 3). Such
effects were also reflected in analyses using uniform
priors, with the first 4 nodes mentioned above having
a posterior distribution whose 95% HPD tails were
within the uniform prior bounds (not shown), the
second 2 having posterior distributions that are highly
[17:24 14/12/2012 Sysbio-sys074.tex]
FIGURE 3. Comparison of the effect of different priors on posterior
temporal estimates for 4 representative fossil calibrated nodes. Two
posterior density distributions are shown in each plot, resulting
from either a normal or uniform prior, along with the prior normal
distribution (derived from running the model without DNA). In the
Hexapoda and Holometabola plots, the distributions order from left
to right is the prior, the normal, and uniform posterior. Both posterior
estimates are older than the normal prior. In the Chilopoda and
Pulmonata, distributions are, left to right, normal posterior, normal
prior, uniform posterior. The normal posterior estimate is younger than
both its prior and the lower limit of the uniform prior. x-axes show
the relevant range of time in millions of years, while the y-axis is a
normalized value for representing the peak and distribution each set
of posterior estimates.
skewed up against the lower bound, and the latter
2 similarly skewed to the upper bound (Fig. 3). In
sum, our data did significantly influence our posterior
estimates, suggesting our soft priors did not result in any
over parameterization bias in our posterior estimates
for other nodes in our tree (Heled and Drummond
2012).
Estimated Ages of Divergences in Arthropoda
The topological robustness and generally low level
of rate variation among the branches of our resolved
tree (Fig. 4) together provide an excellent foundation
for temporal estimation of the age of Arthropoda.
No maximal temporal constraints were set in order
to minimize bias in our estimate, with the oldest of
the 8 calibration points being from Middle Cambrian
at 516 Ma for Vericrustacea (a clade containing fairy
shrimp, copepods, barnacles and crabs). Based on
these constraints, our estimated age for the crown
group of Arthropoda was 706 Ma with a 95%
HPD of 631–787 Ma (Table 2). The next divergence
between Euchelicerata and Mandibulata is estimated
to have happened 675 Ma (95% HPD: 608–746
Ma), followed by the divergence between Myriapoda
and Pancrustacea at 639 Ma (95% HPD: 580–702 Ma).
Page: 99
93–109
[17:24 14/12/2012 Sysbio-sys074.tex]
Ixodes
—
631–787
631–787
631–787
631–787
631–787
631–787
631–787
631–787
631–787
631–787
631–787
631–787
631–787
Locusta
706
—
324–371
324–371
324–371
324–371
324–371
324–371
324–371
324–371
324–371
324–371
324–371
324–371
Pediculus
706
346
—
219–325
219–325
311–350
311–350
311–350
311–350
311–350
311–350
311–350
311–350
311–350
Rhodnius
706
346
274
—
163–292
311–350
311–350
311–350
311–350
311–350
311–350
311–350
311–350
311–350
Aphids
706
346
274
226
—
311–350
311–350
311–350
311–350
311–350
311–350
311–350
311–350
311–350
Wasps
706
346
330
330
330
—
292–325
292–325
292–325
292–325
292–325
292–325
292–325
292–325
Beetles
706
346
330
330
330
308
—
270–305
270–305
270–305
270–305
270–305
270–305
270–305
Choristoneura
706
346
330
330
330
308
288
—
132–188
132–188
249–285
249–285
249–285
249–285
Macromoths
706
346
330
330
330
308
288
160
—
114–167
249–285
249–285
249–285
249–285
Butterflies
706
346
330
330
330
308
288
160
140
—
249–285
249–285
249–285
249–285
Flies
706
346
330
330
330
308
288
267
267
267
—
174–227
210–244
210–244
Mosquitoes
706
346
330
330
330
308
288
267
267
267
201
—
210–244
210–244
Glossina
706
346
330
330
330
308
288
267
267
267
227
227
—
135–199
Ages of the last common ancestor between the species having genomic sequence and emerging genomic species (having transcriptomic sequences available)
Drosophila
706
346
330
330
330
308
288
267
267
267
227
227
168
—
Notes: Values in upper matrix are mean divergence estimates (Ma); values below are their 95% highest posterior densities. Lesser known species are described under the matrix.
Ixodes scapularis, dear tick; human pest (Arachnida; Ixodida) Locusta migratoria, migratory locust; agricultural pest (Insecta; Orthoptera) Pediculus humanus, head louse; human pest (Insecta;
Phthiraptera) Rhodnius prolixus, blood sucking bug; human pest (Insecta; Hemiptera) Choristoneura fumiferana, spruce budworm; agricultural pest (Insecta; Lepidoptera) Glossina morsitans,
tsetse flies; human pest (Insecta; Diptera)
Ixodes
Locusta
Pediculus
Rhodnius
Aphids
Wasps
Beetles
Choristoneura
Macromoths
Butterflies
Flies
Mosquitoes
Glossina
Drosophila
TABLE 3.
100
SYSTEMATIC BIOLOGY
VOL. 62
Page: 100
93–109
2013
WHEAT AND WAHLBERG—DIVERGENCE TIME ESTIMATES FOR ARTHROPODA
FIGURE 4. Variation in rates of molecular evolution across branches of
the reconstructed arthropod phylogeny. Posterior estimates of branch
rates vary between 0.0002 (dark blue) to 0.0013 (red) mutations per myr.
Euchelicerata is estimated to have begun diversifying
533 Ma (95% HPD: 476–592 Ma), while Myriapoda first
diversified 538 Ma (95% HPD: 469–614 Ma) and
Hexapoda at 432 Ma (95% HPD: 420–445 Ma).
Test of the Cambrian Fuse Length: “Short” Versus
“Long-Gradual”
Our estimates for the age of Arthropoda and
the branch lengths for the subsequent diversification
[17:24 14/12/2012 Sysbio-sys074.tex]
101
significantly reject alternative hypotheses in favor
of the “long-gradual fuse” scenario. The 95% HPD
for Arthropoda crown group is entirely within the
Cryogenian, with the 3 subsequent bifurcations leading
to the major lineages of arthropods having all, or nearly
all, of their 95% HDP entirely in the Ediacaran, with
mean values for these nodes spanning a range of nearly
150 myr (Fig. 4; Table 2). Given these findings, we used
simulations to explore how much confidence we can
place in this rejection of the “short-fuse” hypothesis,
given the possible violations of the underlying lineage
rate variation model. In other words, we investigated
how much confidence we could place in the rejection of
the “short-fuse” scenario under “worst case” scenarios
of our estimation method.
Analysis of our “short-fuse” and “long-gradual fuse”
simulations focused on the estimated age of Arthropoda,
assessing how close to the fossil Cambrian explosion
event it was located. For both simulations, the 95%
HPD always contained the simulated age of Arthropoda
and the width between the lower and upper 95%
HPD values increased significantly with higher levels
of autocorrelated rate variation (Fig. 5, Table S3). Such
positive effects of autocorrelated rate variation on the
95% HPD interval reflect increasing uncertainty in
posterior estimations as autocorrelated rate variance
increases, as previously observed (Drummond et al.
2006; Battistuzzi et al. 2010). However, this increase
in the 95% HPD interval was not random in our
simulations (Fig. 5). In the “short-fuse” scenario, only the
upper 95% HPD increased dramatically with increases
in autocorrelated rate variation, while the lower 95%
HPD remained very stable, which likely derives from
the calibration points in the younger areas of the
dataset (Fig. 5a). In the “long-gradual fuse” scenario,
a similar effect was observed although there was a
greater decrease in the lower 95% HPD at high levels of
autocorrelated rate variation (Fig. 5b). Importantly, while
our observed 95% HPD does not include the Cambrian
explosion event, the 95% HPD of the “short-fuse”
simulations always contain the Cambrian explosion
event, even at levels of autocorrelated rate variation
much higher than the rate variation in our data.
In contrast, simulations of the “long-gradual fuse”
scenario, which is based on the branch lengths of our
observed tree, only included the Cambrian explosion
event at levels of autocorrelated rate variation that are
much higher than our observed levels, and in this region
the 95% HPD interval is extremely wide reflecting the
large uncertainly in estimation.
In sum, our simulations have allowed us to assess the
potential negative effects of autocorrelated rate variation
on our analyses, as well as the potential limitations of our
fossil calibration points. We find our observed branch
length insights unaffected by autocorrelation at levels
less than and well above the observed autocorrelated
levels. Simulations indicate that a “short-fuse” Cambrian
explosion could be detected at levels of autocorrelated
rate variation nearly 4 times higher than observed levels.
Thus, we conclude that we were able to significantly
Page: 101
93–109
102
SYSTEMATIC BIOLOGY
VOL. 62
FIGURE 5.
Simulation results showing posterior age estimates and the 95% HPD for Arthropoda under alternative Cambrian evolution
scenarios. Each simulation result (mean (circle), lower (square) and upper (triangle) 95 % HPD) is derived from the combination of 10 independent
topological replicates at a given level of autocorrelated rate variation (except when rate = 0). a) Modeling of the “short-fuse” scenario has all
branch lengths <520 Ma and prior age constrains identical to our observed results. Branch lengths of basal lineages >520 Ma were modified to
conform to a Cambrian explosion event between 540 and 530 Ma (shown in gray on both panels). b) Simulation results using observed branch
lengths. Black filled symbols are results from the observed dataset. The x-axis shows increasing levels of autocorrelated rate variation among
branches of the resulting simulated analyses. See Figure S1 for schematic.
[17:24 14/12/2012 Sysbio-sys074.tex]
Page: 102
93–109
2013
WHEAT AND WAHLBERG—DIVERGENCE TIME ESTIMATES FOR ARTHROPODA
distinguish between the alternative “short-fuse” and
“long-gradual fuse” alternatives for the origin of the
Arthropods (Fig. 1), and find strong evidence rejecting
the “short-fuse” scenario (Fig. 5; Supplementary
Table S3).
DISCUSSION
Estimating times of divergence using molecular data
is becoming a routine part of studies in molecular
systematics and powerful algorithms designed to
perform these analyses continue to be developed.
However, the amount of data needed to estimate ancient
times of divergence within and between phyla is not
clear at the moment. Our empirical results complement
simulation results (Battistuzzi et al. 2010) in suggesting
that datasets comprising about 60 protein coding gene
regions with a wide distribution of fossil calibration
points are enough to generate meaningful estimates of
times of divergence within a phylum.
The Age of Arthropoda and Its Ancient Lineages
Our estimates of the times of the deeper divergences
in Arthropoda are younger than most previous attempts
to date Precambrian divergences using molecular data
and Bayesian relaxed clock methods (e.g., Pisani et al.
2004; Sanders and Lee 2010; Schaefer et al. 2010).
Results of the recent phylogenomic analysis (Rehm et al.
2011) are the most comparable to our results based
on taxon and character sampling, and we find that
some nodes are estimated to be younger and others
to be older in our study. We note that Rehm et al.
(2011) did not sample some key taxa, based their times
of divergence on rates of amino acid changes on a
fixed topology taken from Meusemann et al. (2010),
and were forced to place an age prior on the root
of the topology due to the algorithm they used. In
contrast, our study has a more inclusive taxon sampling
scheme for basal divergences in Arthropoda (based on
Regier et al. 2010), we estimate times of divergence
on rates of nucleotide changes while simultaneously
estimating topology, and we have no restrictions on the
age of the root. Importantly, our phylogenetic hypothesis
differs at some crucial nodes compared with that of
Meusemann et al. (2010). For instance, Meusemann et
al. (2010) sample only 2 Myriapoda and find them to
be sister to Pycnogonida+Euchelicerata. They also find
Xiphosura (1 sampled taxon) to be sister to Araneae (1
sampled taxon) and these 2 sister to Acari (9 sampled
taxa), i.e., horseshoe crabs are found to be within the
chelicerate clade. In contrast, Regier et al. (2010), based on
maximum likelihood and our study based on Bayesian
inference, find Myriapoda (11 sampled taxa) to be sister
to Pancrustacea with strong support and Xiphosura
(2 sampled taxa) to be sister to the chelicerate clade
(which includes 14 sampled taxa of mites, scorpions,
and spiders). We find Pycnogonida (5 sampled taxa)
[17:24 14/12/2012 Sysbio-sys074.tex]
103
to be sister to the rest of Arthropoda, whereas Regier
et al. (2010) have it as sister to Euchelicerata. The latter 2
relationships are weakly supported in all analyses.
Our results strongly suggest a Precambrian origin
for the arthropods, which is consistent with new
fossil finds indicating that spongiform animals had
already diverged from eumetazoans at 630 Ma (Maloof
et al. 2010). We estimate that the arthropods began
diversifying in the Precambrian ∼706 Ma (95% HPD:
631–787), while Rehm et al. (2011) estimate a much
younger age for the first split in Arthropoda at 562 Ma
(95% confidence interval: 523–640), a result that may be
influenced by their root priors.
The Paleozoic is an important era for the
diversification of lineages leading to present day
classes. Xiphosura (horseshoe crabs) diverged from
their sister group Arachnida (mites, scorpions, and
spiders) during the Middle Cambrian (Table 2, clade
Euchelicerata), as did lineages leading to modern
day classes of Myriapoda (centipedes and millipedes;
Table 2). The times of divergence of these clades are
not entirely comparable to Rehm et al. (2011), as their
phylogenetic hypothesis is quite different at these
nodes, but in general, their times are fairly similar to our
estimates. For instance, their estimate for the divergence
of centipedes from millipedes is almost identical to
ours, placing it in the late Cambrian.
Early divergences in Pancrustacea (crabs, waterfleas,
and insects) appear to have happened just prior to
the Cambrian, although Cambrian divergences are
not ruled out (Table 2). Rehm et al. (2011) do not
sample any Oligostraca, thus they are likely missing the
earliest divergence in Pancrustacea. Within Pancrustacea
our results suggest that the clade Oligostraca needs
attention, as Ostracoda is not monophyletic with regard
to Mystacocarida, Pentastomida, and Branchiura. This
is in contrast to Regier et al. (2010), who recovered
Ostracoda as a monophyletic entity, although with
little support. The divergence between Branchiura and
Pentastomida is estimated to be in the Permian (Table 2,
clade Ichthyostraca), ∼200 myr younger than the recent
estimate of Sanders and Lee (2010), although of note
here is that the relationships are very different for
this region of the topology compared with the latter
study. Both Branchiura and Pentastomida are parasitic
and have highly modified morphologies, thus the
fossils attributed to Pentastomida from the Cambrian
(Waloszek et al. 2006) may represent a stem group of
the common ancestor of Branchiura and Pentastomida,
which we estimate to have diverged from ostracodans in
the early Ordovician or even the late Cambrian (445 Ma,
95% HPD: 362–526).
Our taxon sampling scheme covers a large number
of insect lineages including all of the early divergences
(Grimaldi and Engel 2005). The initial divergence
between Entognatha (springtails) and Insecta (insects)
appears to have happened in the Silurian (Table 2,
clade Hexapoda), and the early divergences in Insecta
during the Devonian (Fig. 2), leading to Archaeognatha,
Zygentoma, and Pterygota (winged insects). Pterygota
Page: 103
93–109
104
SYSTEMATIC BIOLOGY
diversified during the late Devonian and early
Carboniferous (Fig. 2). This is in stark contrast to Rehm
et al. (2011), who found that Pterygota diversified in
the late Silurian or early Devonian, despite giving
this node a minimum age in the late Carboniferous.
Similarly, we find that the most diverse extant group,
Holometabola, diversified into the extant orders during
the late Carboniferous and early Permian (Fig. 2),
whereas Rehm et al. (2011) suggest that they did so
much earlier in the late Devonian. As in Regier et al.
(2010), our analysis supports the Paleoptera hypothesis
of Ephemeroptera (mayflies) being the sister group of
Odonata (dragonflies), and the estimated ages of these
groups as early Carboniferous (Table 2) is in accordance
with the fossil record (Grimaldi and Engel 2005). Again,
Rehm et al. (2011) suggest that this group is much
older. Our estimate of the age of the divergence of
Dermaptera (earwigs) from the common ancestor of
Orthoptera (grasshoppers) and Blattodea (cockroaches)
is somewhat younger (early Permian; Table 2) than
the fossil record would suggest (early Carboniferous),
although our credibility interval for the divergence does
encompass the early Carboniferous. We estimate that
Phthiraptera (lice) diverged from Hemiptera (aphids,
bugs) in the early Permian (Table 2) and the Hemiptera
diverged into Sternorrhyncha (aphids, white flies, scale
insects) and Heteroptera (true bugs) in the early Triassic
(Fig. 2). The latter is in line with fossil evidence (Grimaldi
and Engel 2005).
The Colonization of Land and the Origins of Flight:
Insights from Molecular Data
Our estimated times of divergence provide insights
into the 3 independent colonizations of land by
arthropods (Fig. 6). The first lineage likely to have to
colonized land was the common ancestor to myriapods
in the early Cambrian (ca. 538 Ma), although the
95% HPD does stretch into the Ordovician (Table 2).
Colonization by the chelicerate clade Arachnida appears
to have occurred during the Cambrian or early
Ordovician (Fig. 6, Table 2). These estimates for
Arachnida are very close to the proposed ages for the
fossilized tracks of potential chelicerates at ∼490 Ma
(MacNaughton et al. 2002). A recent study on mite
evolution and their colonization of land based on limited
sampling of only one gene (Schaefer et al. 2010) suggests
that the common ancestor of mites existed some 570 Ma,
which is older than our estimate of the colonization of
land by the common ancestor of all arachnids (to which
mites belong). The hexapodans appear to have colonized
land later, during the Ordovician or Silurian (Fig. 6,
Table 2), although our estimate is earlier than the earliest
fossil hexapodans from the early Devonian (Grimaldi
and Engel 2005). The crown clade of Hexapoda was one
of our calibrated nodes, and the posterior estimates of
the time of the first divergence are essentially the same,
but somewhat older than the prior used to calibrate
the node. Interestingly, a fossil insect believed to be a
[17:24 14/12/2012 Sysbio-sys074.tex]
VOL. 62
pterygote has been recorded from the early Devonian
(Engel and Grimaldi 2004), suggesting that the age of
insects in general may go back to the Silurian or indeed
the Ordovician.
Given the overlap in the 95% HPDs among the
myriapod and chelicerate lineages, we cannot exclude
the possibility that the 2 colonizations happened at
about the same time, although this appears unlikely.
Rather, based on our mean estimates of the ages of the
crown clades, myriapods appear to have left the aquatic
life before plants, because fossil evidence for the first
terrestrial plants appears roughly 60 myr later at 475 Ma
(Gray 1985; Wellman et al. 2003). Arachnida also appears
earlier than these plants at 501 Ma. Hexapods on the
other hand appeared after the terrestrial plant fossils in
the late Ordovician at around 433 Ma (Wellman et al.
2003; Gensel 2008).
Grimaldi and Engel (Grimaldi and Engel 2005) state
that “[h]ow, when, and why insect wings originated is
one of the most perplexing conundrums in evolution”
(p. 158). Insect flight, which attains some of the highest
mass-specific aerobic metabolic rates known to science
(Sacktor 1976), has been hypothesized to have evolved
during a period of hyperoxic conditions from the Late
Devonian to the Late Carboniferous [ca. 375–250 Ma;
(Dudley 1998). This period is also known for insect
gigantism (Briggs 1985), even among flying insects
(Carpenter 1992). The higher O2 conditions during this
period, indicated in the geochemical record (Berner
1999), are argued to have facilitated the evolution of
flight by simultaneously providing higher amounts
of O2 for passive uptake by insects and increasing
air density for greater lift (Dudley 1998). Although
suggestive, molecular data are needed to provide
independent confirmation of whether the origins of
the Pterygota occurred during these exceptionally high
ancient atmospheric oxygen levels or predated this
event.
Based on our results, it is clear that the evolution
of insect wings happened much earlier than the fossil
record would suggest and led to a relatively rapid
radiation of insects (Fig. 7). The ancestor to all Pterygota
diverged from the common ancestor of Zygentoma
(silverfish) in the late Devonian (ca. 384 Ma). The putative
Devonian pterygote fossil (Engel and Grimaldi 2004)
is very much inline with our estimate of the origin of
Pterygota. The initial divergence was followed by a series
of divergences leading to clades with large numbers
of species, i.e., the lineages leading to Paleoptera (367
Ma), Polyneoptera (346 Ma), Paraneoptera (330 Ma),
Hymenoptera (308 Ma), Coleoptera (288 Ma), and finally
Lepidoptera and Diptera (267 Ma). Within 120 myr, the
basis for today’s incredible diversity of flying insects was
established. Intriguingly, the first 100 myr of this period
coincides with the period of increasing atmospheric
oxygen levels (Berner 1999), which began rising in the
late Devonian and peaked at about the same time as the
major holometabolan lineages diverged from each other
in the early Permian. This rise in atmospheric oxygen
is attributed to the appearance and spread of large
Page: 104
93–109
2013
WHEAT AND WAHLBERG—DIVERGENCE TIME ESTIMATES FOR ARTHROPODA
105
FIGURE 6. Multiple independent land invasions of the Arthropods. Clades that are aquatic are dark gray, while clades that are terrestrial
are diagonal line filled. Gray shaded areas give 3 the intervals during which colonization of land may have happened (not taking 95% HPD
intervals into account), except in the case of Malacostraca, where the lower limit of the colonization is not possible to ascertain based on the
taxon sampling.
and woody vascular plants, resulting in an increased
burial of organic carbon that in turn formed the most
abundant coal deposits in Earth’s history, from which
the time period’s name Carboniferous derives (Berner
1999). However, although flight clearly did evolve during
a period of increasing oxygen levels and impressively
large forests, the initial steps in the evolution of flight
are primarily associated with the Devonian just prior
to Romer’s Gap, when atmospheric O2 levels were
relatively low although fossils of large forests were well
established (Willis and McElwain 2002; Ward et al. 2006).
Dating with Confidence
Our findings suggest that in order to date with
confidence (Drummond et al. 2006), size does matter.
The necessity of multiple calibration points has been
discussed by a number of authors (Hug and Roger 2007;
Ho and Phillips 2009), but the size of a dataset, either
number of taxa or number of characters sampled, has
not received as much attention in studies estimating
times of divergence (Wertheim and Sanderson 2011). As
pointed out in the Introduction section, most previous
studies have sampled either very few taxa and many gene
regions or many taxa and few gene regions. Typical of
early studies based on few taxa and few gene regions
were much older estimates of divergence times than one
[17:24 14/12/2012 Sysbio-sys074.tex]
would expect based on the fossil record. Increasing the
number of gene regions has not appeared to alleviate
this problem (Sanders and Lee 2010), unless very strong
priors are used to calibrate the tree (Peterson et al. 2008),
which may decrease informative input from the data
itself (Sanders and Lee 2010). Increasing the number
of taxa and relaxing prior constraints in order to get a
better representation of the basal, ancient divergences
does appear to help in arriving at more realistic estimates
of when they have happened (e.g., Wahlberg et al. 2009;
Bell et al. 2010).
Several previous studies that have estimated the age
of various clades within the arthropods, which, while
using somewhat similar analysis methods to our own,
had much smaller phylogenetic breadth, taxonomic
depth, and gene sampling (see Introduction section).
The exception to this is the recent study based on EST
libraries (Rehm et al. 2011). The study by Rehm et al.
(2011) differs in several respects to our study, as we
have noted above. Our results conflict with theirs in
several important respects, our estimate for the age of
the basal split in Arthropoda is much older than theirs
(ca. 700 vs. ca. 560 Ma) and our estimates for the ages
of the basal splits in insects are much younger than
theirs (between 50 and 100 myr younger). However, both
studies do suggest that arthropods began diversifying
in the Precambrian and our simulations suggest that
this is a robust result. These age discrepancies for the
Page: 105
93–109
106
SYSTEMATIC BIOLOGY
VOL. 62
FIGURE 7. The origins and evolution of insect flight. The Insecta chronogram is taken from Fig. 2. The evolution of oxygen content of the
atmosphere over time is shown below the chronogram and is based on (Berner 1999). The darker shaded area gives the interval during which
major lineages of flying insects diverged.
major insect lineages are probably explained by the way
in which calibrations were implemented. Rehm et al.
(2011) appear to have used hard minimum bounds with
no maximum limits (either soft or hard), except at one
node describing the age of the first split in Diptera, which
has both a hard maximum and a hard minimum bound.
Compared with the use of these hard bounds, we feel
[17:24 14/12/2012 Sysbio-sys074.tex]
the use of soft bounds with mean ages provides for
more interaction between our data and uncertainty in
the calibration points during analysis. However, given
the exceptional taxonomic diversity of the Arthropoda,
estimations of the age of this clade will always be
based on an extremely limited subset of taxa. Whether
larger datasets of taxa and genes continue to disagree or
Page: 106
93–109
2013
WHEAT AND WAHLBERG—DIVERGENCE TIME ESTIMATES FOR ARTHROPODA
converge upon consensus remains to be determined in
the coming years.
Substantial changes in substitution rates along
branches could, in theory, complicate our ability to
date with confidence, especially when rate changes are
autocorrelated (Drummond et al. 2006; Battistuzzi et al.
2010). Using simulation analyses, we directly addressed
the potential negative effects of such variation across
a range of rate changes, from lower to much higher
than observed levels (Fig. 5). We found no effect on
the ability to detect a “short-fuse” Cambrian explosion
with increasing levels of autocorrelated rate variation,
with our observed results being significantly older than
the Cambrian explosion. Exploration of similar levels
of autocorrelated rate variation in our observed dataset,
which is consistent with a “long-fuse, gradual” scenario
of arthropod evolution, also finds no significant overlap
with the Cambrian explosion unless autocorrelated rates
are simulated at much higher levels.
Dating complications could also arise should rate
changes be associated with basal nodes close to the
Cambrian explosion. Indeed, previous study reported
an increase in substitution rate among the basal branches
of Arthropoda (Aris-Brosou and Yang 2002; Aris-Brosou
and Yang 2003). However, these increases in rate appear
to have been spurious results (Ho et al. 2005). Our
findings suggest that genome-wide rate variation across
the arthropod tree is low when averaged across a large
phylogenomic dataset (Fig. 4), especially among the
basal branches of the tree. However, previous study
using a 5-taxon analysis of genomic data did identify
an increased rate of molecular evolution for an internal
region of the arthropod phylogeny, in the branches
between Coleoptera and Diptera (Savard et al. 2006). Our
results agree. The increased taxonomic sampling of our
dataset localizes this rate increase to the basal branches of
insects with a significant increase in the lineages leading
to Diptera and Lepidoptera (Fig. 4). Further detailed
sampling coupled with additional fossil data is needed
to more accurately resolve the relationship between
divergence timing and evolutionary rates among these
ancestral branches.
A final concern is that of the age constraint
structure and fossil placements when estimating times
of divergence (Hug and Roger 2007; Ho and Phillips
2009), both of which can have large effects on results.
Our use of multiple true soft constraints, relatively
evenly spread throughout the topology and modeled
as normal distributions, has allowed us to observe how
these constraints affect the posterior results. Looking
at the individual nodes with the prior constraints,
we find no systematic bias in the effects, with some
posterior estimates of ages being older than the
priors, some being similar and some being younger
than our imposed priors (Fig. 3). By allowing such
shifts by using soft constraints, we feel, as others
do, that it provides a more robust analysis of the
data (Sanders and Lee 2007; Ho and Phillips 2009).
Moreover, by including all of these constraints in our
simulation analyses, we have simultaneously assessed
[17:24 14/12/2012 Sysbio-sys074.tex]
107
the interaction of these constraints with various amounts
of autocorrelated rate change, finding them sufficient for
our temporal investigations.
Given these findings, should the arthropods have
arisen quickly under a “short-fuse” Cambrian explosion
scenario, our analyses would have detected this event
even under some “worst case” scenarios of molecular
evolution. Complimentary to this, our observed estimate
of the crown age of Arthropoda was significantly older
than the Cambrian explosion event (Fig. 5a). In sum,
our observations and simulations are consistent with
a “long-fuse” scenario of gradual evolution of the
Arthropoda during the Precambrian, possibly beginning
in the Cryogenian.
CONCLUSIONS
The results presented here provide estimates of
times of divergence in the megadiverse phylum of
Arthropoda using what we believe to be the most robust
estimation methods available. In addition, we explicitly
test the Cambrian explosion hypothesis using these
methods on simulated data. Our molecular-based results
provide independent temporal estimates for the study
of macroevolutionary events that are complementary to
fossil data. Knowing the age of the arthropods, as well
as when subsequent major lineages appeared, provides
a powerful tool for studying macroevolutionary events
fundamental to our understanding of evolution. Recent
fossil finds from the Ediacaran suggest that Metazoans
are older than previously thought (Maloof et al. 2010;
Yuan et al. 2011), and such discovery is an ongoing
process that has continually pushed Metazoan origins
deeper in time. This growing body of empirical data
is concordant with our results in suggesting that
conclusions based on the absence of data, such as the
paucity of fossils in the Ediacaran, may be substantially
revised over time with new fossil finds.
SUPPLEMENTARY MATERIAL
Supplementary material, including data files
and/or online-only appendices, can be found in
the Dryad data repository at http://datadryad.org,
doi:10.5061/dryad.3r4j45d2.
FUNDING
Funding for this work was provided by the Academy
of Finland (grant numbers 131155, 129811).
ACKNOWLEDGEMENTS
A special thank you to R. Robertson and the excellent
guides at the Burgess Shale Geoscience Foundation for a
tour of the Walcott Quary, which inspired much of this
work. We thank Simon Ho for help with his program
Page: 107
93–109
108
SYSTEMATIC BIOLOGY
RateEvolver, and Joona Lehtomäki and Jussi NoksoKoivisto for help with Python. We are also grateful for
access to the resources of CSC -IT Center for Science,
Finland, which were used to perform the Bayesian
analyses. We thank associate editor Brian Wiegmann,
Karl Kjer and an anonymous reviewer for constructive
comments on previous versions of this article.
REFERENCES
Arango C.P., Wheeler W.C. 2007. Phylogeny of the sea spiders
(Arthropoda, Pycnogonida) based on direct optimization of six loci
and morphology. Cladistics. 23:255–293.
Aris-Brosou S., Yang Z. 2002. Effects of models of rate evolution
on estimation of divergence dates with special reference to the
metazoan 18S ribosomal RNA phylogeny. Syst. Biol. 51:703–714.
Aris-Brosou S., Yang Z. 2003. Bayesian models of episodic evolution
support a late precambrian explosive diversification of the Metazoa.
Mol. Biol. Evol. 20:1947–1954.
Averof M., Akam M. 1995. Insect crustacean relationships—insights
from comparative developmental and molecular studies. Philos.
Trans. Roy. Soc. B.–Biol. Sci. 347:293–303.
Battistuzzi F.U., Filipski A., Hedges S.B., Kumar S. 2010. Performance of
relaxed-clock methods in estimating evolutionary divergence times
and their credibility intervals. Mol. Biol. Evol. 27:1289–1300.
Bell C.D., Soltis D.E., Soltis P.S. 2010. The age and diversification of the
angiosperms re-revisited. Am. J. Bot. 97:1296–1303.
Benton M.J., Donoghue P.C. 2007. Paleontological evidence to date the
tree of life. Mol. Biol. Evol. 24:26–53.
Berner R.A. 1999. Atmospheric oxygen over Phanerozoic time. Proc.
Natl. Acad. Sci. 96:10955–10957.
Blair JE, Hedges SB. 2005. Molecular clocks do not support the
Cambrian explosion. Mol. Biol. Evol. 22:387–390.
Briggs D., Fortey R.A. 2005. Wonderful strife: systematics,
stemgroups, and the phylogenetic signal of the Cambrian radiation.
Paleobiology. 31:94–112.
Briggs D.E.G. 1985. Gigantism in Palaeozoic arthropods. Special Papers
in Palaeontol. 33:157.
Briggs D.E.G., Sutton M.D., Siveter D.J., Siveter D.J. 2005.
Metamorphosis in a Silurian barnacle. Proc. Roy. Soc. B. 272:
2365–2369.
Bromham L. 2006. Molecular dates for the Cambrian explosion.
Palaeontol. Electron. 9:3.
Budd G.E. 2008. The earliest fossil record of the animals and its
significance. Philos. Trans. Roy. Soc. London Series B, Biol. Sci.
363:1425–1434.
Budd G.E., Telford M.J. 2009. The origin and evolution of arthropods.
Nature. 457:812–817.
Burge C., Karlin S. 1997. Prediction of Complete Gene Structures in
Human Genomic DNA. J. Mol. Biol. 268:78–94.
Carpenter F.M. 1992. Treatise on invertebrate paleontology, Part R,
Arthropoda 4. Lawrence, University of Kansas Press.
Consortium D.G. 2007. Evolution of genes and genomes on the
Drosophila phylogeny. Nature. 450:203–218.
Conway Morris S. 2006. Darwin’s dilemma: the realities of the
Cambrian ’explosion’. Philos. Trans. Roy. Soc. London Series B, Biol.
Sci. 361:1069–1083.
Drummond A., Ho S.Y., Phillips M., Rambaut A. 2006. Relaxed
phylogenetics and dating with confidence. PLoS Biol. 4:e88.
Drummond A., Rambaut A. 2007. BEAST: Bayesian evolutionary
analysis by sampling trees. BMC Evol. Biol. 7:214.
Dudley R. 1998. Atmospheric oxygen, giant Paleozoic insects and
the evolution of aerial locomotor performance. J. Exp. Biol. 201:
1043–1050.
Dunlop J.A., Tetlie O.E., Lorenzo P. 2008. Reinterpretation of the
Silurian scorpion Proscorpius osborni (Whitfield): Integrating data
from Palaeozoic and recent scorpions. Palaeontology. 51:303–320.
Edgecombe G.D., Giribet G. 2007. Evolutionary biology of centipedes
(Myriapoda: Chilopoda). Ann. Rev. Entomol. 52:151–170.
[17:24 14/12/2012 Sysbio-sys074.tex]
VOL. 62
Engel M.S., Grimaldi D.A. 2004. New light shed on the oldest insect.
Nature. 427:627–630.
Felsenstein J. 1978. Cases in which parsimony or compatibility methods
will be positively misleading. Syst. Zool. 27:401–410.
Gatesy J., DeSalle R., Wahlberg N. 2007. How many genes should
a systematist sample? Conflicting insights from a phylogenomic
matrix characterized by replicated incongruence. Syst. Biol. 56:
355–363.
Gensel P.G. 2008. The earliest land plants. Ann. Rev. Ecol. Evol.
Syst. 39.
Giribet G., Edgecombe G.D., Wheeler W.C. 2001. Arthropod phylogeny
based on eight molecular loci and morphology. Nature. 413:
157–161.
Gould S.J. 1989. Wonderful life: the Burgess Shale and the nature of
history. New York, Norton.
Gray J. 1985. The microfossil record of early land plants: advances in
understanding of early terrestrialization, 1970–1984. Philos. Trans.
Roy. Soc. B-Biol. Sci. 309:167–195.
Grimaldi D.A., Engel M.S. 2005. Evolution of the insects. Cambridge,
Cambridge University Press.
Heled J., Drummond A.J. 2012. Calibrated tree priors for relaxed
phylogenetics and divergence time estimation. Syst. Biol. 61:
138–149.
Ho S.Y., Phillips M., Drummond A., Cooper A. 2005. Accuracy of rate
estimation using relaxed-clock models with a critical focus on the
early metazoan radiation. Mol. Biol. Evol. 22:1355–1363.
Ho S.Y., Phillips M.J. 2009. Accounting for calibration uncertainty in
phylogenetic estimation of evolutionary divergence times. Syst. Biol.
58:367–380.
Hug L.A., Roger A.J. 2007. The impact of fossils and taxon
sampling on ancient molecular dating analyses. Mol. Biol. Evol. 24:
1889–1897.
Jeram A.J., Selden P.A., Edwards D. 1990. Land animals in the Silurian:
arachnids and myriapods from Shropshire, England. Science.
250:658–661.
Krzeminski W., Krzeminski E., Papier F. 1994. Grauvogelia
arzvilleriana sp. n.—the oldest Diptera species (Lower/Middle
Triassic of France). Acta Zool. Cracov. 37:95–99.
Labandeira C.C. 2005. Invasion of the continents: cyanobacterial
crusts to tree-inhabiting arthropods. Trends Ecol. Evol. 20:
253–262.
Labandeira C.C., Phillips T.L. 1996. A Carboniferous insect gall: Insight
into early ecologic history of the Holometabola. Proc. Natl. Acad.
Sci. USA. 93:8470–8474.
MacNaughton R.B., Cole J.M., Dalrymple R.W., Braddy S.J., Briggs
D.E.G., Lukie T.D. 2002. First steps on land: Arthropod trackways
in Cambrian-Ordovician eolian sandstone, southeastern Ontario,
Canada. Geology. 30:391–394.
Mallatt J.M., Garey J.R., Shultz J.W. 2004. Ecdysozoan phylogeny
and Bayesian inference: first use of nearly complete 28S and 18S
rRNA gene sequences to classify the arthropods and their kin. Mol.
Phylogenet. Evol. 31:178–191.
Maloof A.C., Rose C.V., Beach R., Samuels B.M., Calmet C.C., Erwin
D.H., Poirier G.R., Yao N., Simons F.J. 2010. Possible animal-body
fossils in pre-Marinoan limestones from South Australia. Nature
Geosci. advance online publication SP - EP -.
Mayhew PJ. 2003. A tale of two analyses: estimating the consequences
of shifts in hexapod diversification. Biol. J. Linn. Soc. 80:23–36.
Meusemann K., von Reumont B.M., Simon S., Roeding F., Strauss S.,
Kuck P., Ebersberger I., Walzl M., Pass G.,
√Breuers S., Achter V., Von
Haeseler A., Burmester T., Hadrys H., W §gele J.W., Misof B. 2010a.
A phylogenomic approach to resolve the arthropod tree of life. Mol.
Biol. Evol. 27:2451–2464.
Meusemann K., Von Reumont B.M., Simon S., Roeding F., Strauss S.,
Kuck P., Ebersberger I., Walzl M., Pass G., Breuers S., Achter V., Von
Haeseler A., Burmester T., Hadrys H., Wagele J.W., Misof B. 2010b.
A phylogenomic approach to resolve the arthropod tree of life. Mol.
Biol. Evol. 27:2451–2464.
Møller O.S., Olsen J., Avenant-Oldewage A., Thomsen P.F., Glenner
H. 2008. First maxillae suction discs in Branchiura (Crustacea):
development and evolution in light of the first molecular phylogeny
of Branchiura, Pentastomida, and other ‘Maxillopoda’ Arthropod
Struct. Develop. 37:333–346.
Page: 108
93–109
2013
WHEAT AND WAHLBERG—DIVERGENCE TIME ESTIMATES FOR ARTHROPODA
Mutanen M., Wahlberg N., Kaila L. 2010. Comprehensive gene
and taxon coverage elucidates radiation patterns in moths and
butterflies. Proc. Biol. Sci. Roy. Soc. 277:2839–2848.
Olesen J. 2004. On the ontogeny of the Branchiopoda (Crustacea):
contribution of development to phylogeny and classification.
In: Scholtz G., editor. Evolutionary Developmental Biology of
Crustacea. Netherlands, Swets and Zeitlinger B. V., p. 217–271.
Peterson K.J., Cotton J.A., Gehling J.G., Pisani D. 2008. The Ediacaran
emergence of bilaterians: congruence between the genetic and the
geological fossil records. Philos. Trans. Roy. Soc. London Series B,
Biol. Sci. 363:1435–1443.
Peterson K.J., Lyons J.B., Nowak K.S., Takacs C.M., Wargo M.J., McPeek
M.A. 2004. Estimating metazoan divergence times with a molecular
clock. Proc. Natl. Acad. Sci. USA. 101:6536–6541.
Pisani D., Poling L.L., Lyons-Weiler M., Hedges S.B. 2004. The
colonization of land by animals: molecular phylogeny and
divergence times among arthropods. BMC Biol. 2:1.
Posada D., Crandall K.A. 1998. MODELTEST: testing the model of DNA
substitution. Bioinformatics. 14:817–818.
Rambaut A., Grassly N.C. 1997. Seq-Gen: An application for the Monte
Carlo simulation of DNA sequence evolution along phylogenetic
trees. Comp. Appl. Biosci.: CABIOS. 13:235–238.
Regier J.C., Shultz J.W., Ganley A.R., Hussey A., Shi D., Ball B.,
Zwick A., Stajich J.E., Cummings M.P., Martin J.W., Cunningham
C.W. 2008. Resolving arthropod phylogeny: exploring phylogenetic
signal within 41 kb of protein-coding nuclear gene sequence. Syst.
Biol. 57:920–938.
Regier J.C., Shultz J.W., Zwick A., Hussey A., Ball B., Wetzer R., Martin
J.W., Cunningham C.W. 2010. Arthropod relationships revealed
by phylogenomic analysis of nuclear protein-coding sequences.
Nature. 463:1079–1083.
Rehm P., Borner J., Meusemann K., von Reumont B.M., Simon S.,
Hadrys H., Misof B., Burmester T. 2011. Dating the arthropod tree
based on large-scale transcriptome data. Mol. Phylogenet. Evol.
61:880–887.
Sacktor B. 1976. Biochemical adaptations for flight in the insect.
Biochem. Soc. Symposia. 41:111–131.
Sanders K.L., Lee M.S. 2007. Evaluating molecular clock calibrations
using Bayesian analyses with soft and hard bounds. Biol. Lett. 3:
275–279.
Sanders K.L., Lee M.S.Y. 2010. Arthropod molecular divergence times
and the Cambrian origin of pentastomids. Syst. Biodivers. 8:
63–74.
Sanderson M.J., McMahon M.M., Steel M. 2010. Phylogenomics with
incomplete taxon coverage: the limits to inference. BMC Evol. Biol.
10:155.
Savard J., Tautz D., Lercher M.J. 2006. Genome-wide acceleration of
protein evolution in flies (Diptera). BMC Evol. Biol. 6:7.
Savard J., Tautz D., Richards S., Weinstock G.M., Gibbs R.A., Werren
J.H., Tettelin H., Lercher M.J. 2006. Phylogenomic analysis reveals
bees and wasps (Hymenoptera) at the base of the radiation of
Holometabolous insects. Genome Res. 16:1334–1338.
[17:24 14/12/2012 Sysbio-sys074.tex]
109
Schaefer I., Norton R.A., Scheu S., Maraun M. 2010. Arthropod
colonization of land—Linking molecules and fossils in oribatid
mites (Acari, Oribatida). Mol. Phylogenet. Evol. 57:113–121.
Schubart C.D., Diesel R., Hedges S.B. 1998. Rapid evolution of
terrestrial life in Jamaican crabs. Nature. 393:363–365.
Simon S., Strauss S., Von Haeseler A., Hadrys H. 2009. A phylogenomic
approach to resolve the basal pterygote divergence. Mol. Biol. Evol.
26:2719–2730.
Soltis D., Albert V.A., Savolainen V., Hilu K., Qiu Y.L., Chase
M.W., Farris J.S., Stefanovif á S., Rice D.W., Palmer J.D., Soltis P.
2004. Genome-scale data, angiosperm relationships, and "ending
incongruence": A cautionary tale in phylogenetics. Trends Plant Sci.
9:477–483.
Timmermans M.J., Roelofs D., Mariën J., van Straalen N.M. 2008.
Revealing pancrustacean relationships: phylogenetic analysis of
ribosomal protein genes places Collembola (springtails) in a
monophyletic Hexapoda and reinforces the discrepancy between
mitochondrial and nuclear DNA markers. BMC Evol. Biol. 8:83.
Wahlberg N., Leneveu J., Kodandaramaiah U., Pena C., Nylin S.,
Freitas A.V., Brower A.V. 2009. Nymphalid butterflies diversify
following near demise at the Cretaceous/Tertiary boundary. Proc.
Biol. Sci./Roy. Soc. 276:4295–4302.
Waloszek D., Repetski J.E., Mass A. 2006. A new Late Cambrian
pentastomid and a review of the relationships of this
parasitic group. Trans. Roy. Soc. Edinburgh, Earth Sci. 96:
163–176.
Ward P., Labandeira C., Laurin M., Berner R.A. 2006. Confirmation of
Romer’s Gap as a low oxygen interval constraining the timing of
initial arthropod and vertebrate terrestrialization. Proc. Natl. Acad.
Sci. USA. 103:16818–16822.
Welch J.J., Bromham L. 2005. Molecular dating when rates vary. Trends
Ecol. Evol. (Personal edition). 20:320–327.
Welch J.J., Fontanillas E., Bromham L. 2005. Molecular dates for the
"Cambrian explosion": the influence of prior assumptions. Syst. Biol.
54:672–678.
Wellman C.H., Osterloff P.L., Mohluddin U. 2003. Fragments of the
earliest land plants. Nature. 425:282–285.
Wertheim J.O., Sanderson M.J. 2011. Estimating diversification rates:
how useful are divergence times? Evolution. 65:309–320.
Wiegmann B., Trautwein M.D., Kim J., Cassel B.K., Bertone M.A.,
Winterton S.L., Yeates D. 2009. Single-copy nuclear genes resolve
the phylogeny of the holometabolous insects. BMC Biology. 7:34.
Wildish D.J. 1988. Ecology and natural history of aquatic Talitroidea.
Can. J. Zool. 66:2340–2359.
Willis K.J., McElwain J.C. 2002. The evolution of plants. Oxford, Oxford
University Press.
Yuan X., Chen Z., Xiao S., Zhou C., Hua H. 2011. An early Ediacaran
assemblage of macroscopic and morphologically differentiated
eukaryotes. Nature. 470:390–393.
Zhang X., Siveter D., Waloszek D., Maas A. 2007. An epipoditebearing crown-group crustacean from the Lower Cambrian. Nature.
449:595–598.
Page: 109
93–109