Full Text - Systematic Biology

Syst. Biol. 57(6):891–904, 2008
c Society of Systematic Biologists
Copyright ISSN: 1063-5157 print / 1076-836X online
DOI: 10.1080/10635150802570809
The Modified Gap Excess Ratio (GER*) and the Stratigraphic Congruence
of Dinosaur Phylogenies
M ATTHEW A. WILLS ,1 PAUL M. B ARRETT ,2 AND J ULIA F. HEATHCOTE3
2
1
Department of Biology and Biochemistry, The University of Bath, The Avenue, Claverton Down, Bath BA2 7AY, UK; E-mail: [email protected]
Department of Palaeontology, The Natural History Museum, Cromwell Road, South Kensington, London SW7 5BD, UK; E-mail: [email protected]
3
Department of Earth Sciences, Birkbeck College, University of London, Malet Street, London WC1E 7HX, UK
Abstract.— Palaeontologists routinely map their cladograms onto what is known of the fossil record. Where sister taxa first
appear as fossils at different times, a ghost range is inferred to bridge the gap between these dates. Some measure of the
total extent of ghost ranges across the tree underlies several indices of cladistic/stratigraphic congruence. We investigate
this congruence for 19 independent, published cladograms of major dinosaur groups and report exceptional agreement
between the phylogenetic and stratigraphic patterns, evidenced by sums of ghost ranges near the theoretical minima. This
implies that both phylogenetic and stratigraphic data reflect faithfully the evolutionary history of dinosaurs, at least for
the taxa included in this study. We formally propose modifications to an existing index of congruence (the gap excess
ratio; GER), designed to remove a bias in the range of values possible with trees of different shapes. We also propose a
more informative index of congruence—GER∗ —that takes account of the underlying distribution of sums of ghost ranges
possible when permuting stratigraphic range data across the tree. Finally, we incorporate data on the range of possible
first occurrence dates into our estimates of congruence, extending a procedure originally implemented with the modified
Manhattan stratigraphic measure and GER to our new indices. Most dinosaur data sets maintain extremely high congruence
despite such modifications. [Dinosaurs; fossil record; gap excess ratio; phylogeny; randomization; stratigraphic congruence.]
Evidence for the evolutionary history of most groups
derives from two independent sources. The first is the
distribution of phylogenetically informative characters
or markers in extant and extinct taxa. The second is the
stratigraphic or temporal sequence in which taxa occur
as fossils. Neither source of data can be read uncritically,
and both require interpretation. Phylogenies incorporate
assumptions concerning rooting and models of evolution. The resulting trees are therefore inferences rather
than data. Fossils require varying degrees of interpretation depending upon the nature of the material, and dates
may be subject to large margins of error. For these reasons, it is often desirable to compare inferences by mapping cladograms onto stratigraphic range charts (Norell
and Novacek, 1992; Benton and Hitchin, 1997; Clyde and
Fisher, 1997; Wagner and Sidor, 2000; Angielczyk and
Fox, 2006; Pol and Norell, 2006). Where the phylogenetic
and temporal inferences are concordant, they offer mutual corroboration, and both can be assumed reasonably
to reflect the true, underlying evolution of the group.
However, where the order of phylogenetic branching
conflicts with the stratigraphic order, additional evidence is needed to determine which most closely reflects reality (Wills, 2007). Indications of problems with
the phylogeny include the availability of multiple, conflicting cladograms for the same group in the literature
(Wills, 1999, 2001) and poorly supported trees that are
liable to radical branch shifts with minor perturbations
of the data (Cobbett et al., 2007). Indications of problems with the stratigraphic data include a particularly
patchy and fragmentary fossil record (e.g., Crampton
et al., 2003; Smith, 2003), with dates of first occurrences
that have been particularly labile with research time and
effort (e.g., Benton, 2001; Wills, 2002).
As well as focusing on particular cladograms and
specific groups, previous comparisons between phylogenetic and stratigraphic inferences have investigated
large samples of trees, thereby attempting to make generalizations about patterns of congruence across taxa
or through time. Some groups (e.g., tetrapods and fish;
Benton and Hitchin, 1997; Hitchin and Benton, 1997a;
Wills, 2001) match the fossil record comparatively well,
whereas others (e.g., arthropods: Wills, 2001) match
the record comparatively poorly. Cladograms for some
groups—notably the crustaceans (Wills, 1998)—are actually less congruent with the fossil record than with a
significant majority (>99%) of randomly permuted range
assignments.
In cases of very poor congruence, it is difficult to make
inferences about the quality of the fossil record or the
accuracy of cladograms, because neither is known with
certainty, and there are many conflated variables. For example, although congruence is higher through most of
the Mesozoic relative to the Palaeozoic and the Cenozoic
(Wills, 2007), the taxonomic composition of the biota
changes markedly through time. The early Palaeozoic
is dominated by invertebrates, particularly arthropods.
These are known to have relatively low fossilization potential and offer systematists smaller and more homoplastic character sets (Wagner, 2000a). The Late Triassic
and Early Jurassic, in contrast, witness the origin and
early radiation of several extremely well-studied groups
of vertebrates, including mammals and dinosaurs. These
typically have much higher fossilization potential and
provide larger numbers of less homoplastic characters.
Alternatively, the relatively poor congruence of arthropod trees may result from either an arthropod fossil
record that is less complete than that of other groups
(Wills, 2001) or because arthropod cladograms are less
accurate.
Previous piecemeal investigations of a modest number
of dinosaur clades have demonstrated remarkable congruence between phylogeny and stratigraphy (Brochu
and Norell [2000] and Rauhut [2003] for theropods;
891
892
SYSTEMATIC BIOLOGY
Wilson [2002] for sauropods; Pol and Norell [2006] for
basal sauropodomorphs). However, it is unclear whether
this is simply a sampling effect given the trees so far investigated or whether exceptional congruence is a feature of dinosaur cladograms more generally. Here we
provide the first systematic investigation of congruence
between the dinosaur stratigraphical record and phylogeny on the basis of a sample of recently published
dinosaur cladograms. By applying modified congruence
indices to these data, we demonstrate that in half of our
sampled cases, the fit of cladograms to the fossil record
is as good as it could possibly be, whereas in half of the
remaining cases, the fit is within 10% of the theoretical
maximum. We also adapt a procedure implemented by
Pol and Norell (2006) for incorporating uncertainty in the
dating of the earliest occurring fossils. Even allowing for
dating errors and artificially selecting the least congruent
combinations of possible first occurrence dates, a quarter of dinosaur trees were still perfectly congruent, with
another quarter within 10% of this optimum.
M ATERIAL AND M ETHODS
The Data Set
Nineteen higher-level cladograms of dinosaurs covering all of the major groups were sourced from papers
and books (notably, multiple author contributions in
Weishampel et al., 2004a) published in the last 5 years
(Table 1). In many cases, these were revisions and/or
extensions of older trees published by the same or overlapping sets of authors. Our sample is therefore representative of the current, revised state of dinosaur
phylogenetics. Stratigraphic dates for the terminals
were primarily derived from Weishampel et al. (2004b).
Our stratigraphic ranges spanned 27 stages, from the
Ladinian in the Middle Triassic (approximately 237 Ma)
to the Maastrichtian (65.5 Ma), with some taxa (avian
dinosaurs) also extending into the Paleocene (Gradstein
et al., 2004). Each stage was further divided into lower,
middle, and upper, yielding 81 stratigraphic intervals in
total.
First occurrences were dated in two ways. For “static”
datings, we used the best estimates. For “uncertain” datings, we recorded all intervals, or the range of intervals,
over which the earliest fossils may have occurred. Uncertainty of this kind derives from either or both of two
sources. In the first, the assignment of the oldest fossil occurrence may be contentious (e.g., an undiagnostic,
fragmentary, poorly preserved, or uncertainly accredited
specimen), with better and more archetypal material occurring later. In the second, the identity of the oldest
fossil(s) may be known with confidence, but the absolute age and stratigraphic position of this material may
be uncertain. For example, the oldest fossils might date
indubitably from the Late Jurassic (with the next oldest
exemplars occurring in the Cretaceous), but their precise
stratigraphic position within the Oxfordian, Kimmeridgian, and Tithonian stages may be unknown. We have
not sought to distinguish between these sources of uncertainty here. Tables of stratigraphic ranges are provided
in online Appendix 1 (www.systematicbiology.org).
VOL. 57
Indices of Congruence: Modifications to the Gap Excess
Ratio (GER)
A variety of indices have been proposed to quantify
the congruence between phylogenetic and stratigraphic
signals. These have been reviewed in some detail elsewhere (Siddall, 1996, 1998; Benton et al., 1999; Hitchin
and Benton, 1997a, 1997b; Wills, 1999; Wagner and Sidor,
2000). Common to many of these is the identification of
ghost ranges: portions of geological time through which
a lineage is inferred to have existed but for which there
is no direct evidence. These may be measured in absolute time (millions of years) or in stratigraphic units
(however defined). If all of the terminals in a cladogram
are monophyla, then ghost ranges are inferred wherever
sister terminals or sister clades first appear in the fossil
record at different times (Fig. 1). A direct or indirect tally
of these inferred ghost ranges or gaps across the entire
tree underlies several conceptually related congruence
indices, including the gap excess ratio (GER; Wills, 1999),
the Manhattan stratigraphic measure (MSM*; Siddall,
1998; Pol and Norell, 2001), the retention index of the
stratigraphic character (Farris, 1989; Finarelli and Clyde,
2002), and the relative completeness index (RCI; Benton,
1994; although this last is more properly a measure of the
completeness of the fossil record rather than a measure
of congruence per se; Fig. 1). The sum of ghost ranges is
denoted as the minimum implied gap (MIG in Benton
[1994] or simply the MIG in Wills [1999]).
The MIG alone says little about congruence, because
its size depends upon the number of taxa in the cladogram as well as their stratigraphic distribution. The MIG
therefore needs to be scaled between some maximum
and minimum. The GER (Wills, 1999) is widely used and
behaves well in simulations (Finarelli and Clyde, 2002).
It scales the MIG (measured in absolute time) between
the sum of ghost ranges obtained for the best (Gmin ) and
worst (Gmax ) fits of a given set of stratigraphic data onto
any tree topology.
GER = 1 − (MIG − Gmin )/(Gmax − Gmin )
(1)
In practice, Gmin is equivalent to the difference in
age between the oldest origin and the youngest origin,
whereas Gmax is equivalent to the sum of the differences
in age between the oldest origin and the dates of origin
of all other taxa. However, for most nonpectinate tree
topologies, values of MIG can never reach Gmin or Gmax ,
and hence GER values can never reach 0.0 or 1.0 (Fig. 1).
The MSM∗ is conceptually related to the GER but is calculated using an irreversible stepmatrix character (Fig.
1) within parsimony programs. It is determined by dividing the minimum possible number of stratigraphic
character steps by the observed number of steps. Hence,
the measure is not scaled relative to any theoretical maximum number of steps and so can never reach zero (except
in the trivial case where all taxa originate in the same interval). Like the GER, it is also sensitive to differences
in tree balance, with the maximum theoretical values for
nonpectinate trees typically being less than 1.0 (Fig. 1).
893
Group
0.9561 (0.9326)
0.9468 (0.9259)
0.9101 (0.8991)
0.8971 (0.8876)
0.8963 (0.8473)
0.8442 (0.7987)
0.8044 (0.8040)
0.7820 (0.8125)
0.7813 (0.7787)
0.7811 (0.7912)
0.7683 (0.7682)
0.7582 (0.7592)
0.7308 (0.7792)
0.7101 (0.7269)
0.6780 (0.6618)
0.6203 (0.6315)
0.5789 (0.4894)
0.5674 (0.6266)
0.5579 (0.5355)
0.8036 (0.7799)
0.8957 (0.8817)
0.6321 (0.6818)
0.6338 (0.6736)
0.7847 (0.7990)
23
23
40
44
49
Fixed dates
19
12
27
23
14
16
36
40
26
44
38
16
18
21
49
23
10
34
32
Taxa
a
Notes:
Tree of Upchurch et al. (2004), pruning all taxa not in common with the phylogeny of Wilson (2002).
b
Tree of Wilson (2002), pruning all taxa not in common with the phylogeny of Upchurch et al. (2004).
c
Tree of Holtz et al. (2004), with the addition of the problematic fossil Eshanosaurus.
d
Tree of Rauhut (2003), with the addition of the problematic fossil Eshanosaurus.
e
Tree of Holtz et al. (2004), with the addition of nine tyrannosauroid taxa to the tree.
Trees as originally published
Horner et al. (2004)
Hadrosauridae
Maryanska et al. (2004)
Pachycephalosauria
Wilson (2002)
Sauropoda
Vickaryous et al. (2004)
Ankylosauria
Dodson et al. (2004)
Ceratopsia
Norman (2004)
Iguanodontia (basal)
Upchurch et al. (2004)
Sauropoda
Holtz et al. (2004)
Tetanurae (basal)
Upchurch et al. (2007)
Sauropodomorpha (basal)
Rauhut (2003)
Theropoda (basal)
Butler et al. (2008)
Ornithischia (basal)
Xu et al. (2002)
Ceratopsia (basal)
Yates and Kitching (2003)
Sauropoda
Butler (2005)
Ornithischia
Senter (2007)
Paraves + Oviraptorosauria
Galton and Upchurch (2004a) Prosauropoda
Galton and Upchurch (2004b) Stegosauria
Yates (2007)
Sauropodomorpha (basal)
Turner et al. (2007)
Paraves
Trees above with modifications (see notes)
a
Upchurch et al. (2004)
Sauropoda
Wilson (2002)b
Sauropoda
Holtz et al. (2004)c
Tetanurae (basal)
Rauhut (2003)d
Theropoda (basal)
e
Tetanurae (basal)
Holtz et al. (2004)
Authors
0.8027
0.8850
0.6139
0.6379
0.7883
0.8242
0.9230
0.6506
0.6654
0.8106
0.9736
0.9468
0.9201
0.9387
0.9044
0.8433
0.8500
0.7884
0.8008
0.7955
0.7760
0.7973
0.7927
0.7271
0.7430
0.6628
0.6615
0.6903
0.6233
Max.
Min.
0.7878
0.8529
0.5874
0.6162
0.7693
0.9474
0.7673
0.8912
0.8772
0.8871
0.7860
0.7900
0.7351
0.6833
0.7635
0.7554
0.7465
0.6447
0.6815
0.6230
0.5195
0.1915
0.4792
0.5050
Variable dates
0.9583
0.9189
0.9042
0.9009
0.8959
0.8483
0.8138
0.7589
0.7471
0.7783
0.7653
0.7679
0.7206
0.7011
0.6816
0.5965
0.4550
0.5844
0.5629
Mean
GER
1.0000
1.0000
0.7699
1.0000
1.0000
1.0000
0.9740
1.0000
1.0000
0.9167
0.9076
1.0000
1.0000
1.0000
1.0000
1.0000
0.7550
0.9120
1.0000
0.9065
0.7204
0.6111
1.0000
0.5711
dates
Fixed
1.0000
1.0000
0.8421
0.9999
1.0000
1.0000
0.9716
1.0000
1.0000
0.9588
0.9131
1.0000
0.9989
1.0000
1.0000
1.0000
0.7683
0.9960
0.9998
0.9967
0.8778
0.5336
0.9999
0.6869
Mean
1.0000
1.0000
0.9866
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.8005
1.0000
1.0000
1.0000
1.0000
0.8235
1.0000
0.8099
Max.
Min.
1.0000
1.0000
0.6841
0.9577
0.9847
0.9795
0.8085
1.0000
1.0000
0.8750
0.8555
1.0000
0.9071
0.9863
1.0000
1.0000
0.7458
0.7923
0.8795
0.8473
0.6628
0.2195
0.9101
0.5534
Variable dates
GERt
1.0000
1.0000
0.9893
1.0000
1.0000
1.0000
0.9980
1.0000
1.0000
0.9923
0.9992
1.0000
1.0000
1.0000
1.0000
1.0000
0.9980
0.9998
1.0000
0.9998
0.9896
0.8463
1.0000
0.8785
dates
Fixed
0.999
0.999
0.983
0.999
0.999
0.999
0.994
0.999
0.999
0.990
0.998
0.999
0.999
0.999
0.999
0.999
0.997
0.998
0.999
0.998
0.990
0.781
0.999
0.911
Mean
0.999
0.999
0.998
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.999
0.998
0.999
0.965
0.999
0.976
Max.
0.998
0.998
0.958
0.998
0.998
0.999
0.952
0.999
0.999
0.952
0.992
0.999
0.994
0.999
0.999
0.999
0.988
0.999
0.999
0.994
0.966
0.485
0.999
0.843
Min.
Variable dates
GER∗
TABLE 1. A variety of congruence indices for 19 dinosaur trees as originally published, plus experimental manipulations of 4 of those trees (see footnotes). Stratigraphic range
data principally from Weishampel et al. (2004b). All indices calculated assuming stratigraphic intervals of unit length, except values in parentheses for the gap excess ratio (GER),
which used absolute ages. Topological GER (GERt) and modified gap excess ratio (GER∗ ) values for fixed dates are based on 50,000 randomizations of stratigraphic data across each
topology. Mean, maximum, and minimum GER, GERt, and GER∗ values for variable dates are based on 1000 random perturbations of stratigraphic dates within the stipulated range
of possibilities, followed by 1000 randomizations of those dates across each perturbed tree (1,000,000 permutations for each original tree).
894
SYSTEMATIC BIOLOGY
VOL. 57
Huelsenbeck, 1994), is calculated simply as the fraction of
internal cladogram nodes with sister nodes of equivalent
or greater age. This is strongly biased by tree topology
(Wills, 1999).
In order to overcome biases in the GER related to tree
balance, we therefore propose a modification to the index, scaling the observed sum of ghost ranges between
its maximum and minimum possible values on a given
tree topology (rather than on any topology; topological
GER or GERt):
GERt = 1 − (MIGu − Gtmin )/ (Gtmax − Gtmin )
FIGURE 1. Existing indices of stratigraphic congruence (SCI, MSM*,
and GER) are biased by tree shape. (a) Cladogram of nine taxa, A to
I. (b) Stratigraphic ranges for taxa A to I, along with the corresponding Manhattan stepmatrix, used as an irreversible stratigraphic character in calculating the MSM*. (c) Mapping of the tree in (a) onto the
stratigraphic ranges in (b). Ghost ranges are indicated by grey vertical bars. Note that the maximum possible SCI value (the fraction of
internal nodes with sister nodes as old or older) is obtained for this
tree, compared with lower values of the MSM* and GER. The stratigraphic character is optimized to “evolve” in parallel along two major
branches of the tree, hence the moderate value for MSM*. (d) The best
possible fit of stratigraphic ranges to the tree in (a), with a GERt of 1.00
by definition. This is obtained by rearranging the stratigraphic data
over the terminals, whilst leaving the topology unchanged. Note that
the MSM* and GER have values below 1.00, despite the impossibility
of obtaining a higher value given a tree of this shape. (e) The worst
possible fit of stratigraphic ranges to the tree in (a), with a GERt of 0.00
by definition. Although this corresponds to a GER of 0.00, the lowest
possible MSM* value is 0.25. Values of GER∗ shown here are based on
10,000 random permutations. See text for further details. (a) and (b) are
partly derivative of Slater (2007).
According to Finarelli and Clyde (2002), the retention
index of the stratigraphic character is equivalent to the
GER and is therefore susceptible to the same biases. An
unrelated index, the stratigraphic consistency index (SCI;
(2)
where MIGu is the sum of ghost ranges for stratigraphic
intervals of unit length, and Gtmax and Gtmin are the maximum and minimum possible values of MIGu on the
given topology. Although Benton (1994) and Wills (1999)
calculated MIG values by summing the lengths of ghost
ranges calculated in millions of years, this effectively incorporates the assumption that preservation potential is
uniform with time, which is often unwarranted (Finarelli
and Clyde, 2002). The MIGu is therefore calculated as the
sum of the number of stratigraphic intervals traversed
(effectively scaling all to a unit length; Figs. 1 and 2).
This assumes that the intervals are at or near the maximum level of stratigraphic resolution for the groups
concerned, with the boundaries between intervals being
demarcated by distinct events for appropriate correlation. (In fact, we consider that both approaches [measuring intervals in absolute time, or in unit length] are
defensible.) Computationally, it is difficult to determine
Gtmax and Gtmin with certainty, because the problem is
NP-complete. Permuting the assignment of range data
over the tree offers a simple way to estimate them. However, because the distribution of possible MIGu values
usually has long tails (particularly towards Gtmin ), these
estimates are likely to be within the true values (Fig. 3).
Moreover, because the distribution is often strongly negatively skewed, Gtmin is more likely to be overestimated
than Gtmax is likely to be underestimated, resulting in
overestimates of GERt. This is because there are relatively
few ways of permuting the data that will yield the minimum possible MIGu value or something closely approximate to it. In order to compute indices in a reasonable
time, we estimated Gtmin , Gtmax , and hence GERt from
50,000 permutations of the stratigraphic data in each
case.
A more refined index is therefore formally proposed,
based on an estimate of the underlying distribution of
randomized MIGu values, rather than estimates of Gtmax
and Gtmin alone. The modified GER (GER∗ ) is estimated
by calculating the fraction of the area under a curve
of permuted values corresponding to a MIGu value
greater than the observed value (Fig. 3). The compliment of GER∗ values also approximate to the probability that an observed level of stratigraphic congruence is
obtained purely by chance. Unlike GERt, estimates of
GER∗ are much less sensitive to the number of permutations used. They also take the shape of the distribution
2008
WILLS ET AL.—MODIFIED GAP EXCESS RATIO
895
FIGURE 2. The relationship between GER and GERt for a cladogram of sixteen predominantly basal ceratopsian dinosaur taxa and outgroups
(Xu et al., 2002). (a) Published cladogram. (b) Observed ranges for the fossil taxa in (a) are plotted as vertical black bars, spanning between the
interval in which each taxon first appears in the fossil record and the last interval before its extinction. Ghost ranges are inferred between all
pairs of sister taxa that originate at different times and are plotted as vertical grey bars. These ghost ranges have a duration that can be measured
as a number of millions of years (Benton, 1994; Wills, 1999) or as the number of stratigraphic intervals traversed (Siddall, 1998; Finarelli and
Clyde, 2002; this work). Summing ghost ranges over the tree yields the minimum implied gap (MIGu) for a given cladogram. In this example,
all stratigraphic intervals have been ascribed unit length, such that MIGu = MIG. Where taxa originate at different times, a certain number of
ghost ranges will be necessary as an absolute minimum in order to unite all observed ranges into a tree structure. Similarly, there is a maximum
possible sum of ghost ranges for a given distribution of observed ranges. The gap excess ratio (GER; Wills, 1999) scales the observed MIG between
(c) the maximum possible sum of ghost ranges on any tree (Gmax ; equivalent to a GER of 0.0) and (d) the minimum possible sum of ghost ranges
on any tree (Gmin ; equivalent to a GER of 1.0). However, balanced and imperfectly balanced trees (those that depart from a pectinate structure)
often cannot yield GER values of 0.0 or 1.0, irrespective of how taxa are rearranged across them. This means that balanced trees are scaled over a
narrower range of possible GER values than pectinate ones. The topological GER (GERt) is analogous to the GER, except that the MIGu value is
scaled between (d) the maximum (Gtmax ) and (e) the minimum (Gtmin ) sum of ghost ranges possible for a given set of observed ranges distributed
over a given topology.
896
SYSTEMATIC BIOLOGY
VOL. 57
FIGURE 3. The distribution of possible sums of ghost ranges for a given set of observed ranges on a given tree topology can be approximated
empirically using randomization. For this data set (Xu et al., 2002), it reveals that there are relatively few ways to obtain the smallest MIGu (or
something closely approximate to it) but many more ways to obtain the largest MIGu. The modified GER (GER∗ ) takes account of the shape of
this distribution and is approximated by the fraction of randomized values yielding a MIGu greater than or equal to that of the original data.
For negatively skewed distributions (such as that here), the GER∗ value will be greater than the corresponding GERt value. Values of Gmin and
Gmax that define the GER lie outside of the distribution of possible MIGu values. The GER is depressed relative to the GERt and GER∗ .
of MIGu values directly into account. As with the GERt,
GER∗ indices were estimated from 50,000 permutations
of stratigraphic ranges for each data set. However, highly
similar values were obtained using just 1,000 permutations, with the correlation between GER∗ values over all
19 data sets being very high (r = 0.995, P < 0.001).
Uncertain Dates of First Occurrence
For most measures of congruence, times of first
occurrence are considered as single, invariant dates, irrespective of our confidence in those dates. In many cases,
however, first occurrences are not known with confidence, or a range of possible first dates is considered by
different authorities. Differences in interpretation therefore have the potential to make differences to GER, GERt,
and GER∗ values. These differences can become large and
significant when the range of first occurrence dates for
a particular taxon or subset of taxa is large relative to
the total range of first occurrence dates over all taxa. Pol
and Norell (2006) proposed a randomization procedure
for estimating the shape of a distribution of MSM* or
GER values obtained for a set of uncertain or certain and
uncertain first occurrence dates. Their approach was to
generate a large number of virtual data sets in which the
actual first occurrence date of a given taxon was selected
at random, and with equal probability, from all the intervals between its oldest and youngest putative date of
first occurrence. (We note, in this context, that the method
is readily modified to incorporate different probabilities
of first occurrence dates within the total range.) These
data sets were then analyzed by conventional means in
order to estimate the distribution (Fig. 4). A precisely
analogous approach is implemented here for the GERt
and GER∗ . We report a variety of statistics for the randomized distribution of each metric. We are particularly
2008
WILLS ET AL.—MODIFIED GAP EXCESS RATIO
897
FIGURE 4. The distribution of GER values obtained for 10,000 permutations of variable first occurrence dates for the tree of Xu et al. (2002).
The value obtained from the original “static” stratigraphic data (0.758) is close to the modal value for “variable” dates (0.756) but considerably
less than the median (0.767) and the mean (0.768). Hence, the uncertain dates of first occurrence tend to yield higher GER values than the original
data in this case. A similar procedure is implemented for both the GERt and the GER∗ .
interested in whether the “worst” possible perturbations
of first occurrence dates across all taxa yield indices that
differ markedly from either the mean value or the value
for fixed dates.
For practical purposes, we implemented 1,000 permutations of flexible first occurrence dates. For each of
these, we ran 1,000 permutations of the stratigraphic data
across the topology. It is not possible to circumvent the
repeated search for Gtmax and Gtmin with each of the 1,000
“adjustments” of first occurrence dates (by, for example,
implementing a single, more thorough search) because
these may differ for each of what is effectively a new data
set. With such a modest number of iterations, GERt values must be treated with caution. GER∗ values are more
likely to provide accurate approximations.
R ESULTS
Over all 19 data sets, congruence was extremely high
(Table 1). The average GER (Wills, 1999) for static first
occurrence dates was 0.767, with the best data set being
the Hadrosauridae (Horner et al., 2004; 0.956) and the
worst being the Paraves (Turner et al., 2007; 0.558). The
GERt (scaling the MIGu between the maximum and minimum possible on the observed tree topology) returned
even higher values, with an average of 0.909. Ten of the
data sets had a GERt of 1.000, whereas five of the remaining nine had values over 0.900. The least congruent
data sets were the Paraves (Turner et al., 2007; GERt =
0.571), Stegosauria (Galton and Upchurch, 2004b; GERt
= 0.611), Prosauropoda (Galton and Upchurch, 2004a;
GERt = 0.720), and Ceratopsia (Xu et al., 2002; GERt =
0.755). GER∗ values were the highest of all, with a mean
of 0.984. Ten data sets (65%) had a GER∗ of 0.99998 (the
highest possible for 50,000 permutations), with only two
(Galton and Upchurch, 2004b; Turner et al., 2007) having
a GER∗ less than 0.975.
Introducing limited uncertainty into the dates of first
occurrence produced mean values for most data sets that
were generally very similar to those from static datings
(GER: r = 0.976, P < 0.0005; GERt: r = 0.923, P < 0.0005;
GER∗ : r = 0.954, P < 0.0005). One notable exception was
the set of GER values for the Stegosauria (0.455 and 0.579,
respectively). Across all 19 data sets, mean GER incorporating uncertainty (Fig. 5) was 0.756 (just 0.011 less than
the static value). Even allowing for the worst possible
datings, mean GER was only reduced to 0.699. Eleven
data sets had their lowest GER above 0.700, whereas
only two (Galton and Upchurch [2004b] for Stegosauria
[0.192] and Yates [2007] for Sauropodomorpha [0.479])
had a value less than 0.500. GERt values were actually
slightly higher for uncertain dates than for static dates,
with a mean of 0.932 (a difference of 0.023). Seven data
sets (37% of the sample) had a mean GERt of 1.000 (and
so therefore also a minimum GERt of 1.000) and were
perfectly congruent, even allowing for the vagaries of
preservation. Of the remaining 12, just 5 had a mean
GERt less than 0.950. Again, the worst possible datings
still yielded a high mean GERt (0.843). Remarkably, however, the best possible datings yielded perfect congruence
(GERt = 1.000) for 16 of the 19 data sets. Only for the trees
of Galton and Upchurch (2004b), Turner et al. (2007),
and Xu et al. (2002) was it impossible to adjust dates
within the margin of uncertainty so as to achieve a perfect
match (maximum GERt values of 0.824, 0.810, and 0.801,
respectively).
GER∗ values for variable dates had a mean of 0.981,
with 10 of 19 data sets having a mean (and minimum)
898
SYSTEMATIC BIOLOGY
VOL. 57
FIGURE 5. Box and whisker plots of the distribution of GER values obtained for 1000 permutations of variable first occurrence dates. Data
from the 19 published dinosaur trees are arranged in order of median GER.
of 0.999 (the highest value possible for 1,000 randomizations). Even the worst possible datings yielded a mean of
0.956, with just two data sets below 0.950. The interpretation of the GER∗ is more straightforward than that of the
GER or GERt. GER∗ values above 0.500 indicate greater
congruence than the median of the null distribution, and
values above 0.975 indicate significantly greater congruence than the null distribution (with P ≤ 0.050, not correcting for multiple tests). For the static first occurrence
dates, 16 of 17 data sets were significantly congruent,
with just two being indistinguishable from random.
Preliminary reanalysis of the 1,000 animal and plant
“static” data sets utilized by Benton et al. (2000) and Wills
(2007) yielded an average GER∗ of 0.688, with only 34%
attaining a GER∗ > 0.975. This strongly suggests that the
congruence of dinosaur phylogenies is better than that
for a large sample of data sets across a range of other
taxa.
The greatest discrepancies between static GERt and
GER∗ values are for the data sets of Paraves (Turner
et al., 2007; a difference of 0.308), Prosauropoda (Galton
and Upchurch, 2004a; a difference of 0.269), Ceratopsia
(Xu et al., 2002; 0.243), and Stegosauria (Galton and
Upchurch, 2004b; 0.235). These produced particularly
skewed distributions of MIGu values upon permutation,
such that there were very few ways of achieving close to
the highest congruence and many ways to achieve close
to the poorest congruence (Fig. 3). All other differences
were less than 0.100. Such sensitivity to the underlying
distribution of possible MIGu values is a desirable property for an index of congruence.
D ISCUSSION
The majority of the dinosaur cladograms investigated
have excellent stratigraphic congruence, and several
have the theoretical maximum congruence. Strong agreement between the phylogenetically inferred order of lineage branching and the known stratigraphic order of
first occurrences strongly suggests that both reflect the
true history and evolution of the group. This implies that
our knowledge of the dinosaur fossil record at the taxonomic levels investigated (predominantly species and
genera), and the stratigraphic level of stages is adequate
2008
WILLS ET AL.—MODIFIED GAP EXCESS RATIO
for palaeontologists to investigate pertinent macroevolutionary patterns and processes (e.g., the origin of birds
[Brochu and Norell, 2000; Zhou, 2004] as well as patterns
of dinosaur diversity through time [Fastovsky et al., 2004;
Wang and Dodson, 2006]) with some confidence.
Other explanations for such exceptionally strong congruence appear unlikely in this context. It is possible,
for example, that systematists might code their data in
such a way as to yield stratigraphically congruent trees
a priori. This would require some considerable skill and
disingenuity, and there is no evidence that coded character distributions have been more biased by stratigraphic
considerations in our sample of dinosaur trees than in
any other group. A less potent but more credible possibility concerns outgroup selection. Although the oldest
taxa may well be the most plesiomorphic (to precisely the
same extent that we expect the phylogenetic and stratigraphic patterns to concur), it does not follow that the
outgroup can always be identified reliably on the grounds
of antiquity. Factoring chronological age into decisions
concerning rooting “contaminates” phylogenetic inferences with the stratigraphic signal, making them nonindependent. This is a more plausible contingency than
wholesale biases in coding practice, requiring just one
assumption that may contributes to rooting decisions in
a manner that is difficult to quantify in the absence of
an explicit, larger “parent” tree. However, the relationships of the major dinosaur subclades appear to have
converged on a robust consensus, with most nodes being
well supported (Pisani et al., 2002). There are therefore
excellent ancillary phylogenetic grounds for the choice
of outgroup taxa in most or all cases. Moreover, it is unclear why such a putative problem should uniquely afflict the systematics of the nonavian Dinosauria, a group
that appears to offer a wealth of morphological character
data (Novas, 1996; Langer and Benton, 2006). Problems
of this nature might more reasonably be expected to afflict groups whose wider relationships were less certain
(e.g., most major subclades of arthropods; Wills, 1999)
and which systematists struggle to root by any means.
The Relationship Between Cladistic/Stratigraphic
Congruence and the Completeness of the Fossil Record
There is no single definition of fossil record “completeness” or “quality,” still less an agreed metric to quantify
it (Paul and Donovan, 1998). The congruence between
cladistic and stratigraphic order is indicative of just one
aspect of completeness. At the broadest level, we can distinguish between organismal, taxonomic, palaeobiogeographical, and stratigraphic incompleteness, although
each has a bearing on all of the others.
Organismal incompleteness refers to the many characters of extinct species that are not amenable to fossilization, in addition to those that, for particular species,
happen not to have been preserved. The lack of information on the limbs of most arthropods and the feathers of
most birds contribute to incompleteness of this sort. Such
“missing data” is cited as a problem in cladistic studies
that include fossils purportedly inflating the number of
899
most parsimonious trees found and thereby obfuscating apparent relationships. Simulations (Wiens, 2003b)
demonstrate that it is the number of coded characters
(as opposed to the number of missing entries) that contributes most strongly to a lack of resolution, whereas
empirical studies identify no significant difference between the behavior of fossil taxa and their extant counterparts in this regard (Cobbett et al., 2007). Organismal
incompleteness is not necessarily a problem for establishing first (or last) occurrence dates, because even a single
fragmentary or poorly preserved specimen with diagnostic characters can establish the presence of a group
at a particular time (e.g., MacLeod, 1994). Such incompleteness may be more problematic when attempting to
determine if a fossil without diagnostic characters belongs
to a particular stem lineage.
Taxonomic incompleteness refers to the inference that
not all taxa have been preserved as fossils and that not
all of these will have been discovered and documented.
This bears on the issue of taxon sampling, which may influence phylogenetic inference and the apparent stratigraphic congruence of trees in complex ways (Wagner
and Sidor, 2000). Trees are likely to be less complete as
the taxonomic level of their terminals decreases; all other
things being equal, the chances of a species leaving no
record is greater than that for a genus, family, or order.
This also implies that first occurrence dates will be less
reliable descending the taxonomy. It is for this reason
that most macroevolutionary studies of diversity and
disparity through time focus on higher taxa (e.g., Wills,
1998; Erwin, 2007; Sahney and Benton, 2008). Nonetheless, many sources of potential bias remain (Smith, 2007).
The trees in this study are predominantly at the generic
level. As such, many are likely to be incomplete to some
degree. Indeed, sophisticated extrapolations of the rates
of discovery of new dinosaur genera (Wang and Dodson,
2006) using abundance-based coverage estimators (ACE;
Chao et al., 1993) suggest that only 29% of discoverable genera are now known, predicting that another 140
years of research will be required before 90% have been
documented. Congruence indices such as the GERt and
GER∗ do not measure the absolute extent of ghost ranges.
Rather, they measure ghost ranges relative to the distribution of possibilities for the data available. Hence,
high congruence is possible with an extremely gappy
record (Wills, 1999); the only requirement is that the phylogenetic and stratigraphic signals are concordant. Such
congruence may therefore be relatively impervious to
rarefaction of the true tree or to the random degradation
of the stratigraphic signal (at least up to a point). Simulation studies are needed to investigate this. Our finding
of exceptional congruence for most dinosaur groups is
therefore perfectly consistent with poor knowledge of
generic diversity.
The most parsimonious explanation for good agreement between the phylogenetic and stratigraphic signals is that both reflect faithfully (albeit partially) the
underlying evolutionary history of the group (Benton
et al., 1999; Wills, 2002). This does imply, however, that
new discoveries are less likely to have a marked impact
900
SYSTEMATIC BIOLOGY
on inferred phylogeny (see a similar example for basal
birds: Fountaine et al., 2005) and will probably subdivide
existing ghost ranges rather than radically overhaul the
stratigraphic data set. We also note that the “known”
29% of genera are unlikely to represent a random sample of discoverable genera, and towards those from
the best-sampled and/or most accessible geographic regions. Moreover, many recent finds are being made
proactively from hitherto inaccessible and unexploited
regions (Wang and Dodson, 2006) engendering a renaissance in dinosaur descriptions. Most of these are readily
assigned to existing clades and interpolate smoothly into
the existing picture of dinosaur evolution (Pisani et al.,
2002).
A more problematic form of taxonomic incompleteness may be introduced by systematists themselves.
Pearson (1999) noticed that published cladograms tend
to be pectinate much more often than predicted by null
models of cladogenesis and the random sampling of
lineages. This was attributed to a tendency for investigators to include only those taxa proximate to some
preconceived transitional series of fossils. This practice is most likely in studies that investigate the origins of one major group from within another (e.g.,
tetrapods from fish [Clack, 2006], birds from nonavian
dinosaurs [Zhou, 2004], mammals from nonmammalian
cynodonts [Kemp, 2007]). However, such selective pruning alone need not necessarily result in spuriously inflated congruence. There is no reason why a series of
morphologically transitional fossils should be especially
congruent with stratigraphy, unless the transitional series were itself identified partly by weeding out “incongruously” placed fossils (rather than solely on the
basis of what is understood of morphological evolution). Eight of the dinosaurian data sets investigate the
evolution of “basal” groups (i.e., Xu et al., 2002 [basal
Ceratopsia]; Rauhut, 2003 [basal theropods]; Holtz et al.,
2004 [basal Tetanurae]; Galton and Upchurch, 2004a
[basal sauropodomorphs]; Norman, 2004 [basal Iguanodontia]; Turner et al., 2007 [stem birds]; Yates, 2007
[basal sauropodomorphs]; Butler et al., 2008 [basal ornithischians]). However, these are actually more balanced on average than the other 11 trees analyzed (mean
lm [Heard 1992] for the two groups is 0.34 and 0.44,
respectively). Other forms of taxonomic rarefaction are
commonplace, in particular the exclusion of incomplete
material and the de facto absence of an indeterminate
number of lineages from the record. Arguably, all cladograms including fossils (and to a lesser extent, those that
do not; Cobbett et al., 2007) are subject to such uncertain
sampling. Although the effects of missing data (Wiens,
2003a, 2003b, 2006) and missing taxa (Poe, 1998; Wagner,
2000b) upon the accuracy of phylogenetic inference have
been investigated in some detail, their effects upon stratigraphic congruence require further exploration (Wagner
and Sidor, 2000). In this context, it is notable that only one
dinosaur clade (Stegosauria) had very poor congruence
(a MIGu indistinguishable from random). In this particular case, the poor fit probably resulted from a number of
clade-specific factors, including poor phylogenetic reso-
VOL. 57
lution and the relatively small number of characters that
were incorporated into the analysis (see Galton and Upchurch, 2004a).
Paleobiogeographical incompleteness refers to the
inferred non-preservation of fossils from some of the geographical regions that the organisms originally inhabited. This is the least problematic form of incompleteness
for congruence studies, because first occurrence dates are
usually reported globally or for a specified region of interest. We note, however, that the extreme paleobiogeographic discontinuity of sister groups has been cited to
imply the existence of ghost ranges of uncertain duration
(e.g., Fortey et al., 1996; Lieberman, 2002; Clarke et al.,
2007), even in cases where these sister groups appear simultaneously in time. We are aware of no attempts to
quantify such putative temporal gaps.
Stratigraphic incompleteness refers to the absence of
fossils from strata formed at times when a taxon’s existence is otherwise inferred. This may be within taxa
(where occurrences pre- and post-date the gap: so-called
Lazarus taxa) or between taxa. Congruence indices measure only this second aspect; namely, the extent of inferred stratigraphic gaps between the first occurrence
dates of sister taxa. They do not attempt to quantify
the intensity of sampling within ranges, nor do they
seek to estimate confidence intervals on those ranges.
Published compendia typically report the first and last
occurrences of taxa (e.g., Benton, 1993), but there is often scant data on the frequency of fossil occurrences
between these dates. Just two incidences of a taxon
are all that is needed to stake out upper and lower
bounds in the fossil record. Within-taxon completeness
may be very poor, however, with all of the intervening stratigraphic intervals documenting no fossils. Under these circumstances, our confidence in the first and
last dates will be very low, especially if they are widely
separated in time. Where the probability of fossilization is uniform, the mean size of gap within the observed range provides an estimate of the time between
the first or last occurrence and the actual end of the
taxon’s range (Strauss and Sadler, 1989; Marshall, 2001).
Methods for extending ranges in cases where fossilization is non-random are a little more complex (e.g., Marshall, 1997; Solow, 2003). A straightforward measure of
within-taxon completeness per interval is the simple
completeness metric or SCM (Benton, 1987). This expresses the number of constituent taxa (typically species)
within a more inclusive taxon of interest observed in
a given stratigraphic interval, as a fraction of the total number of observed taxa plus Lazarus taxa (Fara,
2001). A more sophisticated estimate of the probability of preservation per interval can be determined given
the numbers of constituent taxa spanning one, two, and
three stratigraphic intervals: f (1), f (2), and f (3), respectively. This probability, R (or the “FreqRat”), is given
simply by the ratio f (2)2 /[ f (1) f (3)] (Foote and Raup,
1996). Of course, it is not impossible for a species or
higher taxon to be known from just one locality or horizon, so that all such measures become impossible to
calculate.
2008
WILLS ET AL.—MODIFIED GAP EXCESS RATIO
Taxonomic Level and Taxon Sampling
Previous studies of large samples of cladograms (e.g.,
Benton and Hitchin, 1997; Benton et al., 1999; Benton,
2001) caution that the taxonomic level of OTUs in a
cladogram is likely to have an influence on stratigraphic
congruence. Our modest sample of trees precluded
quantifying this effect rigorously, which requires the systematic analysis of hundreds of trees with multivariate
methods. We note, however, that our trees were predominantly at the generic level, making our sample relatively
homogeneous. Seven trees exclusively sampled genera,
and no tree had more than one third of its OTUs at any
other level (including species). Over all 19 trees, 84% of
terminals were genera. We also note that no dinosaur
systematists currently use any conventional Linnean hierarchical terms; these have become outmoded with the
advent of rigorous cladistic analyses. This is not to say
that the size and number of composite taxa is unimportant, merely that it is beyond the scope of the present
study to address it in a rigorous way. A related issue is
the intensity of taxon sampling, the proportion of taxa
at a given level that are represented in a cladogram.
Of course, this is difficult to quantify in absolute terms,
because sampling is always influenced by the known
and knowable fossil record, as well as by the deliberate exclusion of taxa by the systematist. Dinosaur taxonomists predominantly work on genera and low-level
clades. They are usually cautious of higher taxonomic
ranks, opting instead for clades based explicitly on cladograms. Moreover, most dinosaur genera are monospecific. This means that dinosaur cladograms are either
predominantly based on genera (and therefore nearly
equivalent to species trees) or a complex mixture of different ranks (ranging from subfamilies to orders or even
classes in traditional taxonomies).
Cladograms for basal ceratopsians, Pachycephalosauria, Ceratopsidae, and prosauropods sample
around 70% of valid genera. The exceptions are those
removed a priori by safe taxonomic reduction (STR;
Wilkinson, 1995) or a posteriori by reduced consensus
(RC) methods (Wilkinson, 1994). The sauropod trees
sample only 25% to 30% of known generic diversity.
Some of the shortfall results from the application of
STR or RC methods. However, other taxa are excluded
because they are extremely incomplete, inaccessible,
or very poorly described or because of perceived taxonomic problems in assigning the hypodigm material.
In other cases, taxa are excluded purely for expedience.
The hadrosaur, ankylosaur, and iguanodontian trees
sample less than 50% of the generic diversity and a
lower proportion of species diversity. Stegosaurs are
represented by around two-thirds of taxa. All of the
theropod phylogenies, along with the large ornithischian phylogeny of Butler, woefully undersample the
total diversity (as they collapse many diverse clades
into OTUs). Nonetheless, they do offer good coverage
of all constituent clades. Similarly, the sauropodomorph
phylogenies of Upchurch et al. (2007) and Yates (2007)
contain at most five sauropod exemplars, representing
a clade of over 120 species.
901
Can We Use Stratigraphic Congruence to Choose between
Trees for the Same Taxa?
Stratigraphic congruence can be used as an ancillary
criterion for choosing between two or more otherwise
equally optimal trees derived from the same character
matrix, with preference being given to the subset of trees
that exhibits the best fit to stratigraphy (Wills, 1999). This
idea can be extended to comparisons between two or
more trees proposed by different authors for an identical set of terminal taxa. Comparisons of the stratigraphic
congruence of competing tree topologies for theropod dinosaurs (Rauhut, 2003; Holtz et al., 2004) reveal comparably high indices because the higher-level relationships
proposed by these two authors are similar. Because these
trees are derived from independent character sets, this
agreement corroborates the gross structure of both trees.
Interestingly, congruence measures for more exclusive
clades within Theropoda (Paraves: Senter, 2007; Turner
et al., 2007) were substantially lower than those obtained
from the broader scale analyses of Rauhut (2003) and
Holtz et al. (2004). One likely reason for this discrepancy is that both Rauhut (2003) and Holtz et al. (2004)
used monophyla to represent speciose coelurosaurian
clades (including those that comprise Paraves). As a result, ghost lineages within these monophyla are not included in the calculations.
The various basal sauropodomorph trees examined
showed wide variation in their congruence with stratigraphy. However, with the exception of Upchurch et al.
(2007), GER, GERt, and GER∗ values were lower than
for most other dinosaur clades. Whether this reflects a
poor understanding of phylogeny, a patchy fossil record,
or both is unclear. Although these trees include a small
number of sauropods, these are merely exemplars of
clades containing 120 or more species. Whatever the case,
more work on the evolution of basal sauropodomorphs
is required.
Both of the sauropod phylogenies examined show excellent stratigraphic congruence, although the tree of
Wilson (2002) performs better than that of Upchurch et al.
(2004). However, because the latter contains more ingroup taxa than the former, we reanalyzed the Upchurch
et al. (2004) tree after pruning out the “additional” terminals. This still resulted in poorer congruence than for
the Wilson (2002) tree.
More generally, we note that adding or redating single
terminals can have marked effects on indices of congruence. For example, the Early Jurassic theropod taxon Eshanosaurus is known from only an incomplete lower jaw.
Some authors (Zhao and Xu, 1998; Xu et al., 2001) place it
controversially in the otherwise exclusively Cretaceous
Therizinosauroidea (Clark et al., 2004). If correct, this implies the existence of a ghost lineage that extends through
all of the Jurassic and into the lower part of the Early
Cretaceous. More importantly, however, because therizinosauroids occupy a derived position within theropod
phylogeny, the early occurrence of Eshanosaurus would
result in the generation of additional ghost ranges for
numerous derived theropod clades. Redating the first
occurrence of Therizinosauroidea in the phylogenies of
902
SYSTEMATIC BIOLOGY
Rauhut (2003) and Holtz et al. (2004) does indeed result in
a drop in all measures of congruence. Conversely, we also
note that relatively large numbers of taxa can be added
to trees without significantly altering levels of congruence. For example, the addition of nine tyrannosauroid
taxa to the phylogeny of Holtz et al. (2004) resulted in
only a 0.3% change in GER score. In this case, the limited
impact probably results from two factors: (i) most tyrannosauroids are of Late Cretaceous age and have overlapping stratigraphic ranges and hence few substantial
ghost lineages; and (ii) all basal tyrannosauroids occur
earlier in time than derived forms (Holtz, 2004).
Finally, it is interesting to note that the ornithischian
phylogeny proposed by Butler et al. (2008) has a higher
GER score than its progenitor (Butler, 2005). This probably results almost entirely from the change in position
of a single, stratigraphically old clade: the Late Triassic and Early Jurassic Heterodontosauridae (Weishampel et al., 2004b). In earlier cladograms (e.g., Sereno,
1999; Butler, 2005), these resolved as the sister group
of the relatively derived Euornithopoda, thereby generating many extensive ghost ranges. Butler et al. (2008),
by contrast, recovered Heterodontosauridae close to the
base of the ornithischians, removing several ghost ranges
in the process.
CONCLUSIONS
1. Although notionally scaled between 0.0 and 1.0, the
gap excess ratio (GER) is sensitive to differences in
tree balance. Only pectinate (maximally imbalanced)
trees are always capable of yielding the extreme values. A more balanced tree (i.e., one for which lm > 0;
Heard, 1992) is incapable of having GER = 1.00 or 0.00
(unless several taxa first appear in the same interval),
no matter how a given set of stratigraphic data is artificially redistributed across its terminals. The topological GER (GERt) represents an advance over the
GER by scaling the observed sum of ghost ranges between the maximum and minimum possible for the
stratigraphic data on the given topology (and also by
scaling the length of stratigraphic intervals to unity,
rather than measuring their duration in millions of
years). Upper and lower bounds are estimated empirically here by randomization in the absence of a
more efficient algorithm. However, because the tails
of the distribution of randomized values are typically
long, it becomes prohibitively time-consuming to find
the bounds for all but the smallest data sets, and the
number of randomizations becomes critical. For this
reason, we propose a new measure of congruence—
the modified GER (GER∗ )—that takes the shape of
this distribution directly into account. GER∗ is given
simply by the fraction of randomized values yielding a sum of ghost ranges (measured in unit interval
lengths: MIGu) greater than the original value.
2. The procedure proposed by Pol and Norell (2006) for
investigating the effects of uncertain first occurrence
dates on the MSM* and GER is readily extended to
other indices. Data on the distribution of values ob-
VOL.
57
tained are informative and indicate to what extent perceived congruence is susceptible to differences in the
interpretation of the dating of fossil lineages.
3. The congruence between the phylogenetically inferred order of lineage branching and the order in
which those same lineages first appear in the fossil
record is exceptionally high in most of the dinosaur
clades investigated. Only one tree (Stegosauria;
Galton and Upchurch, 2004b) had a congruence value
indistinguishable from the null model. The most tenable explanation for this agreement is that both phylogenetic and stratigraphic signals reflect the same
underlying pattern of evolutionary history. This implies that our knowledge of the dinosaur fossil record
is sufficiently complete for palaeontologists to investigate macroevolutionary patterns and processes, at
least at the taxonomic level of genera and the stratigraphic level of stages. This finding is consistent with
the inferences of Wang and Dodson (2006), who estimated that as many as 71% of dinosaur genera remain
to be found. Our results do, however, imply that new
discoveries are unlikely to precipitate radical revisions of phylogeny or amendments to the stratigraphic
series.
4. Excellent stratigraphic congruence for dinosaurs is
probably a function of several factors. Firstly, the long
history of wide and intense interest in the group means
that dinosaurs have been especially well studied
(Weishampel et al., 2004a). Secondly, previous studies
have demonstrated that vertebrate cladograms tend
to be more congruent with stratigraphy than those of
many other groups (Hitchin and Benton, 1997a; Wills,
2001). This may itself be because morphological character sets for vertebrates tend to be less homoplastic
than those for many other taxa, facilitating accurate
phylogeny reconstruction. Moreover, the preservation
potential of bone is high relative to that of many other
tissue types. Thirdly, stratigraphic congruence in the
Jurassic and Cretaceous appears to be higher across
all taxa than at many other times, an observation only
partially accounted for by sampling and edge effects
(e.g., the “pull of the Recent”; Wills, 2007).
ACKNOWLEDGMENTS
We thank Michael Benton, Lewis Jacobs, and Norman MacLeod
for constructive reviews that helped us to significantly improve the
quality of this work. We also thank Frank DeNota and T. Michael
Keesey for allowing us to use silhouettes of several of their illustrations.
We are grateful to all those who contributed cladograms (wittingly or
otherwise) to make this work possible. MAW’s work on the comparison of cladistic and stratigraphic data was supported by BBSRC grant
BB/C006682/1.
R EFERENCES
Angielczyk, K. D. 2002. A character-based method for measuring the
fit of a cladogram to the fossil record. Syst. Biol. 51:137–152.
Angielczyk, K. D., and D. L. Fox. 2006. Exploring new uses for measures
of fit of phylogenetic hypotheses to the fossil record. Paleobiology
32:147–165.
Benton, M. J. 1987. Mass extinctions among families of non-marine
tetrapods: The data. Mémoire de la Société Géologique de France,
Numéro Spécial 150:21–32.
2008
WILLS ET AL.—MODIFIED GAP EXCESS RATIO
Benton, M. J. 1993. The fossil record 2. Chapman and Hall, London.
Benton, M. J. 1994. Paleontological data and identifying mass extinctions. Trends Ecol. Evol. 9:181–185.
Benton, M. J. 2001. Finding the tree of life: Matching phylogenetic trees
to the fossil record through the 20th century. Proc. R. Soc. Lond. B
Biol. 268:2123–2130.
Benton, M. J., and R. Hitchin. 1997. Congruence between phylogenetic
and stratigraphic data on the history of life. Proc. R. Soc. Lond. B
Biol. 264:885–890.
Benton, M. J., R. Hitchin, and M. A. Wills. 1999. Assessing congruence
between cladistic and stratigraphic data. Syst. Biol. 48:581–596.
Benton, M. J., M. A. Wills, and R. Hitchin. 2000. Quality of the fossil
record through time. Nature 403:534–537.
Brochu, C. A., and M. A. Norell. 2000. Temporal congruence and the
origin of birds. J. Vertebr. Paleontol. 20:197–200.
Butler, R. J. 2005. The “fabrosaurid” ornithischian dinosaurs of the Upper Elliot Formation (Lower Jurassic) of South Africa and Lesotho.
Zool. J. Linn. Soc. Lond. 145:175–218.
Butler, R. J., P. Upchurch, and D. B. Norman. 2008. The phylogeny of
the ornithischian dinosaurs. J. Syst. Palaeontol. 6:1–40.
Clack, J. A. 2006. The emergence of early tetrapods. Palaeogeog. Palaeoclimatol. Palaeoecol. 232:167–189.
Clark, J. M., T. Maryanska, and R. Barsbold. 2004. Therizinosauroidea.
Pages 151–164 in The Dinosauria, 2nd edition (D. B. Weishampel,
P. Dodson, and H. Osmólska, eds.). University of California Press,
Berkeley.
Clarke, J. A., D. T. Ksepkac, M. Stucchie, M. Urbinaf, N. Gianninig, S.
Bertellii, Y. Narváezj, and C. A. Boyda. 2007. Paleogene equatorial
penguins challenge the proposed relationship between biogeography, diversity, and Cenozoic climate change. Proc. Natl. Acad. Sci.
USA 104:11545–11550.
Clyde, W. C., and D. C. Fisher. 1997. Comparing the fit of stratigraphic
and morphologic data in phylogenetic analysis. Paleobiology 23:1–
19.
Cobbett, A., M. Wilkinson, and M. A. Wills. 2007. Fossils impact as
hard as living taxa in parsimony analyses of morphology. Syst. Biol.
56:753–766.
Crampton, J. S., A. G. Beu, R. A. Cooper, C. M. Jones, B. Marshall, and
P. A. Maxwell. 2003. Estimating the rock volume bias in paleobiodiversity studies. Science 301:358–360.
Dodson, P., C. A. Forster, and S. D. Sampson. 2004. Ceratopsidae. Pages
494–513 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson,
and H. Osmólska, eds.). University of California Press, Berkeley.
Erwin, D. H. 2007. Disparity: Morphological pattern and developmental context. Palaeontology 50:57–73.
Fara, E. 2001. What are Lazarus taxa? Geol. J. 36:291–303.
Farris, J. S. 1989. The retention index and the rescaled consistency index.
Cladistics 5:417–419.
Fastovsky, D. E., Y. F. Huang, J. Hsu, J. Martin-McNaughton, P. M.
Sheehan, and D. B. Weishampel. 2004. Shape of Mesozoic dinosaur
richness. Geology 32:877–880.
Finarelli, J. A., and W. C. Clyde. 2002. Comparing the gap excess ratio and the retention index of the stratigraphic character. Syst. Biol.
51:166–176.
Foote, M., and D. M. Raup. 1996. Fossil preservation and the stratigraphic ranges of taxa. Paleobiology 22:121–140.
Fortey, R. A., D. E. G. Briggs, and M. A. Wills. 1995. Decoupling
cladogenesis from morphological disparity. Biol. J. Linn. Soc. 57:13–
33.
Fountaine, T. M. R., M. J. Benton, G. J. Dyke, and R. L. Nudds. 2005.
The quality of the fossil record of Mesozoic birds. Proc. R. Soc. Lond.
B Biol. 272:289–294
Galton, P. M., and P. Upchurch. 2004a. Prosauropoda. Pages 232–258 in
The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H.
Osmólska, eds.). University of California Press, Berkeley.
Galton, P. M., and P. Upchurch. 2004b. Stegosauria. Pages 343–362 in
The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H.
Osmólska, eds.). University of California Press, Berkeley.
Gradstein, F. M., J. G. Ogg, and A. G. Smith. 2004. A geologic time scale
2004. Cambridge University Press, Cambridge.
Heard, S. B. 1992. Patterns in tree balance among cladistic, phenetic,
and randomly generated phylogenetic trees. Evolution 46:1818–
1826.
903
Hitchin, R., and M. J. Benton. 1997a. Congruence between parsimony
and stratigraphy: Comparisons of three indices. Paleobiology 23:20–
32.
Hitchin, R., and M. J. Benton. 1997b. Stratigraphic indices and tree
balance. Syst. Biol. 46:563–569.
Holtz, T. R. Jr. 2004. Tyrannosauroidea. Pages 111–136 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H.
Osmólska, eds.). University of California Press, Berkeley.
Holtz, T. R., Jr., R. E. Molnar, and P. J. Currie. 2004. Basal Tetanurae.
Pages 71–110 in The Dinosauria, 2nd edition (D. B. Weishampel,
P. Dodson, and H. Osmólska, eds.). University of California Press,
Berkeley.
Horner, J. R., D. B. Weishampel, and C. A. Forster. 2004. Hadrosauridae.
Pages 438–465 in The Dinosauria, 2nd edition (D. B. Weishampel,
P. Dodson, and H. Osmólska, eds.). University of California Press,
Berkeley.
Huelsenbeck, J. P. 1994. Comparing the stratigraphic record to estimates
of phylogeny. Paleobiology 20:470–483.
Kemp, T. S. 2007. The origin of higher taxa: Macroevolutionary processes, and the case of the mammals. Acta Zool. 88:3–22.
Langer, M. C., and M. J. Benton. 2006. Early dinosaurs: A phylogenetic
study. J. Syst. Palaeontol. 4:309–358.
Lieberman, B. S. 2002. Phylogentic analysis of some basal early Cambrian trilobites, the biogeographic origins of the Eutrilobita, and the
timing of the Cambrian radiation. J. Paleont. 76:692–708
MacLeod, K. G. 1994. Bioturbation, inoceramid extinction, and midMaastrichtian ecological change. Geology 22:139–142.
Marshall, C. R. 1997. Confidence intervals on stratigraphic ranges with
non-random distributions of fossil horizons. Paleobiology 23:165–
173.
Marshall, C. R. 2001. Confidence limits in stratigraphy. Pages 542–545 in
Palaeobiology II (D. E. G. Briggs and P. R. Crowther, eds.). Blackwell,
Oxford, UK.
Maryanska, T., R. E. Chapman, and D. B. Weishampel. 2004. Pachycephalosauria. Pages 464–477 in The Dinosauria, 2nd edition (D. B.
Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley.
Norell, M. A., and M. J. Novacek. 1992. Congruence between superpositional and phylogenetic patterns—Comparing cladistic patterns
with fossil records. Cladistics 8:319–337.
Norman, D. B. 2004. Basal Iguanodontia. Pages 413–437 in The
Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H.
Osmólska, eds.). University of California Press, Berkeley.
Novas, F. E. 1996. Dinosaur monophyly. J. Vertebr. Paleontol. 16:723–
741.
Paul, C. R. C. and S. K. Donovan. 1998. An overview of the completeness
of the fossil record. Pages 111–131 in The adequacy of the fossil record
(C. R. C. Paul and S. K. Donovan, eds). John Wiley, New York.
Pearson, P. N. 1999. Apomorphy distribution is an important aspect of
cladogram symmetry. Syst. Biol. 48:399–406.
Pisani, D., A. M. Yates, M. C. Langer, and M. J. Benton. 2002. A genuslevel supertree of the Dinosauria. Proc. R. Soc. Lond. B Biol. 269:915–
921.
Poe, S. 1998. Sensitivity of phylogeny estimation to taxonomic sampling. Syst. Biol. 47:18–31.
Pol, D., and M. A. Norell. 2001. Comments on the Manhattan stratigraphic measure. Cladistics 17:285–289.
Pol, D., and M. A. Norell. 2006. Uncertainty in the age of fossils and
the stratigraphic fit to phylogenies. Syst. Biol. 55:512–521.
Rauhut, O. W. M. 2003. The interrelationships and evolution of basal
theropod dinosaurs. Spec. Pap. Palaeontol. 69:1–213.
Sahney, S. and M. J. Benton. 2008. Recovery from the most profound
mass extinction of all time. Proc. R. Soc. Lond. B Biol. Sci. 275:759–765.
Senter, P. 2007. A new look at the phylogeny of Coelurosauria (Dinosauria: Theropoda). J. Syst. Palaeontol. 5:429–463.
Sereno, P. C. 1999. The evolution of dinosaurs. Science 284:2137–2147.
Siddall, M. E. 1996. Stratigraphic consistency and the shape of things.
Syst. Biol. 45:111–115.
Siddall, M. E. 1998. Stratigraphic fit to phylogenies: A proposed solution. Cladistics 14:201–208.
Slater, C. S. C. 2007. A study of supertree construction using mammalian phylogenies and information from the fossil record. Unpublished PhD Thesis, University of Cambridge.
904
SYSTEMATIC BIOLOGY
Smith, A. B. 2003. Making the best of a patchy fossil record. Science
301:321–322.
Smith, A. B. 2007. Marine diversity through the Phanerozoic: Problems
and prospects. J. Geol. Soc. 164:731–745.
Solow, A. R. 2003. Estimation of stratigraphic ranges when fossil finds are not randomly distributed. Paleobiology 29:181–
185.
Strauss, D., and P. M. Sadler. 1989. classical confidence intervals and
Bayesian probability estimates for ends of local taxon ranges. Math.
Geol. 21:411–427.
Turner, A. H., D. Pol, J. A. Clarke, G. M. Erickson, and M. A. Norell.
2007. A basal dromaeosaurid and size evolution preceding avian
flight. Science 317:1378–1381.
Upchurch, P., P. M. Barrett, and P. Dodson. 2004. Sauropoda.
Pages 259–322 in The Dinosauria, 2nd edition (D. B. Weishampel,
P. Dodson, and H. Osmólska, eds.). University of California Press,
Berkeley.
Upchurch, P., P. M. Barrett, and P. M. Galton. 2007. A phylogenetic
analysis of basal sauropodomorph interrelationships: Implications
for the origin of sauropod dinosaurs. Spec. Pap. Palaeontol. 77:57–
90.
Vickaryous, M. K., T. Maryanska, and D. B. Weishampel. 2004. Ankylosauria. Pages 363–392 in The Dinosauria, 2nd edition (D. B.
Weishampel, P. Dodson, and H. Osmólska, eds.). University of
California Press, Berkeley.
Wagner, P. J. 2000a. Exhaustion of morphologic character states among
fossil taxa. Evolution 54:365–386.
Wagner, P. J. 2000b. The quality of the fossil record and the accuracy
of phylogenetic inferences about sampling and diversity. Syst. Biol.
49:65–86.
Wagner, P. J., and C. A. Sidor. 2000. Age rank/clade rank metrics—
Sampling, taxonomy, and the meaning of “stratigraphic consistency.”
Syst. Biol. 49:463–479.
Wang, S. C., and P. Dodson. 2006. Estimating the diversity of dinosaurs.
Proc. Natl. Acad. Sci. USA 103:13601–13605.
Weishampel, D. B., P. M. Barrett, R. A. Coria, J. Le Loeuff, X. Xu,
X. J. Zhao, A. Sahni, E. M. P. Gomani, and C. R. Noto. 2004. Dinosaur distribution. Pages 517–606 in The Dinosauria, 2nd edition
(D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of
California Press, Berkeley.
Weishampel, D. B., P. Dodson, and H. Osmólska. 2004. The Dinosauria,
2nd Edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.).
University of California Press, Berkeley.
Wiens, J. J. 2003a. Incomplete taxa, incomplete characters, and phylogenetic accuracy: Is there a missing data problem? J. Vertebr. Paleontol.
23:297–310.
VOL. 57
Wiens, J. J. 2003b. Missing data, incomplete taxa, and phylogenetic
accuracy. Syst. Biol. 52:528–538.
Wiens, J. J. 2006. Missing data and the design of phylogenetic analyses.
J. Biomed. Informat. 39:34–42.
Wilkinson, M. 1994. Common cladistic information and its consensus
representation—Reduced Adams and reduced cladistic consensus
trees and profiles. Syst. Biol. 43:343–368.
Wilkinson, M. 1995. Coping with abundant missing entries in phylogenetic inference using parsimony. Syst. Biol. 44:501–514.
Wills, M. A. 1998. Crustacean disparity through the Phanerozoic:
Comparing morphological and stratigraphic data. Biol. J. Linn. Soc.
65:455–500.
Wills, M. A. 1999. Congruence between phylogeny and stratigraphy:
Randomization tests and the gap excess ratio. Syst. Biol. 48:559–
580.
Wills, M. A. 2001. How good is the fossil record of arthropods? An
assessment using the stratigraphic congruence of cladograms. Geol.
J. 36:187–210.
Wills, M. A. 2002. The tree of life and the rock of ages: Are we getting
better at estimating phylogeny? Bioessays 24:203–207.
Wills, M. A. 2007. Fossil ghost ranges are most common in some of the
oldest and some of the youngest strata. Proc. R. Soc. Lond. B Biol.
274:2421–2427
Wilson, J. A. 2002. Sauropod dinosaur phylogeny: Critique and cladistic
analysis. Zool. J. Linn. Soc. Lond. 136:217–276.
Xu, X., P. J. Makovicky, X.-L. Wang, M. A. Norell and H.-L. You. 2002.
A ceratopsian dinosaur from China and the early evolution of Ceratopsia. Nature 416:314–317.
Xu, X., X. J. Zhao, and J. M. Clark. 2001. A new therizinosaur from
the Lower Jurassic Lower Lufeng Formation of China. J. Vertebr.
Paleontol. 21:477–483.
Yates, A. M. 2007. The first complete skull of the Triassic dinosaur
Melanorosaurus Haughton (Sauropodomorpha: Anchisauria). Spec.
Pap. Palaeontol. 77:9–55.
Yates, A. M., and J. W. Kitching. 2003. The earliest known sauropod
dinosaur and the first steps towards sauropod locomotion. Proc. R.
Soc. Lond. B Biol. 270:1753–1758.
Zhao, X. J., and X. Xu. 1998. The oldest coelurosaurian. Nature 394:234–
235.
Zhou, Z. H. 2004. The origin and early evolution of birds: Discoveries, disputes, and perspectives from fossil evidence. Naturwissenschaften 91:455–471.
First submitted 8 January 2008; reviews returned 25 March 2008;
final acceptance 25 August 2008
Associate Editor: Norman MacLeod