Syst. Biol. 57(6):891–904, 2008 c Society of Systematic Biologists Copyright ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150802570809 The Modified Gap Excess Ratio (GER*) and the Stratigraphic Congruence of Dinosaur Phylogenies M ATTHEW A. WILLS ,1 PAUL M. B ARRETT ,2 AND J ULIA F. HEATHCOTE3 2 1 Department of Biology and Biochemistry, The University of Bath, The Avenue, Claverton Down, Bath BA2 7AY, UK; E-mail: [email protected] Department of Palaeontology, The Natural History Museum, Cromwell Road, South Kensington, London SW7 5BD, UK; E-mail: [email protected] 3 Department of Earth Sciences, Birkbeck College, University of London, Malet Street, London WC1E 7HX, UK Abstract.— Palaeontologists routinely map their cladograms onto what is known of the fossil record. Where sister taxa first appear as fossils at different times, a ghost range is inferred to bridge the gap between these dates. Some measure of the total extent of ghost ranges across the tree underlies several indices of cladistic/stratigraphic congruence. We investigate this congruence for 19 independent, published cladograms of major dinosaur groups and report exceptional agreement between the phylogenetic and stratigraphic patterns, evidenced by sums of ghost ranges near the theoretical minima. This implies that both phylogenetic and stratigraphic data reflect faithfully the evolutionary history of dinosaurs, at least for the taxa included in this study. We formally propose modifications to an existing index of congruence (the gap excess ratio; GER), designed to remove a bias in the range of values possible with trees of different shapes. We also propose a more informative index of congruence—GER∗ —that takes account of the underlying distribution of sums of ghost ranges possible when permuting stratigraphic range data across the tree. Finally, we incorporate data on the range of possible first occurrence dates into our estimates of congruence, extending a procedure originally implemented with the modified Manhattan stratigraphic measure and GER to our new indices. Most dinosaur data sets maintain extremely high congruence despite such modifications. [Dinosaurs; fossil record; gap excess ratio; phylogeny; randomization; stratigraphic congruence.] Evidence for the evolutionary history of most groups derives from two independent sources. The first is the distribution of phylogenetically informative characters or markers in extant and extinct taxa. The second is the stratigraphic or temporal sequence in which taxa occur as fossils. Neither source of data can be read uncritically, and both require interpretation. Phylogenies incorporate assumptions concerning rooting and models of evolution. The resulting trees are therefore inferences rather than data. Fossils require varying degrees of interpretation depending upon the nature of the material, and dates may be subject to large margins of error. For these reasons, it is often desirable to compare inferences by mapping cladograms onto stratigraphic range charts (Norell and Novacek, 1992; Benton and Hitchin, 1997; Clyde and Fisher, 1997; Wagner and Sidor, 2000; Angielczyk and Fox, 2006; Pol and Norell, 2006). Where the phylogenetic and temporal inferences are concordant, they offer mutual corroboration, and both can be assumed reasonably to reflect the true, underlying evolution of the group. However, where the order of phylogenetic branching conflicts with the stratigraphic order, additional evidence is needed to determine which most closely reflects reality (Wills, 2007). Indications of problems with the phylogeny include the availability of multiple, conflicting cladograms for the same group in the literature (Wills, 1999, 2001) and poorly supported trees that are liable to radical branch shifts with minor perturbations of the data (Cobbett et al., 2007). Indications of problems with the stratigraphic data include a particularly patchy and fragmentary fossil record (e.g., Crampton et al., 2003; Smith, 2003), with dates of first occurrences that have been particularly labile with research time and effort (e.g., Benton, 2001; Wills, 2002). As well as focusing on particular cladograms and specific groups, previous comparisons between phylogenetic and stratigraphic inferences have investigated large samples of trees, thereby attempting to make generalizations about patterns of congruence across taxa or through time. Some groups (e.g., tetrapods and fish; Benton and Hitchin, 1997; Hitchin and Benton, 1997a; Wills, 2001) match the fossil record comparatively well, whereas others (e.g., arthropods: Wills, 2001) match the record comparatively poorly. Cladograms for some groups—notably the crustaceans (Wills, 1998)—are actually less congruent with the fossil record than with a significant majority (>99%) of randomly permuted range assignments. In cases of very poor congruence, it is difficult to make inferences about the quality of the fossil record or the accuracy of cladograms, because neither is known with certainty, and there are many conflated variables. For example, although congruence is higher through most of the Mesozoic relative to the Palaeozoic and the Cenozoic (Wills, 2007), the taxonomic composition of the biota changes markedly through time. The early Palaeozoic is dominated by invertebrates, particularly arthropods. These are known to have relatively low fossilization potential and offer systematists smaller and more homoplastic character sets (Wagner, 2000a). The Late Triassic and Early Jurassic, in contrast, witness the origin and early radiation of several extremely well-studied groups of vertebrates, including mammals and dinosaurs. These typically have much higher fossilization potential and provide larger numbers of less homoplastic characters. Alternatively, the relatively poor congruence of arthropod trees may result from either an arthropod fossil record that is less complete than that of other groups (Wills, 2001) or because arthropod cladograms are less accurate. Previous piecemeal investigations of a modest number of dinosaur clades have demonstrated remarkable congruence between phylogeny and stratigraphy (Brochu and Norell [2000] and Rauhut [2003] for theropods; 891 892 SYSTEMATIC BIOLOGY Wilson [2002] for sauropods; Pol and Norell [2006] for basal sauropodomorphs). However, it is unclear whether this is simply a sampling effect given the trees so far investigated or whether exceptional congruence is a feature of dinosaur cladograms more generally. Here we provide the first systematic investigation of congruence between the dinosaur stratigraphical record and phylogeny on the basis of a sample of recently published dinosaur cladograms. By applying modified congruence indices to these data, we demonstrate that in half of our sampled cases, the fit of cladograms to the fossil record is as good as it could possibly be, whereas in half of the remaining cases, the fit is within 10% of the theoretical maximum. We also adapt a procedure implemented by Pol and Norell (2006) for incorporating uncertainty in the dating of the earliest occurring fossils. Even allowing for dating errors and artificially selecting the least congruent combinations of possible first occurrence dates, a quarter of dinosaur trees were still perfectly congruent, with another quarter within 10% of this optimum. M ATERIAL AND M ETHODS The Data Set Nineteen higher-level cladograms of dinosaurs covering all of the major groups were sourced from papers and books (notably, multiple author contributions in Weishampel et al., 2004a) published in the last 5 years (Table 1). In many cases, these were revisions and/or extensions of older trees published by the same or overlapping sets of authors. Our sample is therefore representative of the current, revised state of dinosaur phylogenetics. Stratigraphic dates for the terminals were primarily derived from Weishampel et al. (2004b). Our stratigraphic ranges spanned 27 stages, from the Ladinian in the Middle Triassic (approximately 237 Ma) to the Maastrichtian (65.5 Ma), with some taxa (avian dinosaurs) also extending into the Paleocene (Gradstein et al., 2004). Each stage was further divided into lower, middle, and upper, yielding 81 stratigraphic intervals in total. First occurrences were dated in two ways. For “static” datings, we used the best estimates. For “uncertain” datings, we recorded all intervals, or the range of intervals, over which the earliest fossils may have occurred. Uncertainty of this kind derives from either or both of two sources. In the first, the assignment of the oldest fossil occurrence may be contentious (e.g., an undiagnostic, fragmentary, poorly preserved, or uncertainly accredited specimen), with better and more archetypal material occurring later. In the second, the identity of the oldest fossil(s) may be known with confidence, but the absolute age and stratigraphic position of this material may be uncertain. For example, the oldest fossils might date indubitably from the Late Jurassic (with the next oldest exemplars occurring in the Cretaceous), but their precise stratigraphic position within the Oxfordian, Kimmeridgian, and Tithonian stages may be unknown. We have not sought to distinguish between these sources of uncertainty here. Tables of stratigraphic ranges are provided in online Appendix 1 (www.systematicbiology.org). VOL. 57 Indices of Congruence: Modifications to the Gap Excess Ratio (GER) A variety of indices have been proposed to quantify the congruence between phylogenetic and stratigraphic signals. These have been reviewed in some detail elsewhere (Siddall, 1996, 1998; Benton et al., 1999; Hitchin and Benton, 1997a, 1997b; Wills, 1999; Wagner and Sidor, 2000). Common to many of these is the identification of ghost ranges: portions of geological time through which a lineage is inferred to have existed but for which there is no direct evidence. These may be measured in absolute time (millions of years) or in stratigraphic units (however defined). If all of the terminals in a cladogram are monophyla, then ghost ranges are inferred wherever sister terminals or sister clades first appear in the fossil record at different times (Fig. 1). A direct or indirect tally of these inferred ghost ranges or gaps across the entire tree underlies several conceptually related congruence indices, including the gap excess ratio (GER; Wills, 1999), the Manhattan stratigraphic measure (MSM*; Siddall, 1998; Pol and Norell, 2001), the retention index of the stratigraphic character (Farris, 1989; Finarelli and Clyde, 2002), and the relative completeness index (RCI; Benton, 1994; although this last is more properly a measure of the completeness of the fossil record rather than a measure of congruence per se; Fig. 1). The sum of ghost ranges is denoted as the minimum implied gap (MIG in Benton [1994] or simply the MIG in Wills [1999]). The MIG alone says little about congruence, because its size depends upon the number of taxa in the cladogram as well as their stratigraphic distribution. The MIG therefore needs to be scaled between some maximum and minimum. The GER (Wills, 1999) is widely used and behaves well in simulations (Finarelli and Clyde, 2002). It scales the MIG (measured in absolute time) between the sum of ghost ranges obtained for the best (Gmin ) and worst (Gmax ) fits of a given set of stratigraphic data onto any tree topology. GER = 1 − (MIG − Gmin )/(Gmax − Gmin ) (1) In practice, Gmin is equivalent to the difference in age between the oldest origin and the youngest origin, whereas Gmax is equivalent to the sum of the differences in age between the oldest origin and the dates of origin of all other taxa. However, for most nonpectinate tree topologies, values of MIG can never reach Gmin or Gmax , and hence GER values can never reach 0.0 or 1.0 (Fig. 1). The MSM∗ is conceptually related to the GER but is calculated using an irreversible stepmatrix character (Fig. 1) within parsimony programs. It is determined by dividing the minimum possible number of stratigraphic character steps by the observed number of steps. Hence, the measure is not scaled relative to any theoretical maximum number of steps and so can never reach zero (except in the trivial case where all taxa originate in the same interval). Like the GER, it is also sensitive to differences in tree balance, with the maximum theoretical values for nonpectinate trees typically being less than 1.0 (Fig. 1). 893 Group 0.9561 (0.9326) 0.9468 (0.9259) 0.9101 (0.8991) 0.8971 (0.8876) 0.8963 (0.8473) 0.8442 (0.7987) 0.8044 (0.8040) 0.7820 (0.8125) 0.7813 (0.7787) 0.7811 (0.7912) 0.7683 (0.7682) 0.7582 (0.7592) 0.7308 (0.7792) 0.7101 (0.7269) 0.6780 (0.6618) 0.6203 (0.6315) 0.5789 (0.4894) 0.5674 (0.6266) 0.5579 (0.5355) 0.8036 (0.7799) 0.8957 (0.8817) 0.6321 (0.6818) 0.6338 (0.6736) 0.7847 (0.7990) 23 23 40 44 49 Fixed dates 19 12 27 23 14 16 36 40 26 44 38 16 18 21 49 23 10 34 32 Taxa a Notes: Tree of Upchurch et al. (2004), pruning all taxa not in common with the phylogeny of Wilson (2002). b Tree of Wilson (2002), pruning all taxa not in common with the phylogeny of Upchurch et al. (2004). c Tree of Holtz et al. (2004), with the addition of the problematic fossil Eshanosaurus. d Tree of Rauhut (2003), with the addition of the problematic fossil Eshanosaurus. e Tree of Holtz et al. (2004), with the addition of nine tyrannosauroid taxa to the tree. Trees as originally published Horner et al. (2004) Hadrosauridae Maryanska et al. (2004) Pachycephalosauria Wilson (2002) Sauropoda Vickaryous et al. (2004) Ankylosauria Dodson et al. (2004) Ceratopsia Norman (2004) Iguanodontia (basal) Upchurch et al. (2004) Sauropoda Holtz et al. (2004) Tetanurae (basal) Upchurch et al. (2007) Sauropodomorpha (basal) Rauhut (2003) Theropoda (basal) Butler et al. (2008) Ornithischia (basal) Xu et al. (2002) Ceratopsia (basal) Yates and Kitching (2003) Sauropoda Butler (2005) Ornithischia Senter (2007) Paraves + Oviraptorosauria Galton and Upchurch (2004a) Prosauropoda Galton and Upchurch (2004b) Stegosauria Yates (2007) Sauropodomorpha (basal) Turner et al. (2007) Paraves Trees above with modifications (see notes) a Upchurch et al. (2004) Sauropoda Wilson (2002)b Sauropoda Holtz et al. (2004)c Tetanurae (basal) Rauhut (2003)d Theropoda (basal) e Tetanurae (basal) Holtz et al. (2004) Authors 0.8027 0.8850 0.6139 0.6379 0.7883 0.8242 0.9230 0.6506 0.6654 0.8106 0.9736 0.9468 0.9201 0.9387 0.9044 0.8433 0.8500 0.7884 0.8008 0.7955 0.7760 0.7973 0.7927 0.7271 0.7430 0.6628 0.6615 0.6903 0.6233 Max. Min. 0.7878 0.8529 0.5874 0.6162 0.7693 0.9474 0.7673 0.8912 0.8772 0.8871 0.7860 0.7900 0.7351 0.6833 0.7635 0.7554 0.7465 0.6447 0.6815 0.6230 0.5195 0.1915 0.4792 0.5050 Variable dates 0.9583 0.9189 0.9042 0.9009 0.8959 0.8483 0.8138 0.7589 0.7471 0.7783 0.7653 0.7679 0.7206 0.7011 0.6816 0.5965 0.4550 0.5844 0.5629 Mean GER 1.0000 1.0000 0.7699 1.0000 1.0000 1.0000 0.9740 1.0000 1.0000 0.9167 0.9076 1.0000 1.0000 1.0000 1.0000 1.0000 0.7550 0.9120 1.0000 0.9065 0.7204 0.6111 1.0000 0.5711 dates Fixed 1.0000 1.0000 0.8421 0.9999 1.0000 1.0000 0.9716 1.0000 1.0000 0.9588 0.9131 1.0000 0.9989 1.0000 1.0000 1.0000 0.7683 0.9960 0.9998 0.9967 0.8778 0.5336 0.9999 0.6869 Mean 1.0000 1.0000 0.9866 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.8005 1.0000 1.0000 1.0000 1.0000 0.8235 1.0000 0.8099 Max. Min. 1.0000 1.0000 0.6841 0.9577 0.9847 0.9795 0.8085 1.0000 1.0000 0.8750 0.8555 1.0000 0.9071 0.9863 1.0000 1.0000 0.7458 0.7923 0.8795 0.8473 0.6628 0.2195 0.9101 0.5534 Variable dates GERt 1.0000 1.0000 0.9893 1.0000 1.0000 1.0000 0.9980 1.0000 1.0000 0.9923 0.9992 1.0000 1.0000 1.0000 1.0000 1.0000 0.9980 0.9998 1.0000 0.9998 0.9896 0.8463 1.0000 0.8785 dates Fixed 0.999 0.999 0.983 0.999 0.999 0.999 0.994 0.999 0.999 0.990 0.998 0.999 0.999 0.999 0.999 0.999 0.997 0.998 0.999 0.998 0.990 0.781 0.999 0.911 Mean 0.999 0.999 0.998 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.998 0.999 0.965 0.999 0.976 Max. 0.998 0.998 0.958 0.998 0.998 0.999 0.952 0.999 0.999 0.952 0.992 0.999 0.994 0.999 0.999 0.999 0.988 0.999 0.999 0.994 0.966 0.485 0.999 0.843 Min. Variable dates GER∗ TABLE 1. A variety of congruence indices for 19 dinosaur trees as originally published, plus experimental manipulations of 4 of those trees (see footnotes). Stratigraphic range data principally from Weishampel et al. (2004b). All indices calculated assuming stratigraphic intervals of unit length, except values in parentheses for the gap excess ratio (GER), which used absolute ages. Topological GER (GERt) and modified gap excess ratio (GER∗ ) values for fixed dates are based on 50,000 randomizations of stratigraphic data across each topology. Mean, maximum, and minimum GER, GERt, and GER∗ values for variable dates are based on 1000 random perturbations of stratigraphic dates within the stipulated range of possibilities, followed by 1000 randomizations of those dates across each perturbed tree (1,000,000 permutations for each original tree). 894 SYSTEMATIC BIOLOGY VOL. 57 Huelsenbeck, 1994), is calculated simply as the fraction of internal cladogram nodes with sister nodes of equivalent or greater age. This is strongly biased by tree topology (Wills, 1999). In order to overcome biases in the GER related to tree balance, we therefore propose a modification to the index, scaling the observed sum of ghost ranges between its maximum and minimum possible values on a given tree topology (rather than on any topology; topological GER or GERt): GERt = 1 − (MIGu − Gtmin )/ (Gtmax − Gtmin ) FIGURE 1. Existing indices of stratigraphic congruence (SCI, MSM*, and GER) are biased by tree shape. (a) Cladogram of nine taxa, A to I. (b) Stratigraphic ranges for taxa A to I, along with the corresponding Manhattan stepmatrix, used as an irreversible stratigraphic character in calculating the MSM*. (c) Mapping of the tree in (a) onto the stratigraphic ranges in (b). Ghost ranges are indicated by grey vertical bars. Note that the maximum possible SCI value (the fraction of internal nodes with sister nodes as old or older) is obtained for this tree, compared with lower values of the MSM* and GER. The stratigraphic character is optimized to “evolve” in parallel along two major branches of the tree, hence the moderate value for MSM*. (d) The best possible fit of stratigraphic ranges to the tree in (a), with a GERt of 1.00 by definition. This is obtained by rearranging the stratigraphic data over the terminals, whilst leaving the topology unchanged. Note that the MSM* and GER have values below 1.00, despite the impossibility of obtaining a higher value given a tree of this shape. (e) The worst possible fit of stratigraphic ranges to the tree in (a), with a GERt of 0.00 by definition. Although this corresponds to a GER of 0.00, the lowest possible MSM* value is 0.25. Values of GER∗ shown here are based on 10,000 random permutations. See text for further details. (a) and (b) are partly derivative of Slater (2007). According to Finarelli and Clyde (2002), the retention index of the stratigraphic character is equivalent to the GER and is therefore susceptible to the same biases. An unrelated index, the stratigraphic consistency index (SCI; (2) where MIGu is the sum of ghost ranges for stratigraphic intervals of unit length, and Gtmax and Gtmin are the maximum and minimum possible values of MIGu on the given topology. Although Benton (1994) and Wills (1999) calculated MIG values by summing the lengths of ghost ranges calculated in millions of years, this effectively incorporates the assumption that preservation potential is uniform with time, which is often unwarranted (Finarelli and Clyde, 2002). The MIGu is therefore calculated as the sum of the number of stratigraphic intervals traversed (effectively scaling all to a unit length; Figs. 1 and 2). This assumes that the intervals are at or near the maximum level of stratigraphic resolution for the groups concerned, with the boundaries between intervals being demarcated by distinct events for appropriate correlation. (In fact, we consider that both approaches [measuring intervals in absolute time, or in unit length] are defensible.) Computationally, it is difficult to determine Gtmax and Gtmin with certainty, because the problem is NP-complete. Permuting the assignment of range data over the tree offers a simple way to estimate them. However, because the distribution of possible MIGu values usually has long tails (particularly towards Gtmin ), these estimates are likely to be within the true values (Fig. 3). Moreover, because the distribution is often strongly negatively skewed, Gtmin is more likely to be overestimated than Gtmax is likely to be underestimated, resulting in overestimates of GERt. This is because there are relatively few ways of permuting the data that will yield the minimum possible MIGu value or something closely approximate to it. In order to compute indices in a reasonable time, we estimated Gtmin , Gtmax , and hence GERt from 50,000 permutations of the stratigraphic data in each case. A more refined index is therefore formally proposed, based on an estimate of the underlying distribution of randomized MIGu values, rather than estimates of Gtmax and Gtmin alone. The modified GER (GER∗ ) is estimated by calculating the fraction of the area under a curve of permuted values corresponding to a MIGu value greater than the observed value (Fig. 3). The compliment of GER∗ values also approximate to the probability that an observed level of stratigraphic congruence is obtained purely by chance. Unlike GERt, estimates of GER∗ are much less sensitive to the number of permutations used. They also take the shape of the distribution 2008 WILLS ET AL.—MODIFIED GAP EXCESS RATIO 895 FIGURE 2. The relationship between GER and GERt for a cladogram of sixteen predominantly basal ceratopsian dinosaur taxa and outgroups (Xu et al., 2002). (a) Published cladogram. (b) Observed ranges for the fossil taxa in (a) are plotted as vertical black bars, spanning between the interval in which each taxon first appears in the fossil record and the last interval before its extinction. Ghost ranges are inferred between all pairs of sister taxa that originate at different times and are plotted as vertical grey bars. These ghost ranges have a duration that can be measured as a number of millions of years (Benton, 1994; Wills, 1999) or as the number of stratigraphic intervals traversed (Siddall, 1998; Finarelli and Clyde, 2002; this work). Summing ghost ranges over the tree yields the minimum implied gap (MIGu) for a given cladogram. In this example, all stratigraphic intervals have been ascribed unit length, such that MIGu = MIG. Where taxa originate at different times, a certain number of ghost ranges will be necessary as an absolute minimum in order to unite all observed ranges into a tree structure. Similarly, there is a maximum possible sum of ghost ranges for a given distribution of observed ranges. The gap excess ratio (GER; Wills, 1999) scales the observed MIG between (c) the maximum possible sum of ghost ranges on any tree (Gmax ; equivalent to a GER of 0.0) and (d) the minimum possible sum of ghost ranges on any tree (Gmin ; equivalent to a GER of 1.0). However, balanced and imperfectly balanced trees (those that depart from a pectinate structure) often cannot yield GER values of 0.0 or 1.0, irrespective of how taxa are rearranged across them. This means that balanced trees are scaled over a narrower range of possible GER values than pectinate ones. The topological GER (GERt) is analogous to the GER, except that the MIGu value is scaled between (d) the maximum (Gtmax ) and (e) the minimum (Gtmin ) sum of ghost ranges possible for a given set of observed ranges distributed over a given topology. 896 SYSTEMATIC BIOLOGY VOL. 57 FIGURE 3. The distribution of possible sums of ghost ranges for a given set of observed ranges on a given tree topology can be approximated empirically using randomization. For this data set (Xu et al., 2002), it reveals that there are relatively few ways to obtain the smallest MIGu (or something closely approximate to it) but many more ways to obtain the largest MIGu. The modified GER (GER∗ ) takes account of the shape of this distribution and is approximated by the fraction of randomized values yielding a MIGu greater than or equal to that of the original data. For negatively skewed distributions (such as that here), the GER∗ value will be greater than the corresponding GERt value. Values of Gmin and Gmax that define the GER lie outside of the distribution of possible MIGu values. The GER is depressed relative to the GERt and GER∗ . of MIGu values directly into account. As with the GERt, GER∗ indices were estimated from 50,000 permutations of stratigraphic ranges for each data set. However, highly similar values were obtained using just 1,000 permutations, with the correlation between GER∗ values over all 19 data sets being very high (r = 0.995, P < 0.001). Uncertain Dates of First Occurrence For most measures of congruence, times of first occurrence are considered as single, invariant dates, irrespective of our confidence in those dates. In many cases, however, first occurrences are not known with confidence, or a range of possible first dates is considered by different authorities. Differences in interpretation therefore have the potential to make differences to GER, GERt, and GER∗ values. These differences can become large and significant when the range of first occurrence dates for a particular taxon or subset of taxa is large relative to the total range of first occurrence dates over all taxa. Pol and Norell (2006) proposed a randomization procedure for estimating the shape of a distribution of MSM* or GER values obtained for a set of uncertain or certain and uncertain first occurrence dates. Their approach was to generate a large number of virtual data sets in which the actual first occurrence date of a given taxon was selected at random, and with equal probability, from all the intervals between its oldest and youngest putative date of first occurrence. (We note, in this context, that the method is readily modified to incorporate different probabilities of first occurrence dates within the total range.) These data sets were then analyzed by conventional means in order to estimate the distribution (Fig. 4). A precisely analogous approach is implemented here for the GERt and GER∗ . We report a variety of statistics for the randomized distribution of each metric. We are particularly 2008 WILLS ET AL.—MODIFIED GAP EXCESS RATIO 897 FIGURE 4. The distribution of GER values obtained for 10,000 permutations of variable first occurrence dates for the tree of Xu et al. (2002). The value obtained from the original “static” stratigraphic data (0.758) is close to the modal value for “variable” dates (0.756) but considerably less than the median (0.767) and the mean (0.768). Hence, the uncertain dates of first occurrence tend to yield higher GER values than the original data in this case. A similar procedure is implemented for both the GERt and the GER∗ . interested in whether the “worst” possible perturbations of first occurrence dates across all taxa yield indices that differ markedly from either the mean value or the value for fixed dates. For practical purposes, we implemented 1,000 permutations of flexible first occurrence dates. For each of these, we ran 1,000 permutations of the stratigraphic data across the topology. It is not possible to circumvent the repeated search for Gtmax and Gtmin with each of the 1,000 “adjustments” of first occurrence dates (by, for example, implementing a single, more thorough search) because these may differ for each of what is effectively a new data set. With such a modest number of iterations, GERt values must be treated with caution. GER∗ values are more likely to provide accurate approximations. R ESULTS Over all 19 data sets, congruence was extremely high (Table 1). The average GER (Wills, 1999) for static first occurrence dates was 0.767, with the best data set being the Hadrosauridae (Horner et al., 2004; 0.956) and the worst being the Paraves (Turner et al., 2007; 0.558). The GERt (scaling the MIGu between the maximum and minimum possible on the observed tree topology) returned even higher values, with an average of 0.909. Ten of the data sets had a GERt of 1.000, whereas five of the remaining nine had values over 0.900. The least congruent data sets were the Paraves (Turner et al., 2007; GERt = 0.571), Stegosauria (Galton and Upchurch, 2004b; GERt = 0.611), Prosauropoda (Galton and Upchurch, 2004a; GERt = 0.720), and Ceratopsia (Xu et al., 2002; GERt = 0.755). GER∗ values were the highest of all, with a mean of 0.984. Ten data sets (65%) had a GER∗ of 0.99998 (the highest possible for 50,000 permutations), with only two (Galton and Upchurch, 2004b; Turner et al., 2007) having a GER∗ less than 0.975. Introducing limited uncertainty into the dates of first occurrence produced mean values for most data sets that were generally very similar to those from static datings (GER: r = 0.976, P < 0.0005; GERt: r = 0.923, P < 0.0005; GER∗ : r = 0.954, P < 0.0005). One notable exception was the set of GER values for the Stegosauria (0.455 and 0.579, respectively). Across all 19 data sets, mean GER incorporating uncertainty (Fig. 5) was 0.756 (just 0.011 less than the static value). Even allowing for the worst possible datings, mean GER was only reduced to 0.699. Eleven data sets had their lowest GER above 0.700, whereas only two (Galton and Upchurch [2004b] for Stegosauria [0.192] and Yates [2007] for Sauropodomorpha [0.479]) had a value less than 0.500. GERt values were actually slightly higher for uncertain dates than for static dates, with a mean of 0.932 (a difference of 0.023). Seven data sets (37% of the sample) had a mean GERt of 1.000 (and so therefore also a minimum GERt of 1.000) and were perfectly congruent, even allowing for the vagaries of preservation. Of the remaining 12, just 5 had a mean GERt less than 0.950. Again, the worst possible datings still yielded a high mean GERt (0.843). Remarkably, however, the best possible datings yielded perfect congruence (GERt = 1.000) for 16 of the 19 data sets. Only for the trees of Galton and Upchurch (2004b), Turner et al. (2007), and Xu et al. (2002) was it impossible to adjust dates within the margin of uncertainty so as to achieve a perfect match (maximum GERt values of 0.824, 0.810, and 0.801, respectively). GER∗ values for variable dates had a mean of 0.981, with 10 of 19 data sets having a mean (and minimum) 898 SYSTEMATIC BIOLOGY VOL. 57 FIGURE 5. Box and whisker plots of the distribution of GER values obtained for 1000 permutations of variable first occurrence dates. Data from the 19 published dinosaur trees are arranged in order of median GER. of 0.999 (the highest value possible for 1,000 randomizations). Even the worst possible datings yielded a mean of 0.956, with just two data sets below 0.950. The interpretation of the GER∗ is more straightforward than that of the GER or GERt. GER∗ values above 0.500 indicate greater congruence than the median of the null distribution, and values above 0.975 indicate significantly greater congruence than the null distribution (with P ≤ 0.050, not correcting for multiple tests). For the static first occurrence dates, 16 of 17 data sets were significantly congruent, with just two being indistinguishable from random. Preliminary reanalysis of the 1,000 animal and plant “static” data sets utilized by Benton et al. (2000) and Wills (2007) yielded an average GER∗ of 0.688, with only 34% attaining a GER∗ > 0.975. This strongly suggests that the congruence of dinosaur phylogenies is better than that for a large sample of data sets across a range of other taxa. The greatest discrepancies between static GERt and GER∗ values are for the data sets of Paraves (Turner et al., 2007; a difference of 0.308), Prosauropoda (Galton and Upchurch, 2004a; a difference of 0.269), Ceratopsia (Xu et al., 2002; 0.243), and Stegosauria (Galton and Upchurch, 2004b; 0.235). These produced particularly skewed distributions of MIGu values upon permutation, such that there were very few ways of achieving close to the highest congruence and many ways to achieve close to the poorest congruence (Fig. 3). All other differences were less than 0.100. Such sensitivity to the underlying distribution of possible MIGu values is a desirable property for an index of congruence. D ISCUSSION The majority of the dinosaur cladograms investigated have excellent stratigraphic congruence, and several have the theoretical maximum congruence. Strong agreement between the phylogenetically inferred order of lineage branching and the known stratigraphic order of first occurrences strongly suggests that both reflect the true history and evolution of the group. This implies that our knowledge of the dinosaur fossil record at the taxonomic levels investigated (predominantly species and genera), and the stratigraphic level of stages is adequate 2008 WILLS ET AL.—MODIFIED GAP EXCESS RATIO for palaeontologists to investigate pertinent macroevolutionary patterns and processes (e.g., the origin of birds [Brochu and Norell, 2000; Zhou, 2004] as well as patterns of dinosaur diversity through time [Fastovsky et al., 2004; Wang and Dodson, 2006]) with some confidence. Other explanations for such exceptionally strong congruence appear unlikely in this context. It is possible, for example, that systematists might code their data in such a way as to yield stratigraphically congruent trees a priori. This would require some considerable skill and disingenuity, and there is no evidence that coded character distributions have been more biased by stratigraphic considerations in our sample of dinosaur trees than in any other group. A less potent but more credible possibility concerns outgroup selection. Although the oldest taxa may well be the most plesiomorphic (to precisely the same extent that we expect the phylogenetic and stratigraphic patterns to concur), it does not follow that the outgroup can always be identified reliably on the grounds of antiquity. Factoring chronological age into decisions concerning rooting “contaminates” phylogenetic inferences with the stratigraphic signal, making them nonindependent. This is a more plausible contingency than wholesale biases in coding practice, requiring just one assumption that may contributes to rooting decisions in a manner that is difficult to quantify in the absence of an explicit, larger “parent” tree. However, the relationships of the major dinosaur subclades appear to have converged on a robust consensus, with most nodes being well supported (Pisani et al., 2002). There are therefore excellent ancillary phylogenetic grounds for the choice of outgroup taxa in most or all cases. Moreover, it is unclear why such a putative problem should uniquely afflict the systematics of the nonavian Dinosauria, a group that appears to offer a wealth of morphological character data (Novas, 1996; Langer and Benton, 2006). Problems of this nature might more reasonably be expected to afflict groups whose wider relationships were less certain (e.g., most major subclades of arthropods; Wills, 1999) and which systematists struggle to root by any means. The Relationship Between Cladistic/Stratigraphic Congruence and the Completeness of the Fossil Record There is no single definition of fossil record “completeness” or “quality,” still less an agreed metric to quantify it (Paul and Donovan, 1998). The congruence between cladistic and stratigraphic order is indicative of just one aspect of completeness. At the broadest level, we can distinguish between organismal, taxonomic, palaeobiogeographical, and stratigraphic incompleteness, although each has a bearing on all of the others. Organismal incompleteness refers to the many characters of extinct species that are not amenable to fossilization, in addition to those that, for particular species, happen not to have been preserved. The lack of information on the limbs of most arthropods and the feathers of most birds contribute to incompleteness of this sort. Such “missing data” is cited as a problem in cladistic studies that include fossils purportedly inflating the number of 899 most parsimonious trees found and thereby obfuscating apparent relationships. Simulations (Wiens, 2003b) demonstrate that it is the number of coded characters (as opposed to the number of missing entries) that contributes most strongly to a lack of resolution, whereas empirical studies identify no significant difference between the behavior of fossil taxa and their extant counterparts in this regard (Cobbett et al., 2007). Organismal incompleteness is not necessarily a problem for establishing first (or last) occurrence dates, because even a single fragmentary or poorly preserved specimen with diagnostic characters can establish the presence of a group at a particular time (e.g., MacLeod, 1994). Such incompleteness may be more problematic when attempting to determine if a fossil without diagnostic characters belongs to a particular stem lineage. Taxonomic incompleteness refers to the inference that not all taxa have been preserved as fossils and that not all of these will have been discovered and documented. This bears on the issue of taxon sampling, which may influence phylogenetic inference and the apparent stratigraphic congruence of trees in complex ways (Wagner and Sidor, 2000). Trees are likely to be less complete as the taxonomic level of their terminals decreases; all other things being equal, the chances of a species leaving no record is greater than that for a genus, family, or order. This also implies that first occurrence dates will be less reliable descending the taxonomy. It is for this reason that most macroevolutionary studies of diversity and disparity through time focus on higher taxa (e.g., Wills, 1998; Erwin, 2007; Sahney and Benton, 2008). Nonetheless, many sources of potential bias remain (Smith, 2007). The trees in this study are predominantly at the generic level. As such, many are likely to be incomplete to some degree. Indeed, sophisticated extrapolations of the rates of discovery of new dinosaur genera (Wang and Dodson, 2006) using abundance-based coverage estimators (ACE; Chao et al., 1993) suggest that only 29% of discoverable genera are now known, predicting that another 140 years of research will be required before 90% have been documented. Congruence indices such as the GERt and GER∗ do not measure the absolute extent of ghost ranges. Rather, they measure ghost ranges relative to the distribution of possibilities for the data available. Hence, high congruence is possible with an extremely gappy record (Wills, 1999); the only requirement is that the phylogenetic and stratigraphic signals are concordant. Such congruence may therefore be relatively impervious to rarefaction of the true tree or to the random degradation of the stratigraphic signal (at least up to a point). Simulation studies are needed to investigate this. Our finding of exceptional congruence for most dinosaur groups is therefore perfectly consistent with poor knowledge of generic diversity. The most parsimonious explanation for good agreement between the phylogenetic and stratigraphic signals is that both reflect faithfully (albeit partially) the underlying evolutionary history of the group (Benton et al., 1999; Wills, 2002). This does imply, however, that new discoveries are less likely to have a marked impact 900 SYSTEMATIC BIOLOGY on inferred phylogeny (see a similar example for basal birds: Fountaine et al., 2005) and will probably subdivide existing ghost ranges rather than radically overhaul the stratigraphic data set. We also note that the “known” 29% of genera are unlikely to represent a random sample of discoverable genera, and towards those from the best-sampled and/or most accessible geographic regions. Moreover, many recent finds are being made proactively from hitherto inaccessible and unexploited regions (Wang and Dodson, 2006) engendering a renaissance in dinosaur descriptions. Most of these are readily assigned to existing clades and interpolate smoothly into the existing picture of dinosaur evolution (Pisani et al., 2002). A more problematic form of taxonomic incompleteness may be introduced by systematists themselves. Pearson (1999) noticed that published cladograms tend to be pectinate much more often than predicted by null models of cladogenesis and the random sampling of lineages. This was attributed to a tendency for investigators to include only those taxa proximate to some preconceived transitional series of fossils. This practice is most likely in studies that investigate the origins of one major group from within another (e.g., tetrapods from fish [Clack, 2006], birds from nonavian dinosaurs [Zhou, 2004], mammals from nonmammalian cynodonts [Kemp, 2007]). However, such selective pruning alone need not necessarily result in spuriously inflated congruence. There is no reason why a series of morphologically transitional fossils should be especially congruent with stratigraphy, unless the transitional series were itself identified partly by weeding out “incongruously” placed fossils (rather than solely on the basis of what is understood of morphological evolution). Eight of the dinosaurian data sets investigate the evolution of “basal” groups (i.e., Xu et al., 2002 [basal Ceratopsia]; Rauhut, 2003 [basal theropods]; Holtz et al., 2004 [basal Tetanurae]; Galton and Upchurch, 2004a [basal sauropodomorphs]; Norman, 2004 [basal Iguanodontia]; Turner et al., 2007 [stem birds]; Yates, 2007 [basal sauropodomorphs]; Butler et al., 2008 [basal ornithischians]). However, these are actually more balanced on average than the other 11 trees analyzed (mean lm [Heard 1992] for the two groups is 0.34 and 0.44, respectively). Other forms of taxonomic rarefaction are commonplace, in particular the exclusion of incomplete material and the de facto absence of an indeterminate number of lineages from the record. Arguably, all cladograms including fossils (and to a lesser extent, those that do not; Cobbett et al., 2007) are subject to such uncertain sampling. Although the effects of missing data (Wiens, 2003a, 2003b, 2006) and missing taxa (Poe, 1998; Wagner, 2000b) upon the accuracy of phylogenetic inference have been investigated in some detail, their effects upon stratigraphic congruence require further exploration (Wagner and Sidor, 2000). In this context, it is notable that only one dinosaur clade (Stegosauria) had very poor congruence (a MIGu indistinguishable from random). In this particular case, the poor fit probably resulted from a number of clade-specific factors, including poor phylogenetic reso- VOL. 57 lution and the relatively small number of characters that were incorporated into the analysis (see Galton and Upchurch, 2004a). Paleobiogeographical incompleteness refers to the inferred non-preservation of fossils from some of the geographical regions that the organisms originally inhabited. This is the least problematic form of incompleteness for congruence studies, because first occurrence dates are usually reported globally or for a specified region of interest. We note, however, that the extreme paleobiogeographic discontinuity of sister groups has been cited to imply the existence of ghost ranges of uncertain duration (e.g., Fortey et al., 1996; Lieberman, 2002; Clarke et al., 2007), even in cases where these sister groups appear simultaneously in time. We are aware of no attempts to quantify such putative temporal gaps. Stratigraphic incompleteness refers to the absence of fossils from strata formed at times when a taxon’s existence is otherwise inferred. This may be within taxa (where occurrences pre- and post-date the gap: so-called Lazarus taxa) or between taxa. Congruence indices measure only this second aspect; namely, the extent of inferred stratigraphic gaps between the first occurrence dates of sister taxa. They do not attempt to quantify the intensity of sampling within ranges, nor do they seek to estimate confidence intervals on those ranges. Published compendia typically report the first and last occurrences of taxa (e.g., Benton, 1993), but there is often scant data on the frequency of fossil occurrences between these dates. Just two incidences of a taxon are all that is needed to stake out upper and lower bounds in the fossil record. Within-taxon completeness may be very poor, however, with all of the intervening stratigraphic intervals documenting no fossils. Under these circumstances, our confidence in the first and last dates will be very low, especially if they are widely separated in time. Where the probability of fossilization is uniform, the mean size of gap within the observed range provides an estimate of the time between the first or last occurrence and the actual end of the taxon’s range (Strauss and Sadler, 1989; Marshall, 2001). Methods for extending ranges in cases where fossilization is non-random are a little more complex (e.g., Marshall, 1997; Solow, 2003). A straightforward measure of within-taxon completeness per interval is the simple completeness metric or SCM (Benton, 1987). This expresses the number of constituent taxa (typically species) within a more inclusive taxon of interest observed in a given stratigraphic interval, as a fraction of the total number of observed taxa plus Lazarus taxa (Fara, 2001). A more sophisticated estimate of the probability of preservation per interval can be determined given the numbers of constituent taxa spanning one, two, and three stratigraphic intervals: f (1), f (2), and f (3), respectively. This probability, R (or the “FreqRat”), is given simply by the ratio f (2)2 /[ f (1) f (3)] (Foote and Raup, 1996). Of course, it is not impossible for a species or higher taxon to be known from just one locality or horizon, so that all such measures become impossible to calculate. 2008 WILLS ET AL.—MODIFIED GAP EXCESS RATIO Taxonomic Level and Taxon Sampling Previous studies of large samples of cladograms (e.g., Benton and Hitchin, 1997; Benton et al., 1999; Benton, 2001) caution that the taxonomic level of OTUs in a cladogram is likely to have an influence on stratigraphic congruence. Our modest sample of trees precluded quantifying this effect rigorously, which requires the systematic analysis of hundreds of trees with multivariate methods. We note, however, that our trees were predominantly at the generic level, making our sample relatively homogeneous. Seven trees exclusively sampled genera, and no tree had more than one third of its OTUs at any other level (including species). Over all 19 trees, 84% of terminals were genera. We also note that no dinosaur systematists currently use any conventional Linnean hierarchical terms; these have become outmoded with the advent of rigorous cladistic analyses. This is not to say that the size and number of composite taxa is unimportant, merely that it is beyond the scope of the present study to address it in a rigorous way. A related issue is the intensity of taxon sampling, the proportion of taxa at a given level that are represented in a cladogram. Of course, this is difficult to quantify in absolute terms, because sampling is always influenced by the known and knowable fossil record, as well as by the deliberate exclusion of taxa by the systematist. Dinosaur taxonomists predominantly work on genera and low-level clades. They are usually cautious of higher taxonomic ranks, opting instead for clades based explicitly on cladograms. Moreover, most dinosaur genera are monospecific. This means that dinosaur cladograms are either predominantly based on genera (and therefore nearly equivalent to species trees) or a complex mixture of different ranks (ranging from subfamilies to orders or even classes in traditional taxonomies). Cladograms for basal ceratopsians, Pachycephalosauria, Ceratopsidae, and prosauropods sample around 70% of valid genera. The exceptions are those removed a priori by safe taxonomic reduction (STR; Wilkinson, 1995) or a posteriori by reduced consensus (RC) methods (Wilkinson, 1994). The sauropod trees sample only 25% to 30% of known generic diversity. Some of the shortfall results from the application of STR or RC methods. However, other taxa are excluded because they are extremely incomplete, inaccessible, or very poorly described or because of perceived taxonomic problems in assigning the hypodigm material. In other cases, taxa are excluded purely for expedience. The hadrosaur, ankylosaur, and iguanodontian trees sample less than 50% of the generic diversity and a lower proportion of species diversity. Stegosaurs are represented by around two-thirds of taxa. All of the theropod phylogenies, along with the large ornithischian phylogeny of Butler, woefully undersample the total diversity (as they collapse many diverse clades into OTUs). Nonetheless, they do offer good coverage of all constituent clades. Similarly, the sauropodomorph phylogenies of Upchurch et al. (2007) and Yates (2007) contain at most five sauropod exemplars, representing a clade of over 120 species. 901 Can We Use Stratigraphic Congruence to Choose between Trees for the Same Taxa? Stratigraphic congruence can be used as an ancillary criterion for choosing between two or more otherwise equally optimal trees derived from the same character matrix, with preference being given to the subset of trees that exhibits the best fit to stratigraphy (Wills, 1999). This idea can be extended to comparisons between two or more trees proposed by different authors for an identical set of terminal taxa. Comparisons of the stratigraphic congruence of competing tree topologies for theropod dinosaurs (Rauhut, 2003; Holtz et al., 2004) reveal comparably high indices because the higher-level relationships proposed by these two authors are similar. Because these trees are derived from independent character sets, this agreement corroborates the gross structure of both trees. Interestingly, congruence measures for more exclusive clades within Theropoda (Paraves: Senter, 2007; Turner et al., 2007) were substantially lower than those obtained from the broader scale analyses of Rauhut (2003) and Holtz et al. (2004). One likely reason for this discrepancy is that both Rauhut (2003) and Holtz et al. (2004) used monophyla to represent speciose coelurosaurian clades (including those that comprise Paraves). As a result, ghost lineages within these monophyla are not included in the calculations. The various basal sauropodomorph trees examined showed wide variation in their congruence with stratigraphy. However, with the exception of Upchurch et al. (2007), GER, GERt, and GER∗ values were lower than for most other dinosaur clades. Whether this reflects a poor understanding of phylogeny, a patchy fossil record, or both is unclear. Although these trees include a small number of sauropods, these are merely exemplars of clades containing 120 or more species. Whatever the case, more work on the evolution of basal sauropodomorphs is required. Both of the sauropod phylogenies examined show excellent stratigraphic congruence, although the tree of Wilson (2002) performs better than that of Upchurch et al. (2004). However, because the latter contains more ingroup taxa than the former, we reanalyzed the Upchurch et al. (2004) tree after pruning out the “additional” terminals. This still resulted in poorer congruence than for the Wilson (2002) tree. More generally, we note that adding or redating single terminals can have marked effects on indices of congruence. For example, the Early Jurassic theropod taxon Eshanosaurus is known from only an incomplete lower jaw. Some authors (Zhao and Xu, 1998; Xu et al., 2001) place it controversially in the otherwise exclusively Cretaceous Therizinosauroidea (Clark et al., 2004). If correct, this implies the existence of a ghost lineage that extends through all of the Jurassic and into the lower part of the Early Cretaceous. More importantly, however, because therizinosauroids occupy a derived position within theropod phylogeny, the early occurrence of Eshanosaurus would result in the generation of additional ghost ranges for numerous derived theropod clades. Redating the first occurrence of Therizinosauroidea in the phylogenies of 902 SYSTEMATIC BIOLOGY Rauhut (2003) and Holtz et al. (2004) does indeed result in a drop in all measures of congruence. Conversely, we also note that relatively large numbers of taxa can be added to trees without significantly altering levels of congruence. For example, the addition of nine tyrannosauroid taxa to the phylogeny of Holtz et al. (2004) resulted in only a 0.3% change in GER score. In this case, the limited impact probably results from two factors: (i) most tyrannosauroids are of Late Cretaceous age and have overlapping stratigraphic ranges and hence few substantial ghost lineages; and (ii) all basal tyrannosauroids occur earlier in time than derived forms (Holtz, 2004). Finally, it is interesting to note that the ornithischian phylogeny proposed by Butler et al. (2008) has a higher GER score than its progenitor (Butler, 2005). This probably results almost entirely from the change in position of a single, stratigraphically old clade: the Late Triassic and Early Jurassic Heterodontosauridae (Weishampel et al., 2004b). In earlier cladograms (e.g., Sereno, 1999; Butler, 2005), these resolved as the sister group of the relatively derived Euornithopoda, thereby generating many extensive ghost ranges. Butler et al. (2008), by contrast, recovered Heterodontosauridae close to the base of the ornithischians, removing several ghost ranges in the process. CONCLUSIONS 1. Although notionally scaled between 0.0 and 1.0, the gap excess ratio (GER) is sensitive to differences in tree balance. Only pectinate (maximally imbalanced) trees are always capable of yielding the extreme values. A more balanced tree (i.e., one for which lm > 0; Heard, 1992) is incapable of having GER = 1.00 or 0.00 (unless several taxa first appear in the same interval), no matter how a given set of stratigraphic data is artificially redistributed across its terminals. The topological GER (GERt) represents an advance over the GER by scaling the observed sum of ghost ranges between the maximum and minimum possible for the stratigraphic data on the given topology (and also by scaling the length of stratigraphic intervals to unity, rather than measuring their duration in millions of years). Upper and lower bounds are estimated empirically here by randomization in the absence of a more efficient algorithm. However, because the tails of the distribution of randomized values are typically long, it becomes prohibitively time-consuming to find the bounds for all but the smallest data sets, and the number of randomizations becomes critical. For this reason, we propose a new measure of congruence— the modified GER (GER∗ )—that takes the shape of this distribution directly into account. GER∗ is given simply by the fraction of randomized values yielding a sum of ghost ranges (measured in unit interval lengths: MIGu) greater than the original value. 2. The procedure proposed by Pol and Norell (2006) for investigating the effects of uncertain first occurrence dates on the MSM* and GER is readily extended to other indices. Data on the distribution of values ob- VOL. 57 tained are informative and indicate to what extent perceived congruence is susceptible to differences in the interpretation of the dating of fossil lineages. 3. The congruence between the phylogenetically inferred order of lineage branching and the order in which those same lineages first appear in the fossil record is exceptionally high in most of the dinosaur clades investigated. Only one tree (Stegosauria; Galton and Upchurch, 2004b) had a congruence value indistinguishable from the null model. The most tenable explanation for this agreement is that both phylogenetic and stratigraphic signals reflect the same underlying pattern of evolutionary history. This implies that our knowledge of the dinosaur fossil record is sufficiently complete for palaeontologists to investigate macroevolutionary patterns and processes, at least at the taxonomic level of genera and the stratigraphic level of stages. This finding is consistent with the inferences of Wang and Dodson (2006), who estimated that as many as 71% of dinosaur genera remain to be found. Our results do, however, imply that new discoveries are unlikely to precipitate radical revisions of phylogeny or amendments to the stratigraphic series. 4. Excellent stratigraphic congruence for dinosaurs is probably a function of several factors. Firstly, the long history of wide and intense interest in the group means that dinosaurs have been especially well studied (Weishampel et al., 2004a). Secondly, previous studies have demonstrated that vertebrate cladograms tend to be more congruent with stratigraphy than those of many other groups (Hitchin and Benton, 1997a; Wills, 2001). This may itself be because morphological character sets for vertebrates tend to be less homoplastic than those for many other taxa, facilitating accurate phylogeny reconstruction. Moreover, the preservation potential of bone is high relative to that of many other tissue types. Thirdly, stratigraphic congruence in the Jurassic and Cretaceous appears to be higher across all taxa than at many other times, an observation only partially accounted for by sampling and edge effects (e.g., the “pull of the Recent”; Wills, 2007). ACKNOWLEDGMENTS We thank Michael Benton, Lewis Jacobs, and Norman MacLeod for constructive reviews that helped us to significantly improve the quality of this work. We also thank Frank DeNota and T. Michael Keesey for allowing us to use silhouettes of several of their illustrations. We are grateful to all those who contributed cladograms (wittingly or otherwise) to make this work possible. MAW’s work on the comparison of cladistic and stratigraphic data was supported by BBSRC grant BB/C006682/1. R EFERENCES Angielczyk, K. D. 2002. A character-based method for measuring the fit of a cladogram to the fossil record. Syst. Biol. 51:137–152. Angielczyk, K. D., and D. L. Fox. 2006. Exploring new uses for measures of fit of phylogenetic hypotheses to the fossil record. Paleobiology 32:147–165. Benton, M. J. 1987. Mass extinctions among families of non-marine tetrapods: The data. Mémoire de la Société Géologique de France, Numéro Spécial 150:21–32. 2008 WILLS ET AL.—MODIFIED GAP EXCESS RATIO Benton, M. J. 1993. The fossil record 2. Chapman and Hall, London. Benton, M. J. 1994. Paleontological data and identifying mass extinctions. Trends Ecol. Evol. 9:181–185. Benton, M. J. 2001. Finding the tree of life: Matching phylogenetic trees to the fossil record through the 20th century. Proc. R. Soc. Lond. B Biol. 268:2123–2130. Benton, M. J., and R. Hitchin. 1997. Congruence between phylogenetic and stratigraphic data on the history of life. Proc. R. Soc. Lond. B Biol. 264:885–890. Benton, M. J., R. Hitchin, and M. A. Wills. 1999. Assessing congruence between cladistic and stratigraphic data. Syst. Biol. 48:581–596. Benton, M. J., M. A. Wills, and R. Hitchin. 2000. Quality of the fossil record through time. Nature 403:534–537. Brochu, C. A., and M. A. Norell. 2000. Temporal congruence and the origin of birds. J. Vertebr. Paleontol. 20:197–200. Butler, R. J. 2005. The “fabrosaurid” ornithischian dinosaurs of the Upper Elliot Formation (Lower Jurassic) of South Africa and Lesotho. Zool. J. Linn. Soc. Lond. 145:175–218. Butler, R. J., P. Upchurch, and D. B. Norman. 2008. The phylogeny of the ornithischian dinosaurs. J. Syst. Palaeontol. 6:1–40. Clack, J. A. 2006. The emergence of early tetrapods. Palaeogeog. Palaeoclimatol. Palaeoecol. 232:167–189. Clark, J. M., T. Maryanska, and R. Barsbold. 2004. Therizinosauroidea. Pages 151–164 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Clarke, J. A., D. T. Ksepkac, M. Stucchie, M. Urbinaf, N. Gianninig, S. Bertellii, Y. Narváezj, and C. A. Boyda. 2007. Paleogene equatorial penguins challenge the proposed relationship between biogeography, diversity, and Cenozoic climate change. Proc. Natl. Acad. Sci. USA 104:11545–11550. Clyde, W. C., and D. C. Fisher. 1997. Comparing the fit of stratigraphic and morphologic data in phylogenetic analysis. Paleobiology 23:1– 19. Cobbett, A., M. Wilkinson, and M. A. Wills. 2007. Fossils impact as hard as living taxa in parsimony analyses of morphology. Syst. Biol. 56:753–766. Crampton, J. S., A. G. Beu, R. A. Cooper, C. M. Jones, B. Marshall, and P. A. Maxwell. 2003. Estimating the rock volume bias in paleobiodiversity studies. Science 301:358–360. Dodson, P., C. A. Forster, and S. D. Sampson. 2004. Ceratopsidae. Pages 494–513 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Erwin, D. H. 2007. Disparity: Morphological pattern and developmental context. Palaeontology 50:57–73. Fara, E. 2001. What are Lazarus taxa? Geol. J. 36:291–303. Farris, J. S. 1989. The retention index and the rescaled consistency index. Cladistics 5:417–419. Fastovsky, D. E., Y. F. Huang, J. Hsu, J. Martin-McNaughton, P. M. Sheehan, and D. B. Weishampel. 2004. Shape of Mesozoic dinosaur richness. Geology 32:877–880. Finarelli, J. A., and W. C. Clyde. 2002. Comparing the gap excess ratio and the retention index of the stratigraphic character. Syst. Biol. 51:166–176. Foote, M., and D. M. Raup. 1996. Fossil preservation and the stratigraphic ranges of taxa. Paleobiology 22:121–140. Fortey, R. A., D. E. G. Briggs, and M. A. Wills. 1995. Decoupling cladogenesis from morphological disparity. Biol. J. Linn. Soc. 57:13– 33. Fountaine, T. M. R., M. J. Benton, G. J. Dyke, and R. L. Nudds. 2005. The quality of the fossil record of Mesozoic birds. Proc. R. Soc. Lond. B Biol. 272:289–294 Galton, P. M., and P. Upchurch. 2004a. Prosauropoda. Pages 232–258 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Galton, P. M., and P. Upchurch. 2004b. Stegosauria. Pages 343–362 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Gradstein, F. M., J. G. Ogg, and A. G. Smith. 2004. A geologic time scale 2004. Cambridge University Press, Cambridge. Heard, S. B. 1992. Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees. Evolution 46:1818– 1826. 903 Hitchin, R., and M. J. Benton. 1997a. Congruence between parsimony and stratigraphy: Comparisons of three indices. Paleobiology 23:20– 32. Hitchin, R., and M. J. Benton. 1997b. Stratigraphic indices and tree balance. Syst. Biol. 46:563–569. Holtz, T. R. Jr. 2004. Tyrannosauroidea. Pages 111–136 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Holtz, T. R., Jr., R. E. Molnar, and P. J. Currie. 2004. Basal Tetanurae. Pages 71–110 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Horner, J. R., D. B. Weishampel, and C. A. Forster. 2004. Hadrosauridae. Pages 438–465 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Huelsenbeck, J. P. 1994. Comparing the stratigraphic record to estimates of phylogeny. Paleobiology 20:470–483. Kemp, T. S. 2007. The origin of higher taxa: Macroevolutionary processes, and the case of the mammals. Acta Zool. 88:3–22. Langer, M. C., and M. J. Benton. 2006. Early dinosaurs: A phylogenetic study. J. Syst. Palaeontol. 4:309–358. Lieberman, B. S. 2002. Phylogentic analysis of some basal early Cambrian trilobites, the biogeographic origins of the Eutrilobita, and the timing of the Cambrian radiation. J. Paleont. 76:692–708 MacLeod, K. G. 1994. Bioturbation, inoceramid extinction, and midMaastrichtian ecological change. Geology 22:139–142. Marshall, C. R. 1997. Confidence intervals on stratigraphic ranges with non-random distributions of fossil horizons. Paleobiology 23:165– 173. Marshall, C. R. 2001. Confidence limits in stratigraphy. Pages 542–545 in Palaeobiology II (D. E. G. Briggs and P. R. Crowther, eds.). Blackwell, Oxford, UK. Maryanska, T., R. E. Chapman, and D. B. Weishampel. 2004. Pachycephalosauria. Pages 464–477 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Norell, M. A., and M. J. Novacek. 1992. Congruence between superpositional and phylogenetic patterns—Comparing cladistic patterns with fossil records. Cladistics 8:319–337. Norman, D. B. 2004. Basal Iguanodontia. Pages 413–437 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Novas, F. E. 1996. Dinosaur monophyly. J. Vertebr. Paleontol. 16:723– 741. Paul, C. R. C. and S. K. Donovan. 1998. An overview of the completeness of the fossil record. Pages 111–131 in The adequacy of the fossil record (C. R. C. Paul and S. K. Donovan, eds). John Wiley, New York. Pearson, P. N. 1999. Apomorphy distribution is an important aspect of cladogram symmetry. Syst. Biol. 48:399–406. Pisani, D., A. M. Yates, M. C. Langer, and M. J. Benton. 2002. A genuslevel supertree of the Dinosauria. Proc. R. Soc. Lond. B Biol. 269:915– 921. Poe, S. 1998. Sensitivity of phylogeny estimation to taxonomic sampling. Syst. Biol. 47:18–31. Pol, D., and M. A. Norell. 2001. Comments on the Manhattan stratigraphic measure. Cladistics 17:285–289. Pol, D., and M. A. Norell. 2006. Uncertainty in the age of fossils and the stratigraphic fit to phylogenies. Syst. Biol. 55:512–521. Rauhut, O. W. M. 2003. The interrelationships and evolution of basal theropod dinosaurs. Spec. Pap. Palaeontol. 69:1–213. Sahney, S. and M. J. Benton. 2008. Recovery from the most profound mass extinction of all time. Proc. R. Soc. Lond. B Biol. Sci. 275:759–765. Senter, P. 2007. A new look at the phylogeny of Coelurosauria (Dinosauria: Theropoda). J. Syst. Palaeontol. 5:429–463. Sereno, P. C. 1999. The evolution of dinosaurs. Science 284:2137–2147. Siddall, M. E. 1996. Stratigraphic consistency and the shape of things. Syst. Biol. 45:111–115. Siddall, M. E. 1998. Stratigraphic fit to phylogenies: A proposed solution. Cladistics 14:201–208. Slater, C. S. C. 2007. A study of supertree construction using mammalian phylogenies and information from the fossil record. Unpublished PhD Thesis, University of Cambridge. 904 SYSTEMATIC BIOLOGY Smith, A. B. 2003. Making the best of a patchy fossil record. Science 301:321–322. Smith, A. B. 2007. Marine diversity through the Phanerozoic: Problems and prospects. J. Geol. Soc. 164:731–745. Solow, A. R. 2003. Estimation of stratigraphic ranges when fossil finds are not randomly distributed. Paleobiology 29:181– 185. Strauss, D., and P. M. Sadler. 1989. classical confidence intervals and Bayesian probability estimates for ends of local taxon ranges. Math. Geol. 21:411–427. Turner, A. H., D. Pol, J. A. Clarke, G. M. Erickson, and M. A. Norell. 2007. A basal dromaeosaurid and size evolution preceding avian flight. Science 317:1378–1381. Upchurch, P., P. M. Barrett, and P. Dodson. 2004. Sauropoda. Pages 259–322 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Upchurch, P., P. M. Barrett, and P. M. Galton. 2007. A phylogenetic analysis of basal sauropodomorph interrelationships: Implications for the origin of sauropod dinosaurs. Spec. Pap. Palaeontol. 77:57– 90. Vickaryous, M. K., T. Maryanska, and D. B. Weishampel. 2004. Ankylosauria. Pages 363–392 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Wagner, P. J. 2000a. Exhaustion of morphologic character states among fossil taxa. Evolution 54:365–386. Wagner, P. J. 2000b. The quality of the fossil record and the accuracy of phylogenetic inferences about sampling and diversity. Syst. Biol. 49:65–86. Wagner, P. J., and C. A. Sidor. 2000. Age rank/clade rank metrics— Sampling, taxonomy, and the meaning of “stratigraphic consistency.” Syst. Biol. 49:463–479. Wang, S. C., and P. Dodson. 2006. Estimating the diversity of dinosaurs. Proc. Natl. Acad. Sci. USA 103:13601–13605. Weishampel, D. B., P. M. Barrett, R. A. Coria, J. Le Loeuff, X. Xu, X. J. Zhao, A. Sahni, E. M. P. Gomani, and C. R. Noto. 2004. Dinosaur distribution. Pages 517–606 in The Dinosauria, 2nd edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Weishampel, D. B., P. Dodson, and H. Osmólska. 2004. The Dinosauria, 2nd Edition (D. B. Weishampel, P. Dodson, and H. Osmólska, eds.). University of California Press, Berkeley. Wiens, J. J. 2003a. Incomplete taxa, incomplete characters, and phylogenetic accuracy: Is there a missing data problem? J. Vertebr. Paleontol. 23:297–310. VOL. 57 Wiens, J. J. 2003b. Missing data, incomplete taxa, and phylogenetic accuracy. Syst. Biol. 52:528–538. Wiens, J. J. 2006. Missing data and the design of phylogenetic analyses. J. Biomed. Informat. 39:34–42. Wilkinson, M. 1994. Common cladistic information and its consensus representation—Reduced Adams and reduced cladistic consensus trees and profiles. Syst. Biol. 43:343–368. Wilkinson, M. 1995. Coping with abundant missing entries in phylogenetic inference using parsimony. Syst. Biol. 44:501–514. Wills, M. A. 1998. Crustacean disparity through the Phanerozoic: Comparing morphological and stratigraphic data. Biol. J. Linn. Soc. 65:455–500. Wills, M. A. 1999. Congruence between phylogeny and stratigraphy: Randomization tests and the gap excess ratio. Syst. Biol. 48:559– 580. Wills, M. A. 2001. How good is the fossil record of arthropods? An assessment using the stratigraphic congruence of cladograms. Geol. J. 36:187–210. Wills, M. A. 2002. The tree of life and the rock of ages: Are we getting better at estimating phylogeny? Bioessays 24:203–207. Wills, M. A. 2007. Fossil ghost ranges are most common in some of the oldest and some of the youngest strata. Proc. R. Soc. Lond. B Biol. 274:2421–2427 Wilson, J. A. 2002. Sauropod dinosaur phylogeny: Critique and cladistic analysis. Zool. J. Linn. Soc. Lond. 136:217–276. Xu, X., P. J. Makovicky, X.-L. Wang, M. A. Norell and H.-L. You. 2002. A ceratopsian dinosaur from China and the early evolution of Ceratopsia. Nature 416:314–317. Xu, X., X. J. Zhao, and J. M. Clark. 2001. A new therizinosaur from the Lower Jurassic Lower Lufeng Formation of China. J. Vertebr. Paleontol. 21:477–483. Yates, A. M. 2007. The first complete skull of the Triassic dinosaur Melanorosaurus Haughton (Sauropodomorpha: Anchisauria). Spec. Pap. Palaeontol. 77:9–55. Yates, A. M., and J. W. Kitching. 2003. The earliest known sauropod dinosaur and the first steps towards sauropod locomotion. Proc. R. Soc. Lond. B Biol. 270:1753–1758. Zhao, X. J., and X. Xu. 1998. The oldest coelurosaurian. Nature 394:234– 235. Zhou, Z. H. 2004. The origin and early evolution of birds: Discoveries, disputes, and perspectives from fossil evidence. Naturwissenschaften 91:455–471. First submitted 8 January 2008; reviews returned 25 March 2008; final acceptance 25 August 2008 Associate Editor: Norman MacLeod
© Copyright 2026 Paperzz