Syst. Biol. 63(1):31–54, 2014 © The Author(s) 2013. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: [email protected] DOI:10.1093/sysbio/syt058 Advance Access publication August 20, 2013 Accelerated Rate of Molecular Evolution for Vittarioid Ferns is Strong and Not Driven by Selection CARL J. R OTHFELS1,2,∗ AND ERIC SCHUETTPELZ3,4 1 Department of Biology, Duke University, Box 90338, Durham, NC 27708, USA; 2 Department of Zoology, University of British Columbia, #4200-6270 University Blvd., Vancouver, BC V6T 1Z4, Canada; 3 Department of Biology and Marine Biology, University of North Carolina Wilmington, 601 South College Road, Wilmington, NC 28403, USA; and 4 Department of Botany (MRC 166), National Museum of Natural History, Smithsonian Institution, PO Box 37012, Washington DC 20013-7012, USA ∗ Correspondence to be sent to: Department of Zoology, University of British Columbia, #4200-6270 University Blvd., Vancouver, BC V6T 1Z4, Canada; E-mail: [email protected]. Received 16 January 2013; reviews returned 26 March 2013; accepted 15 August 2013 Associate Editor: Roberta Mason-Gamer Molecular evolutionary rate heterogeneity can take many forms, ranging from variation among nucleotide substitution types (Kimura 1980) to variation among sites (Yang 1996), loci (Wolfe et al. 1989b; Small et al. 1998), genomic regions (Wolfe et al. 1989a), and genomic compartments (Wolfe et al. 1987; Baer et al. 2007). Perhaps most vexing, however, is lineage-specific rate heterogeneity, whereby some lineages have significantly different rates of molecular evolution than do their close relatives. Such violations of a molecular clock (Zuckerkandl and Pauling 1962, 1965) are ubiquitous across the tree of life and have been characterized within vertebrates (Bromham 2002; Hoegg et al. 2004; BinindaEmonds 2007), invertebrates (Hebert et al. 2002; Schon et al. 2003; Shao et al. 2003; Thomas et al. 2006; Singh et al. 2009), fungi (Lutzoni and Pagel 1997; Moncalvo et al. 2000; Woolfit and Bromham 2003; Zoller and Lutzoni 2003; Lumbsch et al. 2008), algae (Zoller and Lutzoni 2003), bacteria (Woolfit and Bromham 2003), liverworts (Lewis et al. 1997), seed plants (Bousquet et al. 1992; Muse 2000; Davies et al. 2004; McCoy et al. 2008; Smith and Donoghue 2008; Xiang et al. 2008), and ferns (Soltis et al. 2002; Des Marais et al. 2003; Schneider et al. 2004; Schuettpelz and Pryer 2006; Korall et al. 2010; Li et al. 2011; Rothfels et al. 2012). Consequently, this phenomenon has important implications for our understanding of evolution, as well as for our ability to infer and date evolutionary events. Much of the research into molecular evolutionary rate heterogeneity has focused on finding a correlation between the rate of molecular evolution and some natural history attribute of the organism in question, using multiple independent comparisons (reviewed in Lanfear et al. 2010). However, cases limited to a single potential rate change, or where obvious candidate correlated traits are lacking, are still tractable within a model selection framework. Using this approach one can ask if a potential rate discrepancy is significant by comparing models that permit particular groups to have individual rates (local clocks) to models that enforce a global clock. This approach provides an elegant solution to problems associated with the lack of independence among branches and differing taxon sampling density, does not require the a priori identification of a potential correlated trait of interest, and has enjoyed considerable popularity (Lutzoni and Pagel 1997, their “method 2”; Yoder and Yang 2000; Bromham and Woolfit 2004; Lanfear et al. 2007; Korall et al. 2010; Lanfear 2010; Neiman et al. 2010); however, note the caveats described by (Lanfear, 2010). In this study, we use sequence data from all three plant genomic compartments—nuclear, mitochondrial, and plastid—to probe for signatures of rate heterogeneity in a clade of ferns. We first utilize a maximum likelihood (ML) method to look for rate heterogeneity at the nucleotide level, which requires the a priori Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Abstract.—Molecular evolutionary rate heterogeneity—the violation of a molecular clock—is a prominent feature of many phylogenetic data sets. It has particular importance to systematists not only because of its biological implications, but also for its practical effects on our ability to infer and date evolutionary events. Here we show, using both maximum likelihood and Bayesian approaches, that a remarkably strong increase in substitution rate in the vittarioid ferns is consistent across the nuclear and plastid genomes. Contrary to some expectations, this rate increase is not due to selective forces acting at the protein level on our focal loci. The vittarioids bear no signature of the change in the relative strengths of selection and drift that one would expect if the rate increase was caused by altered post-mutation fixation rates. Instead, the substitution rate increase appears to stem from an elevated supply of mutations, perhaps limited to the vittarioid ancestral branch. This generalized rate increase is accompanied by extensive fine-scale heterogeneity in rates across loci, genomes, and taxa. Our analyses demonstrate the effectiveness and flexibility of trait-free investigations of rate heterogeneity within a modelselection framework, emphasize the importance of explicit tests for signatures of selection prior to invoking selection-related or demography-based explanations for patterns of rate variation, and illustrate some unexpected nuances in the behavior of relaxed clock methods for modeling rate heterogeneity, with implications for our ability to confidently date divergence events. In addition, our data provide strong support for the monophyly of Adiantum, and for the position of Calciphilopteris in the cheilanthoid ferns, two relationships for which convincing support was previously lacking. [Adiantum; Calciphilopteris; codon models; divergence time dating; local clocks; model selection; molecular clock; mutation rate; nucleotide substitution rate; Pteridaceae; rate heterogeneity; relaxed clocks; trigenomic analyses.] 31 [13:56 4/12/2013 Sysbio-syt058.tex] Page: 31 31–54 32 SYSTEMATIC BIOLOGY MATERIALS AND METHODS Taxon Sampling We sampled 26 species, including 8 species from each focal clade within the Pteridaceae (cheilanthoids, vittarioids, and Adiantum), and 2 outgroup species [13:56 4/12/2013 Sysbio-syt058.tex] (Cryptogramma crispa and Pityrogramma austroamericana; Appendix 1). The selected species span the phylogenetic diversity of each clade, with at least one representative included from each major subclade and were selected to additionally capture the branch length variation in each clade (Crane et al. 1995; Schuettpelz et al. 2007; Windham et al. 2009; Lu et al. 2011b). Equal sampling across clades was adopted to avoid the potential for node-density effect artifacts (Fitch and Beintema 1990; Bromham 2002; Venditti et al. 2006; Hugall and Lee 2007). The likelihood of biases being introduced due to punctuated evolution associated with speciation events (Pagel et al. 2006) is also minimized, given that vittarioids—hypothesized to have the fastest rates—constitute the smallest clade. DNA Extraction, Amplification, and Sequencing DNA was extracted from silica-dried leaf tissue or herbarium fragments in the Fern Lab Database (fernlab.biology.duke.edu) archive. Sequences were obtained for three plastid loci (atpA, atpB, rbcL), two mitochondrial loci (atp1, nad5), and one nuclear locus (gapCp); note that plastids and mitochondria are maternally inherited in this group of ferns (Gastony and Yatskievych 1992). The three plastid loci were amplified and sequenced using previously published protocols (Pryer et al. 2004; Schuettpelz et al. 2006). Most mitochondrial atp1 sequences were obtained using primers F83-atp1 and R725-atp1 (Wikström and Pryer 2005; full data for all primers used are available in Supplementary Table S1), following the protocol of Wikström and Pryer (2005). When reactions failed, we amplified and sequenced atp1 with primers CRATP1F1 and CRATP1R1, using a standard reaction mix (Schuettpelz and Pryer 2007). Our thermal cycling program consisted of an initial denaturation step (94◦ C for 3 min), followed by 35 denaturation, annealing, and elongation cycles (94◦ C for 45 s, 55◦ C for 30 s, 72◦ C for 2 min), and a final elongation step (72◦ C for 10 min). Taxa with a type II intron (Cryptogramma and Pityrogramma, see Results section) required additional sequencing primers F328-atp1, F411-atp1, and R348-atp1 (Wikström and Pryer 2005). Initial amplifications of nad5 were performed as for atp1, but with primers K and L (Vangerow et al. 1999) and a modified thermal cycling program consisting of an initial denaturation step (94◦ C for 3 min), followed by 40 denaturation, annealing, and elongation cycles (94◦ C for 45 s, 45◦ C for 30 s, 72◦ C for 3 min), and a final elongation step (72◦ C for 10 min). The nad5 amplifications tended to be weak and often resulted in multiple bands. These PCR products were therefore cloned, and resulting colonies amplified, following the protocol described by Schuettpelz et al. (2008). We sequenced the colony amplifications using primers K, L, KLEX, FLIN, LISEX, M13F, and M13R (V.Knoop, unpublished data; Invitrogen; Vangerow et al. 1999). Particularly recalcitrant taxa were amplified using primers CRNAD5F1 and CRNAD5R1 (using the Page: 32 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 selection of models to be compared from among a near-infinite array of possible models. To explore the impact of this requirement, we then employ a Bayesian framework, utilizing two relaxed clock models. These models explicitly incorporate rate variation across the tree without the a priori division of the tree into particular clades or classes. Finally, we conduct a series of codon-based ML analyses to distinguish between mutation-driven and fixation-driven causes of elevated substitution rate. Our focal group consists of the cheilanthoid ferns (cheilanthoids), the genus Adiantum, and the vittarioid ferns (vittarioids) in the family Pteridaceae. Each of these subclades is fairly large (approximately 400, 200, and 100 species, respectively; Crane 1997; Schuettpelz et al. 2007; Lu et al. 2011b), and moderately old. The divergence between the cheilanthoids and the vittarioids + Adiantum (collectively referred to as the adiantoids) is estimated at approximately 90 Ma, and that between the vittarioids and Adiantum at about 70 Ma (Schuettpelz and Pryer 2009). Earlier molecular analyses inferred considerably longer branch lengths for the vittarioids than for the remainder of the Pteridaceae (Schuettpelz and Pryer 2007; Schuettpelz et al. 2007), suggesting an increase in the vittarioid rate of molecular evolution. This increase may well be coupled with other changes, as the vittarioids are very different from their relatives in terms of their ecology, population biology, and morphology. The vittarioids are obligate epiphytes in tropical habitats, have dramatically simplified leaf morphologies (many species are colloquially termed “shoestring ferns” and others do not exceed lengths of a centimeter or two or widths of more than two millimeters), and have long-lived gametophytes that are capable of asexual propagation via gemmae (Farrar 1974; Crane et al. 1995). Here, we assess the nature and degree of molecular rate heterogeneity in our focal group and identify the pool of biological mechanisms that are tenable explanations for the suggested vittarioid rate increase. In doing so, we provide an example of a focused, traitfree analysis of molecular rates, demonstrate nuances in the behavior of recently developed models that accommodate rate variation, and show the utility and importance of associated analyses of selection. Our study aims to determine: (i) whether vittarioids have elevated rates of molecular evolution compared with those of their closest relatives; (ii) whether observed rate differences are comparable across loci and genomic compartments, and consistent with possible life-history based explanations; and (iii) whether selection drove the patterns observed. VOL. 63 31–54 2014 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES [13:56 4/12/2013 Sysbio-syt058.tex] only two random-addition-sequence starting trees. This analysis allowed us to phylogenetically discriminate gapCp “short” sequences from gapCp “long” and gapC sequences (Supplementary Fig. S1). The originally targeted gapCp “short” locus was the best represented; sequences from the other loci were discarded, along with the gapCp “short” sequences from taxa outside our focal sample. Among the remaining gapCp “short” sequences were two copies from each of the four sampled members of the Adiantum raddianum clade, suggesting a duplication event had occurred on its stem branch (Supplementary Fig. S1). To yield a nuclear data set that was maximally comparable to the plastid and mitochondrial data sets, we removed one set of copies (the two copies have similar branch lengths and preliminary analyses indicated no effect of retaining one over the other). We were left with a single gapCp “short” sequence from each of 22 (of 26) sampled taxa (gapCp “short” sequences could not be obtained from four vittarioid species). For simplicity, these gapCp “short” sequences are hereafter referred to simply as gapCp sequences. In total, 117 sequences were newly obtained for this study and deposited in GenBank (Appendix 1; Supplementary Table S2). An additional 78 previously published sequences were used to complete our data sets (Appendix 1; Supplementary Table S2). Sequence Alignment and Data Set Construction The plastid and mitochondrial loci were aligned individually, by eye, in Mesquite v2.72 (Maddison and Maddison 2011). Ambiguous portions of each alignment were excluded prior to subsequent analyses. For mitochondrial nad5, unambiguous indels were recoded following Simmons and Ochoterena’s (2000) simple gap recoding method, to yield an additional data set used in the topology searches, but not in subsequent analyses. The intron sequences within our retained gapCp data were significantly divergent, making by-eye alignment unreliable. In order to infer an objective alignment for these regions, we used BAli-Phy v2.1.0 (Suchard and Redelings 2006; Redelings and Suchard 2007; e.g., Gaya et al. 2011), which coestimates alignment and phylogeny in a Bayesian framework. These analyses were run under a seven-partition scheme (one partition for each of the three intron and four exon regions included in the sequenced section; Schuettpelz et al. 2008), with topologies and branch lengths linked across partitions (a branch length scaling parameter was included to allow intron and exon sequences to differ in their global rates). Substitution parameters were estimated for the exon partitions independently from those for the intron partitions. All partitions were analysed under a GTR+gwF substitution model (Tavaré 1986; Goldman and Whelan 2002), with among-site rate heterogeneity modeled by a five-state Dirichlet process for the exons and a three-state process for the introns. The RS07 Page: 33 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 program described above, but with 35 cycles and an annealing temperature of 55◦ C) and direct sequenced with CRNAD5F1, CRNAD5F2, CRNAD5F3, CRNAD5R1, CRNAD5R2, CRNAD5R3, and LISIN (Supplementary Table S1). Nuclear gapCp sequences were obtained following the protocol of Schuettpelz et al. (2008). For hard-toamplify vittarioid species, primer ESGAPCP11R1 was replaced with CJRGAPVITR1 for both amplification and sequencing. Although we attempted to target gapCp “short” sequences through visualization of the colony amplifications on agarose gels, we also obtained sequences from the gapC and gapCp “long” paralogs (Schuettpelz et al. 2008). Our total pool of sequences was thus filtered in a stepwise manner to arrive at a set of gapCp “short” sequences for subsequent analysis. From the sequences obtained for each taxon, we first removed any duplicates and all those sequences containing indels in the exons (exon length is highly conserved in this gene family and exons with indels are almost certainly indicative of pseudogenes; Peterson et al. 2003; Schuettpelz et al. 2008). We then manually screened for and removed chimaeric sequences, presumably resulting from PCR-mediated recombination (Cronn et al. 2002). Because of the considerable phylogenetic depth of our study (an ingroup sample of 24 species selected from across a clade with a crown age of approximately 90 myr; Schuettpelz and Pryer 2009), our goal was not to identify each of the segregating alleles, but rather to capture all of the major copy-types present in each accession. We therefore constructed an exononly alignment of the sequences remaining from each sample (independently for each sample) and inferred from that alignment maximum parsimony trees through a branch-and-bound search in PAUP* v4.0b10 (Swofford 2002). From the unrooted most parsimonious trees (or, when multiple trees were found, consensus trees), we designated as distinct all clusters of sequences that differed from their nearest neighbors by more than 10 substitutions. The use of 10 substitutions was an ad hoc cut-off based on preliminary examinations of the trees; other cut-off values (from 5 to 20 substitutions) gave similar results (data not shown). From each cluster we selected the sequence that was closest to the geometric center of the cluster (the least apomorphic sequence) and discarded the others. We then constructed a large exon-only alignment comprising our partially filtered set of new sequences, three well-characterized sequences from the fern genus Cystopteris, and a set of previously published sequences of known copy type from Martin et al. (1993), MeyerGauen et al. (1994), and Schuettpelz et al. (2008). We analysed this data set using ML in PAUP* (Swofford 2002), under a GTR+I+G model with parameters fixed at values obtained from jModeltest 0.1.1 (Posada 2008). This search utilized TBR branch swapping, and was repeated 100 times from independent random-additionsequence starting trees. To assess support, we also performed 500 ML bootstrap replicates, with the same settings as above, but with each search performed from 33 31–54 34 SYSTEMATIC BIOLOGY Tree Reconstruction We analysed each of the six single-gene data sets (atp1, atpA, atpB, gapCp, nad5, and rbcL), as well as the nad5 recoded indels data set, using ML and Bayesian approaches. ML analyses of the single-gene data sets were conducted with RAxML v7.2.6 (Stamatakis 2006), using the GTR+G model of sequence evolution and the option to conduct a rapid bootstrap analysis (1000 replicates) and a search for the best-scoring tree in a single program run (Stamatakis et al. 2008). The indel data set was analysed in a similar fashion but utilizing BINGAMMA, a likelihood model for binary data (Lewis 2001) that accommodates rate heterogeneity. Bayesian analyses were executed in MrBayes v3.1.1 (Ronquist and Huelsenbeck 2003), using the GTR+G model for genes and the STANDARD+G model for recoded indels. We ran 4 independent runs, each with 4 chains, for 10 million generations. We sampled trees every 1000 generations and assessed convergence by examining the standard deviation of split frequencies within the output and by plotting parameter values in Tracer v1.5 (Rambaut and Drummond 2007). We very conservatively excluded the first 2.5 million generations from each run as the burn-in and computed a majority rule consensus from the pooled trees using the “sumt” command. We analysed the combined data sets (plastid data, mitochondrial data, all genes, all data) as above, allowing for partition-specific parameter estimates. Likelihood Analysis of Nucleotide Models To investigate lineage-specific rate heterogeneity in our data, within and among loci and across genomic [13:56 4/12/2013 Sysbio-syt058.tex] compartments, we adopted a model comparison approach using the program baseml of the PAML v4.4e package (Yang 2007). We selected eight models of particular biological interest for comparison (Table 1), each of which incorporates a GTR+G substitution model with independent parameter estimates for each included partition. Our models vary in whether branch rates are considered to be proportional (vs. independent) among partitions, the presence of a molecular clock, and (if present) how such a molecular clock is enforced. All of our analyses utilized a fixed topology— obtained via phylogenetic analysis of our combined data set (Fig. 1c). Because we were interested in relative rather than absolute rates, when a timescale was necessary, we fixed the stem age of the ingroup at 10 arbitrary time units (rate estimates are thus in units of substitutions per site per arbitrary time unit). We repeated each analysis 10 times, independently, to avoid results based on suboptimal peaks in the likelihood surface. Here, we report results from the runs with the highest likelihoods (results from all runs are available at the Dryad repository (doi:10.5061/dryad.c5m42), summarized from the PAML outputs using the Python script PAMLparser (Supplementary Appendix S1). Three of our chosen models do not incorporate a molecular clock. The first, bas1, is the only model investigated that links substitution parameters across partitions; in addition, it requires that branch lengths be proportional across partitions (Table 1). Model bas2 is the most parameter-rich model and is equivalent to analysing each partition independently. Only the fixed topology is shared among partitions and branch rates are not required to be proportional. The third clockless model (bas4) adds a proportionality requirement. Here, there can be faster and slower partitions, but individual branch lengths must remain proportional. We also examined two models in which a global molecular clock is enforced. The more complex of the two (bas3, Table 1) is, as in bas2, equivalent to analysing each partition independently, but here each partition has a global clock. The simpler model (bas5, Table 1) additionally requires a shared set of branch times while allowing the global clock rate to vary among partitions. Finally, we selected three models that incorporate local molecular clocks. The most complex of these models (bas8, Table 1) treats each partition as fully independent, including branch times and rates. The simpler models both require a shared set of branch times, but differ in how local clock rates vary from partition to partition. One (bas7) allows each partition to have its own unique set of clock rates. The other (bas6) requires the local clock rates to be proportional among partitions (if a given local clock is fast in one partition, it must be fast in all). In our local clock analyses (models bas6, bas7, and bas8) we chose to compare four distinct regimes (Fig. 2). Each regime assigns a unique rate parameter (clock) to the outgroup branches and deep ingroup branches following the “nuisance parameter” arguments of Lanfear (2010). The first three regimes allocate a second Page: 34 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 indel model (Redelings and Suchard 2007) was applied to the three intron partitions, which shared indel model parameters (there were no gaps within the exon partitions, so no indel model was applied to them; Gaya et al. 2011). Priors were set at their default values and seven independent chains were run, each for 100,000 generations, and sampled every 10 generations. Inspection of parameter traces (Suchard and Redelings 2006; Rambaut and Drummond 2007) indicated that each chain converged (to the same area of parameter space) by 8000 generations. To be conservative, we excluded the first 10,000 generations of each run, and pooled the remaining generations across runs before computing the posterior. For all parameters, the pooled effective sample sizes were >2500. The maximum posterior decoding alignment (the alignment that has the maximum sum of column posteriors, weighted by the number of nucleotides in that column; Redelings, personal communication) from each partition was used to produce an alignment for the complete included region of gapCp. Sites for which only one taxon had a character state other than a gap were excluded prior to subsequent analyses; all other sites were retained. The alignments, and associated trees, are available in TreeBASE (accession S14194). VOL. 63 31–54 2014 Descriptions and parameter counts for the general nucleotide substitution models used in this study. NA, not applicable. For the parameter counts: n, number of tips (taxa); p, number of partitions in the analysis; and c, number of predefined local clocks. a Among partitions. 2n + 2p + 4 2np+6p np+8p 2n+10p−4 n+10p−2 n+10p+c−3 cp+n+9p−2 np+cp+7p p−1 0 0 p−1 p−1 (p−1)+(c−1) c(p−1)+(c−1) p(c−1) 2n−3 p(2n−3) p(n−1) 2n−3 n−1 n−1 n−1 p(n−1) 3 3p 3p 3p 3p 3p 3p 3p None None Global None Global Local Local Local bas1 bas2 bas3 bas4 bas5 bas6 bas7 bas8 Shared Independent Independent Independent Independent Independent Independent Independent Proportional Independent NA Proportional NA NA NA NA NA NA NA NA NA Proportional Independent Independent NA NA Independent NA Shared Shared Shared Independent 0 1 1 4 4 4 4 1 0 0 1 0 1 2 3 2 5 5p 5p 5p 5p 5p 5p 5p p p p p p p p p Rate scalars Branch lengths Gamma shape Base frequency Exchange clock Clock(s) Model Substitution parametersa Branch lengthsa Branch ratesa Branch timesa mgene Time parameters Substitution parameters baseml settings Nucleotide models used TABLE 1. [13:56 4/12/2013 Sysbio-syt058.tex] 35 rate to the cheilanthoids, Adiantum, and the vittarioids, respectively, leaving the third rate to be shared by the remaining two clades. The fourth regime gives each of the major clades its own rate. For each of the eight general models, the data were analysed according to an unpartitioned scheme, a threepartition scheme (one partition for each of the three genomic compartments) and a six-partition scheme (one partition for each locus). Model fit was evaluated using the small-sample correction for the Akaike Information Criterion (AICc: Akaike 1974; Hurvich and Tsai 1989), which converges to the AIC as sample size increases and has a reduced propensity for selecting unduly parameterized models when sample size is small (Burnham and Anderson 2004). Smaller AICc scores indicate better fit and, as a general rule of thumb, any model with an AICc score four or more points above the best-fitting model has considerably less support (Burnham and Anderson 2002). For all models, sample size for the AICc calculation was considered to be the number of nucleotide characters (sites) in the alignment. Likelihood Analysis of Codon Models To evaluate whether any lineage-specific rate heterogeneity might be explained by the effects of selection (at the protein level) on our focal loci, we also performed a series of analyses using the program codeml of the PAML v4.4e package (Yang 2007). We selected four codon models (Goldman and Yang 1994) of particular biological interest for comparison (Table 2), each with a parameter () representing the transition:transversion ratio, a second parameter (ω) representing the nonsynonymous:synonymous substitution ratio (i.e., dn:ds), and codon frequencies calculated from the nucleotide frequencies at the three codon positions. This basic model forms the foundation for our models cod1 through cod4 (Table 2), all of which utilize the fixed (unrooted) topology obtained via phylogenetic analysis of our combined data set (Fig. 1c). The first model (cod1, Table 2) is a basic branch model (Yang 1998; Yang and Nielsen 1998), in which the phylogeny is divided, a priori, into groups of branches that are each allocated their own ω parameter. We evaluated four versions of this model, each with a different clade regime (Fig. 2). Our second model (cod2) is equivalent to model M2a of Wong et al. (2004) and Yang et al. (2005) and is a basic site model (Nielsen and Yang 1998; Yang et al. 2000), in which the likelihood of the data is computed given a certain number of site classes (analogous to the site classes of the gamma distribution of rates in nucleotide models incorporating this parameter), each of which has its own ω parameter. Model cod2 has three site classes (with ω0 < 1; ω1 = 1; and ω2 > 1, respectively), and four ω-related parameters (a proportion of sites fitting ω0 , a proportion of sites fitting ω1 , and the ω0 and ω2 parameters themselves; Page: 35 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Parameter counts Total ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES 31–54 36 SYSTEMATIC BIOLOGY VOL. 63 a) Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 b) c) FIGURE 1. ML phylograms. a) From the individual loci: (i) atpA; (ii) atpB; (iii) rbcL; (iv) gapCp; (v) atp1; and (vi) nad5. b) From the data combined by compartment: (i) plastid; and (ii) mitochondrion. c) All data. Branch lengths are in substitutions/site. Bold branches are strongly supported (≥70% ML bootstrap support and ≥0.95 posterior probability). Bootstrap support values are shown above the branch and posterior probabilities below. Asterisks (*) indicate 100% bootstrap support or 1.0 posterior probability. since the proportions have to sum to 1, the proportion of sites fitting ω2 is not a free parameter, nor is the ω1 parameter fixed at 1). The third model (cod3; model M3 in codeml) is very similar, but allows ω1 to vary between [13:56 4/12/2013 Sysbio-syt058.tex] zero and infinity, rather than being fixed at 1, and thus has an additional free parameter. The final model we considered (cod4) is model D of Bielawski and Yang (2004). This clade model is a Page: 36 31–54 2014 37 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES i Outgroup C V Regime 1 Regime 2 Regime 3 Regime 4 A Outgroup ii C A Regime nu1 Regime nu2 Regime nu3 Regime nu4 Local Clock 2 Local Clock 3 Local Clock 0 Local Clock 1 FIGURE 2. Clock/clade regimes. The four “clock” (for nucleotide analyses) or “clade” (for the codon analyses) regimes used for the plastid and mitochondrial data (i) and the nuclear data (ii). For each regime, branches of a given shade share rate or ω parameters (depending on the analysis), whereas branches with different shades get their own individual parameters. Regimes 1–3 each have three rate or ω parameters; regime 4 has four. Clade name abbreviations follow Figure 1c. TABLE 2. Codon models used Parameter counts codeml settings Model ω variability cod1 cod2 cod3 cod4 Among branches Among sitesa Among sitesb Among branches and sites Substitution parameters model nssites Exchange Codon frequency Branch lengths Total 2 0 0 3 NA 2 3 3 p(c+1) 5p 6p p(c+5) 9p 9p 9p 9p p(2n−3) p(2n−3) p(2n−3) p(2n−3) p(2n+c+7) p(2n+11) p(2n+12) p(2n+c+11) Descriptions and parameter counts for the general codon models used in this study. NA = not applicable. For the parameter counts: n, number of tips (taxa); p, number of partitions in the analysis; c, number of predefined “clades” for the branch-site analyses. a With two free ω parameters (ω < 1; ω = 1; ω > 1). 0 1 2 b With three free ω parameters (ω < 1; ω ≥ 0; ω > 1). 0 1 2 variation of the general branch-site family of models (Yang and Nielsen 2002) that allow heterogeneous dn:ds ratios across codons and also across branches of the phylogeny. With three site classes, cod4 incorporates two parameters corresponding to the proportion of sites optimized under the first two site classes (as in cod2, the third proportion is not a free parameter). The ω for the first site class is allowed to take any value between 0 and 1; the ω for the second class can take any value between zero and infinity. Both of these site classes are applied across the full phylogeny. However, the ω parameter for the third site class is independently optimized for each of the predesignated clades and can also take any value between zero and infinity. We ran four versions of this [13:56 4/12/2013 Sysbio-syt058.tex] Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 V model, each under a different clade regime, as we did for cod1 (Fig. 2). Because is not possible to directly perform partitioned analyses in codeml, it was necessary to run each partition individually and sum log-likelihoods and parameter counts. To avoid basing conclusions on results from suboptimal likelihood peaks, we repeated each analysis 10 times independently and we report the run with the highest likelihood (results from all runs are available at the Dryad repository, doi:10.5061/dryad.c5m42). Data were again summarized from the PAML outputs using PAMLparser (Supplementary Appendix S1). Each model was run on the unpartitioned data, under a threepartition scheme (one partition for each of the three Page: 37 31–54 38 genomic compartments) and a six-partition scheme (one partition for each locus). Model fit was evaluated as for the nucleotide models above. RESULTS Character Data The single-gene alignments ranged from 1342 bp (atpB) to 2544 bp (nad5), with the 6-gene data set comprising 10,487 bp (Supplementary Table S3). An additional 82 characters were obtained through the scoring of the nad5 indels, resulting in a combined data set of 10,569 bp. In atp1 there is a group II intron (studied by Wikström and Pryer (2005)) that is present in Cryptogramma and Pityrogramma but absent in all other sampled species (apparently due to a single loss on the branch leading to the adiantoids and cheilanthoids); this intron was excluded prior to analysis. The nad5 group II intron identified by Vangerow et al. (1999), however, is present in all members of our sample, and was included in our analyses. Phylogeny Among the phylogenies inferred from the six individual loci (the indel characters were included with the nad5 sequence characters), there are only two wellsupported conflicts, both involving atp1 (Fig. 1a(v)). This gene places two species of Adiantum (A. peruvianum and A. tetraphyllum) as sister to the remainder of the ingroup, with strong support, rather than with the other species of Adiantum. This gene also places Rheopteris (rather than Anetium) as sister to Vittaria. These incongruences are not the result of a misidentification (the same extractions were used for all loci), or of a lab error (the sequences involved are clearly adiantoid and all sequenced members of that clade are uniquely represented in the tree). We therefore considered the incongruence to be due to stochastic variation in the atp1 signal (e.g., see Weisrock 2012; Rothfels et al. 2013b), and concatenated the loci to infer the compartment specific (Fig. 1b) and global (Fig. 1c) ML phylograms (see Materials and Methods section). All nodes in the phylogeny resulting from ML analysis of the combined data set are well supported by both ML bootstrapping (≥ 70% bootstrap support) and by Bayesian inference (≥ 0.95 posterior probability; Fig. 1c). This includes strong support for each of the three major subclades of interest—cheilanthoids, vittarioids, and Adiantum (labeled C, V, and A, respectively; Fig. 1c)—with Adiantum and the vittarioids sister to each other, and that clade sister to the cheilanthoids. Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Bayesian Analysis To further dissect patterns of rate heterogeneity, while avoiding the need to assign local clocks or even a topology a priori, we analysed our data in a Bayesian framework using BEAST v1.6.1 (Drummond and Rambaut 2007; Drummond and Suchard 2010). BEAST offers two relaxed clock models that are of particular interest here: one with uncorrelated lognormally distributed branch rates and another with randomly assigned local clocks (Drummond et al. 2006; Drummond and Suchard 2010). The six loci (the nad5 indel data set was not included) were analysed individually and in combination using each of these two models. For the partitioned analyses, substitution parameters were unlinked across partitions, while clock parameters and topologies were linked. Monophyly was enforced for four taxon sets: (i) all taxa except for Cryptogramma (in effect, rooting the tree); (ii) all taxa except for Cryptogramma and Pityrogramma; (iii) the cheilanthoids; and (iv) Adiantum (it was never necessary to enforce the monophyly of the vittarioids). In all cases, we employed a GTR+G substitution model and a birth–death tree prior, with the average clock rate fixed at 1.0. Priors were left at their default values with the exception of birthDeath.meanGrowthRate, which was given a uniform prior between 1 and 100, and birthDeath.relativeDeathRate, which was given a uniform prior between 0 and 2. Convergence and effective sample sizes were assessed in Tracer v1.5 (Rambaut and Drummond 2007). For each data set, the lognormal uncorrelated relaxed clock (LURC) analyses were run four times independently, for 50 million generations each, with the chain sampled every 2000 generations. These runs converged relatively rapidly. The first 10 million generations were discarded (very conservatively) as burn-in prior to summarizing the posterior. The random local clock (RLC) analyses generally took longer to converge and were run 7 times independently for 100 million generations, with chains sampled every 25,000 generations. Nevertheless, some individual runs failed to converge. Of the 7 runs per data set, at least 5 converged in all cases, and the burn-in period for the converged runs ranged from 20 million to 80 million generations. The total post-burn-in sample sizes ranged from 10,400 samples (for the concatenated mitochondrial data) to 19,000 samples (for the concatenated full data). [13:56 4/12/2013 Sysbio-syt058.tex] VOL. 63 SYSTEMATIC BIOLOGY Likelihood Analysis of Nucleotide Models Optimizations of branch lengths for the individuallocus data sets (Fig. 3a–c) and the individualcompartment data sets (Fig. 3c–e) on the combined topology reveal a clear trend toward longer branches for the vittarioid taxa. However, this trend is not absolute (Fig. 3a,e) and there is substantial branch length variation within each of the major clades. Our model comparison analyses are more illuminating than a simple inspection of branch lengths. The best performing model, by far, is the most heavily parameterized (bas2), partitioned by locus (Table 3; Fig. 4), despite a strong penalty by the AICc. After bas2, the best fitting of the non-local clock Page: 38 31–54 2014 39 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES a) atpA b) atpB rbcL atp1 c) nad5 d) e) gapCp “short” vittarioids cheilanthoids Adiantum FIGURE 3. Branch lengths optimized on consensus topology. Branch lengths of the mitochondrial loci (a), the plastid loci (b), the nuclear locus (c), and the combined plastid (d) and combined mitochondrial (e) data, optimized on the topology from the combined data (Fig. 1c). Branch length bars each indicate 0.03 substitutions/site/arbitrary time unit. TABLE 3. Model bas1 bas3 bas1 bas2 bas3 bas4 bas5 bas1 bas2 bas3 bas4 bas5 bas6 bas6 bas6 bas6 bas7 bas7 bas7 bas7 bas8 bas8 bas8 bas8 Fit of the full data on the nucleotide models Partitioning None None By locus By locus By locus By locus By locus By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment By compartment Clock Parameter count regimea NA NA NA NA NA NA NA NA NA NA NA NA 1 2 3 4 1 2 3 4 1 2 3 4 58 34 68 348 204 108 83 62 174 102 78 53 56 56 56 57 60 60 60 63 108 108 108 111 lnL AICc –41,813.15 –42,229.79 –41,199.80 –40,515.34 –41,156.48 –40,995.03 –41,436.10 –41,208.81 –40,722.08 –41,286.15 –41,058.03 –41,501.64 –41,361.23 –41,446.50 –41,275.96 –41,238.62 –41,350.65 –41,331.80 –41,167.16 –41,095.55 –41,121.89 –41,188.82 –41,037.67 –40,984.41 83,743.19 845,27.90 82,536.84 81,766.80 82,734.52 82,209.16 83,040.04 82,542.64 81,804.69 82,780.57 82,273.67 83,110.04 82,835.29 83,005.84 82,664.75 82,592.10 82,822.26 82,784.57 82,455.29 82,318.16 82,462.89 82,596.74 82,294.46 82,195.87 NA, not applicable. a Clock regimes correspond to those in Figure 2. models is bas4, which is followed by bas1, bas3, and finally bas5 (Fig. 4). The differences in fit among these models are dramatic. On average, each model had an improvement of more than 300 points in AICc score over the next best performing model—nearly two orders of magnitude greater than the rule of thumb difference for a “considerable” AICc improvement (Burnham [13:56 4/12/2013 Sysbio-syt058.tex] and Anderson 2002). The three best performing models are all clockless (tips are not constrained to be contemporaneous), differing from each other only in the degree to which parameters are shared across partitions. The best performing model (bas2) has no shared parameters, the next best (bas4) has no shared parameters except that branch lengths are constrained to be proportional across partitions, and the worst fitting of the clockless models (bas1) has proportional branch lengths and shared substitution parameters (Table 3; Fig. 4). Under a given model, the unpartitioned runs had very poor fit, and partitioning by locus (six partitions) always resulted in better fit than did partitioning by compartment (three partitions; Fig. 4(i)). With regard to partitioning, the smallest AICc difference (5.80) was for bas1, which includes relatively few additional parameters for each new partition (Tables 1, 3) but is nonetheless above the “considerable” improvement cutoff of four AICc points. Overall, the effect of altering the partitioning scheme was much smaller than that of changing the model. For example, the six-partition version of a model never outperforms the three-partition version of the next best model (such models differ by an average of over 250 AICc points; Fig. 4(ii)). Both of the local clock models (bas6 and bas7), partitioned by compartment, fit the combined data better than a comparable global clock model (bas5) and worse than a comparable clockless model (bas4; Table 3; Fig. 5). The more highly parameterized bas7 consistently outperformed bas6, which forces branch lengths to be proportional across partitions (Fig. 5). The effects of the different local clock regimes on the two models are very similar (Fig. 5), with only one notable deviation: clock regime 2 (cheilanthoids and vittarioids Page: 39 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 “outgroup” 31–54 40 VOL. 63 SYSTEMATIC BIOLOGY i ii 85000 83200 83100 bas5 AICc (smaller for better models) 84000 83500 83000 82500 83000 82800 bas3 82700 82600 bas1 82500 82300 bas4 82200 81900 82000 81800 bas2 81700 81500 a b bas1 c Partitioned by compartment bas2 bas3 Partitioned by locus bas4 bas5 FIGURE 4. Model fit of the full data on the general nucleotide models. Units are in AICc points; smaller values indicate better fit. i) Fit of unpartitioned data (a) versus the same data partitioned by compartment (three partitions; b), and by locus (six partitions; c). ii) Fit of the partitioned analyses. The small tree icons in part (ii) indicate whether the model is clockfree (unrooted), or has some form of molecular clock enforced. sharing a local clock) was a worse fit than clock regime 1 (Adiantum and vittarioids sharing a local clock) under bas6, but a better fit under bas7. That said, local clock regimes 1 and 2 are both very poorly performing for the combined data and there is a strong increase in model fit (decrease in AICc) moving to regime 3, the first to allow the vittarioids their own clock (Fig. 2). A smaller (but still considerable) improvement of fit is seen under clock regime 4, which gives each of the major clades its own clock. Model fit for the compartment-level data, individually optimized, reveals that much of the difference in fit under the local clock regimes is driven by the strong improvement in fit of the plastid data under clock regimes 3 and 4 (Table 4; Fig. 6). The nuclear data show a similar pattern but the mitochondrial data are very different, fitting worst under clock regime 3 (which gives the vittarioids their own clock, lumping Adiantum and the cheilanthoids together), and best under regime 1 (giving cheilanthoids their own clock, Adiantum and the vittarioids together; Table 4; Fig. 6). [13:56 4/12/2013 Sysbio-syt058.tex] The local clock rate estimates under clock regime 4 (each major clade given its own clock) help explain the discrepancy in fit among the compartments. In the plastid and nuclear genomes, the cheilanthoids have the slowest rate, followed by Adiantum, and finally the vittarioids, with a particularly high rate (Table 4; Fig. 7). The vittarioids have a plastid rate that is 2.73 and 5.82 times greater than Adiantum and the cheilanthoids, respectively. For the nuclear data, the numbers are similar: the vittarioids are 2.46 times faster than Adiantum and 3.08 times faster than the cheilanthoids. The mitochondrial data, however, show nearly identical rates for Adiantum and the vittarioids, with both being faster than the cheilanthoids (3.16 and 3.07 times faster, respectively; Table 4; Fig. 7). The rate increase seen for the vittarioids in the plastid and nuclear data thus also includes Adiantum in the mitochondrial data, explaining the better fit of the mitochondrial data to the local clock regime that allows Adiantum and the vittarioids to share a clock (regime 1). These rate estimates are derived from data that, within a Page: 40 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 AICc (smaller for better models) 84500 31–54 2014 83200 Improvement in fit (AICc) vs. global clock AICc (smaller for better models) 83100 83000 82900 82800 82700 82600 bas4 bas5 bas6 bas7 82500 82400 82300 82200 clock regime 2 clock regime 3 450 350 300 250 200 150 100 50 0 clock regime 4 FIGURE 5. Comparison of the fit of the local clock models (bas6 and bas7) to the global clock (bas5) and clockfree (bas4) models. Units are in AICc points; smaller values indicate better fit. The data under all models are partitioned by compartment (three partitions). The clock regimes follow Figure 2. given compartment, were not partitioned. To ensure that heterogeneity among codon positions within a compartment is not biasing our estimates (Brandley et al. 2005; Shapiro et al. 2005; Lanfear et al. 2012; Rothfels et al. 2013a), we additionally ran these analyses with the data partitioned by codon position (one partition for each codon position, and one for noncoding characters, if present). The two model types (unpartitioned vs. partitioned by codon position) gave very similar rate estimates for all compartments (data not shown). Likelihood Analysis of Codon Models As with the nucleotide models, the introduction of partitions in the codon models had a strong effect mitochondrial plastid nuclear 400 clock regime 1 clock regime 2 clock regime 3 clock regime 4 FIGURE 6. Fit of the compartment data to the local clock models (under bas8). For each combination of compartment and clock regime, the fit scores are standardized against the fit of the same data under the global clock model. Units are in AICc points, with larger values indicating bigger improvements in fit for the local clock model. The clock regimes follow Figure 2. on AICc scores (Table 5; Fig. 8(i)). However, threepartition schemes (by compartment) generally fit better for the codon models than do more complex six-partition schemes (by locus; Table 5; Fig. 8(i),(ii)). That said, the effect of the model (e.g., cod1 vs. cod2) was still much stronger than that of the partitioning scheme (three vs. six partitions). The branch model (cod1)—the only model that did not allow for differences in selection pressure among codons—performed very poorly (at least 1500 AICc points worse than the other models; Table 5; Fig. 8(ii)), even though it accommodates lineagespecific differences. The three remaining models each incorporate three site classes to accommodate selection differences among codons. The worst performing of these site models was the simplest (cod2), which was Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 clock regime 1 TABLE 4. 41 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES Fit of the compartment data and rate estimates under local clocks Local clock ratesa Partition Plastid Plastid Plastid Plastid Mitochondrion Mitochondrion Mitochondrion Mitochondrion Nuclear Nuclear Nuclear Nuclear Clock regimeb Parameter count 1 2 3 4 1 2 3 4 1 2 3 4 36 36 36 37 36 36 36 37 36 36 36 37 lnL −24,722.411 −24,762.292 −24,629.441 −24,612.159 −9271.300 −9295.614 −9305.914 −9271.283 −7128.177 −7130.909 −7102.319 −7100.970 AICc clock 0 clock 1 clock 2 49,517.480 49,597.241 49,331.538 49,299.012 18,615.601 18,664.228 18,684.829 18,617.623 14,331.477 14,336.942 14,279.761 14,279.240 0.006381 0.006618 0.006919 0.006363 0.003151 0.003283 0.003269 0.003120 0.027437 0.028789 0.027964 0.027612 0.029328 0.018745 0.008387 0.007547 0.007748 0.003935 0.005141 0.002544 0.032859 0.030989 0.020127 0.017868 0.007585 0.010018 0.025738 0.016098 0.002504 0.006357 0.006218 0.008053 0.017725 0.019710 0.051948 0.022348 clock 3 NA NA NA 0.043965 NA NA NA 0.007822 NA NA NA 0.055020 All values obtained using the bas8 nucleotide model. NA, not applicable. a Rates are in number of substitutions per site per arbitrary time unit. b Clock regimes correspond to those in Figure 2. [13:56 4/12/2013 Sysbio-syt058.tex] Page: 41 31–54 Substitution rate (per site per time unit) 42 SYSTEMATIC BIOLOGY 0.06 plastid mitochondrial nuclear 0.05 0.04 0.03 0.02 0.01 0 clade 0 (outgroup) clade 1 (cheilanth.) clade 2 (Adiantum) clade 3 (vittarioids) TABLE 5. Model cod1 cod1 cod1 cod1 cod1 cod1 cod1 cod1 cod1 cod1 cod1 cod1 cod2 cod2 cod2 cod3 cod3 cod3 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 Fit of the full data on the codon models Partitioning By compartment By compartment By compartment By compartment By locus By locus By locus By locus None None None None By compartment By locus None By compartment By locus None By compartment By compartment By compartment By compartment By locus By locus By locus By locus None None None None Clade Parameter regimea count 1 2 3 4 1 2 3 4 1 2 3 4 NA NA NA NA NA NA 1 2 3 4 1 2 3 4 1 2 3 4 178 178 178 181 364 364 364 370 62 62 62 63 181 370 63 184 375 64 190 190 190 193 388 388 388 394 66 66 66 67 lnL AICc –31,125.31 –31,124.32 –31,123.62 –31,121.95 –30,917.42 –30,917.68 –30,914.27 –30,911.70 –32,204.70 –32,198.30 –32,194.52 –32,193.72 –30,334.41 –30,133.79 –31,363.83 –30,264.73 –30,077.03 –31,331.96 –30,256.20 –30,255.01 –30,254.78 –30,255.59 –30,042.64 –30,056.63 –30,051.57 –30,051.41 –31,329.96 –31,330.26 –31,325.76 –31,324.74 62,617.53 62,615.55 62,614.16 62,617.20 62,609.87 62,610.39 62,603.57 62,612.04 64,534.72 64,521.90 64,514.35 64,514.80 61,042.12 61,056.23 62,855.01 60,909.14 60,954.07 62,793.31 60,904.87 60,902.49 60,902.02 60,910.05 60,914.93 60,942.92 60,932.79 60,946.21 62,793.40 62,794.02 62,785.02 62,785.01 NA, not applicable. a Clade regimes correspond to those in Figure 2. over 100 AICc points poorer than either of the other two models. Those remaining models (cod3 and cod4) had very similar AICc scores on the full data, regardless of partitioning scheme (Table 5; Fig. 8). [13:56 4/12/2013 Sysbio-syt058.tex] Using the cod4 model, the three genomic compartments have approximately parallel responses in fit to the different clade regimes (Table 6; Fig. 9). In each case, clade regime 4—the one that gives each clade its own ω parameter—was the worst fitting, with the other three regimes being very close in performance. This is yet another example of the data not supporting the most parameter-rich models (clade regime 4 has an additional parameter over the other three regimes; Fig. 2). The mitochondrial data, if analysed separately, showed considerable support for the inclusion of lineage effects under most of the clade regimes (most of the mitochondrial AICc improvements are greater than 4; Fig. 9). In contrast, the nuclear data showed considerable support for the simpler, lineage-effect free model (differences are greater than 4 AICc points). The plastid data did not show considerable support for either model over the other (Fig. 9). For both the nuclear and plastid data, most sites were optimized by the software under site class 1 (the class constrained to have ω values <1; see Methods section), indicating strong purifying selection. In fact, the ML estimates of ω for those data were near zero (Table 6; Fig. 10). For each of the three compartments, relatively few codon sites were optimized under site class 2 (ω between zero and infinity), additional evidence for a preponderance of purifying selection. However, those few mitochondrial and nuclear sites that were optimized under site class 2 had ML ω estimates considerably above 1, suggesting strong positive selection at a small number of sites across the tree (Table 6; Fig. 10(ii)). The estimates for the ω parameters that were allowed to have different values for each clade (those for site class 3) are marked by two main patterns. The first is that the biggest ω differences are between the outgroup and the ingroup clades for both the mitochondrial data (a decrease in ω) and the plastid data (an increase in ω; Fig. 10(iii)). Presumably, a strong difference in selective pressure between the ingroup and outgroup is driving the high proportion of mitochondrial sites optimized under this site class (slighter over 50% of the codons; Fig. 10(i)). The second broad pattern is that there is little change in the signature of selection among the three ingroup clades. The mitochondrial and nuclear loci have ω estimates for site class 3 of approximately 0.3, regardless of the clade. The plastid data site class 3 ω estimate is approximately 1.25 (regardless of clade), but very few sites are allocated to this site class (Table 6; Fig. 10(i), (iii)). Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 FIGURE 7. Lineage-specific rates of evolution for each genomic compartment. Inferred substitution rates of the compartment data for each of the clades of clock regime 4 (Fig. 2), calculated under the bas8 model. Units are in substitutions per site per arbitrary time unit. VOL. 63 Bayesian Analyses The Bayesian analyses under the LURC and RLC models yielded results that were complementary to those obtained through the likelihood analyses. For each of the LURC compartmental analyses, the vittarioid branches are generally reconstructed as having higher rates of molecular evolution (Fig. 11(i)). However, this increase is against a very heterogeneous background of Page: 42 31–54 2014 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES i ii 64750 43 62750 AICc (smaller for better models) 63750 63250 62750 62250 61750 62550 61150 60950 61250 cod1 cod3 cod2 cod4 60750 60750 a b c Partitioned by compartment Partitioned by locus FIGURE 8. Model fit of the full data on the general codon models. Units are in AICc points; smaller values indicate better fit. i) Fit under unpartitioned data (a) versus the same data partitioned by compartment (three partitions; b), and by locus (six partitions; c). ii) Zoomed-in view of the fit of the partitioned analyses. rate variation (within each of the major clades there are both unusually fast branches, and particularly slow ones) and much of the rate increase in the vittarioids maps to a single branch (the stem branch of the clade). The RLC analyses reconstruct much more homogenous patterns of rate variation (Fig. 11(ii)). Here, the vittarioids are uniformly fast, the cheilanthoids are generally slow, and Adiantum is somewhat intermediate. However, within that broad pattern, it is actually a cheilanthoid terminal branch (the one leading to Hemionitis) that has the highest rate. For the RLC analyses, the mean number of reconstructed rate shifts across the posterior sample is 8.06 ± 0.018, despite half the prior density for the number of rate changes being on 0. DISCUSSION Phylogenetic Relationships The most significant phylogenetic result of this study is the very strong support for a monophyletic Adiantum (Fig. 1c). Adiantum has an unusual degree of morphological consistency compared to other large [13:56 4/12/2013 Sysbio-syt058.tex] fern genera and is defined by a unique character state— sporangia born on, and limited to, the false indusium (Tryon et al. 1990). However, molecular phylogenetic studies have struggled to find support for its monophyly with respect to the vittarioid ferns, which are strikingly different, morphologically (Gastony and Rollo 1995; Schuettpelz and Pryer 2007; Schuettpelz et al. 2007; Ruhfel et al. 2008; Bouma et al. 2010; Lu et al. 2011a,b). Our six-locus data set strongly supports the monophyly of Adiantum, with 94% ML bootstrap support and 1.0 posterior probability (Fig. 1c). Other relationships within the adiantoids (Adiantum plus the vittarioids) were mostly in accordance with earlier studies (Crane et al. 1995; Crane 1997; Schuettpelz et al. 2007; Ruhfel et al. 2008; Lu et al. 2011a,b), but better supported. We find novel support for a clade uniting the enigmatic Rheopteris (the least morphologically reduced of the vittarioids) and Monogramma acrocarpa, one of the most reduced species, a result that suggests a pattern of convergent morphological simplification in the vittarioids (Crane et al. 1995; Schuettpelz et al. 2007; Ruhfel et al. 2008). Broad relationships within the cheilanthoid ferns are becoming increasingly well understood (Gastony and Rollo 1995; Gastony and Rollo 1998; Kirkpatrick 2007; Prado et al. 2007; Schuettpelz Page: 43 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 AICc (smaller for better models) 64250 31–54 44 VOL. 63 SYSTEMATIC BIOLOGY Improvement in fit (AICc) vs. site model (Cod3) NA NA NA NA NA NA 1.130 NA NA NA 0.328 NA NA NA 0.252 NA NA NA NA NA NA NA NA NA NA NA NA 0.004 0.203 NA NA NA NA NA NA 0.000 4.832 NA NA NA NA NA NA 0.017 1.923 NA NA NA 1.274 1.296 1.130 1.307 0.418 0.129 0.150 0.336 0.352 0.338 0.251 0.333 NA NA NA NA NA NA 0.004 0.201 0.004 0.200 0.004 0.204 0.004 0.203 0.000 4.831 3.195 15.104 3.230 15.195 0.000 4.832 0.018 1.923 0.018 1.913 0.017 1.923 0.017 1.923 NA NA NA 1.191 1.176 1.297 1.284 0.333 0.168 0.146 0.418 0.302 0.302 0.337 0.342 6 4 2 0 mitochondrial plastid -2 nuclear -4 -6 -8 -10 NA NA NA NA NA NA 0.004 0.201 0.004 0.200 0.004 0.204 0.004 0.203 0.000 4.831 3.195 15.104 3.230 15.195 0.000 4.832 0.018 1.923 0.018 1.913 0.017 1.923 0.017 1.923 clade regime 1 clade regime 2 clade regime 3 clade regime 4 0.841 0.865 0.730 0.838 0.837 0.839 0.839 0.404 0.117 0.116 0.404 0.731 0.727 0.721 0.722 0.119 0.125 0.239 0.122 0.122 0.121 0.121 0.087 0.010 0.010 0.087 0.031 0.031 0.031 0.031 0.040 0.011 0.031 0.041 0.041 0.040 0.040 0.509 0.873 0.874 0.509 0.238 0.242 0.247 0.247 0.004 0.207 0.174 3.060 0.018 0.320 0.004 0.201 0.004 0.200 0.004 0.204 0.004 0.203 0.000 4.831 3.195 15.104 3.230 15.195 0.000 4.832 0.018 1.923 0.018 1.913 0.017 1.923 0.017 1.923 1.148 14.217 1.919 0.672 0.673 0.675 0.675 1.048 0.457 0.459 1.049 0.328 0.324 0.315 0.315 FIGURE 9. Fit of the compartment data to the branch-site (“clade”) models. The fit scores are standardized against the fit of the same data under the site model (values plotted are AICc score under cod3 minus AICc score under cod4). Units are thus in AICc points, but with larger (more positive) values indicating bigger improvements in fit for the branch-site model over the site model, and thus the magnitude of the influence of lineage effects. The entire line for the nuclear data is below zero because cod3 was a better fit for those data than was cod4 (and thus the AICc scores for cod4 were larger). The clade regimes follow Figure 2. NA, not applicable. a Clade regimes correspond to those in Figure 2. 46,436.9 9842.1 4652.8 46,433.9 46,433.6 46,433.1 46,435.2 9836.7 9834.5 9835.1 9838.9 4658.8 4658.9 4658.3 4661.4 −23,153.4 −4854.4 −2256.9 −23,149.8 −23,149.7 −23,149.5 −23,149.5 −4849.5 −4848.4 −4848.7 −4849.5 −2256.8 −2256.9 −2256.6 −2256.6 64 64 56 66 66 66 67 66 66 66 67 58 58 58 59 Plastid Mitochondrial Nuclear Plastid Plastid Plastid Plastid Mitochondrial Mitochondrial Mitochondrial Mitochondrial Nuclear Nuclear Nuclear Nuclear cod3 cod3 cod3 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 cod4 NA NA NA 1 2 3 4 1 2 3 4 1 2 3 4 3 2 1 3 2 1 3 2 1 3 2 1 Genome Model Clade Parameter regimea count lnL AICc 1 2 3 Omegas for clade 0 Site classes Proportions Site classes Fit of the compartment data and parameter estimates for codon models TABLE 6. [13:56 4/12/2013 Sysbio-syt058.tex] 8 et al. 2007; Zhang et al. 2007; Rothfels et al. 2008; Windham et al. 2009; Eiserhardt et al. 2011; Johnson et al. 2012) and our results are consistent with these studies. A notable finding for cheilanthoids here, however, is the strong support for Calciphilopteris ludens as a member of this clade (Fig. 1c). Earlier studies that included this taxon typically resolved it as sister to the rest of the cheilanthoids, but without support (Schuettpelz et al. 2007; Zhang et al. 2007), making it unclear as to whether it was more closely related to the cheilanthoids, or to the adiantoids. Certainly, morphology would place Calciphilopteris in the cheilanthoids (Tryon 1942), and that is where it was treated in a recent classification of the genus (Yesilyurt and Schneider 2010). Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Omegas for clade 1 Site classes Omegas for clade 2 Site classes Omegas for clade 3 Site classes 10 Patterns of Substitution Rate Heterogeneity In our analyses, we find rates of molecular evolution to vary among taxa, loci, and genomic compartments. Consistent with other studies of vascular plants (Wolfe et al. 1987; Wolfe et al. 1989b), our mitochondrial data are more slowly evolving than are our plastid data (Supplementary Fig. S2), despite the fact that the plastid loci are all coding and much of the nad5 alignment is intron sequence. In turn, the plastid loci are themselves much more slowly evolving than is our low-copy nuclear locus (Supplementary Fig. S2). However, these results are coarse—the nuclear rate, in particular, is inferred Page: 44 31–54 2014 45 1 2 3 1 3 2 1 4 3.5 3 2.5 2 1.5 1 0.5 0 mitochondrial nuclear 1.25 1 0.75 0.5 0.25 0 clade 0 (outgroup) clade 1 (cheil.) clade 2 (Adiant.) clade 3 (vittar.) FIGURE 10. Inferences of selection pressure on genomic compartments, across taxa. i) Proportion of sites allocated to each site class, for each genomic compartment, under cod4 (a “clade” branch-site model). Estimates for ω (the ratio of nonsynonymous to synonymous substitutions) for (ii) site classes 1 and 2, and (iii) site class 3. The ω values for site class 3 are permitted to vary across predefined “clades” in the tree, in this case, by the four clades of clade regime 4 (Fig. 2). from a single short locus, with a high proportion of missing data in the intron portions (Supplementary Table S3). The lineage-specific and partition-specific differences we observe are strong, and their intersection—the variation (across data sets) of the variation (across taxa) of the rate of molecular evolution—is especially interesting. Overall, the vittarioids show a dramatically elevated rate of molecular evolution, evolving 4.3 times faster on average than the cheilanthoids (under a local clock model with all data linked—bas6 with clock regime 4; Dryad doi:10.5061/dryad.c5m42). This rate acceleration is even more pronounced when the plastid data are considered alone (Table 4; Fig. 7). Adiantum is also faster than the cheilanthoids, but generally much slower than the vittarioids. Local clock models that force the cheilanthoids and Adiantum to share a rate (vittarioids get their own) have a much better fit than those that force the vittarioids to share a rate with either the cheilanthoids or Adiantum (by 367 and 329 AICc points, respectively; bas7; Fig 5; Table 3). However, it is important to note that local clock models that give each of the three major clades its own rate are still preferred over those that force any two to share a rate (clock regime 4 vs. clock regimes 1, 2, or 3, Fig. 2), by a minimum of 137 AIC points (Fig. 5; Table 3). Although within-clade rate variation can result in the superior fit of local clock models even when there is no difference in expected rates among the clades in question (Lanfear 2010), this is not driving our results. Under the Local Clock Permutation Test (Lanfear 2010; 1000 permutations used) both the plastid and nuclear data reject the null model of no rate differences among the focal clades (p < 0.001 and < 0.015, respectively), but the mitochondrial data do not. [13:56 4/12/2013 Sysbio-syt058.tex] From the Bayesian analyses, we reconstruct the vittarioid stem branch to have an average substitution rate that is 2.16 or 2.30 times that of the branch that gave rise to it (under the LURC and RCL models, respectively; Fig. 11). This acceleration for vittarioids is noteworthy in that for the subset of previous studies where polarity could be established, rate accelerations were less frequent than were slowdowns (e.g., Soltis et al. 2002; Schuettpelz and Pryer 2006; Bininda-Emonds 2007; Korall et al. 2010; Li et al. 2011). The strong, broad lineage effects we see in our data set occur in the context of extensive fine-grained rate variation. In all cases, clockless (unrooted) models fit the data much better than do local clock models (Fig. 5; Table 3), despite the greatly increased parameter number of the former. The fine-grained effects are best exhibited by the Bayesian LURC analyses (Fig. 11(i)). Under this uncorrelated model of rate variation, which lacks any a priori division of the tree into focal clades, the data prefer many rate changes, spread across the tree. Most of the vittarioid branches are slower than at least one branch in each of the other clades (Fig. 11(i)) and the vittarioid rate increase, under this model, appears to be largely attributable to a single very fast branch— the stem branch of the clade. Even under the RLC model, which strongly favors very few rate changes (half the prior density is on zero rate changes), the mean number of rate changes across the tree is above eight, and three rate changes (the number necessary to give each of the major clades their own rates with no further changes) is outside the 95% credibility interval for the rateChangeCount parameter. Inferences under this model do, however, suggest rather uniformly elevated rates across the vittarioids, instead of a single fast branch at the base of the clade as inferred under Page: 45 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 plastid 4.5 1.5 Increasing purifying selection Incr. positive selection 2 iii 5 Non-synonymous to synonymous (dn/ds) substitution ratio ii 3 Incr. pur. sel. Increasing positive selection i Non-synonymous to synonymous (dn/ds) substitution ratio ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES 31–54 46 VOL. 63 SYSTEMATIC BIOLOGY 0.33 0.78 0.54 0.42 0.41 0.38 0.41 0.5 0.52 0.43 0.69 0.55 0.97 0.7 2.02 1.11 0.84 0.99 2.83 i) LURC 0.83 1.64 1.16 0.9 0.84 0.84 1.63 1.25 1.26 1.04 1.31 0.97 1.2 0.97 1.29 0.97 0.62 1.25 0.57 0.62 0.72 a b Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 0.66 1.03 0.67 0.84 0.77 0.68 0.95 0.59 0.9 0.54 0.21 0.87 0.52 0.51 0.52 0.51 0.51 0.51 0.57 1.08 1.08 2.87 1.08 1.08 2.21 0.51 0.87 0.51 0.55 0.87 ii) RLC 2.15 All vittarioid branches have a rate of 2.13 unless indicated 1.58 2.17 2.15 2.04 2.03 2.04 0.87 1.43 All Adiantum branches have a rate of 0.87 unless indicated 0.92 b a 10 9 8 7 6 5 4 3 2 1 0 Arbitrary time units from present FIGURE 11. Comparison of Bayesian relaxed clock models: rate and divergence time contrasts. Results from the full data, partitioned by compartment, analysed under a lognormal uncorrelated relaxed clock (LURC; i) and a random local clock (RLC; ii) model. Branch thickness and shading are both proportional to inferred median rate for the branch; median rate estimates are presented above the branches. Two discrepancies between the models are highlighted: the difference between the inferred ages of the ingroup (arrow a) and between the inferred crown age of the vittarioids (arrow b). the LURC model. Under the RLC model, all vittarioid branches have faster median rates than do any other branches, save that of the cheilanthoid Hemionitis, which is reconstructed as fastest of all. The pattern of variation across our data sets is similar to that seen across taxa, with some strong, coarse patterns in the context of considerable finescaled variation. In our analyses, the “variation in rate variation” is partitioned much more strongly among [13:56 4/12/2013 Sysbio-syt058.tex] compartments than among loci (Fig. 4). Generally, the loci within a compartment behave similarly. This result contrasts with, for example, Moncalvo et al. (2000), who found significantly different patterns of rate variation among loci within compartments. The differences among compartments we observe compose perhaps the most difficult set of results to interpret. Both the plastid and the nuclear data strongly support an accelerated rate of evolution for the vittarioids compared Page: 46 31–54 2014 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES [13:56 4/12/2013 Sysbio-syt058.tex] regimes (Fig. 5), and the strongly elevated rates inferred for the vittarioids under those same models (Fig. 7) demonstrate that the rate acceleration of the vittarioids is a prominent feature of these data. The general consistency of this acceleration—across the plastid and nuclear loci (Fig. 7)—is most compatible with demographic (selection and drift related) and/or lifehistory based explanations. Changes in polymerase proof reading accuracy (c.f. Cho et al. 2004; Parkinson et al. 2005), for example, can be discounted—since the compartments have their own polymerases, such a change would affect only a single compartment. Patterns of Selection Our results with codon models showed a pattern somewhat opposed to that of the nucleotide models. Whereas the fit of the nucleotide models improved with model complexity (the best-fitting model was the most highly parameterized one, with the finest grained partition scheme), codon model fit was maximized with models of intermediate complexity. The unpartitioned codon models fit very poorly (Fig. 8(i)), but the threepartition (by compartment) scheme outperformed the more complex six-partition (by locus) scheme (Fig. 8(ii)). This result is in contrast to earlier studies (Eyre-Walker and Gaut 1997; Muse and Gaut 1997; Muse 2000) that found selective effects (nonsynonymous substitution rate differences, in their case) to be largely locus specific. The intermediate-is-best pattern is also seen among the main model types. Model cod1, which does not include site effects, fits very poorly. The site models (cod2 and cod3) fit much better, but the further addition of branch effects (in the branch-site model cod4) has a negligible effect (Fig. 8(ii); Table 5). The lack of a lineage effect in the fit of the codon models is among our most interesting results. In marked contrast to the nucleotidebased analyses, our codon-based analyses show no significant evidence of fine-scale differences across data partitions or strong differences among lineages (Figs. 9 and 10). Differences in selective signatures, when they do appear (i.e., the evidence for strong positive selection in a subset of the mitochondrial sites but not for those of the plastid; Fig. 10(ii)), extend across the tree, and thus cannot explain the heterogeneity in evolutionary rates that is so striking in this group. Finally, the only indications of lineage effects (e.g., ML estimates of ω for site class 3 of cod4) are weak, and occur between the outgroup and ingroup, rather than among ingroup clades (Fig. 10(iii)). These results further limit the pool of biological mechanisms that might explain the vittarioid rate increase. While our nucleotide-based results are consistent with life-history or demography-based explanations, the subset of explanations that depend on altered post-mutation fixation rates, including those that invoke changes in the relative strength of selection and drift (e.g., reduced efficacy of selection in small populations; Woolfit and Bromham 2003; Bromham and Page: 47 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 to either other major clade, but the degree of the increase differs (in the plastid data, vittarioids are 5.8 times faster than cheilanthoids and 2.7 times faster than Adiantum; in the nuclear data, those values are 3.1 and 2.5, respectively; Fig. 7; Table 4). The mitochondrial data, in contrast, show a rate increase for both Adiantum and the vittarioids, versus the cheilanthoids, but no subsequent increase from Adiantum to the vittarioids (vittarioids:cheilanthoids = 3.1; Adiantum:cheilanthoids = 3.2; vittarioids:Adiantum = 0.97; Fig. 7; Table 4). The mitochondrial data therefore stand out, even by visual inspection (Figs. 1, 3). The two mitochondrial loci clearly differ from the other compartments, but also from each other, and both are evolving relatively slowly, on average (Supplementary Fig. S2). It is tempting to attribute the anomalous mitochondrial pattern to stochastic variation in the substitution process, given the generally weak signal present in the mitochondrial data sets. However, the model selection analyses demonstrate that the variation is nonetheless significant: the mitochondrial data fit much better under a local clock regime that forces Adiantum and the vittarioids to share a rate and gives the cheilanthoids their own clock, than under either of the alternatives (e.g., the clock regime that has a single rate for the cheilanthoids and Adiantum, and a different rate for the vittarioids, is 69 AIC points worse; Fig. 6; Table 4). In addition, unlike the plastid or nuclear data, the mitochondrial data do not favor the four-rate local clock model (clock regime 4, Fig. 2). Giving each of the major clades (plus the outgroup) its own rate results in a small reduction in fit (of 2.0 AIC points) over the three-rate clock regime 1 (in which Adiantum and the vittarioids share a rate; Fig. 6; Table 4). One of the noteworthy results of the nucleotide model selection analyses was the strong preference of the data for the most highly parameterized model (bas2), which is not only clockless, with unlinked substitution parameters, but additionally lacks a proportional branch length constraint. In effect, bas2 treats each partition as coming from a separate evolutionary process. This model is a much better fit for the data than are any of the other general models, and bas2 partitioned by locus outperforms the same model partitioned by compartment (by 37.9 AIC points; Fig. 4; Table 3). Thus, fine-scale, locus-specific variation in the variation of rate is one of the dominant features of these data. Even among the local clock models, those which permit each clock to vary freely across partitions (e.g., if the rate for clock 1 is twice that of clock 2 in the first partition, it need not be so for the second partition) fit considerably better than do the corresponding local clock models where the clocks are constrained to be proportional across partitions (bas6 vs. bas7; Fig. 5; Table 3). Overall, the results of our analyses eliminate the possibility that the apparent rate acceleration of the vittarioids is simply due to stochastic variation in the substitution process. The differences in fit among the local clock models under the different local clock 47 31–54 48 SYSTEMATIC BIOLOGY Methods of Investigating Rate Heterogeneity We found our model-fitting approach, using the AICc, to be an effective means of investigating rate heterogeneity. Model-fitting analyses allowed us to establish the pattern of rate variation without requiring multiple independent observations, and to evaluate the assumption that rate is an organism-level phenomenon influenced by some component of that organism’s life history and thus manifest across loci and genomic compartments (e.g., Bousquet et al. 1992; Nikolaev et al. 2007; Bromham 2009; Lanfear et al. 2010). In addition, our approach allowed us to avoid the challenges of phylogenetic non-independence, which have been prevalent in the history of rate heterogeneity research (reviewed in Bromham 2002). Typically, the issue of non-independence is accommodated by examining a series of independent pair-wise comparisons—informed by, for example, a particular phenotypic change of choice—evaluated with a sign test or similar approach [13:56 4/12/2013 Sysbio-syt058.tex] (e.g., Sarich and Wilson 1967; Muse and Weir 1992; Bromham et al. 1996; Bromham and Woolfit 2004; Lanfear et al. 2010). Phylogenetic non-independence is much more difficult to account for in cases, like ours, that lack repeated examples of a rate change correlated with a plausible external factor. There has been a temptation in such cases to approach evolutionary rate itself as a trait and to reconstruct the evolution of that trait on a phylogeny (e.g., Lutzoni and Pagel 1997, their “method 1”). However, if one attempts to do so in a likelihood or Bayesian framework, then the quantity being reconstructed (rate) is itself part of the model used in the reconstruction (the branch lengths). Other approaches also potentially run subtly afoul of the issue of independence. Many analyses that explicitly model rate variation do so in a manner that assumes a degree of correlation among parent and child branches (e.g., r8s; Sanderson 2002, 2003), rendering branch rates dependent (c.f. Hoegg et al. 2004; Korall et al. 2010). Finally, even if a model does not assume autocorrelation (e.g., some of the models in BEAST; Drummond et al. 2006; Drummond and Rambaut 2007) those branches are still embedded within the hierarchy of the phylogenetic tree. Different branches each have a role in accommodating the same data, and thus a rate reconstructed for a given branch is not independent of the rates reconstructed for the others. It remains to be seen whether these issues have practical effects in real data sets; preliminary comparisons suggest that they do not (e.g., Nabholz et al. 2008; Welch et al. 2008). Much future progress is likely to come from recent approaches that model rate variation explicitly as a trait, or at least, as “trait dependent” (Lartillot and Poujol 2010; Mayrose and Otto 2011). By coestimating rate and divergence time, these methods avoid the circularity and non-independence problems; related models can then be compared using likelihood ratio tests or model selection approaches, in a manner similar to this study. Our model-fitting analyses, under likelihood, did require the a priori identification of models to compare, which was not trivial. The number of possible parameterizations of a partitioned analysis increases rapidly with the number of partitions. Our data set of six loci can be divided into groups of loci 203 different ways. In a simplified scenario, where for each of the 203 different partitioning regimes we allow parameters of each “type” (exchangeability; base frequency; gamma shape; clock regime—unrooted, global, or one of four local clock regimes) to be either shared across the full data set or allocated individually to each of the partitions, there are 9744 different possible models. However, even with the requirement of a priori identification of models, the ML modelfitting approach described here will remain attractive. In our case, it allowed us to elucidate complex patterns of rate variation, in the absence of any associated correlation-based hypotheses, and the same approach was readily extendable to our investigations of complex heterogeneous patterns of selection. Page: 48 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Leys 2005; Woolfit and Bromham 2005) are undermined by the absence of lineage-specific signatures of selection in our codon-based analyses. Selection may still be ultimately responsible for the elevated rates, but only indirectly (e.g., through selection for increased mutation rates in the vittarioids). It is possible that lineage-specific effects were undetectable by our methods (there are one-third as many codons as nucleotide sites, so the codon models may lack some of the power of their nucleotide counterparts). However, the strong contrast between the two model types (vittarioids having very strong lineage effects in the nucleotide models vs. none in the codon models) argues that selective effects, if present, are too weak to explain the observed rate discrepancy. The remaining subset of explanations, then, is limited to those that invoke changes in the supply of mutations, rather than changes in the rate of post-mutation fixation. Because only those explanations with organism-wide effects are tenable, given the bigenomic nature of the rate increase, life history or related environmental factors are considered to be the most likely agent. Reports of molecular rate heterogeneity caused by changes in environment-driven mutation rates are rare in the literature. To our knowledge, such effects have been found only in lichen-forming fungi (Lutzoni and Pagel 1997) and halophytic crustaceans (Hebert et al. 2002). That said, it is important to note that the rate increase for the vittarioids may be driven substantially by a single elevated rate on the clade’s stem branch (Fig. 11(i)). If this one branch is responsible for the rate increase, looking for environmental correlates among extant vittarioids could be unproductive. Instead, one would need to ask what was so unusual about the environment of the ancestor of the vittarioids that could cause mutation rates high enough to yield a rate of substitution approaching six times that of the background, even when spread across all descendent branches? VOL. 63 31–54 2014 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES Implications for Topology Inference and Divergence Time Dating Questions about the patterns and causes of rate variation aside, our results are also relevant to recent debates concerning the influence of rate heterogeneity, and the way it is modeled, on phylogenetic inference. Drummond et al. (2006) make the case that standard “unrooted” (or “no-clock”; Yang and Rannala 2006; Wertheim et al. 2010) models are flawed, in that they do not require contemporaneity of the tips. Given that we know that the tips are contemporaneous (in most cases), is it reasonable to ignore that information when inferring phylogeny? Might our inferences be more accurate if such data were included explicitly in the model via “relaxed clock” approaches (Drummond et al. 2006)? Recent studies failed to find any significant improvement of relaxed clock over unrooted models (Wertheim et al. 2010; Rothfels et al. 2012), but were generally inconclusive: “The question remains whether, in practice, modeling rate variation among branches can improve phylogenetic inference” (Wertheim et al. 2010). In our case, the inference of ingroup relationships was largely insensitive to the choice of model (at least, among those investigated, which included both unrooted and relaxed clock models). However, when unconstrained, both relaxed clock models failed to root the tree correctly, placing the root along the fastest branch (separating the vittarioids from the rest of the tree) rather than between the ingroup and outgroup (results not shown). This placement renders the adiantoids paraphyletic, and strongly conflicts with the results obtained from [13:56 4/12/2013 Sysbio-syt058.tex] expanded taxon samples (e.g., Gastony and Rollo 1995; Schuettpelz and Pryer 2007; Schuettpelz et al. 2007; Bouma et al. 2010). At least for these data, then, the relaxed clock models (with our choices of priors and parameterization) are not yet fully capable of accurately modeling rate variation such that they are able to root this tree correctly without additional information. While the effects of model choice on topology are perhaps modest, their effects on inferences of timing (the dating of evolutionary events) are strong. The two relaxed clock models (LURC and RLC) we employed not only made different inferences about the location and scale of rate changes, but also inferred correspondingly strong differences in the relative timing of divergences (Fig. 11). The RLC model inferred events to be generally younger than the LURC, by over 10% of the total tree height for the base of the ingroup (Fig. 11, arrow a), and by over 20% of the tree height for the crown clade of the vittarioids (Fig. 11, arrow b). These discrepancies are potentially the result of a worst-case scenario (very strong rate heterogeneity in a phylogeny without internal time constraints—our sole constraint is at the base of the tree) but are nonetheless dramatic and potentially worrisome. A closer inspection of the results of the two models shows that, as expected, the branch rates inferred under one model are strongly correlated with those inferred under the other (Supplementary Fig. S3). However, the stem branch for the vittarioids is an outlier—it is inferred to have a much higher rate under the LURC model than under the RLC model, whereas the other vittarioid branches have a much lower rate under the LURC model than under the RLC model (the vittarioid stem branch is below the diagonal in Supplementary Fig. S3, vs. the other vittarioid branches, which are above the diagonal). In effect, the tendency of the RLC model to penalize frequent rate changes causes it to prefer to distribute the substitutions that the LURC model loaded onto the single stem branch, and spread them across the entire clade. The LURC model correspondingly infers a shorter (but faster) stem branch, whereas the RLC model infers a longer, slower branch (Fig. 11), and thus very different divergence times. A similar complex model-mediated effect is apparent in Hemionitis, which is on a sufficiently long terminal branch that it is given its own rate under both models. The RLC model prefers to force all the neighboring branches to share a single rate—especially Mildella, the slow sister lineage to Hemionitis—and thus the rate change on the Hemionitis branch is particularly extreme (Fig. 11, Supplementary Fig. S3). As with the vittarioids, a single deeper branch is inferred to have a much higher rate under the LURC than RLC models (Supplementary Fig. S3, arrow), with the more distal tips showing the opposite pattern, for the overall result of a small number of ancestral branches below the diagonal in Supplementary Fig. S3, and many more (typically terminal) branches above the diagonal. In effect, the RLC prefers rate homogeneity, but when heterogeneity is necessary, the degree of the change is unimportant. The LURC model, with its Page: 49 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Understanding the patterns of selection proved critical in this study, to evaluating the potential causes of the vittarioid rate increase. Given the vittarioids’ idiosyncratic biology, there are many potential correlates for their elevated rate of evolution (e.g., changes associated with their reduced morphologies, with their gametophyte generation, or induced by their epiphytic habit). However, any explanation for their elevated rates, based on these correlates, should be interpreted with caution to ensure that mutation, rather than fixation, is invoked. This danger of inferring causation from correlation is perhaps best illustrated, and most intriguing, in studies of rate heterogeneity in plants, where many studies have found a correlation between generation time and evolutionary rate (Bousquet et al. 1992; Gaut et al. 1992; Laroche et al. 1997; Smith and Donoghue 2008; Korall et al. 2010), a pattern well documented in animals. However, plants do not have a segregated germ line, and thus should show no correlation between the number of generations and the number of potentially mutation-inducing replications in the history of their gametes, which is the explanation typically proffered for the “generation time hypothesis” (Mooers and Harvey 1994; Li et al. 1996). As Muse (2000) notes, while such a correlation may exist in plants, the mechanism is unclear (but see Lanfear et al. 2013). 49 31–54 50 SYSTEMATIC BIOLOGY SUPPLEMENTARY MATERIAL Data files and/or other supplementary information related to this paper have been deposited at Dryad (http://datadryad.org/) under doi:10.5061/dryad. c5m42. FUNDING This work was supported by the Society for Systematic Biologists (Graduate Student Research Award to C.J.R.), the Natural Sciences and Engineering Research Council of Canada (a Julie Payette PGS M and PGS D to C.J.R.), and the National Science Foundation (DEB-1145925 to E.S.). ACKNOWLEDGMENTS Our great thanks to the individuals who made this project possible: Kathleen Pryer, who had a pivotal role in initiating and guiding this project; Layne Huiet, who assisted in the lab and provided critical materials and unpublished information regarding the phylogeny of Adiantum; Ben Redelings for guidance through (and execution of) the BAli-Phy analyses; Ester Gaya and Casey Dunn for assistance with PAMLparser; Volker Knoop for supplying unpublished nad5 primer [13:56 4/12/2013 Sysbio-syt058.tex] sequences; and Joe Bielawski, Fay-Wei Li, Nimrod Rubinstein, and Dave Swofford, for assistance with the complexities of parameter counting, model selection, and codon models. Maarten Christenhusz, Jim Croft, Layne Huiet, Thomas Janssen, Nathalie Nagalingum, Tom Ranker, Harald Schneider, Alan Smith, and Paul Wolf contributed plant materials used in this study; we thank the staff of A, COLO, DUKE, GOET, P, TUR, UC, and UTC for curating the associated vouchers. This manuscript was greatly improved through the thoughtful comments of Rob Lanfear, Sally Otto, Editorin-Chief Frank Anderson, Associate Editor Roberta J. Mason-Gamer, and an anonymous reviewer. APPENDIX 1 List of accessions sampled in this study, presented in the following format: Species, Fern Lab database number (http://fernlab.biology.duke.edu/), Voucher (HERBARIUM), Provenance: GenBank numbers (with citations for previously published sequences) for atpA, atpB, rbcL, atp1, nad5, gapCp (in that order). Missing data are indicated by “–”. Herbarium acronyms follow Index Herbariorum (Thiers [continuously updated]). Adiantum aethiopicum L., 3895, N.Nagalingum 24 (DUKE), Australia, New South Wales: KC984436, KC984441, KC984519, KC984409, KC984493, KC984450. Adiantum formosum R.Br., 4602, A.R.Smith s.n. (UC), Cult. (wild provenance unknown): KC984437, KC984442, KC984520, KC984410, KC984494, KC984453. Adiantum hispidulum Sw., 4603, L.Huiet 101 (UC), Cult. (wild provenance unknown): KC984438, KC984443, KC984521, KC984411, KC984495, KC984455. Adiantum malesianum J.Ghatak, 2506, L.Huiet 111 (UC), Cult. (wild provenance unknown): EF452068 (Schuettpelz et al. 2007), EF452011 (Schuettpelz et al. 2007), EF452132 (Schuettpelz et al. 2007), KC984412, KC984496, EU551257 (Schuettpelz et al. 2008). Adiantum peruvianum Klotzsch, 2507, L.Huiet 103 (UC), Cult. (wild provenance unknown): EF452070 (Schuettpelz et al. 2007), EF452013 (Schuettpelz et al. 2007), EF452133 (Schuettpelz et al. 2007), KC984413, KC984497, KC984456. Adiantum raddianum C.Presl, 638, P.G.Wolf 717 (UTC), Cult. (wild provenance unknown): EF452071 (Schuettpelz et al. 2007), U93840 (Wolf 1997), KC984522, KC984414, KC984498, KC984458. Adiantum tenerum Sw., 2504, L.Huiet 107 (UC), Cult. (wild provenance unknown): EF452072 (Schuettpelz et al. 2007), EF452014 (Schuettpelz et al. 2007), EF452134 (Schuettpelz et al. 2007), KC984415, KC984499, KC984459. Adiantum tetraphyllum Humb. & Bonpl. ex Willd., 2505, L.Huiet 105 (UC), Cult. (wild provenance unknown): EF452073 (Schuettpelz et al. 2007), EF452015 (Schuettpelz et al. 2007), EF452135 (Schuettpelz et al. 2007), KC984416, KC984500, KC984460. Anetium citrifolium (L.) Splitg., 3339, M.Christenhusz 4076 (TUR), France, Guadeloupe: EF452075 (Schuettpelz et al. 2007), EF452017 (Schuettpelz et al. 2007), KC984523, KC984417, KC984501, KC984461. Antrophyum latifolium Page: 50 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 tendency toward many smaller rate changes, is able to “accommodate” some of the additional substitutions on the deeper ancestral branches, resulting in rate changes that are less dramatic but more frequent (Fig. 11, Supplementary Fig. S3). While all of our rate and divergence time inferences are model dependent (we do not know the “true rates”), the concordance between the pattern of rate variation in the LURC results and the strong preference of the data for ML models with the maximum number of permitted rate parameters (Fig. 4; Table 3) suggests that fine-scale rate heterogeneity may be dominant in our data. Contrary to the position of Drummond and Suchard (2010) that “in any given tree there exist a small number of rate changes … in general, the numerous small changes arise as a modeling consequence, and are not necessarily data-driven” (see also Yoder and Yang 2000), our data suggest the opposite: it may be the few, large changes inferred under the RLC model that are a model consequence. An appealing avenue for future research is to compare the fit of complex relaxed clock models on diverse data sets, using Bayes factors. Unfortunately, Bayes factors are frequently calculated from the harmonic mean, which is an unreliable method (Lartillot and Philippe 2006; Fan et al. 2011; Xie et al. 2011); more reliable methods are just now becoming available (Lewis et al. 2010; Baele et al. 2012). Regardless of the ultimate truth, one conclusion is clear—inferences of rate, and thus of divergence times, are critically dependent on the model adopted. VOL. 63 31–54 2014 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES [13:56 4/12/2013 Sysbio-syt058.tex] 2007), KC984431, KC984515, KC984472. Pityrogramma austroamericana Domin, 2561, E.Schuettpelz 301 (DUKE), Cult. (wild provenance unknown): EU268769 (Rothfels et al. 2008), EF452050 (Schuettpelz et al. 2007), EF452166 (Schuettpelz et al. 2007), KC984432, KC984516, KC984473. Rheopteris cheesemaniae Alston, 3373, Croft 1749 (A), Papua New Guinea: EF452126 (Schuettpelz et al. 2007), EF452063 (Schuettpelz et al. 2007), EF452176 (Schuettpelz et al. 2007), KC984433, KC984517, –. Vittaria graminifolia Kaulf., 2395, E.Schuettpelz 227 (DUKE), Ecuador, Zamora-Chinchipe Prov.: EF452128 (Schuettpelz et al. 2007), EF452064 (Schuettpelz et al. 2007), U21295 (Crane et al. 1995), KC984434, KC984518, –. REFERENCES Akaike H. 1974. A new look at the statistical model identification. IEEE T. Automat. Contr. 19:716–723. Baele G., Lemey P., Bedford T., Rambaut A., Suchard M.A., Alekseyenko A.V. 2012. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol. Biol. Evol. 29:2157–2167. Baer C.F., Miyamoto M.M., Denver D.R. 2007. Mutation rate variation in multicellular eukaryotes: Causes and consequences. Nat. Rev. Genet. 8:619–631. Bielawski J., Yang Z. 2004. A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J. Mol. Evol. 59:121–132. Bininda-Emonds O.R.P. 2007. Fast genes and slow clades: Comparative rates of molecular evolution in mammals. Evol. Bioinform. 3:59–85. Bouma W., Ritchie P. Perrie L. 2010. Phylogeny and generic taxonomy of the New Zealand Pteridaceae ferns from chloroplast rbcL DNA sequences. Aust. Syst. Bot. 23:143–151. Bousquet J., Strauss S., Doerksen A., Price R. 1992. Extensive variation in evolutionary rate of rbcL gene-sequences among seed plants. PNAS 89:7844–7848. Brandley M., Schmitz A., Reeder T. 2005. Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Syst. Biol. 54:373–390. Bromham L. 2002. Molecular clocks in reptiles: Life history influences rate of molecular evolution. Mol. Biol. Evol. 19:302–309. Bromham L. 2009. Why do species vary in their rate of molecular evolution? Biol. Letters 5:401–404. Bromham L., Leys R. 2005. Sociality and the rate of molecular evolution. Mol. Biol. Evol. 22:1393–1402. Bromham L., Rambaut A., Harvey P. 1996. Determinants of rate variation in mammalian DNA sequence evolution. J. Mol. Evol. 43:610–621. Bromham L., Woolfit M. 2004. Explosive radiations and the reliability of molecular clocks: Island endemic radiations as a test case. Syst. Biol. 53:758–766. Bromham L., Woolfit M., Lee M., Rambaut A. 2002. Testing the relationship between morphological and molecular rates of change along phylogenies. Evolution 56:1921–1930. Burnham K.P., Anderson D.R. 2002. Model selection and multimodel inference: A practical information-theoretic approach. New York, USA: Springer. Burnham K.P., Anderson D.R. 2004. Multimodel inference: Understanding AIC and BIC in model selection. Sociol. Method. Res. 33:261–304. Cho Y., Mower J., Qiu Y.L., Palmer J. 2004. Mitochondrial substitution rates are extraordinarily elevated and variable in a genus of flowering plants. PNAS 101:17741–17746. Crane E.H. 1997. A revised circumscription of the genera of the fern family Vittariaceae. Syst. Bot. 22:509–517. Crane E.H., Farrar D.R., Wendel J.F. 1995. Phylogeny of the Vittariaceae: Convergent simplification leads to a polyphyletic Vittaria. Am. Fern J. 85:283–305. Page: 51 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Blume, 3078, T.Ranker 1774 (COLO), Papua New Guinea: EF452076 (Schuettpelz et al. 2007), EF452018 (Schuettpelz et al. 2007), EF452138 (Schuettpelz et al. 2007), KC984418, KC984502, KC984462. Bommeria hispida (Mett. ex Kuhn) Underw., 3174, E.Schuettpelz et al. 467 (DUKE), USA, Arizona, Cochise Co.: EU268725 (Rothfels et al. 2008), EF452022 (Schuettpelz et al. 2007), EF452142 (Schuettpelz et al. 2007), KC984419, KC984503, KC984463. Calciphilopteris ludens (Wall. ex Hook.) Yesilyurt & H.Schneid., 3510, H.Schneider s.n. (GOET), Cult. (wild provenance unknown): EU268741 (Rothfels et al. 2008), EF452031 (Schuettpelz et al. 2007), EF452150 (Schuettpelz et al. 2007), KC984422, KC984506, KC984465. Cheilanthes covillei Maxon, 3150, E.Schuettpelz et al. 443 (DUKE), USA, Arizona, Maricopa Co.: EU268733 (Rothfels et al. 2008), KC984444, EU268782 (Rothfels et al. 2008), KC984420, KC984504, EU551267 (Schuettpelz et al. 2008). Cryptogramma crispa (L.) R.Br. ex Hook., 2949, M.Christenhusz & F.Katzer 3871 (DUKE), United Kingdom, Scotland: EU268740 (Rothfels et al. 2008), EF452027 (Schuettpelz et al. 2007), EF452148 (Schuettpelz et al. 2007), KC984421, KC984505, KC984464. Haplopteris elongata (Sw.) E.H.Crane, 2546, L.Huiet 112 (UC), New Caledonia: EF452096 (Schuettpelz et al. 2007), EF452035 (Schuettpelz et al. 2007), EF452153 (Schuettpelz et al. 2007), KC984423, KC984507, KC984466. Hecistopteris pumila (Spreng.) J.Sm., 3278, M.Christenhusz 3976 (TUR), France, Guadeloupe: EF452097 (Schuettpelz et al. 2007), EF452036 (Schuettpelz et al. 2007), KC984524, KC984424, KC984508, –. Hemionitis palmata L., 2557, E.Schuettpelz 297 (DUKE), Cult. (wild provenance unknown): EU268743 (Rothfels et al. 2008), EF452037 (Schuettpelz et al. 2007), KC984525, KC984425, KC984509, KC984467. Mildella intramarginalis (Kaulf. ex Link) Trevis, 3513, H.Schneider s.n. (GOET), Cult. (wild provenance unknown): EF452085 (Schuettpelz et al. 2007), EF452025 (Schuettpelz et al. 2007), EF452146 (Schuettpelz et al. 2007), KC984426, KC984510, KC984468. Monogramma acrocarpa (Holttum) D.L.Jones, 3375, T.Ranker 1778 (COLO), Papua New Guinea: KC984435, KC984439, EF452156 (Schuettpelz et al. 2007), KC984427, KC984511, –. Monogramma graminea (Poir.) Schkuhr, 3548, T.Janssen 2692 (P), France, Reunion: EF452102 (Schuettpelz et al. 2007), EF452040 (Schuettpelz et al. 2007), EF452157 (Schuettpelz et al. 2007), KC984428, KC984512, KC984469. Notholaena grayi Davenp., 3187, E.Schuettpelz et al. 480 (DUKE), USA, Arizona, Cochise Co.: EU268749 (Rothfels et al. 2008), JF832173 (Rothfels et al. 2012), EU268794 (Rothfels et al. 2008), KC984429, KC984513, KC984470. Pellaea atropurpurea (L.) Link, 2957, E.Schuettpelz 312 (DUKE), Cult. (originally collected from Virginia, USA): JQ855925 (Johnson et al. 2012), KC984440, EF452162 (Schuettpelz et al. 2007), KC984430, KC984514, KC984471. Pentagramma triangularis (Kaulf.) Yatsk., Windham & E.Wollenw., 3152, E.Schuettpelz et al. 445 (DUKE), USA, Arizona, Gila Co.: EU268768 (Rothfels et al. 2008), EF452049 (Schuettpelz et al. 2007), EF452165 (Schuettpelz et al. 51 31–54 52 SYSTEMATIC BIOLOGY [13:56 4/12/2013 Sysbio-syt058.tex] Korall P., Schuettpelz E., Pryer K.M. 2010. Abrupt deceleration of molecular evolution linked to the origin of arborescence in ferns. Evolution 64:2786–2792. Lanfear R. 2010. The local-clock permutation test: A simple test to compare rates of molecular evolution on phylogenetic trees. Evolution 65:606–611. Lanfear R., Calcott B., Ho S.Y.W., Guindon S. 2012. PartitionFinder: Combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29:1695–1701. Lanfear R., Ho S.Y.W., Davies T.J., Moles A.T., Aarssen L., Swenson N.G., Warmann L., Zanne A.E., Allen A.P. 2013. Taller plants have lower rates of molecular evolution. Nat. Commun. 4:1–29. Lanfear R., Thomas J.A., Welch J.J., Brey T., Bromham L. 2007. Metabolic rate does not calibrate the molecular clock. PNAS 104: 15388–15393. Lanfear R., Welch J. J., Bromham L. 2010. Watching the clock: Studying variation in rates of molecular evolution between species. Trends Ecol. Evol. 25:496–503. Laroche J., Li P., Maggia L., Bousquet J. 1997. Molecular evolution of angiosperm mitochondrial introns and exons. PNAS 94:5722–5727. Lartillot N., Philippe H. 2006. Computing Bayes factors using thermodynamic integration. Syst. Biol. 55:195–207. Lartillot N., Poujol R. 2010. A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters. Mol. Biol. Evol. 28:729–744. Lewis P.O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol. 50:913–925. Lewis P.O., Holder M.T., Swofford D.L. 2010. Phycas. Available from: URL www.phycas.org (last accessed January 1, 2012). Lewis L., Mishler B.D., Vilgalys R. 1997. Phylogenetic relationships of the liverworts (Hepaticae), a basal embryophyte lineage, inferred from nucleotide sequence data of the chloroplast gene rbcL. Mol. Phylogenet. Evol. 7:377–393. Li W.-H., Ellsworth D.L., Krushkal J., Chang B.H.J., Hewett-Emmett D. 1996. Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol. Phylogenet. Evol. 5:182–187. Li F.-W., Kuo L.-Y., Rothfels C.J., Ebihara A., Chiou W.-L., Windham M.D., Pryer K.M. 2011. rbcL and matK earn two thumbs up as the core DNA barcode for ferns. PLoS ONE 6:e26597. Lu J.-M., Li D.-Z., Lutz S., Soejima A., Yi T., Wen J. 2011a. Biogeographic disjunction between eastern Asia and North America in the Adiantum pedatum complex (Pteridaceae). Am. J. Bot. 98:1680–1693. Lu J.-M., Wen J., Lutz S., Wang Y.-P., Li D.-Z. 2011b. Phylogenetic relationships of Chinese Adiantum based on five plastid markers. J. Plant Res. 125:1–13. Lumbsch H., Hipp A.L., Divakar P.K., Blanco O., Crespo A. 2008. Accelerated evolutionary rates in tropical and oceanic parmelioid lichens (Ascomycota). BMC Evol. Biol. 8:257. Lutzoni F., Pagel M. 1997. Accelerated evolution as a consequence of transitions to mutualism. PNAS 94:11422–11427. Maddison W.P., Maddison D.R. 2011. Mesquite: A modular system for evolutionary analysis. Available from: URL mesquiteproject.org (last accessed January 1, 2012). Martin W., Lydiate D., Brinkmann H., Forkmann G., Saedler H. and Cerff R. 1993. Molecular phylogenies in angiosperm evolution. Mol. Biol. Evol. 10:140–162. Mayrose I., Otto S.P. 2011. A likelihood method for detecting traitdependent shifts in the rate of molecular evolution. Mol. Biol. Evol. 28:759–770. McCoy S.R., Kuehl J. V., Boore J. L., Raubeson L.A. 2008. The complete plastid genome sequence of Welwitschia mirabilis: An unusually compact plastome with accelerated divergence rates. BMC Evol. Biol. 8:130. Meyer-Gauen G., Schnarrenberger C., Cerff R., Martin W. 1994. Molecular characterization of a novel, nuclear-encoded, NAD+dependent glyceraldehyde-3-phosphate dehydrogenase in plastids of the gymnosperm Pinus sylvestris L. Plant Mol. Biol. 26:1155–1166. Moncalvo J., Drehmel D., Vilgalys R. 2000. Variation in modes and rates of evolution in nuclear and mitochondrial ribosomal DNA in the mushroom genus Amanita (Agaricales, Basidiomycota): Phylogenetic implications. Mol. Phylogenet. Evol. 16:48–63. Page: 52 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Cronn R., Cedroni M., Haselkorn T., Grover C., Wendel J.F. 2002. PCRmediated recombination in amplification products derived from polyploid cotton. Theor. Appl. Genet. 104:482–489. Davies T., Savolainen V., Chase M., Moat J., Barraclough T. 2004. Environmental energy and evolutionary rates in flowering plants. P. Roy. Soc. Lond. B Bio. 271:2195–2200. Des Marais D.L., Smith A.R., Britton D.M., Pryer K.M. 2003. Phylogenetic relationships and evolution of extant horsetails, Equisetum, based on chloroplast DNA sequence data (rbcL and trnL-F). Int. J. Plant Sci. 164:737–751. Drummond A.J., Ho S.Y.W., Phillips M.J., Rambaut A. 2006. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4: 699–710. Drummond A. J., Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214. Drummond A.J., Suchard M.A. 2010. Bayesian random local clocks, or one rate to rule them all. BMC Biol. 8:1–12. Eiserhardt W.L., Rohwer J.G., Russell S.J., Yesilyurt J.C., Schneider H. 2011. Evidence for radiations of cheilanthoid ferns in the Greater Cape Floristic Region. Taxon 60: 1269–1283. Eyre-Walker A., Gaut B. 1997. Correlated rates of synonymous site evolution across plant genomes. Mol. Biol. Evol. 14:455–460. Fan Y., Wu R., Chen M.-H., Kuo L., Lewis P. 2011. Choosing among partition models in Bayesian phylogenetics. Mol. Biol. Evol. 28:523–532. Farrar D. 1974. Gemmiferous fern gametophytes-Vittariaceae. Am. J. Bot. 61:146–155. Fitch W.M., Beintema J. 1990. Correcting parsimonious trees for unseen nucleotide substitutions: The effect of dense branching as exemplified by ribonuclease. Mol. Biol. Evol. 7:438–443. Gastony G.J., Rollo D.R. 1995. Phylogeny and generic circumscriptions of cheilanthoid ferns (Pteridaceae: Cheilanthoideae) inferred from rbcL nucleotide sequences. Am. Fern J. 85:341–360. Gastony G.J., Rollo D.R. 1998. Cheilanthoid ferns (Pteridaceae: Cheilanthoideae) in the southwestern United States and adjacent Mexico–a molecular phylogenetic reassessment of generic lines. Aliso 17:131–144. Gastony G.J., Yatskievych G. 1992. Maternal inheritance of the chloroplast and mitochondrial genomes in cheilanthoid ferns. Am. J. Bot. 79:716–722. Gaut B., Muse S., Clark W., Clegg M. 1992. Relative rates of nucleotide substitution at the rbcL locus of monocotyledonous plants. J. Mol. Evol. 35:292–303. Gaya E., Redelings B.D., Navarro-Rosinés P., Llimona X., De Cáceres M., Lutzoni F. 2011. Align or not to align? Resolving species complexes within the Caloplaca saxicola group as a case study. Mycologia 103:361–378. Goldman N., Whelan S. 2002. A novel use of equilibrium frequencies in models of sequence evolution. Mol. Biol. Evol. 19: 1821–1831. Goldman N., Yang Z. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725–736. Hebert P., Remigio E., Colbourne J., Taylor D., Wilson C. 2002. Accelerated molecular evolution in halophilic crustaceans. Evolution 56:909–926. Hoegg S., Vences H., Brinkmann H., Meyer A. 2004. Phylogeny and comparative substitution rates of frogs inferred from sequences of three nuclear genes. Mol. Biol. Evol. 21:1188–1200. Hugall A.F., Lee M. S. Y. 2007. The likelihood node density effect and consequences for evolutionary studies of molecular rates. Evolution 61:2293–2307. Hurvich C.M., Tsai C.-L. 1989. Regression and time-series model selection in small samples. Biometrika 76:297–307. Johnson A.K., Rothfels C.J., Windham M.D., Pryer K. M. 2012. Unique expression of a sporophytic character on the gametophytes of notholaenid ferns (Pteridaceae). Am. J. Bot. 99:1118–1124. Kimura M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120. Kirkpatrick R.E.B. 2007. Investigating the monophyly of Pellaea (Pteridaceae) in the context of a phylogenetic analysis of cheilanthoid ferns. Syst. Bot. 32:504–518. VOL. 63 31–54 2014 ROTHFELS AND SCHUETTPELZ—VITTARIOID RATES [13:56 4/12/2013 Sysbio-syt058.tex] Sanderson M.J. 2003. r8s: Inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19:301–302. Sarich V.M., Wilson A.C. 1967. Immunological time scale for hominid evolution. Science 158:1200. Schneider H., Smith A., Cranfill R., Hildebrand T., Haufler C., Ranker T. 2004. Unraveling the phylogeny of polygrammoid ferns (Polypodiaceae and Grammitidaceae): Exploring aspects of the diversification of epiphytic plants. Mol. Phylogenet. Evol. 31:1041–1063. Schon I., Martens K., Van Doninck K., Butlin R.K. 2003. Evolution in the slow lane: Molecular rates of evolution in sexual and asexual ostracods (Crustacea: Ostracoda). Biol. J. Lin. Soc. 79:93–100. Schuettpelz E., Grusz A.L., Windham M.D., Pryer K.M. 2008. The utility of nuclear gapCp in resolving polyploid fern origins. Syst. Bot. 33:621–629. Schuettpelz E., Korall P., Pryer K.M. 2006. Plastid atpA data provide improved support for deep relationships among ferns. Taxon 55:897–906. Schuettpelz E., Pryer K.M. 2006. Reconciling extreme branch length differences: Decoupling time and rate through the evolutionary history of filmy ferns. Syst. Biol. 55:485–502. Schuettpelz E., Pryer K.M. 2007. Fern phylogeny inferred from 400 leptosporangiate species and three plastid genes. Taxon 56:1037–1050. Schuettpelz E., Pryer K.M. 2009. Evidence for a Cenozoic radiation of ferns in an angiosperm-dominated canopy. PNAS 106:11200–11205. SchuettpelzE., Schneider H., Huiet L., Windham M.D., Pryer K.M. 2007. A molecular phylogeny of the fern family Pteridaceae: Assessing overall relationships and the affinities of previously unsampled genera. Mol. Phylogenet. Evol. 44:1172–1185. Shao R., Dowton M., Murrell A., Barker S. 2003. Rates of gene rearrangement and nucleotide substitution are correlated in the mitochondrial genomes of insects. Mol. Biol. Evol. 20:1612–1619. Shapiro B., Rambaut A., Drummond A.J. 2005. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23:7–9. Simmons M.P., Ochoterena H. 2000. Gaps as characters in sequencebased phylogenetic analyses. Syst. Biol. 49:369–381. Singh N.D., Arndt P., Clark A.G., Aquadro C.F. 2009. Strong evidence for lineage and sequence specificity of substitution rates and patterns in Drosophila. Mol. Biol. Evol. 26:1591–1605. Small R., Ryburn J., Cronn R., Seelanan T., Wendel J. 1998. The tortoise and the hare: Choosing between noncoding plastome and nuclear Adh sequences for phylogeny reconstruction in a recently diverged plant group. Am. J. Bot. 85:1301–1315. Smith S.A., Donoghue M.J. 2008. Rates of molecular evolution are linked to life history in flowering plants. Science 322:86–89. Soltis P.S., Soltis D.E., Savolainen V., Crane P.R., Barraclough T.G. 2002. Rate heterogeneity among lineages of tracheophytes: Integration of molecular and fossil data and evidence for molecular living fossils. PNAS 99:4430–4435. Stamatakis A. 2006. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690. Stamatakis A., Hoover P., Rougemont J. 2008. A rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57:758–771. Suchard M.A., Redelings B.D. 2006. BAli-Phy: Simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22:2047–2048. Swofford D.L. 2002. PAUP*: Phylogenetic analysis using parsimony (*and other methods). Sunderland (MA): Sinauer. Tavaré S. 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. Am. Math. Soc. Lect. Math. Life Sci. 17:57–86. Thiers B. [continuously updated]. Index Herbariorum: A global directory of public herbaria and associated staff. New York Botanical Garden’s Virtual Herbarium. Available from: URL http://sweetgum.nybg.org/ih/ (last accessed January 1, 2013). Thomas J., Welch J.J., Woolfit M., Bromham L. 2006. There is no universal molecular clock for invertebrates, but rate variation does not scale with body size. PNAS 103:7366–7371. Tryon R.M. 1942. A revision of the genus Doryopteris. Contrib. Gray Herb. 143:1–80. Page: 53 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Mooers A.Ø., Harvey P.H. 1994. Metabolic rate, generation time, and the rate of molecular evolution in birds. Mol. Phylogenet. Evol. 3:344–350. Muse S.V. 2000. Examining rates and patterns of nucleotide substitution in plants. Plant Mol. Biol. 42:25–43. Muse S., Gaut B. 1997. Comparing patterns of nucleotide substitution rates among chloroplast loci using the relative ratio test. Genetics 146:393–399. Muse S., Weir B. 1992. Testing for equality of evolutionary rates. Genetics 132:269–276. Nabholz B., Glémin S., Galtier N. 2008. Strong variations of mitochondrial mutation rate across mammals–the longevity hypothesis. Mol. Biol. Evol. 25:120–130. Neiman M., Hehman G., Miller J.T., Logsdon J.M. Jr., Taylor D.R. 2010. Accelerated mutation accumulation in asexual lineages of a freshwater snail. Mol. Biol. Evol. 27:954–963. Nielsen R., Yang Z. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936. Nikolaev S.I., Montoya-Burgos J.I., Popadin K., Parand L., Margulies E.H., Antonarakis S.E. 2007. Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. PNAS 104:20443–20448. Pagel M., Venditti C., Meade A. 2006. Large punctuational contribution of speciation to evolutionary divergence at the molecular level. Science 314:119–121. Parkinson C.L., Mower J.P., Qiu Y.L., Shirk A.J., Song K.M., Young N.D., dePamphilis C.W., Palmer J.D. 2005. Multiple major increases and decreases in mitochondrial substitution rates in the plant family Geraniaceae. BMC Evol. Biol. 5:73. Peterson J., Brinkmann H., Cerff R. 2003. Origin, evolution, and metabolic role of a novel glycolytic GAPDH enzyme recruited by land plant plastids. J. Mol. Evol. 57:16–26. Posada D. 2008. jModelTest: Phylogenetic model averaging. Mol. Biol. Evol. 25:1253–1256. Prado J., Rodrigues C.D.N., Salatino A., Salatino M.L.F. 2007. Phylogenetic relationships among Pteridaceae, including Brazilian species, inferred from rbcL sequences. Taxon 56:355–368. Pryer K.M., Schuettpelz E., Wolf P.G., Schneider H., Smith A.R., Cranfill R. 2004. Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am. J. Bot. 91:1582–1598. Rambaut, A., Drummond A.J. 2007. Tracer. Available from: URL http://tree.bio.ed.ac.uk/software/tracer/ (last accessed January 1, 2012). Redelings B.D., Suchard M.A. 2007. Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evol. Biol. 7:40. Ronquist F., Huelsenbeck J.P. 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574. Rothfels C.J., Larsson A., Kuo L.-Y., Korall P., Chiou W.-L., Pryer K.M. 2012. Overcoming deep roots, fast rates, and short internodes to resolve the ancient rapid radiation of eupolypod II ferns. Syst. Biol. 61:490–509. Rothfels C.J., Larsson A., Li F.-W., Sigel E.M., Huiet L., Burge D.O., Ruhsam M., Graham S.W., Stevenson D.W., Wong G.K.-S., Korall P., Pryer K.M. 2013a. Transcriptome-mining for single-copy nuclear markers in ferns. PLoS ONE, in press. Rothfels C.J., Windham M.D., Grusz A.L., Gastony G.J., Pryer K.M. 2008. Toward a monophyletic Notholaena (Pteridaceae): Resolving patterns of evolutionary convergence in xeric-adapted ferns. Taxon 57:712–724. Rothfels C.J., Windham M.D., Pryer K.M. 2013b. A plastid phylogeny of the cosmopolitan fern family Cystopteridaceae (Polypodiopsida). Syst. Bot. 38:295–306. Ruhfel B., Lindsay S., Davis C.C. 2008. Phylogenetic placement of Rheopteris and the polyphyly of Monogramma (Pteridaceae s.l.): Evidence from rbcL sequence data. Syst. Bot. 33: 37–43. Sanderson M.J. 2002. Estimating absolute rates of molecular evolution and divergence times: A penalized likelihood approach. Mol. Biol. Evol. 19:101–109. 53 31–54 54 SYSTEMATIC BIOLOGY [13:56 4/12/2013 Sysbio-syt058.tex] (Cornales)–Pattern of variation and underlying causal factors. Mol. Phylogenet. Evol. 49:327–342. Xie W., Lewis P., Fan Y., Kuo L., Chen M.-H. 2011. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60:150–160. Yang Z. 1996. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11:367–372. Yang Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568–573. Yang Z. 2007. PAML 4: A program package for phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24:1586–1591. Yang Z., Nielsen R. 1998. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J. Mol. Evol. 46: 409–418. Yang Z., Nielsen R. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908–917. Yang Z., Nielsen R., Goldman N., Pedersen A.-M.K. 2000. Codonsubstitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449. Yang Z., Rannala B. 2006. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23:212–226. Yang Z., Wong W.S.W., Nielsen R. 2005. Bayes empirical Bayes inference of amino acid sites under positive selection. Mol. Biol. Evol. 22:1107–1118. Yesilyurt J.C., Schneider H. 2010. The new fern genus Calciphilopteris (Pteridaceae). Phytotaxa 7:52–59. Yoder A.D., Yang Z. 2000. Estimation of primate speciation dates using local molecular clocks. Mol. Biol. Evol. 17:1081–1090. Zhang G., Zhang X., Chen Z., Liu H., Yang W. 2007. First insights in the phylogeny of Asian cheilanthoid ferns based on sequences of two chloroplast markers. Taxon 56:369–378. Zoller S., Lutzoni F. 2003. Slow algae, fast fungi: Exceptionally high nucleotide substitution rate differences between lichenized fungi Omphalina and their symbiotic green algae Coccomyxa. Mol. Phylogenet. Evol. 29:629–640. Zuckerkandl E., Pauling L. 1962. Molecular disease, evolution, and genetic heterogeneity. In: Kasha M., Pullman B., editors. Horizons in biochemistry. New York: Academic Press. p. 189–225. Zuckerkandl E., Pauling L. 1965. Evolutionary divergence and convergence in proteins. In: Bryson V., Vogel H.J., editors. Evolving genes and proteins. New York: Academic Press. p. 97–165. Page: 54 Downloaded from http://sysbio.oxfordjournals.org/ by Carl Rothfels on January 9, 2014 Tryon R.M., Tryon A.F., Kramer K.U. 1990. Pteridaceae. In: Kramer K.U., Green P.S., editors. The families and genera of vascular plants. Berlin: Springer. p. 230–256. Vangerow S., Teerkorn T., Knoop V. 1999. Phylogenetic information in the mitochondrial nad5 gene of pteridophytes: RNA editing and intron sequences. Plant Biol. 1:235–243. Venditti C., Meade A., Pagel M. 2006. Detecting the node-density artifact in phylogeny reconstruction. Syst. Biol. 55:637–643. Weisrock D.W. 2012. Concordance analysis in mitogenomic phylogenetics. Mol. Phylogenet. Evol. 65:194–202. Welch J.J., Bininda-Emonds O.R.P., Bromham L. 2008. Correlates of substitution rate variation in mammalian protein-coding sequences. BMC Evol. Biol. 8:53. Wertheim J.O., Sanderson M.J., Worobey M., Bjork A. 2010. Relaxed molecular clocks, the bias-variance trade-off, and the quality of phylogenetic inference. Syst. Biol. 59:1–8. Wikström N., Pryer K.M. 2005. Incongruence between primary sequence data and the distribution of a mitochondrial atp1 group II intron among ferns and horsetails. Mol. Phylogenet. Evol. 36:484–493. Windham M.D., Huiet L., Schuettpelz E., Grusz A.L., Rothfels C.J., Beck J. 2009. Using plastid and nuclear DNA sequences to redraw generic boundaries and demystify species complexes in cheilanthoid ferns. Am. Fern J. 99:68–72. Wolf P.G. 1997. Evaluation of atpB nucleotide sequences for phylogenetic studies of ferns and other pteridophytes. Am. J. Bot. 84:1429–1440. Wolfe K.H., Li W.-H., Sharp P. M. 1987. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. PNAS 84:9054–9058. Wolfe K.H., Sharp P., Li W.H. 1989a. Mutation rates differ among regions of the mammalian genome. Nature 337:283–285. Wolfe K.H., Sharp P.M., Li W.-H. 1989b. Rates of synonymous substitution in plant nuclear genes. J. Mol. Evol. 29:208–211. Wong W.S.W., Yang Z., Goldman N., Nielsen R. 2004. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168:1041–1051. Woolfit M., Bromham L. 2003. Increased rates of sequence evolution in endosymbiotic bacteria and fungi with small effective population sizes. Mol. Biol. Evol. 20:1545–1555. Woolfit M., Bromham L. 2005. Population size and molecular evolution on islands. P. Roy. Soc. B-Biol. Sci. 272:2277–2282. Xiang Q.-Y., Thorne J.L., Seo T.-K., Zhang W., Thomas D.T., Ricklefs R.E. 2008. Rates of nucleotide substitution in Cornaceae VOL. 63 31–54
© Copyright 2025 Paperzz