SUPPLEMENTAL METHODS Estimating Divergence Times in Angiosperms In order to generate Figure 5 in the main manuscript, we estimated divergence times across angiosperms as follows. The full concatenated alignment was partitioned using the k-means algorithm of PartitionFinder with the --RAxML --min-subset-size 5000 option. This created 19 partitions. The best schemes partitions from PartitionFinder were used with the RAxML option “-f s” to generate separate sub-alignments, these separate PHYLIP files were then concatenated sequentially into a single file for use in PAML v4.9a which recognizes them as separate partitions (Yang, 2007). A chronogram was generated in PAML v4.9a using this partitioned alignment file and the best RAxML tree. The MCMCTree algorithm in PAML v4.9a permits long genome scale alignments to be used for chronogram estimation in a two-step process that takes advantage of an approximate likelihood calculation in step one followed by Bayesian divergence time estimation in step two (Thorne et al. 1998; dos Reis and Yang 2011; dos Reis et al. 2012). The default priors were used for two runs with burnin of 6000 and sample frequency of 2, the number of iterations (nsample) was set at 60,000. Convergence was checked by plotting divergence estimates from the two runs against each other, and effective sample sizes (ESS) were checked in Tracer v1.6 (Rambaut et al. 2014) to ensure that values are over 200. Calibration priors generally followed Magallón et al. (2015). In their Bayesian analysis priors for the calibration points used a lognormal distribution with ln-mean of the youngest fossil age for the calibration representing the fossil age plus 10% (e.g., a minimum fossil age of 10 MYA would be set so the mean was 11 MYA), with a standard deviation of 1. PAML v4.9a does not use lognormal distributions for calibration points, but instead uses the Cauchy distribution though the program does use a convenient mechanism for setting the 10% offset (Inoue et al., 2009). Bounds were set in the Newick-formatted reference tree at all nodes except the root using the PAML format of ‘L(minimum date, 0.1,1.45, 1e-300)'. The offset of 0.1 (=10% of date), and a scale parameter of 1.45 gives a similar density distribution to that used for the lognormal calibration priors used by Magallón et al. (2015). This approach was recommended by Mario dos Reis (pers. comm.). A minimum-bound setting was made hard (PAML defaults to soft bounds) via the 1e-300 parameter setting. Taken together these settings create a hard minimum bound with the mode at an age 10% older than the observed fossil but with a skewed long-tailed density distribution being used to approximate a soft maximum bound on the calibrations (Inoue et al., 2009). This takes into account the idea that the most recent common ancestor (MRCA) of a clade that includes a given fossil is likely to be older than the fossil itself. Eight calibration points could be used from Magallón et al. (2015): the crown group node for the root Angiospermae (MRCA Viburnum lantanoides-Amborella trichopoda) received upper and lower bounds of 136 and 139.35 MYA (in PAML coded as 'B(1.36, 1.3935)'). All other branches received hard minimum bounds and an offset for the mode with soft maximal bounds according to the procedure described previously. They were as follows: Monocotyledoneae (MRCA Rhynchospora caduca-Acorus americanus) received a lower bound of 112 MYA; Arecaceae (MRCA Sabal bermudanaMusa acuminata) 83.5 MYA; Typhaceae (MRCA Rhynchospora caduca-Typha latifolia 65.5 MYA; Nelumbonaceae (MRCA Protea pruinosa-Nelumbo nucifera) 99.6 MYA; Solanaceae (MRCA Solanum tuberosum-Ipomoea purpurea) 33.9 MYA; Brassicales, MRCA (Stanleya albescens-Theobroma cacao) 89.3 MYA; and Brassicaceae (MRCA Stanleya albescens-Arabidopsis thaliana) 23.03 MYA. In addition, two fossils not used by Magallón et al. (2015) in Cyperaceae were used: a fossil for the Mapanoid clade (MRCA of Chorizandra multiarticulata-Hypolytrum nemorum) 47 MYA (Smith et al. 2009) and for Carex 37.8 MYA (Smith et al. 2010). These branching time estimates were used to estimate the divergence time between the reference taxa used in the probe design, and the taxa sequenced using hybrid enrichment The chronogram from PAML was read into R using the package ape version 3.4 (Paradis et al., 2004). The command “mrca” was used determine the node numbers for each pair of species on the tree, and the command “branching.times” was used to get the branching time for all nodes.
© Copyright 2026 Paperzz