Supplemental Methods Estimating Divergence Times in

SUPPLEMENTAL METHODS
Estimating Divergence Times in Angiosperms
In order to generate Figure 5 in the main manuscript, we estimated divergence
times across angiosperms as follows. The full concatenated alignment was partitioned
using the k-means algorithm of PartitionFinder with the --RAxML --min-subset-size
5000 option. This created 19 partitions. The best schemes partitions from PartitionFinder
were used with the RAxML option “-f s” to generate separate sub-alignments, these
separate PHYLIP files were then concatenated sequentially into a single file for use in
PAML v4.9a which recognizes them as separate partitions (Yang, 2007). A chronogram
was generated in PAML v4.9a using this partitioned alignment file and the best RAxML
tree. The MCMCTree algorithm in PAML v4.9a permits long genome scale alignments to
be used for chronogram estimation in a two-step process that takes advantage of an
approximate likelihood calculation in step one followed by Bayesian divergence time
estimation in step two (Thorne et al. 1998; dos Reis and Yang 2011; dos Reis et al. 2012).
The default priors were used for two runs with burnin of 6000 and sample frequency of 2,
the number of iterations (nsample) was set at 60,000. Convergence was checked by
plotting divergence estimates from the two runs against each other, and effective sample
sizes (ESS) were checked in Tracer v1.6 (Rambaut et al. 2014) to ensure that values are
over 200.
Calibration priors generally followed Magallón et al. (2015). In their Bayesian
analysis priors for the calibration points used a lognormal distribution with ln-mean of
the youngest fossil age for the calibration representing the fossil age plus 10% (e.g., a
minimum fossil age of 10 MYA would be set so the mean was 11 MYA), with a standard
deviation of 1. PAML v4.9a does not use lognormal distributions for calibration points,
but instead uses the Cauchy distribution though the program does use a convenient
mechanism for setting the 10% offset (Inoue et al., 2009). Bounds were set in the
Newick-formatted reference tree at all nodes except the root using the PAML format of
‘L(minimum date, 0.1,1.45, 1e-300)'. The offset of 0.1 (=10% of date), and a scale
parameter of 1.45 gives a similar density distribution to that used for the lognormal
calibration priors used by Magallón et al. (2015). This approach was recommended by
Mario dos Reis (pers. comm.). A minimum-bound setting was made hard (PAML
defaults to soft bounds) via the 1e-300 parameter setting. Taken together these settings
create a hard minimum bound with the mode at an age 10% older than the observed fossil
but with a skewed long-tailed density distribution being used to approximate a soft
maximum bound on the calibrations (Inoue et al., 2009). This takes into account the idea
that the most recent common ancestor (MRCA) of a clade that includes a given fossil is
likely to be older than the fossil itself.
Eight calibration points could be used from Magallón et al. (2015): the crown
group node for the root Angiospermae (MRCA Viburnum lantanoides-Amborella
trichopoda) received upper and lower bounds of 136 and 139.35 MYA (in PAML coded
as 'B(1.36, 1.3935)'). All other branches received hard minimum bounds and an offset for
the mode with soft maximal bounds according to the procedure described previously.
They were as follows: Monocotyledoneae (MRCA Rhynchospora caduca-Acorus
americanus) received a lower bound of 112 MYA; Arecaceae (MRCA Sabal bermudanaMusa acuminata) 83.5 MYA; Typhaceae (MRCA Rhynchospora caduca-Typha latifolia
65.5 MYA; Nelumbonaceae (MRCA Protea pruinosa-Nelumbo nucifera) 99.6 MYA;
Solanaceae (MRCA Solanum tuberosum-Ipomoea purpurea) 33.9 MYA; Brassicales,
MRCA (Stanleya albescens-Theobroma cacao) 89.3 MYA; and Brassicaceae (MRCA
Stanleya albescens-Arabidopsis thaliana) 23.03 MYA. In addition, two fossils not used
by Magallón et al. (2015) in Cyperaceae were used: a fossil for the Mapanoid clade
(MRCA of Chorizandra multiarticulata-Hypolytrum nemorum) 47 MYA (Smith et al.
2009) and for Carex 37.8 MYA (Smith et al. 2010).
These branching time estimates were used to estimate the divergence time
between the reference taxa used in the probe design, and the taxa sequenced using hybrid
enrichment The chronogram from PAML was read into R using the package ape version
3.4 (Paradis et al., 2004). The command “mrca” was used determine the node numbers
for each pair of species on the tree, and the command “branching.times” was used to get
the branching time for all nodes.