Evolution of a RNA Polymerase Gene Family in Silene

Syst. Biol. 53(6):914–932, 2004
c Society of Systematic Biologists
Copyright ISSN: 1063-5157 print / 1076-836X online
DOI: 10.1080/10635150490888840
Evolution of a RNA Polymerase Gene Family in Silene (Caryophyllaceae)—Incomplete
Concerted Evolution and Topological Congruence Among Paralogues
M AGNUS POPP AND B ENGT O XELMAN
Department of Systematic Botany, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden;
E-mail: [email protected] (M.P.)
Abstract.—Four low-copy nuclear DNA intron regions from the second largest subunits of the RNA polymerase gene
family (RPA2, RPB2, RPD2a, and RPD2b), the internal transcribed spacers (ITSs) from the nuclear ribosomal regions, and
the rps16 intron from the chloroplast were sequenced and used in a phylogenetic analysis of 29 species from the tribe
Sileneae (Caryophyllaceae). We used a low stringency nested polymerase chain reaction (PCR) approach to overcome the
difficulties of constructing specific primers for amplification of the low copy nuclear DNA regions. Maximum parsimony
analyses resulted in largely congruent phylogenetic trees for all regions. We tested overall model congruence in a likelihood
context using the software PLATO and found that ITSs, RPA2, and RPB2 deviated from the maximum likelihood model
for the combined data. The topology parameter was then isolated and topological congruence assessed by nonparametric
bootstrapping. No strong topological incongruence was found. The analysis of the combined data sets resolves previously
poorly known major relationships within Sileneae. Two paralogues of RPD2 were found, and several independent losses and
incomplete concerted evolution were inferred. The among-site rate variation was significantly lower in the RNA polymerase
introns than in the rps16 intron and ITSs, a property that is attractive in phylogenetic analyses. [Caryophyllaceae; concerted
evolution; congruence test; low-copy nuclear genes; paralogy; RNA polymerase; Sileneae.]
Nuclear ribosomal DNA (nrDNA) and chloroplast
DNA (cpDNA) are the most widely used DNA regions to
infer phylogenetic relationships among plants. And for
good reasons too; both nrDNA and cpDNA are abundant in plant cells, they are usually easy to amplify and
sequence using standard protocols, and the phylogenetic
interpretations are generally reasonably uncomplicated.
However, there are some limitations with nrDNA
and cpDNA, for example incomplete homogenization
of the tandem repeats in nrDNA (Buckler et al., 1997), or
loss of one or more of the homoeologs in allopolyploids
(Wendel et al., 1995). Due to the predominant maternal
inheritance of chloroplasts, the evolutionary history of
cpDNA may differ substantially from that of the main
part of the nuclear genome (Cronn et al., 2002; Ferguson
and Jansen, 2002), and the transfer of DNA regions from
the plastid to the nucleus (Martin and Herrmann, 1998;
Rujan and Martin, 2001) may further complicate the
picture.
Although these characteristics of nrDNA and cpDNA
sometimes may cause severe difficulties when inferring
organismal phylogenies, in many cases they do not. The
major problem when working at shallower taxonomical
levels is often that there is just not enough information
to draw unambiguous conclusions from. The ambiguity
may either stem from the fact that there is not enough
information from nucleotide substitutions and/or the
cpDNA/nrDNA trees are discordant to or incomplete
with respect to the organismal history (see Sang, 2002).
To deal with these problems, there has been a surge of
“new” low-copy nuclear DNA regions (lcnDNA) used in
plant phylogenetics the last few years, for example Adh
(Yokoyama and Harry, 1993; Kosuge et al., 1995), GPAT
(Tank and Sang, 2001), waxy (Mason-Gamer et al., 1998),
Betv1 (Wen et al., 1997), PgiC (Gottlieb and Ford, 1996),
and RPB2 (Denton et al., 1998; Oxelman and Bremer,
2000; Popp and Oxelman, 2001). Cronn et al. (2002) used
11 single-copy nuclear loci to study the evolution of the
major lineages in Gossypium (Malvaceae).
The incorporation of lcnDNA regions in phylogenetics has not been without obstacles. The evolution of lcnDNA regions is largely unknown but seem to be rather
dynamic with fluctuating copy numbers, differences in
chromosomal locations, and recombination events (e.g.,
Gottlieb and Ford, 1996; Clegg et al., 1997; Martin and
Burg, 2002). This contributes to the difficulties of determining orthology of the sequences used and emphasizes
the importance of as complete sampling of sequences
as possible. The paralogy may stem from, for example, duplications within the same organism, or lateral
transfer between organisms (Wendel and Doyle, 1998).
In plants, allopolyploidy is an important evolutionary
process, in which the entire genomes of the parental lineages are merged. It is possible to differentiate between
paralogy due to gene duplication and paralogy due to
allopolyploidy by using several unlinked regions simultaneously. However, few, if any, protocols for lcnDNA
regions can be applied to a given plant group without
optimization and redesign of polymerase chain reaction
(PCR) primers and of PCR parameters. Sang (2002) considered it unlikely that there will be universal primers for
the majority of low-copy nuclear genes. Instead, primers
specific for less inclusive groups (families, genera) will
have to be developed.
The RNA polymerase (RNAP) family consists of three
large nuclear DNA-dependent RNA polymerase holoenzymes in most eukaryotes. RNAPs I and III transcribe
structural RNA such as rRNA and tRNA, respectively,
whereas RNAP II mainly transcribes mRNA. However,
a fourth member, RNAP IV, is found only in plants
and its function is yet to be elucidated (The Arabidopsis Genome, 2000). In Arabidopsis thaliana, three of the
914
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
genes (RPA2, RPB2, and RPC2), encoding the second
largest subunits of these holoenzymes, are single-copy
and are located on chromosomes 1, 4, and 5, respectively,
whereas the fourth (RPD2) is present in two, presumably
recently diverged paralogues located on chromosome 3
(The Arabidopsis Genome, 2000).
The phylogeny of the tribe Sileneae (Caryophyllaceae)
has recently been investigated using nrDNA (the ITS regions) and cpDNA (the rps16 intron) data (Oxelman and
Lidén, 1995; Desfeux and Lejeune, 1996; Oxelman et al.,
1997, 2001). We choose 29 species from Sileneae to study
the evolution and the phylogenetic utility of the RNAP
introns and to compare the results with the analyses of
the ITS and rps16 data.
In this paper we aim to (1) test the phylogenetic hypothesis based on ITS and rps16 data in Sileneae (Oxelman
et al., 2001); (2) provide future studies of Sileneae with
backbone information from several, presumably unlinked regions, thus facilitating inferences of gene duplications and allopolyploidizations; and (3) investigate
the topological congruence among trees inferred from
the data sets. To accomplish this we develop a general method for rapid design of primers targeting all
members of a gene family; in this case, a region coding
for the second largest subunits of the RNA polymerase
family. We aim at a complete sampling, i.e., all orthologous and paralogous sequences, of intron regions between two highly conserved amino acid motifs (GDK
and GEMERD; Fig. 1) in the genes encoding the second
largest subunits (RPA2, RPB2, RPC2, and RPD2, respectively) in the RNAP family.
M ATERIAL AND M ETHODS
Plant Materials
Plant materials used in this study are presented with
voucher data and GenBank/EMBL accession numbers
in Table 1.
Total genomic DNA was extracted as described in
Oxelman et al. (1997), or in a few cases using either
DNeasy Plant Mini Kit (QiaGen) or Plant DNA Isolation
915
Kit (Boehringer Mannheim) according to the manufacturer’s manual.
PCR and Sequencing
Typically, 0.625 U Taq polymerase from Advanced
Biotechnologies were used in 25 µL PCR reactions, with
1.5 to 2.5 mM Mg2+ , 200 µM of each dNTP, 0.5 to 1.0 µM
of each primer, 0.01% bovine serum albumin (Boehringer
Mannheim), and 0.5 to 1.0 µL total genomic DNA of unknown concentration.
The rps16 and ITS regions were amplified using PCR
cycling with an initial 5 min denaturation at 95 ◦ C, followed by 35 cycles of 30 s denaturation at 95 ◦ C, 1 min
annealing at 56 to 58 ◦ C, 2 min extension at 72 ◦ C, and
ended with 10 min final extension at 72 ◦ C. Primers
rpsF/rpsR2R (rps16; Oxelman et al., 1997) and P17/26S82R (ITS; Popp and Oxelman, 2001) were used for PCR,
and rpsF2a/rpsR3R (rps16; Popp et al., in press) and
P16b/ITS4R (ITS; Popp et al., in press, and White et al.,
1990, respectively) where used for sequencing.
To obtain the first RNAP intron sequences we used a
low-stringency nested PCR approach (Fig. 2). The first
PCR was performed with all combinations of the four
RNAP-specific primers kindly provided by B. D. Hall,
Washington University, Seattle, USA (Table 2) to amplify
the region between the highly conserved amino acid motifs GDK and GEMERD of RPA2, RPB2, RPC2, and RPD2
simultaneously (Fig. 1). After an initial 5 min denaturation at 95 ◦ C, the cycling started with 30 s at 95 ◦ C. Annealing began with 3 s at 50 ◦ C followed by a 0.3 ◦ C increase/s
up to 72 ◦ C. This was followed by 72 ◦ C for 2 to 5 min
(+1 s/cycle). The cycling was repeated 34 times before 15 min extension at 72 ◦ C and subsequent soak at
4 ◦ C. The result was a heterogeneous pool of PCR products, presumably including all the sought-after RNAP
introns.
The second PCR (Fig. 2) used the same PCR program,
degenerated, though subunit specific primers, provided
by B. D. Hall (Table 2), and the pooled PCR products
from the previous four reactions as template. In this PCR
round, the subunit introns were separated. The resulting
FIGURE 1. Structure of second largest subunits of the RNAP gene family in Arabidopsis thaliana. Accession numbers RPA2 to RPD2b: AC008030,
AL035527, AB012240, AB020749, and AP000377, respectively. Boxes represents exons and lines represents introns. Lengths are proportional to
scale bar. Arrows indicate the highly conserved amino acid regions GDK and GEMERD, and also approximate primer sites for RNAP10F,
RNAP10FF, RNAP11R, and RNAP11bR. Note that the two paralogous RPD2 sequences in A. thaliana are not orthologous to the two paralogues
in Sileneae.
916
Vouchera,b
2
rps16c
1
ITSc
1
RPA2c
2
2
RPC2c
D—082
D—081
D—083
F—140
D—084–85
F—1321
D—086–87
D—0882
F—141
D—090
n.a.
n.a.
n.a.
n.a.
n.a.
n.a.
n.a.
n.a.
n.a.
n.a.
D—063–64 D—154–56
D—065
n.a.
D—066
n.a.
n.a.
n.a.
D—067, 91
n.a.
n.a.
n.a.
D—0892
n.a.
D—068
n.a.
D—0692
D—1572
D—070-71
n.a.
D—072
n.a.
D—073
n.a.
D—0742
n.a.
F—139
n.a.
D—076
n.a.
D—0772
n.a.
D—078
n.a.
D—079–80
D—158
D—0752
n.a.
RPB2c
2
D—102-04
D—100–01
D—120
D—098–99
D—107, 10–12, 18
D—105–063
D—147–48
D—130–312
D—108-09
D—119
D—1382
D—139–40
D—150–53
n.a.
D—125
D—116–17
D—135–362
D—141–42
D—092, 442
D—113–15
D—122–24
D—093–94
D—126–272
D—121, 28–29
D—132–33
D—145–462
D—134, 51–52
D—095–97
D—137, 43, 492
RPD2a/bc
a
Superscript numbers indicate corresponding specimen if sequences are produced from more than one voucher. b BO = B. Oxelman; MP = M. Popp; AS = A. Strid et al.; GF = Gilbert
and Fries; OH = O. Hedberg; H = Holmdahl; OT = Oxelman and Tollsten; G = Gubanov; M = Mikhajlova [?]; ME = M. Egger; C = Christodoulakis. c A— = Z831; B— = X868; C— = AJ629;
D— = AJ634; E— = AJ409; F— = AJ296.
1
Agrostemma githago, 2n = 24
BO ITS-AGR 30616 (GB) , MP 1049 (UPS)
A—54
B—95
C—279–80
Atocion armeria, 2n = 24
BO ITS-ARM30611(GB)
A—59
B—80
n.a.
Atocion lerchenfeldiana, 2n = 24
AS 24188 (C)
E—061
E—057
C—281
Atocion rupestre, 2n = 24
BO 2198 (GB)
A—60
B—74
C—282
Eudianthe coeli-rosa, 2n = 24
BO 2285 (GB)
A—56
B—81
C—283
Eudianthe laeta, 2n = 24
BO 1876 (GB)
A—55
B—82
C—284
1
2
1
1
Lychnis abyssinica, 2n = ?
GF 8418 (UPS) , OH 5530 (UPS)
A—61
B—90
C—2852
Lychnis chalcedonica, 2n = 24
BO 2277 (GB)
A—64
B—94
C—286
A—651
B—911
C—2872
Lychnis coronaria, 2n = 24
BO 2278 (GB)1 , MP 1050 (UPS)2
Lychnis flos-cuculi, 2n = 24
BO 2200 (GB)
A—63
B—93
C—288
Lychnis flos-jovis, 2n = 24
BO ITS-FLO30610 (GB)
A—66
B—92
n.a.
Petrocoptis pyrenaica, 2n = 24
BO 2276 (GB)
A—67
B—75
C—289
A—891
B—601
C—2902
Silene acaulis, 2n = 24
BO 2243 (GB)1 , MP 1046 (UPS)2
Silene baccifera, 2n = 24
BO 2287 (GB)
A—69
B—89
C—291
Silene bergiana, 2n = 24
H 1182 (GB)
A—91
B—35
C—292
Silene conica, 2n = 20
BO 1944 (GB)1 , BO 1898 (GB)2
A—701
B—321
C—2932
Silene fruticosa, 2n = 24
OT 934 (GB)
A—88
B—65
C—294
Silene keiskei, 2n = 24, 48
BO 2345 (UPS)
C—913
C—909 C—295, 68–69
1
2
1
1
E—060
E—058
C—296–972
Silene linnaeana, 2n = 24
G 143 (MV) , M 1975.VI.28 (K)
Silene ajanensis, 2n = ?
Silene nivalis, 2n = 24
BO 2255 (GB)
A—90
B—61
C—299
Silene nigrescens, 2n = ?
KGB217 (GB)
C—915
B—58
C—298
Silene nocturna, 2n = 24
OT 654 (GB)
A—92
B—41
C—300
Silene noctiflora, 2n = 24
BO 2229 (GB)
A—76
B—29
n.a.
Silene parishii, 2n = 48
ME 886 (WTU)
C—914
C—910
C—301–02
C—3031
Silene pentelica, 2n = 24
MP 1008 (UPS)1 , MP 1009 (UPS)2 , C 2046 (GB)3 AJ2949661 AJ2998122
Silene rotundifolia, 2n = 48
BO 2231 (GB)
A—83
B—87
C—304
A—941
B—521
C—3052
Silene schafta, 2n = 24
BO 2264 (GB)1 , MP 1053 (UPS)2
Silene zawadskii, 2n = 24
BO 2241 (GB)
A—77
B—83
C—306
Viscaria vulgaris, 2n = 24
MP 1051 (UPS)
C—912
C—911
C—307
Taxon/chromosome number
TABLE 1. Plant material and GenBank accession numbers.
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
917
TABLE 2. Primers used for PCR and sequencing.
RNAP-specific forward (F) and reverse (R) primers used in first PCR
RNAP10F
TTYTCIAGYATGCAYGGICARAARGG
RNAP10FF
GGNGAYAARTTYDSNWSNMKRCAYGGNCAR
RNAP11R
ARRCARTCNCKYTCCATYTCNCC
RNAP11bR
GGWGARATGGARCGWGATTG
Subunit-specific (A, B, C, and D, respectively) forward (F) and reverse
(R) primers used in second PCR
A2F
GTTTGYTCTCARTTRTGGCCWG
A2Ra
GRACCATGTGWCGCAGHCKYTGRTA
B2F
TGGWCNRYBGARGGSATHAC
B2R
NCCRCGCAYTGRTANCCRCA
C2F
GAATCCWCATGGKTTYCCAAGG
C2R
CAACYTTRTCAGCATKACCAC
D2F
CCHGGNCARYTBYTDGARGCTGCYYT
D2R
YRCCNGTYCKDCCRTYGTADAC
Sileneae-specific forward (F) and reverse (R) PCR primers(P)
RPA2FP
GCCGTTTTCWGAGATAACTGGGATGCGT
RPA2RP
GRTAATAAACAGGYCCAATAAAGATCTC
F7327a
CCATCYCGTATGACAATCGGYCAGCTT
R7586a
CCCMGTGTGACCATTGTACATTGTCT
RPD2FP
GCATGTGGTGGYACDTTGAGATATGCT
RPD2RP
CTTTCAYTYCCCCATCGACAGAATCCAG
Sileneae-specific forward (F) and reverse (R) sequencing primers (S)
RPA2FS
CATGCRTTTCCTTCTAGRATGAC
RPA2RS
GTTAAMTCGGTRCCATAAACTC
a
F7381
AGCGTCTCCTTCCTTACCCACATGAGC
R7555a
CCACGCATCTGATACCCACATTTCTG
RPD2FS
CTGTTGAATCSATTACRGAGCAACTTC
RPD2RS
CAGAATCCAGCCCTGCAATC
RPD2FSa
GGTATCCCATTTAMGACTTNKTCTTTTG
RPD2FSb
GGTATCCCATTWAAGACTTRAAGGAAA
a
FIGURE 2. Outline of the nested PCR procedure.
fragments were cloned using TOPO TA Cloning Kit
(Invitrogen) according to the manufacturer’s manual,
with the exception that only half of the volumes recommended for the reactions were used.
Between 15 and 40 positive colonies from each reaction were screened by direct PCR (Fig. 2) using T7 and
M13R universal primers. In general, 5 to 15 PCR products from individual colonies were purified using either
QIAquick PCR-purification Kit (QiaGen) or Multiscreen
PCR (Millipore) according to manufacturer’s manual,
and subsequently sequenced, using at least one of the T7
and M13R primers. Often, the quality of the obtained sequences was very good, and only when there were ambiguities, the reverse sequencing reaction was performed.
Sequencing was done with either ABI PRISM BigDye
From Popp and Oxelman (2001).
Terminator Cycle Sequencing Kit and visualized on an
ABI PRISM 377 Sequencer (Perkin-Elmer), or DYEnamic
ET Terminator Cycle Sequencing Premix Kit (Amersham
Pharmacia Biotech) on a MEGABace 1000 DNA Analysis
System (Amersham Pharmacia Biotech).
All sequences were edited using Sequencher 3.1.1
(Gene Codes Corporation). Unique substitutions in
clones from one accession were ignored and consensus
sequences were constructed to reduce the effects of putative Taq errors. If a substitution was found at the same
position in two or more clones from the same taxon, it
was considered to be a unique sequence. The number
of clones sequenced for each unique sequence is indicated on the phylogenetic trees (Figs. 3 to 8). A second
and third set of subunit specific primers, this time designed after the initially obtained sequences, and thus
potentially Sileneae specific, were designed and used for
conventional PCR and direct sequencing. The PCR conditions for the “Sileneae specific” primers were as follows:
initial denaturation at 95 ◦ C for 5 min followed by 35 cycles of 95 ◦ C for 30 s, 56 to 58 ◦ C for 1 min, and 72 ◦ C for
2 min. The PCR ended with 10 min at 72 ◦ C and subsequent soak at 4 ◦ C. Whenever there was an indication
that the PCR product was not unique, either from muliple bands visualized on the agarose gel or from double
signals on chromatograms from direct sequencing reactions, cloning of the PCR was performed.
Preliminary analyses indicated two copies of RPD2
and a set of paralogue-specific forward primers were
918
VOL. 53
SYSTEMATIC BIOLOGY
constructed (Table 2) and used for both PCR and
sequencing. The cloning procedure used in this study has
the disadvantage of PCR-mediated recombinants being
sequenced. To detect recombinants, the criteria of Popp
and Oxelman (2001) was employed.
Alignment and Gap coding
The sequences were manually aligned using Se-Al Ver.
1.0a1 (A. Rambaut, http://evolve.zoo.ox.ac.uk). Gaps
(inferred insertions/deletions) were introduced in the sequences to keep the number of substitutions in an aligned
region to a minimum. Equal costs were assumed for gap
opening and extension versus substitutions, but lower
costs to substitutions in case of ties. Two or more gaps of
equal length inferred at the same position were assumed
to be a homologous character, and was binary coded
as present/absent. Large autapomorphic insertions were
excluded from the analyses. These insertions varied between 50 bp (Silene bergiana, rps16) and 710 bp (Lychnis
coronaria, RPA2).
we choose a very high “significance level,” 0.99, for
those.
Based on previous analyses (Oxelman and Liden, 1995;
Desfeux and Lejeune, 1996; Oxelman et al., 1997, 2001),
Agrostemma githago was used for outgroup rooting.
In addition to separate analyses of each region, MP
and MPB analyses of the combined data were performed
with the same settings. One sequence per species was
only found in the rps16 and ITS data sets, and to be
able to concatenate the data sets the numbers of terminals were reduced in the other matrices. We used consensus sequences from sequences found to be monophyletic within species. In the cases where sequences
were found to be para- or polyphyletic within species, all
sequences from that species were excluded from the particular data set, and instead treated as missing data. This
excluded, in particular, the polyploid taxa from RPB2 and
RPD2, as well as S. acaulis, S. nivalis, and S. fruticosa from
RPD2a.
Congruence Assessment
Phylogenetic Analyses
Maximum parsimony analyses (MP in the following text) of all six data sets were performed separately
with PAUP∗ version 4.0b10 for Macintosh (PPC), or
UNIX (Swofford, 2002), using heuristic search, random
addition sequence with 100 replicates, tree bisectionreconnection (TBR) branch swapping, MULTREES option on, and DELTRAN optimization (ACCTRAN optimization may cause erroneous branch lengths in branch
lengths tables and when printing trees due to a bug in
PAUP∗ version 4.0b10). Maximum parsimony bootstrap
analyses (MPB in the following text) were carried out
with full heuristics, 1000 replicates, TBR branch swapping, MULTREES option off, and random addition sequence with four replicates. Maximum likelihood estimates of all parameters, including branch lengths, was
determined on one arbitrarily chosen most parsimonious
tree for each data set.
Bayesian posterior probabilities (PPs) for the nodes
in the phylogenetic trees were estimated using MrBayes
version 3.0B4 (Huelsenbeck and Ronquist, 2001). Each
data set was analyzed with the default prior distributions and an optimal model of evolution determined
by MrModeltest version 1.1b (Nylander, 2003). MrModeltest is a modified version of Modeltest (Posada and
Crandall 1998), which only considers models that can be
used by MrBayes. Indels were included in the analysis
and treated as binary “morphological” characters with
absence of an indel coded as “0,” and presence as “1.”
The MCMC chains were run for 200000 generations.
Every 100th tree was saved, resulting in 2001 trees for
each data set, of which the first 501 was discarded.
This strategy supposedly conservatively discard the
burn-in phase for likelihood scores, but there is no
guarantee that this is so for the group frequencies,
which is the parameter of interest here. Therefore,
and because several studies have indicated high error
rates on PPs for groups (e.g., Erixon et al., 2003),
PLATO Ver. 2.0 (Grassly and Holmes, 1997) was used
to test for incongruence in the combined data set. This
software uses a sliding window to find regions in a nucleotide matrix that do not fit a given model, and was
originally intended to discover possible recombination
or selection in a maximum likelihood context. Regions
with a significantly low likelihood indicate a deviation
from the a priori phylogenetic model and therefore also
possible topological incongruence. By assembling the sequence regions in an arbitrary order, we avoid restricting
ourselves to the boundaries defined by the PCR primers.
Thus, we also enable the detection of recombination
events within the individual sequenced regions. One potential drawback of this approach might be that the order
of the regions may affect the results. However, as long as
detected deviating regions do not cross region borders,
this appears to be unproblematic.
Due to the prohibitively long computation time required for a full maximum likelihood (ML in the following text) analysis estimating all parameters, ML analyses
were performed as follows: a MP topology was obtained
with a heuristic search carried out as described above.
Initial parameter values were estimated using an arbitrary MP topology and a general time-reversible model
with substitution rate variability among sites following
a gamma distribution (GTR+ model), as suggested by
MrModeltest (Nylander, 2001). A complete TBR branch
swapping with fixed GTR+ parameter values was performed under the ML criterion with the MP topology as
starting tree. The ML topology obtained was used to reestimate the parameter values, which in turn were used to
perform TBR branch swapping of the ML topology. Finally, we performed a heuristic search with five random
sequence additions, TBR branch swapping and GTR+
using parameter values from the last iteration. The ML
topology thus obtained from the combined datasets and
the other model parameter values estimated with PAUP∗
constituted the input for PLATO.
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
919
TABLE 3. Matrix and tree statistics.
Terminals
Included characters
Number/% PICa
Number/lengths of MP trees
CI/RI
a
rps16
ITS
RPA2
RPB2
RPD2
Combined
29
933
119/13
691/378
0.722/0.738
29
533
157/29
6/625
0.536/0.638
29
883
220/25
528/620
0.823/0.836
33
728
229/31
3836/624
0.768/0.765
60
1070
366/34
1296/1118
0.719/0.886
29
5033
962/19
2/3240
0.739/0.718
Percentage parsimony informative characters.
Another limitation with PLATO (and many other
model-based congruence tests) is that it is not possible to
distinguish between the topology parameter and other
parameters such as the shape of the gamma distribution, base frequencies, or the substitution rates. In other
words, it is not possible to discern whether data have
evolved under a different topology or if other parameters are causing the potential anomalies detected. We
used a nonparametric bootstrap approach to isolate the
topology parameter from the rest of the parameters in
the model and test if all the data sets evolved under the
same topology.
Let Xcomb be the combined data set excluding a deviating data set, and Xdev be the deviating data set. Further,
let Tcomb be the ML topology for the combined data set
excluding the deviating data set, and Tdev be the ML
topology for the deviating data set. Xidev denotes the ith
pseudoreplicate of Xdev obtained by nonparametric bootstrapping. The free parameters (all but topology) in the
model are denoted and the −log likelihood −ln L.
We generated 100 pseudoreplicates of Xdev and −ln L
(Xidev | Tdev ) was calculated reestimating for each pseudoreplicate, thus generating a null distribution. Then −ln
L (Xdev | Tcomb ) was calculated reestimating , and subsequently ranked in the null distribution. The null hypothesis was rejected if −ln L (Xdev | Tcomb ) was extreme
at some level of significance.
As noted by Goldman et al. (2000), the selection of a
ML topology a posteriori (in our case, the deviating data
ML topology) obscures the statistical interpretation of
the obtained probability value. First, the test must obviously be one-sided, because the ML topology has higher
likelihood than any other tree. Secondly, the probability
must be corrected for all other possible tree topologies,
as is done by some implementations of the ShimodairaHasegawa test (Goldman et al., 2000). This, however,
severely reduces the power of the test. Parametric tests,
such as those devised by Goldman et al. (2000), are much
more sensitive, but also much more dependent on adequate models. Therefore, we refrain from making strict
probabilistic conclusions from our tests, but rather use
them to evaluate the relative topological incongruence
from the data sets identified by PLATO as favoring significantly different models.
R ESULTS
Table 3 summarizes the number of terminals, included characters, parsimony informative characters,
percentage parsimony informative characters, number
and lengths of MP trees, consistency index (CI), and retention index (RI) for the different DNA regions. The ML
estimates of model parameter values for each data set
and the combined data set are presented in Table 4. MPB
percentages and posterior probabilities for groups in the
tree from the combined data set (see Fig. 8) and comparable groups in the individual data sets are presented in
Table 5.
rps16 and ITS
Both the rps16 and ITS data sets support the monophyly of Atocion, Lychnis, and Eudianthe (Figs. 3 and 4).
Although Silene was recovered in a strict consensus of
the most parsimonious trees from the ITS data (Fig. 4),
neither the rps16 nor the ITS data sets have MPB percentages above 50 for a monophyletic Silene. There were
TABLE 4. Maximum likelihood estimates of separate and combined data sets under the general time reversible + gamma (GTR+) model
and maximum parsimony topologies.
Data partition
rps16
−ln L
3205.6218
Base frequencies
A
0.3543
C
0.1321
G
0.1631
T
0.3505
Relative nucleotide substitution rates
AC
1.1098
AG
1.0683
AT
0.3726
CG
0.3289
CT
1.3763
Gamma shape (α)
0.4228
ITS
RPA2
RPB2
RPD2
Combined
3735.0188
4458.4968
4040.7808
7538.5282
23030.0220
0.1906
0.2855
0.2946
0.2294
0.2585
0.1742
0.1867
0.3807
0.2576
0.1559
0.2171
0.3694
0.2580
0.1799
0.1744
0.3876
0.2728
0.1801
0.1989
0.3482
1.1337
2.6794
2.5900
0.3088
5.6042
0.3581
1.1998
3.3094
1.0250
1.1390
2.4894
1.8761
0.5638
1.9755
0.8088
0.9604
2.0557
1.3218
0.7559
2.4075
0.7235
1.3284
2.0409
3.0581
0.8804
2.0159
0.8003
0.7877
2.3557
0.7585
920
VOL. 53
SYSTEMATIC BIOLOGY
TABLE 5. Summary of MPB percentages/posterior probabilities for nodes in the combined data tree (Fig. 8), and comparable nodes in the
individual data sets. Negative numbers refer to conflicting nodes, numbers in italics indicate groups that are not found in all most parsimonious
trees. Incongruencies that are considered “hard” are indicated in bold.
Node
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Combined
rps16
ITS
RPA2
RPB2
RPD2a
RPD2b
99/1
100/1
100/1
90/1
98/1
100/1
85/.99
100/1
71/.63
100/1
100/1
95/1
100/1
73/.63
100/1
53/.98
100/1
100/1
79/1
86/1
100/1
100/1
96/1
96/1
96/.96
75/.77
100/1
87/1
54/.92
−55/−.58
100/1
<50/<.50
99/1
−82/−.93
88/1
96/1
<50/<.50
−50/−.70
−50/−.70
97/1
−75/−.76
100/1
90/1
<50/<.50
−<50/<.50
−53<.50
100/1
91/1
<50/<.50
<50/<.50
96
87
98
<50
<50
100
<50
100
−56
<50
63
<50
64
62
93
−86
99
<50
<50
−<50
73
98
<50
95
69
n/a
100
100
83
90
100
<50
100
86
98
n/a
−67
81
−81
96
100
99
98
<50
−66
n/a
<50
78
<50
<50
100
n/a
50
<50
100
100
<50
<50
<50
100
100
<50
<50
−100
<50
−93
−100
<50
<50
<50
<50
99
−86
<50
n/a
100
n/a
n/a
80
−62
100
95
100
91
100
100
98
<50
−94
100
n/a
100
− <50
<50
<50
89
100
68
−56
<50
100
n/a
99
<50
n/a
n/a
<50
100
<50
86
100
<50
n/a
n/a
n/a
n/a
n/a
<50
<50
<50
61
100
64
73
n/a
two main groups within Silene that are consistently recovered in the rps16/ITS analyses (Oxelman and Liden,
1995; Oxelman et al., 1997, 2001). In this study, one of
these two groups was represented by Silene nocturna, S.
bergiana, S. schafta, S. fruticosa, S. acaulis, and S. nivalis
and will in the following text be referred to as Silene subgenus Silene, whereas the other group (the rest of the Silene species) will be referred to as Silene subgenus Behen
(Moench) Bunge. No well-supported topological incongruences were detected between the previous analysis of
the combined ITS-rps16 data sets (Oxelman et al., 2001)
and the separate analyses performed here.
RNA Polymerase Introns
Using either low stringency PCR conditions with degenerated primers and a nested PCR approach, or direct
PCR with Sileneae-specific primers, we amplified at least
one fragment from three of the four RNAP regions (RPA2,
RPB2, and RPD2) in all taxa. RPC2 was excluded from
further study after the initial sequences were produced,
see results below. Direct sequencing of PCR fragments
produced with the Sileneae-specific primers resulted in
clean, unambiguous sequences in most cases. The 5 and
3 ends of the sequenced regions had a varying, but relatively high, degree of similarity to the corresponding
regions of Arabidopsis thaliana (Fig. 1), and intron positions from Sileneae sequences were inferred to be identical to intron positions in A. thaliana. Several taxa were
polymorphic for one or more amplified regions and it
was necessary to clone the PCR products to obtain readable sequences. Some of the taxa contained only mono-
phyletic groups of sequences for a given region, whereas
other taxa contained two or more nonmonophyletic sequences (Figs. 5 to 7).
RPA2.—Fourteen synonymous substitutions were
found in the ca. 80 bp long sequenced region corresponding to 3 end of exon 23 in A. thaliana (Fig. 1, GenBank AC008030). A total of 56 substitutions, of which
15 were nonsynonymous within Sileneae, were found
in the ca. 190 bp long region corresponding to exon
24 in A. thaliana. Furthermore, Agrostemma githago had
one extra glutamic acid, whereas Eudianthe laeta appears
to have lost a valine in exon 24. No stop codons were
identified in either of the two sequenced exon regions
in RPA2. Intron size for most taxa varied between 460
and 550 bp, with extremes found in A. githago (300 bp)
and Lychnis coronaria (1185 bp), compared to 161 bp in
A. thaliana.
The RPA2 sequences from A. githago, S. linnaeana, and
S. parishii were unreadable when sequenced directly.
Cloning revealed two different, though monophyletic,
sequences in all three taxa (Fig. 5). Most groups with
MPB percentages above 50 in the RPA2 phylogeny were
congruent with the previous analysis of the combined
ITS and rps16 data sets (Oxelman et al., 2001).
RPB2.—The RPB2 intron in Sileneae varied between
462 bp (S. keiskei) and 519 bp (A. githago). The 5 and 3
regions corresponded to exons 23 and 24, respectively, in
A. thaliana (Fig. 1). Only synonymous substitutions were
found in the ca. 70 (13 substitutions) and 50 (4 substitutions) bp long sequenced regions of exons 23 and 24,
respectively, and one nonsynonymous substitution between Sileneae and A. thaliana in exon 24.
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
921
FIGURE 3. One of 691 most parsimonious trees from the analysis of rps16. Branch lengths are proportional to the inferred number of
substitutions per site under the GTR+ model. Numbers above branches indicate parsimony bootstrap percentages over 50, numbers below
branches represent posterior probabilities.
922
SYSTEMATIC BIOLOGY
VOL. 53
FIGURE 4. One of six most parsimonious trees from the analysis of ITS. Branch lengths are proportional to the inferred number of substitutions
per site under the GTR+ model. Numbers above branches indicate parsimony bootstrap percentages above 50, numbers below branches
represent posterior probabilities. Nodes without numbers have bootstrap percentages <50. Dotted branches collapse in the strict concensus tree,
numbers below branches represent posterior probabilities.
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
923
FIGURE 5. One of 528 most parsimonious trees from the analysis of RPA2. Branch lengths are proportional to the inferred number of
substitutions per site under the GTR+ model. Numbers associated with taxon names refer to number of clones sequenced. The number is
followed by an asterisk if the PCR product was obtained by nested PCR and degenerated primers. Numbers above branches indicate parsimony
bootstrap percentages, numbers below branches represent posterior probabilities.
924
SYSTEMATIC BIOLOGY
VOL. 53
FIGURE 6. One of 3836 most parsimonious trees from the analysis of RPB2. Branch lengths are proportional to the inferred number of
substitutions per site under the GTR+ model. Numbers associated with taxon names refer to number of clones sequenced. The number is
followed by an asterisk if the PCR product was obtained by nested PCR and degenerated primers. Numbers above branches indicate parsimony
bootstrap percentages, numbers below branches represent posterior probabilities.
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
The phylogenetic analyses of the RPB2 sequences
resulted in a basally poorly resolved tree (Fig. 6).
Agrostemma githago, E. coeli-rosa, and L. flos-cuculi were
polymorphic, and cloning revealed two different, though
monophyletic, sequences in each of the three taxa. Three
other taxa, S. keiskei, S. rotundifolia, and S. parishii were
also cloned due to polymorphisms. These sequences
did not form monophyletic groups within species. The
monophyly of Atocion was strongly supported (MPB percentage 100), whereas neither Silene nor Lychnis were resolved as monophyletic in the MPB analysis. Lychnis chalcedonica was not found in any of the two Lychnis groups
(both with MPB percentage 100), and Silene itself consisted of two well-supported (MPB percentages 99 and
100, respectively) clades and a few “stray species” in a
polytomy with the rest of the ingroup.
RPC2.—Only a few RPC2 sequences (Table 1) were
produced because a 35- to 45-bp AC/T repeat close to
the 3 end caused sequencing problems. The length of
the fragment, 1300 to 1400 bp, would have made it necessary to make at least four separate sequencing reactions
to sequence the entire fragment in both directions. The
forward primer site C2F is located at the very 3 end of
exon 31 of the A. thaliana sequence (GenBank AB012240).
The ca. 95-bp exon sequence corresponding to exon 36
in A. thaliana, had only synonymous substitutions in the
taxa investigated in this study. No RPC2 sequences were
included in the analyses.
RPD2.—The partial exon sequences in Sileneae corresponds to the 3 end of exon 6 (Fig. 1, Table 5) in the
two paralogues found in A. thaliana (GenBank AB020749
and AP000377). The Sileneae-specific RPD2RP reverse
primer is located one nucleotide position downstream
of the 5 end of exon 7. Two or more copies of RPD2
were found in most species in Sileneae. The phylogenetic analysis showed that there were two groups, RPD2a
and RPD2b, of paralogous RPD2 sequences (Fig. 7). Only
RPD2a sequences were found in E. coeli-rosa, Petrocoptis
pyrenaica, and in the subgenus Silene clade, whereas only
the RPD2b copy was found in Viscaria vulgaris. A single
sequence was found in A. githago. The RPD2a and RPD2b
sequences are readily alignable over most of the area in
Sileneae, whereas they could not be reliably aligned with
the Arabidopsis sequences. The length of the intron sequences varied between 227 bp (S. fruticosa) and 1123 bp
(L . flos-cuculi), with most sequences being ca. 700 to 750
bp long.
In all taxa belonging to the subgenus Silene clade,
two or more discrete sequences belonging to the RPD2a
group were found after cloning. Most of these sequences
did not form monophyletic groups within species (Fig.
7). Three sequences were found in S. fruticosa. Two of
them were very short, 296 and 328 bp, respectively. The
third sequence was 624 bp, a more “normal” length. The
two short sequences had a large deletion from the end of
the forward PCR primer (RPD2FP) to approximately half
the intron. These sequences appeared in multiple independent PCR reactions, indicating that they might represent pseudogenes. Although S. nocturna successfully
was sequenced directly without cloning, it contained 12
925
polymorphic sites. This sequence was sister to one of
the S. bergiana sequences, with the second S. bergiana sequence as sister to this clade (MPB percentage 100). The
S. nocturna polymorphisms did not correspond to the disagreements between the two S. bergiana sequences. The
two sequences from S. schafta formed a monophyletic
group in a trichotomy with the rest of the taxa in subgenus Silene.
The two RPD2b sequences found in S. parishii were not
monophyletic (Fig. 7), but sisters to S. nigrescens and S.
rotundifolia, respectively. The latter S. parishii sequence
lacks the two last amino acids in exon 6 and has one
amino acid substitution (a leucine for a proline). There
was alignment ambiguity in the beginning of the intron,
and a substantial proportion of the conserved splice region is missing. Thus, it seems likely that this sequence
is a pseudogene.
All included genera were strongly supported as monophyletic groups with MPB percentages of 98 or 100 by the
RPD2a sequences. The results from the previous analysis of ITS and rps16 (Oxelman et al., 2001) and the wellsupported RPD2a topology (Fig. 7) were congruent at the
generic level. The RPD2b clade was less well resolved
than the RPD2a clade and there was no MPB support
above 50 for a monophyletic Silene. Both Lychnis and Atocion had MPB percentages of 100, though, and the topology was largely congruent with the ITS and rps16 data.
Analysis of the combined data sets.—PLATO identified
part of ITS (464 aligned bp, Z = 26.8), RPA2 (398 aligned
bp, Z = 6.2), and RPB2 (459 aligned bp, Z = 15.3) as having significantly lower likelihoods with the ML topology for the combined datasets and the GTR+ model
with parameter values estimated in PAUP∗ . These regions contains 97%, 66%, and 92% of the parsimony informative characters in the ITS, RPA2, and RPB2 data sets,
respectively. The likelihood for the ITS data set evolving
under the ML topology of the combined data set excluding ITS was ranked as number 72 among the 100 bootstrapped ITS data sets (i.e., P = 0.28). The corresponding P-values were 0.30 and 0.12 for the evolution of the
RPA2 and RPB2 data sets under ML topologies inferred,
excluding RPA2 and RPB2, respectively, from the combined data set. Thus, no strong topological incongruence
was detected, and all data sets were included in the combined analysis.
The parsimony bootstrap analysis of the combined
data sets resulted in a well-resolved topology where all
genera have MPB percentages of 95 or higher (Fig. 8).
Lychnis and Silene were resolved as sister to Atocion, Viscaria, Eudianthe, and Petrocoptis. A previously unresolved
sister-group relationship (e.g., Oxelman et al., 2001) between Eudianthe and Petrocoptis was also found in the
latter clade. The result are compatible with previous results from Oxelman et al. (2001).
D ISCUSSION
We will discuss the results from each separate RNAP
intron separately. Results from more extensive phylogenetic analyses of rps16 and ITS sequences in Sileneae
926
SYSTEMATIC BIOLOGY
VOL. 53
FIGURE 7. One of 1296 most parsimonious trees from the analysis of RPD2. Branch lengths are proportional to the inferred number of
substitutions per site under the GTR+ model. Numbers associated with taxon names refer to number of clones sequenced. The number is
followed by an asterisk if the PCR product was obtained by nested PCR and degenerated primers. Numbers above branches indicate parsimony
bootstrap percentages, numbers below branches represent posterior probabilities.
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
927
FIGURE 8. One of two most parsimonious trees from the analysis of the combined dataset. Branch lengths are proportional to the inferred number of substitutions per site under the GTR+ model. Numbers above branches indicate parsimony bootstrap percentages,
numbers below branches represent posterior probabilities. Numbers to the right of branching points represent the branches presented in
Table 5.
928
SYSTEMATIC BIOLOGY
have been discussed thoroughly elsewhere (Oxelman
and Liden, 1995; Oxelman et al., 1997, 2001), and we will
only discuss results that deviate from these analyses. Finally, we will discuss the combinability of the different
sequence regions and the general utility of the RNAP
strategy proposed in this paper.
RPA2
Although strong bootstrap support for monophyly of
the genus Silene is found only in one of the RPD2 paralogues (Fig. 7), the sister-group relationship between
the Silene clade and the rest of the ingroup shown in
the RPA2 phylogeny (Fig. 5) is somewhat unexpected
when compared to previous studies (Oxelman and Liden, 1995; Oxelman et al., 1997, 2001) as well as the rest
of the results in this study. However, the incongruence
is poorly resolved, and might be due to stochastic effects
together with lack of information. Nevertheless, we find
it instructive to examine other explanations to this putatively incongruent gene phylogeny in some detail.
Wendel and Doyle (1998) list a number of biological
phenomena that can lead to incongruence, such as orthology/paralogy conflation, lineage sorting, rate heterogeneity among taxa, hybridization/introgression, and
short internal branches. One of the aims of this study
is to minimize the risk of orthology/paralogy conflation by using degenerated primers and low-stringency
PCR conditions in combination with cloning to amplify
and sequence all possible paralogues. RPA2 polymorphisms were found in three taxa; S. parishii, S. linnaeana,
and the outgroup Agrostemma githago, but because they
form monophyletic groups within species, the polymorphisms may be explained either by allelic variation or autapomorphic gene duplications. No traces of an ancient
gene duplication or polymorphic RPA2 gene pool were
detected, and we conclude that there is no support for a
hypothesis involving orthology/paralogy conflation.
Lineage sorting (e.g., Pamilo and Nei, 1988), and failure of alleles to coalesce within a species lineage, is very
difficult to distinguish from orthology/paralogy conflation. It is, however, unlikely that a polymorphic RPA2
allele pool has been maintained during a time span long
enough for fixation of one allele and loss of the other in
the Silene clade, while the opposite allele is fixed and lost,
respectively, in all other lineages investigated here.
It is likely that a hybridization event would leave traces
in more than one nuclear gene (e.g., Cronn et al., 1999).
The topological pattern found in RPA2 is not found in any
of the other four nuclear DNA regions, nor did we find
any pattern of strong incongruence between the maternally inherited (Corriveau and Coleman, 1988) cpDNA
and the nuclear DNA as sometimes is seen in hybrids
if evolutionary processes homogenize the paralogues
(e.g.,. Brochmann et al., 1996). A hybridization event is
therefore not supported as an explanation to the apparent incongruence.
In phylogenetic analyses using inconsistent models,
rate heterogeneity among taxa may confound phylogeny
estimations as a result of parallel substitution in faster
VOL. 53
evolving taxa, i.e., “long branch attraction” (Felsenstein,
1978), and therefore cause incongruence between different data partitions. A solution to this may be a denser
taxon sampling to break up the long branches. The sampling of taxa from the Silene clade is rather scattered,
and perhaps a denser sampling would result in a topology more in line with results from the analyses of the
other datasets. A second solution is to use a phylogenetic method more robust to rate heterogeneity. Given
a reasonable model and enough data, maximum likelihood is often suggested to be more robust than parsimony (Felsenstein, 1973). Analyzing the RPA2 data with
maximum likelihood method as implemented in PAUP∗ ,
using the HKY+ model suggested by MrModeltest
1.1b (Nylander, 2001), five random additions with TBR
branch swapping, and estimating all parameters from
the data resulted in a topology (not shown) that was basically the same as when analyzed with parsimony. Thus,
it must either be that the inferred RPA2 phylogeny is not
confused by long branch attraction, or the model is not
robust to deviations in the data. The conclusion is that
there is no positive evidence for long branch attraction
as an explanation to the observed pattern in RPA2.
In the case of short internal branches, i.e., if lineage
splits are common relative to the substitution rate, a
relatively high degree of random variation is expected
in the data, and a phylogenetic analysis would result
in a poorly resolved tree. Separate analyses of several
putatively independent DNA regions are predicted to
have weakly supported “soft incongruences” (Wendel
and Doyle, 1998) due to these phenomena, and the incongruences are expected to vanish when more data are
added. Most of the MP topologies from the analyses of
the separate data sets are poorly resolved and poorly
supported by MPB to a varying degree. In the light of
this, plus the fact that the MPB support of subgenus Silene as sister to the rest of the ingroup was low (two nodes
with MPB percentages of <50 and 67, respectively), we
draw the conclusion that the putative incongruence is a
result of random variation as an effect of short internal
branches. This hypothesis is not rejected by the likelihood score rank test discussed below.
RPB2
Several taxa in RPB2 had to be cloned due to polymorphisms making the sequences unreadable when the PCR
products were sequenced directly. If sequences within a
species were monophyletic (Fig. 6), the polymorphisms
were assumed to be caused by divergent alleles in heterozygous individuals, or autapomorphic gene duplications. The sequences from S. keiskei, S. rotundifolia, and
S. parishii did not form monophyletic groups within
species, but nonmonophyly did not receive strong support. A weakly supported monophyletic group consisting of one sequence from S. parishii and one from S. keiskei
is found together with the other sequences in a polytomy. Silene zawadskii and S. nigrescens are resolved as
sister group to these. The pattern is not too surprising
as all three former taxa are tetraploids. The variation in
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
the sequenced part of RPB2, however, is not enough to
resolve the internal relationships of the paralogues. This
group, also including S. linnaeana (MPB percentage of 99),
is one of two well-supported groups found within Silene
in the bootstrap analysis; the other group was within
the subgenus Silene clade. Contrary to previous analyses
(Oxelman and Liden, 1995; Oxelman et al., 1997, 2001)
and the RPA2 and RPD2 trees (Figs. 5 and 7), S. bergiana
is not resolved as sister to S. nocturna. Silene bergiana is
found to be sister to the rest of the clade in the strict consensus from the MP trees, but the MPB percentage for
this relationship was less than 50. There is no MPB support above 50 from RPB2 data neither for the otherwise
well supported Lychnis clade nor for the Atocion/Viscaria
clade.
RPD2
By comparing the short sequenced regions of exon 6
(Table 6) in the RPD2 paralogues found in Sileneae and
Arabidopsis thaliana, it is clear that the paralogues in Sileneae are more closely related to each other than to either
of the two paralogues in A. thaliana.
Despite cloning of the PCR products and the use of
paralogue-specific primers, only a single sequence was
found in Agrostemma githago, the outgroup in this study.
Several of the indels and substitutions diagnostic for either of the paralogues are found in the single sequence
from A. githago. This indicates that the duplication occurred in the branch leading to the ingroup, or the sequence may be a result of incomplete concerted evolution (see below). However, the orthology of this sequence
as well as the duplication event, cannot be determined
until other alignable outgroup sequences are added.
A single gene duplication is not enough to explain
the RPD2 gene phylogeny (Fig. 7). If one accepts only
branches with MPB >95%, at least two more gene duplications and two losses have to be inferred in the subgenus Silene clade. Firstly, one duplication and one loss
in the lineage leading to the clade consisting of S. acaulis
and S. fruticosa, and secondly, one duplication and one
loss in the lineage leading to S. bergiana and S. nocturna
(Fig. 7). As there are numerous species that are morphologically more close to either of these two species,
it seems unlikely that S. nocturna would be derived from
within S. bergiana. Also S. schafta contains two sequences,
and although the two sequences were fairly divergent,
it may be explained by heterozygosity. If one accepts all
branches in the strict consensus tree (Fig. 7), several additional duplications and losses have to be inferred. All
six species mentioned above are diploid, and therefore
only one or two sequences are expected to be found from
a single locus. The bootstrap support is low for most of
the nodes, and part of the pattern could be explained by
929
lineage sorting in recently diverged lineages. However,
in S. nivalis and S. fruticosa, three sequences were found
in each specimen, and lineage sorting alone cannot explain the pattern. No RPD2b paralogues were recovered
in any of the sampled specimens from the subgenus Silene despite extensive cloning and the use of paraloguespecific primers, implying at least one more paralogue
extinction.
Due to the perfect correspondence between multiple
copies of RPD2a paralogues and the complete loss of
the RPD2b paralogues in subgenus Silene, an alternative explanation to the observed pattern is incomplete
concerted evolution. As a result of incomplete concerted
evolution between RPD2a and RPD2b, one would expect mosaic sequences in case of reciprocal recombination, or more homogenous sequences if gene conversion
is operating (Wendel and Doyle, 1998). Incomplete concerted evolution has been suggested to occur in small
nuclear gene families such as Adh in Gossypium (Millar
and Dennis, 1996), PgiC in Clarkia (Gottlieb and Ford,
1996), and glutamine synthetase in Pisum (Walker et al.,
1995). No obvious mosaic sequences were found in the
subgenus Silene RPD2 data set. Thus, there is no support for reciprocal recombination as the evolutionary
process. Incomplete concerted evolution by gene conversion is a simpler explanation than a number of independent duplications and losses, and therefore a preferred
hypothesis.
Combined Analysis and Comparisons of the Data Sets
Both the rps16 intron and ITS data sets show high
among-site rate variation (α = 0.42 and 0.35, respectively;
Table 4). The high among-site rate variation is likely correlated to constraints imposed by the secondary structure
found in the rps16 group II intron (e.g., Kelchner, 2002)
and ITS (e.g., Baldwin et al., 1995). The RNAP introns, on
the other hand, have low among-site rate variation (α =
1.88, 1.32, and 3.06 for RPA2, RPB2, and RPD2, respectively; Table 4) and consequently seem to be free from
such constraints to a much greater extent.
PLATO detected several regions where the ML topology had significantly low likelihood to explain the combined data set. These differences in likelihood score indicate deviations from the model including the topology
parameter and/or other model parameter values (ML estimates from the combined data set) supplied as input to
PLATO (Grassly and Holmes, 1997). Using PLATO, it is
not possible to discern whether data have evolved under
a different topology or if other parameters are causing the
anomalies. Are the observed differences caused by different evolutionary histories, or are they just an effect of
stochastic variation? To answer that question, it would
be desirable to test whether or not the ML topologies
TABLE 6. Amino acid alignment of the 3 end of exon 6 in RPD2
Sileneae RPD2a
Sileneae RPD2b
A. thaliana 1
A. thaliana 2
GKGIAC- - - - - -GG- - - - - - -T-L-RYATPFSTPSVESITEQLH
GKGIAC- - - - - -GG- - - - - - -T-L-RYATPFSTPSVESITEQLH
SKGIACPIQKKEGSSAAYTKLTRHATPFSTPGVTEITEQLH
SKGIACPIQK–EGSSAAYTKLTRHATPFSTPGVTEITEQLH
930
SYSTEMATIC BIOLOGY
inferred from the deviating regions (i.e., ITS, RPA2, and
RPB2, respectively) discovered with PLATO are different from the ML topologies inferred from the rest of the
combined data (excluding, in turn, ITS, RPA2, and RPB2)
at some level of significance. Typically, different implementations of parametric bootstrapping (Huelsenbeck
and Bull, 1996) are formulated to test whether the maximum likelihood topology or an alternative topology is
true (Goldman et al., 2000). However, because we are interested in whether the ML topology differs significantly
from an alternative topology, this is not the appropriate
question to ask in our case. Parametric tests of topologies
are highly sensitive to model misspecification (Buckley,
2002). The nonparametric tests of Kishino and Hasegawa
(KH test) (Hasegawa and Kishino, 1989; Kishino and
Hasegawa, 1989) and Shimodaira-Hasegawa (SH test)
(Shimodaira and Hasegawa, 1999) have been used for
pairwise (KH test) or multiple (SH test) test of topologies. Goldman et al. (2000) showed that the KH test is
not appropriate if one of the compared topologies is also
the ML topology chosen a posteriori. If only two topologies are compared using the SH test, this test reduces to
the KH test (Shimodaira and Hasegawa, 1999). Because
one of the two topologies we want to compare is always
the ML topology in this study, the SH test is also improper to use. Therefore, we do not use the KH test per se,
but rather measure how the ML topology from one data
set (combined data excluding a deviating data partition,
e.g., ITS) fits into the likelihood distribution of another
data set (e.g., ITS), which has a different ML topology.
Although it is difficult to define a relevant null hypothesis because the ML topology is selected a posteriori, the
relative size of the obtained P values give an indication
of the relative impact of the topology parameter on each
data partition.
PLATO detected the strongest deviations from the
model in the ITS data. This contrasts to the low impact of
the topology parameter (P = 0.28) on the ITS data. The
only incongruence between the two topologies reasonably well supported by MPB is the internal relationship
of the group consisting of S. acaulis, S. nivalis, and S. fruticosa, where a sister-group relationship between S. acaulis
and S. nivalis is supported by ITS (MPB 86%; Fig. 4), but
contradicted by MPB analysis of the combined data excluding ITS, where a sister-group relationship between
S. nivalis and S. fruticosa is supported instead (MPB 69%;
data not shown). The ML estimates of the parameter values show that ITS deviates both in relative nucleotide
substitution rates and is also slightly GC biased compared to the other partitions (and also the values in the
model supplied to PLATO), which are AT biased (Table
4). The gamma parameter included in the model makes
PLATO less sensitive to deviations in relative rates of nucleotide substitutions to some extent, whereas PLATO
will detect significant differences in base composition
(Grassly and Holmes, 1997). It therefore seems plausible that the model deviation stem from parameters other
than topology in this case, and we conclude that the deviation observed with PLATO is due to the base composition in ITS.
VOL. 53
The apparent incongruence detected in RPA2 and
RPB2 cannot be explained by deviating rates of substitutions and/or base composition. The parameter values
are in both datasets close to the values in the model used
with PLATO (Table 4). Some topological incongruence
was detected in both datasets. Both the position of subgenus Silene (two nodes with MPB <50% and 67%, respectively; Fig. 5) and the sister-group relationship of
S. nivalis and S. fruticosa (MPB 100%) inferred from the
RPA2 data were incongruent with the ML topology from
combined data excluding RPA2. In RPB2, S. linnaeana was
resolved at a slightly different position, but with poor
MPB support (63% and 67%, respectively). The incongruences are not reflected in the likelihood ranking of the
ML topologies (P = 0.30 and 0.12 excluding RPA2 and
RPB2, respectively). It is notable, however, that many of
the internal branches in the RPB2 topology are relatively
short (Fig. 6). The branch lengths constitute a large fraction of the free parameters, and are dependent on the
topology. One may argue that it is difficult to distinguish
between topological differences and branch length differences, but we find the lack of strong differences in
branching order and other parameters as indicative that
it is the branch lengths themselves that are deviating.
The biological explanation for this is obscure, but a reasonable hypothesis is that it is a random effect of rapid
diversification of the group.
Based on the ML ranking, we cannot reject the null hypothesis that all our data have evolved under the same
topology, and we therefore choose to combine all data in
a MPB analysis. The analysis of the combined data set
resolved the previously poorly known generic relationships within Sileneae (Fig. 8). A denser taxon sampling
is needed to infer the relationships within subgenus Silene, but our analysis supports a hypothesis of a monophyletic genus Silene. To resolve the relationships of the
polyploid taxa, it is necessary to search more thoroughly
for paralogues (see below) and include them in the analysis, instead of excluding nonorthologous sequences as
is done in this analysis. The taxonomic conclusions based
on the rps16 and ITS data sets (Oxelman et al., 2001) are
further substantiated by the results presented here.
General Utility of the Primer Design strategy
The second PCR, with subunit-specific primers,
yielded highly specific PCR products despite the lowstringency PCR conditions. With very few exceptions,
all PCR products used for the sequences in this study
were obtained at the first attempt, with either degenerated or Sileneae-specific primers. Some sequences were
“missing” in the RPD2 data set, i.e., only one of the expected paralogues was found in spite of several attempts
with cloning PCR products obtained with subunit specific primers. In addition, all attempts to amplify a “missing” paralogue with paralogue specific primers failed
despite using several different polymerases, and varying
PCR parameters such as annealing temperature, Mg2+ ,
and primer concentration. There may be several explanations of this. Differences in secondary structure (Buckler
2004
POPP AND OXELMAN—EVOLUTION OF A RNA POLYMERASE GENE FAMILY IN SILENE
et al., 1997) or differences in primer mismatching might
bias the PCR, resulting in recovering only one of the
copies, i.e., PCR selection (Wagner et al., 1994). Another
possibility is physical elimination of one of the redundant copies as is found in some allopolyploids (Shaked
et al., 2001), or large inserts in pseudogenes (Tank and
Sang, 2001) causing either inhibition or heavy bias of
the PCR. Besides running several reactions under varying conditions and with several different sets of primers
and/or paralogue specific primers (Rauscher et al., 2002)
combined with cloning, PCR cannot take us any further
and an ultimate answer to whether there is another copy
or not cannot be given by PCR alone. Despite these inherent difficulties, we argue that our method is suitable
for studying evolutionary relationships of lcnDNA sequence regions. The simultaneous analysis of multiple,
presumably unlinked, lcnDNA sequence regions enables
us to detect complicated evolutionary processes at the
genome level, while offering a large amount of data for
strong inferences of phylogenetic relationships at the “organismal” level. We suggest that this approach holds a
very strong potential for phylogenetic studies of many
organismal groups.
CONCLUSIONS
The addition of intron sequences from RPA2, RPB2,
and RPD2 to the rps16 and ITS data sets results in
a strongly supported phylogeny of the tribe Sileneae.
Among-site rate variation is substantially lower in the
RNA polymerase introns than in the rps16 intron and
ITS. The analyses reveal evolutionary patterns consistent
with gene duplication and incomplete concerted evolution in RPD2. Nested PCR with several sets of highly degenerated “universal” primers combined with cloning
and subsequent design of more specific primers proves
to be a powerful way to amplify and sequence low-copy
nuclear DNA regions.
ACKNOWLEDGMENTS
We thank Chris Simon, Roberta Mason-Gamer, two anonymous
reviewers, Katarina Andreasen, Magnus Lidén, Johan Nylander, and
Sylvain Razafimandimbison for valuable comments; Reija Dufva, Inga
Hallin, and Nahid Heidari for excellent help in the lab; Benjamin D. Hall
for providing primer sequences; Mark W. Chase, the herbaria at WTU
and UPS, and the Botanical Garden in Uppsala for providing plant
material. This study was supported by Helge Ax:son Johnsons Stiftelse,
The Swedish Research Council, The Royal Physiographic Society in
Lund, The Royal Swedish Academy of Sciences, and Linnéstipendiet.
R EFERENCES
Baldwin, B. G., M. J. Sanderson, M. J. Porter, M. F. Wojciechowski, C.
S. Campbell, and M. J. Donoghue. 1995. The ITS region of nuclear
ribosomal DNA: A valuable source of evidence on angiosperm phylogeny. Ann. Miss. Bot. Garden 82:247–277.
Brochmann, C., T. Nilsson, and T. M. Gabrielsen. 1996. A classical
example of postglacial allopolyploid speciation re-examined using
RAPD markers and nucleotide sequences: Saxifraga osloensis. Symbolae Botanicae Upsaliensis 31:75–89.
Buckler, E. S., A. Ippolito, and T. P. Holtsford. 1997. The evolution of ribosomal DNA: Divergent paralogues and phylogenetic implications.
Genetics 145:821–832.
931
Buckley, T. R. 2002. Model misspecification and probabilistic tests
of topology: Evidence from empirical data sets. Syst. Biol. 51:509–
523.
Clegg, M. T., M. P. Cummings, and M. L. Durbin. 1997. The evolution
of plant nuclear genes. Proc. Nat. Acad. Sci. USA 94:7791–7798.
Corriveau, J. L., and A. W. Coleman. 1988. Rapid screening method to
detect potential biparental inheritance of plasmid DNA and result
for over 200 angiosperm species. Am. J. Bot. 75:1443–1458.
Cronn, R. C., R. L. Small, T. Haselkorn, and J. F. Wendel. 2002. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by
analysis of sixteen nuclear and chloroplast genes. Am. J. Bot. 89:707–
725.
Cronn, R. C., R. L. Small, and J. F. Wendel. 1999. Duplicated genes
evolve independently after polyploid formation in cotton. Proc. Nat.
Acad. Sci. USA 96:14406–14411.
Denton, A. L., B. L. McConaughy, and B. D. Hall. 1998. Usefulness of
RNA polymerase II coding sequences for estimation of green plant
phylogeny. Mol. Biol. Evol. 15:1082–1085.
Desfeux, C., and B. Lejeune. 1996. Systematics of euromediterranean
Silene (Caryophyllaceae): Evidence from a phylogenetic analysis using ITS sequences. Comptes Rendus Acad. Sci. Serie ii—Sci. Vie–Life
Sci. 319:351–358.
Erixon, P, Svennblad, B, Britton, T., and Oxelman, B. 2003. The reliability of bayesian posterior probabilities and bootstrap frequencies in
phylogenetics. Syst. Biol. 52:665–673.
Felsenstein, J. 1973. Maximum likelihood and minimum-steps methods
for estimating evolutionary trees from data on discrete characters.
Syst. Zool. 22:240–249.
Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 1978:401–410.
Ferguson, C. J., and R. K. Jansen. 2002. A chloroplast DNA phylogeny
of eastern Phlox (Polemoniaceae): Implications of congruence and
incongruence with the its phylogeny. Am. J. Bot. 89:1324–1335.
Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000. Likelihood-based
tests of topologies in phylogenetics. Syst. Biol. 49:652–670.
Gottlieb, L. D., and V. S. Ford. 1996. Phylogenetic relationships among
the sections of Clarkia (Onagraceae) inferred from the nucleotide sequences of PgiC. Syst. Bot. 21:45–62.
Grassly, N. C., and E. C. Holmes. 1997. A likelihood method for the
detection of selection and recombination using nucleotide sequences.
Mol. Biol. Evol. 14:239–247.
Hasegawa, M., and H. Kishino. 1989. Confidence-limits on
the maximum-likelihood estimate of the hominoid tree from
mitochondrial-DNA sequences. Evolution 43:672–677.
Huelsenbeck, J. P., and J. J. Bull. 1996. A likelihood ratio test to detect
conflicting phylogenetic signal. Syst. Biol. 45:92–98.
Kelchner, S. A. 2002. Group II introns as phylogenetic tools: Structure
function, and evolutionary constraints. Am. J. Bot. 89:1651–1669.
Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximumlikelihood estimate of the evolutionary tree topologies from DNAsequence data, and the branching order in hominoidea. J. Mol. Evol.
29:170–179.
Kosuge, K., K. Sawada, T. Denda, J. Adachi, and K. Watanabe. 1995.
Phylogenetic relationships of some genera in the Ranuculaceae
based on alcohol dehydrogenase genes. Plant Syst. emat Evol.
9(Suppl):263–271.
Martin, A. P., and T. M. Burg. 2002. Perils of paralogy: Using HSP70
genes for inferring organismal phylogenies. Syst. Biol. 51:570–587.
Martin, W., and R. G. Herrmann. 1998. Gene transfer from organelles
to the nucleus: How much, what happens, and why? Plant Physiol.
Rockville. Sept. 118:9–17.
Mason-Gamer, R. J., C. F. Weil, and E. A. Kellogg. 1998. Granule-bound
starch synthase: Structure, function, and phylogenetic utility. Mol.
Biol. Evol. 15:1658–1673.
Millar, A. A., and E. S. Dennis. 1996. The alcohol dehydrogenase genes
of cotton. Plant Mol. Biol. 31:897–904.
Nylander, J. A. A. 2003. MrModeltest, version 1.1b. Department of
Systematic Zoology, EBC, Uppsala University, Sweden. E-mail: [email protected]
Oxelman, B., and B. Bremer. 2000. Discovery of paralogous nuclear gene
sequences coding for the second-largest subunit of RNA polymerase
II (RPB2) and their phylogenetic utility in gentianales of the asterids.
Mol. Biol. Evol. 17:1131–1145.
932
SYSTEMATIC BIOLOGY
Oxelman, B., and M. Liden. 1995. Generic boundaries in the tribe Sileneae (Caryophyllaceae) as inferred from nuclear rDNA sequences.
Taxon 44:525–542.
Oxelman, B., M. Liden, and D. Berglund. 1997. Chloroplast rps16 intron
phylogeny of the tribe Sileneae (Caryophyllaceae). Plant Syst. emat
Evol. 206:393–410.
Oxelman, B., M. Lidén, R. K. Rabeler, and M. Popp. 2001. A revised
generic classification of the tribe Sileneae (Caryophyllaceae). Nordic
J. Bot. 20:743–748.
Pamilo, P., and M. Nei. 1988. Relationships between gene trees and
species trees. Mol. Biol. Evol. 5:568–583.
Popp, M., P. Erixon, F. Eggens, and B. Oxelman. In press. Origin
and evolution of a circumpolar polyploid species complex in Silene
(Caryophyllaceae) inferred from low copy nuclear RNA polymerase
introns, rDNA, and chloroplast DNA. Syst. Bot.
Popp, M., and B. Oxelman. 2001. Inferring the history of the polyploid
Silene aegaea (Caryophyllaceae) using plastid and homoeologous nuclear DNA sequences. Mol. Phylogenet. Evol. 20:474–481.
Posada, D. and K. A. Crandall. 1998. MODELTEST: testing the model
of DNA substitution. Bioinformatics 14:817–818.
Rauscher, J. T., J. J. Doyle, and A. H. D. Brown. 2002. Internal transcribed
spacer repeat-specific primers and the analysis of hybridization in
the Glycine tomentella (Leguminosae) polyploid complex. Mol. Ecol.
11:2691–2702.
Rujan, T., and W. Martin. 2001. How many genes in Arabidopsis come
from cyanobacteria? An estimate from 386 protein phylogenies.
Trends in Genet. 17:113–120.
Sang, T. 2002. Utility of low-copy nuclear gene sequences in plant phylogenetics. Crit. Rev. Biochem. Mol. Biol. 37:121–147.
Shaked, H., K. Kashkush, H. Ozkan, M. Feldman, and A. A. Levy.
2001. Sequence elimination and cytosine methylation are rapid and
reproducible responses of the genome to wide hybridization and
allopolyploidy in wheat. Plant Cell 13:1749–1759.
Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Mol. Biol.
Evol. 16:1114–1116.
Swofford, D. L. 2002. PAUP∗ . Phylogenetic analysis using parsimony
(∗ and other methods). version 4.0b10. Sinauer Associates, Sunderland, Massachusetts.
VOL. 53
Tank, D. C., and T. Sang. 2001. Phylogenetic utility of the glycerol-3phosphate acyltransferase gene: Evolution and implications in Paeonia (Paeoniaceae). Mol. Phylogenet. Evol. 19:421–429.
The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–
815.
Wagner, A., N. Blackstone, P. Cartwright, M. Dick, B. Misof, P. Snow, G.
P. Wagner, J. Bartels, M. Murtha, and J. Pendleton. 1994. Surveys of
gene families using polymerase chain-reaction—PCR selection and
PCR drift. Syst. Biol. 43:250–261.
Walker, E. L., N. F. Weeden, C. B. Taylor, P. Green, and G. M. Coruzzi.
1995. Molecular evolution of duplicate copies of genes encoding
cytosolic glutamine synthetase in Pisum sativum. Plant Mol. Biol.
29:1111–1125.
Wen, J., M. Vanek-Krebitz, K. Hoffmann-Sommergruber, O. Scheiner,
and H. Breiteneder. 1997. The potential of Betv1 homologues, a nuclear multigene family, as phylogenetic markers in flowering plants.
Mol. Phylogenet. Evol. 8:317–333.
Wendel, J. F., and J. J. Doyle. 1998. Phylogenetic incongruence: Window into genome history and molecular evolution. Pages 265–296 In
Molecular systematics of plants. II (P. Soltis, D. Soltis, and J. Doyle,
eds.). Kluwer Academic Press, Dordrecht.
Wendel, J. F., A. Schnabel, and T. Seelanan. 1995. Bidirectional interlocus concerted evolution following allopolyploid speciation in cotton (Gossypium). Proc. Nat. Acad. Sci. USA 92:280–284.
White, T. J., T. Bruns, S. Lee, and J. Taylor. 1990. Amplification and direct
sequencing of fungal ribsomal RNA genes for phylogenetics. Pages
315–322 in PCR protocols: A guide to methods and applications (M.
Innis, D. Gelfand, J. Sninsky, and T. J. White, eds.). Academic Press,
San Diego, California.
Yokoyama, S., and D. E. Harry. 1993. Molecular phylogeny and evolutionary rates of alcohol dehydrogenases in vertebrates and plants.
Mol. Biol. Evol. 10:1215–1226.
First submitted 29 July 2003; reviews returned 14 January 2004;
final acceptance 23 August 2004
Associate Editor: Roberta Mason-Gamer