Independent Ancient Polyploidy Events in the Sister

The Plant Cell, Vol. 18, 1152–1165, May 2006, www.plantcell.org ª 2006 American Society of Plant Biologists
Independent Ancient Polyploidy Events in the Sister Families
Brassicaceae and Cleomaceae
W
M. Eric Schranza,1,2 and Thomas Mitchell-Oldsa,1
a Department
of Genetics and Evolution, Max Planck Institute for Chemical Ecology, D-07745 Jena, Germany
Recent studies have elucidated the ancient polyploid history of the Arabidopsis thaliana (Brassicaceae) genome. The
studies concur that there was at least one polyploidy event occurring some 14.5 to 86 million years ago (Mya), possibly near
the divergence of the Brassicaceae from its sister family, Cleomaceae. Using a comparative genomics approach, we asked
whether this polyploidy event was unique to members of the Brassicaceae, shared with the Cleomaceae, or an independent
polyploidy event in each lineage. We isolated and sequenced three genomic regions from diploid Cleome spinosa
(Cleomaceae) that are each homoeologous to a duplicated region shared between At3 and At5, centered on the paralogs of
SEPALLATA (SEP) and CONSTANS (CO). Phylogenetic reconstructions and analysis of synonymous substitution rates
support the hypothesis that a genomic triplication in Cleome occurred independently of and more recently than the
duplication event in the Brassicaceae. There is a strong correlation in the copy number (single versus duplicate) of individual
genes, suggesting functionally consistent influences operating on gene copy number in these two independently evolving
lineages. However, the amount of gene loss in Cleome is greater than in Arabidopsis. The genome of C. spinosa is only 1.9
times the size of A. thaliana, enabling comparative genome analysis of separate but related polyploidy events.
INTRODUCTION
Polyploidy, or whole-genome duplication, has played a major
role in the evolution of many branches of the tree of life by
providing exponential and saltational increases in genome size
and gene copy number. Polyploidy provides the opportunity for
selection to sculpt a variety of new gene functions, traits, and
lineages. There are several examples of polyploid animals
(Mable, 2004), including mammals (Gallardo et al., 1999), fishes
(Le Comber and Smith, 2004), and frogs (Ptacek et al., 1994), and
polyploidy is particularly common among flowering plants. Many
important model systems, both agricultural and wild, have been
used to study the consequences of recent polyploidization (e.g.,
cotton [Gossypium hirsutum], tobacco [Nicotiana tabacum],
wheat [Triticum aestivum], canola [Brassica napus], soybean
[Glycine max], potato [Solanum tuberosum], sugarcane [Saccharum officinarum], cordgrass (Spartina anglica), and dandelion
[Taraxacum officinale]) (Wendel, 2000; Osborn et al., 2003; Tate
et al., 2005). These examples of recent polyploidy are easily
detected by changes in chromosome numbers, genome size,
and gene copy number compared with progenitors. However,
ancient, or paleopolyploid, events are much more difficult to
detect. After polyploidy occurs, the genome begins its march
1 Current
address: Department of Biology, PO Box 91000, Duke
University, Durham, NC 27708.
2 To whom correspondence should be addressed. E-mail eric.
[email protected]; fax 919-613-8177.
The authors responsible for distribution of materials integral to the
findings presented in this article in accordance with the policy described
in the Instructions for Authors (www.plantcell.org) are: M. Eric Schranz
([email protected]) and Thomas Mitchell-Olds ([email protected]).
W
Online version contains Web-only data.
Article, publication date, and citation information can be found at
www.plantcell.org/cgi/doi/10.1105/tpc.106.041111.
back to diploidy through gene loss and chromosomal rearrangement (Song et al., 1995; Lynch and Conery, 2000; Wolfe, 2001;
Kashkush et al., 2002). After tens or hundreds of millions of years
of evolution, this diploidization process obfuscates homoeologous genomic regions. Increasingly, comparative genomic approaches within a phylogenetic framework have revealed
evidence of such paleopolyploid events. For example, the complete genome sequencing of the ascomycete yeast Kluyveromyces waltii was used to confirm the ancient polyploid history of
Saccharomyces cerevisiae (Kellis et al., 2004). Similarly, genomic
sequencing of the fish Tetraodon nigroviridis (Jaillon et al., 2004)
revealed the ancient polyploid history of the teleost fish lineage.
Considering the prevalence of recent polyploids among flowering plants, it is not surprising that a plethora of ancient
polyploid events have also been discovered. Evidence for ancient polyploidy has come mostly from comparative genetic
mapping, analysis of specific gene families, or by the identification of duplicated genes in EST collections, for example, Brassica
species (Lagercrantz, 1998), maize (Zea mays; Gaut and Doebley,
1997), grasses (Paterson et al., 2004), legumes (Pfeil et al., 2005),
poplar (Populus spp; Sterck et al., 2005), and several other model
plants (Blanc and Wolfe, 2004a). However, the most extensive
evidence for ancient polyploidy comes from analysis of the
complete sequences of both rice (Oryza sativa; Yu et al., 2005)
and Arabidopsis.
Studies of duplicate genes and colinearity in the A. thaliana
genome differ in their assessment of the total number of polyploidy events but generally agree on the occurrence of at least
one complete ancient genome duplication (Arabidopsis Genome
Initiative, 2000; Vision et al., 2000; Simillion et al., 2002; Blanc
et al., 2003; Bowers et al., 2003). The timing of this most recent
(or a) polyploidy event has been placed somewhere between
14.5 and 86 Mya. It has been hypothesized that the polyploidy
Ancient Polyploidy in Brassicales
event occurred at the divergence of the rest of the Brassicaceae
and the genus Aethionema (Galloway et al., 1998), at the radiation of the entire Brassicaceae (Blanc et al., 2003), or as
distantly as the divergence of the orders Brassicales and
Malvales (Bowers et al., 2003). We seek to resolve this issue by
comparison between the Brassicaceae and other members of
the Brassicales.
Recent phylogenetic analyses have established that the Brassicaceae is sister to the Cleomaceae, with these two families
being sister to the Capparaceae (Hall et al., 2002). The Brassicaceae are characterized by their telltale actinomorphic cruciform flower, with a 2 þ 4 arrangement of stamens, characteristic
silique fruit type, and a high percentage of plants with a base
chromosome number of n ¼ 8 (Warwick and Al-Shehbaz, 2006).
The Cleomaceae has greater variation in floral morphology,
including showy zygomorphic flowers and the occurrence of
C4 photosynthesis (Brown et al., 2005). The genus Cleome is by
far the largest group in the family with ;200 of the 275 species in
the family (Sanchez-Acebo, 2005). A general phylogenetic
framework for some of the taxa in these groups is presented
(Figure 1).
There are three hypotheses to describe the a-polyploidy event
in Arabidopsis relative to the divergence of the sister families
Brassicaceae and Cleomaceae (illustrated in Figure 2). First, the
polyploid event could be unique to the Brassicaceae. Second, it
could predate the divergence of these families and therefore be
shared by the Brassicaceae and Cleomaceae. Finally, there
could have been independent ancient polyploidy events in both
lineages. For each of these hypotheses, there are specific
expectations for genomic organization and relationships among
homologous genes. For the Brassicaceae-only hypothesis, there
should be a 2:1 ratio of genomic regions and genes in the
Brassicaceae compared with the Cleomaceae. For the shared
and independent hypotheses, there should be multiple homologous genomic regions and sets of genes in both lineages.
1153
Figure 2. There Are Three Hypotheses as to When the a-Polyploidy
Event in the Brassicaceae Occurred Relative to Its Divergence from the
Cleomaceae.
The hypotheses are as follows: the polyploidy event was unique to the
Brassicaceae (A), the polyploidy event predated the divergence and
hence is shared by the two lineages (B), or there were independent
polyploidy events in each lineage (C). The three hypotheses can be
distinguished by copy number and phylogenetic relationships of genes
from Arabidopsis (At; shown in purple) and Cleome (Cs; shown in green)
that are rooted relative to an outgroup (shown in black).
(A) In the Brassicaceae-only hypothesis, there would be a 2:1 ratio of
Arabidopsis genes in the Brassicaceae compared with Cleome genes in
the Cleomaceae.
(B) and (C) For the shared (B) and independent (C) hypotheses, there
should be multiple genes in both lineages.
(B) In the shared hypothesis, phylogenetic reconstructions would detect
orthologous gene pairs, resulting in a pattern with each Arabidopsis gene
copy being sister to a Cleome gene copy. Also, molecular evolutionary
analyses should detect similar degrees of divergence among the homologous genes as measured by synonymous substitution rates (Ks) or
branch lengths.
(C) In the independent hypothesis, phylogenetic analyses would detect
only paralogous sets of loci with Arabidopsis gene copies being sister to
one another and Cleome copies being sister to each other. Furthermore,
if the independent polyploidy events were of different ages, then the
levels of synonymous substitutions and branch lengths would be different between the two lineages.
However, these two hypotheses can be distinguished by the
relationship and evolution among the duplicated copies of
each gene (Cannon and Young, 2003; Blanc and Wolfe, 2004a;
Chapman et al., 2004; Van de Peer, 2004; Pfeil et al., 2005). If the
shared hypothesis was correct, phylogenetic reconstructions
should detect orthologous gene pairs, resulting in a pattern with
each Arabidopsis gene copy being sister to a Cleome gene copy.
Also, molecular evolutionary analyses would be expected to find
a similar degree of divergence among the homologous genes, as
measured by synonymous substitution rates (Ks). By contrast,
the independent hypothesis would predict that phylogenetic
analyses detect only paralogous sets of loci, with Arabidopsis
gene copies being sister to one another and Cleome copies
being sister to each other. Under this hypothesis, no clear
orthology relationships would be detectable between Cleome
and Arabidopsis copies of the genes. Furthermore, if the independent polyploidy events were of different ages, then the levels
of synonymous substitutions should be different between the
two lineages.
RESULTS
Figure 1. Phylogenetic Relationships of the Brassicaceae, Cleomaceae,
Capparaceae, and Several Other Genera in the Order Brassicales.
Modified with permission from Hall et al. (2004).
Comparative Cleome BAC Sequencing
We focused our comparative genomics analysis on regions
in Cleome that were homoeologous to a duplicated region in
1154
The Plant Cell
Arabidopsis shared between At3 and At5 (Figure 3A). Our target
region was contained within one of the largest recently duplicated blocks identified in Arabidopsis; it was the fourth largest
block (block 0305000103160) identified by Blanc et al. (2003) and
the fifth largest block (block A12) identified by Bowers et al.
(2003). Our target region within this larger block was defined by
the following flanking sets of paralogous genes: At3g02230/
At5g15650 and At3g02560/At5g16130. This interval spans
;128 kb on At3 and 178 kb on At5 (for a total of 306 kb). There
are 36 annotated genes on At3 in this interval and 51 genes
on At5. A total of 20 duplicated (paralogous) gene pairs exist
between these two regions (Figure 3A).
The screening of our Cleome BAC library with the Cleome SEP
and CO probes identified a number of putative positive clones.
Screening the clones by restriction digestion and PCR analysis
identified three independent homoeologous Cleome genomic
regions. BACs from each of the three regions that had nearly
complete target regions (identified by PCR) were chosen for
shotgun sequencing (Cleome BACs Cs_1, Cs_2, and Cs_3).
Sequencing confirmed that Cs_1 and Cs_2 BACs contained
Figure 3. Genomic Composition and Alignment of Homoeologous Target Regions Found in A. thaliana and C. spinosa.
(A) Duplicated target region in A. thaliana between At3 (green) and At5 (red) that contains a number of duplicated paralogous genes (yellow) created by
ancient polyploidy. The region is flanked by the At3g02230/At5g15650 and At3g02560/At5g16130 paralogs.
(B) Comparison of the two Arabidopsis regions to one of the three sequenced homoeologous regions identified in C. spinosa (Cs_1; shown in blue). The
Cs_1 target region contained 2 At3-specific genes (green), 6 At5-specific genes (red), 12 duplicated genes (yellow), and 14 Cleome-specific ORFs (blue;
10 of which are related to a transposon cluster), illustrating the mosaic-like homology of Cleome to Arabidopsis.
(C) Comparison of all five homoeologous genomic regions: two from Arabidopsis and three from Cleome (Cs_1 in blue, Cs_2 in light green, and Cs_3 in
purple). Three loci for which there were homologs in each of the five regions, including the loci defining the ends of the target region, are aligned (shown
in yellow) to illustrate colinearity.
Ancient Polyploidy in Brassicales
homologs to At3g02230/At5g15650 and At3g02560/At5g16130.
Cs_3 contained a homolog to At3g02230/At5g15650 but not
At3g02560/At5g16130 (the BAC ended at the homolog of
At5g16120). However, using PCR, we were able to extend and
clone the length of the Cs_3 region to include a homolog to
At3g02560/At5g16130. The length of sequence between the
flanking genes that define the target region in each BAC was 124,
102, and 143 kb, respectively (for a combined sequence length of
369 kb).
TwinScan identified a number of open reading frames (ORFs)
on each BAC. BLASTN and BLASTX analyses found that each of
the BACs contained a unique subset of genes with homology to
those from At3 and At5 or that were duplicated (Figure 3; see
Supplemental Table 2 online). For example, the target region on
BAC Cs_1 contained 2 At3-specific genes, 6 At5-specific genes,
13 duplicated genes, and 16 BAC-specific ORFs (most of which
are clustered in the center of the BAC and contain a number of
transposons) (Figure 3B). Overall, the gene density within the
target regions was very similar between Arabidopsis (with an
average gene density of one gene per 3.4 kb) and Cleome (with
an average gene density of one gene per 3.2 kb).
The five genomic regions (two in Arabidopsis and three in
Cleome) were remarkably colinear across the target region and
could easily be aligned (Figure 3C). There were three loci for which
there were five homologs, one on each genomic region, including
the loci defining the ends of the target region (Figure 3C).
The complete list of genes found on these five genomic regions
and the homology relationships among them allowed for the
reconstruction of the putative ancestral gene order (see Supplemental Table 2 online). In total, the three Cleome BACs contained
homologs of 50 of the 66 (76%) unique annotated genes found
in Arabidopsis (paralogous genes were counted only once and
excluded tRNA, transposons, and tandem duplicates found in
Arabidopsis). Of the 17 genes found in Arabidopsis for which no
Cleome homologs were identified, many were members of
multigene families. Hence, these loci were impractical for additional BAC library screening. Only one of the missing loci was
both single copy and highly conserved among many organisms
(At5g15920, the structural maintenance of chromosomes family
protein [MSS2]). The Cleome homolog of the At5g15920 gene
was PCR amplified, cloned, sequenced, and used to reprobe the
Cleome BAC library. The putative positive clones were screened
and found to correspond to only a single genomic region. Limited
shotgun sequencing was done from a pool of four of these BACs.
The similarity of these sequences was to the homolog of
At5g15920 and then to another part of the A. thaliana genome,
suggesting that this locus has undergone a rearrangement in
Cleome relative to Arabidopsis.
There were a total of 36 ORFs predicted in the three Cleome
BACs that did not have homologs within the target regions of
Arabidopsis. Many of these ORFs are short and ultimately may
not represent true genes. However, several of these are likely,
and potentially novel, genes. For example, Cs3_ORF44 is predicted to code for a 392–amino acid peptide whose only similarity
is to an unknown peptide in rice (GenBank accession number
BAD81800). There are other loci that have homology to Arabidopsis genes but not to genes found in the target interval. Often
these are members of large gene families. For example,
1155
Cs3_ORF7 has high similarity to members of phosphatidylinositol
3- and 4-kinase family proteins. Cs1_ORF27 is a 639–amino acid
protein with high similarity to pentatricopeptide family proteins
(and the gene At1g31430 in particular). Cs1_ORF25 has similarity
to Cyp89-like cytochrome P450 proteins, and Cs1_ORF31 is only
78 amino acids long and has similarity to S-adenosylmethionine
decarboxylases. This last ORF may represent a regulatory upstream ORF in the 59 region of the upstream region of the
adenosylmethionine decarboxylase Cs1_At5g15950 gene, as
has been described in other systems (Hanfrey et al., 2003).
In addition, six ORFs on Cs1 had similarity to transposon-like
genes (Cs1_ORFs 20 to 24, 26, and 41). When we did a complete
BLASTX of our BAC sequences to the conserved-motif amino
acid database of the Brassicaceae compiled by Zhang and
Wessler (2004), only Cs1_ORF20 and Cs1_ORF26 had significant similarity. Both loci were most similar to LINE element
reverse transcriptases of the I-f lineage.
For each of the sequenced Cleome BACs, we obtained some
sequence beyond the target interval. Most additional sequence
was colinear with the homologous At3 and/or At5 regions, except
for the BAC Cs_2, which showed evidence of a rearrangement
(with the homolog of At5g15650 next to the homolog of
At5g18950; data not shown). This represents the only structural
rearrangement detected in our data.
Phylogenetic Reconstructions with Homologous Gene Sets
Phylogenetic reconstructions were done for each of the replicated genes identified in both Arabidopsis and Cleome and in a
combined analysis of all duplicated loci together. There were 14
homologous gene sets in the target region and four additional
sets of loci derived from other genomic regions. Sixteen of the 18
phylogenetic reconstructions and the combined analysis identified paralogous pairs of Arabidopsis and Cleome sequences
(Figures 4 to 6). The finding of separate paralogous gene pairs in
each lineage supports the hypothesis of independent polyploidy
events. Of the two phylogenies showing orthology between
Arabidopsis and Cleome sequences (which would support the
shared hypothesis), one was in the target region and was
complicated by the detection of a potentially ancient tandem
duplicate of the gene on At5 (At5g15890 and At5g15900). Hence,
this gene set was excluded from additional analyses. Within the
target region, there were several genes found in duplicate in
Cleome that are single copy in Arabidopsis. Outside the target
region, we detected duplicate copies of SEP3 (At1g24260)
homologs, a single-copy gene in Arabidopsis.
One of the homologous gene sets examined from our target
region was the SEP1 and SEP2 family (Figure 4). Phylogenetic
relationships of the SEP genes have recently been established
(Zahn et al., 2005). We were able to expand upon this published
data set by including duplicated copies of SEP1 and SEP2
homologs from C. spinosa and Boechera stricta (GenBank
accession numbers DQ415916 and DQ415918) cloned in this
study (the B. stricta sequences were derived from the sequencing of clones from an indexed genomic l-clone library; Windsor
et al., 2006) and from Arabidopsis lyrata (AY727598 and
AY727621). Cloned sequences from other taxa were used to
increase taxonomic breadth, including Castanea mollissima
1156
The Plant Cell
Figure 4. Phylogenetic Reconstructions of Loci Replicated in Both A. thaliana and C. spinosa Support the Hypothesis That the Loci Were Duplicated
Independently, Rather Than Sharing a Common Duplication History.
Homologs to the SEP1 and SEP2 genes from a number of eudicot species were analyzed (many reported in Zahn et al., 2005), including duplicates from
C. spinosa cloned from our target region. There is a Brassicaceae-specific gene duplication with orthologous gene sets shared by Arabidopsis,
Boechera, and Brassica for SEP1 and SEP2. The Brassicaceae clade is sister to the duplicated paralogous genes from C. spinosa.
(DQ148295), Prunus dulcis (AY947464), Rosa rugosa
(AB099876), G. max (DQ159905), and Pisum sativum
(AY884290). EST sequences from B. napus (CD844028 and
CD816888), tree cotton (Gossypium arboreum; BG440326), and
citrus (Citrus sinensis; CK932769) were also used.
We recovered the same tree topology for the SEP1 and SEP2
regions as Zahn et al. (2005). However, our expanded data set
found a Brassicaceae-specific gene duplication with orthologous gene sets shared by Arabidopsis, Boechera, and Brassica
for SEP1 and SEP2 (Figure 4). The Brassicaceae clade is sister to
the duplicated paralogous genes from C. spinosa (Figure 4). The
Brassicales and other Rosid II sequences from the Malvales
(cotton) and Sapindales (citrus) form a monophyletic clade in our
analysis that is sister to the monophyletic Rosid II group. These
results agree with our current understanding of angiosperm
relationships (APG II, 2003).
In order to have an independent assessment of gene duplication, we present a representative of a duplicated gene from
outside our target region used for phylogenetic reconstruction,
the Arg decarboxylase (ADC) family (Figure 5). The ADC gene
was duplicated between At2 and At4 (ADC1, At2g16500; ADC2,
At4g34710). Sequences of ADC genes were used previously for
phylogenetic studies within the Brassicaceae where the gene
was found to be duplicated across the entire Brassicaceae,
except in the genus Aethionema (Galloway et al., 1998). By PCR
we were able to amplify duplicated copies of the ADC gene from
C. spinosa. Our phylogenetic analyses agree with those of
Galloway et al. (1998). Most important, however, was our finding
that the duplicates of Cleome are monophyletic and are therefore
independent of the duplicates in the Brassicaceae.
Finally, we combined the coding sequence data for all duplicated genes in the target regions for Arabidopsis and Cleome for
Ancient Polyploidy in Brassicales
1157
Figure 5. Phylogenetic Reconstruction Using the ADC Family, a Representative of a Duplicated Gene from Outside Our Target Region, Supports the
Hypothesis of Independent Polyploidy Events in the A. thaliana and C. spinosa Lineages.
Sequences of ADC genes were used previously for phylogenetic studies within the Brassicaceae (Galloway et al., 1998). Our phylogenetic analyses agree
with those of Galloway et al. (1998) in showing the duplication of ADC1 and ADC2 shared by most Brassicaceae, except in the genus Aethionema. Most
important, however, was our finding that the duplicates of Cleome are monophyletic and are therefore independent of the duplicates in the Brassicaceae.
phylogenetic analysis (Figure 6). The At3 and At5 sequences
contained data for all 14 duplicated genes. However, each of the
three regions in Cleome contained some, but not all, of the duplicates. Hence, the missing genes were scored as missing data.
Recent studies have concluded that phylogenetic studies containing
even large amounts of missing data can be more robust than many
smaller gene and taxa studies (Driskell et al., 2004). The unrooted
phylogeny has 100% bootstrap support for separate Arabidopsis
and Cleome clades (Figure 6). Also, the branch lengths for the
Arabidopsis sequences are longer than for the Cleome sequences.
There is some support for a closer relationship of the Cleome
genomic regions Cs_2 and Cs_1 (with 67% bootstrap support).
Calculation and Analysis of Synonymous Substitution Rates
The means for synonymous substitutions, or Ks values, between
duplicate copies of genes were calculated for the three possible
comparisons: Arabidopsis-Arabidopsis, Arabidopsis-Cleome,
and Cleome-Cleome (Table 1). Our results found ArabidopsisCleome > Arabidopsis-Arabidopsis > Cleome-Cleome. Our
mean value for the Arabidopsis-Arabidopsis of Ks ¼ 0.67 is
very similar to that of Maere et al. (2005), who used a value of Ks ¼
0.7 for the Arabidopsis polyploidy event and slightly lower than
the value of Ks ¼ 0.8 calculated by Lynch and Conery (2000).
Assuming that Ks values increase with the passage of time, this
suggests a temporal pattern for the divergence and subsequent
timing of the individual polyploidy events of these two lineages.
We also performed an analysis of variance on the Ks values
from the comparisons of Arabidopsis-Arabidopsis and CleomeCleome and looked for locus-specific variation in synonymous
substitution rates (Table 2). Levels of synonymous differentiation
differed significantly between Arabidopsis-Arabidopsis and
Cleome-Cleome (P ¼ 0.0007), but we found no evidence of
heterogeneous rates among loci (P ¼ 0.228). Hence, the age of
1158
The Plant Cell
23 ¼ 46), and 3 in triplicate (3 þ 3 þ 3 ¼ 9) for a total of 79 loci.
Thus, the overall degree of gene retention is only 79/150 ¼ 53%.
DISCUSSION
Figure 6. Aligned Duplicated Gene-Coding Sequences for Our Target
Region from A. thaliana Chromosomes 3 and 5 (At3 and At5) and the
Three Cleome BAC Regions (Cs1, Cs2, and Cs3) Were Each Concatenated into Single Large Sequences.
Missing genes were treated as gaps. Phylogenetic analysis of these five
concatenated sequences demonstrates the relationships of the five
homoeologous target regions, with the three Cleome regions being more
closely related to each other than to any one of the Arabidopsis regions.
the duplication events in the two lineages, as measured by Ks
values, is different.
Patterns of Duplicate Gene Retention and Loss
The sequencing of the three replicated genomic regions from C.
spinosa allowed us to examine the overall patterns of duplicate
gene retention and loss compared with that observed within A.
thaliana (Table 3, Figure 7). A x2 test of independence between
species and copy number (single versus duplicate) found a highly
significant result (P ¼ 0.0079). Retention of duplicate copies in
the two species was nonrandom. The copy number of particular
loci is correlated in these two independently evolving lineages
(Table 3).
We compiled the gene ontology (GO) annotations of the genes
that were either duplicated in both lineages or single copy in both
lineages. Although genes can have multiple GO annotations and
therefore may be represented multiple times, we report the
number of gene annotations in each molecular function GO
category for single and duplicated copy genes (Figure 7). The
most obvious overrepresented category for single-copy genes
was hydrolase activity, and the most overrepresented category
for duplicated genes was protein binding (Figure 7).
Assuming that there were 50 ancestral genes in the target
region at the split between Brassicaceae and Cleomaceae (this is
a conservative number based on the number of genes shared
now between Cleome and Arabidopsis), a genome duplication in
the Arabidopsis lineage would have resulted in 100 loci and a
genome triplication in Cleome with 150 loci. For these genes in
the two Arabidopsis regions, there are currently 30 genes in
single copy and 20 in duplicate (20 þ 20 ¼ 40) for a total of 70 loci.
The overall degree of gene retention is thus 70/100 ¼ 70%.
In Cleome, there are 24 single-copy genes, 23 in duplicate (23 þ
Since the ancient polyploid history of A. thaliana was illuminated
by genome analyses, there has been much speculation about the
timing and phylogenetic position of the genome doubling (Arabidopsis Genome Initiative, 2000; Vision et al., 2000; Blanc et al.,
2003; Bowers et al., 2003; Ermolaeva et al., 2003; De Bodt et al.,
2005). We took a comparative genomic approach to address this
issue in relation to the divergence of the sister families Brassicaceae and Cleomaceae (Hall et al., 2002). Our results strongly
support the hypothesis of independent ancient polyploidy events
of different ages in these two sister lineages. First, we identified
three independent and unique BACs that were homoeologous to
a duplicated region between At5 and At3. We also documented
four additional duplicated homologous genes located in other
parts of the A. thaliana genome. Second, the phylogenetic
analysis of almost all of the duplicated homologous gene groups
and of a combined data set found pairs of Arabidopsis paralogs,
clearly separate from Cleome paralogs. Third, the synonymous
substitution rates (Ks values) between Arabidopsis paralogs and
Cleome paralogs were significantly different. All of these factors
combine to make an independent polyploidy event the most
likely explanation. However, until complete genomic data becomes available, we cannot yet rule out the possibility of segmental polyploidization in Cleome.
The occurrence of independent ancient polyploidy events in
these closely related lineages allows analysis of the relative
timing of polyploidization, patterns of gene retention and loss,
and genome evolution. Our results highlight the continuous
cycles of polyploidy and diploidization that have occurred during
angiosperm evolution.
Timing of Ancient Polyploidy Events
Previous studies have used several approaches to infer the
timing of the recent (or a) polyploidy event in Arabidopsis (Blanc
et al., 2003; Bowers et al., 2003; Maere et al., 2005). Bowers et al.
(2003) found that the polyploidy event occurred sometime between the divergence of the Brassicales from the Malvales
(estimated to have occurred 86 Mya) and the divergence of
Arabidopsis from Brassica (estimated to have occurred 14.5 to
20 Mya). Blanc et al. (2003) used divergence times calculated for
Table 1. Least Square Means of Synonymous Substitution Rates
Calculated from Sequence Alignments from Loci That Are Duplicated
in Both A. thaliana and C. spinosa
Comparison
Mean
SE
n
At–Cs
At–At
Cs–Cs
0.816
0.674
0.398
0.036
0.036
0.036
13
13
13
Synonymous substitution rates measured by Ks, which equals the
number of substitutions per synonymous site in a coding sequence.
At, A. thaliana; Cs, C. spinosa.
Ancient Polyploidy in Brassicales
Table 2. Analysis of Variance of the Calculated Ks Values of Loci That
Are Duplicated in Both A. thaliana and C. spinosa
Source
df
F-Ratio
P
Species comparison
Locus
Error
1
12
12
20.452
1.556
0.0007
0.2280
n ¼ 26; r2 ¼ 0.765.
the Brassicaceae (Koch et al., 2001) and placed the polyploidy
event some 24 to 40 Mya, near the divergence of the Brassicaceae from the Cleomaceae. To circumvent the problems of
molecular dating and the conflicts over molecular clocks, Maere
et al. (2005) suggested reporting the age of ancient polyploid
events in Ks units rather than in millions of years. By this measure,
Maere et al. (2005) have given a unit value of Ks ¼ 0.7 to the
Arabidopsis genome duplication. Our Ks estimate of 0.67 for
Arabidopsis paralogs is very similar to this.
Following the suggestion of Maere et al. (2005), we use our
calculated Ks values to estimate the temporal pattern of divergence of the two lineages and of the two independent polyploidy
events. The largest estimate of Ks, and hence the oldest event, is
the divergence of Cleomaceae and Brassicaceae lineages (Ks ¼
0.82). The next event is polyploidization within the Brassicaceae
(Ks ¼ 0.67) and finally the younger polyploidization within the
Cleomaceae (Ks ¼ 0.40). For comparison, Blanc et al. (2003)
estimated the median difference between cotton and Arabidopsis as Ks ¼ 1.8. Thus, the divergence of the Brassicaceae and
Cleomaceae is much more recent than the divergence of the
Brassicales and Malvales, agreeing with phylogenetic reconstructions (APG II, 2003; Hall et al., 2004). Synonymous differentiation between Arabidopsis and Brassica paralogs is Ks ¼
0.46 (Blanc et al., 2003). Hence, the likely whole-genome polyploidy event we have detected in Cleome is of approximately the
same age or younger than the divergence of the Arabidopsis from
Brassica lineages. If we then use the widely cited estimates for
the divergence of Brassica from Arabidopsis of 20 Mya (Yang
et al., 1999; Koch et al., 2001), we can approximate the following
timing of these events: the divergence of the Brassicaceae and
Cleomaceae lineages splitting 41 Mya, followed by the polyploidization event in the Brassicaceae lineage 34 Mya, and finally
a likely whole-genome polyploidization event occurring in the
Cleomaceae lineage 20 Mya.
One of the most important conclusions to be drawn from our
analysis is that the divergence of the Cleome and Arabidopsis
lineages is older than the Arabidopsis lineage polyploidy event. A
major goal now will be to determine when the polyploid event
occurred during the evolution of the Brassicaceae, particularly in
relationship to the divergence of the genus Aethionema from the
rest of the family. A previous study (Galloway et al., 1998)
presented evidence that ADC was single copy in Aethionema
but duplicated throughout the rest of the family. Whether other
genes confirm the nonduplicated nature of Aethionema relative
to the rest of the Brassicaceae remains an important question.
Another unresolved question is when the likely whole-genome
polyploid event occurred during the evolution of the Cleomaceae.
1159
There is a well-supported New World temperate clade containing
the genera Cleomella, Oxystylis, Wislizenia, and Isomeris that is
sister to the remainder of the family (Hall et al., 2002). These
genera have a base chromosome number of n ¼ 20, suggesting a
shared and potentially recent polyploid history. The genus Polansia (n ¼ 10) is then sister to the Old and New World members of the
genus Cleome (with a base chromosome number of n ¼ 10 in
Cleome section Tarenaya; H.H. Iltis, personal communication).
The work of Galloway et al. (1998) detected only a single ADC gene
in Polansia that appears sister to the duplicated copies detected in
Cleome (Figure 5). In future work, we hope to determine if the
ancient polyploid event detected in the Cleome lineage is shared
with Polansia and/or with the North American Cleomaceae clade,
or, alternatively if there was yet another independent polyploid
event within this North American group.
Patterns of Gene Retention and Loss
The occurrence of independent ancient polyploid events in
closely related lineages provides an excellent opportunity to
explore patterns of gene evolution by looking for convergent
fates in gene copy number. By analyzing the Arabidopsis genome, several authors have suggested that there are biases and
evolutionary pressures for the types of genes to be maintained in
duplicate or single copy (Blanc and Wolfe, 2004b; Seoighe and
Gehring, 2004; Maere et al., 2005; Chapman et al., 2006). In our
study, we found a highly significant bias for the genes occurring
in duplicate or single copy in both Arabidopsis and Cleome,
although the overlap is not absolute. Thus, comparative analysis
could suggest which genes are particularly advantageous to
maintain in duplicate or to have in single copy. Alternatively,
genes found in duplicate in one lineage but returned to single
copy in the other provide a good system to explore hypotheses of
subfunctionalization and neofunctionalization (Force et al., 1999;
Prince and Pickett, 2002).
When we examined the GO annotations of the genes found in
duplicate in both lineages or single copy in both lineages, we
found several biased categories despite our rather small sample
of the genomes (only ;0.2% of the Arabidopsis genome). The
most obviously overrepresented single-copy gene category
was for hydrolase activity. Interestingly, Maere et al. (2005)
Table 3. A 2 3 2 Contingency Table Comparing the Observed and
Expected Numbers of Single versus Duplicate Copy Genes under the
Hypothesis of Independence between the Two Lineages Using a x2
Test of Independence
A. thaliana Genes
Cleome
Genes
Single
Duplicated
Sum
Single
Duplicated
19 (14.4)
11 (15.6)
5 (9.6)
15 (10.4)
24
26
Sum
30
20
50
x2
Test
x2 Value: 7.065
P value: 0.00786**
Asterisks indicate that the result is significant at the 0.01 level.
1160
The Plant Cell
Figure 7. GO Annotations Were Compiled for Genes That Were Either Duplicated in Both Lineages or Single Copy in Both Lineages.
Shown are the numbers of gene annotations in each molecular function GO category for single and duplicated copy genes. The most obvious
overrepresented category for single-copy genes was hydrolase activity, and an overrepresented category for duplicated genes was protein binding.
found hydrolases to be the most biased GO molecular function
category for single-copy genes (or duplicated by small tandem
duplications). Similarly, the most biased duplicate copy gene
category we identified was protein binding genes. Maere et al.
(2005) also found protein binding genes to be one of the most
biased to be retained in duplicate due to ancient polyploidy.
More complete genome sampling of Cleome could further refine
GO category biases. These results emphasize, however, that
common processes have been operating in these independently
evolving lineages, rather than lineage-specific biases in gene
retention or loss.
A comparative analysis of genes within our target region has
the potential to elucidate gene evolution and function. The
maintenance of duplicate copies of genes in both Arabidopsis
and Cleome may indicate that it is beneficial or necessary to do
so. For example, we found SEP1 and SEP2 genes to be duplicated in both lineages. This agrees with published reports that
MADS box genes, and SEP genes in particular, often have been
maintained in duplicate throughout angiosperm evolution (Irish
and Litt, 2005; Malcomber and Kellogg, 2005; Zahn et al., 2005),
allowing for the evolution of new floral forms by subfunctionalization and neofunctionalization. Recent work with SEP1 and
SEP2 in Arabidopsis found significant differences in gene expression, suggesting a degree of subfunctionalization (Duarte
et al., 2006). Investigations of the patterns of expression of MADS
box genes, such as SEP1 and SEP2 homologs, could be done to
explore differences in floral development.
Other genes are duplicated in one lineage and single copy in
the other. For example, CO is replicated in Arabidopsis but exists
in single copy in Cleome. In Arabidopsis, there is a tandem
duplication on At5, giving rise to CO and CONSTANS-LIKE1
(COL1), and their paralog on At3, COL2 (Ledger et al., 2001). CO
is an important regulator of flowering time, integrating signals for
daylength (Corbesier and Coupland, 2005), whereas COL1 and
COL2 mutants have no obvious flowering time phenotypes but
may be involved with the circadian clock (Ledger et al., 2001). A
comparison of the function of the single CO-like homolog in
Cleome to the three loci in Arabidopsis could be done to look for
evidence of subfunctionalization.
Polyploidization in Cleome
Analysis of the triplicate regions in Cleome enables several
inferences about the polyploidization process. First, we argue
Ancient Polyploidy in Brassicales
that the triplication was most likely due to whole-genome
polyploidization and not segmental duplication or aneuploidy.
Second, we discuss evidence that the Cleome lineage was
derived from an ancestral hexaploid ancestor.
In addition to the triplicated genomic region that we sequenced in Cleome, we also amplified replicated copies for
four additional loci. These additional loci are found in duplicate in
Arabidopsis and are scattered about the genome. The finding of
additional Cleome replicated genes of the same approximate
age as our triplicated region suggests that there was a wholegenome triplication. Additionally, phylogenetic reconstructions
with these loci show similar patterns suggestive of independent
polyploidy events in the Arabidopsis and Cleome lineages (e.g.,
Figure 5). It is possible that replicated regions in Cleome were
derived by aneuploidy or segmental duplication. Examples of
large-scale segmental duplications have been recently detected
in the rice genome (Yu et al., 2005). Such debates over segmental
versus whole-genome duplication in various lineages have a long
history (Wolfe, 2001; Panopoulou and Poustka, 2005) and can be
resolved by whole-genome sequence analysis (Jaillon et al.,
2004; Kellis et al., 2004; Paterson et al., 2004). The generation of
additional sequence data and/or comparative mapping information from Cleome will likely be necessary to fully determine
whether it underwent segmental duplication or polyploidy.
The likely whole-genome triplication observed in Cleome
suggests the ancestral Cleome polyploid was a hexaploid (6x),
which subsequently underwent diploidization. For example, it
has been hypothesized that diploid Brassica species are derived
from a hexaploid ancestor, and now much of the genome exists
in triplicate compared with Arabidopsis (Lagercrantz, 1998;
Lukens et al., 2004; Lysak et al., 2005; Parkin et al., 2005). In
addition, Brassica shares the ancient polyploidy event seen in
Arabidopsis (Bowers et al., 2003; Parkin et al., 2005). Hence,
diploid Brassica species contain six genomic regions that are
homoeologous to a single ancestral region. Therefore, it is
feasible that a direct comparison of two Arabidopsis, six Brassica, and three Cleome homoeologous regions could be made.
Genome Evolution
After polyploidization, genomes begin the diploidization process
involving both gene loss and chromosomal rearrangements (Ma
and Gustafson, 2005). A comparison of our Cleome and Arabidopsis regions allows us to examine patterns of gene loss,
microsynteny, and their effects on genome size.
The patterns of gene loss are important for understanding both
the evolution of the genes themselves and the genome in
general. For genes present in both genomes, 60% of Arabidopsis
genes are now present in single copy compared with 48% in
Cleome. Although more genes are found in single copy in A.
thaliana, the degree of gene retention is higher (70% versus
53%). The lower degree of gene retention in Cleome is surprising
considering it is a much younger polyploidy event. The difference
of gene loss between the two lineages is likely attributable to the
initial degree of replication. For a gene to become single copy in
Cleome, two paralogous sequences must be lost. Most replicated genes in Cleome are duplicated (23 pairs) rather than
triplicated (only three loci), meaning one gene copy has been
1161
lost. A similar degree of high gene loss in ancient triplicated
genomes has been found by analysis of the diploid Brassica
genomes (O’Neill and Bancroft, 2000; Quiros et al., 2001; Rana
et al., 2004; Yang et al., 2005). These results of increased gene
loss in ancient hexaploids compared with ancient tetraploids is
consistent with an analysis of a large number of taxa where it was
found that mean DNA amount per ancestral genome tended to
decrease with increasing ploidy (Leitch and Bennett, 2004).
Of the genes found in the two Arabidopsis regions, we were
unable to find homologs to 24% of them in the Cleome regions.
Although most of these were members of gene families in
Arabidopsis, we did find evidence for the transduction of at least
one locus. Similarly, we predicted 36 unique ORFs in the three
Cleome regions. Many of these were short and may not represent
true genes. However, many of these Cleome ORFs were similar
to members of gene families in Arabidopsis but did not have
homologs within the target interval. Hence, in both Arabidopsis
and Cleome, there appears to be a migration of members of gene
families in and out of various genomic locations that affects the
microsynteny of the genomes.
In addition to gene loss and movement, chromosomal rearrangements play an important role in the evolution of genomes
after polyploidization. The three regions in Cleome and the two
regions in Arabidopsis showed a remarkable colinearity across
our target region despite the degree of gene loss and movement
that has occurred in each lineage subsequent to polyploidization.
We detected a single inversion or rearrangement in one of the
sequenced BACs, but this lay outside of our target interval. The
degree of large-scale rearrangements can be ascertained by
comparative mapping between Brassicaceae species (e.g.,
Koch et al., 2001; Parkin et al., 2005) and a genetic linkage map
of Cleome. Such broad taxonomic comparative mapping has
been very successful for understanding genome evolution in the
grasses (Feuillet and Keller, 2002; Devos, 2005).
The processes of gene loss and chromosomal rearrangements
will obviously contribute to the evolution of genome size. Within
our target interval, the three Cleome regions are only 1.2 times
the size of the two Arabidopsis regions, and the complete
genome of Cleome is ;1.9 times the size of Arabidopsis (M.E.
Schranz and T. Mitchell-Olds, unpublished data; Johnston et al.,
2005). Hence, the Cleome genome is quite compact, especially
considering that the A. thaliana genome is greatly reduced compared with other members of the genus Arabidopsis (Johnston
et al., 2005). For example, the closest nonpolyploid relative of
A. thaliana reported in Johnston et al. (2005), A. halleri, has a
genome 1.6 times that of A. thaliana. The small genome size of
Cleome may be due to both increased gene loss and suppression of transposon activity. Within our Cleome BACs, there were
very few transposons, with the only clear similarity to ubiquitous
LINE elements. The ancient hexaploid Brassica genomes range
from 3.3 to 4.4 times the size of Arabidopsis (Johnston et al.,
2005). Therefore, although there is substantial gene loss in
Brassica genomes after polyploidization, the overall genome
size is relatively large because of transposon expansions (Zhang
and Wessler, 2004). The combination of small genome size,
independent polyploidy, and close phylogenetic position to
Arabidopsis all make Cleome an attractive candidate for complete genome sequencing.
1162
The Plant Cell
The use of both comparative genomic and phylogenetic
approaches was critical for our elucidation of the independent
polyploidy events. A phylogenetic approach alone, for example,
using EST sequences, would likely have detected independent
polyploid events but probably would have concluded that
Cleome was duplicated rather than triplicated. A comparative
mapping approach may have been able to detect ancient triploidy, but without the phylogenetic analysis it would have been
impossible to say that they were independent polyploidy events.
Our genomic approach also allowed for an assessment of the
types of genes present in duplicate versus single copy. Finally,
the occurrence of independent ancient polyploidy events in
closely related lineages provides an interesting opportunity to
examine duplicate gene evolution and its contribution to phenotypic diversity.
METHODS
Direct nucleic acid labeling and detection system (Amersham Biosciences) according to the manufacturer’s protocol.
Several positive clones were isolated from each hybridization and
amplified in small overnight culture, and DNA minipreparations for each
BAC were done using standard alkaline lysis procedures. The minipreparations of BAC DNA were fingerprinted by restriction digest and
screened by PCR with primers to many of the putative genes located
within the genomic target region.
Three distinct BAC regions, each covering the approximate interval
from At5g15650 to At5g16130, were grown in 250 mL TB liquid cultures
with chloramphenicol selection. BAC DNA isolation was done using the
NucleoBond BAC100 kit (Macherey-Nagel) according to the manufacturer’s instructions. The BAC DNA was then cleaned of contaminating
sequences using plasmid-safe ATP-dependent DNase (Epicentre Biotechnologies). The selected BAC clones were shotgun sequenced by
subjecting each to separate partial digestions with Tsp509I and Sau3AI
enzymes, size fractionating into 12- to 18-kb fragments, cloning into a
pUC19 vector, and end-sequencing the clones with an ABI3700 capillary
sequencer. Sequences were assembled and edited with Seqman 5.0
(DNASTAR).
Plant Materials and DNA Isolation
Plants were grown from seeds of Cleome spinosa (ES1046; Spinnenpflanze) purchased from Kiepenkerl. DNA isolations were done using
Qiagen genomic tips and by a modified CTAB procedure as previously
described (Schranz et al., 2005).
Amplification, Cloning, and Sequencing of PCR Products
Conserved primer pairs were identified from the alignment of paralogous
gene pairs found in the Arabidopsis thaliana genome (Blanc et al., 2003;
Bowers et al., 2003). Most of the primers were to loci within our target
duplication region between At3 and At5 (see below). However, an
additional four primer pairs were to paralogous genes located in other
parts of the A. thaliana genome. In addition, several primer sets were
designed to genes found in single copy in A. thaliana. The genes and
primer sequences used can be found in Supplemental Table 1 online.
PCR products were obtained with 35 cycles as follows: 948C, 30
s/608C, 30 s/728C, 2 min. Two independent PCRs were performed for
each sample, and products were cloned using the TOPO TA cloning kit
(Invitrogen Life Technologies). At a minimum, four clones per cloning
reaction were sequenced as above on both strands with an ABI3700
capillary sequencer (Applied Biosystems).
BAC Library
A BAC library for our C. spinosa genotype was made by Keygene and
cloned into a pIndigoBAC-536 vector. Colony picking and filter spotting
was done by RZPD (library No. 1097).
Clone Identification and Sequencing
For comparative genomic analyses, we identified C. spinosa BAC regions
that corresponded to a duplicated A. thaliana region between At3 and
At5. The target regions in Arabidopsis were centered on paralogous pairs
of the SEP and CO regulatory genes (SEP2 [At3g02310] and COL2
[At3g02380] located on At3 and SEP1 [At5g15800] and CO [At5g15840]
located on At5).
The homologous C. spinosa BAC clones were identified by filter
hybridization of gridded nylon membranes containing the clones of the
BAC library. Gene products for the Cleome homologs of CO/COL2 and
SEP1 and SEP2 were amplified, and PCR products were gel-purified
using the QIAquick gel extraction kit (Qiagen). One hundred nanograms of
each purified PCR product was labeled and hybridized using the ECL
Sequence Annotation
Gene detection from each C. spinosa BAC sequence was done using the
TwinScan program (Korf et al., 2001) with A. thaliana as the informant
genome. The homology of each ORF predicted by TwinScan relative to
A. thaliana was ascertained using BLASTN or WU-BLAST servers at the
National Center for Biotechnology Information and The Arabidopsis
Information Resource (TAIR) (Altschul et al., 1990; WU-BLAST, W. Gish,
1996–2004; http://blast.wustl.edu). We named the genes using the first
letters of the species name followed by the BAC number (1, 2, or 3) and
the unique identifier from the TAIR ID in A. thaliana. If the homology was to
a duplicated locus between At3 and At5, we always used the homology to
At5 for naming purposes. For example, the three homologs of At5g15650
genes in C. spinosa are named as follows: Cs1_At5g15650,
Cs2_At5g15650, and Cs3_At5g15650. The BLAST analyses were also
used to identify additional homologous coding sequences from other taxa
to be used in phylogenetic analyses.
Sequence Alignment and Phylogenetic Analysis
Duplicate copies of 13 loci from the target region and an additional four
loci from cloned PCR products from other regions were found in both
A. thaliana and C. spinosa. The coding sequences of these loci were used
for phylogenetic and molecular evolutionary analyses.
Alignments of homologous sets of loci were done on translated protein
sequences using ClustalW (Thompson et al., 1994) with a gap open
penalty of 8 and gap extension penalty of 0.3. All nucleotide sequence
alignments were then visually inspected to improve the alignment.
Phylogenetic reconstruction was performed by analyzing DNA sequences with maximum-likelihood (ML) methods in the PHYML program
(Guindon et al., 2005). For the likelihood analyses of each of the independent and combined analyses, we selected models of molecular
evolution on the basis of MODELTEST (Posada and Crandall, 1998) as
implemented by FindModel (http://hcv.lanl.gov/content/hcvdb/findmodel/
findmodel.html). The ML parameter values were optimized, with a BIONJ
tree as a starting point (Gascuel, 1997). Support values for nodes on
the ML tree were estimated with 1000 bootstrap replicates (Felsenstein,
1985).
In addition, the aligned duplicated gene coding sequences from At3,
At5, and the three Cleome BAC regions were each concatenated into one
large sequence. Missing genes were treated as gaps. The total aligned
sequence was 12,934 bp long. In order to examine the overall relationships of the regions, these five concatenated sequences were used for
Ancient Polyploidy in Brassicales
phylogenetic analysis in the same manner as described above for
individual genes. The phylogenies were then examined for branching
patterns of Cleome copies relative to Arabidopsis copies for evidence of
orthologous versus paralogous relationships (Bowers et al., 2003; Chapman
et al., 2004; Pfeil et al., 2005).
1163
and assistance. Finally, we thank Maria Clauss, Chris Pires, Martin
Lysak, Amy Bouck, Marcus Koch, David Baum, George Coupland,
Laurent Corbesier, Jeff Doyle, and Ihsan Al-Shehbaz for illuminating
discussions. The Max Planck Gesellschaft and Duke University provided
support for this research.
Evolutionary Analysis
For 13 genes that were duplicated in both Arabidopsis and Cleome,
synonymous substitution rates (Ks values) were calculated from the
alignments of the coding regions using the DnaSP v.4.10.4 computer program (Rozas et al., 2003). There were three categories of comparison:
(1) Arabidopsis-Arabidopsis, (2) Arabidopsis-Cleome, and (3) CleomeCleome. Within each category of comparison, we calculated Ks among
all pairwise gene combinations and summarized the central tendency of
each comparison using the median Ks value. These calculations were
performed for each locus. Finally, the median value for each comparison for
each locus was used as input data for a fixed effects analysis of variance
testing cross-classified effects of comparison and locus using Systat v.10
(Systat Software). Least square mean values of Ks for these three categories of comparison are reported in Table 1. Table 2 compares Ks values
among Arabidopsis paralogs versus among Cleome paralogs (crossclassified analysis of variance with 1 df for the species comparison).
We built a 2 3 2 contingency table (Table 3) to compare the observed and
expected numbers of duplicate versus single-copy genes under the hypothesis of independence between the two lineages and performed a
standard x2 test of independence using DnaSP v.4.10.4 (Rozas et al., 2003).
GO annotations for A. thaliana genes in our target genomic regions
were downloaded from TAIR (www.arabidopsis.org). The GO annotations
of genes found in duplicate in both Arabidopsis and Cleome were
contrasted to those found in single copy in both (Figure 7).
The amount of gene loss in the target region was calculated for both
Arabidopsis and Cleome, assuming that there were 50 ancestral genes
(the number of loci shared between the two genomes in the target region).
Genome duplication in the Arabidopsis lineage would result in 100 loci,
and a genome triplication in Cleome with 150 loci. The number of loci for
these shared genes now present in the two genomes was used to
calculate the degree of gene loss.
Accession Numbers
Annotated sequence data for each BAC can be found in the GenBank/
EMBL data libraries under accession numbers DQ415920, DQ415921,
and DQ415922. SEP1 and SEP2 homologs from Boechera stricta can be
found under accession numbers DQ415916 and DQ415918. Duplicated
ADC loci from C. spinosa can be found under accession numbers
DQ415917 and DQ415919.
Supplemental Data
The following materials are available in the online version of this article.
Supplemental Table 1. PCR Primers Used to Clone Homologs from
Cleome spinosa.
Supplemental Table 2. All Genes Found in the Homoeologous Target
Regions in Arabidopsis thaliana and Cleome spinosa, with the Shared
Genes between the Two Lineages Placed into a Reconstructed
Ancestral Gene Order.
ACKNOWLEDGMENTS
We thank Jocelyn Hall and the three reviewers for helpful comments on
the manuscript. We also thank Heiko Vogel, Karl Schmid, Domenica
Schnabelrauch, Xiaoyu Zhang, and Aaron Windsor for technical advice
Received January 13, 2006; revised February 27, 2006; accepted March
21, 2006; published April 14, 2006.
REFERENCES
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J.
(1990). Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410.
APG II (2003). An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J.
Linn. Soc. 141, 399–436.
Arabidopsis Genome Initiative (2000). Analysis of the genome
sequence of the flowering plant Arabidopsis thaliana. Nature 408,
796–815.
Blanc, G., Hokamp, K., and Wolfe, K.H. (2003). A recent polyploidy
superimposed on older large-scale duplications in the Arabidopsis
genome. Genome Res. 13, 137–144.
Blanc, G., and Wolfe, K.H. (2004a). Widespread paleopolyploidy in
model plant species inferred from age distributions of duplicate
genes. Plant Cell 16, 1667–1678.
Blanc, G., and Wolfe, K.H. (2004b). Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant
Cell 16, 1679–1691.
Bowers, J.E., Chapman, B.A., Rong, J.K., and Paterson, A.H. (2003).
Unravelling angiosperm genome evolution by phylogenetic analysis of
chromosomal duplication events. Nature 422, 433–438.
Brown, N.J., Parsley, K., and Hibberd, J.M. (2005). The futured C-4
research – Maize, Flaveria or Cleome? Trends Plant Sci. 10, 215–221.
Cannon, S.B., and Young, N.D. (2003). OrthoParaMap: Distinguishing
orthologs from paralogs by integrating comparative genome data and
gene phylogenies. BMC Bioinformatics 4, 35.
Chapman, B.A., Bowers, J.E., Feltus, F.A., and Paterson, A.H. (2006).
Buffering of crucial functions by paleologous duplicated genes may
contribute cyclicality to angiosperm genome duplication. Proc. Natl.
Acad. Sci. USA 103, 2730–2735.
Chapman, B.A., Bowers, J.E., Schulze, S.R., and Paterson, A.H.
(2004). A comparative phylogenetic approach for dating whole genome duplication events. Bioinformatics 20, 180–185.
Corbesier, L., and Coupland, G. (2005). Photoperiodic flowering of
Arabidopsis: Integrating genetic and physiological approaches to
characterization of the floral stimulus. Plant Cell Environ. 28, 54–66.
De Bodt, S., Maere, S., and Van de Peer, Y. (2005). Genome duplication and the origin of angiosperms. Trends Ecol. Evol. 20, 591–597.
Devos, K.M. (2005). Updating the ‘crop circle’. Curr. Opin. Plant Biol. 8,
155–162.
Driskell, A.C., Ane, C., Burleigh, J.G., McMahon, M.M., O’Meara,
B.C., and Sanderson, M.J. (2004). Prospects for building the tree of
life from large sequence databases. Science 306, 1172–1174.
Duarte, J.M., Cui, L.Y., Wall, P.K., Zhang, Q., Zhang, X.H., LeebensMack, J., Ma, H., Altman, N., and dePamphilis, C.W. (2006).
Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. Mol. Biol. Evol. 23, 469–478.
Ermolaeva, M.D., Wu, M., Eisen, J.A., and Salzberg, S.L. (2003). The
age of the Arabidopsis thaliana genome duplication. Plant Mol. Biol.
51, 859–866.
1164
The Plant Cell
Felsenstein, J. (1985). Confidence limits on phylogenies: An approach
using the bootstrap. Evolution Int. J. Org. Evolution 39, 783–791.
Feuillet, C., and Keller, B. (2002). Comparative genomics in the grass
family: Molecular characterization of grass genome structure and
evolution. Ann. Bot. (Lond.) 89, 3–10.
Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., and
Postlethwait, J. (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545.
Gallardo, M.H., Bickham, J.W., Honeycutt, R.L., Ojeda, R.A.,
and Kohler, N. (1999). Discovery of tetraploidy in a mammal. Nature
401, 341.
Galloway, G.L., Malmberg, R.L., and Price, R.A. (1998). Phylogenetic
utility of the nuclear gene arginine decarboxylase: An example from
Brassicaceae. Mol. Biol. Evol. 15, 1312–1320.
Gascuel, O. (1997). BIONJ: An improved version of the NJ algorithm
based on a simple model of sequence data. Mol. Biol. Evol. 14,
685–695.
Gaut, B.S., and Doebley, J.F. (1997). DNA sequence evidence for the
segmental allotetraploid origin of maize. Proc. Natl. Acad. Sci. USA
94, 6809–6814.
Guindon, S., Lethiec, F., Duroux, P., and Gascuel, O. (2005). PHYML
Online – A web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 33, W557–W559.
Hall, J.C., Iltis, H.H., and Sytsma, K.J. (2004). Molecular phylogenetics
of core brassicales, placement of orphan genera Emblingia, Forchhammeria, Tirania, and character evolution. Syst. Bot. 29, 654–669.
Hall, J.C., Sytsma, K.J., and Iltis, H.H. (2002). Phylogeny of Capparaceae and Brassicaceae based on chloroplast sequence data. Am. J.
Bot. 89, 1826–1842.
Hanfrey, C., Franceschetti, M., Mayer, M.J., Illingworth, C., Elliott,
K., Collier, M., Thompson, B., Perry, B., and Michael, A.J. (2003).
Translational regulation of the plant S-adenosylmethionine decarboxylase. Biochem. Soc. Trans. 31, 424–427.
Irish, V.F., and Litt, A. (2005). Flower development and evolution: Gene
duplication, diversification and redeployment. Curr. Opin. Genet. Dev.
15, 454–460.
Jaillon, O., et al. (2004). Genome duplication in the teleost fish
Tetraodon nigroviridis reveals the early vertebrate proto-karyotype.
Nature 431, 946–957.
Johnston, J.S., Pepper, A.E., Hall, A.E., Chen, Z.J., Hodnett, G.,
Drabek, J., Lopez, R., and Price, H.J. (2005). Evolution of genome
size in Brassicaceae. Ann. Bot. (Lond.) 95, 229–235.
Kashkush, K., Feldman, M., and Levy, A.A. (2002). Gene loss, silencing and activation in a newly synthesized wheat allotetraploid.
Genetics 160, 1651–1659.
Kellis, M., Birren, B.W., and Lander, E.S. (2004). Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617–624.
Koch, M., Haubold, B., and Mitchell-Olds, T. (2001). Molecular systematics of the Brassicaceae: Evidence from coding plastidic matK
and nuclear Chs sequences. Am. J. Bot. 88, 534–544.
Korf, I., Flicek, P., Duan, D., and Brent, M.R. (2001). Integrating
genomic homology into gene structure prediction. Bioinformatics 17,
S140–S148.
Lagercrantz, U. (1998). Comparative mapping between Arabidopsis
thaliana and Brassica nigra indicates that Brassica genomes have evolved
through extensive genome replication accompanied by chromosome
fusions and frequent rearrangements. Genetics 150, 1217–1228.
Le Comber, S.C., and Smith, C. (2004). Polyploidy in fishes: Patterns
and processes. Biol. J. Linn. Soc. Lond. 82, 431–442.
Ledger, S., Strayer, C., Ashton, F., Kay, S.A., and Putterill, J. (2001).
Analysis of the function of two circadian-regulated CONSTANS-LIKE
genes. Plant J. 26, 14–22.
Leitch, I.J., and Bennett, M.D. (2004). Genome downsizing in polyploid
plants. Biol. J. Linn. Soc. Lond. 82, 651–663.
Lukens, L.N., Quijada, P.A., Udall, J., Pires, J.C., Schranz, M.E., and
Osborn, T.C. (2004). Genome redundancy and plasticity within ancient and recent Brassica crop species. Biol. J. Linn. Soc. Lond. 82,
665–674.
Lynch, M., and Conery, J.S. (2000). The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155.
Lysak, M.A., Koch, M.A., Pecinka, A., and Schubert, I. (2005).
Chromosome triplication found across the tribe Brassiceae. Genome
Res. 15, 516–525.
Ma, X.F., and Gustafson, J.P. (2005). Genome evolution of allopolyploids: A process of cytological and genetic diploidization. Cytogenet.
Genome Res. 109, 236–249.
Mable, B.K. (2004). Why polyploidy is rarer in animals than in plants’:
myths and mechanisms. Biol. J. Linn. Soc. Lond. 82, 453–466.
Maere, S., De Bodt, S., Raes, J., Casneuf, T., Van Montagu, M.,
Kuiper, M., and Van de Peer, Y. (2005). Modeling gene and genome
duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102, 5454–
5459.
Malcomber, S.T., and Kellogg, E.A. (2005). SEPALLATA gene diversification: Brave new whorls. Trends Plant Sci. 10, 427–435.
O’Neill, C.M., and Bancroft, I. (2000). Comparative physical mapping
of segments of the genome of Brassica oleracea var. alboglabra that
are homoeologous to sequenced regions of chromosomes 4 and 5 of
Arabidopsis thaliana. Plant J. 23, 233–243.
Osborn, T.C., Pires, J.C., Birchler, J.A., Auger, D.L., Chen, Z.J., Lee,
H.S., Comai, L., Madlung, A., Doerge, R.W., Colot, V., and
Martienssen, R.A. (2003). Understanding mechanisms of novel
gene expression in polyploids. Trends Genet. 19, 141–147.
Panopoulou, G., and Poustka, A.J. (2005). Timing and mechanism of
ancient vertebrate genome duplications – The adventure of a hypothesis. Trends Genet. 21, 559–567.
Parkin, I.A.P., Gulden, S.M., Sharpe, A.G., Lukens, L., Trick, M.,
Osborn, T.C., and Lydiate, D.J. (2005). Segmental structure of the
Brassica napus genome based on comparative analysis with Arabidopsis thaliana. Genetics 171, 765–781.
Paterson, A.H., Bowers, J.E., and Chapman, B.A. (2004). Ancient
polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. USA 101,
9903–9908.
Pfeil, B.E., Schlueter, J.A., Shoemaker, R.C., and Doyle, J.J. (2005).
Placing paleopolyploidy in relation to taxon divergence: A phylogenetic analysis in legumes using 39 gene families. Syst. Biol. 54,
441–454.
Posada, D., and Crandall, K.A. (1998). MODELTEST: Testing the model
of DNA substitution. Bioinformatics 14, 817–818.
Prince, V.E., and Pickett, F.B. (2002). Splitting pairs: The diverging
fates of duplicated genes. Nat. Rev. Genet. 3, 827–837.
Ptacek, M.B., Gerhardt, H.C., and Sage, R.D. (1994). Speciation by
polyploidy in treefrogs – Multiple origins of the tetraploid, Hyla
versicolor. Evolution Int. J. Org. Evolution 48, 898–908.
Quiros, C.F., Grellet, F., Sadowski, J., Suzuki, T., Li, G., and
Wroblewski, T. (2001). Arabidopsis and Brassica comparative genomics: Sequence, structure and gene content in the ABI1-Rps2-Ck1
chromosomal segment and related regions. Genetics 157, 1321–1330.
Rana, D., Boogaart, T., O’Neill, C.M., Hynes, L., Bent, E., Macpherson,
L., Park, J.Y., Lim, Y.P., and Bancroft, I. (2004). Conservation of
the microstructure of genome segments in Brassica napus and its
diploid relatives. Plant J. 40, 725–733.
Rozas, J., Sanchez-DelBarrio, J.C., Messeguer, X., and Rozas, R.
(2003). DnaSP, DNA polymorphism analyses by the coalescent and
other methods. Bioinformatics 19, 2496–2497.
Ancient Polyploidy in Brassicales
Sanchez-Acebo, L. (2005). A phylogenetic study of the new world
Cleome (Brassicaceae, Cleomoideae). Ann. Mo. Bot. Gard. 92,
179–201.
Schranz, M.E., Dobes, C., Koch, M.A., and Mitchell-Olds, T. (2005).
Sexual reproduction, hybridization, apomixis, and polyploidization in
the genus Boechera (Brassicaceae). Am. J. Bot. 92, 1797–1810.
Seoighe, C., and Gehring, C. (2004). Genome duplication led to highly
selective expansion of the Arabidopsis thaliana proteome. Trends
Genet. 20, 461–464.
Simillion, C., Vandepoele, K., Van Montagu, M.C.E., Zabeau, M., and
Van de Peer, Y. (2002). The hidden duplication past of Arabidopsis
thaliana. Proc. Natl. Acad. Sci. USA 99, 13627–13632.
Song, K.M., Lu, P., Tang, K.L., and Osborn, T.C. (1995). Rapid genome
change in synthetic polyploids of Brassica and its implications for
polyploid evolution. Proc. Natl. Acad. Sci. USA 92, 7719–7723.
Sterck, L., Rombauts, S., Jansson, S., Sterky, F., Rouze, P., and Van
de Peer, Y. (2005). EST data suggest that poplar is an ancient
polyploid. New Phytol. 167, 165–170.
Tate, J.A., Soltis, D.E., and Soltis, P.S. (2005). Polyploidy in plants. In
The Evolution of the Genome, T.R. Gregory, ed (San Diego, CA:
Elsevier), pp. 371–426.
Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994). CLUSTAL W:
Improving the sensitivity of progressive multiple sequence alignment
through sequence weighting, position-specific gap penalties and
weight matrix choice. Nucleic Acids Res. 22, 4673–4680.
Van de Peer, Y. (2004). Computational approaches to unveiling ancient
genome duplications. Nat. Rev. Genet. 5, 752–763.
Vision, T.J., Brown, D.G., and Tanksley, S.D. (2000). The origins of
genomic duplications in Arabidopsis. Science 290, 2114–2117.
Warwick, S.I., and Al-Shehbaz, I.A. (2006). Brassicaceae: Chromosome number index and database on CD-ROM. In Evolution and
1165
Phylogeny of the Brassicaceae, M.A. Koch and K. Mummenhoff, eds
(Heidelberg, Germany: Springer-Verlag), in press.
Wendel, J.F. (2000). Genome evolution in polyploids. Plant Mol. Biol.
42, 225–249.
Windsor, A.J., Schranz, M.E., Formanova, N., Gebauer-Jung, S.,
Bishop, J.G., Schnabelrauch, D., Kroymann, J., and Mitchell-Olds,
T. (2006). Partial shotgun sequencing of the Boechera stricta genome
reveals extensive microsynteny and promoter conservation with
Arabidopsis. Plant Physiol. 140, 1169–1182.
Wolfe, K.H. (2001). Yesterday’s polyploids and the mystery of diploidization. Nat. Rev. Genet. 2, 333–341.
Yang, T.J., Kim, J.S., Lim, K.B., Kwon, S.J., Kim, J.A., Jin, M., Park,
J.Y., Lim, M.H., Kim, H.I., Kim, S.H., Lim, Y.P., and Park, B.S.
(2005). The Korea Brassica Genome Project: A glimpse of the
Brassica genome based on comparative genome analysis with
Arabidopsis. Comp. Funct. Genomics 6, 138–146.
Yang, Y.W., Lai, K.N., Tai, P.Y., and Li, W.H. (1999). Rates of
nucleotide substitution in angiosperm mitochondrial DNA sequences
and dates of divergence between Brassica and other angiosperm
lineages. J. Mol. Evol. 48, 597–604.
Yu, J., et al. (2005). The genomes of Oryza sativa: A history of
duplications. PLoS Biol. 3, e38.
Zahn, L.M., King, H.Z., Leebens-Mack, J.H., Kim, S., Soltis, P.S.,
Landherr, L.L., Soltis, D.E., dePamphilis, C.W., and Ma, H. (2005).
The evolution of the SEPALLATA subfamily of MADS-Box genes: A
preangiosperm origin with multiple duplications throughout angiosperm history. Genetics 169, 2209–2223.
Zhang, X.Y., and Wessler, S.R. (2004). Genome-wide comparative
analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. Proc. Natl. Acad. Sci. USA 101,
5589–5594.
Independent Ancient Polyploidy Events in the Sister Families Brassicaceae and Cleomaceae
M. Eric Schranz and Thomas Mitchell-Olds
Plant Cell 2006;18;1152-1165; originally published online April 14, 2006;
DOI 10.1105/tpc.106.041111
This information is current as of July 31, 2017
Supplemental Data
/content/suppl/2006/03/31/tpc.106.041111.DC1.html
References
This article cites 75 articles, 24 of which can be accessed free at:
/content/18/5/1152.full.html#ref-list-1
Permissions
https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X
eTOCs
Sign up for eTOCs at:
http://www.plantcell.org/cgi/alerts/ctmain
CiteTrack Alerts
Sign up for CiteTrack Alerts at:
http://www.plantcell.org/cgi/alerts/ctmain
Subscription Information
Subscription Information for The Plant Cell and Plant Physiology is available at:
http://www.aspb.org/publications/subscriptions.cfm
© American Society of Plant Biologists
ADVANCING THE SCIENCE OF PLANT BIOLOGY