A Gene Duplication/Loss Event in the Ribulose-1,5

A Gene Duplication/Loss Event in the Ribulose-1,5Bisphosphate-Carboxylase/Oxygenase (Rubisco) Small
Subunit Gene Family among Accessions of Arabidopsis
thaliana
Sandra Schwarte and Ralph Tiedemann*
Evolutionary Biology, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
*Corresponding author: E-mail: [email protected].
Associate editor: Neelima Sinha
Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase; EC 4.1.1.39), the most abundant protein in nature, catalyzes the
assimilation of CO2 (worldwide about 1011 t each year) by carboxylation of ribulose-1,5-bisphosphate. It is a hexadecamer
consisting of eight large and eight small subunits. Although the Rubisco large subunit (rbcL) is encoded by a single gene on
the multicopy chloroplast genome, the Rubisco small subunits (rbcS) are encoded by a family of nuclear genes. In
Arabidopsis thaliana, the rbcS gene family comprises four members, that is, rbcS-1a, rbcS-1b, rbcS-2b, and rbcS-3b. We
sequenced all Rubisco genes in 26 worldwide distributed A. thaliana accessions. In three of these accessions, we detected
a gene duplication/loss event, where rbcS-1b was lost and substituted by a duplicate of rbcS-2b (called rbcS-2b*). By
screening 74 additional accessions using a specific polymerase chain reaction assay, we detected five additional accessions
with this duplication/loss event. In summary, we found the gene duplication/loss in 8 of 100 A. thaliana accessions,
namely, Bch, Bu, Bur, Cvi, Fei, Lm, Sha, and Sorbo. We sequenced an about 1-kb promoter region for all Rubisco genes as
well. This analysis revealed that the gene duplication/loss event was associated with promoter alterations (two insertions
of 450 and 850 bp, one deletion of 730 bp) in rbcS-2b and a promoter deletion (2.3 kb) in rbcS-2b* in all eight affected
accessions. The substitution of rbcS-1b by a duplicate of rbcS-2b (i.e., rbcS-2b*) might be caused by gene conversion. All
four Rubisco genes evolve under purifying selection, as expected for central genes of the highly conserved photosystem of
green plants. We inferred a single positive selected site, a tyrosine to aspartic acid substitution at position 72 in rbcS-1b.
Exactly the same substitution compromises carboxylase activity in the cyanobacterium Anacystis nidulans. In A. thaliana,
this substitution is associated with an inferred recombination. Functional implications of the substitution remain to be
evaluated.
Key words: Arabidopsis thaliana, Arabidopsis lyrata, Rubisco, gene duplication, positive selection.
Introduction
Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase;
EC 4.1.1.39) is located in the stroma of the chloroplast,
where it catalyzes carboxylation in the Calvin cycle and oxygenation in the photorespiratory pathway. The composition of the Rubisco holoenzyme varies among species. In
purple nonsulfur bacteria, several chemoautotrophic bacteria, and eukaryotic dinoflagellates, the enzyme complex is
a dimer consisting only of two large subunits (form II), in
most chemoautotrophic bacteria, cyanobacteria, red,
brown, and green algae, and all higher plants the multimeric enzyme consists of eight Rubisco small subunits (rbcS)
and eight Rubisco large subunits (rbcL) (form I) (Baker
et al. 1975; Spreitzer and Salvucci 2002; Andersson and
Backlund 2008). The large subunit (rbcL, AtCg00490) is encoded by a single gene in the chloroplast (Bedbrook et al.
1979). Small subunits are encoded by a multigene family in
the nucleus (Dean et al. 1989). The number of genes encoding rbcS is manifold and spreads from two gene copies
in Chlamydomonas to 22 or more genes in wheat (Spreitzer
2003). In Arabidopsis thaliana, the gene family comprises
four members (rbcS-1a, At1g67090; rbcS-1b, At5g38430;
rbcS-2b, At5g38420; rbcS-3b, At5g38410). With regard to
their chromosomal location, they have been divided into
two classes A and B (Krebbers et al. 1988), the former comprising only one copy (rbcS-1a) located on chromosome 1,
whereas the latter rbcS-b genes are tandemly arrayed
within 8 kb on chromosome 5.
Individual members of the rbcS gene family show unique
expression levels as well as developmental and tissue-specific expressions (Donald and Cashmore 1990; Dedonder
et al. 1993; Sawchuk et al. 2008). In general, rbcS-1a is
the major form (highest expression level), whereas rbcS1b, rbcS-2b, and rbcS-3b are minor forms (lower levels
of expression) (Dedonder et al. 1993; Yoon et al. 2001;
Sawchuk et al. 2008). Regarding developmental and tissuespecific expression patterns, the detailed analysis of small
subunits in A. thaliana of Sawchuk et al. (2008) generally
identified an expression of all small subunits in seedling as
well as in mature plant organs, though tissue-specific patterns occurred: RbcS-1a, for instance, is the only one expressed in roots, where its biological function has not
© The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 28(6):1861–1876. 2011 doi:10.1093/molbev/msr008
Advance Access publication January 10, 2011
1861
Research article
Abstract
MBE
Schwarte and Tiedemann · doi:10.1093/molbev/msr008
been revealed. An interesting peculiarity in the expression
pattern has been described for leaves, where rbcS-1b is exclusively expressed in the abaxial (lower) side of the leaf.
Although carbon fixation is proportional to light intensity
and hence higher at the adaxial (upper) side of the leaf,
Rubisco contents at the abaxial and adaxial surfaces are
similar (Nishio et al. 1993; Sun and Nishio 2001). This suggests that rbcS-1b might modify the catalytic properties of
the Rubisco holoenzyme such that its efficiency under
light-limiting conditions is maintained (Sawchuk et al.
2008). It was demonstrated that expression levels of
rbcS-1a, rbcS-2b, and rbcS-3b show specific responses to
different light pulses, whereas rbcS-1b expression is not affected by any of them (Dedonder et al. 1993). This corresponds to the presence (rbcS-1a, rbcS-2b, and rbcS-3b) and
the absence (rbcS-1b) of light-regulatory units (CMA5: conserved modular array 5) in the promoter of the Rubisco
small subunits (Krebbers et al. 1988). Mutagenesis experiments of G-box (C/A-CACGTGGC) and I-box (GATAAG)
from the Arabidopsis rbcS-1a promoter showed that both
are required for gene expression (Donald and Cashmore
1990). López-Ochoa et al. (2007) analyzed sequences of
G- and I-box as well as IbAM5 among plant species and
assessed that respective modules have potential binding
sites for transcription factors and are essential for CMA5
activity. The spacing between G- and I-box (about 15–
25 bp) is important for achieving functional combinatorial
interactions, whereas the relative position does not seem to
be critical.
Although large subunits contain the active sites for catalytic activity, it could be demonstrated that small subunits have an influence on protein activity (Spreitzer
and Salvucci 2002; Spreitzer 2003; Andersson 2008; Andersson
and Backlund 2008). The bA-bB loop of the small subunits,
which exhibits high variability in both length and nucleotide sequence, triggers numerous interactions between
both the large and the small subunits. It can hence profoundly influence the holoenzyme’s stability and catalytic
performance (Spreitzer et al. 2001; Spreitzer 2003). Rubisco
enzymes with enlarged bA–bB loops, as in land plants and
green algae, have generally higher specificity values for
CO2/O2 than enzymes with normal bA–bB loops. In short,
small subunits are not responsible for the catalytic activity
itself but rather for fine tuning of the Rubisco holoenzyme.
RbcS has been repeatedly engineered by inducing amino
acid substitutions at specific sites in different species, like
pea (Flachmann and Bohnert 1992), Chlamydomonas (Du
et al. 2000; Spreitzer et al. 2001), Anacystis nidulans (5 Synechococcus PCC6301) (Voordouw et al. 1987; Lee et al. 1991;
Paul et al. 1991; Read and Tabita 1992; Flachmann et al.
1997; Kostov et al. 1997), and Anabaena 7120 (Fitchen
et al. 1990). Although some substitutions have no implication on Rubisco activity, there are others that influence
carboxylation activity, specificity, or even the formation
of the holoenzyme.
In this study, we sequenced genes of the large as well as
the four small subunits of Rubisco in 26 worldwide distributed A. thaliana accessions and detected a gene duplica1862
tion/loss event, where rbcS-1b was lost and substituted
by a duplicate of rbcS-2b (called rbcS-2b*) in three (Bur,
Cvi, and Sha) of these accessions. We developed a specific
polymerase chain reaction (PCR) assay by which we
screened 74 additional accessions. The newly found
gene duplication/loss event occurred in eight (Bch, Bu,
Bur, Cvi, Fei, Lm, Sha, and Sorbo) of 100 accessions. We sequenced an about 1-kb promoter region for all Rubisco
genes as well. This analysis revealed that the gene
duplication/loss event was linked to promoter alterations
(two insertions of 450 and 850 bp, one deletion of 730 bp)
in rbcS-2b and a promoter deletion (2.3 kb) in rbcS-2b* in
all eight affected accessions. Functional implications of the
gene duplication/loss event on Rubisco holoenzyme as well
as intraspecific variability among A. thaliana accessions and
interspecific polymorphisms between A. thaliana and its
sister species Arabidopsis lyrata are discussed.
Material and Methods
Plant Cultivation, PCR Amplification, and
Sequencing
Seedlings as well as adult plants were grown in 1:1 mixture
of GS 90 soil and vermiculite. To break dormancy prior to
germination, seeds were incubated at 4 °C for at least 2
days before transfer to a short-day regime (12 h light
[120 lE m2 s1] at 20 °C/12 h dark at 18 °C). Leaves were
harvested after 4 weeks, and genomic DNA was extracted
from a pool of three plants per accession with a modified
cetyl trimethyl ammonium bromide-procedure (Rogers
and Bendich 1985). Primers for the Rubisco genes rbcL
(AtCg00490), rbcS-1a (At1g67090), rbcS-1b (At5g38430),
rbcS-2b (At5g38420), and rbcS-3b (At5g38410) were designed based on the sequence of Col-0 (see supplementary
table, Supplementary Material online). For amplification
and sequencing of coding regions, primers were placed
about 50–200 bp upstream (forward primer), respectively,
downstream (reverse primer) of the coding region. The
promoter region was analyzed with primers amplifying
about 1.0–1.5 kb upstream the start codon. The fragments
of 26 worldwide distributed accessions (Bl-1, Bur-0, Can-0,
Cha-0, Col-0, Ct-1, Cvi-0, Edi-0, El-0, Er-0, Est-1, Gre-0, Ler-1,
Mt-0, Nok-2, Oy-0, Rsch-0, Sap-0, Sha[kdara], Stw-0, Te-0,
Tsu-1, Van-0, Wil, Ws-3, and Yo-0; hereafter called ‘‘sample
set I’’) were amplified with the proofreading polymerase
Phusion (Finnzymes) and purified enzymatically by using
Exonuclease I and Antarctic Phosphatase (New England Biolabs). The template was directly used for sequencing on an
ABI 3130xl automated sequencer (Applied Biosystems), using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). After detection of a gene duplication/
loss event (see below), we developed a specific PCR assay
to screen 74 additional accessions for this event (AK-1,
Akita, Ang-0, Bay-0, Bch-1, Bd-0, Be-0, Bla-11, Blh-1, Bor4, Br-0, Bsch-2, Bu-2, C24, Cl-0, Co-3, Da-0, Da-112, Dijon-M, Dr-0, Dra-0, Ei-2, Enkheim-D, Ep-0, Fei-0, Ge-2,
Goe-2, Gol-1, GOT-7, Gr, HI-3, HOG, Hs-0, Is-1, Je-54, Jea,
Kae-0, Kl-0, Kn-0, Konchezero [N13], Kondara-0, Lan-0,
Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008
Lip-0, Lm, Lov 5, Lu, Mh-1, Ms-0, Nd, NFA8, No-0, Nw-3,
Old-1, Ove-0, Petergof, Po-0, Pr-0, Pt-0, Pyl-0, Rak-2,
RLD-1, RRS-10, RRS-7, Ru[bezhnoe]-1, Sorbo, St-0, Ta-0,
TAMM-2, Ts-1, Ty-0, Wc-2, Wei-1, Zue-1, Wil-0; hereafter
called ‘‘sample set II’’). The primers for detecting the gene
duplication/loss event were placed in intron 3 as well as in
the 3#-untranslated region (UTR) of rbcS-1b. In those accessions of sample set II where the PCR assay indicated the
gene duplication/loss event, we verified it by sequencing
the affected loci.
Data Analysis
Alignment
Sequences were assembled with BioEdit version 7.0.5 (Hall
1999), and all variable sites were checked manually during
the construction of a sequence contig for each accession.
All sequences were manually aligned to the reference
sequence of Col-0.
Estimation of Nucleotide Polymorphism
Genetic variability measures were calculated based on the
26 initially sequenced accessions (sample set I) for rbcL
(AtCg00490), rbcS-1a (At1g67090), rbcS-2b (At5g38420),
and rbcS-3b (At5g38410). For rbcS-1b (At5g38430), this
analysis was performed with only 23 sequences, as three accessions (Bur, Cvi, and Sha) turned out to lack this gene (see
below). With DnaSP version 5 (Librado and Rozas 2009) both
intra and interspecific levels of nucleotide polymorphism
were determined. We performed a multidomain analysis
to estimate number of polymorphic sites (S), total number
of substitutions (g), number of alleles (h), haplotype diversity
(Hd), nucleotide diversity (p) according to Nei (1987), and
GC content, separately for promoters, exons, and introns.
Nucleotide diversity (p) and divergence (K) were determined
according to Nei (1987). For an interspecific comparison,
we used sequences of A. lyrata (JGI).
Promoter Analyses
We sequenced the promoter region of about 1.0 kb for all
Rubisco genes to check if polymorphic sites affect ‘‘functionally important elements,’’ according to the plant promoter database (PPDB; Yamamoto and Obokata 2008),
which we searched for regulatory elements and other
important promoter regions, like the TATA box.
Inference of Gene Conversion and Recombination
We searched for gene conversion within any accession
across the three Rubisco paralogs on chromosome 5
(i.e., rbcS-1b/rbcS-2b*, rbcS-2b, and rbcS-3b), using the GENECONV software with default settings (Sawyer 1989). For
each gene separately, the number of putative recombination events was inferred with DnaSP version 5 (Librado and
Rozas 2009).
Evaluation of Genewise Selection
We tested whether nucleotide substitution patterns are indicative of natural selection acting upon them. Specifically,
we compared the relative frequencies of synonymous substitutions per synonymous site (dS) with those of nonsynonymous substitutions per nonsynonymous site (dN)
MBE
(Tamura et al. 2007; implemented in MEGA version 4).
The nonsynonymous to synonymous substitution rate ratio (x) was calculated according to the modified model of
Nei and Gojobori (Nei and Kumar 2000) with the correction of Jukes and Cantor (1969) for saturation/multiple hits.
With a Z-test, we assessed the likelihood of the null hypothesis of neutral evolution (H0: dN 5 dS), relative to two alternatives, that is, purifying selection (dN , dS) and positive
selection (dN . dS).
Tajima’s D statistic (Tajima 1989) was also calculated.
This often used selection test is based on the comparison
of two estimates of the amount of genetic variation, that is,
1) the number of segregating sites (Watterson 1975) and 2)
the average number of pairwise differences. Under the null
hypothesis of neutral evolution, both measures are expected to yield the same estimate, whereas a significant
difference in these measures can be indicative of natural
selection.
Selection at Particular Codons
Also, within a codon for a single amino acid, the ratio of
nonsynonymous to synonymous substitution rate (x) was
evaluated, as values ,1, 51, and .1 are indicative of purifying selection, neutral evolution, and diversifying (5 positive) selection, respectively. Positive selected sites (PSSs),
suggested by x . 1, were identified by using maximum
likelihood–based random-site model analysis implemented
in the PAML 3.14 package (Yang 1997, 2000). The analyses
for each Rubisco gene were performed using runcode ‘‘user
tree’’ in codeml. The utilized maximum likelihood trees
were constructed by RaxML 7.0.4 (Stamatakis 2006). We
performed one likelihood ratio test (LRT) for dN/dS heterogenity (M0 vs. M3) and two for positive selection (M1 vs.
M2 and M7 vs. M8). In the neutral M1 model, two site classes x0 and x1 are assumed. The more complex M2 model
(selection) adds a free x ratio, which is estimated from the
data set. Both models fix x0 5 1 and x1 5 1 and are unrealistic due to the fact that they do not account for sites
with 0 , x , 1. Therefore, Wong et al. (2004) and Yang
et al. (2005) described new models M1a and M2a, where
0 , x0 , 1 is estimated from the data set and x1 is fixed.
The new models are implemented since PAML version 3.14.
Because the M1–M2 comparison is less powerful, we performed a second LRT with model M7 and M8. M7 (b) assumes a beta distribution of x over sites, whereas model
M8 (b and x) adds an additional site class (free x ratio),
which is estimated from the data set (Yang 2000). The significance of the LRTs was tested assuming that twice the
difference in the log of maximum likelihood values between
the two models is distributed as a v2 distribution under the
null hypothesis of no selection. The degrees of freedom (df)
were equal to the difference in the number of parameters
of the two tested models. Thus, df 5 4 for the M0–M3
comparison, whereas df 5 2 for the M1–M2 and M7M8 comparisons. Whenever the alternative models M2
and M8 fitted the data better (P , 0.05) than the compared null models, the respective site was considered as
being positive selected.
1863
Schwarte and Tiedemann · doi:10.1093/molbev/msr008
Maximum Likelihood Gene Tree
With RAxML 7.0.4 (Stamatakis 2006), we constructed
a maximum likelihood gene tree for Rubisco generated under the general time reversible þ G þ I model of sequence
evolution with 1,000 bootstrap replicates. The tree is based
on a supergene consisting of complete sequences of three
small subunits rbcS-1a, rbcS-2b, and rbcS-3b for 26 natural
Arabidopsis accessions (sample set I). The fourth small subunit rbcS-1b was excluded because that gene was lacking in
some accessions due to a gene duplication/loss event (see
below). RbcL is not located in the nuclear but in the chloroplast genome and cannot be assumed to follow the same
evolutionary pattern as the nuclear genes. Therefore, a separate RAxML analysis was performed for this gene, which,
however, yielded a poorly resolved tree (data not shown).
Results
Gene Duplication/Loss Event
Our analyses of the loci rbcS-1b and rbcS-2b in 26 accessions (sample set I) from around the world revealed two
groups of haplotypes (fig. 1): Altogether 23 accessions appeared similar to Col-0 (termed ‘‘standard’’ hereafter; represented in fig. 1 by Col, Ler, and Ws), whereas three
accessions exhibited a divergent sequence pattern (termed
‘‘exception’’ hereafter; Bur, Cvi, Sha in fig. 1). These two
groups can be distinguished by various substitutions, insertions, and deletions in the promoter, the gene sequence as
well as the 3#-UTR. Relative to the standard, we found in
the promoter of the exception accessions a deletion at position 44 as well as two insertions, a 14-bp fragment
(5#-GAAAAAAAGAGCAA-3#, between positions 21
and 20) and a fragment of 9 bp (5#-GAAACAACA-3#,
between positions 11 and 10). There are 21 polymorphisms distributed over 3 exons of which 16 are diagnostic
between standard and exception accessions (fig. 1). Three
of the substitutions are nonsynomymous and lead to
amino acid exchanges between methionine and leucine
(M20L; nucleotide [nt] position 58; cf. fig. 1), serine and
alanine (S32A; nt position 96), as well as threonine and serine (T77S; nt position 319), respectively. These amino acid
exchanges, however, do not profoundly impact biochemical properties of the resulting protein, as polarity and
charge of the side chain are not affected. In comparison
with the exons, there are much more variable sites and
indels in the introns, especially in intron 2 of rbcS-1b.
The lower part of figure 1 shows the genetic variability of
locus rbcS-2b for the same accessions. Unlike the pattern
for rbcS-1b, we did not find evidence for very divergent
haplotypes among the 26 A. thaliana accessions analyzed.
We found three substitutions in the promoter region close
to the transcription start at positions 47, 43, and 39.
In the coding sequence, we could detect 11 polymorphic
sites, of which four are nonsynonymous. These substitutions lead to amino acid exchanges between leucine and
phenylalanine (L6F in all represented accessions except
Col-0; nt position 16), methionine and leucine (M20L in
Bch; nt position 58), serine and alanine (S32A in Cvi; nt
1864
MBE
position 96), as well as serine and asparagine (S48N in
Ler; nt position 143), which have no profound impact
on the biochemical properties of the affected amino acid.
Both introns are less variable than the exons with just one
3-bp deletion in intron 2 at position 523–525.
The comparison of the exception haplotype group of
rbcS-1b (Bur, Cvi, and Sha) and the sequences of rbcS-2b
revealed several comformities (gray highlighted in fig. 1)
in the promoter and the entire gene: The 14-bp promoter
insertion (Insert A) is—except for 3 bases—equal among
rbcS-1b exception and rbcS-2b, whereas the second 9-bp
insertion (Insert B) is exactly the same. Regarding exon
1, all rbcS-1b sequences (standard and exception) share a sequence pattern distinguished from all rbcS-2b sequences at
positions 24 and 25. From position 75 on, there is again (i.e.,
as in the promoter) a striking difference among rbcS-1b sequences of different accessions (i.e., between groups standard and exception), such that rbcS-1b exception
resembles the sequence pattern of rbcS-2b (highlighted
gray in fig. 1). This identity between rbcS-1b exception
and rbcS-2b extends through subsequent introns and exons
until position 797 in exon 3. Two positions follow (798 and
799) where all rbcS-1b(standard and exception) and rbcS2b were always identical. From position 800 on and
throughout the subsequent 3#-UTR, rbcS-1b of all accessions (standard and exception) are again identical and deviate from rbcS-2b (fig. 1). From this sequence pattern, we
assume that in some accessions (Bur, Cvi, and Sha) the
rbcS-1b gene and part of its 5#-UTR region were lost between position 20 and 797 and substituted by a duplicate
of rbcS-2b. We tested the rbcS-1b and rbcS-2b of Bur,
Cvi, and Sha regarding the occurrence of gene conversion
by using GENECONV (Sawyer 1989) and inferred that
the region between positions 59 and 799 in rbcS-1b was
converted to rbcS-2b. This locus was also the only Rubisco
gene for which we inferred recombination events (Rm 5
2; positions 304–366 and 366–669). The Rm value and
the inferred recombination events remained stable, regardless of inclusion or exclusion of the exception accessions. In addition, the inferred regions of recombination
do not coincide with our gene duplication/loss event,
rendering recombination as the underlying mechanism
unlikely.
To assess whether this newly found gene duplication/
loss event was present in more accessions, we developed
a PCR assay and screened 74 additional accessions. In this
assay, one PCR (P307/P57) only amplified the original rbcS1b standard locus (fig. 2A), whereas we designed two PCRs
containing different primer sets (P316/P57 and P317/57) to
specifically amplify the rbcS-1b exception (hereafter called
rbcS-2b*, because of its assumed origin from a duplication
of rbcS-2b; fig. 2B). Col, Ler, and Ws served as controls for
rbcS-1b standard, whereas Bur, Cvi, and Sha were controls
for the presence of rbcS-2b*, exhibiting the inferred gene/
duplication loss event. Of the 74 additionally analyzed accessions (sample set II), five exhibited the loss of rbcS-1b
and gain of rbcS-2b* in the PCR assay (and additionally verified by sequencing in these five accessions). Altogether, 8
Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008
MBE
FIG. 1. Sequence polymorphisms in rbcS-1b and rbcS-2b of Arabidopsis thaliana accessions. Shading indicates sequence identity to Col-0 rbcS-1b
(white) or Col-0 rbcS-2b (gray). Col-0, Ler, and Ws represent the standard type. In Bur, Cvi, Sha, Bch, Bu, Fei, Lm, and Sorbo, the rbcS-1b gene
resembles the sequence pattern of Col-0 rbcS-2b (exception type), indicative of a gene duplication/loss event at that locus (see text for details).
Nucleotides are numbered starting with the initiation codon (ATG). Stop codon (TAA) is at positions 804–806. Dots indicate identity to the
reference sequence of Col-0. Hyphen represents the absence of a nucleotide relative to the reference. Four insertions are present; (A) 14-bp
insertion between positions 21 and 20, (B) 9-bp insertion between position 11 and 10, (C) 1-bp insertion between positions 175 and
176, and (D) 1-bp insertion between positions 839 and 840. Amino acid substitutions are given as one-letter symbols in the lower part of the
column. The upper symbol indicates the amino acid in Col-0, whereas the lower is the modified one.
1865
MBE
Schwarte and Tiedemann · doi:10.1093/molbev/msr008
FIG. 2. PCR assay for detecting the gene duplication/loss event in
additional Arabidopsis thaliana accessions. Col, Ler, and Ws are
accessions without, whereas Bur, Cvi, and Sha represent reference
plants with gene duplication/loss event. Bch, Bu, Fei, Lm, and Sorbo
are additional accessions found to share the duplication/loss
pattern. L 5 1-kb ladder (Fermentas). (A) PCR amplification of
rbcS-1b (lost in accessions Bur to Sorbo) and (B) detection of rbcS2b* (gained through duplication of rbcS-2b in accessions Bur to
Sorbo).
of 100 A. thaliana accessions analyzed show this alteration
in their rbcS genes.
We amplified and sequenced promoter regions of about
1 kb for each Rubisco gene as well and detected promoter
length variations in rbcS-2b and rbcS-2b* during PCR in all
accessions affected by the gene duplication/loss event
(fig. 3): Accessions with the inferred gene duplication/loss
show shorter rbcS-2b* promoters (fig. 3A) and longer rbcS2b promoters (fig. 3B). By using primers that are placed in
the upstream gene sph8 (At5g38435) as well as in rbcS-1b
we sequenced almost the whole intergenic region in order
to verify that the original rbcS-1b did not remain in the
upstream region of rbcS-2b*. In the rbcS-2b* promoter,
an about ;2.3-kb deletion occurred between 2.7 and
470 bp upstream the start codon (fig. 4C). Additionally,
we sequenced the entire intergenic region between rbcS2b* and rbcS-2b. The rbcS-2b promoter of these accessions
possesses length variations as well. A deletion of ;730 bp
occurred between 1.8 and 1.1 kb upstream the translation
start of rbcS-2b in the intergenic region of rbcS-2b* and
rbcS-2b. Additionally, two insertions occurred in this promoter. The first one of ;850 bp was fit in about 500 bp
upstream the translation start, whereas the second ;450
bp fragment was inserted about 1,000 bp upstream the
start codon (fig. 4C).
Basic local alignment search tool (BLAST) searches of the
inserted sequences show no similarity to any available
genome sequence of A. thaliana. Using RepeatMasker
(http://
repeatmasker.org), we screened for interspersed repeats
within respective DNA fragments. We found a DNA sequence
1866
FIG. 3. PCR assay for detecting the promoter variation in rbcS-1b
(resp. rbcS-2b*) and rbcS-2b in Arabidopsis thaliana accessions. Col,
Ler, and Ws represent accessions without promoter alteration. The
accessions with gene duplication/loss (Bur, Cvi, Sha, Bch, Bu, Fei, Lm,
and Sorbo) all exhibit length variations in rbcS-2b and rbcS-2b*
promoters. L 5 1-kb ladder (Fermentas). (A) PCR of rbcS-1b (resp.
rbcS-2b*) promoter; (B) PCR of rbcS-2b promoter.
of 219 bp (positions 112–339), which has an identity of 66%
to a long terminal repeat element within the ;450-bp fragment. It belongs to the ATHILA7 family, which is a member
of the GYPSY superfamily (The Arabidopsis Information
Resource; Rhee et al. 2003). For the longer ;850-bp fragment, we detected inverted repeats as well. A sequence of
543 bp (positions 329–871) has an identity of 89% to the
ATREP2 family, which belongs to the RC/HELITRON superfamily.
The minimal light-regulatory unit (CMA5) of rbcS promoters in A. thaliana (López-Ochoa et al. 2007), which is
about 150–300 bp upstream the translations start, was not
affected by the two insertion events that occurred 500
and 1,000 bp upstream the start codon in the rbcS-2b
promoter.
Analyses of Regulatory Promoter Elements
We sequenced about 1 kb upstream the translation start in
26 accessions and searched for ‘‘functionally important regions’’ in the PPDB (Yamamoto and Obokata 2008) to look
for variability in those elements. Information for the rbcL
promoter was not available, probably due to the fact that
rbcL is a chloroplast gene and hence not considered in that
database. For rbcS-1a, we found the coordinates for the
TATA box (52 to 63) and four regulatory elements
(142 to 150; 197 to 208; 254 to 261; 250
to 262). According to our analysis, none of them are
modified by polymorphic sites or indels, such that the substitutions found in the promoter region of rbcS-1a are
probably without any implication for transcription. Unfortunately, PPDB did not contain information on regulatory
elements in rbcS-1b that are mentioned in the database.
MBE
Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008
FIG. 4. Graphical overview of promoter variation and gene duplication/loss event in Arabidopsis thaliana. (A) Chromosomal arrangement of
the rbcS-b loci including primer positions that were used for sequencing. Relative positions of the genes are given as well. (B) Sequenced region
of the standard accessions. (C) Sequenced region of the exception accessions, that is, those exhibiting the gene duplication/loss event and the
coupled promoter variations in rbcS-2b* and rbcS-2b. A ;2.3-kb deletion occurred 470–2,700 bp upstream the translation start codon in the
promoter of rbcS-2b* and co-occurred with the replacement of rbcS-1b by rbcS-2b*. A second ;730-bp deletion occurred in the promoter of
rbcS-2b. Additionally, two different fragments (;450 and ;850 bp long) are inserted in the rbcS-2b promoter. Insertions are positioned 500
and 1,000 bp upstream the start codon, respectively, relative to the standard type.
We found numerous substitutions and indels in the rbcS2b* and rbcS-1b promoters, respectively, putatively providing potential for transcriptional changes. As in rbcS-1b,
there are many modifications in the rbcS-2b promoter. Positions for the TATA box (73 to 86) and seven
regulatory elements (122 to 129: AtREG479; 140 to
147: AtREG493; 149 to 156: AtREG654; 211 to
220: AtREG468, AtREG379, AtREG376; 249 to 256:
AtREG570; 313 to 320: AtREG647; 341 to 348:
AtREG645) in rbcS-2b were retrieved from PPDB. In Bur,
Cvi, and Sha, we detected mutations in the TATA box:
There is a C to T substitution at position 86 in Bur
and Sha and a deletion in Cvi at position 84 to 112
occurred, which lead to a loss of 3 bp in the TATA box.
The regulatory element AtREG647 is mutated as well:
The region of 313 to 320 shows two variants of modifications: 1) Bur, Cvi, and Sha have a T to A substitution at
position 315; 2) Nok and Sap are affected by a 3-bp deletion between 314 and 316. In rbcS-3b, the TATA box
is situated between 73 and 83, whereas regulatory elements are at positions 103 to 110, 141 to 150, and
205 to 221. None of those regions show any polymorphism among the 26 A. thaliana accessions analyzed here.
López-Ochoa et al. (2007) analyzed the functional architecture of the conserved modular array 5 (CMA5), a minimal light-regulatory unit of rbcS promoters in A. thaliana.
The position and sequences of respective motifs of rbcS-2b
and rbcS-3b in A. thaliana were described and compared
with other plants’ CMA5s. Donald and Cashmore (1990)
already detected G- and I-box in rbcS-1a as well as the implication of mutations in these motifs on the expression
level. The described sequences of G- and I-box are slightly
different between López-Ochoa et al. (2007) and Donald
and Cashmore (1990). The RbcS-1b promoter is lacking
the G-box due to a 43-bp deletion in that region (Krebbers
et al. 1988). We detected no nucleotide substitutions either
in the I- or in the G-box, whereas in IbAM5, we found variable sites in rbcS-1b and rbcS-2b (table 1). In rbcS-1b, a G to
A substitution occurred coupled with an insertion of two
adenine residues in Bur, Cvi, and Sha, such that the IbAM5
box of these accessions contains only adenine residues.
Modifications in the respective motifs of rbcS-2b were
found in Bur, Cvi, and Sha as well, where at position 2 thymine was substituted by an adenine.
Genetic Variability in Rubisco Genes among
Accessions
Nucleotide variability was estimated for rbcL, rbcS-1a, rbcS2b, and rbcS-3b of 26 worldwide distributed accessions for
each structural region (UTRs, exons, introns) separately
Table 1. Sequences of Minimal Light-Regulatory Unit CMA5
Modules in Arabidopsis thaliana.
rbcS-1a
rbcS-1b
rbcS-2b
rbcS-3b
IbAM5
ATAGATAA
AAAAGAAA
ATGAGAAA
AGAGAAAA
nt
0
0
0
0
I-Box
GATAAG
GATAAG
GATAAG
GATAAG
nt
15
—
14
14
G-Box
CCACGTGGC
—
CCACGTGAT
CCACGTGGC
NOTE.—Variable sites of CMA5 modules among A. thaliana accessions are
highlighted gray. nt 5 number of nucleotides interspersed between regulatory
elements. Consensus sequences of IbAM5 as well as I- and G-box are taken from
Donald and Cashmore (1990) and López-Ochoa et al. (2007).
1867
MBE
Schwarte and Tiedemann · doi:10.1093/molbev/msr008
Table 2. Genetic Variation in Rubisco Genes among Arabidopsis thaliana Accessions.
Domain
n
Sites
S
g
Nonsyn
Indels
h
Hd
p
GC Content
Promoter
Exon 1
Gene
26
26
26
999
1440
1440
3
2
2
3
2
2
—
0
0
—
0
0
4
3
3
0.286
0.538
0.538
0.0003
0.0004
0.0004
0.310
0.440
0.440
Promoter
Exon 1
Intron 1
Exon 2
Intron 2
Exon 3
Gene
26
26
26
26
26
26
26
1029
171
106
135
136
237
785
17
0
9
0
2
0
11
17
0
9
0
2
0
11
—
0
—
0
—
0
0
—
0
2
0
0
0
2
13
1
3
1
2
1
4
0.889
0.000
0.520
0.000
0.271
0.000
0.582
0.004
0.000
0.042
0.000
0.004
0.000
0.006
0.339
0.544
0.227
0.459
0.244
0.506
0.424
Promoter
Exon 1
Intron 1
Exon 2
Intron 2
Exon 3
Gene
23
23
23
23
23
23
23
1221
173
90
133
171
240
807
60
6
2
5
4
5
22
61
6
2
5
4
5
22
—
0
—
2
—
1
3
—
0
0
0
1
0
1
20
4
3
5
5
5
16
0.984
0.320
0.170
0.640
0.451
0.771
0.964
0.0107
0.0048
0.0028
0.0076
0.0053
0.0070
0.0059
0.293
0.543
0.303
0.464
0.299
0.504
0.455
Promoter
Exon 1
Intron 1
Exon 2
Intron 2
Exon 3
Gene
26
26
26
26
26
26
26
2389
173
90
133
143
240
779
63
7
1
0
1
3
12
65
7
1
0
1
3
12
—
3
—
0
—
0
3
—
0
0
0
1
0
1
12
4
2
1
2
4
9
0.862
0.452
0.077
0.000
0.077
0.542
0.809
0.0187
0.0046
0.0009
0.0000
0.0006
0.0026
0.0020
0.341
0.553
0.289
0.459
0.286
0.497
0.441
Promoter
Exon 1
Intron 1
Exon 2
Intron 2
Exon 3
Gene
26
26
26
26
26
26
26
993
173
90
133
170
240
806
49
3
2
1
16
4
25
50
3
2
1
17
4
26
—
1
—
0
—
0
1
—
0
1
0
3
0
4
18
4
3
2
5
4
7
0.963
0.222
0.151
0.148
0.455
0.222
0.471
0.0081
0.0013
0.0025
0.0011
0.0115
0.0013
0.0033
0.315
0.578
0.271
0.452
0.283
0.508
0.440
rbcL
rbcS-1a
rbcS-1b
rbcS-2b
rbcS-3b
NOTE.—n 5 number of analyzed accessions; sites 5 fragment length; S 5 polymorphic sites; g 5 total number of mutations; nonsyn 5 substitutions at nonsynonymous
sites; indels 5 total number of insertions/deletions; h 5 number of different haplotypes; Hd 5 haplotype diversity; p 5 nucleotide diversity.
(table 2). The same analysis for rbcS-1b was performed for
23 accessions, as this gene was lacking in Bur, Cvi, and Sha
(see above). As expected, RbcL is the most conserved Rubisco gene with about 0.3% and 0.1% variable sites in promoter and gene, respectively. Both substitutions in exon 1
are synonymous and do not lead to functional alterations
on the protein level.
Comparing the small subunit genes, the most expressed
subunit rbcS-1a (Dedonder et al. 1993; Yoon et al. 2001;
Sawchuk et al. 2008) is also the most conserved one:
The exons show no variability, whereas intron 1 is with nine
mutated sites more variable than intron 2, which has only
two substitutions. With about 1.7% polymorphic sites (17
substitutions), its promoter is quite polymorphic. In total,
sequencing 2,141 bp of the rbcS-1a locus in 26 accessions
(sample set I), we found 30 polymorphic sites.
RbcS-1b, rbcS-2b, and rbcS-3b, the tandemly arrayed
small subunit genes on chromosome 5, show similar levels
of variability, higher than those of rbcS-1a. About 6% of
nucleotides in the rbcS-1b promoter are polymorphic.
We found 16 substitutions in exons (fig. 5), of which three
are nonsynonymous and lead to amino acid exchanges between tyrosine and aspartic acid (Y72D; nt position 214;
1868
accessions El, Er, Gre), prolin and arginine (P74R; nt position 221; accession Gre), as well as aspartic acid and asparagine (D180N; nt position 538; accessions Er, Est, Mt, Tsu).
The RbcS-2b promoter has about 9% variable sites. As in
rbcS-1b, most polymorphic sites of rbcS-2b are in exons.
Just one substitution per intron was detected in rbcS-2b,
whereas in exons, we found 10 substitutions (fig. 5). Three
of them are nonsynonymous and lead to amino acid substitutions of phenylalanine to leucine (F6L; nt position 16;
accessions Bl, Bur, Can, Cha, Ct, Cvi, Edi, Er, Est, Gre, Ler, Mt,
Nok, Oy, Rsch, Sap, Sha, Stw, Te, Tsu, Wil, and Ws), serine to
alanine (S32A; nt position 94; accession Cvi), and serine to
asparagine (S48N; nt position 143; accessions Ler, Te).
RbcS-3b has a level of variability in promoter (5.2%) and
gene (3.2%) similar to rbcS-1b and rbcS-2b. However, the
distribution of substitutions within the gene is different.
While rbcS-1b and rbcS-2b exhibit more polymorphic sites
in exons than in introns, this pattern is reversed in rbcS-3b.
We could detect eight substitutions within exons (fig. 5),
one of them nonsynonymous leading to an amino acid exchange between alanine and threonine (A47T; nt position
139; accession Sap). Intron 1 of rbcS-3b is less variable (2 substitutions) than intron 2 with 16 polymorphic sites.
Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008
MBE
FIG. 5. Nucleotide polymorphisms in the coding sequence of Rubisco genes among Arabidopsis thaliana accessions. RbcS-1a did not show any
variability in its coding region. Dots indicate identity to the reference Col-0. Sequence patterns and amino acid substitutions of the accessions
Bur, Cvi, and Sha (exception type) are highlighted in gray. Asterisks point to sites where some accessions share a substitution with Arabidopsis
lyrata. Amino acid substitutions are shown in the lower part of any column. The upper symbol indicates the amino acid in Col-0, whereas the
lower is the deviate one. The single inferred positive selected site is indicated with PSS.
All Rubisco genes share some general patterns: Most
substitutions were found in promoters, and the GC content is different between exons (44–58%) and introns
(22–34%), in agreement with previous data (Zhu et al. 2009).
Signs of Selection
To evaluate which kind of selection, either positive or purifying selection, is acting on Rubisco genes, we performed
a Z-test of selection (table 3). As rbcS-1a was invariable in
its exons, the test could not be performed for that gene.
With only two synonymous substitutions in the coding region of rbcL, the test did not reveal a statistical significant
Table 3. Tests of Selection on Rubisco Genes in Arabidopsis
thaliana.
Positive Selection
rbcL
rbcS-1a
rbcS-1b
rbcS-2b
rbcS-3b
Purifying Selection
Tajima’s D
Z Statistic P Value Z Statistic P Value D Value P Value
21.178
1.000
1.147
0.127
0.18 >0.10
0.000
1.000
0.000
1.000
—
22.966
1.000
2.935
0.002
20.68 >0.10
21.751
1.000
1.763
0.040
21.51 >0.10
22.501
1.000
2.412
0.009
22.11 <0.05
NOTE.—Z-test of selection with null hypothesis (H0: dN 5 dS) tested against two
different alternative hypotheses, that is, positive selection (HA: dN . dS) and
purifying selection (HA: dN , dS). Z statistics and respective significance values (P
value) were calculated for coding sequences (CDS) of each Rubisco gene. Exons of
rbcL only exhibited two substitutions, both synonymous. Exons of rbcS-1a were
monomorphic at all sites.
pattern for this gene, as neither positive selection (P 5
1.000 for null hypothesis of no positive selection) nor purifying selection (P 5 0.127) yielded statistical support. We
argue, however, that the very low number of polymorphic
sites in the exons of these two genes might be taken as an
indication for purifying selection to operate. All tests of
genewise selection for the rbcS-b genes revealed a statistically significant pattern, that is, significant support for purifying selection acting on rbcS-1b, rbcS-2b, and rbcS-3b. We
also calculated Tajima’s D, a commonly used selection test
in A. thaliana, and obtain significant values for rbcS-3b
(2.11; indicative of purifying selection), which corroborates the results of the Z-test. The remaining D statistics
were not significant.
Genewise purifying selection on such fundamental genes
as Rubisco could be expected. However, single amino acid
sites could potentially diverge among accessions due to
positive selection. We searched for such PSSs in Rubisco
genes by using PAML. Generally, PAML analyses could yield
false-positive results and should hence not be performed
for genes, where gene conversion is inferred to act (Casola
and Hahn 2009). However, this difficulty is easily overcome
by exclusion of the sequences affected by gene conversion
(Casola and Hahn 2009). We hence excluded rbcS-2b* sequences from this analysis. We identified a single putative
PSS in rbcS-1b at position 214 in the coding sequence
(fig. 5). There, thymine is substituted by guanine in the
1869
MBE
Schwarte and Tiedemann · doi:10.1093/molbev/msr008
Table 4. Intra and Interspecific Variabilities in Rubisco Genes of Arabidopsis thaliana and Arabidopsis lyrata.
Number of Fixed Differences
Orthologs
rbcL
rbcS-1a
rbcS-1b
rbcS-2b
rbcS-3b
Paralogs
rbcS-1a versus
rbcS-1a versus
rbcS-1a versus
rbcS-1b versus
rbcS-1b versus
rbcS-2b versus
p
0.0007
0.0058
0.0058
0.0020
0.0030
rbcS-1b
rbcS-2b
rbcS-3b
rbcS-2b
rbcS-3b
rbcS-3b
K
0.0011
0.0095
0.0106
0.0075
0.0088
p/K
0.627
0.611
0.543
0.272
0.336
Total
7
38
39
55
58
p (A. thaliana Col)
0.193
0.193
0.206
0.053
0.060
0.088
Syn
5
30
32
46
49
Nonsyn
2
8
7
9
9
p (A. thaliana Cvi)
0.194#
0.193
0.216
0.005#
0.093#
0.092
Number of Polymorphic Sites
Total
3
10
20
12
22
Syn
3
10
17
9
21
Nonsyn
0
0
3
3
1
p (A. lyrata)
0.201
0.199
0.209
0.068
0.056
0.075
NOTE.—Measures for complete gene sequences, including introns in rbcS genes. p 5 nucleotide diversity within A. thaliana; K 5 nucleotide divergence among A. thaliana and
A. lyrata; p/K 5 ratio of diversity and divergence. Number of fixed differences between A. thaliana and A. lyrata as well as number of polymorphic sites among A. thaliana
accessions are given for synonymous (syn) and nonsynonymous (nonsyn) substitutions. #in these comparisons, rbcS-1b is substituted by rbcS-2b*; see text for explanation.
accessions El (Ellershausen, Germany), Er (Erlangen, Germany), and Gre (Greenville, Michigan), leading to an amino
acid exchange of tyrosine to aspartic acid at position 72
(Y72D) at the protein level (M0 vs. M3: v2 5 26.512,
P , 0.001; M1 vs. M2: v2 5 15.459, P , 0.001; M7 vs.
M8: v2 5 15.519, P , 0.001, posterior probability [M8 Bayes
Empirical Bayes analysis]: 0.96). At this position, we also
inferred an intracodon recombination event. This recombination position was not inferred, when accessions with
the one (tyrosine) or the other (aspartic acid) residue were
analyzed separately, suggesting that this substitution might
be associated with a recombination. It has been previously
demonstrated that recombination can lead to false positives in PAML analyses (Casola and Hahn 2009). However,
the selection models favored in our analysis (M7 [b] against
M8 [b and x]) are considered relatively robust to the effect
of recombination (Anisimova et al. 2003).
Comparison of Intra and Interspecific Variabilities
in Rubisco Genes
Substitution rates per site (i.e., the ratio of polymorphic vs.
monomorphic sites) differed significantly among the different Rubisco genes, regardless of whether all substitutions
(v2 5 42.069; P , 0.001) were considered, first and second
codon positions only (v2 5 10.272; P 5 0.036) or third codon position only (v2 5 33.292; P , 0.001). As 1) this difference in substitution rate among genes is particularly
large for the third codon position and 2) all observed substitutions at this position were synonymous, this points toward underlying differences in gene-specific mutation
rates. When this comparison was restricted to those genes
of the small subunit tandemly arrayed on chromosome 5
(rbcS-1b, rbcS-2b, and rbcS-3b), no significance among
genes occurred, such that for these genes substitution rates
appear similar. Interestingly, the two remaining Rubisco
genes (rbcL, encoded in the chloroplast; rbcS-1a, encoded
on chromosome 1) did not differ significantly from one another in their substitution rate.
The accessions in which we detected the gene duplication/loss (Bur, Cvi, and Sha) have been previously described
1870
to be particularly variable (Schmid et al. 2003; Nordborg
et al. 2005; Ossowski et al. 2008). We evaluated also
whether our inferred difference in substitution rates among
Rubisco genes was caused specifically by the inclusion of
these accessions. However, the observed pattern remained,
regardless of the inclusion or exclusion of these accessions
(data not shown). To further investigate this pattern, we
included also intron sequences and compared variability
(i.e., nucleotide diversity p) among A. thaliana accessions
to interspecific divergences (K) from its sister species A. lyrata (table 4). The level of polymorphism for rbcL is by far
the lowest, both within A. thaliana (p) and between
A. thaliana and A. lyrata (K), compared with all rbcS genes.
Among the rbcS genes, rbcS-1a and rbcS-1b exhibit highest
nucleotide diversities (p, K), whereas rbcS-2b and rbcS-3b
are intermediate. Despite different diversity and divergence, the p/K ratios were similar among rbcL, rbcS-1a,
and rbcS-1b. This ratio was lower in rbcS-2b and rbcS-3b,
indicating that these genes contain relatively more fixed
differences between the species than polymorphisms
among A. thaliana accessions. We also compared the nucleotide diversity among paralogs within A. lyrata and A.
thaliana accessions (i.e., Col-0 as a standard accession;
Cvi as an exception accession; table 4). It is evident that
the three genes on chromosome 5 (rbcS-1b, rbcS-2b, and
rbcS-3b) are less diverged from one another than from
the rbcS-1a located on the first chromosome. Pairwise divergence patterns among paralogs are similar across the
two analyzed A. thaliana accessions and A. lyrata, except
for the comparison between rbcS-1b and rbcS-2b. Here,
Col-0 and A. lyrata show about 10-fold the divergence
found in Cvi, where rbcS-1b is substituted by rbcS-2b*.
We detected 12 sites in Rubisco exons, where a nucleotide difference relative to Col-0 was shared among some
accessions and A. lyrata (asterisks in fig. 5). Among those
substitutions, there were two nonsynonymous substitutions in rbcS-2b (positions 16, 94). The first one leads to
an amino acid substitution between phenylalanine and leucine. Interestingly, 22 analyzed A. thaliana accessions show
the same sequence pattern as that of A. lyrata, whereas
Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008
MBE
FIG. 6. Intra and interspecific comparisons of inferred amino acid sequences of all rbcS genes in Arabidopsis thaliana and Arabidopsis lyrata.
Dots indicate identity to the reference sequence of rbcS-1a of A. thaliana (Col-0), whereas capital letters show amino acid substitutions. The
first amino acid of the mature protein is methionine at position 56 (bold letter). Intraspecific polymorphisms within Rubisco genes are
highlighted in gray.
only four accessions resemble the Col-0 pattern, which is
typically considered the A. thaliana reference. Regarding
the second amino acid exchange, only one accession
(Cvi) showed the A. lyrata pattern, that is, a substitution
at position 94 (fig. 5), leading to an exchange of serine to
alanine.
Figure 6 presents an alignment of inferred amino acid
sequences for all four rbcS genes of A. thaliana and A. lyrata.
RbcS-1a is clearly divergent from the rbcS-b genes in both
species. Moreover, there is less divergence between rbcS-1a
genes among the two species than between rbcS-1a and
the rbcS-b genes within either species (amino acid positions
10, 13, 35, 54, 57, 79, and 113). Apparently, rbcS-1a and the
rbcS-b genes evolve independently, at least since the speciation of A. thaliana and A. lyrata from their most recent
common ancestor. Among the three rbcS-b genes, we detected a different evolutionary pattern. Here, divergence
among paralogous genes within species is less than among
orthologous between species (see amino acid positions 143,
151, and 162; cf. table 4). This indicates concerted evolution,
which has already been reported for rbcS genes (Pichersky
et al. 1986; Dean et al. 1987, 1989).
Phylogenetic Relationship and Geographical Origin
Based on 2,370-bp sequences of three nuclear Rubisco
genes (rbcS-1a, rbcS-2b, and rbcS-3b; rbcS-1b were excluded
because of the gene duplication/loss event) of 26 accessions, we calculated a maximum likelihood tree (fig. 7).
We initially included A. lyrata in our analysis, yielding a tree
poorly resolved for accessions of A. thaliana due to the considerable divergence among the two species (small inset
graph in fig. 7). The gene tree for A. thaliana accessions
displays four groups of haplotypes with no association between geographical origin. Regarding the relationship between Bur, Cvi, and Sha, that is, those accessions
affected by the gene duplication/loss event (the exception
type), Bur and Cvi are in haplotype group II, whereas Sha is
in group I. There is hence no direct phylogenetic relationship among them, at least not detectable in the Rubisco
genes. Comparing the geographical origin of exception accessions, there is no evident pattern either: Affected accessions occur in Ireland (Bur), Cape Verde Island (Cvi),
Tadjikistan (Sha, Sorbo), Germany (Bch, Bu), Portugal
(Fei), and France (Lm). So far, there is hence neither a clear
phylogenetic nor geographic pattern pointing to the
1871
Schwarte and Tiedemann · doi:10.1093/molbev/msr008
MBE
FIG. 7. Rubisco supergene tree of Arabidopsis thaliana accessions. Unrooted maximum likelihood tree based on the composite supergene of
rbcS-1a, rbcS-2b, and rbcS-3b. RbcS-1b/RbcS-2b* were excluded as only one of the two is present in any accession, due to the gene duplication/
loss event. The unit for branch length is the number of nucleotide differences. If Arabidopsis lyrata is included (small inset graph on the right),
divergence among both species vastly exceeds intraspecific variation, precluding resolution among A. thaliana accessions.
putative origin of our inferred gene duplication/loss event
in Rubisco.
Discussion
Several previous studies have investigated interspecific genetic variability of Rubisco genes among plants (Kapralov
and Filatov 2007; Andersson and Backlund 2008) as well as
genetic variability among accessions of A. thaliana (Kuittinen
and Aguadé 2000; Aguadé 2001; Lu and Rausher 2003;
Mauricio et al. 2003; Shepard and Purugganan 2003; Moore
et al. 2005; Balasubramanian et al. 2006; Ramos-Onsins et al.
2008). Here, we specifically focus on intraspecific genetic
variation in Rubisco genes among accessions of A. thaliana,
compared with the level of divergence from its sister species A. lyrata. After analyzing 100 worldwide distributed accessions, we found a gene duplication/loss event (the
exception in fig. 4) in eight accessions, that is, Bch, Bu,
Bur, Cvi, Fei, Lm, Sha, and Sorbo. In these accessions, the
major part of rbcS-1b was lost and replaced by rbcS-2b
(fig. 1). Our analysis points toward gene conversion as a possible mechanism for this substitution. Gene conversion is
promoted among gene duplicates with high sequence similarities (Dean et al. 1989), like the three Rubisco small subunit paralogs located on chromosome 5 (cf. table 4). Small
subunits are characterized by a unique and partial overlapping gene expression pattern (Dedonder et al. 1993; Yoon
et al. 2001; Sawchuk et al. 2008). In leafs, the rbcS-1b lost in
our exception accessions is specifically expressed in the abaxial leaf side. It has been discussed whether this subunit
modifies the Rubisco holoenzyme to be more efficient under light-limiting conditions (lower side of the leaf). RbcS–
1b and rbcS-2b are distinguished from one another by four
amino acid differences. Two of them are within the chlo1872
roplast transit peptide at positions 6 and 9 and are cleaved
before assembly of the holoenzyme. The other differences
are at position 77 (rbcS-1b: threonine; rbcS-2b: serine) and
180 (rbcS-1b: aspartic acid; rbcS-2b: glutamic acid). Threonine and serine are characterized by hydrogen groups and
differ only by an additional methyl group in threonine. The
exchange between threonine and serine is typically tolerated (Betts and Russell 2003). The same holds true for
the substitutions of aspartic acid by glutamic acid, again
two amino acids differing only by one additional methyl
group (in glutamic acid). Regarding their amino acid sequence, rbcS-1b and rbcS-2b hence exhibit only subtle differences with probably little effect on selection.
The analysis of ‘‘functionally important regions’’ in promoters of rbcS genes revealed variability among A. thaliana
accessions. In the TATA box (CCACTATATAAAGA; 73
to 86) of rbcS-2b of Bur and Sha, a C to T substitution
occurred at position 86 (position 1 of the TATA box).
Mutations at positions 7–9 in the core of the prototype
TATA box (TCACTATATATAG, invariant in most common highly expressed plant genes) have been demonstrated to influence light-dependent transcription efficiency
and formation of transcriptional complexes (Kiran et al.
2006; Ranjan et al. 2009). Comparing the prototype TATA
box with that of rbcS-2b, the substitution at position 1 in
Bur and Sha leads to an identity with the prototype TATA
box and has therefore probably no implication on gene expression. Regarding Cvi, where the first three bases are lost
due to a deletion of 29-bp immediate in front (upstream)
of the TATA box, a possible impact on gene expression
remains to be evaluated. The same holds true for the
detected modifications (i.e., one substitution, one indel)
of AtREG647. Unfortunately, the function of this regulatory
element is unknown (PPDB). We could detect intraspecific
Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008
variability in one module (IbAM5) of the rbcS minimal light
regulator of rbcS-1b and rbcS-2b in accessions Bur, Cvi, and
Sha.
We provide evidence for differences in substitution rates
among Rubisco genes, which are indicative of differences in
underlying mutation rates. This coincides with the chromosomal arrangement of the genes. The faster evolving rbcS1b, rbcS-2b, and rbcS-3b are tandemly arrayed within 8 kb
on chromosome 5. RbcL is encoded in the chloroplast genome, which is known to evolve more slowly than nuclear
DNA (Wolfe et al. 1987; Lynch et al. 2006). Why rbcS-1a as
a nuclear gene on chromosome 1 is evolving as slowly as
rbcL is not fully evident. Both are highly expressed important genes, highly conserved because of purifying selection,
but this does not explain the almost complete absence of
synonymous substitutions in exons as well, compared with
the more variable Rubisco genes on chromosome 5.
Regarding the rbcS-b genes, we could detect variability in
protein-coding sequences among A. thaliana accessions.
Due to the fact that small subunits are encoded in the nucleus, they need a target sequence for chloroplast transport. The respective chloroplast transit peptide is 55
amino acids long (Krebbers et al. 1988) and will be cleaved
after entering the chloroplast. Substitutions at amino acid
level in rbcS-2b at positions 6, 32, and 48 as well as in rbcS3b at position 47 are within the target sequence and might
therefore be without impact on the mature protein. In
rbcS-1b all substitutions (positions 72, 74, and 180) are
in the mature protein. One of them, the tyrosine to aspartic
acid substitution at position 72 (Y72D), was statistically inferred to be a PSS, although rbcs-1b as well as remaining
Rubisco genes in total are under purifying selection. The
secondary structure of rbcS in general and the position
of several a-helices and b-strands are known (Spreitzer
2003). Although no substitutions occurred within a-helices
and b-strands, two amino acid substitutions in A. thaliana
accessions El, Er, and Gre (positions 72 and 74) might have
implications on the Rubisco holoenzyme. We compared
the positions of intraspecific substitutions with those created artificially in pea (Flachmann and Bohnert 1992), Chlamydomonas (Du et al. 2000; Spreitzer et al. 2001), A.
nidulans (5 Synechococcus PCC6301) (Voordouw et al.
1987; Lee et al. 1991; Paul et al. 1991; Read and Tabita
1992; Flachmann et al. 1997; Kostov et al. 1997), and Anabaena 7120 (Fitchen et al. 1990). Those authors have modified small subunits at specific sites, mostly within
structural important regions, and analyzed the implications
on the Rubisco holoenzyme. Kostov et al. (1997) analyzed
an exchange of tyrosine to aspartic acid at position 17
(Y17D) in A. nidulans, an unicellular cyanobacterium with
genes encode the large and the small subunit in a single
operon. This substitution leads to a substantially lower carboxylase activity in A. nidulans, down to 14% compared
with wild type, as well as a reduced specific activity
(5%). The substitution Y17D almost abolishes carboxylase
activity in the assembled enzyme, a result never reported
before for any mutation in the small subunits. These authors infer that the Y17D substitution in the single small
MBE
subunit of A. nidulans should indirectly but profoundly alter the active site structure in proximal dimers of large subunits without affecting the assembly itself (Kostov et al.
1997). The small subunits of A. thaliana (rbcS-1b) and A.
nidulans have an identity of only 28.7%. However, there
is a motif consisting of the 8 amino acids FETLSYLP (positions 12–19 in A. nidulans; 67–74 in A. thaliana), which was
identical in this very distantly related species and also
highly conserved among plant species in general. Interestingly, the substitution Y17D in Anacystis found to severely
compromise carboxylase activity is at exactly the same position as our substitution Y72D among A. thaliana accessions, which we inferred to be under positive selection: In
both species, the substitution alters the sixth position of
the FETLSYLP motif, that is, changing it to FETLSDLP.
We inferred that in A. thaliana, the substitution Y72D is
associated with a putative recombination event. It has been
shown by simulations that recombination can create polymorphisms erroneously inferred to be under positive selection (Casola and Hahn 2009). Although this effect is
considered to be less pronounced under the selection models favored in our analysis (i.e., M7 vs. M8; Anisimova et al.
2003), our inference of positive selection at this polymorphic site should be treated with caution. It remains to be
evaluated whether this substitution in A. thaliana—as
found by us in three accessions—also influences carboxylase activity.
Position 19 in the small subunit of A. nidulans, which
corresponds to the substituted position 74 in A. thaliana
accession Gre, was engineered as well. This position, originally exhibiting proline, was replaced by alanine (Kostov
et al. 1997) as well as histidine (Lee et al. 1991). In both
cases, the mutated enzymes showed almost full carboxylase
activity and similar CO2/O2 specificity factor compared
with the wild type. It was demonstrated in spinach that
proline at that specific position triggers a side-chain interaction with the large subunit (Schneider et al. 1990). Replacements at this position by other small amino acids,
like alanine and histidine, might not influence the holoenzyme very much. However, in accession Gre proline was
substituted by arginine at that position, a disfavored exchange with the potential to compromise protein function
(Betts and Russell 2003).
It was already known that Bur, Cvi, and Sha are accessions very diverse compared with most other accessions
analyzed so far (Schmid et al. 2003; Nordborg et al.
2005; Ossowski et al. 2008). For this reason, they are often
included in investigations regarding photosynthesis.
Sulpice et al. (2007) measured total and initial Rubisco activities in 118 accessions, including all our exception accessions except Bch. Thus, it was possible to see if Rubisco
activity is influenced by the gene duplication/loss event.
Although in Cvi total and initial activities are the lowest
among all analyzed accessions, all other exception accessions show intermediate Rubisco activity levels. Due to
the fact that Rubisco is regulated by a complex network
and that rbcS genes are differently regulated, the limited
set of environmental conditions analyzed so far might
1873
MBE
Schwarte and Tiedemann · doi:10.1093/molbev/msr008
not be sufficient to exclude any functional implication of
the gene duplication/loss found in our study. To further
evaluate this issue, it would be interesting to compare both
groups (i.e., standard and exception) regarding photosynthetic activity and growth under different environmental
conditions, that is, light or temperature.
Neither our phylogenetic nor our geographic analysis revealed a clear indication of the putative origin of our gene
duplication/loss. The striking similarity across exception accessions with regard to the position of the duplication/loss
as well as of the associated insertion/deletion pattern in the
respective promoter region is nevertheless suggestive of
a single evolutionary origin of this particular gene variant
(i.e., rbcS-2b*). However, there is apparently no single common ancestor of the accessions bearing that gene (see
above and fig. 7). Arabidopsis thaliana typically reproduces
via selfing. Nonetheless, there is the possibility of ancient
recombination due to occasional sexual reproduction (Tian
et al. 2002; Zhang and Gaut 2003; Nordborg et al. 2005). In
fact, average recombination rates for A. thaliana have been
found to be surprisingly high (4.8 CM/Mb), compared with
other eukaryotes (average 0.7 CM/Mb in maize, 2.9 CM/Mb
in Drosophila, and 1.5 CM/Mb in humans; Zhang and Gaut
2003). This could explain the decoupling of the ancestry
pattern for this particular gene from the ancestry of the
accessions bearing it. Yet, it remains difficult to imagine
interbreeding among A. thaliana accessions that now occur
in distant regions (Germany, Ireland, Cape Verde Island,
Portugal, France, and Tadjikistan). On a global scale, A. thaliana populations have been demonstrated to be geographically structured (Nordborg et al. 2005; Schmid et al. 2006). It
remains to be elucidated why neither the Rubisco gene tree
in total nor the occurrence of our duplication/loss at rbcS-1b
follow any geographic pattern.
Although Rubisco, the most abundant protein in nature
(Ellis 1979), is with .5,000 publications well investigated
(Portis and Parry 2007), the function and interplay of
the different small subunits in higher plants remain mysterious. Future studies should focus on the contribution of
the different genes for the small subunits to the composition of the Rubisco holoenzyme (Spreitzer 2003), in order
to unravel a potential functional implication of the gene
duplication/loss event as well as the manifold alterations
in promoter and protein coding sequences occurring in
Rubisco genes among accessions of the model plant
A. thaliana.
Supplementary Material
Supplementary table 1 is available at Molecular Biology and
Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
We thank Bernd Müller-Röber, Mark Stitt, and Angelo
Valleriani for stimulating discussions on Rubisco gene expression and evolution. Madlen Stange and Fanny Wegner
participated in the laboratory work. Financial support is ac1874
knowledged from the Bundesministerium für Bildung und
Forschung to the GoFORSYS initiative (grant no. 313924).
References
Aguadé M. 2001. Nucleotide sequence variation at two genes of the
phenylpropanoid pathway, the FAH1 and F3H genes, in
Arabidopsis thaliana. Mol Biol Evol. 18:1–9.
Andersson I. 2008. Catalysis and regulation in Rubisco. J Exp Bot.
59:1555–1568.
Andersson I, Backlund A. 2008. Structure and function of Rubisco.
Plant Physiol Biochem. 46:275–291.
Anisimova M, Nielsen R, Yang Z. 2003. Effect of recombination on
the accuracy of the likelihood method for detecting positive
selection at amino acid sites. Genetics 164:1229–1236.
Baker TS, Eisenberg D, Eiserling FA, Weissman L. 1975. The structure
of form I crystals of D-ribulose-1,5-diphosphate carboxylase. J
Mol Biol. 91:391–399.
Balasubramanian S, Sureshkumar S, Agrawal M, Michael TP,
Wessinger C, Maloof JN, Clark R, Warthmann N, Chory J,
Weigel D. 2006. The PHYTOCHROME C photoreceptor gene
mediates natural variation in flowering and growth responses of
Arabidopsis thaliana. Nat Genet. 38:711–715.
Bedbrook JR, Coen DM, Beaton AR, Bogorad L, Rich A. 1979.
Location of the single gene for the large subunit of ribulosebisphosphate carboxylase on the maize chloroplast chromosome. J
Biol Chem. 254:905–910.
Betts MJ, Russell RB. 2003. Amino acid properties and consequences
of substitutions. In: Barnes MR, Gray IC, editors. Bioinformatics
for geneticists. West Sussex (UK): Wiley.
Casola C, Hahn MW. 2009. Gene conversion among paralogs results
in moderate false detection of positive selection using likelihood
methods. J Mol Evol. 68:679–687.
Dean C, Pichersky E, Dunsmuir P. 1989. Structure, evolution, and
regulation of rbcS genes in higher plants. Annu Rev Plant Physiol
Plant Mol Biol. 40:415–439.
Dean C, van den Elzen P, Tamaki S, Black M, Dunsmuir P,
Bedbrook J. 1987. Molecular characterization of the rbcS
multi-gene family of Petunia (Mitchell). Mol Gen Genet.
206:465–474.
Dedonder A, Rethy R, Fredericq H, van Montagu M, Krebbers E.
1993. Arabidopsis rbcS genes are differentially regulated by light.
Plant Physiol. 101:801–808.
Donald RG, Cashmore AR. 1990. Mutation of either G box or I box
sequences profoundly affects expression from the Arabidopsis
rbcS-1A promoter. EMBO J. 9:1717–1726.
Du YC, Hong S, Spreitzer RJ. 2000. RbcS suppressor mutations
improve the thermal stability and CO2/O2 specificity of rbcLmutant ribulose-1,5-bisphosphate carboxylase/oxygenase. Proc
Natl Acad Sci U S A. 97:14206–14211.
Ellis RJ. 1979. The most abundant protein in the world. Trends
Biochem Sci. 4:241–244.
Fitchen JH, Knight S, Andersson I, Branden CI, McIntosh L. 1990.
Residues in three conserved regions of the small subunit of
ribulose-1,5-bisphosphate carboxylase/oxygenase are required
for quaternary structure. Proc Natl Acad Sci UA. 87:5768–5772.
Flachmann R, Bohnert HJ. 1992. Replacement of a conserved
arginine in the assembly domain of ribulose-1,5-bisphosphate
carboxylase/oxygenase small subunit interferes with holoenzyme
formation. J Biol Chem. 267:10576–10582.
Flachmann R, Zhu C, Jensen RG, Bohnert HJ. 1997. Mutations in the
small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase increase the formation of the misfire product xylulose-1,5bisphosphate. Plant Physiol. 114:131–136.
Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008
Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment
editor and analysis program for Windows 95/98/NT. Nucleic
Acids Symp. 41:95–98.
JGI. Joint Genome Institute. [cited 2009 October]. Available from:
http://www.jgi.doe.gov/.
Jukes TH, Cantor CR. 1969. Evolution of protein molecules. In:
Munro HN, editor. Mammalian protein metabolism. New York:
Academic Press, 21–132.
Kapralov MV, Filatov DA. 2007. Widespread positive selection in the
photosynthetic Rubisco enzyme. BMC Evol Biol. 7:73.
Kiran K, Ansari SA, Srivastava R, Lodhi N, Chaturvedi CP, Sawant SV,
Tuli R. 2006. The TATA-box sequence in the basal promoter
contributes to determining light-dependent gene expression in
plants. Plant Physiol. 142:364–376.
Kostov RV, Small CL, McFadden BA. 1997. Mutations in a sequence
near the N-terminus of the small subunit alter the CO2/O2
specificity factor for ribulose bisphosphate carboxylase/oxygenase. Photosynth Res. 54:127–134.
Krebbers E, Seurinck J, Herdies L, Cashmore AR, Timko MP.
1988. Four genes in two diverged subfamilies encode the
ribulose-1,5-bisphosphate carboxylase small subunit polypeptides of Arabidopsis thaliana. Plant Mol Biol. 11:745–759.
Kuittinen H, Aguadé M. 2000. Nucleotide variation at the
CHALCONE ISOMERASE locus in Arabidopsis thaliana. Genetics
155:863–872.
Lee B, Berka RM, Tabita FR. 1991. Mutations in the small subunit of
cyanobacterial ribulose–bisphosphate carboxylase/oxygenase
that modulate interactions with large subunits. J Biol Chem.
266:7417–7422.
Librado P, Rozas J. 2009. DnaSP v5: a software for comprehensive
analysis of DNA polymorphism data. Bioinformatics 25:
1451–1452.
López-Ochoa L, Acevedo-Hernández G, Martı́nez-Hernández A,
Argüello-Astorga G, Herrera-Estrella L. 2007. Structural relationships between diverse cis-acting elements are critical for the
functional properties of a rbcS minimal light regulatory unit. J
Exp Bot. 58:4397–4406.
Lu Y, Rausher MD. 2003. Evolutionary rate variation in anthocyanin
pathway genes. Mol Evol Biol. 20:1844–1853.
Lynch M, Koskella B, Schaack S. 2006. Mutation pressure and the
evolution of organelle genomic architecture. Science 311:
1727–1730.
Mauricio R, Stahl EA, Korves T, Tian D, Kreitman M, Bergelson J.
2003. Natural selection for polymorphism in the disease
resistance gene Rps2 of Arabidopsis thaliana. Genetics
163:735–746.
Moore RC, Grant SR, Purugganan MD. 2005. Molecular population
genetics of redundant floral-regulatory genes in Arabidopsis
thaliana. Mol Evol Biol. 22:91–103.
Nei M. 1987. Molecular evolutionary genetics. New York: Columbia
University Press.
Nei M, Kumar S. 2000. Molecular evolution and phylogenetics. New
York: Oxford University press.
Nishio JN, Sun J, Vogelmann TC. 1993. Carbon fixation gradients
across spinach leaves do not follow internal light gradients. Plant
Cell. 5:953–961.
Nordborg M, Hu TT, Ishino Y, et al. (11 co-authors). 2005. The
pattern of polymorphism in Arabidopsis thaliana. PLoS Biol.
3:1289–1299.
Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N,
Weigel D. 2008. Sequencing of natural strains of Arabidopsis
thaliana with short reads. Genome Res. 18:2024–2033.
Paul K, Morell MK, Andrews TJ. 1991. Mutations in the small
subunit of ribulosebisphosphate carboxylase affect subunit
binding and catalysis. Biochemistry 30:10019–10026.
MBE
Pichersky E, Bernatzky R, Tanksley SD, Cashmore AR. 1986. Evidence
for selection as a mechanism in the concerted evolution of
Lycopersicon esculentum (tomato) genes encoding the small
subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase.
Proc Nat Acad Sci USA. 83:3880–3884.
Portis AR, Parry MAJ. 2007. Discoveries in Rubisco (ribulose 1,5bisphosphate caboxylase/oxygenase): a historical perspective.
Photosynth Res. 94:121–143.
Ramos-Onsins SE, Puerma E, Balañá-Alcaide D, Salguero D,
Aguadé M. 2008. Multilocus analysis of variation using a large
empirical data set: phenylpropanoid pathway genes in Arabidopsis thaliana. Mol Ecol. 17:1211–1223.
Ranjan A, Ansari SA, Srivastava R, Mantri S, Asif MH, Sawant SV,
Tuli R. 2009. A T9G mutation in the prototype TATA-box
TCACTATATATAG determines nucleosome formation and
synergy with upstream activator sequences in plant promoters.
Plant Physiol. 151:2174–2186.
Read BA, Tabita FR. 1992. Amino acid substitutions in the small
subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase that
influence catalytic activity of the holoenzyme. Biochemistry
31:519–525.
Rhee SY, Beavis W, Berardini TZ, et al. (11 co-authors) 2003. The
Arabidopsis Information Resource (TAIR): a model organism
database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res.
31:224–228.
Rogers SO, Bendich AJ. 1985. Extraction of DNA from milligram
amounts of fresh, herbarium and mummified plant tissues. Plant
Mol Biol. 5:69–76.
Sawchuk MG, Donner TJ, Head P, Scarpella E. 2008. Unique and
overlapping expression patterns among members of photosynthesis-associated nuclear gene families in Arabidopsis. Plant
Physiol. 148:1908–1924.
Sawyer SA. 1989. Statistical tests for detecting gene conversion. Mol
Biol Evol. 6:526–538.
Schmid KJ, Sörensen TR, Stracke R, Törjék O, Altmann T, MitchellOlds T, Weisshaar B. 2003. Large-scale identification and analysis
of genome-wide single nucleotide polymorphisms for mapping
in Arabidopsis thaliana. Genome Res. 13:1250–1257.
Schmid KJ, Törjék O, Meyer R, Schmuths H, Hoffmann MH,
Altmann T. 2006. Evidence for a large-scale population
structure of Arabidopsis thaliana from genome-wide single
nucleotide polymorphism markers. Theor Appl Genet. 112:
1104–1114.
Schneider G, Knight S, Andersson I, Brändén CI, Lindqvist Y,
Lundqvist T. 1990. Comparison of the crystal structures of L2
and L8S8 Rubisco suggests a functional role for the small
subunit. EMBO J. 9:2045–2050.
Shepard KA, Purugganan MD. 2003. Molecular population genetics of
the Arabidopsis CLAVATA2 region: the genomic scale of variation
and selection in a selfing species. Genetics 163:1083–1095.
Spreitzer RJ. 2003. Role of the small subunit in ribulose-1,5bisphosphate carboxylase/oxygenase. Arch Biochem Biophys. 414:
141–149.
Spreitzer RJ, Esquivel MG, Du YC, McLaughlin PD. 2001. Alaninescanning mutagenesis of the small-subunit ba–bb loop of
chloroplast ribulose-1,5-bisphosphate carboxylase/oxygenase:
substitution at arg-71 affects thermal stability and CO2/O2
specificity. Biochemistry 40:5615–5621.
Spreitzer RJ, Salvucci ME. 2002. Rubisco: structure, regulatory
interactions, and possibilities for a better enzyme. Annu Rev
Plant Biol. 53:449–475.
Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based
phylogenetic analyses with thousands of taxa and mixed models.
Bioinformatics 22:2688–2690.
1875
Schwarte and Tiedemann · doi:10.1093/molbev/msr008
Sulpice R, Tschoep H, von Korff M, Büssis D, Usadel B, Höhne M,
Witucka-Wall H, Altmann T, Stitt M, Gibon Y. 2007. Description
and applications of a rapid and sensitive non-radioactive
microplate-based assay for maximum and initial activity of Dribulose-1,5-bisphosphate carboxylase/oxygenase. Plant Cell
Environ. 30:1163–1175.
Sun J, Nishio J. 2001. Why abaxial illumination limits photosynthetic carbon fixation in spinach leaves. Plant Cell Physiol.
42:1–8.
TAIR. The Arabidopsis Information Resource. TAIR9 genome
release [cited 2009 Jun 19]. Available from: http://www.
arabidopsis.org.
Tajima F. 1989. Statistical method for testing the neutral mutation
hypothesis by DNA polymorphism. Genetics 123:585–595.
Tamura K, Dudley J, Nei M, Kumar S. 2007. MEGA4: molecular
Evolutionary Genetics Analysis (MEGA) software version 4.0.
Mol Biol Evol. 24:1596–1599.
Tian D, Araki H, Stahl E, Bergelson J, Kreitman M. 2002. Signature of
balancing selection in Arabidopsis. Proc Nat Acad Sci USA.
99:11525–11530.
Voordouw G, de Vries PA, van den Berg WAM, de Clerck EPJ. 1987.
Site-directed mutagenesis of the small subunit of ribulose-1,5bisphosphate carboxylase/oxygenase from Anacystis nidulans.
Eur J Biochem. 163:591–598.
Watterson WA. 1975. On the number of segregating sites in
genetic models without recombination. Theor Popul Biol. 7:
253–276.
1876
MBE
Wolfe KH, Li WH, Sharp PM. 1987. Rates of nucleotide substitution
vary greatly among plant mitochondrial, chloroplast, and
nuclear DNAs. Proc Nat Acad Sci USA. 84:9054–9058.
Wong WSW, Yang Z, Goldman N, Nielsen R. 2004. Accuracy and
power of statistical methods for detecting adaptive evolution in
protein coding sequences and for identifying positively selected
sites. Genetics 168:1041–1051.
Yamamoto YY, Obokata J. 2008. PPDB: a plant promoter database.
Nucleic Acids Res. 36:D977–D981.
Yang Z. 1997. PAML: a program package for phylogenetic analysis by
maximum likelihood. Comput Appl BioSci. 13:555–556.
Yang Z. 2000. Phylogenetic analysis by maximum likelihood (PAML),
version 3.0. London: University college. Available from: http://
abacus.gene.ucl.ac.uk/software/paml.html.
Yang Z, Wong WSW, Nielsen R. 2005. Bayes empirical Bayes
inference of amino acid sites under positive selection. Mol Biol
Evol. 22:1107–1118.
Yoon M, Putterill JJ, Ross GS, Laing WA. 2001. Determination of the
relative expression levels of Rubisco small subunit genes in
Arabidopsis by rapid amplification of cDNA ends. Anal Biochem.
291:237–244.
Zhang L, Gaut BS. 2003. Does recombination shape the distribution
and evolution of tandemly arrayed genes (TAGs) in the
Arabidopsis thaliana genome? Genome Res. 13:2533–2540.
Zhu L, Zhang Y, Zhang W, Yang S, Chen J-Q, Tian D. 2009. Patterns
of exon–intron architecture variation of genes in eukaryotic
genomes. BMC Genomics. 10:47.