A Gene Duplication/Loss Event in the Ribulose-1,5Bisphosphate-Carboxylase/Oxygenase (Rubisco) Small Subunit Gene Family among Accessions of Arabidopsis thaliana Sandra Schwarte and Ralph Tiedemann* Evolutionary Biology, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany *Corresponding author: E-mail: [email protected]. Associate editor: Neelima Sinha Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase; EC 4.1.1.39), the most abundant protein in nature, catalyzes the assimilation of CO2 (worldwide about 1011 t each year) by carboxylation of ribulose-1,5-bisphosphate. It is a hexadecamer consisting of eight large and eight small subunits. Although the Rubisco large subunit (rbcL) is encoded by a single gene on the multicopy chloroplast genome, the Rubisco small subunits (rbcS) are encoded by a family of nuclear genes. In Arabidopsis thaliana, the rbcS gene family comprises four members, that is, rbcS-1a, rbcS-1b, rbcS-2b, and rbcS-3b. We sequenced all Rubisco genes in 26 worldwide distributed A. thaliana accessions. In three of these accessions, we detected a gene duplication/loss event, where rbcS-1b was lost and substituted by a duplicate of rbcS-2b (called rbcS-2b*). By screening 74 additional accessions using a specific polymerase chain reaction assay, we detected five additional accessions with this duplication/loss event. In summary, we found the gene duplication/loss in 8 of 100 A. thaliana accessions, namely, Bch, Bu, Bur, Cvi, Fei, Lm, Sha, and Sorbo. We sequenced an about 1-kb promoter region for all Rubisco genes as well. This analysis revealed that the gene duplication/loss event was associated with promoter alterations (two insertions of 450 and 850 bp, one deletion of 730 bp) in rbcS-2b and a promoter deletion (2.3 kb) in rbcS-2b* in all eight affected accessions. The substitution of rbcS-1b by a duplicate of rbcS-2b (i.e., rbcS-2b*) might be caused by gene conversion. All four Rubisco genes evolve under purifying selection, as expected for central genes of the highly conserved photosystem of green plants. We inferred a single positive selected site, a tyrosine to aspartic acid substitution at position 72 in rbcS-1b. Exactly the same substitution compromises carboxylase activity in the cyanobacterium Anacystis nidulans. In A. thaliana, this substitution is associated with an inferred recombination. Functional implications of the substitution remain to be evaluated. Key words: Arabidopsis thaliana, Arabidopsis lyrata, Rubisco, gene duplication, positive selection. Introduction Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase; EC 4.1.1.39) is located in the stroma of the chloroplast, where it catalyzes carboxylation in the Calvin cycle and oxygenation in the photorespiratory pathway. The composition of the Rubisco holoenzyme varies among species. In purple nonsulfur bacteria, several chemoautotrophic bacteria, and eukaryotic dinoflagellates, the enzyme complex is a dimer consisting only of two large subunits (form II), in most chemoautotrophic bacteria, cyanobacteria, red, brown, and green algae, and all higher plants the multimeric enzyme consists of eight Rubisco small subunits (rbcS) and eight Rubisco large subunits (rbcL) (form I) (Baker et al. 1975; Spreitzer and Salvucci 2002; Andersson and Backlund 2008). The large subunit (rbcL, AtCg00490) is encoded by a single gene in the chloroplast (Bedbrook et al. 1979). Small subunits are encoded by a multigene family in the nucleus (Dean et al. 1989). The number of genes encoding rbcS is manifold and spreads from two gene copies in Chlamydomonas to 22 or more genes in wheat (Spreitzer 2003). In Arabidopsis thaliana, the gene family comprises four members (rbcS-1a, At1g67090; rbcS-1b, At5g38430; rbcS-2b, At5g38420; rbcS-3b, At5g38410). With regard to their chromosomal location, they have been divided into two classes A and B (Krebbers et al. 1988), the former comprising only one copy (rbcS-1a) located on chromosome 1, whereas the latter rbcS-b genes are tandemly arrayed within 8 kb on chromosome 5. Individual members of the rbcS gene family show unique expression levels as well as developmental and tissue-specific expressions (Donald and Cashmore 1990; Dedonder et al. 1993; Sawchuk et al. 2008). In general, rbcS-1a is the major form (highest expression level), whereas rbcS1b, rbcS-2b, and rbcS-3b are minor forms (lower levels of expression) (Dedonder et al. 1993; Yoon et al. 2001; Sawchuk et al. 2008). Regarding developmental and tissuespecific expression patterns, the detailed analysis of small subunits in A. thaliana of Sawchuk et al. (2008) generally identified an expression of all small subunits in seedling as well as in mature plant organs, though tissue-specific patterns occurred: RbcS-1a, for instance, is the only one expressed in roots, where its biological function has not © The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 28(6):1861–1876. 2011 doi:10.1093/molbev/msr008 Advance Access publication January 10, 2011 1861 Research article Abstract MBE Schwarte and Tiedemann · doi:10.1093/molbev/msr008 been revealed. An interesting peculiarity in the expression pattern has been described for leaves, where rbcS-1b is exclusively expressed in the abaxial (lower) side of the leaf. Although carbon fixation is proportional to light intensity and hence higher at the adaxial (upper) side of the leaf, Rubisco contents at the abaxial and adaxial surfaces are similar (Nishio et al. 1993; Sun and Nishio 2001). This suggests that rbcS-1b might modify the catalytic properties of the Rubisco holoenzyme such that its efficiency under light-limiting conditions is maintained (Sawchuk et al. 2008). It was demonstrated that expression levels of rbcS-1a, rbcS-2b, and rbcS-3b show specific responses to different light pulses, whereas rbcS-1b expression is not affected by any of them (Dedonder et al. 1993). This corresponds to the presence (rbcS-1a, rbcS-2b, and rbcS-3b) and the absence (rbcS-1b) of light-regulatory units (CMA5: conserved modular array 5) in the promoter of the Rubisco small subunits (Krebbers et al. 1988). Mutagenesis experiments of G-box (C/A-CACGTGGC) and I-box (GATAAG) from the Arabidopsis rbcS-1a promoter showed that both are required for gene expression (Donald and Cashmore 1990). López-Ochoa et al. (2007) analyzed sequences of G- and I-box as well as IbAM5 among plant species and assessed that respective modules have potential binding sites for transcription factors and are essential for CMA5 activity. The spacing between G- and I-box (about 15– 25 bp) is important for achieving functional combinatorial interactions, whereas the relative position does not seem to be critical. Although large subunits contain the active sites for catalytic activity, it could be demonstrated that small subunits have an influence on protein activity (Spreitzer and Salvucci 2002; Spreitzer 2003; Andersson 2008; Andersson and Backlund 2008). The bA-bB loop of the small subunits, which exhibits high variability in both length and nucleotide sequence, triggers numerous interactions between both the large and the small subunits. It can hence profoundly influence the holoenzyme’s stability and catalytic performance (Spreitzer et al. 2001; Spreitzer 2003). Rubisco enzymes with enlarged bA–bB loops, as in land plants and green algae, have generally higher specificity values for CO2/O2 than enzymes with normal bA–bB loops. In short, small subunits are not responsible for the catalytic activity itself but rather for fine tuning of the Rubisco holoenzyme. RbcS has been repeatedly engineered by inducing amino acid substitutions at specific sites in different species, like pea (Flachmann and Bohnert 1992), Chlamydomonas (Du et al. 2000; Spreitzer et al. 2001), Anacystis nidulans (5 Synechococcus PCC6301) (Voordouw et al. 1987; Lee et al. 1991; Paul et al. 1991; Read and Tabita 1992; Flachmann et al. 1997; Kostov et al. 1997), and Anabaena 7120 (Fitchen et al. 1990). Although some substitutions have no implication on Rubisco activity, there are others that influence carboxylation activity, specificity, or even the formation of the holoenzyme. In this study, we sequenced genes of the large as well as the four small subunits of Rubisco in 26 worldwide distributed A. thaliana accessions and detected a gene duplica1862 tion/loss event, where rbcS-1b was lost and substituted by a duplicate of rbcS-2b (called rbcS-2b*) in three (Bur, Cvi, and Sha) of these accessions. We developed a specific polymerase chain reaction (PCR) assay by which we screened 74 additional accessions. The newly found gene duplication/loss event occurred in eight (Bch, Bu, Bur, Cvi, Fei, Lm, Sha, and Sorbo) of 100 accessions. We sequenced an about 1-kb promoter region for all Rubisco genes as well. This analysis revealed that the gene duplication/loss event was linked to promoter alterations (two insertions of 450 and 850 bp, one deletion of 730 bp) in rbcS-2b and a promoter deletion (2.3 kb) in rbcS-2b* in all eight affected accessions. Functional implications of the gene duplication/loss event on Rubisco holoenzyme as well as intraspecific variability among A. thaliana accessions and interspecific polymorphisms between A. thaliana and its sister species Arabidopsis lyrata are discussed. Material and Methods Plant Cultivation, PCR Amplification, and Sequencing Seedlings as well as adult plants were grown in 1:1 mixture of GS 90 soil and vermiculite. To break dormancy prior to germination, seeds were incubated at 4 °C for at least 2 days before transfer to a short-day regime (12 h light [120 lE m2 s1] at 20 °C/12 h dark at 18 °C). Leaves were harvested after 4 weeks, and genomic DNA was extracted from a pool of three plants per accession with a modified cetyl trimethyl ammonium bromide-procedure (Rogers and Bendich 1985). Primers for the Rubisco genes rbcL (AtCg00490), rbcS-1a (At1g67090), rbcS-1b (At5g38430), rbcS-2b (At5g38420), and rbcS-3b (At5g38410) were designed based on the sequence of Col-0 (see supplementary table, Supplementary Material online). For amplification and sequencing of coding regions, primers were placed about 50–200 bp upstream (forward primer), respectively, downstream (reverse primer) of the coding region. The promoter region was analyzed with primers amplifying about 1.0–1.5 kb upstream the start codon. The fragments of 26 worldwide distributed accessions (Bl-1, Bur-0, Can-0, Cha-0, Col-0, Ct-1, Cvi-0, Edi-0, El-0, Er-0, Est-1, Gre-0, Ler-1, Mt-0, Nok-2, Oy-0, Rsch-0, Sap-0, Sha[kdara], Stw-0, Te-0, Tsu-1, Van-0, Wil, Ws-3, and Yo-0; hereafter called ‘‘sample set I’’) were amplified with the proofreading polymerase Phusion (Finnzymes) and purified enzymatically by using Exonuclease I and Antarctic Phosphatase (New England Biolabs). The template was directly used for sequencing on an ABI 3130xl automated sequencer (Applied Biosystems), using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). After detection of a gene duplication/ loss event (see below), we developed a specific PCR assay to screen 74 additional accessions for this event (AK-1, Akita, Ang-0, Bay-0, Bch-1, Bd-0, Be-0, Bla-11, Blh-1, Bor4, Br-0, Bsch-2, Bu-2, C24, Cl-0, Co-3, Da-0, Da-112, Dijon-M, Dr-0, Dra-0, Ei-2, Enkheim-D, Ep-0, Fei-0, Ge-2, Goe-2, Gol-1, GOT-7, Gr, HI-3, HOG, Hs-0, Is-1, Je-54, Jea, Kae-0, Kl-0, Kn-0, Konchezero [N13], Kondara-0, Lan-0, Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008 Lip-0, Lm, Lov 5, Lu, Mh-1, Ms-0, Nd, NFA8, No-0, Nw-3, Old-1, Ove-0, Petergof, Po-0, Pr-0, Pt-0, Pyl-0, Rak-2, RLD-1, RRS-10, RRS-7, Ru[bezhnoe]-1, Sorbo, St-0, Ta-0, TAMM-2, Ts-1, Ty-0, Wc-2, Wei-1, Zue-1, Wil-0; hereafter called ‘‘sample set II’’). The primers for detecting the gene duplication/loss event were placed in intron 3 as well as in the 3#-untranslated region (UTR) of rbcS-1b. In those accessions of sample set II where the PCR assay indicated the gene duplication/loss event, we verified it by sequencing the affected loci. Data Analysis Alignment Sequences were assembled with BioEdit version 7.0.5 (Hall 1999), and all variable sites were checked manually during the construction of a sequence contig for each accession. All sequences were manually aligned to the reference sequence of Col-0. Estimation of Nucleotide Polymorphism Genetic variability measures were calculated based on the 26 initially sequenced accessions (sample set I) for rbcL (AtCg00490), rbcS-1a (At1g67090), rbcS-2b (At5g38420), and rbcS-3b (At5g38410). For rbcS-1b (At5g38430), this analysis was performed with only 23 sequences, as three accessions (Bur, Cvi, and Sha) turned out to lack this gene (see below). With DnaSP version 5 (Librado and Rozas 2009) both intra and interspecific levels of nucleotide polymorphism were determined. We performed a multidomain analysis to estimate number of polymorphic sites (S), total number of substitutions (g), number of alleles (h), haplotype diversity (Hd), nucleotide diversity (p) according to Nei (1987), and GC content, separately for promoters, exons, and introns. Nucleotide diversity (p) and divergence (K) were determined according to Nei (1987). For an interspecific comparison, we used sequences of A. lyrata (JGI). Promoter Analyses We sequenced the promoter region of about 1.0 kb for all Rubisco genes to check if polymorphic sites affect ‘‘functionally important elements,’’ according to the plant promoter database (PPDB; Yamamoto and Obokata 2008), which we searched for regulatory elements and other important promoter regions, like the TATA box. Inference of Gene Conversion and Recombination We searched for gene conversion within any accession across the three Rubisco paralogs on chromosome 5 (i.e., rbcS-1b/rbcS-2b*, rbcS-2b, and rbcS-3b), using the GENECONV software with default settings (Sawyer 1989). For each gene separately, the number of putative recombination events was inferred with DnaSP version 5 (Librado and Rozas 2009). Evaluation of Genewise Selection We tested whether nucleotide substitution patterns are indicative of natural selection acting upon them. Specifically, we compared the relative frequencies of synonymous substitutions per synonymous site (dS) with those of nonsynonymous substitutions per nonsynonymous site (dN) MBE (Tamura et al. 2007; implemented in MEGA version 4). The nonsynonymous to synonymous substitution rate ratio (x) was calculated according to the modified model of Nei and Gojobori (Nei and Kumar 2000) with the correction of Jukes and Cantor (1969) for saturation/multiple hits. With a Z-test, we assessed the likelihood of the null hypothesis of neutral evolution (H0: dN 5 dS), relative to two alternatives, that is, purifying selection (dN , dS) and positive selection (dN . dS). Tajima’s D statistic (Tajima 1989) was also calculated. This often used selection test is based on the comparison of two estimates of the amount of genetic variation, that is, 1) the number of segregating sites (Watterson 1975) and 2) the average number of pairwise differences. Under the null hypothesis of neutral evolution, both measures are expected to yield the same estimate, whereas a significant difference in these measures can be indicative of natural selection. Selection at Particular Codons Also, within a codon for a single amino acid, the ratio of nonsynonymous to synonymous substitution rate (x) was evaluated, as values ,1, 51, and .1 are indicative of purifying selection, neutral evolution, and diversifying (5 positive) selection, respectively. Positive selected sites (PSSs), suggested by x . 1, were identified by using maximum likelihood–based random-site model analysis implemented in the PAML 3.14 package (Yang 1997, 2000). The analyses for each Rubisco gene were performed using runcode ‘‘user tree’’ in codeml. The utilized maximum likelihood trees were constructed by RaxML 7.0.4 (Stamatakis 2006). We performed one likelihood ratio test (LRT) for dN/dS heterogenity (M0 vs. M3) and two for positive selection (M1 vs. M2 and M7 vs. M8). In the neutral M1 model, two site classes x0 and x1 are assumed. The more complex M2 model (selection) adds a free x ratio, which is estimated from the data set. Both models fix x0 5 1 and x1 5 1 and are unrealistic due to the fact that they do not account for sites with 0 , x , 1. Therefore, Wong et al. (2004) and Yang et al. (2005) described new models M1a and M2a, where 0 , x0 , 1 is estimated from the data set and x1 is fixed. The new models are implemented since PAML version 3.14. Because the M1–M2 comparison is less powerful, we performed a second LRT with model M7 and M8. M7 (b) assumes a beta distribution of x over sites, whereas model M8 (b and x) adds an additional site class (free x ratio), which is estimated from the data set (Yang 2000). The significance of the LRTs was tested assuming that twice the difference in the log of maximum likelihood values between the two models is distributed as a v2 distribution under the null hypothesis of no selection. The degrees of freedom (df) were equal to the difference in the number of parameters of the two tested models. Thus, df 5 4 for the M0–M3 comparison, whereas df 5 2 for the M1–M2 and M7M8 comparisons. Whenever the alternative models M2 and M8 fitted the data better (P , 0.05) than the compared null models, the respective site was considered as being positive selected. 1863 Schwarte and Tiedemann · doi:10.1093/molbev/msr008 Maximum Likelihood Gene Tree With RAxML 7.0.4 (Stamatakis 2006), we constructed a maximum likelihood gene tree for Rubisco generated under the general time reversible þ G þ I model of sequence evolution with 1,000 bootstrap replicates. The tree is based on a supergene consisting of complete sequences of three small subunits rbcS-1a, rbcS-2b, and rbcS-3b for 26 natural Arabidopsis accessions (sample set I). The fourth small subunit rbcS-1b was excluded because that gene was lacking in some accessions due to a gene duplication/loss event (see below). RbcL is not located in the nuclear but in the chloroplast genome and cannot be assumed to follow the same evolutionary pattern as the nuclear genes. Therefore, a separate RAxML analysis was performed for this gene, which, however, yielded a poorly resolved tree (data not shown). Results Gene Duplication/Loss Event Our analyses of the loci rbcS-1b and rbcS-2b in 26 accessions (sample set I) from around the world revealed two groups of haplotypes (fig. 1): Altogether 23 accessions appeared similar to Col-0 (termed ‘‘standard’’ hereafter; represented in fig. 1 by Col, Ler, and Ws), whereas three accessions exhibited a divergent sequence pattern (termed ‘‘exception’’ hereafter; Bur, Cvi, Sha in fig. 1). These two groups can be distinguished by various substitutions, insertions, and deletions in the promoter, the gene sequence as well as the 3#-UTR. Relative to the standard, we found in the promoter of the exception accessions a deletion at position 44 as well as two insertions, a 14-bp fragment (5#-GAAAAAAAGAGCAA-3#, between positions 21 and 20) and a fragment of 9 bp (5#-GAAACAACA-3#, between positions 11 and 10). There are 21 polymorphisms distributed over 3 exons of which 16 are diagnostic between standard and exception accessions (fig. 1). Three of the substitutions are nonsynomymous and lead to amino acid exchanges between methionine and leucine (M20L; nucleotide [nt] position 58; cf. fig. 1), serine and alanine (S32A; nt position 96), as well as threonine and serine (T77S; nt position 319), respectively. These amino acid exchanges, however, do not profoundly impact biochemical properties of the resulting protein, as polarity and charge of the side chain are not affected. In comparison with the exons, there are much more variable sites and indels in the introns, especially in intron 2 of rbcS-1b. The lower part of figure 1 shows the genetic variability of locus rbcS-2b for the same accessions. Unlike the pattern for rbcS-1b, we did not find evidence for very divergent haplotypes among the 26 A. thaliana accessions analyzed. We found three substitutions in the promoter region close to the transcription start at positions 47, 43, and 39. In the coding sequence, we could detect 11 polymorphic sites, of which four are nonsynonymous. These substitutions lead to amino acid exchanges between leucine and phenylalanine (L6F in all represented accessions except Col-0; nt position 16), methionine and leucine (M20L in Bch; nt position 58), serine and alanine (S32A in Cvi; nt 1864 MBE position 96), as well as serine and asparagine (S48N in Ler; nt position 143), which have no profound impact on the biochemical properties of the affected amino acid. Both introns are less variable than the exons with just one 3-bp deletion in intron 2 at position 523–525. The comparison of the exception haplotype group of rbcS-1b (Bur, Cvi, and Sha) and the sequences of rbcS-2b revealed several comformities (gray highlighted in fig. 1) in the promoter and the entire gene: The 14-bp promoter insertion (Insert A) is—except for 3 bases—equal among rbcS-1b exception and rbcS-2b, whereas the second 9-bp insertion (Insert B) is exactly the same. Regarding exon 1, all rbcS-1b sequences (standard and exception) share a sequence pattern distinguished from all rbcS-2b sequences at positions 24 and 25. From position 75 on, there is again (i.e., as in the promoter) a striking difference among rbcS-1b sequences of different accessions (i.e., between groups standard and exception), such that rbcS-1b exception resembles the sequence pattern of rbcS-2b (highlighted gray in fig. 1). This identity between rbcS-1b exception and rbcS-2b extends through subsequent introns and exons until position 797 in exon 3. Two positions follow (798 and 799) where all rbcS-1b(standard and exception) and rbcS2b were always identical. From position 800 on and throughout the subsequent 3#-UTR, rbcS-1b of all accessions (standard and exception) are again identical and deviate from rbcS-2b (fig. 1). From this sequence pattern, we assume that in some accessions (Bur, Cvi, and Sha) the rbcS-1b gene and part of its 5#-UTR region were lost between position 20 and 797 and substituted by a duplicate of rbcS-2b. We tested the rbcS-1b and rbcS-2b of Bur, Cvi, and Sha regarding the occurrence of gene conversion by using GENECONV (Sawyer 1989) and inferred that the region between positions 59 and 799 in rbcS-1b was converted to rbcS-2b. This locus was also the only Rubisco gene for which we inferred recombination events (Rm 5 2; positions 304–366 and 366–669). The Rm value and the inferred recombination events remained stable, regardless of inclusion or exclusion of the exception accessions. In addition, the inferred regions of recombination do not coincide with our gene duplication/loss event, rendering recombination as the underlying mechanism unlikely. To assess whether this newly found gene duplication/ loss event was present in more accessions, we developed a PCR assay and screened 74 additional accessions. In this assay, one PCR (P307/P57) only amplified the original rbcS1b standard locus (fig. 2A), whereas we designed two PCRs containing different primer sets (P316/P57 and P317/57) to specifically amplify the rbcS-1b exception (hereafter called rbcS-2b*, because of its assumed origin from a duplication of rbcS-2b; fig. 2B). Col, Ler, and Ws served as controls for rbcS-1b standard, whereas Bur, Cvi, and Sha were controls for the presence of rbcS-2b*, exhibiting the inferred gene/ duplication loss event. Of the 74 additionally analyzed accessions (sample set II), five exhibited the loss of rbcS-1b and gain of rbcS-2b* in the PCR assay (and additionally verified by sequencing in these five accessions). Altogether, 8 Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008 MBE FIG. 1. Sequence polymorphisms in rbcS-1b and rbcS-2b of Arabidopsis thaliana accessions. Shading indicates sequence identity to Col-0 rbcS-1b (white) or Col-0 rbcS-2b (gray). Col-0, Ler, and Ws represent the standard type. In Bur, Cvi, Sha, Bch, Bu, Fei, Lm, and Sorbo, the rbcS-1b gene resembles the sequence pattern of Col-0 rbcS-2b (exception type), indicative of a gene duplication/loss event at that locus (see text for details). Nucleotides are numbered starting with the initiation codon (ATG). Stop codon (TAA) is at positions 804–806. Dots indicate identity to the reference sequence of Col-0. Hyphen represents the absence of a nucleotide relative to the reference. Four insertions are present; (A) 14-bp insertion between positions 21 and 20, (B) 9-bp insertion between position 11 and 10, (C) 1-bp insertion between positions 175 and 176, and (D) 1-bp insertion between positions 839 and 840. Amino acid substitutions are given as one-letter symbols in the lower part of the column. The upper symbol indicates the amino acid in Col-0, whereas the lower is the modified one. 1865 MBE Schwarte and Tiedemann · doi:10.1093/molbev/msr008 FIG. 2. PCR assay for detecting the gene duplication/loss event in additional Arabidopsis thaliana accessions. Col, Ler, and Ws are accessions without, whereas Bur, Cvi, and Sha represent reference plants with gene duplication/loss event. Bch, Bu, Fei, Lm, and Sorbo are additional accessions found to share the duplication/loss pattern. L 5 1-kb ladder (Fermentas). (A) PCR amplification of rbcS-1b (lost in accessions Bur to Sorbo) and (B) detection of rbcS2b* (gained through duplication of rbcS-2b in accessions Bur to Sorbo). of 100 A. thaliana accessions analyzed show this alteration in their rbcS genes. We amplified and sequenced promoter regions of about 1 kb for each Rubisco gene as well and detected promoter length variations in rbcS-2b and rbcS-2b* during PCR in all accessions affected by the gene duplication/loss event (fig. 3): Accessions with the inferred gene duplication/loss show shorter rbcS-2b* promoters (fig. 3A) and longer rbcS2b promoters (fig. 3B). By using primers that are placed in the upstream gene sph8 (At5g38435) as well as in rbcS-1b we sequenced almost the whole intergenic region in order to verify that the original rbcS-1b did not remain in the upstream region of rbcS-2b*. In the rbcS-2b* promoter, an about ;2.3-kb deletion occurred between 2.7 and 470 bp upstream the start codon (fig. 4C). Additionally, we sequenced the entire intergenic region between rbcS2b* and rbcS-2b. The rbcS-2b promoter of these accessions possesses length variations as well. A deletion of ;730 bp occurred between 1.8 and 1.1 kb upstream the translation start of rbcS-2b in the intergenic region of rbcS-2b* and rbcS-2b. Additionally, two insertions occurred in this promoter. The first one of ;850 bp was fit in about 500 bp upstream the translation start, whereas the second ;450 bp fragment was inserted about 1,000 bp upstream the start codon (fig. 4C). Basic local alignment search tool (BLAST) searches of the inserted sequences show no similarity to any available genome sequence of A. thaliana. Using RepeatMasker (http:// repeatmasker.org), we screened for interspersed repeats within respective DNA fragments. We found a DNA sequence 1866 FIG. 3. PCR assay for detecting the promoter variation in rbcS-1b (resp. rbcS-2b*) and rbcS-2b in Arabidopsis thaliana accessions. Col, Ler, and Ws represent accessions without promoter alteration. The accessions with gene duplication/loss (Bur, Cvi, Sha, Bch, Bu, Fei, Lm, and Sorbo) all exhibit length variations in rbcS-2b and rbcS-2b* promoters. L 5 1-kb ladder (Fermentas). (A) PCR of rbcS-1b (resp. rbcS-2b*) promoter; (B) PCR of rbcS-2b promoter. of 219 bp (positions 112–339), which has an identity of 66% to a long terminal repeat element within the ;450-bp fragment. It belongs to the ATHILA7 family, which is a member of the GYPSY superfamily (The Arabidopsis Information Resource; Rhee et al. 2003). For the longer ;850-bp fragment, we detected inverted repeats as well. A sequence of 543 bp (positions 329–871) has an identity of 89% to the ATREP2 family, which belongs to the RC/HELITRON superfamily. The minimal light-regulatory unit (CMA5) of rbcS promoters in A. thaliana (López-Ochoa et al. 2007), which is about 150–300 bp upstream the translations start, was not affected by the two insertion events that occurred 500 and 1,000 bp upstream the start codon in the rbcS-2b promoter. Analyses of Regulatory Promoter Elements We sequenced about 1 kb upstream the translation start in 26 accessions and searched for ‘‘functionally important regions’’ in the PPDB (Yamamoto and Obokata 2008) to look for variability in those elements. Information for the rbcL promoter was not available, probably due to the fact that rbcL is a chloroplast gene and hence not considered in that database. For rbcS-1a, we found the coordinates for the TATA box (52 to 63) and four regulatory elements (142 to 150; 197 to 208; 254 to 261; 250 to 262). According to our analysis, none of them are modified by polymorphic sites or indels, such that the substitutions found in the promoter region of rbcS-1a are probably without any implication for transcription. Unfortunately, PPDB did not contain information on regulatory elements in rbcS-1b that are mentioned in the database. MBE Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008 FIG. 4. Graphical overview of promoter variation and gene duplication/loss event in Arabidopsis thaliana. (A) Chromosomal arrangement of the rbcS-b loci including primer positions that were used for sequencing. Relative positions of the genes are given as well. (B) Sequenced region of the standard accessions. (C) Sequenced region of the exception accessions, that is, those exhibiting the gene duplication/loss event and the coupled promoter variations in rbcS-2b* and rbcS-2b. A ;2.3-kb deletion occurred 470–2,700 bp upstream the translation start codon in the promoter of rbcS-2b* and co-occurred with the replacement of rbcS-1b by rbcS-2b*. A second ;730-bp deletion occurred in the promoter of rbcS-2b. Additionally, two different fragments (;450 and ;850 bp long) are inserted in the rbcS-2b promoter. Insertions are positioned 500 and 1,000 bp upstream the start codon, respectively, relative to the standard type. We found numerous substitutions and indels in the rbcS2b* and rbcS-1b promoters, respectively, putatively providing potential for transcriptional changes. As in rbcS-1b, there are many modifications in the rbcS-2b promoter. Positions for the TATA box (73 to 86) and seven regulatory elements (122 to 129: AtREG479; 140 to 147: AtREG493; 149 to 156: AtREG654; 211 to 220: AtREG468, AtREG379, AtREG376; 249 to 256: AtREG570; 313 to 320: AtREG647; 341 to 348: AtREG645) in rbcS-2b were retrieved from PPDB. In Bur, Cvi, and Sha, we detected mutations in the TATA box: There is a C to T substitution at position 86 in Bur and Sha and a deletion in Cvi at position 84 to 112 occurred, which lead to a loss of 3 bp in the TATA box. The regulatory element AtREG647 is mutated as well: The region of 313 to 320 shows two variants of modifications: 1) Bur, Cvi, and Sha have a T to A substitution at position 315; 2) Nok and Sap are affected by a 3-bp deletion between 314 and 316. In rbcS-3b, the TATA box is situated between 73 and 83, whereas regulatory elements are at positions 103 to 110, 141 to 150, and 205 to 221. None of those regions show any polymorphism among the 26 A. thaliana accessions analyzed here. López-Ochoa et al. (2007) analyzed the functional architecture of the conserved modular array 5 (CMA5), a minimal light-regulatory unit of rbcS promoters in A. thaliana. The position and sequences of respective motifs of rbcS-2b and rbcS-3b in A. thaliana were described and compared with other plants’ CMA5s. Donald and Cashmore (1990) already detected G- and I-box in rbcS-1a as well as the implication of mutations in these motifs on the expression level. The described sequences of G- and I-box are slightly different between López-Ochoa et al. (2007) and Donald and Cashmore (1990). The RbcS-1b promoter is lacking the G-box due to a 43-bp deletion in that region (Krebbers et al. 1988). We detected no nucleotide substitutions either in the I- or in the G-box, whereas in IbAM5, we found variable sites in rbcS-1b and rbcS-2b (table 1). In rbcS-1b, a G to A substitution occurred coupled with an insertion of two adenine residues in Bur, Cvi, and Sha, such that the IbAM5 box of these accessions contains only adenine residues. Modifications in the respective motifs of rbcS-2b were found in Bur, Cvi, and Sha as well, where at position 2 thymine was substituted by an adenine. Genetic Variability in Rubisco Genes among Accessions Nucleotide variability was estimated for rbcL, rbcS-1a, rbcS2b, and rbcS-3b of 26 worldwide distributed accessions for each structural region (UTRs, exons, introns) separately Table 1. Sequences of Minimal Light-Regulatory Unit CMA5 Modules in Arabidopsis thaliana. rbcS-1a rbcS-1b rbcS-2b rbcS-3b IbAM5 ATAGATAA AAAAGAAA ATGAGAAA AGAGAAAA nt 0 0 0 0 I-Box GATAAG GATAAG GATAAG GATAAG nt 15 — 14 14 G-Box CCACGTGGC — CCACGTGAT CCACGTGGC NOTE.—Variable sites of CMA5 modules among A. thaliana accessions are highlighted gray. nt 5 number of nucleotides interspersed between regulatory elements. Consensus sequences of IbAM5 as well as I- and G-box are taken from Donald and Cashmore (1990) and López-Ochoa et al. (2007). 1867 MBE Schwarte and Tiedemann · doi:10.1093/molbev/msr008 Table 2. Genetic Variation in Rubisco Genes among Arabidopsis thaliana Accessions. Domain n Sites S g Nonsyn Indels h Hd p GC Content Promoter Exon 1 Gene 26 26 26 999 1440 1440 3 2 2 3 2 2 — 0 0 — 0 0 4 3 3 0.286 0.538 0.538 0.0003 0.0004 0.0004 0.310 0.440 0.440 Promoter Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 Gene 26 26 26 26 26 26 26 1029 171 106 135 136 237 785 17 0 9 0 2 0 11 17 0 9 0 2 0 11 — 0 — 0 — 0 0 — 0 2 0 0 0 2 13 1 3 1 2 1 4 0.889 0.000 0.520 0.000 0.271 0.000 0.582 0.004 0.000 0.042 0.000 0.004 0.000 0.006 0.339 0.544 0.227 0.459 0.244 0.506 0.424 Promoter Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 Gene 23 23 23 23 23 23 23 1221 173 90 133 171 240 807 60 6 2 5 4 5 22 61 6 2 5 4 5 22 — 0 — 2 — 1 3 — 0 0 0 1 0 1 20 4 3 5 5 5 16 0.984 0.320 0.170 0.640 0.451 0.771 0.964 0.0107 0.0048 0.0028 0.0076 0.0053 0.0070 0.0059 0.293 0.543 0.303 0.464 0.299 0.504 0.455 Promoter Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 Gene 26 26 26 26 26 26 26 2389 173 90 133 143 240 779 63 7 1 0 1 3 12 65 7 1 0 1 3 12 — 3 — 0 — 0 3 — 0 0 0 1 0 1 12 4 2 1 2 4 9 0.862 0.452 0.077 0.000 0.077 0.542 0.809 0.0187 0.0046 0.0009 0.0000 0.0006 0.0026 0.0020 0.341 0.553 0.289 0.459 0.286 0.497 0.441 Promoter Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 Gene 26 26 26 26 26 26 26 993 173 90 133 170 240 806 49 3 2 1 16 4 25 50 3 2 1 17 4 26 — 1 — 0 — 0 1 — 0 1 0 3 0 4 18 4 3 2 5 4 7 0.963 0.222 0.151 0.148 0.455 0.222 0.471 0.0081 0.0013 0.0025 0.0011 0.0115 0.0013 0.0033 0.315 0.578 0.271 0.452 0.283 0.508 0.440 rbcL rbcS-1a rbcS-1b rbcS-2b rbcS-3b NOTE.—n 5 number of analyzed accessions; sites 5 fragment length; S 5 polymorphic sites; g 5 total number of mutations; nonsyn 5 substitutions at nonsynonymous sites; indels 5 total number of insertions/deletions; h 5 number of different haplotypes; Hd 5 haplotype diversity; p 5 nucleotide diversity. (table 2). The same analysis for rbcS-1b was performed for 23 accessions, as this gene was lacking in Bur, Cvi, and Sha (see above). As expected, RbcL is the most conserved Rubisco gene with about 0.3% and 0.1% variable sites in promoter and gene, respectively. Both substitutions in exon 1 are synonymous and do not lead to functional alterations on the protein level. Comparing the small subunit genes, the most expressed subunit rbcS-1a (Dedonder et al. 1993; Yoon et al. 2001; Sawchuk et al. 2008) is also the most conserved one: The exons show no variability, whereas intron 1 is with nine mutated sites more variable than intron 2, which has only two substitutions. With about 1.7% polymorphic sites (17 substitutions), its promoter is quite polymorphic. In total, sequencing 2,141 bp of the rbcS-1a locus in 26 accessions (sample set I), we found 30 polymorphic sites. RbcS-1b, rbcS-2b, and rbcS-3b, the tandemly arrayed small subunit genes on chromosome 5, show similar levels of variability, higher than those of rbcS-1a. About 6% of nucleotides in the rbcS-1b promoter are polymorphic. We found 16 substitutions in exons (fig. 5), of which three are nonsynonymous and lead to amino acid exchanges between tyrosine and aspartic acid (Y72D; nt position 214; 1868 accessions El, Er, Gre), prolin and arginine (P74R; nt position 221; accession Gre), as well as aspartic acid and asparagine (D180N; nt position 538; accessions Er, Est, Mt, Tsu). The RbcS-2b promoter has about 9% variable sites. As in rbcS-1b, most polymorphic sites of rbcS-2b are in exons. Just one substitution per intron was detected in rbcS-2b, whereas in exons, we found 10 substitutions (fig. 5). Three of them are nonsynonymous and lead to amino acid substitutions of phenylalanine to leucine (F6L; nt position 16; accessions Bl, Bur, Can, Cha, Ct, Cvi, Edi, Er, Est, Gre, Ler, Mt, Nok, Oy, Rsch, Sap, Sha, Stw, Te, Tsu, Wil, and Ws), serine to alanine (S32A; nt position 94; accession Cvi), and serine to asparagine (S48N; nt position 143; accessions Ler, Te). RbcS-3b has a level of variability in promoter (5.2%) and gene (3.2%) similar to rbcS-1b and rbcS-2b. However, the distribution of substitutions within the gene is different. While rbcS-1b and rbcS-2b exhibit more polymorphic sites in exons than in introns, this pattern is reversed in rbcS-3b. We could detect eight substitutions within exons (fig. 5), one of them nonsynonymous leading to an amino acid exchange between alanine and threonine (A47T; nt position 139; accession Sap). Intron 1 of rbcS-3b is less variable (2 substitutions) than intron 2 with 16 polymorphic sites. Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008 MBE FIG. 5. Nucleotide polymorphisms in the coding sequence of Rubisco genes among Arabidopsis thaliana accessions. RbcS-1a did not show any variability in its coding region. Dots indicate identity to the reference Col-0. Sequence patterns and amino acid substitutions of the accessions Bur, Cvi, and Sha (exception type) are highlighted in gray. Asterisks point to sites where some accessions share a substitution with Arabidopsis lyrata. Amino acid substitutions are shown in the lower part of any column. The upper symbol indicates the amino acid in Col-0, whereas the lower is the deviate one. The single inferred positive selected site is indicated with PSS. All Rubisco genes share some general patterns: Most substitutions were found in promoters, and the GC content is different between exons (44–58%) and introns (22–34%), in agreement with previous data (Zhu et al. 2009). Signs of Selection To evaluate which kind of selection, either positive or purifying selection, is acting on Rubisco genes, we performed a Z-test of selection (table 3). As rbcS-1a was invariable in its exons, the test could not be performed for that gene. With only two synonymous substitutions in the coding region of rbcL, the test did not reveal a statistical significant Table 3. Tests of Selection on Rubisco Genes in Arabidopsis thaliana. Positive Selection rbcL rbcS-1a rbcS-1b rbcS-2b rbcS-3b Purifying Selection Tajima’s D Z Statistic P Value Z Statistic P Value D Value P Value 21.178 1.000 1.147 0.127 0.18 >0.10 0.000 1.000 0.000 1.000 — 22.966 1.000 2.935 0.002 20.68 >0.10 21.751 1.000 1.763 0.040 21.51 >0.10 22.501 1.000 2.412 0.009 22.11 <0.05 NOTE.—Z-test of selection with null hypothesis (H0: dN 5 dS) tested against two different alternative hypotheses, that is, positive selection (HA: dN . dS) and purifying selection (HA: dN , dS). Z statistics and respective significance values (P value) were calculated for coding sequences (CDS) of each Rubisco gene. Exons of rbcL only exhibited two substitutions, both synonymous. Exons of rbcS-1a were monomorphic at all sites. pattern for this gene, as neither positive selection (P 5 1.000 for null hypothesis of no positive selection) nor purifying selection (P 5 0.127) yielded statistical support. We argue, however, that the very low number of polymorphic sites in the exons of these two genes might be taken as an indication for purifying selection to operate. All tests of genewise selection for the rbcS-b genes revealed a statistically significant pattern, that is, significant support for purifying selection acting on rbcS-1b, rbcS-2b, and rbcS-3b. We also calculated Tajima’s D, a commonly used selection test in A. thaliana, and obtain significant values for rbcS-3b (2.11; indicative of purifying selection), which corroborates the results of the Z-test. The remaining D statistics were not significant. Genewise purifying selection on such fundamental genes as Rubisco could be expected. However, single amino acid sites could potentially diverge among accessions due to positive selection. We searched for such PSSs in Rubisco genes by using PAML. Generally, PAML analyses could yield false-positive results and should hence not be performed for genes, where gene conversion is inferred to act (Casola and Hahn 2009). However, this difficulty is easily overcome by exclusion of the sequences affected by gene conversion (Casola and Hahn 2009). We hence excluded rbcS-2b* sequences from this analysis. We identified a single putative PSS in rbcS-1b at position 214 in the coding sequence (fig. 5). There, thymine is substituted by guanine in the 1869 MBE Schwarte and Tiedemann · doi:10.1093/molbev/msr008 Table 4. Intra and Interspecific Variabilities in Rubisco Genes of Arabidopsis thaliana and Arabidopsis lyrata. Number of Fixed Differences Orthologs rbcL rbcS-1a rbcS-1b rbcS-2b rbcS-3b Paralogs rbcS-1a versus rbcS-1a versus rbcS-1a versus rbcS-1b versus rbcS-1b versus rbcS-2b versus p 0.0007 0.0058 0.0058 0.0020 0.0030 rbcS-1b rbcS-2b rbcS-3b rbcS-2b rbcS-3b rbcS-3b K 0.0011 0.0095 0.0106 0.0075 0.0088 p/K 0.627 0.611 0.543 0.272 0.336 Total 7 38 39 55 58 p (A. thaliana Col) 0.193 0.193 0.206 0.053 0.060 0.088 Syn 5 30 32 46 49 Nonsyn 2 8 7 9 9 p (A. thaliana Cvi) 0.194# 0.193 0.216 0.005# 0.093# 0.092 Number of Polymorphic Sites Total 3 10 20 12 22 Syn 3 10 17 9 21 Nonsyn 0 0 3 3 1 p (A. lyrata) 0.201 0.199 0.209 0.068 0.056 0.075 NOTE.—Measures for complete gene sequences, including introns in rbcS genes. p 5 nucleotide diversity within A. thaliana; K 5 nucleotide divergence among A. thaliana and A. lyrata; p/K 5 ratio of diversity and divergence. Number of fixed differences between A. thaliana and A. lyrata as well as number of polymorphic sites among A. thaliana accessions are given for synonymous (syn) and nonsynonymous (nonsyn) substitutions. #in these comparisons, rbcS-1b is substituted by rbcS-2b*; see text for explanation. accessions El (Ellershausen, Germany), Er (Erlangen, Germany), and Gre (Greenville, Michigan), leading to an amino acid exchange of tyrosine to aspartic acid at position 72 (Y72D) at the protein level (M0 vs. M3: v2 5 26.512, P , 0.001; M1 vs. M2: v2 5 15.459, P , 0.001; M7 vs. M8: v2 5 15.519, P , 0.001, posterior probability [M8 Bayes Empirical Bayes analysis]: 0.96). At this position, we also inferred an intracodon recombination event. This recombination position was not inferred, when accessions with the one (tyrosine) or the other (aspartic acid) residue were analyzed separately, suggesting that this substitution might be associated with a recombination. It has been previously demonstrated that recombination can lead to false positives in PAML analyses (Casola and Hahn 2009). However, the selection models favored in our analysis (M7 [b] against M8 [b and x]) are considered relatively robust to the effect of recombination (Anisimova et al. 2003). Comparison of Intra and Interspecific Variabilities in Rubisco Genes Substitution rates per site (i.e., the ratio of polymorphic vs. monomorphic sites) differed significantly among the different Rubisco genes, regardless of whether all substitutions (v2 5 42.069; P , 0.001) were considered, first and second codon positions only (v2 5 10.272; P 5 0.036) or third codon position only (v2 5 33.292; P , 0.001). As 1) this difference in substitution rate among genes is particularly large for the third codon position and 2) all observed substitutions at this position were synonymous, this points toward underlying differences in gene-specific mutation rates. When this comparison was restricted to those genes of the small subunit tandemly arrayed on chromosome 5 (rbcS-1b, rbcS-2b, and rbcS-3b), no significance among genes occurred, such that for these genes substitution rates appear similar. Interestingly, the two remaining Rubisco genes (rbcL, encoded in the chloroplast; rbcS-1a, encoded on chromosome 1) did not differ significantly from one another in their substitution rate. The accessions in which we detected the gene duplication/loss (Bur, Cvi, and Sha) have been previously described 1870 to be particularly variable (Schmid et al. 2003; Nordborg et al. 2005; Ossowski et al. 2008). We evaluated also whether our inferred difference in substitution rates among Rubisco genes was caused specifically by the inclusion of these accessions. However, the observed pattern remained, regardless of the inclusion or exclusion of these accessions (data not shown). To further investigate this pattern, we included also intron sequences and compared variability (i.e., nucleotide diversity p) among A. thaliana accessions to interspecific divergences (K) from its sister species A. lyrata (table 4). The level of polymorphism for rbcL is by far the lowest, both within A. thaliana (p) and between A. thaliana and A. lyrata (K), compared with all rbcS genes. Among the rbcS genes, rbcS-1a and rbcS-1b exhibit highest nucleotide diversities (p, K), whereas rbcS-2b and rbcS-3b are intermediate. Despite different diversity and divergence, the p/K ratios were similar among rbcL, rbcS-1a, and rbcS-1b. This ratio was lower in rbcS-2b and rbcS-3b, indicating that these genes contain relatively more fixed differences between the species than polymorphisms among A. thaliana accessions. We also compared the nucleotide diversity among paralogs within A. lyrata and A. thaliana accessions (i.e., Col-0 as a standard accession; Cvi as an exception accession; table 4). It is evident that the three genes on chromosome 5 (rbcS-1b, rbcS-2b, and rbcS-3b) are less diverged from one another than from the rbcS-1a located on the first chromosome. Pairwise divergence patterns among paralogs are similar across the two analyzed A. thaliana accessions and A. lyrata, except for the comparison between rbcS-1b and rbcS-2b. Here, Col-0 and A. lyrata show about 10-fold the divergence found in Cvi, where rbcS-1b is substituted by rbcS-2b*. We detected 12 sites in Rubisco exons, where a nucleotide difference relative to Col-0 was shared among some accessions and A. lyrata (asterisks in fig. 5). Among those substitutions, there were two nonsynonymous substitutions in rbcS-2b (positions 16, 94). The first one leads to an amino acid substitution between phenylalanine and leucine. Interestingly, 22 analyzed A. thaliana accessions show the same sequence pattern as that of A. lyrata, whereas Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008 MBE FIG. 6. Intra and interspecific comparisons of inferred amino acid sequences of all rbcS genes in Arabidopsis thaliana and Arabidopsis lyrata. Dots indicate identity to the reference sequence of rbcS-1a of A. thaliana (Col-0), whereas capital letters show amino acid substitutions. The first amino acid of the mature protein is methionine at position 56 (bold letter). Intraspecific polymorphisms within Rubisco genes are highlighted in gray. only four accessions resemble the Col-0 pattern, which is typically considered the A. thaliana reference. Regarding the second amino acid exchange, only one accession (Cvi) showed the A. lyrata pattern, that is, a substitution at position 94 (fig. 5), leading to an exchange of serine to alanine. Figure 6 presents an alignment of inferred amino acid sequences for all four rbcS genes of A. thaliana and A. lyrata. RbcS-1a is clearly divergent from the rbcS-b genes in both species. Moreover, there is less divergence between rbcS-1a genes among the two species than between rbcS-1a and the rbcS-b genes within either species (amino acid positions 10, 13, 35, 54, 57, 79, and 113). Apparently, rbcS-1a and the rbcS-b genes evolve independently, at least since the speciation of A. thaliana and A. lyrata from their most recent common ancestor. Among the three rbcS-b genes, we detected a different evolutionary pattern. Here, divergence among paralogous genes within species is less than among orthologous between species (see amino acid positions 143, 151, and 162; cf. table 4). This indicates concerted evolution, which has already been reported for rbcS genes (Pichersky et al. 1986; Dean et al. 1987, 1989). Phylogenetic Relationship and Geographical Origin Based on 2,370-bp sequences of three nuclear Rubisco genes (rbcS-1a, rbcS-2b, and rbcS-3b; rbcS-1b were excluded because of the gene duplication/loss event) of 26 accessions, we calculated a maximum likelihood tree (fig. 7). We initially included A. lyrata in our analysis, yielding a tree poorly resolved for accessions of A. thaliana due to the considerable divergence among the two species (small inset graph in fig. 7). The gene tree for A. thaliana accessions displays four groups of haplotypes with no association between geographical origin. Regarding the relationship between Bur, Cvi, and Sha, that is, those accessions affected by the gene duplication/loss event (the exception type), Bur and Cvi are in haplotype group II, whereas Sha is in group I. There is hence no direct phylogenetic relationship among them, at least not detectable in the Rubisco genes. Comparing the geographical origin of exception accessions, there is no evident pattern either: Affected accessions occur in Ireland (Bur), Cape Verde Island (Cvi), Tadjikistan (Sha, Sorbo), Germany (Bch, Bu), Portugal (Fei), and France (Lm). So far, there is hence neither a clear phylogenetic nor geographic pattern pointing to the 1871 Schwarte and Tiedemann · doi:10.1093/molbev/msr008 MBE FIG. 7. Rubisco supergene tree of Arabidopsis thaliana accessions. Unrooted maximum likelihood tree based on the composite supergene of rbcS-1a, rbcS-2b, and rbcS-3b. RbcS-1b/RbcS-2b* were excluded as only one of the two is present in any accession, due to the gene duplication/ loss event. The unit for branch length is the number of nucleotide differences. If Arabidopsis lyrata is included (small inset graph on the right), divergence among both species vastly exceeds intraspecific variation, precluding resolution among A. thaliana accessions. putative origin of our inferred gene duplication/loss event in Rubisco. Discussion Several previous studies have investigated interspecific genetic variability of Rubisco genes among plants (Kapralov and Filatov 2007; Andersson and Backlund 2008) as well as genetic variability among accessions of A. thaliana (Kuittinen and Aguadé 2000; Aguadé 2001; Lu and Rausher 2003; Mauricio et al. 2003; Shepard and Purugganan 2003; Moore et al. 2005; Balasubramanian et al. 2006; Ramos-Onsins et al. 2008). Here, we specifically focus on intraspecific genetic variation in Rubisco genes among accessions of A. thaliana, compared with the level of divergence from its sister species A. lyrata. After analyzing 100 worldwide distributed accessions, we found a gene duplication/loss event (the exception in fig. 4) in eight accessions, that is, Bch, Bu, Bur, Cvi, Fei, Lm, Sha, and Sorbo. In these accessions, the major part of rbcS-1b was lost and replaced by rbcS-2b (fig. 1). Our analysis points toward gene conversion as a possible mechanism for this substitution. Gene conversion is promoted among gene duplicates with high sequence similarities (Dean et al. 1989), like the three Rubisco small subunit paralogs located on chromosome 5 (cf. table 4). Small subunits are characterized by a unique and partial overlapping gene expression pattern (Dedonder et al. 1993; Yoon et al. 2001; Sawchuk et al. 2008). In leafs, the rbcS-1b lost in our exception accessions is specifically expressed in the abaxial leaf side. It has been discussed whether this subunit modifies the Rubisco holoenzyme to be more efficient under light-limiting conditions (lower side of the leaf). RbcS– 1b and rbcS-2b are distinguished from one another by four amino acid differences. Two of them are within the chlo1872 roplast transit peptide at positions 6 and 9 and are cleaved before assembly of the holoenzyme. The other differences are at position 77 (rbcS-1b: threonine; rbcS-2b: serine) and 180 (rbcS-1b: aspartic acid; rbcS-2b: glutamic acid). Threonine and serine are characterized by hydrogen groups and differ only by an additional methyl group in threonine. The exchange between threonine and serine is typically tolerated (Betts and Russell 2003). The same holds true for the substitutions of aspartic acid by glutamic acid, again two amino acids differing only by one additional methyl group (in glutamic acid). Regarding their amino acid sequence, rbcS-1b and rbcS-2b hence exhibit only subtle differences with probably little effect on selection. The analysis of ‘‘functionally important regions’’ in promoters of rbcS genes revealed variability among A. thaliana accessions. In the TATA box (CCACTATATAAAGA; 73 to 86) of rbcS-2b of Bur and Sha, a C to T substitution occurred at position 86 (position 1 of the TATA box). Mutations at positions 7–9 in the core of the prototype TATA box (TCACTATATATAG, invariant in most common highly expressed plant genes) have been demonstrated to influence light-dependent transcription efficiency and formation of transcriptional complexes (Kiran et al. 2006; Ranjan et al. 2009). Comparing the prototype TATA box with that of rbcS-2b, the substitution at position 1 in Bur and Sha leads to an identity with the prototype TATA box and has therefore probably no implication on gene expression. Regarding Cvi, where the first three bases are lost due to a deletion of 29-bp immediate in front (upstream) of the TATA box, a possible impact on gene expression remains to be evaluated. The same holds true for the detected modifications (i.e., one substitution, one indel) of AtREG647. Unfortunately, the function of this regulatory element is unknown (PPDB). We could detect intraspecific Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008 variability in one module (IbAM5) of the rbcS minimal light regulator of rbcS-1b and rbcS-2b in accessions Bur, Cvi, and Sha. We provide evidence for differences in substitution rates among Rubisco genes, which are indicative of differences in underlying mutation rates. This coincides with the chromosomal arrangement of the genes. The faster evolving rbcS1b, rbcS-2b, and rbcS-3b are tandemly arrayed within 8 kb on chromosome 5. RbcL is encoded in the chloroplast genome, which is known to evolve more slowly than nuclear DNA (Wolfe et al. 1987; Lynch et al. 2006). Why rbcS-1a as a nuclear gene on chromosome 1 is evolving as slowly as rbcL is not fully evident. Both are highly expressed important genes, highly conserved because of purifying selection, but this does not explain the almost complete absence of synonymous substitutions in exons as well, compared with the more variable Rubisco genes on chromosome 5. Regarding the rbcS-b genes, we could detect variability in protein-coding sequences among A. thaliana accessions. Due to the fact that small subunits are encoded in the nucleus, they need a target sequence for chloroplast transport. The respective chloroplast transit peptide is 55 amino acids long (Krebbers et al. 1988) and will be cleaved after entering the chloroplast. Substitutions at amino acid level in rbcS-2b at positions 6, 32, and 48 as well as in rbcS3b at position 47 are within the target sequence and might therefore be without impact on the mature protein. In rbcS-1b all substitutions (positions 72, 74, and 180) are in the mature protein. One of them, the tyrosine to aspartic acid substitution at position 72 (Y72D), was statistically inferred to be a PSS, although rbcs-1b as well as remaining Rubisco genes in total are under purifying selection. The secondary structure of rbcS in general and the position of several a-helices and b-strands are known (Spreitzer 2003). Although no substitutions occurred within a-helices and b-strands, two amino acid substitutions in A. thaliana accessions El, Er, and Gre (positions 72 and 74) might have implications on the Rubisco holoenzyme. We compared the positions of intraspecific substitutions with those created artificially in pea (Flachmann and Bohnert 1992), Chlamydomonas (Du et al. 2000; Spreitzer et al. 2001), A. nidulans (5 Synechococcus PCC6301) (Voordouw et al. 1987; Lee et al. 1991; Paul et al. 1991; Read and Tabita 1992; Flachmann et al. 1997; Kostov et al. 1997), and Anabaena 7120 (Fitchen et al. 1990). Those authors have modified small subunits at specific sites, mostly within structural important regions, and analyzed the implications on the Rubisco holoenzyme. Kostov et al. (1997) analyzed an exchange of tyrosine to aspartic acid at position 17 (Y17D) in A. nidulans, an unicellular cyanobacterium with genes encode the large and the small subunit in a single operon. This substitution leads to a substantially lower carboxylase activity in A. nidulans, down to 14% compared with wild type, as well as a reduced specific activity (5%). The substitution Y17D almost abolishes carboxylase activity in the assembled enzyme, a result never reported before for any mutation in the small subunits. These authors infer that the Y17D substitution in the single small MBE subunit of A. nidulans should indirectly but profoundly alter the active site structure in proximal dimers of large subunits without affecting the assembly itself (Kostov et al. 1997). The small subunits of A. thaliana (rbcS-1b) and A. nidulans have an identity of only 28.7%. However, there is a motif consisting of the 8 amino acids FETLSYLP (positions 12–19 in A. nidulans; 67–74 in A. thaliana), which was identical in this very distantly related species and also highly conserved among plant species in general. Interestingly, the substitution Y17D in Anacystis found to severely compromise carboxylase activity is at exactly the same position as our substitution Y72D among A. thaliana accessions, which we inferred to be under positive selection: In both species, the substitution alters the sixth position of the FETLSYLP motif, that is, changing it to FETLSDLP. We inferred that in A. thaliana, the substitution Y72D is associated with a putative recombination event. It has been shown by simulations that recombination can create polymorphisms erroneously inferred to be under positive selection (Casola and Hahn 2009). Although this effect is considered to be less pronounced under the selection models favored in our analysis (i.e., M7 vs. M8; Anisimova et al. 2003), our inference of positive selection at this polymorphic site should be treated with caution. It remains to be evaluated whether this substitution in A. thaliana—as found by us in three accessions—also influences carboxylase activity. Position 19 in the small subunit of A. nidulans, which corresponds to the substituted position 74 in A. thaliana accession Gre, was engineered as well. This position, originally exhibiting proline, was replaced by alanine (Kostov et al. 1997) as well as histidine (Lee et al. 1991). In both cases, the mutated enzymes showed almost full carboxylase activity and similar CO2/O2 specificity factor compared with the wild type. It was demonstrated in spinach that proline at that specific position triggers a side-chain interaction with the large subunit (Schneider et al. 1990). Replacements at this position by other small amino acids, like alanine and histidine, might not influence the holoenzyme very much. However, in accession Gre proline was substituted by arginine at that position, a disfavored exchange with the potential to compromise protein function (Betts and Russell 2003). It was already known that Bur, Cvi, and Sha are accessions very diverse compared with most other accessions analyzed so far (Schmid et al. 2003; Nordborg et al. 2005; Ossowski et al. 2008). For this reason, they are often included in investigations regarding photosynthesis. Sulpice et al. (2007) measured total and initial Rubisco activities in 118 accessions, including all our exception accessions except Bch. Thus, it was possible to see if Rubisco activity is influenced by the gene duplication/loss event. Although in Cvi total and initial activities are the lowest among all analyzed accessions, all other exception accessions show intermediate Rubisco activity levels. Due to the fact that Rubisco is regulated by a complex network and that rbcS genes are differently regulated, the limited set of environmental conditions analyzed so far might 1873 MBE Schwarte and Tiedemann · doi:10.1093/molbev/msr008 not be sufficient to exclude any functional implication of the gene duplication/loss found in our study. To further evaluate this issue, it would be interesting to compare both groups (i.e., standard and exception) regarding photosynthetic activity and growth under different environmental conditions, that is, light or temperature. Neither our phylogenetic nor our geographic analysis revealed a clear indication of the putative origin of our gene duplication/loss. The striking similarity across exception accessions with regard to the position of the duplication/loss as well as of the associated insertion/deletion pattern in the respective promoter region is nevertheless suggestive of a single evolutionary origin of this particular gene variant (i.e., rbcS-2b*). However, there is apparently no single common ancestor of the accessions bearing that gene (see above and fig. 7). Arabidopsis thaliana typically reproduces via selfing. Nonetheless, there is the possibility of ancient recombination due to occasional sexual reproduction (Tian et al. 2002; Zhang and Gaut 2003; Nordborg et al. 2005). In fact, average recombination rates for A. thaliana have been found to be surprisingly high (4.8 CM/Mb), compared with other eukaryotes (average 0.7 CM/Mb in maize, 2.9 CM/Mb in Drosophila, and 1.5 CM/Mb in humans; Zhang and Gaut 2003). This could explain the decoupling of the ancestry pattern for this particular gene from the ancestry of the accessions bearing it. Yet, it remains difficult to imagine interbreeding among A. thaliana accessions that now occur in distant regions (Germany, Ireland, Cape Verde Island, Portugal, France, and Tadjikistan). On a global scale, A. thaliana populations have been demonstrated to be geographically structured (Nordborg et al. 2005; Schmid et al. 2006). It remains to be elucidated why neither the Rubisco gene tree in total nor the occurrence of our duplication/loss at rbcS-1b follow any geographic pattern. Although Rubisco, the most abundant protein in nature (Ellis 1979), is with .5,000 publications well investigated (Portis and Parry 2007), the function and interplay of the different small subunits in higher plants remain mysterious. Future studies should focus on the contribution of the different genes for the small subunits to the composition of the Rubisco holoenzyme (Spreitzer 2003), in order to unravel a potential functional implication of the gene duplication/loss event as well as the manifold alterations in promoter and protein coding sequences occurring in Rubisco genes among accessions of the model plant A. thaliana. Supplementary Material Supplementary table 1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). Acknowledgments We thank Bernd Müller-Röber, Mark Stitt, and Angelo Valleriani for stimulating discussions on Rubisco gene expression and evolution. Madlen Stange and Fanny Wegner participated in the laboratory work. Financial support is ac1874 knowledged from the Bundesministerium für Bildung und Forschung to the GoFORSYS initiative (grant no. 313924). References Aguadé M. 2001. Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis thaliana. Mol Biol Evol. 18:1–9. Andersson I. 2008. Catalysis and regulation in Rubisco. J Exp Bot. 59:1555–1568. Andersson I, Backlund A. 2008. Structure and function of Rubisco. Plant Physiol Biochem. 46:275–291. Anisimova M, Nielsen R, Yang Z. 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164:1229–1236. Baker TS, Eisenberg D, Eiserling FA, Weissman L. 1975. The structure of form I crystals of D-ribulose-1,5-diphosphate carboxylase. J Mol Biol. 91:391–399. Balasubramanian S, Sureshkumar S, Agrawal M, Michael TP, Wessinger C, Maloof JN, Clark R, Warthmann N, Chory J, Weigel D. 2006. The PHYTOCHROME C photoreceptor gene mediates natural variation in flowering and growth responses of Arabidopsis thaliana. Nat Genet. 38:711–715. Bedbrook JR, Coen DM, Beaton AR, Bogorad L, Rich A. 1979. Location of the single gene for the large subunit of ribulosebisphosphate carboxylase on the maize chloroplast chromosome. J Biol Chem. 254:905–910. Betts MJ, Russell RB. 2003. Amino acid properties and consequences of substitutions. In: Barnes MR, Gray IC, editors. Bioinformatics for geneticists. West Sussex (UK): Wiley. Casola C, Hahn MW. 2009. Gene conversion among paralogs results in moderate false detection of positive selection using likelihood methods. J Mol Evol. 68:679–687. Dean C, Pichersky E, Dunsmuir P. 1989. Structure, evolution, and regulation of rbcS genes in higher plants. Annu Rev Plant Physiol Plant Mol Biol. 40:415–439. Dean C, van den Elzen P, Tamaki S, Black M, Dunsmuir P, Bedbrook J. 1987. Molecular characterization of the rbcS multi-gene family of Petunia (Mitchell). Mol Gen Genet. 206:465–474. Dedonder A, Rethy R, Fredericq H, van Montagu M, Krebbers E. 1993. Arabidopsis rbcS genes are differentially regulated by light. Plant Physiol. 101:801–808. Donald RG, Cashmore AR. 1990. Mutation of either G box or I box sequences profoundly affects expression from the Arabidopsis rbcS-1A promoter. EMBO J. 9:1717–1726. Du YC, Hong S, Spreitzer RJ. 2000. RbcS suppressor mutations improve the thermal stability and CO2/O2 specificity of rbcLmutant ribulose-1,5-bisphosphate carboxylase/oxygenase. Proc Natl Acad Sci U S A. 97:14206–14211. Ellis RJ. 1979. The most abundant protein in the world. Trends Biochem Sci. 4:241–244. Fitchen JH, Knight S, Andersson I, Branden CI, McIntosh L. 1990. Residues in three conserved regions of the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase are required for quaternary structure. Proc Natl Acad Sci UA. 87:5768–5772. Flachmann R, Bohnert HJ. 1992. Replacement of a conserved arginine in the assembly domain of ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit interferes with holoenzyme formation. J Biol Chem. 267:10576–10582. Flachmann R, Zhu C, Jensen RG, Bohnert HJ. 1997. Mutations in the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase increase the formation of the misfire product xylulose-1,5bisphosphate. Plant Physiol. 114:131–136. Gene Duplication/Loss in Rubisco of A. thaliana · doi:10.1093/molbev/msr008 Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. 41:95–98. JGI. Joint Genome Institute. [cited 2009 October]. Available from: http://www.jgi.doe.gov/. Jukes TH, Cantor CR. 1969. Evolution of protein molecules. In: Munro HN, editor. Mammalian protein metabolism. New York: Academic Press, 21–132. Kapralov MV, Filatov DA. 2007. Widespread positive selection in the photosynthetic Rubisco enzyme. BMC Evol Biol. 7:73. Kiran K, Ansari SA, Srivastava R, Lodhi N, Chaturvedi CP, Sawant SV, Tuli R. 2006. The TATA-box sequence in the basal promoter contributes to determining light-dependent gene expression in plants. Plant Physiol. 142:364–376. Kostov RV, Small CL, McFadden BA. 1997. Mutations in a sequence near the N-terminus of the small subunit alter the CO2/O2 specificity factor for ribulose bisphosphate carboxylase/oxygenase. Photosynth Res. 54:127–134. Krebbers E, Seurinck J, Herdies L, Cashmore AR, Timko MP. 1988. Four genes in two diverged subfamilies encode the ribulose-1,5-bisphosphate carboxylase small subunit polypeptides of Arabidopsis thaliana. Plant Mol Biol. 11:745–759. Kuittinen H, Aguadé M. 2000. Nucleotide variation at the CHALCONE ISOMERASE locus in Arabidopsis thaliana. Genetics 155:863–872. Lee B, Berka RM, Tabita FR. 1991. Mutations in the small subunit of cyanobacterial ribulose–bisphosphate carboxylase/oxygenase that modulate interactions with large subunits. J Biol Chem. 266:7417–7422. Librado P, Rozas J. 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451–1452. López-Ochoa L, Acevedo-Hernández G, Martı́nez-Hernández A, Argüello-Astorga G, Herrera-Estrella L. 2007. Structural relationships between diverse cis-acting elements are critical for the functional properties of a rbcS minimal light regulatory unit. J Exp Bot. 58:4397–4406. Lu Y, Rausher MD. 2003. Evolutionary rate variation in anthocyanin pathway genes. Mol Evol Biol. 20:1844–1853. Lynch M, Koskella B, Schaack S. 2006. Mutation pressure and the evolution of organelle genomic architecture. Science 311: 1727–1730. Mauricio R, Stahl EA, Korves T, Tian D, Kreitman M, Bergelson J. 2003. Natural selection for polymorphism in the disease resistance gene Rps2 of Arabidopsis thaliana. Genetics 163:735–746. Moore RC, Grant SR, Purugganan MD. 2005. Molecular population genetics of redundant floral-regulatory genes in Arabidopsis thaliana. Mol Evol Biol. 22:91–103. Nei M. 1987. Molecular evolutionary genetics. New York: Columbia University Press. Nei M, Kumar S. 2000. Molecular evolution and phylogenetics. New York: Oxford University press. Nishio JN, Sun J, Vogelmann TC. 1993. Carbon fixation gradients across spinach leaves do not follow internal light gradients. Plant Cell. 5:953–961. Nordborg M, Hu TT, Ishino Y, et al. (11 co-authors). 2005. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3:1289–1299. Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D. 2008. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 18:2024–2033. Paul K, Morell MK, Andrews TJ. 1991. Mutations in the small subunit of ribulosebisphosphate carboxylase affect subunit binding and catalysis. Biochemistry 30:10019–10026. MBE Pichersky E, Bernatzky R, Tanksley SD, Cashmore AR. 1986. Evidence for selection as a mechanism in the concerted evolution of Lycopersicon esculentum (tomato) genes encoding the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase. Proc Nat Acad Sci USA. 83:3880–3884. Portis AR, Parry MAJ. 2007. Discoveries in Rubisco (ribulose 1,5bisphosphate caboxylase/oxygenase): a historical perspective. Photosynth Res. 94:121–143. Ramos-Onsins SE, Puerma E, Balañá-Alcaide D, Salguero D, Aguadé M. 2008. Multilocus analysis of variation using a large empirical data set: phenylpropanoid pathway genes in Arabidopsis thaliana. Mol Ecol. 17:1211–1223. Ranjan A, Ansari SA, Srivastava R, Mantri S, Asif MH, Sawant SV, Tuli R. 2009. A T9G mutation in the prototype TATA-box TCACTATATATAG determines nucleosome formation and synergy with upstream activator sequences in plant promoters. Plant Physiol. 151:2174–2186. Read BA, Tabita FR. 1992. Amino acid substitutions in the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase that influence catalytic activity of the holoenzyme. Biochemistry 31:519–525. Rhee SY, Beavis W, Berardini TZ, et al. (11 co-authors) 2003. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res. 31:224–228. Rogers SO, Bendich AJ. 1985. Extraction of DNA from milligram amounts of fresh, herbarium and mummified plant tissues. Plant Mol Biol. 5:69–76. Sawchuk MG, Donner TJ, Head P, Scarpella E. 2008. Unique and overlapping expression patterns among members of photosynthesis-associated nuclear gene families in Arabidopsis. Plant Physiol. 148:1908–1924. Sawyer SA. 1989. Statistical tests for detecting gene conversion. Mol Biol Evol. 6:526–538. Schmid KJ, Sörensen TR, Stracke R, Törjék O, Altmann T, MitchellOlds T, Weisshaar B. 2003. Large-scale identification and analysis of genome-wide single nucleotide polymorphisms for mapping in Arabidopsis thaliana. Genome Res. 13:1250–1257. Schmid KJ, Törjék O, Meyer R, Schmuths H, Hoffmann MH, Altmann T. 2006. Evidence for a large-scale population structure of Arabidopsis thaliana from genome-wide single nucleotide polymorphism markers. Theor Appl Genet. 112: 1104–1114. Schneider G, Knight S, Andersson I, Brändén CI, Lindqvist Y, Lundqvist T. 1990. Comparison of the crystal structures of L2 and L8S8 Rubisco suggests a functional role for the small subunit. EMBO J. 9:2045–2050. Shepard KA, Purugganan MD. 2003. Molecular population genetics of the Arabidopsis CLAVATA2 region: the genomic scale of variation and selection in a selfing species. Genetics 163:1083–1095. Spreitzer RJ. 2003. Role of the small subunit in ribulose-1,5bisphosphate carboxylase/oxygenase. Arch Biochem Biophys. 414: 141–149. Spreitzer RJ, Esquivel MG, Du YC, McLaughlin PD. 2001. Alaninescanning mutagenesis of the small-subunit ba–bb loop of chloroplast ribulose-1,5-bisphosphate carboxylase/oxygenase: substitution at arg-71 affects thermal stability and CO2/O2 specificity. Biochemistry 40:5615–5621. Spreitzer RJ, Salvucci ME. 2002. Rubisco: structure, regulatory interactions, and possibilities for a better enzyme. Annu Rev Plant Biol. 53:449–475. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690. 1875 Schwarte and Tiedemann · doi:10.1093/molbev/msr008 Sulpice R, Tschoep H, von Korff M, Büssis D, Usadel B, Höhne M, Witucka-Wall H, Altmann T, Stitt M, Gibon Y. 2007. Description and applications of a rapid and sensitive non-radioactive microplate-based assay for maximum and initial activity of Dribulose-1,5-bisphosphate carboxylase/oxygenase. Plant Cell Environ. 30:1163–1175. Sun J, Nishio J. 2001. Why abaxial illumination limits photosynthetic carbon fixation in spinach leaves. Plant Cell Physiol. 42:1–8. TAIR. The Arabidopsis Information Resource. TAIR9 genome release [cited 2009 Jun 19]. Available from: http://www. arabidopsis.org. Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595. Tamura K, Dudley J, Nei M, Kumar S. 2007. MEGA4: molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 24:1596–1599. Tian D, Araki H, Stahl E, Bergelson J, Kreitman M. 2002. Signature of balancing selection in Arabidopsis. Proc Nat Acad Sci USA. 99:11525–11530. Voordouw G, de Vries PA, van den Berg WAM, de Clerck EPJ. 1987. Site-directed mutagenesis of the small subunit of ribulose-1,5bisphosphate carboxylase/oxygenase from Anacystis nidulans. Eur J Biochem. 163:591–598. Watterson WA. 1975. On the number of segregating sites in genetic models without recombination. Theor Popul Biol. 7: 253–276. 1876 MBE Wolfe KH, Li WH, Sharp PM. 1987. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Nat Acad Sci USA. 84:9054–9058. Wong WSW, Yang Z, Goldman N, Nielsen R. 2004. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168:1041–1051. Yamamoto YY, Obokata J. 2008. PPDB: a plant promoter database. Nucleic Acids Res. 36:D977–D981. Yang Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl BioSci. 13:555–556. Yang Z. 2000. Phylogenetic analysis by maximum likelihood (PAML), version 3.0. London: University college. Available from: http:// abacus.gene.ucl.ac.uk/software/paml.html. Yang Z, Wong WSW, Nielsen R. 2005. Bayes empirical Bayes inference of amino acid sites under positive selection. Mol Biol Evol. 22:1107–1118. Yoon M, Putterill JJ, Ross GS, Laing WA. 2001. Determination of the relative expression levels of Rubisco small subunit genes in Arabidopsis by rapid amplification of cDNA ends. Anal Biochem. 291:237–244. Zhang L, Gaut BS. 2003. Does recombination shape the distribution and evolution of tandemly arrayed genes (TAGs) in the Arabidopsis thaliana genome? Genome Res. 13:2533–2540. Zhu L, Zhang Y, Zhang W, Yang S, Chen J-Q, Tian D. 2009. Patterns of exon–intron architecture variation of genes in eukaryotic genomes. BMC Genomics. 10:47.
© Copyright 2024 Paperzz