Evolution of the Mitochondrial rps3 Intron in Perennial and Annual Angiosperms and Homology to nad5 Intron 1 Jérôme Laroche and Jean Bousquet Centre de Recherche en Biologie Forestière, Université Laval, Québec, Canada The plant mitochondrial rps3 intron was analyzed for substitution and indel rate variation among 15 monocot and dicot angiosperms from 10 genera, including perennial and annual taxa. Overall, the intron sequence was very conserved among angiosperms. Based on length polymorphism, 10 different alleles were identified among the 10 genera. These allelic differences were mainly attributable to large indels. An insertion of 133 nucleotides, observed in the Alnus intron, was partially or completely absent in the other lineages of the family Betulaceae. This insertion was located within domain IV of the secondary-structure model of this group IIA intron. A mobile element of 47 nucleotides that showed homology to sequences located in rice rps3 intron and in intergenic plant mitochondrial genomes was found within this insertion. Both substitution and indel rates were low among the Betulaceae sequences, but substitution rates were increasingly larger than indel rates in comparisons involving more distantly related taxa. From a secondary-structure model, regions involved in helical structures were shown to be well preserved from indels as compared to substitutions, but compensatory changes were not observed among the angiosperm sequences analyzed. Using approximate divergence times based on the fossil record, substitution and indel rate heterogeneity was observed between different pairs of annual and perennial taxa. In particular, the annual petunia and primrose evolved more than 15 and 10 times faster, for substitution and indel rates respectively, than the perennial birch and alder. This is the first demonstration of an evolutionary rate difference between perennial and annual forms in noncoding DNA, lending support to neutral causes such as the generation time, population size, and speciation rate effects to explain such rate heterogeneity. Surprisingly, the sequence from the rps3 intron had a high identity with the sequence of intron 1 from the angiosperm mitochondrial nad5 gene, suggesting a common origin of these two group IIA introns. Introduction Autocatalytic introns are of central interest in genetics because several of them are mobile elements that may insert into intronless alleles. They are also related to ribozymes, by which they direct and catalyze the splicing of the flanking exons (Michel and Ferat 1995). These characteristics are important clues that could link the introns to their early or late origin, a question which remains unresolved in evolutionary genetics (Logsdon et al. 1995; Long, Rosenberg, and Gilbert 1995). Of most classes of autocatalytic introns, plant mitochondrial introns are those for which the least is known concerning the modes and tempo of evolution. On the basis of secondary-structure models, autocatalytic introns are classified into groups I, II (subgroups A and B), and III (Michel and Dujon 1983; Christopher and Hallick 1989). Group I introns are the most widespread and have been found in all eucaryotic genomes, as well as in eubacterial and bacteriophage genomes (Lambowitz and Belfort 1993). Group II introns have been found in fungi and plant organellar genomes and in some cyanobacterial and proteobacterial genomes, which are probable ancestors of mitochondria and chloroplasts (Ferat and Michel 1993; Lambowitz and Belfort 1993; Michel and Ferat 1995). This type of intron shares structural and catalytic characteristics with nuclear pre-mRNA introns and the small nuclear RNA components of the spliceosome (Cech 1986). Group III Key words: Betulaceae, intron secondary structure, mobile element, indel, substitution, rate heterogeneity. Address for correspondence and reprints: Jean Bousquet, Pavillon Marchand, Université Laval, Sainte-Foy, Canada G1K 7P4. E-mail: [email protected]. Mol. Biol. Evol. 16(4):441–452. 1999 q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 introns are found in lower euglenoid taxa and are known to form a mixed group II/group III twintron in the chloroplast genome (Copertino, Christopher, and Hallick 1991). Group II introns are characterized by the folding of the RNA sequence into six double-helical domains radiating from a central wheel (Michel and Dujon 1983; Michel, Umesono, and Ozeki 1989). Also, tertiary interactions have been identified between structural domains of group I and II introns (Jacquier and Michel 1987; Jaeger, Westhof, and Michel 1993; Michel and Ferat 1995). These secondary and tertiary interactions are likely to stabilize the folding of the catalytic core of introns. Hence, the helical regions of the secondary structure appear more conserved than the loops. For the chloroplastic group II introns, it has been suggested that some domains are more involved in the stability of the secondary structure because very different substitution rates have been found among the different domains (Learn et al. 1992). Thus, owing to the importance of accurate splicing, mutations that disrupt secondary and tertiary interactions are likely to be eliminated by strong selective pressure (Michel and Ferat 1995). The analyses of substitution and indel rates in plant mitochondrial intron sequences have also shown that numbers of indels are correlated with numbers of substitutions, but the latter seem to accumulate more readily as the taxonomical distance increases between taxa compared (Laroche et al. 1997). Such a saturation in the number of indels detected could be due to multiple events at the same site or to the fact that indels are tolerated only at a limited number of sites, mainly located in the loops of the secondary structure. Hence, for both substitutions and indels in intron sequences, functional constraints are likely 441 442 Laroche and Bousquet to be imposed by the secondary and tertiary interactions (Learn et al. 1992). In higher-plant mitochondrial genomes, the rps3 gene is clustered with the rpl16 gene in an overlapping operon sequence, and it contains one intron (Leblanc et al. 1995). With regard to nucleotide substitutions, the mitochondrial rps3 intron was the most variable among six mitochondrial introns sampled in a set of angiosperm taxa (Laroche et al. 1997) and thus, it might be most informative regarding the modes and tempo of evolution of intron mitochondrial DNA. In lower-plant mitochondrial genomes, such as in the Bryophyte Marchantia polymorpha, which contains 32 introns of group I and II, no introns are observed in its rps3 gene sequence (Oda et al. 1992). The primitive mitochondrial genomes of the rhodophyte Chondrus crispus (Leblanc et al. 1995) and the protozoan Reclinomonas (Lang et al. 1997) lack the majority of introns including the one in the rps3 gene. The rps3 gene is also present in the chloroplast genome of higher plants, but it is not interrupted by an intron. However, the chloroplast rps3 gene of Euglena gracilis contains a mixed group II/group III twintron of only 409 nucleotides (Copertino, Christopher, and Hallick 1991), and the alga Chlamydomonas contains an extra-long coding region instead of an intron (Turmel and Otis 1994). In this study, the complete nucleotide sequence of the mitochondrial rps3 intron and its secondary structure were determined for a set of taxa representing distant monocot and dicot families as well as taxa from the family Betulaceae, in order to analyze rate variation and to study the distribution of substitutions and indels with regard to the secondary structure of the rps3 intron for distantly related groups and among closely related taxa within a single family. Betulaceae is a small family of perennial angiosperms well described at the morphological and molecular levels (Crane 1989; Bousquet, Strauss, and Li 1992; Savard, Michaud, and Bousquet 1993). Significantly slower rates of substitution for both chloroplast and mitochondrial gene-coding sequences were observed in the Betulaceae when compared to annual dicots and monocots (Bousquet et al. 1992; Laroche et al. 1997). We also demonstrate a rate heterogeneity pattern between annual and perennial plant sequences for this intron, which parallels that observed for chloroplast, nuclear, and mitochondrial gene-coding regions (Bousquet et al. 1992; Gaut et al. 1992; EyreWalker and Gaut 1997; Laroche et al. 1997). Materials and Methods DNA Extraction and Amplification To estimate the amount of sequence variation in the mitochondrial rps3 intron among angiosperm taxa, the following sequences were retrieved from Genbank (with accession numbers): within the monocots, Oryza sativa (D21251) and Zea mays (U96618), Cyperales, Poaceae; within the dicots, Petunia hybrida (X67028), Asteridae, Solanales and Oenothera berteriana (X69140), Rosidae, Myrtales. The complete nucleotide sequence for the rps3 intron was obtained for 10 species representing the ma- jor generic and subgeneric taxonomical subdivisions within the family Betulaceae (dicots, Hamamelidae, Fagales): within the subfamily Betuleae, Alnus glutinosa (AF080076) from subgenus Alnus, Alnus maritima (AF080077) from subgenus Clethropsis, Betula alleghaniensis (AF080078) from section Costatae, Betula glandulosa (AF080079) from section Humiles, and Betula pendula (AF080081) from section Betulae; within the subfamily Coryleae, Carpinus caroliniana (AF080083), Corylus avellana (AF080084), Corylus colurna (AF080085), Corylus cornuta (AF080086), and Ostrya virginiana (AF080087). The outgroup Quercus rubra (AF080088) was selected from the closely related family Fagaceae (Hamamelidae, Fagales) (Maggia and Bousquet 1994). Total genomic DNA from all species was extracted by a CTAB method (Bousquet, Simon, and Lalonde 1990). The mitochondrial rps3 intron was amplified by PCR for 45 cycles (948C for 30 s, 558C for 1 min, and 728C for 1 min, 30 s), followed by 10 min at 728C, using the forward primer 59-ATCTGAATCGTAGTTCAGAT-39 and the reverse primer 59CAAAGGTGAGTMTCGTAGGT-39, located in exons 1 and 2, respectively. The forward and the reverse strands from each taxa were cycle-sequenced using the original primers and several internal primers: upstream, with 59GATGAGACTAAGCAGCCACC-39 and 59-TCTTATTCATTCAGGGTGCT-39; downstream, with 59-GCCGAG C A C C C T G A A T G AAT-39 and 59-CTC C T T C C CTTCCACTGCAT-39. Oligonucleotides were synthesized with a 394 DNA/RNA Synthesizer, and the reactions were loaded on a 373 XL DNA Sequencer (Perkin Elmer Applied Biosystems). Sequence Analysis Sequence analysis was carried out with the Wisconsin Package, version 9.0 (Genetics Computer Group [GCG], Madison, Wisc.). Sequence alignment was conducted with PILEUP and corrected by eye with LINEUP. Database searching for similarity between nucleotide sequences was conducted with BLAST. Values of minimum free energy for some regions were calculated with MFOLD. Secondary-structure model determination of the A. maritima rps3 intron was carried out according to a previously determined model (Michel, Umesono, and Ozeki 1989), and with the program STEMLOOP. This approach was preferred to the use of a probabilistic model (Muse 1995) because a secondary structure was already inferred for the group IIA nad5 intron (Michel, Umesono, and Ozeki 1989) for which a high sequence identity was found with the rps3 intron (see results below). Numbers of substitutions per site (rates) among the angiosperm sequences were calculated for each domain. The procedure used here avoids the circularity of identifying sequence segments on the basis of maximum conservation and then comparing rates of evolution among different domains (Golenberg et al. 1993). This also allowed the study of the evolution of secondary structure itself and the strength of secondary-structure models previously reported in the literature. Overall numbers of substitutions per site (K0) were calculated according to the two-parameter model of Kimura (1980) Evolution of Angiosperm rps3 Intron with MEGA, version 1.0 (Kumar, Tamura, and Nei 1993). No attempt was made at correcting for compensatory changes occurring in stem regions since no such changes were detected in this study. The numbers of indels per site are usually estimated by summing up all indels in each pairwise taxa comparison and dividing by the number of available sites, because each indel is considered to be the result of a single mutational event (Aldrich et al. 1988; Saitou and Ueda 1994; Laroche et al. 1997). However, because the number of sites in indels can be large, this procedure can overestimate the total number of sites and underestimate the indel rate per site. Therefore, the rate of indel per site between two nucleotide sequences was obtained by the following formula: I 5 N/(L 2 D 1 N) where I 5 indel rate, N 5 total number of indels, L 5 total number of sites, and D 5 number of sites involved in indels. In this equation, the number of sites involved in all indels between two sequences is subtracted from the total number of sites, and the total number of indels is added to the total number of sites to recover the sites where the indels occurred. This should allow for a more realistic estimation of numbers of indels per site. Results Primary and Secondary Structures of the rps3 Intron The beginning and end of the rps3 intron was determined from previously published monocot and dicot sequences. With a range of 1475 base pairs (bp) to 1847 bp, 10 distinct rps3 intron-length variants were observed among angiosperm taxa, five of which were observed among the five Betulaceae genera sampled (table 1, diagonal). At the intrageneric level, a total of three substitutions were found among the sequences of B. alleghaniensis, B. glandulosa, and B. pendula; two small indels were found between the sequences of A. maritima and A. glutinosa; and one substitution and two small indels were found among the sequences of C. avellana, C. colurna, and C. cornuta. Thus, because of the very low intragenus variability observed, further description will be reported for only one taxon per genus for these last three genera. The major length polymorphism among the Betulaceae taxa was caused primarily by a large indel of 133 bp in the sequence of Alnus, as compared to the shortest sequence of Betula (table 1, fig. 1). The sequences of the Coryleae (Corylus, Carpinus, and Ostrya) shared 43 bp at the 59 end and 30 bp at the 39 end with the Alnus indel. The outgroup sequence of Quercus (Fagaceae) shared 10 bp at the 59 end and 5 bp at the 39 end of this indel with the Betulaceae sequences. Between these two small homologous stretches, the sequence of Quercus contained an indel region of 92 bp for which no match was possible with the Betulaceae sequences. This portion of the alignment was considered to be nonhomologous and was excluded from the calculation of the substitution rates between Betulaceae and Quercus sequences. This portion of the rps3 intron also appeared quite 443 variable in a preceding study, and proper alignment between Zea, Triticum, Petunia, and Oenothera could not be achieved (Laroche et al. 1997). Many repeats and inverted repeats were found within the sequence of the rps3 intron. The most important ones were found within the large indel in the sequences of Alnus and of the Coryleae (fig. 1). A long inverted repeat (32 bp) was found within the large indel in the Alnus sequence. This inverted repeat could form a stable stem-loop structure with a free energy of 268.3 kcal/mol. The truncated indel in the sequences of the Coryleae retained a part of the long inverted repeat: 24 bp in Corylus, with a free energy of 235.7 kcal/mol, and 18 bp in Carpinus and Ostrya, with a free energy of 222.7 kcal/mol, respectively (fig. 1). Two mutations, in the sequences of Carpinus and Ostrya, shortened the inverted repeat by six nucleotides (fig. 1). No strong hairpin structure was found in the Betula sequence. A structure with a free energy of 212.9 kcal/mol was observed in the corresponding Quercus sequence. In the Alnus sequence, the large indel also contained four elements of three sets of overlapping direct repeats of 9 bp (no. 1, fig. 1), 13 bp (no. 2, fig. 1), and 17 bp (nos. 3 and 39, fig. 1). The elements of direct repeats nos. 1 and 2 were also present upstream from the indel in the Alnus intron sequence, and in the other angiosperm sequences. The complete rps3 intron sequences of A. maritima and Q. rubra and the portions corresponding to the large insert found in these two sequences were submitted to a BLAST search against nonredundant database sequences. Surprisingly, in the large indel of the Alnus sequence, a portion of 47 bp matched with noncoding angiosperm mitochondrial sequences, and the last 20 bp of this portion belonged to the first element of the large inverted repeat (fig. 1). This 47-bp segment was also found in the mitochondrial rps3 intron from Oryza sativa, at a different location in the 39 end of the intron (data not shown). Other positive matches were with intergenic mitochondrial sequences of diverse angiosperm taxa. For the large indel (92 bp, fig. 1) of the Quercus sequence, matches with high scores were observed with two regions of the mitochondrial rps3 intron in Arabidopsis thaliana and Brassica napus. An unexpected result was that the rps3 intron sequences showed high identity with the first intron of the plant mitochondrial nad5 gene (fig. 2). The highest BLAST score showed more than 80% identity on a 113bp stretch. An overall secondary-structure model, which corresponds to group IIA, was already derived for the plant mitochondrial nad5 intron (Michel, Umesono, and Ozeki 1989), so the segments highly similar between nad5 and rps3 introns helped to find the overall secondary-structure model for the rps3 intron (figs. 2–3). The most similar regions are likely involved in secondary base pairing, characterizing the group II introns (fig. 2). The helices of the different domains and the central wheel were particularly conserved between the two introns analyzed (fig. 2). The loops were much more variable in length (see numbers between brackets in fig. 2), 35 0.020 6 0.003 153 0.122 6 0.010 153 0.123 6 0.010 141 0.111 6 0.010 136 0.106 6 0.009 140 0.109 6 0.009 137 0.107 6 0.009 141 0.110 6 0.009 139 0.108 6 0.009 Oryza a 113 0.083 6 0.008 106 0.075 6 0.007 115 0.077 6 0.007 112 0.077 6 0.007 114 0.077 6 0.007 117 0.079 6 0.007 115 0.077 6 0.007 44 0.025 6 0.004 44 0.025 6 0.004 1649a Oenothera berteriana 98 0.072 6 0.007 99 0.071 6 0.007 105 0.076 6 0.008 100 0.072 6 0.007 103 0.075 6 0.008 101 0.073 6 0.007 47 0.031 6 0.004 44 0.029 6 0.004 35 0.024 6 0.004 1475a Petunia hybrida 18 0.011 6 0.003 18 0.012 6 0.003 16 0.010 6 0.003 18 0.011 6 0.003 17 0.011 6 0.003 35 0.019 6 0.003 36 0.019 6 0.003 33 0.020 6 0.003 30 0.020 6 0.004 1650a Quercus rubra 9 0.006 6 0.002 6 0.004 6 0.002 11 0.007 6 0.002 10 0.006 6 0.002 34 0.017 6 0.003 31 0.016 6 0.003 30 0.018 6 0.003 27 0.018 6 0.003 18 0.010 6 0.002 1734a Alnus maritima NOTE.—The numbers of substitutions and indels per site were obtained from pairwise deletion of gap sites from all comparisons. Intron length for each species. Ostrya Carpinus Corylus Betula Alnus Quercus Petunia 147 0.117 6 0.010 145 0.117 6 0.010 133 0.105 6 0.009 129 0.100 6 0.009 132 0.103 6 0.009 129 0.100 6 0.009 132 0.103 6 0.009 130 0.101 6 0.009 11 0.006 6 0.002 1847a 1843a Zea Oenothera Oryza sativa Zea mays 9 0.006 6 0.002 9 0.006 6 0.002 8 0.005 6 0.002 37 0.022 6 0.004 34 0.020 6 0.003 30 0.019 6 0.003 28 0.019 6 0.004 16 0.010 6 0.003 4 0.002 6 0.001 1599a Betula alleghaniensis 7 0.004 6 0.002 4 0.002 6 0.001 34 0.018 6 0.003 32 0.017 6 0.003 31 0.019 6 0.003 26 0.018 6 0.003 18 0.011 6 0.003 5 0.003 6 0.001 6 0.004 6 0.002 1668a Corylus cornuta 3 0.002 6 0.001 33 0.018 6 0.003 30 0.016 6 0.003 33 0.020 6 0.003 29 0.020 6 0.004 18 0.011 6 0.003 7 0.004 6 0.002 9 0.006 6 0.002 4 0.002 6 0.001 1672a Carpinus caroliniana 33 0.018 6 0.003 31 0.017 6 0.003 30 0.018 6 0.003 28 0.019 6 0.004 17 0.010 6 0.002 6 0.004 6 0.001 8 0.005 6 0.002 3 0.002 6 0.001 1 0.001 6 0.001 1674a Ostrya virginiana Table 1 Evolutionary Rates of Mitochondrial rps3 Intron Among Angiosperm Taxa. Numbers of Indels and Numbers of Indels per Site (above diagonal), Numbers of Substitutions and Number of Substitutions per Site (below diagonal) 444 Laroche and Bousquet Evolution of Angiosperm rps3 Intron 445 FIG. 1.—Sequence alignment of a portion of the loop in domain IV of the mitochondrial rps3 intron. Inverted repeats are indicated by the large arrows. Direct-repeat elements are indicated by the small, numbered traits. The mobile element that had the highest scores in the BLAST search is boxed with the two small direct repeats (6 bp) indicated by the dashed boxes. Note that the corresponding portion in Quercus sequence is not homologous to the Betulaceae sequences. but they also contained residue stretches that could participate in base-pairing interactions. The secondary-structure model derived for the rps3 intron sequence of A. maritima also corresponds to those of group IIA introns with six domains (I–VI) radiating from a central wheel, regions of exon- and intron-binding sites (EBS and IBS) and g–gg9 potentially involved in tertiary interactions (fig. 3). Although the EBS1– EBS2 and IBS1–IBS2 regions appeared very different between the nad5 and rps3 introns, they were located in the same regions of domain I and at the 39 end of exon 1. According to this model, the large indel found in Alnus, Corylus, Ostrya, and Carpinus sequences was located in the loop of domain IV. In the rps3 intron, domain IV was the largest, with 43.3% of the total sequence length, and domain II was the smallest, with only 1.4% of the total sequence length. Substitution and Indel Rates in rps3 Intron Sequences Overall numbers of substitutions per site were estimated with pairwise deletion of gap sites to allow a direct comparison with the numbers of indels per site, which were obtained for each pairwise comparison. These estimates varied greatly between the different an- giosperm rps3 intron sequences compared (lower left matrix, table 1). Within dicots, substitution rates were similar across subclasses (Asteridae, Rosidae, and Hamamelidae), although rates between Oenothera (Onagraceae, Rosidae) and the Betulaceae (Hamamelidae) were higher than those between Petunia (Solanaceae, Asteridae) and these dicots. Within the Betulaceae, the Coryleae (Corylus, Carpinus, and Ostrya) were more similar in sequence with each other than the Betuleae were (Betula and Alnus), although these differences were not significant (data not shown). A large rate heterogeneity was observed for substitutions between annual and perennial taxa. Using approximate divergence times based on the fossil record to estimate rates per year, differences from 10- to 30fold were observed between, on one hand, the annuals Oryza-Zea (Poaceae) and Petunia-Oenothera and, on the other hand, the perennials Alnus-Betula and CarpinusOstrya (table 2). Errors in calibration dates could not account for such rate heterogeneity between the various groups compared, with the largest differences observed between perennial and annual taxa. Using Oryza or Zea as reference taxon in lineage relative-rate tests (Li and Bousquet 1992), significant differences were observed 446 Laroche and Bousquet FIG. 2.—Sequence alignment showing the most conserved regions between mitochondrial nad5-1 and rps3 introns. Numbers in brackets indicate length of omitted segments. Small dots correspond to gaps which were introduced to increase similarity. The portions involved in helices of domains I–VI are underlined; EBS and IBS stand for exon- and intron-binding sites, respectively; # refers to the g–g9 base pair; *indicates the nucleotide involved in the lariat formation (see also fig. 3). between, on one hand, the annuals Petunia-Oenothera, and on the other hand, the perennials Alnus-Betula or Carpinus-Ostrya (P , 0.01 in the four tests conducted). No such test could be conducted to compare the annuals Oryza and Zea to the perennial Betulaceae because of the nonavailability of suitable outgroup sequences outside the angiosperms. The numbers of indels per site were also found to vary extensively across all angiosperm pairwise comparisons (upper right matrix, table 1). Indel rates were similar across dicot subclasses (Asteridae, Rosidae, and Hamamelidae), and higher rates were observed between Oenothera and the Betulaceae than between Petunia and these perennial dicots. Within the Betulaceae, there was less difference in indel rates between Coryleae and Betulaeae than in substitution rates. Again here, extensive rate heterogeneity was observed between annual and perennial taxa. Using approximate divergence times derived from the fossil record to estimate rates per year, differences from 10- to 30-fold were also found, between, on one hand, the annuals Oryza-Zea and PetuniaOenothera and, on the other hand, the perennials AlnusBetula and Carpinus-Ostrya (table 2). Using Oryza or Zea as reference taxon in lineage relative-rate tests (Li and Bousquet 1992), significant differences were observed between, on one hand, the annuals Petunia-Oen- Evolution of Angiosperm rps3 Intron 447 FIG. 2 (Continued) othera, and, on the other hand, the perennials AlnusBetula or Carpinus-Ostrya (P , 0.01 in the four tests conducted). For the same reason mentioned above, no such test could be conducted between the annuals Oryza and Zea and the perennial Betulaceae. Indel rates seemed generally more constrained than substitution rates. Indeed, using a substitution rates (K) to indel rates (I) ratio (K/I), the substitution rates were found to increase more rapidly than the indel rates as taxonomical distance increased. The ratio varied between 1.0 and 2.7 for comparisons within the Betulaceae family and for some comparisons between Betulaceae taxa and Fagaceae (Quercus). The ratio values increased to 3.0 and 4.8 for comparisons between different subclasses of the dicots (Petunia-Asteridae and Oenothera- Rosidae), between Zea and Oryza (Poaceae), and between the monocots (Poaceae) and the dicots (Petunia and Oenothera). The ratio values varied between 5.0 and 6.4 for most comparisons between annuals and perennials. Even if most substitutions and indels observed in the rps3 intron between angiosperm taxa were located within the loops of the different domains, particularly for domains III and IV (table 3), substitutions were more evenly distributed than indels, which could account for the increasing substitution-to-indel rate ratio as taxonomical distances increased. The central wheel of the secondary structure was particularly preserved from substitutions and indels (see nucleotide positions between main domains in fig. 2). The main loop of domain IV was the least conserved, 448 Laroche and Bousquet FIG. 3.—Secondary-structure model of the mitochondrial rps3 gene of Alnus maritima based on Michel et al. (1989). Regions potentially involved in tertiary interactions are indicated: IBS1–IBS2, EBS1–EBS2, g–g9, and the bulged nucleotide A in the domain VI. The loops are not drawn to scale. and most substitutions and indels among angiosperm sequences were located in this region (data not shown for indel rates, but see table 3 for substitution rates). Among the six domains, domain II was the most conserved, with only one substitution in the Oenothera sequence, followed by domains V, VI, III, I, and IV (table 3). In general, there was not much difference in substitution rates between helix regions alone and the overall domain. Surprisingly, in some pairwise comparisons, the substitution rates were higher for the regions involved in base-pairing interactions than for the overall domain (table 3). However, no compensatory changes were detected. Discussion The rps3 intron was not an arbitrary choice as a case study. In a preliminary screening, six mitochondrial Table 2 Numbers of Substitutions (K) and Indels (I) per Site per Year Between Angiosperm Taxa Divergence time (Myr)a Petunia vs. Oenothera Oryza vs. Zea Alnus vs. Betula Carpinus vs. Ostrya 90–70 70–50 85–75 65–55 r (I) 1.32 4.40 1.47 4.60 3 3 3 3 210 10 –1.70 10211–6.16 10211–1.67 10212–5.43 r (K) 3 3 3 3 210 10 10211 10211 10212 4.63 1.42 3.33 1.38 3 3 3 3 210 10 –5.96 10210–1.99 10211–3.77 10211–1.64 3 3 3 3 10210 10210 10211 10211 a References for approximate divergence times are as follows: Petunia vs. Oenothera (divergence between Asteridae and Rosidae): Cronquist (1988, pp. 413– 415, 359–361) and Stewart and Rothwell (1993, p. 483), Oryza vs. Zea: G. L. Stebbins, cited in Wolfe et al. (1989), and Alnus vs. Betula and Carpinus vs. Ostrya: Crane (1989). Evolution of Angiosperm rps3 Intron 449 Table 3 Numbers of Substitutions per Site for Each Domain of Mitochondrial rps3 Intron Between Angiosperm Taxa Oryza/Zea Domain I (466 sites) Helix (224 sites) Domain II (25 sites) Domain III (229 sites) Helix (44 sites) Domain IV (478 sites) Domain V (34 sites) Domain VI (80 sites) Helix (49 sites) 0.015 6 0.006 0.014 6 0.008 0 0.022 6 0.010 0 0.039 6 0.009 0 0.013 6 0.013 0 Oryza/Oenothera Alnus/Betula 6 6 6 6 6 6 6 6 6 0.004 6 0.003 0.009 6 0.006 0 0.009 6 0.006 0.023 6 0.024 0.006 6 0.004 0 0 0 0.111 0.085 0.042 0.022 0.151 0.163 0.061 0.079 0.086 0.016 0.020 0.043 0.010 0.063 0.020 0.044 0.033 0.044 Alnus/Oenothera 0.060 0.060 0.042 0.059 0.023 0.063 0.030 0.052 0.064 6 6 6 6 6 6 6 6 6 0.012 0.017 0.043 0.020 0.023 0.012 0.030 0.026 0.037 Oenothera/Petunia 0.072 0.071 0.042 0.094 0.023 0.093 0.061 0.038 0.042 6 6 6 6 6 6 6 6 6 0.013 0.019 0.043 0.021 0.023 0.015 0.044 0.022 0.030 NOTE.—See table 1 for complete name of taxa. Secondary model according to figure 3. The numbers of substitutions per site were obtained from complete deletion of gap sites from all comparisons. introns were tested for amplification with polymerase chain reaction (PCR) in the family Betulaceae: cox2, nad1, nad4, nad5ab, nad5-de, and rps3. From this screening, five introns were successfully amplified, and a fragment of the expected size was obtained. No intron was found to split the coding sequence of the cox2 gene (data not shown). This observation is consistent with those of De Benedetto et al. (1992) and Rabbi and Wilson (1993), who found extensive variation in the occurrence of this intron among angiosperms. From the five introns successfully amplified in the family Betulaceae, a length polymorphism was observed only for rps3 intron. This intron was also the most variable in sequence among a set of four annual dicots and monocots (Laroche et al. 1997). We therefore focused on this intron for a detailed analysis of substitution and indel rates and of patterns of sequence variation between annual and perennial angiosperms. Secondary Structures and Paralogy Between Mitochondrial Introns rps3 and nad5 According to the secondary structure obtained here, the angiosperm mitochondrial rps3 intron belongs to group IIA like the intron 1 of the mitochondrial nad5 gene. It has been shown that domain V may be regarded as a specific attribute of group II introns (Michel and Ferat 1995). In this study, not only domain V but also large parts of domains I and VI and all of domain II appeared highly conserved between the mitochondrial rps3 and nad5 introns. These observations indicate a recent common origin of these two introns and support the idea of intron spread by means of reverse-splicing mechanisms (Malek, Brennicke, and Knoop 1997). The gain of an intron into a coding sequence through duplication and integration into a new site has also been reported for the introns of cox1, cox3, and rrn26 in the mitochondrial genome of Marchantia polymorpha and for the introns nad1 and nad2 in higher plants (reported in Schuster and Brennicke 1994, and references therein). Between the rps3 and nad5 introns, only the 59 of the helical part of domain III remained strongly conserved, and sequences of domain IV could not be aligned. Also, in the rps3 intron, domain II was the shortest and the most conserved domain, while domain IV was the longest and the most variable. The same observation was found for most of the mitochondrial introns analyzed (Michel, Umesono, and Ozeki 1989). This gradient of substitution rates observed among the six domains would indicate different levels of constraint, with the most variable domains being less essential for the catalytic activity even if they are presumed to be involved in the folding of ribozymes or with protein binding (Lambowitz and Belfort 1993). It is rather surprising to observe relatively high substitution rates for the helical part of each domain, which could cause sequence mispairing, as previously observed for rRNA genes (Rzhetsky 1995). But we found no evidence of compensatory changes between the two DNA strands of helical structures. These observations suggest that weak helical core structures are possible for plant mitochondrial group II introns, which could be explained by incompleteness or absence of an RNA-editing process in these regions (Carrillo and Bonen 1997). Different substitution rates were also observed among structural domains of the chloroplast trnV intron, but domain II was longer and more variable than domain IV (Michel, Umesono, and Ozeki 1989; Learn et al. 1992). Such differences in length and substitution rates among domains between chloroplastic and mitochondrial introns could be the result of distinct evolutionary modes of these genomes (Wolfe et al. 1987; Michel and Ferat 1995). These results also show that if the central core and the overall structure of the six domains were very conserved, secondary and tertiary interactions could differ among taxa following extensive evolution by substitutions and indels. A secondary-structure model has also been inferred for the chloroplast rps3 intron in E. gracilis (Copertino, Christopher, and Hallick 1991), but it differs greatly in its primary sequence from the plant mitochondrial rps3 intron. Also, the former contains a group III intron within the sequence of the group II intron. The occurrence of an insert of variable length in domain IV of the rps3 intron of the Betulaceae was investigated from an evolutionary perspective (fig. 4). According to the known phylogeny of the family Betulaceae (Crane 1989; Bousquet, Strauss, and Li 1992), this DNA fragment, absent in the outgroup sequence of Quercus and in other angiosperm sequences, was likely inserted before the diversification that led to the actual 450 Laroche and Bousquet FIG. 4.—Commonly accepted phylogenetic tree of the family Betulaceae (Crane 1989; Bousquet, Strauss, and Li 1992) based on morphological characters and rbcL nucleotide sites showing gains (downward arrows) and losses (upward arrows) of the large indel in mitochondrial rps3 intron. 1—The indel was gained before the divergence of the family. 2—Partial loss of 60 nucleotides during the Coryleae cladogenesis. 3—Complete loss in the genus Betula. 4—Occurrence of two nucleotide substitutions shortening the inverted repeat in the truncated indel of Ostrya and Carpinus. family Betulaceae (event 1). It was eventually lost, partially so (event 2) in the subfamily Coryleae (Corylus, Carpinus, and Ostrya), and completely so (event 3) in the genus Betula. Finally, two mutations in the OstryaCarpinus lineage shortened the inverted repeat (event 4). The insert was conserved in its most complete form only in the sequences of Alnus, known as the most primitive member of the family, on the basis of fossil evidence (see in Bousquet, Strauss, and Li 1992). We have further shown that part of this fragment is an inverted repeat found to be homologous to other sequences of angiosperm mitochondrial genomes, suggesting a possible transpositional event. The presence of such repeated elements in the large and variable domain IV in the secondary structure of rps3 intron may increase the recombinational activity in the mitochondrial genome (Malek, Brennicke, and Knoop 1997). Such recombinational activity caused by repeated sequences is a common feature of the plant mitochondrial genome (Houchins et al. 1986; André, Levy, and Walbot 1992). Heterogeneity of Substitution and Indel Rates The plant mitochondrial rps3 intron appears to be the most variable in length and in sequence among the plant mitochondrial introns analyzed to date (Laroche et al. 1997). Rates of substitution were fairly similar to rates of indel in the mitochondrial rps3 intron when comparisons were made between closely related taxa, such as within the family Betulaceae. However, when the comparisons involved a Betulaceae sequence and the outgroup sequence Quercus from the Fagaceae, indel rates did not increase as much as substitution rates. Indeed, substitution rates were, on average, 2.5 times higher than indel rates. The ratio was even more unbalanced when taxonomically more distant comparisons were made. This trend toward larger substitution-to-indel ratios with increasing taxonomical distance was already reported for other mitochondrial introns (Laroche et al. 1997) and for a noncoding chloroplast DNA region (Golenberg et al. 1993). This effect could be attributable to the secondary and tertiary structures of the rps3 intron that would impose major constraints, eliminating more readily indels that could disrupt the overall stability, or to the fact that multiple indels at the same site could not be detected. Indeed, substitutions were found to be more evenly distributed than indels, indicating that they would be more easily tolerated: many substitutions were observed in putatively important sites or regions involved in secondary and tertiary interactions, and much of the sequence variation observed, attributable to indels, was within the loops of the different domains, particularly domains III and IV. In the Betulaceae, the substitution rate of the mitochondrial rps3 intron was very low. This substitution rate compares to that observed at the rbcL locus between the same genera (Bousquet, Strauss, and Li 1992), and it is three times lower than that for 18S rRNA between the same taxa and 30 times lower than that estimated for the intergenic spacers ITS1 and ITS2 of nuclear rDNA between the same taxa (Savard, Michaud, and Bousquet 1993). Moreover, the lineage relative-rate tests (Li and Bousquet 1992) conducted between Asterideae and Rosideae taxa on one hand, and the Betulaceae on the other hand, and the estimation of evolutionary rates per year in these diverse groups of plants and in the Poaceae revealed extensive heterogeneity in rates of molecular evolution between annual and perennial plant taxa. Such rate heterogeneity, observed for substitutions as well as for indels, does not appear to result from biased procedures of rate estimation. Refined models of nucleotide substitution have been recently proposed to take into account the interdependence of sites and variability of substitution rates across sequences for which base-pairing interactions occur among distant sites (Schöniger and von Haeseler 1994; Rzhetsky 1995; Tillier and Collins 1995). These models have been applied to rRNA sequences of very distant taxa (Rzhetsky 1995) and rapidly evolving mammalian mitochondrial gene sequences (Schöniger and von Haeseler 1994). Very little difference was observed between these models and those assuming independence of sites and homogeneous distribution of variation across sites, such as those of Jukes and Cantor (1969) and Kimura (1980), at high levels of sequence identity, and significant differences were only observed at lower sequence identity levels, where it becomes extremely difficult to align sequences (Schöniger and von Haeseler 1994). Thus, these methods were not required in this study of angiosperm mitochondrial intron sequences which could be easily aligned by eye and in which the estimated numbers of substitutions were low. In addition, the single-strand model of Tillier and Collins (1995) could be particularly appropriate when large numbers of compensatory changes are detected. In this study, no compensatory changes were observed. Thus, the large rate heterogeneity in substitutions and indels here observed between annual and perennial plant taxa cannot be attributable to a biased estimation of the number of substitutions due to compensatory changes. This new observation of rate heterogeneity between annual and perennial taxa for a noncoding region fol- Evolution of Angiosperm rps3 Intron lows the trend observed for chloroplast, nuclear, and mitochondrial coding regions (Bousquet, Strauss, and Li 1992; Gaut et al. 1992; Eyre-Walker and Gaut 1997; Laroche et al. 1997). The much slower rate of evolution in perennial taxa such as the Betulaceae, now detected at the level of noncoding mitochondrial DNA, lends support to the idea that evolutionary forces affecting all regions of the different genomes are likely to be involved, such as generation time, population size, and speciation rate (Bousquet et al. 1992; Eyre-Walker and Gaut 1997). Furthermore, molecular rate heterogeneity appears to correlate with rates of morphological evolution in a growing number of taxa (Bousquet, Strauss, and Li 1992; Bousquet et al. 1992; Omland 1997), although the main driving force behind this evolutionary trend remains to be identified. Conclusions The results presented here show that the regions essential for the folding of the angiosperm mitochondrial rps3 intron are well preserved from mutations, particularly from indels. The indel rates were more similar to the substitution rates when closely related taxa were compared, but the indels-to-substitutions ratio decreased when more distant species were compared. The overall rates of substitution of the rps3 intron were in the range of those of synonymous rates of substitution for plant mitochondrial exons (Laroche et al. 1997) and in the range of that estimated from chloroplast coding regions; hence, they are not very phylogenetically informative at the intrafamilial level. However, substitution rate heterogeneity between annual and perennial taxa was observed for a noncoding (mitochondrial) region, which parallels the trend previously observed for coding regions in the three plant genomes. The observed paralogy detected here between introns rps3 and nad5 is significant, lending support to intron transfer between different mitochondrial genes. This observation stresses the need for an overall phylogenetic tree of introns, organellar and nuclear, in order to understand their complete evolutionary history. Acknowledgments We thank W. J. Elisens (Department of Botany, University of Oklahoma) for kindly providing seeds of A. maritima; J. Renaud and S. Pelletier (RSVS, Université Laval) for primer synthesis and DNA sequencing; C. Lemieux (Département de Biochimie, Université Laval) for discussions concerning intron evolution and secondary structures; and D. J. Perry (CRBF, Université Laval) for comments on an earlier draft of this manuscript. This work was supported by a FCAR of Québec fellowship to J.L. and by NSERC of Canada and FCAR grants to J.B. LITERATURE CITED ALDRICH, J., B. W. CHERNEY, E. MERLIN, and L. CHRISTOPHERSON. 1988. The role of insertions/deletions in the evolution of the intergenic region between psbA and trnH in the chloroplast genome. Curr. Genet. 14:137–146. 451 ANDRÉ, C., A. LEVY, and V. WALBOT. 1992. Small repeated sequences and the structure of plant mitochondrial genomes. Trends Genet. 8:128–132. BOUSQUET, J., L. SIMON, and M. LALONDE. 1990. DNA amplification from vegetative and sexual tissues of trees using polymerase chain reaction. Can. J. For. Res. 20:254–257. BOUSQUET, J., S. H. STRAUSS, A. H. DOERKSEN, and R. A. PRICE. 1992. Extensive variation in evolutionary rate of rbcL gene sequences among seed plants. Proc. Natl. Acad. Sci. USA 89:7844–7848. BOUSQUET, J., S. H. STRAUSS, and P. LI. 1992. Complete congruence between morphological and rbcL-based molecular phylogenies in birches and related species (Betulaceae). Mol. Biol. Evol. 9:1076–1088. CARRILLO, C., and L. BONEN. 1997. RNA editing status of nad7 intron domains in wheat mitochondria. Nucleic Acids Res. 25:403–409. CECH, T. R. 1986. The generality of self-splicing RNA: relationship to nuclear mRNA splicing. Cell 44:207–210. CHRISTOPHER, D. A., and R. B. HALLICK. 1989. Euglena gracilis chloroplast ribosomal protein operon: a new chloroplast gene for ribosomal protein L5 and description of a novel organelle intron category designated group III. Nucleic Acids Res. 17:7591–7608. COPERTINO, D. W., D. A. CHRISTOPHER, and R. B. HALLICK. 1991. A mixed group II/group III twintron in the Euglena gracilis chloroplast ribosomal protein S3 gene: evidence for intron insertion during gene evolution. Nucleic Acids Res. 19:6491–6497. CRANE, P. R. 1989. Early fossil history and evolution of the Betulaceae. Pp. 87–116 in P. R. CRANE and S. BLACKMORE, eds. Evolution, systematics, and fossil history of the Hamamelidae, Vol. 2. ‘‘Higher’’ Hamamelidae. Clarendon, Oxford, England. CRONQUIST, A. 1988. The evolution and elassification of flowering plants. 2nd edition. Columbia University Press, New York. DE BENEDETTO, C., L. DE GARA, O. ARRIGONI, M. ALBRIZIO, and R. GALLERANI. 1992. The structure of the cytochrome oxidase subunit II gene and its use as a new character in the construction of the phylogenetic tree of Angiospermae. Plant Sci. 81:75–82. EYRE-WALKER, A., and B. S. GAUT. 1997. Correlated rates of synonymous site evolution across plant genomes. Mol. Biol. Evol. 14:455–460. FERAT, J.-L., and F. MICHEL. 1993. Group II self-splicing introns in bacteria. Nature 364:358–361. GAUT, B. S., S. V. MUSE, W. D. CLARK, and M. T. CLEGG. 1992. Relative rates of nucleotide substitution at the rbcL locus of monocotyledonous plants. J. Mol. Evol. 35:292– 303. GOLENBERG, E. M., M. T. CLEGG, M. L. DURBIN, J. DOEBLEY, and D. P. MA. 1993. Evolution of a non-coding region of the chloroplast genome. Mol. Phylogenet. Evol. 2:52–64. HOUCHINS, J. P., H. GINSBURG, M. ROHRBAUGH, R. M. K. DALE, C. L. SCHARDL, T. P. HODGE, and D. M. LONDSDALE. 1986. DNA sequence analysis of a 5.27-kb direct repeat occurring adjacent to the regions of S-episome homology in maize mitochondria. EMBO J. 5:2781–2788. JACQUIER, A., and F. MICHEL. 1987. Multiple exon-binding sites in class II self-splicing introns. Cell 50:17–29. JAEGER, L., E. WESTHOF, and F. MICHEL. 1993. Monitoring of the cooperative unfolding of the sunY group I intron of bacteriophage T4. J. Mol. Biol. 234:331–346. JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 21–123 in H. N. MUNRO, ed. Mammalian protein metabolism. Academic Press, New York. 452 Laroche and Bousquet KIMURA, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120. KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: Molecular evolutionary genetics analysis. Version 1.0. Pennsylvania State University, University Park. LAMBOWITZ, A. M., and M. BELFORT. 1993. Introns as mobile genetic element. Annu. Rev. Biochem. 62:587–622. LANG, B. F., G. BURGER, C. J. O’KELLY, R. CEDERGREN, G. B. GOLDING, C. LEMIEUX, D. SANKOFF, M. TURMEL, and M. W. GRAY. 1997. An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature 387: 493–497. LAROCHE, J., P. LI, L. MAGGIA, and J. BOUSQUET. 1997. Molecular evolution of angiosperm mitochondrial exons and introns. Proc. Natl. Acad. Sci. USA 94:5722–5727. LEARN, G. H., J. S. SHORE, G. R. FURNIER, G. ZURAWSKI, and M. T. CLEGG. 1992. Constraints on the evolution of plastid introns: the group II intron in the gene encoding tRNA-Val (UAC). Mol. Biol. Evol. 9:856–871. LEBLANC, C., C. BOYEN, O. RICHARD, G. BONNARD, J.-M. GRIENENBERGER, and B. KLOAREG. 1995. Complete sequence of the mitochondrial DNA of the rhodophyte Chondrus crispus (Gigartinales). Gene content and genome organization. J. Mol. Biol. 250:484–495. LI, P., and J. BOUSQUET. 1992. Relative rate test for nucleotide substitutions between two lineages. Mol. Biol. Evol. 9: 1185–1189. LOGSDON, J. M., M. G. TYSHENKO, C. DIXON, J. D.-JAFARI, V. K. WALKER, and J. D. PALMER. 1995. Seven newly discovered intron positions in the triose-phosphate isomerase gene: evidence for the introns-late theory. Proc. Natl. Acad. Sci. USA 92:8507–8511. LONG, M., C. ROSENBERG, and W. GILBERT. 1995. Intron phase correlations and the evolution of the intron/exon structure of genes. Proc. Natl. Acad. Sci. USA 92:12495–12499. MAGGIA, L., and J. BOUSQUET. 1994. Molecular phylogeny of the actinorhizal Hamamelidae and relationships with host promiscuity towards Frankia. Mol. Ecol. 3:459–467. MALEK, O., A. BRENNICKE, and V. KNOOP. 1997. Evolution of trans-splicing plant mitochondrial introns in pre-Permian times. Proc. Natl. Acad. Sci. USA 94:553–558. MICHEL, F., and B. DUJON. 1983. Conservation of RNA secondary structures in two intron families including mitochondrial-, chloroplast- and nuclear encoded members. EMBO J. 2:33–38. MICHEL, F., and J.-L. FERAT. 1995. Structure and activities of group II introns. Annu. Rev. Biochem. 64:435–461. MICHEL, F., K. UMESONO, and H. OZEKI. 1989. Comparative and functional anatomy of group II catalytic introns. Gene 82:5–30. MUSE, S. V. 1995. Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics 139: 1429–1439. ODA, K., K. YAMATO, E. OHTA et al. (11 co-authors). 1992. Gene organization deduced from the complete sequence of liverwort Marchantia polymorpha mitochondrial DNA. J. Mol. Biol. 223:1–7. OMLAND, K. E. 1997. Correlated rates of molecular and morphological evolution. Evolution 51:1381–1393. RABBI, M. F., and K. G. WILSON. 1993. The mitochondrial cox2 intron has been lost in two different lineages of dicots and altered in others. Am. J. Bot. 80:1216–1223. RZHETSKY, A. 1995. Estimating substitution rates in ribosomal RNA genes. Genetics 141:771–783. SAITOU, N., and S. UEDA. 1994. Evolutionary rates of insertion and deletion in non-coding nucleotide sequence of primates. Mol. Biol. Evol. 11:504–512. SAVARD, L., M. MICHAUD, and J. BOUSQUET. 1993. Genetic diversity and phylogenetic relationships between birches and alders using ITS, 18S rRNA, rbcL gene sequences. Mol. Phylogenet. Evol. 2:112–118. SCHÖNIGER, M., and A. VON HAESELER. 1994. A stochastic model for the evolution of autocorrelated sequences. Mol. Phylogenet. Evol. 3:240–247. SCHUSTER, W., and A. BRENNICKE. 1994. The plant mitochondrial genome: physical structure, information content, RNA editing, and gene migration to the nucleus. Annu. Rev. Plant Physiol. Plant Mol. Biol. 45:61–78. STEWART, W. N., and G. W. ROTHWELL. 1993. Paleobotany and the evolution of plants. 2nd ed. Cambridge University Press, Cambridge, England. TILLIER, E. R. M., and R. A. COLLINS. 1995. Neighbor joining and maximum likelihood with RNA sequences: addressing the interdependence of sites. Mol. Biol. Evol. 12:7–15. TURMEL, M., and C. OTIS. 1994. The chloroplast gene cluster containing psbF, psbL, petG and rps3 is conserved in Chlamydomonas. Curr. Genet. 27:54–61. WOLFE, K. H., M. GOUY, Y.-W. YANG, P. M. SHARP, and W.H. LI. 1987. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc. Natl. Acad. Sci. USA 84:9054–9058. WOLFE, K. H., M. GOUY, Y.-W. YANG, P. M. SHARP, and W.H. LI. 1989. Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc. Natl. Acad. Sci. USA 86:6201–6205. BARBARA A. SCHAAL, reviewing editor Accepted December 8, 1998
© Copyright 2026 Paperzz