DTD 5 Review ARTICLE IN PRESS TIGS 280 TRENDS in Genetics Vol.xx No.xx Monthxxxx The natural history of group I introns Peik Haugen, Dawn M. Simon and Debashish Bhattacharya Department of Biological Sciences and Roy J. Carver Center for Comparative Genomics, University of Iowa, 312 Biology Building, Iowa City, IA 52242-1324, USA There are four major classes of introns: self-splicing group I and group II introns, tRNA and/or archaeal introns and spliceosomal introns in nuclear pre-mRNA. Group I introns are widely distributed in protists, bacteria and bacteriophages. Group II introns are found in fungal and land plant mitochondria, algal plastids, bacteria and Archaea. Group II and spliceosomal introns share a common splicing pathway and might be related to each other. The tRNA and/or archaeal introns are found in the nuclear tRNA of eukaryotes and in archaeal tRNA, rRNA and mRNA. The mechanisms underlying the self-splicing and mobility of a few model group I introns are well understood. By contrast, the role of these highly distinct processes in the evolution of the 1500 group I introns found thus far in nature (e.g. in algae and fungi) has only recently been clarified. The explosion of new sequence data has facilitated the use of comparative methods to understand group I intron evolution in a broader context and to generate hypotheses about intron insertion, splicing and spread that can be tested experimentally. Introduction Mobile genetic elements have had a major impact on eukaryotic genomes [1]. Group I introns are small RNAs (they range in size from 250–500nt) and comprise one of the four distinct types of introns (Box 1). These elements are found in a wide variety of organisms (e.g. in fungi, algae and in many other unicellular eukaryotes), genes (i.e. protein, rRNA and tRNA coding genes) and genomes throughout the tree of life. Group I introns spread efficiently at the DNA level into intronless cognate sites by a process termed homing (see Glossary). The success of group I introns in genomes is also a result of their ability to self-splice from RNA transcripts, which potentially renders them neutral to the host. The structure, folding and autocatalysis of a few model group I introns have been rigorously studied, and we now have a detailed understanding of how these elements act as catalytic RNAs (ribozymes) during splicing of precursor RNA [2–4] (Box 2). The mechanism of group I intron spread in DNA has also been studied in detail. Homing endonuclease genes (HEGs) that invade non-critical regions (i.e. terminal loops) of group I introns promote intron mobility by encoding highly site-specific homing endonucleases (HEs). HEGs are themselves selfish genetic elements and the vast majority of group I introns in nature do not contain a Corresponding author: Bhattacharya, D. ([email protected]). HEG (see the following section for intron life cycle). HEs and putative HEs (encoded by HEG-like sequences) are divided into four families based on the presence of conserved amino acid motifs: these are the His-Cys box, LAGLIDADG, GIY-YIG and HNH families. Differences in the endonuclease sequences suggest independent origins of the four families of HE and putative HE proteins [5]. Despite extensive knowledge about group I intron biochemistry, understanding the complex and often unpredictable evolutionary history of group I introns and their associated HEGs has proven to be a formidable challenge. The explosive growth of DNA sequence data in GenBank (http://www.ncbi.nlm.nih.gov/GenBank/GenBankstats. html) has, however, turned the tide enabling a more Box 1. Types of introns Four major classes of introns are recognized based on their splicing mechanism. These are autocatalytic group I and group II introns, spliceosomal introns, tRNA and/or archaeal introns. Group I introns are widespread in protist nuclear rDNA genes, fungal mitochondria, bacteria and bacteriophages. These pre-RNA insertions self-splice by a well-characterized and distinctive two-step pathway that relies on an external guanosine nucleotide as the cofactor. Group II introns are found in bacterial and organellar genomes and self-splice through a pathway that is different from group I introns but is similar to the mechanism of spliceosomal intron removal. In these introns, rather than using an external guanosine, the 2 0 -OH of an adenine residue within the intron is the nucleophile. The shared splicing mechanism suggests that the small nuclear (sn)RNA components of the spliceosome are derived from a group II intron. Spliceosomal introns are the most common insertions found in nuclear pre-mRNA genes. The tRNA introns are found in eukaryotic nuclei and in Archaea and are enzymatically removed by a cut-and-rejoin mechanism that requires ATP and an endonuclease. This pathway is completely different from that of spliceosomal introns. Glossary Homing: the process by which an intron (or intein) spreads into the homologous position in an allele that lacks the intron. Homing endonuclease: a highly specific endonuclease protein that is typically encoded by self-splicing introns. They promote intron mobility through homing. Introns: intervening sequences in genes that are removed from precursor RNA in a process termed splicing. Monophyletic: a group of organisms (or molecular sequences) which includes the most recent common ancestor of all of its members and all of the descendants of that most recent common ancestor. Monophyletic groups are also called clades. Phylogeny: the evolutionary relationships among organisms or genes. Reverse splicing: the reversal of the forward self-splicing reaction in which a free intron integrates into another RNA molecule. This term is sometimes used to describe the process by which a group I intron becomes stably integrated into the genome following the reverse splicing reaction and subsequent reverse transcription and reintegration into the genome. Ribozymes: catalytic RNAs (RNA enzymes). www.sciencedirect.com 0168-9525/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2004.12.007 ARTICLE IN PRESS DTD 5 Review 2 TRENDS in Genetics Vol.xx No.xx Monthxxxx Box 2. Group I intron secondary structure and splicing The typical secondary structure of a group I intron consists of approximately ten paired elements (P1–P10, plus the optional P11–P17; Figure Ia) that are organized into three domains at the tertiary structure level [4]. The intron recognizes the 5 0 exon sequence by a 4–6nt base pairing [the internal guide sequence (IGS) interaction]. Approximately 100 nucleotides constitute the central catalytic core of the intron RNA (the shaded regions in Figure Ia). Group I introns are removed from precursor RNAs through a two-step splicing reaction (Figure Ib). Many introns can self-splice in vitro as naked RNAs (i.e. no proteins are required for intron splicing), whereas others need protein factors to facilitate correct folding of the ribozyme core. A prerequisite for splicing is the binding of an exogenous guanosine (exoG) cofactor to a pocket in the catalytic core of the intron, called the G-binding site. During the first step of splicing, the cofactor attacks the 5 0 splice site (SS) and attaches to the intron resulting in the release of the upstream exon. The exoG leaves the G-binding site and is replaced by the last nucleotide of the intron, which is always a G (called uG). The second step is initiated by an attack by the 3 0 end of the released exon on the 3 0 SS, which results in ligation of the exons and release of the intron RNA. Successful catalysis is dependent on the correct folding of the intron. (a) P9.2 P9.1 P9b P13′′ P9a Group IC1 intron P9.0 P10 5′SS P1 P5a G IGS 3′SS P5 P7 P5c P4 P3 P14′′ P6 P5b P8 P2 P2.1 P14′ (b) P13′ 5′SS 3′SS 5′ Intron ωG 3′ ωG 3′ exoG OH first step 5′ Intron OH + exoG second step Ligated exons Free intron exoG 5′ 3′ + Intron ωG TRENDS in Genetics Figure I. Secondary structure and self-splicing of group I introns. www.sciencedirect.com TIGS 280 detailed picture to be drawn of intron distribution and spread through populations and species. In this article, we present an overview of recent advances in the understanding of group I intron distribution and evolution and describe the driving forces that bring introns to their genomic destinations. Group I intron distribution in nature The distribution of group I introns on the tree of life is shown in Figure 1 (derived from http://www.rna.icmb. utexas.edu/ [6]). We accounted for intron redundancy by representing with single entries, sets of sequences from taxa with identical names that represent population samples (e.g. 59 introns at the same site in different isolates of the green alga Closterium ehrenbergii) or different cultivated strains. As of June 8 2004, a total of w1400 group I introns were reported in eukaryotic genomes. Of these, 800 are in the nucleus at 47 different sites in small subunit (SSU) ribosomal (r)RNA and 44 sites in large subunit (LSU) rRNA, 220 are in mitochondrial genes and 370 are in plastid DNA. Nuclear group I introns are limited to rDNA genes, whereas in the organelles, they are found in rRNA, tRNA and protein coding regions. Group I introns have a sporadic and highly biased distribution in nature and many microbial eukaryotes are rich in these sequences (Figure 1). However, group I introns appear to be absent in animals, except for three cases. Two are found in the mitochondrial genome of the sea anemone Metridium senile and one is in the mitochondrial gene cytochrome c oxidase subunit I of the coral Acropora tenuis. Group I introns have not yet been found in Archaea. Lineages in the tree of life that are particularly intronrich are fungi, plants and the red and green algae. These taxa contain w90% of all group I introns that have been identified to date. However, DNA sequencing efforts (in particular for organelle genomes) have been biased towards particular groups of genes and organisms [e.g. in animals and plants (of the 623 sequenced, or partially sequenced, mitochondrial genomes, 555 are from animals; http://www.ncbi.nlm.nih.gov/genomes/static/euk_o.html)], and it is possible that larger pools of introns remain to be discovered (e.g. in excavates, a group of potentially anciently diverged flagellates that bear a unique cluster of flagellae; http://microscope.mbl.edu/) as the taxonomic breadth of sequencing programs increase. In contrast to the situation in eukaryotes, group I introns are rare in bacteria. Only fifty introns representing nine different insertion sites occur in this prokaryotic lineage (i.e. four in tRNA genes, four in rRNA genes and one in the recA gene [6–8]). Group I introns also occur infrequently in viruses [9] and phages [10,11] and are not shown in Figure 1. Group I intron mobility The evolutionary fates of group I introns The sporadic distribution of group I introns suggests that many of these mobile elements have undergone horizontal transfer (through homing or reverse splicing) into different species and genes. The potential evolutionary fates of laterally transferred group I introns are loss from the lineage, vertical inheritance and/or movement within the ARTICLE IN PRESS DTD 5 Review TIGS 280 TRENDS in Genetics Vol.xx No.xx Monthxxxx 3 Alveolates E N yte ph ria s a iol ns Ciliates nio d Ra uglyp hid a moe bae Foraminiferans h rac cids soe Stramenopiles Oomycetes Bico Glaucophyte algae NM Lobose amoeba Dictyostelid e slime molds s Hapto phyte Co re j Eubacteria rid ia Fun gi po os M N N Discicristates Excavates Archaea N ias NM s n ma Opisthokonts bid ish M ako s Acra Vah sid slime m olds lka mp fiid am Eu oeb gle ae Tr no yp ids an os om es Diatoms N Brown algae NP Ot he rc hl a,c alg ae NP Le tes ella als g a l f im no oa An s ad on s om salid pl Di raba s Pa r tamonad Reto Oxymonads Ch thulids Cryptophytes olds Amoebozoa Labyrin Opalinid l slime m ia Plasmod icr N Rhizaria rop Alga hyte e P M ae alg M Chlo onads a lor d Re NP Str ph eptoyte s* Plants Cercom N P Ch MN Din o fl a Ap gel l a te ico s m pl ex an s N B Figure 1. The distribution of group I introns on the tree of life. The presence of introns in nuclear (N), plastid (P), mitochondrial (M) or bacterial (B) genomes is indicated by arrowheads and orange text and the relative number of introns is represented by the letter heights. The five different letter sizes indicate the presence of 1–25, 26–50, 51–100, 101–250 or w515 group I introns. Most (90%) introns are found in land plants plus chlorophyte (green) and rhodophyte (red) algae and the fungi (the presence of many introns is indicated in larger text). Streptophytes (marked with an asterisk) represent both charophyte algae and land plants. There are no reported group I introns in the nuclear DNA of land plants. The tree is a consensus of molecular and ultrastructural data (for more details, see Ref. [43]) and the broken lines represent the possible positions of the root (the prokaryotic lineages) on the consensus phylogeny of eukaryotes. Adapted with permission from Ref. [43]. lineage with partial loss or fixation (i.e. stable inheritance with no losses and no apparent mobility within the lineage). An additional level of complexity is that many group I introns appear to be mosaic genetic elements. Unlike their counterparts in group II introns, HEGs in group I introns are less likely to share the same evolutionary history as the catalytic RNA [12]. In fact, a group I intron-encoded HEG from one intron (e.g. intron one) can invade another group I intron at the homologous (same) or heterologous (other) position, independent of the group I ribozyme encoded by intron one (see the following section). The intron life-cycle and homing In 1999 Goddard and Burt published a model of intron evolution that involved cyclical gain and loss (Figure 2 [13]). Analysis of the yeast omega group I ribozyme showed that a group I intron with a full-length HEG invades an intron-minus population and spreads into all homologous (same) sites through homing (see Ref. [5] for mechanistic details of homing). Once the intron becomes fixed in the population, the HEG no longer has a biological function (i.e. the HEG function is to confer intron mobility), accumulates mutations and is eventually inactivated or lost. Unable to spread, the intron is also destined to be lost over www.sciencedirect.com time and will not reappear until the same intron with the full-length HEG is reintroduced through gene flow. To evaluate if the ‘omega’ life-cycle applies to other HEG-associated group I introns, it is necessary to examine many different introns from natural populations. If multiple populations of an organism that carries a HEG-containing intron are sampled, then one would expect to find introns with full-length HEGs, degenerate HEGs, introns lacking HEGs in addition to examples of intron absence. This pattern fits well with what has been reported previously [6,11,14–20], although every step in the intron cycle is not always found in each case (probably as a result of insufficient sampling). An extensive study of SSU rDNA (S)516 and S1506 (the numbering reflects the homologous position in Escherichia coli rDNA) group I introns from geographically distinct populations of red algae identified individuals lacking introns, those containing introns without a HEG and those containing introns with various deletions and/or mutations in the HEG [17]. In summary, the Goddard and Burt intron life-cycle model, which was established with the yeast omega intron, is supported by the intron and HEG distribution in red algae and is consistent with the intron distribution in other naturally occurring populations. This suggests that recurrent gain and loss is likely to be a general ARTICLE IN PRESS DTD 5 4 Review TIGS 280 TRENDS in Genetics Vol.xx No.xx Monthxxxx Precursor RNA No intron Intron invasion and fixation Precise loss Intron loss Homing Functional mobile intron Precursor RNA Precursor RNA Intron with nonfunctional HEG HEG or HEG HEG degeneration Loss of HEG function TRENDS in Genetics Figure 2. This model was proposed by Goddard and Burt [13] and is consistent with other analyses of HEG-associated group I introns. In this scheme, a mobile intron with a functional HEG invades an intron-less population and becomes fixed in all homologous (same) insertion sites through homing. Because the HEG no longer can promote intron spread (all sites are occupied) the HEG becomes redundant and eventually lost. The intron life-cycle is completed by the precise loss of the intronmodel for the recurrent gain and loss of the omega group I intron. feature of HEG-associated group I introns. Introns do however manage to ‘escape’ from the cycle by inserting into a new genic position (discussed in the following section). Introns in the same insertion site are evolutionarily related Reconstruction of group I intron phylogeny is made possible by the existence of the conserved RNA secondary structures (P1–P9) that can be used to guide the alignment of these sequences. The alignments are used as input to computer programs (e.g. PAUP*, PHYLIP, MrBayes and MEGA) to infer evolutionary trees. The core region used in these analyses is short in length (100–250 nt) but, as shown in Figure 3, it still encodes sufficient phylogenetic signal to resolve many nodes in the intron trees. An important insight into group I intron evolution that has come from phylogenetic analyses is that sequences from homologous genic sites are closely related (monophyletic) even though they can reside in distantly related organisms (e.g. eukaryotes and prokaryotes) [21]. It is unlikely that this result is explained by a single origin of each intron before the eukaryotic radiation (or the split of bacteria and eukaryotes) followed by widespread loss in different lineages. Introns are highly divergent sequences and it is therefore unlikely that they remain conserved across the millions or billions of years that separate taxa such as prokaryotes and eukaryotes. A more parsimonious scenario is that group I introns are laterally transferred across the tree of life resulting in their sporadic and highly biased distribution. If the second idea is correct, then the observation that intron movement nearly always occurs to the homologous site in different organisms suggests that strong restrictions exist to guide introns to their target (i.e. high sequence specificity in homing). Intron movement through reverse splicing Group I introns do however occasionally escape to heterologous genic sites, but the mechanism(s) responsible for www.sciencedirect.com these invasions have proven elusive. Because of the limited sequence requirement (4–6nt) for group I ribozymes to locate their target region, reverse splicing of free intron RNAs into target RNAs provides a more plausible model for intron spread into ectopic sites than homing, which relies on a 15–45nt recognition sequence for the intron-encoded homing endonuclease. Whether this pathway has an important role in group I intron spread, however, is unknown because of the lack of any direct evidence of reverse-splicing-mediated intron movement from genetic crosses. Furthermore, in contrast to homing that has been shown to be highly efficient, it is likely that reverse splicing, with its reliance on chance integration followed by two additional steps (i.e. reverse-transcription and then recombination), would be less effective in promoting intron movement. Rare reverse splicing events are therefore most likely to be recognized in the context of a broadly sampled intron phylogeny. A final consideration is that rDNA exists as a multi-copy gene family necessitating that alleles containing transferred introns must rise to high frequency (presumably through concerted evolution or less parsimoniously, repeated reverse splicing events) in individuals and in populations to ensure survival and fixation. In spite of these constraints, conclusive experimental evidence exists that group I introns can fulfill one crucial step in the reverse splicing mobility pathway, full integration into foreign RNA in vivo [22,23]. In one analysis, 19 variants of the Tetrahymena intron, all with different internal guide sequences were expressed in E. coli [23]. Partial reverse splicing at 69 sites and complete integration at one novel site in the 23S rRNA were found with this approach. Phylogenetic analyses provide support for another prediction of the reverse splicing pathway, statistical support for the close relationship of introns at ectopic rDNA sites (e.g. S1046–S1052, S289–S1199 [18]). In these cases, sequence similarity is limited to the proximal 5 0 flanking exon in each pair of related introns. This ARTICLE IN PRESS DTD 5 Review TIGS 280 TRENDS in Genetics Vol.xx No.xx Monthxxxx (a) 5 (b) 66 S516 S788 73/68 98 97 80/67 L2563 S1199 L2066 S516 S516 S516 S516 L1926 L1923 88 63 87/73 91/84 S1506 S516 94/86 S1516 95/76 S989 90/62 95/77 S1210 IC1 S1199 66 100/94 100/99 S1199 66/65/- S943 100 S1199 77 S1199 S516 IE L2449 S516 0.1 substitutions per site 0.2 substitutions per site TRENDS in Genetics Figure 3. The phylogeny of nuclear rDNA group I introns. A summary of the phylogenetic relationships (a) between introns in algae and fungi, and (b) between introns in ascomycete fungi are shown. In both trees, introns at homologous sites are generally more closely related to each other than to introns at ectopic sites. Statistical support for clades using different phylogenetic methods is shown above and below the branches (bootstrap values R60%) or as thick lines (bayesian posterior probabilities R0.95). Both trees were built using the neighbor-joining method with evolutionary distances calculated according to the (a) Kimura two-parameter or (b) the Jukes-Cantor model. For more details about the phylogenetic methods used in (a), see Ref. [21] and for those used in (b), see Ref. [37]. Branch lengths are proportional to the number of substitutions per site and the size of the triangles represents the number of sequences in each clade. finding is consistent with the 4–6nt sequence requirement in the 5 0 exon that is predicted for the RNA-based reverse splicing and integration pathway. In addition, HEGs have not been found within the introns at these insertion sites making it less likely that endonucleases mediated the invasions of new sites at the DNA level. Although we still lack conclusive proof of reverse splicing-mediated intron mobility, accumulating experimental and comparative data are in favor of this mechanism having contributed to the spread of group I introns. Intron-independent HEG mobility It has been known for some time that sequences encoding HEGs insert into the peripheral regions of ribozymes and mobilize them through homing. We now have a more detailed picture of the intron-HEG relationship and the evolutionary history of the two partners turns out to be more complicated than previously thought. The most direct approach to identify intron-independent HEG movement is through the comparison of intron and HEG trees. Unfortunately, this approach is stymied by the rare occurrence of intact HEGs in nature (i.e. they decay rapidly following intron fixation [20]). HEG mobility can, however, leave clear evolutionary footprints (Figure 4). For example, www.sciencedirect.com HEGs that are closely related to the His-Cys box family are inserted into different peripheral loops or encoded on different strands of homologous group I introns (e.g. introns inserted after positions 516, 943 and 1506 in the gene that encodes the nuclear SSU rRNA [20]). Movement to different peripheral loop-regions within the intron provides evidence for HEG mobility within a single ribozyme lineage. Furthermore, closely related HEGs are found in phylogenetically distantly related group I introns that are located at adjacent rDNA sites (sites L1923, L1925 and L1926), suggesting that local HEG mobility into heterologous introns occurs [20]. A study of rDNA introns that encode HEGs of the LAGLIDADG family also identified related HEGs in adjacent introns, in some cases, resulting from HEG mobility alone or, in other cases, as a result of the mobility of the intron–HEG element as a unit [24]. Did a gene duplication event trigger the spread of LAGLIDADG HEGs? Homing endonucleases that contain the conserved LAGLIDADG motif represent the largest family of HEGs and are widespread in nature, occurring in several different types of mobile elements (i.e. group I and II introns, inteins and archaeal introns) and as freestanding open ARTICLE IN PRESS DTD 5 Review 6 (a) Intron homing TIGS 280 TRENDS in Genetics Vol.xx No.xx Monthxxxx HEG Allele "a" Allele "A" Intron DSBR HEG HEG Allele "A" Intron Intron Allele "a" Intron or Intron Allele "a" Both alleles are protected against ENase activity (b) HEG invades homologous intron or intron in adjacent site HEG Allele "A" Intron Allele "a" DSB (in intron) HEG Illegitimate recombination Allele "A" Intron HEG HEG Intron or Intron Intron or Intron HEG or HEG HEG or HEG Intron Intron or or Intron Intron HEG Allele "a" Allele "a" Allele "a" Allele "a" HEG TRENDS in Genetics Figure 4. A model for the movement of homing endonuclease genes. Homing endonuclease genes (HEGs) have the potential to move with the group I intron as (a) a unit by homing or (b) independent of the ribozyme. Homing results in the transfer of the intron-HEG element plus flanking exon sequences (co-conversion) into the homologous genic position. The HE recognition sequence is shown in red. Ribozyme-independent HEG transfer is likely to result in the insertion of the HEG into homologous HEG-minus introns (both shown in orange), or heterologous introns that are located at an adjacent site (shown in orange and blue) with respect to the donor intron. HEGs can insert into different strands of the intron encoding sequence (indicated by arrows above and below the intron), and into different peripheral regions of the ribozyme structure (indicated by the different locations of the HEG on the schematic representation of the intron). reading frames (ORFs) [25]. Comparative studies suggest that the successful spread of LAGLIDADG HEGs was driven by a gene duplication and fusion event followed by the rapid evolution of the DNA-binding domain [24,26,27]. Approximately 40 group I intron-associated LAGLIDADG HEGs contain one copy of the motif (i.e. LAGLIDADG motif), whereas the remainder contain two copies (O200 cases in total [5,24]). Interestingly, the single-motif HEGs are restricted to introns located at six insertion sites in LSU rDNA; five out of six of these are clustered between L1917 and L1951. By contrast, the double-motif HEGs are more widely distributed in rDNA introns and in introns in other types of genes. The difference in the success of these HEGs in spreading into novel introns probably reflects how they recognize target sequences. The single-motif HEs function www.sciencedirect.com as homo-dimers and, therefore, need a high degree of symmetry in the DNA target sequence, whereas the double-motif HEs function as monomers. The N- and C-terminal halves of the monomeric HEs (i.e. each half corresponds to one protein in the dimers of single-motif HEs) have accumulated many amino acid substitutions and might have a less stringent requirement for DNA target symmetry [26]. Most double-motif LAGLIDADG HEGs in rDNA can be traced to one of two lineages [24]. One of these (designated ‘Clade 1’ in Ref. [24]) appears to be derived from a duplication and fusion event involving a single-motif HEG in an intron located between the rDNA positions L1917–L1951, whereas the origin of the other group of double-motif HEGs (‘Clade 2’) is unclear. In summary, evolution has found an effective solution to target site restriction in LAGLIDADG HEGs. Duplication ARTICLE IN PRESS DTD 5 Review TIGS 280 TRENDS in Genetics Vol.xx No.xx Monthxxxx and divergence of the ancestral single-motif HEGs to form a two-motif unit appears to have freed them to explore a larger set of host introns and genic sites. 7 structure). It is therefore likely that the maturase activity evolved as a secondary adaptation. In summary, there appears to be a dynamic relationship between ribozyme, HE and maturase activities in HEG-associated group I introns. The relative evolutionary importance of these processes reflects primarily the potential for intron spread in populations or species and the extent of dependence on the maturase for intron splicing. The role of proteins in group I intron splicing HEGs can avoid loss by gaining maturase activity A fascinating aspect of the group I intron-HEG relationship is that some endonucleases also function as maturases (i.e. they assist in intron folding [28]). The existence of these bifunctional proteins suggests that the gain of maturase activity has provided the HEs and/or their associated introns with an evolutionary advantage. So what advantage could maturase activity offer? One hypothesis is that the gain of a HEG-encoded DNA (typically w1kb in size) in the group I intron might lower the efficiency of selfsplicing. The maturase activity conferred by the HEG could therefore be selected for because it rescues or sufficiently improves splicing ability to compensate for the presence of the ORF [28]. Another intriguing possibility is that maturase activity evolves (or is retained) because it gives the HEG an opportunity to escape inactivation or loss [29] (i.e. endonuclease function will be lost if the intron is fixed in the population, whereas maturase activity might be retained to ensure correct intron excision). This scenario appears to have played out recently in Saccharomyces cerevisiae in which two non-adjacent amino acid changes activate endonuclease activity in an intron-encoded LAGLIDADG maturase [30,31]. Other important insights have come from structural and biochemical studies of the I-AniI HE protein [29,32,33]. In this case, the RNA and DNA-binding domains of the protein are non-overlapping (i.e. they are located at different places in the protein The long-term evolution of self-splicing ability Insights into the evolution of RNA structure and autocatalysis can be gained by comparing group I introns that have been vertically inherited for millions of years. These sequences provide potential examples of intron-host co-evolution and are candidates for introns that might have evolved a function that is beneficial to the host. Two outstanding examples of long-term vertical inheritance are the ancient group I intron inserted in the tRNA-Leu gene of cyanobacteria and plastids [34–36] and the S788 intron in the ascomycete fungi [37]. The tRNA-Leu intron has probably been vertically inherited for more than a billion years in plastids [38] and two-to-three billion years in the cyanobacterial ancestors of these organelles [34], whereas the S788 intron was probably acquired in ascomycetes w400–600 million years ago [37]. Although there are some obvious differences in the evolutionary history of these two group I introns, there are also striking similarities. Most importantly, when self-splicing activity is mapped on the intron phylogeny (which generally recapitulates the host tree as a result of intron vertical inheritance), in both cases the early diverging ribozymes self-splice efficiently as naked RNAs. By contrast, the later Euphyllophytina 65 1st − 74 + − Chlorophyta 1st 63 75 Stramenopiles Glaucophyta 1st Cyanobacteria + + + + + 1st Charophytes Bryophytes Lycophytina 97 69 (b) − − − − − − − − − − − 70 Streptophyta Streptophyta Euphyllophytina Coleochaetales Charales Zygnematales Chlorokybales Chlorophyta Stramenopiles Glaucophyta Cyanobacteria Charophyta (a) − P5abcd 100 77 − 66 83 84 + + + + + − − − + P5abcd − Figure 5. Analysis of the phylogeny and in vitro splicing of two vertically inherited group I introns. Summary phylogenies and the in vitro splicing ability of introns inserted (a) into the tRNA-Leu gene in plastids and in cyanobacteria and (b) after position 788 in the SSU rRNA of ascomycete fungi are shown. Introns that splice efficiently (C), undergo the first step of splicing (1st), or show no activity in vitro (K), are indicated on the tree. The boxed cladogram in (a) shows the host relationships of taxa containing the tRNA-Leu intron. The host and intron trees are in general agreement. Two distinct clades are indicated in the S788 intron, an ancestral pool that contains the P5 extensions (CP5abcd) and a later diverging clade that lacks this domain (-P5abcd). The trees are modified with permission from Ref. [36] and Ref. [37]. www.sciencedirect.com DTD 5 8 Review ARTICLE IN PRESS TRENDS in Genetics Vol.xx No.xx Monthxxxx diverging introns can either only complete the first step of splicing [i.e. attack of the 5 0 splice site by the exogenous guanosine which becomes covalently attached to the intron (the 5 0 exon is freed)] or lack autocatalytic activity entirely (Figure 5). It is tempting to speculate that the partial or complete loss of activity in both ribozymes is due to their dependence over evolutionary time on particular host proteins to facilitate splicing. Furthermore, the fixation of the tRNA-Leu intron in all plants combined with their loss of self-splicing ability suggests that a reciprocal dependence between the intron and the host might have evolved in the common ancestor of this highly successful lineage. Although unproven, this idea is supported by a study that shows chloroplast ribonucleoproteins (cpRNPs) associate with unspliced tRNA-Leu transcripts in Nicotiana tabacum [39]. The cpRNP–tRNA complexes confer stability and ribonuclease resistance to the RNAs and potentially act as a scaffold for host-mediated splicing of the intron-containing tRNAs [39]. The S788 group I intron is also of great interest because derived members of this ribozyme lineage have undergone major structural changes with many nucleotide substitutions in the central catalytic core, some of which are crucial for intron folding, and the loss of the larger peripheral P5abcd domain (Figure 5). Two distinct clades are identified in phylogenetic analyses of the S788 intron, an ancestral pool that contains the P5 extensions (CP5abcd, Figure 5) and a later diverging clade that lacks this domain (KP5abcd) [37]. The loss of P5abcd is correlated with the inability to self-splice in vitro. This example therefore provides a direct link between group I intron structure evolution and autocatalysis in a vertically inherited intron. Concluding remarks The detailed mechanisms of intron autocatalysis and mobility have been established through rigorous biochemical analyses. Intron evolution is being clarified through the analysis of intron phylogeny, distribution, secondary structure and splicing in a diverse group of microbial eukaryotes. Bringing together these two disparate pools of information is essential to understand the natural history of group I introns. Recent work has successfully integrated structural and functional insights regarding group I introns into a broader evolutionary context. Examples of important contributions to our understanding of intron biology and evolution are the compilation of introns into a comprehensive database [6], development of the cyclical model for the gain and loss of HEG-associated introns [13], novel insights into the dynamic intron-HEG relationship [24,26,27,29] and the recognition that group I introns can be vertically inherited over long periods of evolutionary time and their interaction with the host cell can provide important insights into intron biology [34–37,39]. It will be important to investigate the role of host factors in intron splicing across the tree of life {e.g. mitochondrial tyrosyl-tRNA synthetase (Cyt-18, [31]), leucyl-tRNA synthetase (LeuRS, [40]) and mitochondrial RNA splicing 1 (Mrs1, [41])} and the contribution of these factors to intron fixation and mobility. It is likely that host proteins generally have a role in the splicing of many group I introns; however, their broader impact on intron evolution is www.sciencedirect.com TIGS 280 currently unknown. Another unanswered question is how group I introns are transferred between species. Viruses have often been suggested as possible vectors but evidence is still lacking. Finally, the true biological role(s) for group I intron RNAs remains to be clarified (for more details, see Ref. [42]). Acknowledgements This work was generously supported by grants awarded to D.B. from the National Science Foundation (MCB 0110252, DEB 0107754), a grant from The Norwegian Research Council to P.H., and Avis E. Cone and Stanley fellowships from the University of Iowa to D.S. We thank H. Joseph Runge (Iowa) for helpful discussions. References 1 Hurst, G.D. and Werren, J.H. (2001) The role of selfish genetic elements in eukaryotic evolution. Nat. Rev. Genet. 2, 597–606 2 Cech, T.R. (2002) Ribozymes, the first 20 years. Biochem. Soc. Trans. 30, 1162–1166 3 Westhof, E. (2002) Group I introns and RNA folding. Biochem. Soc. Trans. 30, 1149–1152 4 Adams, P.L. et al. (2004) Crystal structure of a self-splicing group I intron with both exons. Nature 430, 45–50 5 Chevalier, B.S. and Stoddard, B.L. (2001) Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility. Nucleic Acids Res. 29, 3757–3774 6 Cannone, J.J. et al. (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3, 2 doi: 10.1186/1471-2105-3-2 (http://www.biomedcentral.com/1471-2105/3/2) 7 Ko, M. et al. (2002) Group I self-splicing intron in the recA gene of Bacillus anthracis. J. Bacteriol. 184, 3917–3922 8 Nesbø, C.L. and Doolittle, W.F. (2003) Active self-splicing group I introns in 23S rRNA genes of hyperthermophilic bacteria, derived from introns in eukaryotic organelles. Proc. Natl. Acad. Sci. U. S. A. 100, 10806–10811 9 Nishida, K. et al. (1998) Group I introns found in Chlorella viruses: biological implications. Virology 242, 319–326 10 Edgell, D.R. et al. (2000) Barriers to intron promiscuity in bacteria. J. Bacteriol. 182, 5281–5289 11 Sandegren, L. and Sjöberg, B.M. (2004) Distribution, sequence homology, and homing of group I introns among T-even-like bacteriophages: evidence for recent transfer of old introns. J. Biol. Chem. 279, 22218–22227 12 Toor, N. et al. (2001) Coevolution of group II intron RNA structures with their intron-encoded reverse transcriptases. RNA 7, 1142–1152 13 Goddard, M.R. and Burt, A. (1999) Recurrent invasion and extinction of a selfish gene. Proc. Natl. Acad. Sci. U. S. A. 96, 13880–13885 14 Cho, Y. et al. (1998) Explosive invasion of plant mitochondria by a group I intron. Proc. Natl. Acad. Sci. U. S. A. 95, 14244–14249 15 Haugen, P. et al. (1999) Complex group-I introns in nuclear SSU rDNA of red and green algae: evidence of homing-endonuclease pseudogenes in the Bangiophyceae. Curr. Genet. 36, 345–353 16 Foley, S. et al. (2000) Widespread distribution of a group I intron and its three deletion derivatives in the lysin gene of Streptococcus thermophilus bacteriophages. J. Virol. 74, 611–618 17 Müller, K.M. et al. (2001) A structural and phylogenetic analysis of the group IC1 introns in the order Bangiales (Rhodophyta). Mol. Biol. Evol. 18, 1654–1667 18 Bhattacharya, D. et al. (2002) Vertical evolution and intragenic spread of lichen-fungal group I introns. J. Mol. Evol. 55, 74–84 19 Nozaki, H. et al. (2002) Evolution of rbcL group IA introns and intron open reading frames within the colonial Volvocales (Chlorophyceae). Mol. Phylogenet. Evol. 23, 326–338 20 Haugen, P. et al. (2004) The evolution of homing endonuclease genes and group I introns in nuclear rDNA. Mol. Biol. Evol. 21, 129–140 21 Nikoh, N. and Fukatsu, T. (2001) Evolutionary dynamics of multiple group I introns in nuclear ribosomal RNA genes of endoparasitic fungi of the genus Cordyceps. Mol. Biol. Evol. 18, 1631–1642 DTD 5 Review ARTICLE IN PRESS TRENDS in Genetics Vol.xx No.xx Monthxxxx 22 Roman, J. and Woodson, S.A. (1998) Integration of the Tetrahymena group I intron into bacterial rRNA by reverse splicing in vivo. Proc. Natl. Acad. Sci. U. S. A. 95, 2134–2139 23 Roman, J. et al. (1999) Sequence specificity of in vivo reverse splicing of the Tetrahymena group I intron. RNA 5, 1–13 24 Haugen, P. and Bhattacharya, D. (2004) The spread of LAGLIDADG homing endonuclease genes in rDNA. Nucleic Acids Res. 32, 2049–2057 25 Dalgaard, J.Z. et al. (1997) Statistical modeling and analysis of the LAGLIDADG family of site-specific endonucleases and identification of an intein that encodes a site-specific endonuclease of the HNH family. Nucleic Acids Res. 25, 4626–4638 26 Lucas, P. et al. (2001) Rapid evolution of the DNA-binding site in LAGLIDADG homing endonucleases. Nucleic Acids Res. 29, 960–969 27 Chevalier, B. et al. (2003) Flexible DNA target site recognition by divergent homing endonuclease isoschizomers I-CreI and I-MsoI. J. Mol. Biol. 329, 253–269 28 Belfort, M. (2003) Two for the price of one: a bifunctional intronencoded DNA endonuclease-RNA maturase. Genes Dev. 17, 2860–2863 29 Chatterjee, P. et al. (2003) Functionally distinct nucleic acid binding sites for a group I intron encoded RNA maturase/DNA homing endonuclease. J. Mol. Biol. 329, 239–251 30 Szczepanek, T. and Lazowska, J. (1996) Replacement of two nonadjacent amino acids in the S. cerevisiae bi2 intron-encoded RNA maturase is sufficient to gain a homing-endonuclease activity. EMBO J. 15, 3758–3767 31 Lambowitz, A.M. et al. (1999) Group I and II ribozymes as RNPs: Clues from the past and guides to the future. In The RNA World (2nd edn) (Gesteland, R.F. et al., eds), pp. 451–485, Cold Spring Harbor Laboratory Press www.sciencedirect.com TIGS 280 9 32 Bolduc, J.M. et al. (2003) Structural and biochemical analyses of DNA and RNA binding by a bifunctional homing endonuclease and group I intron splicing factor. Genes Dev. 17, 2875–2888 33 Geese, W.J. et al. (2003) In vitro analysis of the relationship between endonuclease and maturase activities in the bi-functional group I intron-encoded protein, I-AniI. Eur. J. Biochem. 270, 1543–1554 34 Kuhsel, M.G. et al. (1990) An ancient group-I intron shared by eubacteria and chloroplasts. Science 250, 1570–1573 35 Xu, M.Q. et al. (1990) Bacterial origin of a chloroplast intron: conserved self-splicing group-I introns in cyanobacteria. Science 250, 1566–1570 36 Simon, D. et al. (2003) Phylogeny and self-splicing ability of the plastid tRNA-Leu group I Intron. J. Mol. Evol. 57, 710–720 37 Haugen, P. et al. (2004) Long-term evolution of the S788 fungal nuclear small subunit rRNA group I introns. RNA 10, 1084–1096 38 Yoon, H.S. (2004) A molecular timeline for the origin of photosynthetic eukaryotes. Mol. Biol. Evol. 21, 809–818 39 Nakamura, T. et al. (1999) Chloroplast ribonucleoproteins are associated with both mRNAs and intron-containing precursor tRNAs. FEBS Lett. 460, 437–441 40 Rho, S.B. et al. (2002) An inserted region of leucyl-tRNA synthetase plays a critical role in group I intron splicing. EMBO J. 21, 6874–6881 41 Bassi, G.S. et al. (2002) Recruitment of intron-encoded and co-opted proteins in splicing of the bI3 group I intron RNA. Proc. Natl. Acad. Sci. U. S. A. 99, 128–133 42 Nielsen, H. et al. (2003) The ability to form full-length intron RNA circles is a general property of nuclear group I introns. RNA 9, 1464–1475 43 Baldauf, S.L. (2003) The deep roots of eukaryotes. Science 300, 1703–1706
© Copyright 2026 Paperzz