Evolutionary Dynamics of Ty1-copia Group Retrotransposons in Grass Shown by Reverse Transcriptase Domain Analysis Yoshihiro Matsuoka and Koichiro Tsunewaki Department of Bioscience, Fukui Prefectural University, Fukui, Japan The evolutionary dynamics of Ty1-copia group retrotransposons in grass were examined by reverse transcriptase (RT) domain analysis. Twenty-three rice RT sequences were newly determined for this report. Phylogenetic analysis of 177 RT sequences, mostly derived from wheat, rice, and, maize, showed four distinct families, which were designated G1, G2, G3, and G4. Three of these families have elements obtained from distantly related species, indicative of origins prior to the radiation of grass species. Results of Southern hybridization and detailed comparisons between the wheat and rice sequences indicated that each of the families had undergone a distinct pattern of evolution. Multiple families appear to have evolved in parallel in a host species. Analyses of synonymous and nonsynonymous substitutions suggested that there is a low percentage of elements carrying functional RT domains in the G4 family, indicating that the production of new G4 elements has been controlled by a small number of elements carrying functional RT domains. Introduction Ty1-copia group retrotransposons are one of the best characterized groups of plant retrotransposons. They have sequences called long terminal repeats (LTRs) at both ends and carry regions encoding gag, proteinase, endonuclease, reverse transcriptase (RT), and RNase H between the LTRs in the order typically observed for Ty1 of Saccharomyces and copia of Drosophila (Grandbastien 1992). They replicate by reverse transcription of RNA intermediates transcribed from the parental copies. Unlike DNA-DNA type transposons, their life cycle lacks excision from the genome (see Eickbush 1994 for review). Once inserted, they usually remain in the locus. Ty1-copia group retrotransposons have been reported for a wide range of plant taxa, including angiosperms, gymnosperms, ferns, lycopods, and bryophytes, evidence that they are ubiquitous in the plant kingdom (Flavell et al. 1992; Voytas et al. 1992; Hirochika and Hirochika 1993). The majority of plant Ty1-copia group retrotransposons, like other plant retroelements, are thought to be rarely active (see Bennetzen 1996 for review). Their expression and transposition, however, are inducible by such stresses as protoplast isolation and tissue culture, as well as by pathogen-related stresses (Grandbastien, Spielmann, and Caboche 1989; Hirochika 1993; Pouteau, Grandbastien, and Boccara 1994; Moreau-Mhiri et al. 1996; Mhiri et al. 1997). Detection of their transcripts under ordinary growth conditions has also been reported (Suoniemi, Narvanto, and Schulman 1996; Pearce et al. 1997). The cis-regulating sequences in the LTRs for transcription have been identified (Casacuberta and Grandbastien 1993; Hirochika et al. 1996a; Suoniemi, Narvanto, and Schulman 1996). The extrachromosomal circular DNA intermediates formed during retrotransposition have been analyzed in Tto1 from tobacco (Hirochika and Otsuki 1995). Culture-inducible activation of an element in rice called Tos17 has been proposed for insertional mutagenesis and gene function analysis (Hirochika et al. 1996b). Compared with the progress made in understanding the functional aspects of plant Ty1-copia group retrotransposons, relatively little is known about their evolution. Their ubiquity in the plant kingdom suggests that they are of very ancient origin. Some grass elements from the tribe Triticeae make up at least several percent of the genome (Suoniemi et al. 1996; Turcich et al. 1996; Pearce et al. 1997), indicative of active amplifications during the host’s evolution. This abundance suggests that in some manner they may have affected the determination of the host genome structure. Analyses of large numbers of RT sequences have shown the family structures of the Ty1-copia group retrotransposons in cotton (VanderWiel, Voytas, and Wendel 1993) and wheat (Matsuoka and Tsunewaki 1996). In wheat, further characterizations suggested (1) that there are two age groups in the wheat Ty1-copia group retrotransposon families, one established before and one established after radiation of the subfamily Pooideae (Matsuoka and Tsunewaki 1997a), and (2) that an ancient family whose origin might date to before the divergence of wheat and rice (60 MYA) was active during the divergence of the Triticum and Aegilops species that occurred some millions of years ago (Matsuoka and Tsunewaki 1997b). To clarify the evolutionary relationships between wheat and other grass Ty1-copia group retrotransposons, RT sequences obtained from several grass species were analyzed. Here, we report the identification of four distinct families, the distribution of these families among the grass species, and the characteristics of the nucleotide substitution patterns. The evolutionary dynamics of the Ty1-copia group retrotransposons in grass and possible amplification mechanisms are discussed on the basis of our findings. Key words: Ty1-copia group retrotransposon, reverse transcriptase, grass, evolution. Materials and Methods Plant Materials Address for correspondence and reprints: Yoshihiro Matsuoka, Fukui Prefectural University, Matsuoka-cho, Yoshida-gun, Fukui 910– 1142, Japan. E-mail: [email protected]. Mol. Biol. Evol. 16(2):208–217. 1999 q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 208 Twelve grass species representing four of the five subfamilies of the grass family (Poaceae) were used (table 1). The Poaceae is considered monophyletic (Katayama and Ogihara 1996). Evolution of Retrotransposons in Grass 209 Table 1 Poaceae Species Studied Subfamily Tribe Bambusoideae . . Oryzeae Panicoideae . . . . Paniceae Andropogoneae Chloridoideae . . Chlorideae Pooideae . . . . . . . Aveneae Triticeae a b Species Oryza sativa L. Oryza glaberrima Stud.a Oryza australiensis Dominb Setaria italica (L.) P. Beauv. Echinochloa utilis Ohwi et Yabuno Zea mays L. Sorghum bicolor Moench Eleusine coracana Gaertn. Avena sativa L. Hordeum vulgare L. Secale cereale L. Triticum aestivum L. Common Name Rice African rice Australian wild rice Foxtail millet Japanese barnyard millet Maize Sorghum Finger millet Oats Barley Rye Common wheat Accession number W0025; supplied by the National Institute of Genetics, Mishima, Japan. Accession number W1538; supplied by the National Institute of Genetics, Mishima, Japan. DNA Isolation, Cloning, and Sequencing Total DNA was extracted from the leaves of greenhouse-grown plants by the CTAB method (Murray and Thompson 1980). The PCR method of Hirochika, Fukuchi, and Kikuchi (1992) was used to amplify ;270bp fragments of the RT domain with total DNA as the template. Degenerate primers 59-CA(A/G)ATGGA(C/ T)GT(A/C/G/T)AA(A/G)AC-39 and 59-CAT(A/ G)TC(A/G)TC(A/C/G/T)AC(A/G)TA-39, which respectively correspond to the conserved RT peptide motifs (QMDVKT and YVDDM) of the Ty1-copia group retrotransposons, were used. PCR, cloning, and sequencing of the products were done as described previously (Matsuoka and Tsunewaki 1996). Southern Hybridization Southern hybridization using a four-base-recognizing restriction endonuclease in combination with a short probe was performed to detect restriction site polymorphism within and near the RT domain. Total DNAs (2– 12 mg) were digested with HaeIII or TaqI as specified by the manufacturer (Takara Shuzo). Although large amounts of DNA were used for large-genome-size species like wheat, the number of nuclei represented in samples may vary by species. The digested samples were separated in a 1.5% agarose gel and were then transferred to a Hybond N1 membrane (Amasham). Probes were prepared from the rice RT domain clones by PCR using the M13 primers. Probe labeling and hybridization were done with a DIG labeling and detection kit (Boehringer Mannheim). The membranes were washed under low-stringency conditions: twice with 2 3 SSC plus 0.1% SDS at room temperature for 5 min, then twice with 0.1 3 SSC plus 0.1% SDS at 588C for 15 min. The hybridized probes were made visible on Xray film (Fuji Film) by a chemiluminescent reaction with Lumi-phosy 530. Computer Analysis Initial sequence manipulation and multiple-sequence alignment were done with DNASIS software (Hitachi Software Engineering Co.) and CLUSTAL W, version 1.7 (Thompson, Higgins, and Gibson 1994), respectively. The latter was also used for neighbor-joining tree generation and bootstrapping. MEGA software (Kumar, Tamura, and Nei 1993) was used to estimate the total nucleotide sequence divergence (calculated as the proportion of nucleotide sites at which the two sequences compared differ), the number of synonymous substitutions per synonymous site, and the number of nonsynonymous substitutions per nonsynonymous site. Sequences corresponding to PCR-primer-annealing sites were excluded from the analysis. The Ty1-copia group RT sequences analyzed are given in table 2, along with their database accession numbers. Note that the five species (Triticum urartu, Aegilops speltoides, Aegilops squarrosa, Arabidopsis thaliana, and Nicotiana tabacum) used as sources of the RT sequences are not listed in table 1. Results Rice Ty1-copia Group Retrotransposons Show Different Distribution Ranges in Grass The RT sequences of Ty1-copia group retrotransposons were amplified by PCR from rice (Oryza sativa) and were then cloned and sequenced. Twenty-three new RT clones and three clones identical to those previously reported (RICTOSRT4, Tos12, and Tos18 in table 2) were isolated. The nucleotide sequences of these clones were successfully aligned with many other plant Ty1copia group RT sequences (see below), evidence that they are authentic. The 23 newly isolated sequences have been designated Ros1 (retrotransposon of Oryza sativa 1)–Ros23 and deposited in the DDBJ/EMBL/ GenBank databases (accession numbers AB017978– AB018000). The nucleotide sequence divergences of the new clones range from 0.4% to 57.8% (average 33.2%), evidence of the high level of heterogeneity of rice retrotransposons (Hirochika et al. 1996b; Wang et al. 1997). Four distantly related clones (Ros4, Ros6, Ros12, and Ros13) were selected as probes in order to examine the distribution of rice Ty1-copia group retrotransposons among the grass species. The nucleotide sequence divergences of these selected clones range from 38.3% to 49.8% (average 43.0%). Figure 1 shows the Southern blots in which homologs of the selected rice RT clones of the 12 grass 210 Matsuoka and Tsunewaki Table 2 Accession Numbers of the Ty1-copia Group Reverse Transcriptase Sequences Analyzed Species Triticum aestivum . . . . . . Triticum urartu . . . . . . . . Aegilops speltoides . . . . . Aegilops squarrosa . . . . . Hordeum vulgare . . . . . . Avena sativa . . . . . . . . . . Zea mays . . . . . . . . . . . . . Oryza sativa . . . . . . . . . . Oryza australiensis . . . . . Arabidopsis thaliana . . . . Nicotiana tabaccum . . . . Database Locus Name (element name) Accession TAWIS21A (Wis-2-1A) AB008772 (Tar1) TAPOL TACOPIA D90618–D90671 D90602–D90617 D90685–D90698 D90672–D90684 HVBARE1 (BARE-1) BLYCOPIA ASTCOPIA ZMU68408 (Opie-2) ZMU41000 (PREM-2) ZMU12626 (Hopscotch) MZECOPIAA MZECOPIAB MZEPOL1 MZEPOL2 RICCOPIA RICTOSRT1 RICTOSRT4 D85865–D85879 (Tos6–Tos20) OSCORET1–OSCORET5 (Rrt1–Rrt5) OSCORET7–OSCORET23 (Rrt7–Rrt23) AB017978–AB018000 (Ros1–Ros23) D85597 (RIRE1) ATTA13 (Ta1-3) NTTNT194 (Tnt1) species were probed (table 1). The Ros6 probe detected its homologs in all of the species except Sorghum bicolor (fig. 1a). Homologs of Ros4 were detected in the Oryza species (Bambusoideae) and in two of the Panicoideae species, Setaria italica (very faintly) and Echinochloa utilis (fig. 1b). In contrast, the Ros12 and Ros13 probes convincingly detected their homologs only in the Oryza species (fig. 1c and d), with long exposure giving only very faint visible bands in Echinochloa utilis and Avena sativa when Ros12 was the probe (data not shown). These results indicate that certain rice retrotransposons are detectable in distantly related species of different subfamilies, whereas others are detectable only in the Oryza species. The distribution range of the rice Ty1-copia group retrotransposons in grass therefore varies with the species. We will call the former ‘‘widely distributed’’ and the latter ‘‘Oryza-specific.’’ The above results correlate very well with previous findings about the distribution of wheat Ty1-copia group retrotransposons in grass (Matsuoka and Tsunewaki 1997a). The wide distribution of some Ty1-copia group retrotransposons of rice and wheat across the borders of subfamilies suggests ancient origins prior to radiation of the grass species. The polymorphism detected with the probes indicates that there are HaeIII or TaqI site differences within and near the RT domain. Polymorphism between closely related species therefore reflects recent mutations that have occurred in this region. Analysis of three species from the genus Oryza (O. australiensis, O. glaberrima, and O. sativa) showed that the banding patterns of O. X63184 AB008772 D12832 M94498 D90618–D90671 D90602–D90617 D90685–D90698 D90672–D90684 Z17327 M94470 M94483 U68408 U41000 U12626 M94481 M94482 D12830 D12831 M94492 D12825 D12826 D85865–D85879 Z75496–Z75500 Z75502–Z75518 AB017978–AB018000 D85597 X13291 X13777 Reference Murphy et al. (1992) Matsuoka and Tsunewaki (1996) Hirochika and Hirochika (1993) Voytas et al. (1992) Matsuoka and Tsunewaki (1996) Matsuoka and Tsunewaki (1996) Matsuoka and Tsunewaki (1996) Matsuoka and Tsunewaki (1996) Manninen and Schulman (1993) Voytas et al. (1992) Voytas et al. (1992) SanMiguel et al. (1996) Turcich et al. (1996) White, Habera, and Wessler (1994) Voytas et al. (1992) Voytas et al. (1992) Hirochika and Hirochika (1993) Hirochika and Hirochika (1993) Voytas et al. (1992) Hirochika, Fukuchi, and Kikuchi (1992) Hirochika, Fukuchi, and Kikuchi (1992) Hirochika et al. (1996b) Wang et al. (1997) Wang et al. (1997) This study Noma et al. (1997) Voytas and Ausubel (1988) Grandbastien, Spielmann, and Caboche (1989) sativa were more similar to those of O. glaberrima than to those of O. australiensis (fig. 1). This is consistent with the genetic relationships between those species, with O. glaberrima (genome constitution AA) and O. sativa (AA) being closely related, and O. australiensis (EE) being a distant relative of both (Morishima 1984). Although the similarity of the banding patterns apparently correlates with the genomic similarity of the species, polymorphism was also detected between the two A genome species (fig. 1). These findings are evidence of the accumulation of species-specific elements owing to recent mutations. Four Distinct Ty1-copia Group Retrotransposon Families in Grass We analyzed the phylogenetic relationships between 177 Ty1-copia group retrotransposons, including 101 from common wheat and its diploid relatives and 63 from rice (tables 2 and 3) by constructing a neighborjoining tree (fig. 2). Two dicot sequences, Ta1–3 of Arabidopsis (Voytas and Ausubel 1988) and Tnt1 of tobacco (Grandbastien, Spielmann, and Caboche 1989), were included as references. The four major families recognizable in the tree were designated G1 (grass number 1), G2, G3, and G4, but 14 sequences remained unclassified (table 3). Branches defining G1, G3, and G4 were supported by significantly high bootstrap probabilities (100%). For the G2 family, the probability (94.2%) is high but not statistically significant (Felsenstein 1985). In a tree in which two dicot sequences (Ta1–3 and Tnt1) had been removed, however, the G2 Evolution of Retrotransposons in Grass 211 FIG. 1.—Hybridization of rice RT clones to the genomic DNAs of 12 selected grass species. Genomic DNAs were digested with TaqI (a– c) or HaeIII (d). family branch was significantly supported (100%). The averages of intrafamily nucleotide divergence range from 0.14 to 0.22 (table 4), whereas those of interfamily nucleotide divergence range from 0.42 to 0.49, indicative of these families’ isolation from each other. The four families can be regarded as ‘‘superfamilies,’’ be- cause each contains clusters supported by 100% probability of bootstrapping and the ‘‘wheat retrotransposon families’’ previously identified (Matsuoka and Tsunewaki 1996). In spite of this, at present we prefer to use the term ‘‘family’’ regardless of the potential representativity of a super- or subfamily. 212 Matsuoka and Tsunewaki similar to the divergences of a-amy3 (0.17–0.23, average 0.20) and waxy (0.17) (table 4). This indicates that the RT domain of the G1 elements is conserved to the same extent as that of the nuclear protein-coding genes. The RT domain of the G4 elements, however, showed greater divergence (0.26–0.44, average 0.35). Table 3 The RT Clones Analyzed SPECIES FAMILY Common Wheat and its Diploid Relativesa G1 . . . . . . G2 . . . . . . G3 . . . . . . G4 . . . . . . Others . . . Total . . . . 10 (5) 0 0 91 (57) 0 101 (62) Oryza sativa 21 16 11 7 8 63 (14) (10) (8) (4) (8) (44) Others 1 1 0 5 6 13 (1) (1) (4) (5) (11) TOTAL 32 17 11 103 14 177 (20) (11) (8) (65) (13) (117) NOTE.—Numbers of intact clones are given in parentheses. a Triticum aestivum, Triticum urartu, Aegilops speltoides, and Aegilops squarrosa. The widely distributed Ty1-copia retrotransposons of wheat (W-families 1 and 2) were placed in G1, whereas two widely distributed rice elements, Ros4 and Ros6, were placed in two separate families, G1 and G2 (fig. 2). It is noteworthy that the two dicot sequences, Tnt1 and Ta1–3, were both placed close to G2 (89.9% bootstrap probability; fig. 2). Another notable fact is that Ros12, a highly Oryza-specific element, was placed in G4 together with five highly Pooideae-specific wheat families, W-families 3, 4, 5, 6, and 7, and several sequences from other grass species, including maize (subfamily Panicoideae) (table 3). In contrast, Ros13, the other highly Oryza-specific element, was placed in G3, which is composed of only rice elements (fig. 2). RT Domain Divergence of Ty1-copia Group Retrotransposon Families in Grass Southern hybridization results (Matsuoka and Tsunewaki 1997a and this study) indicated that the RT domains of the G1 and G2 elements are well conserved among distantly related species belonging to different Poaceae subfamilies. The RT nucleotide divergence between the wheat and rice elements was calculated to determine to what extent they are conserved (table 4) and was then compared with the divergence of two nuclear genes (encoding a-amylase 3 and waxy protein) (table 4). Calculations were made only for the G1 and G4 families in which both the wheat and rice RT sequences were available. Genes encoding a-amylase 3 (Baulcombe et al. 1987; Huang et al. 1990; Sutliff et al. 1991) represent a subfamily of the plant amylase multigene family (Huang, Stebbins, and Rodriguez 1992). This subfamily is assumed to have been formed after the divergence of monocots and dicots (Huang, Stebbins, and Rodriguez 1992). Genomic and cDNA nucleotide sequences encoding the waxy protein (presumably orthologous) have been reported, respectively, in rice (Hirano and Sano 1991) and wheat (Clark, Robertson, and Ainsworth 1991). The divergence of these nuclear genes was calculated using the sequences of the coding regions. The wheat–rice divergence of the G1 elements ranges from 0.11 to 0.27 (average 0.20) (table 4), very Synonymous and Nonsynonymous Substitutions The numbers of synonymous and nonsynonymous substitutions per site were estimated for the intact RT sequences of each family (table 5). Here, ‘‘intact’’ describes sequences carrying neither frameshift nor nonsense mutations. Its antonym is ‘‘defective.’’ Exceptions were BARE-1 (Manninen and Schulman 1993) and Tar1 (Matsuoka and Tsunewaki 1997a). These are intact in the RT domain but carry nonsense mutations and frameshifts in the other domains and were therefore excluded from the estimation. As expected for the coding sequences, nonsynonymous substitutions are very suppressed compared with synonymous substitutions, indicating that the RT domain has been subjected to purifying selection. The degree of suppression, however, varies among the families. The first (7.08) and second (5.04) highest synonymous/nonsynonymous (S/N) ratios were for the G1 and G2 families, whereas there was a low S/N ratio (3.55) for G4. G3, a highly Oryza-specific family, had an S/N ratio (3.60) similar to that of G4. For some retroelements, the S/N ratios are reported to be low between closely related elements and high between divergent ones (Springer, Davidson, and Britten 1991; Lathe et al. 1995; Springer et al. 1995; McAllister and Werren 1997). In our study, the G4 family contained two large wheat retrotransposon families, W-families 4 and 7 (fig. 2). Each consists of a number of closely related elements with ,0.1-nt divergence on average (16 intact elements in W-family 4 and 39 in W-family 7) (table 3, see also Matsuoka and Tsunewaki 1996). We therefore suspected that the inclusion of many closely related wheat RT sequences may have lowered the total S/N ratio of the G4 family (3.55). To clarify this, we examined the relationship between the S/N ratios and total nucleotide divergence. Using the elements of each of the four families, we plotted the S/N ratios estimated for all possible pairs against the total divergence between the paired sequences (fig. 3). For the G1 family, smaller S/N ratios were found between closely related sequences than between divergent sequences (fig. 3a). The G2 and G3 families had similar patterns (data not shown). These results indicate that the phenomenon observed in the RT domains of other retroelements (Springer, Davidson, and Britten 1991; Lathe et al. 1995; Springer et al. 1995; McAllister and Werren 1997) is common to the grass Ty1-copia retrotransposons. The G4 family was exceptional in that the correlation between the S/N ratio and the degree of sequence divergence was not obvious (fig. 3b). Interestingly, high S/N ratios (.20) were found between some closely related sequences (less than 0.1 divergence). We therefore conclude that the relatively low S/R ratio for the G4 family (table 5) is not the result of the inclusion of many closely related sequences. Evolution of Retrotransposons in Grass 213 FIG. 2.—Neighbor-joining tree of the grass Ty1-copia group retrotransposons based on RT nucleotide sequences. Distances between sequences were calculated as the proportions of nucleotide sites at which two sequences differ. Numbers on the branches are the bootstrap percentages for 1,000 replicates. Wheat retrotransposon families previously identified (Matsuoka and Tsunewaki 1996) are shown as the ‘‘Wfamily.’’ Names of the source species were given after the clone names. Asterisks indicate rice clones. See table 2 for accession numbers of the sequences. 214 Matsuoka and Tsunewaki Table 4 Nucleotide Divergence of the RT Clones and Two Selected Nuclear Genes Family or Gene Average Divergence Between Wheat and No. of Average Divergence Rice Sequences Sequences (range) (range) Intrafamily divergence of the RT clones G1 . . . . . . . . 32 0.16 (0.01–0.27) G2 . . . . . . . . 17 0.14 (0.00–0.31) G3 . . . . . . . . 11 0.22 (0.01–0.36) G4 . . . . . . . . 103 0.17 (0.00–0.44) 0.20 (0.11–0.27) NA NA 0.35 (0.26–0.44) Nuclear genes a-amy3a . . . waxyb . . . . . 0.21 (0.17–0.23) 0.17 (NA) 6 2 0.19 (0.06–0.23) 0.17 (NA) NOTE.—Nucleotide divergence was calculated as the proportion of nucleotide sites at which the two sequences compared differ. NA 5 not applicable. a Database accession numbers: M16991, M59351, M59352, X56336, X56337, and X56338. b Database accession numbers: X57233 and X58228. Discussion Evolutionary Dynamics of Ty1-copia Group Retrotransposons in Grass The retroelement RT domain has been the subject of extensive evolutionary studies in animals (e.g., Xiong and Eickbush 1990; Eickbush et al. 1995; Springer et al. 1995; Casavant, Sherman, and Wichman 1996; McAllister and Werren 1997), whereas there have been relatively few such studies in plants. The more than 100 RT sequences obtained by us and others (table 2) provide the opportunity to analyze the evolutionary dynamics of grass Ty1-copia group retrotransposons in detail for the first time. Four families (named G1, G2, G3, and G4) were identified in the neighbor-joining tree constructed with 177 RT sequences (fig. 2). Results of Southern hybridization and RT nucleotide sequence analysis suggest that each of the four families underwent a distinct pattern of evolution. G1 Family As indicated by the high S/N ratio estimated for the G1 family, a relatively high percentage of G1 elements are considered to have maintained functional RT domains for long periods of time. The wide distribution range and abundance of elements carrying a functional RT domain suggest that the G1 family, established before radiation of the grass species, remained active until recently. This view is supported by our previous data, which suggest recent amplifications of the wheat G1 elements during the divergence of Triticum and Aegilops that took place some million years ago (Matsuoka and Tsunewaki 1997b). The hardly detectable hybridization signals in maize and sorghum, however, indicate that the extinction or divergence of specific G1 elements indeed did occur during the differentiation of these species (fig. 1c). G2 Family The G2 family may have originated before the divergence of monocots and dicots (200 MYA; see Wolfe Table 5 Numbers of Synonymous and Nonsynonymous Substitutions per Site Estimated for the Grass Ty1-copia Group Retrotransposon Families No. of Family Clones G1 G2 G3 G4 .... .... .... .... 20 11 8 65 Synonymous Substitutions (S) 0.4732 0.4251 0.4912 0.3481 6 6 6 6 0.0408 0.0398 0.0435 0.0330 Nonsynonymous Substitutions (N) Ratio (S/N) 6 6 6 6 7.08 5.04 3.60 3.55 0.0668 0.0844 0.1363 0.0981 0.0085 0.0124 0.0155 0.0097 et al. 1989), because two dicot elements (Tnt1 and Ta1– 3) were placed close to the family (89.9% bootstrap probability) (fig. 2). The percentage of elements carrying a functional RT domain seems to be high in this family, as inferred from the high S/N ratio. Extinction or divergence of the G2 elements has occurred in the subfamily Pooideae, as they are not detectable in four species of the taxon (fig. 1b). G2 appears to be a longsurviving family with a functional RT domain, at least in some grass species. G3 Family The G3 family generally appears to have been maintained in Oryza. Its elements have some features in common with G4: high specificity to a given taxon and a low S/N ratio. The evolutionary dynamics may also be similar to those of the G4 elements (see below), but further characterization is necessary. G4 Family The G4 family consists of elements derived from species of various subfamilies, suggesting that the G4 family was established before radiation of the grass species. Unlike the elements of G1 and G2, the elements of G4 appear highly specific to given taxa, as shown by Southern hybridization. This high specificity to given taxa is consistent with the high degree of nucleotide divergence between the wheat and rice elements compared with that of the nuclear protein-coding genes (table 4). We assume that primarily independent amplifications of specific elements are responsible for the high specificity of the G4 elements to given taxa. One implication for the evolutionary dynamics of grass Ty1-copia group retrotransposons is that these families probably evolved in parallel. For example, the G1 family is suggested to have originated before radiation of the grass species and to have been active during the divergence of the Triticum and Aegilops species. In the period between the grass radiation (60 MYA) and the Triticum and Aegilops divergence (some million years ago), the G1 family supposedly was capable of reproducing new copies at an appropriate rate to compensate for the mutational loss of old ones in the lineage leading to modern Triticum and Aegilops. It is in this period that amplification of the highly Pooideae-specific elements of the G4 family took place. Parallel evolution (i.e., independent amplifications) of the G1 and G4 families is therefore assumed for the ancestor of Triticum and Aegilops. Similarly, two long-term surviving families (G1 and G2) appear to have evolved in parallel in Evolution of Retrotransposons in Grass 215 FIG. 3.—Diagrams showing the relationship between the S/N ratio and the total divergence of the G1 (a) and G4 (b) elements. A linear model was assumed in the regression analysis. The coefficients are highly significant (P , 0.000). the lineage leading to modern rice. The parallel evolution of these families is supported by the branched tree (fig. 2) whose topology is the one expected for families that have multiple elements capable of producing new ones (Clough et al. 1996). This is in sharp contrast to mammal retroelements such as L1 and Alu, in which evolution seems to be controlled by a single or very limited number of master genes that produce a small number of replicatively competent progeny that form a ‘‘master lineage’’ as well as a large number of replicatively incompetent progeny (Deininger et al. 1992). The coexistence of multiple lineages of retroelements has been documented for various organisms (Burke et al. 1993; VanderWeil, Voytas, and Wendel 1993; Springer et al. 1995; Casavant, Sherman, and Wichman 1996). Amplification Mechanism of the G4 Elements The S/N ratio enables us to measure the intensity of purifying selection on a given coding sequence. Because purifying selection suppresses nonsynonymous substitutions that would impair protein function, sequences under less intensive purifying selection should show low S/N ratios; therefore, sequences derived from RT domains that are not functional for long periods of time should have low S/N ratios. In the present cases, the percentage of such sequences in each family affects the S/N ratio, because the RT sequences analyzed are a mixture of those derived from functional and nonfunctional RT domains. The percentage of closely related sequences is another factor that may affect the S/N ratio (see Results). Our analyses showed that the low S/N ratio of the G4 family was not due to the presence of many closely related sequences; it was therefore inferred to result primarily from the high percentage of elements that carry nonfunctional RT domains and are free from purifying selection. It is noteworthy that some G4 elements are abundantly present in the host genome. BARE-1 and its related elements make up 6.7% of the haploid barley genome (Suoniemi et al. 1996). The elements of W-family 7 are abundant in wheat and its close relatives, as shown by Southern hybridization (Matsuoka and Tsunewaki 1996). These facts indicate that these elements once actively amplified during the host’s evolution. We assume that a small number of elements carrying functional RT domains are present in the G4 family, because S/N ratios higher than 20 were found. These high ratios were mainly between elements of W-family 7 (the best-sampled family) and are comparable to those between diverged G1 elements. A high percentage of elements carrying nonfunctional RT domains or a small number of elements carrying functional RT domains suggests that production of new G4 elements has been controlled by a small number of elements carrying functional RT domains. These elements may have served as ‘‘master genes’’ in the evolution of the G4 family. The presence of diagnostic nucleotides that define the wheat families supports the existence of master elements from which progeny are replicated (Matsuoka and Tsunewaki 1996). Alternatively, elements that carry a functional RT domain may be able to help other elements retrotranspose by providing RT which acts in trans. Acknowledgment We thank Toshie Miyabayashi of the National Institute of Genetics, Japan, for providing the seeds of O. australiensis and O. glaberrima. LITERATURE CITED BAULCOMBE, D. C., A. K. HUTTLY, R. A. MARTIENSSEN, R. F. BARKER, and M.G. JARVIS. 1987. A novel wheat a-amylase gene (a-Amy3). Mol. Gen. Genet. 209:33–40. BENNETZEN, J. L. 1996. The contributions of retroelements to plant genome organization, function and evolution. Trends Microbiol. 4:347–353. BURKE, W. D., D. G. EICKBUSH, Y. XIONG, J. JAKUBCZAK, and T. H. EICKBUSH. 1993. Sequence relationship of retrotransposable elements R1 and R2 within and between divergent insect species. Mol. Biol. Evol. 10:163–185. CASACUBERTA, J. M., and M. A. GRANDBASTIEN. 1993. Characterisation of LTR sequences involved in the protoplast specific expression of the tobacco Tnt1 retrotransposon. Nucleic Acids Res. 21:2087–2093. 216 Matsuoka and Tsunewaki CASAVANT, N. C., A. N. SHERMAN, and H. A. WICHMAN. 1996. Two persistent LINE-1 lineages in Peromyscus have unequal rates of evolution. Genetics 142:1289–1298. CLARK, J. R., M. ROBERTSON, and C. C. AINSWORTH. 1991. Nucleotide sequence of a wheat (Triticum aestivum L.) cDNA clone encoding the waxy protein. Plant Mol. Biol. 16:1099–1101. CLOUGH, J. E., J. A. FOSTER, M. BARNETT, and H. A. WICHMAN. 1996. Computer simulation of transposable element evolution: random template and strict master models. J. Mol. Evol. 42:52–58. DEININGER, P. L., M. A. BATZER, C. A. HUTCHISON III, and M. H. EDGELL. 1992. Master genes in mammalian repetitive DNA amplification. Trends Genet. 9:307–311. EICKBUSH, D. G., W. C. LATHE III, M. P. FRANCINO, and T. H. EICKBUSH. 1995. R1 and R2 retrotransposable elements of Drosophila evolve at rates similar to those of nuclear genes. Genetics 139:685–695. EICKBUSH, T. H. 1994. Origin and evolutionary relationships of retroelements. Pp. 121–160 in S. S. MORSE, ed. The evolutionary biology of viruses. Raven Press, New York. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791. FLAVELL, A. J., E. DUNBAR, R. ANDERSON, S. R. PEARCE, R. HARTLEY, and A. KUMAR. 1992. Ty1-copia group retrotransposons are ubiquitous and heterogeneous in higher plants. Nucleic Acids Res. 20:3639–3644. GRANDBASTIEN, M. A. 1992. Retroelements in higher plants. Trends Genet. 8:103–108. GRANDBASTIEN, M. A., A. SPIELMANN, and M. CABOCHE. 1989. Tnt1, a mobile retroviral-like transposable element of tobacco isolated by plant cell genetics. Nature 337:376–380. HIRANO, H., and Y. SANO. 1991. Molecular characterization of the waxy locus of rice (Oryza sativa). Plant Cell Physiol. 32:989–997. HIROCHIKA, H. 1993. Activation of tobacco retrotransposons during tissue culture. EMBO J. 12:2521–2528. HIROCHIKA, H., A. FUKUCHI, and F. KIKUCHI. 1992. Retrotransposon families in rice. Mol. Gen. Genet. 233:209–216. HIROCHIKA, H., and R. HIROCHIKA. 1993. Ty1-copia group retrotransposons as ubiquitous components of plant genomes. Jpn. J. Genet. 68:35–46. HIROCHIKA, H., and H. OTSUKI. 1995. Extrachromosomal circular forms of the tobacco retrotransposon Tto1. Gene 165: 229–232. HIROCHIKA, H., H. OTSUKI, M. YOSHIKAWA, Y. OTSUKI, K. SUGIMOTO, and S. TAKEDA. 1996a. Autonomous transposition of the tobacco retrotransposon Tto1 in rice. Plant Cell 8:725–734. HIROCHIKA, H., K. SUGIMOTO, Y. OTSUKI, H. TSUGAWA, and M. KANDA. 1996b. Retrotransposons of rice involved in mutations induced by tissue culture. Proc. Natl. Acad. Sci. USA 93:7783–7788. HUANG, N., N. KOIZUMI, S. REINL, and R. L. RODRIGUEZ. 1990. Structural organization and different expression of rice a-amylase genes. Nucleic Acids Res. 18:7007–7014. HUANG, N., G. L. STEBBINS, and R. L. RODRIGUEZ. 1992. Classification and evolution of a-amylase genes in plants. Proc. Natl. Acad. Sci. USA 89:7526–7530. KATAYAMA, H., and Y. OGIHARA. 1996. Phylogenetic affinities of the grasses to other monocots as revealed by molecular analysis of chloroplast DNA. Curr. Genet. 29:572–581. KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molecular evolutionary genetics analysis. Version 2.01. Pennsylvania State University, University Park. LATHE, W. C. III, W. D. BURKE, D. G. EICKBUSH, and T. H. EICKBUSH. 1995. Evolutionary stability of the R1 retrotran- sposable element in the genus Drosophila. Mol. Biol. Evol. 12:1094–1105. MCALLISTER, B. F., and J. H. WERREN. 1997. Phylogenetic analysis of a retrotransposon with implications for strong evolutionary constraints on reverse transcriptase. Mol. Biol. Evol. 14:69–80. MANNINEN, I., and A. H. SCHULMAN. 1993. BARE-1, a copialike retroelement in barley (Hordeum vulgare L.). Plant Mol. Biol. 22:829–846. MATSUOKA, Y., and K. TSUNEWAKI. 1996. Wheat retrotransposon families identified by reverse transcriptase domain analysis. Mol. Biol. Evol. 13:1384–1392. . 1997a. Presence of wheat retrotranposons in Gramineae species and the origin of wheat retrotransposon families. Genes Genet. Syst. 72:335–343. . 1997b. Origin and the transmission of some types of family 1 wheat retrotransposons in the two related genera Triticum and Aegilops. Genes Genet. Syst. 72:345–351. MHIRI, C., J. B. MOREL, S. VERNHETTES, J. M. CASACUBERTA, H. LUCAS, and M. A. GRANDBASTIEN. 1997. The promoter of the tobacco Tnt1 retrotransposon is induced by wounding and by abiotic stress. Plant Mol. Biol. 33:257–266. MOREAU-MHIRI, C., J. B. MOREL, C. AUDEON, M. FERAULT, M. A. GRANDBASTIEN, and H. LUCAS. 1996. Regulation of expression of the tobacco Tnt1 retrotransposon in heterologous species following pathogen-related stresses. Plant J. 9:409–419. MORISHIMA, H. 1984. Wild plants and domestication. Pp. 3– 30 in S. TSUNODA and N. TAKAHASHI, eds. Biology of rice. Jpn. Sci. Soc. Press, Tokyo/Elsevier, Amsterdam. MURPHY, G. J. P., H. LUCAS, G. MOORE, and R. B. FLAVELL. 1992. Sequence analysis of Wis-2-1A, a retrotransposonlike element from wheat. Plant Mol. Biol. 20:991–995. MURRAY, M. G., and W. F. THOMPSON. 1980. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8:4321–4325. NOMA, K., R. NAKAJIMA, H. OHTSUBO, and E. OHTSUBO. 1997. RIRE1, a retrotransposon from wild rice Oryza australiensis. Genes Genet. Syst. 72:131–140. PEARCE, S. R., G. HARRISON, P. (J. S.) HESLOP-HARRISON, A. J. FLAVELL, and A. KUMAR. 1997. Characterization and genomic organization of Ty1-copia group retrotransposons in rye (Secale cereale). Genome 40:617–625. POUTEAU, S., M. A. GRANDBASTIEN, and M. BOCCARA. 1994. Microbial elicitors of plant defence responses activate transcription of a retrotransposon. Plant J. 5:535–542. SANMIGUEL, P., A. TIKHONOV, Y. K. JIN et al. (11 co-authors). 1996. Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765–768. SPRINGER, M. S., E. H. DAVIDSON, and R. J. BRITTEN. 1991. Retroviral-like element in a marine invertebrate. Proc. Natl. Acad. Sci. USA 88:8401–8404. SPRINGER, M. S., N. A. TUSNEEM, E. H. DAVIDSON, and R. J. BRITTEN. 1995. Phylogeny, rates of evolution, and patterns of codon usage among sea urchin retroviral-like elements, with implications for the recognition of horizontal transfer. Mol. Biol. Evol. 12:219–230. SUONIEMI, A., K. ANAMTHAWAT-JÓNSSON, T. ARNA, and A. H. SCHULMAN. 1996. Retrotransposon BARE-1 is a major, dispersed component of the barley (Hordeum vulgare L.) genome. Plant Mol. Biol. 30:1321–1329. SUONIEMI, A., A. NARVANTO, and A. H. SCHULMAN. 1996. The BARE-1 retrotransposon is transcribed in barley from an LTR promoter active in transient assays. Plant Mol. Biol. 31:295–306. Evolution of Retrotransposons in Grass SUTLIFF, T. D., N. HUANG, J. C. LITTS, and R. L. RODRIGUEZ. 1991. Characterization of an a-amylase multigene cluster in rice. Plant Mol. Biol. 16:579–591. THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighing, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. TURCICH, M. P., A. BOKHARI-RIZA, D. A. HAMILTON, C. HE, W. MESSIER, C. B. STEWART, and J. P. MASCARENHAS. 1996. PREM-2, a copia-type retroelement in maize is expressed preferentially in early microspores. Sex. Plant Reprod. 9:65–74. VANDERWIEL, P. L., D. F. VOYTAS, and J. F. WENDEL. 1993. Copia-like retrotransposable element evolution in diploid and polyploid cotton (Gossypium L.). J. Mol. Evol. 36:429– 447. VOYTAS, D. F., and F. M. AUSUBEL. 1988. A copia-like transposable element family in Arabidopsis thaliana. Nature 336:242–244. VOYTAS, D. F., M. P. CUMMINGS, A. KONIECZNY, F. M. AUSUBEL, and S. R. RODERMEL. 1992. copia-like retrotranspo- 217 sons are ubiquitous among plants. Proc. Natl. Acad. Sci. USA 89:7124–7128. WANG, S., Q. ZHANG, P. J. MAUGHAN, and M. A. SAGHAI MAROOF. 1997. Copia-like retrotransposons in rice: sequence heterogeneity, species distribution and chromosomal locations. Plant Mol. Biol. 33:1051–1058. WHITE, S. E., L. F. HABERA, and S. R. WESSLER. 1994. Retrotransposons in the flanking regions of normal plant genes: a role for copia-like elements in the evolution of gene structure and expression. Proc. Natl. Acad. Sci. USA 91:11792– 11796. WOLFE, K. H., M. GOUY, Y. W. YANG, P. M. SHARP, and W. H. LI. 1989. Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc. Natl. Acad. Sci. USA 86:6201–6205. XIONG, Y., and T. H. EICKBUSH. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362. THOMAS H. EICKBUSH, reviewing editor Accepted October 22, 1998
© Copyright 2026 Paperzz