Evolutionary Dynamics of Ty1-copia Group Retrotransposons in

Evolutionary Dynamics of Ty1-copia Group Retrotransposons in Grass
Shown by Reverse Transcriptase Domain Analysis
Yoshihiro Matsuoka and Koichiro Tsunewaki
Department of Bioscience, Fukui Prefectural University, Fukui, Japan
The evolutionary dynamics of Ty1-copia group retrotransposons in grass were examined by reverse transcriptase (RT)
domain analysis. Twenty-three rice RT sequences were newly determined for this report. Phylogenetic analysis of 177
RT sequences, mostly derived from wheat, rice, and, maize, showed four distinct families, which were designated G1,
G2, G3, and G4. Three of these families have elements obtained from distantly related species, indicative of origins
prior to the radiation of grass species. Results of Southern hybridization and detailed comparisons between the wheat
and rice sequences indicated that each of the families had undergone a distinct pattern of evolution. Multiple families
appear to have evolved in parallel in a host species. Analyses of synonymous and nonsynonymous substitutions
suggested that there is a low percentage of elements carrying functional RT domains in the G4 family, indicating that
the production of new G4 elements has been controlled by a small number of elements carrying functional RT domains.
Introduction
Ty1-copia group retrotransposons are one of the best
characterized groups of plant retrotransposons. They have
sequences called long terminal repeats (LTRs) at both
ends and carry regions encoding gag, proteinase, endonuclease, reverse transcriptase (RT), and RNase H between the LTRs in the order typically observed for Ty1
of Saccharomyces and copia of Drosophila (Grandbastien 1992). They replicate by reverse transcription of
RNA intermediates transcribed from the parental copies.
Unlike DNA-DNA type transposons, their life cycle lacks
excision from the genome (see Eickbush 1994 for review). Once inserted, they usually remain in the locus.
Ty1-copia group retrotransposons have been reported for a wide range of plant taxa, including angiosperms, gymnosperms, ferns, lycopods, and bryophytes,
evidence that they are ubiquitous in the plant kingdom
(Flavell et al. 1992; Voytas et al. 1992; Hirochika and
Hirochika 1993). The majority of plant Ty1-copia group
retrotransposons, like other plant retroelements, are
thought to be rarely active (see Bennetzen 1996 for review). Their expression and transposition, however, are
inducible by such stresses as protoplast isolation and
tissue culture, as well as by pathogen-related stresses
(Grandbastien, Spielmann, and Caboche 1989; Hirochika 1993; Pouteau, Grandbastien, and Boccara 1994; Moreau-Mhiri et al. 1996; Mhiri et al. 1997). Detection of
their transcripts under ordinary growth conditions has
also been reported (Suoniemi, Narvanto, and Schulman
1996; Pearce et al. 1997). The cis-regulating sequences
in the LTRs for transcription have been identified (Casacuberta and Grandbastien 1993; Hirochika et al. 1996a;
Suoniemi, Narvanto, and Schulman 1996). The extrachromosomal circular DNA intermediates formed during retrotransposition have been analyzed in Tto1 from
tobacco (Hirochika and Otsuki 1995). Culture-inducible
activation of an element in rice called Tos17 has been
proposed for insertional mutagenesis and gene function
analysis (Hirochika et al. 1996b).
Compared with the progress made in understanding
the functional aspects of plant Ty1-copia group retrotransposons, relatively little is known about their evolution. Their ubiquity in the plant kingdom suggests that
they are of very ancient origin. Some grass elements
from the tribe Triticeae make up at least several percent
of the genome (Suoniemi et al. 1996; Turcich et al.
1996; Pearce et al. 1997), indicative of active amplifications during the host’s evolution. This abundance suggests that in some manner they may have affected the
determination of the host genome structure. Analyses of
large numbers of RT sequences have shown the family
structures of the Ty1-copia group retrotransposons in
cotton (VanderWiel, Voytas, and Wendel 1993) and
wheat (Matsuoka and Tsunewaki 1996). In wheat, further characterizations suggested (1) that there are two
age groups in the wheat Ty1-copia group retrotransposon families, one established before and one established
after radiation of the subfamily Pooideae (Matsuoka and
Tsunewaki 1997a), and (2) that an ancient family whose
origin might date to before the divergence of wheat and
rice (60 MYA) was active during the divergence of the
Triticum and Aegilops species that occurred some millions of years ago (Matsuoka and Tsunewaki 1997b).
To clarify the evolutionary relationships between
wheat and other grass Ty1-copia group retrotransposons,
RT sequences obtained from several grass species were
analyzed. Here, we report the identification of four distinct families, the distribution of these families among
the grass species, and the characteristics of the nucleotide substitution patterns. The evolutionary dynamics of
the Ty1-copia group retrotransposons in grass and possible amplification mechanisms are discussed on the basis of our findings.
Key words: Ty1-copia group retrotransposon, reverse transcriptase, grass, evolution.
Materials and Methods
Plant Materials
Address for correspondence and reprints: Yoshihiro Matsuoka,
Fukui Prefectural University, Matsuoka-cho, Yoshida-gun, Fukui 910–
1142, Japan. E-mail: [email protected].
Mol. Biol. Evol. 16(2):208–217. 1999
q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
208
Twelve grass species representing four of the five
subfamilies of the grass family (Poaceae) were used (table 1). The Poaceae is considered monophyletic (Katayama and Ogihara 1996).
Evolution of Retrotransposons in Grass
209
Table 1
Poaceae Species Studied
Subfamily
Tribe
Bambusoideae . . Oryzeae
Panicoideae . . . . Paniceae
Andropogoneae
Chloridoideae . . Chlorideae
Pooideae . . . . . . . Aveneae
Triticeae
a
b
Species
Oryza sativa L.
Oryza glaberrima Stud.a
Oryza australiensis Dominb
Setaria italica (L.) P. Beauv.
Echinochloa utilis Ohwi et Yabuno
Zea mays L.
Sorghum bicolor Moench
Eleusine coracana Gaertn.
Avena sativa L.
Hordeum vulgare L.
Secale cereale L.
Triticum aestivum L.
Common Name
Rice
African rice
Australian wild rice
Foxtail millet
Japanese barnyard millet
Maize
Sorghum
Finger millet
Oats
Barley
Rye
Common wheat
Accession number W0025; supplied by the National Institute of Genetics, Mishima, Japan.
Accession number W1538; supplied by the National Institute of Genetics, Mishima, Japan.
DNA Isolation, Cloning, and Sequencing
Total DNA was extracted from the leaves of greenhouse-grown plants by the CTAB method (Murray and
Thompson 1980). The PCR method of Hirochika, Fukuchi, and Kikuchi (1992) was used to amplify ;270bp fragments of the RT domain with total DNA as the
template. Degenerate primers 59-CA(A/G)ATGGA(C/
T)GT(A/C/G/T)AA(A/G)AC-39 and 59-CAT(A/
G)TC(A/G)TC(A/C/G/T)AC(A/G)TA-39, which respectively correspond to the conserved RT peptide motifs
(QMDVKT and YVDDM) of the Ty1-copia group retrotransposons, were used. PCR, cloning, and sequencing of the products were done as described previously
(Matsuoka and Tsunewaki 1996).
Southern Hybridization
Southern hybridization using a four-base-recognizing restriction endonuclease in combination with a short
probe was performed to detect restriction site polymorphism within and near the RT domain. Total DNAs (2–
12 mg) were digested with HaeIII or TaqI as specified
by the manufacturer (Takara Shuzo). Although large
amounts of DNA were used for large-genome-size species like wheat, the number of nuclei represented in
samples may vary by species. The digested samples
were separated in a 1.5% agarose gel and were then
transferred to a Hybond N1 membrane (Amasham).
Probes were prepared from the rice RT domain clones
by PCR using the M13 primers. Probe labeling and hybridization were done with a DIG labeling and detection
kit (Boehringer Mannheim). The membranes were
washed under low-stringency conditions: twice with 2
3 SSC plus 0.1% SDS at room temperature for 5 min,
then twice with 0.1 3 SSC plus 0.1% SDS at 588C for
15 min. The hybridized probes were made visible on Xray film (Fuji Film) by a chemiluminescent reaction with
Lumi-phosy 530.
Computer Analysis
Initial sequence manipulation and multiple-sequence alignment were done with DNASIS software
(Hitachi Software Engineering Co.) and CLUSTAL W,
version 1.7 (Thompson, Higgins, and Gibson 1994), respectively. The latter was also used for neighbor-joining
tree generation and bootstrapping. MEGA software (Kumar, Tamura, and Nei 1993) was used to estimate the
total nucleotide sequence divergence (calculated as the
proportion of nucleotide sites at which the two sequences compared differ), the number of synonymous substitutions per synonymous site, and the number of nonsynonymous substitutions per nonsynonymous site. Sequences corresponding to PCR-primer-annealing sites
were excluded from the analysis. The Ty1-copia group
RT sequences analyzed are given in table 2, along with
their database accession numbers. Note that the five species (Triticum urartu, Aegilops speltoides, Aegilops
squarrosa, Arabidopsis thaliana, and Nicotiana tabacum) used as sources of the RT sequences are not listed
in table 1.
Results
Rice Ty1-copia Group Retrotransposons Show
Different Distribution Ranges in Grass
The RT sequences of Ty1-copia group retrotransposons were amplified by PCR from rice (Oryza sativa)
and were then cloned and sequenced. Twenty-three new
RT clones and three clones identical to those previously
reported (RICTOSRT4, Tos12, and Tos18 in table 2)
were isolated. The nucleotide sequences of these clones
were successfully aligned with many other plant Ty1copia group RT sequences (see below), evidence that
they are authentic. The 23 newly isolated sequences
have been designated Ros1 (retrotransposon of Oryza
sativa 1)–Ros23 and deposited in the DDBJ/EMBL/
GenBank databases (accession numbers AB017978–
AB018000). The nucleotide sequence divergences of the
new clones range from 0.4% to 57.8% (average 33.2%),
evidence of the high level of heterogeneity of rice retrotransposons (Hirochika et al. 1996b; Wang et al.
1997). Four distantly related clones (Ros4, Ros6, Ros12,
and Ros13) were selected as probes in order to examine
the distribution of rice Ty1-copia group retrotransposons
among the grass species. The nucleotide sequence divergences of these selected clones range from 38.3% to
49.8% (average 43.0%).
Figure 1 shows the Southern blots in which homologs of the selected rice RT clones of the 12 grass
210
Matsuoka and Tsunewaki
Table 2
Accession Numbers of the Ty1-copia Group Reverse Transcriptase Sequences Analyzed
Species
Triticum aestivum . . . . . .
Triticum urartu . . . . . . . .
Aegilops speltoides . . . . .
Aegilops squarrosa . . . . .
Hordeum vulgare . . . . . .
Avena sativa . . . . . . . . . .
Zea mays . . . . . . . . . . . . .
Oryza sativa . . . . . . . . . .
Oryza australiensis . . . . .
Arabidopsis thaliana . . . .
Nicotiana tabaccum . . . .
Database Locus Name
(element name)
Accession
TAWIS21A (Wis-2-1A)
AB008772 (Tar1)
TAPOL
TACOPIA
D90618–D90671
D90602–D90617
D90685–D90698
D90672–D90684
HVBARE1 (BARE-1)
BLYCOPIA
ASTCOPIA
ZMU68408 (Opie-2)
ZMU41000 (PREM-2)
ZMU12626 (Hopscotch)
MZECOPIAA
MZECOPIAB
MZEPOL1
MZEPOL2
RICCOPIA
RICTOSRT1
RICTOSRT4
D85865–D85879 (Tos6–Tos20)
OSCORET1–OSCORET5 (Rrt1–Rrt5)
OSCORET7–OSCORET23 (Rrt7–Rrt23)
AB017978–AB018000 (Ros1–Ros23)
D85597 (RIRE1)
ATTA13 (Ta1-3)
NTTNT194 (Tnt1)
species were probed (table 1). The Ros6 probe detected
its homologs in all of the species except Sorghum bicolor (fig. 1a). Homologs of Ros4 were detected in the
Oryza species (Bambusoideae) and in two of the Panicoideae species, Setaria italica (very faintly) and Echinochloa utilis (fig. 1b). In contrast, the Ros12 and Ros13
probes convincingly detected their homologs only in the
Oryza species (fig. 1c and d), with long exposure giving
only very faint visible bands in Echinochloa utilis and
Avena sativa when Ros12 was the probe (data not
shown). These results indicate that certain rice retrotransposons are detectable in distantly related species of
different subfamilies, whereas others are detectable only
in the Oryza species. The distribution range of the rice
Ty1-copia group retrotransposons in grass therefore
varies with the species. We will call the former ‘‘widely
distributed’’ and the latter ‘‘Oryza-specific.’’
The above results correlate very well with previous
findings about the distribution of wheat Ty1-copia group
retrotransposons in grass (Matsuoka and Tsunewaki
1997a). The wide distribution of some Ty1-copia group
retrotransposons of rice and wheat across the borders of
subfamilies suggests ancient origins prior to radiation of
the grass species.
The polymorphism detected with the probes indicates that there are HaeIII or TaqI site differences within
and near the RT domain. Polymorphism between closely
related species therefore reflects recent mutations that
have occurred in this region. Analysis of three species
from the genus Oryza (O. australiensis, O. glaberrima,
and O. sativa) showed that the banding patterns of O.
X63184
AB008772
D12832
M94498
D90618–D90671
D90602–D90617
D90685–D90698
D90672–D90684
Z17327
M94470
M94483
U68408
U41000
U12626
M94481
M94482
D12830
D12831
M94492
D12825
D12826
D85865–D85879
Z75496–Z75500
Z75502–Z75518
AB017978–AB018000
D85597
X13291
X13777
Reference
Murphy et al. (1992)
Matsuoka and Tsunewaki (1996)
Hirochika and Hirochika (1993)
Voytas et al. (1992)
Matsuoka and Tsunewaki (1996)
Matsuoka and Tsunewaki (1996)
Matsuoka and Tsunewaki (1996)
Matsuoka and Tsunewaki (1996)
Manninen and Schulman (1993)
Voytas et al. (1992)
Voytas et al. (1992)
SanMiguel et al. (1996)
Turcich et al. (1996)
White, Habera, and Wessler (1994)
Voytas et al. (1992)
Voytas et al. (1992)
Hirochika and Hirochika (1993)
Hirochika and Hirochika (1993)
Voytas et al. (1992)
Hirochika, Fukuchi, and Kikuchi (1992)
Hirochika, Fukuchi, and Kikuchi (1992)
Hirochika et al. (1996b)
Wang et al. (1997)
Wang et al. (1997)
This study
Noma et al. (1997)
Voytas and Ausubel (1988)
Grandbastien, Spielmann, and Caboche
(1989)
sativa were more similar to those of O. glaberrima than
to those of O. australiensis (fig. 1). This is consistent
with the genetic relationships between those species,
with O. glaberrima (genome constitution AA) and O.
sativa (AA) being closely related, and O. australiensis
(EE) being a distant relative of both (Morishima 1984).
Although the similarity of the banding patterns apparently correlates with the genomic similarity of the species, polymorphism was also detected between the two
A genome species (fig. 1). These findings are evidence
of the accumulation of species-specific elements owing
to recent mutations.
Four Distinct Ty1-copia Group Retrotransposon
Families in Grass
We analyzed the phylogenetic relationships between 177 Ty1-copia group retrotransposons, including
101 from common wheat and its diploid relatives and
63 from rice (tables 2 and 3) by constructing a neighborjoining tree (fig. 2). Two dicot sequences, Ta1–3 of Arabidopsis (Voytas and Ausubel 1988) and Tnt1 of tobacco (Grandbastien, Spielmann, and Caboche 1989),
were included as references. The four major families
recognizable in the tree were designated G1 (grass number 1), G2, G3, and G4, but 14 sequences remained unclassified (table 3). Branches defining G1, G3, and G4
were supported by significantly high bootstrap probabilities (100%). For the G2 family, the probability
(94.2%) is high but not statistically significant (Felsenstein 1985). In a tree in which two dicot sequences
(Ta1–3 and Tnt1) had been removed, however, the G2
Evolution of Retrotransposons in Grass
211
FIG. 1.—Hybridization of rice RT clones to the genomic DNAs of 12 selected grass species. Genomic DNAs were digested with TaqI (a–
c) or HaeIII (d).
family branch was significantly supported (100%). The
averages of intrafamily nucleotide divergence range
from 0.14 to 0.22 (table 4), whereas those of interfamily
nucleotide divergence range from 0.42 to 0.49, indicative of these families’ isolation from each other. The
four families can be regarded as ‘‘superfamilies,’’ be-
cause each contains clusters supported by 100% probability of bootstrapping and the ‘‘wheat retrotransposon
families’’ previously identified (Matsuoka and Tsunewaki 1996). In spite of this, at present we prefer to use
the term ‘‘family’’ regardless of the potential representativity of a super- or subfamily.
212
Matsuoka and Tsunewaki
similar to the divergences of a-amy3 (0.17–0.23, average 0.20) and waxy (0.17) (table 4). This indicates that
the RT domain of the G1 elements is conserved to the
same extent as that of the nuclear protein-coding genes.
The RT domain of the G4 elements, however, showed
greater divergence (0.26–0.44, average 0.35).
Table 3
The RT Clones Analyzed
SPECIES
FAMILY
Common
Wheat
and its
Diploid
Relativesa
G1 . . . . . .
G2 . . . . . .
G3 . . . . . .
G4 . . . . . .
Others . . .
Total . . . .
10 (5)
0
0
91 (57)
0
101 (62)
Oryza sativa
21
16
11
7
8
63
(14)
(10)
(8)
(4)
(8)
(44)
Others
1
1
0
5
6
13
(1)
(1)
(4)
(5)
(11)
TOTAL
32
17
11
103
14
177
(20)
(11)
(8)
(65)
(13)
(117)
NOTE.—Numbers of intact clones are given in parentheses.
a Triticum aestivum, Triticum urartu, Aegilops speltoides, and Aegilops
squarrosa.
The widely distributed Ty1-copia retrotransposons
of wheat (W-families 1 and 2) were placed in G1,
whereas two widely distributed rice elements, Ros4 and
Ros6, were placed in two separate families, G1 and G2
(fig. 2). It is noteworthy that the two dicot sequences,
Tnt1 and Ta1–3, were both placed close to G2 (89.9%
bootstrap probability; fig. 2). Another notable fact is that
Ros12, a highly Oryza-specific element, was placed in
G4 together with five highly Pooideae-specific wheat
families, W-families 3, 4, 5, 6, and 7, and several sequences from other grass species, including maize (subfamily Panicoideae) (table 3). In contrast, Ros13, the
other highly Oryza-specific element, was placed in G3,
which is composed of only rice elements (fig. 2).
RT Domain Divergence of Ty1-copia Group
Retrotransposon Families in Grass
Southern hybridization results (Matsuoka and Tsunewaki 1997a and this study) indicated that the RT domains of the G1 and G2 elements are well conserved
among distantly related species belonging to different
Poaceae subfamilies. The RT nucleotide divergence between the wheat and rice elements was calculated to
determine to what extent they are conserved (table 4)
and was then compared with the divergence of two nuclear genes (encoding a-amylase 3 and waxy protein)
(table 4). Calculations were made only for the G1 and
G4 families in which both the wheat and rice RT sequences were available. Genes encoding a-amylase 3
(Baulcombe et al. 1987; Huang et al. 1990; Sutliff et al.
1991) represent a subfamily of the plant amylase multigene family (Huang, Stebbins, and Rodriguez 1992).
This subfamily is assumed to have been formed after
the divergence of monocots and dicots (Huang, Stebbins, and Rodriguez 1992). Genomic and cDNA nucleotide sequences encoding the waxy protein (presumably
orthologous) have been reported, respectively, in rice
(Hirano and Sano 1991) and wheat (Clark, Robertson,
and Ainsworth 1991). The divergence of these nuclear
genes was calculated using the sequences of the coding
regions.
The wheat–rice divergence of the G1 elements
ranges from 0.11 to 0.27 (average 0.20) (table 4), very
Synonymous and Nonsynonymous Substitutions
The numbers of synonymous and nonsynonymous
substitutions per site were estimated for the intact RT
sequences of each family (table 5). Here, ‘‘intact’’ describes sequences carrying neither frameshift nor nonsense mutations. Its antonym is ‘‘defective.’’ Exceptions
were BARE-1 (Manninen and Schulman 1993) and Tar1
(Matsuoka and Tsunewaki 1997a). These are intact in
the RT domain but carry nonsense mutations and frameshifts in the other domains and were therefore excluded
from the estimation. As expected for the coding sequences, nonsynonymous substitutions are very suppressed compared with synonymous substitutions, indicating that the RT domain has been subjected to purifying selection. The degree of suppression, however,
varies among the families. The first (7.08) and second
(5.04) highest synonymous/nonsynonymous (S/N) ratios
were for the G1 and G2 families, whereas there was a
low S/N ratio (3.55) for G4. G3, a highly Oryza-specific
family, had an S/N ratio (3.60) similar to that of G4.
For some retroelements, the S/N ratios are reported
to be low between closely related elements and high
between divergent ones (Springer, Davidson, and Britten
1991; Lathe et al. 1995; Springer et al. 1995; McAllister
and Werren 1997). In our study, the G4 family contained
two large wheat retrotransposon families, W-families 4
and 7 (fig. 2). Each consists of a number of closely
related elements with ,0.1-nt divergence on average (16
intact elements in W-family 4 and 39 in W-family 7)
(table 3, see also Matsuoka and Tsunewaki 1996). We
therefore suspected that the inclusion of many closely
related wheat RT sequences may have lowered the total
S/N ratio of the G4 family (3.55). To clarify this, we
examined the relationship between the S/N ratios and
total nucleotide divergence. Using the elements of each
of the four families, we plotted the S/N ratios estimated
for all possible pairs against the total divergence between the paired sequences (fig. 3). For the G1 family,
smaller S/N ratios were found between closely related
sequences than between divergent sequences (fig. 3a).
The G2 and G3 families had similar patterns (data not
shown). These results indicate that the phenomenon observed in the RT domains of other retroelements
(Springer, Davidson, and Britten 1991; Lathe et al. 1995;
Springer et al. 1995; McAllister and Werren 1997) is
common to the grass Ty1-copia retrotransposons. The
G4 family was exceptional in that the correlation between the S/N ratio and the degree of sequence divergence was not obvious (fig. 3b). Interestingly, high S/N
ratios (.20) were found between some closely related
sequences (less than 0.1 divergence). We therefore conclude that the relatively low S/R ratio for the G4 family
(table 5) is not the result of the inclusion of many closely related sequences.
Evolution of Retrotransposons in Grass
213
FIG. 2.—Neighbor-joining tree of the grass Ty1-copia group retrotransposons based on RT nucleotide sequences. Distances between sequences were calculated as the proportions of nucleotide sites at which two sequences differ. Numbers on the branches are the bootstrap
percentages for 1,000 replicates. Wheat retrotransposon families previously identified (Matsuoka and Tsunewaki 1996) are shown as the ‘‘Wfamily.’’ Names of the source species were given after the clone names. Asterisks indicate rice clones. See table 2 for accession numbers of the
sequences.
214
Matsuoka and Tsunewaki
Table 4
Nucleotide Divergence of the RT Clones and Two
Selected Nuclear Genes
Family or
Gene
Average Divergence
Between Wheat and
No. of Average Divergence
Rice Sequences
Sequences
(range)
(range)
Intrafamily divergence of the RT clones
G1 . . . . . . . .
32
0.16 (0.01–0.27)
G2 . . . . . . . .
17
0.14 (0.00–0.31)
G3 . . . . . . . .
11
0.22 (0.01–0.36)
G4 . . . . . . . . 103
0.17 (0.00–0.44)
0.20 (0.11–0.27)
NA
NA
0.35 (0.26–0.44)
Nuclear genes
a-amy3a . . .
waxyb . . . . .
0.21 (0.17–0.23)
0.17 (NA)
6
2
0.19 (0.06–0.23)
0.17 (NA)
NOTE.—Nucleotide divergence was calculated as the proportion of nucleotide sites at which the two sequences compared differ. NA 5 not applicable.
a Database accession numbers: M16991, M59351, M59352, X56336,
X56337, and X56338.
b Database accession numbers: X57233 and X58228.
Discussion
Evolutionary Dynamics of Ty1-copia Group
Retrotransposons in Grass
The retroelement RT domain has been the subject
of extensive evolutionary studies in animals (e.g., Xiong
and Eickbush 1990; Eickbush et al. 1995; Springer et
al. 1995; Casavant, Sherman, and Wichman 1996; McAllister and Werren 1997), whereas there have been relatively few such studies in plants. The more than 100
RT sequences obtained by us and others (table 2) provide the opportunity to analyze the evolutionary dynamics of grass Ty1-copia group retrotransposons in detail
for the first time. Four families (named G1, G2, G3, and
G4) were identified in the neighbor-joining tree constructed with 177 RT sequences (fig. 2). Results of
Southern hybridization and RT nucleotide sequence
analysis suggest that each of the four families underwent
a distinct pattern of evolution.
G1 Family
As indicated by the high S/N ratio estimated for the
G1 family, a relatively high percentage of G1 elements
are considered to have maintained functional RT domains for long periods of time. The wide distribution
range and abundance of elements carrying a functional
RT domain suggest that the G1 family, established before radiation of the grass species, remained active until
recently. This view is supported by our previous data,
which suggest recent amplifications of the wheat G1 elements during the divergence of Triticum and Aegilops
that took place some million years ago (Matsuoka and
Tsunewaki 1997b). The hardly detectable hybridization
signals in maize and sorghum, however, indicate that the
extinction or divergence of specific G1 elements indeed
did occur during the differentiation of these species (fig.
1c).
G2 Family
The G2 family may have originated before the divergence of monocots and dicots (200 MYA; see Wolfe
Table 5
Numbers of Synonymous and Nonsynonymous
Substitutions per Site Estimated for the Grass Ty1-copia
Group Retrotransposon Families
No. of
Family Clones
G1
G2
G3
G4
....
....
....
....
20
11
8
65
Synonymous
Substitutions (S)
0.4732
0.4251
0.4912
0.3481
6
6
6
6
0.0408
0.0398
0.0435
0.0330
Nonsynonymous
Substitutions (N)
Ratio
(S/N)
6
6
6
6
7.08
5.04
3.60
3.55
0.0668
0.0844
0.1363
0.0981
0.0085
0.0124
0.0155
0.0097
et al. 1989), because two dicot elements (Tnt1 and Ta1–
3) were placed close to the family (89.9% bootstrap
probability) (fig. 2). The percentage of elements carrying a functional RT domain seems to be high in this
family, as inferred from the high S/N ratio. Extinction
or divergence of the G2 elements has occurred in the
subfamily Pooideae, as they are not detectable in four
species of the taxon (fig. 1b). G2 appears to be a longsurviving family with a functional RT domain, at least
in some grass species.
G3 Family
The G3 family generally appears to have been
maintained in Oryza. Its elements have some features in
common with G4: high specificity to a given taxon and
a low S/N ratio. The evolutionary dynamics may also
be similar to those of the G4 elements (see below), but
further characterization is necessary.
G4 Family
The G4 family consists of elements derived from
species of various subfamilies, suggesting that the G4
family was established before radiation of the grass species. Unlike the elements of G1 and G2, the elements
of G4 appear highly specific to given taxa, as shown by
Southern hybridization. This high specificity to given
taxa is consistent with the high degree of nucleotide
divergence between the wheat and rice elements compared with that of the nuclear protein-coding genes (table 4). We assume that primarily independent amplifications of specific elements are responsible for the high
specificity of the G4 elements to given taxa.
One implication for the evolutionary dynamics of
grass Ty1-copia group retrotransposons is that these
families probably evolved in parallel. For example, the
G1 family is suggested to have originated before radiation of the grass species and to have been active during
the divergence of the Triticum and Aegilops species. In
the period between the grass radiation (60 MYA) and
the Triticum and Aegilops divergence (some million
years ago), the G1 family supposedly was capable of
reproducing new copies at an appropriate rate to compensate for the mutational loss of old ones in the lineage
leading to modern Triticum and Aegilops. It is in this
period that amplification of the highly Pooideae-specific
elements of the G4 family took place. Parallel evolution
(i.e., independent amplifications) of the G1 and G4 families is therefore assumed for the ancestor of Triticum
and Aegilops. Similarly, two long-term surviving families (G1 and G2) appear to have evolved in parallel in
Evolution of Retrotransposons in Grass
215
FIG. 3.—Diagrams showing the relationship between the S/N ratio and the total divergence of the G1 (a) and G4 (b) elements. A linear
model was assumed in the regression analysis. The coefficients are highly significant (P , 0.000).
the lineage leading to modern rice. The parallel evolution of these families is supported by the branched tree
(fig. 2) whose topology is the one expected for families
that have multiple elements capable of producing new
ones (Clough et al. 1996). This is in sharp contrast to
mammal retroelements such as L1 and Alu, in which
evolution seems to be controlled by a single or very
limited number of master genes that produce a small
number of replicatively competent progeny that form a
‘‘master lineage’’ as well as a large number of replicatively incompetent progeny (Deininger et al. 1992). The
coexistence of multiple lineages of retroelements has
been documented for various organisms (Burke et al.
1993; VanderWeil, Voytas, and Wendel 1993; Springer
et al. 1995; Casavant, Sherman, and Wichman 1996).
Amplification Mechanism of the G4 Elements
The S/N ratio enables us to measure the intensity
of purifying selection on a given coding sequence. Because purifying selection suppresses nonsynonymous
substitutions that would impair protein function, sequences under less intensive purifying selection should
show low S/N ratios; therefore, sequences derived from
RT domains that are not functional for long periods of
time should have low S/N ratios. In the present cases,
the percentage of such sequences in each family affects
the S/N ratio, because the RT sequences analyzed are a
mixture of those derived from functional and nonfunctional RT domains. The percentage of closely related
sequences is another factor that may affect the S/N ratio
(see Results). Our analyses showed that the low S/N ratio of the G4 family was not due to the presence of many
closely related sequences; it was therefore inferred to
result primarily from the high percentage of elements
that carry nonfunctional RT domains and are free from
purifying selection.
It is noteworthy that some G4 elements are abundantly present in the host genome. BARE-1 and its related elements make up 6.7% of the haploid barley genome (Suoniemi et al. 1996). The elements of W-family
7 are abundant in wheat and its close relatives, as shown
by Southern hybridization (Matsuoka and Tsunewaki
1996). These facts indicate that these elements once actively amplified during the host’s evolution. We assume
that a small number of elements carrying functional RT
domains are present in the G4 family, because S/N ratios
higher than 20 were found. These high ratios were mainly between elements of W-family 7 (the best-sampled
family) and are comparable to those between diverged
G1 elements. A high percentage of elements carrying
nonfunctional RT domains or a small number of elements carrying functional RT domains suggests that production of new G4 elements has been controlled by a
small number of elements carrying functional RT domains. These elements may have served as ‘‘master
genes’’ in the evolution of the G4 family. The presence
of diagnostic nucleotides that define the wheat families
supports the existence of master elements from which
progeny are replicated (Matsuoka and Tsunewaki 1996).
Alternatively, elements that carry a functional RT domain may be able to help other elements retrotranspose
by providing RT which acts in trans.
Acknowledgment
We thank Toshie Miyabayashi of the National
Institute of Genetics, Japan, for providing the seeds of
O. australiensis and O. glaberrima.
LITERATURE CITED
BAULCOMBE, D. C., A. K. HUTTLY, R. A. MARTIENSSEN, R. F.
BARKER, and M.G. JARVIS. 1987. A novel wheat a-amylase
gene (a-Amy3). Mol. Gen. Genet. 209:33–40.
BENNETZEN, J. L. 1996. The contributions of retroelements to
plant genome organization, function and evolution. Trends
Microbiol. 4:347–353.
BURKE, W. D., D. G. EICKBUSH, Y. XIONG, J. JAKUBCZAK, and
T. H. EICKBUSH. 1993. Sequence relationship of retrotransposable elements R1 and R2 within and between divergent
insect species. Mol. Biol. Evol. 10:163–185.
CASACUBERTA, J. M., and M. A. GRANDBASTIEN. 1993. Characterisation of LTR sequences involved in the protoplast
specific expression of the tobacco Tnt1 retrotransposon. Nucleic Acids Res. 21:2087–2093.
216
Matsuoka and Tsunewaki
CASAVANT, N. C., A. N. SHERMAN, and H. A. WICHMAN. 1996.
Two persistent LINE-1 lineages in Peromyscus have unequal rates of evolution. Genetics 142:1289–1298.
CLARK, J. R., M. ROBERTSON, and C. C. AINSWORTH. 1991.
Nucleotide sequence of a wheat (Triticum aestivum L.)
cDNA clone encoding the waxy protein. Plant Mol. Biol.
16:1099–1101.
CLOUGH, J. E., J. A. FOSTER, M. BARNETT, and H. A. WICHMAN. 1996. Computer simulation of transposable element
evolution: random template and strict master models. J.
Mol. Evol. 42:52–58.
DEININGER, P. L., M. A. BATZER, C. A. HUTCHISON III, and
M. H. EDGELL. 1992. Master genes in mammalian repetitive
DNA amplification. Trends Genet. 9:307–311.
EICKBUSH, D. G., W. C. LATHE III, M. P. FRANCINO, and T. H.
EICKBUSH. 1995. R1 and R2 retrotransposable elements of
Drosophila evolve at rates similar to those of nuclear genes.
Genetics 139:685–695.
EICKBUSH, T. H. 1994. Origin and evolutionary relationships
of retroelements. Pp. 121–160 in S. S. MORSE, ed. The evolutionary biology of viruses. Raven Press, New York.
FELSENSTEIN, J. 1985. Confidence limits on phylogenies: an
approach using the bootstrap. Evolution 39:783–791.
FLAVELL, A. J., E. DUNBAR, R. ANDERSON, S. R. PEARCE, R.
HARTLEY, and A. KUMAR. 1992. Ty1-copia group retrotransposons are ubiquitous and heterogeneous in higher plants.
Nucleic Acids Res. 20:3639–3644.
GRANDBASTIEN, M. A. 1992. Retroelements in higher plants.
Trends Genet. 8:103–108.
GRANDBASTIEN, M. A., A. SPIELMANN, and M. CABOCHE.
1989. Tnt1, a mobile retroviral-like transposable element of
tobacco isolated by plant cell genetics. Nature 337:376–380.
HIRANO, H., and Y. SANO. 1991. Molecular characterization of
the waxy locus of rice (Oryza sativa). Plant Cell Physiol.
32:989–997.
HIROCHIKA, H. 1993. Activation of tobacco retrotransposons
during tissue culture. EMBO J. 12:2521–2528.
HIROCHIKA, H., A. FUKUCHI, and F. KIKUCHI. 1992. Retrotransposon families in rice. Mol. Gen. Genet. 233:209–216.
HIROCHIKA, H., and R. HIROCHIKA. 1993. Ty1-copia group retrotransposons as ubiquitous components of plant genomes.
Jpn. J. Genet. 68:35–46.
HIROCHIKA, H., and H. OTSUKI. 1995. Extrachromosomal circular forms of the tobacco retrotransposon Tto1. Gene 165:
229–232.
HIROCHIKA, H., H. OTSUKI, M. YOSHIKAWA, Y. OTSUKI, K.
SUGIMOTO, and S. TAKEDA. 1996a. Autonomous transposition of the tobacco retrotransposon Tto1 in rice. Plant Cell
8:725–734.
HIROCHIKA, H., K. SUGIMOTO, Y. OTSUKI, H. TSUGAWA, and
M. KANDA. 1996b. Retrotransposons of rice involved in
mutations induced by tissue culture. Proc. Natl. Acad. Sci.
USA 93:7783–7788.
HUANG, N., N. KOIZUMI, S. REINL, and R. L. RODRIGUEZ.
1990. Structural organization and different expression of
rice a-amylase genes. Nucleic Acids Res. 18:7007–7014.
HUANG, N., G. L. STEBBINS, and R. L. RODRIGUEZ. 1992. Classification and evolution of a-amylase genes in plants. Proc.
Natl. Acad. Sci. USA 89:7526–7530.
KATAYAMA, H., and Y. OGIHARA. 1996. Phylogenetic affinities
of the grasses to other monocots as revealed by molecular
analysis of chloroplast DNA. Curr. Genet. 29:572–581.
KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molecular
evolutionary genetics analysis. Version 2.01. Pennsylvania
State University, University Park.
LATHE, W. C. III, W. D. BURKE, D. G. EICKBUSH, and T. H.
EICKBUSH. 1995. Evolutionary stability of the R1 retrotran-
sposable element in the genus Drosophila. Mol. Biol. Evol.
12:1094–1105.
MCALLISTER, B. F., and J. H. WERREN. 1997. Phylogenetic
analysis of a retrotransposon with implications for strong
evolutionary constraints on reverse transcriptase. Mol. Biol.
Evol. 14:69–80.
MANNINEN, I., and A. H. SCHULMAN. 1993. BARE-1, a copialike retroelement in barley (Hordeum vulgare L.). Plant
Mol. Biol. 22:829–846.
MATSUOKA, Y., and K. TSUNEWAKI. 1996. Wheat retrotransposon families identified by reverse transcriptase domain analysis. Mol. Biol. Evol. 13:1384–1392.
. 1997a. Presence of wheat retrotranposons in Gramineae species and the origin of wheat retrotransposon families. Genes Genet. Syst. 72:335–343.
. 1997b. Origin and the transmission of some types of
family 1 wheat retrotransposons in the two related genera
Triticum and Aegilops. Genes Genet. Syst. 72:345–351.
MHIRI, C., J. B. MOREL, S. VERNHETTES, J. M. CASACUBERTA,
H. LUCAS, and M. A. GRANDBASTIEN. 1997. The promoter
of the tobacco Tnt1 retrotransposon is induced by wounding
and by abiotic stress. Plant Mol. Biol. 33:257–266.
MOREAU-MHIRI, C., J. B. MOREL, C. AUDEON, M. FERAULT,
M. A. GRANDBASTIEN, and H. LUCAS. 1996. Regulation of
expression of the tobacco Tnt1 retrotransposon in heterologous species following pathogen-related stresses. Plant J.
9:409–419.
MORISHIMA, H. 1984. Wild plants and domestication. Pp. 3–
30 in S. TSUNODA and N. TAKAHASHI, eds. Biology of rice.
Jpn. Sci. Soc. Press, Tokyo/Elsevier, Amsterdam.
MURPHY, G. J. P., H. LUCAS, G. MOORE, and R. B. FLAVELL.
1992. Sequence analysis of Wis-2-1A, a retrotransposonlike element from wheat. Plant Mol. Biol. 20:991–995.
MURRAY, M. G., and W. F. THOMPSON. 1980. Rapid isolation
of high molecular weight plant DNA. Nucleic Acids Res.
8:4321–4325.
NOMA, K., R. NAKAJIMA, H. OHTSUBO, and E. OHTSUBO. 1997.
RIRE1, a retrotransposon from wild rice Oryza australiensis. Genes Genet. Syst. 72:131–140.
PEARCE, S. R., G. HARRISON, P. (J. S.) HESLOP-HARRISON, A.
J. FLAVELL, and A. KUMAR. 1997. Characterization and genomic organization of Ty1-copia group retrotransposons in
rye (Secale cereale). Genome 40:617–625.
POUTEAU, S., M. A. GRANDBASTIEN, and M. BOCCARA. 1994.
Microbial elicitors of plant defence responses activate transcription of a retrotransposon. Plant J. 5:535–542.
SANMIGUEL, P., A. TIKHONOV, Y. K. JIN et al. (11 co-authors).
1996. Nested retrotransposons in the intergenic regions of
the maize genome. Science 274:765–768.
SPRINGER, M. S., E. H. DAVIDSON, and R. J. BRITTEN. 1991.
Retroviral-like element in a marine invertebrate. Proc. Natl.
Acad. Sci. USA 88:8401–8404.
SPRINGER, M. S., N. A. TUSNEEM, E. H. DAVIDSON, and R. J.
BRITTEN. 1995. Phylogeny, rates of evolution, and patterns
of codon usage among sea urchin retroviral-like elements,
with implications for the recognition of horizontal transfer.
Mol. Biol. Evol. 12:219–230.
SUONIEMI, A., K. ANAMTHAWAT-JÓNSSON, T. ARNA, and A. H.
SCHULMAN. 1996. Retrotransposon BARE-1 is a major, dispersed component of the barley (Hordeum vulgare L.) genome. Plant Mol. Biol. 30:1321–1329.
SUONIEMI, A., A. NARVANTO, and A. H. SCHULMAN. 1996. The
BARE-1 retrotransposon is transcribed in barley from an
LTR promoter active in transient assays. Plant Mol. Biol.
31:295–306.
Evolution of Retrotransposons in Grass
SUTLIFF, T. D., N. HUANG, J. C. LITTS, and R. L. RODRIGUEZ.
1991. Characterization of an a-amylase multigene cluster in
rice. Plant Mol. Biol. 16:579–591.
THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighing, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.
TURCICH, M. P., A. BOKHARI-RIZA, D. A. HAMILTON, C. HE,
W. MESSIER, C. B. STEWART, and J. P. MASCARENHAS.
1996. PREM-2, a copia-type retroelement in maize is expressed preferentially in early microspores. Sex. Plant Reprod. 9:65–74.
VANDERWIEL, P. L., D. F. VOYTAS, and J. F. WENDEL. 1993.
Copia-like retrotransposable element evolution in diploid
and polyploid cotton (Gossypium L.). J. Mol. Evol. 36:429–
447.
VOYTAS, D. F., and F. M. AUSUBEL. 1988. A copia-like transposable element family in Arabidopsis thaliana. Nature
336:242–244.
VOYTAS, D. F., M. P. CUMMINGS, A. KONIECZNY, F. M. AUSUBEL, and S. R. RODERMEL. 1992. copia-like retrotranspo-
217
sons are ubiquitous among plants. Proc. Natl. Acad. Sci.
USA 89:7124–7128.
WANG, S., Q. ZHANG, P. J. MAUGHAN, and M. A. SAGHAI MAROOF. 1997. Copia-like retrotransposons in rice: sequence
heterogeneity, species distribution and chromosomal locations. Plant Mol. Biol. 33:1051–1058.
WHITE, S. E., L. F. HABERA, and S. R. WESSLER. 1994. Retrotransposons in the flanking regions of normal plant genes:
a role for copia-like elements in the evolution of gene structure and expression. Proc. Natl. Acad. Sci. USA 91:11792–
11796.
WOLFE, K. H., M. GOUY, Y. W. YANG, P. M. SHARP, and W.
H. LI. 1989. Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc. Natl.
Acad. Sci. USA 86:6201–6205.
XIONG, Y., and T. H. EICKBUSH. 1990. Origin and evolution of
retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.
THOMAS H. EICKBUSH, reviewing editor
Accepted October 22, 1998