Structure, Divergence, and Distribution of the CRR Centromeric

Structure, Divergence, and Distribution of the CRR Centromeric
Retrotransposon Family in Rice
Kiyotaka Nagaki,* Pavel Neumann,à Dongfen Zhang,§ Shu Ouyang,k C. Robin Buell,k
Zhukuan Cheng,§ and Jiming Jiang*
*Department of Horticulture, University of Wisconsin-Madison; Research Institute for Bioresources, Okayama
University, Kurashiki, Japan; àInstitute of Plant Molecular Biology, Ceske Budejovice, Czech Republic; §Institute of Genetics
and Developmental Biology, Chinese Academy of Sciences, Beijing, China; and kThe Institute for Genomic Research,
Rockville, Maryland
The centromeric retrotransposon (CR) family in the grass species is one of few Ty3-gypsy groups of retroelements that
preferentially transpose into highly specialized chromosomal domains. It has been demonstrated in both rice and maize that
CRR (CR of rice) and CRM (CR of maize) elements are intermingled with centromeric satellite DNA and are highly
concentrated within cytologically defined centromeres. We collected all of the CRR elements from rice chromosomes
1, 4, 8, and 10 that have been sequenced to high quality. Phylogenetic analysis revealed that the CRR elements are structurally diverged into four subfamilies, including two autonomous subfamilies (CRR1 and CRR2) and two nonautonomous
subfamilies (noaCRR1 and noaCRR2). The CRR1/CRR2 elements contain all characteristic protein domains required for
retrotransposition. In contrast, the noaCRR elements have different structures, containing only a gag or gag-pro domain or
no open reading frames. The CRR and noaCRR elements share substantial sequence similarity in regions required for DNA
replication and for recognition by integrase during retrotransposition. These data, coupled with the presence of young
noaCRR elements in the rice genome and similar chromosomal distribution patterns between noaCRR1 and CRR1/
CRR2 elements, suggest that the noaCRR elements were likely mobilized through the retrotransposition machinery from
the autonomous CRR elements. Mechanisms of the targeting specificity of the CRR elements, as well as their role in
centromere function, are discussed.
Introduction
Retrotransposons are mobile genetic elements that
transpose through reverse transcription of an RNA intermediate. Retrotransposons include two different classes,
depending on the presence of the long terminal repeats
(LTRs). LTR retrotransposons are further subclassified into
the Ty1-copia and Ty3-gypsy groups based on the order
of the coding regions within their pol genes (Xiong and
Eickbush 1990). LTR retrotransposons account for a significant portion of most plant genomes and play an important
role in genome divergence and evolution (Kumar and
Bennetzen 1999; Feschotte, Jiang, and Wessler 2003). For
example, LTR retrotransposons represent more than 50%
of the maize genome, with a majority having transposed
within the past 2 to 6 Myr (SanMiguel et al. 1996, 1998).
Accumulation of LTR retrotransposons in the intergenic
regions played a major role in the divergence between
the maize and other cereal genomes (Chen et al. 1997).
Retrotransposons demonstrate different distribution
patterns in plant genomes. Some retrotransposons are dispersed throughout plant genomes (Heslop-Harrison et al.
1997; Mroczek and Dawe 2003), whereas others are highly
enriched in distinct chromosomal domains (Jiang et al.
2002, Jiang et al. 2003; Mroczek and Dawe 2003). These
distribution patterns are likely caused by different targeting
specificities of the retrotransposons. Recent studies in Saccharomyces cerevisiae have shed light on the mechanisms
of retrotransposon insertion specificity (Sandmeyer 2003;
Bushman 2004). For example, the Ty5 retrotransposon
in S. cerevisiae inserts preferentially into the heterochroKey words: bacterial artificial chromosomes, centromeric retrotransposon, long terminal repeats, rice.
E-mail: [email protected].
Mol. Biol. Evol. 22(4):845–855. 2005
doi:10.1093/molbev/msi069
Advance Access publication December 22, 2004
matic regions. This targeting specificity is determined by
interactions between the targeting domain at the C-terminus
of the Ty5 integrase (IN) and the heterochromatin protein
Sir4p (Zhu et al. 2003). Thus, the targeted integration of
Ty5 is controlled by protein-protein interactions.
One of the most interesting retrotransposon families in
plants is the centromeric retrotransposon (CR) in the grass
species. CR belongs to the Ty3-gypsy group and is highly
specific to the centromeric regions of grass chromosomes
(Jiang et al. 2003). The CR elements are found in both
monocot and dicot species and represent a distinct clade
in the Metaviridae family (Gorinsek, Gubensek, and Kordis
2004). Two repetitive DNA sequences specific to grass centromeres were isolated from sorghum (Jiang et al. 1996) and
Brachypodium sylvaticum (Aragon-Alcaide et al. 1996),
and these sequences were later found to be derived from
different parts of the CR elements (Miller et al. 1998;
Presting et al. 1998; Langdon et al. 2000). The rice and
maize CR subfamilies were named as CRR (CR of rice)
and CRM (CR of maize), respectively (Cheng et al.
2002; Zhong et al. 2002). Both CRR and CRM elements
are highly intermingled with centromere-specific satellite
repeats (Cheng et al. 2002; Jin et al. 2004). Chromatin
immunoprecipitation (ChIP) analysis demonstrated that
CRR and CRM elements are enriched in centromeric chromatin containing the centromere-specific histone H3 variant
(CenH3), suggesting that the CRR and CRM elements may
play a role in centromere function (Zhong et al. 2002;
Nagaki et al. 2004).
Rice chromosomes 1, 4, 8, and 10, including the centromeres of chromosomes 4 and 8, have been sequenced to high
quality (Feng et al. 2002; Sasaki et al. 2002; Yu et al. 2003;
Nagaki et al. 2004; Wu et al. 2004) (http://www.tigr.org/tdb/
e2k1/osa1/pseudomolecules/info.shtml). We identified all of
the CRR elements and their solo LTRs, which include only
the LTR sequence and may be derived from illegitimate
Molecular Biology and Evolution vol. 22 no. 4 Ó Society for Molecular Biology and Evolution 2004; all rights reserved.
846 Nagaki et al.
Table 1
Primers Used for ChIP-PCR Analysis
Name
rDNA1
rDNA2
RIRE3
CRR1
CRR2
noaCRR1
noaCRR2
Primer-1
Primer-2
AATCAGCGGGGAAAGAAGACC
TGTGGAACAAAAGGGTAAAAGC
TTCACATCTCCCCCTCTATTCA
AACCAGATCGCAAGCAACACTA
CACTCGTGTTTTACTCAGGAA
GCCAGAATCACACGCACAAGGT
TTCCAAGTCCAAGTCCAGTTCG
TCGAAGGATCAAAAAGCAACG
TCAAACTCCGAAGGTCTAAAGG
GACTTTGTCCATCTCCATCCAT
TACATCCAAACAAAACCCAAAG
CAGGCAGACGGGCGGTTTAGC
AATCGAAGAAACAAGCAAGAAC
AAGGCCAAGTTCGGTTTCAGC
recombination within and between the CRR elements
(Devos, Brown, and Bennetzen 2002), in these four chromosomes and further analyzed their structure, distribution, and
divergence. Phylogenetic analysis revealed that the CRR
family consists of four structurally diverged subfamilies,
including two autonomous and two nonautonomous subfamilies. The autonomous and nonautonomous CRR elements show similar chromosomal distribution patterns and
share substantial sequence similarities within regions required for DNA replication and integrase recognition. These
results have provided new insights about the evolution
and mechanism of centromeric targeting specificity of the
CR retrotransposon family in grasses.
CACTA), CRR1-b (TACATCCAAACAAAACCCAAAG),
CRR2-a (CACTCGTGTTTTACTCAGGAA), CRR2-b
(CAGGCAGACGGGCGGTTTAGC), noaCRR1-a (GCCACCTGCTACACTGCTGACT), noaCRR1-b (CCGACTACAACCATACGAGACG), noaCRR2-a (TCATAACTTCACACGCTCCAAT), and noaCRR2-b (TGCAATCGCTACACCACAAACG). DNA fragments corresponding to
LTRs of each subfamilie were amplified from the genomic
DNA of Nipponbare and were labeled as FISH probes. Polymerase chain reaction (PCR) conditions were 30 cycles at
94°C for 30 s, 55°C for 30 s, and 72°C for 1 min. Plasmid
pRCS2 (Dong et al. 1998) was used as a probe to detect the
rice centromere-specific satellite CentO.
Materials and Methods
Sequence Analyses
ChIP-PCR
Sequenced rice bacterial artificial chromosomes
(BACs) and P1 artificial chromosome clones (PACs) that
are derived from chromosomes 1, 4, 8, and 10 and contain
CRR-related sequence were identified by Blast search
(http://www.ncbi.nlm.nih.gov/Blast/) using CRR sequences
in AC092749 and AC022352 and in BAC 17P22 (Cheng
et al. 2002; Nagaki et al. 2003) as queries. The BAC/PAC
sequences were aligned using the MegAlign software
(DNASTAR, Madison, Wis.) along with the known CRR
sequences to extract the CRR elements. The extracted
CRR sequences were deposited in GenBank (accession numbers AY827956 to AY828189). CRR sequences were analyzed using the Staden Package software (Staden 1996)
and tools implemented at the Biology Workbench server
(http://workbench.sdsc.edu/). The search for conserved
protein domains was carried out with RPS-Blast (MarchlerBauer et al. 2003) (http://www.ncbi.nlm.nih.gov/Structure/
cdd/wrpsb.cgi). Phylogeny of LTRs from the CRR elements
was analyzed by the neighbor-joining method with ClustalX
software (Saitou and Nei 1987; Thompson et al. 1997). CRR
elements from different subfamilies were compared with
each other using the MegAlign software. The ages of the
CRR elements were estimated by sequence comparison
between the two LTRs from individual CRR elements
(Nagaki et al. 2003).
ChIP-PCR was conducted as described previously
(Nagaki et al. 2004) using 1-week-old etiolated rice seedlings and purified anti-CenH3 antibody. Pre-immuno blood
was used as a mock in the ChIP experiments. DNA from
antibody-bound fraction and mock experiments were used
as the template in PCR. PCR primers specific to each of the
four CRR subfamilies were designed (table 1). Two sets of
primers were designed from the 18S-25S ribosomal RNA
genes (rDNA) (table 1) and were used as negative control
for ChIP-PCR. PCR conditions were 30 cycles at 94°C for
30 s, 55°C for 30 s, and 72°C for 1 min. The PCR products
were electrophoresed and blotted on HybondN1 membrane (Amersham Biosciences, Piscataway, NJ). The same
PCR products were used as probes for Southern hybridizations. The membranes were hybridized at 65°C overnight
and then washed sequentially with 2 3 SSC with 0.1%
SDS, 0.5 3 SSC with 0.1% SDS, and 0.1 3 SSC with
0.1% SDS. The signals were detected by phosphoimaging.
Relative enrichment (RE) was calculated by comparing
antibody-associated PCR product ratios to product ratios
from mock experiments using the following formula:
RE 5 (LTRs or rDNA1/rDNA2)antibody/(LTRs or rDNA1/
rDNA2)mock. The probability (P) of the mock fractions
and antibody fractions belonging to same group was analyzed by t-test.
Fluorescence in situ Hybridization
Results
Divergence of the CRR Elements
Oryza sativa spp. japonica cv. Nipponbare was used for
cytological analyses. The fluorescence in situ hybridization
(FISH) procedures on meiotic pachytene chromosomes have
been described previously (Cheng et al. 2001). Primers specific to the LTRs of the four CRR subfamilies were designed.
Primers include CRR1-a (AACCAGATCGCAAGCAA-
The CR elements can be grouped into ‘‘autonomous’’
and ‘‘nonautonomous’’ subfamilies (Langdon et al. 2000).
The autonomous CR elements are full-size elements. The
nonautonomous CR elements have an internal deletion
leading to the loss of all enzymatic functions, resulting
in the retrotransposons having only LTRs, 5# untranslated
Structure, Divergence, and Distribution of the CRR 847
1998). The CRR elements were clustered into four groups,
with the branches having more than 97 pre-100 bootstrap
test values (fig. 2). Two of the four clusters include only
LTRs from the autonomous CRR elements, and the other
two clusters include LTRs only from the nonautonomous
CRR elements (fig. 2). We named the two autonomous
clusters CRR1 and CRR2, respectively, because of their
sequence similarities to CRM1 and CRM2 in maize
(Nagaki et al. 2003) (fig. 2). The two nonautonomous clusters were arbitrarily named noaCRR1 and noaCRR2,
respectively.
Structure of the CRR Elements from Different
Subfamilies
FIG. 1.—(A) The structures of autonomous (CRR1 and CRR2) and
nonautonomous (noaCRR1 and noaCRR2) CRR elements. (B) Conserved
DNA motifs among CRR elements from all four CRR subfamilies.
region (UTR), and a gag structural gene fragment (Langdon
et al. 2000) (fig. 1A). Phylogenetic studies revealed that the
full size CR elements in maize can be grouped into two distinct subfamilies (Nagaki et al. 2003). The LTRs and 5#
UTRs between the two subfamilies are more diverged than
the pol and gag regions (Nagaki et al. 2003).
We searched all of the sequenced rice BACs/PACs in
GenBank using three published CRR sequences (Cheng
et al. 2002; Nagaki et al. 2003) as queries. We were able
to identify 72, 69, 60, and 53 putative CRR-containing
BACs/PACs from chromosomes 1, 4, 8, and 10, respectively. These four chromosomes, including the centromeres
of chromosomes 4 and 8, have been sequenced to high quality (Feng et al. 2002; Sasaki et al. 2002; Yu et al. 2003;
Nagaki et al. 2004; Wu et al. 2004; Zhang et al. 2004)
(http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.
shtml). Putative CRR elements were individually analyzed
by comparing with the previously reported CRR sequences
using MegAlign software. A total of 32 autonomous elements, 86 nonautonomous elements, and 116 solo LTRs
were identified from this analysis (accession numbers
AY827956 to AY828189).
Intact LTRs from the CRR elements were analyzed
using ClustalX together with the previously reported
CRR sequences and CR elements from maize, including
CRM1 and CRM2 (Nagaki et al. 2003) and CentA (a nonautonomous CRM element) (Ananiev, Phillips, and Rines
Despite of the sequence divergence among the four
subfamilies, all CRR elements have a conserved primerbinding site (PBS) complementary to 12 bp at the 3# end
of tRNAini from wheat (Sprinzl et al. 1999), which is
located 2 bp downstream of the 3# end of the 5# LTR,
as well as a polypurine tract (PPT) located immediately
upstream of the 3# LTR (fig. 1B). The termini of the LTRs
contain the inverted repeat motif TGATG/CATCA that is
strongly conserved among all the CRR elements. An additional common feature of the CRR elements is an A-rich
stretch within the 5# UTR, although its length varies among
the four CRR subfamilies.
Search for conserved domains within the polyprotein
performed using RPS-Blast (Marchler-Bauer et al. 2003)
allowed identification of GAG, zinc-finger, protease
(PRO), reverse-transcriptase (RT), and integrase (IN)
domains (table 2). The ribonuclease H (RH) domain was
identified based on the presence of typical DEDD motif
(Malik and Eickbush 2001). The CRR1/CRR2 elements
contain all characteristic protein domains required for retrotransposition. The noaCRR1 elements contain only a partial GAG domain lacking at least part of the nucleocapsid
domain defined by zinc finger. The noaCRR2 elements
show heterogeneous structures. Among the six noaCRR2
elements analyzed, two of them show a similar structure
to noaCRR1 elements and contain a partial GAG domain,
and two elements have a complete GAG, together with the
PRO domain. The remaining two elements contain no coding regions (fig. 1A). Besides these domains, putative proteins in the nonautonomous elements contain relatively
large downstream regions, which have no similarity to
the polyprotein of the autonomous elements. Similar to previously described CR elements (Langdon et al. 2000), the
putative coding regions of newly identified CRR elements
from all four groups extended into the 3# LTR.
Sequence comparisons among representative elements
from the four subfamilies showed that the CRR1 and CRR2
elements have a conserved pol region; yet the LTRs, 5#
UTR, and gag regions were diverged between the two subfamilies (fig. 3). The noaCRR1 and noaCRR2 elements
have partial homology with the autonomous elements
within LTRs, 5# UTR, and gag regions (fig. 3). The
LTR regions of noaCRR1 show the highest sequence similarity with the LTR regions of the autonomous CRR elements. In contrast, the gag-pro coding region of noaCRR2
has the highest sequence similarity with the corresponding
region of the autonomous elements (fig. 3).
848 Nagaki et al.
Structure, Divergence, and Distribution of the CRR 849
Table 2
Putative Proteins Encoded by CRR Elements
Element
CRR1_CH1-2 (1,632 aa)
CRR2 _CH10-1 (1,640 aa)
noaCRR1_CH4-5 (608 aa)
noaCRR2_CH1-3 (767 aa)
noaCRR2_CH4-2
Domain
Score
Position
GAG (pfam037032)
PRO
RT (pfam00078)
RH
IN (pfam00665)
GAG (pfam037032)
PRO
RT (pfam00078)
RH
IN (pfam00665)
GAG (pfam037032)
GAG (pfam037032)
PRO (COG3577)
GAG (pfam037032)
1e–16
—
5e–28
—
5e–26
7e–12
—
4e–29
—
1e–26
7e–13
1e–15
1.2
1e–07
194–289 (zinc finger at 398–411)
491–536
779–947
1047–1152 (DEDD motif)
1286–1442 (HHCC motif at 1229–1269)
192–288 (zinc finger at 392–405)
480–525
752–921
1019–1124 (DEDD motif)
1260–1416 (HHCC motif at 1201–1241)
81–171
182–276 (zinc finger at 372–386)
463–508
91–160
Distribution of the CRR Elements in Rice
Chromosomes 1, 4, 8, and 10
We plotted the CRR sequences on the genetic maps of
rice chromosomes 1, 4, 8, and 10 (fig. 4). Each linkage map
is divided by 10-cM units, with one of the units spanning
the genetically mapped centromeres (Harushima et al.
1998). The majority of the full-size CRR1, CRR2, and
noaCRR1 elements, along with their solo LTRs, are located
in the centromeric regions (fig. 4). In contrast, we did not
find noaCRR2 elements in two of the four centromeres
analyzed.
PCR-amplified DNAs using primers specific to the
LTRs of each of the four subfamilies were used as probes
for FISH analysis on rice pachytene chromosomes. In
general, the FISH signals derived from the LTR probes
of CRR1, CRR2, and noaCRR1 have a similar pattern
(fig. 5). However, the signals from the CRR1 and CRR2
probes are more concentrated in the centromeric and/or
pericentromeric regions than are those from the noaCRR1
probe. Unambiguous signals outside of the pericentromeric
regions were more frequently observed with the noaCRR1
probe than with the CRR1 and CRR2 probes (fig. 5). These
results are consistent with the distribution patterns
generated from sequence plotting (fig. 4). The noaCRR2
probe generated only weak signals that were inconclusive
on their centromere-specificity (data not shown).
Plant centromeres contain long tracts of satellite
repeats, which prevent cloning and sequencing efforts
(Henikoff 2002). Thus far, only the centromeres of rice
chromosomes 4 and 8 have been sequenced owing to the
limited amount of the centromeric satellite repeat in these
two chromosomes (Cheng et al. 2002; Nagaki et al. 2004;
Wu et al. 2004; Zhang et al. 2004). Quantitative FISH
analysis estimated that the centromeres of rice chromosomes 1 and 10 contain approximately 1,400 and 500 kb
of the CentO repeat (Cheng et al. 2002). However, only
70 kb and 2 kb of CentO sequences are reported in the most
updated chromosomes 1 and 10 sequences (http://www.
tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml). Thus,
the collections of CRR elements from chromosomes 1
and 10 are most likely not complete, as the CRR elements
frequently insert into the CentO arrays (Cheng et al. 2002),
which is reflected by the fact that chromosome 1 and 10
have fewer CRR elements than does chromosomes 4 and
8 in our collection (fig. 4).
Age of the CRR Elements
LTR nucleotide identity was used to estimate the ages
of the CRR elements using a reported nucleotide substitution rate of 6.53109 (Gaut et al. 1996). Average age,
standard deviation, youngest age, and oldest age of the
CRR elements from each of the four subfamilies are listed
in table 3. The average age of the two autonomous subfamilies, CRR1 and CRR2, are approximately 0.44 and
0.87 Myr, respectively. In contrast, the average age of the
noaCRR1 and noaCRR2 elements are 2.13 and 3.91 Myr,
respectively, which is significantly older than the autonomous elements. We did not find a correlation between the
age and the chromosomal locations of the CRR elements
(data not shown).
Association of CRR Elements with CenH3-Associated
Chromatin
The rice chromosome 8, including the centromere, has
been sequenced (Nagaki et al. 2004; Wu et al. 2004)(http://
www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml).
Chromosome 8 includes a total of 61 CRR1, CRR2,
noaCRR1 elements and solo LTRs derived from these three
subfamilies. Among these CRR elements, 45 are located in
the genetically defined centromere (fig. 3), and 21 are
located within the approximately 750-kb CenH3-associated
chromatin domain (Nagaki et al. 2004). The only noaCRR2
element associated with chromosome 8 is not located in the
centromere (fig. 4).
To further confirm whether all four CRR subfamilies
are associated with CenH3-associated chromatin, we
FIG. 2.—Phylogenetic analysis of the CR elements from rice and maize. Phylogenetic trees are constructed from the LTRs of the CR elements.
Bootstrap values in 100 tests are indicated on the branches.
850 Nagaki et al.
FIG. 3.—Dotplot analysis of the four CRR subfamilies using 60% match and a window size of 30. The sequence similarity of representative elements
from the four subfamilies were compared with each other. The specific domains of each element are drawn on the top or left sides of the plots.
conducted ChIP-PCR analysis using the rice anti-CenH3
antibody (Nagaki et al. 2004) and PCR primers designed
to the LTR regions (table 1). The rDNA, which is located
in the subtelomeric regions of rice chromosomes (Fukui,
Ohmido, and Khush 1994), was used as controls in the
ChIP-PCR analysis. A set of PCR primers was also
constructed from the LTR of RIRE3, which is one of the
most dominant Ty3/gypsy class of retrotransposons in
the rice genome and was found in the centromeric regions
of the rice chromosomes 5 (Nonomura and Kurata 2001)
and 8 (Nagaki et al. 2004).
The relative enrichment (RE) of the CRR1-LTR was 3.8
(standard error [SE] 5 60.6, n 5 5) on average, which is
significantly higher (P , 0.003) than the RE of the rDNA
Structure, Divergence, and Distribution of the CRR 851
FIG. 4.—Distribution of the CRR elements on the genetic maps of rice chromosomes 1, 4, 8, and 10. The CRR elements are plotted on the genetic map
using 10 cM per unit. The genetically mapped centromeric regions (Harushima et al. 1998) are located within one of the 10-cM units that is shown in a
shadowed box. Open bars represent the numbers of the CRR elements located within the genetically mapped centromeric regions.
control (0.9, SE 5 60.0, n 5 5) (fig. 6). REs of CRR2LTR and noaCRR1-LTR were 3.9 (SE 5 60.4, n 5 5)
and 7.3 (SE 5 60.7, n 5 5), respectively, both significantly increased in the immunoprecipitated fraction
(CRR2-LTR: P , 0.0002; noaCRR1-LTR: P , 0.00003).
The RIRE3-LTR was slightly increased (RE 5 1.4,
SE 5 60.1, n 5 5) compared with the rDNA control (P ,
0.004). However, the noaCRR2-LTR was not significantly
increased (RE 5 0.9, SE 5 60.1, n 5 5, P , 0.31) (fig.
6). These results confirmed that at least a portion of the
CRR1, CRR2, and noaCRR1 elements are located within
CenH3-associated chromatin. However, we failed to demonstrate the potential association of the noaCRR2 elements with
CenH3-associated chromatin.
Discussion
Evolution of the CRR Elements
Several types of nonautonomous LTR retrotransposons have been reported recently, including LARDs (large
retrotransposon derivatives) and TRIMs (terminal-repeat
retrotransposons in miniature) (Witte et al. 2001; Jiang
et al. 2002; Havecker, Gao, and Voytas 2004; Kalendar
et al. 2004). Both LARDs and TRIMs lack open reading
frames between the two LTRs but retain a primer-binding
site and a polypurine tract (Havecker, Gao, and Voytas
2004). Therefore, mobilization of LARDs and TRIMs relies
on the retrotransposition machinery provided from the
autonomous elements. Nevertheless, for most of the nonautonomous elements, it is not clear how they are mobilized
and what machinery is utilized for their retrotransposition.
Recently, a relationship between a nonautonomous retrotransposon (Dasheng) and an autonomous element (RIRE2)
has been demonstrated in rice (Jiang, Jordan, and Wessler
2002). Dasheng and RIRE2 share significant sequence similarity within LTRs, PBS, and PPT and have similar chromosomal distribution patterns. In addition, the presence
of chimeric RIRE2-Dasheng elements suggest possible
copackaging of RNAs from both elements in the same
viruslike particle (VLP) (Jiang, Jordan, and Wessler 2002).
The discovery of autonomous and nonautonomous
CRR elements provides another example of evolution of
nonautonomous retrotransposon families. The full-size
CRR elements encode for a polyprotein with all characteristic domains. Although some of the CRR1/CRR2 elements
seem to be slightly mutated within the coding region, others
seem to have the region intact, implying that they could be
still capable of autonomous transposition, which is in agreement with the young age estimated for these elements
(table 3). The noaCRR elements have different structures
(fig. 1A). Most noaCRR elements contain the gag or a
gag-pro gene, which is different from LARDs and TRIMs,
852 Nagaki et al.
FIG. 5.—FISH mapping on rice pachytene chromosomes using LTR sequences derived from CRR2 (A–C) and noaCRR1 (D–F). (A) and (D): FISH
signals derived from CentO probe pRCS2; (B) and (E): FISH signals derived from the LTR sequences of CRR2 and noaCRR1, respectively; (C) and (F):
Merged images. Arrowheads in (E) and (F) point to some of the noncentromeric FISH signals derived from the noaCRR1 LTR probe. Chromosomes are
stained by 4#,6-diamidino-2-phenylindole in blue. Bars represent 10 lm.
which contain no open reading frames. The CRR and
noaCRR elements share substantial sequence similarity
of the LTRs and have fully conserved PBS and PPT regions
(fig. 1B). There is also a strongly conserved heptanucleotide
inverted repeat at the termini of LTRs (fig. 1B). The
inverted repeats differ in sequence and length among different retroelements and are important for recognition by integrase (Hindmarsh and Leis 1999). Thus, the conservation of
these sites suggests that autonomous and nonautonomous
elements use the same or very similar enzyme machinery.
The presence of young noaCRR elements in the rice
genome (table 3) coupled with similar chromosomal distribution between noaCRR1 and CRR1/CRR2 elements further suggest that the noaCRR elements are likely mobilized
through the retrotransposition machinery from CRR elements, a similar scenario as Dasheng and RIRE2.
It is interesting to note that most noaCRR elements
contain the gag or gag-pro genes, a feature different from
LARDs and TRIMs. As reverse transcription takes place
only in the VLPs, nonautonomous elements must have a
mechanism that allows their RNA to be packaged during
the assembly of VLPs. This can be achieved by the presence
of encapsidation signals that should be conserved among
autonomous and nonautonomous elements. Several
candidate regions could be identified within LTR, 5#
UTR, and gag (data not shown). However, it remains unclear which of them, if any, serve as an actual encapsida-
tion signal. An alternative scenario can be envisioned for
some of the noaCRR2 elements. These elements encode
for a protein with the GAG and PRO domains, which alone
should be capable of RNA packaging, as was demonstrated
in the case of retroviruses (Swanstrom and Wills 1997). The
remaining enzymes supplied by autonomous elements
could be assembled into VLPs by a virtue of protein-protein
interactions between the GAG-PRO and GAG-PRO-POL
polyproteins. Such interactions are well documented in retroviruses and are very important in the process of the virion
assembly (Swanstrom and Wills 1997; Freed 1998). This
scenario cannot be readily applied to noaCRR1 elements,
as they lack portion of the nucleocapsid domain responsible
for RNA binding. However, even the noaCRR1 proteins are
likely to play some roles during the assembly process, as the
appropriate coding region appears to have evolved under
selection constraints. Alternatively, the discovery of several
noaCRR2 elements lacking the whole coding region suggests that all necessary enzymes could be supplied in trans,
like in the Dasheng and RIRE2 elements (Jiang, Jordan, and
Wessler 2002).
Targeting Specificity of the CRR Elements
The CRR elements are highly concentrated in the centromeric and pericentromeric regions (fig. 5). In the centromere of rice chromosome 8, the CRR elements are highly
Table 3
The Age of CRR Elements
Element
CRR1
CRR2
noaCRR1
noaCRR2
Average
Age(Myr)
Standard
Deviation
Youngest
Element(Myr)
Oldest
Element(Myr)
Number of
Elements Analyzed
0.44
0.87
2.13
3.91
0.37
0.70
2.22
2.73
0.10
0.00
0.00
0.20
0.84
2.12
8.82
6.18
3
9
30
5
Structure, Divergence, and Distribution of the CRR 853
FIG. 6.—ChIP-PCR analysis using the rice anti-CenH3 antibody. The
relative enrichment (RE) of different CRR elements as well as RIRE3 elements in the CenH3-associated chromatin is compared with the RE of the
rDNA control. Mean (n 5 5) RE levels are shown as histogram bars with
standard error. P-values calculated based on Student’s t-test are shown as
percentages.
enriched within the chromatin domain containing CenH3
(Nagaki et al. 2004). In maize, CRM elements are highly
intermingled with a centromeric satellite repeat CentC,
suggesting that CRM transposed preferentially into CentC
satellite arrays or into other CRM elements. Maize CenH3
is almost exclusively associated with intermingled CRM/
CentC sequences (Jin et al. 2004). These results suggest that
CR elements in both rice and maize transposed preferentially into CenH3-associated chromatin domains.
In yeast, the Ty3 element integrates only in DNA
encoding the 5# end of genes transcribed by RNA polymerase III. The mechanism of Ty3 integration appears to
involve the interaction between integration complex and
the TFIIIB component of the PolIII transcription apparatus
(Kirchner, Connolly, and Sandmeyer 1995). The targeting
of the Ty5 element into the heterochromatin domains is
determined by interactions between the targeting domain
of the integrase and the heterochromatin protein Sir4p
(Zhu et al. 2003). The preferential integration of CRR elements within and near the CenH3-associated DNA domain
suggests that the targeting mechanism of CRR elements
may involve an interaction with centromeric proteins.
CenH3, a histone H3 variant, would be a good candidate
because it is a constitutive component of the centromeric
chromatin. The Tf1 element of Schizosaccharomyces
pombe preferentially inserts in intergenic regions (Behrens,
Hayles, and Nurse 2000; Singleton and Levin 2002). It has
recently been proposed that the Tf1 integration may be controlled by an interaction of the chromodomain located at the
C terminal of the integrase with histone H3 methylated
at lysine 4 (Sandmeyer 2003). The N terminal of CenH3
is significantly diverged from the N terminal of histone
H3 (Henikoff, Ahmad, and Malik 2001), which would provide the specificity for recognition by the CRR elements.
Gorinsek, Gubensek, and Kordis (2004) recently reported
that the CR family shows clear differences in the integrase
sequences from other plant LTR retrotransposons. They
differ in the otherwise conserved sequence motifs in the
C-terminal region of the integrase, such as in the HPVFHS
motif and in two motifs of the chromodomain. It will be of
great interest to test whether the chromodomain of the CRR
integrase interacts with CenH3 in rice.
Interestingly, the nonautonomous CRR elements are
less specific to the centromeric regions compared with the
autonomous CRR elements (figs. 4 and 5). Furthermore,
the LTRs of the noaCRR1 element share more sequence similarity with the LTRs of autonomous elements than with the
LTRs of the noaCRR2 element (fig. 3). In parallel, noaCRR1
elements appear to target the centromeres more frequently
than the noaCRR2 elements (fig. 4), especially considering
the fact that we were not able to reveal an association
between the noaCRR2 elements with CenH3-associated
chromatin (fig. 6). These results support the hypothesis that
the LTR sequences may play a role in centromere specificity
of the CR family (Nagaki et al. 2003). Recognition of centromeric chromatin during noaCRR1 retrotransposition may
be error prone, resulting in a less centromeric specificity of
the noaCRR1 elements compared with the CRR1/CRR2 elements. Alternatively, but less likely, noaCRR1 elements
may transpose using the retrotransposition machinery from
other retrotransposon families, which would result in the loss
of the centromeric specificity.
CRR Elements and Grass Centromere Function
It has been well documented that retrotransposition
within or near genes will generate mutations or alter gene
expression (Kumar and Bennetzen 1999; Hirochika 2001).
However, few retrotransposons have been associated with
specific structural and/or functional roles. For example, the
telomeres of Drosophila chromosomes consist of long
tandem arrays of two non-LTR retrotransposons, HeT-A
and TART. These telomeric retrotransposons have a functional role in preventing the shortening of the chromosome
ends (Pardue and DeBaryshe 2003). The putative role of the
CR elements in centromere function was speculated mostly
because of their centromere specificity (Miller et al. 1998;
Presting et al. 1998). In maize, the core of the centromeres
consist of primarily intermingled CRM/CentC sequences
(Jin et al. 2004). Maize CenH3 is associated exclusively
with such intermingled CentC/CRM sequences (Zhong
et al. 2002; Jin et al. 2004). Association of CR elements
with CenH3 has also been demonstrated in rice (Nagaki
et al. 2004) (fig. 5). These recent results strongly suggest
a structural and/or functional role of the CR elements in
grass centromere function.
Jiang et al. (2003) recently proposed that deposition of
CenH3 in centromeres is possibly a transcription-mediated
event. Incorporation of CenH3 into centromeric chromatin
is independent of DNA replication (Shelby, Monier, and
Sullivan 2000; Ahmad and Henikoff 2001; Sullivan and
Karpen 2001). DNA transcription can result in displacement of histone molecules, which may provide an opportunity for CenH3 deposition/replacement (Jiang et al.
2003). DNA transcription in CenH3-associated chromatin
has been reported in a human neocentromere (Saffery et al.
2003) and in the centromere of rice chromosome 8 (Nagaki
et al. 2004). Nakano et al. (2003) recently showed that
activation of centromeric function of ectopically integrated alpha satellite sites on human chromosomes can
be achieved by treatment with histone deacetylase inhibitors, which also increases the acetylation level of histone
H3 and the transcription level of a marker gene within
854 Nagaki et al.
the ectopic centromeres. This result supports the hypothesis
on the relationship between DNA transcription and centromere assembly.
LTRs usually diverge faster than the other parts of the
retrotransposons. Even closely related retrotransposon families often have LTRs with no detectable sequence similarity. In contrast, the CR elements from different grass
species share substantial homology in the LTR sequences.
Highly conserved DNA motifs were found in the LTRs of
both autonomous and nonautonomous CR elements from
rice, maize, and barley (Nagaki et al. 2003). which were
diverged more than 55 Myr ago (Kellogg 2001). The conservation of LTRs of CR elements from distantly related
grass species suggests a selective pressure at the nucleotide
level. Because the transcriptional regulatory sequences
reside in the LTRs, the selection pressure of LTRs of
CR elements has probably been on their capacity to initiate
transcription. Transcription of CR elements and/or the
flanking centromeric satellite may be an important component of centromeric chromatin assembly in the grass species
(Jiang et al. 2003).
Acknowledgments
We thank Drs. Dan Voytas and Ning Jiang for their
valuable comments on the manuscript. This research was
supported by grants DE-FG02-01ER15266 and DEFG02-01ER15265 from U. S. Department of Energy to
J.J. and C.R.B., respectively. Z.C. is supported by grant
2002AA225011 from the Chinese State High-Tech
Program and grants 30100099 and 30325008 from the
National Natural Science Foundation of China.
Literature Cited
Ahmad, K., and S. Henikoff. 2001. Centromeres are specialized
replication domains in heterochromatin. J. Cell. Biol. 153:
101–110.
Ananiev, E. V., R. L. Phillips, and H. W. Rines. 1998. Chromosome-specific molecular organization of maize (Zea mays L.)
centromeric regions. Proc. Natl. Acad. Sci. USA 95:13073–
13078.
Aragon-Alcaide, L., T. Miller, T. Schwarzacher, S. Reader, and
G. Moore. 1996. A cereal centromeric sequence. Chromosoma
105:261–268.
Behrens, R., J. Hayles, and P. Nurse. 2000. Fission yeast retrotransposon Tf1 integration is targeted to 5# ends of open reading frames. Nucleic Acids Res. 28:4709–4716.
Bushman, F. D. 2004. Targeting survival: intergration site selection by retroviruses and LTR-retrotransposons. Cell 115:
135–138.
Chen, M., P. SanMiguel, A. C. de Oliveira, S. S. Woo, H. Zhang,
R. A. Wing, and J. L. Bennetzen. 1997. Microcolinearity in
sh2-homologous regions of the maize, rice, and sorghum
genomes. Proc. Natl. Acad. Sci. USA 94:3431–3435.
Cheng, Z. K., F. Dong, T. Langdon, S. Ouyang, C. B. Buell, M. H.
Gu, F. R. Blattner, and J. Jiang. 2002. Functional rice centromeres are marked by a satellite repeat and a centromerespecific retrotransposon. Plant Cell 14:1691–1704.
Cheng, Z., R. M. Stupar, M. Gu, and J. Jiang. 2001. A tandemly
repeated DNA sequence is associated with both knob-like
heterochromatin and a highly decondensed structure in the
meiotic pachytene chromosomes of rice. Chromosoma
110:24–31.
Devos, K. M., J. K. M. Brown, and J. L. Bennetzen. 2002.
Genome size reduction through illegitimate recombination
counteracts genome expansion in Arabidopsis. Genome Res.
12:1075– 1079.
Dong, F., J. T. Miller, S. A. Jackson, G. L. Wang, P. C. Ronald,
and J. Jiang. 1998. Rice (Oryza sativa) centromeric regions
consist of complex DNA. Proc. Natl. Acad. Sci. USA
95:8135–8140.
Feng, Q., Y. J. Zhang, P. Hao, et al. (74 co-authors). 2002.
Sequence and analysis of rice chromosome 4. Nature
420:316–320.
Feschotte, C., N. Jiang, and S. R. Wessler. 2003. Plant transposable elements: where genetics meets genomics. Nat. Rev.
Genet. 3:329–341.
Freed, E. O. 1998. HIV-1 gag proteins: diverse functions in the
virus life cycle. Virology 251:1–15.
Fukui, K., N. Ohmido, and G. S. Khush. 1994. Variability in
rDNA loci in the genus Oryza detected trough fluorescence
in-situ hybridization. Theor. Appl. Genet. 87:893–899.
Gaut, B. S., B. R. Morton, B. C. McCaig, and M. T. Clegg. 1996.
Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate
differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA
93:10274–10279.
Gorinsek, B., F. Gubensek, and D. Kordis. 2004. Evolutionary
genomics of chromoviruses in eukaryotes. Mol. Biol. Evol.
21:781–798.
Harushima, Y., M. Yano, A. Shomura et al. (17 co-authors). 1998.
A high-density rice genetic linkage map with 2275 markers
using a single F2 population. Genetics 148:479–494.
Havecker, E. R., X. Gao, and D. F. Voytas. 2004. The diversity of
LTR retrotransposons. Genome Biol. 4:225.
Henikoff, S. 2002. Near the edge of a chromosome’s Ôblack holeÕ.
Trends Genet. 18:165–167.
Henikoff, S., K. Ahmad, and H. S. Malik. 2001. The centromere
paradox: stable inheritance with rapidly evolving DNA.
Science 293:1098–1102.
Heslop-Harrison, J. S., A. Brandes, S. Takeda et al. (14 co-authors).
1997. The chromosomal distributions of Ty1-copia group
retrotransposable elements in higher plants and their implications for genome evolution. Genetica 100:197–204.
Hindmarsh, P., and J. Leis. 1999. Retroviral DNA integration.
Microbiol. Mol. Biol. Rev. 63:836–843.
Hirochika, H. 2001. Contribution of the Tos17 retrotransposon
to rice functional genomics. Curr. Opin. Plant Biol. 4:118–122.
Jiang, J., J. B. Birchler, W. A. Parrott, and R. K. Dawe. 2003. A
molecular view of plant centromeres. Trends Plant Sci. 8:
570–575.
Jiang, J., S. Nasuda, F. Dong, C. W. Scherrer, S. Woo, R. A. Wing,
B. S. Gill, and D. C. Ward. 1996. A conserved repetitive DNA
element located in the centromeres of cereal chromosomes.
Proc. Natl. Acad. Sci. USA 93:14210–14213.
Jiang, N., Z. Bao, S. Temnykh, Z. Cheng, J. Jiang, R. A. Wing,
S. R. McCouch, and S. R. Wessler. 2002. Dasheng: a recently
amplified non-autonomous LTR element that is a major
component of pericentromeric regions in rice. Genetics 161:
1293–1305.
Jiang, N., I. K. Jordan, and S. R. Wessler. 2002. Dasheng and
RIRE2. A nonautonomous long terminal repeat element and
its putative autonomous partner in the rice genome. Plant
Physiol. 130:1697–1705.
Jin, W. W., J. R. Melo, K. Nagaki, P. B. Talbert, S. Henikoff,
R. K. Dawe, and J. Jiang. 2004. Maize centromeres: organization and functional adaptation in the genetic background of oat.
Plant Cell 16:571–581.
Structure, Divergence, and Distribution of the CRR 855
Kalendar, R., C. M. Vicient, O. Peleg, K. Anamthawat-Jonsson,
A. Bolshoy, and A. H. Schulman. 2004. Large retrotransposon derivatives: abundant, conserved but nonautonomous retroelements of barley and related genomes. Genetics
166:1437–1450.
Kellogg, E. A. 2001. Evolutionary history of the grasses. Plant
Physiol. 125:1198–1205.
Kirchner, J., C. M. Connolly, and S. B. Sandmeyer. 1995.
Requirement of RNA-polymerase III transcription factors
for in vitro position-specific integration of a retrovirus-like
element. Science 267:1488–1491.
Kumar, A., and J. L. Bennetzen. 1999. Plant retrotransposons.
Annu. Rev. Genet. 33:479–532.
Langdon, T., C. Seago, M. Mende, M. Leggett, H. Thomas, J. W.
Forster, H. Thomas, R. N. Jones, and G. Jenkins. 2000. Retrotransposon evolution in diverse plant genomes. Genetics
156:313–325.
Malik, H. S., and T. H. Eickbush. 2001. Phylogenetic analysis of
ribonuclease H domains suggests a late, chimeric origin of
LTR retrotransposable elements and retroviruses. Genome
Res. 11:1187–1197.
Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott et al. (27
co-authors). 2003. CDD: a curated Entrez database of conserved
domain alignments. Nucleic Acids Res. 31:383–387.
Miller, J. T., F. Dong, S. A. Jackson, J. Song, and J. Jiang. 1998.
Retrotransposon-related DNA sequences in the centromeres of
grass chromosomes. Genetics 150:1615–1623.
Mroczek, R. J., and R. K. Dawe. 2003. Distribution of retroelements in centromeres and neocentromeres of maize. Genetics
165:809–819.
Nagaki, K., Z. K. Cheng, S. Ouyang, P. B. Talbert, M. Kim, K. M.
Jones, S. Henikoff, C. R. Buell, and J. Jiang. 2004. Sequencing
of a rice centromere uncovers active genes. Nature Genet.
36:138–145.
Nagaki, K., J. Song, S. M. Stupar et al. (12 co-authors). 2003.
Molecular and cytological analyses of large tracks of centromeric DNA reveal the structure and evolutionary dynamics of
maize centromeres. Genetics 163:759–770.
Nakano, M., Y. Okamoto, J. I. Ohzeki, and H. Masumoto. 2003.
Epigenetic assembly of centromeric chromatin at ectopic asatellite sites on human chromosomes. J. Cell Sci. 116:
4021–4034.
Nonomura, K., and N. Kurata. 2001. The centromere composition
of multiple repetitive sequences on rice chromosome 5.
Chromosoma 110:284–291.
Pardue, M. L., and P. G. DeBaryshe. 2003. Retrotransposons
provide an evolutionarily robust non-telomerase mechanism
to maintain telomeres. Annu. Rev. Genet. 37:485–511.
Presting, G. G., L. Malysheva, J. Fuchs, and I. Schubert. 1998. A
Ty3/gypsy retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J. 16:721–728.
Saffery, R., H. Sumer, S. Hassan, L. H. Wong, J. M. Craig, K.
Todokoro, M. Anderson, A. Stafford, and K. H. A. Choo.
2003. Transcription within a functional human centromere.
Mol. Cell 12:509–516.
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new
method for reconstructing phylogenic trees. Mol. Biol. Evol.
4:406–425.
Sandmeyer, S. 2003. Intergration by design. Proc. Natl. Acad. Sci.
USA 100:5586–5588.
SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L.
Bennetzen. 1998. The paleontology of intergene retrotransposons of maize. Nature Genet. 20:43–45.
SanMiguel, P., A. Tikhonov, Y. K. Jin et al (11 co-authors). 1996.
Nested retrotransposons in the intergenic regions of the maize
genome. Science 274:765–768.
Sasaki, T., T. Matsumoto, K. Yamamoto et al. (79 co-authors).
2002. The genome sequence and structure of rice chromosome
1. Nature 420:312–316.
Shelby, R. D., K. Monier, and K. F. Sullivan. 2000. Chromatin
assembly at kinetochores is uncoupled from DNA replication.
J. Cell Biol. 115:1113–1118.
Singleton, T. L., and H. L. Levin. 2002. A long terminal repeat
retrotransposon of fission yeast has strong preferences for specific sites of insertion. Eukaryot. Cell 1:44–55.
Sprinzl, M., K. S. Vassilenko, J. Emmerich, and F. Bauer. 1999,
Compilation of tRNA sequences and sequences of tRNA genes.
http://www.uni-bayreuth.de/departments/biochemie/trna/.
Staden, R. 1996. The Staden sequence analysis package. Mol.
Biotechnol. 5:233–241.
Sullivan, B., and G. Karpen. 2001. Centromere identity in Drosophila is not determined in vivo by replication timing. J. Cell.
Biol. 154:683–690.
Swanstrom, R., and J. W. Wills. 1997. Synthesis, assembly, and
processing of viral proteins. Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, New York.
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and
D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment
aided by quality analysis tools. Nucleic Acids Res. 25:
4876–4882.
Witte, C. P., Q. H. Le, T. E. Bureau, and A. Kumar. 2001.
Terminal-repeatretrotransposons in miniature (TRIM) are
involved in restructuring plant genomes. Proc. Natl. Acad.
Sci. USA 98:13778–13783.
Wu, J. Z., H. Yamagata, M. Hayashi-Tsugane et al. (21 coauthors). 2004. Composition and structure of the centromeric
region of rice chromosome 8. Plant Cell 16:967–976.
Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse trascriptase sequences.
EMBO J. 9:3353–3362.
Yu, Y. S., T. Rambo, J. Currie et al. (104 co-authors). 2003. Indepth view of structure, activity, and evolution of rice chromosome 10. Science 300:1566–1569.
Zhang, Y., Y. C. Huang, L. Zhang et al (12 co-authors). 2004.
Structural features of the rice chromosome 4 centromere.
Nucleic Acids Res. 32:2023–2030.
Zhong, C. X., J. B. Marshall, C. Topp, R. Mroczek, A. Kato, K.
Nagaki, J. A. Birchler, J. M. Jiang, and R. K. Dawe. 2002. Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14:2825–2836.
Zhu, Y., J. Dai, P. G. Fuerst, and D. F. Voytas. 2003. Controlling
integration specificity of a yeast retrotranposon. Proc. Natl.
Acad. Sci. USA 100:5891–5895.
Spencer V. Muse, Associate Editor
Accepted December 14, 2004