Alu Monomer Revisited: Recent Generation of

Alu Monomer Revisited: Recent Generation of Alu Monomers
Kenji K. Kojima*,1,2
1
Department of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-Ku,
Yokohama, Kanagawa, Japan
2
Genetic Information Research Institute, Mountain View, California
*Corresponding author: E-mail: [email protected].
Associate editor: Norihiro Okada
Abstract
Alu is a predominant short interspersed element (SINE) family in the human genome and consists of two monomer units
connected by an A-rich linker. At present, dimeric Alu elements are active in humans, but Alu monomers are present as
fossilized sequences. A comparative genome analysis of human and chimpanzee genomes revealed eight recent insertions
of Alu monomers. One of them was a retroposed product of another Alu monomer with 3# transduction. Further analysis
of 1,404 loci of the Alu monomer in the human genome revealed that some Alu monomers were recently generated by
recombination between the internal and 3# A-rich tracts inside of dimeric Alu elements. The data show that Alu
monomers were generated by 1) retroposition of other Alu monomers and 2) recombination between two A-rich tracts.
Key words: Alu, monomer, dimer, retroposition, recombination, evolution.
Supplementary Material online). Among them, five
chimpanzee-specific insertions were generated by the
retroposition of BC200 RNA (Kuryshev et al. 2001). Of
the remaining eight Alu monomer insertions, five were
similar to the left monomer of the AluY lineage; two to
the left monomer of the AluS lineage; and one to the left
monomer of the AluJ lineage.
h14_44393 was an AluJ-like monomer on chromosome
13, and the sequence most similar to h14_44393 was also
a monomer (PM_h14_44393) located on chromosome 19
(fig. 1). The two human Alu monomers differed from each
other with respect to only two nucleotides. Alu monomers
orthologous to PM_h14_44393 were found in the genomes
of the chimpanzee, orangutan, gibbon, and baboon. There
is no insertion at the orthologous locus in the mouse lemur
genome. The comparison of uninserted and inserted loci
revealed that PM_h14_44393 was inserted with 12-bp
target site duplications (TSDs) as a monomer after the split
of Strepsirrhini (lemurs, lorises, and bush babies) and
before the split between apes and Old World monkeys.
Notably, h14_44393 contained the 3# flanking TSD of
PM_h14_44393, clearly showing that PM_h14_44393 is
the parental element of h14_44393. Distinct TSDs and
flanking sequences of h14_44393 and PM_h14_44393
show that h14_44393 was not generated by segmental duplication of PM_h14_44393. It indicates that h14_44393 is
generated by retroposition of PM_h14_44393 with 3#
flanking sequence transduction. It is noteworthy that retrocopies of BC200 RNA also showed 3# transduction downstream of their polyA tracts (supplementary material S1,
Supplementary Material online).
Next, all Alu monomers were surveyed in the human
genome. There were 1,404 Alu monomers, and 304 of them
were more similar to dimeric Alu than ancient Alu monomers
© The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 28(1):13–15. 2011 doi:10.1093/molbev/msq218
Advance Access publication August 16, 2010
13
Letter
Alu is the most abundant repetitive sequence accounting
for ;11% of the human genome (Lander et al. 2001). A
typical Alu sequence is a dimer consisting of two 7SL
RNA-derived monomer units connected by an A-rich linker
(Ullu and Tschudi 1984). Dimeric Alus are primarily
classified into three lineages (J, S, and Y) (Britten et al.
1988; Jurka and Smith 1988; Batzer et al. 1996). AluJ
elements are inactive in humans, whereas some AluS
and AluY elements actively retropose in the human
genome (Bennett et al. 2008).
Dimeric Alu elements are considered to be originally
generated by a fusion between free left Alu monomer
(FLAM)-C and free right Alu monomer (Quentin 1992a).
Two other types of Alu monomers, fossil Alu monomer
(FAM) and FLAM-A (or free left Alu) are likely the
precursors of these monomeric Alu elements (Jurka and
Zuckerkandl 1991; Quentin 1992b). FLAM-A is nearly identical to the rodent short interspersed element (SINE) family
PB1 (Quentin 1994), and SINEs derived from FLAM-A/PB1
have been identified in the genomes of rodents, tree shrews,
and primates (Nishihara et al. 2002; Kriegs et al. 2007). It
means that FLAM-A/PB1 originated from the common
ancestor of Supraprimates, which includes primates, tree
shrews, flying lemurs, rodents, and lagomorphs. No Alu
monomers have been reported to be active in the human
genome with the exception of BC200 (Kuryshev et al. 2001).
BC200 is a small non–protein coding RNA gene derived by
a FLAM-like monomer insertion shared by New World
monkeys, Old World monkeys, and apes. More than 200
retroposed copies of BC200 are found in the human genome.
On analyzing human-specific and chimpanzee-specific
insertions (Kojima and Okada 2009; Kojima 2010), six
human-specific and seven chimpanzee-specific Alu monomer insertions were found (supplementary material S1,
MBE
Kojima · doi:10.1093/molbev/msq218
FIG. 1 Alignment of h14_44393, PM_h14_44393, and Alu consensus sequences. TSDs are in lowercase. Substitutions shared among
PM_h14_44393 and h14_44393 are shaded except CpG sites.
(supplementary materials S2 and S3, Supplementary Material online). Orthologous loci from eight primate species
(chimpanzee, orangutan, gibbon, macaque, marmoset, tarsier, mouse lemur, and bush baby) were searched for all Alu
monomers.
No Alu monomers shared by the human genome and
the genomes of nonprimates were found (supplementary
material S2, Supplementary Material online). One FAM
insertion was not present at the orthologous locus of
the tree shrew genome (supplementary material S2, Supplementary Material online). This shows that FAM was
active after the branching of primates from tree shrews.
Three uninserted empty loci for human FLAM-A insertions
were identified from the tarsier genome (supplementary
material S2, Supplementary Material online); thus,
FLAM-A was still active after the branching of tarsiers. Only
one FLAM-C insertion was traced back to the common
ancestor of all primates. Recent generations of FLAM-C
include BC200 retroposition and recombination of polyA
tracts of existing AluJ elements as described below.
Several orthologous loci for human Alu monomers in
some other primates were occupied by dimeric Alu elements.
If at least two successive sister groups have dimeric Alu
insertions instead of the Alu monomer, the dimeric Alu is
likely to have been replaced by the Alu monomer on the
lineage leading to humans. Four loci with such distribution
were found (table 1). In all four cases, TSDs were conserved.
In the case of human_chr16_2851, the human and chimp
genomes had AluY-like monomers (fig. 2). The genomes of
the orangutan, gibbon, macaque, and baboon had AluY-like
dimers instead. High similarity between the Alu monomers
and the left monomers of dimeric Alu indicates that the
Alu monomer was derived from the deletion of the right
Alu monomer through recombination between the internal
and 3# A-rich tracts. The other three loci also showed high
similarity between the Alu monomer and the left monomer
of dimeric Alu (supplementary material S4, Supplementary
Material online). The FLAM-C monomer, human_chr1_
4450, was likely derived from an AluJ insertion. The internal
A-rich tract was extended in all dimeric Alus at orthologous
loci of Alu monomers. It is likely that this extended polyA
tract recombined with the polyA tail to generate the Alu
monomer.
This study indicated that at least two mechanisms contribute to the generation of recent Alu monomers (fig. 3). The
first mechanism is represented by h14_44393 (fig. 1). This
14
monomer was generated by retroposition of another monomer, that is, PM_h14_44393. The second mechanism is
represented by human_chr16_2851 (fig. 2). Dimeric Alu
insertion and subsequent recombinational deletion of the
right Alu monomer result in human_chr16_2851 monomer.
Recombination is likely enhanced by the extension of
A-rich tracts.
Methods
Genome sequences were downloaded from the University of California Santa Cruz Genome Browser (http://
hgdownload.cse.ucsc.edu/downloads.html), the Washington
University Genome Sequencing Center (http://genome.wus
tl.edu/), the Broad Institute Mammalian Genome Project
Web site (http://www.broad.mit.edu/mammals/), and the
Trace Archive database at the National Center for Biotechnology Information (NCBI) Web site (http://www.ncbi.nlm
.nih.gov/).
Human-specific and chimpanzee-specific Alu insertions
were collected as described in previous reports (Kojima and
Okada 2009; Kojima 2010). 3#-truncated Alu insertions
were manually analyzed.
All Alu insertions in the human genome were characterized with the aid of RepeatMasker (http://www.repeatmasker
.org/) and the RepBase library (http://www.girinst.org/
repbase/). Alu insertions that were longer than 100 bp,
flanked by 13–21-bp TSDs, and ended between positions
111 and 149 were selected. Alu subfamily classification
was based on RepeatMasker results. To characterize the
distribution of each human Alu monomer, NCBI Blast
was used (ftp://ftp.ncbi.nlm.nih.gob/blast/executables/)
and the genomes of eight primate species (chimpanzee,
orangutan, gibbon, macaque, marmoset, tarsier, mouse
lemur, and bush baby) were examined in order to manually
determine whether Alu is present or absent at the
corresponding loci.
Table 1. Distribution of Alu Monomers and Dimers at
Orthologous Loci.
Insertion at Orthologous Loci
Alu Monomer
human_chr1_4450
human_chr6_4284
human_chr14_1978
human_chr16_2851
Family
FLAM-C
AluS
AluS
AluY
H
M
M
M
M
C
M
M
M
M
O
M
D
M
D
G
M
ND
M
D
Mac
D
D
D
D
Mar
D
D
D
U
NOTE.—H, human; C, chimpanzee; O, orangutan; G, gibbon; Mac, macaque; Mar,
marmoset, M, monomer; D, dimer; ND, not determined; U, uninserted.
MBE
Recent Generation of Alu Monomers · doi:10.1093/molbev/msq218
FIG. 2 Alignment of orthologous loci of human_chr16_2851 locus in primates. TSDs are shaded.
Supplementary Material
Supplementary materials S1–S4 are available at
Molecular Biology and Evolution online (http://www.
mbe.oxfordjournals.org/).
Acknowledgments
The author thanks Dr. Norihiro Okada for useful suggestions. This work was supported by grants from the Ministry
of Education, Culture, Sports, Science, and Technology.
K.K.K. was the recipient of a Grant-in-Aid from the Japan
Society for the Promotion of Science for Young scientists.
References
Batzer MA, Deininger PL, Hellmann-Blumberg U, Jurka J, Labuda D,
Rubin CM, Schmid CW, Zietkiewicz E, Zuckerkandl E. 1996.
Standardized nomenclature for Alu repeats. J Mol Evol. 42:3–6.
Bennett EA, Keller H, Mills RE, Schmidt S, Moran JV,
Weichenrieder O, Devine SE. 2008. Active Alu retrotransposons
in the human genome. Genome Res. 18:1875–1883.
Britten RJ, Baron WF, Stout DB, Davidson EH. 1988. Sources and
evolution of human Alu repeated sequences. Proc Natl Acad Sci
U S A. 85:4770–4774.
Jurka J, Smith T. 1988. A fundamental division in the Alu family of
repeated sequences. Proc Natl Acad Sci U S A. 85:4775–4778.
Jurka J, Zuckerkandl E. 1991. Free left arms as precursor molecules in
the evolution of Alu sequences. J Mol Evol. 33:49–56.
Kojima KK. 2010. Different integration site structures between L1
protein-mediated retrotransposition in cis and retrotransposition in trans. Mobile DNA. 1:17.
Kojima KK, Okada N. 2009. mRNA retrotransposition coupled with
5# inversion as a possible source of new genes. Mol Biol Evol.
26:1405–1420.
Kriegs JO, Churakov G, Jurka J, Brosius J, Schmitz J. 2007.
Evolutionary history of 7SL RNA-derived SINEs in Supraprimates.
Trends Genet. 23:158–161.
Kuryshev VY, Skryabin BV, Kremerskothen J, Jurka J, Brosius J. 2001.
Birth of a gene: locus of neuronal BC200 snmRNA in three
prosimians and human BC200 pseudogenes as archives of
change in the Anthropoidea lineage. J Mol Biol. 309:1049–1066.
Lander ES, Linton LM, Birren B, et al. (100 co-authors). 2001. Initial
sequencing and analysis of the human genome. Nature
409:860–921.
Nishihara H, Terai Y, Okada N. 2002. Characterization of novel Aluand tRNA-related SINEs from the tree shrew and evolutionary
implications of their origins. Mol Biol Evol. 19:1964–1972.
Quentin Y. 1992a. Fusion of a free left Alu monomer and a free right
Alu monomer at the origin of the Alu family in the primate
genomes. Nucleic Acids Res. 20:487–493.
Quentin Y. 1992b. Origin of the Alu family: a family of Alu-like
monomers gave birth to the left and the right arms of the Alu
elements. Nucleic Acids Res. 20:3397–3401.
Quentin Y. 1994. A master sequence related to a free left Alu
monomer (FLAM) at the origin of the B1 family in rodent
genomes. Nucleic Acids Res. 22:2222–2227.
Ullu E, Tschudi C. 1984. Alu sequences are processed 7SL RNA
genes. Nature 312:171–172.
FIG. 3 Mechanisms for recent generation of Alu monomers. (A) Alu monomer is transcribed with its 3# flanking sequence, and the Alu RNA is
retrotransposed into another locus. (B) The polyA tract at the middle of dimeric Alu and the polyA tail are extended, and the latter half of
dimeric Alu is deleted through the recombination between these two polyA tracts.
15