Article Hominoid Composite Non-LTR

Hominoid Composite Non-LTR Retrotransposons—Variety,
Assembly, Evolution, and Structural Determinants of
Mobilization
Bianca Ianc,y,1 Cornelia Ochis,y,1 Robert Persch,2 Octavian Popescu,1,3 and Annette Damert*,1
1
Institute for Interdisciplinary Research in Bio-Nano-Sciences, Molecular Biology Center, Babes-Bolyai-University, Cluj-Napoca,
Romania
2
Wiesbaden, Germany
3
Institute of Biology, Romanian Academy, Bucharest, Romania
y
These authors contributed equally to this work.
*Corresponding author: E-mail: [email protected].
Associate editor: Katja Nowick
Abstract
SVA (SINE-R-VNTR-Alu) elements constitute the youngest family of composite non-LTR retrotransposons in hominoid
primates. The sequence of their assembly, however, remains unclear. Recently, a second family of VNTR-containing
composites, LAVA (L1-Alu-VNTR-Alu), has been identified in gibbons. We now report the existence of two additional
VNTR composite families, PVA (PTGR2-VNTR-Alu) and FVA (FRAM-VNTR-Alu), in the genome of Nomascus leucogenys.
Like LAVA, they share the 50 -Alu-like region and VNTR with SVA, but differ at their 30 -ends. The 30 -end of PVA comprises
part of the PTGR2 gene, whereas FVA is characterized by the presence of a partial FRAM element in its 30 -domain. Splicing
could be identified as the mechanism of acquisition of the variant 30 -ends in all four families of VNTR composites. SVAs
have been shown to be mobilized by the L1 protein machinery in trans. A critical role in this process has been ascribed to
their 50 -hexameric repeat/ Alu-like region. The Alu-like region displays specific features in each of the VNTR composite
families/subfamilies with characteristic deletions found in the evolutionary younger subfamilies. Using reciprocal exchanges between SVA_E and PVA/FVA elements, we demonstrate that the structure, not the presence of the (CCCTCT)n/
Alu-like region determines mobilization capacity. Combination of LAVA and SVA_E domains does not yield any active
elements—suggesting the use of different combinations of host factors for the two major groups of VNTR composites.
Finally, we demonstrate that the LAVA 30 -L1ME5 fragment attenuates mobilization capacity.
Key words: retrotransposon, SVA, VNTR, Nomascus leucogenys.
Introduction
of the element (Raiz et al. 2012). A more recent analysis
revealed that none of the SVA domains is essential for retrotransposition. Furthermore, it identified the 50 -end as the
“minimal active human SVA” (Hancks et al. 2012).
Recently, a second family of primate composite retrotransposons, LAVA, has been identified in gibbons (Damert A,
unpublished data [Carbone et al. 2012]). LAVA elements
share with SVA the 50 -CCCTCT hexameric repeats and
Alu-like region as well as the central VNTR. The 30 -end (LA
domain) comprises two unique regions (U1 and U2) separated by an AluSz sequence and followed by an antisense
L1ME5 fragment. Around 1,800 LAVA copies are found in
the genome of the Northern white-cheeked gibbon,
Nomascus leucogenys (NLE). They can be subdivided in 22
subfamilies (Carbone et al. 2014).
The shared 50 -end and central VNTR suggest a common
ancestor for SVA and LAVA. The prototype SVA element has
most likely been assembled before the split of gibbons and
great apes but after the divergence of hominoid and Old
World primates, as SVAs are not found in the Rhesus macaque (Han et al. 2007). 50 -Truncated LAVA copies (VNTR-
ß The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 31(11):2847–2864 doi:10.1093/molbev/mst256 Advance Access publication September 12, 2014
2847
Article
SVA (SINE-R-VNTR-Alu), a family of hominoid specific retrotransposons, presents a unique composite structure combining sequences derived from other retroelements (Alu and
HERV-K) with a variable number of tandem repeats
(VNTR) region (Shen et al. 1994; Ostertag et al. 2003). SVA
elements appear to be one of the most versatile vectors for
shuffling sequences and, thus, have an appreciable impact on
genome evolution. SVAs can transduce heterologous sequences at both their 50 - (Damert et al. 2009; Hancks et al.
2009) and 30 - (Xing et al. 2006) ends. They have also been
shown to function as exon traps, incorporating coding sequence into the elements proper through splicing (Damert
et al. 2009; Hancks et al. 2009).
SVAs have been shown to be mobilized by the L1 protein
machinery in trans. Results regarding dependency on
L1ORF1p are controversial, depending on whether an
ORF2p-only expression vector or bicistronic vectors carrying
ORF1 mutations or an in-frame deletion have been used
(Hancks et al. 2011; Raiz et al. 2012). Reduction of the retrotransposition rate following deletion of the SVA 50 -hexameric
repeat/Alu-like region suggested a functional role for this part
Ianc et al. . doi:10.1093/molbev/mst256
LA) are found in regions of segmental duplication in humans
and great apes but are absent from the Rhesus genome
(Damert A, unpublished data). Based on this observation assembly of the prototype LAVA element can be assumed to
have occurred in approximately the same period as that of
SVA. The sequence and mechanism(s) of the assembly of
these prototype elements are, however, elusive. All four domains constituting an SVA/LAVA element are independently
present in the Rhesus genome. The most likely precursor of
SVA/LAVA is SVA2—a VNTR carrying heterologous sequence
at its 30 -end, terminating in a poly A tail and flanked by target
site duplications (TSD) (Jurka 2000; Jurka et al. 2005; Han et al.
2007). Other assembly intermediates, for example, an SVA2
fused to the Alu-like region characteristic for SVA/LAVA have
not been identified to date—neither in the Rhesus genome
nor in the genomes of the great apes.
The identification of a second family of VNTR composite
retrotransposons in gibbons (Carbone et al. 2012) suggested
that such intermediates might be found in this sister taxon to
the great apes. To address the issue, we analyzed the genome
of the Northern white-cheeked gibbon, NLE (Carbone et al.
2014). Although we failed to detect SVA assembly
intermediates, we discovered two additional families of
VNTR-containing composites sharing the CCCTCT hexameric
repeats and Alu-like region with SVA and LAVA elements, but
differing at their 30 -ends—PVA (PTGR2-VNTR-Alu) and FVA
(FRAM-VNTR-Alu). The distinct characteristics of the
PVA and FVA VNTR—30 -intersections as well as of those
found in LAVA and the NLE SVA copies suggest splicing as
the mechanism of assembly for the founder elements of
VNTR composites.
The identification of a total number of four families of
VNTR composite retrotransposons sharing the 50 and
VNTR domains but differing at their 30 -ends prompted us
to investigate their retrotransposition potential and the contribution of their functional domains in more detail. Using a
previously established cell-based assay (Moran et al. 1996; Raiz
et al. 2012), we found considerable differences in the mobilization potential of PVA and FVA on the one hand and LAVA
elements on the other hand. Results obtained following
reciprocal exchange of the 50 -hexameric repeat/Alu-like
region between PVA/FVA and an active SVA_E element indicate that it is not the mere presence but the specific
sequence-based structure of this domain that determines
retrotransposition efficiency. LAVA and SVA_E domains
were found to be incompatible suggesting different mobilization pathways for these two families. Finally, we demonstrate
that the LAVA 30 -L1ME5 sequence has an inhibitory effect on
retrotransposition.
Results
The NLE Genome Harbors Four Different Types of
VNTR-Containing Composite Retrotransposons
A search of the NLE whole-genome shotgun sequences (wgs)
(Carbone et al. 2014) using the Alu-like region of the SVA_A
consensus (Wang et al. 2005) retrieved a number of composite elements, which were flanked by TSD and carried CCCTCT
2848
MBE
hexameric repeats and the SVA Alu-like region at their 50 ends (fig. 1). As in SVA these were followed by VNTR regions
of variable length. At the 30 -ends, however, four different
types of sequences could be distinguished, the SVA SINE-R
being one of them. The most numerous of the families, LAVA
(Carbone et al. 2012), is represented by approximately 1,800
elements, falling into 22 subfamilies (Carbone et al. 2014). The
30 -end of the second largest family does not contain any repetitive sequences. Instead, exon 4 and the 50 -part of intron 4
of the gene encoding prostaglandin reductase 2 (PTGR2) were
found to be fused to the 50 -part of the unique sequence
characteristic for SVA2 elements. The third type of VNTRcontaining composites is characterized by a 30 -end including
part of a FRAM (Free Right Alu Monomer) element embedded in otherwise nonrepetitive sequence. According to their
distinguishing features the two families were called PVA
(PTGR2-VNTR-Alu) and FVA (FRAM-VNTR-Alu). Thus,
there are four different families of VNTR containing composite retrotransposons present in the NLE genome.
Gibbon SVAs Amplified Independently from Those in
the Great Ape Lineage
Twenty-nine SVA elements were identified in the NLE
genome (table 1 and supplementary table S1, Supplementary
Material online). The consensus sequence constructed is
closest to human SVA_A. It displays, however, diagnostic
substitutions compared with the human elements (fig. 2A
and B). Inspection of the orthologous loci in the human,
chimpanzee, and orangutan genomes revealed that SVA elements are absent from the corresponding positions in great
apes in all cases where this could be assessed. For a small
number of elements, their state could not be determined in
either one or all of the great ape genomes due to the lack of
sequence information. Amplification time based on divergence in the SINE-R region was estimated at 11.3 Ma, placing
expansion of gibbon SVAs well after the split from great apes
(~20 Ma) but before separation of the four Gibbon genera
(5.6–8 Ma) (Chan et al. 2010; Israfil et al. 2011).
PVA and FVA Elements—Copy Number and Origin
In total, 143 PVA and 11 FVA elements could be identified in
the NLE genome sequence (table 1 and supplementary tables
S2 and S3, Supplementary Material online). TSDs were found
for most of the elements. Neither PVA nor FVA elements
display a subfamily structure. Their Alu-like regions are
most closely related to SVA_A/SVANLE and the ancestral
LAVA_A2 subfamily (Carbone et al. 2014) (fig. 2A). The
30 -part of PVA elements is constituted by exon 4 and the
50 -part of intron 4 of the PTGR2 gene. A polyadenylation
signal present in PTGR2 intron 4 is used for 30 -end processing
of PVA RNA. In Rhesus macaques the PTGR2 gene is localized
on chromosome 7, the corresponding NLE sequence maps to
chr22a:32,514,746–32,515,039 (fig. 2C). PVA elements represent the second case where an entire exon of a protein-coding
gene has been incorporated into a VNTR composite. The SVA
subfamily F1 acquired exon 1 of MAST2 through splicing to its
Alu-like region (Damert et al. 2009; Hancks et al. 2009).
MBE
Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256
FIG. 1. VNTR-containing composite retrotransposons in the genome of NLE. All families comprise CCCTCT hexameric repeats and an Alu-like region at
their 50 -ends, but have acquired different sequences downstream of the VNTR region. SVA2—not strictly a VNTR composite—is given for comparison.
Al, partial AluSz sequence; L1, partial L1ME5 sequence; PTGR2, exon 4 and 50 -part of intron 4 of the gene encoding Prostaglandin Reductase 2; FR, partial
FRAM sequence; An, poly A tail; Polyadenylation signals are denoted with asterisks, ovals represent TSD. Schematic representations of LAVA_A2 and
LAVA_F2 are based on the Gibbon Genome Sequencing Consortium subfamily categorization (Carbone et al. 2014).
Table 1. Summary Statistics of PVA, FVA, and SVA Elements in the NLE Genome.
Family
Count
50 -n.d.a
50 -Truncated
PVA
FVA
SVANLE
143
11
29
22
1
1
41 (33.6%)
5 (45.5%)
6 (20.7%)
50 -Transductions
(Spliced RNAs)b
4 (3)
—
3 (1)
50 - and 30 -Transductions
(Spliced RNAs)b
3 (3)
—
—
30 -Transductions
16
6
6
a 0
5 -n.d.: 50 -end could not be determined due to assembly gaps.
Spliced RNAs: The number of 50 -transductions constituted of spliced cellular RNAs is given in parentheses.
b
Interestingly, the sequence found immediately upstream of
PTGR2 exon 4 in PVA elements corresponds to the 50 -part of
the SVA2 unique 30 -sequence. This indicates that SVA2 or a
derivative is the precursor of PVA elements. The 30 -part
of FVA elements could be traced back to an ancestral
sequence localized on chromosome 12 in Macaca mulatta.
The corresponding NLE sequence maps to chromosome
22a:85,484,874–85,485,192 (fig. 2D).
Promiscuous Splicing at the 30 -End Generates VNTR
Composite Variety
The finding that the intersection between SVA2-derived and
PTGR2-derived sequences in PVA elements coincides
precisely with the PTGR2 intron 3–exon 4 junction indicates
that splicing is the most likely mechanism responsible for the
acquisition of the PVA 30 -end. Computational analysis of the
SVA2 30 -unique sequence to which the PTGR2 exon had been
fused identified a splice donor (consensus MAG/gtragt) at the
position of fusion (fig. 3B)—further supporting the assumption of PVA assembly through splicing. For SVA recent
analysis based on the Repbase consensus sequence SVARep
did not identify a splice site in HERV-K corresponding to the
VNTR—env intersection (Hancks and Kazazian 2010).
However, closer inspection of SVANLE (supplementary fig.
S1A, Supplementary Material online) and, subsequently, of
human SVA_A (not shown) revealed that in these ancestral
families SVA2 sequence is still present. Alignment to
SVA2 (Repbase) and HERV-K10 (GenBank accession
number M14123 [Ono et al. 1986]; supplementary fig. S1A,
Supplementary Material online) showed that the same SVA2
donor as in PVA is used. Splice site prediction confirmed the
acceptor (consensus cag/G) in the HERV-K10 env sequence
(fig. 3C).
The ancestral sequences of the LAVA and FVA 30 -parts
map to intron 2 of the gene encoding hydroxysteroid
(17-beta) dehydrogenase 3 (HSD17B3, LAVA—an alignment
of the LAVA 30 -end to its source sequence is given in supplementary fig. S1B, Supplementary Material online) and a region
for which no transcripts are reported (NLE chromosome 22a,
FVA, see fig. 2D). We, therefore, used computational
2849
Ianc et al. . doi:10.1093/molbev/mst256
MBE
FIG. 2. Alignment of VNTR composite Alu-like regions (A) and the 30 -domains of SVANLE (B), PVA (C), and FVA (D). Substitutions specific for SVANLE
are highlighted in black in A and B. Deletions specific to evolutionary younger subfamilies are boxed in A. Consensus sequences of SVA_A and SVA_B
are taken from Wang et al. (2005). LAVA subfamily consensus sequences are those established by the Gibbon Genome Sequencing Consortium
(Carbone et al. 2014). In C and D the 30 -domains of PVA and FVA are aligned to their source loci in the NLE PTGR2 gene and at chromosome
22a:85,484,874–85,485,192, respectively. The corresponding sequences in Macaca mulatta (MMU) are given for comparison. Splice acceptors are
highlighted in black in C and D. The arrowhead in C marks the PTGR2 exon 4/intron 4 boundary. The PTGR2 intron 4 polyadenylation signal is boxed in
C. The part of the sequence repeat-masked as FRAM is boxed in D. NLE, Nomascus leucogenys.
(continued)
2850
Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256
MBE
FIG. 2. Continued.
2851
Ianc et al. . doi:10.1093/molbev/mst256
MBE
FIG. 3. Promiscuous splicing at the 30 -end generates VNTR composite variety. (A) Consensus sequences found at the exon–intron junctions (marked
with arrows) of spliceosomal introns. (B–E) Sequences spliced to give rise to variant VNTR composite 30 -ends. Splice donor sites in the SVA2 30 -unique
sequence or VNTR (top sequences in the panels) and their corresponding acceptor sites in cellular or, in case of SVA, endogenous retroviral RNA (line 2
in each panel) are shown. The sequences found in the resulting PVA (B), SVANLE/SVA_A (C), LAVA (D), and FVA (E) elements are given at the bottom
of each panel. Exon sequences are bold and uppercase; intron sequences lowercase. The 100% conserved residues at the 50 - and 30 -ends of introns are
bold. The part of the SVA2 30 -unique sequence retained in PVA and SVANLE/SVA_A elements is underlined. The SVA2 sequence was obtained from
Repbase, HERV-K10: GenBank accession number M14123 (Ono et al. 1986).
prediction to establish whether splicing might have been operative in 30 -end assembly for these two families as well. The
analysis revealed that in both cases a splice donor site in the
VNTR region could have been used. Splice acceptor sites were
identified at appropriate positions in the respective ancestral
sequences (fig. 3D and E). Thus, splicing represents a possible
mechanism for the acquisition of 30 -sequence also in LAVA
and FVA elements.
Nomascus VNTR Composites Can Be Mobilized by
L1-Encoded Proteins in trans in Human Cells
Having identified four different families of VNTR composites
in the NLE genome, we next wanted to know whether these
can be mobilized in human cells using human L1RP
(Kimberland et al. 1999) as driver. Toward this aim, we
amplified one copy each of PVA (1 kb), FVA (1 kb), and
SVANLE (1.4 kb) as well as two LAVA elements (2 and
2.2 kb, respectively; table 2 and supplementary fig. S2,
Supplementary Material online) from NLE genomic DNA
and cloned them upstream of the mneoI reporter cassette
(Freeman et al. 1994; Moran et al. 1996) using a similar strategy as for the SVA in pAD3SVA_E (Raiz et al. 2012). The reporter cassette consists of a neomycin resistance gene driven by
2852
an SV40 promoter. Transcription terminates at a thymidine
kinase polyA signal. The entire transcription cassette is placed
in antisense relative to the VNTR composite element; the
neomycin phosphotransferase coding sequence is interrupted
by an intron in sense orientation. This arrangement ensures
that G418 resistant (G418R) cells will only arise when a transcript initiated from the promoter driving transcription of the
VNTR composite is spliced, reverse transcribed and reintegrated into chromosomal DNA (fig. 4).
The LAVA elements were chosen from the phylogenetically younger subfamilies LAVA_E and LAVA_F1 (Carbone
et al. 2014). LAVA_F1 elements are characterized by an
Alu-like region comprising only 182 bp (fig. 2A). The truncation is most likely due to a splicing event as computational
analysis predicts a splice donor site at the point of truncation.
The LAVA_F1 element tested has been shown to be specific
for NLE and polymorphic within NLE (Carbone et al. 2014).
Based on copy number in the NLE genome, we expected
different retrotransposition potentials with FVA and PVA
being less well mobilized than LAVA. While PVA and
FVA—as expected—were mobilized close to the level obtained with the pseudogene control vector pCEPNeo
(15–20% of SVA_E), the NLE SVA showed only about half
MBE
Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256
Table 2. VNTR Composite Retrotransposons Amplified from NLE Genomic DNA.
Family/ Identifier
Subfamily
PVA
FVA
Position in GGSC
TSD
Length of the Element
CT Hexameric
Nleu3.0/nomLeu3
Sequence in the
repeats (bp)
(Including TSDs)
mneoI-Tagged Constructs (bp)
AAAGAATGGCAGAAAA
1,021
82
Nl_P16_2 chr16:71,184,467–71,185,653
Nl_F18 chr18:86,030,704–86,032,071 AAAATTCGCAATAAACCA/
991
75
AAAATTCTCAGTAAAACA
SVANLE
Nl_S5_1 chr5:81,980,179–81,981,574 AAAGAAATTAACCTAATA
LAVA_E
chr2:155,391,066–155,392,835 AAAAAAAAAAAAGAAGTCAA
AGAAAACACCGACGT
LAVA_F1
chr3:108,773,434–108,775,518
SVA_E
H19_27
1,354
1,991
2,213
1,899
90
58
122
138
NOTE.—Mismatches in TSDs are bold and underlined.
FIG. 4. Schematic representation of the cell culture retrotransposition
assay. G418 resistant (G418R) cells can arise only if the mneoI-tagged
VNTR composite element is transcribed, spliced and the spliced copy
reintegrated into the genome. Reverse transcription and integration are
mediated by L1 proteins encoded on a cotransfected vector. Following
integration the neo ORF is transcribed from its own promoter—conferring G418 resistance. SD, splice donor; SA, splice acceptor; G418S,
G418 sensitive; The mneoI polyadenylation signal is marked with an
asterisk; An, polyA tail.
of the activity of the human SVA_E (Raiz et al. 2012) used as
standard. Surprisingly, the two LAVA elements differed
dramatically in their mobilization rates: The LAVA_E retrotransposed consistently below pseudogene levels, the
LAVA_F1, on the other hand, was found to be approximately
twice as active as SVA_E (fig. 5A). For none of the constructs
tested G418 resistant colonies could be detected following
cotransfection with the empty pCEP4 vector which served as
negative control.
As our analysis had identified splicing as the mechanism of
assembly of VNTR composites, we next set out to assess
whether RNAs transcribed from our test vectors are spliced
at sites other than those present in the mneoI cassette and if
additional splicing events have an influence on the availability
of full-length mneoI-spliced RNA. Northern blot analysis revealed the existence of a single species of full-length mneoIspliced RNA for FVA (2.4 kb), LAVA_E (3.4 kb), LAVA_F1
(3.6 kb) (marked by asterisks in fig. 5B), and SVANLE (2.8 kb,
fig. 7C) following transfection with the respective vectors. In
case of PVA two bands corresponding to mneoI-spliced RNAs
could be detected. The longer one of 2.4 kb represents the
expected PVA full-length mneoI-spliced RNA. The shorter one
(~2.2 kb) is most likely the result of PVA internal transcription
initiation as amplification from cDNA using an upstream
primer at the element’s 50 –end, and a downstream primer
in the mneoI cassette yields a single product corresponding to
the full-length unspliced PVA (data not shown). Transcription
initiation within the VNTR has been discussed by Hancks et al.
(2011) for 50 -truncated de novo integrants derived from an
SVA lacking an exogenous promoter. Surprisingly, RNA isolated from SVA_E mneoI transfected cells showed a second
hybridization signal at around 2.2 kb in addition to the expected 3.3 kb full-length mneoI-spliced RNA. Reverse transcription-polymerase chain reaction (RT-PCR) analysis
revealed that this hybridization signal represents a mneoIspliced RNA that, in addition, is spliced between a donor
(AG/gtgag) in the SVA_E VNTR and an acceptor (ag/A) at
the very 30 -end of the neomycin phosphotransferase (neo)
ORF. Lacking the neo stop codon and polyadenylation
signal, this RNA cannot give rise to neomycin-resistant cells
following reverse transcription and integration. The VNTR
splice donor corresponds to the one predicted for the 30 assembly of LAVA and FVA (fig. 3D and E)—further supporting splicing as the mechanism responsible for the acquisition
of variant 30 -ends by VNTR composites. Interestingly, the
VNTR-neo single-spliced RNA does not appear to have
been generated at detectable levels (fig. 5B). With regard to
splicing of an SVA when combined with the mneoI cassette it
is worthwhile noticing that the RNA detected by Hancks et al.
(2011) for SVA.2mneoI (expected length for the full-length
mneoI-spliced transcript is 3.5 kb) migrates well below 3 kb.
In light of our findings it can be speculated that this represents a double-spliced transcript as well and that the much
less abundant full-length mneoI-spliced RNA is not visible in
the exposure shown.
Taken together, the results of the Northern blot analysis
indicate that differences in the amount of full-length mneoIspliced RNA available for retrotransposition cannot explain
the differences in mobilization potential observed. PVA and
FVA are mobilized at low levels—despite comparatively
higher amounts of RNA available (fig. 5B). Likewise, in the
case of SVANLE there is no correlation to be observed between
RNA level and retrotransposition capacity relative to SVA_E
(figs. 5A and 7C). SVA_E, LAVA_E, and LAVA_F1 show comparable levels of full-length mneoI-spliced RNA (compare also
fig. 7C), their mobilization potential, however, differs.
2853
Ianc et al. . doi:10.1093/molbev/mst256
MBE
FIG. 5. NLE VNTR composites can be mobilized by L1RP in HeLa HA cells. (A) Results of retrotransposition reporter assays following selection with
hygromycin and G418. Cells were cotransfected with driver (pJM101 L1RP Neo) and the respective mneoI-tagged VNTR composite containing
plasmids. Values given represent the average over 3–9 independent experiments +/ standard deviation. *In case of SVANLE a 1:5 dilution is shown. (B)
Northern blot analysis of the transcripts generated following transfection of the mneoI-tagged VNTR composite constructs. The bands corresponding to
the full-length mneoI-spliced RNAs are marked with asterisks above. The expected lengths are: SVA_E 3.3 kb, PVA 2.4 kb, FVA 2.4 kb, LAVA_E 3.4 kb, and
LAVA_F 3.6 kb. The left-hand panel schematically depicts the isoforms generated through splicing of the SVA_E mneoI transcript. (C) In SVA_E mneoI
transcripts splicing occurs between the VNTR and the 30 -end of the neomycin phosphotransferase (neo) ORF. Exon/intron junctions as well as the
branchpoint are marked with arrowheads. The exon/exon junction in the resulting spliced sequence is marked by a vertical bar. Consensus splice donor,
branchpoint, and splice acceptor sequences are given in the top panel. Exon sequences are in uppercase; intron sequences (except for the branchpoint)
in lowercase. The 100% conserved residues at the 50 - and 30 -ends of introns are bold and underlined. The branchpoint is marked in uppercase. HSV TK
pA, HSV TK polyadenylation cassette of mneoI.
2854
MBE
Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256
Table 3. LAVA_F1 De Novo Integrations.
Id
#1
#1A
#1B
#1C
#1E
#2
#2D
Insertion Site
(hg19)
chr2:173,960,322
chr4:48,364,737
chr2:133,636,276
chr11:61,763,000
chr1:150,617,001
chr14:77,879,917
chr2:42,291,366
Gene
Orientation
ZAK intron 2
SLAIN2 intron 1
NCKAP5 intron 9
Antisense
Antisense
Sense
GOLPH3L downstream
NOXRED1 intron 2
PKDCC downstream
Sense
Antisense
Sense
TSD
AAGATTCTTGA
AAAAAAAAAAAAA
GAAAAGGAAGTGT
AAAGAAAAATGCCC
AAAAAAAAAAAAAAGAAAA
AAAAATATAAGGCCAA
AAGAAAAGGCTCTC
EN Site
(TTTT/AA)a
TCTT/AA
TTTT/AA
TTTC/AA
CTTT/CA
TTTT/GA
TTTT/AA
TCTT/GA
polyA
23
20
63
40
39
45
90
NOTE.—EN, endonuclease.
a
Consensus EN recognition site (Feng et al. 1996).
To confirm that G418 resistant colonies obtained after
LAVA_F1 transfection indeed are the result of retrotransposition events, we characterized integration sites in seven clones.
LAVA_F1 de novo integrants resemble those of SVA (Hancks
et al. 2011, 2012; Raiz et al. 2012): They are mostly full-length
(6/7), contain polyA tails of variable lengths (20–90 bp) and
are flanked by TSDs (11–19 bp). By contrast, only about 50%
of the genomic LAVA insertions for which the 50 -end could be
determined are full-length (Carbone et al. 2014). The insertion
sites resemble the L1 endonuclease consensus cleavage site
(50 -TTTT/AA-30 , [Feng et al. 1996]), except for integration 1C.
In this case 30 -processing of the bottom strand before reverse
transcription might have taken place—as suggested by
Hancks and Kazazian (2012) (based on [Kopera et al. 2011])
for genomic insertions displaying atypical L1 endonuclease
sites with the consensus 50 -YYYY/YN-30 (Hancks and
Kazazian 2012). The actual endonuclease site of integration
1C would then be 50 -TTTC/AG-30 .
Four out of seven insertions occurred in introns of genes;
two downstream of genes (table 3). Similar to what has been
observed for SVA de novo integrants (summarized over the
integrations reported in [Hancks et al. 2011, 2012; Raiz et al.
2012]) there does not appear to be a strand bias for insertions
occurring in or near genes. By contrast, only 20% of intragenic
SVAs (Hancks et al. 2009) and 25% of intragenic LAVAs
(Carbone et al. 2014) are on the coding strand.
The SVA_E Hexameric Repeat/Alu-Like Domain
Enhances PVA and FVA Retrotransposition
In a prior publication, we have demonstrated that deletion of
the CT hexameric repeats and Alu-like region reduces SVA
retrotransposition efficiency by 50% (Raiz et al. 2012). More
recently, Hancks et al. (2012) provided evidence that the
CT-Alu-like domain constitutes the minimal active SVA
(Hancks et al. 2012). PVA and FVA both possess CT-Alu-like
domains—nevertheless they do not retrotranspose significantly above pseudogene level. We, therefore, reasoned that
not the presence but the specific sequence/structure of the
50 -domain (CT-hexamer plus Alu-like) is important for
efficient mobilization. Alignment of the Alu-like domains of
SVA (Wang et al. 2005) and LAVA (Carbone et al. 2014)
subfamilies, PVA, and FVA reveals distinct characteristics for
each of them (fig. 2A). The ancestral Alu-like domains of
SVA_A, SVANLE, LAVA_A2, FVA, and PVA do not display
any of the deletions found in the evolutionary younger subfamilies. It is also worthwhile noticing that these families
(SVANLE, FVA, PVA, this study) and subfamilies (SVA_A
[Wang et al. 2005], LAVA_A2 [Carbone et al. 2014]) reached
only comparatively low copy numbers in the respective
genomes. To test the hypothesis that not the mere presence
but rather the sequence (and, as VNTR composites are
noncoding, the structure likely determined by the sequence)
of the CT-Alu-like domain determines mobilization potential,
we generated domain swaps by reciprocally exchanging the
SVA_E and PVA/FVA 50 -regions. The chimeras constructed
are schematically depicted in figure 6A.
Given that SVA_E is efficiently mobilized in our assay
system, we expected the SP and SF domain swaps to be
mobilized more efficiently than their parental elements
(PVA, FVA) if indeed the sequence/structure of the CT-Alulike domain is the key determinant for VNTR composite
retrotransposition mediated by L1 in trans. The PS and FS
chimeras, containing the ancestral PVA/FVA 50 -domains,
were expected to be less well mobilized than SVA_E. The
results obtained (fig. 6B) provide support for the hypothesis
outlined above: The SP and SF chimeras containing the
SVA_E hexameric repeats and Alu-like domain are mobilized
4- and 7-fold more efficiently than PVA and FVA, respectively.
Transcript patterns and steady-state levels of full-length
mneoI-spliced RNAs of parental elements (PVA, FVA) and
corresponding chimeras (SP, SF) are comparable (fig. 6C).
The ancestral 50 -domains of PVA and FVA, on the other
hand, drastically reduce the mobilization capacity of the
respective chimeras (PS; FS) when compared with SVA_E
(fig. 6B). The ratio of full-length mneoI-spliced to doublespliced transcript is roughly equal for SVA_E and the two
chimeras. In case of FS an influence of smaller amounts of
available full-length mneoI-spliced RNA on the observed
retrotransposition rate can, however, not be completely
excluded (fig. 6C, right panel).
Incompatibility of LAVA and SVA Domains Suggests
Different Pathways for LAVA and SVA Mobilization
Based on the results obtained with the SVA–PVA/FVA
domain swaps, we expected a similar effect of the SVA_E
CT-Alu-like domain on retrotransposition of the “inactive”
2855
Ianc et al. . doi:10.1093/molbev/mst256
MBE
FIG. 6. The structure of the hexameric repeat/Alu-like region determines retrotransposition potential of SVA/PVA/FVA. (A) Schematic representation
of the SVA–PVA/FVA chimeras tested. 50 -Domains of SVA and PVA/FVA were reciprocally exchanged at the Alu-like—VNTR junction. (B) The resulting
chimeras were cotransfected with an L1RP expression vector and cells subjected to consecutive hygromycin and G418 selection. Retrotransposition
rates (+ standard deviation) are given relative to that of SVA_E (100%). Numbers above columns denote the number of experiments taken into account
for each individual construct. Brackets link chimeras and parental elements that can be directly compared based on identical transcript patterns. (C)
Northern blot analysis of the transcripts generated following transfection of the mneoI-tagged domain swap constructs. The bands corresponding to the
full-length mneoI-spliced RNAs are marked with asterisks above (left panel) or an arrowhead (right panel). Brackets link chimeras and parental elements
that can be directly compared based on identical transcript patterns. The expected lengths of the full-length mneoI-spliced RNAs are: SVA_E 3.3, SP 2.5,
SF 2.5, PVA 2.4, PS 3.3, FVA 2.4, and FS 3.3 kb. A side-by-side comparison of the spliced transcripts resulting for SVA_E, PS, and FS is shown on the right.
LAVA_E element. However, there was only a slight increase in
mobilization potential to be observed for the SLE chimera
when compared with LAVA_E (fig. 7B, schematic representation of the chimeras see fig. 7A and supplementary fig. S3,
Supplementary Material online). RNA levels of the SLE
chimera and the parental LAVA_E element are similar
2856
(fig. 7C) so that observed retrotransposition rates can be directly compared. Domain swaps carrying the LAVA_E or
LAVA_F1 hexameric repeats and Alu-like domains at their
50 -ends (LES, LFS) could not be mobilized above pseudogene
level (fig. 7B). Whereas this result had been expected for the
LAVA_E CT-Alu-like domain, combination of domains
Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256
MBE
A
B
C
FIG. 7. SVA and LAVA domains are incompatible. (A) Schematic representation of the SVA–LAVA chimeras tested. 50 -Domains of SVA and LAVA_E/
LAVA_F were reciprocally exchanged at the Alu-like—VNTR junction, with the exception of SLF were the exchange was effected at the VNTR 30 -end of
SVA_E. The fine structure of the junctions of the SVA/LAVA chimeras is given in supplementary figure S3, Supplementary Material online. (B) The
resulting chimeras were cotransfected with an L1RP expression vector and cells subjected to consecutive hygromycin and G418 selection.
Retrotransposition rates (+ standard deviation) are given relative to that of SVA_E (100%). Numbers above columns denote the number of independent
experiments for each individual construct. The results obtained with the pCEPNeo pseudogene control are given for comparison. (C) Northern blot
analysis of the transcripts generated following transfection of the mneoI-tagged domain swap constructs. SVANLE mneoI-tagged RNA analyzed in the
same experiment is shown in addition. The bands corresponding to the full-length mneoI-spliced RNAs are marked with asterisks above. The expected
lengths are: SVA_E 3.3, SLE 3.5, SLF 3.3, LAVA_E 3.4, LES 3.2, LFS 3.1, LAVA_F 3.6, and SVANLE 2.8 kb. Note that the splicing pattern of the SLF chimera, for
which the domain exchange was effected at the VNTR 30 -end corresponds to that of SVA_E.
derived from the two active elements LAVA_F1 and SVA_E
had been predicted to yield an active element. This, however,
is not the case—indicating that domains of the two large
families of VNTR composites, LAVA and SVA, are incompatible and, in extension, that elements of the two families might
use mobilization pathways with different structural requirements. Results obtained with the last chimera, SLF, support
this assumption: The SVA_E hexameric repeats and Alu-like
domain is not functional when combined with the LAVA_F1
30 -part. Due to the shared VNTR region the LES, LFS, and SLF
chimeras show the same splicing pattern as SVA_E. There is
no correlation to be observed between the amount of fulllength mneoI-spliced transcripts and the retrotransposition
rates observed.
As the domain swap experiments indicated that SVA and
LAVA might use different mobilization pathways, we next
set out to explore whether LAVA retrotransposition is
dependent on L1 ORF1p. Previously it has been shown
that L1 ORF1p is dispensable for Alu retrotransposition
(Dewannieux et al. 2003), whereas there are conflicting data
on ORF1p requirement in SVA mobilization (Hancks et al.
2011; Raiz et al. 2012). Using an L1ORF2-only driver Hancks
2857
MBE
Ianc et al. . doi:10.1093/molbev/mst256
FIG. 8. The LAVA_F1 30 -L1ME5 sequence inhibits retrotransposition.
(A) Schematic representation of the LAVA 30 -domain indicating the
sites at which the respective constructs have been truncated.
Numbers are given relative to the first nucleotide of the LAVA_F1
30 -domain. (B) Retrotransposition reporter assay following selection
with G418 only. Cells were cotransfected with driver (pJM101 L1RP Neo) and the respective mneoI-tagged LAVA_F1 30 -deletion mutants or
the LAVA_F1 full-length construct. Retrotransposition rates (+ standard
deviation) are given relative to that of the full-length LAVA_F1 construct (100%). The inset shows the Northern blot analysis of the
RNAs generated from the two shortest LAVA_F 30 -deletions. The expected lengths of the full-length mneoI-spliced RNAs are: LAVA_F
3.6 kb, 15–3.2 kb, and 92–3.3 kb.
et al. found a canonical SVA_D to be independent of ORF1p.
It is, however, possible that endogenous ORF1p is sufficient to
support retrotransposition in this setting. In the presence of a
bicistronic driver containing a double mutation in ORF1 trans
mobilization of SVA_D was reduced to background levels,
suggesting that SVA retrotransposition requires both L1encoded proteins (Hancks et al. 2011). Results consistent
with the latter finding were obtained by Raiz et al. (2012)
for an SVA_E using a driver containing an ORF1 in-frame
deletion. The controversial results obtained are most likely
the result of differences in the availability of ORF2p, in the
ORF1p/ORF2p ratio (including endogenous ORF1p) and in
the formation and composition of L1RNPs (Doucet et al.
2010) in each of the three experimental approaches.
Cotransfection of the mneoI-tagged LAVA_F1 with the
driver carrying the L1 ORF1 in-frame-deletion (Raiz et al.
2012) did not yield any colonies following G418 selection.
This result provides a first indication that mobilization of
LAVA requires L1 ORF1p.
The LAVA 30 -L1ME5 Sequence Inhibits
Retrotransposition
Analysis of the structure of retrotransposon genomic copies
can provide insights into mechanisms of and requirements
for their mobilization. A whole-genome survey of LAVA
elements in the NLE genome revealed that at least 97 of
them (~5%) are 30 -truncated; the vast majority of them
through premature polyadenylation (supplementary fig. S4,
Supplementary Material online). By contrast to SVAs in the
human genome, for which premature polyadenylation events
were found to be distributed over the entire length of the
2858
SINE-R region (Damert A, unpublished data), LAVA premature polyadenylation occurs exclusively downstream of the
simple repeat (U2) region. The minimum length of the 30 -part
of LAVA genomic copies is, thus, around 300 bp. One explanation for this finding could be that there are no suitable
polyadenylation signals in the U1-AluSz-U2 part of the LAVA
30 -end. On the other hand, it is possible that the U1-AluSz-U2
is absolutely required for retrotransposition. To test this latter
hypothesis, we generated LAVA_F1 nested 30 -deletions
lacking most of the antisense L1ME5 sequence (332,
fig. 8A), the U2 30 -part and the L1ME5 (227, fig. 8A), the
AluSz 30 , U2, and L1ME5 (92, fig. 8A), and the AluSz, U2,
and L1ME5 (15, fig. 8A), respectively. Because LAVA_F1,
when cotransfected with L1RP, had been found to yield
acceptable colony counts after selection with G418 only,
the experiments were carried out without hygromycin preselection. Surprisingly, all of the deletion mutants yielded more
G418R colonies than the full-length LAVA_F1 element
(fig. 8B). One explanation for this finding could be that in
case of the full-length construct the use of alternative polyadenylation sites in the L1ME5 antisense fragment leads to
transcription termination upstream of the mneoI cassette,
thus reducing the amount of mneoI containing transcript
available for retrotransposition. Northern blot analysis, however, did not provide support for this scenario. Deletion mutants lacking the L1ME5 antisense fragment do not yield
significantly higher amounts of full-length mneoI-spliced
RNA than the full-length construct (inset in fig. 8B). We,
therefore, conclude that the antisense L1ME5 sequence,
that is missing in all deletion mutants but present in the
full-length element, has an inhibitory effect on LAVA_F1
mobilization by L1 encoded proteins in trans.
Discussion
Assembly of VNTR-Containing Composites
The sequence and mechanism(s) of the assembly of the prototype VNTR composite retrotransposon(s) have, up to date,
been elusive. Based on the presence of the VNTR region—
which is shared by all VNTR composites—it has been safe to
assume that SVA2 is their common ancestor. With regard to
the Alu-like region Hancks and Kazazian suggested a series of
splicing events to explain its mosaic structure. Although they
discuss splicing as the mechanism of acquisition for the
SINE-R as well, sequence analysis using the SVARep consensus
led them to conclude that the VNTR-SINE-R fusion most
likely is the result of template switching (Hancks and
Kazazian 2010). We now provide evidence that the
30 -domains of all four families of VNTR composites have
been acquired through splicing. In case of SVA and PVA
SVA2 could unambiguously be identified as the molecule
providing the splice donor. It must, therefore, be the precursor of these families. Phylogenetic analysis (fig. 9A) indicates
that the SVANLE/SVA_A/B and PVA Alu-like parts are derived
from a common ancestor—independent acquisition of this
domain appears, therefore, unlikely. Taken together these two
findings—SVA2 as precursor and a common ancestor for
PVA and SVA—suggest that an SVA2 already carrying the
Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256
MBE
FIG. 9. Assembly of VNTR composite retrotransposons through splicing to “Alu-SVA2.” (A) PhyML generated maximum-likelihood tree of the Alu-like
domains of ancestral VNTR composite families/ subfamilies. (B and C) Schematic representation of the assembly of PVA and SVA prototype elements
through splicing to the SVA2 30 -unique sequence (B) and of LAVA and FVA prototype elements through splicing to the SVA2 VNTR region (C). The
SVA2 30 -unique sequence retained in PVA and SVA is colored dark gray. Exons are represented by numbered boxes. Polyadenylation signals are denoted
with asterisks. TSDs are shown as arrows. SA, splice acceptor; FR, FRAM partial sequence.
Alu-like domain at its 50 -end did exist at one point in evolution. This “Alu-SVA2” subsequently acquired two different
30 -domains—SINE-R and PTGR2 exon 4/intron 4—through
splicing to HERV-K and PTGR2 RNAs, respectively. Whether
splicing occurred in cis or trans cannot be decided anymore as
no Alu-SVA2 elements have been preserved during evolution.
For splicing to occur in cis (as illustrated in fig. 9B) copies of
Alu-SVA2 must have existed at an appropriate distance
2859
Ianc et al. . doi:10.1093/molbev/mst256
upstream of a genomic HERV-K copy and in PTGR2 intron 3,
respectively. Alternatively, splicing could have happened in
trans, combining Alu-SVA2 and HERV-K/PTGR2 transcripts
derived from different loci. For the assembly of LAVA and
FVA splice donors in the VNTR region have been used.
Formally they could, thus, have originated from either
Alu-SVA2 or an element carrying a different 30 -end (SVA,
PVA). Phylogenetic analysis places FVA and LAVA Alu-like
domains on a branch separate from that leading to SVA
and PVA (fig. 9A). Based on this derivation of FVA and
LAVA from PVA/SVA appears unlikely. Potential splice acceptors could be identified at appropriate positions in the source
sequences of both FVA (on chromosome 22a in NLE) and
LAVA (in HSD17B3 intron 2) 30 -ends (fig. 3 and supplementary fig. S1B, Supplementary Material online). Figure 9C illustrates a possible scenario for acquisition of FVA and LAVA
30 -ends through splicing to Alu-SVA2 in cis.
Whereas both HERV-K (Ahn and Kim 2009) and PTGR2
(Zhang et al. 2003), the precursors of SVA and PVA, have been
demonstrated to be expressed in germ cells/testis, to date
there is no transcript or expressed sequence tag (EST) annotated for the locus of origin of the FVA 30 -part. The LAVA
30 ancestral sequence maps to intron 2 of HSD17B3, which is
expressed in testis germ cells. However, there is no EST support for alternative splicing at the site used for fusion to the
VNTR region in LAVA. Thus, the 30 -assembly of both LAVA
and FVA must be the result of rare events if splicing is
involved.
Differential Mobilization of SVA, PVA, and FVA Can
Be Attributed to the Structure of Their 50 -Domains
The SVA 50 -part has long been suspected to play a crucial role
in the mobilization of these composite retrotransposons.
A first model (Ostertag et al. 2003; Mills et al. 2007) postulated
hybridization of the SVA 50 -antisense Alu copies with ribosome-bound Alu elements as the mechanism facilitating
interaction of SVA RNA with L1 proteins. Subsequently, we
could show that deletion of the hexameric repeat/Alu-like
region reduces the SVA retrotransposition rate by 50%
(Raiz et al. 2012). More recently, the hexameric repeat/
Alu-like domain has been demonstrated to constitute the
minimal active human SVA (Hancks et al. 2012). The identification of three additional families of VNTR composite nonLTR retrotransposons sharing the SVA 50 -domain—LAVA
([Carbone et al. 2012], Damert A, unpublished data), PVA
,and FVA (this study)—now opened up the unique opportunity to assess the contribution of this functional domain in
the context of different elements. The results obtained show
marked differences in the capacity of the elements to be
mobilized by L1RP proteins in trans. PVA and FVA retrotransposition rates were found to be close to those obtained for
the processed pseudogene control—in spite of the presence
of the hexameric repeat/Alu-like domain in both elements.
Thus, either the respective 30 -ends exert an inhibitory effect
on trans mobilization or there are functionally relevant
differences in the 50 -domain when compared with the efficiently mobilized SVA_E. The latter assumption received
2860
MBE
support from the finding that evolutionary younger (and
presumably still active) LAVA and SVA subfamilies are characterized by specific deletions in the Alu-like domain—by
contrast to the ancestral (low genomic copy number) families
SVA_A, SVANLE, PVA, and FVA (fig. 2A). The relatively low
mobilization rate of the NLE SVA is also in line with the
hypothesis that the ancestral type of the Alu-like domain
(“deletion free”) does not support efficient retrotransposition.
Finally, chimeras composed of the SVA_E 50 -domain and the
PVA/FVA VNTR/30 -end are efficiently mobilized, whereas
combination of the PVA/FVA 50 -ends with the SVA_E
VNTR/SINE-R leads to a drastic reduction of retrotransposition potential when compared to SVA_E. Based on these
results, we conclude that the specific sequence-based structure of the hexameric repeat/Alu-like region is the critical
parameter for mobilization efficiency of SVA, PVA, and
FVA. Differences in the Alu-like domains are apparent
(fig. 2A), however, how exactly they might influence the process of retrotransposition remains to be elucidated. Deletions
are more likely to have an effect on secondary structure than
single nucleotide substitutions—thus, it might be the specific
folding of the 50 -domain that determines mobilization
efficiency. Alternatively, protein binding might differ for the
different Alu-like domains—with possible effects on RNA
stability, transport and retrotransposition.
Closer inspection of the elements revealed that there are
also differences in the length of the hexameric repeat region.
PVA, FVA, and SVANLE possess 82, 75, and 90 bp of CT repeats,
respectively; the hexameric repeat region of SVA_E spans
138 bp (table 2). Hancks et al. (2012) recently reported that
deletion of the CT repeats significantly reduced SVA retrotransposition rates. The two elements tested in their experiments (SVA_D) contain hexameric repeat regions of 122 and
126 bp, respectively. They also observed that readdition of
20/35 bp of CT repeats to the hexamer deleted elements
did not rescue SVA activity (Hancks et al. 2012). It is tempting
to speculate that there might be a minimal length of the
hexameric repeat region required for efficient
retrotransposition.
The contribution of differences in the VNTR region to the
differences in retrotransposition rates observed is the most
difficult to assess. From the comparison of the results obtained for PVA, FVA, and SVANLE with their relatively short
VNTRs (250–350 bp) to SVA_E (900 bp VNTR) it could be
assumed that shorter VNTRs support lower retrotransposition rates than longer ones. However, there are two findings
contravening this assumption: First, the SP and SF chimeras,
in which the PVA/FVA short VNTRs are fused to the SVA_E
CT/Alu-like domain, show a four and 7-fold increase in retrotransposition rate, respectively, when compared with their
parental elements. Second, Hancks et al. (2012) reported an
increase in retrotransposition following partial deletion of the
VNTR region in SVA. How the internal organization of the
VNTR domain (sequential arrangement of shorter and longer
repeat units, presence of half-repeats and internal deletions in
the VNTR units) affects mobilization capacity remains to be
elucidated.
Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256
LAVA Mobilization Has Structural Requirements
Differing from Those of SVA
LAVA is the largest family of VNTR composites in the NLE
genome. The two elements chosen to be tested for trans
mobilization in vitro belong to the evolutionary younger subfamilies LAVA_E and LAVA_F1. Strikingly, only the LAVA_F1
element is retrotransposed efficiently in our assay, whereas
the LAVA_E element was found to be mobilized below pseudogene levels. Contrary to the results obtained with PVA (SP)
and FVA (SF) chimeras, the SVA_E 50 -domain was not able to
rescue LAVA_E retrotransposition—neither was it found
to be functional when combined with the LAVA_F1 30 -end.
Taken together with the fact that the LAVA_F1 hexameric
repeat/Alu-like region (derived from an efficiently mobilized
element) is inactive in the context of the SVA_E VNTR/SINER, and these findings suggest that SVA and LAVA interact
with different sets of host proteins/use different pathways
for their mobilization. Efficient mobilization in the “SVA
pathway”—which is used by PVA and FVA as well—requires
a particular structure of the 50 -hexameric repeat/Alulike region. The “LAVA pathway” obviously has different
sequence/structural requirements as evidenced by the
inactivity of the SVA_E 50 -domain in the context of LAVA
30 -ends. Which of the domains is responsible for targeting of a
VNTR composite element to this pathway remains to be
elucidated.
Deletion analysis of the LAVA_F1 30 -end suggests that
the antisense L1ME5 fragment has an inhibitory effect.
An inhibitory effect of sequences 30 of the VNTR has
also been suggested for SVA by Hancks et al. (2012) who
found that “most of the SVA deletions lacking SINE-R
sequences are more active than their full-length counterpart.”
The AluSz and U2 region of the LAVA 30 -end appear to be
dispensable (at least for LAVA_F1 retrotransposition) as a
deletion mutant lacking these sequences is still efficiently
mobilized.
Further characterization of the LAVA pathway will necessitate testing additional elements—especially against the
background that the element efficiently mobilized in our
study is characterized by the LAVA_F1 specific truncation
in its 50 -domain which sets it apart from the other subfamilies.
Elements of other LAVA subfamilies are found to be still
polymorphic in NLE (Carbone et al. 2014)—suggesting
recent mobilization and—in conclusion—presence of
all structural features necessary for retrotransposition in
trans. Analysis of a number of them should also provide an
answer to the question whether the results obtained for the
LAVA_E element here are representative for the entire subfamily. Amplification of LAVA from genomic DNA is, unfortunately, severely hampered by the fact that especially
members of the younger subfamilies frequently inserted in
or close to other repetitive sequences (Damert A, unpublished data) and, in case of polymorphic elements, by amplification bias toward the preintegration allele. Once BACs
mapped to the genome assembly will become publicly available, a more detailed characterization of the LAVA pathway
will be possible.
MBE
Materials and Methods
Element Identification, Retrieval, Age Estimates, and
Phylogenetic Analysis
PVA and FVA elements were initially identified using BLAST
(Altschul et al. 1990) at http://blast.ncbi.nlm.nih.gov (last
accessed September 18, 2014) against the NLE wgs database
(Carbone et al. 2014) with the SVA_A consensus (Wang et al.
2005) as query sequence. Retroelements carrying the SVA_A
50 -end and flanked by TSD were repeat-masked using the
Repeatmasker web server at http://www.repeatmasker.org/
cgi-bin/WEBRepeatMasker (last accessed September 18,
2014). Consensus sequences for the respective 30 -ends were
constructed and subsequently used as BLAST query to retrieve exhaustive sets of sequences from the GenBank wgs
section (December 2010) and the NLE genome build 1.1 (October 2011) (Carbone et al. 2014). The elements were annotated manually. Sequence logos were generated using
WebLogo 3 at http://weblogo.threeplusone.com (last
accessed September 18, 2014) (Schneider and Stephens
1990; Crooks et al. 2004). All alignments were calculated
using BioEdit. Consensus sequences were generated using a
majority rule approach. Age estimates for SVANLE were obtained by aligning the SINE-R parts of the elements to the
consensus. Substitution densities were then calculated separately for CpG and non-CpG sites using a Python script.
Neutral substitution rates of 0.0090/site per My and 0.0015/
site per My were used for CpG and non-CpG substitutions,
respectively (Xing et al. 2004). Phylogenetic analysis was carried out using PhyML at www.phylogeny.fr (last accessed
September 18, 2014) with default parameters.
Splice Site Prediction
Splice site prediction was performed with the human splicing
finder 2.4.1 (Desmet et al. 2009) at http://www.umd.be/HSF3/
HSF.html (last accessed September 18, 2014) and using the
splice site prediction at http://www.fruitfly.org/seq_tools/
splice.html (last accessed September 18, 2014) (Reese et al.
1997).
Plasmid Constructs
All VNTR composite test vectors are based on pCEPNeo (Raiz
et al. 2012). Elements were inserted via KpnI/NheI. Primers
used for amplification are listed in supplementary table S4,
Supplementary Material online. All amplification and cloning
steps were verified using Sanger sequencing. The structure of
the domain swaps is schematically depicted in figures 6A
and 7A.
pAD7PVA, pAD8FVA, pAD9LAVA_E, pAD10LAVA_F1,
and pAD11SVANLE
The respective elements (positions and TSDs listed in table 2
and supplementary fig. S1, Supplementary Material online)
were amplified from NLE genomic DNA (kindly provided by
Christian Roos, Gene Bank of Primates at the German Primate
Centre, G€ottingen) using Phusion Hot Start II (Thermo
Scientific) according to the manufacturer’s instructions. To
2861
MBE
Ianc et al. . doi:10.1093/molbev/mst256
amplify LAVA elements, DMSO was added to the reaction to
a final concentration of 3%, and denaturation time was
extended to 30 s. Amplified elements were subcloned into
pJET 1.2 (Thermo Scientific). Reamplifications were carried
out using 50 -primers localized directly upstream of the CT
hexameric repeats and 30 -primers designed to exclude the
elements’ polyadenylation signals. Upstream primers contain
a KpnI, downstream primers a NheI restriction site, respectively. Reamplification products were subcloned again into
pJET 1.2 for sequencing and further cloning. Finally, the
elements were transferred into pCEPNeo via KpnI/NheI.
SP, PS, SF, and FS Domain Swaps
The 50 -hexameric repeats and Alu-like domains of the elements were combined with the VNTR/SINE-R of H19_27
(pAD3SVA_E [Raiz et al. 2012]) or VNTR/30 -ends of PVA and
FVA, respectively, via the AlwNI site at the Alu-like—VNTR
junction shared by PVA, FVA, and SVA.
SLE and LES Domain Swaps
As the AlwNI/BstAPI sites of the LAVA_E and the SVA_E
(H19_27) elements differ by 1 nt, they were made compatible
by amplification of the 50 -hexameric repeats and Alu-like domains using downstream primers carrying the AlwNI recognition sequence of the respective other element. The
amplified 50 -ends of LAVA_E and SVA_E were subcloned
and then reciprocally combined with the SVA_E (H19_27)
VNTR/SINE-R and LAVA_E VNTR/LA using BstAPI and
AlwNI, respectively.
SLF and LFS Domain Swaps
LAVA_F1, due to its shorter Alu-like domain, does not offer
the possibility of direct exchange via AlwNI. For generation
of the SLF domain swap the exchange was, therefore, made
at the 30 -end. The SVA_E (H19_27) CT/Alu-like/VNTR was
amplified using a downstream primer complementary to the
VNTR 30 -end which, at its 50 -end, contained the first 6 bp of
the LAVA_F1 LA domain including an NcoI recognition site.
The SVA-derived amplification product was then combined
with the LAVA_F1 30 -region in pCEPNeo KpnI/NcoI/NheI.
The LFS domain swap was generated by amplification of
the LAVA_F1 50 -end using a primer with a SmaI recognition
site. The amplification product was then combined with
the SVA_E VNTR/SINE-R and cloned into pCEPNeo using
KpnI/SmaI/AlwNI(blunt)/NheI.
LAVA 30 -Deletion Mutants
LAVA 30 -deletion mutants were generated using Bal31 digestion. The resulting deletions were repaired at their 30 -ends
and transferred into pCEPNeo via KpnI/NheI (blunt).
pJM101 L1RPNeo
pJM101 L1RPNeo (Wei et al. 2001) was kindly provided by
John Moran.
2862
pJM101 L1RPNeoORF1
pJM101 L1RPNeoORF1 (Raiz et al. 2012) was kindly provided by Gerald Schumann.
Tissue Culture and Retrotransposition Assays
HeLa HA cells (kindly provided by J. Moran and previously shown to support detectable levels of SVA retrotransposition [Raiz et al. 2012]) were cultured in DMEM
(Lonza) 4.5 g/l Glucose, 10% FCS. Cell-based assays to
assess retrotransposition in trans were carried out as
described previously (Moran et al. 1996; Raiz et al. 2012)
with minor modifications. Briefly, 4 105 cells were seeded
on T25 flasks 24 h before transfection. They were then
cotransfected with 2 g test plasmid and 2 g L1 expression
vector (pJM101 L1RPNeo) or pCEP4 (Invitrogen), respectively, using X-tremeGENE 9 (Roche) according to the
manufacturer’s instructions. For assays with hygromycin
selection medium was changed 24 h posttransfection to
medium containing 200 g/ml hygromycin (Invitrogen).
Cells were divided after six days of hygromycin selection
and selection was continued with 50% of the cells for another
six days. Where appropriate the other half of the cells was
kept for RNA isolation. After a total of 12 days of hygromycin
selection cells (~7 106) were trypsinized and seeded directly
into medium containing 400 g/ml G418 (Invitrogen).
In case of test vectors displaying higher retrotransposition
rates 1:5/1:10 dilutions were seeded to facilitate counting of
individual colonies. In pilot experiments different dilutions
were plated to validate that starting cell number does not
have an influence on selection conditions and outcome of the
experiment. G418 selection was carried out for 10–12 days.
Subsequently, cells were stained with Giemsa (Merck) and
colonies were counted. For assays without hygromycin selection the medium was changed 24 h posttransfection and cells
were reseeded 48 h posttransfection. G418 selection was
initiated 72 h posttransfection and continued for 12 days.
RNA Isolation, Northern Blot Analysis, and RT-PCR
RNA was isolated from cells after 12 days of hygromycin
selection using the peqGOLD Total RNA Kit (PEQLAB
Labortechnologie GmbH). Northern Blot analysis was carried
out using 8 g of total RNA and the NorthernMax-Gly Kit
(Ambion, Life Technologies). Membranes were hybridized
with a biotin-labeled intron-spanning neo sense riboprobe.
The Chemiluminescent Nucleic Acid Detection Module Kit
(Pierce, Thermo Scientific) was used for detection.
For RT-PCR 3 g of total RNA were DNAse I (Fermentas)
digested, 1.5 g of these were then reverse transcribed using
Superscript II (Invitrogen) according to the manufacturer’s
instructions. The remaining 1.5 g served as negative control
(-RT). Amplification of VNTR—neo-spliced transcripts of
SVA_E and the chimeras containing the SVA_E VNTR was
achieved using the respective upstream (KpnI—site containing) primer used for reamplification (supplementary table S4,
Supplementary Material online) in combination with a downstream primer localized at the 30 of the neo ORF (GS88 50 -CCT
TCTATCGCCTTCTTGACGAGTTCTTC-30 ; Neo_DW 50 -CTTC
Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256
TATCGCCTTCTTGACG-30 ; or SVA_Neo_Down_1 50 -ACCGC
TTCCTCGTGCTTTAC-30 ). The resulting amplicons were
subcloned into pJET1.2 (Thermo Scientific) and sequenced.
Characterization of LAVA_F1 De Novo Integrations
Following transfection with pAD10LAVA_F1/pJM101
L1RPNeo and G418 selection single colonies were grown
up and genomic DNA was isolated using the DNeasy Blood &
Tissue kit (Qiagen). The presence of the spliced mneoI
cassette was determined by PCR using primers GS86/GS87
(Raiz et al. 2012). Genomic DNA was then digested with MscI/
NheI or MscI/SacI and 30 -ends of de novo integrations were
determined using EPTS-LM PCR as described previously
(Kirilyuk et al. 2008; Raiz et al. 2012). Subsequently, primers
in the upstream genomic sequence were designed and de
novo integration 50 -ends were amplified.
Supplementary Material
Supplementary tables S1–S4 and figures S1–S4 are available at
Molecular Biology and Evolution (http://www.mbe.oxfordjournals.org/).
Acknowledgments
The authors thank the Gibbon Genome Sequencing
Consortium for making NLE genome sequences available
before publication. Furthermore, we wish to thank Christian
Roos for providing NLE genomic DNA, John Moran for
providing plasmids pJM101 L1RP and pJM101 L1RPNeo
as well as HeLa-HA cells and Gerald Schumann for plasmid
pJM101 L1RPNeoORF1. The authors also like to thank
the anonymous reviewers of this and an earlier version of
the manuscript for helpful comments. This work was
supported by grants of the Ministry of National Education,
CNCS–UEFISCDI, project number PN-II-ID-PCE-2012-4-0090
(to A.D.) and PN-II-IDEI-PCCE 312/2008 (to O.P.).
References
Ahn K, Kim HS. 2009. Structural and quantitative expression analyses of
HERV gene family in human tissues. Mol Cells. 28:99–103.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local
alignment search tool. J Mol Biol. 215:403–410.
Carbone L, Harris RA, Gnerre S, Veeramah KR, Lorente-Galdos B,
Huddleston J, Meyer TJ, Herrero J, Roos C, Aken B, et al. 2014.
Gibbon genome and the fast karyotype evolution of small apes.
Nature 513:195–201.
Carbone L, Harris RA, Mootnick AR, Milosavljevic A, Martin DI, Rocchi
M, Capozzi O, Archidiacono N, Konkel MK, Walker JA, et al. 2012.
Centromere remodeling in Hoolock leuconedys (Hylobatidae) by a
new transposable element unique to the gibbons. Genome Biol Evol.
4:648–658.
Chan YC, Roos C, Inoue-Murayama M, Inoue E, Shih CC, Pei KJ, Vigilant
L. 2010. Mitochondrial genome sequences effectively reveal the phylogeny of Hylobates gibbons. PLoS One 5:e14419.
Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a
sequence logo generator. Genome Res. 14:1188–1190.
Damert A, Raiz J, Horn AV, Lower J, Wang H, Xing J, Batzer MA, Lower R,
Schumann GG. 2009. 5’-Transducing SVA retrotransposon groups
spread efficiently throughout the human genome. Genome Res. 19:
1992–2008.
Desmet FO, Hamroun D, Lalande M, Collod-Beroud G, Claustres M,
Beroud C. 2009. Human Splicing Finder: an online bioinformatics
tool to predict splicing signals. Nucleic Acids Res. 37:e67.
MBE
Dewannieux M, Esnault C, Heidmann T. 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat Genet. 35:41–48.
Doucet AJ, Hulme AE, Sahinovic E, Kulpa DA, Moldovan JB, Kopera HC,
Athanikar JN, Hasnaoui M, Bucheton A, Moran JV, et al. 2010.
Characterization of LINE-1 ribonucleoprotein particles. PLoS Genet. 6
Feng Q, Moran JV, Kazazian HH Jr, Boeke JD. 1996. Human L1
retrotransposon encodes a conserved endonuclease required for
retrotransposition. Cell 87:905–916.
Freeman JD, Goodchild NL, Mager DL. 1994. A modified indicator gene
for selection of retrotransposition events in mammalian cells.
Biotechniques 17:46, 48–49, 52.
Han K, Konkel MK, Xing J, Wang H, Lee J, Meyer TJ, Huang CT, Sandifer
E, Hebert K, Barnes EW, et al. 2007. Mobile DNA in Old World
monkeys: a glimpse through the rhesus macaque genome. Science
316:238–240.
Hancks DC, Ewing AD, Chen JE, Tokunaga K, Kazazian HH Jr. 2009. Exontrapping mediated by the human retrotransposon SVA. Genome
Res. 19:1983–1991.
Hancks DC, Goodier JL, Mandal PK, Cheung LE, Kazazian HH Jr. 2011.
Retrotransposition of marked SVA elements by human L1s in cultured cells. Hum Mol Genet. 20:3386–3400.
Hancks DC, Kazazian HH Jr. 2010. SVA retrotransposons: evolution and
genetic instability. Semin Cancer Biol. 20:234–245.
Hancks DC, Kazazian HH Jr. 2012. Active human retrotransposons:
variation and disease. Curr Opin Genet Dev. 22:191–203.
Hancks DC, Mandal PK, Cheung LE, Kazazian HH Jr. 2012. The minimal
active human SVA retrotransposon requires only the 5’-hexamer
and Alu-like domains. Mol Cell Biol. 32:4718–4726.
Israfil H, Zehr SM, Mootnick AR, Ruvolo M, Steiper ME. 2011.
Unresolved molecular phylogenies of gibbons and siamangs
(Family: Hylobatidae) based on mitochondrial, Y-linked, and
X-linked loci indicate a rapid Miocene radiation or sudden vicariance event. Mol Phylogenet Evol. 58:447–455.
Jurka J. 2000. Repbase update: a database and an electronic journal of
repetitive elements. Trends Genet. 16:418–420.
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz
J. 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110:462–467.
Kimberland ML, Divoky V, Prchal J, Schwahn U, Berger W, Kazazian
HH Jr. 1999. Full-length human L1 insertions retain the capacity
for high frequency retrotransposition in cultured cells. Hum Mol
Genet. 8:1557–1560.
Kirilyuk A, Tolstonog GV, Damert A, Held U, Hahn S, Lower R,
Buschmann C, Horn AV, Traub P, Schumann GG. 2008.
Functional endogenous LINE-1 retrotransposons are expressed
and mobilized in rat chloroleukemia cells. Nucleic Acids Res. 36:
648–665.
Kopera HC, Moldovan JB, Morrish TA, Garcia-Perez JL, Moran JV. 2011.
Similarities between long interspersed element-1 (LINE-1) reverse
transcriptase and telomerase. Proc Natl Acad Sci U S A. 108:
20345–20350.
Mills RE, Bennett EA, Iskow RC, Devine SE. 2007. Which transposable
elements are active in the human genome? Trends Genet. 23:
183–191.
Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian
HH Jr. 1996. High frequency retrotransposition in cultured mammalian cells. Cell 87:917–927.
Ono M, Yasunaga T, Miyata T, Ushikubo H. 1986. Nucleotide sequence
of human endogenous retrovirus genome related to the mouse
mammary tumor virus genome. J Virol. 60:589–598.
Ostertag EM, Goodier JL, Zhang Y, Kazazian HH Jr. 2003. SVA elements
are nonautonomous retrotransposons that cause disease in
humans. Am J Hum Genet. 73:1444–1451.
Raiz J, Damert A, Chira S, Held U, Klawitter S, Hamdorf M, Lower J,
Stratling WH, Lower R, Schumann GG. 2012. The non-autonomous
retrotransposon SVA is trans-mobilized by the human LINE-1
protein machinery. Nucleic Acids Res. 40:1666–1683.
Reese MG, Eeckman FH, Kulp D, Haussler D. 1997. Improved splice site
detection in Genie. J Comput Biol. 4:311–323.
2863
Ianc et al. . doi:10.1093/molbev/mst256
Schneider TD, Stephens RM. 1990. Sequence logos: a new way to display
consensus sequences. Nucleic Acids Res. 18:6097–6100.
Shen L, Wu LC, Sanlioglu S, Chen R, Mendoza AR, Dangel AW, Carroll
MC, Zipf WB, Yu CY. 1994. Structure and genetics of the partially
duplicated gene RP located immediately upstream of the complement C4A and the C4B genes in the HLA class III region. Molecular
cloning, exon-intron structure, composite retroposon, and
breakpoint of gene duplication. J Biol Chem. 269:8466–8476.
Wang H, Xing J, Grover D, Hedges DJ, Han K, Walker JA, Batzer MA.
2005. SVA elements: a hominid-specific retroposon family. J Mol Biol.
354:994–1007.
2864
MBE
Wei W, Gilbert N, Ooi SL, Lawler JF, Ostertag EM, Kazazian HH, Boeke JD,
Moran JV. 2001. Human L1 retrotransposition: cis preference versus
trans complementation. Mol Cell Biol. 21:1429–1439.
Xing J, Hedges DJ, Han K, Wang H, Cordaux R, Batzer MA. 2004. Alu
element mutation spectra: molecular clocks and the effect of DNA
methylation. J Mol Biol. 344:675–682.
Xing J, Wang H, Belancio VP, Cordaux R, Deininger PL, Batzer MA. 2006.
Emergence of primate genes by retrotransposon-mediated sequence
transduction. Proc Natl Acad Sci U S A. 103:17608–17613.
Zhang L, Zhang F, Huo K. 2003. Cloning and characterization of a novel
splicing variant of the ZADH1 gene. Cytogenet Genome Res. 103:79–83.