Hominoid Composite Non-LTR Retrotransposons—Variety, Assembly, Evolution, and Structural Determinants of Mobilization Bianca Ianc,y,1 Cornelia Ochis,y,1 Robert Persch,2 Octavian Popescu,1,3 and Annette Damert*,1 1 Institute for Interdisciplinary Research in Bio-Nano-Sciences, Molecular Biology Center, Babes-Bolyai-University, Cluj-Napoca, Romania 2 Wiesbaden, Germany 3 Institute of Biology, Romanian Academy, Bucharest, Romania y These authors contributed equally to this work. *Corresponding author: E-mail: [email protected]. Associate editor: Katja Nowick Abstract SVA (SINE-R-VNTR-Alu) elements constitute the youngest family of composite non-LTR retrotransposons in hominoid primates. The sequence of their assembly, however, remains unclear. Recently, a second family of VNTR-containing composites, LAVA (L1-Alu-VNTR-Alu), has been identified in gibbons. We now report the existence of two additional VNTR composite families, PVA (PTGR2-VNTR-Alu) and FVA (FRAM-VNTR-Alu), in the genome of Nomascus leucogenys. Like LAVA, they share the 50 -Alu-like region and VNTR with SVA, but differ at their 30 -ends. The 30 -end of PVA comprises part of the PTGR2 gene, whereas FVA is characterized by the presence of a partial FRAM element in its 30 -domain. Splicing could be identified as the mechanism of acquisition of the variant 30 -ends in all four families of VNTR composites. SVAs have been shown to be mobilized by the L1 protein machinery in trans. A critical role in this process has been ascribed to their 50 -hexameric repeat/ Alu-like region. The Alu-like region displays specific features in each of the VNTR composite families/subfamilies with characteristic deletions found in the evolutionary younger subfamilies. Using reciprocal exchanges between SVA_E and PVA/FVA elements, we demonstrate that the structure, not the presence of the (CCCTCT)n/ Alu-like region determines mobilization capacity. Combination of LAVA and SVA_E domains does not yield any active elements—suggesting the use of different combinations of host factors for the two major groups of VNTR composites. Finally, we demonstrate that the LAVA 30 -L1ME5 fragment attenuates mobilization capacity. Key words: retrotransposon, SVA, VNTR, Nomascus leucogenys. Introduction of the element (Raiz et al. 2012). A more recent analysis revealed that none of the SVA domains is essential for retrotransposition. Furthermore, it identified the 50 -end as the “minimal active human SVA” (Hancks et al. 2012). Recently, a second family of primate composite retrotransposons, LAVA, has been identified in gibbons (Damert A, unpublished data [Carbone et al. 2012]). LAVA elements share with SVA the 50 -CCCTCT hexameric repeats and Alu-like region as well as the central VNTR. The 30 -end (LA domain) comprises two unique regions (U1 and U2) separated by an AluSz sequence and followed by an antisense L1ME5 fragment. Around 1,800 LAVA copies are found in the genome of the Northern white-cheeked gibbon, Nomascus leucogenys (NLE). They can be subdivided in 22 subfamilies (Carbone et al. 2014). The shared 50 -end and central VNTR suggest a common ancestor for SVA and LAVA. The prototype SVA element has most likely been assembled before the split of gibbons and great apes but after the divergence of hominoid and Old World primates, as SVAs are not found in the Rhesus macaque (Han et al. 2007). 50 -Truncated LAVA copies (VNTR- ß The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 31(11):2847–2864 doi:10.1093/molbev/mst256 Advance Access publication September 12, 2014 2847 Article SVA (SINE-R-VNTR-Alu), a family of hominoid specific retrotransposons, presents a unique composite structure combining sequences derived from other retroelements (Alu and HERV-K) with a variable number of tandem repeats (VNTR) region (Shen et al. 1994; Ostertag et al. 2003). SVA elements appear to be one of the most versatile vectors for shuffling sequences and, thus, have an appreciable impact on genome evolution. SVAs can transduce heterologous sequences at both their 50 - (Damert et al. 2009; Hancks et al. 2009) and 30 - (Xing et al. 2006) ends. They have also been shown to function as exon traps, incorporating coding sequence into the elements proper through splicing (Damert et al. 2009; Hancks et al. 2009). SVAs have been shown to be mobilized by the L1 protein machinery in trans. Results regarding dependency on L1ORF1p are controversial, depending on whether an ORF2p-only expression vector or bicistronic vectors carrying ORF1 mutations or an in-frame deletion have been used (Hancks et al. 2011; Raiz et al. 2012). Reduction of the retrotransposition rate following deletion of the SVA 50 -hexameric repeat/Alu-like region suggested a functional role for this part Ianc et al. . doi:10.1093/molbev/mst256 LA) are found in regions of segmental duplication in humans and great apes but are absent from the Rhesus genome (Damert A, unpublished data). Based on this observation assembly of the prototype LAVA element can be assumed to have occurred in approximately the same period as that of SVA. The sequence and mechanism(s) of the assembly of these prototype elements are, however, elusive. All four domains constituting an SVA/LAVA element are independently present in the Rhesus genome. The most likely precursor of SVA/LAVA is SVA2—a VNTR carrying heterologous sequence at its 30 -end, terminating in a poly A tail and flanked by target site duplications (TSD) (Jurka 2000; Jurka et al. 2005; Han et al. 2007). Other assembly intermediates, for example, an SVA2 fused to the Alu-like region characteristic for SVA/LAVA have not been identified to date—neither in the Rhesus genome nor in the genomes of the great apes. The identification of a second family of VNTR composite retrotransposons in gibbons (Carbone et al. 2012) suggested that such intermediates might be found in this sister taxon to the great apes. To address the issue, we analyzed the genome of the Northern white-cheeked gibbon, NLE (Carbone et al. 2014). Although we failed to detect SVA assembly intermediates, we discovered two additional families of VNTR-containing composites sharing the CCCTCT hexameric repeats and Alu-like region with SVA and LAVA elements, but differing at their 30 -ends—PVA (PTGR2-VNTR-Alu) and FVA (FRAM-VNTR-Alu). The distinct characteristics of the PVA and FVA VNTR—30 -intersections as well as of those found in LAVA and the NLE SVA copies suggest splicing as the mechanism of assembly for the founder elements of VNTR composites. The identification of a total number of four families of VNTR composite retrotransposons sharing the 50 and VNTR domains but differing at their 30 -ends prompted us to investigate their retrotransposition potential and the contribution of their functional domains in more detail. Using a previously established cell-based assay (Moran et al. 1996; Raiz et al. 2012), we found considerable differences in the mobilization potential of PVA and FVA on the one hand and LAVA elements on the other hand. Results obtained following reciprocal exchange of the 50 -hexameric repeat/Alu-like region between PVA/FVA and an active SVA_E element indicate that it is not the mere presence but the specific sequence-based structure of this domain that determines retrotransposition efficiency. LAVA and SVA_E domains were found to be incompatible suggesting different mobilization pathways for these two families. Finally, we demonstrate that the LAVA 30 -L1ME5 sequence has an inhibitory effect on retrotransposition. Results The NLE Genome Harbors Four Different Types of VNTR-Containing Composite Retrotransposons A search of the NLE whole-genome shotgun sequences (wgs) (Carbone et al. 2014) using the Alu-like region of the SVA_A consensus (Wang et al. 2005) retrieved a number of composite elements, which were flanked by TSD and carried CCCTCT 2848 MBE hexameric repeats and the SVA Alu-like region at their 50 ends (fig. 1). As in SVA these were followed by VNTR regions of variable length. At the 30 -ends, however, four different types of sequences could be distinguished, the SVA SINE-R being one of them. The most numerous of the families, LAVA (Carbone et al. 2012), is represented by approximately 1,800 elements, falling into 22 subfamilies (Carbone et al. 2014). The 30 -end of the second largest family does not contain any repetitive sequences. Instead, exon 4 and the 50 -part of intron 4 of the gene encoding prostaglandin reductase 2 (PTGR2) were found to be fused to the 50 -part of the unique sequence characteristic for SVA2 elements. The third type of VNTRcontaining composites is characterized by a 30 -end including part of a FRAM (Free Right Alu Monomer) element embedded in otherwise nonrepetitive sequence. According to their distinguishing features the two families were called PVA (PTGR2-VNTR-Alu) and FVA (FRAM-VNTR-Alu). Thus, there are four different families of VNTR containing composite retrotransposons present in the NLE genome. Gibbon SVAs Amplified Independently from Those in the Great Ape Lineage Twenty-nine SVA elements were identified in the NLE genome (table 1 and supplementary table S1, Supplementary Material online). The consensus sequence constructed is closest to human SVA_A. It displays, however, diagnostic substitutions compared with the human elements (fig. 2A and B). Inspection of the orthologous loci in the human, chimpanzee, and orangutan genomes revealed that SVA elements are absent from the corresponding positions in great apes in all cases where this could be assessed. For a small number of elements, their state could not be determined in either one or all of the great ape genomes due to the lack of sequence information. Amplification time based on divergence in the SINE-R region was estimated at 11.3 Ma, placing expansion of gibbon SVAs well after the split from great apes (~20 Ma) but before separation of the four Gibbon genera (5.6–8 Ma) (Chan et al. 2010; Israfil et al. 2011). PVA and FVA Elements—Copy Number and Origin In total, 143 PVA and 11 FVA elements could be identified in the NLE genome sequence (table 1 and supplementary tables S2 and S3, Supplementary Material online). TSDs were found for most of the elements. Neither PVA nor FVA elements display a subfamily structure. Their Alu-like regions are most closely related to SVA_A/SVANLE and the ancestral LAVA_A2 subfamily (Carbone et al. 2014) (fig. 2A). The 30 -part of PVA elements is constituted by exon 4 and the 50 -part of intron 4 of the PTGR2 gene. A polyadenylation signal present in PTGR2 intron 4 is used for 30 -end processing of PVA RNA. In Rhesus macaques the PTGR2 gene is localized on chromosome 7, the corresponding NLE sequence maps to chr22a:32,514,746–32,515,039 (fig. 2C). PVA elements represent the second case where an entire exon of a protein-coding gene has been incorporated into a VNTR composite. The SVA subfamily F1 acquired exon 1 of MAST2 through splicing to its Alu-like region (Damert et al. 2009; Hancks et al. 2009). MBE Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256 FIG. 1. VNTR-containing composite retrotransposons in the genome of NLE. All families comprise CCCTCT hexameric repeats and an Alu-like region at their 50 -ends, but have acquired different sequences downstream of the VNTR region. SVA2—not strictly a VNTR composite—is given for comparison. Al, partial AluSz sequence; L1, partial L1ME5 sequence; PTGR2, exon 4 and 50 -part of intron 4 of the gene encoding Prostaglandin Reductase 2; FR, partial FRAM sequence; An, poly A tail; Polyadenylation signals are denoted with asterisks, ovals represent TSD. Schematic representations of LAVA_A2 and LAVA_F2 are based on the Gibbon Genome Sequencing Consortium subfamily categorization (Carbone et al. 2014). Table 1. Summary Statistics of PVA, FVA, and SVA Elements in the NLE Genome. Family Count 50 -n.d.a 50 -Truncated PVA FVA SVANLE 143 11 29 22 1 1 41 (33.6%) 5 (45.5%) 6 (20.7%) 50 -Transductions (Spliced RNAs)b 4 (3) — 3 (1) 50 - and 30 -Transductions (Spliced RNAs)b 3 (3) — — 30 -Transductions 16 6 6 a 0 5 -n.d.: 50 -end could not be determined due to assembly gaps. Spliced RNAs: The number of 50 -transductions constituted of spliced cellular RNAs is given in parentheses. b Interestingly, the sequence found immediately upstream of PTGR2 exon 4 in PVA elements corresponds to the 50 -part of the SVA2 unique 30 -sequence. This indicates that SVA2 or a derivative is the precursor of PVA elements. The 30 -part of FVA elements could be traced back to an ancestral sequence localized on chromosome 12 in Macaca mulatta. The corresponding NLE sequence maps to chromosome 22a:85,484,874–85,485,192 (fig. 2D). Promiscuous Splicing at the 30 -End Generates VNTR Composite Variety The finding that the intersection between SVA2-derived and PTGR2-derived sequences in PVA elements coincides precisely with the PTGR2 intron 3–exon 4 junction indicates that splicing is the most likely mechanism responsible for the acquisition of the PVA 30 -end. Computational analysis of the SVA2 30 -unique sequence to which the PTGR2 exon had been fused identified a splice donor (consensus MAG/gtragt) at the position of fusion (fig. 3B)—further supporting the assumption of PVA assembly through splicing. For SVA recent analysis based on the Repbase consensus sequence SVARep did not identify a splice site in HERV-K corresponding to the VNTR—env intersection (Hancks and Kazazian 2010). However, closer inspection of SVANLE (supplementary fig. S1A, Supplementary Material online) and, subsequently, of human SVA_A (not shown) revealed that in these ancestral families SVA2 sequence is still present. Alignment to SVA2 (Repbase) and HERV-K10 (GenBank accession number M14123 [Ono et al. 1986]; supplementary fig. S1A, Supplementary Material online) showed that the same SVA2 donor as in PVA is used. Splice site prediction confirmed the acceptor (consensus cag/G) in the HERV-K10 env sequence (fig. 3C). The ancestral sequences of the LAVA and FVA 30 -parts map to intron 2 of the gene encoding hydroxysteroid (17-beta) dehydrogenase 3 (HSD17B3, LAVA—an alignment of the LAVA 30 -end to its source sequence is given in supplementary fig. S1B, Supplementary Material online) and a region for which no transcripts are reported (NLE chromosome 22a, FVA, see fig. 2D). We, therefore, used computational 2849 Ianc et al. . doi:10.1093/molbev/mst256 MBE FIG. 2. Alignment of VNTR composite Alu-like regions (A) and the 30 -domains of SVANLE (B), PVA (C), and FVA (D). Substitutions specific for SVANLE are highlighted in black in A and B. Deletions specific to evolutionary younger subfamilies are boxed in A. Consensus sequences of SVA_A and SVA_B are taken from Wang et al. (2005). LAVA subfamily consensus sequences are those established by the Gibbon Genome Sequencing Consortium (Carbone et al. 2014). In C and D the 30 -domains of PVA and FVA are aligned to their source loci in the NLE PTGR2 gene and at chromosome 22a:85,484,874–85,485,192, respectively. The corresponding sequences in Macaca mulatta (MMU) are given for comparison. Splice acceptors are highlighted in black in C and D. The arrowhead in C marks the PTGR2 exon 4/intron 4 boundary. The PTGR2 intron 4 polyadenylation signal is boxed in C. The part of the sequence repeat-masked as FRAM is boxed in D. NLE, Nomascus leucogenys. (continued) 2850 Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256 MBE FIG. 2. Continued. 2851 Ianc et al. . doi:10.1093/molbev/mst256 MBE FIG. 3. Promiscuous splicing at the 30 -end generates VNTR composite variety. (A) Consensus sequences found at the exon–intron junctions (marked with arrows) of spliceosomal introns. (B–E) Sequences spliced to give rise to variant VNTR composite 30 -ends. Splice donor sites in the SVA2 30 -unique sequence or VNTR (top sequences in the panels) and their corresponding acceptor sites in cellular or, in case of SVA, endogenous retroviral RNA (line 2 in each panel) are shown. The sequences found in the resulting PVA (B), SVANLE/SVA_A (C), LAVA (D), and FVA (E) elements are given at the bottom of each panel. Exon sequences are bold and uppercase; intron sequences lowercase. The 100% conserved residues at the 50 - and 30 -ends of introns are bold. The part of the SVA2 30 -unique sequence retained in PVA and SVANLE/SVA_A elements is underlined. The SVA2 sequence was obtained from Repbase, HERV-K10: GenBank accession number M14123 (Ono et al. 1986). prediction to establish whether splicing might have been operative in 30 -end assembly for these two families as well. The analysis revealed that in both cases a splice donor site in the VNTR region could have been used. Splice acceptor sites were identified at appropriate positions in the respective ancestral sequences (fig. 3D and E). Thus, splicing represents a possible mechanism for the acquisition of 30 -sequence also in LAVA and FVA elements. Nomascus VNTR Composites Can Be Mobilized by L1-Encoded Proteins in trans in Human Cells Having identified four different families of VNTR composites in the NLE genome, we next wanted to know whether these can be mobilized in human cells using human L1RP (Kimberland et al. 1999) as driver. Toward this aim, we amplified one copy each of PVA (1 kb), FVA (1 kb), and SVANLE (1.4 kb) as well as two LAVA elements (2 and 2.2 kb, respectively; table 2 and supplementary fig. S2, Supplementary Material online) from NLE genomic DNA and cloned them upstream of the mneoI reporter cassette (Freeman et al. 1994; Moran et al. 1996) using a similar strategy as for the SVA in pAD3SVA_E (Raiz et al. 2012). The reporter cassette consists of a neomycin resistance gene driven by 2852 an SV40 promoter. Transcription terminates at a thymidine kinase polyA signal. The entire transcription cassette is placed in antisense relative to the VNTR composite element; the neomycin phosphotransferase coding sequence is interrupted by an intron in sense orientation. This arrangement ensures that G418 resistant (G418R) cells will only arise when a transcript initiated from the promoter driving transcription of the VNTR composite is spliced, reverse transcribed and reintegrated into chromosomal DNA (fig. 4). The LAVA elements were chosen from the phylogenetically younger subfamilies LAVA_E and LAVA_F1 (Carbone et al. 2014). LAVA_F1 elements are characterized by an Alu-like region comprising only 182 bp (fig. 2A). The truncation is most likely due to a splicing event as computational analysis predicts a splice donor site at the point of truncation. The LAVA_F1 element tested has been shown to be specific for NLE and polymorphic within NLE (Carbone et al. 2014). Based on copy number in the NLE genome, we expected different retrotransposition potentials with FVA and PVA being less well mobilized than LAVA. While PVA and FVA—as expected—were mobilized close to the level obtained with the pseudogene control vector pCEPNeo (15–20% of SVA_E), the NLE SVA showed only about half MBE Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256 Table 2. VNTR Composite Retrotransposons Amplified from NLE Genomic DNA. Family/ Identifier Subfamily PVA FVA Position in GGSC TSD Length of the Element CT Hexameric Nleu3.0/nomLeu3 Sequence in the repeats (bp) (Including TSDs) mneoI-Tagged Constructs (bp) AAAGAATGGCAGAAAA 1,021 82 Nl_P16_2 chr16:71,184,467–71,185,653 Nl_F18 chr18:86,030,704–86,032,071 AAAATTCGCAATAAACCA/ 991 75 AAAATTCTCAGTAAAACA SVANLE Nl_S5_1 chr5:81,980,179–81,981,574 AAAGAAATTAACCTAATA LAVA_E chr2:155,391,066–155,392,835 AAAAAAAAAAAAGAAGTCAA AGAAAACACCGACGT LAVA_F1 chr3:108,773,434–108,775,518 SVA_E H19_27 1,354 1,991 2,213 1,899 90 58 122 138 NOTE.—Mismatches in TSDs are bold and underlined. FIG. 4. Schematic representation of the cell culture retrotransposition assay. G418 resistant (G418R) cells can arise only if the mneoI-tagged VNTR composite element is transcribed, spliced and the spliced copy reintegrated into the genome. Reverse transcription and integration are mediated by L1 proteins encoded on a cotransfected vector. Following integration the neo ORF is transcribed from its own promoter—conferring G418 resistance. SD, splice donor; SA, splice acceptor; G418S, G418 sensitive; The mneoI polyadenylation signal is marked with an asterisk; An, polyA tail. of the activity of the human SVA_E (Raiz et al. 2012) used as standard. Surprisingly, the two LAVA elements differed dramatically in their mobilization rates: The LAVA_E retrotransposed consistently below pseudogene levels, the LAVA_F1, on the other hand, was found to be approximately twice as active as SVA_E (fig. 5A). For none of the constructs tested G418 resistant colonies could be detected following cotransfection with the empty pCEP4 vector which served as negative control. As our analysis had identified splicing as the mechanism of assembly of VNTR composites, we next set out to assess whether RNAs transcribed from our test vectors are spliced at sites other than those present in the mneoI cassette and if additional splicing events have an influence on the availability of full-length mneoI-spliced RNA. Northern blot analysis revealed the existence of a single species of full-length mneoIspliced RNA for FVA (2.4 kb), LAVA_E (3.4 kb), LAVA_F1 (3.6 kb) (marked by asterisks in fig. 5B), and SVANLE (2.8 kb, fig. 7C) following transfection with the respective vectors. In case of PVA two bands corresponding to mneoI-spliced RNAs could be detected. The longer one of 2.4 kb represents the expected PVA full-length mneoI-spliced RNA. The shorter one (~2.2 kb) is most likely the result of PVA internal transcription initiation as amplification from cDNA using an upstream primer at the element’s 50 –end, and a downstream primer in the mneoI cassette yields a single product corresponding to the full-length unspliced PVA (data not shown). Transcription initiation within the VNTR has been discussed by Hancks et al. (2011) for 50 -truncated de novo integrants derived from an SVA lacking an exogenous promoter. Surprisingly, RNA isolated from SVA_E mneoI transfected cells showed a second hybridization signal at around 2.2 kb in addition to the expected 3.3 kb full-length mneoI-spliced RNA. Reverse transcription-polymerase chain reaction (RT-PCR) analysis revealed that this hybridization signal represents a mneoIspliced RNA that, in addition, is spliced between a donor (AG/gtgag) in the SVA_E VNTR and an acceptor (ag/A) at the very 30 -end of the neomycin phosphotransferase (neo) ORF. Lacking the neo stop codon and polyadenylation signal, this RNA cannot give rise to neomycin-resistant cells following reverse transcription and integration. The VNTR splice donor corresponds to the one predicted for the 30 assembly of LAVA and FVA (fig. 3D and E)—further supporting splicing as the mechanism responsible for the acquisition of variant 30 -ends by VNTR composites. Interestingly, the VNTR-neo single-spliced RNA does not appear to have been generated at detectable levels (fig. 5B). With regard to splicing of an SVA when combined with the mneoI cassette it is worthwhile noticing that the RNA detected by Hancks et al. (2011) for SVA.2mneoI (expected length for the full-length mneoI-spliced transcript is 3.5 kb) migrates well below 3 kb. In light of our findings it can be speculated that this represents a double-spliced transcript as well and that the much less abundant full-length mneoI-spliced RNA is not visible in the exposure shown. Taken together, the results of the Northern blot analysis indicate that differences in the amount of full-length mneoIspliced RNA available for retrotransposition cannot explain the differences in mobilization potential observed. PVA and FVA are mobilized at low levels—despite comparatively higher amounts of RNA available (fig. 5B). Likewise, in the case of SVANLE there is no correlation to be observed between RNA level and retrotransposition capacity relative to SVA_E (figs. 5A and 7C). SVA_E, LAVA_E, and LAVA_F1 show comparable levels of full-length mneoI-spliced RNA (compare also fig. 7C), their mobilization potential, however, differs. 2853 Ianc et al. . doi:10.1093/molbev/mst256 MBE FIG. 5. NLE VNTR composites can be mobilized by L1RP in HeLa HA cells. (A) Results of retrotransposition reporter assays following selection with hygromycin and G418. Cells were cotransfected with driver (pJM101 L1RP Neo) and the respective mneoI-tagged VNTR composite containing plasmids. Values given represent the average over 3–9 independent experiments +/ standard deviation. *In case of SVANLE a 1:5 dilution is shown. (B) Northern blot analysis of the transcripts generated following transfection of the mneoI-tagged VNTR composite constructs. The bands corresponding to the full-length mneoI-spliced RNAs are marked with asterisks above. The expected lengths are: SVA_E 3.3 kb, PVA 2.4 kb, FVA 2.4 kb, LAVA_E 3.4 kb, and LAVA_F 3.6 kb. The left-hand panel schematically depicts the isoforms generated through splicing of the SVA_E mneoI transcript. (C) In SVA_E mneoI transcripts splicing occurs between the VNTR and the 30 -end of the neomycin phosphotransferase (neo) ORF. Exon/intron junctions as well as the branchpoint are marked with arrowheads. The exon/exon junction in the resulting spliced sequence is marked by a vertical bar. Consensus splice donor, branchpoint, and splice acceptor sequences are given in the top panel. Exon sequences are in uppercase; intron sequences (except for the branchpoint) in lowercase. The 100% conserved residues at the 50 - and 30 -ends of introns are bold and underlined. The branchpoint is marked in uppercase. HSV TK pA, HSV TK polyadenylation cassette of mneoI. 2854 MBE Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256 Table 3. LAVA_F1 De Novo Integrations. Id #1 #1A #1B #1C #1E #2 #2D Insertion Site (hg19) chr2:173,960,322 chr4:48,364,737 chr2:133,636,276 chr11:61,763,000 chr1:150,617,001 chr14:77,879,917 chr2:42,291,366 Gene Orientation ZAK intron 2 SLAIN2 intron 1 NCKAP5 intron 9 Antisense Antisense Sense GOLPH3L downstream NOXRED1 intron 2 PKDCC downstream Sense Antisense Sense TSD AAGATTCTTGA AAAAAAAAAAAAA GAAAAGGAAGTGT AAAGAAAAATGCCC AAAAAAAAAAAAAAGAAAA AAAAATATAAGGCCAA AAGAAAAGGCTCTC EN Site (TTTT/AA)a TCTT/AA TTTT/AA TTTC/AA CTTT/CA TTTT/GA TTTT/AA TCTT/GA polyA 23 20 63 40 39 45 90 NOTE.—EN, endonuclease. a Consensus EN recognition site (Feng et al. 1996). To confirm that G418 resistant colonies obtained after LAVA_F1 transfection indeed are the result of retrotransposition events, we characterized integration sites in seven clones. LAVA_F1 de novo integrants resemble those of SVA (Hancks et al. 2011, 2012; Raiz et al. 2012): They are mostly full-length (6/7), contain polyA tails of variable lengths (20–90 bp) and are flanked by TSDs (11–19 bp). By contrast, only about 50% of the genomic LAVA insertions for which the 50 -end could be determined are full-length (Carbone et al. 2014). The insertion sites resemble the L1 endonuclease consensus cleavage site (50 -TTTT/AA-30 , [Feng et al. 1996]), except for integration 1C. In this case 30 -processing of the bottom strand before reverse transcription might have taken place—as suggested by Hancks and Kazazian (2012) (based on [Kopera et al. 2011]) for genomic insertions displaying atypical L1 endonuclease sites with the consensus 50 -YYYY/YN-30 (Hancks and Kazazian 2012). The actual endonuclease site of integration 1C would then be 50 -TTTC/AG-30 . Four out of seven insertions occurred in introns of genes; two downstream of genes (table 3). Similar to what has been observed for SVA de novo integrants (summarized over the integrations reported in [Hancks et al. 2011, 2012; Raiz et al. 2012]) there does not appear to be a strand bias for insertions occurring in or near genes. By contrast, only 20% of intragenic SVAs (Hancks et al. 2009) and 25% of intragenic LAVAs (Carbone et al. 2014) are on the coding strand. The SVA_E Hexameric Repeat/Alu-Like Domain Enhances PVA and FVA Retrotransposition In a prior publication, we have demonstrated that deletion of the CT hexameric repeats and Alu-like region reduces SVA retrotransposition efficiency by 50% (Raiz et al. 2012). More recently, Hancks et al. (2012) provided evidence that the CT-Alu-like domain constitutes the minimal active SVA (Hancks et al. 2012). PVA and FVA both possess CT-Alu-like domains—nevertheless they do not retrotranspose significantly above pseudogene level. We, therefore, reasoned that not the presence but the specific sequence/structure of the 50 -domain (CT-hexamer plus Alu-like) is important for efficient mobilization. Alignment of the Alu-like domains of SVA (Wang et al. 2005) and LAVA (Carbone et al. 2014) subfamilies, PVA, and FVA reveals distinct characteristics for each of them (fig. 2A). The ancestral Alu-like domains of SVA_A, SVANLE, LAVA_A2, FVA, and PVA do not display any of the deletions found in the evolutionary younger subfamilies. It is also worthwhile noticing that these families (SVANLE, FVA, PVA, this study) and subfamilies (SVA_A [Wang et al. 2005], LAVA_A2 [Carbone et al. 2014]) reached only comparatively low copy numbers in the respective genomes. To test the hypothesis that not the mere presence but rather the sequence (and, as VNTR composites are noncoding, the structure likely determined by the sequence) of the CT-Alu-like domain determines mobilization potential, we generated domain swaps by reciprocally exchanging the SVA_E and PVA/FVA 50 -regions. The chimeras constructed are schematically depicted in figure 6A. Given that SVA_E is efficiently mobilized in our assay system, we expected the SP and SF domain swaps to be mobilized more efficiently than their parental elements (PVA, FVA) if indeed the sequence/structure of the CT-Alulike domain is the key determinant for VNTR composite retrotransposition mediated by L1 in trans. The PS and FS chimeras, containing the ancestral PVA/FVA 50 -domains, were expected to be less well mobilized than SVA_E. The results obtained (fig. 6B) provide support for the hypothesis outlined above: The SP and SF chimeras containing the SVA_E hexameric repeats and Alu-like domain are mobilized 4- and 7-fold more efficiently than PVA and FVA, respectively. Transcript patterns and steady-state levels of full-length mneoI-spliced RNAs of parental elements (PVA, FVA) and corresponding chimeras (SP, SF) are comparable (fig. 6C). The ancestral 50 -domains of PVA and FVA, on the other hand, drastically reduce the mobilization capacity of the respective chimeras (PS; FS) when compared with SVA_E (fig. 6B). The ratio of full-length mneoI-spliced to doublespliced transcript is roughly equal for SVA_E and the two chimeras. In case of FS an influence of smaller amounts of available full-length mneoI-spliced RNA on the observed retrotransposition rate can, however, not be completely excluded (fig. 6C, right panel). Incompatibility of LAVA and SVA Domains Suggests Different Pathways for LAVA and SVA Mobilization Based on the results obtained with the SVA–PVA/FVA domain swaps, we expected a similar effect of the SVA_E CT-Alu-like domain on retrotransposition of the “inactive” 2855 Ianc et al. . doi:10.1093/molbev/mst256 MBE FIG. 6. The structure of the hexameric repeat/Alu-like region determines retrotransposition potential of SVA/PVA/FVA. (A) Schematic representation of the SVA–PVA/FVA chimeras tested. 50 -Domains of SVA and PVA/FVA were reciprocally exchanged at the Alu-like—VNTR junction. (B) The resulting chimeras were cotransfected with an L1RP expression vector and cells subjected to consecutive hygromycin and G418 selection. Retrotransposition rates (+ standard deviation) are given relative to that of SVA_E (100%). Numbers above columns denote the number of experiments taken into account for each individual construct. Brackets link chimeras and parental elements that can be directly compared based on identical transcript patterns. (C) Northern blot analysis of the transcripts generated following transfection of the mneoI-tagged domain swap constructs. The bands corresponding to the full-length mneoI-spliced RNAs are marked with asterisks above (left panel) or an arrowhead (right panel). Brackets link chimeras and parental elements that can be directly compared based on identical transcript patterns. The expected lengths of the full-length mneoI-spliced RNAs are: SVA_E 3.3, SP 2.5, SF 2.5, PVA 2.4, PS 3.3, FVA 2.4, and FS 3.3 kb. A side-by-side comparison of the spliced transcripts resulting for SVA_E, PS, and FS is shown on the right. LAVA_E element. However, there was only a slight increase in mobilization potential to be observed for the SLE chimera when compared with LAVA_E (fig. 7B, schematic representation of the chimeras see fig. 7A and supplementary fig. S3, Supplementary Material online). RNA levels of the SLE chimera and the parental LAVA_E element are similar 2856 (fig. 7C) so that observed retrotransposition rates can be directly compared. Domain swaps carrying the LAVA_E or LAVA_F1 hexameric repeats and Alu-like domains at their 50 -ends (LES, LFS) could not be mobilized above pseudogene level (fig. 7B). Whereas this result had been expected for the LAVA_E CT-Alu-like domain, combination of domains Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256 MBE A B C FIG. 7. SVA and LAVA domains are incompatible. (A) Schematic representation of the SVA–LAVA chimeras tested. 50 -Domains of SVA and LAVA_E/ LAVA_F were reciprocally exchanged at the Alu-like—VNTR junction, with the exception of SLF were the exchange was effected at the VNTR 30 -end of SVA_E. The fine structure of the junctions of the SVA/LAVA chimeras is given in supplementary figure S3, Supplementary Material online. (B) The resulting chimeras were cotransfected with an L1RP expression vector and cells subjected to consecutive hygromycin and G418 selection. Retrotransposition rates (+ standard deviation) are given relative to that of SVA_E (100%). Numbers above columns denote the number of independent experiments for each individual construct. The results obtained with the pCEPNeo pseudogene control are given for comparison. (C) Northern blot analysis of the transcripts generated following transfection of the mneoI-tagged domain swap constructs. SVANLE mneoI-tagged RNA analyzed in the same experiment is shown in addition. The bands corresponding to the full-length mneoI-spliced RNAs are marked with asterisks above. The expected lengths are: SVA_E 3.3, SLE 3.5, SLF 3.3, LAVA_E 3.4, LES 3.2, LFS 3.1, LAVA_F 3.6, and SVANLE 2.8 kb. Note that the splicing pattern of the SLF chimera, for which the domain exchange was effected at the VNTR 30 -end corresponds to that of SVA_E. derived from the two active elements LAVA_F1 and SVA_E had been predicted to yield an active element. This, however, is not the case—indicating that domains of the two large families of VNTR composites, LAVA and SVA, are incompatible and, in extension, that elements of the two families might use mobilization pathways with different structural requirements. Results obtained with the last chimera, SLF, support this assumption: The SVA_E hexameric repeats and Alu-like domain is not functional when combined with the LAVA_F1 30 -part. Due to the shared VNTR region the LES, LFS, and SLF chimeras show the same splicing pattern as SVA_E. There is no correlation to be observed between the amount of fulllength mneoI-spliced transcripts and the retrotransposition rates observed. As the domain swap experiments indicated that SVA and LAVA might use different mobilization pathways, we next set out to explore whether LAVA retrotransposition is dependent on L1 ORF1p. Previously it has been shown that L1 ORF1p is dispensable for Alu retrotransposition (Dewannieux et al. 2003), whereas there are conflicting data on ORF1p requirement in SVA mobilization (Hancks et al. 2011; Raiz et al. 2012). Using an L1ORF2-only driver Hancks 2857 MBE Ianc et al. . doi:10.1093/molbev/mst256 FIG. 8. The LAVA_F1 30 -L1ME5 sequence inhibits retrotransposition. (A) Schematic representation of the LAVA 30 -domain indicating the sites at which the respective constructs have been truncated. Numbers are given relative to the first nucleotide of the LAVA_F1 30 -domain. (B) Retrotransposition reporter assay following selection with G418 only. Cells were cotransfected with driver (pJM101 L1RP Neo) and the respective mneoI-tagged LAVA_F1 30 -deletion mutants or the LAVA_F1 full-length construct. Retrotransposition rates (+ standard deviation) are given relative to that of the full-length LAVA_F1 construct (100%). The inset shows the Northern blot analysis of the RNAs generated from the two shortest LAVA_F 30 -deletions. The expected lengths of the full-length mneoI-spliced RNAs are: LAVA_F 3.6 kb, 15–3.2 kb, and 92–3.3 kb. et al. found a canonical SVA_D to be independent of ORF1p. It is, however, possible that endogenous ORF1p is sufficient to support retrotransposition in this setting. In the presence of a bicistronic driver containing a double mutation in ORF1 trans mobilization of SVA_D was reduced to background levels, suggesting that SVA retrotransposition requires both L1encoded proteins (Hancks et al. 2011). Results consistent with the latter finding were obtained by Raiz et al. (2012) for an SVA_E using a driver containing an ORF1 in-frame deletion. The controversial results obtained are most likely the result of differences in the availability of ORF2p, in the ORF1p/ORF2p ratio (including endogenous ORF1p) and in the formation and composition of L1RNPs (Doucet et al. 2010) in each of the three experimental approaches. Cotransfection of the mneoI-tagged LAVA_F1 with the driver carrying the L1 ORF1 in-frame-deletion (Raiz et al. 2012) did not yield any colonies following G418 selection. This result provides a first indication that mobilization of LAVA requires L1 ORF1p. The LAVA 30 -L1ME5 Sequence Inhibits Retrotransposition Analysis of the structure of retrotransposon genomic copies can provide insights into mechanisms of and requirements for their mobilization. A whole-genome survey of LAVA elements in the NLE genome revealed that at least 97 of them (~5%) are 30 -truncated; the vast majority of them through premature polyadenylation (supplementary fig. S4, Supplementary Material online). By contrast to SVAs in the human genome, for which premature polyadenylation events were found to be distributed over the entire length of the 2858 SINE-R region (Damert A, unpublished data), LAVA premature polyadenylation occurs exclusively downstream of the simple repeat (U2) region. The minimum length of the 30 -part of LAVA genomic copies is, thus, around 300 bp. One explanation for this finding could be that there are no suitable polyadenylation signals in the U1-AluSz-U2 part of the LAVA 30 -end. On the other hand, it is possible that the U1-AluSz-U2 is absolutely required for retrotransposition. To test this latter hypothesis, we generated LAVA_F1 nested 30 -deletions lacking most of the antisense L1ME5 sequence (332, fig. 8A), the U2 30 -part and the L1ME5 (227, fig. 8A), the AluSz 30 , U2, and L1ME5 (92, fig. 8A), and the AluSz, U2, and L1ME5 (15, fig. 8A), respectively. Because LAVA_F1, when cotransfected with L1RP, had been found to yield acceptable colony counts after selection with G418 only, the experiments were carried out without hygromycin preselection. Surprisingly, all of the deletion mutants yielded more G418R colonies than the full-length LAVA_F1 element (fig. 8B). One explanation for this finding could be that in case of the full-length construct the use of alternative polyadenylation sites in the L1ME5 antisense fragment leads to transcription termination upstream of the mneoI cassette, thus reducing the amount of mneoI containing transcript available for retrotransposition. Northern blot analysis, however, did not provide support for this scenario. Deletion mutants lacking the L1ME5 antisense fragment do not yield significantly higher amounts of full-length mneoI-spliced RNA than the full-length construct (inset in fig. 8B). We, therefore, conclude that the antisense L1ME5 sequence, that is missing in all deletion mutants but present in the full-length element, has an inhibitory effect on LAVA_F1 mobilization by L1 encoded proteins in trans. Discussion Assembly of VNTR-Containing Composites The sequence and mechanism(s) of the assembly of the prototype VNTR composite retrotransposon(s) have, up to date, been elusive. Based on the presence of the VNTR region— which is shared by all VNTR composites—it has been safe to assume that SVA2 is their common ancestor. With regard to the Alu-like region Hancks and Kazazian suggested a series of splicing events to explain its mosaic structure. Although they discuss splicing as the mechanism of acquisition for the SINE-R as well, sequence analysis using the SVARep consensus led them to conclude that the VNTR-SINE-R fusion most likely is the result of template switching (Hancks and Kazazian 2010). We now provide evidence that the 30 -domains of all four families of VNTR composites have been acquired through splicing. In case of SVA and PVA SVA2 could unambiguously be identified as the molecule providing the splice donor. It must, therefore, be the precursor of these families. Phylogenetic analysis (fig. 9A) indicates that the SVANLE/SVA_A/B and PVA Alu-like parts are derived from a common ancestor—independent acquisition of this domain appears, therefore, unlikely. Taken together these two findings—SVA2 as precursor and a common ancestor for PVA and SVA—suggest that an SVA2 already carrying the Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256 MBE FIG. 9. Assembly of VNTR composite retrotransposons through splicing to “Alu-SVA2.” (A) PhyML generated maximum-likelihood tree of the Alu-like domains of ancestral VNTR composite families/ subfamilies. (B and C) Schematic representation of the assembly of PVA and SVA prototype elements through splicing to the SVA2 30 -unique sequence (B) and of LAVA and FVA prototype elements through splicing to the SVA2 VNTR region (C). The SVA2 30 -unique sequence retained in PVA and SVA is colored dark gray. Exons are represented by numbered boxes. Polyadenylation signals are denoted with asterisks. TSDs are shown as arrows. SA, splice acceptor; FR, FRAM partial sequence. Alu-like domain at its 50 -end did exist at one point in evolution. This “Alu-SVA2” subsequently acquired two different 30 -domains—SINE-R and PTGR2 exon 4/intron 4—through splicing to HERV-K and PTGR2 RNAs, respectively. Whether splicing occurred in cis or trans cannot be decided anymore as no Alu-SVA2 elements have been preserved during evolution. For splicing to occur in cis (as illustrated in fig. 9B) copies of Alu-SVA2 must have existed at an appropriate distance 2859 Ianc et al. . doi:10.1093/molbev/mst256 upstream of a genomic HERV-K copy and in PTGR2 intron 3, respectively. Alternatively, splicing could have happened in trans, combining Alu-SVA2 and HERV-K/PTGR2 transcripts derived from different loci. For the assembly of LAVA and FVA splice donors in the VNTR region have been used. Formally they could, thus, have originated from either Alu-SVA2 or an element carrying a different 30 -end (SVA, PVA). Phylogenetic analysis places FVA and LAVA Alu-like domains on a branch separate from that leading to SVA and PVA (fig. 9A). Based on this derivation of FVA and LAVA from PVA/SVA appears unlikely. Potential splice acceptors could be identified at appropriate positions in the source sequences of both FVA (on chromosome 22a in NLE) and LAVA (in HSD17B3 intron 2) 30 -ends (fig. 3 and supplementary fig. S1B, Supplementary Material online). Figure 9C illustrates a possible scenario for acquisition of FVA and LAVA 30 -ends through splicing to Alu-SVA2 in cis. Whereas both HERV-K (Ahn and Kim 2009) and PTGR2 (Zhang et al. 2003), the precursors of SVA and PVA, have been demonstrated to be expressed in germ cells/testis, to date there is no transcript or expressed sequence tag (EST) annotated for the locus of origin of the FVA 30 -part. The LAVA 30 ancestral sequence maps to intron 2 of HSD17B3, which is expressed in testis germ cells. However, there is no EST support for alternative splicing at the site used for fusion to the VNTR region in LAVA. Thus, the 30 -assembly of both LAVA and FVA must be the result of rare events if splicing is involved. Differential Mobilization of SVA, PVA, and FVA Can Be Attributed to the Structure of Their 50 -Domains The SVA 50 -part has long been suspected to play a crucial role in the mobilization of these composite retrotransposons. A first model (Ostertag et al. 2003; Mills et al. 2007) postulated hybridization of the SVA 50 -antisense Alu copies with ribosome-bound Alu elements as the mechanism facilitating interaction of SVA RNA with L1 proteins. Subsequently, we could show that deletion of the hexameric repeat/Alu-like region reduces the SVA retrotransposition rate by 50% (Raiz et al. 2012). More recently, the hexameric repeat/ Alu-like domain has been demonstrated to constitute the minimal active human SVA (Hancks et al. 2012). The identification of three additional families of VNTR composite nonLTR retrotransposons sharing the SVA 50 -domain—LAVA ([Carbone et al. 2012], Damert A, unpublished data), PVA ,and FVA (this study)—now opened up the unique opportunity to assess the contribution of this functional domain in the context of different elements. The results obtained show marked differences in the capacity of the elements to be mobilized by L1RP proteins in trans. PVA and FVA retrotransposition rates were found to be close to those obtained for the processed pseudogene control—in spite of the presence of the hexameric repeat/Alu-like domain in both elements. Thus, either the respective 30 -ends exert an inhibitory effect on trans mobilization or there are functionally relevant differences in the 50 -domain when compared with the efficiently mobilized SVA_E. The latter assumption received 2860 MBE support from the finding that evolutionary younger (and presumably still active) LAVA and SVA subfamilies are characterized by specific deletions in the Alu-like domain—by contrast to the ancestral (low genomic copy number) families SVA_A, SVANLE, PVA, and FVA (fig. 2A). The relatively low mobilization rate of the NLE SVA is also in line with the hypothesis that the ancestral type of the Alu-like domain (“deletion free”) does not support efficient retrotransposition. Finally, chimeras composed of the SVA_E 50 -domain and the PVA/FVA VNTR/30 -end are efficiently mobilized, whereas combination of the PVA/FVA 50 -ends with the SVA_E VNTR/SINE-R leads to a drastic reduction of retrotransposition potential when compared to SVA_E. Based on these results, we conclude that the specific sequence-based structure of the hexameric repeat/Alu-like region is the critical parameter for mobilization efficiency of SVA, PVA, and FVA. Differences in the Alu-like domains are apparent (fig. 2A), however, how exactly they might influence the process of retrotransposition remains to be elucidated. Deletions are more likely to have an effect on secondary structure than single nucleotide substitutions—thus, it might be the specific folding of the 50 -domain that determines mobilization efficiency. Alternatively, protein binding might differ for the different Alu-like domains—with possible effects on RNA stability, transport and retrotransposition. Closer inspection of the elements revealed that there are also differences in the length of the hexameric repeat region. PVA, FVA, and SVANLE possess 82, 75, and 90 bp of CT repeats, respectively; the hexameric repeat region of SVA_E spans 138 bp (table 2). Hancks et al. (2012) recently reported that deletion of the CT repeats significantly reduced SVA retrotransposition rates. The two elements tested in their experiments (SVA_D) contain hexameric repeat regions of 122 and 126 bp, respectively. They also observed that readdition of 20/35 bp of CT repeats to the hexamer deleted elements did not rescue SVA activity (Hancks et al. 2012). It is tempting to speculate that there might be a minimal length of the hexameric repeat region required for efficient retrotransposition. The contribution of differences in the VNTR region to the differences in retrotransposition rates observed is the most difficult to assess. From the comparison of the results obtained for PVA, FVA, and SVANLE with their relatively short VNTRs (250–350 bp) to SVA_E (900 bp VNTR) it could be assumed that shorter VNTRs support lower retrotransposition rates than longer ones. However, there are two findings contravening this assumption: First, the SP and SF chimeras, in which the PVA/FVA short VNTRs are fused to the SVA_E CT/Alu-like domain, show a four and 7-fold increase in retrotransposition rate, respectively, when compared with their parental elements. Second, Hancks et al. (2012) reported an increase in retrotransposition following partial deletion of the VNTR region in SVA. How the internal organization of the VNTR domain (sequential arrangement of shorter and longer repeat units, presence of half-repeats and internal deletions in the VNTR units) affects mobilization capacity remains to be elucidated. Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256 LAVA Mobilization Has Structural Requirements Differing from Those of SVA LAVA is the largest family of VNTR composites in the NLE genome. The two elements chosen to be tested for trans mobilization in vitro belong to the evolutionary younger subfamilies LAVA_E and LAVA_F1. Strikingly, only the LAVA_F1 element is retrotransposed efficiently in our assay, whereas the LAVA_E element was found to be mobilized below pseudogene levels. Contrary to the results obtained with PVA (SP) and FVA (SF) chimeras, the SVA_E 50 -domain was not able to rescue LAVA_E retrotransposition—neither was it found to be functional when combined with the LAVA_F1 30 -end. Taken together with the fact that the LAVA_F1 hexameric repeat/Alu-like region (derived from an efficiently mobilized element) is inactive in the context of the SVA_E VNTR/SINER, and these findings suggest that SVA and LAVA interact with different sets of host proteins/use different pathways for their mobilization. Efficient mobilization in the “SVA pathway”—which is used by PVA and FVA as well—requires a particular structure of the 50 -hexameric repeat/Alulike region. The “LAVA pathway” obviously has different sequence/structural requirements as evidenced by the inactivity of the SVA_E 50 -domain in the context of LAVA 30 -ends. Which of the domains is responsible for targeting of a VNTR composite element to this pathway remains to be elucidated. Deletion analysis of the LAVA_F1 30 -end suggests that the antisense L1ME5 fragment has an inhibitory effect. An inhibitory effect of sequences 30 of the VNTR has also been suggested for SVA by Hancks et al. (2012) who found that “most of the SVA deletions lacking SINE-R sequences are more active than their full-length counterpart.” The AluSz and U2 region of the LAVA 30 -end appear to be dispensable (at least for LAVA_F1 retrotransposition) as a deletion mutant lacking these sequences is still efficiently mobilized. Further characterization of the LAVA pathway will necessitate testing additional elements—especially against the background that the element efficiently mobilized in our study is characterized by the LAVA_F1 specific truncation in its 50 -domain which sets it apart from the other subfamilies. Elements of other LAVA subfamilies are found to be still polymorphic in NLE (Carbone et al. 2014)—suggesting recent mobilization and—in conclusion—presence of all structural features necessary for retrotransposition in trans. Analysis of a number of them should also provide an answer to the question whether the results obtained for the LAVA_E element here are representative for the entire subfamily. Amplification of LAVA from genomic DNA is, unfortunately, severely hampered by the fact that especially members of the younger subfamilies frequently inserted in or close to other repetitive sequences (Damert A, unpublished data) and, in case of polymorphic elements, by amplification bias toward the preintegration allele. Once BACs mapped to the genome assembly will become publicly available, a more detailed characterization of the LAVA pathway will be possible. MBE Materials and Methods Element Identification, Retrieval, Age Estimates, and Phylogenetic Analysis PVA and FVA elements were initially identified using BLAST (Altschul et al. 1990) at http://blast.ncbi.nlm.nih.gov (last accessed September 18, 2014) against the NLE wgs database (Carbone et al. 2014) with the SVA_A consensus (Wang et al. 2005) as query sequence. Retroelements carrying the SVA_A 50 -end and flanked by TSD were repeat-masked using the Repeatmasker web server at http://www.repeatmasker.org/ cgi-bin/WEBRepeatMasker (last accessed September 18, 2014). Consensus sequences for the respective 30 -ends were constructed and subsequently used as BLAST query to retrieve exhaustive sets of sequences from the GenBank wgs section (December 2010) and the NLE genome build 1.1 (October 2011) (Carbone et al. 2014). The elements were annotated manually. Sequence logos were generated using WebLogo 3 at http://weblogo.threeplusone.com (last accessed September 18, 2014) (Schneider and Stephens 1990; Crooks et al. 2004). All alignments were calculated using BioEdit. Consensus sequences were generated using a majority rule approach. Age estimates for SVANLE were obtained by aligning the SINE-R parts of the elements to the consensus. Substitution densities were then calculated separately for CpG and non-CpG sites using a Python script. Neutral substitution rates of 0.0090/site per My and 0.0015/ site per My were used for CpG and non-CpG substitutions, respectively (Xing et al. 2004). Phylogenetic analysis was carried out using PhyML at www.phylogeny.fr (last accessed September 18, 2014) with default parameters. Splice Site Prediction Splice site prediction was performed with the human splicing finder 2.4.1 (Desmet et al. 2009) at http://www.umd.be/HSF3/ HSF.html (last accessed September 18, 2014) and using the splice site prediction at http://www.fruitfly.org/seq_tools/ splice.html (last accessed September 18, 2014) (Reese et al. 1997). Plasmid Constructs All VNTR composite test vectors are based on pCEPNeo (Raiz et al. 2012). Elements were inserted via KpnI/NheI. Primers used for amplification are listed in supplementary table S4, Supplementary Material online. All amplification and cloning steps were verified using Sanger sequencing. The structure of the domain swaps is schematically depicted in figures 6A and 7A. pAD7PVA, pAD8FVA, pAD9LAVA_E, pAD10LAVA_F1, and pAD11SVANLE The respective elements (positions and TSDs listed in table 2 and supplementary fig. S1, Supplementary Material online) were amplified from NLE genomic DNA (kindly provided by Christian Roos, Gene Bank of Primates at the German Primate Centre, G€ottingen) using Phusion Hot Start II (Thermo Scientific) according to the manufacturer’s instructions. To 2861 MBE Ianc et al. . doi:10.1093/molbev/mst256 amplify LAVA elements, DMSO was added to the reaction to a final concentration of 3%, and denaturation time was extended to 30 s. Amplified elements were subcloned into pJET 1.2 (Thermo Scientific). Reamplifications were carried out using 50 -primers localized directly upstream of the CT hexameric repeats and 30 -primers designed to exclude the elements’ polyadenylation signals. Upstream primers contain a KpnI, downstream primers a NheI restriction site, respectively. Reamplification products were subcloned again into pJET 1.2 for sequencing and further cloning. Finally, the elements were transferred into pCEPNeo via KpnI/NheI. SP, PS, SF, and FS Domain Swaps The 50 -hexameric repeats and Alu-like domains of the elements were combined with the VNTR/SINE-R of H19_27 (pAD3SVA_E [Raiz et al. 2012]) or VNTR/30 -ends of PVA and FVA, respectively, via the AlwNI site at the Alu-like—VNTR junction shared by PVA, FVA, and SVA. SLE and LES Domain Swaps As the AlwNI/BstAPI sites of the LAVA_E and the SVA_E (H19_27) elements differ by 1 nt, they were made compatible by amplification of the 50 -hexameric repeats and Alu-like domains using downstream primers carrying the AlwNI recognition sequence of the respective other element. The amplified 50 -ends of LAVA_E and SVA_E were subcloned and then reciprocally combined with the SVA_E (H19_27) VNTR/SINE-R and LAVA_E VNTR/LA using BstAPI and AlwNI, respectively. SLF and LFS Domain Swaps LAVA_F1, due to its shorter Alu-like domain, does not offer the possibility of direct exchange via AlwNI. For generation of the SLF domain swap the exchange was, therefore, made at the 30 -end. The SVA_E (H19_27) CT/Alu-like/VNTR was amplified using a downstream primer complementary to the VNTR 30 -end which, at its 50 -end, contained the first 6 bp of the LAVA_F1 LA domain including an NcoI recognition site. The SVA-derived amplification product was then combined with the LAVA_F1 30 -region in pCEPNeo KpnI/NcoI/NheI. The LFS domain swap was generated by amplification of the LAVA_F1 50 -end using a primer with a SmaI recognition site. The amplification product was then combined with the SVA_E VNTR/SINE-R and cloned into pCEPNeo using KpnI/SmaI/AlwNI(blunt)/NheI. LAVA 30 -Deletion Mutants LAVA 30 -deletion mutants were generated using Bal31 digestion. The resulting deletions were repaired at their 30 -ends and transferred into pCEPNeo via KpnI/NheI (blunt). pJM101 L1RPNeo pJM101 L1RPNeo (Wei et al. 2001) was kindly provided by John Moran. 2862 pJM101 L1RPNeoORF1 pJM101 L1RPNeoORF1 (Raiz et al. 2012) was kindly provided by Gerald Schumann. Tissue Culture and Retrotransposition Assays HeLa HA cells (kindly provided by J. Moran and previously shown to support detectable levels of SVA retrotransposition [Raiz et al. 2012]) were cultured in DMEM (Lonza) 4.5 g/l Glucose, 10% FCS. Cell-based assays to assess retrotransposition in trans were carried out as described previously (Moran et al. 1996; Raiz et al. 2012) with minor modifications. Briefly, 4 105 cells were seeded on T25 flasks 24 h before transfection. They were then cotransfected with 2 g test plasmid and 2 g L1 expression vector (pJM101 L1RPNeo) or pCEP4 (Invitrogen), respectively, using X-tremeGENE 9 (Roche) according to the manufacturer’s instructions. For assays with hygromycin selection medium was changed 24 h posttransfection to medium containing 200 g/ml hygromycin (Invitrogen). Cells were divided after six days of hygromycin selection and selection was continued with 50% of the cells for another six days. Where appropriate the other half of the cells was kept for RNA isolation. After a total of 12 days of hygromycin selection cells (~7 106) were trypsinized and seeded directly into medium containing 400 g/ml G418 (Invitrogen). In case of test vectors displaying higher retrotransposition rates 1:5/1:10 dilutions were seeded to facilitate counting of individual colonies. In pilot experiments different dilutions were plated to validate that starting cell number does not have an influence on selection conditions and outcome of the experiment. G418 selection was carried out for 10–12 days. Subsequently, cells were stained with Giemsa (Merck) and colonies were counted. For assays without hygromycin selection the medium was changed 24 h posttransfection and cells were reseeded 48 h posttransfection. G418 selection was initiated 72 h posttransfection and continued for 12 days. RNA Isolation, Northern Blot Analysis, and RT-PCR RNA was isolated from cells after 12 days of hygromycin selection using the peqGOLD Total RNA Kit (PEQLAB Labortechnologie GmbH). Northern Blot analysis was carried out using 8 g of total RNA and the NorthernMax-Gly Kit (Ambion, Life Technologies). Membranes were hybridized with a biotin-labeled intron-spanning neo sense riboprobe. The Chemiluminescent Nucleic Acid Detection Module Kit (Pierce, Thermo Scientific) was used for detection. For RT-PCR 3 g of total RNA were DNAse I (Fermentas) digested, 1.5 g of these were then reverse transcribed using Superscript II (Invitrogen) according to the manufacturer’s instructions. The remaining 1.5 g served as negative control (-RT). Amplification of VNTR—neo-spliced transcripts of SVA_E and the chimeras containing the SVA_E VNTR was achieved using the respective upstream (KpnI—site containing) primer used for reamplification (supplementary table S4, Supplementary Material online) in combination with a downstream primer localized at the 30 of the neo ORF (GS88 50 -CCT TCTATCGCCTTCTTGACGAGTTCTTC-30 ; Neo_DW 50 -CTTC Hominoid Composite Non-LTR Retrotransposons . doi:10.1093/molbev/mst256 TATCGCCTTCTTGACG-30 ; or SVA_Neo_Down_1 50 -ACCGC TTCCTCGTGCTTTAC-30 ). The resulting amplicons were subcloned into pJET1.2 (Thermo Scientific) and sequenced. Characterization of LAVA_F1 De Novo Integrations Following transfection with pAD10LAVA_F1/pJM101 L1RPNeo and G418 selection single colonies were grown up and genomic DNA was isolated using the DNeasy Blood & Tissue kit (Qiagen). The presence of the spliced mneoI cassette was determined by PCR using primers GS86/GS87 (Raiz et al. 2012). Genomic DNA was then digested with MscI/ NheI or MscI/SacI and 30 -ends of de novo integrations were determined using EPTS-LM PCR as described previously (Kirilyuk et al. 2008; Raiz et al. 2012). Subsequently, primers in the upstream genomic sequence were designed and de novo integration 50 -ends were amplified. Supplementary Material Supplementary tables S1–S4 and figures S1–S4 are available at Molecular Biology and Evolution (http://www.mbe.oxfordjournals.org/). Acknowledgments The authors thank the Gibbon Genome Sequencing Consortium for making NLE genome sequences available before publication. Furthermore, we wish to thank Christian Roos for providing NLE genomic DNA, John Moran for providing plasmids pJM101 L1RP and pJM101 L1RPNeo as well as HeLa-HA cells and Gerald Schumann for plasmid pJM101 L1RPNeoORF1. The authors also like to thank the anonymous reviewers of this and an earlier version of the manuscript for helpful comments. This work was supported by grants of the Ministry of National Education, CNCS–UEFISCDI, project number PN-II-ID-PCE-2012-4-0090 (to A.D.) and PN-II-IDEI-PCCE 312/2008 (to O.P.). References Ahn K, Kim HS. 2009. Structural and quantitative expression analyses of HERV gene family in human tissues. Mol Cells. 28:99–103. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. Carbone L, Harris RA, Gnerre S, Veeramah KR, Lorente-Galdos B, Huddleston J, Meyer TJ, Herrero J, Roos C, Aken B, et al. 2014. Gibbon genome and the fast karyotype evolution of small apes. Nature 513:195–201. Carbone L, Harris RA, Mootnick AR, Milosavljevic A, Martin DI, Rocchi M, Capozzi O, Archidiacono N, Konkel MK, Walker JA, et al. 2012. Centromere remodeling in Hoolock leuconedys (Hylobatidae) by a new transposable element unique to the gibbons. Genome Biol Evol. 4:648–658. Chan YC, Roos C, Inoue-Murayama M, Inoue E, Shih CC, Pei KJ, Vigilant L. 2010. Mitochondrial genome sequences effectively reveal the phylogeny of Hylobates gibbons. PLoS One 5:e14419. Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Res. 14:1188–1190. Damert A, Raiz J, Horn AV, Lower J, Wang H, Xing J, Batzer MA, Lower R, Schumann GG. 2009. 5’-Transducing SVA retrotransposon groups spread efficiently throughout the human genome. Genome Res. 19: 1992–2008. Desmet FO, Hamroun D, Lalande M, Collod-Beroud G, Claustres M, Beroud C. 2009. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37:e67. MBE Dewannieux M, Esnault C, Heidmann T. 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat Genet. 35:41–48. Doucet AJ, Hulme AE, Sahinovic E, Kulpa DA, Moldovan JB, Kopera HC, Athanikar JN, Hasnaoui M, Bucheton A, Moran JV, et al. 2010. Characterization of LINE-1 ribonucleoprotein particles. PLoS Genet. 6 Feng Q, Moran JV, Kazazian HH Jr, Boeke JD. 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905–916. Freeman JD, Goodchild NL, Mager DL. 1994. A modified indicator gene for selection of retrotransposition events in mammalian cells. Biotechniques 17:46, 48–49, 52. Han K, Konkel MK, Xing J, Wang H, Lee J, Meyer TJ, Huang CT, Sandifer E, Hebert K, Barnes EW, et al. 2007. Mobile DNA in Old World monkeys: a glimpse through the rhesus macaque genome. Science 316:238–240. Hancks DC, Ewing AD, Chen JE, Tokunaga K, Kazazian HH Jr. 2009. Exontrapping mediated by the human retrotransposon SVA. Genome Res. 19:1983–1991. Hancks DC, Goodier JL, Mandal PK, Cheung LE, Kazazian HH Jr. 2011. Retrotransposition of marked SVA elements by human L1s in cultured cells. Hum Mol Genet. 20:3386–3400. Hancks DC, Kazazian HH Jr. 2010. SVA retrotransposons: evolution and genetic instability. Semin Cancer Biol. 20:234–245. Hancks DC, Kazazian HH Jr. 2012. Active human retrotransposons: variation and disease. Curr Opin Genet Dev. 22:191–203. Hancks DC, Mandal PK, Cheung LE, Kazazian HH Jr. 2012. The minimal active human SVA retrotransposon requires only the 5’-hexamer and Alu-like domains. Mol Cell Biol. 32:4718–4726. Israfil H, Zehr SM, Mootnick AR, Ruvolo M, Steiper ME. 2011. Unresolved molecular phylogenies of gibbons and siamangs (Family: Hylobatidae) based on mitochondrial, Y-linked, and X-linked loci indicate a rapid Miocene radiation or sudden vicariance event. Mol Phylogenet Evol. 58:447–455. Jurka J. 2000. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16:418–420. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110:462–467. Kimberland ML, Divoky V, Prchal J, Schwahn U, Berger W, Kazazian HH Jr. 1999. Full-length human L1 insertions retain the capacity for high frequency retrotransposition in cultured cells. Hum Mol Genet. 8:1557–1560. Kirilyuk A, Tolstonog GV, Damert A, Held U, Hahn S, Lower R, Buschmann C, Horn AV, Traub P, Schumann GG. 2008. Functional endogenous LINE-1 retrotransposons are expressed and mobilized in rat chloroleukemia cells. Nucleic Acids Res. 36: 648–665. Kopera HC, Moldovan JB, Morrish TA, Garcia-Perez JL, Moran JV. 2011. Similarities between long interspersed element-1 (LINE-1) reverse transcriptase and telomerase. Proc Natl Acad Sci U S A. 108: 20345–20350. Mills RE, Bennett EA, Iskow RC, Devine SE. 2007. Which transposable elements are active in the human genome? Trends Genet. 23: 183–191. Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH Jr. 1996. High frequency retrotransposition in cultured mammalian cells. Cell 87:917–927. Ono M, Yasunaga T, Miyata T, Ushikubo H. 1986. Nucleotide sequence of human endogenous retrovirus genome related to the mouse mammary tumor virus genome. J Virol. 60:589–598. Ostertag EM, Goodier JL, Zhang Y, Kazazian HH Jr. 2003. SVA elements are nonautonomous retrotransposons that cause disease in humans. Am J Hum Genet. 73:1444–1451. Raiz J, Damert A, Chira S, Held U, Klawitter S, Hamdorf M, Lower J, Stratling WH, Lower R, Schumann GG. 2012. The non-autonomous retrotransposon SVA is trans-mobilized by the human LINE-1 protein machinery. Nucleic Acids Res. 40:1666–1683. Reese MG, Eeckman FH, Kulp D, Haussler D. 1997. Improved splice site detection in Genie. J Comput Biol. 4:311–323. 2863 Ianc et al. . doi:10.1093/molbev/mst256 Schneider TD, Stephens RM. 1990. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18:6097–6100. Shen L, Wu LC, Sanlioglu S, Chen R, Mendoza AR, Dangel AW, Carroll MC, Zipf WB, Yu CY. 1994. Structure and genetics of the partially duplicated gene RP located immediately upstream of the complement C4A and the C4B genes in the HLA class III region. Molecular cloning, exon-intron structure, composite retroposon, and breakpoint of gene duplication. J Biol Chem. 269:8466–8476. Wang H, Xing J, Grover D, Hedges DJ, Han K, Walker JA, Batzer MA. 2005. SVA elements: a hominid-specific retroposon family. J Mol Biol. 354:994–1007. 2864 MBE Wei W, Gilbert N, Ooi SL, Lawler JF, Ostertag EM, Kazazian HH, Boeke JD, Moran JV. 2001. Human L1 retrotransposition: cis preference versus trans complementation. Mol Cell Biol. 21:1429–1439. Xing J, Hedges DJ, Han K, Wang H, Cordaux R, Batzer MA. 2004. Alu element mutation spectra: molecular clocks and the effect of DNA methylation. J Mol Biol. 344:675–682. Xing J, Wang H, Belancio VP, Cordaux R, Deininger PL, Batzer MA. 2006. Emergence of primate genes by retrotransposon-mediated sequence transduction. Proc Natl Acad Sci U S A. 103:17608–17613. Zhang L, Zhang F, Huo K. 2003. Cloning and characterization of a novel splicing variant of the ZADH1 gene. Cytogenet Genome Res. 103:79–83.
© Copyright 2026 Paperzz