A Major Difference between the Divergence Patterns within the Lines-l Families in Mice and Voles1 Flavie Vanlerberghe,2 Frumpis Bonhomme,3 Clyde A. Hutchison III, and Marshall Hall Edge11 Department of Microbiology and Immunology, Unilpersity of North Carolina at Chapel Hill Ll retroposons are represented in mice by subfamilies of interspersed sequences of varied abundance. Previous analyseshave indicated that subfamilies are generated by duplicative transposition of a small number of members of the Ll family, the progeny of which then become a major component of the murine Ll population, and are not due to any active processesgenerating homology within preexisting groups of elements in a particular species.In mice, more than a third of the Ll elements belong to a clade that became active -5 Mya and whose elements are 295% identical. We have collected sequence information from 13 Ll elements isolated from two speciesof voles (Rodentia: Microtinae: Microtus and Arvicola) and have found that divergence within the vole Ll population is quite different from that in mice, in that there is no abundant subfamily of homologous elements. Individual Ll elements from voles are very divergent from one another and belong to a clade that began a period of elevated duplicative transposition - 13 Mya. Sequence analysesof portions of these divergent Ll elements ( -250 bp each) gave no evidence for concerted evolution having acted on the vole Ll elements since the split of the two vole lineages -3.5 Mya; that is, the observed interspecific divergence (6.7%24.7% ) is not larger than the intraspecific divergence (7.9% 27.2%), and phylogenetic analysesshowed no clustering into Arvicola and Microtus clades. Introduction Mammals contain a small number of families of very abundant interspersed sequences (Singer 1982)) one of which is the long interspersed repetitive sequence called “LINES- 1,” or “L 1.” L 1 elements are present in high copy number in many eukaryotes, including protozoa (Kimmel et al. 1987), insects (Fawcett et al. 1986), plants ( Schwarz-Sommer et al. 1987), and all mammals studied so far (Burton et al. 1986). The family has been extensively characterized in the mouse, rat, and primates (for reviews, see Rogers 1985; Singer and Skowronski 1985; Edge11et al. 1987; Fanning and Singer 1988; Hutchison et al. 1989); where it accounts for lo%-20% of the genome. Full-length L 1 retroposons are -7 kb in length. However, 90% of the Ll elements in mice contain a truncation of variable size at their 5’end. Laboratory strains of mice 1. Key words:transposableelements,LINES- 1, mice, voles, divergence patterns. 2. Present address: INRA, Laboratoire de Biologie des Invertebres, Unite de Biologie des Populations, Antibes, France. 3. Present address: Laboratoire Genome et Populations, CNRS URA 1493, Universite de Montpellier II, Place Eugene Bataillon, 34095 Montpellier, France. Address for correspondence and reprints: Marshall Hall Edgell, Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599. Mol. Biol. Evof. 10(4):719-731. 1993. 0 1993 by The University of Chicago. All rights reserved. 0737-4038/93/1004-0001$02.00 719 720 Vanlerberghe et al. contain three major subfamilies of elements. One subfamily, the A clade, shows ~5% divergence (Loeb et al. 1986; Schichman et al. 1992). F clade elements show - 10% divergence ( Wincker et al. 1987; Padgett et al. 1988; Adey et al. 199 1) while a third clade, the V clade, is quite divergent, with elements differing from each other by 20%25% (Jubier-Maurin et al. 199 1). All of the murine species examined to date contain abundant Ll subfamilies showing very little ( <5%) divergence between individual members (Jubier-Maurin et al. 1985; Martin et al. 1985; Bellis et al. 1987). While it was initially postulated (JubierMaurin et al. 1985; Martin et al. 1985; Hardies et al. 1986) that this large speciesspecific similarity seen in these subfamilies was due to genetic exchange processes such as gene conversion, molecular analyses (Casavant et al. 1988) of the distribution of L 1 elements within the embryonic portion of the P-globin loci of two closely related species of mice, Mus muscuZus (strain BALB/c) and M. caruli, indicated that all of the elements with low divergence were in unique positions and hence must have been placed within the two loci after the divergence of the two species some 2.4 Mya [this time represents a recent recalibration, by She et al. ( 1990) of the divergence times within the murine lineages]. This indicates that the great sequence similarity within the murine A and F clades is due to a high rate of duplicative transposition of particular subfamilies, instead of to a replacement process. To explore the generality of these processes and conclusions, we have collected Ll sequence information from two species of vole (family Microtinae) that diverged from the Muridae - 15-25 Mya (Lindsay 1978; Jaeger et al. 1985; Catzeflis et al. 1989). The mutation rate in voles is similar to that in mice (Catzeflis et al. 1989), as are the generation times. We isolated seven Ll sequences from the sibling vole, Microtus epiroticus, and six from the water vole, Arvicola terrestris, which are estimated to have diverged from each other 3.5 Mya (Catzeflis et al. 1987). Sequences collected from these two species indicate that there is no abundant subfamily of Ll elements in voles with sequence identity of >95%, as there is in murine species. Material and Methods Genomic Libraries We produced partial libraries of DNA sequence from two species of voles, Microtus epiroticus (sibling vole) and Arvicola terrestris (water vole), DNA from which was provided by F. Catzeflis (Montpellier, France). Five micrograms of DNA were sonicated to give an average size of 600 bp as measured by gel electrophoresis, and fragmented DNA of size 300-900 bp was eluted from a preparative 1.2% agarose gel by using DE8 1 DEAE paper according to a method described by Dretzen et al. ( 198 1). The fragments were end repaired for 30 min by using 10 units of Klenow enzyme and 10 units of T4 polymerase and then were phenol extracted, ethanol precipitated, and resuspended in 20 ~1 of 10 mM Tris-HCl (pH 8.0), 1 mM ethylenediaminetetraacetate. Fifty to two hundred nanograms of fragments were then ligated with 20 ng of M 13mp 18 plasmid DNA that had been digested with SmaI and alkaline phosphatase (Messing et al. 1977; Schreier and Cortese 1979; Bankier and Barrel1 1983) to produce a partial library for screening. The ligated material was transfected into DH5aF’ cells made competent according to a method described by Hanahan ( 1983). Approximately 70% of the transformants contained recombinants, as determined by lack of P-galactosidase activity on x-gal indicator plates. Ll Family in Voles 72 1 Hybridization The phage library was probed with a single-stranded radiolabeled probe called “Bad” [the 540-bp BamHI fragment from the 3’end of ORF 2 of a Mus muscdus BALB/c Ll element cloned into M 13mp7 by Martin et al. ( 1984)] at low stringency, i.e., in 40% formamide at 42°C overnight, followed by two 15min washes in 0.3 M NaCl, 0.03 M sodium citrate ( 2 X SSC), 0.1% SDS at room temperature and by three 30-min washes in 0.1 X SSC, 0.1% SDS at 42°C. These conditions should detect any Ll sequence with 265% homology with the probe. Hybridizing clones were picked, replated, and rescreened for purification according to a method described by Jahn et al. (1980). DNA Sequencing The nucleotide sequences of the vole DNA were determined by the dideoxy chain-termination method of Sanger et al. ( 1977)) as modified by Bankier and Barrel1 ( 1983) and Padgett et al. ( 1988). We used a 17-base sequencing primer derived from sequence at the 3’end of the Hind111 site of M 13mp 18 to collect our initial sequence data and then used that sequence to design an oligonucleotide primer (2 1 bases) from within the vole Ll sequence. The sequence data reported here were derived from only one strand. The GenBank accession numbers for our new sequences are MIC 1, M94693; MIC 6, M94694; MIC 8, M94695; MIC 16, M94696; MIC 19, M94697; MIC 20, M94698; MIC 28, M94699; ARV 2, M94700; ARV 7, M94701; ARV 8, M94702; ARV 17, M94703; ARV 18, M94704; and ARV 20, M94705. Sequence Analysis Sequence data were entered into a VAX 11/780 data base using the SEQUINP and BATCAT programs developed by Hutchison ( 1986). Homology search analysis between pairs of sequences was done by using programs from the University of Wisconsin GCG package (version 5.0). Multiple alignments of homologous sequences were done by using a sequence alignment tool called “SALT” (White et al. 1984). The phylogenetic trees were constructed by using DNAPARS and DNABOOT programs from the PHYLIP package (version 3.4) provided by J. Felsenstein (Department of Genetics, University of Washington, Seattle). Results Copy Number To maximize the amount of useful sequence from Ll that could easily be sequenced and would then be comparable for divergence analysis both with each other and with previously collected data, we produced partial genomic libraries by using short vole DNA fragments (average size 600 bp, as estimated on the basis of electrophoresis) generated by sonication. Clones were chosen on the basis of hybridization to a 540-bp Barn5 probe that previously had been used to define a region for similar analyses in several species of mice (Martin et al. 1985; Hardies et al. 1986). Of 6,000 Microtus recombinants, 67 hybridized positively with the Barn5 probe at low stringency, and 62 of 5,700 Arvicola recombinants were positive. The number of Ll copies per haploid genome can be approximated by using the formula N = - [In ( 1 - f)] G/S, where fis the fraction of positive clones, G is the size of the genome, and S is the size of the inserts. Given that G = 3 X lo9 bp and that S = 600 bp, the Barn5 homologous sequence is present in -55,000 copies in the vole genome (Microtus = 56,200; and . 722 Con Vole ARv2 MICl mc 19 ARva MIC 6 MIC 28 ma ARV 11 Vanlerberghe et al. 10 20 GGATCCAGCB AT . . . . . . . C. . . . . GT . . . . . . . ..c..c. . .C. . . . . . . . . . . . . . C. A.T.. .. .. .... .. . . . . . . . .C . . . . . . . . . . . . . . TT . ..T-...C. . .A . . . . . . .. .. .... .. A . . . AT. -A-T . . ..A...-. . . . . A . ..-. . . . . G . ..-. . . . -------.T.C...-. - . . . . G.--. . .A . . . . . -. . . . . . . ..-. . . . . ...*-. . . . . ...*-. .T.....T-. .T......A. . . . . C . ..-. .......... .C ........ .C.....A .. ......... C .......... ........ A. T.-Cc ..T ..a: .T ..CC T. .CC T. .cC ..... T, .C. ...... ..C. .. C . ..T AC . . . . . . . . . . . ..G.A.. . . . ..G.A.. . . . ..G.A.. . . . ..G.A.. . . . ..G.A.. .C.....A.. .C.A..GL. .Ga;. . . . . . . ..C.C.A.T . ..C.C.A.. ..TC.C.A.C . .TC.C.A.C . . . ..T.A.T . ..C!TC.A.. . ..c.c.c.. .TA.A..... loo CAA....A.. .A.....A.. . . . . . ..A.. .A..CA.A.. ----- *. . .A. .CT.A.. . ... .... .. . ... .... .. ..G....... -.T.......C .C........ .... .. . ... ... . ... ... ..T.....T... .... . .. .... . ... ... I20 ----ChGAX...A -----..A. -......... -----..A. ----.AT. -----.... -......... -----.... -----.... 130 140 150 160 TM;ATBBBG .......... G...T ...... ..CA ........... ..-. G ......................... ..-. .... C ..... G ...... T ... .TG ......... ..G..-. .......... .......... G ................... .AA..G.T-. .G......G. G .......... ..cC ....... ..C...-. G . ..T ...... ..CA..A ...... ..G.-. ... T ...... .............................. .------.-. .............................. AA ... -G-A ................ T ... C ..CA.. ... Cl'..... G-. .................... G .............. ..G-. ....................... CA.......... ..G-. .......... .-..T.T ... .G ............. ..G-. ..-....... ... ... ... . ... ... ... . . . . . . ..G-. ARVI mc mv Aw MIC 20 18 20 16 . ..C.T.A.. Con I43 ............... LllmA2 ............. LJM F3 .............. Con MC ............... ...... C ........ con W Fu4T ......... T Llm 19 A..C ... ..T T..C ... ..C Con Vole NW2 MICl MIC 19 ma MIX 6 mc 28 MIC a ARV 17 AN7 MIC 20 mv la ARV 20 MIC 16 Con M3 LllMA2 LLLW F3 ConMc c0n m RAT La-d 19 90 AT-T . . . . . . . ... .. .... .. . . ... ... .. . . . . . . . . . G. . .. ... ... . . . . . . . . KG. .G........ . .. ... ... . . . ..TG...A .c........ e...... . . . . . . . . . . . . . . . . . . . . . T. ........ C. ........ C. ........ C. ........ CY ........ C. ... ..C ..CA . ..AT...C. .. T ...... C . . . C . ..-. . . . C . . .-. . . . C . ..-. . . . C . ..-. . . . C ..*-. . . . C . ..-. . ..A..G-. . . . . . G.-. 110 ............- . . . . . . .A . . .C.....A.. .c.. . .GL. . . . . . . . . . . . . . . . ..G.- . . .C . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . ..- ---------------- . ... .... .. .A.......- -----.G.A .A...A.A .A...A.A .A...A.A .A...A.A .A...A.A .A...A.A .A.....G .A..CA.A ........... ........... ........... ........... ........... ........... .. .A ..... ... ..A.~ . . . . . . ..- . . . . . . . . . . . . ------ . . . . ------ . . . . ------ . . . --_--- ------ . . . . . . . . ---e--T . . . ----d-f . . . . ..A....... ...... T ... . ..C.T .... .Tr.AG .... .TI'A..C ... ....... . . .T . . . .Ah . . .A.. . .AA . . . T.T.... C. . . . . . . .C . . . . . -.. A. .... .. .... .... ... ... ... ... .... . ..AT..T.. . . ..T..T.. GT. .A.. . . . .T..T..... .T..T..... . . . CT . . . . . ..- . . . . T . . .T........ . . . . . ..T.. .......... .GT.....A. .G ........ .G ........ ... G .... C. ... G ...... ... G .... C. ... G .... C. ... G .... C. ... G .... C. ... l-r ..... .. c ..... c. Q;TA.G ..... GGTA.G ..... G2TA.G ..... GTA.G ..... WULT .... ..TA.A.GA. .G ... ..G .A .G.T.T..A. G .......... G .......... G .......... G . ..T.. .... G. .. . ...... G......T ... G . ..T ... A. .. ..T..AA. .-CA ..... ..CA ..... .-CA ..... ..CA ..... .T.CA ..... . ..CA ..... G..CL.T .. . ..CA...A. ..A..G ....... ..A..G ....... ..A..G ....... ..A. .G ....... ..A..G ....... ..A..G ....... G..A ...... . ..G?iT . ..T .......... .. . ..... ......... T ... C ...... . ..A ...... .T ........ C ......... C ......... T ......... C ......... C......... C ......... C ......... .CG.... ..T ..C.-. ..c.-. ..C.A. ..C.-. ..C.-. ..C.-. CAL .CC.-. ... ..T..-. Arvicola = 54,700). Hybridization of the same probe, under the same low-stringency conditions, to a BALB/c library (provided by S. Schichman) containing sonicated DNA fragments of average size 1,100 bp gave 98 positive clones, of 3,000 recombinant phages tested. This gives a copy number of 90,000 for the Barn5positive sequences in BALB/c. Determination of Nucleotide Divergence The region used for divergence analysis was the 3’ end of the large mouse Ll open reading frame (coordinates 6385-6697 from the Ll Md-A2 sequence from Mus musculus BALB/c; Loeb et al. 1986). Seven clones from the Microtus library contained ~200 bp of Ll sequence and were used in this analysis. We also isolated from the Arvicola library six clones with 2.200 bp of sequence from this same region. These 13 sequences were then aligned to each other and to consensus Ll sequences from Mus musculus BALB/c (A and F types), Mus caroli, A&s platythrix, Rattus rattus, and Homo sapiens and to a divergent L 1 element, L 1Md- 19, from Mus musculus BALB / c (fig. 1). Each of the 13 aligned vole sequences were compared with each other and with reference sequences in all pairwise combinations (table 1). Divergence was determined by the fraction of overlapping bases mismatched, given the alignments, with Ll Family in Voles 170 Con Vole I4P.v 2 ......... . ......... ... G ..... . . ..G.A . ..A .... C ..... .- .. C ..... .T..A .. ..T .T ........ MTC 6 MIC 28 MIC 8 AFW 17 Aw 7 MIC 20 ARV 18 AEW 20 MK 16 .......... ............... .................... .......... .................... .................... ..... ..-- - G ......... .... ......... ............ .............. Ll.W ......... LLLRi 19 AEIv2 MICl MIC 19 ARV8 MIC 6 MIC 28 MlC8 ARV 17 mv7 MIC 20 ARV 18 ARV 20 MIC 16 Con Mi LalMdA2 Lllm F3 Con MC Con PQ Llai 19 HWAN ..G ....... ---- ...... . ---------- . .......... ..C ..... C ..... . ..T.R .............. . ..T ................ .............. C ..... G ......... T ......... ......... C .... A ..... 250 Con Vole ..T .. ..A ............. Con f&i LlMiA2 ConMc Con &I 210 ............. T ... 'IS. .T.A ....... ..T ..... C ........... ..TG..TG. .T----...... T .... . ... T .... . ..... T ... T .............. ..C ..A .................... ....... C ............ A .. ................. ..T ....... .G.. ...... .......... ...... AT .. --1-111------m-v--- .... ..--- ..... ..-- C....---.... ..--- .T.....-A.....--_---A .... A .... ..lT A AG.....-A .... ..-- __________ ---------- G ...... ..A .... ..- ... .GcAm ... - ....... T. ..... .-_. . .GcATAc ... T.T ................. GET ...... C ................... G ......... - ......... .A ........ A....A .... GA....T ....... ..T ............. .GCA.G . ..A -T..CA ........ ..-- .. TT..A ......................... -T....cA ............ G...A.- ... -------------------T..C ..... -_--------------------------_---_---------------------- ..... ..-..... ..-..... ..-..... ..-.C.....-.C.....-- - -G.....AT. -G.....AT. -G.....AT. -G.....AT. -G.....AT. -...C..GT. TA ..... TA ..... TA ..... TA ..... TA ..... TA ..... ..C ..C ..C ..C ..C ..C Cl'..C ..... Cl'..C ..... .T ........ Cl'..C ..... CT ........ Gl'........ .A.....-- - -TG....GT. .A..XC ... Gl'...G.C 190 230 240 AASBTGTGGJJ =mXAC mc 1 Iac 19 AFW 8 F3 200 180 220 723 260 BGABBB--- ............. T !!A T .... TA ..T....T A ..T .. ..TA ..T .. ..T A . ..T....T A . ..T....T A ..n; .. ..CA .... ............. .... ..A .... ........... ........... .... ..A ..T .... ..A ..A C ..... A ... 270 280 300 290 CA%-ATM;T =- - . ..T-..T.. . ..T-..... -- . . C . . . . . ..-..A.. ::.T-2. . . ..-..AT. . . . . . . ..TA ..* . . . . . TA . . ..-G?L.T . , . ,A , . , , . ----ET... A . . . ..A... . . . . . . ..TA . . . . A . ..TA A-..AGA..A AG. AGiL .A A . ..A..A.. . .... .. . . . . . . . -.T . . . .A A-..AG..A. Ac..AGx.. _______-_----------- ---------- ---------. . . . . . . . .T. .G....A... .G.......T CG. .TTCXGT. G. .-7.C.T.. G...K.T.. G..T'IC.T.. . . . . . ..C.. . . . . . . .C. . G..GCG---G..GGG---- . . ..-..... . . ..-..... ACA.A.T.A. ACA.ATT.A. AU4.A.T.A. ACA.A.T.A. AC..A.T.A. AC..ACT.A. . ..GcaA.Tc . ..G.AA.'TC . ..GG'+A.!K . ..GG?iA.'IC . . ..GAA.K .G....A..T .CA.A..A.A .T...-C... GGAW . . . . . . . .T. . . .C . . . . -. . . . ..C.-.. A . . .A.. . . . ..TAA..... .A....A--. . . . -----..T...A--. . . . ..A?A..G...AAA- -...T..GT. -. . .-. .--- . . . . . . .C. . G..GOG---- . . ..-..... ..,.... C. . . . ..AT.T.. ..T....... G..GCG---G..GGG---G . . . ..---- . . ..-...Y. T...-..T.. T . ..-..... . . . . . . ..TA . . . . . . ..TA .,..,..,TA . . . . . . ..TA . . . . . . ..TA . . . . . . .CTA . . . ..A.G.. G....---< T...-..T.. C. .CA.ACTA .. . . . ..T.A-. ..G.AAG .G....A.G. CG. .lTXT. FIG. 1.-Alignments of Barn5 homologous sequences from Microtus and Arvicolu with rodent and human L 1 sequences. The alignments are shown in a difference format with respect to the consensus sequence (Con Vole) derived from the vole sequences. Nucleotides underlined in the concensus vole sequence represent the informative sites used to derive the parsimony tree shown in fig. 2. Identity in the other sequences is indicated by a period (.), a nucleotide difference is indicated by the appropriate base, a deletion or pad is indicated by a dash (-), and lack of sequence is indicated by a space. The letter “R” in the figure indicates an A or G; the letter “Y,” a T or C. The sources of the other sequences are as follows: Mus curoli (Con MC) and Muspluththrix (Con Mp) (Martin et al. 1985); LlMd A2 (Loeb et al. 1986) and LlMd F3 and LlMd 19 (Shehee et al. 1989; R. Shehee, personal communication); rat (from ratline-3; D’Ambrosio et al. 1986); and human (from TbG 41; Hattori et al. 1985). no contribution to the score by insertions or deletion (indels). In contrast to what was found in the mouse, the average intraspecific divergence in both Microtus ( 18.5%; range 7.9%-27.2%) and Arvicola ( 16.2%; range 9.6%-23.3%) was not significantly different from the interspecific divergence of 17.5% (range 6.7%-24.7%). Table 2 summarizes the divergence distribution (number of pairwise comparisons within a particular range of divergence) in the Barn5 homologous region of Ll elements from three groups-voles, mice, and humans. However, the Ll sequences from mice represented in this table were collected in previous studies at high stringency. Thus the divergent V clade is not represented in the table, because no V clade sequence is Table 1 Nucleotide Divergence MIC ARV 2 MIC I MIC 19 ARV 8 MIC 6 MIC 28 MIC 8 ARV 17 ARV 7 MIC 20 ARV 18 ARV 20 MIC 16 Con Md LlMdA2 LlMd F3 Con MC Con Mp Rat LlMd 19 . Matrix 1 MIC 12.1 for Individual 19 ARV 18.6 13.3 19.2 16.5 20.5 8 MIC 6 MIC 18.5 18.4 21.5 18.5 21.2 23.3 27.2 21.8 23.0 Ll Element 28 MIC 15.8 15.5 18.4 17.5 17.2 21.8 8 ARV 19.2 18.0 19.6 15.4 20.3 23.2 6.7 Pairs 17 ARV 7 MIC 16.7 20.2 22.7 23.2 21.0 23.4 10.0 10.9 . NOTE.-Data are percent nucleotide divergence, with no contribution from indels. 17.4 16.0 18.5 21.2 19.2 22.9 7.9 8.7 11.3 20 ARV 19.0 21.7 21.8 21.6 22.4 24.7 17.3 11.9 14.2 11.7 18 ARV 19.9 19.8 22.2 17.3 23.5 24.6 10.9 10.5 15.1 10.0 9.6 20 MIC 17.4 16.9 19.2 20.4 20.1 24.8 12.2 8.4 13.8 10.3 11.5 10.4 16 Con Md LlMd 25.0 20.4 21.1 16.4 22.9 24.6 20.1 21.7 25.7 20.7 24.2 21.0 21.7 23.8 19.3 20.0 15.7 22.3 23.2 19.0 21.1 24.6 20.1 23.7 20.4 20.1 1.7 A2 LlMd 23.8 21.5 21.6 16.4 21.7 24.3 21.2 22.2 26.3 21.3 24.2 21.5 21.5 3.0 4.7 F3 Con MC Con Mp 23.8 22.1 21.6 17.2 22.9 23.4 21.2 22.8 26.9 21.7 25.9 21.0 23.6 1.7 2.7 2.7 24.4 21.7 21.6 16.8 24.6 24.0 21.2 20.7 26.3 21.7 25.3 21.0 21.9 7.0 8.9 8.9 6.7 Rat LlMd 23.4 21.5 21.6 25.3 24.0 24.7 22.9 24.9 25.1 22.8 27.5 25.4 24.8 15.0 14.7 16.0 16.3 16.2 25.5 25.5 24.7 22.8 29.5 30.9 24.9 27.0 26.9 26.1 32.0 26.5 28.3 26.6 26.8 27.3 26.6 27.2 24.3 19 Human 27.7 31.5 32.3 27.4 30.6 30.6 29.3 32.1 33.0 31.0 32.0 32.0 31.8 31.6 33.5 31.7 30.5 31.6 35.2 37.6 Ll Family Table 2 Nucleotide Divergence Distribution Microt us-n = 7, p = 2 1, d = 18.45% . _. . Arvicola-n = 6, p = 15, d = 16.25% . .. . Voles-n = 13, p = 78, d = 17.5% . _. . . Mus domesticus’-A and F clades, n = 10,~ = 45, d = 4.1% . . . M. caroli-n = 10, p = 45, d=4.8% .. .. .. .. . .. ... . .. M. platythrix-n = 10, p = 45, d=4.1% .. .. .. .. . .. .. .. .. Human-n = 10, p = 45, d= 13.7% . . . . . . . . . . . . . . . . 725 within the Ll Families in Voles, Mice, and Humans No. 0%5% SPECIES a in Voles 5%-10% WITH DIVERGENCE lo%-15% 15%-20% OFT 20%-25% 25%-30% . 0 2 2 9 7 1 . 0 1 5 7 2 0 0 7 14 28 27 2 . 32 13 0 0 ND ND . 25 20 0 0 ND ND . 32 13 0 0 ND ND . 5 15 5 10 9 1 ’ n = No. of sequences; p = no. of pairwise combinations; and d = average divergence. b Data are percent divergence with no contribution from indels. ND = not determined. ’ The inbred strain, BALB/c. available from the region being analyzed. The divergence distribution in the voles is quite different from that for mice, even after account is taken of the missing divergent clade in mice, in that there is no abundant clade of very similar Ll elements in the vole, as there is in the mouse. In our sample of vole Ll elements, 90% of the pairwise comparisons show a divergence > 10%. On the other hand, in each species of mouse there is a clade whose members show divergence values of <5% from each other. Age of Divergent Clade We can estimate the divergence rate of Ll elements in this portion of the element on the basis of the observations by Martin et al. ( 1985)) who measured a 5.4% difference between L 1 elements from caroli and BALB/c, which have been estimated to have diverged from each other 2.4 Mya. Correcting for homoplastic substitutions by the formula of Jukes and Cantor ( 1969) (Pestim= -‘/4 X ln( 1 - 4/3X Pobserved),we get an estimated divergence rate of 2.28%/Myr. This rate is very similar to that for singlecopy DNAs, as estimated from DNA-DNA hybridization between A4us musculus and Mus caroli (2.9%; She et al. 1990) and within voles (2.5%; Catzeflis et al. 1989). The observed values of Ll divergence in the vole, 6.7%-24.7%, can similarly be corrected, to give estimated values of 7.0%-30.0%. It therefore appears that the most divergent L 1 elements in the vole were placed into the genome - 13.1 Mya and that the least divergent elements in our collection w&-e placed in the genome - 3.1 Mya. Phylogenetic Analysis To visualize the ancestry of the various Ll elements, we analyzed our data set with two kinds of phylogeny-inference algorithms. One, the FITCH program of the PHYLIP package version 3.4, was used to search for the Fitch-Margoliash least-squares estimate of the phylogenetic tree (results not shown) by using the pair-wise divergence 726 Vanlerberghe et al. matrix (table 1). In addition, we ran the program DNAPARS (same package), which performs a site-by-site maximum-parsimony algorithm on the aligned sequences (fig. 1 ), over two short regions and over a large region from which the tree shown (fig. 2) is derived. The region covering nucleotides 1 I- 195 (fig. 1) contains 42 variable sites if only the 13 vole sequences are considered and contains 79 variable sites if all 21 sequences are taken into account. This region contains two CG dinucleotides that were removed from one of the short alignments for analysis. Region lo-288 contains 67 informative sites over 12 vole sequences and contains 10 1 sites when the 8 reference sequences are added. Analysis of these three data sets by using the parsimony method gave the same general results; that is, while the branching details between the vole sequences varied somewhat, depending on the exact data set (75,79, or 10 1 informative sites), the vole sequences always clustered together relative to the mouse, rat, and human sequences, and the Microtus and Arvicola sequences were always admixed. The parsimony tree shown (fig. 2) gives bootstrap information as percent of 500 replicates ( DNABOOT ) . As expected from the divergence distribution, the branching order of the vole Ll sequences is not resolvable, by these analyses, into two species-specific groups. Thus there is no evidence, within our sampling of the vole L 1 population, for species-specific homogenization of this divergent Ll family. Maximum-parsimony analyses of our data set also showed no clustering of the Microtus sequences with respect to Awicola. The number of substitutions found in the most parsimonious phylogenetic trees for the 38 informative sites among the 13 vole sequences is 105. This gives 2.6 mutations per site, which is noticeably higher than the number of substitutions per site (2.1) when one analyzes the entire 185-bp region from which the informative sites were drawn. Discussion Distribution of Divergence within the Ll Population in Voles Although the average pairwise divergence among the vole Ll elements was 17.5%, some elements showed pairwise divergence as low as 6.7%. This indicates that at least one vole Ll element became active only a few million years ago. This raises the possibility of explaining the differences that we see between voles and mice by postulating that voles have a large number of currently active elements and that the great divergence seen in our Ll sequences is due to not collecting multiple samples from each of the clades. This does not seem to be the case, because elements recently derived from active clades should have intact open-reading-frame sequences, but sequences from the divergent vole elements contain multiple mutations (frameshifts, termination codons, and amino acid replacements), indicating that it has been a long time since the elements were inserted into the genome. The estimated divergence range ( 30%7% ) and the estimated L 1 divergence rate of 2.28% /Myr indicates that an L 1 duplicative transposition interval began in voles - 13 Mya, which is after the most recent estimated time for the divergence of mice and voles. This suggests that the Ll elements that we have isolated from the vole should be vole specific. This supposition is supported by the fact that the bulk of the vole Ll elements cluster together in the phylogenetic analyses (fig. 2). The youngest Ll element in our collection was generated - 3 Mya. Although the size of our data set does not allow us to exclude the possibility that younger elements exist in the vole genome, we can conclude that the rate of amplification of any currently active element must be very low compared with the active elements in mice, since there is no single abundant clade of recently (43 Mya) amplified ARV 2 MIC 1 MIC 19 ARV 8 MIC 6 25 MIC 28 MIC 8 ARV 17 ARV 7 27 MIC 20 ARV 18 ARV 20 MIC 16 CON MD LlMD A2 LlMD F3 95 CON MC 92 87 ’ CON MP RAT LlMD 19 HUMAN FIG. 2.-Unrooted parsimony tree for the L 1 sequences from voles, rodents, and human. This maximumparsimony tree was obtained by using the DNAPARS and DNABOOT programs from the PI-IYLIP package (Felsenstein 199 1) using 102 informative sites from the nucleotide sequences from fig. 1, with the human sequence set as the outgroup. The numbers at the nodes are percent of the time that the multiple tree members to the right of the node were found in the 500 replicates analyzed. Branch lengths have no significance. 728 Vanlerberghe et al. Ll elements in voles, as has been seen in six different species of mice (Hardies et al. 1986; Bellis et al. 1987). Copy Number of the Divergent Clade Mice and voles are estimated to have diverged -20 Mya [ Catzeflis et al. ( 1989) estimate it as 15 Mya on the basis of DNA data, and Lindsay ( 1978) and Jaeger et al. ( 1985) estimate it as 25 Mya on the basis of fossil evidence], and hence the divergence of mice and voles occurred prior to the time when the oldest Ll elements in our collection were placed into the vole genome. Hybridization of an Ll probe from the murine Barn5 region of Ll at low stringency to plaques containing DNA fragments of the vole genome indicates that this region is present in voles at a relative abundance of 55,000 copies/genome equivalent. Hybridization of the same probe, under the same low-stringency conditions, to our BALB / c library gives 90,000 copies / genome equivalent of the Barn5 region, as compared with 57,500 copies (45,00070,000 copies, depending on the library) for the Barn5 region in the A and F clades when the probing was carried out at high stringency (M. Comer, personal communication ) . Are Ll Elements Evolving in Concert? Sequence families that are evolving in concert (i.e., families within which processes are at work to reduce the species-specific divergence of existing elements) should show a smaller intraspecific versus interspecific divergence. In the vole the intraspecific divergence within our Ll samples (7.9%-27.2%) was not smaller than the interspecific divergence (6.7%-24.7%), indicating that there are no processes at work on this divergent vole Ll family that are sufficient to reduce the divergence of existing members within these species since their divergence -3.5 Mya. This conclusion is supported by the fact that we see no species-specific clustering of the Arvicola and Micro&s Ll elements within our analyses of the phylogenetic relationships of the vole Ll elements sampled. These observations tell us that the rate of any exchange processes such as gene conversion acting on the L 1 family since the divergence of Arvicola and Micro&s is not sufficient to reduce the average divergence between L 1 elements in these species. L 1 Population Dynamics Our results suggest that L 1 expansion within the genome is a discontinuous process and that Ll amplification has turned on and off at various times during evolution; that is, Ll elements appear to produce large numbers of elements via duplicative transposition only during discrete intervals, and different species seem to have different intervals in which the Ll amplifications have been high. This is similar to what has been concluded for the population dynamics of the smaller repetitive elements (i.e., SINES) in mammals (Deininger 1989), although the amplification intervals for these smaller elements seems to be so short as to be more of a burst. In murines, in contrast to the voles, there has been a relatively recent duplicative transposition interval in which a large number ( -50,OfiO) of new Ll elements were produced during a 2-3Myr interval (Hardies et al. 1986). A puzzling aspect of this amplification event in murines is that it appears to have taken place at approximately the same time in at least six murine species (Mus platythrix, Mus spretus, Mus spicilegus, Mus macedonicus, Mus caroli, and Mus musculus) that diverged from each other prior to the beginning of this duplicative transposition interval (Jubier-Maurin et al. 1985; Martin et al. 1985; Bellis et al. 1987). Since it seems unlikely that in mice there were six Ll Family in Voles 729 independent amplification events all at the same time, there must have been, in mice, some unknown common feature leading to these amplification events that was not present in voles. One way to explain this would be to postulate that the duplicative transposition interval began in the ancestor common to these murine species but that it was accompanied by a deletion process, as proposed by Hardies et al. ( 1986 ) . Recalibrating that rate by using She et al.‘s( 1990) time of the domesticus/caroZi divergence gives an Ll turnover rate of 0.8 Myr, which would be sufficient to give the observed results. Another puzzling feature, given both this picture of L 1 duplicative transposition intervals occurring at random times in mammals and the capacity to generate a very large number of Ll elements in a time very short with respect to the mammalian radiation ( -50,000 copies in 3 Myr in mice), is the relative similarity of Ll copy numbers in the various mammals (Burton et al. 1986). It will therefore be very interesting to discover whether there are processes that can act to limit Ll copy number, such as impact on fitness, Ll -mediated copy-number control, or generic deletion mechanisms acting on the Ll family. Acknowledgments We thank S. C. Hardies for providing the primate Ll sequence alignment and for very helpful discussions concerning the manuscript. We also thank S. Stamper for synthesizing our oligonucleotide primers for sequencing. This research was supported by Public Health Service grant AI08998 from the National Institutes of Health to M.H.E. and C.A.H. and by NATO grant 88/762 to F.B. LITERATURE CITED N. B., M. B. COMER, M. H. EDGELL, and C. A. HUTCHISON III. 199 1. Nucleotide sequence of a mouse full-length F-type Ll element. Nucleic Acids Res. 19:2497. BANKIER, A. T., and B. G. BARRELL. 1983. Shotgun DNA sequencing, Pp. l-34 in R. A. FLAVELL, ed. Techniques in nucleic acid biochemistry. Vol. B5. ElsevierScientific, Limerick, Ireland. BELLIS, M., V. JUBIER-MAURIN, B. DOD, F. VANLERBERGHE, A. M. LAURENT, C. SENGLAT, F. BONHOMME, and G. ROIZES. 1987. Distribution of two recently inserted long interspersed elements of the Ll repeat family at the Alb and Bh3 loci in wild mice. J. Mol. Evol. 4:35 l363. BURTON, F. H., D. D. LOEB, C. F. VOLIVA, S. L. MARTIN, M. H. EDGELL, and C. A. HUTCHISON III. 1986. Conservation throughout Mammalia and extensive protein-encoding capacity of the highly repeated DNA interspersed sequence one. J. Mol. Biol. 187:291-304. CASAVANT, N. C., S. C. HARDIES, F. D. FUNK, M. B. COMER, M. H. EDCELL, and C. A. HUTCHISON III. 1988. Extensive movement of LINES ONE sequencesin P-globin loci of A4us caroli and A4us domesticus. Mol. Cell. Biol. 8:4669-4674. CATZEFLIS, F. M,, E. NEVO, J, E. AHLQUIST, and C. G. SIBLEY. 1989, Relationships of the chromosomal speciesin the Eurasian mole rats of the Spalax ehrenbergi group as determined by DNA-DNA hybridization, and a estimate of the spalacid-murid divergence time. J. Mol. Evol. 29:223-232. CATZEFLIS, F. M., F. H. SHELDON, J. E. AHLQUIST, and C. G. SIBLEY. 1987. DNA-DNA hybridization evidence of the rapid rate of muroid rodent DNA evolution. Mol. Biol. Evol. 4:242-253. D’AMBROSIO, E., S. D. WAITZKIN, F. R. WITNEY, A. SALEMME, and A. V. FIJRANO. 1986. Structure of the highly repeated, long interspersed DNA family (LINE or Ll Rn) of the rat. Mol. Cell. Biol. 6:4 1 l-424. DEININGER, P. L. 1989. SINES:short interspersed repeated DNA elements in higher eucaryotes. ADEY, 730 Vanlerbergheet al. Pp. 6 19-636 in M. HOWEand D. BERG,eds. Mobile DNA. American Society for Microbiology, Washington, D.C. DRETZEN,G., M. BELLARD,P. SUSSONE-CORSI, and P. CHAMBON. 198 1. A reliable method for recovery of DNA fragments from agaroseand acrylamide gels. Anal. Biochem. 112:295298. EDGELL,M. H., S. C. HARDIES,D. D. LOEB,W. R. SHEHEE, R. W. PADGETT, F. H. BURTON, M. B. COMER, N. C. CASAVANT, F. D. FUNK, and C. A. HUTCHISON III. 1987. The Ll family in mice. Pp. 107- 129 in G. STAMATOYANNOPOLOS and W. A. NIENHUIS, eds. Developmental control of globin gene expression. Alan R. Liss, New York. FANNING, T. G., and M. F. SINGER. 1988. LINE-l : a mammalian transposableelement. B&him. Biophys. Acta. 910:203-2 12. FAWCETT, D. H., C. K. LISTER, E. KELLETT, and D. J. FINNEGAN. 1986. Transposable elements controlling I-R hybrid dysgenesisin D. melanogaster are similar to mammalian LINES. Cell 47:1007-1015. FELSENSTEIN, J. 199 1. PHYLIP (phylogeny inference package), version 3.4. Distributed by the author, University of Washington, Seattle. HANAHAN, D. 1983. Studies on transformation of Escherichia coZiwith plasmids. J. Mol. Biol. 166:557-580. HARDIES, S. C., S. L. MARTIN, C. F. VOLIVA, C. A. HUTCHISON III, and M. H. EDGELL. 1986. An analysis of replacement and synonymous changes in the rodent Ll repeat family. Mol. Biol. Evol. 3:109-125. HATTORI, M., S. HIDAKA, and Y. SAKAKI. 1985. Sequence analysisof a Kpn I family member near the 3’end of human beta-globin gene. Nucleic Acids Res. 13:7813-7827. HUTCHISON, C. A. III. 1986. Sequence gel reading with a portable computer. Nucleic Acids Res. 14:1917. HUTCHISON, C. A. III, S. C. HARDIES, D. D. LOEB, W. R. SHEHEE, and M. H. EDGELL. 1989. LINES and related retroposons: long interspersedrepeated sequencesin the eucaryotic genome. Pp. 157-169 in D. E. BERG and M. M. HOWE, eds. Mobile DNA. Vol 1. American Society for Microbiology, Washington D.C. JAEGER, J. J., H. TONG, E. BUFFETAUT, and R. INGAVAT. 1985. The first fossil rodents from the Miocene of northern Thailand and their bearing on the problems of the origin of the Muridae. Rev. Paleobiol. 4: l-7. JAHN, C. L., C. A. HUTCHISON III, S. J. PHYLLIPS, S. WEAVER, N. L. HAIGWOOD, C. F. VOLIVA, and M. H. EDCELL. 1980. DNA sequence organization of the P-globin complex in the BALB/c mouse. Cell 21: 159- 168. JUBIER-MAURIN, V., G. CUNY, A.-M. LAURENT, L. PAQUEREAU, and G. ROIZES ., 199 1. A new 5’ sequence associated with mouse Ll elements is representative of a major class of Ll termini. Mol. Biol. Evol. 9:41-55. JUBIER-MAURIN, V., B. J. DOD, M. BELLIS, M. PIECHACZYK, and G. ROIZES. 1985. Comparative study of the L 1 family in the genus A&s: possible role of retroposition and conversion events in its concerted evolution. J. Mol. Biol. 184:547-564. JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 2 l-1 32 in H. N. MUNRO, ed. Mammalia protein metabolism. Vol. 3. Academic Press,New York. KIMMEL, B. E., 0. K. OLE-MOIYOI, and J. R. YOUNG. 1987. Ingi, a 5.2-kb dispersed sequence element from Trypanosoma brucei that carries half of a smaller mobile element at either end and has homology with mammalian LINES. Mol. Cell. Biol. 7: 1465-1475. LINDSAY, E. H. 1978. Eucricetodon asiaticus (Matthew and Granger), an Oligocene rodent (Cricetidae) from Mongolia. J. Paleontol. 52:590-595. LOEB, D. D., R. W. PADGETT, S. C. HARDIES, W. R. SHEHEE, M. H. EDGELL, and C. A. HUTCHISON III. 1986. The sequence of a large L 1Md element reveals a tandemly repeated 5’end and several features found in retrotransposons. Mol. Cell. Biol. 6: 168-l 82. MARTIN, S. L., C. F. VOLIVA, S. C. HARDIES, M. H. EDGELL, and C. A. HUTCHISON III. 1985. Ll Family in Voles 73 I Tempo and mode of concerted evolution in the Ll repeat family of mice. Mol. Biol. Evol. 2:127-140. MESSING,J., B. GRONENBORN, B. MULLER-HILL, and P. H. HOFSCHNEIDER. 1977. Filamentous coliphage Ml3 as a cloning vehicle: insertion of a Hind II fragment of the lac regulatory region in Ml3 replicative form in vitro. Proc. Natl. Acad. Sci. USA 74:3642. PADGETT, R. W., C. A. HUTCHISON III, and M. H. EDGELL. 1988. The F-type 5’ motif of mouse Ll elements: a major classof Ll termini similar to the A-type in organization but not in sequence. Nucleic Acids Res. 16:739-749. ROGERS, J. H. 1985. The origin and evolution of retroposons. Int. Rev. Cytol. 93:187-279. SANGER, F., S. NICKLEN, and A. R. COULSON. 1977. DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74:5463-5467. SCHICHMAN, S. A., D. M. SEVERYNSE, M. H. EDGELL, and C. A. HUTCHISON III. 1992. Strandspecific LINE-l transcription in mouse F9 cells originates from the youngest phylogenetic subgroup of LINE-l elements. J. Mol. Biol. 224:559-574. SCHREIER, P. H., and R. CORTESE. 1979. A fast and simple method for sequencing DNA cloned in the single-stranded bacteriophage, M 13. J. Mol. Biol. 129: 169- 172. SCHWARZ-SOMMER, Z., L. LECLERCQ, E. G~BEL, and H. SAEDLER. 1987. Cin4, an insert altering the structure of the Al gene in Zea map, exhibits properties of nonviral retrotransposons. EMBO J. 6:3873-3880. SHE, J. X., F. BONHOMME, P. BOURSOT, L. THALER, and F. M. CATZEFLIS.1990. Molecular phylogenies in the genus Mus: comparative analysisof electrophoretic, scnDNA hybridization and mtDNA RFLP data. Biol. J. Linnean Sot. 41:83-103. SHEHEE, W. R., D. D. LOEB, N. B. ADEY, F. H. BURTON, N. C. CASAVANT, P. COLE, C. J. DAVIES, R. A. MCGRAW, S. A. SCHICHMAN, D. M. SEVERYNSE, C. F. VOLIVA, F. W. WEYTER, G. B. WISELY, M. H. EDGELL, and C. A. HUTCHISON III. 1989. Nucleotide sequence of the Balb/c mouse P-globin complex. J. Mol. Biol. 205:41-62. SINGER, M. F. 1982. SINES and LINES: highly repeated short and long interspersed sequences in mammalian genomes. Cell 28:433-434. SINGER, M. F., and J. SKOWRONSKI. 1985. Making senseout of LINES: long interspersed repeat sequencesin mammalian genomes. Trends Biochem. Sci. 10: 119-122. WHITE, C. T., S. C. HARDIES, C. A. HUTCHISON III, and M. H. EJXELL. 1984. The diagonaltraverse homology search algorithm for locating similarities between two sequences.Nucleic Acids Res. 12:75 l-766. WINCKER, P., V. JUBIER-MAURIN, and G. ROIZES. 1987. Unrelated sequences at the 5’ end of mouse LINE-l repeated elements define two distinct subfamilies. Nucleic Acids Res. 15: 8593-8606. BRIAN CHARLESWORTH, reviewing editor Received October 14, 199 1; revision received March 9, 1993 Accepted March 11, 1993
© Copyright 2026 Paperzz