A Major Difference between the Divergence

A Major Difference between the Divergence Patterns
within the Lines-l Families in Mice and Voles1
Flavie Vanlerberghe,2 Frumpis Bonhomme,3 Clyde A. Hutchison III,
and Marshall Hall Edge11
Department of Microbiology and Immunology, Unilpersity of North Carolina at Chapel Hill
Ll retroposons are represented in mice by subfamilies of interspersed sequences
of varied abundance. Previous analyseshave indicated that subfamilies are generated
by duplicative transposition of a small number of members of the Ll family, the
progeny of which then become a major component of the murine Ll population,
and are not due to any active processesgenerating homology within preexisting
groups of elements in a particular species.In mice, more than a third of the Ll
elements belong to a clade that became active -5 Mya and whose elements are
295% identical. We have collected sequence information from 13 Ll elements
isolated from two speciesof voles (Rodentia: Microtinae: Microtus and Arvicola)
and have found that divergence within the vole Ll population is quite different
from that in mice, in that there is no abundant subfamily of homologous elements.
Individual Ll elements from voles are very divergent from one another and belong
to a clade that began a period of elevated duplicative transposition - 13 Mya.
Sequence analysesof portions of these divergent Ll elements ( -250 bp each) gave
no evidence for concerted evolution having acted on the vole Ll elements since
the split of the two vole lineages -3.5 Mya; that is, the observed interspecific
divergence (6.7%24.7% ) is not larger than the intraspecific divergence (7.9%
27.2%), and phylogenetic analysesshowed no clustering into Arvicola and Microtus
clades.
Introduction
Mammals contain a small number of families of very abundant interspersed
sequences (Singer 1982)) one of which is the long interspersed repetitive sequence
called “LINES- 1,” or “L 1.” L 1 elements are present in high copy number in many
eukaryotes, including protozoa (Kimmel et al. 1987), insects (Fawcett et al. 1986),
plants ( Schwarz-Sommer et al. 1987), and all mammals studied so far (Burton et al.
1986). The family has been extensively characterized in the mouse, rat, and primates
(for reviews, see Rogers 1985; Singer and Skowronski 1985; Edge11et al. 1987; Fanning
and Singer 1988; Hutchison et al. 1989); where it accounts for lo%-20% of the genome.
Full-length L 1 retroposons are -7 kb in length. However, 90% of the Ll elements in
mice contain a truncation of variable size at their 5’end. Laboratory strains of mice
1. Key words:transposableelements,LINES- 1, mice, voles, divergence patterns.
2. Present address: INRA, Laboratoire de Biologie des Invertebres, Unite de Biologie des Populations,
Antibes, France.
3. Present address: Laboratoire Genome et Populations, CNRS URA 1493, Universite de Montpellier
II, Place Eugene Bataillon, 34095 Montpellier, France.
Address for correspondence and reprints: Marshall Hall Edgell, Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599.
Mol. Biol. Evof. 10(4):719-731.
1993.
0 1993 by The University of Chicago. All rights reserved.
0737-4038/93/1004-0001$02.00
719
720
Vanlerberghe et al.
contain three major subfamilies of elements. One subfamily, the A clade, shows ~5%
divergence (Loeb et al. 1986; Schichman et al. 1992). F clade elements show - 10%
divergence ( Wincker et al. 1987; Padgett et al. 1988; Adey et al. 199 1) while a third
clade, the V clade, is quite divergent, with elements differing from each other by 20%25% (Jubier-Maurin et al. 199 1).
All of the murine species examined to date contain abundant Ll subfamilies
showing very little ( <5%) divergence between individual members (Jubier-Maurin et
al. 1985; Martin et al. 1985; Bellis et al. 1987). While it was initially postulated (JubierMaurin et al. 1985; Martin et al. 1985; Hardies et al. 1986) that this large speciesspecific similarity seen in these subfamilies was due to genetic exchange processes
such as gene conversion, molecular analyses (Casavant et al. 1988) of the distribution
of L 1 elements within the embryonic portion of the P-globin loci of two closely related
species of mice, Mus muscuZus (strain BALB/c) and M. caruli, indicated that all of
the elements with low divergence were in unique positions and hence must have been
placed within the two loci after the divergence of the two species some 2.4 Mya [this
time represents a recent recalibration, by She et al. ( 1990) of the divergence times
within the murine lineages]. This indicates that the great sequence similarity within
the murine A and F clades is due to a high rate of duplicative transposition of particular
subfamilies, instead of to a replacement process. To explore the generality of these
processes and conclusions, we have collected Ll sequence information from two species
of vole (family Microtinae) that diverged from the Muridae - 15-25 Mya (Lindsay
1978; Jaeger et al. 1985; Catzeflis et al. 1989). The mutation rate in voles is similar
to that in mice (Catzeflis et al. 1989), as are the generation times. We isolated seven
Ll sequences from the sibling vole, Microtus epiroticus, and six from the water vole,
Arvicola terrestris, which are estimated to have diverged from each other 3.5 Mya
(Catzeflis et al. 1987). Sequences collected from these two species indicate that there
is no abundant subfamily of Ll elements in voles with sequence identity of >95%, as
there is in murine species.
Material and Methods
Genomic Libraries
We produced partial libraries of DNA sequence from two species of voles, Microtus
epiroticus (sibling vole) and Arvicola terrestris (water vole), DNA from which was
provided by F. Catzeflis (Montpellier, France). Five micrograms of DNA were sonicated to give an average size of 600 bp as measured by gel electrophoresis, and fragmented DNA of size 300-900 bp was eluted from a preparative 1.2% agarose gel by
using DE8 1 DEAE paper according to a method described by Dretzen et al. ( 198 1).
The fragments were end repaired for 30 min by using 10 units of Klenow enzyme and
10 units of T4 polymerase and then were phenol extracted, ethanol precipitated, and
resuspended in 20 ~1 of 10 mM Tris-HCl (pH 8.0), 1 mM ethylenediaminetetraacetate.
Fifty to two hundred nanograms of fragments were then ligated with 20 ng of M 13mp 18
plasmid DNA that had been digested with SmaI and alkaline phosphatase (Messing
et al. 1977; Schreier and Cortese 1979; Bankier and Barrel1 1983) to produce a partial
library for screening. The ligated material was transfected into DH5aF’ cells made
competent according to a method described by Hanahan ( 1983). Approximately 70%
of the transformants contained recombinants, as determined by lack of P-galactosidase
activity on x-gal indicator plates.
Ll Family
in Voles
72 1
Hybridization
The phage library was probed with a single-stranded radiolabeled probe called
“Bad” [the 540-bp BamHI fragment from the 3’end of ORF 2 of a Mus muscdus
BALB/c Ll element cloned into M 13mp7 by Martin et al. ( 1984)] at low stringency,
i.e., in 40% formamide at 42°C overnight, followed by two 15min washes in 0.3 M
NaCl, 0.03 M sodium citrate ( 2 X SSC), 0.1% SDS at room temperature and by three
30-min washes in 0.1 X SSC, 0.1% SDS at 42°C. These conditions should detect any
Ll sequence with 265% homology with the probe. Hybridizing clones were picked,
replated, and rescreened for purification according to a method described by Jahn et
al. (1980).
DNA Sequencing
The nucleotide sequences of the vole DNA were determined by the dideoxy
chain-termination
method of Sanger et al. ( 1977)) as modified by Bankier and Barrel1
( 1983) and Padgett et al. ( 1988). We used a 17-base sequencing primer derived from
sequence at the 3’end of the Hind111 site of M 13mp 18 to collect our initial sequence
data and then used that sequence to design an oligonucleotide primer (2 1 bases) from
within the vole Ll sequence. The sequence data reported here were derived from only
one strand. The GenBank accession numbers for our new sequences are MIC 1,
M94693; MIC 6, M94694; MIC 8, M94695; MIC 16, M94696; MIC 19, M94697;
MIC 20, M94698; MIC 28, M94699; ARV 2, M94700; ARV 7, M94701; ARV 8,
M94702; ARV 17, M94703; ARV 18, M94704; and ARV 20, M94705.
Sequence Analysis
Sequence data were entered into a VAX 11/780 data base using the SEQUINP
and BATCAT programs developed by Hutchison ( 1986). Homology search analysis
between pairs of sequences was done by using programs from the University of Wisconsin GCG package (version 5.0). Multiple alignments of homologous sequences
were done by using a sequence alignment tool called “SALT” (White et al. 1984).
The phylogenetic trees were constructed by using DNAPARS and DNABOOT programs from the PHYLIP package (version 3.4) provided by J. Felsenstein (Department
of Genetics, University of Washington, Seattle).
Results
Copy Number
To maximize the amount of useful sequence from Ll that could easily be sequenced and would then be comparable for divergence analysis both with each other
and with previously collected data, we produced partial genomic libraries by using
short vole DNA fragments (average size 600 bp, as estimated on the basis of electrophoresis) generated by sonication. Clones were chosen on the basis of hybridization
to a 540-bp Barn5 probe that previously had been used to define a region for similar
analyses in several species of mice (Martin et al. 1985; Hardies et al. 1986). Of 6,000
Microtus recombinants, 67 hybridized positively with the Barn5 probe at low stringency,
and 62 of 5,700 Arvicola recombinants were positive. The number of Ll copies per
haploid genome can be approximated by using the formula N = - [In ( 1 - f)] G/S,
where fis the fraction of positive clones, G is the size of the genome, and S is the size
of the inserts. Given that G = 3 X lo9 bp and that S = 600 bp, the Barn5 homologous
sequence is present in -55,000 copies in the vole genome (Microtus = 56,200; and
.
722
Con Vole
ARv2
MICl
mc 19
ARva
MIC 6
MIC 28
ma
ARV 11
Vanlerberghe
et al.
10
20
GGATCCAGCB AT . . . . . . . C.
. . . . GT . . . .
. . . ..c..c.
. .C. . . . . .
. . . . . . . . C.
A.T..
.. .. .... ..
. . . . . . . .C . . . . . . . . . .
. . . . TT
. ..T-...C.
. .A . . . . . .
.. .. .... ..
A . . . AT.
-A-T
. . ..A...-.
. . . . A . ..-.
. . . . G . ..-.
. . . -------.T.C...-.
- . . . . G.--.
. .A . . . . . -.
. . . . . . ..-.
. . . . ...*-.
. . . . ...*-.
.T.....T-.
.T......A.
. . . . C . ..-.
..........
.C ........
.C.....A ..
.........
C
..........
........
A.
T.-Cc
..T ..a:
.T ..CC
T. .CC
T. .cC
..... T, .C.
......
..C.
.. C . ..T AC
.
.
.
.
.
.
.
.
. . . ..G.A..
. . . ..G.A..
. . . ..G.A..
. . . ..G.A..
. . . ..G.A..
.C.....A..
.C.A..GL.
.Ga;. . . . . .
. ..C.C.A.T
. ..C.C.A..
..TC.C.A.C
. .TC.C.A.C
. . . ..T.A.T
. ..C!TC.A..
. ..c.c.c..
.TA.A.....
loo
CAA....A..
.A.....A..
. . . . . ..A..
.A..CA.A..
----- *. .
.A. .CT.A..
. ... .... ..
. ... .... ..
..G.......
-.T.......C
.C........ .... ..
. ... ...
. ... ...
..T.....T... ....
. .. ....
. ... ...
I20
----ChGAX...A
-----..A.
-.........
-----..A.
----.AT.
-----....
-.........
-----....
-----....
130
140
150
160
TM;ATBBBG
..........
G...T ......
..CA ...........
..-.
G .........................
..-.
.... C .....
G ...... T ...
.TG .........
..G..-.
..........
..........
G ...................
.AA..G.T-.
.G......G.
G ..........
..cC .......
..C...-.
G . ..T ......
..CA..A ......
..G.-.
... T ......
..............................
.------.-.
..............................
AA ... -G-A
................
T ...
C ..CA.. ...
Cl'..... G-.
....................
G ..............
..G-.
.......................
CA..........
..G-.
..........
.-..T.T ...
.G .............
..G-.
..-.......
... ... ... .
... ... ... .
. . . . . ..G-.
ARVI
mc
mv
Aw
MIC
20
18
20
16
. ..C.T.A..
Con I43 ...............
LllmA2
.............
LJM F3
..............
Con MC ...............
...... C ........
con W
Fu4T .........
T
Llm 19 A..C ... ..T
T..C ... ..C
Con Vole
NW2
MICl
MIC 19
ma
MIX 6
mc 28
MIC a
ARV 17
AN7
MIC 20
mv la
ARV 20
MIC 16
Con M3
LllMA2
LLLW F3
ConMc
c0n m
RAT
La-d 19
90
AT-T
. . . . . . . ... .. .... .. .
. ... ... .. .
. . . . . . . . G.
. .. ... ... .
. . . . . . . KG.
.G........
. .. ... ... .
. . ..TG...A
.c........
e......
. . .
. . . . . . . . . .
. . . . . . . . T.
........ C.
........
C.
........
C.
........
CY
........
C.
... ..C ..CA
. ..AT...C.
.. T ...... C
. . . C . ..-.
. . . C . . .-.
. . . C . ..-.
. . . C . ..-.
. . . C ..*-.
. . . C . ..-.
. ..A..G-.
. . . . . G.-.
110
............-
. . . . . . .A . .
.C.....A..
.c.. . .GL.
. . . . . . . . . .
. . . . . ..G.-
. . .C . . . . . .
. . . . . . . . . .
. . . . . . . . .. . . . . . . ..-
----------------
. ... .... ..
.A.......-
-----.G.A
.A...A.A
.A...A.A
.A...A.A
.A...A.A
.A...A.A
.A...A.A
.A.....G
.A..CA.A
...........
...........
...........
...........
...........
...........
..
.A .....
...
..A.~
.
.
.
.
.
.
..-
. . . .
. . . .
. . . .
------
. . . .
------
. . . .
------
. . .
--_---
------
. . . .
.
. . .
---e--T
. . .
----d-f
. . .
.
..A.......
...... T ...
. ..C.T ....
.Tr.AG ....
.TI'A..C ...
.......
. . .T . . . .Ah
. . .A.. . .AA
. . . T.T....
C. . . . . . . .C
. . . . . -.. A.
.... .. ....
.... ... ...
... ... ....
. ..AT..T..
. . ..T..T..
GT. .A.. . . .
.T..T.....
.T..T.....
. . . CT . . . . .
..- . . . . T . .
.T........
. . . . . ..T..
..........
.GT.....A.
.G ........
.G ........
... G .... C.
... G ......
... G .... C.
... G .... C.
... G .... C.
... G .... C.
... l-r .....
.. c ..... c.
Q;TA.G .....
GGTA.G .....
G2TA.G .....
GTA.G .....
WULT ....
..TA.A.GA.
.G ... ..G .A
.G.T.T..A.
G ..........
G ..........
G ..........
G . ..T.. ....
G. .. . ......
G......T
...
G . ..T ... A.
.. ..T..AA.
.-CA .....
..CA .....
.-CA .....
..CA .....
.T.CA .....
. ..CA .....
G..CL.T ..
. ..CA...A.
..A..G .......
..A..G .......
..A..G .......
..A. .G .......
..A..G .......
..A..G .......
G..A ......
. ..G?iT . ..T
..........
.. . .....
.........
T
... C ......
. ..A ......
.T ........
C .........
C .........
T .........
C .........
C.........
C .........
C .........
.CG.... ..T
..C.-.
..c.-.
..C.A.
..C.-.
..C.-.
..C.-.
CAL .CC.-.
... ..T..-.
Arvicola = 54,700). Hybridization of the same probe, under the same low-stringency
conditions, to a BALB/c library (provided by S. Schichman) containing sonicated
DNA fragments of average size 1,100 bp gave 98 positive clones, of 3,000 recombinant
phages tested. This gives a copy number of 90,000 for the Barn5positive sequences
in BALB/c.
Determination
of Nucleotide
Divergence
The region used for divergence analysis was the 3’ end of the large mouse Ll
open reading frame (coordinates 6385-6697 from the Ll Md-A2 sequence from Mus
musculus BALB/c; Loeb et al. 1986). Seven clones from the Microtus library contained
~200 bp of Ll sequence and were used in this analysis. We also isolated from the
Arvicola library six clones with 2.200 bp of sequence from this same region. These 13
sequences were then aligned to each other and to consensus Ll sequences from Mus
musculus BALB/c (A and F types), Mus caroli, A&s platythrix, Rattus rattus, and
Homo sapiens and to a divergent L 1 element, L 1Md- 19, from Mus musculus BALB /
c (fig. 1). Each of the 13 aligned vole sequences were compared with each other and
with reference sequences in all pairwise combinations (table 1). Divergence was determined by the fraction of overlapping bases mismatched, given the alignments, with
Ll Family in Voles
170
Con Vole
I4P.v 2
.........
.
.........
... G ..... .
. ..G.A . ..A
.... C .....
.- .. C .....
.T..A .. ..T
.T ........
MTC 6
MIC 28
MIC 8
AFW 17
Aw 7
MIC 20
ARV 18
AEW 20
MK 16
..........
...............
....................
..........
....................
....................
.....
..-- -
G .........
....
.........
............
..............
Ll.W
.........
LLLRi 19
AEIv2
MICl
MIC 19
ARV8
MIC 6
MIC 28
MlC8
ARV 17
mv7
MIC 20
ARV 18
ARV 20
MIC 16
Con Mi
LalMdA2
Lllm F3
Con MC
Con PQ
Llai 19
HWAN
..G .......
----
......
.
----------
.
..........
..C .....
C .....
. ..T.R ..............
. ..T ................
..............
C .....
G .........
T .........
.........
C
.... A .....
250
Con Vole
..T ..
..A .............
Con f&i
LlMiA2
ConMc
Con &I
210
.............
T ... 'IS.
.T.A .......
..T ..... C
...........
..TG..TG.
.T----......
T .... .
... T .... .
..... T ... T
..............
..C ..A
....................
.......
C ............
A ..
.................
..T .......
.G.. ......
..........
...... AT ..
--1-111------m-v---
.... ..--- ..... ..-- C....---.... ..--- .T.....-A.....--_---A ....
A .... ..lT A
AG.....-A .... ..-- __________
----------
G ......
..A
.... ..- ...
.GcAm ...
- .......
T.
..... .-_. .
.GcATAc ...
T.T .................
GET ......
C ...................
G .........
- .........
.A ........
A....A ....
GA....T .......
..T .............
.GCA.G . ..A
-T..CA ........
..-- ..
TT..A .........................
-T....cA ............
G...A.- ...
-------------------T..C .....
-_--------------------------_---_----------------------
..... ..-..... ..-..... ..-..... ..-.C.....-.C.....--
-
-G.....AT.
-G.....AT.
-G.....AT.
-G.....AT.
-G.....AT.
-...C..GT.
TA .....
TA .....
TA .....
TA .....
TA .....
TA .....
..C
..C
..C
..C
..C
..C
Cl'..C .....
Cl'..C .....
.T ........
Cl'..C .....
CT ........
Gl'........
.A.....--
-
-TG....GT.
.A..XC
...
Gl'...G.C
190
230
240
AASBTGTGGJJ =mXAC
mc 1
Iac 19
AFW 8
F3
200
180
220
723
260
BGABBB---
.............
T
!!A
T .... TA
..T....T A
..T .. ..TA
..T .. ..T A
. ..T....T A
. ..T....T A
..n; .. ..CA
....
.............
.... ..A ....
...........
...........
.... ..A ..T
.... ..A ..A
C ..... A ...
270
280
300
290
CA%-ATM;T
=-
-
. ..T-..T..
. ..T-.....
-- . . C . . . . .
..-..A..
::.T-2.
. . ..-..AT.
. . . . . . ..TA
..* . . . . . TA
. . ..-G?L.T
. , . ,A , . , , .
----ET...
A . . . ..A...
. . . . . . ..TA
. . . . A . ..TA
A-..AGA..A
AG. AGiL .A
A . ..A..A..
. .... ..
. . . . . . . -.T
. . . .A
A-..AG..A.
Ac..AGx..
_______-_-----------
----------
---------.
. . . . . . . .T.
.G....A...
.G.......T
CG. .TTCXGT.
G. .-7.C.T..
G...K.T..
G..T'IC.T..
. . . . . ..C..
. . . . . . .C. .
G..GCG---G..GGG----
. . ..-.....
. . ..-.....
ACA.A.T.A.
ACA.ATT.A.
AU4.A.T.A.
ACA.A.T.A.
AC..A.T.A.
AC..ACT.A.
. ..GcaA.Tc
. ..G.AA.'TC
. ..GG'+A.!K
. ..GG?iA.'IC
. . ..GAA.K
.G....A..T
.CA.A..A.A
.T...-C...
GGAW
. . . . . . . .T.
. . .C . . . . -.
. . . ..C.-..
A . . .A.. . . .
..TAA.....
.A....A--. . . . -----..T...A--. . . . ..A?A..G...AAA-
-...T..GT.
-. . .-. .---
. . . . . . .C. .
G..GOG----
. . ..-.....
..,....
C. .
. . ..AT.T..
..T.......
G..GCG---G..GGG---G . . . ..----
. . ..-...Y.
T...-..T..
T . ..-.....
. . . . . . ..TA
. . . . . . ..TA
.,..,..,TA
. . . . . . ..TA
. . . . . . ..TA
. . . . . . .CTA
. . . ..A.G..
G....---<
T...-..T..
C. .CA.ACTA
..
. . . ..T.A-. ..G.AAG
.G....A.G.
CG. .lTXT.
FIG. 1.-Alignments
of Barn5 homologous sequences from Microtus
and Arvicolu with rodent and
human L 1 sequences. The alignments are shown in a difference format with respect to the consensus sequence
(Con Vole) derived from the vole sequences. Nucleotides underlined in the concensus vole sequence represent
the informative sites used to derive the parsimony tree shown in fig. 2. Identity in the other sequences is
indicated by a period (.), a nucleotide difference is indicated by the appropriate base, a deletion or pad is
indicated by a dash (-), and lack of sequence is indicated by a space. The letter “R” in the figure indicates
an A or G; the letter “Y,” a T or C. The sources of the other sequences are as follows: Mus curoli (Con MC)
and Muspluththrix
(Con Mp) (Martin et al. 1985); LlMd A2 (Loeb et al. 1986) and LlMd F3 and LlMd
19 (Shehee et al. 1989; R. Shehee, personal communication); rat (from ratline-3; D’Ambrosio et al. 1986);
and human (from TbG 41; Hattori et al. 1985).
no contribution to the score by insertions or deletion (indels). In contrast to what
was found in the mouse, the average intraspecific divergence in both Microtus ( 18.5%;
range 7.9%-27.2%) and Arvicola ( 16.2%; range 9.6%-23.3%) was not significantly
different from the interspecific divergence of 17.5% (range 6.7%-24.7%). Table 2
summarizes the divergence distribution (number of pairwise comparisons within a
particular range of divergence) in the Barn5 homologous region of Ll elements from
three groups-voles, mice, and humans. However, the Ll sequences from mice represented in this table were collected in previous studies at high stringency. Thus the
divergent V clade is not represented in the table, because no V clade sequence is
Table 1
Nucleotide
Divergence
MIC
ARV 2
MIC I
MIC 19
ARV 8
MIC 6
MIC 28
MIC 8
ARV 17
ARV 7
MIC 20
ARV 18
ARV 20
MIC 16
Con Md
LlMdA2
LlMd F3
Con MC
Con Mp
Rat
LlMd 19
.
Matrix
1 MIC
12.1
for Individual
19 ARV
18.6
13.3
19.2
16.5
20.5
8 MIC 6 MIC
18.5
18.4
21.5
18.5
21.2
23.3
27.2
21.8
23.0
Ll Element
28 MIC
15.8
15.5
18.4
17.5
17.2
21.8
8 ARV
19.2
18.0
19.6
15.4
20.3
23.2
6.7
Pairs
17 ARV
7 MIC
16.7
20.2
22.7
23.2
21.0
23.4
10.0
10.9
.
NOTE.-Data
are percent
nucleotide
divergence,
with no contribution
from indels.
17.4
16.0
18.5
21.2
19.2
22.9
7.9
8.7
11.3
20 ARV
19.0
21.7
21.8
21.6
22.4
24.7
17.3
11.9
14.2
11.7
18 ARV
19.9
19.8
22.2
17.3
23.5
24.6
10.9
10.5
15.1
10.0
9.6
20 MIC
17.4
16.9
19.2
20.4
20.1
24.8
12.2
8.4
13.8
10.3
11.5
10.4
16 Con Md LlMd
25.0
20.4
21.1
16.4
22.9
24.6
20.1
21.7
25.7
20.7
24.2
21.0
21.7
23.8
19.3
20.0
15.7
22.3
23.2
19.0
21.1
24.6
20.1
23.7
20.4
20.1
1.7
A2 LlMd
23.8
21.5
21.6
16.4
21.7
24.3
21.2
22.2
26.3
21.3
24.2
21.5
21.5
3.0
4.7
F3 Con MC Con Mp
23.8
22.1
21.6
17.2
22.9
23.4
21.2
22.8
26.9
21.7
25.9
21.0
23.6
1.7
2.7
2.7
24.4
21.7
21.6
16.8
24.6
24.0
21.2
20.7
26.3
21.7
25.3
21.0
21.9
7.0
8.9
8.9
6.7
Rat LlMd
23.4
21.5
21.6
25.3
24.0
24.7
22.9
24.9
25.1
22.8
27.5
25.4
24.8
15.0
14.7
16.0
16.3
16.2
25.5
25.5
24.7
22.8
29.5
30.9
24.9
27.0
26.9
26.1
32.0
26.5
28.3
26.6
26.8
27.3
26.6
27.2
24.3
19 Human
27.7
31.5
32.3
27.4
30.6
30.6
29.3
32.1
33.0
31.0
32.0
32.0
31.8
31.6
33.5
31.7
30.5
31.6
35.2
37.6
Ll Family
Table 2
Nucleotide
Divergence
Distribution
Microt us-n
= 7, p = 2 1,
d = 18.45%
.
_. .
Arvicola-n
= 6, p = 15,
d = 16.25%
.
..
.
Voles-n
= 13, p = 78,
d = 17.5% . _.
. .
Mus domesticus’-A
and F clades,
n = 10,~ = 45, d = 4.1% . . .
M. caroli-n
= 10, p = 45,
d=4.8%
.. .. .. .. . .. ... . ..
M. platythrix-n
= 10, p = 45,
d=4.1%
.. .. .. .. . .. .. .. ..
Human-n
= 10, p = 45,
d= 13.7% . . . . . . . . . . . . . . . .
725
within the Ll Families in Voles, Mice, and Humans
No.
0%5%
SPECIES a
in Voles
5%-10%
WITH DIVERGENCE
lo%-15%
15%-20%
OFT
20%-25%
25%-30%
.
0
2
2
9
7
1
.
0
1
5
7
2
0
0
7
14
28
27
2
.
32
13
0
0
ND
ND
.
25
20
0
0
ND
ND
.
32
13
0
0
ND
ND
.
5
15
5
10
9
1
’ n = No. of sequences; p = no. of pairwise combinations;
and d = average divergence.
b Data are percent divergence with no contribution
from indels. ND = not determined.
’ The inbred strain, BALB/c.
available from the region being analyzed. The divergence distribution in the voles is
quite different from that for mice, even after account is taken of the missing divergent
clade in mice, in that there is no abundant clade of very similar Ll elements in the
vole, as there is in the mouse. In our sample of vole Ll elements, 90% of the pairwise
comparisons show a divergence > 10%. On the other hand, in each species of mouse
there is a clade whose members show divergence values of <5% from each other.
Age of Divergent Clade
We can estimate the divergence rate of Ll elements in this portion of the element
on the basis of the observations by Martin et al. ( 1985)) who measured a 5.4% difference
between L 1 elements from caroli and BALB/c, which have been estimated to have
diverged from each other 2.4 Mya. Correcting for homoplastic substitutions by the
formula of Jukes and Cantor ( 1969) (Pestim= -‘/4 X ln( 1 - 4/3X Pobserved),we get an
estimated divergence rate of 2.28%/Myr. This rate is very similar to that for singlecopy DNAs, as estimated from DNA-DNA hybridization between A4us musculus and
Mus caroli (2.9%; She et al. 1990) and within voles (2.5%; Catzeflis et al. 1989). The
observed values of Ll divergence in the vole, 6.7%-24.7%, can similarly be corrected,
to give estimated values of 7.0%-30.0%. It therefore appears that the most divergent
L 1 elements in the vole were placed into the genome - 13.1 Mya and that the least
divergent elements in our collection w&-e placed in the genome - 3.1 Mya.
Phylogenetic Analysis
To visualize the ancestry of the various Ll elements, we analyzed our data set
with two kinds of phylogeny-inference algorithms. One, the FITCH program of the
PHYLIP package version 3.4, was used to search for the Fitch-Margoliash least-squares
estimate of the phylogenetic tree (results not shown) by using the pair-wise divergence
726
Vanlerberghe et al.
matrix (table 1). In addition, we ran the program DNAPARS (same package), which
performs a site-by-site maximum-parsimony
algorithm on the aligned sequences (fig.
1 ), over two short regions and over a large region from which the tree shown (fig. 2)
is derived. The region covering nucleotides 1 I- 195 (fig. 1) contains 42 variable sites
if only the 13 vole sequences are considered and contains 79 variable sites if all 21
sequences are taken into account. This region contains two CG dinucleotides that
were removed from one of the short alignments for analysis. Region lo-288 contains
67 informative sites over 12 vole sequences and contains 10 1 sites when the 8 reference
sequences are added. Analysis of these three data sets by using the parsimony method
gave the same general results; that is, while the branching details between the vole
sequences varied somewhat, depending on the exact data set (75,79, or 10 1 informative
sites), the vole sequences always clustered together relative to the mouse, rat, and
human sequences, and the Microtus and Arvicola sequences were always admixed.
The parsimony tree shown (fig. 2) gives bootstrap information as percent of 500
replicates ( DNABOOT ) .
As expected from the divergence distribution, the branching order of the vole Ll
sequences is not resolvable, by these analyses, into two species-specific groups. Thus
there is no evidence, within our sampling of the vole L 1 population, for species-specific
homogenization of this divergent Ll family. Maximum-parsimony
analyses of our
data set also showed no clustering of the Microtus sequences with respect to Awicola.
The number of substitutions found in the most parsimonious phylogenetic trees
for the 38 informative sites among the 13 vole sequences is 105. This gives 2.6 mutations
per site, which is noticeably higher than the number of substitutions per site (2.1)
when one analyzes the entire 185-bp region from which the informative sites
were drawn.
Discussion
Distribution
of Divergence within the Ll Population in Voles
Although the average pairwise divergence among the vole Ll elements was 17.5%,
some elements showed pairwise divergence as low as 6.7%. This indicates that at least
one vole Ll element became active only a few million years ago. This raises the
possibility of explaining the differences that we see between voles and mice by postulating that voles have a large number of currently active elements and that the great
divergence seen in our Ll sequences is due to not collecting multiple samples from
each of the clades. This does not seem to be the case, because elements recently derived
from active clades should have intact open-reading-frame sequences, but sequences
from the divergent vole elements contain multiple mutations (frameshifts, termination
codons, and amino acid replacements), indicating that it has been a long time since
the elements were inserted into the genome. The estimated divergence range ( 30%7% ) and the estimated L 1 divergence rate of 2.28% /Myr indicates that an L 1 duplicative
transposition interval began in voles - 13 Mya, which is after the most recent estimated
time for the divergence of mice and voles. This suggests that the Ll elements that we
have isolated from the vole should be vole specific. This supposition is supported by
the fact that the bulk of the vole Ll elements cluster together in the phylogenetic
analyses (fig. 2). The youngest Ll element in our collection was generated - 3 Mya.
Although the size of our data set does not allow us to exclude the possibility that
younger elements exist in the vole genome, we can conclude that the rate of amplification of any currently active element must be very low compared with the active
elements in mice, since there is no single abundant clade of recently (43 Mya) amplified
ARV 2
MIC 1
MIC 19
ARV 8
MIC 6
25
MIC 28
MIC 8
ARV 17
ARV 7
27
MIC 20
ARV 18
ARV 20
MIC 16
CON MD
LlMD
A2
LlMD
F3
95
CON MC
92
87
’
CON MP
RAT
LlMD
19
HUMAN
FIG. 2.-Unrooted
parsimony tree for the L 1 sequences from voles, rodents, and human. This maximumparsimony tree was obtained by using the DNAPARS and DNABOOT programs from the PI-IYLIP package
(Felsenstein 199 1) using 102 informative sites from the nucleotide sequences from fig. 1, with the human
sequence set as the outgroup. The numbers at the nodes are percent of the time that the multiple tree
members to the right of the node were found in the 500 replicates analyzed. Branch lengths have no significance.
728 Vanlerberghe et al.
Ll elements in voles, as has been seen in six different species of mice (Hardies et al.
1986; Bellis et al. 1987).
Copy Number of the Divergent Clade
Mice and voles are estimated to have diverged -20 Mya [ Catzeflis et al. ( 1989)
estimate it as 15 Mya on the basis of DNA data, and Lindsay ( 1978) and Jaeger et
al. ( 1985) estimate it as 25 Mya on the basis of fossil evidence], and hence the divergence of mice and voles occurred prior to the time when the oldest Ll elements
in our collection were placed into the vole genome. Hybridization of an Ll probe
from the murine Barn5 region of Ll at low stringency to plaques containing DNA
fragments of the vole genome indicates that this region is present in voles at a relative
abundance of 55,000 copies/genome equivalent. Hybridization of the same probe,
under the same low-stringency conditions, to our BALB / c library gives 90,000 copies /
genome equivalent of the Barn5 region, as compared with 57,500 copies (45,00070,000 copies, depending on the library) for the Barn5 region in the A and F clades
when the probing was carried out at high stringency (M. Comer, personal communication ) .
Are Ll Elements Evolving in Concert?
Sequence families that are evolving in concert (i.e., families within which processes
are at work to reduce the species-specific divergence of existing elements) should show
a smaller intraspecific versus interspecific divergence. In the vole the intraspecific
divergence within our Ll samples (7.9%-27.2%) was not smaller than the interspecific
divergence (6.7%-24.7%), indicating that there are no processes at work on this divergent vole Ll family that are sufficient to reduce the divergence of existing members
within these species since their divergence -3.5 Mya. This conclusion is supported
by the fact that we see no species-specific clustering of the Arvicola and Micro&s Ll
elements within our analyses of the phylogenetic relationships of the vole Ll elements
sampled. These observations tell us that the rate of any exchange processes such as
gene conversion acting on the L 1 family since the divergence of Arvicola and Micro&s
is not sufficient to reduce the average divergence between L 1 elements in these species.
L 1 Population
Dynamics
Our results suggest that L 1 expansion within the genome is a discontinuous process
and that Ll amplification has turned on and off at various times during evolution;
that is, Ll elements appear to produce large numbers of elements via duplicative
transposition only during discrete intervals, and different species seem to have different
intervals in which the Ll amplifications have been high. This is similar to what has
been concluded for the population dynamics of the smaller repetitive elements (i.e.,
SINES) in mammals (Deininger 1989), although the amplification intervals for these
smaller elements seems to be so short as to be more of a burst. In murines, in contrast
to the voles, there has been a relatively recent duplicative transposition interval in
which a large number ( -50,OfiO) of new Ll elements were produced during a 2-3Myr interval (Hardies et al. 1986). A puzzling aspect of this amplification event in
murines is that it appears to have taken place at approximately the same time in at
least six murine species (Mus platythrix, Mus spretus, Mus spicilegus, Mus macedonicus, Mus caroli, and Mus musculus) that diverged from each other prior to the beginning of this duplicative transposition interval (Jubier-Maurin et al. 1985; Martin
et al. 1985; Bellis et al. 1987). Since it seems unlikely that in mice there were six
Ll Family in Voles 729
independent amplification events all at the same time, there must have been, in mice,
some unknown common feature leading to these amplification events that was not
present in voles. One way to explain this would be to postulate that the duplicative
transposition interval began in the ancestor common to these murine species but that
it was accompanied by a deletion process, as proposed by Hardies et al. ( 1986 ) . Recalibrating that rate by using She et al.‘s( 1990) time of the domesticus/caroZi divergence
gives an Ll turnover rate of 0.8 Myr, which would be sufficient to give the observed
results. Another puzzling feature, given both this picture of L 1 duplicative transposition
intervals occurring at random times in mammals and the capacity to generate a very
large number of Ll elements in a time very short with respect to the mammalian
radiation ( -50,000 copies in 3 Myr in mice), is the relative similarity of Ll copy
numbers in the various mammals (Burton et al. 1986). It will therefore be very interesting to discover whether there are processes that can act to limit Ll copy number,
such as impact on fitness, Ll -mediated copy-number control, or generic deletion
mechanisms acting on the Ll family.
Acknowledgments
We thank S. C. Hardies for providing the primate Ll sequence alignment and
for very helpful discussions concerning the manuscript. We also thank S. Stamper for
synthesizing our oligonucleotide primers for sequencing. This research was supported
by Public Health Service grant AI08998 from the National Institutes of Health to
M.H.E. and C.A.H. and by NATO grant 88/762 to F.B.
LITERATURE CITED
N. B., M. B. COMER, M. H. EDGELL, and C. A. HUTCHISON
III. 199 1. Nucleotide
sequence of a mouse full-length F-type Ll element. Nucleic Acids Res. 19:2497.
BANKIER,
A. T., and B. G. BARRELL.
1983. Shotgun DNA sequencing, Pp. l-34 in R. A.
FLAVELL,
ed. Techniques in nucleic acid biochemistry. Vol. B5. ElsevierScientific, Limerick,
Ireland.
BELLIS,
M., V. JUBIER-MAURIN, B. DOD,
F. VANLERBERGHE, A. M. LAURENT,
C. SENGLAT,
F. BONHOMME,
and G. ROIZES. 1987. Distribution of two recently inserted long interspersed
elements of the Ll repeat family at the Alb and Bh3 loci in wild mice. J. Mol. Evol. 4:35 l363.
BURTON,
F. H., D. D. LOEB, C. F. VOLIVA, S. L. MARTIN, M. H. EDGELL, and C. A. HUTCHISON
III. 1986. Conservation throughout Mammalia and extensive protein-encoding capacity of
the highly repeated DNA interspersed sequence one. J. Mol. Biol. 187:291-304.
CASAVANT,
N. C., S. C. HARDIES,
F. D. FUNK,
M. B. COMER,
M. H. EDCELL,
and C. A.
HUTCHISON
III. 1988. Extensive movement of LINES ONE sequencesin P-globin loci of
A4us caroli and A4us domesticus. Mol. Cell. Biol. 8:4669-4674.
CATZEFLIS,
F. M,, E. NEVO, J, E. AHLQUIST,
and C. G. SIBLEY. 1989, Relationships of the
chromosomal speciesin the Eurasian mole rats of the Spalax ehrenbergi group as determined
by DNA-DNA hybridization, and a estimate of the spalacid-murid divergence time. J. Mol.
Evol. 29:223-232.
CATZEFLIS,
F. M., F. H. SHELDON,
J. E. AHLQUIST,
and C. G. SIBLEY. 1987. DNA-DNA
hybridization evidence of the rapid rate of muroid rodent DNA evolution. Mol. Biol. Evol.
4:242-253.
D’AMBROSIO,
E., S. D. WAITZKIN,
F. R. WITNEY,
A. SALEMME,
and A. V. FIJRANO.
1986.
Structure of the highly repeated, long interspersed DNA family (LINE or Ll Rn) of the rat.
Mol. Cell. Biol. 6:4 1 l-424.
DEININGER,
P. L. 1989. SINES:short interspersed repeated DNA elements in higher eucaryotes.
ADEY,
730 Vanlerbergheet al.
Pp. 6 19-636 in M. HOWEand D. BERG,eds. Mobile DNA. American Society for Microbiology, Washington, D.C.
DRETZEN,G., M. BELLARD,P. SUSSONE-CORSI,
and P. CHAMBON.
198 1. A reliable method
for recovery of DNA fragments from agaroseand acrylamide gels. Anal. Biochem. 112:295298.
EDGELL,M. H., S. C. HARDIES,D. D. LOEB,W. R. SHEHEE, R. W. PADGETT,
F. H. BURTON,
M. B. COMER, N. C. CASAVANT, F. D. FUNK, and C. A. HUTCHISON III. 1987. The Ll
family in mice. Pp. 107- 129 in G. STAMATOYANNOPOLOS and W. A. NIENHUIS, eds. Developmental control of globin gene expression. Alan R. Liss, New York.
FANNING, T. G., and M. F. SINGER. 1988. LINE-l : a mammalian transposableelement. B&him.
Biophys. Acta. 910:203-2 12.
FAWCETT, D. H., C. K. LISTER, E. KELLETT, and D. J. FINNEGAN.
1986. Transposable elements
controlling I-R hybrid dysgenesisin D. melanogaster are similar to mammalian LINES. Cell
47:1007-1015.
FELSENSTEIN, J. 199 1. PHYLIP (phylogeny inference package), version 3.4. Distributed by the
author, University of Washington, Seattle.
HANAHAN, D. 1983. Studies on transformation of Escherichia coZiwith plasmids. J. Mol. Biol.
166:557-580.
HARDIES, S. C., S. L. MARTIN, C. F. VOLIVA, C. A. HUTCHISON III, and M. H. EDGELL. 1986.
An analysis of replacement and synonymous changes in the rodent Ll repeat family. Mol.
Biol. Evol. 3:109-125.
HATTORI, M., S. HIDAKA, and Y. SAKAKI. 1985. Sequence analysisof a Kpn I family member
near the 3’end of human beta-globin gene. Nucleic Acids Res. 13:7813-7827.
HUTCHISON, C. A. III. 1986. Sequence gel reading with a portable computer. Nucleic Acids
Res. 14:1917.
HUTCHISON, C. A. III, S. C. HARDIES, D. D. LOEB, W. R. SHEHEE, and M. H. EDGELL. 1989.
LINES and related retroposons: long interspersedrepeated sequencesin the eucaryotic genome.
Pp. 157-169 in D. E. BERG and M. M. HOWE, eds. Mobile DNA. Vol 1. American Society
for Microbiology, Washington D.C.
JAEGER, J. J., H. TONG, E. BUFFETAUT, and R. INGAVAT. 1985. The first fossil rodents from
the Miocene of northern Thailand and their bearing on the problems of the origin of the
Muridae. Rev. Paleobiol. 4: l-7.
JAHN, C. L., C. A. HUTCHISON III, S. J. PHYLLIPS, S. WEAVER, N. L. HAIGWOOD, C. F. VOLIVA,
and M. H. EDCELL. 1980. DNA sequence organization of the P-globin complex in the
BALB/c mouse. Cell 21: 159- 168.
JUBIER-MAURIN, V., G. CUNY, A.-M. LAURENT, L. PAQUEREAU, and G. ROIZES ., 199 1. A new
5’ sequence associated with mouse Ll elements is representative of a major class of Ll
termini. Mol. Biol. Evol. 9:41-55.
JUBIER-MAURIN, V., B. J. DOD, M. BELLIS, M. PIECHACZYK, and G. ROIZES. 1985. Comparative
study of the L 1 family in the genus A&s: possible role of retroposition and conversion events
in its concerted evolution. J. Mol. Biol. 184:547-564.
JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 2 l-1 32 in H. N.
MUNRO, ed. Mammalia protein metabolism. Vol. 3. Academic Press,New York.
KIMMEL, B. E., 0. K. OLE-MOIYOI, and J. R. YOUNG. 1987. Ingi, a 5.2-kb dispersed sequence
element from Trypanosoma brucei that carries half of a smaller mobile element at either
end and has homology with mammalian LINES. Mol. Cell. Biol. 7: 1465-1475.
LINDSAY, E. H. 1978. Eucricetodon asiaticus (Matthew and Granger), an Oligocene rodent
(Cricetidae) from Mongolia. J. Paleontol. 52:590-595.
LOEB, D. D., R. W. PADGETT, S. C. HARDIES, W. R. SHEHEE, M. H. EDGELL, and C. A.
HUTCHISON III. 1986. The sequence of a large L 1Md element reveals a tandemly repeated
5’end and several features found in retrotransposons. Mol. Cell. Biol. 6: 168-l 82.
MARTIN, S. L., C. F. VOLIVA, S. C. HARDIES, M. H. EDGELL, and C. A. HUTCHISON III. 1985.
Ll Family in Voles
73 I
Tempo and mode of concerted evolution in the Ll repeat family of mice. Mol. Biol. Evol.
2:127-140.
MESSING,J., B. GRONENBORN, B. MULLER-HILL, and P. H. HOFSCHNEIDER. 1977. Filamentous
coliphage Ml3 as a cloning vehicle: insertion of a Hind II fragment of the lac regulatory
region in Ml3 replicative form in vitro. Proc. Natl. Acad. Sci. USA 74:3642.
PADGETT, R. W., C. A. HUTCHISON III, and M. H. EDGELL. 1988. The F-type 5’ motif of
mouse Ll elements: a major classof Ll termini similar to the A-type in organization but
not in sequence. Nucleic Acids Res. 16:739-749.
ROGERS, J. H. 1985. The origin and evolution of retroposons. Int. Rev. Cytol. 93:187-279.
SANGER, F., S. NICKLEN, and A. R. COULSON. 1977. DNA sequencing with chain terminating
inhibitors. Proc. Natl. Acad. Sci. USA 74:5463-5467.
SCHICHMAN, S. A., D. M. SEVERYNSE, M. H. EDGELL, and C. A. HUTCHISON III. 1992. Strandspecific LINE-l transcription in mouse F9 cells originates from the youngest phylogenetic
subgroup of LINE-l elements. J. Mol. Biol. 224:559-574.
SCHREIER, P. H., and R. CORTESE. 1979. A fast and simple method for sequencing DNA cloned
in the single-stranded bacteriophage, M 13. J. Mol. Biol. 129: 169- 172.
SCHWARZ-SOMMER, Z., L. LECLERCQ, E. G~BEL, and H. SAEDLER. 1987. Cin4, an insert altering
the structure of the Al gene in Zea map, exhibits properties of nonviral retrotransposons.
EMBO J. 6:3873-3880.
SHE, J. X., F. BONHOMME, P. BOURSOT, L. THALER, and F. M. CATZEFLIS.1990. Molecular
phylogenies in the genus Mus: comparative analysisof electrophoretic, scnDNA hybridization
and mtDNA RFLP data. Biol. J. Linnean Sot. 41:83-103.
SHEHEE, W. R., D. D. LOEB, N. B. ADEY, F. H. BURTON, N. C. CASAVANT, P. COLE, C. J.
DAVIES, R. A. MCGRAW, S. A. SCHICHMAN, D. M. SEVERYNSE, C. F. VOLIVA, F. W. WEYTER,
G. B. WISELY, M. H. EDGELL, and C. A. HUTCHISON III. 1989. Nucleotide sequence of the
Balb/c mouse P-globin complex. J. Mol. Biol. 205:41-62.
SINGER, M. F. 1982. SINES and LINES: highly repeated short and long interspersed sequences
in mammalian genomes. Cell 28:433-434.
SINGER, M. F., and J. SKOWRONSKI. 1985. Making senseout of LINES: long interspersed repeat
sequencesin mammalian genomes. Trends Biochem. Sci. 10: 119-122.
WHITE, C. T., S. C. HARDIES, C. A. HUTCHISON III, and M. H. EJXELL. 1984. The diagonaltraverse homology search algorithm for locating similarities between two sequences.Nucleic
Acids Res. 12:75 l-766.
WINCKER, P., V. JUBIER-MAURIN, and G. ROIZES. 1987. Unrelated sequences at the 5’ end of
mouse LINE-l repeated elements define two distinct subfamilies. Nucleic Acids Res. 15:
8593-8606.
BRIAN CHARLESWORTH, reviewing editor
Received October 14, 199 1; revision received March 9, 1993
Accepted March 11, 1993