Evolutionary Dynamics of Tandem Repeats in the Mitochondrial

Evolutionary Dynamics of Tandem Repeats in the Mitochondrial DNA
Control Region of the Minnow Cyprinella spiloptera
Richard E. Broughton’ and Thomas E. Dowling
Department
of Zoology,
Length
variation
Arizona State University
due to tandem
repeats
is now recognized
as a common
feature of animal mitochondrial
DNA;
however, the evolutionary dynamics of repeated sequences are not well understood. Using phylogenetic analysis,
predictions of three models of repeat evolution were tested for arrays of 260-bp repeats in the cyprinid fish Cyprinella spilupteru. Variation at different nucleotide positions in individual repeats supported different models of repeat
evolution. One set of characters included several nucleotide variants found in all copies from a limited number of
individuals, while the other set included an 8-bp deletion found in a limited number of copies in all individuals.
The deletion and an associated nucleotide change appear to be the result of a deterministic, rather than stochastic,
mutation process. Parallel origins of repeat arrays in different mitochondrial lineages, possibly coupled with a
homogenization mechanism, best explain the distribution of nucleotide variation.
Introduction
Animal mitochondrial
DNA (mtDNA) has been
viewed as an example of genetic economy due to small
genome size (typically 16-18 kb), absence of introns,
and limited noncoding
sequences (Brown 1985). However, this view has recently been challenged by the discovery of frequent tandem duplications, primarily in the
form of tandem repeat arrays in the contol region. As
the number of taxa for which sequence or restrictionsite data exist has increased, the occurrence of tandem
repeats has become recognized as a common feature of
mtDNA. Rand (1993) listed more than 50 species with
variable-length
genomes, and examples continue to be
discovered at a rapid rate (e.g., Ghivizzani et al. 1993;
Hoelzel, Hancock,
and Dover 1993; Broughton
and
Dowling 1994; Stewart and Baker 1994; Mundy, Winchell, and Woodruff 1996). Because much of the theory
for the evolution of nuclear repeats involves some form
of recombination
(reviewed by Elder and Turner 1995),
the evolution of repeats in mtDNA (which lacks recombination) is not well understood. Knowledge of molecular mechanisms
that cause duplications
to arise and
proliferate is vital for a complete understanding
of the
evolution of mtDNA and its efficient use as a marker in
evolutionary
studies.
Extensive length polymorphism
in mtDNA tandem
repeat arrays results from a propensity for addition and
deletion of copies. In the absence of recombination,
slipped-strand
mispairing
(SSM) during replication
(Levinson and Gutman 1987; Buroker et al. 1990; Hayasaka, Ishida, and Horai 199 1) appears to be the primary
mechanism
causing additions and deletions (Madsen,
Ghivizzani, and Hauswirth 1993). Sequences capable of
forming secondary
structures have been shown to increase the frequency of SSM in a variety of nonmitoI Present address: Section of Ecology and Systematics, Cornell
University.
Key words: tandem repeat evolution, mitochondrial DNA, Cyprinella spiloptera.
Address for correspondence and reprints: Richard E. Broughton,
Section of Ecology and Systematics, Corson Hall, Cornell University,
Ithaca, New York 14853-2701. E-mail: [email protected].
Mol. Biol.
Evol. 14(12):1187-l 196. 1997
0 1997 by the Society for Molecular Biology and Evolution.
ISSN: 0737-4038
chondrial systems (Glickman and Ripley 1984; Pierce,
Kong, and Masker 1991), and it appears that such structures also play an important role in mtDNA (Buroker et
al. 1990; Wilkinson
and Chapman 1991; Arnason and
Rand 1992; Broughton and Dowling 1994; Stanton et
al. 1994; Stewart and Baker 1994).
Nucleotide variation among copies may reveal historical processes that gave rise to extant repeat arrays.
Primary among these processes is the rate of copy addition and deletion relative to the nucleotide mutation
rate. At a low rate of addition/deletion,
copies in an
array would diverge from each other as nucleotide
changes accumulate.
Alternatively,
“homogenization”
of nucleotide variation within specific lineages may occur as long as the rate of copy turnover is greater than
the nucleotide mutation rate (Rand 1994). Greater sequence similarity within rather than among some repeat
arrays has led several investigators (Solignac, Monnerot,
and Mounolou 1986; Arnason and Rand 1992; Hoelzel,
Hancock, and Dover 1993; Stewart and Baker 1994) to
draw analogies with concerted evolution observed for
many repeated nuclear sequences. However, because the
term “concerted evolution”
has been used to describe
similarity of repeat arrays among lineages mediated by
interlocus recombination
(Dover 1982), the more general term “homogenization”
may be more appropriate
for nonrecombinant
lineages. In fact, different repeat arrays in clonal lineages will tend to diverge, rather than
evolve in concert, as they are homogenized
for different
nucleotide variants (Smith 1973; Ohta 1983).
Independent
formation of similar duplications
in
different lineages may also be important. Formation of
specific secondary structures may repeatedly facilitate
duplication
of homologous
sequences, resulting in independently
derived repeat arrays. Although their frequency is unknown, convergent mitochondrial
duplications have been reported within species (Hayasaka, Ishida, and Horai 199 1) and among mammalian
orders
(Wilkinson
et al. 1997). If independent
duplications
arise in the presence of nucleotide polymorphisms,
multiple repeat arrays, each possessing lineage-specific
nucleotide variants, may result.
We can therefore identify several evolutionary
scenarios for tandem repeat evolution and derive predic1187
Tandem Repeatsin
Minnow mtDNA
1189
111111111222333333334445556666
1234445555577779999111145999144003347890694694667
5790481390145623492345167833139389561760300868802790
Raisin
2
ACCTGTTCTGTAGTAACACCTTAGTGCTAGAATTAGCATTAGCATTATTGCACCGTAC
A .... G..............CT.T .....
.......................
Tiffin 1
Tiffin 2
Embarras
Kankakee
...A ........................ G ............ C..T .......
........
.................. TTCC.A....G..............C
1
1
... A ..................
G ..... G ............ C .. T...A.G.
A .... G........G...C.CTT..A ...
.... A ..................
Turtle 1
Turtle 2
Susquehanna
Susquehanna
Tippecanoe
Hocking 1
.......................
.......................
1
Little 1
French Broad
Elkhart 2
Stony 2
Stony 6
Big 3
Big 5
Gasconade
Venusta
....A .................. A ....G......A.....C.CT .......
...A ........................ G ......... C ..C..T...A ...
1
2
1
A .... G ............ C.CT .......
A .... G............C.CT.....G.
...................... GA..T.G............C..T .......
...A .......... G ....... G ...TAG...CC....C.G.......A
..G
1
..... G ........
..............
..............
..............
.. ..G
G.TC.....A..TAG..GCC....C.G..C..T
.. ..G
G ....... GA..TAGA..CC......G..C..T
. ..G
G ....... GAC.TAG...CC....C....C..TA
G ....... GA..TAG...CCG...C...CC.....A .G
T ...................... A.ATCG.......T....C.CT .......
....A ................ C.A....G............C.CT.....G.
....
.AT.A.CTCAGCAG.G.......A.AT.G.T........A.C.CT..A
...
.... A.CT ...... G.A..TC..A..T.G.CG...A......CC...AA
FIG. 2.-Variable
nucleotides in C. spilopteru nonduplicated
control region sequences. Individual fish are identified by river of origin and
a numeral distinguishing
individuals from the same collection. Numbering refers to positions in complete 670-bp alignment; a dot indicates
same nucleotide as top sequence.
GCATAACGTATCTGTACTTC-3’)
(fig. 1). Specific repeat copies were designated based on their positions on
the light strand, i.e., 5’, internal, or 3’.
Data Analysis
Sequences
were aligned using PILEUP in GCG
(Devereux, Haeberli, and Smithies 1984). To assess differences in variability
among different sequences, average pairwise differences within each of four categories
(5’ copies, internal copies, 3’ copies, and nonduplicated
sequences) were calculated using Jukes-Cantor distances.
Because of an 8-bp deletion found in many of the repeats, distances were calculated using the pairwise deletion option in the computer program MEGA (Kumar,
Tamura and Nei 1993). Variation of Jukes-Cantor
distances within and among the four groups of sequences
was analyzed with a Kruskal-Wallis
nonparametric
analysis of variance.
Phylogenetic
analysis was conducted separately for
duplicated
and nonduplicated
sequences
using parsimony and neighbor-joining
methods. For the repeats,
each copy was considered a separate OTU. In parsimony
analyses (PAUP 3.1.1; Swofford 1993), all nucleotide
changes were given equal weight, and an 8-bp deletion
was coded as a single base change. Heuristic searches
used random taxon addition and TBR branch swapping,
and 10 trees were held at each step during 100 search
replicates.
Neighbor-joining
analysis (Saitou and Nei
1987) employing Jukes-Cantor distances and confidence
probability
(CP) values (Rzhetsky and Nei 1992) was
performed with MEGA (Kumar, Tamura, and Nei 1993).
Comparison of tree topologies supporting alternate models of repeat evolution was conducted with the nonparametric test of Templeton (1983).
Results
Phylogenetic
Analysis
of Nonduplicated
Sequences
Variable nucleotides from nonduplicated
sequences
are shown in figure 2 (complete sequences available under GenBank accession numbers U73315-U73323
and
U86047-U86057).
No two sequences
were identical;
however, much of the variation was unique to single
individuals.
Phylogenetic
analysis yielded limited resolution with both parsimony and neighbor-joining
(fig. 3).
However, individuals Little 1, Stony 2, Stony 6, Elkhart
2, and French Broad 1 form a lineage distinct from the
remaining individuals.
Variation
Within
and Among
Repeats
Variable nucleotides in tandem repeats are shown
in figure 4 (complete sequences under GenBank accession numbers U73306-U733
14 and U86058-U86068).
No standard-length
(single-copy) genomes or length heteroplasmy was observed, and the positions of duplication endpoints were identical for all copies in all individuals. Divergence within C. spiloptera was <5% for
both duplicated and nonduplicated
sequences.
Nucleotide
diversity differed significantly
among
different repeat positions and nonduplicated
sequences
(fig. 5). Because of the potential problem of assessing
the homology of particular copies among arrays which
differ in copy number, only the 15 individuals with three
copies (fig. 4) were used for the nucleotide
diversity
analysis. This allowed 105 pairwise distance comparisons for each group of sequences. Diversity was lowest
for 5’ copies and highest for the nonduplicated sequences
(Kruskal-Wallis
test: H = 28.694, P < 0.0001). In a
posteriori
t-tests (corrected
for multiple
tests; Rice
1989), 5’ copies were found to exhibit significantly re-
1190
Broughton
and Dowling
B
A
I-
Raisin 2
Tiffin 1
Embarras 1
Turtle 1
Turtle 2
ti7,
Hocking 1
Tippecanoe 1
Susquehanna 2
Big 3
Big 5
Gasconade 1
Little 1
Elkhart 2
Stony 2
Stony 6
French Broad 1
C. venusfa
A
Embarras 1
Tlffin 1
I ,75
Turtle 2
Turtle 1
- Susquehanna2
Big 3
Hocking 1
r
67
French Broad 1
Gasconade 1
C. venusfa
I
1.0%
FIG. 3.-Phylogenetic
trees. B, Neighbor-joining
relationships of individual haplotypes
tree with Ck values of 150. - --
indicated
duced diversity relative to all other groups. These results
indicate that among repeats in C. spiloptera, copies in
the 5’ position are conserved, and that the portion of the
control region adjacent to tRNAPro evolves rapidly relative to the repeat region.
Phylogenetic
Analysis
of Repeats
Parsimony analysis of all repeats yielded >32,000
equally most parsimonious trees (the limit of memory on
the computer used). The only structure was a monophyletic
group containing all copies from individuals Little 1, Stony
2, Stony 6, Elkhart 2, and French Broad 1, and a group
uniting the 3’ and internal copies from individual Big 5
(not shown). For practical purposes, subsequent analyses
included sequences from 10 individuals (9 C. spiloptera
and 1 C. venusta) that exhibited nucleotide differences in
two or more copies. For these individuals, parsimony analysis produced 233 shortest trees (fig. 6A). Use of the neighbor-joining approach resolved additional nodes, but CP
values on many of these were low (fig. 6B).
Examination
of the repeat sequence data (fig. 4)
revealed that the source of limited resolution was extreme conflict among characters. Conflict appears to be
attributable to a derived 8-bp deletion and an adjacent
A-to-G change (positions 11-19 in fig. 4; note that the
deletion
was weighted
equal to a single-nucleotide
change). These nucleotides (hereafter referred to as the
G/deletion)
occurred in all internal and 3’ copies but
never in 5’ copies, a distribution consistent with the single-origin model. However, most of the other characters
(positions 60-62,
159, 176, 177, 183-185,
187, 188,
210, 225, 240, and 248; fig. 4) exhibited derived states
in all copies (internal, 3 ‘, and 5 ‘) in the individuals
in
which they occurred. Ten of these 15 characters were
uniquely derived in single individuals.
These 15 characters support grouping of repeats by individuals
or
erouns of individuals
as nredicted bv the multinle-ori-
by nonduplicated
-
sequences.
A, Parsimony
strict consensus
of 8,124
gins or homogenization
models. Conflict among these
two groups of characters resulted in reduced phylogenetic resolution,
yet the presence
of well-supported
nodes grouping all copies from some individuals
(e.g.,
Stony 6 and French Broad 1) support the multiple-origins or homogenization
models (fig. 6).
Although the G/deletion reduced phylogenetic
resolution, in no case among the 233 shortest trees (37
steps) did it form an unreversed synapomorphy.
Enforcing a topological constraint on a single node, making
the G/deletion
(and thus all 3’ and internal copies)
monophyletic,
required five additional steps. In each of
these 582 trees (42 steps), parallel mutation of 15 other
nucleotide characters was required. To assess the significance of the difference between trees consistent with
each model, the nonparametric
test of Templeton (1983)
was applied. From each set of most-parsimonious
trees
(37 vs. 42 steps), one fully resolved tree that minimized
the number of homoplastic characters was selected (fig.
7). These trees have similar structures for nodes that are
not important in discriminating
among hypotheses. The
Templeton test indicated that the topology supporting
multiple origins or homogenization
(fig. 7A) was significantly shorter (T, = 33, n = 17, P < 0.025) than the
alternative supporting a single origin (fig. 7B).
Relationships
of repeats from individual fish were
consistent with relationships of nonduplicated
sequences.
In particular, sequences from individuals French Broad
1 and Stony 6 form a well-supported
group in both analyses. In addition, phylogenetic analysis of only 5’ copies
from each individual (not shown) yielded a pattern very
similar to that in figure 3, with strong support for the
French Broad l-Stony
6 group. These results demonstrate that sequence evolution in the tandem repeats reflects the history of the lineages in which they are found,
and that recombination
or gene conversion among lineaees is ne&gible.
Tandem Repeats in Minnow mtDNA
11111111111222222
11111111136666902257788888012444
1267812345678950123208996734578705048
TTGCGAAACCCCCTGCCCGGA--AAAGGAAACAAATTA
. . . . . G-__----- . . . . . . . TA..............
Raisin 2-5’
Raisin
Tiffin
2-3'
l-5'
.....................................
. . . . . . . TA..............
. .-. . G __-----_
Tiffin 1-3'
Tiffin
Tiffin
Tiffin
. . . . . . . . . . . . . . . . . . . ..--G . . .A....G....
2-5'
2-i
2-3’
. . . . . G--------.......--G...A....G....
. . . . . G ------mm
Embarras
1-5'
Embarras
Kankakee
1-3'
l-5'
Kankakee
Kankakee
l-i
1-3'
Turtle
Turtle
1-5'
l-i
2-5’
2-i5'
2-i3'
2-3’
Susquehanna
1-5'
Susquehanna
Susquehanna
Susguehanna
Susquehanna
l-i
l-3'
2-5’
2-3’
Tippecanoe
Tippecanoe
Tippecanoe
1-5'
l-i
l-3'
l-5'
l-i
l-3'
Hocking
Hocking
Hocking
Little l-5'
Little l-i
Little l-3'
French
French
French
Broad
Broad
Broad
Elkhart 2-5'
Elkhart 2-i
Elkhart 2-3'
Stony 2-5'
Stony 2-i
Stony 2-3'
Stony
6-5 ’
Stony
6-i
Stony
6-3
Big 3-5’
Big 3-i
Big 3-3’
Big 5-5’
Big 5-i
Big 5-3’
Gasconade
Gasconade
Gasconade
Venusta
’
. . . . . AT--G...A....G....
. . . . . . . . . . . . . . . . . . . ..--..............
. . . . . G--------.......TA..............
..-- .... A .... G ....
...................
..... G--------.......-A....A....G ....
-A....A....G ....
..... G----_--- .......
A . . . .G . . . .
. . . . . . . . . . . . . . . . . . . ..--....
Turtle l-3'
Turtle
Turtle
Turtle
Turtle
1191
. .-. .G --------.......-- ....A ....G..A.
..... G--------.......--....A....G ....
................... ..-- ..G.A....G ....
..... G--------.......--..G.A ....G ....
..... G--------.......--..G.A....S
....
G.A....G....
. . . . . ..--..
. . . . . G -------. . . . . . . . . . . . . . . . . . . ..--.... A . . . .G....
A . . . .G. . . .
. . . . . ..--....
G--------.......--....A....G
....
..-- .... A .... G . ..G
...................
.
.
.
.
.
G---__---
.....
......... G ----__--
A ..-- .... A .... G . ..G
................... ..-- ....A .... G ....
.....
.....
G--------.......--....A....G
....
G--------.......--....A....G
....
..-- .... A .... G ....
...................
..... ..-- .... A .... G ....
..... G-------_
....
..... G--------.......--....A....G
A .... ..-- ... TA ...... ..G
..............
....
....
l-5’
l-i
l-3’
AG--------.......--...TA ...... ..G
AG--------.......--...TA ...... ..G
................... ..-- ...TA......A .G
..-- ... TA......A .G
..... G------__ .....
..-- ... TA......A .G
..... G------__ .....
................... ..-- ...TA ...... ..G
..-- ... TA ...... ..G
..... G-----___ .....
..... G---_____ .....
..-- ... TA ...... ..G
................... ..-- ...TA ...... ..G
..-- ... TA ...... ..G
..... G------__ .....
..-- ... TA ...... ..G
. .-. . G---_____ .....
............... ATT...--...TATTT ... ..G
..... G--------.ATT...--...TATTT ... ..G
..... G--------.ATT...--...TATTT ... ..G
..................... --.G..A....G ....
--.G..A...TG ....
. .-. . G__------ .......
..... G--------.......--. G..A....G ....
...................
l-5’
l-i
l-3’
..-- .... A .... GG ...
ACCGAG--------.......-- ....A ....GG ...
ACCGAG--------.......--....A....GG
...
..................... --.G.TA....G ....
..... G--------.......--.G.TA....G ....
..... G--------.......--.G.TA....G ....
..................... TA...CA .........
FIG. 4.-Variable
nucleotides in copies of tandem repeats. Individuals are identified as in figure 2; copies within individuals are indicated
by physical position (5’, internal [i], or 3’). For each individual, all copies are included. Repeats included in phylogenetic analyses are indicated
by boldface type. Numbering refers to position in the complete 271-bp alignment; a dot indicates same nucleotide as top sequence; a dash
indicates an alignment gap.
Discussion
Repeat Variation
Differences
in nucleotide diversity among repeats
and nonduplicated sequences suggest that these sequences
are subject to different functional constraints. It has been
well established that the central part of the control region is more highly conserved than the adjacent flanking
regions in mammals (Saccone, Pesole, and Sbisa 1991)
and in fishes (Brown, Beckenbach,
and Smith 1992;
Shedlock et al. 1992; Lee et al. 1993, apparently due
to the presence of regulatory elements. Therefore, it is
not surprising that nonduplicated
sequences proximal to
tRNAPro exhibit substantially
more diversity. Reduced
diversity of 5’ copies relative to other repeat units is
similar to results reported for the evening bat (Wilkinson
1192
Broughton
and Dowling
within repeat arrays, derived variation is usually shared
among internal copies, whereas terminal copies tend to
accumulate
fewer changes and have greater similarity
among arrays. This phenomenon
was termed the “edgeeffect” by Rand (1994). Although
Fumagalli
et al.
(1996) suggested that addition and deletion occur at the
5’ end and that the 3’ copy is protected from such
events, the present results indicate that 5’ copies are
conserved. An edge-effect at the 3’ end was not evident,
but this may be a function of the limited number repeats
in C. spilopteru. It therefore appears that 5 ’ copies are
ancestral, while additional copies accumulate in the 3’
direction.
Ch 0.020
!?
z 0.015
a,
g8 0.010
Ti
z’ 0.005
0.000
5’
Int
3’
ND
Sequence Type
FIG. 5.-Nucleotide diversity in nonduplicated sequences and 5’,
internal, and 3’ copies of the repeat. Error bars = SEM; for each group,
n = 105.
and Chapman 1991). In that and many other systems,
tandem repeats are found in the portion of the control
region adjacent to tRNAPro, and are assumed to result
from strand slippage at the 3’ end of the D-loop (7s)
DNA (Buroker et al. 1990). Repeats at the opposite end
of the control region (adjacent to tRNAPhe) may result
from a similar mechanism involving slippage of the nascent heavy strand as replication nears completion (Hayasaka, Ishida, and Horai 1991; Broughton and Dowling
1994). In either case, repeats would be added on the
heavy strand in the same directional orientation but differing in timing (early vs. late) and position (left vs.
right of 0,).
An implication of these results is that the template
for initial duplication events occupies the 5’ position but
may be involved to a lesser extent in subsequent additions and deletions. Several studies have indicated that
Character
Repeats
r
Raisin 2-5
Raisin 2-5
Raisin 2-3
-
Tiffin 2-5
TEfin P-int
Tin
Among
Despite limited phylogenetic
resolution, examination of nucleotide character state distributions
and functional constraints provide insights on the evolutionary
dynamics of mtDNA tandem repeats. The G/deletion occurs in what would otherwise be conserved sequence
block 2 (CSB-2). Given its regulatory role in replication
and transcription
(Chang, Hauswirth, and Clayton 1985;
Clayton 1991, 1992) and its high level of sequence conservation among vertebrates (Foran, Hixson, and Brown
1988), CSB-2 appears to be an essential element in normal mtDNA function. The duplicated segments contain
CSB-2 and CSB-3 and presumably
both heavy- and
light-strand promoters, and are adjacent to On. Disruption of regulatory sequences in “extra” copies may confer an advantage, as multiple functional copies of this
important region may complicate regulatory processes.
Deletion of part of CSB-2 in extra copies may reduce
or eliminate potential problems in regulation, favoring
individuals lacking multiple intact copies of CSB-2.
B
A
Distribution
2-3
Turtle 2-5
I
Turtle 2-int5
Turtle 2-int3
Turtle 2-3
Susquehanna
2-5
Susquehanna
2-3
French Broad l-5
French Broad 1 -int
Raisin 2-3
76
.
Big 5-int
89
1Big
5-3
54
French Broad l-3
Gasconade
l-5
Stony 6-5
Stony 6-int
-
Stony 6-3
Gasconade
1 -int
Big 3-5
-
Susquehanna
Susquehanna
2-5
2-3
Big 5-5
Big 5-int
Big 5-3
Gasconade
l-5
Gasconade
I-int
Gasconade
l-3
C. venusta
FIG. 6.-Phylogenetic
relationships
of repeats
Neighbor-joining
tree with CP values of ~50.
from
10 individuals.
A, Strict consensus
of 233 most-parsimonious
trees (37 steps). B,
Tandem
A
Repeats
in Minnow
mtDNA
1193
B
Raisin 2-5
Raisin 2-3’
Raisin 2-3
Tiffin P-int
Tiffin 2-5
Tiffin 2-3
Tiffin 2-int
Turtle 2-int5
Tiffin 2-3
Turtle 2-int3
Turtle 2-5
Turtle 2-3
Turtle 2-int5’
Big 5-int
Turtle 2-int3’
Big 5-3
Turtle 2-3’
Big 3-int
Big 5-5’
Big 3-3
Big 5-int
Gasconade
I-int
Big 5-3’
Gasconade
l-3
Big 3-5’
Big 3-int
Susquehanna
French Broad l-3
Big 3-3
h
FIG. 7.-Fully
resolved trees consistent
(selected from each set of most-parsimonious
2-3
French Broad 1-int
Gasconade
Gasconade
1-5’
I-int
Gasconade
1-3
Stony 6-int
Stony 6-3
Raisin 2-5’
Susquehanna
2-5
Tiffm 2-5
Susquehanna
2-3’
Turtle 2-5
French Broad 1-5’
Big 5-5
French Broad 1-int
Big 3-5
French Broad l-3
Gasconade
Stony 6-5
Susquehanna
Stony 6-int
French Broad l-5
Stony 6-3
Stony 6-5’
C. venusta
C. venusta
1-5
2-5
with (A) multiple origins or homogenization
and (B) a single orgin of the repeats. These trees
trees) minimized the number of characters that were homoplastic. Open hash marks indicate the
G/deletion. and shaded hash marks refer to characters in figure 4 as follows: a = 1, 2, 6-8; b = 60-62, 185, 187, 188; c = 159; d = 176; e
= 177; f = 183; g = 184; h = 210; i = 225; j = 240; k z 248.
While it is not difficult to envision a full-length
copy (i.e., one without the G/deletion) giving rise to a
copy with the G/deletion, it seems very unlikely that the
reverse could occur. As such, 5’ copies are not likely to
be deleted; first, because the molecule would be left
without a functional CSB-2, and second, because there
would be no conceivable
way to regenerate an intact
CSB-2. This apparent irreversibility
has important implications for character state transformations
in phylogenetic analysis because an array containing at least one
full-length copy could never be derived from a lineage
where all copies possess the G/deletion.
Clearly
some characters
must have converged
across repeat units. Not unexpectedly,
coding the G/deletion as a single character (vs. two characters) or removing it entirely reduced homoplasy and substantially
increased
phylogenetic
resolution.
These results (not
shown) are consistent with the topology in figure 7A.
Three lines of evidence suggest that the G/deletion is
convergent:
(1) Trees allowing convergent evolution of
the G/deletion are significantly
shorter than alternatives
where it evolves only once. (2) It is more parsimonious
to assume that the G/deletion (i.e., two possibly related
characters) is convergent than to assume that 15 nucleotide characters (that unite sets of copies in specific lineages) were derived more than once. (3) The apparent
irreversibility
of the G/deletion
requires that it must
have been independently
derived in each monophyletic
group containing both 5’ and extra copies.
In light of evidence indicating that the G/deletion
is convergent, the distribution
of intact CSB-2s only in
5’ copies suggests that whenever a new copy is made
from a 5’ copy, a G/deletion is produced (or only when
a G/deletion is produced may repeats persist). Presum-
ably, an event drastic enough to result in deletion of
eight bases could also cause an adjacent nucleotide
change. Such an event could be an artifact of the duplication process, conceivably
involving the secondary
structures thought to facilitate duplication
formation.
The G/deletion therefore represents a deterministic form
of mutation that is apparently driven by the molecular
mechanisms causing sequence duplications. Because the
G/deletion provides the only support for the single-origin model, this model may be rejected in favor of either
multiple origins or homogenization.
Although the multiple-origins
and homogenization
models were not differentiated by phylogenetic analysis,
they differ in several respects. Primary differences include the timing of point mutations (before duplication
for multiple origins and after initial duplication for homogenization)
and the relative rate of copy addition/deletion (fast vs. slow, respectively).
These differences are
incorporated
into a simple example where individuals
with repeats fixed for nucleotide differences are derived
from a single ancestor (fig. 8). This example illustrates
two important points: (1) the multiple-origins
model requires fewer evolutionary
steps than homogenization
(accounting
for both addition/deletion
and point mutations), and (2) complete homogenization
passes through
intermediate
forms in which derived nucleotides
are
shared by some but not all copies.
To explain repeat variation in C. spilopteru under
the two models, multiple origins would require 30 mutational events, while homogenization
would require a
minimum of 38 events. Because of the presumably stochastic nature of copy addition/deletion,
the actual number of events required for complete homogenization
may
actually be much higher than the minimum.
Multiple
1194
Broughton
and Dowling
FIG. 8.-Mutational
steps required by (A) multiple origins and (B) homogenization
for each to produce two repeat arrays fixed for nucleotide
differences. Vertical arrows = derived nucleotide state; dashed arrows = copy number changes. Multiple origins requires three steps, while
homogenization
requires four steps.
origins is, therefore, a more parsimonious
explanation.
With respect to intermediate
forms, it appears that copies with the G/deletion cannot occupy the 5’ position;
therefore, nucleotide changes arising in copies with the
G/deletion cannot be completely homogenized
in an array. Consequently,
fixed derived nucleotides must originate in 5’ copies, and the homogenization
process
should produce intermediate
states where derived nucleotides are shared by the 5’ and some extra copies. Although some unfixed variants were shared among extra
copies in C. spiloptera, there were no cases of incomplete homogenization
involving a 5’ copy (fig. 4).
Both the lack of expected intermediates
and the
greater number of evolutionary
steps predicted by the
homogenization
model could be explained by a sufficiently high rate of copy addition/deletion.
Although an
expected by-product
of rapid copy turnover is length
heteroplasmy, no heteroplasmy
was observed among the
nearly 50 individuals examined. In the C. spiloptera system with relatively low copy number, large repeat units,
and lack of heteroplasmy,
the rate of copy turnover
would appear to be fairly low. Therefore, while neither
homogenization
nor multiple origins can be rejected, the
lack of circumstantial
evidence for homogenization
suggests it has not acted recently. In addition, regardless of
whether repeats arise independently
in different lineages
(multiple origins) or are added to existing arrays (homogenization),
it is clear that the G/deletion must be
produced whenever a 5’ copy serves as the template for
duplication.
The patterns of nucleotide variation reported here
provide several new perspectives
on the evolution of
mitochondrial
tandem repeats. The multiple-origins
model remains a viable explanation for the distribution
of fixed nucleotide variants within various arrays. It is
possible that homogenization
has acted in conjunction
with multiple origins, but tests of such a scenario will
be difficult because of the potential for each model to
produce similar end results. Although homogenization
is clearly an important factor governing tandem repeat
variation in many animal taxa, multiple origins may be
more important than previously thought. The G/deletion
also allows novel insights on molecular evolution by
providing new evidence for a real, but as yet uncharacterized, deterministic
component
of DNA mutation.
Such mutations may be important where specific secondary structures interfere with DNA replication.
Repeat Evolution
We suggest the following hypothetical
framework
for control region tandem repeat evolution. Original duplications are formed via strand slippage, which may be
facilitated
by DNA secondary
structures.
Stem-loop
structures may be formed by the repeat units themselves
or possibly occur upstream (with respect to replication)
of the duplicated region. Because the ability to form
secondary structures is sequence-specific,
sequence variation among taxa may contribute to variation in the positions and lengths of the structures formed. This may
account for the observation
of different tandem repeats
in the same general locations among taxa and could also
facilitate the independent duplication of homologous sequences within or among species. Once two copies are
present, the propensity for strand slippage may increase,
allowing tandem arrays to expand and contract. Given
multiple copies of a repeat, those that serve as templates
in particular addition events may vary stochastically, but
internal copies should be favored when they outnumber
terminal copies.
If repeats contain functionally
important regions,
the primary sequence of terminal copies (or at least the
important parts) may be maintained by functional constraints, while selection on internal copies is relaxed.
Therefore, a greater proportion of variation which becomes completely homogenized
will likely have arisen
in terminal copies, yet these copies should also be more
conserved. The extent of nucleotide variation within and
among repeat arrays will vary with the propensity for
parallel duplication and the rate of addition/deletion
relative to nucleotide mutation.
The discovery of the same (or very similar) tandem
repeat units among many species of shrews (Fumagalli
et al. 1996), darters (Turner 1997), and bats (Wilkinson
et al. 1997) suggests greater evolutionary
significance
Tandem
Repeats in Minnow
mtDNA
1195
GHIVIZZANI,S. C., S. L. D. MACKAY, C. S. MADSEN, I? J.
LAIPIS, and W. W. HAUSWIRTH.1993. Transcribed heteroplasmic repeated sequences in the porcine mitochondrial
DNA D-loop region. J. Mol. Evol. 37:36-47.
GLICKMAN,B. W., and L. S. RIPLEY. 1984. Structural intermediates of deletion mutagenesis: a role for palindromic
DNA. Proc. Natl. Acad. Sci. USA 81:512-516.
HAYASAKA,K., T. ISHIDA,and S. HORAI. 199 1. Heteroplasmy
and polymorphism in the major noncoding region of miAcknowledgments
tochondrial DNA in Japanese monkeys: association with
tandernly repeated sequences. Mol. Biol. Evol. 8:399-415.
We thank Joe Bielawski, Rick Harrison, Doug
McElroy, Gavin Naylor, David Rand, John Rice, Tom HOELZEL,A. R., J. M. HANCOCK,and G. A. DOVER. 1993.
Generation of VNTRs and heteroplasmy by sequence tumTurner, and Rob Wood for helpful discussions and/or
over in the mitochondrial control region of two elephant
comments on the manuscript.
For assistance in collectseal species. J. Mol. Evol. 37:190-197.
ing or providing specimens, we thank Don Buth, Robert
KUMAR,S., K. TAMURA,and M. NEI. 1993. MEGA: molecular
Dawley, Abby Dowling, Karen Dowling, Dave Etnier,
evolutionary genetics analysis. The Pennsylvania State UniTom Haglund, Randy Hoeh, Wayne Starnes, Ross Timversity, University Park.
mons, Matt White, and Rob Wood. This work was supLEE, W.-J., J. CONROY,W. H. HOWELL,and T. D. KOCHER.
ported by the Department of Zoology and Graduate Stu1995. Structure and evolution of teleost mitochondrial condent Association at Arizona State University, Sigma Xi,
trol regions. J. Mol. Evol. 41:54-66.
and NSF (DEB-9220683).
LEVINSON,G., and G. A. GUTMAN. 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution.
LITERATURE CITED
Mol. Biol. Evol. 4:203-221.
MADSEN,
C. S., S. C. GHIVIZZANI,and W. W. HAUSWIRTH.
ARNASON,E., and D. M. RAND. 1992. Heteroplasmy of short
1993. In vivo and in vitro evidence for slipped mispairing
tandem repeats in mitochondrial DNA of Atlantic cod, Gain mammalian mitochondrial DNA. Proc. Natl. Acad. Sci.
dus morhua. Genetics 132:21 l-220.
USA 90:767 l-7675.
BROUGHTON,R. E., and T. E. DOWLING.1994. Length variation
MUNDY,
N. I., C. S. WINCHELL,and D. S. WOODRUFF.1996.
in mitochondrial DNA of the minnow Cyprinda spilopTandem repeats and heteroplasmy in the mitochondrial
tera. Genetics 138: 179-190.
DNA control region of the loggerhead shrike (Lunius ZuBROWN, J. R., A. T. BECKENBACH,and M. J. SMITH. 1992.
dovicianus). J. Hered. 87:21-26.
Mitochondrial DNA length variation and heteroplasmy in
OHTA, T. 1983. On the evolution of multigene families. Theor.
populations of white sturgeon (Acipenser transmontanus).
Popul. Biol. 23:216-240.
Genetics 132:221-228.
PIERCE,J. C., D. KONG, and W. MASKER. 199 1. The effect of
BROWN,W. M. 1985. The mitochondrial genome of animals.
the length of direct repeats and the presence of palindromes
Pp. 95-130 in R. MACINTYRE,ed. Molecular evolutionary
on deletion between directly repeated DNA sequences in
genetics. Plenum Press, New York.
bacteriophage T7. Nucleic Acids Res. 19:3901-3905.
BUROKER,N. E., J. R. BROWN,T. A. GILBERT,F? J. O’HARA,
A. T. BECKENBACH,
W. K. THOMAS,and M. J. SMITH. 1990. RAND, D. M. 1993. Endotherms, ectotherms, and mitochondrial
Length heteroplasmy of sturgeon mitochondrial DNA; an
genome-size variation. J. Mol. Evol. 37:281-295.
illegitimate elongation model. Genetics 124: 157-l 63.
-.
1994. Concerted evolution and RAPing in mitochonCHANG,D. D., W. W. HAUSWIRTH,and D. A. CLAYTON.1985.
drial VNTRs and the molecular geography of cricket popReplication priming and transcription initiate from precisely
ulations. Pp. 227-245 in B. SCHIERWATER,
B. STREIT,G. I?
the same site in mouse mitochondrial DNA. EMBO J. 4:
WAGNER, and R. DESALLE, eds. Molecular ecology and
1559-1567.
evolution: approaches and applications. Birkhauser Verlag,
CLAYTON,D. A. 1991. Nuclear gadgets in mitochondrial DNA
Basel.
replication and transcription. Trends Biochem. Sci. 16: 107RICE, W. R. 1989. Analyzing tables of statistical tests. Evolu111.
tion 43~223-225.
1992. Transcription and replication of animal mitoR~HETSKY,A., and M. NEI. 1992. A simple method for estichondrial DNAs. Int. Rev. Cytol. 141:217-232.
mating and testing minimum-evolution trees. Mol. Biol.
DEVEREUX,J., I? HAEBERLI,and 0. SMITHIES.1984. A comEvol. 9: 945-967.
prehensive set of sequence analysis programs for the VAX.
SACCONE,C., G. PESOLE,and E. SBISA. 1991. The main regNucleic Acids Res. 12:387-395.
ulatory region of mammalian mitochondrial DNA: strucDOVER,G. A. 1982. Molecular drive: a cohesive mode of speture-function model and evolutionary pattern. J. Mol. Evol.
cies evolution. Nature 299: 11 l-l 17.
33:83-91.
ELDER,J. E, and B. J. TURNER. 1995. Concerted evolution of
SAITOU,N., and M. NEI. 1987. The neighbor joining method:
repetitive DNA sequences in eukaryotes. Q. Rev. Biol. 70:
a new method for reconstructing phylogenetic trees. Mol.
297-319.
Biol. Evol. 4:406-425.
FORAN, D. R., J. E. HIXSON, and W. M. BROWN. 1988. ComSHEDLOCK,A. M., J. D. PARKER,D. A. CRISPIN,T. W. PIETSCH,
parisons of ape and human sequences that regulate mitoand G. C. BURMER. 1992. Evolution of the salmonid michondrial DNA transcription and D-loop synthesis. Nucleic
tochondrial control region. Mol. Phylogenet. Evol. 1:179Acids Res. 13:5841-5861.
192.
FUMAGALLI,L., I? TABERLET,L. FAVRE,and J. HAUSSER.1996.
SMITH, G. F? 1973. Unequal crossover and the evolution of
Origin and evolution of homologous repeated sequences in
multigene families. Cold Spring Harb. Symp. Quant. Biol.
the mitochondrial DNA control region of shrews. Mol. Biol.
38:507-513.
Evol. 13:3 l-46.
for repeat arrays than has been previously
thought.
These results indicate either that such repeats exhibit
substantial historical stability or that mutation pressure
has caused them to arise repeatedly in multiple lineages.
Additional detailed investigations
should provide further
information
on the evolutionary
mode and significance
of tandem repeats in mtDNA.
1196
Broughton
and Dowling
SOLIGNAC, M., M. MONNEROT, and J.-C. MOUNOLOU. 1986.
Concerted evolution of sequence repeats in Drosophila mitochondrial DNA. J. Mol. Evol. 2453-60.
STANTON, D. J., L. L. DAHLER, C. MORITZ, and W. M. BROWN.
1994. Sequences with the potential to form stem-and-loop
structures are associated with coding-region duplications in
animal mitochondrial DNA. Genetics 137:233-24 1.
STEWART, D. T., and A. J. BAKER. 1994. Patterns of sequence
variation in the mitochondrial
D-loop region of shrews.
Mol. Biol. Evol. 11:9-21.
SWOFFORD, D. L. 1993. PAUP: phylogenetic analysis using
parisimony. Version 3.1.1. Illinois Natural History Survey,
Champaign.
TEMPLETON, A. R. 1983. Phylogenetic inference from restric-
tion endonuclease cleavage site maps with particular refer-
ence to the evolution of humans and the apes. Evolution
371221-244.
TURNER, T. E 1997. Mitochondrial
control region sequences
and phylogenetic systematics of darters (Teleostei: Percidae). Copeia 1997:319-338.
WILKINSON, G. S., and A. M. CHAPMAN. 1991. Length and
sequence variation in evening bat D-loop mtDNA. Genetics
128:607-617.
WILKINSON, G. S., E MAYER, G. KEXRTH,and B. PETRI. 1997.
Evolution of repeated sequence arrays in the D-loop of bat
mitochondrial DNA. Genetics 146: 1035-1048.
NARUYA SAITOU, reviewing
Accepted August 20, 1997
editor