Evolutionary Dynamics of Tandem Repeats in the Mitochondrial DNA Control Region of the Minnow Cyprinella spiloptera Richard E. Broughton’ and Thomas E. Dowling Department of Zoology, Length variation Arizona State University due to tandem repeats is now recognized as a common feature of animal mitochondrial DNA; however, the evolutionary dynamics of repeated sequences are not well understood. Using phylogenetic analysis, predictions of three models of repeat evolution were tested for arrays of 260-bp repeats in the cyprinid fish Cyprinella spilupteru. Variation at different nucleotide positions in individual repeats supported different models of repeat evolution. One set of characters included several nucleotide variants found in all copies from a limited number of individuals, while the other set included an 8-bp deletion found in a limited number of copies in all individuals. The deletion and an associated nucleotide change appear to be the result of a deterministic, rather than stochastic, mutation process. Parallel origins of repeat arrays in different mitochondrial lineages, possibly coupled with a homogenization mechanism, best explain the distribution of nucleotide variation. Introduction Animal mitochondrial DNA (mtDNA) has been viewed as an example of genetic economy due to small genome size (typically 16-18 kb), absence of introns, and limited noncoding sequences (Brown 1985). However, this view has recently been challenged by the discovery of frequent tandem duplications, primarily in the form of tandem repeat arrays in the contol region. As the number of taxa for which sequence or restrictionsite data exist has increased, the occurrence of tandem repeats has become recognized as a common feature of mtDNA. Rand (1993) listed more than 50 species with variable-length genomes, and examples continue to be discovered at a rapid rate (e.g., Ghivizzani et al. 1993; Hoelzel, Hancock, and Dover 1993; Broughton and Dowling 1994; Stewart and Baker 1994; Mundy, Winchell, and Woodruff 1996). Because much of the theory for the evolution of nuclear repeats involves some form of recombination (reviewed by Elder and Turner 1995), the evolution of repeats in mtDNA (which lacks recombination) is not well understood. Knowledge of molecular mechanisms that cause duplications to arise and proliferate is vital for a complete understanding of the evolution of mtDNA and its efficient use as a marker in evolutionary studies. Extensive length polymorphism in mtDNA tandem repeat arrays results from a propensity for addition and deletion of copies. In the absence of recombination, slipped-strand mispairing (SSM) during replication (Levinson and Gutman 1987; Buroker et al. 1990; Hayasaka, Ishida, and Horai 199 1) appears to be the primary mechanism causing additions and deletions (Madsen, Ghivizzani, and Hauswirth 1993). Sequences capable of forming secondary structures have been shown to increase the frequency of SSM in a variety of nonmitoI Present address: Section of Ecology and Systematics, Cornell University. Key words: tandem repeat evolution, mitochondrial DNA, Cyprinella spiloptera. Address for correspondence and reprints: Richard E. Broughton, Section of Ecology and Systematics, Corson Hall, Cornell University, Ithaca, New York 14853-2701. E-mail: [email protected]. Mol. Biol. Evol. 14(12):1187-l 196. 1997 0 1997 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 chondrial systems (Glickman and Ripley 1984; Pierce, Kong, and Masker 1991), and it appears that such structures also play an important role in mtDNA (Buroker et al. 1990; Wilkinson and Chapman 1991; Arnason and Rand 1992; Broughton and Dowling 1994; Stanton et al. 1994; Stewart and Baker 1994). Nucleotide variation among copies may reveal historical processes that gave rise to extant repeat arrays. Primary among these processes is the rate of copy addition and deletion relative to the nucleotide mutation rate. At a low rate of addition/deletion, copies in an array would diverge from each other as nucleotide changes accumulate. Alternatively, “homogenization” of nucleotide variation within specific lineages may occur as long as the rate of copy turnover is greater than the nucleotide mutation rate (Rand 1994). Greater sequence similarity within rather than among some repeat arrays has led several investigators (Solignac, Monnerot, and Mounolou 1986; Arnason and Rand 1992; Hoelzel, Hancock, and Dover 1993; Stewart and Baker 1994) to draw analogies with concerted evolution observed for many repeated nuclear sequences. However, because the term “concerted evolution” has been used to describe similarity of repeat arrays among lineages mediated by interlocus recombination (Dover 1982), the more general term “homogenization” may be more appropriate for nonrecombinant lineages. In fact, different repeat arrays in clonal lineages will tend to diverge, rather than evolve in concert, as they are homogenized for different nucleotide variants (Smith 1973; Ohta 1983). Independent formation of similar duplications in different lineages may also be important. Formation of specific secondary structures may repeatedly facilitate duplication of homologous sequences, resulting in independently derived repeat arrays. Although their frequency is unknown, convergent mitochondrial duplications have been reported within species (Hayasaka, Ishida, and Horai 199 1) and among mammalian orders (Wilkinson et al. 1997). If independent duplications arise in the presence of nucleotide polymorphisms, multiple repeat arrays, each possessing lineage-specific nucleotide variants, may result. We can therefore identify several evolutionary scenarios for tandem repeat evolution and derive predic1187 Tandem Repeatsin Minnow mtDNA 1189 111111111222333333334445556666 1234445555577779999111145999144003347890694694667 5790481390145623492345167833139389561760300868802790 Raisin 2 ACCTGTTCTGTAGTAACACCTTAGTGCTAGAATTAGCATTAGCATTATTGCACCGTAC A .... G..............CT.T ..... ....................... Tiffin 1 Tiffin 2 Embarras Kankakee ...A ........................ G ............ C..T ....... ........ .................. TTCC.A....G..............C 1 1 ... A .................. G ..... G ............ C .. T...A.G. A .... G........G...C.CTT..A ... .... A .................. Turtle 1 Turtle 2 Susquehanna Susquehanna Tippecanoe Hocking 1 ....................... ....................... 1 Little 1 French Broad Elkhart 2 Stony 2 Stony 6 Big 3 Big 5 Gasconade Venusta ....A .................. A ....G......A.....C.CT ....... ...A ........................ G ......... C ..C..T...A ... 1 2 1 A .... G ............ C.CT ....... A .... G............C.CT.....G. ...................... GA..T.G............C..T ....... ...A .......... G ....... G ...TAG...CC....C.G.......A ..G 1 ..... G ........ .............. .............. .............. .. ..G G.TC.....A..TAG..GCC....C.G..C..T .. ..G G ....... GA..TAGA..CC......G..C..T . ..G G ....... GAC.TAG...CC....C....C..TA G ....... GA..TAG...CCG...C...CC.....A .G T ...................... A.ATCG.......T....C.CT ....... ....A ................ C.A....G............C.CT.....G. .... .AT.A.CTCAGCAG.G.......A.AT.G.T........A.C.CT..A ... .... A.CT ...... G.A..TC..A..T.G.CG...A......CC...AA FIG. 2.-Variable nucleotides in C. spilopteru nonduplicated control region sequences. Individual fish are identified by river of origin and a numeral distinguishing individuals from the same collection. Numbering refers to positions in complete 670-bp alignment; a dot indicates same nucleotide as top sequence. GCATAACGTATCTGTACTTC-3’) (fig. 1). Specific repeat copies were designated based on their positions on the light strand, i.e., 5’, internal, or 3’. Data Analysis Sequences were aligned using PILEUP in GCG (Devereux, Haeberli, and Smithies 1984). To assess differences in variability among different sequences, average pairwise differences within each of four categories (5’ copies, internal copies, 3’ copies, and nonduplicated sequences) were calculated using Jukes-Cantor distances. Because of an 8-bp deletion found in many of the repeats, distances were calculated using the pairwise deletion option in the computer program MEGA (Kumar, Tamura and Nei 1993). Variation of Jukes-Cantor distances within and among the four groups of sequences was analyzed with a Kruskal-Wallis nonparametric analysis of variance. Phylogenetic analysis was conducted separately for duplicated and nonduplicated sequences using parsimony and neighbor-joining methods. For the repeats, each copy was considered a separate OTU. In parsimony analyses (PAUP 3.1.1; Swofford 1993), all nucleotide changes were given equal weight, and an 8-bp deletion was coded as a single base change. Heuristic searches used random taxon addition and TBR branch swapping, and 10 trees were held at each step during 100 search replicates. Neighbor-joining analysis (Saitou and Nei 1987) employing Jukes-Cantor distances and confidence probability (CP) values (Rzhetsky and Nei 1992) was performed with MEGA (Kumar, Tamura, and Nei 1993). Comparison of tree topologies supporting alternate models of repeat evolution was conducted with the nonparametric test of Templeton (1983). Results Phylogenetic Analysis of Nonduplicated Sequences Variable nucleotides from nonduplicated sequences are shown in figure 2 (complete sequences available under GenBank accession numbers U73315-U73323 and U86047-U86057). No two sequences were identical; however, much of the variation was unique to single individuals. Phylogenetic analysis yielded limited resolution with both parsimony and neighbor-joining (fig. 3). However, individuals Little 1, Stony 2, Stony 6, Elkhart 2, and French Broad 1 form a lineage distinct from the remaining individuals. Variation Within and Among Repeats Variable nucleotides in tandem repeats are shown in figure 4 (complete sequences under GenBank accession numbers U73306-U733 14 and U86058-U86068). No standard-length (single-copy) genomes or length heteroplasmy was observed, and the positions of duplication endpoints were identical for all copies in all individuals. Divergence within C. spiloptera was <5% for both duplicated and nonduplicated sequences. Nucleotide diversity differed significantly among different repeat positions and nonduplicated sequences (fig. 5). Because of the potential problem of assessing the homology of particular copies among arrays which differ in copy number, only the 15 individuals with three copies (fig. 4) were used for the nucleotide diversity analysis. This allowed 105 pairwise distance comparisons for each group of sequences. Diversity was lowest for 5’ copies and highest for the nonduplicated sequences (Kruskal-Wallis test: H = 28.694, P < 0.0001). In a posteriori t-tests (corrected for multiple tests; Rice 1989), 5’ copies were found to exhibit significantly re- 1190 Broughton and Dowling B A I- Raisin 2 Tiffin 1 Embarras 1 Turtle 1 Turtle 2 ti7, Hocking 1 Tippecanoe 1 Susquehanna 2 Big 3 Big 5 Gasconade 1 Little 1 Elkhart 2 Stony 2 Stony 6 French Broad 1 C. venusfa A Embarras 1 Tlffin 1 I ,75 Turtle 2 Turtle 1 - Susquehanna2 Big 3 Hocking 1 r 67 French Broad 1 Gasconade 1 C. venusfa I 1.0% FIG. 3.-Phylogenetic trees. B, Neighbor-joining relationships of individual haplotypes tree with Ck values of 150. - -- indicated duced diversity relative to all other groups. These results indicate that among repeats in C. spiloptera, copies in the 5’ position are conserved, and that the portion of the control region adjacent to tRNAPro evolves rapidly relative to the repeat region. Phylogenetic Analysis of Repeats Parsimony analysis of all repeats yielded >32,000 equally most parsimonious trees (the limit of memory on the computer used). The only structure was a monophyletic group containing all copies from individuals Little 1, Stony 2, Stony 6, Elkhart 2, and French Broad 1, and a group uniting the 3’ and internal copies from individual Big 5 (not shown). For practical purposes, subsequent analyses included sequences from 10 individuals (9 C. spiloptera and 1 C. venusta) that exhibited nucleotide differences in two or more copies. For these individuals, parsimony analysis produced 233 shortest trees (fig. 6A). Use of the neighbor-joining approach resolved additional nodes, but CP values on many of these were low (fig. 6B). Examination of the repeat sequence data (fig. 4) revealed that the source of limited resolution was extreme conflict among characters. Conflict appears to be attributable to a derived 8-bp deletion and an adjacent A-to-G change (positions 11-19 in fig. 4; note that the deletion was weighted equal to a single-nucleotide change). These nucleotides (hereafter referred to as the G/deletion) occurred in all internal and 3’ copies but never in 5’ copies, a distribution consistent with the single-origin model. However, most of the other characters (positions 60-62, 159, 176, 177, 183-185, 187, 188, 210, 225, 240, and 248; fig. 4) exhibited derived states in all copies (internal, 3 ‘, and 5 ‘) in the individuals in which they occurred. Ten of these 15 characters were uniquely derived in single individuals. These 15 characters support grouping of repeats by individuals or erouns of individuals as nredicted bv the multinle-ori- by nonduplicated - sequences. A, Parsimony strict consensus of 8,124 gins or homogenization models. Conflict among these two groups of characters resulted in reduced phylogenetic resolution, yet the presence of well-supported nodes grouping all copies from some individuals (e.g., Stony 6 and French Broad 1) support the multiple-origins or homogenization models (fig. 6). Although the G/deletion reduced phylogenetic resolution, in no case among the 233 shortest trees (37 steps) did it form an unreversed synapomorphy. Enforcing a topological constraint on a single node, making the G/deletion (and thus all 3’ and internal copies) monophyletic, required five additional steps. In each of these 582 trees (42 steps), parallel mutation of 15 other nucleotide characters was required. To assess the significance of the difference between trees consistent with each model, the nonparametric test of Templeton (1983) was applied. From each set of most-parsimonious trees (37 vs. 42 steps), one fully resolved tree that minimized the number of homoplastic characters was selected (fig. 7). These trees have similar structures for nodes that are not important in discriminating among hypotheses. The Templeton test indicated that the topology supporting multiple origins or homogenization (fig. 7A) was significantly shorter (T, = 33, n = 17, P < 0.025) than the alternative supporting a single origin (fig. 7B). Relationships of repeats from individual fish were consistent with relationships of nonduplicated sequences. In particular, sequences from individuals French Broad 1 and Stony 6 form a well-supported group in both analyses. In addition, phylogenetic analysis of only 5’ copies from each individual (not shown) yielded a pattern very similar to that in figure 3, with strong support for the French Broad l-Stony 6 group. These results demonstrate that sequence evolution in the tandem repeats reflects the history of the lineages in which they are found, and that recombination or gene conversion among lineaees is ne&gible. Tandem Repeats in Minnow mtDNA 11111111111222222 11111111136666902257788888012444 1267812345678950123208996734578705048 TTGCGAAACCCCCTGCCCGGA--AAAGGAAACAAATTA . . . . . G-__----- . . . . . . . TA.............. Raisin 2-5’ Raisin Tiffin 2-3' l-5' ..................................... . . . . . . . TA.............. . .-. . G __-----_ Tiffin 1-3' Tiffin Tiffin Tiffin . . . . . . . . . . . . . . . . . . . ..--G . . .A....G.... 2-5' 2-i 2-3’ . . . . . G--------.......--G...A....G.... . . . . . G ------mm Embarras 1-5' Embarras Kankakee 1-3' l-5' Kankakee Kankakee l-i 1-3' Turtle Turtle 1-5' l-i 2-5’ 2-i5' 2-i3' 2-3’ Susquehanna 1-5' Susquehanna Susquehanna Susguehanna Susquehanna l-i l-3' 2-5’ 2-3’ Tippecanoe Tippecanoe Tippecanoe 1-5' l-i l-3' l-5' l-i l-3' Hocking Hocking Hocking Little l-5' Little l-i Little l-3' French French French Broad Broad Broad Elkhart 2-5' Elkhart 2-i Elkhart 2-3' Stony 2-5' Stony 2-i Stony 2-3' Stony 6-5 ’ Stony 6-i Stony 6-3 Big 3-5’ Big 3-i Big 3-3’ Big 5-5’ Big 5-i Big 5-3’ Gasconade Gasconade Gasconade Venusta ’ . . . . . AT--G...A....G.... . . . . . . . . . . . . . . . . . . . ..--.............. . . . . . G--------.......TA.............. ..-- .... A .... G .... ................... ..... G--------.......-A....A....G .... -A....A....G .... ..... G----_--- ....... A . . . .G . . . . . . . . . . . . . . . . . . . . . . . ..--.... Turtle l-3' Turtle Turtle Turtle Turtle 1191 . .-. .G --------.......-- ....A ....G..A. ..... G--------.......--....A....G .... ................... ..-- ..G.A....G .... ..... G--------.......--..G.A ....G .... ..... G--------.......--..G.A....S .... G.A....G.... . . . . . ..--.. . . . . . G -------. . . . . . . . . . . . . . . . . . . ..--.... A . . . .G.... A . . . .G. . . . . . . . . ..--.... G--------.......--....A....G .... ..-- .... A .... G . ..G ................... . . . . . G---__--- ..... ......... G ----__-- A ..-- .... A .... G . ..G ................... ..-- ....A .... G .... ..... ..... G--------.......--....A....G .... G--------.......--....A....G .... ..-- .... A .... G .... ................... ..... ..-- .... A .... G .... ..... G-------_ .... ..... G--------.......--....A....G A .... ..-- ... TA ...... ..G .............. .... .... l-5’ l-i l-3’ AG--------.......--...TA ...... ..G AG--------.......--...TA ...... ..G ................... ..-- ...TA......A .G ..-- ... TA......A .G ..... G------__ ..... ..-- ... TA......A .G ..... G------__ ..... ................... ..-- ...TA ...... ..G ..-- ... TA ...... ..G ..... G-----___ ..... ..... G---_____ ..... ..-- ... TA ...... ..G ................... ..-- ...TA ...... ..G ..-- ... TA ...... ..G ..... G------__ ..... ..-- ... TA ...... ..G . .-. . G---_____ ..... ............... ATT...--...TATTT ... ..G ..... G--------.ATT...--...TATTT ... ..G ..... G--------.ATT...--...TATTT ... ..G ..................... --.G..A....G .... --.G..A...TG .... . .-. . G__------ ....... ..... G--------.......--. G..A....G .... ................... l-5’ l-i l-3’ ..-- .... A .... GG ... ACCGAG--------.......-- ....A ....GG ... ACCGAG--------.......--....A....GG ... ..................... --.G.TA....G .... ..... G--------.......--.G.TA....G .... ..... G--------.......--.G.TA....G .... ..................... TA...CA ......... FIG. 4.-Variable nucleotides in copies of tandem repeats. Individuals are identified as in figure 2; copies within individuals are indicated by physical position (5’, internal [i], or 3’). For each individual, all copies are included. Repeats included in phylogenetic analyses are indicated by boldface type. Numbering refers to position in the complete 271-bp alignment; a dot indicates same nucleotide as top sequence; a dash indicates an alignment gap. Discussion Repeat Variation Differences in nucleotide diversity among repeats and nonduplicated sequences suggest that these sequences are subject to different functional constraints. It has been well established that the central part of the control region is more highly conserved than the adjacent flanking regions in mammals (Saccone, Pesole, and Sbisa 1991) and in fishes (Brown, Beckenbach, and Smith 1992; Shedlock et al. 1992; Lee et al. 1993, apparently due to the presence of regulatory elements. Therefore, it is not surprising that nonduplicated sequences proximal to tRNAPro exhibit substantially more diversity. Reduced diversity of 5’ copies relative to other repeat units is similar to results reported for the evening bat (Wilkinson 1192 Broughton and Dowling within repeat arrays, derived variation is usually shared among internal copies, whereas terminal copies tend to accumulate fewer changes and have greater similarity among arrays. This phenomenon was termed the “edgeeffect” by Rand (1994). Although Fumagalli et al. (1996) suggested that addition and deletion occur at the 5’ end and that the 3’ copy is protected from such events, the present results indicate that 5’ copies are conserved. An edge-effect at the 3’ end was not evident, but this may be a function of the limited number repeats in C. spilopteru. It therefore appears that 5 ’ copies are ancestral, while additional copies accumulate in the 3’ direction. Ch 0.020 !? z 0.015 a, g8 0.010 Ti z’ 0.005 0.000 5’ Int 3’ ND Sequence Type FIG. 5.-Nucleotide diversity in nonduplicated sequences and 5’, internal, and 3’ copies of the repeat. Error bars = SEM; for each group, n = 105. and Chapman 1991). In that and many other systems, tandem repeats are found in the portion of the control region adjacent to tRNAPro, and are assumed to result from strand slippage at the 3’ end of the D-loop (7s) DNA (Buroker et al. 1990). Repeats at the opposite end of the control region (adjacent to tRNAPhe) may result from a similar mechanism involving slippage of the nascent heavy strand as replication nears completion (Hayasaka, Ishida, and Horai 1991; Broughton and Dowling 1994). In either case, repeats would be added on the heavy strand in the same directional orientation but differing in timing (early vs. late) and position (left vs. right of 0,). An implication of these results is that the template for initial duplication events occupies the 5’ position but may be involved to a lesser extent in subsequent additions and deletions. Several studies have indicated that Character Repeats r Raisin 2-5 Raisin 2-5 Raisin 2-3 - Tiffin 2-5 TEfin P-int Tin Among Despite limited phylogenetic resolution, examination of nucleotide character state distributions and functional constraints provide insights on the evolutionary dynamics of mtDNA tandem repeats. The G/deletion occurs in what would otherwise be conserved sequence block 2 (CSB-2). Given its regulatory role in replication and transcription (Chang, Hauswirth, and Clayton 1985; Clayton 1991, 1992) and its high level of sequence conservation among vertebrates (Foran, Hixson, and Brown 1988), CSB-2 appears to be an essential element in normal mtDNA function. The duplicated segments contain CSB-2 and CSB-3 and presumably both heavy- and light-strand promoters, and are adjacent to On. Disruption of regulatory sequences in “extra” copies may confer an advantage, as multiple functional copies of this important region may complicate regulatory processes. Deletion of part of CSB-2 in extra copies may reduce or eliminate potential problems in regulation, favoring individuals lacking multiple intact copies of CSB-2. B A Distribution 2-3 Turtle 2-5 I Turtle 2-int5 Turtle 2-int3 Turtle 2-3 Susquehanna 2-5 Susquehanna 2-3 French Broad l-5 French Broad 1 -int Raisin 2-3 76 . Big 5-int 89 1Big 5-3 54 French Broad l-3 Gasconade l-5 Stony 6-5 Stony 6-int - Stony 6-3 Gasconade 1 -int Big 3-5 - Susquehanna Susquehanna 2-5 2-3 Big 5-5 Big 5-int Big 5-3 Gasconade l-5 Gasconade I-int Gasconade l-3 C. venusta FIG. 6.-Phylogenetic relationships of repeats Neighbor-joining tree with CP values of ~50. from 10 individuals. A, Strict consensus of 233 most-parsimonious trees (37 steps). B, Tandem A Repeats in Minnow mtDNA 1193 B Raisin 2-5 Raisin 2-3’ Raisin 2-3 Tiffin P-int Tiffin 2-5 Tiffin 2-3 Tiffin 2-int Turtle 2-int5 Tiffin 2-3 Turtle 2-int3 Turtle 2-5 Turtle 2-3 Turtle 2-int5’ Big 5-int Turtle 2-int3’ Big 5-3 Turtle 2-3’ Big 3-int Big 5-5’ Big 3-3 Big 5-int Gasconade I-int Big 5-3’ Gasconade l-3 Big 3-5’ Big 3-int Susquehanna French Broad l-3 Big 3-3 h FIG. 7.-Fully resolved trees consistent (selected from each set of most-parsimonious 2-3 French Broad 1-int Gasconade Gasconade 1-5’ I-int Gasconade 1-3 Stony 6-int Stony 6-3 Raisin 2-5’ Susquehanna 2-5 Tiffm 2-5 Susquehanna 2-3’ Turtle 2-5 French Broad 1-5’ Big 5-5 French Broad 1-int Big 3-5 French Broad l-3 Gasconade Stony 6-5 Susquehanna Stony 6-int French Broad l-5 Stony 6-3 Stony 6-5’ C. venusta C. venusta 1-5 2-5 with (A) multiple origins or homogenization and (B) a single orgin of the repeats. These trees trees) minimized the number of characters that were homoplastic. Open hash marks indicate the G/deletion. and shaded hash marks refer to characters in figure 4 as follows: a = 1, 2, 6-8; b = 60-62, 185, 187, 188; c = 159; d = 176; e = 177; f = 183; g = 184; h = 210; i = 225; j = 240; k z 248. While it is not difficult to envision a full-length copy (i.e., one without the G/deletion) giving rise to a copy with the G/deletion, it seems very unlikely that the reverse could occur. As such, 5’ copies are not likely to be deleted; first, because the molecule would be left without a functional CSB-2, and second, because there would be no conceivable way to regenerate an intact CSB-2. This apparent irreversibility has important implications for character state transformations in phylogenetic analysis because an array containing at least one full-length copy could never be derived from a lineage where all copies possess the G/deletion. Clearly some characters must have converged across repeat units. Not unexpectedly, coding the G/deletion as a single character (vs. two characters) or removing it entirely reduced homoplasy and substantially increased phylogenetic resolution. These results (not shown) are consistent with the topology in figure 7A. Three lines of evidence suggest that the G/deletion is convergent: (1) Trees allowing convergent evolution of the G/deletion are significantly shorter than alternatives where it evolves only once. (2) It is more parsimonious to assume that the G/deletion (i.e., two possibly related characters) is convergent than to assume that 15 nucleotide characters (that unite sets of copies in specific lineages) were derived more than once. (3) The apparent irreversibility of the G/deletion requires that it must have been independently derived in each monophyletic group containing both 5’ and extra copies. In light of evidence indicating that the G/deletion is convergent, the distribution of intact CSB-2s only in 5’ copies suggests that whenever a new copy is made from a 5’ copy, a G/deletion is produced (or only when a G/deletion is produced may repeats persist). Presum- ably, an event drastic enough to result in deletion of eight bases could also cause an adjacent nucleotide change. Such an event could be an artifact of the duplication process, conceivably involving the secondary structures thought to facilitate duplication formation. The G/deletion therefore represents a deterministic form of mutation that is apparently driven by the molecular mechanisms causing sequence duplications. Because the G/deletion provides the only support for the single-origin model, this model may be rejected in favor of either multiple origins or homogenization. Although the multiple-origins and homogenization models were not differentiated by phylogenetic analysis, they differ in several respects. Primary differences include the timing of point mutations (before duplication for multiple origins and after initial duplication for homogenization) and the relative rate of copy addition/deletion (fast vs. slow, respectively). These differences are incorporated into a simple example where individuals with repeats fixed for nucleotide differences are derived from a single ancestor (fig. 8). This example illustrates two important points: (1) the multiple-origins model requires fewer evolutionary steps than homogenization (accounting for both addition/deletion and point mutations), and (2) complete homogenization passes through intermediate forms in which derived nucleotides are shared by some but not all copies. To explain repeat variation in C. spilopteru under the two models, multiple origins would require 30 mutational events, while homogenization would require a minimum of 38 events. Because of the presumably stochastic nature of copy addition/deletion, the actual number of events required for complete homogenization may actually be much higher than the minimum. Multiple 1194 Broughton and Dowling FIG. 8.-Mutational steps required by (A) multiple origins and (B) homogenization for each to produce two repeat arrays fixed for nucleotide differences. Vertical arrows = derived nucleotide state; dashed arrows = copy number changes. Multiple origins requires three steps, while homogenization requires four steps. origins is, therefore, a more parsimonious explanation. With respect to intermediate forms, it appears that copies with the G/deletion cannot occupy the 5’ position; therefore, nucleotide changes arising in copies with the G/deletion cannot be completely homogenized in an array. Consequently, fixed derived nucleotides must originate in 5’ copies, and the homogenization process should produce intermediate states where derived nucleotides are shared by the 5’ and some extra copies. Although some unfixed variants were shared among extra copies in C. spiloptera, there were no cases of incomplete homogenization involving a 5’ copy (fig. 4). Both the lack of expected intermediates and the greater number of evolutionary steps predicted by the homogenization model could be explained by a sufficiently high rate of copy addition/deletion. Although an expected by-product of rapid copy turnover is length heteroplasmy, no heteroplasmy was observed among the nearly 50 individuals examined. In the C. spiloptera system with relatively low copy number, large repeat units, and lack of heteroplasmy, the rate of copy turnover would appear to be fairly low. Therefore, while neither homogenization nor multiple origins can be rejected, the lack of circumstantial evidence for homogenization suggests it has not acted recently. In addition, regardless of whether repeats arise independently in different lineages (multiple origins) or are added to existing arrays (homogenization), it is clear that the G/deletion must be produced whenever a 5’ copy serves as the template for duplication. The patterns of nucleotide variation reported here provide several new perspectives on the evolution of mitochondrial tandem repeats. The multiple-origins model remains a viable explanation for the distribution of fixed nucleotide variants within various arrays. It is possible that homogenization has acted in conjunction with multiple origins, but tests of such a scenario will be difficult because of the potential for each model to produce similar end results. Although homogenization is clearly an important factor governing tandem repeat variation in many animal taxa, multiple origins may be more important than previously thought. The G/deletion also allows novel insights on molecular evolution by providing new evidence for a real, but as yet uncharacterized, deterministic component of DNA mutation. Such mutations may be important where specific secondary structures interfere with DNA replication. Repeat Evolution We suggest the following hypothetical framework for control region tandem repeat evolution. Original duplications are formed via strand slippage, which may be facilitated by DNA secondary structures. Stem-loop structures may be formed by the repeat units themselves or possibly occur upstream (with respect to replication) of the duplicated region. Because the ability to form secondary structures is sequence-specific, sequence variation among taxa may contribute to variation in the positions and lengths of the structures formed. This may account for the observation of different tandem repeats in the same general locations among taxa and could also facilitate the independent duplication of homologous sequences within or among species. Once two copies are present, the propensity for strand slippage may increase, allowing tandem arrays to expand and contract. Given multiple copies of a repeat, those that serve as templates in particular addition events may vary stochastically, but internal copies should be favored when they outnumber terminal copies. If repeats contain functionally important regions, the primary sequence of terminal copies (or at least the important parts) may be maintained by functional constraints, while selection on internal copies is relaxed. Therefore, a greater proportion of variation which becomes completely homogenized will likely have arisen in terminal copies, yet these copies should also be more conserved. The extent of nucleotide variation within and among repeat arrays will vary with the propensity for parallel duplication and the rate of addition/deletion relative to nucleotide mutation. The discovery of the same (or very similar) tandem repeat units among many species of shrews (Fumagalli et al. 1996), darters (Turner 1997), and bats (Wilkinson et al. 1997) suggests greater evolutionary significance Tandem Repeats in Minnow mtDNA 1195 GHIVIZZANI,S. C., S. L. D. MACKAY, C. S. MADSEN, I? J. LAIPIS, and W. W. HAUSWIRTH.1993. Transcribed heteroplasmic repeated sequences in the porcine mitochondrial DNA D-loop region. J. Mol. Evol. 37:36-47. GLICKMAN,B. W., and L. S. RIPLEY. 1984. Structural intermediates of deletion mutagenesis: a role for palindromic DNA. Proc. Natl. Acad. Sci. USA 81:512-516. HAYASAKA,K., T. ISHIDA,and S. HORAI. 199 1. Heteroplasmy and polymorphism in the major noncoding region of miAcknowledgments tochondrial DNA in Japanese monkeys: association with tandernly repeated sequences. Mol. Biol. Evol. 8:399-415. We thank Joe Bielawski, Rick Harrison, Doug McElroy, Gavin Naylor, David Rand, John Rice, Tom HOELZEL,A. R., J. M. HANCOCK,and G. A. DOVER. 1993. Generation of VNTRs and heteroplasmy by sequence tumTurner, and Rob Wood for helpful discussions and/or over in the mitochondrial control region of two elephant comments on the manuscript. For assistance in collectseal species. J. Mol. Evol. 37:190-197. ing or providing specimens, we thank Don Buth, Robert KUMAR,S., K. TAMURA,and M. NEI. 1993. MEGA: molecular Dawley, Abby Dowling, Karen Dowling, Dave Etnier, evolutionary genetics analysis. The Pennsylvania State UniTom Haglund, Randy Hoeh, Wayne Starnes, Ross Timversity, University Park. mons, Matt White, and Rob Wood. This work was supLEE, W.-J., J. CONROY,W. H. HOWELL,and T. D. KOCHER. ported by the Department of Zoology and Graduate Stu1995. Structure and evolution of teleost mitochondrial condent Association at Arizona State University, Sigma Xi, trol regions. J. Mol. Evol. 41:54-66. and NSF (DEB-9220683). LEVINSON,G., and G. A. GUTMAN. 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. LITERATURE CITED Mol. Biol. Evol. 4:203-221. MADSEN, C. S., S. C. GHIVIZZANI,and W. W. HAUSWIRTH. ARNASON,E., and D. M. RAND. 1992. Heteroplasmy of short 1993. In vivo and in vitro evidence for slipped mispairing tandem repeats in mitochondrial DNA of Atlantic cod, Gain mammalian mitochondrial DNA. Proc. Natl. Acad. Sci. dus morhua. Genetics 132:21 l-220. USA 90:767 l-7675. BROUGHTON,R. E., and T. E. DOWLING.1994. Length variation MUNDY, N. I., C. S. WINCHELL,and D. S. WOODRUFF.1996. in mitochondrial DNA of the minnow Cyprinda spilopTandem repeats and heteroplasmy in the mitochondrial tera. Genetics 138: 179-190. DNA control region of the loggerhead shrike (Lunius ZuBROWN, J. R., A. T. BECKENBACH,and M. J. SMITH. 1992. dovicianus). J. Hered. 87:21-26. Mitochondrial DNA length variation and heteroplasmy in OHTA, T. 1983. On the evolution of multigene families. Theor. populations of white sturgeon (Acipenser transmontanus). Popul. Biol. 23:216-240. Genetics 132:221-228. PIERCE,J. C., D. KONG, and W. MASKER. 199 1. The effect of BROWN,W. M. 1985. The mitochondrial genome of animals. the length of direct repeats and the presence of palindromes Pp. 95-130 in R. MACINTYRE,ed. Molecular evolutionary on deletion between directly repeated DNA sequences in genetics. Plenum Press, New York. bacteriophage T7. Nucleic Acids Res. 19:3901-3905. BUROKER,N. E., J. R. BROWN,T. A. GILBERT,F? J. O’HARA, A. T. BECKENBACH, W. K. THOMAS,and M. J. SMITH. 1990. RAND, D. M. 1993. Endotherms, ectotherms, and mitochondrial Length heteroplasmy of sturgeon mitochondrial DNA; an genome-size variation. J. Mol. Evol. 37:281-295. illegitimate elongation model. Genetics 124: 157-l 63. -. 1994. Concerted evolution and RAPing in mitochonCHANG,D. D., W. W. HAUSWIRTH,and D. A. CLAYTON.1985. drial VNTRs and the molecular geography of cricket popReplication priming and transcription initiate from precisely ulations. Pp. 227-245 in B. SCHIERWATER, B. STREIT,G. I? the same site in mouse mitochondrial DNA. EMBO J. 4: WAGNER, and R. DESALLE, eds. Molecular ecology and 1559-1567. evolution: approaches and applications. Birkhauser Verlag, CLAYTON,D. A. 1991. Nuclear gadgets in mitochondrial DNA Basel. replication and transcription. Trends Biochem. Sci. 16: 107RICE, W. R. 1989. Analyzing tables of statistical tests. Evolu111. tion 43~223-225. 1992. Transcription and replication of animal mitoR~HETSKY,A., and M. NEI. 1992. A simple method for estichondrial DNAs. Int. Rev. Cytol. 141:217-232. mating and testing minimum-evolution trees. Mol. Biol. DEVEREUX,J., I? HAEBERLI,and 0. SMITHIES.1984. A comEvol. 9: 945-967. prehensive set of sequence analysis programs for the VAX. SACCONE,C., G. PESOLE,and E. SBISA. 1991. The main regNucleic Acids Res. 12:387-395. ulatory region of mammalian mitochondrial DNA: strucDOVER,G. A. 1982. Molecular drive: a cohesive mode of speture-function model and evolutionary pattern. J. Mol. Evol. cies evolution. Nature 299: 11 l-l 17. 33:83-91. ELDER,J. E, and B. J. TURNER. 1995. Concerted evolution of SAITOU,N., and M. NEI. 1987. The neighbor joining method: repetitive DNA sequences in eukaryotes. Q. Rev. Biol. 70: a new method for reconstructing phylogenetic trees. Mol. 297-319. Biol. Evol. 4:406-425. FORAN, D. R., J. E. HIXSON, and W. M. BROWN. 1988. ComSHEDLOCK,A. M., J. D. PARKER,D. A. CRISPIN,T. W. PIETSCH, parisons of ape and human sequences that regulate mitoand G. C. BURMER. 1992. Evolution of the salmonid michondrial DNA transcription and D-loop synthesis. Nucleic tochondrial control region. Mol. Phylogenet. Evol. 1:179Acids Res. 13:5841-5861. 192. FUMAGALLI,L., I? TABERLET,L. FAVRE,and J. HAUSSER.1996. SMITH, G. F? 1973. Unequal crossover and the evolution of Origin and evolution of homologous repeated sequences in multigene families. Cold Spring Harb. Symp. Quant. Biol. the mitochondrial DNA control region of shrews. Mol. Biol. 38:507-513. Evol. 13:3 l-46. for repeat arrays than has been previously thought. These results indicate either that such repeats exhibit substantial historical stability or that mutation pressure has caused them to arise repeatedly in multiple lineages. Additional detailed investigations should provide further information on the evolutionary mode and significance of tandem repeats in mtDNA. 1196 Broughton and Dowling SOLIGNAC, M., M. MONNEROT, and J.-C. MOUNOLOU. 1986. Concerted evolution of sequence repeats in Drosophila mitochondrial DNA. J. Mol. Evol. 2453-60. STANTON, D. J., L. L. DAHLER, C. MORITZ, and W. M. BROWN. 1994. Sequences with the potential to form stem-and-loop structures are associated with coding-region duplications in animal mitochondrial DNA. Genetics 137:233-24 1. STEWART, D. T., and A. J. BAKER. 1994. Patterns of sequence variation in the mitochondrial D-loop region of shrews. Mol. Biol. Evol. 11:9-21. SWOFFORD, D. L. 1993. PAUP: phylogenetic analysis using parisimony. Version 3.1.1. Illinois Natural History Survey, Champaign. TEMPLETON, A. R. 1983. Phylogenetic inference from restric- tion endonuclease cleavage site maps with particular refer- ence to the evolution of humans and the apes. Evolution 371221-244. TURNER, T. E 1997. Mitochondrial control region sequences and phylogenetic systematics of darters (Teleostei: Percidae). Copeia 1997:319-338. WILKINSON, G. S., and A. M. CHAPMAN. 1991. Length and sequence variation in evening bat D-loop mtDNA. Genetics 128:607-617. WILKINSON, G. S., E MAYER, G. KEXRTH,and B. PETRI. 1997. Evolution of repeated sequence arrays in the D-loop of bat mitochondrial DNA. Genetics 146: 1035-1048. NARUYA SAITOU, reviewing Accepted August 20, 1997 editor
© Copyright 2026 Paperzz