Extraordinary genome stability in the ciliate Paramecium tetraurelia Way Sunga,1, Abraham E. Tuckera, Thomas G. Doaka, Eunjin Choia, W. Kelley Thomasb, and Michael Lyncha a Department of Biology, Indiana University, Bloomington, IN 47405; and bDepartment of Molecular Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824 Mutation plays a central role in all evolutionary processes and is also the basis of genetic disorders. Established base-substitution mutation rates in eukaryotes range between ∼5 × 10−10 and 5 × 10−8 per site per generation, but here we report a genome-wide estimate for Paramecium tetraurelia that is more than an order of magnitude lower than any previous eukaryotic estimate. Nevertheless, when the mutation rate per cell division is extrapolated to the length of the sexual cycle for this protist, the measure obtained is comparable to that for multicellular species with similar genome sizes. Because Paramecium has a transcriptionally silent germ-line nucleus, these results are consistent with the hypothesis that natural selection operates on the cumulative germ-line replication fidelity per episode of somatic gene expression, with the germ-line mutation rate per cell division evolving downward to the lower barrier imposed by random genetic drift. We observe ciliate-specific modifications of widely conserved amino acid sites in DNA polymerases as one potential explanation for unusually high levels of replication fidelity. mutation accumulation | drift-barrier M utation is the ultimate source of genetic variation, which not only drives adaptive processes but also contributes to genetic disorders, and in some cases, extinction. Understanding the mutation rate is critical in determining rates of molecular evolution, estimating effective population sizes, understanding the impact of mutations on organismal fitness, and evaluating the power of drift, selection, and recombination in shaping genomes (1). However, because of the extraordinarily high degree of replication fidelity in most organisms, procuring accurate measures of mutation rate has been fraught with difficulties. The recent application of high-throughput sequencing technology to mutation-accumulation (MA) lines has now revolutionized this area of inquiry, yielding essentially unbiased estimates of the genome-wide spontaneous mutation rate and spectrum in several eukaryotes (2–5). Under the MA process, repeated population bottlenecks minimize the efficiency of selection, enabling even highly deleterious mutations to accumulate in an effectively neutral fashion. At various points in the MA process, the accumulated pool of mutations in the individual lines can be assayed using high-throughput sequencing. So far, MA experiments in eukaryotes have shown that per-generation mutation rates generally correlate with genome size and effective population size (6, 7). However, the available data involving whole-genome sequencing of MA experiments include only a limited number of unicellular eukaryotes (2, 8), and the factors driving mutationrate evolution across the eukaryotic domain remain unclear. To further understand mutation-rate evolution in eukaryotes, we have applied the MA process to the ubiquitous unicellular freshwater ciliate Paramecium tetraurelia. This species, and ciliates in general, have a distinct reproductive biology that may constrain mutation-rate evolution. P. tetraurelia exhibits nuclear dimorphism, harboring a transcriptionally silent (germ-line) micronucleus and a transcriptionally active (somatic) macronucleus. The species normally reproduces by mitotic binary fission, but after ∼75 cell divisions under high-growth conditions (data www.pnas.org/cgi/doi/10.1073/pnas.1210663109 from this experiment), or ∼30–50 fissions when starved (9), P. tetraurelia undergoes a self-fertilization process known as autogamy (9), at which time the old macronucleus is destroyed and replaced by a processed version of the new micronuclear genome (10). When in contact with compatible mating types, P. tetraurelia can also undergo conjugation (10), although, this can be prevented in the laboratory by using a stock consisting of only one mating type. Under conditions of exclusive autogamy, all mutations arising in the micronucleus during clonal propagation should accumulate in a completely neutral fashion, with the fitness effects only being realized after sexual reproduction. The mutations that accumulate in the micronucleus during P. tetraurelia clonal division are analogous to mutations that accumulate during germ-line cell divisions of metazoan, as the effects of both types of mutations are not realized until expression after the subsequent sexual generation. Although metazoans have some of the highest per-generation mutation rates known, the mutation rate per germ-line cell division remains quite low in such species (6). This observation suggests that although selection operates to drive down mutation rates (6, 11–14), because it can only operate on the visible effects of mutations, the efficiency of selection to reduce the germ-line mutation rate is constrained by the amount of time during which mutations experience quiescent germ-line sequestration (15). If this hypothesis is correct, the mutation rate per cell division in P. tetraurelia (and ciliates in general) is expected to be lower than that in other unicellular species. Thus, application of the MA procedure to P. tetraurelia provides a unique opportunity to shed light on the mechanisms driving the evolution of the mutation rate. Results Mutation Rate. To estimate the mutation rate in P. tetraurelia, we analyzed the macronuclear genomes of seven P. tetraurelia MA lines that had been propagated through single-cell bottlenecks for ∼3,300 cell divisions starting from an isogenic state and subsequently restricted to periodic autogamy. At the time of the final assay, essentially all detected mutations will have resided in the germ line at the last autogamy, and hence will reflect true germline mutations. A previous estimate of the scaling of the mutation rate and genome size predicts a base-substitution rate of ∼1.5 × 10−9 per site per generation given the ∼72-Mb genome size of P. tetraurelia (6, 7, 16), which would have generated at least a few hundred base substitutions in each MA line. However, across all Author contributions: W.S., W.K.T., and M.L. designed research; W.S., A.T., T.G.D., and E.C. performed research; W.S. analyzed data; and W.S., W.K.T., and M.L. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The sequences reported in this paper have been deposited in the National Center for Biotechnology Information Short Read Archive (SRA), www.ncbi. nlm.nih.gov/sra, [accession nos. SRP013857 (study), SRS346546 (sample), and SRX155532 (experiment)]. 1 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1210663109/-/DCSupplemental. PNAS | November 20, 2012 | vol. 109 | no. 47 | 19339–19344 EVOLUTION Edited by Detlef Weigel, Max Planck Institute for Developmental Biology, Tübingen, Germany, and approved October 10, 2012 (received for review June 21, 2012) seven lines (an average of 63.2 Mb analyzable sequence per line), we identified only 29 mutations, yielding an overall base-substitution mutation rate of 1.94 (SE = 0.36) × 10−11 per site per cell division (Table 1). A separate maximum-likelihood approach (17) yielded a slightly higher rate of 2.64 (SE = 0.22) × 10−11 per site per cell division. At a level ∼75× lower than the expectation for eukaryotes with a similar genome size, and even 10× lower than most prokaryotic estimates, the P. tetraurelia mutation rate is by far the lowest ever observed. Our analysis also revealed a total of five insertions (but no deletions) across the seven lines, with only one involving a simple sequence repeat (Table S1), yielding an insertion/deletion rate of 3.87 (2.10) × 10−12 per site per cell division. The low proportion of all mutations that are insertion/deletion events (0.15) is consistent with observations in humans (0.06%) (18) and Arabidopsis thaliana (0.10) (4). Using the annotation of the P. tetraurelia genome (19), we attempted to identify the functional context of each base substitution (Table 1). Twenty-three of the 29 substitutions (79.3%) were in coding regions, with the remaining six found in intergenic sites, consistent with the random expectation based upon overall genome composition (χ2 test, P = 0.48, 2 df) (Table 2). If the substitution mutations in coding regions are randomly distributed across P. tetraurelia codons (Table S2), 29.1% are expected to result in silent changes. The observed ratio (8 of 23 = 34.8%) does not differ significantly from this expectation (χ2 test, P = 0.55, 1 df), suggesting that purifying selection on coding-region mutations was not a significant force in the MA lines. Mutational Bias. There are 5.2× more G/C→A/T base substitutions than A/T→G/C mutations in this species, and taking into consideration the A/T genome composition (71%), this implies that G/C→A/T base substitutions arise ∼12.9-times more frequently per target site than do A/T→G/C base substitutions (Fig. 1). This mutation bias toward A/T, which is consistent with observations made in all other species to date (2–5, 20, 21), may be a consequence of the spontaneous deamination of cytosine and the conversion of guanine to 8-oxo-guanine (22). Over the course of the experiment, zero A:T→G:C transitions were observed (Fig. 1). If we assume that mutations are Poisson distributed, there is a >0.95 probability of not seeing any A:T→G:C base-substitution mutations across seven lines if the rate of origin of such mutations is ≤2.51 × 10−12. If the P. tetraurelia genome has reached a nucleotide-content equilibrium from mutation pressure alone, a genome composition of 93% A/T would be expected based on the observed mutation spectrum. However, the A/T contents for both the entire genome (71%) and for silent sites alone (76%) (Table S2) are substantially below this expectation, implying that the A/T mutation bias has historically been opposed by other forces. Although we cannot rule out the possibility that the P. tetraurelia genome is currently moving toward mutation equilibrium, the structure of the P. tetraurelia genome may facilitate G/C maintenance by biased gene conversion (23). The inverse correlation between chromosome size and G/C content in the P. tetraurelia genome (10) is consistent with an increased density of crossover events (and associated conversion events) in smaller chromosomes, especially when one considers the G/C bias of gene conversion found in most organisms (24). Mitochondrial Mutations. The same selective pressure that is stabilizing the mutation rate in the silent P. tetraurelia germ line should not apply to the mitochondrial genome in this species, as mitochondrial genes are expressed every generation, although mutations are expected to be found in a heteroplasmic state for a number of cell divisions after first appearance. Using methods identical to those for detecting nuclear mutations, with an average of 39,422 analyzed sites per line, we discovered no fixed mitochondrial mutations across the seven lines, but were able to estimate the base-substitution mutation rate by using the fraction of mutations attaining significantly higher frequencies (>3 SDs) than expected from random sequencing error (Fig. S1). Summing the allele frequencies for all heteroplasmic MA-derived mitochondrial base substitutions, we obtain an estimate of 6.96 (2.44) × 10−8 base-substitution mutations per site per cell division (Table S3), which is quite high compared with most other eukaryotes (∼1 × 10−8) (2, 6, 25, 26) and ∼2,600× the P. tetraurelia nuclear rate, but consistent with the observation that Paramecium mitochondrial silent-site diversity estimates are among the highest ever reported (27). Discussion During periods of vegetative propagation in P. tetraurelia, the germ-line micronucleus is replicated but not expressed (10, 24), and hence germ-line mutation accumulation is unrestrained by natural selection until expression by the new macronucleus following an episode of sexual reproduction. If the P. tetraurelia micronucleus had a typical base-substitution mutation rate for its genome size [∼1.5 × 10−9 per site per generation with a genome size of 72 Mb (6)], each genome would be expected to accumulate ∼7.9 mutations over the 75 asexual generations, and the subsequent exposure following autogamy would likely impose a very high burden for a genome with 78% coding density (24). Thus, we propose that lengthy periods of germ-line sequestration Table 1. P. tetraurelia summary statistics, base-substitution distribution, and mutation rate MA line Base substitutions Intergenic Intron Synonymous Nonsynonymous Ts/Tv ratio Read depth Analyzed sites (×107) Generations (×103) Mutation rate (×10−11) SE (×10−11) 15 25 30 40 55 60 70 Pooled 1 0 3 3 0.40 31.55 6.59 3.31 3.21 1.21 0 0 1 4 0.25 86.54 7.02 3.32 2.14 0.96 2 0 1 2 0.67 59.36 7.00 3.31 2.16 0.97 0 0 0 2 – 28.54 4.47 3.31 1.36 0.96 1 0 2 0 2.00 45.36 6.97 3.31 1.30 0.75 1 0 0 3 0.33 42.79 5.45 3.31 1.73 0.87 1 0 1 1 – 73.78 7.01 3.31 1.67 0.96 6 0 8 15 0.73 52.56 6.36 3.31 1.94 0.36 Base-substitution distribution, transition/transversion (Ts/Tv) ratios, number of analyzable sites per line, generations at time of sequencing, mutation rate per site per generation, and the SE of the base-substitution mutation rate per site per generation across seven P. tetraurelia MA lines. Pooled column is the total sum for summary statistics, and average for mutation rates. 19340 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109 Sung et al. -11 Cons Mut Type Orig AA New AA 110 125 150 184 560 63 72 11 157 32 6 89 103 145 175 65 88 1 8 515 62 73 102 16 57 162 146 60 79 113028 27587 6745 1534 447 208 326857 241977 128026 259470 647359 192525 52586 29743 100327 330649 68666 547782 591378 1496 62583 359233 226059 307226 454181 73827 108424 187448 28592 C A T A G A C C A C A C C G A C G T C G G T G G T C G A C A T G T A T T G T G T T T T T A A G G A A A T A G A T T A EX EX EX EX EX IG EX EX EX EX EX EX EX IG IG EX EX EX EX EX IG EX EX EX IG EX EX EX IG H P F H I N P V L I I F L R L T P I L L P Q I S 2.0 1.6 1.2 0.8 0.4 0.0 AT > GC H F K S I N F Q * I S L S S I L C P K F P N P. tetraurelia mutations identified using the consensus approach. Cons, consensus nucleotide; Line, mutation accumulation line; Mut, mutation nucleotide; New AA, new amino acid; Orig AA, original amino acid; Scaff, scaffold; Type, coding context (EX, exon; IG, intergenic). in P. tetraurelia promote unusually strong selection for low germline mutation rates (on a per-cell division basis). Because the majority of mutations with fitness effects are slightly deleterious (1, 28), natural selection will generally operate to promote mechanisms that minimize the production of replication errors and maximize the efficiency of repair of nonreplicating DNA (e.g., higher fidelity polymerases, and improved repair enzymes) (6, 11, 14). However, the point at which the advantage of a further reduction in the mutation rate equals the power of random genetic drift (resulting from finite population size) represents the lower bound to which the mutation rate can be driven by natural selection (15). Because selection can only operate on expressed mutations, the unit of selection on the mutation rate in a species with a sequestered germ line is the pool of deleterious mutations accumulated over all germ-line cell divisions, which generally have no phenotypic effects until emerging into a soma in the following generation. Thus, in a metazoan, selection operates to minimize the per-generation mutation rate, and this is accomplished by reducing the mutation rate per cell division by a factor equal to the number of germ-line cell divisions per generation. As a consequence, although mammals have the highest known pergeneration mutation rates, the rates per cell division are very low in the germ line (6). Not all mutations are deleterious, and the fraction of mutations that are beneficial may vary with environmental context (29–32). However, for sexually reproducing organisms like P. tetraurelia, the indirect selection for a higher mutation rate associated with beneficial mutations is expected to be small relative to the downward pressure associated with background deleterious mutations (14, 33). Sung et al. 2.4 GC > AT AT > TA Transitions GC > TA AT > CG GC > CG Transversions Fig. 1. Conditional base-substitution rates for P. tetraurelia MA lines. Rates for each base-substitution type normalized by genome base composition; error bars indicate one SE. Gray horizontal line indicates the average mutation rate per site across all lines, with gray shading showing the SE. Viewed in this way, the mutation rate per sexual episode in P. tetraurelia is equal to 75× the rate per germ-line cell division, which is essentially the same as the per-generation expectation for other eukaryotes with similar genome sizes, ∼1.5 × 10−9 per site (Fig. 2). Thus, the extraordinarily low mutation rate in this species appears to be a consequence of the high efficiency of selection operating on this globally abundant species combined with the added constraint of germ-line sequestration. The unusual level of mutation-rate depression observed in Paramecium is not expected in unicellular species that lack a protected germ line (e.g., bacteria and yeast), as such cells immediately experience the effects of mutations. Mus musculus 10 Homo sapiens Arabidopsis thaliana Drosophila melanogaster Caenorhabditis elegans Paramecium tetraurelia (per sexual episode) 1 Saccharomyces cerevisiae 0.1 Paramecium tetraurelia 0.01 0.01 0.1 1 Genome Size (Mb) Fig. 2. Base-substitution rate scaling for P. tetraurelia per sexual episode. Filled squares represent base-substitution mutation rates derived from human disease alleles (18) and MA projects (2–5). P. tetraurelia open circle represents the base-substitution mutation rate per cell division from this experiment, and the P. tetraurelia filled circle indicates the base-substitution mutation rate per sexual episode. Linear regression for filled icons only (r2 = 0.85, P = 0.002). PNAS | November 20, 2012 | vol. 109 | no. 47 | 19341 EVOLUTION Position -9 15 15 15 15 15 15 15 25 25 25 25 25 30 30 30 30 30 40 40 55 55 55 60 60 60 60 70 70 70 Scaff Base-Substitution Mutation Rate (x10 ) Line Conditional Base Substitution Rates (x 10 ) Table 2. P. tetraurelia mutation accumulation derived mutations The high level of replication fidelity in P. tetraurelia may be achieved via modifications in replication enzymes, as alterations of critical domains in B-family replicative polymerases are known to have direct effects on mutation rates (34–36). B-family replicative polymerases α, δ, ε, and ζ are primary enzymes involved in initiating replication, synthesizing the leading and lagging strand, and allowing for synthesis of DNA across lesions (35, 37, 38). In addition to being involved in the synthesis of the leading and lagging strands, δ and ε polymerases also contain 3′ proofreading exonuclease domains critical to excision of incorrectly polymerized nucleotides (39). Using BLAST (Materials and Methods), we identified all P. tetraurelia B-family DNA polymerases (α, δ, ε, and ζ homologs) and compared these to other eukaryotes. Remarkably, ciliate-specific amino acid changes in the primary catalytic sites of α-region III and ζ-region III result in a switches of amino acid polarity (T→V; T→V/I/M) (Table 3) in active sites that are otherwise highly conserved across eukaryotes. Moreover, the proofreading exonuclease of DNA polymerase ε exhibits ciliatespecific changes in the highly conserved region II to highly charged amino acids (F→R/K) (Table 3). Although such changes in amino acid sequence might not necessarily translate into an improvement in DNA fidelity and lower mutation rates, the unique sequences of ciliate DNA polymerases suggest a fundamental change in the mechanisms of replication fidelity in this lineage (Table 3). Nevertheless, it remains unclear why residues strongly conserved in other eukaryotic lineages and presumably essential for high replication fidelity would be specifically altered in ciliates to achieve even higher accuracy. Despite the global distribution and high local abundances of the study species, conservative estimates of standing levels of silent-site heterozygosity within species of the P. aurelia complex (π s = 0.0096) are not extraordinarily high (27, 40), and for other ciliates may be even lower [e.g., Tetrahymena thermophila, π s = 0.0030 (41)], an observation that lead the latter authors to suggest that effective population sizes (Ne) may be low in ciliates (41). A re-evaluation of this interpretation appears to be in order, as it is derived under the assumption of a substantially higher mutation rate than observed herein. Although low pergeneration mutation rates may not be common to all ciliates, assuming standing levels of silent-site diversity reflect the neutral expectation (4Neu), by factoring out the term 4u, we estimate that Ne is on the order of 108 for P. tetraurelia and perhaps 107 for T. thermophila. Because there is substantial evidence of selection on silent sites (42), especially in species with an elevated Ne, these estimates are likely to be downwardly biased. Finally, it is worth noting that like P. tetraurelia, the distantly related ciliate T. thermophila shares strikingly similar modifications to active sites in B-family DNA polymerases (Table 3), Table 3. Differences in α and ε B-family DNA polymerases of P. tetraurelia Polymerase catalytic sites Species Proofreading 3′→5′ exonuclease Region III Region I Exo II Exo III α AA change AA position H. sapiens M. musculus D. melanogaster C. elegans S. cerevisiae A. thaliana T. pseudonana T. cruzi T. thermophila P. tetraurelia T. gondii G. lamblia * K K K K K K K K K K K K L L L L L L L L L L L L T T T T T T T T V V T C A A A A A A A A A A A A N N N N N N N N N N N N S S S S S S S S S S S A M M M M M M M M M I I M Y Y Y Y Y Y Y Y Y Y Y Y 839 G G G G G G G G G G G G Y Y Y Y Y Y Y Y Y Y Y Y 783 G G G G G G G G G G G G C C C C C C C C C C C S I I V V V I I I I I V I Y Y Y Y Y Y Y Y Y Y Y Y G G G G G G G G G G G G D D D D D D D D D D D D T T T T T T T T T T T T D D D D D D D D D D D D D D D D D D D D D D D D 825 T D T D T D T D T D T D T D T D T D T D T D T D 885 S S S S S S S S S S S S I I L I V I I V L M I I ε AA change AA position H. sapiens M. musculus D. melanogaster C. elegans S. cerevisiae A. thaliana T. pseudonana T. cruzi T. thermophila P. tetraurelia T. gondii G. lamblia + * K K K K K K K K K K K K C C C C V C C C I I C C I I I I I I I I I I I I L L L L L L L L L L L L N N N N N N N N N N N N S S S S S S S S S S S S F F F F F F F F F F F F Y Y Y Y Y Y Y Y Y Y Y Y L L L L L L L L L L M L E E E E E E E E E E E E L L L L L L L L L L L L G G G G G G G G G G G G I I I I I I I I I I I I N N N N N N N N N N N N 314 G G G G G G G G G G G G D D D D D D D D D D D D F F F F F F F Y K R T F * F F F F F F F F F F F F D D D D D D D D D D D D Y Y Y Y Y Y Y Y Y Y Y Y 410 S S S S S S S S S S S S V V V V V V V V V I V V S S S S S S S S S S S S D D D D D D D D D D D D A A A A A A A A A S A A Subset of conserved residues from DNA polymerases α and ε (46), α exonucleases are not involved in proofreading. AA change: “*” designates ciliatespecific amino acid change in polarity; “+” designates ciliate-specific amino acid change in charge. AA position indicates amino acid position in each gene. Species and gene identifier number (α/ε) or genome flat file (GFF) number from top to bottom: Homo sapiens (106507301/3192938), Mus musculus (6679409/ 195947387), Drosophila melanogaster (217344/23172053), Caenorhabditis elegans (257146921/17507143), Arabidopsis thaliana (332010917/3885342), Saccharomyces cerevisiae (929851/171409), Trypanosoma cruzi (71661998/70882804), Thalassiosira pseudonana (220967910/220976143), Tetrahymena thermophila (47.m00199/3812.m02363, GFF), Paramecium tetraurelia (GSPATP00014027001/GSPATP00033819001, GFF), Toxoplasma gondii (4808579/237841111), and Giardia lamblia (159108384/159115199). Dark gray boxes indicate residues that are specific to P. tetraurelia and T. thermophila. 19342 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109 Sung et al. Materials and Methods Mutation Accumulation Process and DNA Extraction. For ∼3,300 generations, 100 independent P. tetraurelia MA lines (reference strain d4-2) were passed through single-cell bottlenecks every day to a new tube with fresh wheat grass medium (yeast extract/cerophyl/Na2HPO4/stigmasterol/H2O) seeded with Klebsiella pneumonia. Self-fertilization (autogamy) was stimulated by starvation, which was prevented during daily single-cell transfers to fresh medium. In the absence of autogamy, Paramecium cells senesce. Therefore, every 20–25 d (∼75 generations), cells from each P. tetraurelia MA line were transferred into a single well. After approximately 3 d in the well, the cells reach saturation and undergo autogamy. Autogamy was detected with an Aceto-Carmine nuclear stain that shows fragmentation of the old macronucleus. Single cells were picked when the majority of cells in a well are undergoing autogamy, and they and their descendants were then propagated using daily single-cell transfers until the next scheduled autogamy. Because macronuclear fragments were retained for several generations in the progeny, we were able to ensure that the MA lines went through autogamy by continually monitoring descendants of each line. Over the course of the 4-y experiment, 52 lines died out. It is impossible to say in each case if this was a result of mutational processes, or to handling. Attempts were made in all cases to revert to an earlier generation, but failed for the 52 lines. Paramecium cells are difficult to cryogenically preserve, limiting how far back we could go to revive lines. Eight of the 48 remaining lines were randomly selected for DNA extraction. Cells of each selected line were filtered using 10-μm Nitex filter cloth, and washed several times in Drys Buffer (sodium citrate/NaH2PO4/Na2HPO4/ CaCl2). Homogenization and lysis was done using 0.5% Nonidet P-40 suspended in MgCl2/TrisCl/sucrose, which leaves the macronuclei intact. After centrifugation of the lysate, the supernatant containing most of the bacteria, mitochondria, and micronuclei was discarded. DNA extraction of the pellet containing relatively pure macronuclei was done using CTAB buffer, and was further purified to Illumina library standards using phenol chloroform and ethanol precipitation. The P. tetraurelia lines were sequenced using the Illumina GAIIx platform and mapped to the reference genome (SI Materials and Methods). Mutation Identification Procedure. To identify putative mutations (putations), a consensus approach was used, comparing each individual line (focal line) against the consensus of all of the remaining lines. This methodology has previously been used in multiple MA experiments, and is robust against sequencing or alignment errors that may have occurred in the reference genome (2–4). With the consensus approach, we identified a total of 111 putations before the filtering process. After filtering for paralogy and sequencing errors (SI Materials and Methods and Tables S4–S7), we reduced the number of putations to 29. The 29 remaining mutations (Table 2) are ∼26.8% of the original putations (29 of 108), which is comparable to the final ratio of mutations to putations observed after filtering in other MA experiments [e.g., A. thaliana MA experiment 99 of 538 or ∼18.4% after filtering (3)]. A subset of randomly selected filtered putations was directly sequenced in both directions using standard fluorescent sequencing technology on an ABI3730. Across the seven P. tetraurelia MA lines, 12 of the 12 randomly selected putations were verified as MA-derived mutations, and 10 of the 10 randomly selected filtered putations were verified as false1. Baer CF, Miyamoto MM, Denver DR (2007) Mutation rate variation in multicellular eukaryotes: Causes and consequences. Nat Rev Genet 8(8):619–631. 2. Lynch M, et al. (2008) A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci USA 105(27):9272–9277. 3. Denver DR, et al. (2009) A genome-wide view of Caenorhabditis elegans base-substitution mutation processes. Proc Natl Acad Sci USA 106(38):16310–16314. 4. Ossowski S, et al. (2010) The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327(5961):92–94. 5. Keightley PD, et al. (2009) Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Res 19(7):1195–1201. 6. Lynch M (2010) Evolution of the mutation rate. Trends Genet 26(8):345–352. 7. Lynch M, Conery JS (2003) The origins of genome complexity. Science 302(5649):1401–1404. Sung et al. positives using traditional sequencing. For all cases, the wild-type nucleotide was also confirmed at the mutation site in at least 3 other lines without the mutation. Mutation Rate Calculations. To calculate the base-substitution mutation rate per site per cell division for each line, we used the following equation ubs = m ∕ nT, where ubs is the base-substitution mutation rate (per haploid nucleotide site per generation), m is the number of observed base substitutions, n is the number of haploid nucleotide sites analyzed and T is the number of generations that occurred in the mutationp accumulation line. The SE for an inffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dividual line is calculated using SEx = ubs ∕ nT, pand ffiffiffiffi the total SE of basesubstitution mutation rate is given by SEpooled = s ∕ N, where (s) is the average SE of the mutation rates across all lines and (N) is the number of lines analyzed. The same calculation was used to calculate insertion/deletion (indel) mutation rate, with ubs replaced with uindel . The maximum-likelihood estimates of the base-substitution mutation (u) and error rate (∈) across all lines were obtained from the maximum of the log-likelihood function (17), as previously described (4). The initial values of u (1.94 × 10−11 per site per generation) and ∈ (∼0.005 per site) used to initialize the optimization process were those obtained by the consensus method. A minimum coverage cutoff of 10 and maximum coverage cutoff of 100 were used to obtain the final maximum-likelihood estimate. Mitochondrial Analysis. To determine mitochondrial base-substitution rates, it is necessary to first set a minimum allele-frequency cutoff for calling a heteroplasmic mitochondrial base substitution. Prior studies involving Illumina reads have shown low false-positive rates in calling heteroplasmic mitochondrial base substitutions when requiring a minimum of three forward and three reverse reads with a minimum allele-frequency cutoff greater than 0.10 (43). We applied the same criteria across an average of 39,422 mitochondrial sites in each P. tetraurelia MA line. The MA process does not enforce a complete organelle bottleneck, so unlike the nuclear genome, selection may be still operating at a low efficiency, and mutations will remain in a heteroplasmic state. To determine mitochondrial base-substitution rates, it is necessary to assume that the ultimate fixation of a mitochondrial base substitution is determined solely by genetic drift. Under this assumption, neutral theory dictates that the fixation rate is equal to the mutation rate (44), and the probability of fixation (di) is dependent on the current frequency within an individual (i) (26). Sequencing error can contribute to allele frequencies at a site, so we subtracted the MA-specific error rates (Fig. S2) for the reference nucleotide type nucerr at the heteroplasmic site. The final estimate for mitochondrial base-substitution rate for each line is calculated by: ubs = X di − nucerr ∕ nT : i B-Family DNA Polymerases. The B-family DNA polymerases α, δ, ε, and ζ of 12 eukaryotic species were obtained from the National Center for Biotechnology Information (NCBI) data repository (Table 3). The sequence of each individual DNA polymerase was used to search for homology in the gene database for P. tetraurelia (19, 24) and T. thermophila (45) using BLASTP (minimum e-value cutoff of 10−10 to identify homology). To detect differences in each polymerase, the resulting BLAST matches were aligned with the original DNA polymerases from the NCBI. ACKNOWLEDGMENTS. Support for this study was provided by National Institutes of Health Grant R01 GM036827 (to M.L. and W.K.T.); National Science Foundation Grant MCB-1050161 (to M.L.), and US Department of Defense Multidisciplinary University Research Initiative Award W911NF-09-1-0444 (to M.L., P. Foster, H. Tang, and S. Finkel). 8. Sung W, Ackerman MS, Miller SF, Doak TG, Lynch M (2012) The drift-barrier hypothesis and mutation-rate evolution. Proc Natl Acad Sci USA. 9. Berger JD (1986) Autogamy in Paramecium. Cell cycle stage-specific commitment to meiosis. Exp Cell Res 166(2):475–485. 10. Duret L, et al. (2008) Analysis of sequence variability in the macronuclear DNA of Paramecium tetraurelia: A somatic view of the germline. Genome Res 18(4):585–596. 11. Kimura M (1967) On the evolutionary adjustment of spontaneous mutation rates. Genet Res 9(1):23–24. 12. Dawson KJ (1999) The dynamics of infinitesimally rare alleles, applied to the evolution of mutation rates and the expression of deleterious mutations. Theor Popul Biol 55(1):1–22. 13. Johnson T (1999) The approach to mutation-selection balance in an infinite asexual population, and the evolution of mutation rates. Proc Biol Sci 266(1436):2389–2397. PNAS | November 20, 2012 | vol. 109 | no. 47 | 19343 EVOLUTION and also undergoes multiple rounds of vegetative reproduction before a sexual cycle. Although further biochemical assays are needed to confirm whether modifications in B-family polymerases are truly responsible for the reduction in mutation rate, the nuclear dimorphism unique to ciliated protozoa suggests that this lineage is a logical target in the search for commercially useful, high-fidelity DNA polymerases. 14. Sniegowski PD, Gerrish PJ, Johnson T, Shaver A (2000) The evolution of mutation rates: Separating causes from consequences. Bioessays 22(12):1057–1066. 15. Lynch M (2011) The lower bound to the evolution of mutation rates. Genome Biol Evol 3:1107–1118. 16. Lynch M (2006) The origins of eukaryotic gene structure. Mol Biol Evol 23(2):450–468. 17. Lynch M (2008) Estimation of nucleotide diversity, disequilibrium coefficients, and mutation rates from high-coverage genome-sequencing projects. Mol Biol Evol 25 (11):2409–2419. 18. Lynch M (2010) Rate, molecular spectrum, and consequences of human mutation. Proc Natl Acad Sci USA 107(3):961–968. 19. Arnaiz O, Sperling L (2011) ParameciumDB in 2011: New tools and new data for functional and comparative genomics of the model ciliate Paramecium tetraurelia. Nucleic Acids Res 39(Database issue):D632–D636. 20. Hershberg R, Petrov DA (2010) Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet 6(9):e1001115. 21. Hildebrand F, Meyer A, Eyre-Walker A (2010) Evidence of selection upon genomic GCcontent in bacteria. PLoS Genet 6(9):e1001107. 22. Duncan BK, Miller JH (1980) Mutagenic deamination of cytosine residues in DNA. Nature 287(5782):560–561. 23. Duret L, Galtier N (2009) Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet 10:285–311. 24. Aury JM, et al. (2006) Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444(7116):171–178. 25. Denver DR, Morris K, Lynch M, Vassilieva LL, Thomas WK (2000) High direct estimate of the mutation rate in the mitochondrial genome of Caenorhabditis elegans. Science 289(5488):2342–2344. 26. Haag-Liautard C, et al. (2008) Direct estimation of the mitochondrial DNA mutation rate in Drosophila melanogaster. PLoS Biol 6(8):e204. 27. Catania F, Wurmser F, Potekhin AA, Przybos E, Lynch M (2009) Genetic diversity in the Paramecium aurelia species complex. Mol Biol Evol 26(2):421–431. 28. Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8(8):610–618. 29. Rutter MT, et al. (2012) Fitness of Arabidopsis thaliana mutation accumulation lines whose spontaneous mutations are known. Evolution 66(7):2335–2339. 30. Hall DW, Joseph SB (2010) A high frequency of beneficial mutations across multiple fitness components in Saccharomyces cerevisiae. Genetics 185(4):1397–1409. 19344 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109 31. Hall DW, Mahmoudizad R, Hurd AW, Joseph SB (2008) Spontaneous mutations in diploid Saccharomyces cerevisiae: Another thousand cell generations. Genet Res (Camb) 90(3):229–241. 32. Joseph SB, Hall DW (2004) Spontaneous mutations in diploid Saccharomyces cerevisiae: More beneficial than expected. Genetics 168(4):1817–1825. 33. Johnson T (1999) Beneficial mutations, hitchhiking and the evolution of mutation rates in sexual populations. Genetics 151(4):1621–1631. 34. Pavlov YI, Shcherbakova PV, Kunkel TA (2001) In vivo consequences of putative active site mutations in yeast DNA polymerases alpha, epsilon, delta, and zeta. Genetics 159 (1):47–64. 35. Kunkel TA (2009) Evolving views of DNA replication (in)fidelity. Cold Spring Harb Symp Quant Biol 74:91–101. 36. Loh E, Salk JJ, Loeb LA (2010) Optimization of DNA polymerase mutation rates during bacterial evolution. Proc Natl Acad Sci USA 107(3):1154–1159. 37. Hubscher U, Maga G, Spadari S (2002) Eukaryotic DNA polymerases. Annu Rev Biochem 71:133–163. 38. Wang TS (1991) Eukaryotic DNA polymerases. Annu Rev Biochem 60:513–552. 39. Shevelev IV, Hübscher U (2002) The 3′ 5′ exonucleases. Nat Rev Mol Cell Biol 3(5): 364–376. 40. Snoke MS, Berendonk TU, Barth D, Lynch M (2006) Large global effective population sizes in Paramecium. Mol Biol Evol 23(12):2474–2479. 41. Katz LA, Snoeyenbos-West O, Doerder FP (2006) Patterns of protein evolution in Tetrahymena thermophila: Implications for estimates of effective population size. Mol Biol Evol 23(3):608–614. 42. Lynch M (2007) The Origins of Genome Architecture (Sinauer Associates, Sunderland, MA.). 43. Li M, et al. (2010) Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. Am J Hum Genet 87(2):237–249. 44. Kimura M (1983) The Neutral Theory of Molecular Evolution (Cambridge Univ Press, Cambridge, UK). 45. Eisen JA, et al. (2006) Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. PLoS Biol 4(9):e286. 46. Pavlov YI, Shcherbakova PV, Rogozin IB (2006) Roles of DNA polymerases in replication, repair, and recombination in eukaryotes. Int Rev Cytol 255:41–132. Sung et al.
© Copyright 2026 Paperzz