Extraordinary genome stability in the ciliate Paramecium tetraurelia

Extraordinary genome stability in the ciliate
Paramecium tetraurelia
Way Sunga,1, Abraham E. Tuckera, Thomas G. Doaka, Eunjin Choia, W. Kelley Thomasb, and Michael Lyncha
a
Department of Biology, Indiana University, Bloomington, IN 47405; and bDepartment of Molecular Cellular and Biomedical Sciences, University of New
Hampshire, Durham, NH 03824
Mutation plays a central role in all evolutionary processes and is also
the basis of genetic disorders. Established base-substitution mutation rates in eukaryotes range between ∼5 × 10−10 and 5 × 10−8 per
site per generation, but here we report a genome-wide estimate for
Paramecium tetraurelia that is more than an order of magnitude
lower than any previous eukaryotic estimate. Nevertheless, when
the mutation rate per cell division is extrapolated to the length of
the sexual cycle for this protist, the measure obtained is comparable
to that for multicellular species with similar genome sizes. Because
Paramecium has a transcriptionally silent germ-line nucleus, these
results are consistent with the hypothesis that natural selection
operates on the cumulative germ-line replication fidelity per episode of somatic gene expression, with the germ-line mutation rate
per cell division evolving downward to the lower barrier imposed
by random genetic drift. We observe ciliate-specific modifications of
widely conserved amino acid sites in DNA polymerases as one potential explanation for unusually high levels of replication fidelity.
mutation accumulation
| drift-barrier
M
utation is the ultimate source of genetic variation, which
not only drives adaptive processes but also contributes to
genetic disorders, and in some cases, extinction. Understanding
the mutation rate is critical in determining rates of molecular
evolution, estimating effective population sizes, understanding
the impact of mutations on organismal fitness, and evaluating the
power of drift, selection, and recombination in shaping genomes
(1). However, because of the extraordinarily high degree of
replication fidelity in most organisms, procuring accurate measures of mutation rate has been fraught with difficulties. The recent application of high-throughput sequencing technology to
mutation-accumulation (MA) lines has now revolutionized this
area of inquiry, yielding essentially unbiased estimates of the
genome-wide spontaneous mutation rate and spectrum in several
eukaryotes (2–5). Under the MA process, repeated population
bottlenecks minimize the efficiency of selection, enabling even
highly deleterious mutations to accumulate in an effectively
neutral fashion. At various points in the MA process, the accumulated pool of mutations in the individual lines can be assayed
using high-throughput sequencing. So far, MA experiments in
eukaryotes have shown that per-generation mutation rates generally correlate with genome size and effective population size
(6, 7). However, the available data involving whole-genome sequencing of MA experiments include only a limited number of
unicellular eukaryotes (2, 8), and the factors driving mutationrate evolution across the eukaryotic domain remain unclear.
To further understand mutation-rate evolution in eukaryotes,
we have applied the MA process to the ubiquitous unicellular
freshwater ciliate Paramecium tetraurelia. This species, and ciliates in general, have a distinct reproductive biology that may
constrain mutation-rate evolution. P. tetraurelia exhibits nuclear
dimorphism, harboring a transcriptionally silent (germ-line) micronucleus and a transcriptionally active (somatic) macronucleus. The species normally reproduces by mitotic binary fission,
but after ∼75 cell divisions under high-growth conditions (data
www.pnas.org/cgi/doi/10.1073/pnas.1210663109
from this experiment), or ∼30–50 fissions when starved (9), P.
tetraurelia undergoes a self-fertilization process known as autogamy (9), at which time the old macronucleus is destroyed and
replaced by a processed version of the new micronuclear genome
(10). When in contact with compatible mating types, P. tetraurelia
can also undergo conjugation (10), although, this can be prevented in the laboratory by using a stock consisting of only one
mating type. Under conditions of exclusive autogamy, all mutations arising in the micronucleus during clonal propagation
should accumulate in a completely neutral fashion, with the fitness effects only being realized after sexual reproduction.
The mutations that accumulate in the micronucleus during
P. tetraurelia clonal division are analogous to mutations that
accumulate during germ-line cell divisions of metazoan, as the
effects of both types of mutations are not realized until expression after the subsequent sexual generation. Although metazoans
have some of the highest per-generation mutation rates known,
the mutation rate per germ-line cell division remains quite low in
such species (6). This observation suggests that although selection operates to drive down mutation rates (6, 11–14), because it
can only operate on the visible effects of mutations, the efficiency
of selection to reduce the germ-line mutation rate is constrained
by the amount of time during which mutations experience quiescent germ-line sequestration (15). If this hypothesis is correct,
the mutation rate per cell division in P. tetraurelia (and ciliates in
general) is expected to be lower than that in other unicellular
species. Thus, application of the MA procedure to P. tetraurelia
provides a unique opportunity to shed light on the mechanisms
driving the evolution of the mutation rate.
Results
Mutation Rate. To estimate the mutation rate in P. tetraurelia, we
analyzed the macronuclear genomes of seven P. tetraurelia MA
lines that had been propagated through single-cell bottlenecks for
∼3,300 cell divisions starting from an isogenic state and subsequently restricted to periodic autogamy. At the time of the final
assay, essentially all detected mutations will have resided in the
germ line at the last autogamy, and hence will reflect true germline mutations. A previous estimate of the scaling of the mutation
rate and genome size predicts a base-substitution rate of ∼1.5 ×
10−9 per site per generation given the ∼72-Mb genome size of
P. tetraurelia (6, 7, 16), which would have generated at least a few
hundred base substitutions in each MA line. However, across all
Author contributions: W.S., W.K.T., and M.L. designed research; W.S., A.T., T.G.D., and E.C.
performed research; W.S. analyzed data; and W.S., W.K.T., and M.L. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequences reported in this paper have been deposited in the
National Center for Biotechnology Information Short Read Archive (SRA), www.ncbi.
nlm.nih.gov/sra, [accession nos. SRP013857 (study), SRS346546 (sample), and
SRX155532 (experiment)].
1
To whom correspondence should be addressed. E-mail: [email protected].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
1073/pnas.1210663109/-/DCSupplemental.
PNAS | November 20, 2012 | vol. 109 | no. 47 | 19339–19344
EVOLUTION
Edited by Detlef Weigel, Max Planck Institute for Developmental Biology, Tübingen, Germany, and approved October 10, 2012 (received for review
June 21, 2012)
seven lines (an average of 63.2 Mb analyzable sequence per line),
we identified only 29 mutations, yielding an overall base-substitution mutation rate of 1.94 (SE = 0.36) × 10−11 per site per
cell division (Table 1). A separate maximum-likelihood approach
(17) yielded a slightly higher rate of 2.64 (SE = 0.22) × 10−11 per
site per cell division. At a level ∼75× lower than the expectation
for eukaryotes with a similar genome size, and even 10× lower
than most prokaryotic estimates, the P. tetraurelia mutation rate is
by far the lowest ever observed.
Our analysis also revealed a total of five insertions (but no
deletions) across the seven lines, with only one involving a simple
sequence repeat (Table S1), yielding an insertion/deletion rate of
3.87 (2.10) × 10−12 per site per cell division. The low proportion
of all mutations that are insertion/deletion events (0.15) is consistent with observations in humans (0.06%) (18) and Arabidopsis
thaliana (0.10) (4).
Using the annotation of the P. tetraurelia genome (19), we
attempted to identify the functional context of each base substitution (Table 1). Twenty-three of the 29 substitutions (79.3%)
were in coding regions, with the remaining six found in intergenic
sites, consistent with the random expectation based upon overall
genome composition (χ2 test, P = 0.48, 2 df) (Table 2). If the
substitution mutations in coding regions are randomly distributed across P. tetraurelia codons (Table S2), 29.1% are expected
to result in silent changes. The observed ratio (8 of 23 = 34.8%)
does not differ significantly from this expectation (χ2 test, P =
0.55, 1 df), suggesting that purifying selection on coding-region
mutations was not a significant force in the MA lines.
Mutational Bias. There are 5.2× more G/C→A/T base substitutions than A/T→G/C mutations in this species, and taking
into consideration the A/T genome composition (71%), this
implies that G/C→A/T base substitutions arise ∼12.9-times more
frequently per target site than do A/T→G/C base substitutions
(Fig. 1). This mutation bias toward A/T, which is consistent with
observations made in all other species to date (2–5, 20, 21), may
be a consequence of the spontaneous deamination of cytosine
and the conversion of guanine to 8-oxo-guanine (22). Over the
course of the experiment, zero A:T→G:C transitions were observed (Fig. 1). If we assume that mutations are Poisson distributed, there is a >0.95 probability of not seeing any A:T→G:C
base-substitution mutations across seven lines if the rate of origin
of such mutations is ≤2.51 × 10−12.
If the P. tetraurelia genome has reached a nucleotide-content
equilibrium from mutation pressure alone, a genome composition
of 93% A/T would be expected based on the observed mutation
spectrum. However, the A/T contents for both the entire genome
(71%) and for silent sites alone (76%) (Table S2) are substantially below this expectation, implying that the A/T mutation
bias has historically been opposed by other forces. Although we
cannot rule out the possibility that the P. tetraurelia genome is
currently moving toward mutation equilibrium, the structure of
the P. tetraurelia genome may facilitate G/C maintenance by biased gene conversion (23). The inverse correlation between
chromosome size and G/C content in the P. tetraurelia genome
(10) is consistent with an increased density of crossover events
(and associated conversion events) in smaller chromosomes, especially when one considers the G/C bias of gene conversion
found in most organisms (24).
Mitochondrial Mutations. The same selective pressure that is stabilizing the mutation rate in the silent P. tetraurelia germ line
should not apply to the mitochondrial genome in this species, as
mitochondrial genes are expressed every generation, although
mutations are expected to be found in a heteroplasmic state for
a number of cell divisions after first appearance. Using methods
identical to those for detecting nuclear mutations, with an average of 39,422 analyzed sites per line, we discovered no fixed
mitochondrial mutations across the seven lines, but were able to
estimate the base-substitution mutation rate by using the fraction
of mutations attaining significantly higher frequencies (>3 SDs)
than expected from random sequencing error (Fig. S1). Summing the allele frequencies for all heteroplasmic MA-derived
mitochondrial base substitutions, we obtain an estimate of 6.96
(2.44) × 10−8 base-substitution mutations per site per cell division (Table S3), which is quite high compared with most other
eukaryotes (∼1 × 10−8) (2, 6, 25, 26) and ∼2,600× the
P. tetraurelia nuclear rate, but consistent with the observation
that Paramecium mitochondrial silent-site diversity estimates are
among the highest ever reported (27).
Discussion
During periods of vegetative propagation in P. tetraurelia, the
germ-line micronucleus is replicated but not expressed (10, 24),
and hence germ-line mutation accumulation is unrestrained by
natural selection until expression by the new macronucleus following an episode of sexual reproduction. If the P. tetraurelia
micronucleus had a typical base-substitution mutation rate for its
genome size [∼1.5 × 10−9 per site per generation with a genome
size of 72 Mb (6)], each genome would be expected to accumulate ∼7.9 mutations over the 75 asexual generations, and the
subsequent exposure following autogamy would likely impose
a very high burden for a genome with 78% coding density (24).
Thus, we propose that lengthy periods of germ-line sequestration
Table 1. P. tetraurelia summary statistics, base-substitution distribution, and mutation rate
MA line
Base substitutions
Intergenic
Intron
Synonymous
Nonsynonymous
Ts/Tv ratio
Read depth
Analyzed sites (×107)
Generations (×103)
Mutation rate (×10−11)
SE (×10−11)
15
25
30
40
55
60
70
Pooled
1
0
3
3
0.40
31.55
6.59
3.31
3.21
1.21
0
0
1
4
0.25
86.54
7.02
3.32
2.14
0.96
2
0
1
2
0.67
59.36
7.00
3.31
2.16
0.97
0
0
0
2
–
28.54
4.47
3.31
1.36
0.96
1
0
2
0
2.00
45.36
6.97
3.31
1.30
0.75
1
0
0
3
0.33
42.79
5.45
3.31
1.73
0.87
1
0
1
1
–
73.78
7.01
3.31
1.67
0.96
6
0
8
15
0.73
52.56
6.36
3.31
1.94
0.36
Base-substitution distribution, transition/transversion (Ts/Tv) ratios, number of analyzable sites per line, generations at time of sequencing, mutation rate per site per generation, and the SE of the base-substitution mutation
rate per site per generation across seven P. tetraurelia MA lines. Pooled column is the total sum for summary
statistics, and average for mutation rates.
19340 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109
Sung et al.
-11
Cons
Mut
Type
Orig AA
New AA
110
125
150
184
560
63
72
11
157
32
6
89
103
145
175
65
88
1
8
515
62
73
102
16
57
162
146
60
79
113028
27587
6745
1534
447
208
326857
241977
128026
259470
647359
192525
52586
29743
100327
330649
68666
547782
591378
1496
62583
359233
226059
307226
454181
73827
108424
187448
28592
C
A
T
A
G
A
C
C
A
C
A
C
C
G
A
C
G
T
C
G
G
T
G
G
T
C
G
A
C
A
T
G
T
A
T
T
G
T
G
T
T
T
T
T
A
A
G
G
A
A
A
T
A
G
A
T
T
A
EX
EX
EX
EX
EX
IG
EX
EX
EX
EX
EX
EX
EX
IG
IG
EX
EX
EX
EX
EX
IG
EX
EX
EX
IG
EX
EX
EX
IG
H
P
F
H
I
N
P
V
L
I
I
F
L
R
L
T
P
I
L
L
P
Q
I
S
2.0
1.6
1.2
0.8
0.4
0.0
AT > GC
H
F
K
S
I
N
F
Q
*
I
S
L
S
S
I
L
C
P
K
F
P
N
P. tetraurelia mutations identified using the consensus approach. Cons,
consensus nucleotide; Line, mutation accumulation line; Mut, mutation nucleotide; New AA, new amino acid; Orig AA, original amino acid; Scaff, scaffold; Type, coding context (EX, exon; IG, intergenic).
in P. tetraurelia promote unusually strong selection for low germline mutation rates (on a per-cell division basis).
Because the majority of mutations with fitness effects are
slightly deleterious (1, 28), natural selection will generally operate
to promote mechanisms that minimize the production of replication errors and maximize the efficiency of repair of nonreplicating
DNA (e.g., higher fidelity polymerases, and improved repair
enzymes) (6, 11, 14). However, the point at which the advantage of
a further reduction in the mutation rate equals the power of
random genetic drift (resulting from finite population size) represents the lower bound to which the mutation rate can be driven
by natural selection (15). Because selection can only operate on
expressed mutations, the unit of selection on the mutation rate in
a species with a sequestered germ line is the pool of deleterious
mutations accumulated over all germ-line cell divisions, which
generally have no phenotypic effects until emerging into a soma in
the following generation. Thus, in a metazoan, selection operates
to minimize the per-generation mutation rate, and this is accomplished by reducing the mutation rate per cell division by a factor
equal to the number of germ-line cell divisions per generation. As
a consequence, although mammals have the highest known pergeneration mutation rates, the rates per cell division are very low
in the germ line (6). Not all mutations are deleterious, and the
fraction of mutations that are beneficial may vary with environmental context (29–32). However, for sexually reproducing
organisms like P. tetraurelia, the indirect selection for a higher
mutation rate associated with beneficial mutations is expected
to be small relative to the downward pressure associated with
background deleterious mutations (14, 33).
Sung et al.
2.4
GC > AT
AT > TA
Transitions
GC > TA
AT > CG
GC > CG
Transversions
Fig. 1. Conditional base-substitution rates for P. tetraurelia MA lines. Rates
for each base-substitution type normalized by genome base composition;
error bars indicate one SE. Gray horizontal line indicates the average mutation rate per site across all lines, with gray shading showing the SE.
Viewed in this way, the mutation rate per sexual episode in
P. tetraurelia is equal to 75× the rate per germ-line cell division,
which is essentially the same as the per-generation expectation
for other eukaryotes with similar genome sizes, ∼1.5 × 10−9 per
site (Fig. 2). Thus, the extraordinarily low mutation rate in this
species appears to be a consequence of the high efficiency of
selection operating on this globally abundant species combined
with the added constraint of germ-line sequestration. The unusual level of mutation-rate depression observed in Paramecium
is not expected in unicellular species that lack a protected germ
line (e.g., bacteria and yeast), as such cells immediately experience the effects of mutations.
Mus musculus
10
Homo sapiens
Arabidopsis thaliana
Drosophila melanogaster
Caenorhabditis elegans
Paramecium tetraurelia (per sexual episode)
1
Saccharomyces cerevisiae
0.1
Paramecium tetraurelia
0.01
0.01
0.1
1
Genome Size (Mb)
Fig. 2. Base-substitution rate scaling for P. tetraurelia per sexual episode.
Filled squares represent base-substitution mutation rates derived from human disease alleles (18) and MA projects (2–5). P. tetraurelia open circle
represents the base-substitution mutation rate per cell division from this
experiment, and the P. tetraurelia filled circle indicates the base-substitution
mutation rate per sexual episode. Linear regression for filled icons only (r2 =
0.85, P = 0.002).
PNAS | November 20, 2012 | vol. 109 | no. 47 | 19341
EVOLUTION
Position
-9
15
15
15
15
15
15
15
25
25
25
25
25
30
30
30
30
30
40
40
55
55
55
60
60
60
60
70
70
70
Scaff
Base-Substitution Mutation Rate (x10 )
Line
Conditional Base Substitution Rates (x 10 )
Table 2. P. tetraurelia mutation accumulation derived
mutations
The high level of replication fidelity in P. tetraurelia may be
achieved via modifications in replication enzymes, as alterations
of critical domains in B-family replicative polymerases are known
to have direct effects on mutation rates (34–36). B-family replicative polymerases α, δ, ε, and ζ are primary enzymes involved in
initiating replication, synthesizing the leading and lagging strand,
and allowing for synthesis of DNA across lesions (35, 37, 38). In
addition to being involved in the synthesis of the leading and
lagging strands, δ and ε polymerases also contain 3′ proofreading
exonuclease domains critical to excision of incorrectly polymerized nucleotides (39). Using BLAST (Materials and Methods), we
identified all P. tetraurelia B-family DNA polymerases (α, δ, ε, and
ζ homologs) and compared these to other eukaryotes. Remarkably, ciliate-specific amino acid changes in the primary catalytic
sites of α-region III and ζ-region III result in a switches of amino
acid polarity (T→V; T→V/I/M) (Table 3) in active sites that are
otherwise highly conserved across eukaryotes. Moreover, the
proofreading exonuclease of DNA polymerase ε exhibits ciliatespecific changes in the highly conserved region II to highly
charged amino acids (F→R/K) (Table 3). Although such changes
in amino acid sequence might not necessarily translate into an
improvement in DNA fidelity and lower mutation rates, the
unique sequences of ciliate DNA polymerases suggest a fundamental change in the mechanisms of replication fidelity in this
lineage (Table 3). Nevertheless, it remains unclear why residues
strongly conserved in other eukaryotic lineages and presumably
essential for high replication fidelity would be specifically altered
in ciliates to achieve even higher accuracy.
Despite the global distribution and high local abundances of
the study species, conservative estimates of standing levels of
silent-site heterozygosity within species of the P. aurelia complex
(π s = 0.0096) are not extraordinarily high (27, 40), and for other
ciliates may be even lower [e.g., Tetrahymena thermophila, π s =
0.0030 (41)], an observation that lead the latter authors to suggest that effective population sizes (Ne) may be low in ciliates
(41). A re-evaluation of this interpretation appears to be in order, as it is derived under the assumption of a substantially
higher mutation rate than observed herein. Although low pergeneration mutation rates may not be common to all ciliates,
assuming standing levels of silent-site diversity reflect the neutral
expectation (4Neu), by factoring out the term 4u, we estimate
that Ne is on the order of 108 for P. tetraurelia and perhaps 107
for T. thermophila. Because there is substantial evidence of selection on silent sites (42), especially in species with an elevated
Ne, these estimates are likely to be downwardly biased.
Finally, it is worth noting that like P. tetraurelia, the distantly
related ciliate T. thermophila shares strikingly similar modifications to active sites in B-family DNA polymerases (Table 3),
Table 3. Differences in α and ε B-family DNA polymerases of P. tetraurelia
Polymerase catalytic sites
Species
Proofreading 3′→5′ exonuclease
Region III
Region I
Exo II
Exo III
α
AA change
AA position
H. sapiens
M. musculus
D. melanogaster
C. elegans
S. cerevisiae
A. thaliana
T. pseudonana
T. cruzi
T. thermophila
P. tetraurelia
T. gondii
G. lamblia
*
K
K
K
K
K
K
K
K
K
K
K
K
L
L
L
L
L
L
L
L
L
L
L
L
T
T
T
T
T
T
T
T
V
V
T
C
A
A
A
A
A
A
A
A
A
A
A
A
N
N
N
N
N
N
N
N
N
N
N
N
S
S
S
S
S
S
S
S
S
S
S
A
M
M
M
M
M
M
M
M
M
I
I
M
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
839
G
G
G
G
G
G
G
G
G
G
G
G
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
783
G
G
G
G
G
G
G
G
G
G
G
G
C
C
C
C
C
C
C
C
C
C
C
S
I
I
V
V
V
I
I
I
I
I
V
I
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
G
G
G
G
G
G
G
G
G
G
G
G
D
D
D
D
D
D
D
D
D
D
D
D
T
T
T
T
T
T
T
T
T
T
T
T
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
825
T D
T D
T D
T D
T D
T D
T D
T D
T D
T D
T D
T D
885
S
S
S
S
S
S
S
S
S
S
S
S
I
I
L
I
V
I
I
V
L
M
I
I
ε
AA change
AA position
H. sapiens
M. musculus
D. melanogaster
C. elegans
S. cerevisiae
A. thaliana
T. pseudonana
T. cruzi
T. thermophila
P. tetraurelia
T. gondii
G. lamblia
+
*
K
K
K
K
K
K
K
K
K
K
K
K
C
C
C
C
V
C
C
C
I
I
C
C
I
I
I
I
I
I
I
I
I
I
I
I
L
L
L
L
L
L
L
L
L
L
L
L
N
N
N
N
N
N
N
N
N
N
N
N
S
S
S
S
S
S
S
S
S
S
S
S
F
F
F
F
F
F
F
F
F
F
F
F
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
L
L
L
L
L
L
L
L
L
L
M
L
E
E
E
E
E
E
E
E
E
E
E
E
L
L
L
L
L
L
L
L
L
L
L
L
G
G
G
G
G
G
G
G
G
G
G
G
I
I
I
I
I
I
I
I
I
I
I
I
N
N
N
N
N
N
N
N
N
N
N
N
314
G
G
G
G
G
G
G
G
G
G
G
G
D
D
D
D
D
D
D
D
D
D
D
D
F
F
F
F
F
F
F
Y
K
R
T
F
*
F
F
F
F
F
F
F
F
F
F
F
F
D
D
D
D
D
D
D
D
D
D
D
D
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
410
S
S
S
S
S
S
S
S
S
S
S
S
V
V
V
V
V
V
V
V
V
I
V
V
S
S
S
S
S
S
S
S
S
S
S
S
D
D
D
D
D
D
D
D
D
D
D
D
A
A
A
A
A
A
A
A
A
S
A
A
Subset of conserved residues from DNA polymerases α and ε (46), α exonucleases are not involved in proofreading. AA change: “*” designates ciliatespecific amino acid change in polarity; “+” designates ciliate-specific amino acid change in charge. AA position indicates amino acid position in each gene.
Species and gene identifier number (α/ε) or genome flat file (GFF) number from top to bottom: Homo sapiens (106507301/3192938), Mus musculus (6679409/
195947387), Drosophila melanogaster (217344/23172053), Caenorhabditis elegans (257146921/17507143), Arabidopsis thaliana (332010917/3885342), Saccharomyces cerevisiae (929851/171409), Trypanosoma cruzi (71661998/70882804), Thalassiosira pseudonana (220967910/220976143), Tetrahymena thermophila
(47.m00199/3812.m02363, GFF), Paramecium tetraurelia (GSPATP00014027001/GSPATP00033819001, GFF), Toxoplasma gondii (4808579/237841111), and
Giardia lamblia (159108384/159115199). Dark gray boxes indicate residues that are specific to P. tetraurelia and T. thermophila.
19342 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109
Sung et al.
Materials and Methods
Mutation Accumulation Process and DNA Extraction. For ∼3,300 generations,
100 independent P. tetraurelia MA lines (reference strain d4-2) were passed
through single-cell bottlenecks every day to a new tube with fresh wheat
grass medium (yeast extract/cerophyl/Na2HPO4/stigmasterol/H2O) seeded
with Klebsiella pneumonia.
Self-fertilization (autogamy) was stimulated by starvation, which was
prevented during daily single-cell transfers to fresh medium. In the absence of
autogamy, Paramecium cells senesce. Therefore, every 20–25 d (∼75 generations), cells from each P. tetraurelia MA line were transferred into a single
well. After approximately 3 d in the well, the cells reach saturation and undergo autogamy. Autogamy was detected with an Aceto-Carmine nuclear
stain that shows fragmentation of the old macronucleus. Single cells were
picked when the majority of cells in a well are undergoing autogamy, and
they and their descendants were then propagated using daily single-cell
transfers until the next scheduled autogamy. Because macronuclear fragments were retained for several generations in the progeny, we were able to
ensure that the MA lines went through autogamy by continually monitoring
descendants of each line.
Over the course of the 4-y experiment, 52 lines died out. It is impossible to
say in each case if this was a result of mutational processes, or to handling.
Attempts were made in all cases to revert to an earlier generation, but failed
for the 52 lines. Paramecium cells are difficult to cryogenically preserve,
limiting how far back we could go to revive lines. Eight of the 48 remaining
lines were randomly selected for DNA extraction.
Cells of each selected line were filtered using 10-μm Nitex filter cloth, and
washed several times in Drys Buffer (sodium citrate/NaH2PO4/Na2HPO4/
CaCl2). Homogenization and lysis was done using 0.5% Nonidet P-40 suspended in MgCl2/TrisCl/sucrose, which leaves the macronuclei intact. After
centrifugation of the lysate, the supernatant containing most of the bacteria, mitochondria, and micronuclei was discarded. DNA extraction of the
pellet containing relatively pure macronuclei was done using CTAB buffer,
and was further purified to Illumina library standards using phenol chloroform and ethanol precipitation. The P. tetraurelia lines were sequenced
using the Illumina GAIIx platform and mapped to the reference genome
(SI Materials and Methods).
Mutation Identification Procedure. To identify putative mutations (putations),
a consensus approach was used, comparing each individual line (focal line)
against the consensus of all of the remaining lines. This methodology has
previously been used in multiple MA experiments, and is robust against
sequencing or alignment errors that may have occurred in the reference
genome (2–4). With the consensus approach, we identified a total of 111
putations before the filtering process. After filtering for paralogy and sequencing errors (SI Materials and Methods and Tables S4–S7), we reduced
the number of putations to 29. The 29 remaining mutations (Table 2) are
∼26.8% of the original putations (29 of 108), which is comparable to the
final ratio of mutations to putations observed after filtering in other MA
experiments [e.g., A. thaliana MA experiment 99 of 538 or ∼18.4% after
filtering (3)]. A subset of randomly selected filtered putations was directly
sequenced in both directions using standard fluorescent sequencing technology on an ABI3730. Across the seven P. tetraurelia MA lines, 12 of the
12 randomly selected putations were verified as MA-derived mutations, and
10 of the 10 randomly selected filtered putations were verified as false1. Baer CF, Miyamoto MM, Denver DR (2007) Mutation rate variation in multicellular
eukaryotes: Causes and consequences. Nat Rev Genet 8(8):619–631.
2. Lynch M, et al. (2008) A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci USA 105(27):9272–9277.
3. Denver DR, et al. (2009) A genome-wide view of Caenorhabditis elegans base-substitution mutation processes. Proc Natl Acad Sci USA 106(38):16310–16314.
4. Ossowski S, et al. (2010) The rate and molecular spectrum of spontaneous mutations
in Arabidopsis thaliana. Science 327(5961):92–94.
5. Keightley PD, et al. (2009) Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Res 19(7):1195–1201.
6. Lynch M (2010) Evolution of the mutation rate. Trends Genet 26(8):345–352.
7. Lynch M, Conery JS (2003) The origins of genome complexity. Science 302(5649):1401–1404.
Sung et al.
positives using traditional sequencing. For all cases, the wild-type nucleotide
was also confirmed at the mutation site in at least 3 other lines without
the mutation.
Mutation Rate Calculations. To calculate the base-substitution mutation rate per
site per cell division for each line, we used the following equation ubs = m ∕ nT,
where ubs is the base-substitution mutation rate (per haploid nucleotide site
per generation), m is the number of observed base substitutions, n is the
number of haploid nucleotide sites analyzed and T is the number of generations that occurred in the mutationp
accumulation
line. The SE for an inffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
dividual line is calculated using SEx = ubs ∕ nT, pand
ffiffiffiffi the total SE of basesubstitution mutation rate is given by SEpooled = s ∕ N, where (s) is the average
SE of the mutation rates across all lines and (N) is the number of lines analyzed.
The same calculation was used to calculate insertion/deletion (indel) mutation
rate, with ubs replaced with uindel .
The maximum-likelihood estimates of the base-substitution mutation (u)
and error rate (∈) across all lines were obtained from the maximum of the
log-likelihood function (17), as previously described (4). The initial values of
u (1.94 × 10−11 per site per generation) and ∈ (∼0.005 per site) used to initialize the optimization process were those obtained by the consensus
method. A minimum coverage cutoff of 10 and maximum coverage cutoff
of 100 were used to obtain the final maximum-likelihood estimate.
Mitochondrial Analysis. To determine mitochondrial base-substitution rates, it
is necessary to first set a minimum allele-frequency cutoff for calling a heteroplasmic mitochondrial base substitution. Prior studies involving Illumina
reads have shown low false-positive rates in calling heteroplasmic mitochondrial base substitutions when requiring a minimum of three forward and
three reverse reads with a minimum allele-frequency cutoff greater than 0.10
(43). We applied the same criteria across an average of 39,422 mitochondrial
sites in each P. tetraurelia MA line. The MA process does not enforce
a complete organelle bottleneck, so unlike the nuclear genome, selection
may be still operating at a low efficiency, and mutations will remain in
a heteroplasmic state. To determine mitochondrial base-substitution rates, it
is necessary to assume that the ultimate fixation of a mitochondrial base
substitution is determined solely by genetic drift. Under this assumption,
neutral theory dictates that the fixation rate is equal to the mutation rate
(44), and the probability of fixation (di) is dependent on the current frequency within an individual (i) (26). Sequencing error can contribute to allele
frequencies at a site, so we subtracted the MA-specific error rates (Fig. S2)
for the reference nucleotide type nucerr at the heteroplasmic site. The
final estimate for mitochondrial base-substitution rate for each line is
calculated by:
ubs =
X
di − nucerr ∕ nT :
i
B-Family DNA Polymerases. The B-family DNA polymerases α, δ, ε, and ζ of 12
eukaryotic species were obtained from the National Center for Biotechnology Information (NCBI) data repository (Table 3). The sequence of
each individual DNA polymerase was used to search for homology in the
gene database for P. tetraurelia (19, 24) and T. thermophila (45) using
BLASTP (minimum e-value cutoff of 10−10 to identify homology). To detect
differences in each polymerase, the resulting BLAST matches were aligned
with the original DNA polymerases from the NCBI.
ACKNOWLEDGMENTS. Support for this study was provided by National Institutes of Health Grant R01 GM036827 (to M.L. and W.K.T.); National Science
Foundation Grant MCB-1050161 (to M.L.), and US Department of Defense
Multidisciplinary University Research Initiative Award W911NF-09-1-0444 (to
M.L., P. Foster, H. Tang, and S. Finkel).
8. Sung W, Ackerman MS, Miller SF, Doak TG, Lynch M (2012) The drift-barrier hypothesis and mutation-rate evolution. Proc Natl Acad Sci USA.
9. Berger JD (1986) Autogamy in Paramecium. Cell cycle stage-specific commitment to
meiosis. Exp Cell Res 166(2):475–485.
10. Duret L, et al. (2008) Analysis of sequence variability in the macronuclear DNA of
Paramecium tetraurelia: A somatic view of the germline. Genome Res 18(4):585–596.
11. Kimura M (1967) On the evolutionary adjustment of spontaneous mutation rates.
Genet Res 9(1):23–24.
12. Dawson KJ (1999) The dynamics of infinitesimally rare alleles, applied to the evolution of
mutation rates and the expression of deleterious mutations. Theor Popul Biol 55(1):1–22.
13. Johnson T (1999) The approach to mutation-selection balance in an infinite asexual
population, and the evolution of mutation rates. Proc Biol Sci 266(1436):2389–2397.
PNAS | November 20, 2012 | vol. 109 | no. 47 | 19343
EVOLUTION
and also undergoes multiple rounds of vegetative reproduction
before a sexual cycle. Although further biochemical assays are
needed to confirm whether modifications in B-family polymerases are truly responsible for the reduction in mutation rate,
the nuclear dimorphism unique to ciliated protozoa suggests that
this lineage is a logical target in the search for commercially
useful, high-fidelity DNA polymerases.
14. Sniegowski PD, Gerrish PJ, Johnson T, Shaver A (2000) The evolution of mutation
rates: Separating causes from consequences. Bioessays 22(12):1057–1066.
15. Lynch M (2011) The lower bound to the evolution of mutation rates. Genome Biol
Evol 3:1107–1118.
16. Lynch M (2006) The origins of eukaryotic gene structure. Mol Biol Evol 23(2):450–468.
17. Lynch M (2008) Estimation of nucleotide diversity, disequilibrium coefficients, and
mutation rates from high-coverage genome-sequencing projects. Mol Biol Evol 25
(11):2409–2419.
18. Lynch M (2010) Rate, molecular spectrum, and consequences of human mutation.
Proc Natl Acad Sci USA 107(3):961–968.
19. Arnaiz O, Sperling L (2011) ParameciumDB in 2011: New tools and new data for
functional and comparative genomics of the model ciliate Paramecium tetraurelia.
Nucleic Acids Res 39(Database issue):D632–D636.
20. Hershberg R, Petrov DA (2010) Evidence that mutation is universally biased towards
AT in bacteria. PLoS Genet 6(9):e1001115.
21. Hildebrand F, Meyer A, Eyre-Walker A (2010) Evidence of selection upon genomic GCcontent in bacteria. PLoS Genet 6(9):e1001107.
22. Duncan BK, Miller JH (1980) Mutagenic deamination of cytosine residues in DNA.
Nature 287(5782):560–561.
23. Duret L, Galtier N (2009) Biased gene conversion and the evolution of mammalian
genomic landscapes. Annu Rev Genomics Hum Genet 10:285–311.
24. Aury JM, et al. (2006) Global trends of whole-genome duplications revealed by the
ciliate Paramecium tetraurelia. Nature 444(7116):171–178.
25. Denver DR, Morris K, Lynch M, Vassilieva LL, Thomas WK (2000) High direct estimate
of the mutation rate in the mitochondrial genome of Caenorhabditis elegans. Science
289(5488):2342–2344.
26. Haag-Liautard C, et al. (2008) Direct estimation of the mitochondrial DNA mutation
rate in Drosophila melanogaster. PLoS Biol 6(8):e204.
27. Catania F, Wurmser F, Potekhin AA, Przybos E, Lynch M (2009) Genetic diversity in the
Paramecium aurelia species complex. Mol Biol Evol 26(2):421–431.
28. Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8(8):610–618.
29. Rutter MT, et al. (2012) Fitness of Arabidopsis thaliana mutation accumulation lines
whose spontaneous mutations are known. Evolution 66(7):2335–2339.
30. Hall DW, Joseph SB (2010) A high frequency of beneficial mutations across multiple
fitness components in Saccharomyces cerevisiae. Genetics 185(4):1397–1409.
19344 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109
31. Hall DW, Mahmoudizad R, Hurd AW, Joseph SB (2008) Spontaneous mutations in
diploid Saccharomyces cerevisiae: Another thousand cell generations. Genet Res
(Camb) 90(3):229–241.
32. Joseph SB, Hall DW (2004) Spontaneous mutations in diploid Saccharomyces cerevisiae: More beneficial than expected. Genetics 168(4):1817–1825.
33. Johnson T (1999) Beneficial mutations, hitchhiking and the evolution of mutation
rates in sexual populations. Genetics 151(4):1621–1631.
34. Pavlov YI, Shcherbakova PV, Kunkel TA (2001) In vivo consequences of putative active
site mutations in yeast DNA polymerases alpha, epsilon, delta, and zeta. Genetics 159
(1):47–64.
35. Kunkel TA (2009) Evolving views of DNA replication (in)fidelity. Cold Spring Harb
Symp Quant Biol 74:91–101.
36. Loh E, Salk JJ, Loeb LA (2010) Optimization of DNA polymerase mutation rates during
bacterial evolution. Proc Natl Acad Sci USA 107(3):1154–1159.
37. Hubscher U, Maga G, Spadari S (2002) Eukaryotic DNA polymerases. Annu Rev
Biochem 71:133–163.
38. Wang TS (1991) Eukaryotic DNA polymerases. Annu Rev Biochem 60:513–552.
39. Shevelev IV, Hübscher U (2002) The 3′ 5′ exonucleases. Nat Rev Mol Cell Biol 3(5):
364–376.
40. Snoke MS, Berendonk TU, Barth D, Lynch M (2006) Large global effective population
sizes in Paramecium. Mol Biol Evol 23(12):2474–2479.
41. Katz LA, Snoeyenbos-West O, Doerder FP (2006) Patterns of protein evolution in
Tetrahymena thermophila: Implications for estimates of effective population size.
Mol Biol Evol 23(3):608–614.
42. Lynch M (2007) The Origins of Genome Architecture (Sinauer Associates, Sunderland,
MA.).
43. Li M, et al. (2010) Detecting heteroplasmy from high-throughput sequencing of
complete human mitochondrial DNA genomes. Am J Hum Genet 87(2):237–249.
44. Kimura M (1983) The Neutral Theory of Molecular Evolution (Cambridge Univ Press,
Cambridge, UK).
45. Eisen JA, et al. (2006) Macronuclear genome sequence of the ciliate Tetrahymena
thermophila, a model eukaryote. PLoS Biol 4(9):e286.
46. Pavlov YI, Shcherbakova PV, Rogozin IB (2006) Roles of DNA polymerases in replication, repair, and recombination in eukaryotes. Int Rev Cytol 255:41–132.
Sung et al.