Purifying Selection in Mammalian Mitochondrial

Purifying Selection in Mammalian Mitochondrial
Protein-Coding Genes Is Highly Effective and
Congruent with Evolution of Nuclear Genes
Konstantin Yu Popadin,*,y ,1 Sergey I. Nikolaev,y ,1 Thomas Junier,1 Maria Baranova,2 and
Stylianos E. Antonarakis*,1
1
Department of Genetic Medicine and Development, University of Geneva Medical School and iGE3 Institute of Genetics and
Genomics of Geneva, Switzerland
2
Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Leninskiye Gory, Moscow, Russia
y
These authors contributed equally to this work.
*Corresponding author: E-mail: [email protected]; [email protected].
Associate editor: Koichiro Tamura
Abstract
Key words: slightly deleterious mutations, purifying selection, effective population size, mitochondrial genome.
Introduction
The fitness of any mammalian species is genetically determined by the variation in nuclear (nucDNA) and mitochondrial (mitDNA) genomes. Thus, deleterious mutations in both
nuclear (Hamosh et al. 2005) and mitochondrial (Ruiz-Pesini
et al. 2007) genomes are eliminated by purifying selection to
maintain population fitness. However, these genomes differ in
mutation rate, effective population size, and level of recombination, which influence the rate of elimination of mutations
by natural selection, so that the mitDNA could be more susceptible to the accumulation of deleterious mutations. The
maternal inheritance of mitDNA prevents interparental recombination, and the mitochondrial bottlenecks in oogenesis
(Wai et al. 2008) decrease the number of mitochondrial genomes inherited from the mother to offspring that results in
four times smaller effective population size of mitDNA when
compared with nucDNA (Palumbi et al. 2001; Lynch et al.
2006). The absence of recombination may cause such genetic
processes such as the Hill–Robertson effects (background selection and selective sweeps) (Hill and Robertson 1966;
Charlesworth 2009) and the Muller’s ratchet (the irreversible
accumulation of slightly deleterious mutations) (Felsenstein
1974; Gordo et al. 2002). The low effective population size
increases power of both processes, which mutually decrease
the population’s fitness and can lead to the extinction of
species through the mutational meltdown (Gabriel et al.
1991; Lynch et al. 1993). The mutational meltdown is a process of extinction of small populations through the positive
loop between population size and genetic drift: small population size leads to high genetic drift, which results in accumulation of deleterious mutations and decrease of fitness,
which in turn decrease population size. The mutation rate
ß The Author 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 30(2):347–355 doi:10.1093/molbev/mss219 Advance Access publication September 14, 2012
347
Article
The mammalian mitochondrial genomes differ from the nuclear genomes by maternal inheritance, absence of recombination,
and higher mutation rate. All these differences decrease the effective population size of mitochondrial genome and make it more
susceptible to accumulation of slightly deleterious mutations. It was hypothesized that mitochondrial genes, especially in species
with low effective population size, irreversibly degrade leading to decrease of organismal fitness and even to extinction of species
through the mutational meltdown. To interrogate this hypothesis, we compared the purifying selections acting on the representative set of mitochondrial (potentially degrading) and nuclear (potentially not degrading) protein-coding genes in species
with different effective population size. For 21 mammalian species, we calculated the ratios of accumulation of slightly deleterious
mutations approximated by Kn/Ks separately for mitochondrial and nuclear genomes. The 75% of variation in Kn/Ks is explained
by two independent variables: type of a genome (mitochondrial or nuclear) and effective population size of species approximated
by generation time. First, we observed that purifying selection is more effective in mitochondria than in the nucleus that implies
strong evolutionary constraints of mitochondrial genome. Mitochondrial de novo nonsynonymous mutations have at least 5-fold
more harmful effect when compared with nuclear. Second, Kn/Ks of mitochondrial and nuclear genomes is positively correlated
with generation time of species, indicating relaxation of purifying selection with decrease of species-specific effective population
size. Most importantly, the linear regression lines of mitochondrial and nuclear Kn/Ks’s from generation times of species are
parallel, indicating congruent relaxation of purifying selection in both genomes. Thus, our results reveal that the distribution of
selection coefficients of de novo nonsynonymous mitochondrial mutations has a similar shape with the distribution of de novo
nonsynonymous nuclear mutations, but its mean is five times smaller. The harmful effect of mitochondrial de novo nonsynonymous mutations triggers highly effective purifying selection, which maintains the fitness of the mammalian mitochondrial
genome.
Popadin et al. . doi:10.1093/molbev/mss219
in the mitochondrial genomes of mammals is approximately
25-fold increased when compared with the nuclear genomes
(Lynch et al. 2006). This strong mutation pressure can additionally speed up the rates of genetic drift, Muller’s ratchet,
and mutational meltdown and decrease mitDNA effective
population size due to the linkage of neutral alleles with deleterious (Charlesworth et al. 1993) or favorable (Gillespie 2000)
mutations.
There are intensive debates about the efficiency of purifying selection in mitDNA. On one hand, several studies in line
with theoretical expectations have demonstrated that the
purifying selection in the mitDNA (tRNA, rRNA, and
protein-coding genes) is less effective than that of the
nucDNA (Lynch 1997; Lynch and Blanchard 1998). Furthermore, it has been observed in human pedigrees that there is a
weak or even nonexistent selection of pathogenic mitDNA
mutations (Jenuth et al. 1996; Chinnery et al. 2000). On the
other hand, there are recent studies suggesting strong purifying selection in mitDNA: 1) purifying selection of mammalian
mitochondria protein-coding genes is more effective than
that of orthologous genes in proteobacteria (with large population size and presence of recombination) (Mamirova et al.
2007); 2) a severe mitDNA mutation (a frameshift in the nad6
gene) that was introduced into mice has disappeared after
four generations demonstrating an effective purifying selection during oogenesis (Fan et al. 2008; Shoubridge and Wai
2008); and 3) using a mitDNA mutator strain of mice with a
proofreading-deficient mitDNA polymerase, a rapid and
strong elimination of nonsynonymous changes in proteincoding genes was observed during six generations (Stewart
et al. 2008).
Here, we use a set of 21 mammalian species with
sequenced mitochondrial and nuclear genomes to study
the controversial issue of purifying selection in mitDNA. We
estimated the rate of accumulation of slightly deleterious
mutations approximated by Kn/Ks for both genomes (mitochondrial or nuclear) of each species. To reveal potential
mitochondria-specific detrimental genetic processes, we contrasted purifying selections of mitochondrial (potentially
degrading) and nuclear (not degrading) genomes across
mammalian species with different effective population size.
We found that 1) the mitDNA selection is always more effective than nucDNA, 2) selections in mitDNA and nucDNA
are relaxing in parallel with decreasing of population size
of species, and 3) the relative efficiency of purifying selection
in mitDNA of species with low population size does not
decrease when compared with species with high population
size. Totally, our results provide evidence against
mitochondria-specific detrimental genetic processes in the
mammalian species.
Materials and Methods
Species Studied and Genomic Sequence Alignments
Pilot ENCODE Data Set
The 21 species studied and listed below represent all major
clades of eutherian, metaterian, and prototherian mammals:
Loxodonta africana (African elephant), Procavia capensis
348
MBE
(rock hyrax), Echinops telfairi (tenrec), Cavia porcellus
(guinea pig), Mus musculus (mouse), Oryctolagus cuniculus
(rabbit), Rattus norvegicus (rat), Felis catus (cat), Bos taurus
(cow), Canis lupus familiaris (dog), Equus caballus (horse),
Monodelphis domestica (monodephis), Pan troglodytes (chimpanzee), Colobus guereza (colobus monkey), Homo sapiens
(human), Mac. mulatta (macaque), Pongo abelii (orangutan),
Tupaia belangeri (tree chrew), Chlorocebus aethiops (vervet),
Ornithorhynchus anatinus (platypus), and Dasypus novemcinctus (armadillo). The alignments of coding sequences
(CDS) from the nuclear genomes were extracted from the
ENCODE pilot project regions. We used the TBA alignments
generated by the Multispecies Sequence Alignment group;
these alignments cover 30 Mb of human genomic DNA
(Margulies et al. 2007). The pilot ENCODE regions are selected
in semirandom way, where 30 regions are selected randomly
and 14 are selected because of biological interest or because
they have been extensively studied (ENCODE Project
Consortium 2004; Birney et al. 2007). The CDS alignment
was created by an in-frame concatenation of the longest
transcript per gene. To filter out the misaligned sequences
for each exon alignment, the branch length was estimated
using the phyml program (Guindon et al. 2009), and the tree
topology was adopted from Murphy et al. (2001). Misaligned
exons were detected by very long branch lengths (>0.36 substitutions per site) and substituted by missing data symbols.
Subsequently, exons were assembled into transcripts to maintain the ORFs in the human sequence. Only 156 transcripts
that maintain an ORF in all species were kept. To accommodate for insufficiently represented codons, only those codon
positions were kept where the data are present in at least one
representative species of each mammalian clade. For the
mitochondrial genomes, all 13 protein-coding sequences
with the total length of 3,882 codons were studied; for the
nuclear genomes, a total of 15,687 codons (from the original
input of 210,984 nts) from the pilot ENCODE regions were
used.
COX Data Set
Fourteen species were used for the COX gene data set
(Equus caballus, Sus scrofa, Bos taurus, Canis lupus familiaris,
Ailuropoda melanoleuca, Rattus norvegicus, Mus musculus,
Cavia porcellus, Callithrix jacchus, Macaca mulatta, Pongo
abelii, Pan troglodytes, Homo sapiens, and Loxodonta africana).
The COX data set included 802 codons from the nuclear
genes and 1,008 from the mitochondrial genes. From the
nuclear-coded COX genes, the fast evolving signal peptides
(Li et al. 2009) were removed. The signal peptides were identified using amino acid sequences from X-ray structure of
bovine heart cytochrome c oxidase (protein database ID:
2zxw) (Aoyama et al. 2009).
Data Set of ESSENTIAL Genes
The data set of essential nuclear genes was based on
the multiple species genomic alignment from the University
of California–Santa Cruz (UCSC) genome browser
(phastCons46way) (Fujita et al. 2011). We have selected
CDS of all genes, which are present at least in one species
Purifying Selection in Mammalian Mitochondrial Genomes . doi:10.1093/molbev/mss219
of each vertebrate clade (Euarchonta, Glires, Laurasiatheria,
Afrotheria,
Xenarthra,
Marsupiales,
Monotremata,
Archosauria, Amphibia, Actinopterygii, and Petromyzontiformes). From this data set, we have extracted a
subset of genes that are human housekeeping genes
(Chang et al. 2011) and that are associated with lethal knockout phenotype in mice (Yuan et al. 2012). The resulting data
set includes 34 nuclear genes: ABL1, ACVR1, BUB3, CANX,
COBRA1, COPS5, CSNK1D, CUL3, CYCS, DPAGT1, EGLN1,
EIF6, FTH1, HIF1A, HMGB1, KRAS, KRT10, MAP3K7, PCNA,
PDIA3, PITPNB, PPP2CA, PRKAR1A, PTGES3, RAC1, SHOC2,
SMAD2, TARDBP, TBP, TCEA1, TPT1, UBE2A, UBE2N, and YY1.
In this work, we did not exclude the CpG-prone sites from
the sequences of nuclear genome, to maintain more information in the alignments. In our previous work, all the results
were robust to the exclusion of CpG prone sites (Nikolaev
et al. 2007). The estimated generation time for each species
was taken from the AnAge database (de Magalhães et al.
2009). R package was used for all statistical analyses (R
Development Core Team 2012).
Sequence Analysis
For inferring trees based on synonymous (Ks) or nonsynonymous (Ka) substitutions, we used the codeml program of
PAML (Yang 1997) with a codon model specifying different
transition/transversion rate ratios, and different nucleotide
frequencies for each codon position, without gene partitioning, and imposing the topology adopted from Murphy et al.
(2001) (run mode = 0). All sites with ambiguity characters or
missing data were excluded (clean data = 1).
Ratio of Radical to Conservative Amino Acid
Substitutions (Kr/Kc)
The ratio of the rates of radical over conservative substitutions (Kr/Kc) was estimated by comparison of the nucleotide
sequences of extant animals with the nucleotide sequences of
their most recent reconstructed ancestors as described in
Zhang (2000). The ancestral nucleotide sequences were reconstructed using Yang’s (1997) method implemented in
PAML. Because Kr and Kc values were small (<0.3), the
Jukes–Cantor formula was used to correct for multiple hits;
thus, our estimated Kr/Kc ratio is identical to the dR/dC ratio of
Zhang (2000).
The 20 amino acids were classified into four groups according to their volume, charge, polarity, and both polarity
and volume (Taylor 1986). Amino acid substitutions within
groups (i.e., when ancestral and modern amino acids in homologous sites belong to the same group) were regarded as
conservative, whereas those between groups as radical.
Average Grantham Distance
To measure amino acid dissimilarity, we computed an average physicochemical distance between modern species and
their most recent reconstructed ancestors. The distance between each ancestral and derived amino acid was taken from
Grantham’s matrix (Grantham 1974) and averaged over all
pairs of substitutions for a given external branch.
MBE
Evaluation of Distribution of Selection Coefficient of
De Novo Nonsynonymous mitDNA Mutations
Because Nemit = 0.25 Nenuc (Palumbi et al. 2001), we expect
that a fraction of effectively neutral mutations should be
higher for the mitochondrial than for the nuclear genome
(according to the scheme on fig. 3, it should be approximately
20% when compared with 10% for nucleus). However, for
mitochondrial data, we observe 5% of fixed slightly deleterious mutations (Kn/Ks 0.05) not 20%, and to fit this value,
we had to change the mean in the density distribution of
selection coefficients of mitochondrial de novo nonsynonymous mutations and make it one order of magnitude
more harmful (from 1 103 to 1 102—the green
curve on fig. 3).
Results
For both the mitochondrial and the nuclear genomes of each
of 21 mammalian species (hereafter pilot ENCODE data set,
see Materials and Methods), two evolutionary metrics were
estimated: 1) the rates of accumulation of slightly deleterious
mutations, approximated by Kn/Ks and 2) the amino acid
dissimilarity between modern species and their last reconstructed ancestor, approximated by Kr/Kc ratios and the average Grantham distance.
Rates of Accumulation of Slightly Deleterious
Mutations
The efficiency of purifying selection was approximated by
ratios of nonsynonymous substitutions per nonsynonymous
site to synonymous substitutions per synonymous site in the
mitDNA (Knmit/Ksmit) and nucDNA (Knnuc/Ksnuc) on the
terminal branches of the tree for all 21 mammalian species.
The Kn/Ks in mitDNA was about twice smaller than in the
nucDNA (with averages Knmit/Ksmit = 0.047 and Knnuc/
Ksnuc = 0.108 among all 21 tested mammals, P < 0.001,
Mann–Whitney U test).
The ENCODE phylogenetic tree contains both closely and
distantly related species, which can introduce biases in the
estimation of Kn/Ks due to the saturation of synonymous
substitutions per synonymous site (Ks) on long branches. All
nuclear Ks values were smaller than 0.5, suggesting a small
effect of saturation. However, the mitochondrial Ks values
range from 0.19 to 2.94, and thus, some of them are affected
by saturation. To test the robustness of our results, we performed the same analysis on a subset of closely related species
(primates), which are most likely not biased by saturation
effect due to small Ksmit values (<0.6). The analysis of six
primate species confirmed more effective purifying selection
in mitDNA than in the nucDNA (Knmit/Ksmit = 0.079 and
Knnuc/Ksnuc = 0.133, P = 0.03). To test the accuracy of estimation of Ksmit on the full data set, we investigated association
between Ksnuc and Ksmit. The observed relationship between
Ksnuc and Ksmit are well described by linear regression through
the origin (Ksnuc = 0.10 Ksmit, P = 3.6 1013, R2 = 0.93,
supplementary fig. S1, Supplementary Material online). This
indicates an accurate reconstruction of the number of synonymous mutations per synonymous site in the mitDNA
349
MBE
0.25
Popadin et al. . doi:10.1093/molbev/mss219
nucDNA
mitDNA
0.15
0.20
●
●
Kn/Ks
●
●
●
●
0.10
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
0.05
●
●
●
●
●
●
●
●
●
●
●
●
●
0.00
●●●
●
● ●
●
0
1000
2000
3000
4000
generation time (GT, days)
FIG. 1. Relationships between Kn/Ks of nuclear genes and generation time (black regression lines) and between Kn/Ks of mitochondrial genes and
generation time (gray regression lines) for 21 mammalian species from the ENCODE data set. The list of species, sorted according to their generation
times (days of female maturity), is as follows: mouse (42), tree shrew (60), guinea pig (66), rat (90), monodelphis (122), cat (289), tenrec (365), armadillo
(365), rock hyrax (500), dog (510), cow (548), platypus (548), rabbit (730), horse (914), vervet (1,034), macaque (1,231), colobus monkey (1,461),
chimpanzee (3,376), African elephant (4,018), orangutan (4,493), and human (4,745).
despite of the saturation effect. The linear relationship was
also observed between Knnuc and Knmit (Knnuc = 0.23 Knmit,
P = 1.5 1010, R2 = 0.88, supplementary fig. S1, Supplementary Material online).
To evaluate the influence of effective population size
of each species on purifying selection in mitDNA and
nucDNA, we performed comparative analysis of variation in
Kn/Ks. Because the generation time of species (GT) inversely
correlates with population size (Ne) (Chao and Carr 1993),
we correlated GT with Knmit/Ksmit and Knnuc/Ksnuc of the 21
mammals studied. We have observed highly significant
linear regressions between Kn/Ks and GT for both
genomes (Knmit/Ksmit = 0.037 + 0.88 105 GT, R2 = 0.31,
P = 0.0089; Knnuc/Ksnuc = 0.094 + 1.18 105 GT, R2 = 0.31,
P = 0.0088, where GT is an age of female maturity of each
species in days) (fig. 1). Furthermore, the ratio (Knmit/
Ksmit)/(Knnuc/Ksnuc), which determines relative efficiency
of purifying selection in mitDNA, is remarkably constant
across all studied species and does not demonstrate significant regression with generation time (P = 0.2). These results indicate that when Ne of species decreases, the
relaxation of purifying selection is similar in nucDNA and
mitDNA (see the same slopes of the linear regression lines
in fig. 1).
To compare the relative efficiency of purifying selection in
mitDNA (Knmit/Ksmit)/(Knnuc/Ksnuc) between species with
350
low effective population size versus species with high effective
population size, we divided our species into two groups by GT
median value (548 days). Comparison of relative efficiencies of
purifying selections in mitDNA (Knmit/Ksmit)/(Knnuc/Ksnuc)
between these two groups did not demonstrate significant
difference (P = 0.1, Mann–Whitney U test).
To estimate the fraction of explained variation in Kn/Ks by
both the GT and the type of genomes G (mitDNA G = 0 and
nucDNA G = 1), we applied a multiple linear regression
model. Our results are shown in the equation as follows:
logðKn=KsÞ ¼ 4:3 + 1:34 G + 0:18 logðGTÞ
0:07 ½G logðGTÞ;
2
ð1Þ
11
R ¼ 0:747, P ¼ 1:9 10
We have thus observed that 75% of variation in Kn/Ks is
explained by these two variables. The P values corresponding
to G and GT are highly significant (0.0032 and 0.0004, respectively), whereas the interaction of variables (G GT) is not
significant (P = 0.28); these results confirm the analysis shown
earlier.
To evaluate the generality of the observed difference in
purifying selection between mitDNA and nucDNA, we performed the Kn/Ks analysis not only for the semirandomly
selected nuclear genes (pilot ENCODE data set) but also for
a set of genes encoding subunits of complex IV (COX) of
MBE
Purifying Selection in Mammalian Mitochondrial Genomes . doi:10.1093/molbev/mss219
Average Physicochemical Distance of Amino Acid
Substitutions
We also studied the rate of amino acid substitutions that
impact on protein structure, that is, the physicochemical distance between the ancestral and the derived amino acid in
protein-coding genes from both mitDNA and nucDNA. This
was measured by the ratio of radical over conservative
changes Kr/Kc and the average Grantham distance (see
Materials and Methods). The rate of substitutions between
amino acids with different charge is significantly less frequent
in mitDNA than in nucDNA (for charge-based Kr/Kc: Krmit/
Kcmit = 0.40 and Krnuc/Kcnuc = 0.60, P < 0.001). All other substitutions types, that is, different polarity, volume, or both, are
0.7
0.6
0.5
Kn/Ks
a respiratory chain (hereafter COX data set), which is functionally related to mitochondria. This COX complex consists
of 13 subunits, 3 of which are coded in the mitochondrial and
10 in the nuclear genome (Scheffler 1999). Because this complex represents one integral structure, the two genomes
should coevolve (Osada and Akashi 2012) to maintain the
structural and functional properties of this complex. The
mitochondria-encoded COX genes had 15 times smaller
Kn/Ks when compared with nuclear-encoded COX genes
(median value of Knmit/Ksmit is 0.014 and 0.222 for Knnuc/
Ksnuc, P = 1 104, paired Mann–Whitney U test, fig. 2). This
difference is significantly higher than that observed on the
pilot ENCODE data set. Linear regressions between Kn/Ks and
GT in both genomes were not significant (P > 0.16); this
could be due to the small sample size of species (14) and
the small number of studied genes (3 in mitochondria and 10
in the nucleus). However, one-sided Kendall’s rank correlations demonstrated the expected positive trends between
GT and Kn/Ks for both mitochondrial (Kendall’s tau = 0.45,
P value = 0.01) and nuclear genes (Kendall’s tau = 0.52,
P value: 0.005).
The nuclear genome contains many nonconserved genes
with uncertain function, which may increase the overall Kn/
Ks ratio. To validate our results obtained with the high-quality
pilot ENCODE alignment, we have constructed an additional
independent data set of nuclear CDS based on 46 multiple
species alignment from the UCSC genome browser
(phastCons46way). From this alignment, we have selected a
subset of 34 essential genes (i.e., genes with lethal knockout
phenotype in mice and human housekeeping genes), which
are conserved across all vertebrates (see Materials and
Methods). These genes are comparable with mitochondrial
genes in terms of functional importance and the level of
conservation. The Kn/Ks ratios of the nuclear essential
genes were significantly lower than the Kn/Ks ratios of the
pilot ENCODE data set for 16 overlapping mammalian species
(Kn/Ks = 0.079 for nuclear essential genes and Kn/Ks = 0.099
for pilot ENCODE data sets, P value = 0.038, paired
Mann–Whitney U test). Furthermore, this set of essential
nuclear genes is characterized by 2-fold higher Kn/Ks ratio
than the mitochondrial genes (Knnuc/Ksnuc = 0.079 and
Knmit/Ksmit = 0.039, P value = 3 105, paired Mann–
Whitney U test).
0.4
0.3
0.2
0.1
●
●
0.0
mitDNA
nucDNA
FIG. 2. Box-and-whisker plots of Kn/Ks of mitochondria and nuclear
genes from the COX data set in 14 mammalian species (see Materials
and Methods). The horizontal bold line corresponds to median, and
bottom and upper lines of box are lower and upper quartiles, respectively. Whiskers extend out from the box no more than 1.5 times the
interquartile range from the box.
similar between mitochondrial and nuclear genomes (for
polarity-based Kr/Kc: Krmit/Kcmit = 0.81 and Krnuc/Kcnuc =
0.74, P = 0.43; for volume-based Kr/Kc: Krmit/Kcmit = 0.73
and Krnuc/Kcnuc = 0.82, P = 0.23; and for both polarity and
volume-based Kr/Kc: Krmit/Kcmit = 0.46 and Krnuc/
Kcnuc = 0.47, P = 0.88). The Grantham distance metric,
which is based on composition, polarity, and molecular
volume, is significantly smaller in mitDNA when compared
with the nucDNA (56.8 ± 5 versus 59.5 ± 2 Grantham units,
respectively, P = 0.013).
Comparison of Deleterious Effects of Nonsynonymous
Mutations in Mitochondria and Nucleus
We further compared the selection coefficient of nonsynonymous mutations in mitDNA and nucDNA (smit/snuc). If silent
synonymous sites evolve neutrally, the ratio of the fixation
probability for a mutation with selection coefficient s 6¼ 0 to
the fixation probability for a neutral mutation with s = 0 can
be equated with the Kn/Ks ratio (Kimura 1983):
Kn=Ks ¼ S=½1 expðSÞ ;
ð2Þ
where S = Nes.
Because our estimate of Knnuc/Ksnuc was 0.108 and Knmit/
Ksmit was 0.047 (see earlier), we computed Snuc = 3.4 and
Smit = 4.5. Because Nemit = 1/4Nenuc, the estimated ratio of
selection coefficients smit/snuc = 5.3. Thus, the deleterious
effect of amino acid substitutions in mitDNA is 5-fold
higher than in the nucDNA.
We then attempted to reconstruct the distribution of selection coefficients of de novo nonsynonymous mutations in
mitDNA. It was estimated empirically for nucDNA that the
fractions of amino acid replacements that reduce fitness by
>102 (lethal and semilethal), 102–104 (pathogenic mutations, causing Mendelian diseases), 104–105 (segregating
in the human population as nonsynonymous variants), and
<105 (reaching fixation in the human–chimpanzee divergence) are 25%, 49%, 14%, and 12%, respectively (gray bins of
fig. 3) (Yampolsky et al. 2005). We fitted a normal distribution
to those bines (the red curve of fig. 3) and assumed that the
351
Popadin et al. . doi:10.1093/molbev/mss219
MBE
FIG. 3. Distributions of selection coefficients of de novo nonsynonymous mutations in mitochondrial and nuclear genomes. The red curve represents
the normal distribution that fits the empirical distribution (the four gray bins) of the selection coefficient of mutations in human nuclear-coded proteins
(Yampolsky et al. 2005). The horizontal bold red line marks the region of nuclear effectively neutral mutations with jsj < 0.5 Nenuc1 assuming the
long-term Nenuc of human population as 5 104 (Yampolsky et al. 2005). The horizontal bold green line marks the region of mitochondrial effectively
neutral mutations with jsj < 0.5 Nemit1 assuming the long-term Nemit = 0.25 Nenuc = 1.25 104. The red area (10%) represents the fraction of the
effectively neutral mutations accumulated in nuclear DNA, which corresponds to Knnuc/Ksnuc = 0.1. The area with green horizontal lines (20%)
represents the expected fraction of effectively neutral mutations accumulated in mitochondrial DNA assuming the same distribution of selection
coefficients as in nuclear DNA. The green curve represents the hypothetical shift (the gray arrow) in the distribution of selection coefficients of
mitochondrial mutations, which was obtained by reducing the mean of the red distribution by one order of magnitude. The area with vertical green
lines (5%) represents the fraction of effectively neutral mitochondrial mutations in the shifted distribution, which fits our empirical results (Knmit/
Ksmit 0.05).
distribution of selection coefficients in mitDNA is the same as
in the nucDNA. Because Nemit is 4-fold less than Nenuc, the
expected Kn/Ks in mitDNA is 0.2 (area with horizontal green
lines, fig. 3). However, because our observed Kn/Ks in mitDNA
is 2-fold less than in nucDNA (0.047 vs. 0.108), we had to shift
the mean of the distribution of selection coefficient of
mitDNA on one order of magnitude (green curve of fig. 3,
see Materials and Methods for details).
Discussion
Congruent Mitochondrial and Nuclear Purifying
Selection
The relaxation of purifying selection with decrease of effective
population size of species (Ne) has been demonstrated separately for mitochondrial (Popadin et al. 2007) and nuclear
(Nikolaev et al. 2007) protein-coding genes of mammals.
However, the comparison of dynamics of accumulation of
mutations in mitDNA and nucDNA gave contradictory results (Bazin et al. 2006; Mulligan et al. 2006; Piganeau and
Eyre-Walker 2009), and comparison of the rates of relaxation
of purifying selection of these two genomes on the same set
of species have never been performed before.
In this work, we observe that the rates of accumulation of
slightly deleterious mutations in mitDNA and nucDNA are
parallel. This indicates that effective population sizes of
352
mitochondrial (Nemit) and nuclear (Nenuc) genomes are positively correlated with each other. The parallel dynamics
also imply that the relaxation of purifying selection with decrease of species-specific effective population size leads
to proportional increases of fraction of effectively neutral mutations in mitochondrial and nuclear genes. This means
the similar shapes of distributions of selection coefficients
of de novo nonsynonymous mutations in mitDNA and
nucDNA.
It has been proposed that mammalian species with low
effective population size are more prone to extinction when
compared with species with high population size (Polishchuk
2002; Popadin et al. 2007). If extinction is associated to gradual
decrease of population size of the species with low Ne, we
expect that in these species, (Knmit/Ksmit)/(Knnuc/Ksnuc)
would be significantly smaller than in species with stable
population size. Knmit/Ksmit is more sensitive to the recent
changes in population size when compared with Knnuc/Ksnuc
because of low Nemit and correspondingly short coalescence
time of mutations in mitochondrial genomes. So the ratio
(Knmit/Ksmit)/(Knnuc/Ksnuc) under the assumption of absence
of frequent positive selection in both genomes reflects the
ratio of short-term purifying selection to long-term purifying
selection and allows to infer recent population dynamics of
species. In our study, we found that the ratio (Knmit/Ksmit)/
Purifying Selection in Mammalian Mitochondrial Genomes . doi:10.1093/molbev/mss219
(Knnuc/Ksnuc) is the same across all studied species, implying
the similar, and most likely stable, recent demographic history
of the studied species. Thus, our study does not support any
evidence of extinction of species with low Ne due to gradual
decrease in their population size.
Highly Effective Purifying Selection in mitDNA
Mitochondrial genes are expected to undergo less effective
purifying selection than nuclear genes because of a 4-fold
lower effective population size (Palumbi et al. 2001; Lynch
et al. 2006) and absence of recombination. However, our results demonstrate the opposite: we observed significantly
higher purifying selection in mitDNA when compared with
nucDNA (using Kn/Ks, charge-based Kr/Kc, and Grantham’s
distance), across all investigated mammalian species. We also
estimated that the de novo nonsynonymous substitutions in
mitDNA are on average 5-fold more deleterious when compared with nucDNA. Our results are compatible with study
on more effective selection in mammalian mitochondria
versus proteobacterial orthologs (Mamirova et al. 2007) and
the fast elimination of deleterious mutations during mouse
oogenesis (Fan et al. 2008; Shoubridge and Wai 2008; Stewart
et al. 2008).
There are three recent lines of evidence from human population genetic studies on higher evolutionary constrains of
mitochondria genes than the nuclear genes. 1) In mitDNA,
the ratio of polymorphic to fixed mutations (50–90%) is significantly higher when compared with nucDNA (8–28%)
(Hasegawa et al. 1998; Subramanian 2011); 2) there is a
lower ratio of nonsynonymous to synonymous polymorphisms in human mitDNA than in the nucDNA (Breen and
Kondrashov 2010); and 3) there is an apparent discrepancy
between the estimated mutation rate in mitDNA from comparative species studies and from human pedigrees, and this
discrepancy does not exist in nucDNA. The mutation rate
in nucDNA is 1.2–1.0 108 per generation per nt (observation for the genome sequence of two parent–offspring trios
[Durbin et al. 2010]), and 2.5 108 per generation per nt
(human–chimpanzee comparison [Nachman and Crowell
2000]). Contrary to this similarity, there is 20-fold difference
in the mitDNA mutation rates, estimated for hypervariable
regions in mitochondrial control region: this estimate is
5 105 per generation per nt in mother–offspring lineages
and 2.4 106 per generation per nt from comparative species analysis (Parsons et al. 1997; Howell et al. 2003). This
discrepancy could be explained by a strong purifying selection
against nearly neutral mutations in control region or by a
strong purifying selection against deleterious mutations,
which are linked to neutral ones.
Potential Mechanisms of Effective Purifying Selection
in Mitochondria
The effectively neutral theory cannot explain the strong purifying selection in mitDNA taking into account only the absence of recombination and the low Ne of mitochondria.
Thus, additional mechanisms should be considered; these include 1) mitochondrial bottlenecks that alters the
MBE
heteroplasmy level of deleterious mutations facilitating negative selection (Bergstrom and Pritchard 1998; Shoubridge and
Wai 2008); 2) effective haploidy is associated with more
strong negative selection due to the absence of the masking
effect of the second allele (Kondrashov and Crow 1991); 3)
thousands of copies of mitDNA per somatic cells and linkage
of the number with oxidative phosphorylation activity of the
cell (Fernández-Vizarra et al. 2011) are associated with high
expression level and thus with high selective constraints of
mitochondrially encoded proteins; and 4) extensive protein–
protein interactions in complexes of respiratory chain can
additionally constrain mitochondrial genes (Fraser et al. 2003).
Evolutionarily, the mammalian mitochondrial genome
could be compared with the nonrecombining regions of
the Y chromosome, because in both there is absence of recombination and low Ne. Thus the Y chromosome could be
used as an analog to the mitochondrial genome evolutionary
processes. During the evolution of the ancestral Y chromosome, there is considerable gene loss (Charlesworth 2003), but
because of different “survival” rates, critically important genes
with low Kn/Ks are over-represented in the derived Y
chromosomes (Bachtrog et al. 2008; Chibalina and Filatov
2011). The same process has important role in the evolution
of bacterial genomes (Mira et al. 2001). We suggest a similar
process in the mitDNA in which the majority of ancestral
genes were eliminated (either degraded or migrated to the
nucDNA), and only highly conserved genes have remained.
Indeed, only genes encoding core subunits of the respiratory
chain complexes have remained in the mitDNA (Scheffler
1999). This concentration of critically important genes
during the evolution of the mitDNA has two consequences:
the average deleterious effect (selection coefficient s) of de
novo mutations is increased and the number of new slightly
deleterious mutations per genome per generation (u) is
decreased due to the minimization of this genome. Both
these trends (high s and low u) work against detrimental
mechanisms such as Muller’s ratchet and background selection (Bachtrog 2008). The third potentially detrimental process, that is, the fixation of deleterious mutations caused by
genetic hitchhiking can be important even on short nonrecombining regions (Bachtrog 2008). However, in the mitDNA
of mammals, there is no evidence of frequent positive selection (Mamirova et al. 2007), and therefore we do not expect
to observe massive accumulation of deleterious mutations
due to hitchhiking.
Conclusion
Our observations reveal that the distribution of selection coefficients of de novo mutations in mitDNA and nucDNA have
a similar shape but different means; on average, mitDNA
mutations are five times more harmful when compared
with nucDNA mutations. This shift of selection coefficients
of mitochondrial de novo nonsynonymous mutations toward
more deleterious can be partially due to the elimination of
nonessential genes from the mitochondrial genome during
evolution because only highly constrained genes can survive
on a nonrecombining genome with low effective population
size (Mira et al. 2001; Bachtrog et al. 2008; Chibalina and
353
Popadin et al. . doi:10.1093/molbev/mss219
Filatov 2011). Other reasons for decreased selection coefficients of mitochondrial de novo nonsynonymous mutations
could be 1) mitochondrial bottlenecks, 2) effective haploidy,
3) multiple copies of mitDNA per cell, 4) high level of expression of mitochondrial genes, and 5) multiple protein–protein
interactions of mitochondria encoded subunits. Altogether,
these traits increase or unmask the deleterious effects of de
novo nonsynonymous mutations in the mitochondrial
genome providing the mechanism of effective purifying selection and maintaining the fitness of the mammalian mitochondrial genome.
Supplementary Material
Supplementary figure S1 is available at Molecular Biology and
Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
The authors are grateful to S. Kryazhimskiy, S. Subramanian,
G. Bazykin, D. Filatov, and M. Lynch for constructive criticism.
This work was supported by EMBO long-term fellowship program ALTF 527-2010 and RFBR 10-04-00276-‘ grants to K.Y.P.
and by the Swiss National Science Foundation to S.E.A.
References
Aoyama H, Muramoto K, Shinzawa-Itoh K, Hirata K, Yamashita E,
Tsukihara T, Ogura T, Yoshikawa S. 2009. A peroxide bridge between
Fe and Cu ions in the O2 reduction site of fully oxidized cytochrome
c oxidase could suppress the proton pump. Proc Natl Acad Sci U S A.
106:2165–2169.
Bachtrog D. 2008. The temporal dynamics of processes underlying Y
chromosome degeneration. Genetics 179:1513–1525.
Bachtrog D, Hom E, Wong KM, Maside X, de Jong P. 2008. Genomic
degradation of a young Y chromosome in Drosophila miranda.
Genome Biol. 9:R30.
Bazin E, Glémin S, Galtier N. 2006. Population size does not influence
mitochondrial genetic diversity in animals. Science 312:570–572.
Bergstrom CT, Pritchard J. 1998. Germline bottlenecks and the evolutionary maintenance of mitochondrial genomes. Genetics 149:
2135–2146.
Birney E, Stamatoyannopoulos JA, Dutta A, et al. (311 co-authors). 2007.
Identification and analysis of functional elements in 1% of the
human genome by the ENCODE pilot project. Nature 447:799–816.
Breen MS, Kondrashov FA. 2010. Mitochondrial pathogenic mutations
are population-specific. Biol Direct. 5:68.
Chang C-W, Cheng W-C, Chen C-R, Shu W-Y, Tsai M-L, Huang C-L, Hsu
IC. 2011. Identification of human housekeeping genes and
tissue-selective genes by microarray meta-analysis. PloS One 6:
e22859.
Chao L, Carr DE. 1993. The molecular clock and the relationship between population size and generation time. Evolution 47:688–690.
Charlesworth B. 2003. The organization and evolution of the human Y
chromosome. Genome Biol. 4:226.
Charlesworth B. 2009. Effective population size and patterns of molecular evolution and variation. Nat Rev Genet. 10:195–205.
Charlesworth B, Morgan MT, Charlesworth D. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:
1289–1303.
354
MBE
Chibalina MV, Filatov DA. 2011. Plant Y chromosome degeneration is
retarded by haploid purifying selection. Curr Biol. 21:1475–1479.
Chinnery PF, Thorburn DR, Samuels DC, White SL, Dahl HM, Turnbull
DM, Lightowlers RN, Howell N. 2000. The inheritance of mitochondrial DNA heteroplasmy: random drift, selection or both? Trends
Genet. 16:500–505.
de Magalhães JP, Budovsky A, Lehmann G, Costa J, Li Y, Fraifeld V,
Church GM. 2009. The Human Ageing Genomic Resources: online
databases and tools for biogerontologists. Aging Cell. 8:65–72.
Durbin RM, Altshuler DL, Abecasis GR, et al. (362 co-authors). 2010. A
map of human genome variation from population-scale sequencing.
Nature 467:1061–1073.
ENCODE Project Consortium. 2004. The ENCODE (ENCyclopedia Of
DNA Elements) Project. Science 306:636–640.
Fan W, Waymire KG, Narula N, Li P, Rocher C, Coskun PE, Vannan MA,
Narula J, Macgregor GR, Wallace DC. 2008. A mouse model of
mitochondrial disease reveals germline selection against severe
mtDNA mutations. Science 319:958–962.
Felsenstein J. 1974. The evolutionary advantage of recombination.
Genetics 78:737–756.
Fernández-Vizarra E, Enrı́quez JA, Pérez-Martos A, Montoya J,
Fernández-Silva P. 2011. Tissue-specific differences in mitochondrial
activity and biogenesis. Mitochondrion 11:207–213.
Fraser HB, Wall DP, Hirsh AE. 2003. A simple dependence between
protein evolution rate and the number of protein-protein interactions. BMC Evol Biol. 3:11.
Fujita PA, Rhead B, Zweig AS, et al. (27 co-authors). 2011. The UCSC
Genome Browser database: update 2011. Nucleic Acids Res. 39:
D876–D882.
Gabriel W, Burger R, Lynch M. 1991. Population extinction by mutational load and demographic stochasticity. In: Seitz A, Loeschcke V,
editors. Species Conservation: A Population-Biological Approach.
Basel (Switzerland): Birkhauser Verlag. p. 49–59.
Gillespie JH. 2000. Genetic drift in an infinite population. The pseudohitchhiking model. Genetics 155:909–919.
Gordo I, Navarro A, Charlesworth B. 2002. Muller’s ratchet and the
pattern of variation at a neutral locus. Genetics 848:835–848.
Grantham R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862–864.
Guindon S, Delsuc F, Dufayard J-F, Gascuel O. 2009. Estimating maximum likelihood phylogenies with PhyML. Methods Mol Biol. 537:
113–137.
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. 2005.
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of
human genes and genetic disorders. Nucleic Acids Res. 33:
D514–D517.
Hasegawa M, Cao Y, Yang Z. 1998. Preponderance of slightly deleterious
polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species.
Mol Biol Evol. 15:1499–1505.
Hill WG, Robertson A. 1966. The effect of linkage on limits to artificial
selection. Genet Res. 8:269–294.
Howell N, Smejkal CB, Mackey DA, Chinnery PF, Turnbull DM,
Herrnstadt C. 2003. The pedigree rate of sequence divergence in
the human mitochondrial genome: there is a difference between
phylogenetic and pedigree rates. Am J Hum Genet. 72:659–670.
Jenuth JP, Peterson AC, Fu K, Shoubridge EA. 1996. Random genetic drift
in the female germline explains the rapid segregation of mammalian
mitochondrial DNA. Nat Genet. 14:146–151.
Purifying Selection in Mammalian Mitochondrial Genomes . doi:10.1093/molbev/mss219
Kimura M. 1983. The neutral theory of molecular evolution. Cambridge
(UK): Cambridge University Press.
Kondrashov AS, Crow JF. 1991. Haploidy or diploidy: which is better?
Nature 351:314–315.
Li Y-D, Xie Z-Y, Du Y-L, Zhou Z, Mao X-M, Lv L-X, Li Y-Q. 2009.
The rapid evolution of signal peptides is mainly caused by relaxed
selection on non-synonymous and synonymous sites. Gene 436:
8–11.
Lynch M. 1997. Mutation accumulation in nuclear, organelle, and prokaryotic transfer RNA genes. Mol Biol Evol. 14:914–925.
Lynch M, Blanchard JL. 1998. Deleterious mutation accumulation in
organelle genomes. Genetica 102–103:29–39.
Lynch M, Bürger R, Butcher D, Gabriel W. 1993. The mutational meltdown in asexual populations. J Hered. 84:339–344.
Lynch M, Koskella B, Schaack S. 2006. Mutation pressure and the
evolution of organelle genomic architecture. Science 311:1727–1730.
Mamirova L, Popadin K, Gelfand MS. 2007. Purifying selection in mitochondria, free-living and obligate intracellular proteobacteria. BMC
Evol Biol. 7:17.
Margulies EH, Cooper GM, Asimenos G, et al. (77 co-authors). 2007.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17:
760–774.
Mira A, Ochman H, Moran NA. 2001. Deletional bias and the evolution
of bacterial genomes. Trends Genet. 17:589–596.
Mulligan CJ, Kitchen A, Miyamoto MM. 2006. Comment on “Population
size does not influence mitochondrial genetic diversity in animals.”
Science 314:1390.
Murphy WJ, Eizirik E, Johnson WE, Zhang YP, Ryder OA, O’Brien SJ. 2001.
Molecular phylogenetics and the origins of placental mammals.
Nature 409:614–618.
Nachman MW, Crowell SL. 2000. Estimate of the mutation rate per
nucleotide in humans. Genetics 156:297–304.
Nikolaev SI, Montoya-Burgos JI, Popadin K, Parand L, Margulies EH,
Antonarakis SE. 2007. Life-history traits drive the evolutionary
rates of mammalian coding and noncoding genomic elements.
Proc Natl Acad Sci U S A. 104:20443–20448.
Osada N, Akashi H. 2012. Mitochondrial-nuclear interactions and accelerated compensatory evolution: evidence from the primate cytochrome C oxidase complex. Mol Biol Evol. 29:337–346.
Palumbi SR, Cipriano F, Hare MP. 2001. Predicting nuclear gene coalescence from mitochondrial data: the three-times rule. Evolution 55:
859–868.
MBE
Parsons TJ, Muniec DS, Sullivan K, et al. (11 co-authors). 1997. A high
observed substitution rate in the human mitochondrial DNA control region. Nat Genet. 15:363–368.
Piganeau G, Eyre-Walker A. 2009. Evidence for variation in the effective
population size of animal mitochondrial DNA. PloS One 4:e4396.
Polishchuk LV. 2002. Ecology. Conservation priorities for Russian mammals. Science 297:1123.
Popadin K, Polishchuk LV, Mamirova L, Knorre D, Gunbin K. 2007.
Accumulation of slightly deleterious mutations in mitochondrial
protein-coding genes of large versus small mammals. Proc Natl
Acad Sci U S A. 104:13390–13395.
R Development Core Team. 2012. R: a language and environment for
statistical computing. Vienna: R Foundation for Statistical
Computing.
Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC, Mishmar D,
Yi C, Kreuziger J, Baldi P, Wallace DC. 2007. An enhanced MITOMAP
with a global mtDNA mutational phylogeny. Nucleic Acids Res. 35:
D823–D828.
Scheffler IE. 1999. Mitochondria. New York: Wiley-Liss.
Shoubridge EA, Wai T. 2008. Medicine. Sidestepping mutational meltdown. Science 319:914–915.
Stewart JB, Freyer C, Elson JL, Wredenberg A, Cansu Z, Trifunovic A,
Larsson N-G. 2008. Strong purifying selection in transmission
of mammalian mitochondrial DNA. Hurst LD, editor. PLoS Biol.
6:e10.
Subramanian S. 2011. High proportions of deleterious polymorphisms in
constrained human genes. Mol Biol Evol. 28:49–52.
Taylor WR. 1986. The classification of amino acid conservation. J Theor
Biol. 119:205–218.
Wai T, Teoli D, Shoubridge EA. 2008. The mitochondrial DNA genetic
bottleneck results from replication of a subpopulation of genomes.
Nat Genet. 40:1484–1488.
Yampolsky LY, Kondrashov FA, Kondrashov AS. 2005. Distribution of
the strength of selection against amino acid replacements in human
proteins. Hum Mol Genet. 14:3191–3201.
Yang Z. 1997. PAML: a program package for phylogenetic analysis by
maximum likelihood. Comput Appl Biosci. 13:555–556.
Yuan Y, Xu Y, Xu J, Ball RL, Liang H. 2012. Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic
data. Bioinformatics 28:1246–1252.
Zhang J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J Mol Evol. 50:
56–68.
355