Purifying Selection in Mammalian Mitochondrial Protein-Coding Genes Is Highly Effective and Congruent with Evolution of Nuclear Genes Konstantin Yu Popadin,*,y ,1 Sergey I. Nikolaev,y ,1 Thomas Junier,1 Maria Baranova,2 and Stylianos E. Antonarakis*,1 1 Department of Genetic Medicine and Development, University of Geneva Medical School and iGE3 Institute of Genetics and Genomics of Geneva, Switzerland 2 Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Leninskiye Gory, Moscow, Russia y These authors contributed equally to this work. *Corresponding author: E-mail: [email protected]; [email protected]. Associate editor: Koichiro Tamura Abstract Key words: slightly deleterious mutations, purifying selection, effective population size, mitochondrial genome. Introduction The fitness of any mammalian species is genetically determined by the variation in nuclear (nucDNA) and mitochondrial (mitDNA) genomes. Thus, deleterious mutations in both nuclear (Hamosh et al. 2005) and mitochondrial (Ruiz-Pesini et al. 2007) genomes are eliminated by purifying selection to maintain population fitness. However, these genomes differ in mutation rate, effective population size, and level of recombination, which influence the rate of elimination of mutations by natural selection, so that the mitDNA could be more susceptible to the accumulation of deleterious mutations. The maternal inheritance of mitDNA prevents interparental recombination, and the mitochondrial bottlenecks in oogenesis (Wai et al. 2008) decrease the number of mitochondrial genomes inherited from the mother to offspring that results in four times smaller effective population size of mitDNA when compared with nucDNA (Palumbi et al. 2001; Lynch et al. 2006). The absence of recombination may cause such genetic processes such as the Hill–Robertson effects (background selection and selective sweeps) (Hill and Robertson 1966; Charlesworth 2009) and the Muller’s ratchet (the irreversible accumulation of slightly deleterious mutations) (Felsenstein 1974; Gordo et al. 2002). The low effective population size increases power of both processes, which mutually decrease the population’s fitness and can lead to the extinction of species through the mutational meltdown (Gabriel et al. 1991; Lynch et al. 1993). The mutational meltdown is a process of extinction of small populations through the positive loop between population size and genetic drift: small population size leads to high genetic drift, which results in accumulation of deleterious mutations and decrease of fitness, which in turn decrease population size. The mutation rate ß The Author 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 30(2):347–355 doi:10.1093/molbev/mss219 Advance Access publication September 14, 2012 347 Article The mammalian mitochondrial genomes differ from the nuclear genomes by maternal inheritance, absence of recombination, and higher mutation rate. All these differences decrease the effective population size of mitochondrial genome and make it more susceptible to accumulation of slightly deleterious mutations. It was hypothesized that mitochondrial genes, especially in species with low effective population size, irreversibly degrade leading to decrease of organismal fitness and even to extinction of species through the mutational meltdown. To interrogate this hypothesis, we compared the purifying selections acting on the representative set of mitochondrial (potentially degrading) and nuclear (potentially not degrading) protein-coding genes in species with different effective population size. For 21 mammalian species, we calculated the ratios of accumulation of slightly deleterious mutations approximated by Kn/Ks separately for mitochondrial and nuclear genomes. The 75% of variation in Kn/Ks is explained by two independent variables: type of a genome (mitochondrial or nuclear) and effective population size of species approximated by generation time. First, we observed that purifying selection is more effective in mitochondria than in the nucleus that implies strong evolutionary constraints of mitochondrial genome. Mitochondrial de novo nonsynonymous mutations have at least 5-fold more harmful effect when compared with nuclear. Second, Kn/Ks of mitochondrial and nuclear genomes is positively correlated with generation time of species, indicating relaxation of purifying selection with decrease of species-specific effective population size. Most importantly, the linear regression lines of mitochondrial and nuclear Kn/Ks’s from generation times of species are parallel, indicating congruent relaxation of purifying selection in both genomes. Thus, our results reveal that the distribution of selection coefficients of de novo nonsynonymous mitochondrial mutations has a similar shape with the distribution of de novo nonsynonymous nuclear mutations, but its mean is five times smaller. The harmful effect of mitochondrial de novo nonsynonymous mutations triggers highly effective purifying selection, which maintains the fitness of the mammalian mitochondrial genome. Popadin et al. . doi:10.1093/molbev/mss219 in the mitochondrial genomes of mammals is approximately 25-fold increased when compared with the nuclear genomes (Lynch et al. 2006). This strong mutation pressure can additionally speed up the rates of genetic drift, Muller’s ratchet, and mutational meltdown and decrease mitDNA effective population size due to the linkage of neutral alleles with deleterious (Charlesworth et al. 1993) or favorable (Gillespie 2000) mutations. There are intensive debates about the efficiency of purifying selection in mitDNA. On one hand, several studies in line with theoretical expectations have demonstrated that the purifying selection in the mitDNA (tRNA, rRNA, and protein-coding genes) is less effective than that of the nucDNA (Lynch 1997; Lynch and Blanchard 1998). Furthermore, it has been observed in human pedigrees that there is a weak or even nonexistent selection of pathogenic mitDNA mutations (Jenuth et al. 1996; Chinnery et al. 2000). On the other hand, there are recent studies suggesting strong purifying selection in mitDNA: 1) purifying selection of mammalian mitochondria protein-coding genes is more effective than that of orthologous genes in proteobacteria (with large population size and presence of recombination) (Mamirova et al. 2007); 2) a severe mitDNA mutation (a frameshift in the nad6 gene) that was introduced into mice has disappeared after four generations demonstrating an effective purifying selection during oogenesis (Fan et al. 2008; Shoubridge and Wai 2008); and 3) using a mitDNA mutator strain of mice with a proofreading-deficient mitDNA polymerase, a rapid and strong elimination of nonsynonymous changes in proteincoding genes was observed during six generations (Stewart et al. 2008). Here, we use a set of 21 mammalian species with sequenced mitochondrial and nuclear genomes to study the controversial issue of purifying selection in mitDNA. We estimated the rate of accumulation of slightly deleterious mutations approximated by Kn/Ks for both genomes (mitochondrial or nuclear) of each species. To reveal potential mitochondria-specific detrimental genetic processes, we contrasted purifying selections of mitochondrial (potentially degrading) and nuclear (not degrading) genomes across mammalian species with different effective population size. We found that 1) the mitDNA selection is always more effective than nucDNA, 2) selections in mitDNA and nucDNA are relaxing in parallel with decreasing of population size of species, and 3) the relative efficiency of purifying selection in mitDNA of species with low population size does not decrease when compared with species with high population size. Totally, our results provide evidence against mitochondria-specific detrimental genetic processes in the mammalian species. Materials and Methods Species Studied and Genomic Sequence Alignments Pilot ENCODE Data Set The 21 species studied and listed below represent all major clades of eutherian, metaterian, and prototherian mammals: Loxodonta africana (African elephant), Procavia capensis 348 MBE (rock hyrax), Echinops telfairi (tenrec), Cavia porcellus (guinea pig), Mus musculus (mouse), Oryctolagus cuniculus (rabbit), Rattus norvegicus (rat), Felis catus (cat), Bos taurus (cow), Canis lupus familiaris (dog), Equus caballus (horse), Monodelphis domestica (monodephis), Pan troglodytes (chimpanzee), Colobus guereza (colobus monkey), Homo sapiens (human), Mac. mulatta (macaque), Pongo abelii (orangutan), Tupaia belangeri (tree chrew), Chlorocebus aethiops (vervet), Ornithorhynchus anatinus (platypus), and Dasypus novemcinctus (armadillo). The alignments of coding sequences (CDS) from the nuclear genomes were extracted from the ENCODE pilot project regions. We used the TBA alignments generated by the Multispecies Sequence Alignment group; these alignments cover 30 Mb of human genomic DNA (Margulies et al. 2007). The pilot ENCODE regions are selected in semirandom way, where 30 regions are selected randomly and 14 are selected because of biological interest or because they have been extensively studied (ENCODE Project Consortium 2004; Birney et al. 2007). The CDS alignment was created by an in-frame concatenation of the longest transcript per gene. To filter out the misaligned sequences for each exon alignment, the branch length was estimated using the phyml program (Guindon et al. 2009), and the tree topology was adopted from Murphy et al. (2001). Misaligned exons were detected by very long branch lengths (>0.36 substitutions per site) and substituted by missing data symbols. Subsequently, exons were assembled into transcripts to maintain the ORFs in the human sequence. Only 156 transcripts that maintain an ORF in all species were kept. To accommodate for insufficiently represented codons, only those codon positions were kept where the data are present in at least one representative species of each mammalian clade. For the mitochondrial genomes, all 13 protein-coding sequences with the total length of 3,882 codons were studied; for the nuclear genomes, a total of 15,687 codons (from the original input of 210,984 nts) from the pilot ENCODE regions were used. COX Data Set Fourteen species were used for the COX gene data set (Equus caballus, Sus scrofa, Bos taurus, Canis lupus familiaris, Ailuropoda melanoleuca, Rattus norvegicus, Mus musculus, Cavia porcellus, Callithrix jacchus, Macaca mulatta, Pongo abelii, Pan troglodytes, Homo sapiens, and Loxodonta africana). The COX data set included 802 codons from the nuclear genes and 1,008 from the mitochondrial genes. From the nuclear-coded COX genes, the fast evolving signal peptides (Li et al. 2009) were removed. The signal peptides were identified using amino acid sequences from X-ray structure of bovine heart cytochrome c oxidase (protein database ID: 2zxw) (Aoyama et al. 2009). Data Set of ESSENTIAL Genes The data set of essential nuclear genes was based on the multiple species genomic alignment from the University of California–Santa Cruz (UCSC) genome browser (phastCons46way) (Fujita et al. 2011). We have selected CDS of all genes, which are present at least in one species Purifying Selection in Mammalian Mitochondrial Genomes . doi:10.1093/molbev/mss219 of each vertebrate clade (Euarchonta, Glires, Laurasiatheria, Afrotheria, Xenarthra, Marsupiales, Monotremata, Archosauria, Amphibia, Actinopterygii, and Petromyzontiformes). From this data set, we have extracted a subset of genes that are human housekeeping genes (Chang et al. 2011) and that are associated with lethal knockout phenotype in mice (Yuan et al. 2012). The resulting data set includes 34 nuclear genes: ABL1, ACVR1, BUB3, CANX, COBRA1, COPS5, CSNK1D, CUL3, CYCS, DPAGT1, EGLN1, EIF6, FTH1, HIF1A, HMGB1, KRAS, KRT10, MAP3K7, PCNA, PDIA3, PITPNB, PPP2CA, PRKAR1A, PTGES3, RAC1, SHOC2, SMAD2, TARDBP, TBP, TCEA1, TPT1, UBE2A, UBE2N, and YY1. In this work, we did not exclude the CpG-prone sites from the sequences of nuclear genome, to maintain more information in the alignments. In our previous work, all the results were robust to the exclusion of CpG prone sites (Nikolaev et al. 2007). The estimated generation time for each species was taken from the AnAge database (de Magalhães et al. 2009). R package was used for all statistical analyses (R Development Core Team 2012). Sequence Analysis For inferring trees based on synonymous (Ks) or nonsynonymous (Ka) substitutions, we used the codeml program of PAML (Yang 1997) with a codon model specifying different transition/transversion rate ratios, and different nucleotide frequencies for each codon position, without gene partitioning, and imposing the topology adopted from Murphy et al. (2001) (run mode = 0). All sites with ambiguity characters or missing data were excluded (clean data = 1). Ratio of Radical to Conservative Amino Acid Substitutions (Kr/Kc) The ratio of the rates of radical over conservative substitutions (Kr/Kc) was estimated by comparison of the nucleotide sequences of extant animals with the nucleotide sequences of their most recent reconstructed ancestors as described in Zhang (2000). The ancestral nucleotide sequences were reconstructed using Yang’s (1997) method implemented in PAML. Because Kr and Kc values were small (<0.3), the Jukes–Cantor formula was used to correct for multiple hits; thus, our estimated Kr/Kc ratio is identical to the dR/dC ratio of Zhang (2000). The 20 amino acids were classified into four groups according to their volume, charge, polarity, and both polarity and volume (Taylor 1986). Amino acid substitutions within groups (i.e., when ancestral and modern amino acids in homologous sites belong to the same group) were regarded as conservative, whereas those between groups as radical. Average Grantham Distance To measure amino acid dissimilarity, we computed an average physicochemical distance between modern species and their most recent reconstructed ancestors. The distance between each ancestral and derived amino acid was taken from Grantham’s matrix (Grantham 1974) and averaged over all pairs of substitutions for a given external branch. MBE Evaluation of Distribution of Selection Coefficient of De Novo Nonsynonymous mitDNA Mutations Because Nemit = 0.25 Nenuc (Palumbi et al. 2001), we expect that a fraction of effectively neutral mutations should be higher for the mitochondrial than for the nuclear genome (according to the scheme on fig. 3, it should be approximately 20% when compared with 10% for nucleus). However, for mitochondrial data, we observe 5% of fixed slightly deleterious mutations (Kn/Ks 0.05) not 20%, and to fit this value, we had to change the mean in the density distribution of selection coefficients of mitochondrial de novo nonsynonymous mutations and make it one order of magnitude more harmful (from 1 103 to 1 102—the green curve on fig. 3). Results For both the mitochondrial and the nuclear genomes of each of 21 mammalian species (hereafter pilot ENCODE data set, see Materials and Methods), two evolutionary metrics were estimated: 1) the rates of accumulation of slightly deleterious mutations, approximated by Kn/Ks and 2) the amino acid dissimilarity between modern species and their last reconstructed ancestor, approximated by Kr/Kc ratios and the average Grantham distance. Rates of Accumulation of Slightly Deleterious Mutations The efficiency of purifying selection was approximated by ratios of nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site in the mitDNA (Knmit/Ksmit) and nucDNA (Knnuc/Ksnuc) on the terminal branches of the tree for all 21 mammalian species. The Kn/Ks in mitDNA was about twice smaller than in the nucDNA (with averages Knmit/Ksmit = 0.047 and Knnuc/ Ksnuc = 0.108 among all 21 tested mammals, P < 0.001, Mann–Whitney U test). The ENCODE phylogenetic tree contains both closely and distantly related species, which can introduce biases in the estimation of Kn/Ks due to the saturation of synonymous substitutions per synonymous site (Ks) on long branches. All nuclear Ks values were smaller than 0.5, suggesting a small effect of saturation. However, the mitochondrial Ks values range from 0.19 to 2.94, and thus, some of them are affected by saturation. To test the robustness of our results, we performed the same analysis on a subset of closely related species (primates), which are most likely not biased by saturation effect due to small Ksmit values (<0.6). The analysis of six primate species confirmed more effective purifying selection in mitDNA than in the nucDNA (Knmit/Ksmit = 0.079 and Knnuc/Ksnuc = 0.133, P = 0.03). To test the accuracy of estimation of Ksmit on the full data set, we investigated association between Ksnuc and Ksmit. The observed relationship between Ksnuc and Ksmit are well described by linear regression through the origin (Ksnuc = 0.10 Ksmit, P = 3.6 1013, R2 = 0.93, supplementary fig. S1, Supplementary Material online). This indicates an accurate reconstruction of the number of synonymous mutations per synonymous site in the mitDNA 349 MBE 0.25 Popadin et al. . doi:10.1093/molbev/mss219 nucDNA mitDNA 0.15 0.20 ● ● Kn/Ks ● ● ● ● 0.10 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ●●● ● ● ● ● 0 1000 2000 3000 4000 generation time (GT, days) FIG. 1. Relationships between Kn/Ks of nuclear genes and generation time (black regression lines) and between Kn/Ks of mitochondrial genes and generation time (gray regression lines) for 21 mammalian species from the ENCODE data set. The list of species, sorted according to their generation times (days of female maturity), is as follows: mouse (42), tree shrew (60), guinea pig (66), rat (90), monodelphis (122), cat (289), tenrec (365), armadillo (365), rock hyrax (500), dog (510), cow (548), platypus (548), rabbit (730), horse (914), vervet (1,034), macaque (1,231), colobus monkey (1,461), chimpanzee (3,376), African elephant (4,018), orangutan (4,493), and human (4,745). despite of the saturation effect. The linear relationship was also observed between Knnuc and Knmit (Knnuc = 0.23 Knmit, P = 1.5 1010, R2 = 0.88, supplementary fig. S1, Supplementary Material online). To evaluate the influence of effective population size of each species on purifying selection in mitDNA and nucDNA, we performed comparative analysis of variation in Kn/Ks. Because the generation time of species (GT) inversely correlates with population size (Ne) (Chao and Carr 1993), we correlated GT with Knmit/Ksmit and Knnuc/Ksnuc of the 21 mammals studied. We have observed highly significant linear regressions between Kn/Ks and GT for both genomes (Knmit/Ksmit = 0.037 + 0.88 105 GT, R2 = 0.31, P = 0.0089; Knnuc/Ksnuc = 0.094 + 1.18 105 GT, R2 = 0.31, P = 0.0088, where GT is an age of female maturity of each species in days) (fig. 1). Furthermore, the ratio (Knmit/ Ksmit)/(Knnuc/Ksnuc), which determines relative efficiency of purifying selection in mitDNA, is remarkably constant across all studied species and does not demonstrate significant regression with generation time (P = 0.2). These results indicate that when Ne of species decreases, the relaxation of purifying selection is similar in nucDNA and mitDNA (see the same slopes of the linear regression lines in fig. 1). To compare the relative efficiency of purifying selection in mitDNA (Knmit/Ksmit)/(Knnuc/Ksnuc) between species with 350 low effective population size versus species with high effective population size, we divided our species into two groups by GT median value (548 days). Comparison of relative efficiencies of purifying selections in mitDNA (Knmit/Ksmit)/(Knnuc/Ksnuc) between these two groups did not demonstrate significant difference (P = 0.1, Mann–Whitney U test). To estimate the fraction of explained variation in Kn/Ks by both the GT and the type of genomes G (mitDNA G = 0 and nucDNA G = 1), we applied a multiple linear regression model. Our results are shown in the equation as follows: logðKn=KsÞ ¼ 4:3 + 1:34 G + 0:18 logðGTÞ 0:07 ½G logðGTÞ; 2 ð1Þ 11 R ¼ 0:747, P ¼ 1:9 10 We have thus observed that 75% of variation in Kn/Ks is explained by these two variables. The P values corresponding to G and GT are highly significant (0.0032 and 0.0004, respectively), whereas the interaction of variables (G GT) is not significant (P = 0.28); these results confirm the analysis shown earlier. To evaluate the generality of the observed difference in purifying selection between mitDNA and nucDNA, we performed the Kn/Ks analysis not only for the semirandomly selected nuclear genes (pilot ENCODE data set) but also for a set of genes encoding subunits of complex IV (COX) of MBE Purifying Selection in Mammalian Mitochondrial Genomes . doi:10.1093/molbev/mss219 Average Physicochemical Distance of Amino Acid Substitutions We also studied the rate of amino acid substitutions that impact on protein structure, that is, the physicochemical distance between the ancestral and the derived amino acid in protein-coding genes from both mitDNA and nucDNA. This was measured by the ratio of radical over conservative changes Kr/Kc and the average Grantham distance (see Materials and Methods). The rate of substitutions between amino acids with different charge is significantly less frequent in mitDNA than in nucDNA (for charge-based Kr/Kc: Krmit/ Kcmit = 0.40 and Krnuc/Kcnuc = 0.60, P < 0.001). All other substitutions types, that is, different polarity, volume, or both, are 0.7 0.6 0.5 Kn/Ks a respiratory chain (hereafter COX data set), which is functionally related to mitochondria. This COX complex consists of 13 subunits, 3 of which are coded in the mitochondrial and 10 in the nuclear genome (Scheffler 1999). Because this complex represents one integral structure, the two genomes should coevolve (Osada and Akashi 2012) to maintain the structural and functional properties of this complex. The mitochondria-encoded COX genes had 15 times smaller Kn/Ks when compared with nuclear-encoded COX genes (median value of Knmit/Ksmit is 0.014 and 0.222 for Knnuc/ Ksnuc, P = 1 104, paired Mann–Whitney U test, fig. 2). This difference is significantly higher than that observed on the pilot ENCODE data set. Linear regressions between Kn/Ks and GT in both genomes were not significant (P > 0.16); this could be due to the small sample size of species (14) and the small number of studied genes (3 in mitochondria and 10 in the nucleus). However, one-sided Kendall’s rank correlations demonstrated the expected positive trends between GT and Kn/Ks for both mitochondrial (Kendall’s tau = 0.45, P value = 0.01) and nuclear genes (Kendall’s tau = 0.52, P value: 0.005). The nuclear genome contains many nonconserved genes with uncertain function, which may increase the overall Kn/ Ks ratio. To validate our results obtained with the high-quality pilot ENCODE alignment, we have constructed an additional independent data set of nuclear CDS based on 46 multiple species alignment from the UCSC genome browser (phastCons46way). From this alignment, we have selected a subset of 34 essential genes (i.e., genes with lethal knockout phenotype in mice and human housekeeping genes), which are conserved across all vertebrates (see Materials and Methods). These genes are comparable with mitochondrial genes in terms of functional importance and the level of conservation. The Kn/Ks ratios of the nuclear essential genes were significantly lower than the Kn/Ks ratios of the pilot ENCODE data set for 16 overlapping mammalian species (Kn/Ks = 0.079 for nuclear essential genes and Kn/Ks = 0.099 for pilot ENCODE data sets, P value = 0.038, paired Mann–Whitney U test). Furthermore, this set of essential nuclear genes is characterized by 2-fold higher Kn/Ks ratio than the mitochondrial genes (Knnuc/Ksnuc = 0.079 and Knmit/Ksmit = 0.039, P value = 3 105, paired Mann– Whitney U test). 0.4 0.3 0.2 0.1 ● ● 0.0 mitDNA nucDNA FIG. 2. Box-and-whisker plots of Kn/Ks of mitochondria and nuclear genes from the COX data set in 14 mammalian species (see Materials and Methods). The horizontal bold line corresponds to median, and bottom and upper lines of box are lower and upper quartiles, respectively. Whiskers extend out from the box no more than 1.5 times the interquartile range from the box. similar between mitochondrial and nuclear genomes (for polarity-based Kr/Kc: Krmit/Kcmit = 0.81 and Krnuc/Kcnuc = 0.74, P = 0.43; for volume-based Kr/Kc: Krmit/Kcmit = 0.73 and Krnuc/Kcnuc = 0.82, P = 0.23; and for both polarity and volume-based Kr/Kc: Krmit/Kcmit = 0.46 and Krnuc/ Kcnuc = 0.47, P = 0.88). The Grantham distance metric, which is based on composition, polarity, and molecular volume, is significantly smaller in mitDNA when compared with the nucDNA (56.8 ± 5 versus 59.5 ± 2 Grantham units, respectively, P = 0.013). Comparison of Deleterious Effects of Nonsynonymous Mutations in Mitochondria and Nucleus We further compared the selection coefficient of nonsynonymous mutations in mitDNA and nucDNA (smit/snuc). If silent synonymous sites evolve neutrally, the ratio of the fixation probability for a mutation with selection coefficient s 6¼ 0 to the fixation probability for a neutral mutation with s = 0 can be equated with the Kn/Ks ratio (Kimura 1983): Kn=Ks ¼ S=½1 expðSÞ ; ð2Þ where S = Nes. Because our estimate of Knnuc/Ksnuc was 0.108 and Knmit/ Ksmit was 0.047 (see earlier), we computed Snuc = 3.4 and Smit = 4.5. Because Nemit = 1/4Nenuc, the estimated ratio of selection coefficients smit/snuc = 5.3. Thus, the deleterious effect of amino acid substitutions in mitDNA is 5-fold higher than in the nucDNA. We then attempted to reconstruct the distribution of selection coefficients of de novo nonsynonymous mutations in mitDNA. It was estimated empirically for nucDNA that the fractions of amino acid replacements that reduce fitness by >102 (lethal and semilethal), 102–104 (pathogenic mutations, causing Mendelian diseases), 104–105 (segregating in the human population as nonsynonymous variants), and <105 (reaching fixation in the human–chimpanzee divergence) are 25%, 49%, 14%, and 12%, respectively (gray bins of fig. 3) (Yampolsky et al. 2005). We fitted a normal distribution to those bines (the red curve of fig. 3) and assumed that the 351 Popadin et al. . doi:10.1093/molbev/mss219 MBE FIG. 3. Distributions of selection coefficients of de novo nonsynonymous mutations in mitochondrial and nuclear genomes. The red curve represents the normal distribution that fits the empirical distribution (the four gray bins) of the selection coefficient of mutations in human nuclear-coded proteins (Yampolsky et al. 2005). The horizontal bold red line marks the region of nuclear effectively neutral mutations with jsj < 0.5 Nenuc1 assuming the long-term Nenuc of human population as 5 104 (Yampolsky et al. 2005). The horizontal bold green line marks the region of mitochondrial effectively neutral mutations with jsj < 0.5 Nemit1 assuming the long-term Nemit = 0.25 Nenuc = 1.25 104. The red area (10%) represents the fraction of the effectively neutral mutations accumulated in nuclear DNA, which corresponds to Knnuc/Ksnuc = 0.1. The area with green horizontal lines (20%) represents the expected fraction of effectively neutral mutations accumulated in mitochondrial DNA assuming the same distribution of selection coefficients as in nuclear DNA. The green curve represents the hypothetical shift (the gray arrow) in the distribution of selection coefficients of mitochondrial mutations, which was obtained by reducing the mean of the red distribution by one order of magnitude. The area with vertical green lines (5%) represents the fraction of effectively neutral mitochondrial mutations in the shifted distribution, which fits our empirical results (Knmit/ Ksmit 0.05). distribution of selection coefficients in mitDNA is the same as in the nucDNA. Because Nemit is 4-fold less than Nenuc, the expected Kn/Ks in mitDNA is 0.2 (area with horizontal green lines, fig. 3). However, because our observed Kn/Ks in mitDNA is 2-fold less than in nucDNA (0.047 vs. 0.108), we had to shift the mean of the distribution of selection coefficient of mitDNA on one order of magnitude (green curve of fig. 3, see Materials and Methods for details). Discussion Congruent Mitochondrial and Nuclear Purifying Selection The relaxation of purifying selection with decrease of effective population size of species (Ne) has been demonstrated separately for mitochondrial (Popadin et al. 2007) and nuclear (Nikolaev et al. 2007) protein-coding genes of mammals. However, the comparison of dynamics of accumulation of mutations in mitDNA and nucDNA gave contradictory results (Bazin et al. 2006; Mulligan et al. 2006; Piganeau and Eyre-Walker 2009), and comparison of the rates of relaxation of purifying selection of these two genomes on the same set of species have never been performed before. In this work, we observe that the rates of accumulation of slightly deleterious mutations in mitDNA and nucDNA are parallel. This indicates that effective population sizes of 352 mitochondrial (Nemit) and nuclear (Nenuc) genomes are positively correlated with each other. The parallel dynamics also imply that the relaxation of purifying selection with decrease of species-specific effective population size leads to proportional increases of fraction of effectively neutral mutations in mitochondrial and nuclear genes. This means the similar shapes of distributions of selection coefficients of de novo nonsynonymous mutations in mitDNA and nucDNA. It has been proposed that mammalian species with low effective population size are more prone to extinction when compared with species with high population size (Polishchuk 2002; Popadin et al. 2007). If extinction is associated to gradual decrease of population size of the species with low Ne, we expect that in these species, (Knmit/Ksmit)/(Knnuc/Ksnuc) would be significantly smaller than in species with stable population size. Knmit/Ksmit is more sensitive to the recent changes in population size when compared with Knnuc/Ksnuc because of low Nemit and correspondingly short coalescence time of mutations in mitochondrial genomes. So the ratio (Knmit/Ksmit)/(Knnuc/Ksnuc) under the assumption of absence of frequent positive selection in both genomes reflects the ratio of short-term purifying selection to long-term purifying selection and allows to infer recent population dynamics of species. In our study, we found that the ratio (Knmit/Ksmit)/ Purifying Selection in Mammalian Mitochondrial Genomes . doi:10.1093/molbev/mss219 (Knnuc/Ksnuc) is the same across all studied species, implying the similar, and most likely stable, recent demographic history of the studied species. Thus, our study does not support any evidence of extinction of species with low Ne due to gradual decrease in their population size. Highly Effective Purifying Selection in mitDNA Mitochondrial genes are expected to undergo less effective purifying selection than nuclear genes because of a 4-fold lower effective population size (Palumbi et al. 2001; Lynch et al. 2006) and absence of recombination. However, our results demonstrate the opposite: we observed significantly higher purifying selection in mitDNA when compared with nucDNA (using Kn/Ks, charge-based Kr/Kc, and Grantham’s distance), across all investigated mammalian species. We also estimated that the de novo nonsynonymous substitutions in mitDNA are on average 5-fold more deleterious when compared with nucDNA. Our results are compatible with study on more effective selection in mammalian mitochondria versus proteobacterial orthologs (Mamirova et al. 2007) and the fast elimination of deleterious mutations during mouse oogenesis (Fan et al. 2008; Shoubridge and Wai 2008; Stewart et al. 2008). There are three recent lines of evidence from human population genetic studies on higher evolutionary constrains of mitochondria genes than the nuclear genes. 1) In mitDNA, the ratio of polymorphic to fixed mutations (50–90%) is significantly higher when compared with nucDNA (8–28%) (Hasegawa et al. 1998; Subramanian 2011); 2) there is a lower ratio of nonsynonymous to synonymous polymorphisms in human mitDNA than in the nucDNA (Breen and Kondrashov 2010); and 3) there is an apparent discrepancy between the estimated mutation rate in mitDNA from comparative species studies and from human pedigrees, and this discrepancy does not exist in nucDNA. The mutation rate in nucDNA is 1.2–1.0 108 per generation per nt (observation for the genome sequence of two parent–offspring trios [Durbin et al. 2010]), and 2.5 108 per generation per nt (human–chimpanzee comparison [Nachman and Crowell 2000]). Contrary to this similarity, there is 20-fold difference in the mitDNA mutation rates, estimated for hypervariable regions in mitochondrial control region: this estimate is 5 105 per generation per nt in mother–offspring lineages and 2.4 106 per generation per nt from comparative species analysis (Parsons et al. 1997; Howell et al. 2003). This discrepancy could be explained by a strong purifying selection against nearly neutral mutations in control region or by a strong purifying selection against deleterious mutations, which are linked to neutral ones. Potential Mechanisms of Effective Purifying Selection in Mitochondria The effectively neutral theory cannot explain the strong purifying selection in mitDNA taking into account only the absence of recombination and the low Ne of mitochondria. Thus, additional mechanisms should be considered; these include 1) mitochondrial bottlenecks that alters the MBE heteroplasmy level of deleterious mutations facilitating negative selection (Bergstrom and Pritchard 1998; Shoubridge and Wai 2008); 2) effective haploidy is associated with more strong negative selection due to the absence of the masking effect of the second allele (Kondrashov and Crow 1991); 3) thousands of copies of mitDNA per somatic cells and linkage of the number with oxidative phosphorylation activity of the cell (Fernández-Vizarra et al. 2011) are associated with high expression level and thus with high selective constraints of mitochondrially encoded proteins; and 4) extensive protein– protein interactions in complexes of respiratory chain can additionally constrain mitochondrial genes (Fraser et al. 2003). Evolutionarily, the mammalian mitochondrial genome could be compared with the nonrecombining regions of the Y chromosome, because in both there is absence of recombination and low Ne. Thus the Y chromosome could be used as an analog to the mitochondrial genome evolutionary processes. During the evolution of the ancestral Y chromosome, there is considerable gene loss (Charlesworth 2003), but because of different “survival” rates, critically important genes with low Kn/Ks are over-represented in the derived Y chromosomes (Bachtrog et al. 2008; Chibalina and Filatov 2011). The same process has important role in the evolution of bacterial genomes (Mira et al. 2001). We suggest a similar process in the mitDNA in which the majority of ancestral genes were eliminated (either degraded or migrated to the nucDNA), and only highly conserved genes have remained. Indeed, only genes encoding core subunits of the respiratory chain complexes have remained in the mitDNA (Scheffler 1999). This concentration of critically important genes during the evolution of the mitDNA has two consequences: the average deleterious effect (selection coefficient s) of de novo mutations is increased and the number of new slightly deleterious mutations per genome per generation (u) is decreased due to the minimization of this genome. Both these trends (high s and low u) work against detrimental mechanisms such as Muller’s ratchet and background selection (Bachtrog 2008). The third potentially detrimental process, that is, the fixation of deleterious mutations caused by genetic hitchhiking can be important even on short nonrecombining regions (Bachtrog 2008). However, in the mitDNA of mammals, there is no evidence of frequent positive selection (Mamirova et al. 2007), and therefore we do not expect to observe massive accumulation of deleterious mutations due to hitchhiking. Conclusion Our observations reveal that the distribution of selection coefficients of de novo mutations in mitDNA and nucDNA have a similar shape but different means; on average, mitDNA mutations are five times more harmful when compared with nucDNA mutations. This shift of selection coefficients of mitochondrial de novo nonsynonymous mutations toward more deleterious can be partially due to the elimination of nonessential genes from the mitochondrial genome during evolution because only highly constrained genes can survive on a nonrecombining genome with low effective population size (Mira et al. 2001; Bachtrog et al. 2008; Chibalina and 353 Popadin et al. . doi:10.1093/molbev/mss219 Filatov 2011). Other reasons for decreased selection coefficients of mitochondrial de novo nonsynonymous mutations could be 1) mitochondrial bottlenecks, 2) effective haploidy, 3) multiple copies of mitDNA per cell, 4) high level of expression of mitochondrial genes, and 5) multiple protein–protein interactions of mitochondria encoded subunits. Altogether, these traits increase or unmask the deleterious effects of de novo nonsynonymous mutations in the mitochondrial genome providing the mechanism of effective purifying selection and maintaining the fitness of the mammalian mitochondrial genome. Supplementary Material Supplementary figure S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). Acknowledgments The authors are grateful to S. Kryazhimskiy, S. Subramanian, G. Bazykin, D. Filatov, and M. Lynch for constructive criticism. This work was supported by EMBO long-term fellowship program ALTF 527-2010 and RFBR 10-04-00276-‘ grants to K.Y.P. and by the Swiss National Science Foundation to S.E.A. References Aoyama H, Muramoto K, Shinzawa-Itoh K, Hirata K, Yamashita E, Tsukihara T, Ogura T, Yoshikawa S. 2009. A peroxide bridge between Fe and Cu ions in the O2 reduction site of fully oxidized cytochrome c oxidase could suppress the proton pump. Proc Natl Acad Sci U S A. 106:2165–2169. Bachtrog D. 2008. The temporal dynamics of processes underlying Y chromosome degeneration. Genetics 179:1513–1525. Bachtrog D, Hom E, Wong KM, Maside X, de Jong P. 2008. Genomic degradation of a young Y chromosome in Drosophila miranda. Genome Biol. 9:R30. Bazin E, Glémin S, Galtier N. 2006. Population size does not influence mitochondrial genetic diversity in animals. Science 312:570–572. Bergstrom CT, Pritchard J. 1998. Germline bottlenecks and the evolutionary maintenance of mitochondrial genomes. Genetics 149: 2135–2146. Birney E, Stamatoyannopoulos JA, Dutta A, et al. (311 co-authors). 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799–816. Breen MS, Kondrashov FA. 2010. Mitochondrial pathogenic mutations are population-specific. Biol Direct. 5:68. Chang C-W, Cheng W-C, Chen C-R, Shu W-Y, Tsai M-L, Huang C-L, Hsu IC. 2011. Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis. PloS One 6: e22859. Chao L, Carr DE. 1993. The molecular clock and the relationship between population size and generation time. Evolution 47:688–690. Charlesworth B. 2003. The organization and evolution of the human Y chromosome. Genome Biol. 4:226. Charlesworth B. 2009. Effective population size and patterns of molecular evolution and variation. Nat Rev Genet. 10:195–205. Charlesworth B, Morgan MT, Charlesworth D. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303. 354 MBE Chibalina MV, Filatov DA. 2011. Plant Y chromosome degeneration is retarded by haploid purifying selection. Curr Biol. 21:1475–1479. Chinnery PF, Thorburn DR, Samuels DC, White SL, Dahl HM, Turnbull DM, Lightowlers RN, Howell N. 2000. The inheritance of mitochondrial DNA heteroplasmy: random drift, selection or both? Trends Genet. 16:500–505. de Magalhães JP, Budovsky A, Lehmann G, Costa J, Li Y, Fraifeld V, Church GM. 2009. The Human Ageing Genomic Resources: online databases and tools for biogerontologists. Aging Cell. 8:65–72. Durbin RM, Altshuler DL, Abecasis GR, et al. (362 co-authors). 2010. A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. ENCODE Project Consortium. 2004. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306:636–640. Fan W, Waymire KG, Narula N, Li P, Rocher C, Coskun PE, Vannan MA, Narula J, Macgregor GR, Wallace DC. 2008. A mouse model of mitochondrial disease reveals germline selection against severe mtDNA mutations. Science 319:958–962. Felsenstein J. 1974. The evolutionary advantage of recombination. Genetics 78:737–756. Fernández-Vizarra E, Enrı́quez JA, Pérez-Martos A, Montoya J, Fernández-Silva P. 2011. Tissue-specific differences in mitochondrial activity and biogenesis. Mitochondrion 11:207–213. Fraser HB, Wall DP, Hirsh AE. 2003. A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol. 3:11. Fujita PA, Rhead B, Zweig AS, et al. (27 co-authors). 2011. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 39: D876–D882. Gabriel W, Burger R, Lynch M. 1991. Population extinction by mutational load and demographic stochasticity. In: Seitz A, Loeschcke V, editors. Species Conservation: A Population-Biological Approach. Basel (Switzerland): Birkhauser Verlag. p. 49–59. Gillespie JH. 2000. Genetic drift in an infinite population. The pseudohitchhiking model. Genetics 155:909–919. Gordo I, Navarro A, Charlesworth B. 2002. Muller’s ratchet and the pattern of variation at a neutral locus. Genetics 848:835–848. Grantham R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862–864. Guindon S, Delsuc F, Dufayard J-F, Gascuel O. 2009. Estimating maximum likelihood phylogenies with PhyML. Methods Mol Biol. 537: 113–137. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. 2005. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33: D514–D517. Hasegawa M, Cao Y, Yang Z. 1998. Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Mol Biol Evol. 15:1499–1505. Hill WG, Robertson A. 1966. The effect of linkage on limits to artificial selection. Genet Res. 8:269–294. Howell N, Smejkal CB, Mackey DA, Chinnery PF, Turnbull DM, Herrnstadt C. 2003. The pedigree rate of sequence divergence in the human mitochondrial genome: there is a difference between phylogenetic and pedigree rates. Am J Hum Genet. 72:659–670. Jenuth JP, Peterson AC, Fu K, Shoubridge EA. 1996. Random genetic drift in the female germline explains the rapid segregation of mammalian mitochondrial DNA. Nat Genet. 14:146–151. Purifying Selection in Mammalian Mitochondrial Genomes . doi:10.1093/molbev/mss219 Kimura M. 1983. The neutral theory of molecular evolution. Cambridge (UK): Cambridge University Press. Kondrashov AS, Crow JF. 1991. Haploidy or diploidy: which is better? Nature 351:314–315. Li Y-D, Xie Z-Y, Du Y-L, Zhou Z, Mao X-M, Lv L-X, Li Y-Q. 2009. The rapid evolution of signal peptides is mainly caused by relaxed selection on non-synonymous and synonymous sites. Gene 436: 8–11. Lynch M. 1997. Mutation accumulation in nuclear, organelle, and prokaryotic transfer RNA genes. Mol Biol Evol. 14:914–925. Lynch M, Blanchard JL. 1998. Deleterious mutation accumulation in organelle genomes. Genetica 102–103:29–39. Lynch M, Bürger R, Butcher D, Gabriel W. 1993. The mutational meltdown in asexual populations. J Hered. 84:339–344. Lynch M, Koskella B, Schaack S. 2006. Mutation pressure and the evolution of organelle genomic architecture. Science 311:1727–1730. Mamirova L, Popadin K, Gelfand MS. 2007. Purifying selection in mitochondria, free-living and obligate intracellular proteobacteria. BMC Evol Biol. 7:17. Margulies EH, Cooper GM, Asimenos G, et al. (77 co-authors). 2007. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17: 760–774. Mira A, Ochman H, Moran NA. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17:589–596. Mulligan CJ, Kitchen A, Miyamoto MM. 2006. Comment on “Population size does not influence mitochondrial genetic diversity in animals.” Science 314:1390. Murphy WJ, Eizirik E, Johnson WE, Zhang YP, Ryder OA, O’Brien SJ. 2001. Molecular phylogenetics and the origins of placental mammals. Nature 409:614–618. Nachman MW, Crowell SL. 2000. Estimate of the mutation rate per nucleotide in humans. Genetics 156:297–304. Nikolaev SI, Montoya-Burgos JI, Popadin K, Parand L, Margulies EH, Antonarakis SE. 2007. Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. Proc Natl Acad Sci U S A. 104:20443–20448. Osada N, Akashi H. 2012. Mitochondrial-nuclear interactions and accelerated compensatory evolution: evidence from the primate cytochrome C oxidase complex. Mol Biol Evol. 29:337–346. Palumbi SR, Cipriano F, Hare MP. 2001. Predicting nuclear gene coalescence from mitochondrial data: the three-times rule. Evolution 55: 859–868. MBE Parsons TJ, Muniec DS, Sullivan K, et al. (11 co-authors). 1997. A high observed substitution rate in the human mitochondrial DNA control region. Nat Genet. 15:363–368. Piganeau G, Eyre-Walker A. 2009. Evidence for variation in the effective population size of animal mitochondrial DNA. PloS One 4:e4396. Polishchuk LV. 2002. Ecology. Conservation priorities for Russian mammals. Science 297:1123. Popadin K, Polishchuk LV, Mamirova L, Knorre D, Gunbin K. 2007. Accumulation of slightly deleterious mutations in mitochondrial protein-coding genes of large versus small mammals. Proc Natl Acad Sci U S A. 104:13390–13395. R Development Core Team. 2012. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC, Mishmar D, Yi C, Kreuziger J, Baldi P, Wallace DC. 2007. An enhanced MITOMAP with a global mtDNA mutational phylogeny. Nucleic Acids Res. 35: D823–D828. Scheffler IE. 1999. Mitochondria. New York: Wiley-Liss. Shoubridge EA, Wai T. 2008. Medicine. Sidestepping mutational meltdown. Science 319:914–915. Stewart JB, Freyer C, Elson JL, Wredenberg A, Cansu Z, Trifunovic A, Larsson N-G. 2008. Strong purifying selection in transmission of mammalian mitochondrial DNA. Hurst LD, editor. PLoS Biol. 6:e10. Subramanian S. 2011. High proportions of deleterious polymorphisms in constrained human genes. Mol Biol Evol. 28:49–52. Taylor WR. 1986. The classification of amino acid conservation. J Theor Biol. 119:205–218. Wai T, Teoli D, Shoubridge EA. 2008. The mitochondrial DNA genetic bottleneck results from replication of a subpopulation of genomes. Nat Genet. 40:1484–1488. Yampolsky LY, Kondrashov FA, Kondrashov AS. 2005. Distribution of the strength of selection against amino acid replacements in human proteins. Hum Mol Genet. 14:3191–3201. Yang Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 13:555–556. Yuan Y, Xu Y, Xu J, Ball RL, Liang H. 2012. Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data. Bioinformatics 28:1246–1252. Zhang J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J Mol Evol. 50: 56–68. 355
© Copyright 2026 Paperzz