Increased Number of MicroRNA Target Sites in Genes Encoded in CNV Regions. Evidence for an Evolutionary Genomic Interaction Kyriakos Felekkis, ,1 Konstantinos Voskarides, ,1 Harsh Dweep,2 Carsten Sticht,2 Norbert Gretz,2 and Constantinos Deltas*,1 1 Department of Biological Sciences and Molecular Medicine Research Center, University of Cyprus, Nicosia, Cyprus Medical Research Center, University of Heidelberg, Mannheim, Germany Authors contributed equally to this work. *Corresponding author: E-mail: [email protected]. Associate editor: Douglas Crawford 2 Abstract MicroRNAs (miRNAs) and copy number variations (CNVs) are two newly discovered genetic elements that have revolutionized the field of molecular biology and genetics. By performing in silico whole genome analysis, we demonstrate that both the number of miRNAs that target genes found in CNV regions as well as the number of miRNA-binding sites are significantly higher than those of genes found in non-CNV regions. This suggests that miRNAs may have acted as equilibrators of gene expression during evolution in an attempt to regulate aberrant gene expression and to increase the tolerance to genome plasticity. expression level (Stranger et al. 2007; Henrichsen et al. 2009; Schuster-Bockler et al. 2010). But what drives evolution for CNVs and miRNAs? With regards to CNVs, there are two main hypotheses: 1) CNVs are somehow beneficial and directional selection increases their frequency in the genome, 2) CNVs might be only slightly deleterious or neutral so they remain in the genome due to difficulties to get rid of or due to genic drift phenomena. It is very interesting that different genic classes are not randomly included in CNV regions. CNV regions are significantly enriched with sensory receptor genes (Nozawa et al. 2007) in contrast to the dosage sensitive genes that are significantly underrepresented in CNV regions (Schuster-Bockler et al. 2010). Additionally, Nguyen et al. (2008) found indications for reduced purifying selection in human CNVs regions (Nguyen et al. 2008). Contrary to CNVs, miRNA alignment and interspecies comparison is much easier due to their small size. Multi-alignments show their great conservation, something that underlines their important role in fine-tuning of gene expression (Liu et al. 2008). We herewith hypothesize that for CNVs and miRNA target sites, a coevolutionary process may exist that configures—at least partly—their distribution in the different genomes. It is possible that an evolutionary mechanism regulates the number of miRNAs and miRNAs target sites in genes found within CNV regions. For example, increased expression of genes that are located in duplicated regions may be counteracted and equilibrated with increased miRNA target sites in the 3# UTRs of those genes or possibly increased number of miRNAs targeting those genes. To test our hypothesis, we utilized the miRWalk database (http://www.ma.uni-heidelberg.de/apps/zmf/mirwalk/) and © The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 28(9):2421–2424. 2011 doi:10.1093/molbev/msr078 Advance Access publication March 25, 2011 2421 Letter MicroRNAs (miRNAs) and copy number variations (CNVs) are two newly discovered categories of genomic elements that have changed the way we view the diploid genome and added one additional order of complexity in gene expression and regulation. Also, they increased our understanding of the plasticity of the human and other mammalian genomes. miRNAs belong to the most abundant class of small RNAs in animals (Farazi et al. 2008). It is a recently discovered class of eukaryotic, endogenous, noncoding RNAs that play a key role in the regulation of gene expression. Recent studies show that about 30% of human protein-coding genes are regulated by one or more miRNAs. They are short single-stranded RNA molecules (;21–23 nt) that usually are partially complementary to one or more messenger RNA (mRNA) molecules (target mRNAs) (Farazi et al. 2008). Their main function is to downregulate gene expression by inhibiting translation or by targeting the mRNA for degradation or deadenylation (Wu et al. 2006; Borel and Antonarakis 2008). CNVs refer to genomic regions with segmental duplications that have been recognized after systematic comparative genomic hybridizations on the DNA of healthy human subjects and are defined as DNA segments that are 1 kb or larger in size and are present at significant frequency in the population (Bartel 2004; Lafrate et al. 2004; Sebat et al. 2004). Exact boundaries and allele number of CNVs can be hardly estimated by current methods, adding to the complexity of their role in gene expression and disease (Dear 2009; Henrichsen et al. 2009). In addition, only few studies are available for assessing the role of CNVs on gene expression and there is significant evidence that a number of genes located within CNVs have increased Felekkis et al. · doi:10.1093/molbev/msr078 the TargetScan algorithm (http://www.targetscan.org/) to predict the potential number of targeting miRNAs and the number of miRNA-binding sites on the 3# UTR of various genes that are found either in CNV or non-CNV regions. miRWalk algorithm searches for seeds based on Watson–Crick complementarity, walking on the complete sequence of a gene starting with a heptamer (seven nucleotides) seed of miRNA sequences. As soon as it identifies a heptamer perfect base pairing, it immediately extends the length of the miRNA seed until a mismatch arises. It then returns all possible hits with seven or longer matches. Then it assigns the prediction results in four parts, according to promoter region, 5# UTR, coding sequence and 3# UTR (Shahi et al. 2006). In this study, we concentrated only on the predicted miRNA-binding sites within the 3# UTR of genes in the entire human genome. On the other hand, TargetScan takes into account not only the sequence homology between the miRNA sequence and target mRNA sequence but also it integrates thermodynamics-based modeling of miRNA–mRNA interactions and comparative sequence analysis to predict miRNA targets conserved across multiple genomes (Lewis et al. 2003). The use of two independent and different methods improves the reliability of the predictions made. We analyzed two categories of genes of the entire human genome: 1) All human genes, 12,705, that are encoded within non-CNV regions, downloaded from NCBI and 2) all human genes, 9,673, that belong to a CNV region that is registered to ‘‘Database of Genomic Variants ’’ as copy number (http://projects.tcag.ca/variation/) (Lafrate et al. 2004). The number of predicted miRNAs and the number of their target sites were determined for each gene using miRWalk and TargetScan and the means were compared between the two different categories of genes mentioned above. Figure 1a summarizes the results between the two classes of genes. It is shown that the number of miRNAs predicted to regulate genes found in CNV regions of the genome is significantly higher (P value , 0.0001) than the one of genes found in non-CNV regions, as determined by both miRWalk and TargetScan. Because miRNAs can regulate gene expression by binding to the same gene in more than one target sites, we also determined the total number of miRNA-binding sites in each of the above genes. As with the number of miRNAs, the total number of binding sites is significantly higher (P value , 0.0001) in CNV genes than that in non-CNV genes as predicted by both algorithms (fig. 1b). It should be noted that the difference in the absolute values of miRNAs and miRNA-binding sites as calculated by the two algorithms can be attributed to the difference of the prediction methodologies used. In order to address the possibility that the increase in the total number of miRNA-binding sites in CNV genes is independent of the number of miRNA, we compared the mean number of binding sites per miRNA molecule for CNV and non-CNV genes. As shown in figure 1c, the mean number of binding sites per miRNA in CNV genes is higher 2422 MBE than the one in non-CNV genes as determined by both miRWalk (P value , 0.0001) and TargetScan (P value , 0.05). Collectively, these results demonstrate that genes found in CNV regions of the human genome are targeted by more miRNA molecules and at the same time they have more miRNA-binding sites than genes found in non-CNV regions. These results cannot be attributed to a difference in the length of the respective 3# UTRs in CNV versus nonCNV genes. When we compared the relevant lengths of 3# UTRs in human/chimp CNV genes versus non-CNV genes, we found them very similar with no statistically significant difference (Human CNVs vs. non-CNVs: 953.5 ± 11.11 standard error of the mean [SEM] vs. 932 ± 11.54 SEM, Chimpanzee CNVs vs. non-CNVs : 976.2 ± 60.76 SEM vs. 1005 ± 14.82 SEM, Student’s t-test shows no statistical significance). These results of in silico analysis support our original hypothesis. But, what is the explanation for these results? It appears that in the evolutionary timescale as genes were duplicated in the genome, the organisms adapted to the higher levels of protein expression by the concurrent accumulation of binding sites for new miRNAs on the 3# UTR of those genes and at the same time by increasing the number of miRNA-binding sites per miRNA molecule. In order to support this notion, we performed comparative analysis examining the mean number of miRNAs per gene in human versus the mean number of miRNAs per gene in Chimpanzee, in CNV genes. As shown in figure 1d, the number of miRNAs in human CNV genes is significantly higher than the one in chimpanzee (miRWalk and TargetScan). This is an indication that the number of miRNAs targeting a gene, increase after the formation of CNVs because genes in CNVs formed in human lineage are targeted by a significantly higher number of miRNAs than the genes found within CNVs in chimpanzee for which data are available. We propose that through this analysis, we may be witnessing an evolutionary interaction between two widespread genomic elements, an evolutionary process that may have operated also between other genomic elements. This phenomenon may be more evident for particular gene families or classes that are more sensitive in dosage and expression deregulations. As discussed above, there are two possible explanations for the wide genomic presence of CNVs. This may be a third one. Recently, it was proved (Ha et al. 2009) that small RNAs produced during interspecific mating or polyploidization serve as a buffer against the genomic shock in interspecific hybrids and allopolyploids. Ha et al. and our in silico investigative analysis show that miRNAs (and perhaps other small RNAs) have an equilibrating role for genomic dosage phenomena. Importantly, Lehnert et al (2009) showed that sense Alu sequences are enriched for miRNA target sites (Lehnert et al. 2009). Even more noteworthy is the work by Li et al (2008) who found that miRNA targets are significantly enriched for paralogs genes while they mention also that their results suggest that ‘‘miRNA- Increased Number of miRNA Target Sites · doi:10.1093/molbev/msr078 MBE FIG. 1. (a) Mean number of miRNA per genes as determined by miRWalk (left) and TargetScan (right) in non-CNV genes (97.95 1/- 0.89 SEM; 68.03 1/2 0.65 SEM) and CNVs (126.80 1/2 1.17 SEM; 72.56 1/2 0.69 SEM) in the whole genome. (b) Mean number of miRNA-binding sites per genes as determined by miRWalk (left) and TargetScan (right) in non-CNV genes (138.7 þ/ 1.49 SEM; 85.73 þ/ 1.01 SEM) and CNVs (185.7 þ/ 2.05 SEM; 92.89 þ/ 1.08 SEM) in the whole genome. (c) Average of the mean binding sites per miRNA as determined by miRWalk (left) and TargetScan (right) in non-CNV genes (1.28 1/2 0.001 SEM; 1.14 1/2 0.003 SEM) and CNVs (1.32 1/2 0.002 SEM; 1.16 1/2 0.002 SEM) in the whole genome. (d) Mean number of miRNA-binding sites per gene as determined by miRWalk (left) and TargetScan (right) in human CNV genes (128.2 þ/ 4.62 SEM; 73.17 þ/ 2.55 SEM) and chimpanzee CNV genes (71.59 þ/ 2.51 SEM; 12.40 þ/ 0.59 SEM). Comparison of means was performed by t-test using the SPSS statistical software. Data represent means of total miRNA number per gene. The symbols (*) and (***) denote significance difference between the means, with a P value , 0.05 and ,0.0001, respectively. mediated regulation plays an important role in the regulatory circuits involving duplicated genes including adjusting imbalanced dosage effects of gene duplicates and possibly creating a mechanism for genetic buffering.’’ Hence, this may be an evolutionary phenomenon pertaining to genomic repeats (Li et al. 2008). 2423 Felekkis et al. · doi:10.1093/molbev/msr078 We suggest that miRNAs may have arisen under evolutionary pressure, as a mechanism for increasing the tolerance to and dealing with genome plasticity. Alternatively, it is possible that CNVs that are targeted by micRNAs had a higher chance to segregate or be fixed due to the buffering effects of the miRNAs. This may be an additional underestimated role of miRNAs. Additionally, the results from this whole human genome analysis support the notion that the overall main molecular contribution of miRNAs is to downregulate gene expression with the aim of conserving physiology and homeostasis. In a recent study, it is hypothesized that miRNAs play a role in canalizing gene expression, that is, in stabilizing a phenotype within a species. According to the authors, miRNAs may do so by a dual function, tuning and buffering of gene expression, which means that they assist in presetting a mean level of gene expression while they also reduce the variance around this mean. This is turning out to be an essential means of achieving homeostasis in living organisms and our results are in agreement with this hypothesis (Wu et al. 2009) We wish to close this report by emphasizing that our hypothesis does not in any way annul the true physiological and required role of genes that have been found or will be found to function in more than two doses generated from the dogmatic two copies in diploid organisms. Our approach has been a holistic and statistical one that identified a trend that relates the hypothesized coevolutionary history of CNVs and miRNAs, in a fashion that is similar to the one proposed by other recent works (Li et al. 2008; Lehnert et al. 2009). The application of similar analysis in other mammalian species, other high-throughput and wet-lab approaches will certainly enable more sophisticated and demanding analysis aimed at elucidating their complete evolutionary role in gene regulation. Acknowledgments This work was funded by programs from the Cyprus Research Promotion Foundation, DIDACTOR/DISEK/0308/ 07 to K.F., and NEW INFRASTRUCTURE/STRATEGIC/ 0308/24 to C.D. References Bartel DP. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297. 2424 MBE Borel C, Antonarakis SE. 2008. Functional genetic variation of human miRNAs and phenotypic consequences. Mamm Genome. 19: 503–509. Dear PH. 2009. Copy-number variation: the end of the human genome? Trends Biotechnol 27:448–454. Farazi AT, Juranek AS, Tuschl T. 2008. The growing catlog of small RNAs and their association with distinct Argonaute/Piwi family members. Development 135:1201–1214. Ha M, Lu J, Tian L, Ramachandran V, Kasschau KD, Chapman EJ, Carrington JC, Chen X, Wang XJ, Chen ZJ. 2009. Small RNAs serve as a genetic buffer against genomic shock in Arabidopsis interspecific hybrids and allopolyploids. Proc Natl Acad Sci U S A. 106:17835–17840. Henrichsen CN, Chaignat E, Reymond A. 2009. Copy number variants, diseases and gene expression. Hum Mol Genet. 18:R1–R8. Lafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. 2004. Detection of large-scale variation in the human genome. Nat Genet. 36:949–951. Lehnert S, Van Loo P, Thilakarathne PJ, Marynen P, Verbeke G, Schuit FC. 2009. Evidence for co-evolution between human microRNAs and Alu-repeats. PLoS One. 4:e4456. Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. 2003. Prediction of mammalian microRNA targets. Cell 115:787–798. Li J, Musso G, Zhang Z. 2008. Preferential regulation of duplicated genes by microRNAs in mammals. Genome Biol. 9:R132. Liu N, Okamura K, Tyler DM, Phillips MD, Chung WJ, Lai EC. 2008. The evolution and functional diversification of animal microRNA genes. Cell Res. 18:985–996. Nguyen DQ, Webber C, Hehir-Kwa J, Pfundt R, Veltman J, Ponting CP. 2008. Reduced purifying selection prevails over positive selection in human copy number variant evolution. Genome Res. 18:1711–1723. Nozawa M, Kawahara Y, Nei M. 2007. Genomic drift and copy number variation of sensory receptor genes in humans. Proc Natl Acad Sci U S A. 104:20421–20426. Schuster-Bockler B, Conrad D, Bateman A. 2010. Dosage sensitivity shapes the evolution of copy-number varied regions. PLoS One. 5:e9474. Sebat J, Lakshmi B, Troge J, et al. (21 co-authors). 2004. Large-scale copy number polymorphism in the human genome. Science 305:525–528. Shahi P, Loukianiouk S, Bohne-Lang A, Kenzelmann M, Kuffer S, Maertens S, Eils R, Grone HJ, Gretz N, Brors B. 2006. Argonaute– a database for gene regulation by mammalian microRNAs. Nucleic Acids Res. 34:D115–D118. Stranger BE, Forrest MS, Dunning M, et al. (17 co-authors). 2007. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315:848–853. Wu CI, Shen Y, Tang T. 2009. Evolution under canalization and the dual roles of microRNAs: a hypothesis. Genome Res. 19:734–743. Wu L, Fa J, Belasco GJ. 2006. MicroRNAs direct rapid deadenylation of mRNA. Proc Natl Acad Sci U S A. 103:4034–4039.
© Copyright 2026 Paperzz