PDF - Oxford Academic - Oxford University Press

Increased Number of MicroRNA Target Sites in Genes
Encoded in CNV Regions. Evidence for an Evolutionary
Genomic Interaction
Kyriakos Felekkis, ,1 Konstantinos Voskarides, ,1 Harsh Dweep,2 Carsten Sticht,2 Norbert Gretz,2 and
Constantinos Deltas*,1
1
Department of Biological Sciences and Molecular Medicine Research Center, University of Cyprus, Nicosia, Cyprus
Medical Research Center, University of Heidelberg, Mannheim, Germany
Authors contributed equally to this work.
*Corresponding author: E-mail: [email protected].
Associate editor: Douglas Crawford
2
Abstract
MicroRNAs (miRNAs) and copy number variations (CNVs) are two newly discovered genetic elements that have
revolutionized the field of molecular biology and genetics. By performing in silico whole genome analysis, we demonstrate
that both the number of miRNAs that target genes found in CNV regions as well as the number of miRNA-binding sites
are significantly higher than those of genes found in non-CNV regions. This suggests that miRNAs may have acted as
equilibrators of gene expression during evolution in an attempt to regulate aberrant gene expression and to increase the
tolerance to genome plasticity.
expression level (Stranger et al. 2007; Henrichsen et al.
2009; Schuster-Bockler et al. 2010).
But what drives evolution for CNVs and miRNAs? With
regards to CNVs, there are two main hypotheses: 1) CNVs
are somehow beneficial and directional selection increases
their frequency in the genome, 2) CNVs might be only
slightly deleterious or neutral so they remain in the genome
due to difficulties to get rid of or due to genic drift phenomena. It is very interesting that different genic classes are
not randomly included in CNV regions. CNV regions are
significantly enriched with sensory receptor genes (Nozawa
et al. 2007) in contrast to the dosage sensitive genes
that are significantly underrepresented in CNV regions
(Schuster-Bockler et al. 2010). Additionally, Nguyen et al.
(2008) found indications for reduced purifying selection
in human CNVs regions (Nguyen et al. 2008). Contrary
to CNVs, miRNA alignment and interspecies comparison
is much easier due to their small size. Multi-alignments
show their great conservation, something that underlines
their important role in fine-tuning of gene expression (Liu
et al. 2008).
We herewith hypothesize that for CNVs and miRNA
target sites, a coevolutionary process may exist that configures—at least partly—their distribution in the different
genomes. It is possible that an evolutionary mechanism
regulates the number of miRNAs and miRNAs target sites
in genes found within CNV regions. For example, increased
expression of genes that are located in duplicated regions
may be counteracted and equilibrated with increased miRNA target sites in the 3# UTRs of those genes or possibly
increased number of miRNAs targeting those genes.
To test our hypothesis, we utilized the miRWalk database
(http://www.ma.uni-heidelberg.de/apps/zmf/mirwalk/) and
© The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 28(9):2421–2424. 2011 doi:10.1093/molbev/msr078
Advance Access publication March 25, 2011
2421
Letter
MicroRNAs (miRNAs) and copy number variations (CNVs)
are two newly discovered categories of genomic elements
that have changed the way we view the diploid genome
and added one additional order of complexity in gene
expression and regulation. Also, they increased our
understanding of the plasticity of the human and other
mammalian genomes.
miRNAs belong to the most abundant class of small
RNAs in animals (Farazi et al. 2008). It is a recently discovered class of eukaryotic, endogenous, noncoding RNAs that
play a key role in the regulation of gene expression. Recent
studies show that about 30% of human protein-coding
genes are regulated by one or more miRNAs. They are short
single-stranded RNA molecules (;21–23 nt) that usually
are partially complementary to one or more messenger
RNA (mRNA) molecules (target mRNAs) (Farazi et al.
2008). Their main function is to downregulate gene expression by inhibiting translation or by targeting the mRNA for
degradation or deadenylation (Wu et al. 2006; Borel and
Antonarakis 2008).
CNVs refer to genomic regions with segmental duplications that have been recognized after systematic
comparative genomic hybridizations on the DNA of
healthy human subjects and are defined as DNA segments
that are 1 kb or larger in size and are present at significant
frequency in the population (Bartel 2004; Lafrate et al. 2004;
Sebat et al. 2004). Exact boundaries and allele number of
CNVs can be hardly estimated by current methods, adding
to the complexity of their role in gene expression and disease (Dear 2009; Henrichsen et al. 2009). In addition, only
few studies are available for assessing the role of CNVs on
gene expression and there is significant evidence that
a number of genes located within CNVs have increased
Felekkis et al. · doi:10.1093/molbev/msr078
the TargetScan algorithm (http://www.targetscan.org/) to
predict the potential number of targeting miRNAs and
the number of miRNA-binding sites on the 3# UTR of various
genes that are found either in CNV or non-CNV regions. miRWalk algorithm searches for seeds based on Watson–Crick
complementarity, walking on the complete sequence of
a gene starting with a heptamer (seven nucleotides) seed
of miRNA sequences. As soon as it identifies a heptamer perfect base pairing, it immediately extends the length of the
miRNA seed until a mismatch arises. It then returns all
possible hits with seven or longer matches. Then it assigns
the prediction results in four parts, according to promoter
region, 5# UTR, coding sequence and 3# UTR (Shahi
et al. 2006). In this study, we concentrated only on the
predicted miRNA-binding sites within the 3# UTR of
genes in the entire human genome. On the other hand, TargetScan takes into account not only the sequence homology
between the miRNA sequence and target mRNA sequence
but also it integrates thermodynamics-based modeling of
miRNA–mRNA interactions and comparative sequence
analysis to predict miRNA targets conserved across multiple
genomes (Lewis et al. 2003). The use of two independent and
different methods improves the reliability of the predictions
made.
We analyzed two categories of genes of the entire
human genome: 1) All human genes, 12,705, that are encoded within non-CNV regions, downloaded from NCBI
and 2) all human genes, 9,673, that belong to a CNV region
that is registered to ‘‘Database of Genomic Variants ’’ as
copy number (http://projects.tcag.ca/variation/) (Lafrate
et al. 2004). The number of predicted miRNAs and the
number of their target sites were determined for each gene
using miRWalk and TargetScan and the means were
compared between the two different categories of genes
mentioned above.
Figure 1a summarizes the results between the two classes of genes. It is shown that the number of miRNAs
predicted to regulate genes found in CNV regions of the
genome is significantly higher (P value , 0.0001) than
the one of genes found in non-CNV regions, as determined
by both miRWalk and TargetScan.
Because miRNAs can regulate gene expression by binding to the same gene in more than one target sites, we also
determined the total number of miRNA-binding sites in
each of the above genes. As with the number of miRNAs,
the total number of binding sites is significantly higher (P
value , 0.0001) in CNV genes than that in non-CNV genes
as predicted by both algorithms (fig. 1b). It should be noted
that the difference in the absolute values of miRNAs and
miRNA-binding sites as calculated by the two algorithms
can be attributed to the difference of the prediction methodologies used.
In order to address the possibility that the increase in the
total number of miRNA-binding sites in CNV genes is independent of the number of miRNA, we compared the
mean number of binding sites per miRNA molecule for
CNV and non-CNV genes. As shown in figure 1c, the mean
number of binding sites per miRNA in CNV genes is higher
2422
MBE
than the one in non-CNV genes as determined by both
miRWalk (P value , 0.0001) and TargetScan (P value ,
0.05). Collectively, these results demonstrate that genes
found in CNV regions of the human genome are targeted
by more miRNA molecules and at the same time they have
more miRNA-binding sites than genes found in non-CNV
regions. These results cannot be attributed to a difference
in the length of the respective 3# UTRs in CNV versus nonCNV genes. When we compared the relevant lengths of 3#
UTRs in human/chimp CNV genes versus non-CNV genes,
we found them very similar with no statistically significant
difference (Human CNVs vs. non-CNVs: 953.5 ± 11.11 standard error of the mean [SEM] vs. 932 ± 11.54 SEM, Chimpanzee CNVs vs. non-CNVs : 976.2 ± 60.76 SEM vs. 1005 ±
14.82 SEM, Student’s t-test shows no statistical
significance).
These results of in silico analysis support our original
hypothesis. But, what is the explanation for these results?
It appears that in the evolutionary timescale as genes were
duplicated in the genome, the organisms adapted to the
higher levels of protein expression by the concurrent
accumulation of binding sites for new miRNAs on the
3# UTR of those genes and at the same time by increasing
the number of miRNA-binding sites per miRNA molecule.
In order to support this notion, we performed comparative analysis examining the mean number of miRNAs per
gene in human versus the mean number of miRNAs
per gene in Chimpanzee, in CNV genes. As shown in
figure 1d, the number of miRNAs in human CNV genes
is significantly higher than the one in chimpanzee
(miRWalk and TargetScan). This is an indication that
the number of miRNAs targeting a gene, increase after
the formation of CNVs because genes in CNVs formed
in human lineage are targeted by a significantly higher
number of miRNAs than the genes found within CNVs
in chimpanzee for which data are available. We propose
that through this analysis, we may be witnessing an evolutionary interaction between two widespread genomic
elements, an evolutionary process that may have
operated also between other genomic elements. This
phenomenon may be more evident for particular gene
families or classes that are more sensitive in dosage
and expression deregulations.
As discussed above, there are two possible explanations for the wide genomic presence of CNVs. This
may be a third one. Recently, it was proved (Ha et al.
2009) that small RNAs produced during interspecific
mating or polyploidization serve as a buffer against
the genomic shock in interspecific hybrids and allopolyploids. Ha et al. and our in silico investigative analysis
show that miRNAs (and perhaps other small RNAs) have
an equilibrating role for genomic dosage phenomena.
Importantly, Lehnert et al (2009) showed that sense
Alu sequences are enriched for miRNA target sites
(Lehnert et al. 2009). Even more noteworthy is the work
by Li et al (2008) who found that miRNA targets are
significantly enriched for paralogs genes while they
mention also that their results suggest that ‘‘miRNA-
Increased Number of miRNA Target Sites · doi:10.1093/molbev/msr078
MBE
FIG. 1. (a) Mean number of miRNA per genes as determined by miRWalk (left) and TargetScan (right) in non-CNV genes (97.95 1/- 0.89 SEM;
68.03 1/2 0.65 SEM) and CNVs (126.80 1/2 1.17 SEM; 72.56 1/2 0.69 SEM) in the whole genome. (b) Mean number of miRNA-binding sites
per genes as determined by miRWalk (left) and TargetScan (right) in non-CNV genes (138.7 þ/ 1.49 SEM; 85.73 þ/ 1.01 SEM) and CNVs
(185.7 þ/ 2.05 SEM; 92.89 þ/ 1.08 SEM) in the whole genome. (c) Average of the mean binding sites per miRNA as determined by miRWalk
(left) and TargetScan (right) in non-CNV genes (1.28 1/2 0.001 SEM; 1.14 1/2 0.003 SEM) and CNVs (1.32 1/2 0.002 SEM; 1.16 1/2 0.002
SEM) in the whole genome. (d) Mean number of miRNA-binding sites per gene as determined by miRWalk (left) and TargetScan (right) in
human CNV genes (128.2 þ/ 4.62 SEM; 73.17 þ/ 2.55 SEM) and chimpanzee CNV genes (71.59 þ/ 2.51 SEM; 12.40 þ/ 0.59 SEM).
Comparison of means was performed by t-test using the SPSS statistical software. Data represent means of total miRNA number per gene. The
symbols (*) and (***) denote significance difference between the means, with a P value , 0.05 and ,0.0001, respectively.
mediated regulation plays an important role in the regulatory circuits involving duplicated genes including adjusting imbalanced dosage effects of gene duplicates and
possibly creating a mechanism for genetic buffering.’’
Hence, this may be an evolutionary phenomenon pertaining to genomic repeats (Li et al. 2008).
2423
Felekkis et al. · doi:10.1093/molbev/msr078
We suggest that miRNAs may have arisen under evolutionary pressure, as a mechanism for increasing the
tolerance to and dealing with genome plasticity. Alternatively, it is possible that CNVs that are targeted by micRNAs
had a higher chance to segregate or be fixed due to the
buffering effects of the miRNAs. This may be an additional
underestimated role of miRNAs. Additionally, the results
from this whole human genome analysis support the
notion that the overall main molecular contribution of
miRNAs is to downregulate gene expression with the
aim of conserving physiology and homeostasis. In a recent
study, it is hypothesized that miRNAs play a role in canalizing gene expression, that is, in stabilizing a phenotype
within a species. According to the authors, miRNAs may
do so by a dual function, tuning and buffering of gene expression, which means that they assist in presetting a mean
level of gene expression while they also reduce the variance
around this mean. This is turning out to be an essential
means of achieving homeostasis in living organisms and
our results are in agreement with this hypothesis (Wu
et al. 2009)
We wish to close this report by emphasizing that our
hypothesis does not in any way annul the true physiological
and required role of genes that have been found or will be
found to function in more than two doses generated from
the dogmatic two copies in diploid organisms. Our approach has been a holistic and statistical one that identified
a trend that relates the hypothesized coevolutionary history of CNVs and miRNAs, in a fashion that is similar to
the one proposed by other recent works (Li et al. 2008;
Lehnert et al. 2009). The application of similar analysis
in other mammalian species, other high-throughput and
wet-lab approaches will certainly enable more sophisticated and demanding analysis aimed at elucidating their
complete evolutionary role in gene regulation.
Acknowledgments
This work was funded by programs from the Cyprus Research Promotion Foundation, DIDACTOR/DISEK/0308/
07 to K.F., and NEW INFRASTRUCTURE/STRATEGIC/
0308/24 to C.D.
References
Bartel DP. 2004. MicroRNAs: genomics, biogenesis, mechanism, and
function. Cell 116:281–297.
2424
MBE
Borel C, Antonarakis SE. 2008. Functional genetic variation of human
miRNAs and phenotypic consequences. Mamm Genome. 19:
503–509.
Dear PH. 2009. Copy-number variation: the end of the human
genome? Trends Biotechnol 27:448–454.
Farazi AT, Juranek AS, Tuschl T. 2008. The growing catlog of small
RNAs and their association with distinct Argonaute/Piwi family
members. Development 135:1201–1214.
Ha M, Lu J, Tian L, Ramachandran V, Kasschau KD, Chapman EJ,
Carrington JC, Chen X, Wang XJ, Chen ZJ. 2009. Small RNAs serve
as a genetic buffer against genomic shock in Arabidopsis
interspecific hybrids and allopolyploids. Proc Natl Acad Sci U S A.
106:17835–17840.
Henrichsen CN, Chaignat E, Reymond A. 2009. Copy number variants,
diseases and gene expression. Hum Mol Genet. 18:R1–R8.
Lafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y,
Scherer SW, Lee C. 2004. Detection of large-scale variation in the
human genome. Nat Genet. 36:949–951.
Lehnert S, Van Loo P, Thilakarathne PJ, Marynen P, Verbeke G,
Schuit FC. 2009. Evidence for co-evolution between human
microRNAs and Alu-repeats. PLoS One. 4:e4456.
Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. 2003.
Prediction of mammalian microRNA targets. Cell 115:787–798.
Li J, Musso G, Zhang Z. 2008. Preferential regulation of duplicated
genes by microRNAs in mammals. Genome Biol. 9:R132.
Liu N, Okamura K, Tyler DM, Phillips MD, Chung WJ, Lai EC. 2008.
The evolution and functional diversification of animal microRNA
genes. Cell Res. 18:985–996.
Nguyen DQ, Webber C, Hehir-Kwa J, Pfundt R, Veltman J,
Ponting CP. 2008. Reduced purifying selection prevails over
positive selection in human copy number variant evolution.
Genome Res. 18:1711–1723.
Nozawa M, Kawahara Y, Nei M. 2007. Genomic drift and copy
number variation of sensory receptor genes in humans. Proc
Natl Acad Sci U S A. 104:20421–20426.
Schuster-Bockler B, Conrad D, Bateman A. 2010. Dosage sensitivity
shapes the evolution of copy-number varied regions. PLoS One.
5:e9474.
Sebat J, Lakshmi B, Troge J, et al. (21 co-authors). 2004. Large-scale
copy number polymorphism in the human genome. Science
305:525–528.
Shahi P, Loukianiouk S, Bohne-Lang A, Kenzelmann M, Kuffer S,
Maertens S, Eils R, Grone HJ, Gretz N, Brors B. 2006. Argonaute–
a database for gene regulation by mammalian microRNAs.
Nucleic Acids Res. 34:D115–D118.
Stranger BE, Forrest MS, Dunning M, et al. (17 co-authors). 2007.
Relative impact of nucleotide and copy number variation on
gene expression phenotypes. Science 315:848–853.
Wu CI, Shen Y, Tang T. 2009. Evolution under canalization and the
dual roles of microRNAs: a hypothesis. Genome Res. 19:734–743.
Wu L, Fa J, Belasco GJ. 2006. MicroRNAs direct rapid deadenylation
of mRNA. Proc Natl Acad Sci U S A. 103:4034–4039.