B RIEFINGS IN FUNC TIONAL GENOMICS AND P ROTEOMICS . VOL 8. NO 6. 407^ 423 doi:10.1093/bfgp/elp038 Pervasive transcription of the eukaryotic genome: functional indices and conceptual implications Marcel E. Dinger, Paulo P. Amaral, Timothy R. Mercer and John S. Mattick Advance Access publication date 21 September 2009 Abstract Genome-wide analyses of the eukaryotic transcriptome have revealed that the majority of the genome is transcribed, producing large numbers of non-protein-coding RNAs (ncRNAs). This surprising observation challenges many assumptions about the genetic programming of higher organisms and how information is stored and organized within the genome. Moreover, the rapid advances in genomics have given little opportunity for biologists to integrate these emerging findings into their intellectual and experimental frameworks. This problem has been compounded by the perception that genome-wide studies often generate more questions than answers, which in turn has led to confusion and controversy. In this article, we address common questions associated with the phenomenon of pervasive transcription and consider the indices that can be used to evaluate the function (or lack thereof) of the resulting ncRNAs. We suggest that many lines of evidence, including expression profiles, conservation signatures, chromatin modification patterns and examination of increasing numbers of individual cases, argue in favour of the widespread functionality of non-coding transcription. We also discuss how informatic and experimental approaches used to analyse protein-coding genes may not be applicable to ncRNAs and how the general perception that protein-coding genes form the main informational output of the genome has resulted in much of the misunderstanding surrounding pervasive transcription and its potential significance. Finally, we present the conceptual implications of the majority of the eukaryotic genome being functional and describe how appreciating this perspective will provide considerable opportunity to further understand the molecular basis of development and complex diseases. Keywords: non-coding RNA; functional RNA; pervasive transcription; gene definition INTRODUCTION Each technical advance in examining the eukaryotic transcriptome has revealed increasing degrees of its complexity. Our understanding of the structure of the genome has shifted accordingly, from a simple model where each gene (with its adjacent cis-regulatory sequences) comprises a discrete unit that yields a messenger RNA encoding a single protein, to a model where a gene is a blurred entity that encompasses a complex network of protein-coding and non-coding transcripts with both proximal and distal regulatory elements. Historically, new insights into the complexity of the transcriptome, such as the existence of alternative splicing, antisense transcripts and microRNAs, were initially identified as idiosyncratic phenomena, then later recognized as common Corresponding author. John S. Mattick, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia. Tel: þ61-7-3346-2079; Fax: þ61-7-3346-2111; E-mail: [email protected] Marcel Dinger is a Senior Research Officer at the Institute for Molecular Bioscience at the University of Queensland, Australia. His research focuses on unravelling the functions of long non-coding RNAs in mammalian development and disease. Paulo Amaral is a PhD student at the Institute for Molecular Bioscience at the University of Queensland, Australia. His research focuses on the role of non-coding RNA in the epigenetic control of differentiation and development. Tim Mercer is a Research Officer at the Institute for Molecular Bioscience at the University of Queensland, Australia. His research focuses on the evolution and expression of long non-coding RNAs and their function in the brain. John Mattick is a Professor of Molecular Biology and Australian Research Council Federation Fellow at the Institute for Molecular Bioscience at the University of Queensland, Australia. His research focuses on the role of non-coding RNAs in the evolution and development of complex organisms, and the emergence of cognition in humans. ß The Author 2009. Published by Oxford University Press. For permissions, please email: [email protected] 408 Dinger et al. or typical features. In every case this recognition has involved a debate about the degree to which each new phenomenon has general relevance. Transcriptomic studies based on the analysis of cDNA libraries [1], 50 -CAGE tags [2] and genome tiling arrays [3–5] have shown that, surprisingly, the vast majority of the eukaryotic genome is dynamically transcribed in one setting or another, mostly as non-protein-coding RNAs (ncRNAs). Estimates range from 70% in nematode worm to 85% in fruit fly and 93% in humans. It is also clear that much of the genome is transcribed on both strands, and that many protein-coding loci also express antisense RNAs [6]. Moreover, recent data shows that many human genes (two-thirds of those examined) contain previously unrecognized first exons (and associated promoters) located at huge distances (an average of 186 kb; median 85 kb) upstream of the currently annotated start sites that are expressed in a tissuespecific manner and often span neighbouring genes [7], with similar results in fruit fly [8, 9], as well as intergenic splicing and fusion of transcripts that may be derived from widely separated locations including different chromosomes [10–14], and various types of post-splicing processing to produce smaller/variant transcripts [15–22] (Figure 1). These observations build a picture of the genome as a semi-continuous information system that is extraordinarily sophisticated and highly regulated to produce complex suites of RNA with structural, catalytic, protein-coding and regulatory potential in different cells and in different developmental contexts. In parallel, it is becoming increasingly apparent that the traditional concept of the gene as a discrete unit of (generally) protein-coding information with cis-regulatory elements is simplistic and misleading when applied to the genomes or genetics of the higher organisms. Although initially discounted as artefacts in cDNA library preparations, many independent observations have shown that these unexpected non-proteincoding transcripts are prevalent in vivo. However, the proportion of this pervasive transcription that is functional is unknown, and has been the subject of ongoing debate, with contrasting suggestions that it simply represents transcriptional ‘noise’ [23–26] or that it represents a hidden layer of regulatory information affecting gene expression, especially epigenetic status, during differentiation and development [27, 28]. There are some indices that are consistent with function, such as the dynamic and precise expression patterns of these transcripts during Figure 1: Overview of pervasive transcription and its implications on the gene concept. A representation of a traditional gene (boxed) shown in context with associated coding and non-coding transcripts identified in various transcriptomic analyses. The highlighted sections exemplify how experimental approaches targeting regions within one ‘gene’ may inadvertently target other intersecting transcripts and thereby yield confounding results. Abbreviations: PASRs, promoter-associated small RNAs; TALRs, terminal-associated long RNAs; tiRNAs, tiny RNAs. Pervasive transcription of the eukaryotic genome differentiation [29–31], but others that appear to be inconsistent with function. The latter include the observations that the fraction of the non-coding DNA (ncDNA) that is considered conserved is significantly smaller than that which is transcribed, that many ncRNAs are transient and only present in low levels, and finally that, if functionally relevant, these RNAs should have been discovered before now. However, these considerations are largely conceived within a paradigm based on the properties of protein-coding sequences, which have influenced expectations for functional conservation and expression levels, and dominated the focus and interpretation of genetic screens [32]. In this article, we address key questions concerning the significance of pervasive transcription in eukaryotic genomes and propose that assumptions about functional indices based on the characteristics of protein-coding information do not necessarily apply to non-coding transcribed regions. We present the argument that a large fraction, perhaps the majority, of the non-coding transcription is functional and consider how this impacts on our understanding of the information content of the genome. MOST PERVASIVE TRANSCRIPTION OCCURS AT LOW LEVELSçCAN LOW-LEVEL EXPRESSION BE MEANINGFUL? The prevalent transcription across the genome generally occurs at a low frequency relative to protein-coding genes. Indeed, in the absence of normalization strategies to enrich for less abundant transcripts [1, 2] or array-based interrogation methods that can detect low abundance transcripts [3, 4], it appeared in early studies that the RNA repertoire of cells was comprised (apart from common infrastructural ncRNAs such as ribosomal and transfer RNAs) almost entirely of mRNAs, and any remainder simply discounted as degradation products or incomplete cloned fragments (‘expressed sequence tags’ or ESTs) from protein-coding transcripts. The conclusion that most of the transcriptome specified mRNAs was also influenced by the use of purification strategies based on the presence of polyA tails to reduce contamination by infrastructural RNAs, but array-based methods subsequently revealed that 44% of transcripts are not polyadenylated, and that this fraction was largely different in sequence 409 composition from the polyadenylated RNAs [3]. Additionally, there are significant numbers of transcripts derived from RNA polymerase III promoters that escape capture by traditional methods [33]. The existence of pervasive expression of the genome has only recently been established and, like the discovery of introns, has come as a surprise. This in turn has prompted concerns that this transcription may simply be background noise, a view informed by the perception that most of the genome is comprised of evolutionary debris and reinforced by the expectation that the low expression of the non-coding transcripts is reflective of non-function. However, these perceptions rest on the assumption that proteins transact most genetic information, and that the functional efficacy of ncRNAs can be quantified on the same scale as protein-coding transcripts, which is not reasonable if their mode of action is different. Messenger RNAs are necessary in levels adequate to produce required amounts of proteins, which is high for those with core structural or metabolic functions and common to most types of cells [34]. In contrast, regulatory signals may only be expressed in small amounts (or in particular cells, see below), like hormones and many transcription factors, which can trigger a cascade with amplified downstream effects. For example, an ncRNA that interacts directly with the genome at a unique locus would need only to be present at a 1 : 1 ratio with each allele (i.e. just one or two copies per cell), especially since many regulatory RNAs appear to act locally, for example to alter local chromatin architecture [28], in which case diffusion is less of a problem. Moreover, if an ncRNA targets a regulatory gene that targets other genes in a network, profound changes in expression programs could eventuate. Indeed, there is accumulating evidence that transcription factors and other global regulators are prevalent targets for different classes of ncRNAs [29, 35–38] and that small variations in the expression of these factors can have dramatic phenotypic consequences (see e.g. [39]). In addition, it might be expected that ncRNAs, especially those involved in regulating differentiation and development, would only be expressed in particular places or restricted subsets of cell populations, giving the impression of overall low levels in a tissue or organism as a whole. For instance, one of the few genetically identified microRNAs (miRNAs) in Caenorhabditis elegans, lsy-6, which 410 Dinger et al. Figure 2: Specific expression of a long ncRNA in the adult mouse brain. In situ hybridization image of the whole brain (top-left) showing the expression of ncRNA AK011437 in the CA1 region of the hippocampus (top-right). The false colour image (bottom) highlights the specificity of the expression. Image courtesy of the Allen Institute for Brain Science. determines left-right asymmetry in taste-receptor neurons, is expressed in a very limited subset of neurons and was initially very difficult to verify biochemically [40]. This appears to be common. A large fraction of long ncRNAs (849 out of 1328 examined) are expressed in brain and are easily detectable by insitu hybridization in particular cells in, for example, the hippocampus, cortex or cerebellum [30], but they only comprise a tiny fraction of all transcripts in the brain as a whole (Figure 2). This also applies to small RNAs, as strikingly illustrated by piwi-interacting RNAs (piRNAs), small RNAs of 30 nt highly enriched in germline cells that were unknown until very recently when they were identified in testis by specific immunoprecipitation, despite being present in sufficient amounts to be visible on an agarose gel [41]. Finally, it should be noted that, despite the general trend, some long ncRNAs are actually highly expressed [42]. NON-CODING RNAs ARE OFTEN MORE RAPIDLY DEGRADED THAN mRNAsçDOES THAT SUGGEST THEY ARE JUST NOISE THAT THE CELL IS TRYING TO DAMPEN? The rapid degradation of some non-coding transcripts is often cited as indicative of lack of function [43]. However, the ready detection of many ncRNAs by Northern, microarray or in situ hybridization analyses indicates that there is an operational homeostasis between their synthesis and degradation, and that many are relatively stable. Nonetheless, the knockdown of exosome components in Arabidopsis and human cells has revealed an additional hidden layer of hundreds of ncRNAs [44, 45] that are normally rapidly degraded by RNA surveillance programs [46], such as by 30 !50 exonucleases in exosomes [47] or the so-called nonsensemediated decay (NMD) pathway [48]. Despite the targeting of this hidden layer by NMD [43], increasing evidence suggests that it is unlikely that all these transcripts are actually ‘nonsense’. The transcripts rapidly degraded by the exosome, also known in yeast as ‘cryptic unstable transcripts’ (CUTs) [49], are non-randomly distributed in the genome and are often associated with promoters [50], suggesting an involvement in gene regulation. Indeed, several CUTs have been ascribed roles in the regulation of histone modifications with subsequent effects on gene expression [51]. For example, in yeast, a cryptic antisense transcript is able to delay the chromatin remodelling and subsequent RNA polymerase II recruitment to the sense PHO5 gene [52]. Similarly, the expression of two antisense cryptic transcripts was shown to induce histone deacetylase activity that resulted in the repression Pervasive transcription of the eukaryotic genome of the sense PHO84 gene [53]. It has also been shown that these long antisense RNAs when ectopically expressed can act in trans to mediate transcriptional silencing by targeting different histone deacetylases specifically to the PHO84 promoter [54]. Moreover, CUTs play a broad role in limiting the dispersion of repetitive elements in the yeast genome. Recently, a cryptic antisense RNA was found to silence the action of the sense Ty1 repetitive element in trans via Set1 histone methyltransferase action [55]. Although much of this analysis is currently restricted to yeast, whose simplicity and experimental tractability has favoured the quantitative description of non-coding transcription and its stability, it seems likely that these phenomena are widespread in eukaryotes. Similarly, given the range of key processes affected by NMD [56], it is possible that NMD is not simply a mechanism to degrade aberrant mRNAs, but rather a sophisticated system for modulating transcript stability that can discriminate between protein-coding and regulatory RNAs. These could involve uncharacterized mechanisms of control of ncRNA processing and metabolism, as indicated by the involvement of NMD and nuclear degradation pathways in the well-established role of Xist in dosage compensation in mammals, which are essential for the up-regulation of spliced Xist transcripts at the onset of X-inactivation [57]. The biological potency of relatively shortlived transcripts can also be rationalized from a mechanistic perspective, in a similar fashion to lowlevel transcription. Unlike mRNAs, which need to persist for a sufficient period to be exported and undergo several rounds of translation, a nuclearacting ncRNA can immediately exert its function. Indeed, this can represent an intrinsic advantage of RNA regulation given that RNA signals can not only be rapidly produced, but also rapidly eliminated, providing efficient and dynamic changes to the system. For example, non-coding transcripts associated with the Ccdn1 promoter are quickly induced by DNA damage, and even in low-copy numbers can recruit and locally activate the RNAbinding protein TLS to repress Ccdn1 expression [58]. Conversely, short-lived transcripts may also work as triggers with stably induced responses. This is exemplified by the p15 antisense RNA, p15AS, which induces the silencing of the p15 tumour suppressor gene in leukaemia through heterochromatin formation [59]. The subsequent inhibition of p15AS 411 does not release the silencing suggesting that transient expression of the antisense transcript acts as a trigger to induce persistent epigenetic changes. Moreover, transcripts that target proteins to the genome are only required for the duration of the recruitment. Indeed, such a mechanism whereby ncRNAs are involved in the recruitment of regulatory proteins has been proposed for centromeric RNAs [60, 61] and promoter associated transcripts that are targets for small RNAs, both of which undergo rapid turnover [62]. IS IT THE ACT OF TRANSCRIPTION OR THE TRANSCRIPT THAT IS FUNCTIONAL? An alternative mechanism by which regulated expression of ncDNA might exert a biologically important function is by the act of transcription itself [25]. This may be illustrated by the fbp1 locus in Schizosaccharomyces pombe, wherein long rare transcripts initiated upstream of the fbp1 gene initiate a cascade of ncRNAs with concomitant alteration of the chromatin around the fbp1 promoter to allow its efficient transcription [63]. In this case, the nucleosome rearrangement caused by RNA polymerase II activity is considered to be responsible for the local chromatin remodelling, with the resulting noncoding transcripts thought simply to be by-products with no intrinsic function. A recent study suggests that non-coding transcription can remodel the expression landscape (‘ripples of transcription’), and thereby influence the expression of neighbouring loci [64]. Negative regulation of gene expression by transcriptional interference also occurs in yeast [65] and probably in higher eukaryotes [66], suggesting that ncDNA transcription per se represents an important mechanism for both positive and negative influence on gene expression, a possibility supported by the observation that the promoters of non-coding transcripts are generally more conserved than the transcripts themselves [2, 67]. However, it is difficult to reconcile this essentially binary act with the complex range of epigenetic changes that have been documented at active and inactive gene loci [68, 69]. Moreover, in most cases where transcription of non-coding regions clearly has a regulatory role, such as in the b-globin enhancers [70] and imprinted DNA methylation sites in the germline [71], the function of the resulting transcripts has not been studied, and it 412 Dinger et al. remains an open possibility that they convey a regulatory signal. For example, the imprinted ncRNA Air, an unspliced 100 kb transcript whose transcription had been proposed to have regulatory function [72], was recently demonstrated to directly interact with histone methyltransferase complexes to epigenetically repress neighbouring gene expression in cis, thus regulating the imprinting of target genes [73]. In some cases both the transcription of a locus and the resultant transcripts may be functional, as observed in the dihydrofolate reductase (DHFR) locus where the transcription from an upstream minor promoter and the resulting ncRNA contribute to DHFR repression [74]. This may also be the case for the Tsix inhibition of Xist expression, and, indeed, while direct demonstration of a role for Air transcription itself is missing and the transcript has been shown to act in trans [73], a number of observations have led to the proposition that both mechanisms could be taking place to regulate imprinting of the genes that are not overlapped (Slc22a2 and Slc22a3) or overlapped (Igf2r) by Air transcription, respectively (see ref. [75] for discussion and additional references). Finally, nascent transcripts may also represent targets that recruit effector factors, such as in the previous examples of regulatory functions of the centromeric and promoter-associated RNAs [60–62]. These observations highlight the need for careful experimental design to differentiate the assignment of functions to transcription per se and/or the resulting transcripts. NON-CODING TRANSCRIPTS ARE POORLY CONSERVEDç DOESN’T THIS INDICATE THEY ARE NOT UNDER SELECTION AND THEREFORE NON-FUNCTIONAL? When considered collectively, ncRNAs are on average less conserved in sequence than mRNAs, although they have similar conservation levels as pre-mRNA transcripts [67]. Nevertheless, an examination of various signatures of selection, including those in promoters, primary sequences, splice site motifs [67] and secondary structures [76–78], reveals that many long ncRNAs do not evolve neutrally [67]. Furthermore, thousands of loci that express novel long spliced ncRNAs, identified by particular chromatin marks characteristic of transcription initiation and elongation in intergenic regions of mouse and human genomes, exhibit detectable conservation [36, 79]. Moreover, the most highly conserved sequences in the vertebrate genome are almost exclusively non-coding [80, 81], many of which are transcribed in a regulated manner that is disrupted in cancer [82]. Since several known functional long ncRNAs, such as Xist and Air, are poorly conserved, it is evident that relative lack of conservation does not necessarily signify lack of function and that long ncRNAs are under different evolutionary constraints to protein-coding genes [83]. Indeed, novel intergenic ncRNAs predicted in Drosophila species on the basis of the conservation of splice sites alone were shown to be developmentally regulated and expressed in different species, suggesting these represent functional transcripts [84]. The secondary structure of RNA can similarly be maintained despite changes to the primary sequence due to the possibility of complementary base-pair substitutions and tolerance for insertions. Accordingly, many regions of eukaryotic genomes are predicted to be conserved at the RNA secondary structural level [76–78]. For example, analysis of conserved RNA secondary structures identified more than 30 000 potential structural RNA elements in the human genome, many of which overlap with known sites of transcription [78]. Similarly, comparative analysis of seven yeast species identified 2800 genomic loci that showed signatures of evolutionarily conserved RNA secondary structures, a substantial subset of which occurred in non-coding and antisense regions [85]. The structure–function relationships of regulatory sequences, including regulatory RNAs, can also be quite different from those of proteins, which are essential functional and often multitasked components of cells with precise three-dimensional requirements; features that are reflected in their different conservation characteristics. Moreover, most evolutionary innovation in higher organisms, and evolutionary selection for altered expression patterns that underpin phenotypic radiation, occurs in regulatory sequences [86], which can and do evolve flexibly and rapidly [87–91]. Indeed, those regions of the human genome conserved amongst vertebrates that are subject to recent evolutionary change relative to chimpanzee occur mainly in non-coding regions, some of which are transcribed [92, 93]. This includes the ncRNA HAR1, which has undergone rapid evolutionary Pervasive transcription of the eukaryotic genome change in humans and is specifically expressed in the Cajal-Retzius cells in the human neocortex [93]. The observation that many functionally validated RNAs are evolving quickly [83, 90] may result from these sequences having more plastic structure–function constraints, and we may expect considerable evolutionary innovation to occur in such sequences. Indeed, many RNAs can be lineagespecific, such as Xist [94] and Air [95] in eutherian mammals and transcribed Alu sequences in primates [96], and even whole classes of ncRNAs specific to particular organisms, such as Leishmania [97] and Dictyostelium [98]. Similar observations have been made in bacteria, in which functional RNAs are poorly conserved [68] and can be specific to certain strains [99]. Finally, lineage-specific phenomena, such as DNA elimination [100] and whole genome rearrangements in ciliate reproduction, can be guided by ncRNAs [100, 101], suggesting the evolution of a whole mechanistic infrastructure based on RNA-regulation. MUCH OF THE MAMMALIAN GENOME IS REPETITIVEçEVEN IF IT IS TRANSCRIBED,WHAT EVIDENCE IS THERE TO SUGGEST THAT REPETITIVE SEQUENCE CAN BE FUNCTIONAL? A major criticism of the proposal that much of the mammalian genome is functional is that a significant proportion, almost half in human and mouse, is composed of transposon-derived sequences. Along with duplication, transposition is a major force in genome evolution [102, 103], and is increasingly implicated in the emergence of regulatory innovations such as imprinting [104, 105]. Nevertheless, transposon-derived sequences per se have been widely considered to be (mainly) non-functional and not major contributors to genome function. This may well be an incorrect assumption—one which has also biased our understanding of how much of the genome is conserved, as they are used as the index sequences to assess the background rate of ‘neutral evolution’ [87]. This issue is one of intense debate, with conflicting evidence and/or interpretations thereof. For example, the apparent uniformity of indel distribution in transposonderived sequences suggests that these sequences are not under selection, but such analyses did not consider different subclasses of transposon-derived 413 sequences or examine the indel distribution in regulatory sequences (such as promoters) that have different structure–function constraints to proteincoding exons or miRNAs. On the other hand, mammalian-wide interspersed repeats (MIRs), which occupy 1–2% of the genome (300 000 copies) and date back 130-million years, have a lower than expected divergence from the mammalian MIR consensus, with similar divergence in both human and mouse, although neutrally evolving ancient repeats should be twice as divergent in mouse, suggesting they are subject to selection [106]. These elements also exhibit variable conservation across their length with a relatively conserved 70-nt central region containing a 15–25 nt highly conserved core, also indicative of selection [106, 107]. Moreover, Alu elements, which occupy 10.5% of the human genome, also have a conserved core [108] and a non-random distribution that suggests positive selection [109]. The full regulatory potential of transposonderived sequences is unknown, but appears to include a range of functions [87, 110]. They are largely transcribed in a regulated manner, and feature promoters to drive specific expression [111]. Important roles have also been demonstrated for Alu-derived RNAs in the regulation of RNA polymerase II during heat shock [112] and the regulation of alternative splicing, translation and mRNA stability [113]. Similarly, it has been shown that transcripts derived from retrotransposons can regulate chromatin structure in transposon-rich regions such as centromeres and neocentromeres [114, 115], that LINE L1 retrotransposition can mediate somatic mosaicism in neuronal precursor cells [116], and that the transcription of inverted repeats that serve as boundary elements can influence gene expression [117]. Moreover, a recent study has shown that 6–30% of cap-selected mouse and human RNA transcripts initiate within repetitive elements, that approximately 250 000 of these transcripts are generally tissue specific, and that transposonderived sequences located immediately upstream of protein-coding loci frequently function as alternative promoters and/or express non-coding RNAs, identifying some 23 000 candidate regulatory regions derived from retrotransposons [111]. In addition, repetitive sequences may be included as parts of larger transcripts, including many ncRNAs whose functions have been demonstrated, such as Xist, Air, Kcnq1ot1, BORG, DISC2, NTT and Xlisrts, 414 Dinger et al. suggesting that these elements may be functional modules common among ncRNAs [118] as well as mRNAs [119]. IF MANY NON-CODING REGIONS IN THE GENOME ARE FUNCTIONAL,WHY HAVE THEY NOT BEEN DETECTED IN GENETIC SCREENS? The widely held assumption that protein-coding genes are the main functional outputs of the genome was classically supported by genetic screens, which identified genes associated with modified phenotypes, most of which had mutations that resulted in defective proteins. This assumption is largely a consequence of an historic emphasis, both technically and phenotypically, on protein-coding genes, as well as assumptions about the basis of regulatory mutations (for review and full discussion see [32]). However, it is now evident that most complex genetic and epigenetic phenomena in eukaryotes are RNA-directed [27]. These include RNA interference-related processes such as transcriptional and post-transcriptional gene silencing [114, 120–123], position effect variegation [124, 125], hybrid dysgenesis [126], chromosome dosage compensation, parental imprinting and allelic exclusion [127], germ cell reprogramming [128], paramutation [129, 130] and possibly transvection and transinduction [32]. The recent recognition that ncRNAs function in various aspects of cell biology has prompted a reconsideration of functional polymorphisms located in non-coding regions. The majority of genome-wide association studies to identify single nucleotide polymorphisms (SNPs) associated with disease susceptibility have mapped the variation to non-coding regions, with a number overlapping expressed ncRNAs [131]. For example, SNPs that identified a susceptibility locus for myocardial infarction mapped to a long ncRNA, MIAT [132]. Similarly, a region associated with coronary artery disease [133] encompasses a long ncRNA, ANRIL, that associates with a high-risk haplotype for coronary artery disease and is expressed in tissues and cell types affected by atherosclerosis [133]. However, the complex and pervasive networks of non-coding transcription within these regions can make it particularly difficult to elucidate the functional effects of polymorphisms. For example, a SNP both within the truncated form of ZFAT and the promoter of an antisense transcript increases the expression of ZFAT not through increasing the mRNA stability, but rather by repressing the expression of the antisense transcript [134]. Further association studies and an era of personal genomics will likely reveal many more similar examples of natural polymorphisms within functional non-coding transcripts. IN THE CURRENT ABSENCE OF EXTENSIVE FUNCTIONAL STUDIES,WHAT EVIDENCE IS THERE TO SUGGEST PERVASIVE NON-CODING TRANSCRIPTION IS GENERALLY FUNCTIONAL? The number of specific studies on these newly identified ncRNAs is limited. To date, there are only 40 documented examples of ncRNAs in mammals whose function has been verified, mainly using siRNA- or shRNA-mediated knockdown (see ref. [32]). We anticipate there will be an avalanche of such studies in the coming years, with in vivo strategies also being used to determine the biological processes in which individual ncRNAs may be involved (see e.g. [135]). Nonetheless, the currently validated sample is as yet too small to derive any firm conclusions about the entire population of ncRNAs, although some themes are beginning to emerge, notably the role of many long ncRNAs in regulating chromatin structure [28, 79, 136] (see below). Despite limited functional data on individual ncRNAs, there is substantial genome-wide evidence to indicate that non-coding transcripts do not arise merely by leaky transcription. The structure of the transcriptome from different tissues or cell types is, like mRNAs, largely consistent, with ESTs and tiling array signals showing coincident start and termination sites, thereby indicating discrete transcriptional units, which in some cases are conserved across species [2, 5, 35, 67]. The observation that the majority of the genome is transcribed should not be misconstrued as the subset transcribed in individual cell types, which is estimated 15% [3]. Moreover, new evidence from yeast strand-specific transcriptome analysis indicates that a significant proportion (13%) of all RNAs exhibit differential expression between alleles in both strands, indicating that transcription of both protein-coding and noncoding loci is regulated on all four strands of a diploid genome [137]. Pervasive transcription of the eukaryotic genome There are also many general characteristics of ncRNAs that point to their intrinsic functionality [138]. These include: (i) the conservation of their promoters, splice junctions, exons, predicted structures, genomic position and expression patterns [2, 29, 35, 36, 67, 76–78, 83, 139–141]; (ii) their dynamic expression and alternative splicing during differentiation [29, 36, 142]; (iii) their altered expression or splicing patterns in cancer and other diseases [82, 143–152]; (iv) their association with particular chromatin signatures that are indicative of actively transcribed genes [29, 36]; (v) their regulation by key morphogens and transcription factors [29, 36, 152, 153]; and (vi) their tissue- and cell-specific expression patterns and subcellular localization [30, 152, 154–161]. Independently, the developmental and tissuespecific expression of most ncRNAs provides perhaps the most compelling case for their widespread functionality. A study of ncRNAs expressed in mouse brain by in situ hybridization showed that the majority (623 out of 849) are selectively expressed in discrete functional regions of the brain, sometimes with evidence of specific subcellular localization [30]. Moreover, expression signatures and dynamic regulation of hundreds of ncRNAs has been observed across tissue types [38, 162, 163] and in various developmental systems, from Drosophila embryogenesis [156, 163] to differentiation of mammalian ES cells [29], T-cells [31] and muscle cells [160]. Large-scale alterations in the expression of ncRNAs are observed in cancers, indicating disrupted regulation and a possible involvement in disease [149, 164]. Importantly, many of these regulated transcripts, the so called ‘macroRNAs’, span several thousands of bases in the genome [5, 165], indicating that not only transcription initiation is a regulated process, but elongation is also highly processive, and not abortive as might be expected of random initiation events. The rapidly increasing number of individual ncRNAs implicated in various biological processes, involving a vast range of mechanisms [138], ranging from DNA replication [144, 166] to VDJ recombination [167], suggest we are likely only at the beginning of discovering the biological capabilities and functional repertoires of RNA. While it is possible that these are isolated cases involving one or a few special ncRNAs, they may also represent precedents for general mechanisms by which other RNAs (perhaps even families of RNAs) may function. 415 As an example, the gene silencing directed by Xist was recently shown to involve recruitment of chromatin silencing proteins [168], in particular Polycomb complexes, which induce histone methylation to regulate expression of a large cohort of genes. Intriguingly, Polycomb has subsequently been found to be recruited to several loci by other ncRNAs including RepA, Air, Kcnq1ot1 and Hotair, suggesting a general mechanism whereby specific ncRNAs direct the action of Polycomb to different sites in the genome. Indeed, a recent study found a high proportion (estimated at 24%) of long intergenic ncRNAs associate with Polycomb Repressive Complex 2 (PRC2), and can affect expression of target genes, with many additional ncRNAs being associated with other chromatin-modifying complexes [79]. In addition, ncRNAs that are induced during embryonic stem cell differentiation are associated with activated chromatin and chromatinactivating complexes [29] and it seems that the regulation of epigenetic processes may be a major function of ncRNAs [28, 136]. Many classes of eukaryotic effector proteins have RNA-binding domains, including chromatin modifying proteins [28], transcription factors [169] and even proteins involved in membrane signal transduction pathways [170], suggestive of a general infrastructure that can interact with and potentially be regulated by RNA. Some of these proteins bind specific populations of RNAs and such RNAbinding propensity has been used as evidence to identify regulatory RNAs, such as the roX RNAs that regulate dosage compensation in Drosophila [171]. Finally, many cellular processes and subcellular structures, such as chromatin [172, 173], the nuclear matrix [174], and the origin recognition complex [166], are RNase-sensitive, indicating that RNA components are required. Although most are yet to be identified, the demonstration of fundamental roles for ncRNAs in nuclear structures such as paraspeckles [160, 175, 176] and in other previously unrecognized domains [159], sets a precedent for further roles of ncRNAs in cell biology. HOW WILL IT BE POSSIBLE TO TEST WHETHER LARGE FRACTIONS OF NONCODING RNAs ARE FUNCTIONAL? A definitive identification of all ncRNAs in eukaryotic genomes that are biologically relevant may not 416 Dinger et al. be achievable in the foreseeable future. Nevertheless, recent years have witnessed an explosion in the number of reports of functional non-coding transcripts [32, 138, 177]. Together with an increased awareness of ncRNA functionality, and their inclusion in genome-wide screens, it is likely that many more functional transcripts will be defined, and that a better quantification may be subsequently possible. Estimating the proportion of ncRNAs that are functional may be more feasible in simple organisms. Dozens of regulatory RNAs have been found in bacteria over the past two decades, including antisense transcripts and a diversity of plasmid-encoded and intergenic small regulatory RNAs [178]. The regulatory RNA repertoire can vary considerably between microbial groups [178, 179] and, although most are still to be characterized, the roles of many have been examined with the aid of the wellestablished genetic and biochemical tools in model bacterial systems. Although the non-coding space in bacterial genomes is very limited compared to eukaryotes, bacterial RNAs show an exquisite diversity in terms of mechanisms of action and are largely explored in adaptation to specific environmental conditions, indicating that ncRNA regulation has been exploited since very early in evolutionary history. Similarly, many ncRNAs, such as the aforementioned CUTs, have been discovered in yeast [49], whose simplicity and experimental tractability has favoured the quantitative description of noncoding transcription and its stability. In contrast, ncRNAs in complex eukaryotes are considerably more abundant and, given the recent discovery of pervasive transcription, investigation of their potential functions and biological roles is in its infancy. The field has so far concentrated on attempting to catalogue the repertoire of RNAs [180], distinguishing them from protein-coding genes [181], and assessing their expression in different systems, all of which are important first steps to test their possible functionality and to identify the biological processes in which they may be involved [42]. Moreover, although the involvement of ncRNAs in many processes in eukaryotic cells has been demonstrated, the understanding of their modes of action is presently poor. Some ncRNAs are involved in complex phenomena, which have so far proven challenging to dissect. This is exemplified by the first transcript recognized as an ncRNA in mammalian cells, H19, which was discovered 20 years ago [182] and has since been implicated in many genetic phenomena, but whose mechanisms of action remain unknown [183]. Our evolving awareness of ncRNAs will likely prompt the development of novel tools to study their roles. Indeed, the discovery of RNA interference has been exploited by molecular biologists to provide the very powerful tool of siRNA-mediated knockdown for functional analyses. These tools may be similarly employed in large-scale approaches, such as systematic siRNA knockdowns to identify ncRNAs involved in specific functions or diseases [8]. Similarly, the sequencing of RNAs associated with particular protein-complexes will be useful to broadly define classes of RNAs functionally implicated in biological processes. Nevertheless, it remains likely that in most cases the elucidation of novel mechanisms will require ad hoc characterization of individual RNAs. HOW DOES PERVASIVE NON-CODING TRANSCRIPTION AFFECT THE TRADITIONAL CONCEPT OF A GENE? Traditionally, a gene is described as a sequence of DNA that occupies a specific location on a chromosome and determines a particular characteristic of an organism. There is no a priori problem considering a genomic locus that encodes an ncRNA as a ‘gene’ [184]. The gene concept can also be expanded to encompass splice variants and post-transcriptional processing in cases where the alternate products still contribute to the same characteristic. However, in cases where the alternate product has a distinct function, such as a tissue-specific splice variant lacking a transmembrane domain resulting in an alternative cellular localization and function, the concept of a gene becomes troubled. Further challenges to the gene concept are instigated by factors that blur the boundaries of genes, such as variable transcription start sites, variable locations of promoter sequences that may be active in different types of cells, and variable polyadenylation sites. The occurrence of overlapping and antisense regulatory ncRNAs that may be expressed differentially to the associated protein-coding ‘gene’ further confounds the gene concept [29, 30], as does the co-expression of multiple small RNAs [18, 19, 185, 186]. This blurring has led to genes in higher organisms being described as ‘fuzzy transcription clusters with multiple products’ [187], although descriptors for these clusters Pervasive transcription of the eukaryotic genome and their products, as well as hierarchically structured lexicons, are problematic. The more recent data describing the complex pervasive transcription across the genome [2, 3, 5, 185, 188] further challenges even this fuzzier conception of a gene [184, 189, 190]. The observation that many transcripts cover huge areas of the genome, and may include exons from very distal locations that then traverse ‘introns’ that themselves harbour other ‘genes’ specifying other proteins or ncRNAs, which may be separately expressed, makes the idea of a gene as a discrete locus untenable. Moreover, as it is likely that such overlapping suites of genetic information (including distal promoters and enhancers) may ultimately form contigs that span whole chromosomes, it becomes impossible to separate one gene as an entity from another. Finally, the discovery of interactions between genes, as in the case of transcripts generated by ‘fusions’ from different genomic regions or transpliced to generate chimeras [14, 191–193], cannot be rationalized within existing conceptualizations of a gene. This renews the original focus for the definition of a gene centred on phenotype, although in this case the operational unit is a transcriptional final product and not a fixed genomic coordinate [189]. Together, these observations also prompt a revision in our understanding of genome architecture, away from a linear model, to a concept that provides for the inclusion of any combination of subsequences from the genome to be incorporated into a proteincoding or non-coding product. It suggests that the information content of the genome is far greater and that its organization is far more sophisticated than previously imagined. This in turn implies that most of the genome specifies a continuum of RNA regulatory information that may be intimately involved in the ontogeny of complex organisms, by controlling the expression of proteins and their alternative isoforms. However, a much more comprehensive description of the genome and its relationship to the transcriptome will be necessary before the practicalities of such a revision could be meaningfully proposed. In the interim, at the very least, it will be prudent to carefully consider the genomic context of any ‘gene’ in the interpretation and design of experiments that involve the targeting of gene subsequences, such as microarray, in situ hybridization or RNA interference. Such experiments are often presumed to provide data representative of the entire gene, but may actually be inadvertently 417 affecting or measuring other overlapping transcriptional products (Figure 1). FINAL CONSIDERATIONS: IN UNCHARTED TERRITORY It is now clear that RNAs have much greater structural and functional versatility than assumed only a few years ago. Considering this potential, it has been argued that extensive non-coding transcription primarily provides a cache of RNA molecules that can eventually evolve useful functions [23]. While this may indeed represent a mechanism for generating novel regulatory RNAs, it is unknown what fraction of the extant transcriptome might fall into this reservoir category and, conversely, it is therefore quite possible that most ncRNAs may have already evolved functions, in many cases lineage-specifically. Indeed, it has been proposed that the increasing extent of transcribed non-coding RNA may provide an important expansion of regulatory information underpinning the developmental and cognitive complexity and the phenotypic diversity of animals, which have a similar set of protein-coding genes but exhibit large increases in their non-coding genomic sequences as their complexity increases [27, 194, 195]. To clarify the matter, it will be important to elucidate the processes of genesis, fixation and functionalization of non-coding transcription, in particular the differences in the evolutionary forces that shape non-coding sequences compared to proteincoding genes. The proportion of pervasive transcription that is functional remains an open question. Considering the sheer amount of ncRNA transcription in higher eukaryotes, even if a small fraction (let alone a majority) has evolved functions, there could be thousands of new genetic loci that have escaped our attention and may hold the key to understanding complex processes in eukaryotic biology, especially in relation to development and neural function [27, 196]. Research on ncRNAs may also lead to discovery of important molecules that, like protein-coding genes, may be used as tools and targets in biotechnology and therapeutics [197]. Given the remarkably fertile grounds of ncRNA research in recent years, there appears to be a whole universe out there in the genome to be explored. The focusing of so-called ‘genome-wide’ studies on protein-coding regions can be misleading to 418 Dinger et al. biologists who are not closely acquainted with the genomics field. The assumption that an understanding of the protein-coding subset of the genome and its products will equate to a complete picture of the molecular basis for development, cognition or disease is not sound, and indeed may be arbitrarily diverting attention from many other genetic factors underlying particular phenotypes. Recent advances in technology make such experiments easier to design, and it will be sensible henceforth to treat any transcript, or indeed any part of the genome, as being potentially functional. 9. 10. 11. 12. 13. 14. Key Points Transcriptomic analysis reveals prevalent transcription across the eukaryotic genome. The functional relevance of pervasive transcription is widely debated. Many indices of function, including dynamic expression profiles, conservation signatures, splicing and chromatin modification patterns, and subcellular localization, suggest the general functionality of non-coding transcription. Pervasive transcription challenges traditional conceptions of the gene and genetic information, and has important implications for the design and interpretation of experiments investigating gene function. 15. 16. 17. 18. 19. References 1. 2. 3. 4. 5. 6. 7. 8. Okazaki Y, Furuno M, Kasukawa T, et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 2002;420:563–73. Carninci P, Kasukawa T, Katayama S, et al. The transcriptional landscape of the mammalian genome. Science 2005; 309:1559–63. Cheng J, Kapranov P, Drenkow J, et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 2005;308:1149–54. Kapranov P, Cheng J, Dike S, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 2007;316:1484–8. Kapranov P, Drenkow J, Cheng J, et al. Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res 2005; 15:987–97. Katayama S, Tomaru Y, Kasukawa T, et al. Antisense transcription in the mammalian transcriptome. Science 2005;309: 1564–6. Denoeud F, Kapranov P, Ucla C, et al. Prominent use of distal 50 transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res 2007;17:746–59. Willingham AT, Orth AP, Batalov S, et al. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 2005;309:1570–3. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. Manak JR, Dike S, Sementchenko V, et al. Biological function of unannotated transcription during the early development of Drosophila melanogaster. Nat Genet 2006;38:1151–8. Li H, Wang J, Mor G, etal. A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. Science 2008; 321:1357–61. Li X, Zhao L, Jiang H, et al. Short homologous sequences are strongly associated with the generation of chimeric RNAs in eukaryotes. J Mol Evol 2009;68:56–65. Akiva P, Toporik A, Edelheit S, et al. Transcriptionmediated gene fusion in the human genome. Genome Res 2006;16:30–6. Unneberg P, Claverie JM. Tentative mapping of transcription-induced interchromosomal interaction using chimeric EST and mRNA data. PLoS ONE 2007;2:e254. Maher CA, Palanisamy N, Brenner JC, et al. Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci USA 2009;106:12353–8. Fejes-Toth K, Sotirova V, Sachidanandam R, et al. Post-transcriptional processing generates a diversity of 50 -modified long and short RNAs. Nature 2009;457: 1028–32. Rodriguez A, Griffiths-Jones S, Ashurst JL, et al. Identification of mammalian microRNA host genes and transcription units. Genome Res 2004;14:1902–10. Berezikov E, van Tetering G, Verheul M, et al. Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 2006;16:1289–98. Kiss T. SnoRNP biogenesis meets Pre-mRNA splicing. Mol Cell 2006;23:775–6. Okamura K, Hagen JW, Duan H, et al. The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 2007;130:89–100. Taft RJ, Glazov EA, Cloonan N, et al. Tiny RNAs associated with transcription start sites in animals. Nat Genet 2009;41:572–8. Taft RJ, Glazov EA, Lassmann T, et al. Small RNAs derived from snoRNAs. Rna 2009;15:1233–40. Ruby JG, Jan CH, Bartel DP. Intronic microRNA precursors that bypass Drosha processing. Nature 2007;448:83–6. Brosius J. Waste not, want not—transcript excess in multicellular eukaryotes. Trends Genet 2005;21:287–8. Struhl K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat Struct Mol Biol 2007;14:103–5. Chakalova L, Debrand E, Mitchell JA, et al. Replication and transcription: shaping the landscape of the genome. Nat Rev Genet 2005;6:669–77. Babak T, Blencowe BJ, Hughes TR. A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription. BMC Genomics 2005;6:104. Mattick JS. A new paradigm for developmental biology. J Exp Biol 2007;210:1526–47. Mattick JS, Amaral PP, Dinger ME, et al. RNA regulation of epigenetic processes. Bioessays 2009;31:51–9. Dinger ME, Amaral PP, Mercer TR, et al. Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation. Genome Res 2008;18:1433–45. Mercer TR, Dinger ME, Sunkin SM, et al. Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci USA 2008;105:716–21. Pervasive transcription of the eukaryotic genome 31. Pang KC, Dinger ME, Mercer TR, et al. Genome-wide identification of long noncoding RNAs in CD8þ T cells. J Immunol 2009;182:7738–48. 32. Mattick JS. The genetic signatures of noncoding RNAs. PLoS Genet 2009;5:e1000459. 33. Dieci G, Fiorino G, Castelnuovo M, et al. The expanding RNA polymerase III transcriptome. Trends Genet 2007;23: 614–22. 34. Karlin S, Brocchieri L, Campbell A, et al. Genomic and proteomic comparisons between bacterial and archaeal genomes and related comparisons with the yeast and fly genomes. Proc Natl Acad Sci USA 2005;102:7309–14. 35. Engstrom PG, Suzuki H, Ninomiya N, et al. Complex loci in human and mouse genomes. PLoS Genet 2006;2:e47. 36. Guttman M, Amit I, Garber M, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 2009;458:223–7. 37. Makeyev EV, Zhang J, Carrasco MA, etal. The MicroRNA miR-124 promotes neuronal differentiation by triggering brain-specific alternative pre-mRNA splicing. Mol Cell 2007;27:435–48. 38. Nakaya HI, Amaral PP, Louro R, et al. Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription. Genome Biol 2007;8:R43. 39. Kopp JL, Ormsbee BD, Desler M, et al. Small increases in the level of Sox2 trigger the differentiation of mouse embryonic stem cells. Stem Cells 2008;26:903–11. 40. Johnston RJ, Hobert O. A microRNA controlling left/right neuronal asymmetry in Caenorhabditis elegans. Nature 2003; 426:845–9. 41. Girard A, Sachidanandam R, Hannon GJ, et al. A germlinespecific class of small RNAs binds mammalian Piwi proteins. Nature 2006;442:199–202. 42. Dinger ME, Pang KC, Mercer TR, et al. NRED: a database of long noncoding RNA expression. Nucleic Acids Res 2009; 37:D122–6. 43. Kurihara Y, Matsui A, Hanada K, et al. Genome-wide suppression of aberrant mRNA-like noncoding RNAs by NMD in Arabidopsis. Proc Natl Acad Sci USA 2009;106: 2453–8. 44. Chekanova JA, Gregory BD, Reverdatto SV, et al. Genome-wide high-resolution mapping of exosome substrates reveals hidden features in the Arabidopsis transcriptome. Cell 2007;131:1340–53. 45. Preker P, Nielsen J, Kammler S, etal. RNA exosome depletion reveals transcription upstream of active human promoters. Science 2008;322:1851–4. 46. Fasken MB, Corbett AH. Process or perish: quality control in mRNA biogenesis. Nat Struct Mol Biol 2005;12: 482–8. 47. Houseley J, LaCava J, Tollervey D. RNA-quality control by the exosome. Nat Rev Mol Cell Biol 2006;7:529–39. 48. Isken O, Maquat LE. The multiple lives of NMD factors: balancing roles in gene and genome regulation. Nat Rev Genet 2008;9:699–712. 49. Wyers F, Rougemaille M, Badis G, et al. Cryptic pol II transcripts are degraded by a nuclear quality control pathway involving a new poly(A) polymerase. Cell 2005;121: 725–37. 419 50. Davis CA, Ares M, Jr. Accumulation of unstable promoterassociated transcripts upon loss of the nuclear exosome subunit Rrp6p in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 2006;103:3262–7. 51. Johnson JM, Edwards S, Shoemaker D, et al. Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. Trends Genet 2005;21: 93–102. 52. Uhler JP, Hertel C, Svejstrup JQ. A role for noncoding transcription in activation of the yeast PHO5 gene. Proc Natl Acad Sci USA 2007;104:8011–6. 53. Camblong J, Iglesias N, Fickentscher C, et al. Antisense RNA stabilization induces transcriptional gene silencing via histone deacetylation in S. cerevisiae. Cell 2007;131: 706–17. 54. Camblong J, Beyrouthy N, Guffanti E, et al. Trans-acting antisense RNAs mediate transcriptional gene cosuppression in S. cerevisiae. Genes Dev 2009;23:1534–45. 55. Berretta J, Pinskaya M, Morillon A. A cryptic unstable transcript mediates transcriptional trans-silencing of the Ty1 retrotransposon in S. cerevisiae. Genes Dev 2008;22: 615–26. 56. Neu-Yilik G, Kulozik AE. NMD: multitasking between mRNA surveillance and modulation of gene expression. Adv Genet 2008;62:185–243. 57. Ciaudo C, Bourdet A, Cohen-Tannoudji M, et al. Nuclear mRNA degradation pathway(s) are implicated in Xist regulation and X chromosome inactivation. PLoS Genet 2006;2:e94. 58. Wang X, Arai S, Song X, et al. Induced ncRNAs allosterically modify RNA-binding proteins in cis to inhibit transcription. Nature 2008;454:126–30. 59. Yu W, Gius D, Onyango P, et al. Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature 2008;451:202–6. 60. Buhler M, Verdel A, Moazed D. Tethering RITS to a nascent transcript initiates RNAi- and heterochromatindependent gene silencing. Cell 2006;125:873–886. 61. Motamedi MR, Verdel A, Colmenares SU, et al. Two RNAi complexes, RITS and RDRC, physically interact and localize to noncoding centromeric RNAs. Cell 2004; 119:789–802. 62. Han J, Kim D, Morris KV. Promoter-associated RNA is required for RNA-directed transcriptional gene silencing in human cells. Proc Natl Acad Sci USA 2007;104: 12422–7. 63. Hirota K, Miyoshi T, Kugou K, et al. Stepwise chromatin remodelling by a cascade of transcription initiation of non-coding RNAs. Nature 2008;456:130–4. 64. Ebisuya M, Yamamoto T, Nakajima M, et al. Ripples from neighbouring transcription. Nat Cell Biol 2008;10: 1106–13. 65. Martens JA, Laprade L, Winston F. Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature 2004;429:571–4. 66. Mazo A, Hodgson JW, Petruk S, et al. Transcriptional interference: an unexpected layer of complexity in gene regulation. J Cell Sci 2007;120:2755–61. 67. Ponjavic J, Ponting CP, Lunter G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res 2007;17:556–65. 420 Dinger et al. 68. Barski A, Cuddapah S, Cui K, et al. High-resolution profiling of histone methylations in the human genome. Cell 2007;129:823–37. 69. Kouzarides T. Chromatin modifications and their function. Cell 2007;128:693–705. 70. Ashe HL, Monks J, Wijgerde M, et al. Intergenic transcription and transinduction of the human beta-globin locus. Genes Dev 1997;11:2494–509. 71. Chotalia M, Smallwood SA, Ruf N, et al. Transcription is required for establishment of germline methylation marks at imprinted genes. Genes Dev 2009;23:105–17. 72. Pauler FM, Koerner MV, Barlow DP. Silencing by imprinted noncoding RNAs: is transcription the answer? Trends Genet 2007;23:284–92. 73. Nagano T, Mitchell JA, Sanz LA, et al. The air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin. Science 2008;322:1717–20. 74. Martianov I, Ramadass A, Serra Barros A, et al. Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript. Nature 2007;445:666–70. 75. Mohammad F, Mondal T, Kanduri C. Epigenetics of imprinted long noncoding RNAs. Epigenetics 2009;4. 76. Torarinsson E, Sawera M, Havgaard JH, et al. Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. Genome Res 2006;16:885–9. 77. Torarinsson E, Yao Z, Wiklund ED, et al. Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res 2008;18: 242–51. 78. Washietl S, Hofacker IL, Lukasser M, et al. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol 2005;23:1383–90. 79. Khalil AM, Guttman M, Huarte M, etal. Many human large intergenic noncoding RNAs associate with chromatinmodifying complexes and affect gene expression. Proc Natl Acad Sci USA 2009;106:11667–72. 80. Bejerano G, Pheasant M, Makunin I, et al. Ultraconserved elements in the human genome. Science 2004;304:1321–5. 81. Stephen S, Pheasant M, Makunin IV, et al. Large-scale appearance of ultraconserved elements in tetrapod genomes and slowdown of the molecular clock. Mol Biol Evol 2008; 25:402–8. 82. Calin GA, Liu CG, Ferracin M, etal. Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell 2007;12:215–29. 83. Pang KC, Frith MC, Mattick JS. Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet 2006;22:1–5. 84. Hiller M, Findeiss S, Lein S, et al. Conserved introns reveal novel transcripts in Drosophila melanogaster. Genome Res 2009;19:1289–300. 85. Steigele S, Huber W, Stocsits C, et al. Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions. BMC Biol 2007;5:25. 86. Carroll SB. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 2008;134:25–36. 87. Pheasant M, Mattick JS. Raising the estimate of functional human sequences. Genome Res 2007;17:1245–53. 88. Fisher S, Grice EA, Vinton RM, etal. Conservation of RET regulatory function from human to zebrafish without sequence similarity. Science 2006;312:276–9. 89. Frith MC, Ponjavic J, Fredman D, et al. Evolutionary turnover of mammalian transcription start sites. Genome Res 2006;16:713–22. 90. Smith NG, Brandstrom M, Ellegren H. Evidence for turnover of functional noncoding DNA in mammalian genome evolution. Genomics 2004;84:806–13. 91. Taylor MS, Kai C, Kawai J, et al. Heterotachy in mammalian promoter evolution. PLoS Genet 2006;2:e30. 92. Pollard KS, Salama SR, King B, et al. Forces shaping the fastest evolving regions in the human genome. PLoS Genet 2006;2:e168. 93. Pollard KS, Salama SR, Lambert N, et al. An RNA gene expressed during cortical development evolved rapidly in humans. Nature 2006;443:167–72. 94. Chow JC, Yen Z, Ziesche SM, et al. Silencing of the mammalian X chromosome. Annu Rev Genomics Hum Genet 2005;6:69–92. 95. Sleutels F, Zwart R, Barlow DP. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature 2002;415:810–3. 96. Liu GE, Alkan C, Jiang L, et al. Comparative analysis of Alu repeats in primate genomes. Genome Res 2009;19:876–85. 97. Dumas C, Chow C, Muller M, et al. A novel class of developmentally regulated noncoding RNAs in Leishmania. Eukaryot Cell 2006;5:2033–46. 98. Aspegren A, Hinas A, Larsson P, et al. Novel non-coding RNAs in Dictyostelium discoideum and their expression during development. Nucleic Acids Res 2004;32:4646–56. 99. Mandin P, Repoila F, Vergassola M, et al. Identification of new noncoding RNAs in Listeria monocytogenes and prediction of mRNA targets. Nucleic Acids Res 2007;35: 962–74. 100. Liu Y, Taverna SD, Muratore TL, et al. RNAi-dependent H3K27 methylation is required for heterochromatin formation and DNA elimination in Tetrahymena. Genes Dev 2007;21:1530–45. 101. Nowacki M, Vijayan V, Zhou Y, et al. RNA-mediated epigenetic programming of a genome-rearrangement pathway. Nature 2008;451:153–8. 102. Zeh DW, Zeh JA, Ishida Y. Transposable elements and an epigenetic basis for punctuated equilibria. Bioessays 2009;31: 715–26. 103. Oliver KR, Greene WK. Transposable elements: powerful facilitators of evolution. Bioessays 2009;31:703–14. 104. Pask AJ, Papenfuss AT, Ager EI, et al. Analysis of the platypus genome suggests a transposon origin for mammalian imprinting. Genome Biol 2009;10:R1. 105. Gehring M, Bubb KL, Henikoff S. Extensive demethylation of repetitive elements during seed development underlies gene imprinting. Science 2009;324:1447–51. 106. Silva JC, Shabalina SA, Harris DG, et al. Conserved fragments of transposable elements in intergenic regions: evidence for widespread recruitment of MIR- and L2-derived sequences within the mouse and human genomes. Genet Res 2003;82:1–18. 107. Smit AF, Riggs AD. MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation. Nucleic Acids Res 1995;23:98–102. Pervasive transcription of the eukaryotic genome 108. Jelinek WR, Toomey TP, Leinwand L, et al. Ubiquitous, interspersed repeated sequences in mammalian genomes. Proc Natl Acad Sci USA 1980;77:1398–402. 109. Consortium IHGS. Finishing the euchromatic sequence of the human genome. Nature 2004;431:931–45. 110. Jurka J. Conserved eukaryotic transposable elements and the evolution of gene regulation. Cell Mol Life Sci 2008;65: 201–4. 111. Faulkner GJ, Kimura Y, Daub CO, et al. The regulated retrotransposon transcriptome of mammalian cells. Nat Genet 2009;41:563–71. 112. Mariner PD, Walters RD, Espinoza CA, et al. Human Alu RNA is a modular transacting repressor of mRNA transcription during heat shock. Mol Cell 2008;29:499–509. 113. Hasler J, Samuelsson T, Strub K. Useful ‘junk’: Alu RNAs in the human transcriptome. Cell Mol Life Sci 2007;64: 1793–800. 114. Buhler M, Moazed D. Transcription and RNAi in heterochromatic gene silencing. Nat Struct Mol Biol 2007;14: 1041–8. 115. Chueh AC, Northrop EL, Brettingham-Moore KH, et al. LINE retrotransposon RNA is an essential structural and functional epigenetic component of a core neocentromeric chromatin. PLoS Genet 2009;5:e1000354. 116. Muotri AR, Gage FH. Generation of neuronal variability and complexity. Nature 2006;441:1087–93. 117. Lunyak VV, Prefontaine GG, Nunez E, et al. Developmentally regulated activation of a SINE B2 repeat as a domain boundary in organogenesis. Science 2007;317: 248–51. 118. Amaral PP, Mattick JS. Noncoding RNA in development. Mamm Genome 2008;19:454–92. 119. Peaston AE, Evsikov AV, Graber JH, et al. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 2004;7:597–606. 120. Mette MF, Aufsatz W, van der Winden J, et al. Transcriptional silencing and promoter methylation triggered by double-stranded RNA. EmboJ 2000;19:5194–201. 121. Pal-Bhadra M, Bhadra U, Birchler JA. RNAi related mechanisms affect both transcriptional and posttranscriptional transgene silencing in Drosophila. Mol Cell 2002;9: 315–27. 122. Vaucheret H. Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev 2006;20: 759–71. 123. Matzke M, Kanno T, Huettel B, et al. Targets of RNAdirected DNA methylation. Curr Opin Plant Biol 2007;10: 512–19. 124. Pal-Bhadra M, Leibovitch BA, Gandhi SG, et al. Heterochromatic silencing and HP1 localization in Drosophila are dependent on the RNAi machinery. Science 2004;303:669–72. 125. Singh J, Freeling M, Lisch D. A position effect on the heritability of epigenetic silencing. PLoS Genet 2008;4: e1000216. 126. Brennecke J, Malone CD, Aravin AA, et al. An epigenetic role for maternally inherited piRNAs in transposon silencing. Science 2008;322:1387–92. 127. Yang PK, Kuroda MI. Noncoding RNAs and intranuclear positioning in monoallelic gene expression. Cell 2007;128: 777–86. 421 128. Sasaki H, Matsui Y. Epigenetic events in mammalian germcell development: reprogramming and beyond. Nat Rev Genet 2008;9:129–40. 129. Chandler VL. Paramutation: from maize to mice. Cell 2007; 128:641–5. 130. Cuzin F. Induction by microRNAs of hereditary epigenetic modifications (paramutation) in the mouse. Proceedings of the 5th Colmar Symposium: The New RNA Frontiers. France: Colmar, 2007. 131. Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science 2008;322:881–8. 132. Ishii N, Ozaki K, Sato H, et al. Identification of a novel non-coding RNA, MIAT, that confers risk of myocardial infarction. J Hum Genet 2006;51:1087–99. 133. Pasmant E, Laurendeau I, Heron D, etal. Characterization of a germ-line deletion, including the entire INK4/ARF locus, in a melanoma-neural system tumor family: identification of ANRIL, an antisense noncoding RNA whose expression coclusters with ARF. Cancer Res 2007;67:3963–9. 134. Shirasawa S, Harada H, Furugaki K, et al. SNPs in the promoter of a B cell-specific antisense transcript, SASZFAT, determine susceptibility to autoimmune thyroid disease. Hum Mol Genet 2004;13:2221–31. 135. Bond AM, Vangompel MJ, Sametsky EA, et al. Balanced gene regulation by an embryonic brain ncRNA is critical for adult hippocampal GABA circuitry. Nat Neurosci 2009;12: 1020–7. 136. Costa FF. Non-coding RNAs, epigenetics and complexity. Gene 2008;410:9–17. 137. Gagneur J, Sinha H, Perocchi F, et al. Genome-wide alleleand strand-specific expression profiling. Mol Syst Biol 2009;5: 274. 138. Amaral PP, Dinger ME, Mercer TR, et al. The eukaryotic genome as an RNA machine. Science 2008;319:1787–9. 139. Louro R, Smirnova AS, Verjovski-Almeida S. Long intronic noncoding RNA transcription: expression noise or expression choice? Genomics 2009;93:291–8. 140. Trinklein ND, Aldred SF, Hartman SJ, et al. An abundance of bidirectional promoters in the human genome. Genome Res 2004;14:62–6. 141. Tupy JL, Bailey AM, Dailey G, et al. Identification of putative noncoding polyadenylated transcripts in Drosophila melanogaster. Proc Natl Acad Sci USA 2005;102:5495–500. 142. Rinn JL, Kertesz M, Wang JK, et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 2007;129:1311–23. 143. Angeloni D, ter Elst A, Wei MH, et al. Analysis of a new homozygous deletion in the tumor suppressor region at 3p12.3 reveals two novel intronic noncoding RNA genes. Genes Chromosomes Cancer 2006;45:676–91. 144. Christov CP, Trivier E, Krude T. Noncoding human Y RNAs are overexpressed in tumours and required for cell proliferation. BrJ Cancer 2008;98:981–8. 145. Ji P, Diederichs S, Wang W, et al. MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene 2003;22:8031–41. 146. Mutsuddi M, Marshall CM, Benzow KA, et al. The spinocerebellar ataxia 8 noncoding RNA causes neurodegeneration and associates with staufen in Drosophila. Curr Biol 2004;14:302–8. 422 Dinger et al. 147. Perez DS, Hoage TR, Pritchett JR, et al. Long, abundantly expressed non-coding transcripts are altered in cancer. Hum Mol Genet 2008;17:642–55. 148. Reis EM, Nakaya HI, Louro R, et al. Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer. Oncogene 2004;23: 6684–92. 149. Reis EM, Ojopi EP, Alberto FL, et al. Large-scale transcriptome analyses reveal new genetic marker candidates of head, neck, and thyroid cancer. Cancer Res 2005;65:1693–9. 150. Sonkoly E, Bata-Csorgo Z, Pivarcsi A, et al. Identification and characterization of a novel, psoriasis susceptibilityrelated noncoding RNA gene, PRINS. J Biol Chem 2005; 280:24159–67. 151. Thrash-Bingham CA, Tartof KD. aHIF: a natural antisense transcript overexpressed in human renal cancer and during hypoxia. J Natl Cancer Inst 1999;91:143–51. 152. Zhang X, Lian Z, Padden C, etal. A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster. Blood 2009;113:2526–34. 153. Cawley S, Bekiranov S, Ng HH, etal. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004;116:499–509. 154. Blin-Wakkach C, Lezot F, Ghoul-Mazgar S, et al. Endogenous Msx1 antisense transcript: in vivo and in vitro evidences, structure, and potential involvement in skeleton development in mammals. Proc Natl Acad Sci USA 2001;98:7336–41. 155. Brena C, Chipman AD, Minelli A, et al. Expression of trunk Hox genes in the centipede Strigamia maritima: sense and anti-sense transcripts. Evol Dev 2006;8:252–65. 156. Inagaki S, Numata K, Kondo T, et al. Identification and expression analysis of putative mRNA-like non-coding RNA in Drosophila. Genes Cells 2005;10:1163–73. 157. Kohtz JD, Fishell G. Developmental regulation of EVF-1, a novel non-coding RNA transcribed upstream of the mouse Dlx6 gene. Gene Expr Patterns 2004;4:407–12. 158. Redrup L, Branco MR, Perdeaux ER, et al. The long noncoding RNA Kcnq1ot1 organises a lineage-specific nuclear domain for epigenetic gene silencing. Development 2009;136: 525–30. 159. Sone M, Hayashi T, Tarui H, et al. The mRNA-like noncoding RNA Gomafu constitutes a novel nuclear domain in a subset of neurons. J Cell Sci 2007;120:2498–506. 160. Sunwoo H, Dinger ME, Wilusz JE, et al. MEN varepsilon/ beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles. Genome Res 2009;19:347–59. 161. Young TL, Matsuda T, Cepko CL. The noncoding RNA taurine upregulated gene 1 is required for differentiation of the murine retina. Curr Biol 2005;15:501–12. 162. Ravasi T, Suzuki H, Pang KC, etal. Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res 2006;16:11–9. 163. Sasaki YT, Sano M, Ideue T, et al. Identification and characterization of human non-coding RNAs with tissuespecific expression. Biochem Biophys Res Commun 2007;357: 991–6. 164. Lu J, Getz G, Miska EA, et al. MicroRNA expression profiles classify human cancers. Nature 2005;435:834–8. 165. Furuno M, Pang KC, Ninomiya N, et al. Clusters of internally primed transcripts reveal novel long noncoding RNAs. PLoS Genet 2006;2:e37. 166. Norseen J, Thomae A, Sridharan V, et al. RNA-dependent recruitment of the origin recognition complex. EmboJ 2008; 27:3024–35. 167. Abarrategui I, Krangel MS. Noncoding transcription controls downstream promoters to regulate T-cell receptor alpha recombination. EmboJ 2007;26:4380–90. 168. Wutz A, Gribnau J. X inactivation Xplained. Curr Opin Genet Dev 2007;17:387–93. 169. Cassiday LA, Maher LJ, 3rd. Having it both ways: transcription factors that bind DNA and RNA. Nucleic Acids Res 2002;30:4118–26. 170. Kennedy D, Wood SA, Ramsdale T, et al. Identification of a mouse orthologue of the human ras-GAP-SH3-domain binding protein and structural confirmation that these proteins contain an RNA recognition motif. Biomed Pept Proteins Nucleic Acids 1996;2:93–9. 171. Kelley RL, Meller VH, Gordadze PR, et al. Epigenetic spreading of the Drosophila dosage compensation complex from roX RNA genes into flanking chromatin. Cell 1999; 98:513–22. 172. Rodriguez-Campos A, Azorin F. RNA is an integral component of chromatin that contributes to its structural organization. PLoS ONE 2007;2:e1182. 173. Sanchez-Elsner T, Gou D, Kremmer E, et al. Noncoding RNAs of trithorax response elements recruit Drosophila Ash1 to Ultrabithorax. Science 2006;311:1118–23. 174. Barboro P, D’Arrigo C, Diaspro A, et al. Unraveling the organization of the internal nuclear matrix: RNAdependent anchoring of NuMA to a lamin scaffold. Exp Cell Res 2002;279:202–18. 175. Clemson CM, Hutchinson JN, Sara SA, et al. An architectural role for a nuclear noncoding RNA: NEAT1 RNA is essential for the structure of paraspeckles. Mol Cell 2009;33: 717–26. 176. Sasaki YT, Ideue T, Sano M, et al. MENepsilon/beta noncoding RNAs are essential for structural integrity of nuclear paraspeckles. Proc Natl Acad Sci USA 2009;106: 2525–30. 177. Mercer TR, Dinger ME, Mattick JS. Long noncoding RNAs: insights into function. Nat Rev Genet 2009; 10:155–9. 178. Waters LS, Storz G. Regulatory RNAs in bacteria. Cell 2009;136:615–28. 179. Shi Y, Tyson GW, DeLong EF. Metatranscriptomics reveals unique microbial small RNAs in the ocean’s water column. Nature 2009;459:266–9. 180. Pang KC, Stephen S, Dinger ME, et al. RNAdb 2.0–an expanded database of mammalian non-coding RNAs. Nucleic Acids Res 2007;35:D178–82. 181. Dinger ME, Pang KC, Mercer TR, et al. Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol 2008;4:e1000176. 182. Brannan CI, Dees EC, Ingram RS, et al. The product of the H19 gene may function as an RNA. Mol Cell Biol 1990;10: 28–36. 183. Gabory A, Ripoche MA, Yoshimizu T, et al. The H19 gene: regulation and function of a non-coding RNA. Cytogenet Genome Res 2006;113:188–93. Pervasive transcription of the eukaryotic genome 184. Gerstein MB, Bruce C, Rozowsky JS, et al. What is a gene, post-ENCODE? History and updated definition. Genome Res 2007;17:669–81. 185. Birney E, Stamatoyannopoulos JA, Dutta A, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007;447:799–816. 186. Watanabe T, Totoki Y, Toyoda A, et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 2008;453:539–43. 187. Mattick JS. Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. Bioessays 2003;25:930–9. 188. Carninci P, Sandelin A, Lenhard B, et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 2006;38:626–35. 189. Gingeras TR. Origin of phenotypes: genes and transcripts. Genome Res 2007;17:682–90. 190. Pesole G. What is a gene? An updated operational definition. Gene 2008;417:1–4. 191. Ruan Y, Ooi HS, Choo SW, et al. Fusion transcripts and transcribed retrotransposed loci discovered through 423 comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res 2007;17:828–38. 192. Smalheiser NR. EST analyses predict the existence of a population of chimeric microRNA precursor-mRNA transcripts expressed in normal human and mouse tissues. Genome Biol 2003;4:403. 193. Chen C, Fossar N, Weil D, et al. High frequency trans-splicing in a cell line producing spliced and polyadenylated RNA polymerase I transcripts from an rDNA-myc chimeric gene. Nucleic Acids Res 2005;33: 2332–42. 194. Mattick JS. Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep 2001;2:986–91. 195. Taft RJ, Pheasant M, Mattick JS. The relationship between non-protein-coding DNA and eukaryotic complexity. Bioessays 2007;29:288–99. 196. Mehler MF, Mattick JS. Noncoding RNAs and RNA editing in brain development, functional diversification, and neurological disease. Physiol Rev 2007;87: 799–823. 197. Costa FF. Non-coding RNAs and new opportunities for the private sector. Drug DiscovToday 2009;14:446–52.
© Copyright 2026 Paperzz