elp038_online 407..423 - Oxford Academic

B RIEFINGS IN FUNC TIONAL GENOMICS AND P ROTEOMICS . VOL 8. NO 6. 407^ 423
doi:10.1093/bfgp/elp038
Pervasive transcription of the
eukaryotic genome: functional indices
and conceptual implications
Marcel E. Dinger, Paulo P. Amaral, Timothy R. Mercer and John S. Mattick
Advance Access publication date 21 September 2009
Abstract
Genome-wide analyses of the eukaryotic transcriptome have revealed that the majority of the genome is
transcribed, producing large numbers of non-protein-coding RNAs (ncRNAs). This surprising observation challenges
many assumptions about the genetic programming of higher organisms and how information is stored and organized
within the genome. Moreover, the rapid advances in genomics have given little opportunity for biologists to
integrate these emerging findings into their intellectual and experimental frameworks. This problem has been
compounded by the perception that genome-wide studies often generate more questions than answers, which in
turn has led to confusion and controversy. In this article, we address common questions associated with the
phenomenon of pervasive transcription and consider the indices that can be used to evaluate the function (or lack
thereof) of the resulting ncRNAs. We suggest that many lines of evidence, including expression profiles, conservation signatures, chromatin modification patterns and examination of increasing numbers of individual cases, argue
in favour of the widespread functionality of non-coding transcription. We also discuss how informatic and experimental approaches used to analyse protein-coding genes may not be applicable to ncRNAs and how the general perception that protein-coding genes form the main informational output of the genome has resulted in much of the
misunderstanding surrounding pervasive transcription and its potential significance. Finally, we present the conceptual implications of the majority of the eukaryotic genome being functional and describe how appreciating this
perspective will provide considerable opportunity to further understand the molecular basis of development and
complex diseases.
Keywords: non-coding RNA; functional RNA; pervasive transcription; gene definition
INTRODUCTION
Each technical advance in examining the eukaryotic
transcriptome has revealed increasing degrees of its
complexity. Our understanding of the structure of
the genome has shifted accordingly, from a simple
model where each gene (with its adjacent cis-regulatory sequences) comprises a discrete unit that yields
a messenger RNA encoding a single protein, to a
model where a gene is a blurred entity that encompasses a complex network of protein-coding and
non-coding transcripts with both proximal and
distal regulatory elements. Historically, new insights
into the complexity of the transcriptome, such as the
existence of alternative splicing, antisense transcripts
and microRNAs, were initially identified as idiosyncratic phenomena, then later recognized as common
Corresponding author. John S. Mattick, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia.
Tel: þ61-7-3346-2079; Fax: þ61-7-3346-2111; E-mail: [email protected]
Marcel Dinger is a Senior Research Officer at the Institute for Molecular Bioscience at the University of Queensland, Australia. His
research focuses on unravelling the functions of long non-coding RNAs in mammalian development and disease.
Paulo Amaral is a PhD student at the Institute for Molecular Bioscience at the University of Queensland, Australia. His research
focuses on the role of non-coding RNA in the epigenetic control of differentiation and development.
Tim Mercer is a Research Officer at the Institute for Molecular Bioscience at the University of Queensland, Australia. His research
focuses on the evolution and expression of long non-coding RNAs and their function in the brain.
John Mattick is a Professor of Molecular Biology and Australian Research Council Federation Fellow at the Institute for Molecular
Bioscience at the University of Queensland, Australia. His research focuses on the role of non-coding RNAs in the evolution and
development of complex organisms, and the emergence of cognition in humans.
ß The Author 2009. Published by Oxford University Press. For permissions, please email: [email protected]
408
Dinger et al.
or typical features. In every case this recognition has
involved a debate about the degree to which each
new phenomenon has general relevance.
Transcriptomic studies based on the analysis of
cDNA libraries [1], 50 -CAGE tags [2] and genome
tiling arrays [3–5] have shown that, surprisingly, the
vast majority of the eukaryotic genome is dynamically transcribed in one setting or another, mostly
as non-protein-coding RNAs (ncRNAs). Estimates
range from 70% in nematode worm to 85% in fruit
fly and 93% in humans. It is also clear that much
of the genome is transcribed on both strands, and
that many protein-coding loci also express antisense
RNAs [6]. Moreover, recent data shows that many
human genes (two-thirds of those examined) contain
previously unrecognized first exons (and associated
promoters) located at huge distances (an average of
186 kb; median 85 kb) upstream of the currently
annotated start sites that are expressed in a tissuespecific manner and often span neighbouring genes
[7], with similar results in fruit fly [8, 9], as well as
intergenic splicing and fusion of transcripts that may
be derived from widely separated locations including
different chromosomes [10–14], and various types of
post-splicing processing to produce smaller/variant
transcripts [15–22] (Figure 1).
These observations build a picture of the genome
as a semi-continuous information system that is
extraordinarily sophisticated and highly regulated
to produce complex suites of RNA with structural,
catalytic, protein-coding and regulatory potential in
different cells and in different developmental contexts. In parallel, it is becoming increasingly apparent
that the traditional concept of the gene as a discrete
unit of (generally) protein-coding information with
cis-regulatory elements is simplistic and misleading
when applied to the genomes or genetics of the
higher organisms.
Although initially discounted as artefacts in cDNA
library preparations, many independent observations
have shown that these unexpected non-proteincoding transcripts are prevalent in vivo. However,
the proportion of this pervasive transcription that is
functional is unknown, and has been the subject of
ongoing debate, with contrasting suggestions that it
simply represents transcriptional ‘noise’ [23–26] or
that it represents a hidden layer of regulatory information affecting gene expression, especially epigenetic status, during differentiation and development
[27, 28]. There are some indices that are consistent
with function, such as the dynamic and precise
expression patterns of these transcripts during
Figure 1: Overview of pervasive transcription and its implications on the gene concept. A representation of
a traditional gene (boxed) shown in context with associated coding and non-coding transcripts identified in
various transcriptomic analyses. The highlighted sections exemplify how experimental approaches targeting regions
within one ‘gene’ may inadvertently target other intersecting transcripts and thereby yield confounding results.
Abbreviations: PASRs, promoter-associated small RNAs; TALRs, terminal-associated long RNAs; tiRNAs,
tiny RNAs.
Pervasive transcription of the eukaryotic genome
differentiation [29–31], but others that appear to be
inconsistent with function. The latter include the
observations that the fraction of the non-coding
DNA (ncDNA) that is considered conserved is
significantly smaller than that which is transcribed,
that many ncRNAs are transient and only present
in low levels, and finally that, if functionally relevant,
these RNAs should have been discovered
before now.
However, these considerations are largely conceived within a paradigm based on the properties
of protein-coding sequences, which have influenced
expectations for functional conservation and expression levels, and dominated the focus and interpretation of genetic screens [32]. In this article, we address
key questions concerning the significance of pervasive transcription in eukaryotic genomes and propose
that assumptions about functional indices based on
the characteristics of protein-coding information do
not necessarily apply to non-coding transcribed
regions. We present the argument that a large
fraction, perhaps the majority, of the non-coding
transcription is functional and consider how this
impacts on our understanding of the information
content of the genome.
MOST PERVASIVE
TRANSCRIPTION OCCURS AT
LOW LEVELSçCAN LOW-LEVEL
EXPRESSION BE MEANINGFUL?
The prevalent transcription across the genome
generally occurs at a low frequency relative to
protein-coding genes. Indeed, in the absence of
normalization strategies to enrich for less abundant
transcripts [1, 2] or array-based interrogation methods that can detect low abundance transcripts [3, 4],
it appeared in early studies that the RNA repertoire
of cells was comprised (apart from common infrastructural ncRNAs such as ribosomal and transfer
RNAs) almost entirely of mRNAs, and any remainder simply discounted as degradation products or
incomplete cloned fragments (‘expressed sequence
tags’ or ESTs) from protein-coding transcripts. The
conclusion that most of the transcriptome specified
mRNAs was also influenced by the use of purification strategies based on the presence of polyA tails
to reduce contamination by infrastructural RNAs,
but array-based methods subsequently revealed that
44% of transcripts are not polyadenylated, and
that this fraction was largely different in sequence
409
composition from the polyadenylated RNAs [3].
Additionally, there are significant numbers of transcripts derived from RNA polymerase III promoters
that escape capture by traditional methods [33].
The existence of pervasive expression of the
genome has only recently been established and,
like the discovery of introns, has come as a surprise.
This in turn has prompted concerns that this transcription may simply be background noise, a view
informed by the perception that most of the genome
is comprised of evolutionary debris and reinforced
by the expectation that the low expression of the
non-coding transcripts is reflective of non-function.
However, these perceptions rest on the assumption
that proteins transact most genetic information,
and that the functional efficacy of ncRNAs can be
quantified on the same scale as protein-coding
transcripts, which is not reasonable if their mode of
action is different. Messenger RNAs are necessary
in levels adequate to produce required amounts of
proteins, which is high for those with core structural
or metabolic functions and common to most types
of cells [34].
In contrast, regulatory signals may only be
expressed in small amounts (or in particular cells,
see below), like hormones and many transcription
factors, which can trigger a cascade with amplified
downstream effects. For example, an ncRNA that
interacts directly with the genome at a unique
locus would need only to be present at a 1 : 1 ratio
with each allele (i.e. just one or two copies per
cell), especially since many regulatory RNAs
appear to act locally, for example to alter local
chromatin architecture [28], in which case diffusion
is less of a problem. Moreover, if an ncRNA targets
a regulatory gene that targets other genes in a network, profound changes in expression programs
could eventuate. Indeed, there is accumulating
evidence that transcription factors and other global
regulators are prevalent targets for different classes of
ncRNAs [29, 35–38] and that small variations in
the expression of these factors can have dramatic
phenotypic consequences (see e.g. [39]).
In addition, it might be expected that ncRNAs,
especially those involved in regulating differentiation and development, would only be expressed in
particular places or restricted subsets of cell populations, giving the impression of overall low levels
in a tissue or organism as a whole. For instance,
one of the few genetically identified microRNAs
(miRNAs) in Caenorhabditis elegans, lsy-6, which
410
Dinger et al.
Figure 2: Specific expression of a long ncRNA in the adult mouse brain. In situ hybridization image of the whole
brain (top-left) showing the expression of ncRNA AK011437 in the CA1 region of the hippocampus (top-right). The
false colour image (bottom) highlights the specificity of the expression. Image courtesy of the Allen Institute for
Brain Science.
determines left-right asymmetry in taste-receptor
neurons, is expressed in a very limited subset of
neurons and was initially very difficult to verify biochemically [40]. This appears to be common. A large
fraction of long ncRNAs (849 out of 1328 examined) are expressed in brain and are easily detectable
by insitu hybridization in particular cells in, for example, the hippocampus, cortex or cerebellum [30],
but they only comprise a tiny fraction of all transcripts in the brain as a whole (Figure 2). This also
applies to small RNAs, as strikingly illustrated by
piwi-interacting RNAs (piRNAs), small RNAs of
30 nt highly enriched in germline cells that were
unknown until very recently when they were
identified in testis by specific immunoprecipitation,
despite being present in sufficient amounts to be
visible on an agarose gel [41]. Finally, it should be
noted that, despite the general trend, some long
ncRNAs are actually highly expressed [42].
NON-CODING RNAs ARE OFTEN
MORE RAPIDLY DEGRADED
THAN mRNAsçDOES THAT
SUGGEST THEY ARE JUST NOISE
THAT THE CELL IS TRYING
TO DAMPEN?
The rapid degradation of some non-coding
transcripts is often cited as indicative of lack of
function [43]. However, the ready detection of
many ncRNAs by Northern, microarray or in situ
hybridization analyses indicates that there is an
operational homeostasis between their synthesis and
degradation, and that many are relatively stable.
Nonetheless, the knockdown of exosome components in Arabidopsis and human cells has revealed
an additional hidden layer of hundreds of ncRNAs
[44, 45] that are normally rapidly degraded by RNA
surveillance programs [46], such as by 30 !50 exonucleases in exosomes [47] or the so-called nonsensemediated decay (NMD) pathway [48]. Despite
the targeting of this hidden layer by NMD [43],
increasing evidence suggests that it is unlikely that
all these transcripts are actually ‘nonsense’.
The transcripts rapidly degraded by the exosome,
also known in yeast as ‘cryptic unstable transcripts’
(CUTs) [49], are non-randomly distributed in the
genome and are often associated with promoters
[50], suggesting an involvement in gene regulation.
Indeed, several CUTs have been ascribed roles in the
regulation of histone modifications with subsequent
effects on gene expression [51]. For example, in
yeast, a cryptic antisense transcript is able to delay
the chromatin remodelling and subsequent RNA
polymerase II recruitment to the sense PHO5
gene [52]. Similarly, the expression of two antisense
cryptic transcripts was shown to induce histone
deacetylase activity that resulted in the repression
Pervasive transcription of the eukaryotic genome
of the sense PHO84 gene [53]. It has also been
shown that these long antisense RNAs when ectopically expressed can act in trans to mediate transcriptional silencing by targeting different histone
deacetylases specifically to the PHO84 promoter
[54]. Moreover, CUTs play a broad role in limiting
the dispersion of repetitive elements in the yeast
genome. Recently, a cryptic antisense RNA was
found to silence the action of the sense Ty1 repetitive
element in trans via Set1 histone methyltransferase
action [55]. Although much of this analysis is
currently restricted to yeast, whose simplicity and
experimental tractability has favoured the quantitative description of non-coding transcription and
its stability, it seems likely that these phenomena
are widespread in eukaryotes. Similarly, given the
range of key processes affected by NMD [56], it is
possible that NMD is not simply a mechanism to
degrade aberrant mRNAs, but rather a sophisticated
system for modulating transcript stability that can
discriminate between protein-coding and regulatory
RNAs. These could involve uncharacterized
mechanisms of control of ncRNA processing and
metabolism, as indicated by the involvement of
NMD and nuclear degradation pathways in the
well-established role of Xist in dosage compensation in mammals, which are essential for the
up-regulation of spliced Xist transcripts at the onset
of X-inactivation [57].
The biological potency of relatively shortlived transcripts can also be rationalized from a
mechanistic perspective, in a similar fashion to lowlevel transcription. Unlike mRNAs, which need to
persist for a sufficient period to be exported and
undergo several rounds of translation, a nuclearacting ncRNA can immediately exert its function.
Indeed, this can represent an intrinsic advantage of
RNA regulation given that RNA signals can not
only be rapidly produced, but also rapidly eliminated, providing efficient and dynamic changes to
the system. For example, non-coding transcripts
associated with the Ccdn1 promoter are quickly
induced by DNA damage, and even in low-copy
numbers can recruit and locally activate the RNAbinding protein TLS to repress Ccdn1 expression
[58]. Conversely, short-lived transcripts may also
work as triggers with stably induced responses. This
is exemplified by the p15 antisense RNA, p15AS,
which induces the silencing of the p15 tumour suppressor gene in leukaemia through heterochromatin
formation [59]. The subsequent inhibition of p15AS
411
does not release the silencing suggesting that transient expression of the antisense transcript acts as a
trigger to induce persistent epigenetic changes.
Moreover, transcripts that target proteins to the
genome are only required for the duration of the
recruitment. Indeed, such a mechanism whereby
ncRNAs are involved in the recruitment of regulatory proteins has been proposed for centromeric
RNAs [60, 61] and promoter associated transcripts
that are targets for small RNAs, both of which
undergo rapid turnover [62].
IS IT THE ACT OF TRANSCRIPTION
OR THE TRANSCRIPT THAT IS
FUNCTIONAL?
An alternative mechanism by which regulated
expression of ncDNA might exert a biologically
important function is by the act of transcription
itself [25]. This may be illustrated by the fbp1 locus
in Schizosaccharomyces pombe, wherein long rare transcripts initiated upstream of the fbp1 gene initiate a
cascade of ncRNAs with concomitant alteration of
the chromatin around the fbp1 promoter to allow its
efficient transcription [63]. In this case, the nucleosome rearrangement caused by RNA polymerase II
activity is considered to be responsible for the local
chromatin remodelling, with the resulting noncoding transcripts thought simply to be by-products
with no intrinsic function.
A recent study suggests that non-coding transcription can remodel the expression landscape (‘ripples of
transcription’), and thereby influence the expression
of neighbouring loci [64]. Negative regulation of
gene expression by transcriptional interference also
occurs in yeast [65] and probably in higher eukaryotes [66], suggesting that ncDNA transcription
per se represents an important mechanism for both
positive and negative influence on gene expression,
a possibility supported by the observation that the
promoters of non-coding transcripts are generally
more conserved than the transcripts themselves
[2, 67]. However, it is difficult to reconcile this
essentially binary act with the complex range of epigenetic changes that have been documented at active
and inactive gene loci [68, 69]. Moreover, in most
cases where transcription of non-coding regions
clearly has a regulatory role, such as in the
b-globin enhancers [70] and imprinted DNA methylation sites in the germline [71], the function of
the resulting transcripts has not been studied, and it
412
Dinger et al.
remains an open possibility that they convey a regulatory signal. For example, the imprinted ncRNA
Air, an unspliced 100 kb transcript whose transcription had been proposed to have regulatory function [72], was recently demonstrated to directly
interact with histone methyltransferase complexes
to epigenetically repress neighbouring gene expression in cis, thus regulating the imprinting of target
genes [73].
In some cases both the transcription of a locus
and the resultant transcripts may be functional, as
observed in the dihydrofolate reductase (DHFR) locus
where the transcription from an upstream minor
promoter and the resulting ncRNA contribute to
DHFR repression [74]. This may also be the case
for the Tsix inhibition of Xist expression, and,
indeed, while direct demonstration of a role for
Air transcription itself is missing and the transcript
has been shown to act in trans [73], a number of
observations have led to the proposition that
both mechanisms could be taking place to regulate
imprinting of the genes that are not overlapped
(Slc22a2 and Slc22a3) or overlapped (Igf2r) by Air
transcription, respectively (see ref. [75] for discussion
and additional references). Finally, nascent transcripts
may also represent targets that recruit effector factors,
such as in the previous examples of regulatory functions of the centromeric and promoter-associated
RNAs [60–62]. These observations highlight the
need for careful experimental design to differentiate
the assignment of functions to transcription per se
and/or the resulting transcripts.
NON-CODING TRANSCRIPTS
ARE POORLY CONSERVEDç
DOESN’T THIS INDICATE THEY
ARE NOT UNDER SELECTION AND
THEREFORE NON-FUNCTIONAL?
When considered collectively, ncRNAs are on average less conserved in sequence than mRNAs,
although they have similar conservation levels as
pre-mRNA transcripts [67]. Nevertheless, an examination of various signatures of selection, including
those in promoters, primary sequences, splice site
motifs [67] and secondary structures [76–78], reveals
that many long ncRNAs do not evolve neutrally
[67]. Furthermore, thousands of loci that express
novel long spliced ncRNAs, identified by particular
chromatin marks characteristic of transcription initiation and elongation in intergenic regions of mouse
and human genomes, exhibit detectable conservation [36, 79]. Moreover, the most highly conserved
sequences in the vertebrate genome are almost
exclusively non-coding [80, 81], many of which
are transcribed in a regulated manner that is disrupted
in cancer [82].
Since several known functional long ncRNAs,
such as Xist and Air, are poorly conserved, it is
evident that relative lack of conservation does not
necessarily signify lack of function and that long
ncRNAs are under different evolutionary constraints
to protein-coding genes [83]. Indeed, novel intergenic ncRNAs predicted in Drosophila species on
the basis of the conservation of splice sites alone
were shown to be developmentally regulated and
expressed in different species, suggesting these represent functional transcripts [84].
The secondary structure of RNA can similarly be
maintained despite changes to the primary sequence
due to the possibility of complementary base-pair
substitutions and tolerance for insertions. Accordingly, many regions of eukaryotic genomes are predicted to be conserved at the RNA secondary
structural level [76–78]. For example, analysis of
conserved RNA secondary structures identified
more than 30 000 potential structural RNA elements
in the human genome, many of which overlap with
known sites of transcription [78]. Similarly, comparative analysis of seven yeast species identified 2800
genomic loci that showed signatures of evolutionarily conserved RNA secondary structures, a substantial subset of which occurred in non-coding and
antisense regions [85].
The structure–function relationships of regulatory
sequences, including regulatory RNAs, can also
be quite different from those of proteins, which
are essential functional and often multitasked components of cells with precise three-dimensional
requirements; features that are reflected in their
different conservation characteristics. Moreover,
most evolutionary innovation in higher organisms,
and evolutionary selection for altered expression
patterns that underpin phenotypic radiation, occurs
in regulatory sequences [86], which can and do
evolve flexibly and rapidly [87–91].
Indeed, those regions of the human genome conserved amongst vertebrates that are subject to recent
evolutionary change relative to chimpanzee occur
mainly in non-coding regions, some of which are
transcribed [92, 93]. This includes the ncRNA
HAR1, which has undergone rapid evolutionary
Pervasive transcription of the eukaryotic genome
change in humans and is specifically expressed in the
Cajal-Retzius cells in the human neocortex [93].
The observation that many functionally validated
RNAs are evolving quickly [83, 90] may result
from these sequences having more plastic structure–function constraints, and we may expect considerable evolutionary innovation to occur in such
sequences. Indeed, many RNAs can be lineagespecific, such as Xist [94] and Air [95] in eutherian
mammals and transcribed Alu sequences in primates
[96], and even whole classes of ncRNAs specific to
particular organisms, such as Leishmania [97] and
Dictyostelium [98]. Similar observations have been
made in bacteria, in which functional RNAs are
poorly conserved [68] and can be specific to certain
strains [99]. Finally, lineage-specific phenomena,
such as DNA elimination [100] and whole genome
rearrangements in ciliate reproduction, can be
guided by ncRNAs [100, 101], suggesting the evolution of a whole mechanistic infrastructure based
on RNA-regulation.
MUCH OF THE MAMMALIAN
GENOME IS REPETITIVEçEVEN
IF IT IS TRANSCRIBED,WHAT
EVIDENCE IS THERE TO SUGGEST
THAT REPETITIVE SEQUENCE
CAN BE FUNCTIONAL?
A major criticism of the proposal that much of the
mammalian genome is functional is that a significant
proportion, almost half in human and mouse, is
composed of transposon-derived sequences. Along
with duplication, transposition is a major force in
genome evolution [102, 103], and is increasingly
implicated in the emergence of regulatory innovations such as imprinting [104, 105]. Nevertheless,
transposon-derived sequences per se have been
widely considered to be (mainly) non-functional
and not major contributors to genome function.
This may well be an incorrect assumption—one
which has also biased our understanding of how
much of the genome is conserved, as they are used
as the index sequences to assess the background
rate of ‘neutral evolution’ [87]. This issue is one of
intense debate, with conflicting evidence and/or
interpretations thereof. For example, the apparent
uniformity of indel distribution in transposonderived sequences suggests that these sequences are
not under selection, but such analyses did not consider different subclasses of transposon-derived
413
sequences or examine the indel distribution in
regulatory sequences (such as promoters) that have
different structure–function constraints to proteincoding exons or miRNAs. On the other hand,
mammalian-wide interspersed repeats (MIRs),
which occupy 1–2% of the genome (300 000
copies) and date back 130-million years, have a
lower than expected divergence from the mammalian MIR consensus, with similar divergence in both
human and mouse, although neutrally evolving
ancient repeats should be twice as divergent in
mouse, suggesting they are subject to selection
[106]. These elements also exhibit variable conservation across their length with a relatively conserved
70-nt central region containing a 15–25 nt highly
conserved core, also indicative of selection
[106, 107]. Moreover, Alu elements, which occupy
10.5% of the human genome, also have a conserved core [108] and a non-random distribution
that suggests positive selection [109].
The full regulatory potential of transposonderived sequences is unknown, but appears to
include a range of functions [87, 110]. They are
largely transcribed in a regulated manner, and feature promoters to drive specific expression [111].
Important roles have also been demonstrated for
Alu-derived RNAs in the regulation of RNA
polymerase II during heat shock [112] and the
regulation of alternative splicing, translation and
mRNA stability [113]. Similarly, it has been
shown that transcripts derived from retrotransposons
can regulate chromatin structure in transposon-rich
regions such as centromeres and neocentromeres
[114, 115], that LINE L1 retrotransposition can
mediate somatic mosaicism in neuronal precursor
cells [116], and that the transcription of inverted
repeats that serve as boundary elements can influence
gene expression [117]. Moreover, a recent study has
shown that 6–30% of cap-selected mouse and human
RNA transcripts initiate within repetitive elements,
that approximately 250 000 of these transcripts
are generally tissue specific, and that transposonderived sequences located immediately upstream of
protein-coding loci frequently function as alternative
promoters and/or express non-coding RNAs, identifying some 23 000 candidate regulatory regions
derived from retrotransposons [111]. In addition,
repetitive sequences may be included as parts of
larger transcripts, including many ncRNAs whose
functions have been demonstrated, such as Xist,
Air, Kcnq1ot1, BORG, DISC2, NTT and Xlisrts,
414
Dinger et al.
suggesting that these elements may be functional
modules common among ncRNAs [118] as well as
mRNAs [119].
IF MANY NON-CODING REGIONS
IN THE GENOME ARE
FUNCTIONAL,WHY HAVE THEY
NOT BEEN DETECTED IN GENETIC
SCREENS?
The widely held assumption that protein-coding
genes are the main functional outputs of the
genome was classically supported by genetic screens,
which identified genes associated with modified
phenotypes, most of which had mutations that
resulted in defective proteins. This assumption is
largely a consequence of an historic emphasis, both
technically and phenotypically, on protein-coding
genes, as well as assumptions about the basis of
regulatory mutations (for review and full discussion
see [32]). However, it is now evident that most
complex genetic and epigenetic phenomena in
eukaryotes are RNA-directed [27]. These include
RNA interference-related processes such as transcriptional and post-transcriptional gene silencing
[114, 120–123], position effect variegation
[124, 125], hybrid dysgenesis [126], chromosome
dosage compensation, parental imprinting and allelic
exclusion [127], germ cell reprogramming [128],
paramutation [129, 130] and possibly transvection
and transinduction [32].
The recent recognition that ncRNAs function
in various aspects of cell biology has prompted a
reconsideration of functional polymorphisms
located in non-coding regions. The majority of
genome-wide association studies to identify single
nucleotide polymorphisms (SNPs) associated with
disease susceptibility have mapped the variation to
non-coding regions, with a number overlapping
expressed ncRNAs [131]. For example, SNPs that
identified a susceptibility locus for myocardial infarction mapped to a long ncRNA, MIAT [132].
Similarly, a region associated with coronary artery
disease [133] encompasses a long ncRNA, ANRIL,
that associates with a high-risk haplotype for coronary artery disease and is expressed in tissues and cell
types affected by atherosclerosis [133]. However, the
complex and pervasive networks of non-coding
transcription within these regions can make it particularly difficult to elucidate the functional effects of
polymorphisms. For example, a SNP both within the
truncated form of ZFAT and the promoter of an
antisense transcript increases the expression of
ZFAT not through increasing the mRNA stability,
but rather by repressing the expression of the antisense transcript [134]. Further association studies and
an era of personal genomics will likely reveal many
more similar examples of natural polymorphisms
within functional non-coding transcripts.
IN THE CURRENT ABSENCE OF
EXTENSIVE FUNCTIONAL
STUDIES,WHAT EVIDENCE IS
THERE TO SUGGEST PERVASIVE
NON-CODING TRANSCRIPTION
IS GENERALLY FUNCTIONAL?
The number of specific studies on these newly identified ncRNAs is limited. To date, there are only
40 documented examples of ncRNAs in mammals
whose function has been verified, mainly using
siRNA- or shRNA-mediated knockdown (see ref.
[32]). We anticipate there will be an avalanche of
such studies in the coming years, with in vivo
strategies also being used to determine the biological
processes in which individual ncRNAs may be
involved (see e.g. [135]). Nonetheless, the currently
validated sample is as yet too small to derive any firm
conclusions about the entire population of ncRNAs,
although some themes are beginning to emerge,
notably the role of many long ncRNAs in regulating
chromatin structure [28, 79, 136] (see below).
Despite limited functional data on individual
ncRNAs, there is substantial genome-wide evidence
to indicate that non-coding transcripts do not arise
merely by leaky transcription. The structure of the
transcriptome from different tissues or cell types is,
like mRNAs, largely consistent, with ESTs and tiling
array signals showing coincident start and termination sites, thereby indicating discrete transcriptional
units, which in some cases are conserved across
species [2, 5, 35, 67]. The observation that the
majority of the genome is transcribed should not
be misconstrued as the subset transcribed in individual cell types, which is estimated 15% [3].
Moreover, new evidence from yeast strand-specific
transcriptome analysis indicates that a significant
proportion (13%) of all RNAs exhibit differential
expression between alleles in both strands, indicating
that transcription of both protein-coding and noncoding loci is regulated on all four strands of a diploid
genome [137].
Pervasive transcription of the eukaryotic genome
There are also many general characteristics of
ncRNAs that point to their intrinsic functionality
[138]. These include: (i) the conservation of their
promoters, splice junctions, exons, predicted structures, genomic position and expression patterns
[2, 29, 35, 36, 67, 76–78, 83, 139–141]; (ii) their
dynamic expression and alternative splicing during
differentiation [29, 36, 142]; (iii) their altered expression or splicing patterns in cancer and other diseases
[82, 143–152]; (iv) their association with particular
chromatin signatures that are indicative of actively
transcribed genes [29, 36]; (v) their regulation by
key morphogens and transcription factors [29, 36,
152, 153]; and (vi) their tissue- and cell-specific
expression patterns and subcellular localization
[30, 152, 154–161].
Independently, the developmental and tissuespecific expression of most ncRNAs provides perhaps the most compelling case for their widespread
functionality. A study of ncRNAs expressed in
mouse brain by in situ hybridization showed that
the majority (623 out of 849) are selectively
expressed in discrete functional regions of the
brain, sometimes with evidence of specific subcellular localization [30]. Moreover, expression signatures
and dynamic regulation of hundreds of ncRNAs has
been observed across tissue types [38, 162, 163] and
in various developmental systems, from Drosophila
embryogenesis [156, 163] to differentiation of
mammalian ES cells [29], T-cells [31] and muscle
cells [160]. Large-scale alterations in the expression
of ncRNAs are observed in cancers, indicating
disrupted regulation and a possible involvement in
disease [149, 164]. Importantly, many of these regulated transcripts, the so called ‘macroRNAs’, span
several thousands of bases in the genome [5, 165],
indicating that not only transcription initiation is a
regulated process, but elongation is also highly
processive, and not abortive as might be expected
of random initiation events.
The rapidly increasing number of individual
ncRNAs implicated in various biological processes,
involving a vast range of mechanisms [138], ranging
from DNA replication [144, 166] to VDJ recombination [167], suggest we are likely only at the beginning of discovering the biological capabilities and
functional repertoires of RNA. While it is possible
that these are isolated cases involving one or a few
special ncRNAs, they may also represent precedents
for general mechanisms by which other RNAs
(perhaps even families of RNAs) may function.
415
As an example, the gene silencing directed by Xist
was recently shown to involve recruitment of
chromatin silencing proteins [168], in particular
Polycomb complexes, which induce histone methylation to regulate expression of a large cohort of
genes. Intriguingly, Polycomb has subsequently
been found to be recruited to several loci by other
ncRNAs including RepA, Air, Kcnq1ot1 and Hotair,
suggesting a general mechanism whereby specific
ncRNAs direct the action of Polycomb to different
sites in the genome. Indeed, a recent study found a
high proportion (estimated at 24%) of long intergenic ncRNAs associate with Polycomb Repressive
Complex 2 (PRC2), and can affect expression of
target genes, with many additional ncRNAs being
associated with other chromatin-modifying complexes [79]. In addition, ncRNAs that are induced
during embryonic stem cell differentiation are associated with activated chromatin and chromatinactivating complexes [29] and it seems that the
regulation of epigenetic processes may be a major
function of ncRNAs [28, 136].
Many classes of eukaryotic effector proteins have
RNA-binding domains, including chromatin modifying proteins [28], transcription factors [169]
and even proteins involved in membrane signal transduction pathways [170], suggestive of a general
infrastructure that can interact with and potentially
be regulated by RNA. Some of these proteins
bind specific populations of RNAs and such RNAbinding propensity has been used as evidence to identify regulatory RNAs, such as the roX RNAs that
regulate dosage compensation in Drosophila [171].
Finally, many cellular processes and subcellular
structures, such as chromatin [172, 173], the nuclear
matrix [174], and the origin recognition complex
[166], are RNase-sensitive, indicating that RNA
components are required. Although most are yet to
be identified, the demonstration of fundamental
roles for ncRNAs in nuclear structures such as paraspeckles [160, 175, 176] and in other previously
unrecognized domains [159], sets a precedent for
further roles of ncRNAs in cell biology.
HOW WILL IT BE POSSIBLE TO
TEST WHETHER LARGE
FRACTIONS OF NONCODING
RNAs ARE FUNCTIONAL?
A definitive identification of all ncRNAs in eukaryotic genomes that are biologically relevant may not
416
Dinger et al.
be achievable in the foreseeable future. Nevertheless,
recent years have witnessed an explosion in the
number of reports of functional non-coding transcripts [32, 138, 177]. Together with an increased
awareness of ncRNA functionality, and their inclusion in genome-wide screens, it is likely that many
more functional transcripts will be defined, and that
a better quantification may be subsequently possible.
Estimating the proportion of ncRNAs that are
functional may be more feasible in simple organisms.
Dozens of regulatory RNAs have been found in
bacteria over the past two decades, including antisense transcripts and a diversity of plasmid-encoded
and intergenic small regulatory RNAs [178]. The
regulatory RNA repertoire can vary considerably
between microbial groups [178, 179] and, although
most are still to be characterized, the roles of many
have been examined with the aid of the wellestablished genetic and biochemical tools in model
bacterial systems. Although the non-coding space
in bacterial genomes is very limited compared to
eukaryotes, bacterial RNAs show an exquisite diversity in terms of mechanisms of action and are largely
explored in adaptation to specific environmental
conditions, indicating that ncRNA regulation has
been exploited since very early in evolutionary
history. Similarly, many ncRNAs, such as the aforementioned CUTs, have been discovered in yeast
[49], whose simplicity and experimental tractability
has favoured the quantitative description of noncoding transcription and its stability.
In contrast, ncRNAs in complex eukaryotes are
considerably more abundant and, given the recent
discovery of pervasive transcription, investigation
of their potential functions and biological roles is
in its infancy. The field has so far concentrated on
attempting to catalogue the repertoire of RNAs
[180], distinguishing them from protein-coding
genes [181], and assessing their expression in different
systems, all of which are important first steps to test
their possible functionality and to identify the biological processes in which they may be involved [42].
Moreover, although the involvement of ncRNAs
in many processes in eukaryotic cells has been
demonstrated, the understanding of their modes of
action is presently poor. Some ncRNAs are involved
in complex phenomena, which have so far proven
challenging to dissect. This is exemplified by the first
transcript recognized as an ncRNA in mammalian
cells, H19, which was discovered 20 years ago
[182] and has since been implicated in many genetic
phenomena, but whose mechanisms of action
remain unknown [183].
Our evolving awareness of ncRNAs will likely
prompt the development of novel tools to study
their roles. Indeed, the discovery of RNA interference has been exploited by molecular biologists to
provide the very powerful tool of siRNA-mediated
knockdown for functional analyses. These tools
may be similarly employed in large-scale approaches,
such as systematic siRNA knockdowns to identify
ncRNAs involved in specific functions or diseases
[8]. Similarly, the sequencing of RNAs associated
with particular protein-complexes will be useful to
broadly define classes of RNAs functionally implicated in biological processes. Nevertheless, it remains
likely that in most cases the elucidation of novel
mechanisms will require ad hoc characterization of
individual RNAs.
HOW DOES PERVASIVE
NON-CODING TRANSCRIPTION
AFFECT THE TRADITIONAL
CONCEPT OF A GENE?
Traditionally, a gene is described as a sequence of
DNA that occupies a specific location on a chromosome and determines a particular characteristic of
an organism. There is no a priori problem considering
a genomic locus that encodes an ncRNA as a ‘gene’
[184]. The gene concept can also be expanded to
encompass splice variants and post-transcriptional
processing in cases where the alternate products still
contribute to the same characteristic. However, in
cases where the alternate product has a distinct function, such as a tissue-specific splice variant lacking
a transmembrane domain resulting in an alternative
cellular localization and function, the concept of
a gene becomes troubled. Further challenges to the
gene concept are instigated by factors that blur the
boundaries of genes, such as variable transcription
start sites, variable locations of promoter sequences
that may be active in different types of cells, and
variable polyadenylation sites. The occurrence of
overlapping and antisense regulatory ncRNAs that
may be expressed differentially to the associated
protein-coding ‘gene’ further confounds the gene
concept [29, 30], as does the co-expression of multiple small RNAs [18, 19, 185, 186]. This blurring
has led to genes in higher organisms being described
as ‘fuzzy transcription clusters with multiple products’ [187], although descriptors for these clusters
Pervasive transcription of the eukaryotic genome
and their products, as well as hierarchically structured
lexicons, are problematic.
The more recent data describing the complex
pervasive transcription across the genome [2, 3, 5,
185, 188] further challenges even this fuzzier conception of a gene [184, 189, 190]. The observation
that many transcripts cover huge areas of the
genome, and may include exons from very distal
locations that then traverse ‘introns’ that themselves
harbour other ‘genes’ specifying other proteins or
ncRNAs, which may be separately expressed,
makes the idea of a gene as a discrete locus untenable. Moreover, as it is likely that such overlapping
suites of genetic information (including distal
promoters and enhancers) may ultimately form
contigs that span whole chromosomes, it becomes
impossible to separate one gene as an entity from
another. Finally, the discovery of interactions
between genes, as in the case of transcripts generated
by ‘fusions’ from different genomic regions or transpliced to generate chimeras [14, 191–193], cannot be
rationalized within existing conceptualizations of
a gene. This renews the original focus for the definition of a gene centred on phenotype, although in
this case the operational unit is a transcriptional final
product and not a fixed genomic coordinate [189].
Together, these observations also prompt a revision in our understanding of genome architecture,
away from a linear model, to a concept that provides
for the inclusion of any combination of subsequences
from the genome to be incorporated into a proteincoding or non-coding product. It suggests that the
information content of the genome is far greater and
that its organization is far more sophisticated than
previously imagined. This in turn implies that most
of the genome specifies a continuum of RNA regulatory information that may be intimately involved
in the ontogeny of complex organisms, by controlling the expression of proteins and their alternative
isoforms. However, a much more comprehensive
description of the genome and its relationship to
the transcriptome will be necessary before the practicalities of such a revision could be meaningfully
proposed. In the interim, at the very least, it will
be prudent to carefully consider the genomic context
of any ‘gene’ in the interpretation and design of
experiments that involve the targeting of gene subsequences, such as microarray, in situ hybridization
or RNA interference. Such experiments are often
presumed to provide data representative of the
entire gene, but may actually be inadvertently
417
affecting or measuring other overlapping transcriptional products (Figure 1).
FINAL CONSIDERATIONS: IN
UNCHARTED TERRITORY
It is now clear that RNAs have much greater structural and functional versatility than assumed only a
few years ago. Considering this potential, it has been
argued that extensive non-coding transcription
primarily provides a cache of RNA molecules that
can eventually evolve useful functions [23]. While
this may indeed represent a mechanism for generating novel regulatory RNAs, it is unknown what
fraction of the extant transcriptome might fall into
this reservoir category and, conversely, it is therefore
quite possible that most ncRNAs may have already
evolved functions, in many cases lineage-specifically.
Indeed, it has been proposed that the increasing
extent of transcribed non-coding RNA may provide
an important expansion of regulatory information
underpinning the developmental and cognitive complexity and the phenotypic diversity of animals,
which have a similar set of protein-coding genes
but exhibit large increases in their non-coding genomic sequences as their complexity increases [27, 194,
195]. To clarify the matter, it will be important to
elucidate the processes of genesis, fixation and functionalization of non-coding transcription, in particular the differences in the evolutionary forces that
shape non-coding sequences compared to proteincoding genes.
The proportion of pervasive transcription that is
functional remains an open question. Considering
the sheer amount of ncRNA transcription in
higher eukaryotes, even if a small fraction (let
alone a majority) has evolved functions, there
could be thousands of new genetic loci that have
escaped our attention and may hold the key to
understanding complex processes in eukaryotic
biology, especially in relation to development and
neural function [27, 196]. Research on ncRNAs
may also lead to discovery of important molecules
that, like protein-coding genes, may be used as tools
and targets in biotechnology and therapeutics [197].
Given the remarkably fertile grounds of ncRNA
research in recent years, there appears to be a
whole universe out there in the genome to be
explored.
The focusing of so-called ‘genome-wide’ studies
on protein-coding regions can be misleading to
418
Dinger et al.
biologists who are not closely acquainted with the
genomics field. The assumption that an understanding of the protein-coding subset of the genome and
its products will equate to a complete picture of the
molecular basis for development, cognition or disease is not sound, and indeed may be arbitrarily
diverting attention from many other genetic factors
underlying particular phenotypes. Recent advances
in technology make such experiments easier to
design, and it will be sensible henceforth to treat
any transcript, or indeed any part of the genome,
as being potentially functional.
9.
10.
11.
12.
13.
14.
Key Points
Transcriptomic analysis reveals prevalent transcription across
the eukaryotic genome.
The functional relevance of pervasive transcription is widely
debated.
Many indices of function, including dynamic expression profiles,
conservation signatures, splicing and chromatin modification
patterns, and subcellular localization, suggest the general functionality of non-coding transcription.
Pervasive transcription challenges traditional conceptions of the
gene and genetic information, and has important implications
for the design and interpretation of experiments investigating
gene function.
15.
16.
17.
18.
19.
References
1.
2.
3.
4.
5.
6.
7.
8.
Okazaki Y, Furuno M, Kasukawa T, et al. Analysis of the
mouse transcriptome based on functional annotation of
60,770 full-length cDNAs. Nature 2002;420:563–73.
Carninci P, Kasukawa T, Katayama S, et al. The transcriptional landscape of the mammalian genome. Science 2005;
309:1559–63.
Cheng J, Kapranov P, Drenkow J, et al. Transcriptional
maps of 10 human chromosomes at 5-nucleotide resolution.
Science 2005;308:1149–54.
Kapranov P, Cheng J, Dike S, et al. RNA maps reveal new
RNA classes and a possible function for pervasive transcription. Science 2007;316:1484–8.
Kapranov P, Drenkow J, Cheng J, et al. Examples of the
complex architecture of the human transcriptome revealed
by RACE and high-density tiling arrays. Genome Res 2005;
15:987–97.
Katayama S, Tomaru Y, Kasukawa T, et al. Antisense transcription in the mammalian transcriptome. Science 2005;309:
1564–6.
Denoeud F, Kapranov P, Ucla C, et al. Prominent use of
distal 50 transcription start sites and discovery of a large
number of additional exons in ENCODE regions. Genome
Res 2007;17:746–59.
Willingham AT, Orth AP, Batalov S, et al. A strategy for
probing the function of noncoding RNAs finds a repressor
of NFAT. Science 2005;309:1570–3.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
Manak JR, Dike S, Sementchenko V, et al. Biological function of unannotated transcription during the early development of Drosophila melanogaster. Nat Genet 2006;38:1151–8.
Li H, Wang J, Mor G, etal. A neoplastic gene fusion mimics
trans-splicing of RNAs in normal human cells. Science 2008;
321:1357–61.
Li X, Zhao L, Jiang H, et al. Short homologous sequences
are strongly associated with the generation of chimeric
RNAs in eukaryotes. J Mol Evol 2009;68:56–65.
Akiva P, Toporik A, Edelheit S, et al. Transcriptionmediated gene fusion in the human genome. Genome Res
2006;16:30–6.
Unneberg P, Claverie JM. Tentative mapping of
transcription-induced interchromosomal interaction using
chimeric EST and mRNA data. PLoS ONE 2007;2:e254.
Maher CA, Palanisamy N, Brenner JC, et al. Chimeric transcript discovery by paired-end transcriptome sequencing.
Proc Natl Acad Sci USA 2009;106:12353–8.
Fejes-Toth K, Sotirova V, Sachidanandam R, et al.
Post-transcriptional processing generates a diversity of
50 -modified long and short RNAs. Nature 2009;457:
1028–32.
Rodriguez A, Griffiths-Jones S, Ashurst JL, et al.
Identification of mammalian microRNA host genes and
transcription units. Genome Res 2004;14:1902–10.
Berezikov E, van Tetering G, Verheul M, et al. Many novel
mammalian microRNA candidates identified by extensive
cloning and RAKE analysis. Genome Res 2006;16:1289–98.
Kiss T. SnoRNP biogenesis meets Pre-mRNA splicing.
Mol Cell 2006;23:775–6.
Okamura K, Hagen JW, Duan H, et al. The mirtron
pathway generates microRNA-class regulatory RNAs in
Drosophila. Cell 2007;130:89–100.
Taft RJ, Glazov EA, Cloonan N, et al. Tiny RNAs
associated with transcription start sites in animals. Nat
Genet 2009;41:572–8.
Taft RJ, Glazov EA, Lassmann T, et al. Small RNAs derived
from snoRNAs. Rna 2009;15:1233–40.
Ruby JG, Jan CH, Bartel DP. Intronic microRNA precursors that bypass Drosha processing. Nature 2007;448:83–6.
Brosius J. Waste not, want not—transcript excess in multicellular eukaryotes. Trends Genet 2005;21:287–8.
Struhl K. Transcriptional noise and the fidelity of initiation
by RNA polymerase II. Nat Struct Mol Biol 2007;14:103–5.
Chakalova L, Debrand E, Mitchell JA, et al. Replication and
transcription: shaping the landscape of the genome. Nat Rev
Genet 2005;6:669–77.
Babak T, Blencowe BJ, Hughes TR. A systematic search
for new mammalian noncoding RNAs indicates little conserved intergenic transcription. BMC Genomics 2005;6:104.
Mattick JS. A new paradigm for developmental biology.
J Exp Biol 2007;210:1526–47.
Mattick JS, Amaral PP, Dinger ME, et al. RNA regulation
of epigenetic processes. Bioessays 2009;31:51–9.
Dinger ME, Amaral PP, Mercer TR, et al. Long noncoding
RNAs in mouse embryonic stem cell pluripotency and
differentiation. Genome Res 2008;18:1433–45.
Mercer TR, Dinger ME, Sunkin SM, et al. Specific expression of long noncoding RNAs in the mouse brain. Proc Natl
Acad Sci USA 2008;105:716–21.
Pervasive transcription of the eukaryotic genome
31. Pang KC, Dinger ME, Mercer TR, et al. Genome-wide
identification of long noncoding RNAs in CD8þ T cells.
J Immunol 2009;182:7738–48.
32. Mattick JS. The genetic signatures of noncoding RNAs.
PLoS Genet 2009;5:e1000459.
33. Dieci G, Fiorino G, Castelnuovo M, et al. The expanding
RNA polymerase III transcriptome. Trends Genet 2007;23:
614–22.
34. Karlin S, Brocchieri L, Campbell A, et al. Genomic
and proteomic comparisons between bacterial and archaeal
genomes and related comparisons with the yeast and fly
genomes. Proc Natl Acad Sci USA 2005;102:7309–14.
35. Engstrom PG, Suzuki H, Ninomiya N, et al. Complex loci
in human and mouse genomes. PLoS Genet 2006;2:e47.
36. Guttman M, Amit I, Garber M, et al. Chromatin signature
reveals over a thousand highly conserved large non-coding
RNAs in mammals. Nature 2009;458:223–7.
37. Makeyev EV, Zhang J, Carrasco MA, etal. The MicroRNA
miR-124 promotes neuronal differentiation by triggering
brain-specific alternative pre-mRNA splicing. Mol Cell
2007;27:435–48.
38. Nakaya HI, Amaral PP, Louro R, et al. Genome mapping
and expression analyses of human intronic noncoding
RNAs reveal tissue-specific patterns and enrichment in
genes related to regulation of transcription. Genome Biol
2007;8:R43.
39. Kopp JL, Ormsbee BD, Desler M, et al. Small increases in
the level of Sox2 trigger the differentiation of mouse
embryonic stem cells. Stem Cells 2008;26:903–11.
40. Johnston RJ, Hobert O. A microRNA controlling left/right
neuronal asymmetry in Caenorhabditis elegans. Nature 2003;
426:845–9.
41. Girard A, Sachidanandam R, Hannon GJ, et al. A germlinespecific class of small RNAs binds mammalian Piwi proteins.
Nature 2006;442:199–202.
42. Dinger ME, Pang KC, Mercer TR, et al. NRED: a database
of long noncoding RNA expression. Nucleic Acids Res 2009;
37:D122–6.
43. Kurihara Y, Matsui A, Hanada K, et al. Genome-wide suppression of aberrant mRNA-like noncoding RNAs by
NMD in Arabidopsis. Proc Natl Acad Sci USA 2009;106:
2453–8.
44. Chekanova JA, Gregory BD, Reverdatto SV, et al.
Genome-wide high-resolution mapping of exosome substrates reveals hidden features in the Arabidopsis transcriptome. Cell 2007;131:1340–53.
45. Preker P, Nielsen J, Kammler S, etal. RNA exosome depletion reveals transcription upstream of active human promoters. Science 2008;322:1851–4.
46. Fasken MB, Corbett AH. Process or perish: quality
control in mRNA biogenesis. Nat Struct Mol Biol 2005;12:
482–8.
47. Houseley J, LaCava J, Tollervey D. RNA-quality control
by the exosome. Nat Rev Mol Cell Biol 2006;7:529–39.
48. Isken O, Maquat LE. The multiple lives of NMD factors:
balancing roles in gene and genome regulation. Nat Rev
Genet 2008;9:699–712.
49. Wyers F, Rougemaille M, Badis G, et al. Cryptic pol II
transcripts are degraded by a nuclear quality control pathway involving a new poly(A) polymerase. Cell 2005;121:
725–37.
419
50. Davis CA, Ares M, Jr. Accumulation of unstable promoterassociated transcripts upon loss of the nuclear exosome subunit Rrp6p in Saccharomyces cerevisiae. Proc Natl Acad Sci
USA 2006;103:3262–7.
51. Johnson JM, Edwards S, Shoemaker D, et al. Dark matter in
the genome: evidence of widespread transcription detected
by microarray tiling experiments. Trends Genet 2005;21:
93–102.
52. Uhler JP, Hertel C, Svejstrup JQ. A role for noncoding
transcription in activation of the yeast PHO5 gene. Proc
Natl Acad Sci USA 2007;104:8011–6.
53. Camblong J, Iglesias N, Fickentscher C, et al. Antisense
RNA stabilization induces transcriptional gene silencing
via histone deacetylation in S. cerevisiae. Cell 2007;131:
706–17.
54. Camblong J, Beyrouthy N, Guffanti E, et al. Trans-acting
antisense RNAs mediate transcriptional gene cosuppression
in S. cerevisiae. Genes Dev 2009;23:1534–45.
55. Berretta J, Pinskaya M, Morillon A. A cryptic unstable
transcript mediates transcriptional trans-silencing of the
Ty1 retrotransposon in S. cerevisiae. Genes Dev 2008;22:
615–26.
56. Neu-Yilik G, Kulozik AE. NMD: multitasking between
mRNA surveillance and modulation of gene expression.
Adv Genet 2008;62:185–243.
57. Ciaudo C, Bourdet A, Cohen-Tannoudji M, et al. Nuclear
mRNA degradation pathway(s) are implicated in Xist
regulation and X chromosome inactivation. PLoS Genet
2006;2:e94.
58. Wang X, Arai S, Song X, et al. Induced ncRNAs allosterically modify RNA-binding proteins in cis to inhibit
transcription. Nature 2008;454:126–30.
59. Yu W, Gius D, Onyango P, et al. Epigenetic silencing of
tumour suppressor gene p15 by its antisense RNA. Nature
2008;451:202–6.
60. Buhler M, Verdel A, Moazed D. Tethering RITS to a
nascent transcript initiates RNAi- and heterochromatindependent gene silencing. Cell 2006;125:873–886.
61. Motamedi MR, Verdel A, Colmenares SU, et al. Two
RNAi complexes, RITS and RDRC, physically interact
and localize to noncoding centromeric RNAs. Cell 2004;
119:789–802.
62. Han J, Kim D, Morris KV. Promoter-associated RNA
is required for RNA-directed transcriptional gene
silencing in human cells. Proc Natl Acad Sci USA 2007;104:
12422–7.
63. Hirota K, Miyoshi T, Kugou K, et al. Stepwise chromatin
remodelling by a cascade of transcription initiation of
non-coding RNAs. Nature 2008;456:130–4.
64. Ebisuya M, Yamamoto T, Nakajima M, et al. Ripples
from neighbouring transcription. Nat Cell Biol 2008;10:
1106–13.
65. Martens JA, Laprade L, Winston F. Intergenic transcription
is required to repress the Saccharomyces cerevisiae SER3 gene.
Nature 2004;429:571–4.
66. Mazo A, Hodgson JW, Petruk S, et al. Transcriptional interference: an unexpected layer of complexity in gene regulation. J Cell Sci 2007;120:2755–61.
67. Ponjavic J, Ponting CP, Lunter G. Functionality or transcriptional noise? Evidence for selection within long
noncoding RNAs. Genome Res 2007;17:556–65.
420
Dinger et al.
68. Barski A, Cuddapah S, Cui K, et al. High-resolution profiling of histone methylations in the human genome. Cell
2007;129:823–37.
69. Kouzarides T. Chromatin modifications and their function.
Cell 2007;128:693–705.
70. Ashe HL, Monks J, Wijgerde M, et al. Intergenic transcription and transinduction of the human beta-globin locus.
Genes Dev 1997;11:2494–509.
71. Chotalia M, Smallwood SA, Ruf N, et al. Transcription is
required for establishment of germline methylation marks at
imprinted genes. Genes Dev 2009;23:105–17.
72. Pauler FM, Koerner MV, Barlow DP. Silencing by
imprinted noncoding RNAs: is transcription the answer?
Trends Genet 2007;23:284–92.
73. Nagano T, Mitchell JA, Sanz LA, et al. The air noncoding
RNA epigenetically silences transcription by targeting G9a
to chromatin. Science 2008;322:1717–20.
74. Martianov I, Ramadass A, Serra Barros A, et al. Repression
of the human dihydrofolate reductase gene by a non-coding
interfering transcript. Nature 2007;445:666–70.
75. Mohammad F, Mondal T, Kanduri C. Epigenetics of
imprinted long noncoding RNAs. Epigenetics 2009;4.
76. Torarinsson E, Sawera M, Havgaard JH, et al. Thousands of
corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure.
Genome Res 2006;16:885–9.
77. Torarinsson E, Yao Z, Wiklund ED, et al. Comparative
genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res 2008;18:
242–51.
78. Washietl S, Hofacker IL, Lukasser M, et al. Mapping of
conserved RNA secondary structures predicts thousands
of functional noncoding RNAs in the human genome.
Nat Biotechnol 2005;23:1383–90.
79. Khalil AM, Guttman M, Huarte M, etal. Many human large
intergenic noncoding RNAs associate with chromatinmodifying complexes and affect gene expression. Proc Natl
Acad Sci USA 2009;106:11667–72.
80. Bejerano G, Pheasant M, Makunin I, et al. Ultraconserved
elements in the human genome. Science 2004;304:1321–5.
81. Stephen S, Pheasant M, Makunin IV, et al. Large-scale
appearance of ultraconserved elements in tetrapod genomes
and slowdown of the molecular clock. Mol Biol Evol 2008;
25:402–8.
82. Calin GA, Liu CG, Ferracin M, etal. Ultraconserved regions
encoding ncRNAs are altered in human leukemias and
carcinomas. Cancer Cell 2007;12:215–29.
83. Pang KC, Frith MC, Mattick JS. Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of
function. Trends Genet 2006;22:1–5.
84. Hiller M, Findeiss S, Lein S, et al. Conserved introns reveal
novel transcripts in Drosophila melanogaster. Genome Res
2009;19:1289–300.
85. Steigele S, Huber W, Stocsits C, et al. Comparative analysis
of structured RNAs in S. cerevisiae indicates a multitude of
different functions. BMC Biol 2007;5:25.
86. Carroll SB. Evo-devo and an expanding evolutionary
synthesis: a genetic theory of morphological evolution.
Cell 2008;134:25–36.
87. Pheasant M, Mattick JS. Raising the estimate of functional
human sequences. Genome Res 2007;17:1245–53.
88. Fisher S, Grice EA, Vinton RM, etal. Conservation of RET
regulatory function from human to zebrafish without
sequence similarity. Science 2006;312:276–9.
89. Frith MC, Ponjavic J, Fredman D, et al. Evolutionary
turnover of mammalian transcription start sites. Genome
Res 2006;16:713–22.
90. Smith NG, Brandstrom M, Ellegren H. Evidence for
turnover of functional noncoding DNA in mammalian
genome evolution. Genomics 2004;84:806–13.
91. Taylor MS, Kai C, Kawai J, et al. Heterotachy in mammalian promoter evolution. PLoS Genet 2006;2:e30.
92. Pollard KS, Salama SR, King B, et al. Forces shaping the
fastest evolving regions in the human genome. PLoS Genet
2006;2:e168.
93. Pollard KS, Salama SR, Lambert N, et al. An RNA gene
expressed during cortical development evolved rapidly in
humans. Nature 2006;443:167–72.
94. Chow JC, Yen Z, Ziesche SM, et al. Silencing of the
mammalian X chromosome. Annu Rev Genomics Hum
Genet 2005;6:69–92.
95. Sleutels F, Zwart R, Barlow DP. The non-coding Air RNA
is required for silencing autosomal imprinted genes. Nature
2002;415:810–3.
96. Liu GE, Alkan C, Jiang L, et al. Comparative analysis of Alu
repeats in primate genomes. Genome Res 2009;19:876–85.
97. Dumas C, Chow C, Muller M, et al. A novel class of developmentally regulated noncoding RNAs in Leishmania.
Eukaryot Cell 2006;5:2033–46.
98. Aspegren A, Hinas A, Larsson P, et al. Novel non-coding
RNAs in Dictyostelium discoideum and their expression
during development. Nucleic Acids Res 2004;32:4646–56.
99. Mandin P, Repoila F, Vergassola M, et al. Identification
of new noncoding RNAs in Listeria monocytogenes and
prediction of mRNA targets. Nucleic Acids Res 2007;35:
962–74.
100. Liu Y, Taverna SD, Muratore TL, et al. RNAi-dependent
H3K27 methylation is required for heterochromatin
formation and DNA elimination in Tetrahymena. Genes
Dev 2007;21:1530–45.
101. Nowacki M, Vijayan V, Zhou Y, et al. RNA-mediated
epigenetic programming of a genome-rearrangement
pathway. Nature 2008;451:153–8.
102. Zeh DW, Zeh JA, Ishida Y. Transposable elements and an
epigenetic basis for punctuated equilibria. Bioessays 2009;31:
715–26.
103. Oliver KR, Greene WK. Transposable elements: powerful
facilitators of evolution. Bioessays 2009;31:703–14.
104. Pask AJ, Papenfuss AT, Ager EI, et al. Analysis of the
platypus genome suggests a transposon origin for mammalian imprinting. Genome Biol 2009;10:R1.
105. Gehring M, Bubb KL, Henikoff S. Extensive demethylation
of repetitive elements during seed development underlies
gene imprinting. Science 2009;324:1447–51.
106. Silva JC, Shabalina SA, Harris DG, et al. Conserved
fragments of transposable elements in intergenic regions:
evidence for widespread recruitment of MIR- and
L2-derived sequences within the mouse and human
genomes. Genet Res 2003;82:1–18.
107. Smit AF, Riggs AD. MIRs are classic, tRNA-derived SINEs
that amplified before the mammalian radiation. Nucleic Acids
Res 1995;23:98–102.
Pervasive transcription of the eukaryotic genome
108. Jelinek WR, Toomey TP, Leinwand L, et al. Ubiquitous,
interspersed repeated sequences in mammalian genomes.
Proc Natl Acad Sci USA 1980;77:1398–402.
109. Consortium IHGS. Finishing the euchromatic sequence of
the human genome. Nature 2004;431:931–45.
110. Jurka J. Conserved eukaryotic transposable elements and the
evolution of gene regulation. Cell Mol Life Sci 2008;65:
201–4.
111. Faulkner GJ, Kimura Y, Daub CO, et al. The regulated
retrotransposon transcriptome of mammalian cells. Nat
Genet 2009;41:563–71.
112. Mariner PD, Walters RD, Espinoza CA, et al. Human
Alu RNA is a modular transacting repressor of mRNA
transcription during heat shock. Mol Cell 2008;29:499–509.
113. Hasler J, Samuelsson T, Strub K. Useful ‘junk’: Alu RNAs
in the human transcriptome. Cell Mol Life Sci 2007;64:
1793–800.
114. Buhler M, Moazed D. Transcription and RNAi in heterochromatic gene silencing. Nat Struct Mol Biol 2007;14:
1041–8.
115. Chueh AC, Northrop EL, Brettingham-Moore KH, et al.
LINE retrotransposon RNA is an essential structural and
functional epigenetic component of a core neocentromeric
chromatin. PLoS Genet 2009;5:e1000354.
116. Muotri AR, Gage FH. Generation of neuronal variability
and complexity. Nature 2006;441:1087–93.
117. Lunyak VV, Prefontaine GG, Nunez E, et al.
Developmentally regulated activation of a SINE B2 repeat
as a domain boundary in organogenesis. Science 2007;317:
248–51.
118. Amaral PP, Mattick JS. Noncoding RNA in development.
Mamm Genome 2008;19:454–92.
119. Peaston AE, Evsikov AV, Graber JH, et al. Retrotransposons
regulate host genes in mouse oocytes and preimplantation
embryos. Dev Cell 2004;7:597–606.
120. Mette MF, Aufsatz W, van der Winden J, et al.
Transcriptional silencing and promoter methylation triggered by double-stranded RNA. EmboJ 2000;19:5194–201.
121. Pal-Bhadra M, Bhadra U, Birchler JA. RNAi related
mechanisms affect both transcriptional and posttranscriptional transgene silencing in Drosophila. Mol Cell 2002;9:
315–27.
122. Vaucheret H. Post-transcriptional small RNA pathways
in plants: mechanisms and regulations. Genes Dev 2006;20:
759–71.
123. Matzke M, Kanno T, Huettel B, et al. Targets of RNAdirected DNA methylation. Curr Opin Plant Biol 2007;10:
512–19.
124. Pal-Bhadra M, Leibovitch BA, Gandhi SG, et al.
Heterochromatic silencing and HP1 localization in
Drosophila are dependent on the RNAi machinery.
Science 2004;303:669–72.
125. Singh J, Freeling M, Lisch D. A position effect on the
heritability of epigenetic silencing. PLoS Genet 2008;4:
e1000216.
126. Brennecke J, Malone CD, Aravin AA, et al. An epigenetic
role for maternally inherited piRNAs in transposon silencing. Science 2008;322:1387–92.
127. Yang PK, Kuroda MI. Noncoding RNAs and intranuclear
positioning in monoallelic gene expression. Cell 2007;128:
777–86.
421
128. Sasaki H, Matsui Y. Epigenetic events in mammalian germcell development: reprogramming and beyond. Nat Rev
Genet 2008;9:129–40.
129. Chandler VL. Paramutation: from maize to mice. Cell 2007;
128:641–5.
130. Cuzin F. Induction by microRNAs of hereditary epigenetic
modifications (paramutation) in the mouse. Proceedings of
the 5th Colmar Symposium: The New RNA Frontiers. France:
Colmar, 2007.
131. Altshuler D, Daly MJ, Lander ES. Genetic mapping in
human disease. Science 2008;322:881–8.
132. Ishii N, Ozaki K, Sato H, et al. Identification of a novel
non-coding RNA, MIAT, that confers risk of myocardial
infarction. J Hum Genet 2006;51:1087–99.
133. Pasmant E, Laurendeau I, Heron D, etal. Characterization of
a germ-line deletion, including the entire INK4/ARF locus,
in a melanoma-neural system tumor family: identification of
ANRIL, an antisense noncoding RNA whose expression
coclusters with ARF. Cancer Res 2007;67:3963–9.
134. Shirasawa S, Harada H, Furugaki K, et al. SNPs in the
promoter of a B cell-specific antisense transcript, SASZFAT, determine susceptibility to autoimmune thyroid
disease. Hum Mol Genet 2004;13:2221–31.
135. Bond AM, Vangompel MJ, Sametsky EA, et al. Balanced
gene regulation by an embryonic brain ncRNA is critical for
adult hippocampal GABA circuitry. Nat Neurosci 2009;12:
1020–7.
136. Costa FF. Non-coding RNAs, epigenetics and complexity.
Gene 2008;410:9–17.
137. Gagneur J, Sinha H, Perocchi F, et al. Genome-wide alleleand strand-specific expression profiling. Mol Syst Biol 2009;5:
274.
138. Amaral PP, Dinger ME, Mercer TR, et al. The eukaryotic
genome as an RNA machine. Science 2008;319:1787–9.
139. Louro R, Smirnova AS, Verjovski-Almeida S. Long intronic noncoding RNA transcription: expression noise or
expression choice? Genomics 2009;93:291–8.
140. Trinklein ND, Aldred SF, Hartman SJ, et al. An abundance
of bidirectional promoters in the human genome. Genome
Res 2004;14:62–6.
141. Tupy JL, Bailey AM, Dailey G, et al. Identification of putative noncoding polyadenylated transcripts in Drosophila
melanogaster. Proc Natl Acad Sci USA 2005;102:5495–500.
142. Rinn JL, Kertesz M, Wang JK, et al. Functional demarcation
of active and silent chromatin domains in human HOX loci
by noncoding RNAs. Cell 2007;129:1311–23.
143. Angeloni D, ter Elst A, Wei MH, et al. Analysis of a new
homozygous deletion in the tumor suppressor region at
3p12.3 reveals two novel intronic noncoding RNA genes.
Genes Chromosomes Cancer 2006;45:676–91.
144. Christov CP, Trivier E, Krude T. Noncoding human Y
RNAs are overexpressed in tumours and required for cell
proliferation. BrJ Cancer 2008;98:981–8.
145. Ji P, Diederichs S, Wang W, et al. MALAT-1, a novel
noncoding RNA, and thymosin beta4 predict metastasis
and survival in early-stage non-small cell lung cancer.
Oncogene 2003;22:8031–41.
146. Mutsuddi M, Marshall CM, Benzow KA, et al. The spinocerebellar ataxia 8 noncoding RNA causes neurodegeneration and associates with staufen in Drosophila. Curr Biol
2004;14:302–8.
422
Dinger et al.
147. Perez DS, Hoage TR, Pritchett JR, et al. Long, abundantly
expressed non-coding transcripts are altered in cancer. Hum
Mol Genet 2008;17:642–55.
148. Reis EM, Nakaya HI, Louro R, et al. Antisense intronic
non-coding RNA levels correlate to the degree of tumor
differentiation in prostate cancer. Oncogene 2004;23:
6684–92.
149. Reis EM, Ojopi EP, Alberto FL, et al. Large-scale transcriptome analyses reveal new genetic marker candidates of head,
neck, and thyroid cancer. Cancer Res 2005;65:1693–9.
150. Sonkoly E, Bata-Csorgo Z, Pivarcsi A, et al. Identification
and characterization of a novel, psoriasis susceptibilityrelated noncoding RNA gene, PRINS. J Biol Chem 2005;
280:24159–67.
151. Thrash-Bingham CA, Tartof KD. aHIF: a natural antisense
transcript overexpressed in human renal cancer and during
hypoxia. J Natl Cancer Inst 1999;91:143–51.
152. Zhang X, Lian Z, Padden C, etal. A myelopoiesis-associated
regulatory intergenic noncoding RNA transcript within the
human HOXA cluster. Blood 2009;113:2526–34.
153. Cawley S, Bekiranov S, Ng HH, etal. Unbiased mapping of
transcription factor binding sites along human chromosomes
21 and 22 points to widespread regulation of noncoding
RNAs. Cell 2004;116:499–509.
154. Blin-Wakkach C, Lezot F, Ghoul-Mazgar S, et al.
Endogenous Msx1 antisense transcript: in vivo and
in vitro evidences, structure, and potential involvement in
skeleton development in mammals. Proc Natl Acad Sci USA
2001;98:7336–41.
155. Brena C, Chipman AD, Minelli A, et al. Expression of trunk
Hox genes in the centipede Strigamia maritima: sense and
anti-sense transcripts. Evol Dev 2006;8:252–65.
156. Inagaki S, Numata K, Kondo T, et al. Identification and
expression analysis of putative mRNA-like non-coding
RNA in Drosophila. Genes Cells 2005;10:1163–73.
157. Kohtz JD, Fishell G. Developmental regulation of EVF-1, a
novel non-coding RNA transcribed upstream of the mouse
Dlx6 gene. Gene Expr Patterns 2004;4:407–12.
158. Redrup L, Branco MR, Perdeaux ER, et al. The long noncoding RNA Kcnq1ot1 organises a lineage-specific nuclear
domain for epigenetic gene silencing. Development 2009;136:
525–30.
159. Sone M, Hayashi T, Tarui H, et al. The mRNA-like noncoding RNA Gomafu constitutes a novel nuclear domain in
a subset of neurons. J Cell Sci 2007;120:2498–506.
160. Sunwoo H, Dinger ME, Wilusz JE, et al. MEN varepsilon/
beta nuclear-retained non-coding RNAs are up-regulated
upon muscle differentiation and are essential components of
paraspeckles. Genome Res 2009;19:347–59.
161. Young TL, Matsuda T, Cepko CL. The noncoding RNA
taurine upregulated gene 1 is required for differentiation of
the murine retina. Curr Biol 2005;15:501–12.
162. Ravasi T, Suzuki H, Pang KC, etal. Experimental validation
of the regulated expression of large numbers of non-coding
RNAs from the mouse genome. Genome Res 2006;16:11–9.
163. Sasaki YT, Sano M, Ideue T, et al. Identification and
characterization of human non-coding RNAs with tissuespecific expression. Biochem Biophys Res Commun 2007;357:
991–6.
164. Lu J, Getz G, Miska EA, et al. MicroRNA expression
profiles classify human cancers. Nature 2005;435:834–8.
165. Furuno M, Pang KC, Ninomiya N, et al. Clusters of internally primed transcripts reveal novel long noncoding RNAs.
PLoS Genet 2006;2:e37.
166. Norseen J, Thomae A, Sridharan V, et al. RNA-dependent
recruitment of the origin recognition complex. EmboJ 2008;
27:3024–35.
167. Abarrategui I, Krangel MS. Noncoding transcription controls downstream promoters to regulate T-cell receptor
alpha recombination. EmboJ 2007;26:4380–90.
168. Wutz A, Gribnau J. X inactivation Xplained. Curr Opin
Genet Dev 2007;17:387–93.
169. Cassiday LA, Maher LJ, 3rd. Having it both ways: transcription factors that bind DNA and RNA. Nucleic Acids Res
2002;30:4118–26.
170. Kennedy D, Wood SA, Ramsdale T, et al. Identification of
a mouse orthologue of the human ras-GAP-SH3-domain
binding protein and structural confirmation that these
proteins contain an RNA recognition motif. Biomed Pept
Proteins Nucleic Acids 1996;2:93–9.
171. Kelley RL, Meller VH, Gordadze PR, et al. Epigenetic
spreading of the Drosophila dosage compensation complex
from roX RNA genes into flanking chromatin. Cell 1999;
98:513–22.
172. Rodriguez-Campos A, Azorin F. RNA is an integral
component of chromatin that contributes to its structural
organization. PLoS ONE 2007;2:e1182.
173. Sanchez-Elsner T, Gou D, Kremmer E, et al. Noncoding
RNAs of trithorax response elements recruit Drosophila
Ash1 to Ultrabithorax. Science 2006;311:1118–23.
174. Barboro P, D’Arrigo C, Diaspro A, et al. Unraveling the
organization of the internal nuclear matrix: RNAdependent anchoring of NuMA to a lamin scaffold. Exp
Cell Res 2002;279:202–18.
175. Clemson CM, Hutchinson JN, Sara SA, et al. An architectural role for a nuclear noncoding RNA: NEAT1 RNA is
essential for the structure of paraspeckles. Mol Cell 2009;33:
717–26.
176. Sasaki YT, Ideue T, Sano M, et al. MENepsilon/beta
noncoding RNAs are essential for structural integrity of
nuclear paraspeckles. Proc Natl Acad Sci USA 2009;106:
2525–30.
177. Mercer TR, Dinger ME, Mattick JS. Long noncoding RNAs: insights into function. Nat Rev Genet 2009;
10:155–9.
178. Waters LS, Storz G. Regulatory RNAs in bacteria. Cell
2009;136:615–28.
179. Shi Y, Tyson GW, DeLong EF. Metatranscriptomics reveals
unique microbial small RNAs in the ocean’s water column.
Nature 2009;459:266–9.
180. Pang KC, Stephen S, Dinger ME, et al. RNAdb 2.0–an
expanded database of mammalian non-coding RNAs.
Nucleic Acids Res 2007;35:D178–82.
181. Dinger ME, Pang KC, Mercer TR, et al. Differentiating
protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol 2008;4:e1000176.
182. Brannan CI, Dees EC, Ingram RS, et al. The product of the
H19 gene may function as an RNA. Mol Cell Biol 1990;10:
28–36.
183. Gabory A, Ripoche MA, Yoshimizu T, et al. The H19
gene: regulation and function of a non-coding RNA.
Cytogenet Genome Res 2006;113:188–93.
Pervasive transcription of the eukaryotic genome
184. Gerstein MB, Bruce C, Rozowsky JS, et al. What is a gene,
post-ENCODE? History and updated definition. Genome
Res 2007;17:669–81.
185. Birney E, Stamatoyannopoulos JA, Dutta A, et al.
Identification and analysis of functional elements in 1% of
the human genome by the ENCODE pilot project. Nature
2007;447:799–816.
186. Watanabe T, Totoki Y, Toyoda A, et al. Endogenous
siRNAs from naturally formed dsRNAs regulate transcripts
in mouse oocytes. Nature 2008;453:539–43.
187. Mattick JS. Challenging the dogma: the hidden layer of
non-protein-coding RNAs in complex organisms.
Bioessays 2003;25:930–9.
188. Carninci P, Sandelin A, Lenhard B, et al. Genome-wide
analysis of mammalian promoter architecture and evolution.
Nat Genet 2006;38:626–35.
189. Gingeras TR. Origin of phenotypes: genes and transcripts.
Genome Res 2007;17:682–90.
190. Pesole G. What is a gene? An updated operational definition. Gene 2008;417:1–4.
191. Ruan Y, Ooi HS, Choo SW, et al. Fusion transcripts
and transcribed retrotransposed loci discovered through
423
comprehensive transcriptome analysis using Paired-End
diTags (PETs). Genome Res 2007;17:828–38.
192. Smalheiser NR. EST analyses predict the existence of a
population of chimeric microRNA precursor-mRNA transcripts expressed in normal human and mouse tissues.
Genome Biol 2003;4:403.
193. Chen C, Fossar N, Weil D, et al. High frequency
trans-splicing in a cell line producing spliced and polyadenylated RNA polymerase I transcripts from an
rDNA-myc chimeric gene. Nucleic Acids Res 2005;33:
2332–42.
194. Mattick JS. Non-coding RNAs: the architects of eukaryotic
complexity. EMBO Rep 2001;2:986–91.
195. Taft RJ, Pheasant M, Mattick JS. The relationship between
non-protein-coding DNA and eukaryotic complexity.
Bioessays 2007;29:288–99.
196. Mehler MF, Mattick JS. Noncoding RNAs and
RNA editing in brain development, functional diversification, and neurological disease. Physiol Rev 2007;87:
799–823.
197. Costa FF. Non-coding RNAs and new opportunities for
the private sector. Drug DiscovToday 2009;14:446–52.