Lars Steinmetz, Marina Granovskaia, Wolfgang

Lars Steinmetz, Marina
Granovskaia, Wolfgang Huber
and Sandra Clauder-Münster
Reappraising
the genome
Here there is a gap in the sequence;
the RNA is missing some information
that was present in the gene because
an intron has been spliced out.
W
HAT’S IN A GENOME? Just a few years ago,
most people would probably have answered
“genes and junk,” and there seemed to be an awful lot of
junk. Upon completion of the human genome, scientists
announced that only about two percent of the complete
DNA sequence encoded proteins. Most of the rest
appeared to be excess baggage, the leftovers of evolution.
Surely some of it had a function – cells were known to
produce some RNAs that didn’t encode proteins and had
regulatory functions. But what was the rest up to?
In a collaborative project with Stanford University, Lars
Steinmetz of Heidelberg and Wolfgang Huber of EMBLEBI have been trying to answer this question. They are
76
using a new method called a tiling array to search for new
functions in the complete yeast genome. A tiling array is
a DNA chip, so the method is similar to DNA chips based
on genes or microRNAs (see previous story). All of these
methods contain probes of DNA on a glass surface to
detect RNA molecules extracted from cells thus a tiling
array study also shows what part of the genome is active
under various conditions. But each of these DNA chip
methods is like asking a series of yes/no questions: you
only get an answer if you’ve posed the right question.
Typical gene chips hold only probes for known genes, and
the samples on a microRNA chip look for matches to
samples preselected by a computational analysis of the
COMPLEMENTARITY AND THE FATES OF CELLS
The tiling array used by Lars’ group uses 25-nucleotide
sequences from the yeast genome as probes. RNAs
(represented by the long molecule along the top) are
extracted from cells, cut into fragments, and allowed to
bind to any probes with complementary sequences.
The method reveals not only which RNAs are produced
at particular times, but where they begin and end, and
also whether sequences are missing – for example,
when an intron has been removed.
“This will give us our first full
look at the cell’s complexity at the
level of RNA. At the moment, a
tiling array is the best way to find
this out.”
In contrast, a tiling array creates probes from both strands
of the entire genome, even “junk” regions that have no
known function. “There are several reasons to generate an
array in such an unbiased fashion,” Lars says. “One is that
we might find new genes. Another is that it will give us our
first full look at the cell’s complexity at the level of RNA. It
can tell us which bases in the genome are transcribed. At
the moment, a tiling array is the best way to find this out.”
genome. The result is like conducting a survey only of
friends, and supposing that the results apply to everyone.
This means that the assay is biased towards the sequences
which are selected and placed onto the array.
As often happens in the development of new technologies, he says, at first skeptics wondered whether such
arrays would yield useful results. Now the power of this
technology is apparent. It promises to revolutionize what
microarrays can reveal about genomes.
EMBL ANNUAL REPORT 05·06
77
A readout from a tiling array experiment. Blue and green
represent different strands of DNA. Dots show where probes
recorded “hits” – RNA transcripts produced by the cell.
Tiling arrays have been made before on a smaller scale,
for example to investigate the genome of cellular structures called mitochondria. Most scientists believe that
these structures evolved from independent organisms –
probably bacteria – which once took up residence in
other types of cells and never left. Mitochondria have
their own DNA, a much smaller genome which reproduces independently of the DNA in the nucleus. Recently
scientists have begun interrogating larger stretches of
DNA, such as whole genomes, but these studies have
yielded unclear results because of a lack of precision and
problems in interpretation. Lars and his colleagues took
on these issues last year when they created a new “highresolution” array that contains 6.5 million separate
probes from the yeast genome.
“It’s been hard to get a direct look at
the untranslated regions of the RNA
at the head and tail. With this study
we could determine exactly where an
RNA molecule begins and ends.”
“The resolution comes from the number of probes making up each array and the overlap between consecutive
probes as they map to the genome,” Lars says. “It’s a bit
like trying to read a book by sampling the text. A standard
DNA chip based on genes says, we know there is some
content on page five, so we start at page five and grab 50
or 60 characters in the middle. Then we skip to page eight
and do the same thing. The tiling array starts with the first
78
letter in the book (really the first base of the genome), and
captures the first 25 letters. Then we move down eight letters and take another sample of 25 letters – an overlap of
17 letters in the code. And we continue this way all the
way to the end of the book. Then we do exactly the same
thing with the second strand of DNA.”
Each experiment using the array produces hundreds of
megabytes of data – a nightmare for interpretation. Here
Lars and his colleagues could draw on the expertise of
Wolfgang’s group at EMBL-EBI, who have been collecting methods needed to analyze microarray experiments
in a suite of tools called BioConductor. The methods are
particularly good at distinguishing meaningful data from
noise – particularly important with short probes, in
which the specific sequences that make up each probe
affect how well RNAs from the sample bind to them. Poor
binding leads to ambiguous results and lots of noise.
The analysis was important and complicated, Wolfgang
says, because the tiling array shows where RNAs are
bound – but not precisely what they are. The same DNA
sequence can produce different RNAs, for example, when
an RNA is spliced to remove an intron. Thus several
forms of a molecule may be bound to the same probe, and
it takes clever computational and statistical methods to
understand what a “hit” means.
Overall, the study revealed that when yeast grows in a rich
source of food, 84.5% of the entire genome is transcribed
into RNA. This is substantially more than the proteinencoding part, which accounts for about 75% of the yeast
genome. 16% of the bases that are transcribed in the
genome had never before been observed or predicted.
THE POWERS OF PROTEINS
The scientists also made some important discoveries
about the structure of genes. “In many cases the coding
region of the gene was well-known,” Lars says. “But it’s
been harder to get a direct look at the untranslated
regions of the RNA at the head and tail. With this study
we could determine exactly where an RNA molecule
begins and ends.”
“This opens a new frontier,” Lars says. “Yeast was the first
completely sequenced eukaryotic organism, and people
have had ten years to work on the information encoded in
its genome. Even so, there is a vast amount of transcription detected by our study that was not known.”
The same is true of other genomes, including humans’.
Paul Bertone, who recently joined Nick Luscombe’s
group at EMBL-EBI, carried out a tiling array study of the
entire human genome as a PhD student at Yale. In both
studies, probes captured a large number of RNAs that
hadn’t been known to exist, including antisense RNAs.
Wolfgang’s analysis showed that some genes had an
unexpectedly complex architecture. At times, parts of
genes were expressed at different levels, indicating that
different lengths of RNA molecules
had been created from the same DNA
sequence. Other unusual cases includThe yeast data
ed single RNAs that seemed to encomrepresents the most
pass two neighboring protein-coding
accurate transcriptional
regions.
The yeast data represents the most
accurate transcriptional map of any
eukaryotic organism, with far higher
resolution, Lars says. Even so, it’s
map of any eukaryotic
just a beginning. He believes that
Another discovery was that the averthere is much more information to
age tail region of an mRNA is longer
organism, with far
be mined from the data that has been
than the head – 91 versus 68
higher resolution. Even
obtained. “And comparing these
nucleotides. That makes sense, Lars
so, it’s just a beginning.
results to similar studies in other
says, because the tail region is often
organisms, once they can be carried
packed with information that helps
out at a sufficiently high resolution,
cells regulate when, where and how
will give us unique insights into evolution,” he says.
often an RNA is translated into protein. The longest tails
“Comparing complete genomes has already suggested
were usually found in RNAs that encoded proteins which
that a lot of DNA beyond the protein-encoding content of
would be used in the mitochondria, the cell membrane or
genes may have a function than we have been able to
the cell wall. Long untranslated regions usually indicated
observe. If we find that this information is transcribed in
that the RNAs were somehow being regulated, for examseveral species, it will give us a handle to start looking for
ple through the attachment of proteins, ther RNAs.
its functions.” !
As well as discovering hundreds of new RNAs, and RNAs
produced by reading the “second” strand of DNA, the
researchers obtained new insights into the functions of
these molecules. The length of an RNA’s untranslated
regions is related to its function and the region of the cell
in which it operates. And what happens on the two
strands is not independent. If in a particular region, both
strands of DNA encode an RNA, their untranslated
regions tend to be longer. “Antisense” RNAs made by
transcribing the strand opposite another gene often seem
to be involved in regulating other RNAs – which is logical, because they have complementary sequences to the
second strand and thus the two molecules could bind to
each other. This was suggested based on genetic engineering experiments several years ago, but so far the phenomenon hasn’t been considered to have a serious role
under normal conditions in the cell. Lars says that this
study revisits the issue and suggests that antisense transcription, which the scientists have now observed extensively over the genome, could indeed have a regulatory
role in yeast cells.
EMBL ANNUAL REPORT 05·06
79