Lars Steinmetz, Marina Granovskaia, Wolfgang Huber and Sandra Clauder-Münster Reappraising the genome Here there is a gap in the sequence; the RNA is missing some information that was present in the gene because an intron has been spliced out. W HAT’S IN A GENOME? Just a few years ago, most people would probably have answered “genes and junk,” and there seemed to be an awful lot of junk. Upon completion of the human genome, scientists announced that only about two percent of the complete DNA sequence encoded proteins. Most of the rest appeared to be excess baggage, the leftovers of evolution. Surely some of it had a function – cells were known to produce some RNAs that didn’t encode proteins and had regulatory functions. But what was the rest up to? In a collaborative project with Stanford University, Lars Steinmetz of Heidelberg and Wolfgang Huber of EMBLEBI have been trying to answer this question. They are 76 using a new method called a tiling array to search for new functions in the complete yeast genome. A tiling array is a DNA chip, so the method is similar to DNA chips based on genes or microRNAs (see previous story). All of these methods contain probes of DNA on a glass surface to detect RNA molecules extracted from cells thus a tiling array study also shows what part of the genome is active under various conditions. But each of these DNA chip methods is like asking a series of yes/no questions: you only get an answer if you’ve posed the right question. Typical gene chips hold only probes for known genes, and the samples on a microRNA chip look for matches to samples preselected by a computational analysis of the COMPLEMENTARITY AND THE FATES OF CELLS The tiling array used by Lars’ group uses 25-nucleotide sequences from the yeast genome as probes. RNAs (represented by the long molecule along the top) are extracted from cells, cut into fragments, and allowed to bind to any probes with complementary sequences. The method reveals not only which RNAs are produced at particular times, but where they begin and end, and also whether sequences are missing – for example, when an intron has been removed. “This will give us our first full look at the cell’s complexity at the level of RNA. At the moment, a tiling array is the best way to find this out.” In contrast, a tiling array creates probes from both strands of the entire genome, even “junk” regions that have no known function. “There are several reasons to generate an array in such an unbiased fashion,” Lars says. “One is that we might find new genes. Another is that it will give us our first full look at the cell’s complexity at the level of RNA. It can tell us which bases in the genome are transcribed. At the moment, a tiling array is the best way to find this out.” genome. The result is like conducting a survey only of friends, and supposing that the results apply to everyone. This means that the assay is biased towards the sequences which are selected and placed onto the array. As often happens in the development of new technologies, he says, at first skeptics wondered whether such arrays would yield useful results. Now the power of this technology is apparent. It promises to revolutionize what microarrays can reveal about genomes. EMBL ANNUAL REPORT 05·06 77 A readout from a tiling array experiment. Blue and green represent different strands of DNA. Dots show where probes recorded “hits” – RNA transcripts produced by the cell. Tiling arrays have been made before on a smaller scale, for example to investigate the genome of cellular structures called mitochondria. Most scientists believe that these structures evolved from independent organisms – probably bacteria – which once took up residence in other types of cells and never left. Mitochondria have their own DNA, a much smaller genome which reproduces independently of the DNA in the nucleus. Recently scientists have begun interrogating larger stretches of DNA, such as whole genomes, but these studies have yielded unclear results because of a lack of precision and problems in interpretation. Lars and his colleagues took on these issues last year when they created a new “highresolution” array that contains 6.5 million separate probes from the yeast genome. “It’s been hard to get a direct look at the untranslated regions of the RNA at the head and tail. With this study we could determine exactly where an RNA molecule begins and ends.” “The resolution comes from the number of probes making up each array and the overlap between consecutive probes as they map to the genome,” Lars says. “It’s a bit like trying to read a book by sampling the text. A standard DNA chip based on genes says, we know there is some content on page five, so we start at page five and grab 50 or 60 characters in the middle. Then we skip to page eight and do the same thing. The tiling array starts with the first 78 letter in the book (really the first base of the genome), and captures the first 25 letters. Then we move down eight letters and take another sample of 25 letters – an overlap of 17 letters in the code. And we continue this way all the way to the end of the book. Then we do exactly the same thing with the second strand of DNA.” Each experiment using the array produces hundreds of megabytes of data – a nightmare for interpretation. Here Lars and his colleagues could draw on the expertise of Wolfgang’s group at EMBL-EBI, who have been collecting methods needed to analyze microarray experiments in a suite of tools called BioConductor. The methods are particularly good at distinguishing meaningful data from noise – particularly important with short probes, in which the specific sequences that make up each probe affect how well RNAs from the sample bind to them. Poor binding leads to ambiguous results and lots of noise. The analysis was important and complicated, Wolfgang says, because the tiling array shows where RNAs are bound – but not precisely what they are. The same DNA sequence can produce different RNAs, for example, when an RNA is spliced to remove an intron. Thus several forms of a molecule may be bound to the same probe, and it takes clever computational and statistical methods to understand what a “hit” means. Overall, the study revealed that when yeast grows in a rich source of food, 84.5% of the entire genome is transcribed into RNA. This is substantially more than the proteinencoding part, which accounts for about 75% of the yeast genome. 16% of the bases that are transcribed in the genome had never before been observed or predicted. THE POWERS OF PROTEINS The scientists also made some important discoveries about the structure of genes. “In many cases the coding region of the gene was well-known,” Lars says. “But it’s been harder to get a direct look at the untranslated regions of the RNA at the head and tail. With this study we could determine exactly where an RNA molecule begins and ends.” “This opens a new frontier,” Lars says. “Yeast was the first completely sequenced eukaryotic organism, and people have had ten years to work on the information encoded in its genome. Even so, there is a vast amount of transcription detected by our study that was not known.” The same is true of other genomes, including humans’. Paul Bertone, who recently joined Nick Luscombe’s group at EMBL-EBI, carried out a tiling array study of the entire human genome as a PhD student at Yale. In both studies, probes captured a large number of RNAs that hadn’t been known to exist, including antisense RNAs. Wolfgang’s analysis showed that some genes had an unexpectedly complex architecture. At times, parts of genes were expressed at different levels, indicating that different lengths of RNA molecules had been created from the same DNA sequence. Other unusual cases includThe yeast data ed single RNAs that seemed to encomrepresents the most pass two neighboring protein-coding accurate transcriptional regions. The yeast data represents the most accurate transcriptional map of any eukaryotic organism, with far higher resolution, Lars says. Even so, it’s map of any eukaryotic just a beginning. He believes that Another discovery was that the averthere is much more information to age tail region of an mRNA is longer organism, with far be mined from the data that has been than the head – 91 versus 68 higher resolution. Even obtained. “And comparing these nucleotides. That makes sense, Lars so, it’s just a beginning. results to similar studies in other says, because the tail region is often organisms, once they can be carried packed with information that helps out at a sufficiently high resolution, cells regulate when, where and how will give us unique insights into evolution,” he says. often an RNA is translated into protein. The longest tails “Comparing complete genomes has already suggested were usually found in RNAs that encoded proteins which that a lot of DNA beyond the protein-encoding content of would be used in the mitochondria, the cell membrane or genes may have a function than we have been able to the cell wall. Long untranslated regions usually indicated observe. If we find that this information is transcribed in that the RNAs were somehow being regulated, for examseveral species, it will give us a handle to start looking for ple through the attachment of proteins, ther RNAs. its functions.” ! As well as discovering hundreds of new RNAs, and RNAs produced by reading the “second” strand of DNA, the researchers obtained new insights into the functions of these molecules. The length of an RNA’s untranslated regions is related to its function and the region of the cell in which it operates. And what happens on the two strands is not independent. If in a particular region, both strands of DNA encode an RNA, their untranslated regions tend to be longer. “Antisense” RNAs made by transcribing the strand opposite another gene often seem to be involved in regulating other RNAs – which is logical, because they have complementary sequences to the second strand and thus the two molecules could bind to each other. This was suggested based on genetic engineering experiments several years ago, but so far the phenomenon hasn’t been considered to have a serious role under normal conditions in the cell. Lars says that this study revisits the issue and suggests that antisense transcription, which the scientists have now observed extensively over the genome, could indeed have a regulatory role in yeast cells. EMBL ANNUAL REPORT 05·06 79
© Copyright 2026 Paperzz