Vol. 20 no. 10 2004, pages 1495–1499 doi:10.1093/bioinformatics/bth108 BIOINFORMATICS Conserved RNA secondary structures in viral genomes: a survey I.L. Hofacker1, ∗, P.F. Stadler1,2 and R.R. Stocsits1 1 Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, Vienna, A-1090, Austria and 2 Bioinformatik, Institut für Informatik, Universität Leipzig, Leipzig, D-04103, Germany Received on October 12, 2003; accepted on February 3, 2004 ABSTRACT Summary: The genomes of RNA viruses often carry conserved RNA structures that perform vital functions during the life cycle of the virus. Such structures can be detected using a combination of structure prediction and co-variation analysis. Here we present results from pilot studies on a variety of viral families performed during bioinformatics computer lab courses in past years. Contact: [email protected] INTRODUCTION In addition to coding for proteins, the genomes of RNA viruses often contain functionally active RNA structures that are vital during the various stages of the viral life cycle. In many cases only a small part of the viral genome is functionally relevant at the level of RNA. Detecting these motifs is a difficult and tedious task since the secondary structures of such regions in general do not look significantly different from structures formed by random sequences. RNA secondary structures are however very fragile to accumulation of point mutations: computer simulations (Schuster et al., 1994) showed that a small number of point mutations is very likely to cause large changes in the secondary structures: mutations in some 10% of the sequence positions already lead almost certainly to unrelated structures if the mutated positions are chosen randomly. Secondary structure elements that are consistently present in a group of sequences with less than, say 95%, average pairwise identity are therefore most likely the result of stabilizing selection, not a consequence of the high degree of sequence conservation. This fact is exploited by the alidot algorithm for a systematic search of conserved secondary structure patterns in large RNA. In brief, independent predictions of the secondary structure for each of the sequences and a multiple sequence alignment that is obtained without any reference to the predicted secondary structures are combined to a list of homologous base pairs. This list is then sorted ∗ To whom correspondence should be addressed. Bioinformatics 20(10) © Oxford University Press 2004; all rights reserved. by means of hierarchical credibility criteria that explicitly take both thermodynamic information and information on sequence covariation into account. A detailed description of the method can be found in Hofacker et al. (1998) and Hofacker and Stadler (1999), and the programs are available from http://www.tbi.univie.ac.at/RNA/ as part of the Vienna RNA Package. A comprehensive analysis of the genomic secondary features is available for Picornaviridae (Witwer et al., 2001), Flaviviridae (Thurner et al., 2004), and Hepadnaviridae (Stocsits et al., 1999; Kidd-Ljunggren et al., 2000). In this contribution we report on preliminary surveys of a variety of virus families. LEVIVIRIDAE Leviviridae are small ssRNA phages without an envelope and tail. The replication cycle includes no DNA stage. The total genome length is 3466–4276 nt depending on the type of strain. Most Levivirus species have four (partly) overlapping genes; some exceptions exist that contain only three open reading frames. Currently, two structural virion proteins have been found and identified. We have investigated eight sequences of the Levivirus genus. Currently, there are five types of strains represented in the GenBank database as complete genomes: the Enterobacteria phages MS2, KU1, GA, fr and AP205. The last one was eliminated from the data set analyzed here because of large changes in genomic structure. Figure 1 gives an overview of potentially conserved structures in Leviviridae, in the form of a mountain plot. In the mountain plot, each base pair (i, j ) is represented as a slab ranging from i to j with a height corresponding to the mean probability of the pair. In the gray-scale version shown here, shades of gray, from light to dark, indicate an increasing number of consistent mutations. The base pairs marked by sparse and dense white stripes have one or two non-compatible sequences, respectively. Detailed results for each of these regions can be found online at http://rna.tbi.univie.ac.at/virus. Of particular interest is the 5 terminal end, where we detect 1495 I.L.Hofacker et al. GG G C C U A G CC G C G C G C G C G U C U AB C Adsorption A2 DE F G Coat H I Polymerase Fig. 1. The Hogeweg mountain plot on the left gives an overview of the potentially conserved secondary structures of the complete RNA genome of Levivirus. The underlying alignment is obtained using code2aln (Stocsits, 2003). In the genome chart below credible RNA structures and positions of the open reading frames are indicated. The position numbers of these elements (compiled in the small table on the right) refer to the alignment available at http://rna.tbi.univie.ac.at/virus/. The 5 -end structure A is homologous to the 5 -terminal RNA replicase recognition site of the allolevivirus Qβ where it is required for the recognition of the template and its amplification. Element F could correspond to the coat protein binding site that is located just upstream of the polymerase gene in Qβ . a short GC-rich hairpin (tetraloop) that follows an unpaired GGG element (see upper right of Fig. 1). This element is most likely the analogon to the stem-loop-structure that is recognized by the Qβ replicase enzyme and is essential for viral replication (Biebricher and Luce, 1993). NEGATIVE-STRANDED RNA VIRUSES The latest version of Virus Taxonomy (Büchen-Osmond, 2003) classifies the families Paramyxoviridae, Rhabdoviridae, Filoviridae and Bornaviridae as members of the order Mononegavirales. The families Orthomyxoviridae, Bunyaviridae and Arenaviridae remain unassigned. Paramyxoviridae have genome sizes of about 15–16 kb. The only unambiguous RNA structure is a panhandle structure that is highly conserved. In addition, there are a few weak signals for specific small RNA features in the genera Respirovirus and Morbillivirus that deserve a more detailed analysis. Rhabdoviridae have a genome size of about 12 kb. No unambiguous signals have been detected. In the genus Rhabdovirus 1496 conserved gene-end and -start sequences have been reported (Wertz et al., 1994) that might be associated also with conserved structural features. Filoviridae. Complete sequences are available only for Reston and Zaire Ebola and for the Marburg virus. These three sequences are insufficient for the alidot approach. The program qrna (Rivas and Eddy, 2001) was therefore used to identify possible conserved RNA features in a pairwise alignment. An example is shown in Figure 2. Bunyaviridae exhibit panhandle structures that are confirmed by the existence of compensatory mutations. Neither the genus Hantavirus nor the genus Bunyavirus (which has a genome consisting of three distinct chromosomes) exhibit convincing additional RNA secondary structure motifs in either the (+) or the (−)-strand although there are some small hairpins containing a few consistent and compensatory mutations reported in Fekete (2000). Orthomyxoviridae have genomes that contain a moderate number of small segments (eight in the case of the influenza virus). As for most negative-stranded RNA viruses our computations confirm the well-known panhandle structures (Cheong et al., 2003). A hairpin loop structure at both the 5 Conserved RNA structures in viral genomes G U C UC AG AG A CC A A A CG AA U A C C UU A A CUC C A A C A A A UU A A C C U C C C A A GA U U G AGG_ A A U A A A C C C GA A AGAG A A A A C A U GA AGA CU A U U A C A UUUU A A C A CGGGGG CUUU A A C UGUG A C A A U A A AG A A U A AUACA A G G G A A U A A CUCC A U A A A A A U C U U A AA AA C A CA A Fig. 2. Common RNA structure of Reston Ebola and Marburg virus in the region from 13 100 to 13 270 of their clustalw alignment. The signal was detected by qrna and then re-analyzed using alidot. Circles mark sequence variation supporting the structure. and 3 ends (which competes with the panhandle and hence does not show in our data) is required for efficient endonuclease activity of influenza virus RNA polymerase, an activity that is required for the cap-snatching activity of primers from host pre-mRNA (Leahy et al., 2001). Arenaviridae also exhibit panhandle structures that have been repeatedly reported in the literature. Conserved intergenic regions containing poly-U stretches are reminiscent of Rhabdoviridae. We did not find evidence that these sequences form conserved secondary structures. Panhandle structures appear to be the only common RNA secondary structure features of negative stranded RNA viruses, despite at least weak indications of additional conserved features in some groups. POSITIVE-STRANDED RNA VIRUSES The survey of positive-stranded RNA viruses is far from complete at this point. Apart from the families Picornaviridae (Witwer et al., 2001) and Flaviviridae (Thurner et al., 2004), which have been studied in detail, preliminary data are available for three additional unrelated groups. Dicistroviridae are an only recently described family of arthropod viruses. The genome contains a 5 untranslated region of 500–800 nt followed by two ORFs of around 5500 nt and 2600 nt separated by an untranslated intergenic region (IGR) of approximately 190 nt. This di-cistronic structure of the genome is special among RNA viruses. The IGR between the replicase and the four capsid proteins contains an IRES region while the 5 end contains another IRES motif despite the fact that the translation is fMet-initiated. Preliminary data indicate that the IGR–IRES is structurally well conserved. Comoviridae. A number of conserved elements with a larger number of compensatory mutations can be found within the three genera Comovirus, Fabavirus and Nepovirus. A hairpin structure with a well-conserved stem and a relatively large A/U-rich loop region (pos. 1257–1271 of the Nepovirus alignment) is reminiscent of the cis-acting replication element (CRE) of Picornaviridae. The close similarity of the life cycles of Picornaviridae and Comoviridae lends credibility to this conjecture. Potyviridae have genomes of almost 10 000 nt consisting of a single ORF, coding for a polyprotein, and short UTRs at the 5 and 3 ends. Many complete sequences are available since the family includes important plant pathogens. Because of high sequence diversity, it is difficult to get good alignments; nevertheless, a number of interesting features were found. As an example Figure 3 shows the conserved structure in the 3 UTR of Potyvirus. RETROVIRIDAE Retroviridae, in contrast, exhibit a large number of functional RNA secondary structures. In the case of the lentivirus HIV, some of them are very well characterized both experimentally and computationally, such as the TAR hairpin at the 5 end, the gag/pol frameshift hairpin (also present in Mammalian Type B Retrovirus) and the RRE structure. Our pilot study yields indications of functional RNA structures throughout the family. However, there seems to be little similarity in the motifs used by different genera. A comparison between different genera is difficult because the low sequence homology does not permit reliable alignments. A much more detailed analysis will be necessary to determine whether some secondary motifs are shared between all Retroviridae. DISCUSSION By combining thermodynamic structure prediction with co-variation analysis, conserved RNA secondary structures in viral RNA genomes can be detected reliably. The analysis works best if at least five sequences with pairwise identities of around 80% are available. For many types of viruses such sequence data are now available, and we have begun a systematic survey of viral RNA structures, the results of which are made available in an online database at http:// rna.tbi.univie.ac.at/virus/ All the RNA viruses studied so far contain at least some functional RNA structures. These conserved structures occur particularly often in the 5 and 3 UTRs, but even within coding regions conserved RNA structures are not rare. To our surprise, the bibliographic search for experimental evidence of and further information on ‘RNA secondary structures’ in a given group of virus—a seemingly rather straightforward task—turned out to be more tedious than the work on the actual sequence and structure data. We have therefore developed the litsift tool (Faulstich et al., 2003) to make a bibliographic search in the context of the Atlas of Viral Secondary Structures more effective. To 1497 I.L.Hofacker et al. UU G A G A C G A U A A A U G G U A G U U G UA G A C A C G U AC U A A A UA A CC C G AG A G C G UU U U C G A A C GA A A U C U G GU U U G U A U GU G UG A A A C G A A A C U U U AA A U CA C CG UG G A C G C U U CGU U A A C U U A C C G A A G A A AG A A A A U U A AUU G G GGC G U GC U A A GG G U G A CG U A A A C G U AA G A A C U G C C CC GU A A U AA U _ A _ C AAUC G UA A G G C G U _ A G G A UA A U G U U _ GU GU G C G A U UU U A C A C G G A A CU CAC UG U _ U U C A U C C U A G C G U A U A A A AG G A U G UG U U A A GA GAG A G U G C U C A G A AA AC U U AA C U U A CC C G GU A C C U _ UC GA U G GG G C A A C GU U U G U GC AG A A A G GU G A A U ACA C CC UA UC UA G G AU A A AGA A A C U A UA GC CG CG UA GA U A CG U G G G CA AC U U G A C AU A G UA GU C G C U A UG AC G A U GC GG U A G C U CG U GC A AG G G A U U GU A AC U A AU G U G A C A C A A C U GC A A GC G U AU C AA U A A CG A A U CUG U UA C GC CG GC AA GC AU A UA C UG G C A UU A A U U UG C G C A AA U UG U CU A G U UG U G UA CG U U A U G G Fig. 3. Conserved secondary structures in the 3 end of Potyvirus. The region spans from approximately 150 nt before to 400 nt after the stop codon. Potyvirus sequences are too divers to obtain a reliable alignment of all sequences. Here we used five sequences (NC_001517 Pepper Mottle Virus, NC_001616 Potato Virus Y, NC_004010 Potato Virus V, NC_004426 Wild Potato Mosaic Virus, NC_004573 Peru Tomato Mosaic Virus) aligned using clustalw. this end, we use classifiers trained on sample corpora in a system that filters and ranks search results from bibliographic databases such as Pubmed. Preliminary results indicate that a classifier trained on one virus group can indeed be applied successfully to search the literature on other virus groups. ACKNOWLEDGEMENTS This work is supported by the Austrian Fonds zur Förderung der Wissenschaftlichen Forschung, Project Nos. P-13545MAT and P-15893, and the German DFG Bioinformatics Initiative. Pilot studies on individual virus groups were conducted as part of bioinformatics computer lab courses in Vienna and Leipzig—Winter 2000: Leviviridae (Erwin Gaubitzer, Philipp Hohensinner); Winter 2001: Retroviridae (Peter Andorfer, Veronika Bayer, Alexander Biedermann, Matthias Binder, Claudia Blaukopf, Ulrich Braunschweig, Claudia Fried, Stefan Fringer, Sebastian Glatt, David Grote, Michaela Gruber, Jakob Haglmüller, Leonhard Heinz, Astrid Hrdina, Alexander Kassler, Vedhu Krystufek, Christian Lahsnig, Beate Lichtenberger, Christian Martschitsch, Georg Mitterer, Michael Müller, Gregor Obernosterer, Florian Pauler, Paul Perco, Thomas Perlot, Sonja Prohaska, Markus Reschke, Miriam Satler, Manfred Schifrer, Yvonne Schindlegger, Ulrike Seifert, Sabine Stampfl, Stefanie Strobach, Andrea Tanzer, Thomas Taylor, Siegfried Ussar, Stefan Washietl, 1498 Wolfgang Winkler); Winter 2002: Mononegavirales (Philipp Adaktylos, Katja Balazs, Florian Baumgart, Stefan Bruckner, Christoph Buchmann, Valerie Diederichs, Karin Ebner, Jörg Ettenauer, Wolfgang Fischl, Andrea Förster, Andreas Gruber, Susanne Haider, Philipp Hainzl, Jennifer Hetzl, Hans-Peter Kantner, Martina Knapp, Stefan Kraus, Roman Ferdinand Kreindl, Dominik Muggenhumer, Murat Özcelik, Jürgen Pfatschbacher, Petra Pokorny, Josef Ruppert, Johann Schmidt, Simone Schopper, Andreas Verhounig, Michael Wittinger) Spring 2003: Dicistroviridae (Christine Körner, Torsten Glomb); Potyviridae (Franziska Heinze, Sonja Karam, Jörg Lehmann, Markus Rohrschneider); Comoviridae (Jan Buck, Alexander Groß, Jana Hertel, Manuela Lindemeyer, Peter Menzel, Alexander Schubert, Stefan Seemann). REFERENCES Biebricher,C.K. and Luce,R. (1993) Sequence analysis of RNA species synthesized by Qbeta replicase without template. Biochemistry, 32, 4848–4854. Büchen-Osmond,C. (2003) The universal virus database ICTVdB. Comput. Sci. Eng., 5, 16–25. Cheong,H.-K., Cheong,C., Lee,Y.-S., Seong,B.L. and Choi,B.-S. (2003) Structure of influenza virus panhandle RNA studied by NMR spectroscopy and molecular modeling. Nucleic Acids Res., 27, 1392–1397. Conserved RNA structures in viral genomes Faulstich,L.C., Stadler,P.F., Thurner,C. and Witwer,C. (2003) litsift: Automated text categorization in bibliographic search. In Scheffer,T. and Leser,U. (eds), Data Mining and Text Mining for Bioinformatics, (ECML/PKDD 2003), Berlin, pp. 20–25. Fekete,M. (2000) Scanning RNA virus genomes for functional secondary structures. Ph.D. thesis, University of Vienna. Hofacker,I.L. and Stadler,P.F. (1999) Automatic detection of conserved base pairing patterns in RNA virus genomes. Comp. Chem., 23, 401–414. Hofacker,I.L., Fekete,M., Flamm,C., Huynen,M.A., Rauscher,S., Stolorz,P.E. and Stadler,P.F. (1998) Automatic detection of conserved RNA structure elements in complete RNA virus genomes. Nucleic Acids Res., 26, 3825–3836. Kidd-Ljunggren,K., Zuker,M., Hofacker,I.L. and Kidd,A.H. (2000) The hepatitis B virus pregenome: prediction of RNA structure and implications for the emergence of deletions. Intervirology, 43, 154–164. Leahy,M.B., Dobbyn,H.C. and Brownlee,G.G. (2001). Hairpin loop structure in the 3 arm of the influenza A virus virion RNA promoter is required for endonuclease activity. J. Virol., 75, 7042–7049. Rivas,E. and Eddy,S.R. (2001). Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics, 2, 1–19. Schuster,P., Fontana,W., Stadler,P.F. and Hofacker,I.L. (1994) From sequences to shapes and back: A case study in RNA secondary structures. Proc. R. Soc. London B, 255, 279–284. Stocsits,R., Hofacker,I.L. and Stadler,P.F. (1999) Conserved secondary structures in hepatitis B virus RNA. In Computer Science in Biology. University of Bielefeld, Bielefeld, D, pp. 73–79. Proceedings of the GCB’99, Hannover, D. Stocsits,R. (2003) Nucleic acid sequence alignments of partly coding regions. Ph.D. thesis, University of Vienna. Thurner,C., Witwer,C., Hofacker,I. and Stadler,P.F. (2004) Conserved RNA secondary structures in Flaviviridae genomes. J. Gen. Virol., submitted. Wertz,G., Whelan,S., LeGrone,A. and Ball,L.A. (1994) Extent of terminal complementarity modulates the balance between transcription and replication of vesicular stomatitis virus RNA. Proc. Natl Acad. Sci., USA, 91, 8587–8591. Witwer,C., Rauscher,S., Hofacker,I.L. and Stadler,P.F. (2001) Conserved RNA secondary structures in picornaviridae genomes. Nucleic Acids Res., 29, 5079–5089. 1499
© Copyright 2026 Paperzz