Outline Motivation Existing implementations Combination of two models Application Combined thermodynamic and evolutionary model for RNA secondary structure prediction Rolf Backofen1 , Jan Gorodkin2 and Stefan Seemann1,2 1 Bioinformatics 2 Division - Inst. of Computer Science, Albert-Ludwigs-University Freiburg of Genetics and Bioinformatics, IBHV, Copenhagen University, Denmark Bled, Feb 2007 Discussion Outline Motivation Existing implementations Combination of two models 1 Motivation 2 Existing implementations Pfold Vienna RNA Package - RNAfold 3 Combination of two models 4 Application Model performance Alignment dependencies 5 Discussion Application Discussion Outline Motivation Existing implementations Combination of two models Outline 1 Motivation 2 Existing implementations Pfold Vienna RNA Package - RNAfold 3 Combination of two models 4 Application Model performance Alignment dependencies 5 Discussion Application Discussion Outline Motivation Existing implementations Combination of two models Application Motivation non-coding RNA genes provide their functionality through their space conformation functional structures are conserved in the evolution several independent models to judge consensus secondary structures: 1 2 3 evolutionary model of RNA sequences probabilistic model for secondary structure thermodynamic model for folding energy Discussion Outline Motivation Existing implementations Combination of two models Outline 1 Motivation 2 Existing implementations Pfold Vienna RNA Package - RNAfold 3 Combination of two models 4 Application Model performance Alignment dependencies 5 Discussion Application Discussion Outline Motivation Existing implementations Combination of two models Application Pfold Probabilistic evolutionary model1 , which consists of stochastic evolutionary model (T) 1 ~ i ~Ai+j−1|T ] Prpaired [A ~ i |T ] Prsingle [A SCFG-based probabilistic model (M) for secondary structure 2 production rules: S → LS | L F → dFd | LS L → s | dFd 1 (produces loops) (produces stems) (single base or new stem) Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 31(13):3423-8. Discussion Outline Motivation Existing implementations Combination of two models Application Discussion Pfold Most probable consensus structure σ can be determined by maximize Pr[σ|A, T , M]: σ MAP = argmax Pr[A|σ, T , M] Pr[σ|T , M] σ This can be solved using the CYK-algorithm. Outline Motivation Existing implementations Combination of two models Application Discussion Vienna RNA Package - RNAfold2 The partition function measures the probability of a secondary structure σ in thermodynamic equilibrium: ∆Gσ Zσ e− RT Pσ = =P ∆GS Z e− RT S∈Ω It can be calculated the density probability of base pairs Pr [(Aiu , Aui+j−1 ) | su ] unpaired bases Prunpaired [Aui+j−1 |su ] 2 I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 Outline Motivation Existing implementations Combination of two models Outline 1 Motivation 2 Existing implementations Pfold Vienna RNA Package - RNAfold 3 Combination of two models 4 Application Model performance Alignment dependencies 5 Discussion Application Discussion Outline Motivation Existing implementations Combination of two models Application Discussion Extended model Combination of probabilistic evolutionary information with thermodynamic parameters of the standard energy minimization model: sh T, M and σ s1 Pr[ σ1| σ =A σ 1 sn s1] Pr[σn| σn =A σ sn] Outline Motivation Existing implementations Combination of two models Application Probabilistic evolutionary model Def. of probability of structure σ: Pr[A|σ, T , M] Pr[σ|T , M] = probM,τM (σ) (r , A) Recursively definition for the probabilistic evolutionary model: probM,τM (σ) (n, A) = Pr[rule(n)|M] Q × kℓ=1 probM,τM (σ) (nℓ , A) ~ i ~Ai+j−1|T ] if rule(n) = F → dFd Prpaired [A or rule(n) = L → dFd × i ~ Prsingle [A |T ] if rule(n) = L → s 1 else Discussion Outline Motivation Existing implementations Combination of two models Application PE thermodynamic model probM,τ (σ) (n, A) = M Pr[rule(n)|M] × Qk ℓ=1 probM,τM (σ) (nℓ , A) 8 Prpaired [~ Ai ~ Ai+j−1 |T ] > > ( > i+j−1 > > Q Pr [(Aiu , Au ) | su ] > > × nu=1 > > i+j−1 > |su ] Prunpaired [Aiu |su ] × Prunpaired [Au > > > > > > > > i i+j−1 > > Prpaired [~ A~ A |T ] < ( i+j−1 i+j × Q Pr [(Aiu , Au ) | (Ai−1 u , Au ), su ] >× n > i+j−1 u=1 > > Prunpaired [Aiu |su ] × Prunpaired [Au |su ] > > > > > > > Qn > i i > ~ > Pr [A |T ] × u=1 Prunpaired [Au |su ] > > single > > > > : 1 if rule(n) = L → dFd i+j−1 if bp(sui , su ) i+j−1 if ¬bp(sui , su ) if rule(n) = F → dFd i+j−1 if bp(sui , su ) i+j−1 if ¬bp(sui , su ) if rule(n) = L → s else Discussion Outline Motivation Existing implementations Combination of two models Application Discussion Gaps Treating gaps is a general problem in biological sequence analysis: alignment columns with ≥ 25% gaps are removed (like in Pfold) sequence depended probabilities are calculated without gaps gap probabilities are estimated as geometric mean of probabilities in the appropriate column Outline Motivation Existing implementations Combination of two models Outline 1 Motivation 2 Existing implementations Pfold Vienna RNA Package - RNAfold 3 Combination of two models 4 Application Model performance Alignment dependencies 5 Discussion Application Discussion Outline Motivation Existing implementations Combination of two models Application Model performance in U1 Comparison of: PETfold, RNAalifold, Pfold Test data: Rfam seed alignment of U1 spliceosomal RNA (av.id= 40.6%; max.id = 50%; #seq = 5) set rules 0 0 0 0 1 1 1 1 set set log2 prob tree seq 0 0 0 0 1 -1089 1 0 -12 1 1 -1381 0 0 -37 0 1 -1276 1 0 -106 1 1 -1540 RNAalifold Pfold sensitivity [%] 0 75 95 72.5 0 72.5 90 70 62.5 95 specificity [%] 0 64 90 81 0 74 90 82 86 90 Discussion Outline Motivation Existing implementations Combination of two models Application Discussion Optimal consensus structure Rfam Pfold RNAalifold PETfold Rfam Pfold RNAalifold PETfold Rfam Pfold RNAalifold PETfold Rfam Pfold RNAalifold PETfold mir-9/mir-79 microRNA precursor family ...<<<<<<<<<<<<<<<..<.<<<................>.>>>..>>>>>>>>>>>>>>>... ...(((((((((((((((..(.(((................).)))..)))))))))))))))... ...(((((((((((((((..............................)))))))))))))))... ...(((((((..(.(((((.(-((((.((-......--))))-))).))))).)..)))))))... sensitivity = 100%|79%|84%; specificity = 100%|100%|80% U98 small nucleolar RNA .......<<<<<.......<<<<<<<........>>>>>>>...........>>>>>...... ....................(((((..........)))))....................... .....(((..(((...((.((((((((......)))))))).))(((((.............. ..--((.((.......(((((((((((......)))))))...))))......)).))..... .......<<<<<<<...........<<<...........>>>................>>>>>>>....... ........................................................................ (((((.(((......))).)))))......(((......)))))))).....)))...)))........... ......((((((((.((((.....))))..(((......)))..(((.--..)))...)))))))..).... sensitivity = 23%|55%|73%; specificity = 100%|37.5%|52% Small nucleolar RNA Z159/U59 ..<<<<.............∼........<<<<<<<<<..∼..>>>>>>>>>..............∼..>>>>.. ((((((.............∼........(((((((((..∼..)))))))))......((...)).∼..)))))) ..((((.............∼........(((((((((..∼..)))))))))......(....)..∼..)))).. (.(((((.....((.((..∼..))..))(..(..(((..∼..)))..)..)...)-.(....)..∼..)))).) sensitivity = 100%|100%|69%; specificity = 76%|93%|53% Outline Motivation Existing implementations Combination of two models Application RNAfold vs PETfold −20 50 suboptimal structures of the U1 spliceosomal RNA AE003745 1 1 1 1 1 1 1 1 1 1 −60 1 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 −100 −80 1 1 −50 −40 1 1 1 −120 qfold Log2 Prob of substructure −40 1 1 11 1 1 1 1 1 1 −30 RNAfold Log2 Prob of substructure −20 Discussion G C C G G C C C G G G U U A C A G C C G AG G C UA A CU C G UA UU A A U AU U G C U C G U C GGU G G C AAC C G G U U GUG A UG C G A U A A U C G U G A U C A U A U GU AAG C GG U C CACU G CGG C G U C U G U C U AC G U G AG A A A G U C C U C CG A U A G G U C GU UUA G G G U C GC A CG U C U U G C A CC C G U C U G G C C C G G G U U A C A G C C G AG G C UA A C C U UA G AU CG U CG U U G U U G U C UAA G G U C C A G G U AG U G G C C GG U C G UC UU CC A A A GC A U A U G C A AU G A A G CUAG U U C U A A G CGGUU GAG U A C U GU CU U C G U C C C G C A U A GG G C A U G U CC A GC A G G U G CAU U CG C U G U U Observation: PETfold predicts more probable basepairs as single bases in multiloops RNAfold vs PETfold 3 most probable structures of the U1 spliceosomal RNA AE003745 predicted by RNAfold Discussion Application Combination of two models Existing implementations Motivation Outline C G U C U G G C C C G G G U U A C A C G C G AG G C UA A CU C G UA UAA A U UU U CG U G G U C GU C G G AC C GC U A UC U G G A UG C G A U U G A A U A G AU C U G G U U C G U UU CC A GC A C AA GA G A A G C U A UG C C U U U A G CGGU GAG U CUC U C C GC GG G C A U G U CC A GC A G G G CAUU U CG C U G Outline Motivation Existing implementations Combination of two models Application RNAfold vs PETfold 50 suboptimal structures of the mir-9/mir-79 microRNA Z81467 1 1 1 −10 1 1 1 1 1 1 1 1 1 1 −20 1 1 1 −30 1 −50 −40 1 −60 qfold Log2 Prob of substructure 1 1 1 1 −25 −20 −15 RNAfold Log2 Prob of substructure −10 −5 1 1 Discussion Outline Motivation Existing implementations Combination of two models Application Discussion Alignment dependencies in U1 Influence of average pairwise identity of an alignment on the optimal structure probability of our model: Test data: Rfam seed alignment of U1 spliceosomal RNA −4000 −6000 −8000 Log2 Probability −2000 0 the number of sequences in the alignment is fixed 2 seq 5 seq 10 seq 20 seq 30 40 50 60 70 average pairwise identity [%] 80 90 100 Outline Motivation Existing implementations Combination of two models Application Alignment dependencies in U1 Influence of sequence number in an alignment on the optimal structure probability of our model: Test data: Rfam seed alignment of U1 spliceosomal RNA 0 the average pairwise identity of the alignment is fixed −2000 −3000 −4000 −5000 Log2 Probability −1000 average pairwise identity 40 % 60 % 80 % 2 4 6 8 number of sequences 10 12 Discussion Outline Motivation Existing implementations Combination of two models Outline 1 Motivation 2 Existing implementations Pfold Vienna RNA Package - RNAfold 3 Combination of two models 4 Application Model performance Alignment dependencies 5 Discussion Application Discussion Outline Motivation Existing implementations Combination of two models Application Discussion Problems and further work prior probability of structures without extra information content (uniform distribution) low number of basepairs in large alignments (also Pfold has problems with large input) basepairs with higher probability as single bases in multiloops dangling ends are not considered until now (usage of RNAfold -p2) Outline Motivation Existing implementations Combination of two models Application Discussion Extended grammar considering stacking probabilities Modified grammar: single bases are considered in their structural context by changed F rule S → LS | L F → dFd | dFdS | sS L → s | dFd Sequence stacking probabilities are estimated by RNAfold constraints. Outline Motivation Existing implementations Thank you!!! :-) Combination of two models Application Discussion
© Copyright 2026 Paperzz