nd
RNA 2 structure prediction
Why?
●
●
●
tRNA phenylalanine from yeast
tertiary and 2nd structure
RNA molecules can be
classified in:
●
messenger (coding) RNA
●
non-coding RNA
The non-coding RNAs have a
wide range of functions that is
(believed to be) determined by
its tertiary structure
The scaffold for the tertiary
structure is provided by the 2nd
structure
RNA sequence
●
●
●
RNA (RiboNucleic Acid) molecules are very similar to
DNA (DeoxyriboNucleic Acid) molecules
Each molecule is made of a chain of nucleotides
(bases). There are only four nucleotides.
Thus, the sequence (or primary structure) of the RNA
molecule can be represented as a string over the
alphabet {A, C, G, U}
adenine
cytosine
guanine
uracil
nd
RNA 2 structure
●
●
Unlike DNA, RNA is produced as a single stranded
molecule which then folds to form base pairs
(2nd structure)
The typical base pairs are:
●
canonical (Watson - Crick):
–
–
●
non-canonical:
–
●
A and U
C and G
G and U
RNA can have other base pairs but they are formed
with very low frequency
nd
RNA 2 structure representation
●
Primary structure
●
Dome
UUUGGAUAAAA
●
Sequence of base pairs
1
3
{1 · 11, 2 · 10, 3 · 9}
●
Bracket
(((.....)))
●
Standard
graphical
representation
9
11
Base pairs
●
Any base can take part in at most one base pair
●
Two base pairs can be in one of three configurations
juxtaposed
●
●
nested
overlapping
Overlapping base pairs form a pseudoknot
A 2nd structure without pseudoknots can be
represented as a planar graph
nd
RNA 2 structure prediction
●
●
●
Energy minimization
●
predict a 2nd structure of least free energy
●
based on primary structure only
●
example Nussinov, Zuker's Mfold
Comparative structure prediction
●
predict 2nd structures for several sequences
●
based on a prior (reliable) alignment
Probabilistic models
●
example SCFGs (stochastic context free grammars)
Nussinov
●
●
●
Minimum energy ↔ maximum number of base pairs
Calculate best structure for small subsequences and work
outwards to larger and larger subsequences
Notations
●
seq
the RNA sequence
(over alphabet {A, C, G, U})
●
seq[i, j]
the RNA sequence from position i to j
●
str
the best 2nd structure for seq
(over alphabet {(, ), .})
●
str[i, j]
the best 2nd structure for seq[i, j]
●
score[i, j]
the number of base pairs in str[i, j]
Nussinov
●
i unpaired and str[i+1, j]
i
●
i+1
j
j unpaired and str[i, j-1]
i
●
j
j-1
j
seq[i] · seq[j] and str[i+1, j-1]
i
●
j-1
i+1
str[i, k] and str[k+1, j]
for some i < k < j
i
k
k+1
j
Nussinov
score [i , j ] =
i
i
{
0
if j − i < 2
{
score [i + 1, j ]
score [i , j − 1]
max
score [i + 1, j − 1] + 1 if seq [i ]⋅seq [ j ]
max i < k < j − 1 (score [i , k ] + score [k + 1, j ])
i+1
j-1
j
i
j
i
i+1
j-1
k
k+1
j
j
Nussinov
score [i , j ] =
i
{
0
if j − i < 2
{
score [i + 1, j ]
score [i , j − 1]
max
score [i + 1, j − 1] + 1 if seq [i ]⋅seq [ j ]
max i < k < j − 1 (score [i , k ] + score [k + 1, j ])
i+1
i
j-1
Space?
j
i
j
i
j-1
i+1
k
k+1
Time?
j
j
Nussinov
score [i , j ] =
i
{
0
if j − i < 2
{
score [i + 1, j ]
score [i , j − 1]
max
score [i + 1, j − 1] + 1 if seq [i ]⋅seq [ j ]
max i < k < j − 1 (score [i , k ] + score [k + 1, j ])
i+1
i
j-1
Space? O(n2)
j
i
j
i
j-1
i+1
k
k+1
Time? O(n3)
j
j
Nussinov
U
U
U
G
G
A
U
A
A
A
A
U
0
0
0
0
0
0
0
0
0
0
0
score [i , j ] =
U
0
0
0
0
0
0
0
0
0
0
0
{
0
U
G
G
A
U
A
A
A
A
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
if j − i < 2
{
score [i + 1, j ]
score [i , j − 1]
max
score [i + 1, j − 1] + 1 if seq [i ]⋅seq [ j ]
max i < k < j − 1 (score [i , k ] + score [k + 1, j ])
Nussinov
U
U
U
G
G
A
U
A
A
A
A
U
0
0
0
0
0
0
0
0
0
0
0
score [i , j ] =
U
0
0
0
0
0
0
0
0
0
0
0
{
0
U
0
0
0
0
0
0
0
0
0
0
0
G
1
1
0
0
0
0
0
0
0
0
0
G
2
1
1
0
0
0
0
0
0
0
0
A
2
2
1
0
0
0
0
0
0
0
0
U
2
2
1
1
1
0
0
0
0
0
0
A
3
2
2
1
1
0
0
0
0
0
0
A
3
3
2
1
1
1
1
0
0
0
0
A
4
3
2
1
1
1
1
0
0
0
0
A
4
3
2
1
1
1
1
0
0
0
0
if j − i < 2
{
score [i + 1, j ]
score [i , j − 1]
max
score [i + 1, j − 1] + 1 if seq [i ]⋅seq [ j ]
max i < k < j − 1 (score [i , k ] + score [k + 1, j ])
Backtracking
U
U
U
G
G
A
U
A
A
A
A
U
0
0
0
0
0
0
0
0
0
0
0
score [i , j ] =
U
0
0
0
0
0
0
0
0
0
0
0
{
0
U
0
0
0
0
0
0
0
0
0
0
0
G
1
1
0
0
0
0
0
0
0
0
0
G
2
1
1
0
0
0
0
0
0
0
0
if j − i < 2
{
A
2
2
1
0
0
0
0
0
0
0
0
U
2
2
1
1
1
0
0
0
0
0
0
A
3
2
2
1
1
0
0
0
0
0
0
A
3
3
2
1
1
1
1
0
0
0
0
A
4
3
2
1
1
1
1
0
0
0
0
A
4
3
2
1
1
1
1
0
0
0
0
(4 (3 (2 . (1 . )1 )2 )3 )4 .
score [i + 1, j ]
score [i , j − 1]
max
score [i + 1, j − 1] + 1 if seq [i ]⋅seq [ j ]
max i < k < j − 1 (score [i , k ] + score [k + 1, j ])
Backtracking
U
U
U
G
G
A
U
A
A
A
A
U
0
0
0
0
0
0
0
0
0
0
0
score [i , j ] =
U
0
0
0
0
0
0
0
0
0
0
0
{
0
U
0
0
0
0
0
0
0
0
0
0
0
G
1
1
0
0
0
0
0
0
0
0
0
G
2
1
1
0
0
0
0
0
0
0
0
if j − i < 2
{
A
2
2
1
0
0
0
0
0
0
0
0
U
2
2
1
1
1
0
0
0
0
0
0
A
3
2
2
1
1
0
0
0
0
0
0
A
3
3
2
1
1
1
1
0
0
0
0
A
4
3
2
1
1
1
1
0
0
0
0
A
4
3
2
1
1
1
1
0
0
0
0
(4 (3 (2 . ( 1 . ) 1 )2 )3 )4 .
Time?
score [i + 1, j ]
score [i , j − 1]
max
score [i + 1, j − 1] + 1 if seq [i ]⋅seq [ j ]
max i < k < j − 1 (score [i , k ] + score [k + 1, j ])
Backtracking
U
U
U
G
G
A
U
A
A
A
A
U
0
0
0
0
0
0
0
0
0
0
0
score [i , j ] =
U
0
0
0
0
0
0
0
0
0
0
0
{
0
U
0
0
0
0
0
0
0
0
0
0
0
G
1
1
0
0
0
0
0
0
0
0
0
G
2
1
1
0
0
0
0
0
0
0
0
if j − i < 2
{
A
2
2
1
0
0
0
0
0
0
0
0
U
2
2
1
1
1
0
0
0
0
0
0
A
3
2
2
1
1
0
0
0
0
0
0
A
3
3
2
1
1
1
1
0
0
0
0
A
4
3
2
1
1
1
1
0
0
0
0
A
4
3
2
1
1
1
1
0
0
0
0
(4 (3 (2 . (1 . )1 )2 )3 )4 .
Time? O(n2)
score [i + 1, j ]
score [i , j − 1]
max
score [i + 1, j − 1] + 1 if seq [i ]⋅seq [ j ]
max i < k < j − 1 (score [i , k ] + score [k + 1, j ])
Nussinov
VS
(((.....)))
(((.(.)))).
True
Nussinov
VS
© Copyright 2026 Paperzz