Computational RNA Design

Computational RNA Design
Peter F. Stadler
Bioinformatics Group, Dept. of Computer Science &
Interdisciplinary Center for Bioinformatics,
University of Leipzig
Max Planck Institute for Mathematics in the Sciences
RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology
Institute for Theoretical Chemistry, Univ. of Vienna (external faculty)
Center for non-coding RNA in Technology and Health, U. Copenhagen
The Santa Fe Institute (external faculty)
Boston, Jul 07 2014
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
1 / 32
From Sequence to Shape and Back
A historical note: why inverse folding is easy in practice ...
The starting point
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
A CC
U
GC
CG
GC
GC
GC
AU
G
U A
U AU
U GA
A
U
UGCUC
CUCG
G
G
GCGAG
G
GU A A GC
C
U
U
UC
A
C GGGG
GC
AU
CG
CG
UA
UA
G
C C
Sequence-structure relationships in RNA
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
2 / 32
Sequence-Structure Map of RNA
1
Redundancy: Many more sequences than structures
2
Sensitivity: Small changes in the sequences may lead to large changes
in the structure
Neutrality: A substantial fraction of mutations does not alter the
structure.
Implications:
1
Neutral Networks: S(ψ) forms a connected “percolating” network in
sequence space for all “common” structures.
3
2
Mutual Accessibility: The neutral networks of any two structures almost
touch each other somewhere in sequence space.
RNA sequence evolve drift-like while maintaining secondary structure
Substitution pattern reflects selective constraints on the structure
Proc.Roy.Soc.B 255 279-284 (1994), Proc. Natl. Acad. Sci. USA 93, 397-401 (1996),
Bull. Math. Biol. 59, 339-397 (1997), RNA 7: 254-265 (2000).
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
3 / 32
Implementations
Several programs are available that implement simple inverse folding:
RNAinverse (Hofacker et al. 1994)
unbiased search using adaptive walks and a simple hierarchical
problem decomposition
RNAdesigner (Andronescu 2004)
Uses sequence bias in paired/unpaired regions, more
sophisticated decomposition to speed up search
INFO-RNA (Busch & Back 2006)
Uses a sequence with minimal energy for the target structure as
starting point (strong sequence bias)
NUPACK (Zadeh, Wolf, Pierce 2011)
Allows design of multi-strand structures
RNAiFOLD (Garcia-Martin, Clote, Dotu 2013)
uses constraint programming or a large neighborhood search
heuristic
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
4 / 32
One Sequence, Two target Structures
Schultes, EA & Bartel, DP; Science (2000), 289:448-452
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
5 / 32
Design of Artificial Riboswitches
Riboswitches are a convenient gadget in synthetic biology
Task: combine ligand-specific sensor with an effector
(i.e., some form of a regulatory element)
Question: to what extent is this really modular?
Idea: use RNA structure prediction to model the interplay of
sensor and effector
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
6 / 32
Riboswitches: Regulators of Gene Expression
Transcriptional versus translational riboswitch
Kim & Breaker, Biol. Cell (2008)
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
7 / 32
5’
UUUUU
RNAP
5’
RNAP
Ribo-Switching of Transcription
5’
RNAP
Transcriptional ON − Switch
5’
5’
RNAP
UUUU
UUUUU
RNAP
5’
RNAP
5’
RNAP
5’
RNAP
Metabolite
5’
RNAP
Transcriptional OFF − Switch
5’
5’
P.F. Stadler (Leipzig)
RNAP
Metabolite
RNAdesign
5’
UUUUU
Boston, Jul 07 2014
8 / 32
Theophylline Aptamer
Unbound aptamer
Model predicted using Rosetta
P.F. Stadler (Leipzig)
Theophylline bound aptamer
Crystal Structure
(PDB-ID 1O15)
RNAdesign
Boston, Jul 07 2014
9 / 32
Design Idea
O
H
N
N
5’
A
C
U
U
C
A
C
C
U U G
G U A
C G C
U A
G
A U G
C G U
AAGUGAUACCAG CCC U
O
N
N
UCU
G
U
C G
U A
A U
C G
G CC
C
A UU
C G
C G
A
C
A
U
G
A
G C
U A
G C
A U
A UCA
O
H
N
N
O
termination
UUUUUUUU AGGA
bgaB
3’
5’
N
N
transcription
UUUUUUUU AGGA
bgaB
3’
Goal: a theophylline triggered on-switch
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
10 / 32
Essence of the Multistable Design Problem
Design a sequence that compatible with not just one but several
target structures
Each target should be almost a ground state
Questions:
When can this be solved?
How can we include ligang specificity
First step: generate sequences that are compatible with all design
goals.
2nd step: optimize the sequences toward the design goal(s)
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
11 / 32
Bi-Stable Structures
Given two structures S1 S2 , are there sequences compatible to both?
intersection theorem:
C[S1 ] ∩ C[S2 ] 6= ∅
Proof: Dependency graph decomposes into paths and cycles of even length
(((...)))((...))((...)).
(((.((.(((...))).)).))).
the alternating sequence AUAUAU... is compatible with each path and cycle.
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
12 / 32
Examples of bistable structures
7.40
1
GUCCUUGCGUGAGGACAGCCCUUAUGUGAGGGC
GUCCUUGCGUGAGGACAGCCCUUAUGUGAGGGC
4
5
8
22
19
2.97
29
2.10
20
30
2.63
28
4.33
24
6.30
3.70
2
3
2.30
7
11
12
2.40
6
21
14
2.20
5.50
9
10
18
13
25
23
26
2.10
17
27
2.60
15
2
Ξ(x) = E (x, Ω1 ) + E (x, Ω2 ) − 2G(x) + ξ E (x, Ω1 ) − E (x, Ω2 )
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
13 / 32
Multi-Stable Structures
Generalization to multiple Targets:
Theorem. There is a sequence satisfying each secondary structure
constraints S1 , S2 , . . . , SM if and only if the overlap graph
S1 ∪ S2 ∪ · · · ∪ SM is bipartite.
10
15
0
0.0
-2
slope
log( fraction of bipartite graphs )
(..)..(....).(....)..
(.....)(...).........
(......)(....).......
(..(....)......).....
(..(......)(...))....
...(.......(....).)..
-0.1
-0.2
-4
-0.3
-6
2
3
4
5
6
7
number of structures
8
5
-8
0
50
100
150
200
chain length
1
P.F. Stadler (Leipzig)
20
RNAdesign
Boston, Jul 07 2014
14 / 32
Solving Multi-Constraint Design Problems
one possibility: constraint programming [Dotu’s work]
stochastic heutristics
TM
Complex search space. Only C := i=1 C(Ωi ) allowed
How to choose a good (fair) starting position?
simple for M = 2: constraints are path and cycles. Simple
recursions to sample uniformly from C
Difficult for M > 2: need more complex descompositions of graphs
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
15 / 32
How to sample uniformly?
Use the overlap graph: it defines the dependence structure of
nucleoides.
Zeroth Step: connected components are indepenently sampled.
First Step: block decomposition of the overlap graph.
Color every block separately with fixed colors at the cut points
G1
x
x
x
x
x
G3
y
y
G1
x
x
G’2
y
G3
y
G2
G’’
2
Second Step: Color Blocks by Dynamic Programming
Biopolymers 99: 1124-1136 (2013)
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
16 / 32
Coloring dense blocks: Ear decomposition
ear decomposition
P1
P3
P6
P0
P2
P5
P4
G6
G5
G6
G5
G4
G4
G3
G3
G2
G2
G1
G1
G0
G0
complement graph with attachment vertices
dynamic programming approach to count colorings with given
color combinations at the attachment vertices.
memory exponential in the maximum number of attachment
vertices α, CPU time in the maximum size of the union of
attachment vertices in consecutive steps β
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
17 / 32
A design for the SV11 switch
U
U G
A
U
U G
C G A
C G
U G
G G
C G
G
C
C G
G U
U C
G G C A C CG U
G U
C
G C
G
G G GCC
G
U C
U GCGG
A
U
U
G
U U
G
G
U A A C AU
U
UU C
A
U
G
UA
A GG
A
U
U
GC GUA
G
C
C
U CG U G
G
A
A CU
G
GU
C
C
A
GG
U
U
G U
UGU
dot.ps
CCGCGCUCGGGGUUCCUGGGUUUGUUA UCGCUCA A GUGGGUGGUA GCUCA GA UUA GCGGUUA A A GCUA UUGCUUGUUCGUGCGGUA GUGGGUUCGGGA A GUUUCCGGGCCA CGGC
CCGCGCUCGGGGUUCCUGGGUUUGUUA UCGCUCA A GUGGGUGGUA GCUCA GA UUA GCGGUUA A A GCUA UUGCUUGUUCGUGCGGUA GUGGGUUCGGGA A GUUUCCGGGCCA CGGC
CCGCGCUCGGGGUUCCUGGGUUUGUUA UCGCUCA A GUGGGUGGUA GCUCA GA UUA GCGGUUA A A GCUA UUGCUUGUUCGUGCGGUA GUGGGUUCGGGA A GUUUCCGGGCCA CGGC
CCGCGCUCGGGGUUCCUGGGUUUGUUA UCGCUCA A GUGGGUGGUA GCUCA GA UUA GCGGUUA A A GCUA UUGCUUGUUCGUGCGGUA GUGGGUUCGGGA A GUUUCCGGGCCA CGGC
1
14
18
24
1
6
21
20
16
9
10
17
12
15
11
22
13
23
19
3
4
2
Population Probability
25
0.8
5
7
0.6
0.4
0.2
8
0
-32.0
-34.0
-36.0
-38.0
P.F. Stadler (Leipzig)
-40.0
-42.0
-44.0
RNAdesign
1
100
10000
Time [arbitrary units]
1e+06
1e+08
Boston, Jul 07 2014
18 / 32
Back to the Theophylline Switch
O
H
N
N
5’
A
C
U
U
C
A
C
C
U U G
G U A
C G C
U A
G
A U G
C G U
AAGUGAUACCAG CCC U
O
N
N
UCU
G
U
C G
U A
A U
C G
G CC
C
A UU
C G
C G
A
C
A
U
G
A
G C
U A
G C
A U
A UCA
O
H
N
N
O
termination
UUUUUUUU AGGA
bgaB
3’
5’
N
N
transcription
UUUUUUUU AGGA
bgaB
3’
Goal: a theophylline triggered on-switch
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
19 / 32
Designed Theophylline Switches
Transcribed from arabinose promoter of plasmid pBAD, using β-galactosidase as reporter gene.
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
20 / 32
Construct Expression
RS1:
spacer
sensor
poly(U)
3’-part terminator
aptamer-UUACAUC------------UGAAGUGCUGCC--UUUUUUUU
RS2:
aptamer-UGAUCUCGCU---------UGAAGUGCUGC---UUUUUUUU
RS3:
aptamer-UUUACAUACUCGGUAAAC-UGAAGUGCUGCCA-UUUUUUUU
RS4:
aptamer-AACCGAAAUUUGCGCU---UGAAGUGCUGC---UUUUUUUU
RS8:
aptamer-CUCCUAGUGGAG-------UGAAGUGCUG----UUUUUUUU
RS10: aptamer-GAAAUCUC-----------UGAAGUGCUG----UUUUUUUU
0 mM Theophylline
2 mM Theophylline
2 mM Caffeine
Activity [MU]
150
150
100
100
5050
00
P.F. Stadler (Leipzig)
RS1
RS2
RS3 0,00 RS4
RNAdesign
RS8
RS10
Boston, Jul 07 2014
21 / 32
Transcriptional Switching
C
T10
_
+
RS10
_
+
0 mM Theophylline
2 mM Theophylline
4
3
bgaB
2
1
23S rRNA
0
1
RS10
Northern blot of RS10 and terminator T10
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
22 / 32
Optimizing the Switch
RS10loop
5’
AA
AU
G A
C
A
G C
A U
G C
A U
C G
U A
U A
RS10shift
C G
C
A U
U U
19 nt spacer
C G
G U
G C
C G
U A
A U
C G
A U
C G
G U
AAGUGAUACCAG CCCUUG UUUUUUU AGG A
bgaB
3’
5’
AU
C
A
A U
G C AGG A
A
C
U
U
C
C
A
U U
C
G U
G
C G
U A
A
C
A U
C G
G
AAGUGAUACCAG CCCUUG
19 nt spacer
GCUUUUUUUU
AGC AGG A
bgaB
bgaB
P.F. Stadler (Leipzig)
3’
3’
RS10ΔT
RS10
5’
bgaB
5’
AGC AGG A
bgaB
bgaBU8
RNAdesign
3’
5’
AGC AGG A
bgaB
3’
bgaBshift
Boston, Jul 07 2014
23 / 32
Optimizing the Switch
0 mM Theophylline
2 mM Theophylline
2 mM Caffeine
Activity [MU]
150
150
100
100
50
50
8
U
ab
bg
aB
sh
RS10ΔT
bg
RS10
shift
aB
1
RS10
loop
bg
RS10
ift
0
0
Activities of optimized constructs
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
24 / 32
Working in Tandem ...
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
25 / 32
... works even better
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
26 / 32
A more principled way to include ligand binding
Known/measured binding energy −ε of the ligand to a particular
structural motif Ψ necessary for binding.
without ligand: we want structural feature Ω
with ligand: we want structural feature Ψ
Compute the partition function Z [Ψ] over all structures with feature
Ψ.
Partition function over structures without feature Ψ is
Z [¬Ψ] := (Z − Z [Ψ])/Z .
Binding distorts the ensembl of structure when the ligand is
present: ZL = Z [¬Ψ] + Z [Ψ] exp(−ε)
Objectives
without ligand: p0 (Ω) := Z [Ω]/Z → max and p0 (Ψ) should be small
with ligand: pL (Ψ) := Z [Ψ] exp(−ε)/ZL → max and pL (Ω) should be
small.
...easy if Ψ and Ω are mutually exclusive, otherwise we also need the partition
function Z (Ω ∧ Ψ).
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
27 / 32
Using Constraints ...
Modified folding algorithms that scores certain structures differently
Z {ψ; e} scores a (set of) pattern(s) ψ (in practice e.g. the loop forming
the binding pocket) with bonus energies e
Key relationship:
[RNA · L]
Z {ψ; e}
=K =
[RNA] [L]
ZzL
Set zL = 1 for a small molecular ligand and gauge the binding energies
accordingly.
Z {ψ; e} ≈ Z [¬Ψ] + Z [Ψ] exp(−ε)
in the more general model with (soft) constraints we may include a
more elaborate parametrization that includes e.g. a set of variant
binding site structures ...
Details of the theory (and implementations) are still being developed ...
see Ronny’s talk
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
28 / 32
Designing an Integrated XOR Switch
Idea: Combination of two adptamers interacting non-linearly
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
29 / 32
Designing an Integrated XOR Switch
4 computational designs for the cobalamin and theophyllin aptamers
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
30 / 32
Initial experimental tests
... are promising ... but a lot of optimization will still be necessary
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
31 / 32
Acknowledgements
Funding: DFG
Collaborators:
Ronny Lorenz
Mörl Lab (Leipzig):
Manja Wachsmuth
Nadine Weissheimer
Hofacker Lab (Vienna):
Sven Findeiß
Christoph Flamm
Christian Höner zu Siederdissen
P.F. Stadler (Leipzig)
RNAdesign
Boston, Jul 07 2014
32 / 32