Workshop in Computational Structural Biology - CS

4. Modeling of side chains
1
Side chain modeling is part of
structure prediction
Protein Structure Prediction:
– given:
– predict:
sequence of protein
structure of protein
Challenges:
– conformation space
• goal: describe continuous, immense space of conformations in an efficient and
representative way
– realistic energy function
• goal: energy minimum at or near experimentally derived structure (native)
– efficient and reliable search algorithm
• goal: locate minimum (global minimum energy conformation GMEC)
Prediction of side chain conformations:
– subtask of protein structure prediction
2
The importance of side chain
modeling
Side chain prediction
subtask of protein structure prediction
• given: correct backbone conformation
• predict: side chain conformations (i.e. whole protein)
• successful prediction of protein structure depends on
successful prediction of the side chain conformations
• complete details not solved by experiment
• allows evaluation of protocol at detailed, full-atom level
• allows flexibility in docking
3
Today’s menu
Prediction of side chain conformations
1. rotamer libraries
2. dependence on backbone accuracy
3. approaches that locate GMEC or MECs
Rosetta & other approaches
DEE - Dead end elimination, SCWRL, PB - Belief
propagation, LP -Linear integer programming
4
Side chains are described as rotamers
Dihedral angles c1-c4 define side chain
(assuming equilibrium bond and angle values)
From wikipedia
5
Side chains assume discrete
conformations
Staggered conformations
minimize collision with
neighboring atoms
Serine c1 preferences
t=180o
g+=+60o
6
g-=-60o Lovell,
2000
Rotamer libraries contain
preferred conformations
Rotamer: discrete side chain
conformation defined by c1-c4
Shapovalov and Dunbrack*
2011
BBDEP
* Shapovalov & Dunbrack, Structure 2011
3854
1.8
7
Dunbrack, 2002
Representative rotamer libraries are
surprisingly small
Ponder & Richards, 1987:
Analysis of ~20 proteins
(~2000 side chains)
67 rotamers can adequately
represent side chain
conformations (for 17/20aa)
8
Backbone dependent
rotamer libraries
Dunbrack & Karplus, 1993:
• For each f-y (20ox20o) bin, derive statistics
on c1+c2 values
• Reflects dependence of side chain
conformation on backbone conformation
y
f
9
Rotamer preferences depend on backbone
conformation: example Valine
Observed frequency of
gauche+, gauche- + trans
gauche+
Sheet
gauche-
trans
is very different in different
backbone conformations
sheet, helix, and coil regions
Helix
(n=850 proteins, <1.7 Å resolution,
and pair-wise seqid < 50%)
Coil
10
Bayesian statistical analysis of
rotamer library Dunbrack 1997
Estimate populations
• for all rotamers,
• of all side chain types,
• for each f-y (10ox10o)
bin
P(c1| f, y)
P(c2, c3, c4 | c1, f, y)
using Bayesian formalism
11
Rotamer energy (Edun):
a knowledge-based score
Boas & Harbury , 2007
1. Calculate pobs: frequencies of
rotamers (or any other
feature)
2. Convert into effective
potential energy using
Boltzmann equation
DG = -RTln (pobs/pexp)
12
Rotamer energy Edun
• Calculate rotamer preference for given F-Y bin:
1. For each rotamer r of aa: determine a probability density estimate r(j,f|r)
(= Ramachandran distribution for each rotamer)
13
Rotamer energy Edun
• Calculate rotamer preference for given F-Y bin:
1. For each rotamer r of aa: determine a probability density estimate r(j,f|r)
(= Ramachandran distribution for each rotamer)
2. Use Bayes’ rule to invert this density to produce an estimate of the rotamer
probability
P(r): backbone independent
probability of rotamer r
14
Bayesian statistical analysis of
rotamer library Dunbrack 1997
Estimate populations
• for all rotamers,
• of all side chain types,
• for each f-y (10ox10o)
bin
P(c1| f, y)
P(c2, c3, c4 | c1, f, y)
using Bayesian formalism
Combine
• prior distribution based on
P(f)*Py)
• fully f,y dependent data
… to describe both
• well-sampled regions
• sparsely sampled regions
15
Rotamer energy Edun
Prior distributions:
P(c1/f,ψ)=P(c1/f)*P(c1/ψ)
P(c2, c3, c4/c1) =
*P(c2/c1)*P(c3/c2)*P(c4/c3)
16
Structure determination revisited
Refit electron density maps
15% of non-rotameric side chains can be refitted to 1 (or 2) rotameric conformations
17
(Shapovalov & Dunbrack, 2007)
Structure determination revisited
Rotameric side chains
have lower entropy
(dispersion of electron
density around c)
than side chains with
multiple conformations in
pdb, or non-rotameric
side chains
c1 entropy
Refit electron density
maps
Residue type
18
(Shapovalov & Dunbrack, 2007)
2011: Improved
Dunbrack library
Many good reasons:
1. More structural data
2. Improved set: Electron density calculations - remove
highly dynamic side chains
3. Derive accurate and smooth density estimates of
rotamer populations (incl. rare rotamers) as continuous
function of backbone dihedral angles
4. Derive smooth estimates of the mean values and
variances of rotameric side-chain dihedral angles
5. Improve treatment of non-rotameric degrees of
freedom
19
Shapovalov & Dunbrack, 2011
Smoother density function
P(r = g+| j,f, aa = Ser)
histogram
Original probability
density
Using adaptive
density kernels
(integrate over
neighborhood
of adaptive size)
20
Better description of
non-rotameric side chains
Example: GLN c3 angles for (c1=g+; c2=t)
Alpha helix
Beta sheet
Loops (polyP II)
Original library
Met
c1
SP3
New library
Gln
c3
SP2
c3
21
…. Leads to slight improvement
in modeling
22
Some conclusions about rotamer
libraries
Rotamer frequency:
• rare conformations reflect increased internal strain – important to
take frequency into account
• frequency can be used as energy term: Ei= -K ln Pi
Increasing availability of high-resolution structures
• narrows distribution around rotamer in library
• Indicates that errors are responsible for outliers
Refitting of electron density maps
• non-rotameric conformations often incorrectly modeled and high in
entropy
23
Some conclusions about rotamer
libraries
Rotamericity <100%:
• Include more side chain conformations!
– Position-dependent rotamers (example: unbound
conformations in docking predictions)
– Additional conformations around rotamer (± sd)
– Non-rotameric side chain angles: describe as continuous
density function
24
Today’s menu
Prediction of side chain conformations
1. rotamer libraries
2. dependence on backbone accuracy
3. approaches that locate GMEC or MECs
Rosetta & other approaches
DEE - Dead end elimination, SCWRL, PB - Belief
propagation, LP -Linear integer programming
25
Backrub Motions: “How protein backbone
shrugs when side chain dances”
•
•
•
•
Most common local backbone move in ultra-high resolution
structures (<1.0Å)
Changes side chain orientation without effect on backbone
3 rotations around Ca-Ca axes
In 3% of all residues (1/4=Serine)
Two distinct rotamers related by backrub
moves for Ile (tt,mm)
Change of  1,3
Davis, 2006
Compensatory changes
26
of  1,2 and 2,3
Today’s menu
Prediction of side chain conformations
1. rotamer libraries
2. dependence on backbone accuracy
3. approaches that locate GMEC or MECs
Rosetta & other approaches
DEE - Dead end elimination, SCWRL, PB - Belief
propagation, LP -Linear integer programming
27
Prediction of side chain
conformations using rotamers
• Given:
– protein backbone
– for each residue: set of possible
conformations (rotamers from library)
• Wanted:
Combination of rotamers that results in
lowest total energy
GMEC = min (SEir + SEirjs)
Self
energy
Pair
energy
i
i
i+1
i+1
i+2
i+2
location of GMEC is NP-hard
(Fraenkel, 1997; Pierce, 2002)
28
Side chain modeling = find best
combination of rotamers
How?
1. systematic scan
• for a protein with
– 50 residue, and
– 9 rotamers/residue
number of combinations to scan:
N=509 ~ 1047 !
 feasible only for small proteins
 search space needs to be reduced
i
ia
Pos
…
ja
jb
….
i+1
i+2
ic
ib
… ia
ib
eia,ja
eib,ja
eia,jb
eib,jb
…
Etot= Si Ei + Si,j Eij
29
Search strategies for locating
GMEC or MECs
Deterministic Approaches (e.g. DEE):
– Guarantee location of GMEC
– Can be slow
– Advantageous when GMEC is (the only) nearnative conformation
Heuristic Approaches (e.g. MC):
– Locate Population of low-energy models (not
necessarily GMEC)
– Faster, often converge
30
Guaranteed finding of GMEC
DEE (Dead-end elimination)
–
prune impossible rotamers, determine GMEC from reduced rotamer
set
Residue-interacting graphs (SCWRL)
–
–
use dynamic programming on graph to find GMEC
start with “leafs”: residues with low connectivity in graph
Linear Programming (Kingsford)
–
–
–
solve set of linear constraints
can locate GMEC for sparsely connected graphs
poses contrains on energy function
31
Dead End Elimination (DEE)
• Approach: remove rotamers that cannot be part of
the GMEC
Rotamer r at position i can be eliminated if there exists a
rotamer t such that:
r
E
t
Combinations of rotamers at positions j≠i
• Iterative application of DEE removes many rotamers, at
certain positions only one rotamer is left
•
(Note that some rotamers can be removed from the beginning because they clash
with the backbone - too high Eit)
32
Desmet & Lasters, 1992
Refined DEE
• Approach: remove rotamers that cannot be part of
the GMEC, second criterion:
Rotamer r at position i can be eliminated if there exists a
rotamer t such that:
r
E
t
Combinations of rotamers at positions j≠i
•
This criterion allows removing of additional rotamers
Goldstein, 1994
33
More sophisticated DEE criteria….
• Approach: remove rotamers that cannot be part of
the GMEC - additional criterion:
Rotamer r at position i can be eliminated if there exists rotamers
t1 and t2 such that
either t1 or t2 are better for
E
any combination:
t1
r
•
takes more time to compute
Combinations of rotamers at positions j≠i
At the end, we are left with 1 combination, or with a
few combinations only, that need to be evaluated using
34
other criteria
t2
DEE-based approaches
• DEE guarantees to find GMEC…
• … but may miss conformations that have only slightly
worse energy
• Given that the energy function is not perfect, we
want to find also additional conformations with
comparable energy
• Approach used in Orbit: use MC to find additional
low-energy combinations that resemble GMEC (we will
talk about this when we discuss protein design)*
35
Dahiyat & Mayo (1997)
SCWRL - residue-interacting graphs
• DEE - remain with residues with > 1
rotamer: “active residues”
• undirected graph of active residues:
– side chains = vertices
– interacting rotamer pairs: connected by
edge
• identify
– articulation points (break cluster apart) &
– bi-connected components (cannot be
broken into different parts by removing
one node)
Canutescu, 2003
Very simple energy function: only
dunbrack energy and repulsion 36
SCWRL - residue-interacting graphs
Solve a cluster using bi-connected
components
• For each, calculate best energy
given specific rotamer in biconnected residue
• Pruning is easy since energy
function only positive [Backtracking:
when certain threshold is used, a
specific rotamer (combination) can
be deleted]
Canutescu, 2003
37
Heuristic approaches
•
Define cutoff values to prune branches that
probably do not contain low-energy
conformations
•
•
•
Mean-field approach, Belief Propagation
Self-consistent algorithms
Monte-Carlo sampling
38
Sc modeling in Rosetta:
part of a cycle
• rigid body optimization
• backbone optimization
Random
perturbation
Side chain
optimization
Random
perturbation
Side chain
optimization
Rigid body
minimization
START
Rigid body
minimization
Energy
MC
FINISH
Rigid body orientations
39
Side chain modeling protocols in Rosetta
• Monte-Carlo procedure:
• heuristic
• does not converge – several runs needed to locate solution
• Use Dunbrack bb-dependent rotamer library
Approaches:
1. “Repacking” – model side chain conformation from
scratch
2. “Rotamer Trial” – refine side chain conformations
3. (“Rotamer Trial with minimization” (RTmin) –
off-rotamer sampling by minimization)
40
Monte Carlo sampling
• Pre-calculate Eir and Eirjt matrix
• Self energy: Energy between rotamer r at position i with
constant part
• Pairwise energy: between rotamer r at position i and rotamer
t at position j (sparse matrix)
Etotal = Si Eir + SiSj Eirjt
• Simulated annealing
• make random change
• start with high acceptance rate, gradually lower temperature
• acceptance based on Boltzmann
distribution
41
“Repacking”: full combinatorial side
chain optimization
• remove all side chains
• gradually add side chains: select from
backbone-dependent rotamer library
add position-specific rotamers (e.g. from unbound conformation): set
their energy to minimum rotamer energy, to ensure acceptance
• use simulated annealing to create increasingly
well packed side chains
• repeat to sample range of low-energy
conformations
42
“Rotamer trial”: side chain
adjustment
• Find better rotamers for existing structure
• pick residue at random
• search for rotamer with lower energy
• replace rotamer
• Repeated until all high-energy positions
are improved
• Fast
43
Side chain modeling: Summary
• Side chain modeling based on rotamer libraries 
Combinatorial problem
• Approaches for side chain modeling involve smart
reduction of combinatorial complexity (heuristic or
exact)
• Side chain modeling as a “toy model” for structural
modeling
• Side chain modeling can be extended to Design by
adding rotamer options of different amino acids
44