Information Encoding in Biological Molecules

Lab 10.2:
Homology Modeling Lab
Boris Steipe
[email protected]
http://biochemistry.utoronto.ca/steipe
Departments of Biochemistry and Molecular and Medical Genetics
Program in Proteomics and Bioinformatics
University of Toronto
Lab 10.2
1
Concepts
1. Sequence alignment is the single most
important step in homology modeling.
2. Reasons to model need to be defined.
3. Fully automated homology modeling services
perform well.
4. SwissModel in practice.
Lab 10.2
2
Concept 1:
Sequence alignment
is the single most
important step in
homology modeling.
Lab 10.2
3
Superposition vs Alignment
• The coordinates of two proteins are
“superimposed” in space.
• An alignment may be derived by correlated
pairs of alpha-carbons.
• A superposition may differ from an
optimized symbolic alignment...
Lab 10.2
4
Insert of 4 residues
• Optimal sequence aligment
• gktlit
nfsqehip
• gktlisflyeqnfsqehip
• Optimal structure alignment (blue=helix)
• gktlitnfsq
ehip
• gktlisflyeqnfsqehip
Lab 10.2
5
Off by 1, Off by
4
3.8Å
• A shift in alignment of 1
residue corresponds to a
skew in the modeled
structure of about 4 Å (3.8 Å
is the inter-alpha carbon
distance)
• Nothing you can do AFTER
an alignment will fix this error
(not even molecular
dynamics).
Lab 10.2
6
Alignment is the limiting step
for homology model accuracy
No amount of forcefield minimization will put a
misaligned residue in the right place !
HOMSTRAD @ CASP4:
Williams MG et al. (2001)
Proteins Suppl.5: 92-97
Lab 10.2
7
Indels (inserts or deletions)
• Observations of known similarities in structures
demonstrate that uniform gap penalty assumptions
are NOT BIOLOGICAL.
• Indels are most often observed in loops, less often in
secondary structure elements
• When they do not occur in loops, there is usually a
maintenance of helical or strand properties.
Lab 10.2
8
Can we do better with the gap
assumption?
• Required: position specific gap penalties
• One approach: implemented in Clustal as secondary
structure masks
• Get secondary structure information, convert it to
Clustal mask format. (Easy - read documentation !)
Lab 10.2
9
Secondary
structure
from PDB ....
(Algorithm ?)
Lab 10.2
10
Secondary
structure
from RasMol
....
(DSSP !)
Lab 10.2
11
Concept 2:
Reasons to model
need to be defined.
Lab 10.2
12
Use of homology models
Interpreting homology models: biochemical inference from 3D similarity
•Bonds
•Angles, plain and dihedral
•Surfaces, solvent accessibility
•Amino acid functions, presence in structure patterns
•Spatial relationship of residues to active site
•Spatial relationship to other residues
•Participation in function / mechanism
•Static and dynamic disorder
•Electrostatics
•Conservation patterns (structural and functional)
•Posttranslational modification sites
Lab 10.2
13
Abuse of homology models
•Modelling structures that cannot / will not be verified
•Analysing geometry of model
•Interpreting loop structures
Lab 10.2
14
Databases of Models
Don’t make models unless you check first...
• Swiss-Model repository
• 64,000 models based on 4000 structures and Swiss-Prot proteins
• ModBase
• Made with "Modeller" - 15,000 reliable models for
substantial segments of approximately 4,000 proteins in
the genomes of Saccharomyces cerevisiae, Mycoplasma
genitalium, Methanococcus jannaschii, Caenorhabditis
elegans, and Escherichia coli.
Lab 10.2
15
Concept 3:
Fully automated
services perform
well.
Lab 10.2
16
Homology Modeling
Software?
• Freely available packages perform as
good as commercial ones at CASP
(Critical Assessment of Structure
Prediction)
• Swiss Model (tutorial)
• Modeller (http://guitar.rockefeller.edu)
Lab 10.2
17
Swiss-Model steps:
1. Search for sequence similarities
BLASTP against
EX-NRL 3D
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
18
Swiss-Model steps:
1. Search for sequence similarities
2. Evaluate suitable templates
Identity: > 25%
Expected model :
> 20 resid.
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
19
Swiss-Model steps:
1. Search for sequence similarities
2. Evaluate suitable templates
3. Generate structural alignments
Select regions of similarity
and match in coordinatespace (EXPDB).
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
20
Swiss-Model steps:
1.
2.
3.
4.
Search for sequence similarities
Evaluate suitable templates
Generate structural alignments
Average backbones
Compute weighted
average coordinates for
backbone atoms expected
to be in model.
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
21
Swiss-Model steps:
1.
2.
3.
4.
5.
Search for sequence similarities
Evaluate suitable templates
Generate structural alignments
Average backbones
Build loops
Pick plausible loops from
library, ligate to stems; if
not possible, try
combinatorial search.
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
22
Swiss-Model steps:
1.
2.
3.
4.
5.
6.
Search for sequence similarities
Evaluate suitable templates
Generate structural alignments
Average backbones
Build loops
Bridge incomplete backbones
Bridge with overlapping
pieces from pentapeptide
fragment library, anchor
with the terminal residues
and add the three central
residues.
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
23
Swiss-Model steps:
1.
2.
3.
4.
5.
6.
7.
Search for sequence similarities
Evaluate suitable templates
Generate structural alignments
Average backbones
Build loops
Bridge incomplete backbones
Rebuild sidechains
Rebuild sidechains from
rotamer library - complete
sidechains first, then
regenerate partial
sidechains from
probabilistic approach.
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
24
Swiss-Model steps:
1.
2.
3.
4.
5.
6.
7.
8.
Search for sequence similarities
Evaluate suitable templates
Generate structural alignments
Average backbones
Build loops
Bridge incomplete backbones
Rebuild sidechains
Energy minimize
Gromos 96 Energy minimization
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
25
Swiss-Model steps:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Search for sequence similarities
Evaluate suitable templates
Generate structural alignments
Average backbones
Build loops
Bridge incomplete backbones
Rebuild sidechains
Energy minimize
Write Alignment and PDB file
e-mail results
Peitsch M & Guex N (1997) Electrophoresis 18: 2714
Lab 10.2
26
Swissmodel in comparison
3D-Crunch:
Manual alternatives:
Modeller
...
Automatic alternatives:
SwissModel
sdsc1
3djigsaw
pcomb_pcons
cphmodels
easypred
211,000 sequences ->
64,000 models
# 1 for RMSD and
% correct aligned,
#2 for coverage
Controls:
>50 % ID: ~ 1 Å RMSD
40-49% ID: 63% < 3Å
25-29% ID: 49% < 4Å
Guex et al. (1999) TIBS 24:365-367
EVA: Eyrich et al. (2001) Bioinformatics 17:1242-1243 (http://cubic.bioc.columbia.edu/eva)
Lab 10.2
27
What structure elements change
between similar sequence?
• Subtle changes in protein backbone path
• Changes in amino acid side-chain
rotamer orientation
• backbone dependent
• Loops added or truncated
• Model may be incomplete
Lab 10.2
28
Concept 4:
SwissModel
in practice.
Lab 10.2
29
SwissModel ... first approach mode
http://www.expasy.org/swissmod
Lab 10.2
30
... enter the ExPDB template ID...
Lab 10.2
31
... run in Normal Mode (Except if defining a DeepView project )...
Lab 10.2
32
... successful submission.
Results come by e-mail.
Lab 10.2
33
Optimal sequence alignment
http://cbrmain.cbr.nrc.ca/EMBOSS/index.html
[...]
# Matrix: EBLOSUM35
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 122
# Identity:
36/122 (29.5%)
# Similarity:
55/122 (45.1%)
# Gaps:
28/122 (23.0%)
# Score: 150.5
[...]
#=======================================
23 LNNKKTIAEGRRIPISKAVENPTATEIQDVCSAVGLNVFLEKNKMYSREW
|:.||:.|||||||...||.|....|:.:....:||. |..:.|.|.:.|
11 LDSKKSRAEGRRIPRRFAVPNVKLHELVEASKELGLK-FRAEEKKYPKSW
72
59
73 NRDVQYRGRVRVQLKQEDGSLCLVQFPSRKSVMLYAAEMIPKLKTRTQKT
.:..|||.|:.:
.::..:|:..|..|.:::
60 ---WEEGGRVVVEKR-----------GTKTKLMIELARKIAEIR-----123 GGADQSLQQGEGSKKGKGKKKK
:|..:|
||.|.||||
90 ---EQKREQ----KKDKKKKKK
Lab 10.2
122
89
144
104
34
Optimal structural superposition
1.4Å
in 32 res.
Lab 10.2
35
Questions ? Feedback ?
[email protected]
http://biochemistry.utoronto.ca/steipe/
Lab 10.2
36