Molecular Docking based on Shape Complementary Methods

Molecular Docking based on
Shape Complementary Methods
-- Presenter: Xiaoyan Xiang
-- Advisor: Michela Taufer
1
Outline
• Purpose and my project organization
• Introduction
• Protein docking based on shape
complementary method
• Software for molecular docking
• Discussion on MD simulation and shape
complementary methods
2
Purpose
• Approaches to Molecular docking
– Shape Complementary methods:
• Using a matching technique that describes the protein and
the ligand as complementary surfaces.
– Simulation Processes:
• Simulating the actual docking process
• The ligand-protein pairwise interaction energies are
calculated.
• -- This is what we have studied in this course: molecular
dynamic simulation processes.
• Our Purpose:
– How to predict the docking pockets by the shape
complementary methods?
3
My Project Organization (1)
The most common
search engine
Basic key
words
A first list
of papers
Appropriate
database
1. Papers listed in the reference
2. Papers citing the basic articles
3. Papers related to the same
topics listed by the engine
4. Updated keywords
A second list
of papers
To rank and
find the most
correlated
papers
4
My Project Organization (2)
– Search Engine & database
• Google, pubmed (common used)
• Library -> Electronic Journals -> Journals related to
“computational science/chemistry/biology”
– Key words
•
•
•
•
“docking”, “docking pocket”, “docking site”
“survey”, “review”
“computer vision”
“3D surface reconstruction”
– Other sources
• Talk to people doing research in related area
• Talk to your supervisor who might have experience in
related area
5
My Project Organization (3)
• Some Results
Keywords
Database
Papers
protein docking + summary
Pubmed
10
docking pocket + summary
Pubmed
3
docking site + summary
Pubmed
7
docking pocket + survey
Pubmed
22 (6*)
Related papers listed by “Pubmed”
15
docking
wiki
1*
docking/dock
google
Inf
“Rusting of the lock and key model for
protein-ligand binding” (Ref from wiki)
Science
7*
6
My Project Organization (4)
• Searching from the E-Journal from library
Keywords
Database (computational)
Papers
docking/dock
Computational and Theoretical
Polymer Science (1997 – 2001)
0
docking/dock
Computational Biology and
Chemistry (2003 – 2008)
16
docking/dock
Computational Geometry (1991
– 2008)
0
docking/dock
Computational Geosciences
0
docking/dock
Computational Mathematics and
Modeling
0
7
My Project Organization (5)
• Searching from the engine linking to library
Keywords
Database
Papers
docking (within this content)
Sprintlink
5912
docking (title)
Sprintlink
238
docking and pocket (title)
Sprintlink
4
docking site (title)
Sprintlink
13
Paper distribution
Biomedical and Life Science (138), Chemistry (114), Computer
Application in Chemistry (96), Animal Anatomy /Morphology
/Histology (74), Physical Chemistry (73), Chemistry and Materials
Science (50), Life Science, general (27), Biomedicine general (23),
Computer Science (22), Biochemistry, general (18)
8
My Project Organization (6)
• Clustering Criteria
– Papers mentioned by wikipedia
– Paper related to “vision/computer vision”
– Papers published in important journals (Science,
Protein, IEEE transaction, CVPR, Graphics)
– Papers recommended by friends in related areas
or advisors
9
Outline
• Purpose and my project organization
• Introduction
• Protein docking based on shape
complementary method
• Software for molecular docking
• Discussion on MD simulation and shape
complementary methods
10
Introduction (1)
• Molecular docking:
– Docking is a method which predicts the preferred
orientation of one molecule to a second when
bound to each other to form a stable complex.
protein
ligand
Best-fit
Ref: http://upload.wikimedia.org/wikipedia/en/2/2b/Docking.gif
11
Introduction (2)
• Two types of docking
– Rigid docking
• Relate the molecules as rigid objects that cannot change
their spatial shape during the docking process [Haim Wolfson]
– Flexible (soft) docking
• Conformational changes take place between the bound and
unbound structures [Inbal Halperin etc. 2002]
• The surface of one molecule can penetrate or overlap the
other
Shape Complementary
methods
Rigid docking
Simulation Processes
Flexible docking
12
Introduction (3)
• Rigid docking
– Ligand-protein docking
• A large molecule
• A small molecule (the ligand)
• “key in lock” situation: the ligand is docking
in a cavity of the protein
– Protein-protein docking
• Two protein approximately the same size
• Usually the docking site is a more “planar”
surface
http://bioinfo3d.cs.tau.ac.il/Education/CS99b/class_notes/class6.html
13
Outline
• Purpose and my project organization
• Introduction
• Protein docking based on shape
complementary method
• Software for molecular docking
• Discussion on MD simulation and shape
complementary methods
14
Docking based on shape complementary
method
• This method consists of 4 stages
(1) Surface representation
• Surface construction
• Smoothing
(2) Feature calculation
Stage 1
Stage 2
• Curvature calculation
(3) Docking
• Searching procedure
• Matching algorithm
(4) Scoring scheme
• Geometric complementary criteria
Stage 3
Stage 4
Combine stage 2 and 3
[Inbal Halperin etc. 2002]15
Stage 1: Surface Representation (1)
• The surface is represented by its geometric
features. [Inbal Halperin etc. 2002]
– Michael Connolly, Analytical molecular surface
calculation, J. Appl. Cryst 16. 548 – 558, 1983
• The solvent molecule is modeled by a sphere
• The sphere is rolled over the molecule to generate a
smooth outer-surface contour
• Van der Waals surface and solvent-accessible surface
that is formed by rolling a solvent(probe) sphere over
the van der Waals surface [Tolga Can, 2006]
– Lin SL, etc. Molecular surface representation by
sparse critical points, Proteins, 18:94-101, 1994
• Surface is described by sparse critical points, defined by
the projection of the gravity center of a Connolly face
16
Stage 1: Surface Representation (2)
– Ausiello G, Cesareni G, Helmer-Citterich, M
ESCHER: A new docking procedure applied to the
reconstruction of protein tertiary structure, Proteins,
28:556 – 567, 1997
• The solvent-accessible surface is cut into parallel slices
• Each slice is transformed into a polygon to be used in a
rigid surface matching
– Tolga Can, Chao-I Chen, and Yuan-Fang Wang,
Efficient molecular surface generation using levelset methods, Journal of Molecular Graphics and Modelling, Vol
25, 4:442-454, 2006
• The level-set technique is used to enhance the speed
17
Stage 1: Surface Representation (3)
• Using volumetric properties [Michael Teschner etc,
1995, Thitiwan Srinark etc, 2003]
ra: Probe Radius + van
der Waal’s radius -solvent-accessible surface
rb:van der Waal’s radius - Van der Waals surface
discrete
Note: This should be 3D surface, here use 2D for explanation
18
Stage 1: Surface Representation (3)
• Find the volume surface
– Using the grid points
Fig: Generating the pseudo
contour using the existing grid.
[Michael Teschner etc, 1995]
Fig: Generating the contact part of the
surface, by base point (left), and the
reentrant part of the surface (right)..
19
Stage 1: Surface Representation (4)
– using the marching cubes algorithm to create a
surface mesh [Lorensen and Cline’ 87, Thitiwan Srinark
etc, 2003]
Image source:: http://kom.auc.dk/~zeek/kowd/mcubes/ind.html
20
Stage 1: Surface Representation (5)
– The surface is estimated
by triangles
– These triangles are
approximated from
intersecting points of the
surface and edges of
cubes
– The size of the cubes Î
the size of the triangles
Î the resolution of the
mesh (or surface)
Image source: Thitiwan Srinark etc 2003
21
Stage 1: Surface Representation (6)
Mesh Reconstruction
Cube Size
=6
Cube Size
=2
Image source: Thitiwan Srinark etc 2003
22
O
Stage 1: Surface
Representation (7)
1
• Mesh Smoothing
– To smooth surface
for analysis
– A surface
subdivision method:
“Loop Surface”
[Charles Loop, 1987]
2
3
4
Image source: Thitiwan Srinark23
etc 2003
Stage 1: Surface Representation (8)
• Results of loop subdivision
Mesh of
cube size = 5
Iteration # = 1
Iteration # = 2
Image source: Thitiwan Srinark etc 2003
24
Docking based on shape complementary
method
• This method consists of 4 stages
(1) Surface representation
• Surface construction
• Smoothing
(2) Feature calculation
Stage 1
Stage 2
• Curvature calculation
(3) Docking
• Searching procedure
• Matching algorithm
(4) Scoring scheme
• Geometric complementary criteria
Stage 3
Stage 4
Combine stage 2 and 3
[Inbal Halperin etc. 2002]25
Stage 2: Feature Calculation (1)
• Features used to characterize the potential
docking sites
– They are determined by the surface representation
– Points suggested:
• Interesting points: a cap, belt, or pit for convex, toroidal
and concave faces [Connolly, 1983]
• Critical points of the facets and the associated surface
normal [Raquel Norel etc. 1994]
Features
In continuous form
? Discretize
? Intermedia
features
Features
In discrete form
26
Stage 2: Feature Calculation (2)
• Curvatures are used to represent the
characteristic features [Connolly 1983, Michael Teschner
etc, 1995, Thitiwan Srinark etc, 2003]
– Working on discrete data
– Intermedia features
• Total, Gaussian and mean curvatures
– Surface (facet) type is classified from the curvature
information
• Determined by the combination of the intermedia features
27
Stage 2: Feature Calculation (3)
• Total Curvature
[Mangan & Whitaker’ 99]
N = the number of triangles associated
with the vertex
[xt, yt, zt] = normal of triangle t
C = the covariance matrix
D = the total curvature which is equal
to the norm of C
28
Stage 2: Feature Calculation (4)
• Gaussian Curvature
[Falcidieno & Spagnuolo’ 92]
the angle deficit
a constant 3
the total area of the
adjacent triangles
source::http://www.cse.ucsc.edu/research/slvg/mesh.html
29
Stage 2: Feature Calculation (5)
• Mean Curvature
[Desburn et. al’ 99]
N(i) = set of adjacent polygons
around xi
A = sum of the areas of triangles
in N(i)
source::http://www.cse.ucsc.edu/research/slvg/mesh.html
30
Stage 2: Feature Calculation (6)
• Surface classification
– The surface type (T) of a vertex is classified using
Gaussian curvature (K) and mean curvature (H)
[Besl and Jain ’88, Thitiwan Srinark etc, 2003]
31
Stage 2: Feature Calculation (7)
• Surface Segmentation
– To reduce calculation
– Surface meshes are segmented based on distance
between vertices and surface types
– Four surface (patch) types are defined [Thitiwan
Srinark etc, 2003]
PEAK-TYPE
FLAT-TYPE
SADDLE-TYPE
PIT-TYPE
32
Total Curvature
blue < 0.01 < green < 0.1 < red
Gaussian Curvature
red > 0.0; blue < 0.0
Mean Curvature
red < 0.0; blue > 0.0
Surface Type
Surface Type
Segmented Mesh 33
Image source: Thitiwan Srinark etc 2003
Docking based on shape complementary
method
• This method consists of 4 stages
(1) Surface representation
• Surface construction
• Smoothing
(2) Feature calculation
Stage 1
Stage 2
• Curvature calculation
(3) Docking
• Searching procedure
• Matching algorithm
(4) Scoring scheme
• Geometric complementary criteria
Stage 3
Stage 4
Combine stage 2 and 3
[Inbal Halperin etc. 2002]34
Stage 3: Docking (1)
• Searching procedure
– Rotation and translation are
computed from each pair of
matchable segments
• Coarse docking
• Fine docking
• Matching algorithm
– This is determined by the
surface representation
– It will influence the searching
procedure
– It is a scoring scheme
Coarse docking
Matching
algorithm
A first set of
candidates
Increasing
resolution
Fine docking
Matching
algorithm
A second set
of candidates
35
Stage 3: Docking (2)
• Two segments are matchable
(1) “holes” and “knobs” [Inbal Halperin etc, 2002]
(2) Critical points with associated normals [Inbal Halperin
etc, 2002]
• Share the same internal distance
• If superimposed, have opposing surface normals
(3) Cavities in the surface of the receptor
[Haim Wolfson]
• For protein-ligand docking
(4) The complementary types
[Thitiwan Srinark etc, 2003]
• One is peak-type, and the other is pit-type
• Both are saddle-type
• Both are flat-type
36
Stage 3: Docking (3)
• Coarse Docking Results
Protein Info
No.
PDB ID
1 1G0B
1G0BA
1G0BB
2 1ACB
1ACBE
1ACBI
3 1SBN
1SBNE
1SBNI
#Atom
1069
1134
1769
522
1938
525
Mesh Info
MC Size #Vertices
#Edges
PreDock
#Seg
Seg. Time
#Results
Rank
RMSD
Time
4
4
5
5
6
6
1704
1908
1001
1070
699
746
5118
5724
3000
3204
2088
2232
105
122
64
64
34
48
3.210
3.940
1.450
1.620
1.509
1.713
10000
3618
9.266
8.468
4749
9
10.750
3.538
1344
875
12.740
0.954
5
5
6
6
1434
506
986
370
4296
1512
2952
1104
97
22
72
16
2.274
0.663
1.171
0.732
1957
82
8.916
1.414
627
302
7.832
0.582
4
4
5
5
2172
830
1386
494
6514
2448
4152
1476
135
48
87
24
4.726
1.100
2.258
0.738
7153
1081
7.310
4.335
2080
180
10.030
1.431
[Thitiwan Srinark etc, 2003]
37
Stage 3: Docking (4)
• Fine docking
results (1)
(Protein docking)
RMSD
7.04234
1G0B
RMSD
6.65955
[Thitiwan Srinark etc, 2003]
6ADH
38
Stage 3: Docking (5)
• Fine docking
results (2)
(Protein docking)
RMSD
6.51693
RMSD
8.55502
[Thitiwan Srinark etc, 2003]
3TIM
1A2Y
39
Docking based on shape complementary
method
• This method consists of 4 stages
(1) Surface representation
• Surface construction
• Smoothing
(2) Feature calculation
Stage 1
Stage 2
• Curvature calculation
(3) Docking
• Searching procedure
• Matching algorithm
(4) Scoring scheme
• Geometric complementary criteria
Stage 3
Stage 4
Combine stage 2 and 3
[Inbal Halperin etc. 2002]40
Stage 4: Scoring scheme (1)
• Why we need the scoring scheme (function)
[Inbal Halperin etc. 2002]
– Reason:
• A search algorithm may produce solutions that are
unmanageable for any practical need
– Purpose of the scoring function:
• To discriminate between “correct” native solutions with
low rmsd from crystal complex and others within a
reasonable computation time
1. To check their own geometric properties
Purpose
2. To compare the complex with a know structure
To design the scoring schemes
41
Stage 4: Scoring scheme (2)
• Scoring function based on geometric properties
[Inbal Halperin etc. 2002]
– Area shared by two matching dots
[Lin et al. 1994 Hou et
al 1999]
– The number of matching dots
– Using grid cubes to represent the surface -> the
overlap between surface cells from different
molecules
• Problem
– A lack of reliable method for quickly locating correct
solutions, especially if the binding site is unknown
42
Stage 4: Scoring scheme (3)
An example of scoring function based on geometric
properties [Thitiwan Srinark etc, 2003]
– Mesh Based Scoring
• (docking segment area of mesh surfaces) / (distance between
the two surfaces) (fast)
– Volume Based Scoring
• Compute how well the two surfaces are intersected to each
other (slow)
– Their combinations
Ematch = αEmesh − β Evolume
Note: they are also used in the matching process
to remove some unreasonable results
43
Stage 4: Scoring scheme (4)
• The relationship between the docking and
scoring stages
[Inbal Halperin etc. 2002]
44
Outline
• Purpose and my project organization
• Introduction
• Protein docking based on shape
complementary method
• Software for molecular docking
• Discussion on MD simulation and shape
complementary methods
45
Software for molecular docking
• Softwares
– DOCK
– HotDock
– IDock
– AUTODOCK
– ZDOCK
46
Outline
• Purpose and my project organization
• Introduction
• Protein docking based on shape
complementary method
• Software for molecular docking
• Discussion on MD simulation and shape
complementary (SC) methods
47
Discussion on MD and SC (1)
• Comparison between MD and SC
Shape complementary
MD simulation
Rigid docking
Yes
Yes
Soft docking
Special process
(Not accurate)
Yes
Calculation
Relative small
Large
Speed
Fast*
Slow
Parallel computation
?
Yes
Protein chain
Need special process
The same
*: It depends on the resolution of the surface representation
48
Discussion on MD and SC (2)
• How to handle soft docking by SC methods?
[Inbal Halperin etc. 2002]
– To add a soft belt at the contacting surfaces,
where one molecule can penetrate or overlap the
other
Ligand (Protein)
Protein
Rigid
Rigid
Flexible (soft)
Rigid
Flexible (soft)
Flexible (soft)
49
Discussion on MD and SC (3)
• How to dock protein chain by SC methods?
[Inbal Halperin etc. 2002]
– To segment protein chains into different parts
– To dock each segment
• Dock each individual parts
• Grow patches from the potential docking sites, compare
them with the segments from the other protein
– The docking of each segment gives a vote to the
total score
50
Discussion on MD and SC (4)
• Can we combine the two
methods?
– Yeah!
Shape Complementary
MD simulation
– Why?
•
•
•
•
Rigid / soft docking
Cost /calculation
Speed
Accuracy
– Examples
• [Jiang & Kim 1991]
• [Katchalski-Katzir et al. 1992]
Coarse docking
Matching
algorithm
A first set of
candidates
Increasing
resolution
Fine docking
Matching
algorithm
A second set
of candidates
51
Discussion on MD and SC (5)
• The strategy we used in our
class
MD simulation
– Benefit
• Do not worry about the size of the
protein (chain or domain)
• Do not worry about rigid/soft docking
• Can easily use parallel computation
Low energy
states
– Disadvantages
• Manually check the results
Check
52
Discussion on MD and SC (6)
– Potential solutions
• At the end of the soft docking, we can reconstruct the
surface of each protein
• Segment each protein into patches with different types
• Find the patches from different proteins with distances
less than a Threshold
• Check the patch type or the area of the contacting
surface
– The patch type should match
– The area of the contacting surface should greater than a
threshold
53
Q&A
54
Reference (1)
[1] Thitiwan Srinark, Chandra Kambhamettu, An approach for 3D
segmentation on multiresolution surfaces, Proceedings of International
Conference on Intelligent Technologies 2003.
[2] W.E.Lorensen and H.E. Cline, Matching cubes: a high resolution 3D
surface construction algorithm, Computer Graphics, 21(4), 1987.
[3] Protein Data Band, http://www.rcsb.org/pdb/.
[4] A.P.Mangan and R.T. Whitaker, Partitioning 3D surface meshes using
watershed segmentation, IEEE Transactions on Visualization and
Computer Graphics, 5(4), 1999
[5] Inbal Halperin, Buyong Ma, Haim Wolfson, and Ruth Nussinov,
Principles of docking: an overview of search algorithms and a guide to
scoring functions, PROTEINS: Structure, Function, and Genetics
47:409-443, 2002
[6] Michael Connolly, Analytical molecular surface calculation, J. Appl.
Cryst 16. 548 – 558, 1983
[7] Michael Connolly, Solvent-accessible surfaces of proteins and nucleic
acides, Science, 221:709 – 713, 1983
[8] Lin SL, Nussinov R. Rischer D, Wolfson HJ. Molecular surface
representation by sparse critical points, Proteins, 18:94-101, 1994
55
Reference (2)
[9] Ausiello G, Cesareni G, Helmer-Citterich, M ESCHER: A new docking
procedure applied to the reconstruction of protein tertiary structure,
Proteins, 28:556 – 567, 1997
[10] Tolga Can, Chao-I Chen, and Yuan-Fang Wang, Efficient molecular
surface generation using level-set methods, Journal of Molecular
Graphics and Modelling, Vol 25, 4:442-454, 2006
[11] Michael Teschner and Christian Henn, Mapping volumetric properties
on molecular surfaces in real-time, Proceedings of the 28th Annual
Hawaii International Conference on System Sciences ,1995
[12] Haim Wolfson, Structural Bioinformatics, online:
http://bioinfo3d.cs.tau.ac.il/Education/CS99b/class_notes/
[13] Charles Loop, Smooth subdivision surfaces based on triangles, Thesis,
University of Utah, 1987
[14] Raquel Norel, Daniel Fischer, Haim J.Wolfson and Ruth Nussinov,
Molecular surface recognition by a computer vision-based technique,
Protein Engineering, 7:39 – 46, 1994
56
Reference (3)
[15] P.J.Besl and R.C.Jain, Segmentation through variable-order surface
fitting, IEEE Transactions on Pattern Analysis and Machine Intelligence,
10(2), 1988
[16] Hou T, Wang J, Chen L, Xu X. Automated docking of peptides and
proteins by using genetric algorithm combined with a tabu search,
Protein Engineering, 12:639 – 647, 1999
[17] Jiang, F. and Kim, S.H., Soft Docking : Matching of Molecular Surface
Cubes, J. of Mol. Bio., vol. 219:79-102, 1991
[18] Katchalski-Katzir, E., Shariv, I. , Eisenstein, M., Friesem, A.A., Aflalo,
C., Vakser, I.A., Molecular Surface Recognition: Determination of
Geometric Fit between Protein and their Ligands by Correlation
Techniques", Proc. of the Nat. Acad. Sc., USA, vol. 89:2195-2199, 1992
57
Van der Waal Surface
[Thitiwan Srinark etc, 2003]
Surface Normal
Surface Type
58
Surface Segment
Van der Waal Surface
[Thitiwan Srinark etc, 2003]
Surface Normal
Surface Type
59
Surface Segment
Van der Waal Surface
[Thitiwan Srinark etc, 2003]
Surface Type
Mean Curvature
Gaussian Curvature
60
Coarse Docking
• (Ri, Ti) are computed from each pair of
matchable segments
• Define Matchable Segments: two segments
are matchable if
– (i) one segment has peak-type, and the other one
has pit-type,
– (ii) both segments are saddle-type, or
– (iii) both segments are flat-type
61
[Thitiwan Srinark etc, 2003]
Coarse Docking
Algorithm
62
[Thitiwan Srinark etc, 2003]
Coarse Docking Results
Protein Info
No.
PDB ID
1 1G0B
1G0BA
1G0BB
2 1ACB
1ACBE
1ACBI
3 1SBN
1SBNE
1SBNI
#Atom
1069
1134
1769
522
1938
525
[Thitiwan Srinark etc, 2003]
Mesh Info
MC Size #Vertices
#Edges
PreDock
#Seg
Seg. Time
#Results
Rank
RMSD
Time
4
4
5
5
6
6
1704
1908
1001
1070
699
746
5118
5724
3000
3204
2088
2232
105
122
64
64
34
48
3.210
3.940
1.450
1.620
1.509
1.713
10000
3618
9.266
8.468
4749
9
10.750
3.538
1344
875
12.740
0.954
5
5
6
6
1434
506
986
370
4296
1512
2952
1104
97
22
72
16
2.274
0.663
1.171
0.732
1957
82
8.916
1.414
627
302
7.832
0.582
4
4
5
5
2172
830
1386
494
6514
2448
4152
1476
135
48
87
24
4.726
1.100
2.258
0.738
7153
1081
7.310
4.335
2080
180
10.030
1.431
63
Geometric Matching Energy
Ematch = αEmesh − β Evolume
Nm
Emesh = ∑ α 0 vi − v j +α1 K i − K j
i =1
+ α 2 H i ± H j + α 3 Di − D j + α 4 1 ± ni ⋅ n j
[Thitiwan Srinark etc, 2003]
Msi :surface & space-fill (with penalties) matrix Oi
Msj :surface matrix of Oj
Nv :number of intersecting elements
Nm : number of corresponding vertices
64
ICP Based Docking Algorithm
65
[Thitiwan Srinark etc, 2003]