PPT - Bioinformatics.ca

Small Molecule-Protein
Interactions
Howard Feldman
The Blueprint Initiative
Toronto, Ontario
[email protected]
Lecture 4.3
1
Drug Discovery Pipeline
• Most drugs are small molecules, and the interactions
they make with proteins determine their effects, and
toxicity, to the human body
• Clinical trials are most expensive part of the pipeline
– if failure can be predicted before this point, it saves
time and money
Lecture 4.3
2
Drug Discovery Pipeline
• It is of utmost importance to identify lead
compounds in the early stages of drug
discovery that will be most likely to succeed
• Recent study by Tufts Center for the Study of
Drug Development showed that bringing one
drug to market costs an average of $800M!
• 5/5,000 potential new drugs tested on
animals reach clinical trials, and only one
ultimately wins FDA approval
Lecture 4.3
3
How small is a small molecule?
• Small molecule generally
considered anything which
may interact with proteinDNA
• Must be biologically
relevant
• Examples include ions,
polysaccharides, peptides,
drugs
Lecture 4.3
4
Small Molecules
• No absolute maximum size, though drug-like
molecules often have molecular weight of 500
Da or less
• However can get complex – branched poly
saccharides, cyclic antibiotics, etc.
• Normally not interested in: detergents,
buffers, solvents, denaturants, non-biological
ions
Lecture 4.3
5
How Many?
• Recently a number of public small molecule
databases have become available:
– CAS Registry – 26,000,000 substances
– Cambridge Structural Database – 300,000 3D
structures
– PDBSum – 6700 3D ligands from PDB
– NCI databse – 250,000 molecules
– NCBI PubChem – 700,000 compounds
– ChemBank – 1,100,000 molecules
• Problem: data can be very messy, sparse
Lecture 4.3
6
Popular small molecules and domains
A
B
1
2
3
4
5
6
7
8
9
10
SM
Mg2+
ATP
Mn2+
ADP
ClCa2+
Zn2+
AMP-PNP
AGS
GTP
Total
% Total
0.38
0.26
0.26
0.25
0.24
0.23
0.22
0.21
0.19
0.18
2.43
1
2
3
4
5
6
7
8
9
10
Identifier
pfam00004
pfam01443
pfam00910
pfam00270
pfam00680
pfam01695
pfam00005
pfam00437
pfam03288
pfam05729
Domain
AAA
Viral helicase
RNA helicase
DEAD
RNA pol
IstB
ABC
GSPII
Pox
NACHT
Total
% Total
0.31
0.29
0.27
0.27
0.23
0.23
0.23
0.22
0.20
0.20
2.45
• Not surprisingly, divalent cations and ATP are the
most common small molecules found interacting with
proteins
• AAA is an ATPase domain, the next three are all
helicases, which bind various nucleotides as well
Lecture 4.3
7
Toxicity
• Caused when drug interferes with biological
pathway(s) in the host
• Less side-effects, the better
• Must be determined in early stages of
discovery, or very costly
• Hence predicting toxicity is very important
and desirable
• Boils down to predicting interactions, or
rather, non-interactions
Lecture 4.3
8
Predicting Toxicity
• Inverse docking – Chen and Zhi developed a
database of cavities in PDB structures
• INVDOCK searches cavities for potential
interactions to ligand of interest, using scoring
function
– Compare energy to absolute threshold, as well as
energy of observed PDB ligand(s) at that site
Lecture 4.3
9
Example – 4H-Tamoxifen
• Used to treat breast cancer
• INVDOCK finds 22 putative protein targets at least 10
of which have some experimental backing (including
the ones shown here):
– Estrogen receptor (the drug target)
– Alcohol dehydrogenase (enhances sedative effect of
alcohol)
– IgG light chain (modulates immune response)
– 17b-hydroxysteroid dehydrogenase (tumor regression)
– GST (suppressed activity, genotoxicity, carcinogenicity)
Lecture 4.3
10
Drug Docking
Lecture 4.3
11
Drug Docking
• Shares much in common with structure
prediction
• Two components
– Exploration of conformational space
– Scoring function
• Plus one additional component
– Locating the binding site
Lecture 4.3
12
Drug Docking – Level of Detail
• Rigid body docking – protein remains fixed,
small molecule has 6 degrees of freedom
(DOF) – 3 translational and 3 rotational
Lecture 4.3
13
Drug Docking – Level of Detail
• Flexible-ligand docking – protein remains fixed, small
molecule has standard 6 DOF plus internal DOF –
can rotate about bonds
– More time consuming, but necessary for complex ligands if
binding conformation is unknown
• Flexible docking – as above, and in addition protein
atoms in neighbourhood of binding site can move
– Largest conformational space to search
– Often done by using multiple static protein conformers, and
treating each by flexible ligand docking
– Often important when docking to apo-protein e.g. allosteric
effects
Lecture 4.3
14
Drug Docking – Level of Detail
• Some methods such as FlexX perform
incremental construction within the binding
pocket rather than docking per se
Lecture 4.3
15
Drug Docking – Techniques
• Drug docking algorithms share much with
protein structure prediction, and include:
–
–
–
–
–
–
Monte Carlo search
Molecular Dynamics
Genetic Algorithms
Fragment Assembly
Tabu Search
Many more…
Lecture 4.3
16
Drug Docking
• When ligand and target are known, can allow
complete flexible docking
• For HTS, can usually only afford rigid body for initial
pass
• Location to dock to on protein target may be known
ahead of time, or may be computed through binding
pocket detection
– Often binding site can be predicted if 3D structure is
available using cavity-detection algorithms
• Search must be efficient, as with protein folding,
since exhaustive search is not possible
• Scoring function must be selective and efficient
Lecture 4.3
17
Drug Docking Example
• Study by Thornton’s group (Nature Biotech.
22(8) (2004) p 1039-1045
• Took 120 enzymes and 125 metabolites
from EcoCyc – subset of 29 complexes
have crystal structures
• Docked all-vs-all with AUTODOCK
Lecture 4.3
18
• Energy plots for
docking (a) and
reverse docking (b)
for subset of 29 with
crystal structures;
triangles represent
crystal complex
• Note from (a),
enzymes are not that
selective about
substrate, nor are
substrates that
specific for enzyme in
(b)
Lecture 4.3
19
Drug Docking Example
• Computed P value –
ability of substrate or
enzyme to recognize
its partner based on
energy distribution
• Now with 4
exceptions, the
docked pairs show
either enzyme OR
substrate OR both
are specific
Lecture 4.3
20
Transition state
Lecture 4.3
21
Transition state
• Most potent inhibitors
are not substrate
analogues but rather
transition-state
analogues
• Important to remember
when screening
compounds
Lecture 4.3
22
Interaction Databases
• BIND (Protein-ligand interactions from PDB
and literature, SLRI)
• Het-PDB Navi (Protein-ligand interactions
from PDB, Nagahama Inst. Bio-Science)
• EcoCyc (metabolic pathways, SRI)
• KEGG (pathway database, Kyoto)
Lecture 4.3
23
Blueprint’s
Small Molecule Resources
•
BIND-3DSM Division
– 23,584 Filtered Small Molecule – Biopolymer interactions, automatically
derived from crystal structures
– Biologically insignificant records removed (i.e. crystal packing, non-biological
ions)
– Published: Biopolymers. 2001-2002; 61(2):111-20
•
SMID
– 48886 records matching 4283 small molecules (from PDB structures) to 2807
protein families (CDD, SMART, PFAM)
•
SMID-BLAST
– BLAST calibre tool to attach small molecule binding annotation (residue-level)
to genomic sequence
•
SMID-Genomes
– SMID-BLAST vs all completely sequenced genomes
– 9.6 Million high-quality small molecule interaction annotations mapped to
sequences
– Database interface to browse/compare/investigate small molecule specificity
across organisms
Lecture 4.3
24
www.bind.ca
A 3DSM Record
Lecture 4.3
25
BIND record – binding site
Lecture 4.3
26
Interaction Example
• Taxol is derived from natural products, and
was discovered to be effective against certain
types of cancer
• Interacts with tubulin and
stabilizes tubules forming
cell cytoskeleton, preventing
mitosis and leading to cell death
Lecture 4.3
27
Visualizing Binding Sites
Lecture 4.3
28
SMID
• http://smid.blueprint.org/
• Small Molecule Interaction Database
• Matches small molecule binding sites in structures
to protein domains in NCBI's Conserved Domain
Database
• 4283 small molecules from PDB
Lecture 4.3
29
Creating SMID Records
Small Molecule A (smA)
Protein A (ProA)
Start with an MMDB record
(PDB record) containing more
than one “molecule”.
Small Molecule B (smB)
ProA
401
336
345
357
371
321
62
74
83
Find atoms from one molecule in proximity
(0.5 Å) of atoms from another molecule.
Interactions Found:
1)
Residues 62, 74 & 83
interacting with smA.
2)
Residues 321, 336, 345, 357,
371 & 401 interacting with
smB.
smA
smB
Lecture 4.3
30
44
62
73
86
31
98
105
123
Creating SMID Records
RPS-BLAST all sequences found
to interact with a small molecule
in order to obtain alignments
with conserved domains (DomA &
DomB).
401
336
345
357
371
321
DomB
62
74
83
DomA
ProA
Overlay small molecule – protein
interaction on aligned conserved domains.
smA
Interactions Found:
1)
DomA (residues 98, 105 &
123) interacting with smA.
2)
DomB (residues 31, 44, 62,
73, 86 interacting with smB.
44
62
73
86
31
98
105
123
smB
DomA
DomB
smA
Lecture 4.3
31
smB
Use Cases for SMID
• Domain Studies
– Binding site analysis
– Domain family binding site conservation
– Small molecule to the domain families that bind
• Structural Genomics
– Domain/ligand/binding site identification
• Some ligands go over domain boundaries
– Easier pattern recognition for interactions
– Quickly identify candidate co-crystalization ligands
Lecture 4.3
32
Taxol ligand conservation in
Tubulin/FtsZ domain family
Lecture 4.3
33
SMID-BLAST
• Uses RPS-BLAST (unmodified) with a new scoring scheme to
improve domain family hits using specific ligand conservation
information
• Validation - 1652 new unique interactions deposited into PDB
– 1027 (62%) of these interactions are predicted within our selected
ligand score cutoff
– Of these 262 (25%) were top predictions
• This is very good, as the test set is not comprehensive…
– we do not have a set of all possible ligands to each protein crystal
structure
– we can only use exact small molecule matches (not similar
molecules, e.g. ATP vs ATP-gamma-S)
• Specificity – able to distinguish closely related Trp- and Tyraminoacyl-tRNA synthetases that hit the same protein domain
families
Lecture 4.3
34
Use Cases for SMID-BLAST
• Annotation of Newly
Sequenced Genomes
– New enzyme discovery
– Rhodococcus genome
• William Mohn (UBC)
• Metabolic diversity
• PCB degradation
• Drug Docking
– Can help prioritize
experiments
• Homology Modelling
– May help in template
selection phase
Lecture 4.3
35
SMID-BLAST Results: Summary
Lecture 4.3
36
Summary
• Understanding and cataloguing biopolymer-small
molecule interactions is critical to the drug discovery
process
• Drug docking can help explain toxicity and side
effects, and can be useful in understanding the forces
behind interactions
• Transition state analogues make the best inhibitors
• Tools like SMID-BLAST provide a simple, powerful
way to predict what ligands may interact with a
protein, and vice-versa
Lecture 4.3
37