John's lecture slides

Interpreting protein structure to
discover new small molecules for
biology
John J. Irwin
UCSF Pharmaceutical Chemistry
12/2/2010 @ University of San Francisco
Acknowledgements
Kong Nguyen
Michael Mysinger
Lab members
Pascal Wassam, Ryan Coleman,
Michael Mysinger
Shoichet Lab
Previous lab members
Eddie Cao, Francesco Colizzi, Niu Huang,
Michael Carchia, Kong Nguyen
SFSU computer science students
Teague Sterling, Cassidy Kelly, Gurgen Tumanian
USFCA computer science students
Laurgen Assour, Nathalie Le Guay, Dave Rice
NIH for funding
What is this molecule?
What is this molecule?
11 M patients
25 M potential patients
25 $Bn market
Atorvastatin
“Lipitor”
$12.4 billion /year
Lowers cholesterol
Simvastatin
“Zocor”
Chemical space
Lovestatin
•
•
•
•
•
How big?
9 atoms ~ 106
11 atoms ~ 107
13 atoms ~ 109
35 atoms ~ 1060
Rosuvastatin aka “Crestor”
Mevastatin
Pitavastatin
Fluvastatin
Cerivastatin
Atorvastatin (Lipitor)
(41 atoms)
Screening for Novel Inhibitors by
Molecular Docking .
dock
Test high-scoring
molecules
Calculating orientations in DOCK
hot spots on protein surface
Match ligand atoms onto
Hot spots using internal distances
O
O
Thousands of orientations per molecule
Scoring orientations
ΔGbind = ΔGinteract - ΔGsolv, L - ΔGsolv, R
ΔGinteract = Σ(Aij/Rij12 –Bij/Rij6) + qiPj)
ΔGsolv,L
= (1/D0 - 1/Dw)/2r ΣΣQiδqj + ΔHnp (precalculated)
Scoring on a grid
Lennard-Jones Potential
20.000
Energy (kcal/mol)
f ( x) =
A B
−
r 12 r 6
X = 1.50Å
-O
0.00
0.50
1.00
1.50
-5.000
Radius (Å)
2.00
2.50
3.00
O
The docking problem: flexible fits
• Docking scales badly with degrees of
freedom
– Configuration, conformation, chemistry
O
• Ligand & Protein, conformations α 3N
1.0E+8
Time (minutes)
1.0E+7
O
1.0E+6
1.0E+5
1.0E+4
1.0E+3
1.0E+2
1.0E+1
1.0E+0
0
2
4
6
8
Rotatable bonds
10
12
14
O
Cl
hierarchical docking
C
A
Flexible docking:
27 confs
x3 atoms
81 atom positions
Hierarchical docking:
27 confs
3C + 3A + 9B
15 atom positions
81 evaluations
9 evaluations
B
B1
C1
B2
B4
A1
A2
O
C2
C3
B3
A3
B9
B6
B7
B8
B5
neglects internal
energies
Why is Docking Difficult?
Binding sites are complicated
Lots of interactions to consider
Everything in competition with water
Kd = e-ΔG/RT
ΔGbind =
ΔGinter
small
large
ΔGsolv
-
large
x
x
x
x
P O
-O
O
x
x
x
x
x x
+
+
x x
x
x
x
x
P O
-O
O
+ +
Predicted docking pose
tested by crystallography (1)
CTX-M beta lactamase. Chen Y, Shoichet BK. Molecular docking and ligand specificity
in fragment-based inhibitor discovery. Nature Chemical Biology 5, 358-364 (2009).
Teotico DG, Babaoglu K, Rocklin GJ, Ferreira RS, Giannetti AM, Shoichet BK. Docking
for fragment inhibitors of AmpC ß-lactamase. PNAS 106 (18), 7455-60 (2009).
Predicted docking pose
tested by crystallography (2)
Babaoglu K, et al. Austin CP, Shoichet BK. Comprehensive mechanistic analysis of hits
from high-throughput and docking screens against beta-lactamase. J Med Chem 51,
2502-11 (2008).
Graves AP, Shivakumar DM, Boyce SE, Jacobson MP, Case DA, Shoichet BK.
Rescoring docking hit lists for model cavity sites: predictions and experimental testing. J
Mol Biol 377, 914-34 (2008). T4 lysozyme. L99A
Predicted docking pose
tested by crystallography (3)
T4 lysozyme. L99A/M102Q
Cytochrome C Peroxidase W191G
Graves AP, Shivakumar DM, Boyce SE, Jacobson MP, Case DA, Shoichet BK.
Rescoring docking hit lists for model cavity sites: predictions and experimental testing. J
Mol Biol 377, 914-34 (2008).
Predicted docking pose
tested by crystallography (4)
Hermann JC, Marti-Arbona R, Fedorov AA, Fedorov E, Almo SC, Shoichet BK, Raushel
FM. Structure-based activity prediction for an enzyme of unknown function. Nature
448,775-9 (2007).
Use of DOCK to predict the substrate of an enzyme of previously unknown function.
Predicted docking pose
tested by crystallography (5)
Leu177
Leu177
His175
Asp235
Wat308
Asp235
Asp235
Met230
Asp235
Met230
Met230
Met230
Brenk R, Vetter SW, Boyce SE, Goodin DB, Shoichet BK. Probing molecular docking in
a charged model binding site. J Mol Biol 357, 1449-70 (2006).
CCP W191G
L118
F114
V87
L84
L121
V111
Q102
V103
Wei BQ, Weaver L, Ferrari AM, Matthews BM, Shoichet BK. Testing a flexible-receptor
docking algorithm in a model binding site. J Mol Biol 337 (5), 1161-82 (2004). M102A
Predicted docking pose
tested by crystallography (6)
Wei BQ, Baase WA, Weaver LH, Matthews BW, Shoichet BK. A model binding site for
testing scoring functions in molecular docking. J Mol Biol 322, 339-55 (2002). M102Q
Powers RA, Shoichet BK. Structure-based approach for binding site identification on
AmpC ß-lactamase. J Med Chem 45, 3222-34 (2002).
Backlog of uninterpreted proteins
Unknown function of many proteins
Why is Docking Difficult to Automate?
Structure
Interpretation
of structure
Why is Docking Difficult to Automate?
Database
preparation
Structure
Interpretation
of structure
The ZINC Database
http://zinc.docking.org
18 million compounds
commercially available
structures calculated
multiple conformations
properties (charge, solv, etc…)
links to suppliers
Free to the community
Multiple subsets
13.8 M drug-like (Lipinski)
3.8 M lead-like (Oprea…)
385 K fragment-like (Astex, …)
Availlable in popular formats
SMILES, SDF, mol2, flexibase
Irwin & Shoichet JCIM 2005
Rapid Turnover!
Updated continuously (~10,000 new today)
Over 2 million new compounds per year
Over 1 million depletions per year
New Search Tool
Search 19M+ molecules in seconds
Why is Docking Difficult to Automate?
dock
Database
preparation
Structure
Running docking:
Site preparation
Software configuration
Parameter choices
File manipulations
Interpretation
of structure
Automated
Docking
Pipeline
Irwin*, Shoichet, Mysinger et al,
J Med Chem, 2009 52(18),
5712-5720
Try Docking Four Ways
Sampling
Scoring
Coarser
Finer
Polarized AMBER
#1
#2
#3
#4
Thus four docking runs with four different parameter sets.
Start with a PDB Code
Pick a PDB Code for Docking. Click DOCK!
Why is Docking Difficult to Automate?
% of known ligands found
100
dock
ACE (automated)
80
60
40
20
0
0.1
1
10
% of ranked database
100
Assessment.
Did docking work?
Structure
Interpretation
of structure
Assessment: Review Docking Hits
Browse using Chimera or PyMOL
DUD
A Multi-Ligand Metric
13,500 Protein targets
X-ray, <= 2.5 A
Non-covalent organic ligand
http://www.rcsb.org/pdb
733
Ligand
sets
7408
PDB
codes
1825 Ligand sets for targets
5 or more ligands
10uM affinity or better
http://www.ebi.ac.uk/chembl
Multi-Ligand
Enrichment Results
•
•
•
•
7408 PDB structures with ligands
4826 Automatic docking starts (65%)
4018 Automatic docking completes (54%)
2500+ Good enrichment => Docking works
How Well Does Docking Work?
Less Hopeless Than We Feared
% of known ligands found
100
• 134/345 targets attempted
• Showing results for 114
ACE (automated)
Adj.LogAUC
80
60
40
20
0
0.1
45
adj.logAU
Adj.LogAUC
40
8
(7%)
35
30
25
20
65
(57%)
15
10
5
0
-5 1
41
(36%)
11
21
31
41
51
61
-10
targets
71
81
91
101
111
1
10
%
database
% of
of ranked
ranked decoys
100
Opportunities
• Research
– Discover new ligands for proteins
– Discover protein function using docking
• Programming / engineering
– Website (re-)design and implementation
– Database (re-)design and implementation
Summary
DUD
• Docking has been
automated
• Compelling results for more
than half of all targets for
which data are available
• Algorithms and hypotheses
can now be tested on a
massive scale
• Docking is freely accessible