Molecular Docking 2014_wrk1

Molecular Docking
Tuomo Laitinen, PhD (Chem.)
School of Pharmacy
UEF
Flow of slides…
•
•
•
•
•
Introduction
Search algorithms
Scoring functions
Reliability / limitations
Performance
University of Eastern Finland
School of Pharmacy
Molecular Docking
“Docking s tudies are computational techniques for the exploration of the possible binding modes of a substrate to a given receptor, enzyme or other binding site.”
(PAC, 1997, 69, 1137 (IUPAC Recommendations 1997)) on page 1142)
“Docking is a computational technique that samples conformations of small molecules in protein binding sites;; s coring functions are used to assess which of these conformations best complements the protein binding site.” (Warren et al. J .Med.Chem. 2006)
Slide made by Heikki Salo.
In theory, there is no difference between theory and practice. But, in practice, there is.
(Jan L. A. van de Snepscheut/Yogi Berra)
What is docking?
• Protein-­‐Small ligand
• Protein-­‐Protein
Protein-­protein interaction is key to understanding cellular events
Kuopion Yliopisto Farmaseuttinen kemia
Key-­‐lock theory
• The specific action of an protein with a single substrate can be explained using a Lock and Key analogy first postulated in 1894 by Emil Fischer. – the lock is the enzyme and the key is the substrate.
– only the correctly sized key (substrate) fits into the key hole (active site) of the lock (enzyme).
University of Eastern Finland
School of Pharmacy
Key lock theory
9 June, 2016 Tuomo Laitinen
Molecular databases
Key lock theory
9 June, 2016 Tuomo Laitinen
Molecular databases
Induced Fit Theory
• not all experimental evidence can be adequately explained by using the key-­‐lock theory.
• assumes that the substrate plays a role in determining the final shape of the enzyme and that the enzyme is partially flexible. – explains why certain compounds can bind to the enzyme but do not react because the enzyme has been distorted too much. – other molecules may be too small to induce the proper alignment and therefore cannot react
– only the proper substrate is capable of inducing the proper alignment of the active site
University of Eastern Finland
School of Pharmacy
Molecular recognition
• A central phenomenon in biochemistry.
–
–
–
–
enzymes and their substrates
protein receptors and ligands
antigens and their antibodies
etc.
• Approaches to investigate molecular recognition
– Molecular docking
– Free energy calculations
• MM-­‐GBSA, FEP, TI, LIE,… – QM/MM methods • Part of system is defined with QM and rest of system is MM.
• ONIOM-­‐method developed by Morokuma et al., 1996
Some definitions:
• pKd measures tightness of binding
• pKi measures ability to inhibit
• Free energy of binding
•
DG= DH -­‐T DS (DH enthalpy, DS entropy)
• Mechanisms
– competitive inhibition (most typical case in docking)
– allosteric inhibition • Inhibitor binds to different pocket
– allosteric activation • activator first binds to other pocket and activates enzyme
Kuopion Yliopisto Farmaseuttinen kemia
Energetics of binding
•
•
•
Gibbs energy of binding
DG= DH -­‐T DS (DH enthalpy, DS entropy)
and DG = -­‐RTlnKi = DH -­‐T DS •
Molecular recognition is depending on both Enthalpy (DH) and Entropy (DS)
•
Enthalpy
•
– Direct interactions between ligand, solvent, proteins, ions
•
•
•
•
Entropy
–
–
–
–
Ligand-­‐Protein interaction
Ligand-­‐Solvent interaction
Solvent-­‐Protein interaction
Conformational changes during binding
Rotational and translational entropy
Conformational entropy
Solvent reorganization (hydrophobicity)
Vibrational entropy
Prediction of binding energetics
• How to estimate binding
– Entropy is usually size-­‐dependent (rotational, translational, conformational, vibrational)
• Are waters released in binding from cavity? How tight were they bound? • Can be measured using calorimetric methods – Isothermal titration calorimetry (ITC) – Differential Scanning Calorimetry (DSC)
• Difficult to estimate computationally – Also hydrophobicity is connected to size
– Enthalpy usually deals with the direct binding effects PLUS solvent effects
• Conformational effects are also affecting enthalpic component
University of Eastern Finland
School of Pharmacy
What is docking ?
• Finding a right docking pose
• AND making difference between right and wrong docking poses • AND making difference between high and not so high affinity compounds
Molecular Docking – Three Tasks of Molecular Docking
Binding mode prediction
Binding affinity prediction
Score:
-­9.1
Predicted affinity (e.g. ΔG prediction)
Relative binding affinity prediction
Score:
-­9.3
VS.
Score:
-­7.6
Experimental measurements
(e.g. ITC ΔG measurement)
Slide made by Heikki Salo. Score:
-­8.5
Usage of Molecular Docking:
• Reproduce the binding mode of x-­‐ray ligand
• Predicting the binding mode of a known active ligands
• Predicting the binding affinities of related compounds from a known active series
• Identifying new ligands using virtual screening
Amy C. Anderson. The process of structure-­‐based drug design. Chem. Biol. 10:787-­‐797, 2003
Determine 3 D structure of ligand-­‐protein complex
(XRC tai NMR)
Choose Drug Target
Determine 3 D structure
(XRC tai NMR)
Homology modeling Analyze interactions
in silico-­‐optimization
Is the lead a nM inhibitor? Analyze inhibitor binding sites
Dock compound database to selected sites
yes
Can the lead be modified
Select a subset for in vitro -­‐testing
no
Is the lead a µM inhibitor? Slide made by Heikki Salo. yes
no
yes
A potential drug candidate passed on to further drug development phases
Approaches for molecular docking
• Rigid ligand – rigid protein – Historically the first approaches
– Search for the relative orientation of the two molecules with lowest energy
– Conformational analysis is needed for both ligand and protein
– FLOG (Flexible Ligands Oriented on Grid): each ligand represented by up to 25 low energy conformations. • Ligand flexible -­‐ protein rigid
– Several protein structures are needed
– GOLD, AutoDock, GLIDE
• Both ligand and protein are flexible – More time consuming, – ”Induced Fit” methods (Glide, MOE)
– Sidechain flexibility: Surf-­‐Flex, GOLD, Autodock,..
Crude workflow of molecular Docking:
• Binding mode prediction
– a search algorithm that finds the docking complex structure measured by the scoring function
– consume of CPU-­‐time is critical
– local minima
• Binding affinity prediction eg. Ranking
– a scoring function that can discriminate correct (experimentally observed) docking complex structure from incorrect ones
– strict control of false positives
– good correlation with pKd – (Note, pKd does not always correlate with activity)
– No consensus
– Multiple terms
Search algorithms:
• 1) Stochastic search
• 2) Incremental construction
• 3) Multiconformer
– Generation of a set of low-­‐energy conformers for ligands
– Rigid docking – FRED, FLOG
Stochastic search:
•
Simulated Annealing, MC, Genetic algorithm, Tabu Search
•
Gold (GA), Glide (MC), AutoDock4.0 (Lamarckian Genetic Algorithm), •
Monte Carlo simulated annealing (MCSA)
– Random
– Outcome varies
– Repeat to improve chances of success
– Glide
• an initial rough positioning
• torsionally flexible energy optimization (OPLS-­‐AA)
• MC refinement for energetically best poses
– AutoDock 2.4
• random changes in ligand's orientation and conformation in each temperature cycle
• new state is accepted (1) if the energy is lower than previous state (2) otherwise accepted based on probability expression Kuopion Yliopisto Farmaseuttinen kemia
Genetic Optimization for Ligand Docking (GOLD)
•
GA (a genetic algorithm)
– mimics the process of evolution
– initially a population of conformations is g enerated
– scoring algorithm evaluates the fitness of each conformation •
=> conformation=chromosome
– genetic operations (crossover, mutation)
•
•
•
Gold has fitness functions: – GoldScore or ChemScore
– Calculations based on chemical and physical theories
•
•
Geometrical properties
Bonding affinities
Full ligand flexibility
Partial protein flexibility, – including protein side chain and backbone flexibility for up to ten user-­‐defined residues
– the ability to dock into multiple models of the same or different proteins, i.e. ensemble docking Stochastic algorithms continues...:
• Tabu search
-­ limits conformational search space • impose restrictions in order to help search process to negotiate difficult regions
• difficult regions are listed and search is prevented to go these regions again
Initial solution randomly
Evaluate
Tabu list
Generate moves
Rank moves based on interaction energy
lowest energy accept
or not tabu list
Examine
reject
Incremental construction:
– Ligand is divided into single fragments
– Incrementally reconstructed inside active site: preferred torsinal angles
Dock, Flexx, Surflex-­‐Dock
SO2CH3
Protocol
1. Fragmentation
2. Selection of anchor fragment
-­‐ specificity
-­‐ placeability
3. Anchor fragment placement
4. Incremental addition of other fragments
Cl
COOH
COOH
COOH
Surflex-­‐Dock (Sybyl)
•
•
•
Surflex-­‐Dock is developed by A. N. Jain (J. Med Chem (2003), 46, 499-­‐511)
Uses an empirical scoring function
•
Succesful to eliminate false positive
•
Note:Surflex-­‐Dock results are dependent upon having a properly typed input ligand! •
– basis on the binding affinities of protein-­‐ligand complexes and on their X-­‐ray structures
– terms: hydrophobic, polar, repulsive, entropic, solvation
– scores are expressed in -­‐log10(Kd) units to represent binding affinities
Identify active site
(cavity)
Probe binding pocket
Protein's surface is coated with three types of probes:
CH4 represents steric, hydrophobic probe
N-­H represents hydrogen bond donor probe
C=O represents hydrogen bond acceptor probe
Protomol generation
Aligns ligand fragments to protomol
But! -­‐> scoring is calculated inside the binding pocket using scoring functions.
E
Scoring functions
• Gibbs energy of binding
• DG= DH -­‐T DS (DH enthalpy, DS entropy)
• => exact calculating time-­‐consuming • Scoring functions are used to estimate free energies of binding
• Force field scoring
– GoldScore, DOKC, AutoDock
• Empirical scoring
– ChemScore, Glide SP/XP
• Knowledge-­‐based scoring
– PMF, Drug Score
• Consensus scoring
E
Scoring functions continues..
• Force field scoring
– fast and transferable
– well studied and physical basis
– disadvantages
• only parts of relevant energies included (electrostatic dominating which cause systematic problems in ranking)
– force field scoring is based on idea to use only enthalpic contributions to estimate the binding free energy
– for example DOCK's force field score consists intermolecular terms of Amber energy function
– Intra ligand interactions are also included in the score
Scoring functions continues...
•
2. Empirical scoring function
–
–
–
–
Fast and good predictive power multivariate regression method
FlexX uses empirical scoring function generated by Böhm
free energy of binding is estimated as a sum of Nrot is the number of rotatable bonds that are immobilized in the c omplex Ghb, Gio, Grot, and G0 are adjustable parameters. Garo accounts for the interactions of aromatic groups, and is set to -­0.7 kJ/mol. Glipo is a modified term that is c alculated as a pairwise s um over all atom-­atom c ontacts. f( R, ) is a s caling function that penalizes deviations from ideal geometry. R = R -­R0, where R is the distance between the atom c enters R0 is the ideal v alue, which is assumed to be the s um of both van der Waals radii plus 0.6 Å. Scoring functions continues...
2.Empirical scoring functions
– X-­‐Score
– includes van der Waals interaction, hydrogen bonding, hydrophobic effect and deformation effect
– Disadvantages
• the function are trained solely on crystal structures of protein-­‐
ligand complexes (medium-­‐strong affinity)
• no effective penalty term for bad conformations
• usage/accuracy feasible only within similar compounds included in training set
ChemScrore
Scoring functions continues...
3. Knowledge-­‐based Scoring function
• Based on information from known protein-­‐ligand complex (protein databank)
• More general than empirical scoring functions
• PMF-­‐Score
The protein-­‐ligand interaction energy is calculated as a sum of distance dependent pairwise potentials over all heavy atom pairs between the complex. Both enthalpic and entropic effects are assumed to be included implicitly:
Scoring functions continues...
3. Knowledge-­‐based Scoring function
• DrugScore. The total protein-­‐ligand binding score is combination of distance-­‐dependent potentials and surface-­‐dependent potentials.
Use combinations of mean field terms and extraterms for e.g. solvation
Scoring functions continues...
4. Consensus scoring
– In protein−ligand docking, the scoring function is responsible for identifying the correct pose of a particular ligand as well as separating ligands from nonligands.
– Consensus scoring involves combining the results from several rescoring experiments
– “Consensus” hypothesis: rescoring is a way of combining results from two scoring functions such that only true positives are likely to score highly
– “Complementary” hypothesis: the scoring functions used in rescoring have complementary strengths; one is better at ranking actives with respect to inactives while the other is better at ranking poses of actives
Venn diagrams contrasting the performance of the three scoring functions for different proteins. The results are taken from a
single repetition (the first repetition) of the docking experiments. (a) The number of actives placed in the top-­ranked position. For example there are 25 actives (out of 85) which all three scoring functions correctly place in the top-­ranked position. (b) Poses
correctly predicted;; that is, where the top-­ranked pose is within 2.0 Å rmsd of the crystal structure. For example there are 53 proteins where all three scoring functions correctly predict the active pose.
Published in: Noel M. O’Boyle;; J ohn W. Liebeschuetz;; J ason C. Cole;; J . Chem. Inf. Model. 2009, 49, 1871-­1878.
DOI: 10.1021/ci900164f
Copyright © 2009 American Chemical Society
Problems with Docking and Scoring functions
• The best of docking methods predict the experimental pose about 70% of the time, although selecting the program that will give the best result for any given target is not straightforward
• The most stringent test of docking is the accurate prediction of the binding affinities of a series of related compounds – This goal is essentially beyond all of the current docking methods
Virtual screening performance
Kuopion Yliopisto Farmaseuttinen kemia
Problems continue…
• Prediction of binding affinities for a diverse set of molecules is difficult task
– Scoring
– Search Space is high-­‐dimensional
• Both molecules are flexible – hundreds to thousands of degrees of freedom
• Total number of possible poses is astronomical
• About 30 docking programs
• Calculations in gas phase
– accurate calculations in water time-­‐consuming
• Free energy differences between best ligand (potency 50 nM) and experimental detection limit (potency 100 mikroM) is only 4.5 kcal/mol)
– Conformational factors alone for ligands can be as large as that
Kuopion Yliopisto Farmaseuttinen kemia
Docking Accuracy
• Docking of 100 ligands to their cognate protein X-­‐ray structure.
• Cumulative percentage of complexes as a function of the RMSD from the X-­‐ray pose. (A) Docking accuracy: RMSD in Å of the best pose (nearest to the experimental binding mode) from the experimental solution. Scoring accuracy
•
•
B. RMSD in Å of the top pose (best scored solution) from the experimental solution. Current plot have been obtained considering the X-­‐ray pose as input conformation of the ligand to dock.
Note the scale!
Scoring…L
Docking Performance
Binding mode prediction
Docking Power
Score:
-­9.1
Binding affinity prediction
Scoring Power
Predicted affinity (e.g. ΔG prediction)
VS.
Score:
-­7.6
Experimental measurements
(e.g. ITC ΔG measurement)
Relative binding affinity prediction
Screening Performance
Score:
-­9.3
Score:
-­8.5