Small Molecule-Protein Interactions Howard Feldman The Blueprint Initiative Toronto, Ontario [email protected] Lecture 4.3 1 Drug Discovery Pipeline • Most drugs are small molecules, and the interactions they make with proteins determine their effects, and toxicity, to the human body • Clinical trials are most expensive part of the pipeline – if failure can be predicted before this point, it saves time and money Lecture 4.3 2 Drug Discovery Pipeline • It is of utmost importance to identify lead compounds in the early stages of drug discovery that will be most likely to succeed • Recent study by Tufts Center for the Study of Drug Development showed that bringing one drug to market costs an average of $800M! • 5/5,000 potential new drugs tested on animals reach clinical trials, and only one ultimately wins FDA approval Lecture 4.3 3 How small is a small molecule? • Small molecule generally considered anything which may interact with proteinDNA • Must be biologically relevant • Examples include ions, polysaccharides, peptides, drugs Lecture 4.3 4 Small Molecules • No absolute maximum size, though drug-like molecules often have molecular weight of 500 Da or less • However can get complex – branched poly saccharides, cyclic antibiotics, etc. • Normally not interested in: detergents, buffers, solvents, denaturants, non-biological ions Lecture 4.3 5 How Many? • Recently a number of public small molecule databases have become available: – CAS Registry – 26,000,000 substances – Cambridge Structural Database – 300,000 3D structures – PDBSum – 6700 3D ligands from PDB – NCI databse – 250,000 molecules – NCBI PubChem – 700,000 compounds – ChemBank – 1,100,000 molecules • Problem: data can be very messy, sparse Lecture 4.3 6 Popular small molecules and domains A B 1 2 3 4 5 6 7 8 9 10 SM Mg2+ ATP Mn2+ ADP ClCa2+ Zn2+ AMP-PNP AGS GTP Total % Total 0.38 0.26 0.26 0.25 0.24 0.23 0.22 0.21 0.19 0.18 2.43 1 2 3 4 5 6 7 8 9 10 Identifier pfam00004 pfam01443 pfam00910 pfam00270 pfam00680 pfam01695 pfam00005 pfam00437 pfam03288 pfam05729 Domain AAA Viral helicase RNA helicase DEAD RNA pol IstB ABC GSPII Pox NACHT Total % Total 0.31 0.29 0.27 0.27 0.23 0.23 0.23 0.22 0.20 0.20 2.45 • Not surprisingly, divalent cations and ATP are the most common small molecules found interacting with proteins • AAA is an ATPase domain, the next three are all helicases, which bind various nucleotides as well Lecture 4.3 7 Toxicity • Caused when drug interferes with biological pathway(s) in the host • Less side-effects, the better • Must be determined in early stages of discovery, or very costly • Hence predicting toxicity is very important and desirable • Boils down to predicting interactions, or rather, non-interactions Lecture 4.3 8 Predicting Toxicity • Inverse docking – Chen and Zhi developed a database of cavities in PDB structures • INVDOCK searches cavities for potential interactions to ligand of interest, using scoring function – Compare energy to absolute threshold, as well as energy of observed PDB ligand(s) at that site Lecture 4.3 9 Example – 4H-Tamoxifen • Used to treat breast cancer • INVDOCK finds 22 putative protein targets at least 10 of which have some experimental backing (including the ones shown here): – Estrogen receptor (the drug target) – Alcohol dehydrogenase (enhances sedative effect of alcohol) – IgG light chain (modulates immune response) – 17b-hydroxysteroid dehydrogenase (tumor regression) – GST (suppressed activity, genotoxicity, carcinogenicity) Lecture 4.3 10 Drug Docking Lecture 4.3 11 Drug Docking • Shares much in common with structure prediction • Two components – Exploration of conformational space – Scoring function • Plus one additional component – Locating the binding site Lecture 4.3 12 Drug Docking – Level of Detail • Rigid body docking – protein remains fixed, small molecule has 6 degrees of freedom (DOF) – 3 translational and 3 rotational Lecture 4.3 13 Drug Docking – Level of Detail • Flexible-ligand docking – protein remains fixed, small molecule has standard 6 DOF plus internal DOF – can rotate about bonds – More time consuming, but necessary for complex ligands if binding conformation is unknown • Flexible docking – as above, and in addition protein atoms in neighbourhood of binding site can move – Largest conformational space to search – Often done by using multiple static protein conformers, and treating each by flexible ligand docking – Often important when docking to apo-protein e.g. allosteric effects Lecture 4.3 14 Drug Docking – Level of Detail • Some methods such as FlexX perform incremental construction within the binding pocket rather than docking per se Lecture 4.3 15 Drug Docking – Techniques • Drug docking algorithms share much with protein structure prediction, and include: – – – – – – Monte Carlo search Molecular Dynamics Genetic Algorithms Fragment Assembly Tabu Search Many more… Lecture 4.3 16 Drug Docking • When ligand and target are known, can allow complete flexible docking • For HTS, can usually only afford rigid body for initial pass • Location to dock to on protein target may be known ahead of time, or may be computed through binding pocket detection – Often binding site can be predicted if 3D structure is available using cavity-detection algorithms • Search must be efficient, as with protein folding, since exhaustive search is not possible • Scoring function must be selective and efficient Lecture 4.3 17 Drug Docking Example • Study by Thornton’s group (Nature Biotech. 22(8) (2004) p 1039-1045 • Took 120 enzymes and 125 metabolites from EcoCyc – subset of 29 complexes have crystal structures • Docked all-vs-all with AUTODOCK Lecture 4.3 18 • Energy plots for docking (a) and reverse docking (b) for subset of 29 with crystal structures; triangles represent crystal complex • Note from (a), enzymes are not that selective about substrate, nor are substrates that specific for enzyme in (b) Lecture 4.3 19 Drug Docking Example • Computed P value – ability of substrate or enzyme to recognize its partner based on energy distribution • Now with 4 exceptions, the docked pairs show either enzyme OR substrate OR both are specific Lecture 4.3 20 Transition state Lecture 4.3 21 Transition state • Most potent inhibitors are not substrate analogues but rather transition-state analogues • Important to remember when screening compounds Lecture 4.3 22 Interaction Databases • BIND (Protein-ligand interactions from PDB and literature, SLRI) • Het-PDB Navi (Protein-ligand interactions from PDB, Nagahama Inst. Bio-Science) • EcoCyc (metabolic pathways, SRI) • KEGG (pathway database, Kyoto) Lecture 4.3 23 Blueprint’s Small Molecule Resources • BIND-3DSM Division – 23,584 Filtered Small Molecule – Biopolymer interactions, automatically derived from crystal structures – Biologically insignificant records removed (i.e. crystal packing, non-biological ions) – Published: Biopolymers. 2001-2002; 61(2):111-20 • SMID – 48886 records matching 4283 small molecules (from PDB structures) to 2807 protein families (CDD, SMART, PFAM) • SMID-BLAST – BLAST calibre tool to attach small molecule binding annotation (residue-level) to genomic sequence • SMID-Genomes – SMID-BLAST vs all completely sequenced genomes – 9.6 Million high-quality small molecule interaction annotations mapped to sequences – Database interface to browse/compare/investigate small molecule specificity across organisms Lecture 4.3 24 www.bind.ca A 3DSM Record Lecture 4.3 25 BIND record – binding site Lecture 4.3 26 Interaction Example • Taxol is derived from natural products, and was discovered to be effective against certain types of cancer • Interacts with tubulin and stabilizes tubules forming cell cytoskeleton, preventing mitosis and leading to cell death Lecture 4.3 27 Visualizing Binding Sites Lecture 4.3 28 SMID • http://smid.blueprint.org/ • Small Molecule Interaction Database • Matches small molecule binding sites in structures to protein domains in NCBI's Conserved Domain Database • 4283 small molecules from PDB Lecture 4.3 29 Creating SMID Records Small Molecule A (smA) Protein A (ProA) Start with an MMDB record (PDB record) containing more than one “molecule”. Small Molecule B (smB) ProA 401 336 345 357 371 321 62 74 83 Find atoms from one molecule in proximity (0.5 Å) of atoms from another molecule. Interactions Found: 1) Residues 62, 74 & 83 interacting with smA. 2) Residues 321, 336, 345, 357, 371 & 401 interacting with smB. smA smB Lecture 4.3 30 44 62 73 86 31 98 105 123 Creating SMID Records RPS-BLAST all sequences found to interact with a small molecule in order to obtain alignments with conserved domains (DomA & DomB). 401 336 345 357 371 321 DomB 62 74 83 DomA ProA Overlay small molecule – protein interaction on aligned conserved domains. smA Interactions Found: 1) DomA (residues 98, 105 & 123) interacting with smA. 2) DomB (residues 31, 44, 62, 73, 86 interacting with smB. 44 62 73 86 31 98 105 123 smB DomA DomB smA Lecture 4.3 31 smB Use Cases for SMID • Domain Studies – Binding site analysis – Domain family binding site conservation – Small molecule to the domain families that bind • Structural Genomics – Domain/ligand/binding site identification • Some ligands go over domain boundaries – Easier pattern recognition for interactions – Quickly identify candidate co-crystalization ligands Lecture 4.3 32 Taxol ligand conservation in Tubulin/FtsZ domain family Lecture 4.3 33 SMID-BLAST • Uses RPS-BLAST (unmodified) with a new scoring scheme to improve domain family hits using specific ligand conservation information • Validation - 1652 new unique interactions deposited into PDB – 1027 (62%) of these interactions are predicted within our selected ligand score cutoff – Of these 262 (25%) were top predictions • This is very good, as the test set is not comprehensive… – we do not have a set of all possible ligands to each protein crystal structure – we can only use exact small molecule matches (not similar molecules, e.g. ATP vs ATP-gamma-S) • Specificity – able to distinguish closely related Trp- and Tyraminoacyl-tRNA synthetases that hit the same protein domain families Lecture 4.3 34 Use Cases for SMID-BLAST • Annotation of Newly Sequenced Genomes – New enzyme discovery – Rhodococcus genome • William Mohn (UBC) • Metabolic diversity • PCB degradation • Drug Docking – Can help prioritize experiments • Homology Modelling – May help in template selection phase Lecture 4.3 35 SMID-BLAST Results: Summary Lecture 4.3 36 Summary • Understanding and cataloguing biopolymer-small molecule interactions is critical to the drug discovery process • Drug docking can help explain toxicity and side effects, and can be useful in understanding the forces behind interactions • Transition state analogues make the best inhibitors • Tools like SMID-BLAST provide a simple, powerful way to predict what ligands may interact with a protein, and vice-versa Lecture 4.3 37
© Copyright 2026 Paperzz