Structure-Activity Relationship Clusters (and the Difference between 2-D and 3-D Similarity) Volker D. Hähnke, Lianyi Han, Sunghwan Kim, Evan E. Bolton, Stephen H. Bryant National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda (MD), USA [email protected] 1 PubChem Structure-Activity Relationship Clusters Description of Chemical Sample - 127 million substances Unique Chemical Structures Aggregated Information - 48 million compounds Structure Activity Properties Metabolic pathways Vendors Publishers Patents 2 Bioactivity Information Assays & Results - 720 thousand assays 215 million bioactivity results PubChem Structure-Activity Relationship Clusters PubChem 3D Generation of 3-D Conformers* • ≤ 50 non-hydrogen atoms • ≤ 15 rotatable bonds • H, C, N, O, F, Si, P, S, Cl, Br, and I • 1 covalent unit • ≤ 6 undefined stereo centers • All atom types supported by MMFF94 Yields Thematic Series in J Cheminf 3 PubChem Structure-Activity Relationship Clusters • Conformers • 3-D descriptors • 3-D similarities *Bolton, Kim & Bryant; J Cheminf 2011; 3:4. PubChem 3D Coverage* 92.3% have at least 1 conformer 0.9% 89.6% 0.8% 2.7% 0.4% 0.3% 0.3% has conformer salt (parent has conformer) 4.9% Method too flexible atom type not supported too big undefined stereo complex/mixture failed • OpenEye OMEGA • MMFF94, no Coulomb term • Custom RMSD sampling threshold • Up to 500 conformers per Compound 4 PubChem Structure-Activity Relationship Clusters CID 23666729 CID 2244 *Bolton et al.; J Cheminf 2011; 3:32. PubChem – Neighboring Instant access to structurally similar compounds (pre-computed) 2-D 5 PubChem Structure-Activity Relationship Clusters 3-D PubChem – Neighboring … PubChem Fingerprint - 881 Bits • • • • • Atoms Rings Atom pairs Atom environments More specific substructures • Tanimoto Coefficient , Overlay of Volumes (ROCS) • Shape Tanimoto , ≥0.9 • ≥0.8 , Color Tanimoto Combo Tanimoto combines ST and CT A B C 6 #Bits set in A #Bits set in B #Bits set in A&B PubChem Structure-Activity Relationship Clusters ≥0.5 differentiates 6 different features f • • & ComboTST-optimized ComboTCT-optimized 5 6 10 conformers maximum similarity between pairs 1302 naproxen 2581 carprofen 3332 felbinac 3394 flurbiprofen 3826 ketorolac 5468 surgam 5733 zomepirac 39941 benaxoprofen CID 1302 2581 3332 3394 3826 5468 5733 39941 1302 - 0.92 / 0.55 0.89 / 0.41 0.84 / 0.53 0.84 / 0.28 0.83 / 0.39 0.80 / 0.34 0.84 / 0.56 2581 0.43 - 0.92 / 0.50 0.92 / 0.52 0.87 / 0.27 0.90 / 0.35 0.81 / 0.38 0.84 / 0.29 3332 0.70 0.49 - 0.95 / 0.73 0.86 / 0.39 0.86 / 0.50 0.87 / 0.59 0.83 / 0.21 3394 0.70 0.49 0.94 - 0.86 / 0.40 0.87 / 0.59 0.88 / 0.45 0.75 / 0.43 3826 0.42 0.65 0.49 0.49 - 0.92 / 0.52 0.81 / 0.70 0.79 / 0.17 5468 0.57 0.43 0.71 0.71 0.52 - 0.78 / 0.60 0.79 / 0.32 5733 0.42 0.72 0.48 0.48 0.86 0.51 - 0.64 / 0.27 39941 0.49 0.69 0.39 0.40 0.62 0.41 0.70 - 3-D Similarity (ST / CT) PubChem – Neighboring 2-D Similarity 7 PubChem Structure-Activity Relationship Clusters Bolton, Kim & Bryant; J Cheminf 2011; 3:13. Structure-Activity-Clustering PubChem Neighboring Instant access to structurally similar compounds (pre-computed) not necessarily biologically similar Structure-Activity-Relationship Clusters Instant access to structurally and biologically similar compounds 8 PubChem Structure-Activity Relationship Clusters Structure-Activity-Clustering Bioactivity inactive • Active Non-Inactive undecided / inconclusive active o In at least one of 548,071 Assays (PubChem AID) 843,845 Compounds o Against one of 4,280 Proteins (NCBI GI number) 400,599 Compounds o Modulate at least one of 4,540 Pathways (BioSystems BSID) 265,470 Compounds 9 PubChem Structure-Activity Relationship Clusters Structure-Activity-Clustering – Method Leader Algorithm (Taylor-Butina Grouping) Score Distribution* (randomly drawn pairs) 7 Create a Nearest Neighbors List Eliminate (real) Singletons 2 ̅ 0.4229 6 % Scores 1 5 4 0.1326 3 2 distance 1 Find Compound with largest list 3 0 0 0.2 0.4 0.6 0.8 1 Similarity 1 ̅ 2 ! Group Compounds in largest list & eliminate from further consideration 3 4 10 PubChem Structure-Activity Relationship Clusters Similarity $%&'()& 2-D 0.3119 ST 0.1502 ComboTST 0.4822 CT 0.6102 ComboTCT 0.4748 *Kim, Bolton & Bryant; J Cheminf 2012; 4:28. Structure-Activity-Clustering – Results Assays # Clusters 4,000,000 843,845 Compounds 3,000,000 2,000,000 1,000,000 0 Proteins 400,599 Compounds Taylor-Butina Grouping # Clusters 1,000,000 800,000 600,000 400,000 200,000 0 Pathways 265,470 Compounds # Clusters 2,000,000 1,500,000 1,000,000 500,000 11 PubChem Structure-Activity Relationship Clusters 2-D ComboTCT CT ComboTST ST 0 Structure-Activity-Clustering – Results Compounds Assays Proteins Assays 10,000,000 750,000 Absolute Frequency Compounds Cluster(-ing) Statistics 500,000 250,000 0 400,000 1,000,000 100,000 10,000 1,000 100 200,000 10 0 1 12 100,000 2-D CT 0 ComboTCT In Cluster 100 Cluster Size ̅ 200,000 ComboTST Singletons 10 300,000 ST Compounds 1 Pathways ST x ComboTST CT ComboTCT 2-D PubChem Structure-Activity Relationship Clusters ST 4.0 ± 5.2 ComboTST 5.3 ± 7.8 CT 5.9 ± 9.5 ComboTCT 5.4 ± 8.3 2-D 8.2 ± 13.8 1,000 10,000 Structure-Activity-Clustering – Results 4 Compounds 14 Conformers 13 PubChem Structure-Activity Relationship Clusters 4 Compounds 16 Conformers Structure-Activity-Clustering – Results High Value Compounds • Assay: 43% More reliable information 23% 5.1% 14.9% IC50 / EC50 < 10 µM Has MeSH annotation Protein: 49.5% 29.4% • 10.7% 9.3% Mainly in bigger clusters Pathway: 50.9% 25.2% 14 PubChem Structure-Activity Relationship Clusters 8.7% 17% 2-D & 3-D Similarity Method i Clustering Differences Method j O(i,j): % Overlapping compounds in clusters for • ST ComboTST CT ComboTCT 2D ST ComboTST CT ComboTCT 2D ST ComboTST CT ComboTCT 2D a given UID between similarity measures i & j ST - 79 77 79 76 ST - 90 87 89 85 ST - 94 93 94 90 ComboTST 73 - 85 88 83 ComboTST 77 - 89 92 87 ComboTST 83 - 95 96 92 CT 71 85 - 86 83 CT 75 89 - 90 87 CT 81 93 - 95 92 ComboTCT 72 87 86 - 84 ComboTCT 77 91 90 - 86 ComboTCT 82 95 95 - 92 2-D 69 82 82 83 - 2-D 71 83 83 83 - 2-D 76 87 89 88 - Assay O(i,j) 15 Protein O(i,j) PubChem Structure-Activity Relationship Clusters Pathway O(i,j) PubChem Cluster Explorer Public Resource: https://pubchem.ncbi.nlm.nih.gov/sar/ … CID 2244 555 AIDs 4,401 Clusters 107 GIs 1,562 Clusters Similarity Method No. Compounds No. Conformers … 467 BSIDs Export 5,902 Clusters Absolute Frequency 1000 100 10 1 0 16 PubChem Structure-Activity Relationship Clusters 200 400 Cluster Size 600 800 https://pubchem.ncbi.nlm.nih.gov/sar/ PubChem Cluster Explorer Cluster 697257876062209 • Similarity Method: 2D • AID 162343: Inhibitory concentration in DMSO with purified human Prostaglandin G/H synthase 2 (COX-2) • 47 Compounds / Conformers 17 PubChem Structure-Activity Relationship Clusters https://pubchem.ncbi.nlm.nih.gov/sar/ PubChem Cluster Explorer Export AIDs Export GIs Export BSIDs 18 PubChem Structure-Activity Relationship Clusters https://pubchem.ncbi.nlm.nih.gov/sar/ Structure-Activity-Clustering Structure-Activity-Relationship Clusters Instant access to structurally and biologically similar compounds Limitations • No Inactives • No quality measure for clusters 19 PubChem Structure-Activity Relationship Clusters Current Work Adding Inactives • Neighbors to compounds in cluster • Tested in the same assay 20 PubChem Structure-Activity Relationship Clusters Current Work Adding Inactives • Measure quality / modelability* Establish “good” clusters • How good is good enough? • Suitable for model generation 21 PubChem Structure-Activity Relationship Clusters Golbraikh et al.; J Chem Inf Model 2014; 54:1-4. Mesa Analytics & Computing, Inc. • Steve Bryant • Jiyao Wang • Evan Bolton • Siqian He • Sunghwan Kim • Jane He • Lianyi Han • Bo Yu • Paul Thiessen • Renata Geer • Asta Gindulyte • Ben Shoemaker • Lewis Geer • Gang Fu • Yanli Wang • Tiejun Cheng • Jian Zhang • John MacCuish • Nora MacCuish • Mitch Chapman PubChem Cluster Explorer https://pubchem.ncbi.nlm.nih.gov/sar/ This research was supported [in part] by the Intramural Research Program of the NIH, National Library of Medicine. 22 PubChem Structure-Activity Relationship Clusters Key Points Structure Activity Clusters • Instant access to structurally & biologically similar compounds • 50% of clusters have very reliable activity information • Publicly accessible • Inactives are incoming https://pubchem.ncbi.nlm.nih.gov/sar/ 2-D & 3-D Similarity: • 2-D similarity less restrictive • Pure shape similarity is the most restrictive • Feature similarity is similar to 2-D similarity 23 PubChem Structure-Activity Relationship Clusters PubChem Structure-Activity Relationship Clusters
© Copyright 2025 Paperzz