• What are they? • How do we find them? • How to deploy for users? Practical Applications of Matched Molecular Pairs at Vernalis Steve Roughley Richard Sherhod About Vernalis • Expertise • Fragments and structure‐based drug discovery (Protein Science, Structural Biology, Chemistry) • Therapeutic areas • Oncology, CNS, infectious diseases • Location • Based in Granta Park, outside Cambridge, UK Trusted community contributor since 2013 (2 KNIME‐trained developers) 2 16 March 2017 Matched Molecular Pairs (MMPs) Definition “MMP can be defined as a pair of molecules that differ in only a minor single point change” (Wikipedia) CHEMBL2263252 CHEMBL60592 • Multiple open‐source implementations in various forms • At least 2 in KNIME • Vernalis • Erlwood • Recently reviewed • Christian Tyrchan and Emma Evertsson, Comput. & Struct. Biotech. J., 2017, 15, 86‐90 3 16 March 2017 Anatomy of a Matched Molecular Pair • Hussain‐Rea Algorithm3 • Identify bonds that can be broken • Eg acyclic bonds • Break molecule along each matching bond in turn • Match identical ‘Keys’ • ‘Values’ form pair transforms * * Molecule A cut Values form Pair Transform identical keys * Molecule B 4 3. Jameed Hussain, Ceara Rea, J. Chem. Inf. Model., 2010, 50, 339–348 * 16 March 2017 Anatomy of a Matched Molecular Pair • Hussain‐Rea Algorithm3 • Identify bonds that can be broken • Eg acyclic bonds • Break molecule along each matching bond in turn • Match identical ‘Keys’ • ‘Values’ form pair transforms CHEMBL2263252 cut identical keys Values form Pair Transform CHEMBL60592 5 3. Jameed Hussain, Ceara Rea, J. Chem. Inf. Model., 2010, 50, 339–348 16 March 2017 Multi‐cut pairs • Pairs also can be formed by cutting 2 or more bonds simultaneously • Allows scaffold replacement transforms • Need to track which breaking bond is which 1 * 1 * cut Molecule A 2 * 2 * Values form Pair Transform Identical Keys 1 * 1 * 2 * 2 * Molecule A 6 16 March 2017 Multi‐cut pairs • Pairs also can be formed by cutting 2 or more bonds simultaneously • Allows scaffold replacement transforms • Need to track which breaking bond is which cut CHEMBL373838 Identical Keys Values form Pair Transform CHEMBL1368873 7 16 March 2017 Transforms application * * ‘A’ A>>B ‘B’ Molecule A Molecule B Transform takes no account of relevance or context Original molecules interconvert when transform is applied A>>B CHEMBL2263252 CHEMBL60592 A>>B Other molecules generate new ideas CHEMBL1350874 Not found in ChEMBL 8 16 March 2017 DESTRUCTION TESTING Can we fragment all of ChEMBL? A ‘reasonable’ ‘representative’ test set 9 16 March 2017 Pre‐processing – “Speedy SMILES” Vernalis Community Nodes • Fast String‐based SMILES string manipulation • No chemical toolkit conversion e.g. c1cc[nH]c1C(=O)OCCN(C)C • Streamable • Example application – pre‐processing ChEMBL • De‐salt • Remove large (HAC>40) or small (HAC<8) molecules • and those with a net charge or large number of charges 10 16 March 2017 SpeedySMILES pre‐processing ChEMBL 1,581,653 molecules in Processed 76 seconds 1,486,077 molecules out Failure Category HAC < 8 or > 50 63,920 Non‐neutral 30,706 Total Charges > 4 Broken Bonds 11 Count 949 1 16 March 2017 ChEMBL fragmentation The numbers… • 1,420,462 molecules fragmented • 1‐10 cuts • Non‐functional group single bonds • Maximum 10,000 fragmentations / molecule • 134,020,007 fragments • 139,679 failed rows: Failure category Complexity limit Count 139,550 Too few matching bonds 79 Valence error 25 Kekulisation error 25 • 10h30min (Intel® Core™ i7‐4770 @ 3.4GHz; W10) • 10 threads; 500 rows buffer • ‐Xmx16329m This version will be released to the community ‘imminently’ 12 16 March 2017 Example failure rows New Vernalis Matching Bonds Renderer Node +H CHEMBL2297882 Molecule failed complexity limit 15965 possible fragmentations CHEMBL1698868 No matching bounds found, or too few to cut N O HN P S HN NH HN P NH N N P N CHEMBL178180 Error parsing … Explicit valence for atom # 8 Te, 4, is greater than permitted 13 CHEMBL3188982 Error parsing… OC(=O)C(=O)Nc1cccc(c1)c2nnnn2 … Unkekulized atoms N P N CHEMBL2006679 Error parsing … Unkekulized atoms 4 16 March 2017 Matched Molecular Pairs (MMPs) • MMP Concept has been extended to improve utility • Data analysis • What effect does a transform have on data, e.g. Metabolism/Stability, hERG binding, target binding? • Matched Molecular Series • When my series is seen in activity order, what other new members are commonly ‘better’? • Fingerprint similarity • How closely related is the surrounding chemical matter to my input molecule? • 3D Matched Pairs • Molecular shape/pharmacophore presentation • In all cases, provides additional ‘context’ to the pairs • Can be used for substituent analysis/replacement or scaffold replacement 14 16 March 2017 MMAnalyser: Applied MMP/S Analysis Richard Sherhod [email protected] MMAnalyser • KNIME Web Portal application • Composed of multiple interactive KNIME workflows • Allows chemists to do matched‐molecular pair/series analysis • Guides users through the analysis process 16 16 March 2017 MMAnalyser: Application • Two interactive workflows for chemists • MMPair Analyser – matched‐molecular pair analysis • MMSeries Analyser – matched molecular series analysis • Interactive admin workflows for database maintenance • Rebuilding MMPair and MMSeries databases from pre‐defined sources Kenny, P.W. & Sadowski, J., 2005. Structure Modification in Chemical Databases. In Wiley‐VCH Verlag GmbH & Co. KGaA, pp. 271–285. Available at: http://doi.wiley.com/10.1002/3527603743.ch11 [Accessed March 6, 2017]. Wawer, M. & Bajorath, J., 2011. Local Structural Changes, Global Data Views: Graphical Substructure−Ac vity Rela onship Trailing. Journal of Medicinal Chemistry, 54(8), pp.2944–2951. Available at: http://pubs.acs.org/doi/abs/10.1021/jm200026b [Accessed March 6, 2017]. 17 16 March 2017 MMPair Analyser: Application MMP Analysis • All transforms applied to the input structure Input structure • Results filtered and sorted by: • Observation count • Enrichment of positive observations Pre‐generated transformations with observation data 18 16 March 2017 MMPair Analyser: Database • • • • • • • ChEMBL data Molecule dictionary Compound structures Compound properties (QED) Compound records Activities Assays Documents Generate MMPs Filtered structures 1. Fragment structures • Fragments (values) and scaffolds (keys) 2. Add hydrogens to fragments and scaffolds 3. Generate transformations from fragments • Record ID of left and right structures Observations Gather evidence 1. Filter transformations by observation count 2. Get observed changes in property for each transformation 3. Calculate enrichment of positive observations 4. Perform one‐tailed binomial test • Keep transforms with p >= 0.05 19 16 March 2017 MMPair Analyser: Demo 20 16 March 2017 MMSeries Analyser Matched‐molecular Series Analysis • Extension of MMPs to three or more R‐groups (values) • Originally proposed by Waver & Bajorath (2011) • Several implementations, e.g. MATSY (O’Boyle et al. 2014) • R‐groups ordered by the properties of their parent Br > Cl > F > H pIC50 R pIC50 7.00 H 8.30 7.68 F 8.00 8.51 Cl 7.77 8.77 Br 8.89 Ms R MeO Br > H > F > Cl O’Boyle, N.M. et al., 2014. Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity. Journal of Medicinal Chemistry, 57(6), pp.2704–2713. Available at: http://pubs.acs.org/doi/abs/10.1021/jm500022q [Accessed March 3, 2017]. Wawer, M. & Bajorath, J., 2011. Local Structural Changes, Global Data Views: Graphical Substructure−Ac vity Rela onship Trailing. Journal of Medicinal Chemistry, 54(8), pp.2944–2951. Available at: http://pubs.acs.org/doi/abs/10.1021/jm200026b [Accessed March 6, 2017]. 21 16 March 2017 MMSeries Analyser: Application MMS Analysis Input structures With unique IDs and numeric data • Query structures are fragmented into scaffolds and R‐groups • Sets of R‐groups, their scaffold and data are arranged into series • Query series are compared to pre‐generated series • Common R‐groups are recorded • Query and database series ordered by data • Spearman's rank correlation between matching series calculated Pre‐generated sets of scaffolds and R‐groups with data 22 • Matching series sorted by rank correlation 16 March 2017 MMSeries Analyser: Application Scaffold Rank Correlation 86 23 69 63 95 60 59 52 46 94 92 0.975 100 100 0.9 72 61 16 33 7 0.5 95.6 85.2 95.7 90 78.9 34 16 March 2017 MMSeries Analyser: Database • • • • • • • ChEMBL data Molecule dictionary Compound structures Compound properties (MW) Compound records Activities Assays Documents Generate MMPs Filtered structures Observations 1. Fragment structures into R‐groups and scaffolds • 4 methods including Hussein/Rea rules 2. Group R‐groups into series by common parent scaffolds 3. Keep series of 3 or more R‐groups 4. Record IDs of parent structures Gather evidence Associate parent structures with observation data 24 16 March 2017 MMSeries Analyser: Demo 25 16 March 2017 MMAnalyser: Possible Improvements • Present observation data to the user • Direct the user to relevant source material • More datasets • Better (more robust) processing • Incorporate transformation site similarity (MMPs) • Associate transformations with fingerprints from their parent scaffold(s) • Incorporate scaffold similarity (MMSs) • Filter/order series by similarity to query scaffold 26 16 March 2017 Acknowledgments • Vernalis colleagues • Greg Landrum (RDKit) • ChEMBL • KNIME Matched‐molecular series implementation: Hunt, P. et al., 2017. Practical applications of matched series analysis: SAR transfer, binding mode suggestion and data point validation. Future Medicinal Chemistry, 9(2), pp.153–168. Available at: http://www.future‐ science.com/doi/10.4155/fmc‐2016‐0203 [Accessed March 3, 2017]. 27 16 March 2017 Thank you! BACKUP SLIDES 29 16 March 2017 Stereochemistry • Fragmentation can create new chiral centres / double bond geometries • Existing absolute/unknown must be preserved Known Stereocentre? Unknown/Racemic Stereocentre? Yes 31 No Yes No 16 March 2017 Stereochemistry • Fragmentation can create new chiral centres / double bond geometries • Existing absolute/unknown must be preserved Known Stereocentre? Unknown/Racemic Stereocentre? Yes 32 No Yes No “KNOWN KNOWNS” We know we know about stereochemistry 16 March 2017 Stereochemistry • Fragmentation can create new chiral centres / double bond geometries • Existing absolute/unknown must be preserved Known Stereocentre? Unknown/Racemic Stereocentre? Yes 33 No Yes “KNOWN UNKNOWN” We know unknown or racemic No 16 March 2017 Stereochemistry • Fragmentation can create new chiral centres / double bond geometries • Existing absolute/unknown must be preserved Known Stereocentre? Unknown/Racemic Stereocentre? Yes 34 No Yes No ? “UNKNOWN UNKNOWN” We have no idea… 16 March 2017 Stereochemistry • Fragmentation can create new chiral centres / double bond geometries • Existing absolute/unknown must be preserved Known Stereocentre? Unknown/Racemic Stereocentre? Yes 35 No Yes No ? or 16 March 2017 Sneak Preview – Upcoming revised release • Memory leak fixed • Survives ‘destruction testing’ • New fragmentation type added • More flexibility for custom types • New parallelised pair generation nodes • Transform filtering options • Reference table version • Only generates pairs between rows from the two tables • Improved Rendering/Filtering nodes 36 16 March 2017 Attachment point fingerprints • RDKit Morgan ECFP‐like fingerprint • Rooted at the attachment point atom for each ‘key’ component • Default radius 4, size 2048 bit • Calculated during fragmentation in Vernalis Nodes 1: 10000000000100000000010100000100 1 A 1: 00001000000010000000100000000101 O 1: 00000000000010000000000100000100 2: 10000000000001000000001100000100 Example 32‐bit AP fingerprints 37 16 March 2017
© Copyright 2026 Paperzz