Practical Applications of Matched Molecular Pairs at Vernalis

• What are they?
• How do we find them?
• How to deploy for users?
Practical Applications of Matched Molecular Pairs at Vernalis
Steve Roughley
Richard Sherhod
About Vernalis
• Expertise
• Fragments and structure‐based drug discovery
(Protein Science, Structural Biology, Chemistry)
• Therapeutic areas
• Oncology, CNS, infectious diseases
• Location
• Based in Granta Park, outside Cambridge, UK
Trusted community contributor since 2013 (2 KNIME‐trained developers)
2
16 March 2017
Matched Molecular Pairs (MMPs)
Definition
“MMP can be defined as a pair of molecules that differ in only a minor single point change”
(Wikipedia)
CHEMBL2263252
CHEMBL60592
• Multiple open‐source implementations in various forms
• At least 2 in KNIME
• Vernalis
• Erlwood
• Recently reviewed
• Christian Tyrchan and Emma Evertsson, Comput. & Struct. Biotech. J., 2017, 15, 86‐90
3
16 March 2017
Anatomy of a Matched Molecular Pair
• Hussain‐Rea Algorithm3
• Identify bonds that can be broken
• Eg acyclic bonds
• Break molecule along each matching bond in turn
• Match identical ‘Keys’
• ‘Values’ form pair transforms
*
*
Molecule A
cut
Values form Pair Transform
identical keys
*
Molecule B
4
3. Jameed Hussain, Ceara Rea, J. Chem. Inf. Model., 2010, 50, 339–348
*
16 March 2017
Anatomy of a Matched Molecular Pair
• Hussain‐Rea Algorithm3
• Identify bonds that can be broken
• Eg acyclic bonds
• Break molecule along each matching bond in turn
• Match identical ‘Keys’
• ‘Values’ form pair transforms
CHEMBL2263252
cut
identical keys
Values form Pair Transform
CHEMBL60592
5
3. Jameed Hussain, Ceara Rea, J. Chem. Inf. Model., 2010, 50, 339–348
16 March 2017
Multi‐cut pairs
• Pairs also can be formed by cutting 2 or more bonds simultaneously
• Allows scaffold replacement transforms
• Need to track which breaking bond is which
1
*
1
*
cut
Molecule A
2
*
2
*
Values form Pair Transform
Identical Keys
1
*
1
*
2
*
2
*
Molecule A
6
16 March 2017
Multi‐cut pairs
• Pairs also can be formed by cutting 2 or more bonds simultaneously
• Allows scaffold replacement transforms
• Need to track which breaking bond is which
cut
CHEMBL373838
Identical Keys
Values form Pair Transform
CHEMBL1368873
7
16 March 2017
Transforms application
*
*
‘A’
A>>B
‘B’
Molecule A
Molecule B
Transform takes no account of relevance or context
Original molecules interconvert when transform is applied
A>>B
CHEMBL2263252
CHEMBL60592
A>>B
Other molecules generate new ideas
CHEMBL1350874
Not found in ChEMBL
8
16 March 2017
DESTRUCTION TESTING
Can we fragment all of ChEMBL?
A ‘reasonable’ ‘representative’ test set
9
16 March 2017
Pre‐processing – “Speedy SMILES”
Vernalis Community Nodes
• Fast String‐based SMILES string manipulation
• No chemical toolkit conversion
e.g. c1cc[nH]c1C(=O)OCCN(C)C
• Streamable
• Example application – pre‐processing ChEMBL
• De‐salt
• Remove large (HAC>40) or small (HAC<8) molecules
• and those with a net charge or large number of charges
10
16 March 2017
SpeedySMILES pre‐processing
ChEMBL
1,581,653 molecules in
Processed 76 seconds
1,486,077 molecules out
Failure Category
HAC < 8 or > 50
63,920
Non‐neutral
30,706
Total Charges > 4
Broken Bonds
11
Count
949
1
16 March 2017
ChEMBL fragmentation
The numbers…
• 1,420,462 molecules fragmented
• 1‐10 cuts
• Non‐functional group single bonds
• Maximum 10,000 fragmentations / molecule
• 134,020,007 fragments
• 139,679 failed rows:
Failure category
Complexity limit
Count
139,550
Too few matching bonds
79
Valence error
25
Kekulisation error
25
• 10h30min (Intel® Core™ i7‐4770 @ 3.4GHz; W10)
• 10 threads; 500 rows buffer
• ‐Xmx16329m
This version will be released to the community ‘imminently’
12
16 March 2017
Example failure rows
New Vernalis Matching Bonds Renderer Node
+H
CHEMBL2297882
Molecule failed complexity limit
15965 possible fragmentations
CHEMBL1698868
No matching bounds found, or too few to cut
N
O
HN
P
S
HN
NH HN
P
NH N
N
P
N
CHEMBL178180
Error parsing … Explicit valence for atom # 8 Te, 4, is greater than permitted
13
CHEMBL3188982
Error parsing… OC(=O)C(=O)Nc1cccc(c1)c2nnnn2 …
Unkekulized atoms
N
P
N
CHEMBL2006679
Error parsing … Unkekulized atoms 4
16 March 2017
Matched Molecular Pairs (MMPs)
• MMP Concept has been extended to improve utility
• Data analysis
• What effect does a transform have on data, e.g. Metabolism/Stability, hERG binding, target binding?
• Matched Molecular Series
• When my series is seen in activity order, what other new members are commonly ‘better’?
• Fingerprint similarity
• How closely related is the surrounding chemical matter to my input molecule?
• 3D Matched Pairs
• Molecular shape/pharmacophore presentation
• In all cases, provides additional ‘context’ to the pairs
• Can be used for substituent analysis/replacement or scaffold replacement 14
16 March 2017
MMAnalyser: Applied MMP/S Analysis
Richard Sherhod
[email protected]
MMAnalyser
• KNIME Web Portal application
• Composed of multiple interactive KNIME workflows
• Allows chemists to do matched‐molecular pair/series analysis
• Guides users through the analysis process
16
16 March 2017
MMAnalyser: Application
• Two interactive workflows for chemists
• MMPair Analyser – matched‐molecular pair analysis
• MMSeries Analyser – matched molecular series analysis
• Interactive admin workflows for database maintenance
• Rebuilding MMPair and MMSeries databases from pre‐defined sources
Kenny, P.W. & Sadowski, J., 2005. Structure Modification in Chemical Databases. In Wiley‐VCH Verlag GmbH & Co. KGaA, pp. 271–285. Available at: http://doi.wiley.com/10.1002/3527603743.ch11 [Accessed March 6, 2017].
Wawer, M. & Bajorath, J., 2011. Local Structural Changes, Global Data Views: Graphical Substructure−Ac vity Rela onship Trailing. Journal of Medicinal Chemistry, 54(8), pp.2944–2951. Available at: http://pubs.acs.org/doi/abs/10.1021/jm200026b [Accessed March 6, 2017].
17
16 March 2017
MMPair Analyser: Application
MMP Analysis
• All transforms applied to the input structure
Input structure
• Results filtered and sorted by:
• Observation count
• Enrichment of positive observations
Pre‐generated transformations with observation data
18
16 March 2017
MMPair Analyser: Database
•
•
•
•
•
•
•
ChEMBL data
Molecule dictionary
Compound structures
Compound properties (QED)
Compound records
Activities
Assays
Documents
Generate MMPs
Filtered structures
1. Fragment structures
• Fragments (values) and scaffolds (keys)
2. Add hydrogens to fragments and scaffolds
3. Generate transformations from fragments
• Record ID of left and right structures
Observations
Gather evidence
1. Filter transformations by observation count
2. Get observed changes in property for each transformation
3. Calculate enrichment of positive observations
4. Perform one‐tailed binomial test
• Keep transforms with p >= 0.05
19
16 March 2017
MMPair Analyser: Demo
20
16 March 2017
MMSeries Analyser
Matched‐molecular Series Analysis
• Extension of MMPs to three or more R‐groups (values)
• Originally proposed by Waver & Bajorath (2011)
• Several implementations, e.g. MATSY (O’Boyle et al. 2014)
• R‐groups ordered by the properties of their parent
Br > Cl > F > H
pIC50
R
pIC50
7.00
H
8.30
7.68
F
8.00
8.51
Cl
7.77
8.77
Br
8.89
Ms
R
MeO
Br > H > F > Cl
O’Boyle, N.M. et al., 2014. Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity. Journal of Medicinal Chemistry, 57(6), pp.2704–2713. Available at: http://pubs.acs.org/doi/abs/10.1021/jm500022q [Accessed March 3, 2017].
Wawer, M. & Bajorath, J., 2011. Local Structural Changes, Global Data Views: Graphical Substructure−Ac vity Rela onship Trailing. Journal of Medicinal Chemistry, 54(8), pp.2944–2951. Available at: http://pubs.acs.org/doi/abs/10.1021/jm200026b [Accessed March 6, 2017].
21
16 March 2017
MMSeries Analyser: Application
MMS Analysis
Input structures
With unique IDs and numeric data
• Query structures are fragmented into scaffolds and R‐groups
• Sets of R‐groups, their scaffold and data are arranged into series
• Query series are compared to pre‐generated series
• Common R‐groups are recorded
• Query and database series ordered by data
• Spearman's rank correlation between matching series calculated
Pre‐generated sets of scaffolds and R‐groups with data
22
• Matching series sorted by rank correlation
16 March 2017
MMSeries Analyser: Application
Scaffold
Rank
Correlation
86
23
69
63
95
60
59
52
46
94
92
0.975
100
100
0.9
72
61
16
33
7
0.5
95.6
85.2
95.7
90
78.9
34
16 March 2017
MMSeries Analyser: Database
•
•
•
•
•
•
•
ChEMBL data
Molecule dictionary
Compound structures
Compound properties (MW)
Compound records
Activities
Assays
Documents
Generate MMPs
Filtered structures
Observations
1. Fragment structures into R‐groups and scaffolds
• 4 methods including Hussein/Rea rules
2. Group R‐groups into series by common parent scaffolds
3. Keep series of 3 or more R‐groups
4. Record IDs of parent structures
Gather evidence
Associate parent structures with observation data
24
16 March 2017
MMSeries Analyser: Demo
25
16 March 2017
MMAnalyser: Possible Improvements
• Present observation data to the user
• Direct the user to relevant source material
• More datasets
• Better (more robust) processing
• Incorporate transformation site similarity (MMPs)
• Associate transformations with fingerprints from their parent scaffold(s)
• Incorporate scaffold similarity (MMSs)
• Filter/order series by similarity to query scaffold
26
16 March 2017
Acknowledgments
• Vernalis colleagues
• Greg Landrum (RDKit)
• ChEMBL
• KNIME
Matched‐molecular series implementation:
Hunt, P. et al., 2017. Practical applications of matched series analysis: SAR transfer, binding mode suggestion and data point validation. Future Medicinal Chemistry, 9(2), pp.153–168. Available at: http://www.future‐
science.com/doi/10.4155/fmc‐2016‐0203 [Accessed March 3, 2017].
27
16 March 2017
Thank you!
BACKUP SLIDES
29
16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries
• Existing absolute/unknown must be preserved
Known Stereocentre?
Unknown/Racemic
Stereocentre?
Yes
31
No
Yes
No
16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries
• Existing absolute/unknown must be preserved
Known Stereocentre?
Unknown/Racemic
Stereocentre?
Yes
32
No
Yes
No
“KNOWN KNOWNS”
We know we know about stereochemistry
16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries
• Existing absolute/unknown must be preserved
Known Stereocentre?
Unknown/Racemic
Stereocentre?
Yes
33
No
Yes
“KNOWN UNKNOWN”
We know unknown or racemic
No
16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries
• Existing absolute/unknown must be preserved
Known Stereocentre?
Unknown/Racemic
Stereocentre?
Yes
34
No
Yes
No
?
“UNKNOWN UNKNOWN”
We have no idea…
16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries
• Existing absolute/unknown must be preserved
Known Stereocentre?
Unknown/Racemic
Stereocentre?
Yes
35
No
Yes
No
?
or
16 March 2017
Sneak Preview – Upcoming revised release
• Memory leak fixed
• Survives ‘destruction testing’
• New fragmentation type added
• More flexibility for custom types
• New parallelised pair generation nodes
• Transform filtering options
• Reference table version
• Only generates pairs between rows from the two tables
• Improved Rendering/Filtering nodes
36
16 March 2017
Attachment point fingerprints
• RDKit Morgan ECFP‐like fingerprint
• Rooted at the attachment point atom for each ‘key’ component
• Default radius 4, size 2048 bit
• Calculated during fragmentation in Vernalis Nodes
1: 10000000000100000000010100000100
1
A
1: 00001000000010000000100000000101
O
1: 00000000000010000000000100000100
2: 10000000000001000000001100000100
Example 32‐bit AP fingerprints
37
16 March 2017