Free Energy Perturbation at Merck: Benchmarking against Faster Methods Vertex Free Energy Workshop 2016 05/16/16 Andreas Verras Merck FEP Evaluation and Benchmarking Overview: • Discussion of data set for validation • Performance • Domain of applicability • Benchmarking against faster methods • Future directions 2 FEP Data Set Generation Initial data set was identified from internal crystallographic database with a by near neighbors. The following criteria were also used: • No charge state changes • No qualified data and well behaved binding curves • All chirality must be known absolutely • No metals in the protein active site • Symmetry events flagged but included 3 FEP Data Set Generation • Initially single atom perturbations were identified. • Because of the mapping workflow we built maps containing multiple atom perturbations. • Hysteresis is assessed for cycles rather than back and forth transformations. 4 Data Set • Have completed 15 maps, ~200 total perturbations. Will review data for first 124 pairs. • Perturbation rate ~2 / 24 hrs on 4 GPUs. • All receptor structures were prepared by modelers. • Receptor and water inclusion were determined by someone who had supported each project. 5 Performance on Validation Set • Overall R2 of about 0.3 • Mean unsigned error is about 1.5 kcal/mol • Performance varies depending on target and map 6 FEP at Merck still an Enrichment Method • Data binned into 1 kcal/mol ΔΔG changes. • In cases where FEP predicts a 1 kcal/mol change in either direction, the experimental result agrees or is essentially unchanged 95% of the time. ddG exp ≤ -1 kcal/mol -1 to 1 kcal/mol ≥ 1 kcal/mol 7 Domain Applicability – Perturbation Size • Red box indicates perturbations witnessed N ≥ 10 times. • Performance likely declines with increasing perturbation size. 8 Domain Applicability – Ring vs. Substitution • Perturbations were classified as Ring Changes if an atom is modified within a ring vs. Ring Substitution if a ring substituent is modified. • No difference in performance. 9 Domain Applicability – Cycle Closure Error • Red box indicates perturbations witnessed N ≥ 10 times. • Error may trend upward with increasing hysteresis. 10 Benchmarking – PhysProps and Fast Methods Performance was compared against change in physical properties including: • PSA and SASA • LogD and Heavy Atom Count • HBD and HBA Performance was compared against MMGBSA scoring implemented in Schrodinger with 6Å flexible active site around ligand and default parameters. 11 Benchmark - SASA ddG exp ≤ -1 kcal/mol -1 to 1 kcal/mol ≥ 1 kcal/mol 12 Benchmark - MMGBSA ddG exp ≤ -1 kcal/mol -1 to 1 kcal/mol ≥ 1 kcal/mol 13 Benchmarking – Visual inspection There was concern that FEP may have been getting “easy” perturbations correct. Developed a visual inspection tool and had 18 participants vote on the perturbations. A vote of -1 indicated left compound was more potent by at least 1 kcal/mol, 1 indicated right compound was more potent, and 0 indicated the two compounds were within 1 kcal/mol potency. No one was allowed to vote on their own projects. A voting mean was calculated for all perturbations. 14 Visual Inspection Tool 15 Benchmark - Visual Inspection ddG exp ≤ -1 kcal/mol -1 to 1 kcal/mol ≥ 1 kcal/mol 16 Consensus Methods a) c) b) Performance of Random Forest with AP, DP, and MOE2D descriptors(a) compared with FEP(b) and an unweighted geometric mean(c). While this is an initial evaluation on a single target it may suggest that a 17 consensus approach may add predictivity. Conclusions • FEP is still an enrichment method. • Performance is both dependent on targets and on maps. • Error is likely dependent on perturbation size and hysteresis, but our understanding of domain of applicability is incomplete. • FEP outperforms cheaper methods and physical properties. • FEP outperforms visual inspection. • There may be improved performance in including consensus scores. 18 Future Directions • Prospective application in projects. • Work with others to better understand domain applicability and variation based on user. • Evaluate consensus approaches. 19 Acknowledgements Merck Alejandro Crespo Kerim Babaoglu John Sanders Deping Wang Xavier Fradera Hakan Gunaydin Hongwu Wang Michael Altman Sung-Sau So Jennifer Johnston Daniel Mcmasters Matt Walker Robert Sheridan Zhuyan Guo Yuan Hu Chip Lesburg Frank Brown Brad Sherborne Schrodinger Alessandro Monge Jeff Sanders Fiona McRobb Teng Lin Thijs Beuming 20 backup 21 Benchmark – MMGBSA All protein ligand complexes were rescored by MMGBSA. MMGBSA is not predictive of experimental affinities for our data set. 22 Benchmark - PhysProp Benchmarks against delta logD between pairs (left) and delta heavy atom count (right). Neither physical property is predictive of affinity for our data set. 23
© Copyright 2025 Paperzz