Applicability Domain Towards a more formal definition 17th International Conference on QSAR in Environmental and Health Sciences Thierry Hanser Research Leader [email protected] Presentation aims • Applicability Domain (AD) is not a monolithic concept: there are 3 key layers • Separation of concerns can help clarify and formalise the notion of AD • Purpose: Initiate a constructive discussion among the (Q)SAR community to build a common understanding together • Harmonize the way we define and present AD to the end user across models and applications • Remove confusion for the end user and improve the value of AD model Current understanding and definitions (Q)SAR principles • • • • • A defined endpoint An unambiguous algorithm A defined domain of applicability Appropriate measures of goodness-of–fit, robustness and predictivity A mechanistic interpretation, if possible Common definition “AD is the response and chemical structure space in which the model makes predictions with a given reliability”. Guidance Document on the Validation of (Quantitative) Structure−Activity Relationship (Q)SAR Models; OECD Series on Testing and Assessment No.69; OECD Environment Directorate, Environment, Health and Safety Division: Paris, 2007 Current understanding and definitions (Q)SAR principles • • • • • A defined endpoint An unambiguous algorithm A defined domain of applicability Appropriate measures of goodness-of–fit, robustness and predictivity A mechanistic interpretation, if possible Boundaries Common definition2 Likelihood ? “AD is the response and chemical structure space in which the model makes predictions with a given reliability”. Applicability Reliability A good foundation on which to build Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inf. 2016 May 1;35(5):160–80. Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O. Applicability Domain for QSAR Models: Where Theory Meets Reality. International Journal of Quantitative Structure-Property Relationships. 2016 Jan;1(1):45–63. Norinder U, Rybacka A, Andersson PL. Conformal prediction to define applicability domain – A case study on predicting ER and AR binding. SAR and QSAR in Environmental Research. 2016 Apr 2;27(4):303–16. Toccacheli P, Nouretdinov I, Gammerman A. Conformal Predictors for Compound Activity Prediction. arXiv:160304506 [cs] [Internet]. 2016 Mar 14 [cited 2016 May 11]; Available from: http://arxiv.org/abs/1603.04506 Roy K, Kar S, Ambure P. On a simple approach for determining applicability domain of QSAR models. Chemometrics and Intelligent Laboratory Systems. 2015 Jul 15;145:22–9. Sheridan RP. The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity. J Chem Inf Model. 2015 Jun 22;55(6):1098–107. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, et al. QSAR Modeling: Where have you been? Where are you going to? J Med Chem. 2014 Jun 26;57(12):4977–5010. Carrió P, Pinto M, Ecker G, Sanz F, Pastor M. Applicability Domain Analysis (ADAN): A Robust Method for Assessing the Reliability of Drug Property Predictions. J Chem Inf Model. 2014 May 27;54(5):1500–11. Toplak M, Močnik R, Polajnar M, Bosnić Z, Carlsson L, Hasselgren C, et al. Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models. J Chem Inf Model. 2014 Feb 24;54(2):431–41. Dragos H, Gilles M, Alexandre V. Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models. J Chem Inf Model. 2009 Jul 27;49(7):1762–76. And many more... Current common methods Molecule classes • Organic-Organometallic-Inorganic • Class of molecules (e.g. Aromatic Amines) Feature representation • Unseen features Agreement based • RF consensus • kNN Descriptor ranges • Box • Convex hull Distance based methods • Distance to data points • Density Response domain Current common methods Molecule classes • Organic-Organometalic-Inorganic • Class of molecules (e.g. Aromatic Amines) Feature representation • OH S Unseen features Agreement based • RF consensus • kNN NH Descriptor ranges • Box • Convex hull Distance based methods • Distance to data points • Density Response domain No boronic acids in the training set Current common methods • Organic-Organometallic-Inorganic • Class of molecules (e.g. Aromatic Amines) Likelihood Molecule classes Feature representation • Unseen features Agreement based • RF consensus • kNN Random forest • Box • Convex hull Likelihood Descriptor ranges Distance based methods • Distance to data points • Density Response domain Nearest Neighbours Current common methods Molecule classes • Organic-Organometallic-Inorganic • Class of molecules (e.g. Aromatic Amines) Feature representation • Unseen features Agreement based • RF consensus • kNN D2 Descriptor ranges • Box • Convex hull D1 Distance based methods • Distance to data points • Density Response domain Box Current common methods Molecule classes • Organic-Organometallic-Inorganic • Class of molecules (e.g. Aromatic Amines) Feature representation • Unseen features Agreement based • RF consensus • kNN D2 Descriptor ranges • Box • Convex hull Distance based methods • Distance to data points • Density Response domain D1 Convex hull Current common methods Molecule classes • Organic-Organometallic-Inorganic • Class of molecules (e.g. Aromatic Amines) Distance to data Feature representation • Unseen features Agreement based • RF consensus • kNN D2 Descriptor ranges • Box • Convex hull Distance based methods • Distance to data points • Density Response domain D1 Distance to data points Current common methods Molecule classes • Organic-Organometallic-Inorganic • Class of molecules (e.g. Aromatic Amines) Descriptor density Feature representation • Unseen features Agreement based • RF consensus • kNN D2 Descriptor ranges • Box • Convex hull D1 Distance based methods • Distance to data points • Density Response domain Density of data Current common methods Molecule classes • Organic-Organometallic-Inorganic • Class of molecules (e.g. Aromatic Amines) Feature representation Unseen features Agreement based • RF consensus • kNN Descriptor ranges • Box • Convex hull Distribution • Distance based methods • Distance to data points • Density Response domain Predicted property Mixture of different concepts Likelihood Applicability (can I use this model to make a prediction ?) Reliability (is the prediction reliable ?) Decidability (can I make a clear decision ?) Mixture of different concepts Applicability (can I use this model to make a prediction ?) Reliability (is the prediction reliable ?) Decidability (can I make a clear decision ?) Mixture of different concepts Applicability Domain Towards an extended and more formal framework Confidence in the decision if ... My model can be applied for this query compound Applicability domain The prediction is reliable enough for my use case Reliability domain I can make a clear decision Decidabiity domain Applicability (of the model) Can I apply my model ? Applicabilty Model boundaries (Designer's specifications) Out of Applicability Domain Perform the prediction • Is the class of my query compound supported by the model ? e.g. exclude polymers, proteins, inorganic molecules • Is my query compound in the descriptor range of the traning set ? e.g. inside convex hull, minimum information density • Has my model seen all the structural features present in the query compound ? e.g. unseen boronic acid functional group Reliability (of the prediction) Can I apply my model ? Out of Applicability Domain Perform the prediction Can I trust my prediction ? Reliability Prediction boundaries (User defined) Out of Reliability Domain Look at the prediction • How close are the nearest neighbours ? • How reliable are these nearest data points ? e.g. GLP compliance • How well did my model predict the nearest neighbours? e.g. performance during cross-validation ‘Decidability’ (of the outcome) Can I apply my model ? • Does my evidence converge or conflict ? e.g. nearest neighbours distribution Perform the prediction Out of Applicability Domain • Is there a consensus between intermediate conclusions ? e.g. RF tree distribution Can I trust my prediction ? Out of Reliability Domain • Is my posterior likelihood strong enough ? e.g. Naïve Bayes posterior probability Look at the prediction Can I make a clear call ? Equivocal / Undecided Decidability Likelihood boundaries (User defined) Make a statement 3-step process (separation of concerns) Can I apply my model ? Applicability Out of Applicability Domain Perform the prediction Can I trust my prediction ? Reliability Out of Reliability Domain Look at the prediction Can I make a clear call ? Decidability Equivocal / Undecided Make a statement Intuitive, non ambigous and formal decision framework Can I apply my model ? Out of Applicability Domain Perform the prediction Can I trust my prediction ? Decision domain Out of Reliability Domain Look at the prediction Can I make a clear call ? Equivocal / Undecided Make a statement Decision Domain Proposal: Extend and refine the Applicability Domain into a Decision Domain Decision Domain “The Decision Domain (DD) is the scope within which it is possible to apply the model to make a non equivocal decision based on a reliable prediction”. Confidence Reliability Confidence (subjective) Applicability Decidability Confidence in the decision Low reliability High reliability Confidence in the decision Low decidability High decidability Confidence in the decision Can I trust my prediction ? We can't trust this high likelihood ! Outside the Reliability Domain Outside the Reliability Domain Confidence in the decision Difficult call (activity cliff) Can I make a clear call ? Outside the Reliability Domain Outside the Reliability Domain Equivocal / Undecided Confident call Reliability and Likelihood are not compensatory Reliability and Likelihood are not interconvertible They can't compensate each other (e.g. low likelihood can't be compensated by high reliability) Transparent decision process My model can be applied for this query compound Applicability The prediction is reliable enough for my use case Reliability I can make a clear decision Decidability Decision domain Transparent decision process My model can be applied for this query compound Applicability The prediction is reliable enough for my use case Reliability I can make a clear decision I understand which assumptions the model has used in its reasoning I know which data (examples) support the model's conclusion Decision domain Decidability Interpretation Evidence Decision support Transparent decision process My model can be applied for this query compound Applicability The prediction is reliable enough for my use case Reliability I can make a clear decision I understand which assumptions the model has used in its reasoning I know which data (examples) support the model's conclusion Decision domain Transparent Decision Decidability Interpretation Evidence Decision support TARDIS principle Requirements to usefully support a decision process : T ransparency (of the method) A pplicability (of the model) R eliability (of the prediction) D ecidability (non equivocal conclusion) I nterpretability (of the conclusion) S upport (evidence supporting the conclusion) Conclusion • Confidence in a decision can be assessed in 3 steps corresponding to 3 separate concerns • Separation of concerns offers clarity, transparency and can be more easily formalised • In the context of risk assessment models should provide transparent, interpretable, reliable and non equivocal conclusions that can be easily assessed by human experts using their own knowledge • It is the human experts that make the final decision based on their knowledge and the helpful information provided by the tools Thank you for your contribution and attention !
© Copyright 2026 Paperzz