Update on Negative Predictions in Derek

Update on Negative Predictions in Derek
vICGM3, 14th May 2014
Richard Williams
Principle Scientist, Lhasa Limited
[email protected]
Summary
• Why implement negative predictions?
• What has been implemented?
• Definitions of new concepts
 Misclassified and unclassified features
 Lhasa Ames test reference set
• How does it look?
• What is the performance?
• How should negative predictions be interpreted?
 Two worked examples
3
Summary
• Why implement negative predictions?
• What has been implemented?
• Definitions of new concepts
 Misclassified and unclassified features
 Lhasa Ames test reference set
• How does it look?
• What is the performance?
• How should negative predictions be interpreted?
 Two worked examples
4
Why implement negative predictions?
• To assist users when no alerts or examples have been
matched for bacterial, in vitro mutagenicity
 Increase confidence in non-alerting outcomes (which have often
been interpreted as ‘negative’)
 Provide information to support expert assessment
 Are there any features that could be considered concerning?
 Develop ‘nothing to report’ into more meaningful outcomes
• The necessity for this is dependent on the application of the
predictions
 Early screening – non-alerting may be good enough
 Regulatory submissions – should consider reliability of non-alerting
outcome
 Derek now provides the information to lubricate this interchange
5
No changes for alerting compounds
Match alert
or example
6
Starting position: nothing to report
No alert or
example match
7
What has been implemented – background work
No alert or
example match
Aggregate public
data sets into a
Lhasa Ames test
reference set
Process reference
set against Derek
mutagenicity alerts
Identify non-alerting
mutagens
8
What has been implemented – workflow
No alert or
example match
Does query contain features found
in non-alerting mutagens?
Misclassified
features
Does query contain features
not found in reference set?
Unclassified
features
Prediction
9
What has been implemented – four potential outcomes
No alert or
example match
Inactive
Inactive with
misclassified
features
Inactive with
unclassified
features
Inactive with misand un-classified
features
10
What has been implemented – new functionality:
negative predictions
No alert or
example match
11
Summary
• Why implement negative predictions?
• What has been implemented?
• Definitions of new concepts
 Misclassified and unclassified features
 Lhasa Ames test reference set
• How does it look?
• What is the performance?
• How should negative predictions be interpreted?
 Two worked examples
12
Definitions: Lhasa Ames test reference set
• Lhasa Ames test reference set is an aggregation of six sets of publically available
Ames test data
 CGX (Kirkland et al)
 Hansen data set (aka the Benchmark data set)
 ISSSTY (derived from CCRIS database)
 Marketed pharmaceuticals (derived from Snyder et al publications)
 NTP data (derived from Vitic database)
 FDA CFSAN data set (provided as part of collaboration with FDA)
• Compounds with equivocal and inconsistent results have been removed
 5177 Mutagens, 5066 Non-mutagens
• We have not
 Gone back to primary references and rechecked all of the data
 Carried out searches to determine whether reported activity (a snapshot) is
representative of all published data for each compound
13
Definitions: misclassified and unclassified features
• These features are only reported for compounds that do not
activate bacterial, in vitro mutagenicity alerts or examples
• Misclassified features
 Are present in (at least one) non-alerting mutagen in publically
available data
• Unclassified features
 Are not present, in the context of the query molecule, in publically
available data
• These features are weak arguments against the inactive
prediction
 Depending on the application, the presence of such features may
require follow-up
14
Confidence in negative predictions
Confidence in
negative prediction
DX3
DX4
Inactive
Nothing
to report
(no misclassifed or
unclassifed features)
Inactive
(contains misclassifed
and/or unclassifed features)
Confidence can be
increased or decreased
by expert assessment
of misclassifed and/or
unclassifed features
Summary
• Why implement negative predictions?
• What has been implemented?
• Definitions of new concepts
 Misclassified and unclassified features
 Lhasa Ames test reference set
• How does it look?
• What is the performance?
• How should negative predictions be interpreted?
 Two worked examples
16
How does it look? Inactive prediction
(without misclassified or unclassified features)
Explanatory
text
Prediction
17
How does it look? Inactive (with misclassified features)
prediction
Misclassified
features
highlighted
Explanatory
text
Prediction
18
How does it look? Inactive (with unclassified
features) prediction
Unclassified
features
highlighted
Explanatory
text
Prediction
19
Summary
• Why implement negative predictions?
• What has been implemented?
• Definitions of new concepts
 Misclassified and unclassified features
 Lhasa Ames test reference set
• How does it look?
• What is the performance?
• How should negative predictions be interpreted?
 Two worked examples
20
Performance
• The performance of the new functionality has been
assessed using three proprietary data sets
Prop.1
Ames +ve
Ames -ve
Prop.2
Vitic Int.
0
200
400
600
800
1000
21
Distribution
• These charts demonstrate how many compounds fall into
each predictive category
Vitic Intermediates
Prop. Dataset 1
20 9
29
15 1
39
Prop. Dataset 2
13 31
87
361
464
280
372
22
Predictivity
100
% Negative Predictivity
80
60
280
29
15
372
13
31
464
20
9
Inactive
Inactive (+mis)
40
Inactive (+unc)
20
0
Prop. 1
Prop. 2
Vitic Int.
Data set
Negative predictivity
= How often are negative predictions correct?
= Σ negative predictions made for non-mutagens
23
Σ all negative predictions
Summary of distribution and predictivity
• Majority of compounds in analysis are either alerting or inactive (with
no misclassified or unclassified features)
 Varies from 87.6% to 96.6%
• Negative predictivity is high for compounds without misclassified or
unclassified features
 Varies from 86.0% to 94.3%
 Comparable to repeatability of Ames test
• Negative predictivity for compounds with misclassified and/or
unclassified features
 Is more variable (sample groups are small)
 Is reduced by presence of unclassified and misclassified features, but
in all cases comparable to Ames test repeatabilty

Inactive (89.2%) > Inactive+unc. (86.7%) > Inactive+mis. (83.6%)
24
Summary
• Why implement negative predictions?
• What has been implemented?
• Definitions of new concepts
 Misclassified and unclassified features
 Lhasa Ames test reference set
• How does it look?
• What is the performance?
• How should negative predictions be interpreted?
 Two worked examples
25
Interpretation – unclassified features
• Unclassified features are those that are not found following a search
in our reference set
 Built using data in the public domain
• Where these features are reported, Derek has found no alerts
 Depending on the application of the prediction, this may be good
enough (e.g. during early screening)
• In these cases, the public data can’t be used to determine the
reliability of Derek’s inactive call
 This tells us something about the data, not something about Derek
(which may be considering proprietary toxicity data, or
mechanistic/chemical data)
• If required, the significance of the unclassified features can be
determined by an expert
26
Interpretation – misclassified features
• Misclassified features are those that have been found in non-alerting
mutagens in our reference set
 Built using data in the public domain
• Where these features are reported, Derek has found no alerts
 Depending on the application of the prediction, this may be good
enough (e.g. during early screening)
• This is not a flag for mutagenicity
 Although present in at least one mutagen, it may not be the feature
promoting mutagenicity
 Uncertainty in public data set – these are only snaphots of the whole
• If required, the significance of the unclassified features can be
determined by an expert
27
Interpretation – how would an expert determine
significance?
• For misclassified features
 Use databases (e.g. Toxnet, Vitic) to identify similar
compounds in public data sets
• For both misclassified and unclassified features
 Use in-house data sets to identify proprietary analogues
containing the same feature
• For the specific case of GTI assessments
 If the (Ames negative) API and the evaluated impurity
contain unclassified or misclassified feature, can argue that
the feature is not relevant for activity

PhRMA class 4 impuirty
28
Worked example 1
29
Worked example 1 – compound with misclassified
features
30
Worked example 1 – evaluate misclassified
features in public data
31
Worked example 1 – follow-up analysis of data
32
Worked example 2
33
Worked example 2 – potential impurity in simeprevir
Active Pharmaceutical Ingredient (API)
Potential Impurity (GTI)
2
Simeprevir
(Ames neg.)
Potential
impurity
34
Worked example 2 – impurity contains unclassified
features
2
35
Worked example 2 – API also contains unclassified
features
36
Worked example 2 – compare unclassified features
API
GTI
• API and GTI both contain the same unclassified feature
 API is Ames negative, so unclassified feature unlikely to promote
mutagenicity
 Analogous to class 4 impurity: ‘alerting structure’ related to the API
37
Worked example 2 – potential class 4 impurity
Control as an
ordinary impurity
2
*
*Text taken from Muller et al (2006) A rationale for determining, testing, and
controlling specific impurities in pharmaceuticals that possess potential for
genotoxicity. Regulatory Toxicology and Pharmacology 44(3), 198-211
38
Conclusion
• New functionality implemented into Derek
• Provides reliable negative predictions
• Highlights features that
 May reduce confidence in negative predictions
 Can be further interrogated by users
39
Questions?
Extra slides
Lhasa Ames test reference set composition
42