Evaluation Of The Statistics-Based Ames

Evaluation Of The Statistics-Based Ames
Mutagenicity Model Sarah Nexus And
Interpretation Of The Results Obtained
Alex Cayley
[email protected]
Summary
• What is a “statistics-based Ames mutagenicity model” and
why are they important?
• How can we judge how useful these models will be?
(validation statistics, expert interpretation)
• How well does Sarah Nexus (SX) version 1.1 predict and
how can results from the program be interpreted to
increase performance?
• Worked examples of SX version 1.1 predictions
What Are Statistical Models?
• In our sphere no real clear-cut definition but…
• Data for a given endpoint is fed in to the model builder
(binary or continuous)
• An algorithm based on one or multiple descriptors is used to
distinguish compound categories or predict values
• Any patterns found in the data are the result of statistical
relationships and are machine learnt and NOT human
intervention
• Definition is important for ICH M7 guidance compliance
• May not be as important elsewhere and the distinction is
blurring…
What Are Statistical Models?
• In our sphere no real clear-cut definition but…
• Data for a given endpoint is fed in to the model builder
(binary or continuous)
• An algorithm based on one or multiple descriptors is used to
distinguish compound categories or predict values
• Any patterns found in the data are the result of statistical
relationships and are machine learnt and NOT human
intervention
• Definition is important for ICH M7 guidance compliance
• May not be as important elsewhere and the distinction is
blurring…
ICH M7 Guidance
What Makes A Good Statistical Model?
What Makes A Good Statistical Model?
Super Expert Scientist
“right” = Every time
Explanation = Full and Reasoned
“right” = Sometimes
Explanation = None
7
What Makes A Good Statistical Model?
Super Expert Scientist
Expert Scientist
“right” = Every time
Explanation = Full and Reasoned
“right” = Most Times
Explanation = None
“right” = Most Times
Explanation = Full and Reasoned
“right”= Sometimes
Explanation = Some
“right” = Sometimes
Explanation = None
8
What Makes A Good Statistical Model
Test Set
1
2
3
4
Pharma A
Pharma B
Pharma C
Pharma D
14 Pharma N
Training
Data
QSAR
Model
Performance Stats
BA SEN SPEC
72
76
68
63
38
89
72
64
80
75
65
85
xx
xx
Validation
xx
Expert
Interpretation
Validation of (Q)SAR Models In The Literature
Validation Of Sarah Nexus (v1.1)
Specificity = 69-91% (83% mean)
Sensitivity = 38-68% (55% mean)
Balanced Accuracy =
Sens + Spec
2
Positive <50
Incorrectly assign 3-4 = ~10%
= 62-77% (69% mean)
Prediction Scenarios in Sarah Nexus
Overall Prediction
+
Confidence
Overruled
Hypotheses
+
Confidence
Positive
Negative
Negative Predictions
Negative Predictions
Positive Predictions
Positive Predictions
Confidence Correlation With Predictivity
Equivocal Predictions
Out Of Domain Predictions
An Update
Sarah Nexus V1.2
Data Set ID
SIZE
1
2
3
4
5
POS
879
513
4018
2862
4040
NEG
279
97
576
170
725
BAC
600
416
3442
2692
3315
ACC
72
74
67
67
62
68
Mean
SEN
74
81
78
83
81
SPEC
68
62
51
48
33
53
PPV
76
86
82
85
90
84
NPV
55
48
32
17
42
TP
85
91
91
96
87
TN
143
44
231
57
186
FP
372
284
2261
1661
2444
FN
115
48
488
283
260
COV
67
27
224
61
371
EQ
79
79
80
72
81
78
OOD
136
76
498
646
430
EM
46
34
316
192
374
51
0
158
28
243
Sarah Nexus V2.0.1
Data Set ID
SIZE
1
2
3
4
5
Mean
POS
879
513
4018
2862
4040
NEG
279
97
576
170
725
BAC
600
416
3442
2692
3315
ACC
77
72
69
67
68
71
SEN
76
76
78
83
80
SPEC
80
64
56
48
50
60
PPV
74
80
82
85
86
81
NPV
60
44
33
17
44
TP
89
90
92
96
89
TN
188
51
254
56
282
FP
361
253
2262
1654
2205
FN
126
65
507
283
361
COV
46
29
198
61
278
EQ
82
78
80
72
77
78
OOD
136
84
524
621
590
EM
22
31
272
187
343
121
0
313
27
389
Conclusions
• Proprietary data sets can give a good indication of the
performance of statistical prediction systems
• Sarah Nexus performs well when tested against a
number of different proprietary validation sets
• Additional information provided for each prediction is also
important in aiding the user to make a final decision
• Negative predictions based purely on negative hypotheses
are more reliable
• Positive predictions with a higher confidence are more
reliable
Barber et al.; Reg. Tox. Pharm.; 76; 7-20 (2016)
http://www.sciencedirect.com/science/article/pii/S0273230015301410
Acknowledgements
• Sandy Weiner
• Joerg Wichard
• Amanda Giddings
• Susanne Glowienke
• Alexis Parenty
• Alessandro Briggo
• Hans-Peter Spirkl
• Alexander Amberg
• Ray Kemper
• Nigel Greene
• Chris Barber
• Thierry Hanser
• Alex Harding
• Crina Heghes
• Jonathan Vessey
• Stephane Werner
Questions?
Work in progress disclaimer
This document is intended to outline our general product direction and is for information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon. The development, release, and timing of any features or functionality described for
Lhasa Limited’s products remains at the sole discretion of Lhasa Limited.
25