Artificial neural network used for detection of mutations in DNA

Artificial neural network used for detection of mutations in DNA sequence raw data.
Tobias Söderman and Lennart Björkesten, Amersham Pharmacia Biotech, Uppsala, Sweden.
[email protected], [email protected]
Keywords: Mutation, base-calling, artificial neural network.
Abstract
Mutation detection has important applications in many different fields ranging from early drug discovery to clinical diagnostics. Even though there are many methods available
there are few being as general in applicability and defining as accurately the nature of the change as direct DNA sequencing [1].
Depending on the origin of the mutation, one must in the general case expect any mixture between a mutated DNA component and the wild-type component in the sample.
Inherited mutations typically give rise to a heterozygote 50-50 mixture while induced mutations e. g. in tumour tissue may give rise to any mixture depending on the heterogeneous
composition [2].
Base-calling strategies are traditionally designed and optimised for genomics oriented applications. The main focus has been on read-length and on the accurate assignment of clean
bases [3]. Less effort has been put into the analysis of heterozygote situations and general mixtures.
Our ANN based mutation detection algorithm is applied as a second pass on data processed by ordinary base-calling software. Every single base position is reconsidered as
potentially hiding a point mutation. A number of descriptive features derived from the raw data traces, from the actual sample as well as from a reference sample, are fed into an
artificial neural network, which is trained to produce mutation assignments.
The performance of the ANN based mutation detection algorithm will be discussed in terms of sensitivity and specificity and it will be pointed out that pre-processing of the
descriptive features is essential in order to keep the required training at a reasonable level.
Algorithm Overview
Introduction
The detection of point mutations using data produced by
automated sequencers becomes complicated due to the fact that
induced point mutations give rise to any mixture of wildtype
component and mutated component.
Primary Features
Peak Amplitude, Modulation, Asymmetry...
Peak
Amplitude
Normalisation
Normalised Features
Reference Sample Comparison
75% A
Compared Features
10% A
•A set of primary features are extracted from raw data
Functional Labelling
Labelled Features
50% A
produced by the automated sequencer. Examples of features
are peak amplitude, peak modulation and asymmetry.
Wild Type Component, Second
Significant Component...
Deviation Pattern Analysis
Mutation Assignment
Deletions and insertions
are not considered in this evaluation
4
Normalisation - Reference Comparison
Functional Labelling
WT
Sorted on decreasing size
Deviation Pattern Analysis
Sorted according to F1 size
0.00 0.01
Sample Averages
Sample Data
0.96 0.68
• Features are normalised against
sample average for that particular
feature to correct for variations in
sample concentration.
Reference
F1[Comp. 4]
F1[Comp. 3]
compared to normalised
reference sample features to
calculate deviation signals.
0
161
0
73
False Positive
Training set Test set
1-6
7-12
F2[Comp. 2]
F2 [WT Comp.]
975
1040
199
193
25
27
Remaining False Positive
Training set
Test set
0.9 %
1.3 %
1.4 %
2.8 %
Remaining Positive
Training set
Test set
100 %
100 %
100 %
100 %
5
4
Artificial Neural Net
Mutation Assignment
F1 = “Leading Feature”
•
F2 = “Secondary Feature”
The deviation signals for the nucleotide position to be investigated
are sorted to produce a deviation pattern.
[1] Grompe, M., Nat. Genet. 5, 111-117 (1993).
Positive
Training set Test set
Automated Mutation Assignment
Exp.
F2[Comp. 4]
F2[Comp. 3]
References
Ordinary Base Calling
1-6
7-12
F1[Comp. 2]
F1 [WT Comp.]
• The normalised features are
Results
Negative
Training set Test set
Reference Data
Deviation Pattern
Sample
Deviation signals
Exp.
Peak
Asymmetry
Peak
Modulation
15% A
90% A
We have developed
a data evaluation
procedure that
utilises Artificial
Neural Networks for
the analysis. We
evaluated this
procedure on the
p53-gene.
Primary Feature Extraction
Primary Feature Extraction
[2] Hederum, A., Pontén, F., Ren, Z., Lundeberg, J., Pontén, J. and Uhlén,
M., BiTechniques 17, 118-129 (1994).
[3] Tibbetts, C., Bowling, J. M. and Golden, J. B. III. (1994). In Adams, M.
D. et al., eds., Automated DNA Sequencing and Analysis, 219-230. San
Diego, CA: Academic Press.
•
The deviation patterns are fed into a multilayer perceptron neural
network which is trained to produce accurate mutation assignments.