Artificial neural network used for detection of mutations in DNA sequence raw data. Tobias Söderman and Lennart Björkesten, Amersham Pharmacia Biotech, Uppsala, Sweden. [email protected], [email protected] Keywords: Mutation, base-calling, artificial neural network. Abstract Mutation detection has important applications in many different fields ranging from early drug discovery to clinical diagnostics. Even though there are many methods available there are few being as general in applicability and defining as accurately the nature of the change as direct DNA sequencing [1]. Depending on the origin of the mutation, one must in the general case expect any mixture between a mutated DNA component and the wild-type component in the sample. Inherited mutations typically give rise to a heterozygote 50-50 mixture while induced mutations e. g. in tumour tissue may give rise to any mixture depending on the heterogeneous composition [2]. Base-calling strategies are traditionally designed and optimised for genomics oriented applications. The main focus has been on read-length and on the accurate assignment of clean bases [3]. Less effort has been put into the analysis of heterozygote situations and general mixtures. Our ANN based mutation detection algorithm is applied as a second pass on data processed by ordinary base-calling software. Every single base position is reconsidered as potentially hiding a point mutation. A number of descriptive features derived from the raw data traces, from the actual sample as well as from a reference sample, are fed into an artificial neural network, which is trained to produce mutation assignments. The performance of the ANN based mutation detection algorithm will be discussed in terms of sensitivity and specificity and it will be pointed out that pre-processing of the descriptive features is essential in order to keep the required training at a reasonable level. Algorithm Overview Introduction The detection of point mutations using data produced by automated sequencers becomes complicated due to the fact that induced point mutations give rise to any mixture of wildtype component and mutated component. Primary Features Peak Amplitude, Modulation, Asymmetry... Peak Amplitude Normalisation Normalised Features Reference Sample Comparison 75% A Compared Features 10% A •A set of primary features are extracted from raw data Functional Labelling Labelled Features 50% A produced by the automated sequencer. Examples of features are peak amplitude, peak modulation and asymmetry. Wild Type Component, Second Significant Component... Deviation Pattern Analysis Mutation Assignment Deletions and insertions are not considered in this evaluation 4 Normalisation - Reference Comparison Functional Labelling WT Sorted on decreasing size Deviation Pattern Analysis Sorted according to F1 size 0.00 0.01 Sample Averages Sample Data 0.96 0.68 • Features are normalised against sample average for that particular feature to correct for variations in sample concentration. Reference F1[Comp. 4] F1[Comp. 3] compared to normalised reference sample features to calculate deviation signals. 0 161 0 73 False Positive Training set Test set 1-6 7-12 F2[Comp. 2] F2 [WT Comp.] 975 1040 199 193 25 27 Remaining False Positive Training set Test set 0.9 % 1.3 % 1.4 % 2.8 % Remaining Positive Training set Test set 100 % 100 % 100 % 100 % 5 4 Artificial Neural Net Mutation Assignment F1 = “Leading Feature” • F2 = “Secondary Feature” The deviation signals for the nucleotide position to be investigated are sorted to produce a deviation pattern. [1] Grompe, M., Nat. Genet. 5, 111-117 (1993). Positive Training set Test set Automated Mutation Assignment Exp. F2[Comp. 4] F2[Comp. 3] References Ordinary Base Calling 1-6 7-12 F1[Comp. 2] F1 [WT Comp.] • The normalised features are Results Negative Training set Test set Reference Data Deviation Pattern Sample Deviation signals Exp. Peak Asymmetry Peak Modulation 15% A 90% A We have developed a data evaluation procedure that utilises Artificial Neural Networks for the analysis. We evaluated this procedure on the p53-gene. Primary Feature Extraction Primary Feature Extraction [2] Hederum, A., Pontén, F., Ren, Z., Lundeberg, J., Pontén, J. and Uhlén, M., BiTechniques 17, 118-129 (1994). [3] Tibbetts, C., Bowling, J. M. and Golden, J. B. III. (1994). In Adams, M. D. et al., eds., Automated DNA Sequencing and Analysis, 219-230. San Diego, CA: Academic Press. • The deviation patterns are fed into a multilayer perceptron neural network which is trained to produce accurate mutation assignments.
© Copyright 2026 Paperzz