Anomaly detection through Bayesian Support Vector Machines Vasilis A. Sotiris Michael Pecht AMSC663 Project Proposal 1 Detection Algorithm Model Decision boundary Model space R kxm K<n Training data KarhunenLoeve expansion Residual Decision boundary Decision Input space R nxm Residual space R lxm l<n Positive class Negative class AMSC663 Project Proposal 2 Basics: Linear Classification – Separable x2 • For seperable data the SVM finds a function D(x) that best separates the two classes (Maximum distance M) • Function D(x) can be used as a classifier • Through the support vectors we can = – compress the input space – Detect anomalies • By minimizing the norm of w we find the line or linear surface that best separates the two classes • The decision function is the linear combination of the weight vector w Abnormal Class M w Optimal Separating Line D(x) x1 Normal Class Training Support Vectors New observation vector 1 min! w 2 n 2 1 wT w 2 n n w i yi xi i 1 D( x) wi xi b yi i xi x b i i 1 T i 1 Lagrange multipliers AMSC663 Project Proposal 3 Basics: Linear Classification - Inseparable • Maximize the margin M and minimize the sum of slack errors xi • Function D(x) can be again used as a classifier (incorporating a degree of error) x2 Abnormal Class x1 x1 M x2 x2 Normal Class x1 Training Support Vectors New observation vector n 1 2 1 T min! w w w C x i 2 2 i 1 AMSC663 Project Proposal 4 Nonlinear classification • For inseparable data the SVM finds a nonlinear function D(x) that best separates the two classes by: x2 Abnormal Class D(x) – Use of a kernel map k(.) – K=F(xi)F(x) – An example of a feature map F(x)=[x2 √2x 1]T • The decision function D(x) requires the dot product of the feature map F – uses the same mathematical framework as the linear classifier • The class y of the data is determined by the sign of D(x) Normal Observation x1 Normal Class n n i 1 i 1 D( x) wi xi b yi i Fxi Fx b y AMSC663 Project Proposal +1 sign ( D( x)) 0 -1 sign ( D( x)) 0 5 Nonlinear classification for detection F2 58 x2 58 56 56 54 54 50 F3 Feature Space 48 46 44 42 42 F1 D(x) 52 X2 X2 52 x2 50 48 46 44 44 46 48 50 x1 52 Input Space 54 56 x1 42 42 44 46 48 50 x1 52 54 Input Space 56 x1 • Given: a training data set that contains the normal and artificial abnormal data points (Blue crosses and red circles respectively • Solve linear optimization problem to find w and b in the feature space • Form a nonlinear decision function by mapping back to the input space using the same kernel mapping • The result is that we can obtain a decision boundary on the given training set and use it to classify new observations AMSC663 Project Proposal 6 Need for a soft decision boundary Could be a false alarm • Class predictions are not probabilistic • SVM output is a ‘hard’ binary decision – We desire to estimate the conditional distribution p(y|x) capture uncertainty Soft decision boundary Hard decision boundary • User defined model parameters like C can lead to low generalization • Bayesian methods can n determine model p( y | x, w) p yi | xi , w parameters Likelihood function i 1 AMSC663 Project Proposal D(x) 7 Validation • Training data: simulate (n x m) matrix of observations • Test data: use training data and inject a fault • Construct D(x) with BSVM against training data • Validation – detect given faults – reduce false alarms (compare to SVM) AMSC663 Project Proposal 8 BACKUP - Bayesian Classifier Design • Loss functions for a soft decision and hard decision boundary 3 yx=-1 yx=+1 2 1 0 -2 -1 0 1 2 D(x) AMSC663 Project Proposal 9
© Copyright 2026 Paperzz