The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies A fault detection concept for single class problems Jens Strackeljan(1), Stefan Goreczka(1) and Dietrich Behr(2) (1) Otto-von-Guericke-Universität Magdeburg, Fakultät für Machinenbau Institut für Mechanik, Universitätsplatz 2, 30106 Magdeburg, Germany e-mail: [email protected] e-mail: [email protected] (2) TU Clausthal, Institut für Technische Mechanik, Adolph-Roemer-Straße 2A, 38678 Clausthal-Zellerfeld, Germany Abstract: Signal processing and the feature selection process could be optimized if test samples of different classes are available. In most cases, free parameters in signal processing, like order of derivative and filter frequencies, could be determined under the main objective of the best separation. Also feature selection in supervised learning, if class information for all states is present, has been extensively studied. Supervised feature selection algorithms try to find features that help separate data of different classes. Unsupervised feature selection aims to find a subset of features according to a certain criterion without prior information. In comparison to supervised methods, there are only few algorithms published for these cases. The One-Class Classification (OCC) problem is complete different from the conventional multi-class problem because a second (or negative) class is either not present or no data representing those class are available. The problem of classifying so called target cases in the absence of negative cases has gained increasing attention in recent years. The paper demonstrates a concept idea how to design a fault detection system if only information for a single class is present. Assuming that this class and the corresponding time signals describe the fault free state, no information about a fault state is available. Under these conditions the difference between the labeled training samples and an unknown measurement signal during the operation phase of the monitoring system has to be recognized. What is a significant change or difference between two states? What features should be considered if no feature selection process is available? 1. Introduction One-class classification tries to distinguish one class of objects from all other possible objects, by learning from a training set containing only the objects of that class. This is different from and more difficult than the traditional classification problem, which tries to distinguish between two or more classes with the training set containing objects from all the classes. In OCC(1,2) one of the classes which is in general referred to as the positive class or target class is well characterized by instances in the training data. For The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies the other class (non target), it has either no instances at all, very few of them, or they do not form a statistically representative sample of the negative concept. The term one-class classification was coined by Moya(3) and a couple of applications can be found in the literature. The motivation for using OCC in condition monitoring is obvious. If a classifier should detect abnormal or faulty behavior in a motor, pump or any other technical device or plant measurements on the normal operation of the machine are easy to obtain. The presence of a huge amount of target training data is absolute no problem. Fortunately most faults will not have occurred or the generation of artificial faults is not reasonable. Nobody will allow major faults in a nuclear power plant just for the generation of test samples. This situation could lead to a set of little or no training data for a second (negative) class. This is the reason why the boundary between the target class and a second one which will possibly occur during the working phase of the classifier has to be estimated from data of only the normal class. Tax introduces to OCC a nice example and defines the task to distinguish between apples and pears. This problem does not seem very complicated, everyone can immediately separate the two types of fruit on how they look and feel. If one likes to design a classifier, which will process the task automatically it turns out to be more complicated. What features should be the basis for the decision to call an object ‘apple’, and another object ‘pear’? it could be the weight, the height or color, the shape, smell or the flavor or the combination of all of these properties(2). Assuming we use measurements of the color and the smell of an object, how should an apple with some mud on it be classified? Is it an apple or is it dirt? In the context of OCC, the object should be classified as a genuine object (apple or pear), or an outlier object like on other type of fruit, rotten fruit or dirt. In Figure 1 a conventional and a one-class classifier applied to an example dataset containing apples and pears, represented by 2 features per object. The solid line is the conventional classifier which distinguishes between the apples and pears, while the dashed line describes the dataset. This description can identify the outlier apple in the lower right corner while the classifier will just classify it as an pear(2). An excellent overview about the recent literature gives Khan(1). OCC problems have been studied extensively under three broad frameworks(1): 1. Learning with positive examples only 2. Learning with positive examples and some amount of poorly distributed negative examples 3. Learning with positive and unlabeled data. 2 The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies Figure 1. Difference between an on-class classifier and a convential classifier to distinguishes between the apples and pears(2) 2 Support Vector Data Description (SVDD) Tax and Duin(4) and Scholkopf(5) have developed algorithms based on support vector machines to tackle the problem of OCC using positive examples only. The main idea behind these strategies is to construct a decision boundary around the positive data so as to differentiate the outliers (non-positives) from the positive data. A different algorithm could be used to estimate the target density or to generate a model in orientation to a data support vector classifier. Tax and Duin(4) seek to solve the problem of OCC by distinguishing the positive class from all other possible patterns in the pattern space. Instead of using a hyper-plane to distinguish between two classes, a hyper-sphere is found around the positive class data. Free parameters in the algorithm could be used to decide how many samples in the training data could be covers with a minimum radius inside the hyper-sphere. This method is called the Support Vector Data Description (SVDD). Furthermore, the hyper-sphere model of the SVDD can be made more flexible by introducing kernel functions. Tax(2) considers a Polynomial and a Gaussian kernel and found that the Gaussian kernel works better for most data sets. A drawback of this technique is that they often require a large data set. When designing an OCC, the question is what feature subset should be selected to generate the hyper-shere or to estimate a density. Even complete other concepts without SVDD are possible. 3 A Concept based on Feature Selection Methods The target of any feature selection/extraction algorithm is to choose a feature subset such that the performance of a classification system, trained with the data, is successful. The process of feature creation and feature selection requires an effective and positive impact on the classifier performance. Feature selection in supervised learning has been 3 The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies extensively studied. Supervised feature selection algorithms depend on measures that take into account the class information. They try to find features that help separate data of different classes. When a small number of instances are labelled but the majority are not, semi-supervised feature selection is designed to take advantage of both the large number of unlabelled instances and the labelling information as in semi supervised learning. Intuitively, the additional labelling information should help constrain the search space of unsupervised feature selection(9). Unsupervised feature selection aims to find a subset of features according to a certain criterion without prior information(9): • Redundancy based: These algorithms try to eliminate the redundancy among the features. • Clustering based: In this category, the clustering quality assessment methods are used as evaluation method. Different concepts are available to execute the process of feature selection(10,11,12) . The authors have developed a concept for feature selection following a wrapper(13) approach and the reclassification error as a quality indicator of the feature set(14,15). On the basis of a classified learning sample, for which an unambiguous class assignment has been performed for each random sample during the learning phase, a measure of appraisal can be obtained by reclassifying the learning sample with the respective classification algorithm. The ratio, of the number of random samples correctly classified in accordance with the given class assignment to the total number of random samples investigated provides the reclassification rate. The objective is to obtain a very small reclassification error. In the ideal case, the decision on the class assignment for the reclassification agrees with the class subdivision of the learning sample for all objects on the basis of the maximal membership. The advantage of the reclassification error concept is the possibility of determining conclusive values of even with a small number of random samples. In Figure 2 100 power spectra with 50 magnitudes per spectra amplitudes are used the for feature selection process. Each amplitude is a possible input feature. The feature selection should determine the best combination of input features to solve the classification with a minimal number of features. To demonstrate the complexity of the search functions Figure 2 shows the 50 possible input features for all combinations with two input features.. Because the sequence of feature in combination is arbitray, the 50 features generate 1250 different combinations. In Figure 2 al combinations with an error rate less than 30% are presented. The example clearly indicates that feature selection is a crucial point for the performance of a classifier and strongly depends from the particular task.. If only one class is available, the concept has to be adopted, because a second class could not be used for the calculation of the quality criteria. We assume a transition from the state A without a fault to state C where an alert should be generated over a long time (Figure 3). For a couple of faults in condition monitoring due to wear processes this assumption is valid. 4 The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies Figure 2. Classification rate in an example with 50 features Figure 3. Transition of machinery condition from a state A to a fault state C After an initial phase we declare a set of new measurements with the according features as data belonging to a second class (non target). If the feature selection process is not able to separate the both classes, the assumption that the current measurements are a new class could be rejected (Figure 4). The advantaged is now that for the process of feature selection all possible feature combinations are evaluated. During the life cycle, a change of the vibration signal (State B,C) should occur. The feature selection will detect this change and not only few, but many different feature combinations could be used for the separation of the two classes (Figure 5). 1175 5 The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies combinations with 2 feature generate an error less than 70%. The situation in Figure 2 could be between State B and C. The fault indicator is not a physical parameter like Kurtosis but the number of features which could separate the target class and a second one. The difficulty is to identify a reasonable limit of feature combinations for setting an alert. (Figure 6). Figure 4. Using the initial training data as a reference (target class) and a current block of measurements as class 2 Figure 5. Classification rate where the original training set and the second class could be separated by may feature combinations 6 The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies Figure 6. Development of the number of separating feature combinations over the time until an alert is set We have good experience with the detection of cracks in shafts, delimitations in plates and other applications(16). 4. Conclusions If only data for a good condition of a machine are available as a target class, these data can still be employed as starting material for a monitoring concept. SVVD is only one possible idea. When using feature selection methods all significant deviations from this initial condition are then detected. References 1. 2. 3. 4. 5. 6. S Khan and M Madden, ‘A Survey of Recent Trends in One Class Classification’, Proceedings of the 20th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, in the LNAI, volume 6206, pp.181-190, Springer-Verlag, 2009 D Tax, ‘One Class Classification’. PhD thesis, Delft University of Technology, 2001 M Moya, M. Koch, L Hostetler, ‘One-class classifier networks for target recognition applications’, Proceedings World Congress on Neural Networks,,1993 D Tax, R Duin, ‘Uniform object generation for optimizing one-class classifiers’, J. Machine Learning Research 2, 155, 2001 B Scholkopf et al., ‘ Support vector method for novelty detection’. In S.A.Solla et.al. eds.: Neural Information Processing Systems, pp 582-588, 2000 G. Ritter, M Gallegos, ‘Outliers in statistical pattern recognition and an application to automatic chromosome classifiation’, Pattern Recognition Letters, Vol 18, pp 525-539, 1997. 7 The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies 7. C Bishop, Novelty detection and neural network validation, IEEE Proceedings on Vision, Image and Signal Processing, Special Issue on Applications of Neural Networks 141(4), pp 217-222, 1994. 8. N Japkowicz, ‘Concept-Learning in the absence of counterexamples: An autoassociation-based approach to classifcation’. PhD thesis, New Brunswick Rutgers, The State University of New Jersey (1999) 9. A. Mosallam, ‘Self-organized Selection of Features for Unsupervised On-board Fault Detection’, Studies from the Department of Technology at Örebro University, Master Thesis, 2010 10. H. LIU et.al. ‘Feature Selection: An Ever Evolving Frontier in Data Mining’, LJMLR: Workshop and Conference Proceedings 10: 4-13 The Fourth Workshop on Feature Selection in Data Mining, 2010 11. X Wu, Kui Yu, H Wang and W Ding, ‘Online Streaming Feature Selection’, Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 2010. 12. L Yu and H Liu, ‘Efficient Feature Selection via Analysis of Relevance and Redundancy’, Journal of Machine Learning Research, Vol 5, pp 1205–1224, October 2004 13. R Kohavi and G H John, ‘Wrappers for feature subset selection’, Artificial Intelligence, Vol 97, pp 273-324, 1997. 14. J Strackeljan, ‘Feature selection methods - an application oriented overview’, TOOLMET‘01 Symposium, Oulu, Finland, pp 29-49, 2001. 15. J Strackeljan and A Schubert, ‘Evolutionary strategy to Select Input Features for a Neural Network Classifier’, Advances in Computational Intelligence and Learning: Methods and Applications (eds.: Zimmerman, H-J., Tselentis, G.), 2002. 16. S Goreczka and J Strackeljan, ‘Comparison of one-class classifiers for Condition Monitoring of rolling bearings in non-stationary operations, Proceedings CM 2012 – MFPT 2012, 9th International Conference on Condition Monitoring and Machinery Failure Prevention Technologies, London, June 2012. 8
© Copyright 2026 Paperzz