A fault detection concept for single class problems - Otto

The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies
A fault detection concept for single class problems
Jens Strackeljan(1), Stefan Goreczka(1) and Dietrich Behr(2)
(1)
Otto-von-Guericke-Universität Magdeburg, Fakultät für Machinenbau
Institut für Mechanik, Universitätsplatz 2, 30106 Magdeburg, Germany
e-mail: [email protected]
e-mail: [email protected]
(2)
TU Clausthal, Institut für Technische Mechanik,
Adolph-Roemer-Straße 2A, 38678 Clausthal-Zellerfeld, Germany
Abstract:
Signal processing and the feature selection process could be optimized if test samples of
different classes are available. In most cases, free parameters in signal processing, like
order of derivative and filter frequencies, could be determined under the main objective
of the best separation. Also feature selection in supervised learning, if class information
for all states is present, has been extensively studied. Supervised feature selection
algorithms try to find features that help separate data of different classes. Unsupervised
feature selection aims to find a subset of features according to a certain criterion without
prior information. In comparison to supervised methods, there are only few algorithms
published for these cases. The One-Class Classification (OCC) problem is complete
different from the conventional multi-class problem because a second (or negative)
class is either not present or no data representing those class are available. The problem
of classifying so called target cases in the absence of negative cases has gained
increasing attention in recent years.
The paper demonstrates a concept idea how to design a fault detection system if only
information for a single class is present. Assuming that this class and the corresponding
time signals describe the fault free state, no information about a fault state is available.
Under these conditions the difference between the labeled training samples and an
unknown measurement signal during the operation phase of the monitoring system has
to be recognized. What is a significant change or difference between two states? What
features should be considered if no feature selection process is available?
1. Introduction
One-class classification tries to distinguish one class of objects from all other possible
objects, by learning from a training set containing only the objects of that class. This is
different from and more difficult than the traditional classification problem, which tries
to distinguish between two or more classes with the training set containing objects from
all the classes. In OCC(1,2) one of the classes which is in general referred to as the
positive class or target class is well characterized by instances in the training data. For
The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies
the other class (non target), it has either no instances at all, very few of them, or they do
not form a statistically representative sample of the negative concept.
The term one-class classification was coined by Moya(3) and a couple of applications
can be found in the literature.
The motivation for using OCC in condition monitoring is obvious. If a classifier should
detect abnormal or faulty behavior in a motor, pump or any other technical device or
plant measurements on the normal operation of the machine are easy to obtain. The
presence of a huge amount of target training data is absolute no problem. Fortunately
most faults will not have occurred or the generation of artificial faults is not reasonable.
Nobody will allow major faults in a nuclear power plant just for the generation of test
samples. This situation could lead to a set of little or no training data for a second
(negative) class.
This is the reason why the boundary between the target class and a second one which
will possibly occur during the working phase of the classifier has to be estimated from
data of only the normal class.
Tax introduces to OCC a nice example and defines the task to distinguish between
apples and pears. This problem does not seem very complicated, everyone can
immediately separate the two types of fruit on how they look and feel. If one likes to
design a classifier, which will process the task automatically it turns out to be more
complicated. What features should be the basis for the decision to call an object ‘apple’,
and another object ‘pear’? it could be the weight, the height or color, the shape, smell
or the flavor or the combination of all of these properties(2). Assuming we use
measurements of the color and the smell of an object, how should an apple with some
mud on it be classified? Is it an apple or is it dirt? In the context of OCC, the object
should be classified as a genuine object (apple or pear), or an outlier object like on other
type of fruit, rotten fruit or dirt.
In Figure 1 a conventional and a one-class classifier applied to an example dataset
containing apples and pears, represented by 2 features per object. The solid line is the
conventional classifier which distinguishes between the apples and pears, while the
dashed line describes the dataset. This description can identify the outlier apple in the
lower right corner while the classifier will just classify it as an pear(2).
An excellent overview about the recent literature gives Khan(1). OCC problems have
been studied extensively under three broad frameworks(1):
1. Learning with positive examples only
2. Learning with positive examples and some amount of poorly distributed
negative examples
3. Learning with positive and unlabeled data.
2
The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies
Figure 1. Difference between an on-class classifier and a convential classifier to
distinguishes between the apples and pears(2)
2 Support Vector Data Description (SVDD)
Tax and Duin(4) and Scholkopf(5) have developed algorithms based on support vector
machines to tackle the problem of OCC using positive examples only. The main idea
behind these strategies is to construct a decision boundary around the positive data so as
to differentiate the outliers (non-positives) from the positive data.
A different algorithm could be used to estimate the target density or to generate a model
in orientation to a data support vector classifier. Tax and Duin(4) seek to solve the
problem of OCC by distinguishing the positive class from all other possible patterns in
the pattern space. Instead of using a hyper-plane to distinguish between two classes, a
hyper-sphere is found around the positive class data. Free parameters in the algorithm
could be used to decide how many samples in the training data could be covers with a
minimum radius inside the hyper-sphere. This method is called the Support Vector Data
Description (SVDD). Furthermore, the hyper-sphere model of the SVDD can be made
more flexible by introducing kernel functions. Tax(2) considers a Polynomial and a
Gaussian kernel and found that the Gaussian kernel works better for most data sets. A
drawback of this technique is that they often require a large data set. When designing an
OCC, the question is what feature subset should be selected to generate the hyper-shere
or to estimate a density. Even complete other concepts without SVDD are possible.
3 A Concept based on Feature Selection Methods
The target of any feature selection/extraction algorithm is to choose a feature subset
such that the performance of a classification system, trained with the data, is successful.
The process of feature creation and feature selection requires an effective and positive
impact on the classifier performance. Feature selection in supervised learning has been
3
The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies
extensively studied. Supervised feature selection algorithms depend on measures that
take into account the class information. They try to find features that help separate data
of different classes. When a small number of instances are labelled but the majority are
not, semi-supervised feature selection is designed to take advantage of both the large
number of unlabelled instances and the labelling information as in semi supervised
learning. Intuitively, the additional labelling information should help constrain the
search space of unsupervised feature selection(9).
Unsupervised feature selection aims to find a subset of features according to a certain
criterion without prior information(9):
• Redundancy based: These algorithms try to eliminate the redundancy among the
features.
• Clustering based: In this category, the clustering quality assessment methods are
used as evaluation method.
Different concepts are available to execute the process of feature selection(10,11,12) . The
authors have developed a concept for feature selection following a wrapper(13) approach
and the reclassification error as a quality indicator of the feature set(14,15).
On the basis of a classified learning sample, for which an unambiguous class
assignment has been performed for each random sample during the learning phase, a
measure of appraisal can be obtained by reclassifying the learning sample with the
respective classification algorithm. The ratio, of the number of random samples
correctly classified in accordance with the given class assignment to the total number of
random samples investigated provides the reclassification rate. The objective is to
obtain a very small reclassification error. In the ideal case, the decision on the class
assignment for the reclassification agrees with the class subdivision of the learning
sample for all objects on the basis of the maximal membership. The advantage of the
reclassification error concept is the possibility of determining conclusive values of even
with a small number of random samples.
In Figure 2 100 power spectra with 50 magnitudes per spectra amplitudes are used the
for feature selection process. Each amplitude is a possible input feature. The feature
selection should determine the best combination of input features to solve the
classification with a minimal number of features. To demonstrate the complexity of the
search functions Figure 2 shows the 50 possible input features for all combinations with
two input features.. Because the sequence of feature in combination is arbitray, the 50
features generate 1250 different combinations. In Figure 2 al combinations with an error
rate less than 30% are presented. The example clearly indicates that feature selection is
a crucial point for the performance of a classifier and strongly depends from the
particular task..
If only one class is available, the concept has to be adopted, because a second class
could not be used for the calculation of the quality criteria.
We assume a transition from the state A without a fault to state C where an alert should
be generated over a long time (Figure 3). For a couple of faults in condition monitoring
due to wear processes this assumption is valid.
4
The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies
Figure 2. Classification rate in an example with 50 features
Figure 3. Transition of machinery condition from a state A to a fault state C
After an initial phase we declare a set of new measurements with the according features
as data belonging to a second class (non target). If the feature selection process is not
able to separate the both classes, the assumption that the current measurements are a
new class could be rejected (Figure 4). The advantaged is now that for the process of
feature selection all possible feature combinations are evaluated.
During the life cycle, a change of the vibration signal (State B,C) should occur. The
feature selection will detect this change and not only few, but many different feature
combinations could be used for the separation of the two classes (Figure 5). 1175
5
The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies
combinations with 2 feature generate an error less than 70%. The situation in Figure 2
could be between State B and C.
The fault indicator is not a physical parameter like Kurtosis but the number of features
which could separate the target class and a second one. The difficulty is to identify a
reasonable limit of feature combinations for setting an alert. (Figure 6).
Figure 4. Using the initial training data as a reference (target class) and a current
block of measurements as class 2
Figure 5. Classification rate where the original training set and the second class
could be separated by may feature combinations
6
The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies
Figure 6. Development of the number of separating feature combinations over the
time until an alert is set
We have good experience with the detection of cracks in shafts, delimitations in plates
and other applications(16).
4. Conclusions
If only data for a good condition of a machine are available as a target class, these data
can still be employed as starting material for a monitoring concept. SVVD is only one
possible idea. When using feature selection methods all significant deviations from this
initial condition are then detected.
References
1.
2.
3.
4.
5.
6.
S Khan and M Madden, ‘A Survey of Recent Trends in One Class Classification’,
Proceedings of the 20th Irish Conference on Artificial Intelligence and Cognitive
Science, Dublin, in the LNAI, volume 6206, pp.181-190, Springer-Verlag, 2009
D Tax, ‘One Class Classification’. PhD thesis, Delft University of Technology,
2001
M Moya, M. Koch, L Hostetler, ‘One-class classifier networks for target
recognition applications’, Proceedings World Congress on Neural Networks,,1993
D Tax, R Duin, ‘Uniform object generation for optimizing one-class classifiers’,
J. Machine Learning Research 2, 155, 2001
B Scholkopf et al., ‘ Support vector method for novelty detection’. In S.A.Solla
et.al. eds.: Neural Information Processing Systems, pp 582-588, 2000
G. Ritter, M Gallegos, ‘Outliers in statistical pattern recognition and an
application to automatic chromosome classifiation’, Pattern Recognition Letters,
Vol 18, pp 525-539, 1997.
7
The Ninth International Conference on Condition Monitoring and Machinery Failure Prevention Technologies
7.
C Bishop, Novelty detection and neural network validation, IEEE Proceedings on
Vision, Image and Signal Processing, Special Issue on Applications of Neural
Networks 141(4), pp 217-222, 1994.
8.
N Japkowicz, ‘Concept-Learning in the absence of counterexamples: An
autoassociation-based approach to classifcation’. PhD thesis, New Brunswick
Rutgers, The State University of New Jersey (1999)
9.
A. Mosallam, ‘Self-organized Selection of Features for Unsupervised On-board
Fault Detection’, Studies from the Department of Technology at Örebro University,
Master Thesis, 2010
10. H. LIU et.al. ‘Feature Selection: An Ever Evolving Frontier in Data Mining’,
LJMLR: Workshop and Conference Proceedings 10: 4-13 The Fourth Workshop on
Feature Selection in Data Mining, 2010
11. X Wu, Kui Yu, H Wang and W Ding, ‘Online Streaming Feature Selection’,
Proceedings of the 27th International Conference on Machine Learning, Haifa,
Israel, 2010.
12. L Yu and H Liu, ‘Efficient Feature Selection via Analysis of Relevance and
Redundancy’, Journal of Machine Learning Research, Vol 5, pp 1205–1224,
October 2004
13. R Kohavi and G H John, ‘Wrappers for feature subset selection’, Artificial
Intelligence, Vol 97, pp 273-324, 1997.
14. J Strackeljan, ‘Feature selection methods - an application oriented overview’,
TOOLMET‘01 Symposium, Oulu, Finland, pp 29-49, 2001.
15. J Strackeljan and A Schubert, ‘Evolutionary strategy to Select Input Features for a
Neural Network Classifier’, Advances in Computational Intelligence and Learning:
Methods and Applications (eds.: Zimmerman, H-J., Tselentis, G.), 2002.
16. S Goreczka and J Strackeljan, ‘Comparison of one-class classifiers for Condition
Monitoring of rolling bearings in non-stationary operations, Proceedings CM 2012
– MFPT 2012, 9th International Conference on Condition Monitoring and
Machinery Failure Prevention Technologies, London, June 2012.
8