38 The review of literature given in this chapter is centered upon new studies for classifier based text mining approaches for data mining applications and ensemble methods. Early work (Hansen & Salamon, 1990) on ensemble suggested that ensembles with as few as ten members were adequate to sufficiently reduce test-set error. Michie, Spiegelhalter, and Taylor (1994) try to find the relationship between the best performing method and data types of input/output variables. However, the common understanding of data mining practitioners and researchers is that there does not exist a universal best-performing method. That is, different kinds of methods have their own advantages and defects. So, a method can perform best for one specific problem, but given another problem, another method can work better. This situation is called selective superiority (Michie et al.,1994). Many researchers have investigated the technique of combining the predictions of multiple classifiers to produce a single classifiers (Breiman 1996c; Clemen, 1989; Perrone, 1993; Wolpert, 1992). The resulting classifier (hereafter referred to as an ensemble) is generally more accurate than any of the individual classifiers making up the ensemble. Both theoretical (Hansen & Salamon, 1990; Krogh & Vedelsby, 1995) and empirical (Hashem, 1997; Opitz & Shavlik, 1996a, 1996b) research has demonstrated that a good ensemble is one where the individual classifiers in the ensemble are both accurate and make their errors on different parts of the input space. Two popular methods for creating accurate ensembles are bagging (Breiman, 1996c) and Boosting 39 (Freund & Schapire, 1996; Schapire, 1990). These methods rely on “resampling” techniques to obtain different training sets for each of the classifiers. This work presents a comprehensive evaluation of bagging on data mining problems using four basis classification methods: k-Nearest Neighbor (k-NN), Radial Basis Function (RBF), Multilayer Perceptron (MLP), and Support Vector Machine (SVM). Combining the output of several classifiers is useful only if there is disagreement among them. Obviously, combining several identical classifiers produces no gain. Hansen and Salamon (1990) proved that if the average error rate for an example is less than 50% and the component classifiers in the ensemble are independent in the production of their errors, the expected error for that example can be reduced to zero as the number of classifiers combined goes to infinity; however, such assumptions rarely hold in practice. Krogh and Vedelsby (1995) later proved that the ensemble error can be divided into a term measuring the generalization error of each individual classifier and a term measuring the disagreement among the classifiers. What they formally showed was that an ideal ensemble consists of highly correct classifiers that disagree as much as possible. Opitz and Shavlik (1996a, 1996b) empirically verified that such ensembles generalize well. Breiman (1996c) showed that bagging is effective on “unstable” learning algorithms where small changes in the training set result in large changes in predictions. Breiman (1996c) claimed that neural networks and decision trees are example of unstable learning algorithms. If the learning algorithm has certain learning parameters, set each base model to have different values, for example, in the area of neural networks one can set each base model to have different initial random weights or a different topology. (Sharkey,A. 1996) 40 The boosting literature (Schapire, Freund, Bartlett, & Lee, 1997) has recently suggested (based on a few data sets with decision trees) that it is possible to further reduce the test-set error even after ten members have been added to an ensemble (and they note that this result also applies to bagging). The idea of combining fitted values from a number of fitting attempts has been suggested by several authors (Leblance and Tibsharini, 1996; Mojirshebeibani, 1997; 1999; Mertz, 1999). In an important sense, the whole becomes more than the sum of its parts. Perhaps the earliest procedure to exploit a combination of “random trees” is bagging. (Breiman, 1996). In Supervised learning (Mitchell, T. (1997), a pool labeled data, S, is used to predict the labels of unseen data. Using S, an empirical accuracy can be calculated and this can be used as an estimate of the generalization accuracy. Two accepted techniques for estimating the generalization accuracy are subsampling and k-fold cross validation. Both techniques may employ a stratified partitioning in which the subsets contain approximately the same proportion of classes as S. System can be monitored at various levels. Various factors including cost, accuracy, and ability to differentiate normal from abnormal behavior influence the choice. Typically, intrusion detection systems monitor either behavior or privileged processes. Although the former method was more popular earlier (Denning, 1987), recent studies have used the latter method (Lee, Stolfo, & Mok, 1998; Hofmeyr et al., 1998). (Hofmeyr et al., 1998) found that short sequences of system calls are a good discriminator for several types of intrusion. Ensemble improves prediction performance by the combined use of two effects: reduction of errors due to bias and variance (Haykin, 1999). The purpose of ensemble learning is to build learning models which integrates a number of base learning models. So that the model gives better 41 generalization performance on application to a particular dataset than any of the individual base models (Dietterich,T. 2000). Ensemble methods often perform extremely well and in many cases, can be shown to have desirable statistical properties. (Breiman, 2001a, 2001c). Fang b, et al., (2003) proposed, two methods to track the variations in the signature patterns written by the same person. The variations can occur in the shape or in the relative positions of the characteristic features. Given the set of training signature samples, the first method measures the positional variations of the one-dimensional projection profiles of the signature patterns; and the second method determines the variations in relative stroke positions in the two-dimension signature patterns. Kagan Tumer and Nikunj C. Oza (2003) have shown from the results that input decimated ensembles outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains. Niall Rooney, et.al (2004) investigated an algorithmic extension to the technique of Stacked Regression that prunes the size of a homogeneous ensemble set based on a consideration of the accuracy and diversity of the set members. They showed that the pruned ensemble set is as accurate on average over the data-sets tested as the non-pruned version, which provides benefits in terms of its application efficiency and reduced complexity of the ensemble. Ensemble methods could perhaps do that job better (Berk et al, 2004). That is, the selection process could be better captured and the probability of membership in each treatment group estimated with less bias. More credible estimates of intervention effects would follow. P. M. Granitto, P. F. Verdes, H. A. Ceccatto (2005) present an extensive evaluation of several algorithms for ensemble construction, including new 42 proposals and comparing them with standard methods in the literature and their algorithms and the weighted modifications are favorably tested against other methods in the literature, producing a sensible improvement in performance on most of the standard statistical databases used as benchmarks. Zonghua Zhang, Hong Shen (2005) modified the conventional SVM, Robust SVM and one-class SVM respectively based on the idea from Online SVM in this paper, and their performances are compared with that of the original algorithms. After elaborate theoretical analysis, concrete experiments with 1998 DARPA BSM data set collected at MIT's Lincoln Labs are carried out. These experiments verify that the modified SVMs can be trained online and the results outperform the original ones with fewer support vectors (SVs) and less training time without decreasing detection accuracy. Both of these achievements could significantly benefit an effective online intrusion detection system. Gavin Brown, Jeremy L. Wyatt, Peter Tiňo (2005) present the results of an empirical study, showing significant improvements over simple ensemble learning, and finding that this technique is competitive with a variety of methods, including boosting, bagging, mixtures of experts, and Gaussian processes, on a number of tasks. Oliver Buchtala, Manuel Klimek, and Bernhard Sick (2005) described an evolutionary algorithm (EA) that performs feature and model selection simultaneously for radial basis function (RBF) classifiers. In order to reduce the optimization effort, various techniques are integrated that accelerate and improve the EA significantly: hybrid training of RBF networks, lazy evaluation, consideration of soft constraints by means of penalty terms, and temperaturebased adaptive control of the EA. The feasibility and the benefits of the approach are demonstrated by means of four data mining problems: intrusion detection in computer networks, biometric signature verification, customer 43 acquisition with direct marketing methods and optimization of chemical production processes. It is shown that, compared to earlier EA-based RBF optimization techniques, the runtime is reduced by up to 99% while error rates are lowered by up to 86%, depending on the application. The algorithm is independent of specific applications so that many ideas and solutions can be transferred to other classifier paradigms. Songbo Tana (2006) proposed a new refinement strategy, which is called as Drag Pushing, for the KNN Classifier. The experiments on three benchmark evaluation collections show that Drag Pushing achieved a significant improvement on the performance of the KNN Classifier. Alok Sharma, Arun K. Pujari, Kuldip K. Paliwal (2007) focused on intrusion detection based on system call sequences using text processing techniques. It introduces kernel based similarity measure for the detection of host-based intrusions. The k-nearest neighbour (kNN) classifier is used to classify a process as either normal or abnormal. The proposed technique is evaluated on the DARPA-1998 database and its performance is compared with other existing techniques available in the literature. It is shown that this technique is significantly better than the other techniques in achieving lower false positive rates at 100% detection rate. Hyoung-joo Lee, Sungzoon Cho (2007) proposed to use novelty detection approaches to alleviate the class imbalance in response modeling. Two novelty detectors, one-class support vector machine (1-SVM) and learning vector quantization for novelty detection (LVQ-ND), are compared with binary classifiers for a catalogue mailing task with DMEF4 dataset. The novelty detectors are more accurate and more profitable when the response rate is low. When the response rate is relatively high, however, a support vector machine model with modified misclassification costs performs the best. In addition, the novelty detectors turn in higher profits with a low mailing cost, while the SVM model is the most portable with a high mailing cost. 44 Sandhya Peddabachigari, Ajith Abraham, Crina Grosan, Johnson Thomas (2007) presents two hybrid approaches for modeling IDS. Decision trees (DT) and support vector machines (SVM) are combined as a hierarchical hybrid intelligent system model (DT–SVM) and an ensemble approach combining the base classifiers. The hybrid intrusion detection model combines the individual base classifiers and other hybrid machine learning paradigms to maximize detection accuracy and minimize computational complexity. Empirical results illustrate that the proposed hybrid systems provide more accurate intrusion detection systems. Taeshik Shon, Jongsub Moon (2007) proposed a new SVM approach, named Enhanced SVM, which combines these two methods in order to provide unsupervised learning and low false alarm capability, similar to that of a supervised SVM approach. Nikunj C. Oza, Kagan Tumer (2008) showed that Mathematically, classifier ensembles provide an extra degree of freedom in the classical bias/variance tradeoff, allowing solutions that would be difficult (if not impossible) to reach with only a single classifier. Because of these advantages, classifier ensembles have been applied to many difficult real-world problems. They survey selected applications of ensemble methods to problems that have historically been most representative of the difficulties in classification. In particular, they surveyed applications of ensemble methods to remote sensing, person recognition, one vs. all recognition and medicine. Response modeling has become a key factor to direct marketing. In general, there are two stages in response modeling. The first stage is to identify respondents from a customer database while the second stage is to estimate purchase amounts of the respondents. (Dongil Kim, Hyoung-joo Lee, Sungzoon Cho, 2008) focused on the second stage where a regression, not a classification, problem is solved. Recently, several non-linear models based on 45 machine learning such as support vector machines (SVM) have been applied to response modeling. Rachid Beghdad (2008) present a critical study about the use of some neural networks (NNs) to detect and classify intrusions. The aim of research is to determine which NN classifies well the attacks and leads to the higher detection rate of each attack. This study focused on two classification types of records: a single class (normal, or attack), and a multiclass, where the category of attack is also detected by the NN. Five different types of NNs were tested: multilayer perceptron (MLP), generalized feed forward (GFF), radial basis function (RBF), self-organizing feature map (SOFM), and principal component analysis (PCA) NN. In the single class case, the PCA NN performs the higher detection rate. Xuchun incorporating Li, Lei Wang, Eric properly designed Sung (2008) show that RBFSVM (SVM with the AdaBoost RBF kernel) component classifiers, which is called AdaBoostSVM, can perform as well as SVM. Furthermore, the proposed AdaBoostSVM demonstrates better generalization performance than SVM on imbalanced classification problems. The key idea of AdaBoostSVM is that for the sequence of trained RBFSVM component classifiers, starting with large σ values (implying weak learning), the σ values are reduced progressively as the Boosting iteration proceeds. This effectively produces a set of RBFSVM component classifiers whose model parameters are adaptively different manifesting in better generalization as compared to AdaBoost approach with SVM component classifiers using a fixed (optimal) σ value. From benchmark data sets, it is shown that their AdaBoostSVM approach outperforms other AdaBoost approaches using component classifiers such as Decision Trees and Neural Networks. Wing W.Y. Ng, Daniel S. Yeung, Michael Firth, Eric C.C. Tsang, Xi-Zhao Wang (2008) proposed a novel hybrid filter–wrapper-type feature subset selection methodology using a localized generalization error model. In the 46 experiments for two of the datasets, the classifiers built using feature subsets with 90% of features removed by our proposed approach yield average testing accuracies higher than those trained, using the full set of features. Tian Xinguang, Duan Miyi, Sun Chunlai, Li Wenfa (2008) A novel method for detecting anomalous program behavior is presented, which is applicable to host based intrusion detection systems that monitor system call activities. The method constructs a homogeneous Markov chain model to characterize the normal behavior of a privileged program, and associates the states of the Markov chain with the unique system calls in the training data. At the detection stage, the probabilities that the Markov chain model supports the system call sequences generated by the program are computed. Yaochu Jin and Bernhard Sendhoff (2008) presented an overview of the existing research on multiobjective machine learning, focusing on supervised learning. In addition, a number of case studies are provided to illustrate the major benefits of the Pareto-based approach to machine learning, e.g., how to identify interpretable models and models that can generalize on unseen data from the obtained Pareto-optimal solutions. Three approaches to Pareto-based multiobjective ensemble generation are compared and discussed in detail. Finally, potentially interesting topics in multiobjective machine learning are suggested. Sebastián Maldonado, Richard Weber (2009) introduced a novel wrapper Algorithm for Feature Selection, using Support Vector Machines with kernel functions. This method is based on a sequential backward selection, using the number of errors in a validation subset as the measure to decide which feature to remove in each iteration. They compared their approach with other algorithms like a filter method or Recursive Feature Elimination SVM to demonstrate its effectiveness and efficiency. 47 Ioannis Partalas, Grigorios Tsoumakas, Ioannis Vlahavas (2009) studied the problem of pruning an ensemble of classifiers from a reinforcement learning perspective. It contributes a new pruning approach that uses the Q-learning algorithm in order to approximate an optimal policy of choosing whether to include or exclude each classifier from the ensemble. Extensive experimental comparisons of the proposed approach against state-of-the-art pruning and combination methods show very promising results. Following this literature review, the work presented in the remainder of the thesis is guided by the above considerations.
© Copyright 2026 Paperzz