SSRG International Journal of Computer Science and Engineering - (ICRTECITA-2017) - Special Issue - March 2017 Subset Feature Selection Algorithm Based on Optimal Characterization of Data Set M Chitra Assistant Professor Department of Computer Science and Engineering Ramco Institute of Technology Rajapalayam Abstract—Most feature selection methods determine a global subset of features, where all data instances are projected in order to improve classification accuracy. An attractive alternative solution is to adaptively find a local subset of features and this paper presents a novel Local Feature Selection (LFS) approach for data classification in the presence of a huge number of irrelevant features whereby each region of the sample space has its own distinct optimized feature set that varies both in membership and size across the sample space allowing the feature set to optimally adapt to the local variations in the sample space. In addition, a method for measuring the similarities of a query datum to each of the classes is also proposed that makes no assumption about the underlying structure of the samples and hence insensitive to the distribution of the data over the sample space. The method is formulated as a linear programming optimization problem. Moreover, the method is robust against the over-fitting problem. The experimental results demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. Keywords—feature selection, class similarity, distance measure I. INTRODUCTION Feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. Selecting the most relevant variables is usually suboptimal for building a predictor, particularly if the variables are redundant. Conversely, a subset of useful variables may exclude many redundant, but relevant, variables. The most widely issue in scientific discipline is dimensionality reduction. Dimensionality reduction can be classified into two: feature extraction and feature selection.Feature selection is the process of selecting a subset of the terms occurring in the training set and using only this subset as features in text classification. ISSN : 2348 – 8387 Abi Nanthana M, Abinaya S L, Abitha P, Kaleeswari T Department of Computer Science and Engineering Ramco Institute of Technology Rajapalayam Feature selection serves two main purposes. First, it makes training and applying a classifier more efficient by decreasing the size of the effective vocabulary. Second, feature selection often increases classification accuracy by eliminating noise features. We can view feature selection as a method for replacing a complex classifier (using all features) with a simpler one (using a subset of the features). Feature selection is an optimization problem. The following are the two steps involved in feature selection: 1) Search the space of possible feature subsets 2) Select the subset that is optimal or nearoptimal with respect to some objective function. In this, the feature selection process is considered for data classification. Given a set of training samples and their classes, the feature selection involves finding a subset of relevant features. We introduced an alternative to the conventional feature selection called Localized Feature Selection (LFS). The localized feature selection is realized by considering each training sample as a representative point of its neighboring region and by selecting an optimal feature set for that region. This method solves the problem of over-fitting. Over-fitting problem arises by considering too many numbers of observations that lead to increase in error. II. METHOD DESCRIPTION Subset selection method This method selects a subset of features that together have good predictive power, as opposed to ranking features individually. the sequential forward selection and sequential backward selection can be used when new features are added into existing set or removed out of existing set. For that, the evaluation can be done by category distance measurement and classification error www.internationaljournalssrg.org Page 65 SSRG International Journal of Computer Science and Engineering - (ICRTECITA-2017) - Special Issue - March 2017 Original feature set Generation Validation Evaluation Selected Subset of features Yes StoppingCr No iterion Figure 1.Steps in Feature Selection method 1. Generation: In Generation, select candidate subset of feature for evaluation. At the start, there can be no feature, all feature, random feature subset. And at the subsequent we can add, remove and add/remove situations. The three ways by which the feature space is examined are complete, heuristics and random. 2. Evaluation: In Evaluation process, it determine the relevancy of the generated feature subset candidate towards the classification task. In Rvalue = J (candidate subset), if (Rvalue>best_value) best_value = Rvalue For evaluation, we use distance as a evaluation function. In this work we use Euclidean distance measure as the distance measure. The equation for distance measure is Z2=X2+Y2 and the steps involved in distance measure are as follows: 1) Distance measure will select those features that support instances of the same class to stay within the same proximity. 2) The instances of same class should be closer in terms of distance than those from different class. ISSN : 2348 – 8387 III. FEATURE SELECTION A. Preprocessing: It is a data mining technique that involves transforming a raw data into an understandable format. We need to preprocess the data because of data may be inconsistent and have errors. To get the required information from huge, incomplete, noisy and inconsistent set of data, it is necessary to use data preprocessing. Here, we used Butterworth filter for preprocessing the data. For preprocessing, first we need to calculate Butterworth value by passing order and cutoff frequency and the result is produced as a normalized frequency. Later, filtering of data is done. B. Feature Optimization: In Feature optimization, there is a need to calculate eigen values to find out how much information can be passed through the medium. It also helps to rank the usage of features. For this we need to find out the covariance of a matrix. If the eigen value is large, then that feature has to be selected else the feature is not considered. At last, project the original dataset. www.internationaljournalssrg.org Page 66 SSRG International Journal of Computer Science and Engineering - (ICRTECITA-2017) - Special Issue - March 2017 A. Graph Construction: Graph construction involves the construction of the graph. If the graph is already present, we need to assign weights for each edge in the graph. The weights are assigned by the following three options: Binary, Heat Kernel and cosine. Then, the output of this will be send to the next process for processing the input. In this, we also need to calculate the Euclidean distance between data points in the sample. It focuses on the neighboring samples by assigning higher weights to them. Initially, the weights are all assigned uniform values. If two samples are close to each other in one space, they are close in most of the other sub-spaces. Then, the distance in N subspace obtained as follows: (𝐢) 𝐍 𝐤=𝟏 𝐞𝐱𝐩(−(𝐝𝐢𝐣|𝐤 (𝐣) ∗ (𝐤) 𝐰𝐣 =1/N( 𝐝𝐢𝐣|𝐤 =||(𝐱 (𝐢) -𝐱 )⨂𝐟 𝐝𝐦𝐢𝐧 𝐢𝐣|𝐤 = 𝐦𝐢𝐧 𝐝𝐢𝐯|𝐤 𝐯𝛜𝐲 (𝐢) 𝐝𝐢𝐯|𝐤 𝐦𝐢𝐧 𝐯∉𝒚(𝒊) − 𝐝𝐦𝐢𝐧 𝐢𝐣|𝐤 ))) ||2 A. Overfitting Problem: Overfitting occurs when a learning model customizes itself too much to describe the relationship between training data and the labels. Ittends to make the model very complex by having too many parameters. This leads to poor performance on new data. The main reason overfitting happens is because you have a small dataset and you try to learn from it. There is an algorithm for automatically deciding which features to keep and which features to throw out. This idea of reducing the number of features can work well and reduce overfitting. The potential for overfitting depends not only on the number of parameters and data but also the conformability of the model structure with the data shape and the magnitude of model error compared to the expected level of noise in the data. The risk of overfitting is not eliminated for real data sets. The LFS algorithm inherently tends to select only relevant features and rejects irrelevant features. , 𝐢𝐟 𝐲 (𝐣) = 𝐲 (𝐢) IV. EXPERIMENTAL RESULTS , 𝐢𝐟 𝐲 (𝐣) ≠ 𝐲 (𝐢) In the proposed method, the data set is distributed in a two dimensional feature space where class Y1 datais split into two clusters. The artificial irrelevant features are independently sampled with zero-mean and unit- variance. Data-sets“Prostate”, “Duke-breast cancer”, “Leukemia” and “colon” are microarray data-sets in which each case the number of features is significantly larger than the number of samples. The proposed algorithm is implemented in MATLAB and executed on a desktop with an Intel Core i3 CPU. B. Laplacian Score: Laplacian Score (LS) is a popular feature ranking based on feature selection method both supervised and unsupervised. It seeks features which best reflect the underlying manifold structure. LS construct a nearest neighbor graph to model the local structure and then selects those features which best respect this graph structure. A. Overrlapping feature set: C. Class similarity Measurement: In this approach, there is no common set of features across the sample space are inappropriate. The proposed structure is based on the similarity of a datum to a specific class. It consists of N regions, where each region has a representative point, class label and the optimal feature set. In the region, there is a factor called impurity level. The impurity level is the ratio differing class label with the same class label. And also, we have to measure the similarity to all regions. After computing the similarity to all classes, the class label which provides the largest similarity. If the query sample does not fall into the region, then we assign its class as the class label of the nearest sample. The coordinate system is used to determine the nearest neighboring sample. ISSN : 2348 – 8387 We are going to consider the fact, whether there is any overlap between the optimal feature sets of the representative points. The height of the feature indicates what percentage of representative points to select the respective feature. The assumption of a common feature set over the entire sample space is not necessarily optimal in the applications. The common features are interpreted as the most informative features in terms of accuracy over the space. The less informative features are interpreted as being informative features. B. CPU time: The computational complexity for computing a feature set depends mainly on the data dimension. The proposed method is used to perform feature selection for one representative point on the data-set www.internationaljournalssrg.org Page 67 SSRG International Journal of Computer Science and Engineering - (ICRTECITA-2017) - Special Issue - March 2017 with a number of irrelevant features. The feature selection for each representative point is independent of the others and can be performed in a parallel manner. It determines the class label of its nearest neighbors and also it requires no optimization. The CPU time will be reduced to the fraction of a second. Figure 4. Preprocessed data Figure 2.Option for getting input Figure 3. Loading of dataset ISSN : 2348 – 8387 Figure 5. Filtered data www.internationaljournalssrg.org Page 68 SSRG International Journal of Computer Science and Engineering - (ICRTECITA-2017) - Special Issue - March 2017 Figure 6. Optimized data C. Conclusion: We present an effective method for Local Feature Selection to the data classification problem. The most feature selection algorithms select a global feature. In this proposed method, we are selecting local subsets of features that are most informative for the small regions around the data points. The process of computation is independent of all others and also performed in a parallel manner. The proposed algorithm has the advantage of efficient selection of features and also done in a parallel way [5] R. E. Bellman and S. E. Dreyfus, Applied Dynamic Programming. Rand Corporation, 1962 [6] Y. Sun, S. Todorovic, and S. Goodison, “Locallearning-based feature selection for high-dimensional data analysis,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1610–1626, Sep. 2010. [7] R. E. Bellman and S. E. Dreyfus, Applied Dynamic Programming. Rand Corporation, 1962. [8] D. L. Donoho and C. Grimes, “Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data,” Proc. Nat.Acad. Sci., vol. 100, no. 10, pp. 5591–5596, 2003. [9] Number S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323– 2326, 2000. [10] I. K. Fodor, “A survey of dimension reduction techniques,” Lawrence Livermore National Laboratory, Tech. Rep. UCRL-ID-148494, 2002. [11] H.-L. Wei and S. A. Billings, “Feature subset selection and ranking for data dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 162–166, Jan. 2007. [12] L. Wang, “Feature selection with kernel class separability ,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 9, pp. 1534–1546, Sep. 2008. [13] K. Kira and L. A. Rendell, “A practical approach to featureselection,” in Proc. 9th Int. Workshop Mach. Learn., 1992, pp. 249–256. [14] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004. Acknowledgment The authors wish to acknowledge the guide for the support. References [1]P. Langley, Selection of Relevant Features in Machine Learning. Defense Technical Information Center, 1994. [2] H. Peng , F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min redundancy, ”IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8,pp. 1226–1238, Aug. 2005. [3] T.F. Cox and M.A.A.Cox.Multidimensional Scaling.Chapman and Hall,second edition, 2001 [4]M. Belkin and P. Niyogi, “Laplacianeigenmaps for dimensionality reduction and data representation,” Neural Comput., vol. 15, no. 6, pp. 1373–1396, 2003. ISSN : 2348 – 8387 www.internationaljournalssrg.org Page 69
© Copyright 2026 Paperzz