Best First and Greedy Search Based CFS and Naïve

Best First and Greedy Search Based
CFS and Naïve Bayes Algorithms for
Hepatitis Diagnosis
CHAPTER 3
BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE
BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS
3.1 Introduction
Feature Selection [37, 38, 49, 88] is a technique which is used to reduce the
dimensionality of data or eliminate the irrelevant features and improve the predictive accuracy.
The feature selection begins with an empty set of features and generates all possible single
feature expansions and the subset with the maximum accuracy is chosen and expanded in the
same way by adding single features. The search continues and if the accuracy’s subset
expansion is maximized, then the search goes to the next best unexpanded subset. Then, the
subset with the maximum accuracy will be selected as the reduced feature set [7, 33, 59].
The objective of this study is to predict the life expectancy for patients with
hepatitis based on hepatitis data and improve the classification accuracy. This work is
used Naive Bayes algorithm to get the accuracy of the classification and prediction.
In order to increase its accuracy Correlation Based Feature Selection (CFS), best first
algorithm and greedy approach of feature selection are being used. This is to make sure
that the noisy or irrelevant features are removed. Then compare the accuracy of prediction by
using Naive bayes and other classification algorithms like J48, Multi layer Perceptron
(MLP), Sequential Minimal Optimization (SMO) and Radial Basis Function (RBF).
This chapter is organized as follow: The section 3.2 deals with the concept of CFS
and best first search and greedy search algorithm. The section 3.3 deals with the proposed
methodology and the section 3.4 illustrates the Experimental results.
65
3.2 Correlation Based Feature Selection Filter
Attribute selection involves searching through all possible combinations of
attributes in the data to find which subset of attributes works best for prediction. For that
purpose an attribute evaluator and a search method are needed. The evaluator determines
what method is used to assign a worth to each subset of attributes. The search method
determines what style of search is performed. In this work CFS is used as an evaluator
and BFS and GS as searching methods.
3.2.1 Feature Evaluation
CFS [41, 57, 61,100] is a heuristics approach for evaluating the worth or merit of
a subset of features. It is one of the techniques to rank the relevance of features by
measuring correlation between features and classes and between features and other
features. Given number of features k and classes c, CFS defined relevance of features
subset by using Pearson’s correlation equation
M S = k rcf / k + (k − 1) r ff r�cf
(3.1)
Where Ms is relevance of feature subset, rcf is the average linear correlation coefficient
between these features and classes and r ff
is the average linear correlation coefficient
among different features.
Normally, CFS adds (forward selection) or deletes (backward selection) one
feature at a time, however, in this work used best first search (BFS) and greedy hill
climbing search algorithms for the best results [24, 25].
66
3.2.2 Searching the Feature Subset
Feature selection can be viewed as a search problem, with each state in the search
space specifying a subset of the possible features. Search strategies can be influenced by
search directions. In this section Greedy search and best first search are elaborated.
3.2.2.1 Greedy Hill Climbing Search (GS)
Searching the space of feature subsets within reasonable time constraints is
necessary if a feature selection algorithm is to operate on data with a large number of
features. One simple search strategy, called greedy hill climbing, considers local changes
to the current feature subset. Frequently, local change is the addition or deletion of a
single feature from the subset. When the algorithm considers only additions to the feature
subset it is known as forward selection and the deletion is known as backward
elimination method [24, 25, 15].
An alternative approach, called stepwise bi-directional search, uses both addition
and deletion. It encompasses each of these variations, the search method may consider all
possible local changes to the current subset and then choose the best, or the first change
that improves the merit of the current feature subset. In both cases, once a change is
identified then it is never reconsidered.
3.2.2.2 Best First Search (BFS)
The Best first search is an important AI search strategy that allows backtracking
along the search path [24, 25]. Like greedy hill climbing, best first search moves through
the search space by making local changes to the current feature subset. However, unlike
hill climbing method, suppose path being explored begins to look less promising, the best
67
first search method can back-track to a more promising previous subset and continue the
search from there. A best first search will explore the entire search space for specified
time, so it is common to use a stopping criterion. Normally this involves limiting the
number of fully expanded subset and that results in no improvement.
3.2.3 Correlation Measures
Correlation measures are important to get the merit of the feature subset. To estimate
the merit of a feature subset, it is necessary to compute the correlation (dependence)
among attributes by applying equation (3.1). Research on decision tree induction has
provided a number of methods for estimating the quality of an attribute- that is, how
predictive one attribute is of another.
For discrete class problems, CFS first discretizes numeric features using the
technique of Fayyad and Irani and to estimate the degree of association among discrete
features then uses a modified information gain measure (symmetrical uncertainty). If X
and Y are discrete random variables, Equations (3.2) and (3.3) give the entropy of Y
before and after observing X.
H (Y ) = −∑ p ( y ) log 2 p( y )
(3.2)
H (Y | X ) = − ∑ p ( x)∑ p ( y | x) log 2 p ( y | x)
(3.3)
y∈Y
x∈ X
y∈Y
The amount by which the entropy of Y decreases and reflects the additional
information about Y provided by X is called the information gain. Information gain is
given by
Gain = H(Y) - H(Y|X)
(3. 4)
68
= H(X) – H(X|Y )
= H(Y) + H(X) - H(X,Y )
Information gain is a symmetrical measure, (i.e.), the amount of information gained
about Y after observing X is equal to the amount of information gained about X after
observing Y. Unfortunately, information gain is biased in favour of features with more
values, that is, attributes with greater numbers of values will appear to gain more information
than those with fewer values even if they are actually no more informative. Furthermore, the
correlations in Equation (3.1) should be normalized to ensure that they are comparable and
have the same effect. Symmetrical uncertainty compensates for information gain's bias
toward attributes with more values and normalizes its value to the range [0, 1]:
Symmetrical Uncertainty = 2.0 x
 gain 
 H (Y ) + H ( X ) 


(3.5)
To handle unknown (missing) data values in an attribute, CFS distributes their
counts across the represented values in proportion to their relative frequencies.
3.3 Proposed Work
The proposed approach is incorporated in two categories. Firstly, all the numbers of
features of the hepatitis disease dataset were reduced to ten from nineteen by CFS Evaluator
based on best first and the subset is classified by using Naive Bayes classification Algorithm.
The model is evaluated with the performance measures like accuracy, sensitivity, specificity
and precision. Secondly, all the numbers of features of the hepatitis disease dataset were
reduced to ten from nineteen by CFS Evaluator based on Greedy search and the subset is
classified by using Naive Bayes classification Algorithm. The model is evaluated with the
69
above said performance measures. The Architectural diagram of proposed methodology is
shown in Fig. 3.1.
Hepatitis Dataset
Dimensionality Reduction
CFS and
Best First
Search
CFS and
Greedy
Search
Reduced
Subset1
Reduced
Subset2
Classification by Naive Bayes
Performance Evaluation
Fig. 3.1 Proposed Methodology of BFSCFS-NB and GSCFS-NB
3.3.1 BFSCFS-NB Algorithm
A copy of the training data is first discretized and then passed to CFS. CFS
calculates feature-class, feature-feature correlations using symmetric Uncertainty
measures and then searches the feature subset space best first search. The subset with the
highest merit found during the search is used to reduce the dimensionality of both the
original training data and the testing data. The reduced datasets may then be passed to a
machine learning scheme for training and testing. Training and testing data are reduced to
contain only the features selected by CFS. The dimensionally reduced data can then be
passed to a Naïve Bayes algorithm for classification and prediction.
70
The working principle of BFSCFS-NB is shown in Fig.3.2.
CFS
Training
data
Data pre-processing
Calculate
Feature
correlation
Discretization
Feature-class
BFS search
f1 f2 f3 f4
Feature
Set
Merits
Class
Featurefeature
f2
f1
f3
Feature
Evaluation
f4
Naïve Bayes
Algorithm
Dimensionality
Reduction
Final
Evaluation
Testing data
Fig. 3.2 Proposed Model for BFS based CFS and Naïve Bayes Algorithm
71
The following shows the proposed BFSCFS-NB algorithm:
Step 1: To start with OPEN list containing the start state, the CLOSED list empty and
BEST← start state.
Step 2: Let assign s = arg max e(x) (get the state from OPEN with the highest evaluation).
Step 3: Eliminate s from OPEN and add to CLOSED.
Step 4: If e(s) ≥ e(BEST), then BEST← s.
Step 5: For every child t of s that is not in the OPEN or CLOSED list, evaluate and add
to OPEN.
Step 6: If BEST changed in the last set of expansions, go to 2
Step 7: Return BEST.
Step 8: Obtain the new data set.
Step 9: Construct both training and test data discrete.
Step 10: Estimate the prior probabilities P(Cj), j=1,... k from the training data, where k
is the number of classes.
Step 11: Estimate the conditional probabilities P(Ai= aℓ│Cj), i= 1,....,D, j=1,....,k,
ℓ= 1,....,d from the training data, where D is the number of features, d is the
number of discretization level.
Step 12: Estimate the posterior probabilities P(Cj│A) for each test example x
represented by a feature vector A.
Step 13: Assign x to the class C* such that C*= arg maxj=1,2 P(Cj│A).
72
The first half of the algorithm from step one to eight is used to select the subset
using Best First Search and then the second half of the algorithm from nine to thirteen are
for classification using Naïve Bayes.
3.3.2 GSCFS-NB Algorithm
The working principle of GSCFS-NB is shown in Fig.3.3.
CFS
Training
data
Data pre-processing
Calculate
Feature
correlation
Discretization
Feature-class
Greedy
search
f1 f2 f3 f4
Feature
Set
Class
Featurefeature
f2
f1
f3
Merits
Feature
Evaluation
f4
Naïve Bayes
Algorithm
Dimensionality
Reduction
Final
Evaluation
Testing data
Fig. 3.3 Proposed Model for GS based CFS and Naïve Bayes Algorithm
73
A copy of the training data is first discretized and then passed to CFS. CFS
calculates feature-class, feature-feature correlations using symmetric uncertainty
measures and then searches the feature subset space using Greedy search. The subset with
the highest merit found during the search is used to reduce the dimensionality of both the
original training data and the testing data. The reduced datasets may then be passed to a
machine learning scheme for training and testing. Training and testing data are reduced to
contain only the features selected by CFS. The dimensionally reduced data can then be
passed to a Naïve Bayes classification for prediction.
The proposed GSCFS-NB algorithm is given below.
Step 1: Let s← start state.
Step 2: Enlarge s by making each possible local change.
Step 3: Evaluate each child t of s.
Step 4: Let s’← child t with the highest evaluation e (t).
Step 5: If e(s’) ≥ e(s) then s←s’, go to 2.
Step 6: Return s.
Step 7: Obtain the new data set.
Step 8: Construct both training and test data discrete.
Step 9: Estimate the prior probabilities P(Cj), j=1,... k from the training data, where k
is the number of classes.
Step 10: Estimate the conditional probabilities P(Ai= aℓ│Cj), i= 1,....,D, j=1,....,k,
ℓ= 1,....,d from the training data, where D is the number of features, d is the
number of discretization level.
74
Step 11: Estimate the posterior probabilities P(Cj│A) for each test example x
represented by a feature vector A.
Step 12: Assign x to the class C* such that C*= arg maxj=1,2 P(Cj│A).
The first half of the algorithm from step one to seven is used to select the subset
using Greedy Search and then the second half of the algorithm from eight to twelve is for
classification using Naïve Bayes.
3.4 Experimental Results and Discussion
The hepatitis data set were applied to evaluate the proposed method. The whole
dataset is divided for training the models and test them by the ratio of 66%:34%
respectively. The training set is used to estimate each model parameters, while the test set
is used to independently assess the individual models. The experiments are implemented
in WEKA data mining workbench using Intel(R) core(TM)-i3-2328M CPU with 2.20 GHz.
In this work CFS was used as a feature selection method. In CFS, the best first
and Greedy search was used as searching strategy. After Feature selection, the numbers
of attributes selected by greedy search based CFS were age, sex, malaise, spiders, ascites,
varices, bilirubin, albumin, protime, histology respectively.
The attributes selected through Best First Search based CFS were age, sex,
malaise, spiders, ascites, varices, bilirubin, albumin, protime, histology. After identifying
the subset, naïve bayes algorithm was used for classification purpose. Next stage was
compared with the existing classification algorithms like sequential minimal
optimization, J48, multilayer Perceptron and Radial Basis Function.
75
3.4.1 Performance Evaluation
The experimental results illustrate the various measures that are used to evaluate
the model for classification and prediction. In this work the accuracy, sensitivity,
specificity, precision and kappa statistics are elaborated.
Accuracy
The accuracy of a classifier on a given test set is the percentage of test set tuples
that are correctly classified by the classifier. The associated class label of each test tuple
is compared with the learned classifier’s class prediction for that tuple. The ten fold cross
validation method is used for estimating classifier accuracy.
The Equation (3.6) is to measure the accuracy.
Accuracy = (TP + TN)/(TP + TN + FP + FN)
(3.6)
Sensitivity
Sensitivity is also referred to as the true positive rate, that is, the proportion of positive
tuples that are correctly identified. The Equation (3.7) is used to measure the sensitivity.
Sensitivity = TP/(TP + FN)
(3.7)
Specificity
The specificity is the true negative rate. That is, the proportion of negative tuples
that are correctly identified. The Equation (3.8) is used to measure the specificity.
Specificity = TN/(TN + FP)
(3.8)
Precision
A false positive (FP) occurs when the outcome is incorrectly predicted as yes
(or positive) when it is actually no (negative). A false negative (FN) occurs when the
76
outcome is incorrectly predicted as no when it is actually yes. The Equation (3.9) is used
to measure the Precision.
Precision = TP/(TP + FP)
(3.9)
The accuracy, Time, Precision, Sensitivity and Specificity for Naive Bayes, J48,
Multilayer Perceptron, SMO and RBF [91, 92] are shown in Table 3.1.
Table 3.1 Performance Measures: Before Feature Selection
Classification
Algorithms
Accuracy
Time
Precision
Sensitivity
Specificity
Multi Layer
Perceptron
80.0%
0.61
80.7%
80.0%
85.17%
RBF
83.80%
0.2
83.2%
83.8%
87.86%
SMO
83.16%
0.11
82.7%
83.2%
86.95%
J48
83.87%
0.11
82.5%
83.9%
87.93%
Naive Bayes
84.51%
0.02
85.3%
84.5%
88.61%
The accuracy, Time, Precision, Sensitivity and Specificity for BFS based CFS-MLP,
BFS based CFS-RBF, BFS based CFS-SMO, BFS based CFS-J48 and BFS based
CFS-NB based are listed out in Table 3.2.
Table 3.2 Performance Measures: After Feature Selection based on BFSCFS
Classification
Algorithms
Accuracy
Time
Precision
Sensitivity
Specificity
BFS based
CFS-MLP
84.51%
0.36
84.2%
84.5%
91.05%
BFS based
CFS-RBF
86.4%
0.33
85.8%
86.5%
91.49%
BFS based
CFS-SMO
83.22%
0.25
81.6%
83.2%
90.30%
BFS based
CFS-J48
81.29%
0.08
79.3%
81.3%
89.68%
BFS based
CFS-NB
88.53%
0.02
88.9%
88.7%
92.86%
77
Based on classification accuracy, sensitivity and specificity the models were
evaluated. CFS with Best first search and naïve bayes algorithm were applied in this
proposed method. Using this model, a prediction accuracy of 88.53% is achieved.
The accuracy, Time, Precision, Sensitivity and Specificity for GS based CFS-MLP,
GS based CFS-RBF, GS based CFS-SMO, GS based CFS-J48 and GS based CFS-NB are
mentioned in Table 3.3.
Table 3.3 Performance Measures: After Feature Selection based on GSCFS
Classification
Algorithms
GS based
CFS- MLP
GS based
CFS-RBF
GS based
CFS-SMO
GS based
CFS-J48
GS based
CFS-NB
Accuracy
Time
Precision
Sensitivity
Specificity
84.51%
0.47
84.2%
84.5%
89.90%
86.45%
0.13
85.8%
86.5%
89.15%
83.22%
0.06
81.6%
83.2%
88.30%
81.29%
0.05
79.3%
81.3%
89.38%
87.74%
0.02
87.9%
87.7%
91.86%
Based on classification accuracy, sensitivity and specificity the models were
evaluated. This work has applied CFS with Greedy search and naïve bayes algorithms
were proposed. Using this model a prediction accuracy of 87.74% is achieved.
3.4.2 K-Fold Cross Validation
In k-fold cross-validation, the initial data are randomly partitioned into k mutually
exclusive subsets or folds, D1, D2… Dk , each of approximately equal size and the training
78
and testing are performed k times. In iteration i, partition Di is kept as the test set and the
remaining partitions are collectively used to train the model. That is, in the first iteration,
subsets D2, ….. Dk collectively serve as the training set in order to obtain a first model,
which is tested on D1. The second iteration is trained on subsets D1, D2, ….. Dk and tested
on D2 and so on and so forth. Here, each sample is used the same number of times for
training and once for testing. The accuracy estimate for classification is the overall
number of correct classifications from the k iterations, divided by the total number of
tuples in the initial data. The error estimate for prediction can be computed as the total
loss from the k iterations, divided by the total number of initial tuples.
Leave-one-out is a special case of k-fold cross-validation where k is set to the
number of initial tuples. That is, only one sample is left out at a time for the test set.
In stratified cross-validation, the folds are stratified so that the class distribution of the
tuples in each fold is approximately the same as that in the initial data. In this work
10-fold cross-validation is used for estimating accuracy.
3.4.3 Kappa Statistics
The kappa parameter measures pairwise agreement between two different
observers, corrected for an expected chance agreement [40]. For instance, if the value is
one, it means that there is a complete agreement between the classifier and real world
value. Kappa value is calculated using the following equation:
K = [ P ( A) − P ( E ) /[1 − P ( E )]
(3.10)
P ( A) = (TP + TN ) / N
(3.11)
P ( E ) = [(TP + FN ) * (TP + FP ) * (TN + FN )] N
79
2
(3.12)
Where N is the total number of instances used. P (A) is the percentage of agreement
between the classifier and underlying truth calculated by Equation (3.11). P (E) is the
chance of agreement calculated by Equation (3. 12). In this study the kappa value is
0.6302 for CFS-NB based on best first search and greedy search which is calculated by
Equation (3.10).
The mean absolute error is a quantity used to measure predictions of the eventual
outcomes and the mean absolute error is given by,
d
∑| y
Mean absolute Error =
i =1
i
d
− yi '|
(3.13)
The mean absolute error is an average of the absolute errors |yi
-
yi’|, Where
yi = prediction and yi’ = true value.
The Root mean squared error is the square root of the mean of the squares of the
values. It squares the errors before they are averaged and Route Means Square Error
gives a relatively high weight to large errors. The Route Means Square Error Ei of an
individual program i is evaluated by the equation:
Ei =
1 n  P(ij) − Tj 
∑ T 
n j =1 
j

2
(3.14)
Where P(ij) = the value predicted by the individual program . i = fitness case and Tj=the
target value for fitness case j.
The mean absolute error and root mean square error of the classification error
measures are shown in Tables 3.4 and 3.5.
80
Table 3.4 Classifier Statistical results based on BFSCFS
Classifier
Mean Absolute Error
Root Mean
Square Error
Kappa Statistics
BFS based CFS-MLP
0.1743
0.369
0.5163
BFS based CFS-RBF
0.1858
0.3354
0.5611
BFS based CFS-SMO
0.1677
0.4096
0.4056
BFS based CFS-J48
0.254
0.3822
0.3458
BFS based CFS-NB
0.1514
0.3235
0.6302
Table 3.5 Classifier Statistical Results based on GSCFS
Classifier
Mean Absolute Error
Root Mean
Square Error
Kappa Statistics
GS based CFS-MLP
0.1743
0.369
0.5163
GS based CFS-RBF
0.1858
0.3354
05611
GS based CFS-SMO
0.1677
0.4096
0.4056
GS based CFS-J48
0.2541
0.3822
0.3458
GS based CFS-NB
0.1514
0.3235
0.6302
3.4.4 Confusion Matrix
The confusion matrix is a useful tool for analyzing how well your classifier can
recognize tuples of different classes. A confusion matrix for two classes is shown in Table 3.6.
Table 3.6 Different outcomes of two Class Prediction
Predicted Class
Yes
Actual Class
Yes
No
No
True
Positive
False
Negative
False
Positive
True
Negative
81
Given m classes, a confusion matrix is a table of at least size m by m. An entry, CMi,j
in the first m rows and m columns indicates the number of tuples of class i that were labelled
by the classifier as class j. To have good accuracy for a classifier preferably most of the
tuples would be represented along the diagonal of the confusion matrix from entry CM1.1 to
entry CMm, m with the rest of the entries being close to zero. The table may have additional
rows or columns to provide totals or recognition rates per class. Given two classes, it could
be considered positive tuples and negative tuples. True positives refer to the positive tuples
that were correctly labeled by the classifier, while true negatives are the negative tuples that
were correctly labeled by the classifier. The False positives are the negative tuples that were
incorrectly labeled. Similarly, false negatives are the positive tuples that were incorrectly
labeled. These terms are useful when analyzing a classifier’s ability.
A confusion matrix is calculated for Naive Bayes, BFS based CFS-NB classifiers
and GS based CFS-NB classifier to interpret the results. The confusion matrix is shown
in Tables 3.7 and 3.8.
Table 3.7 Confusion Matrix: Before Feature Selection based on NB
a
b
Classified as
22
10
a = DIE
14
109
b = LIVE
Table 3.8 Confusion Matrix: After Feature Selection based on BFSCFS-NB
a
b
Classified as
23
9
a = DIE
9
114
b = LIVE
82
Table 3.9 Confusion Matrix: After Feature Selection based on GSCFS-NB
a
b
Classified as
22
10
a = DIE
9
114
b = LIVE
3.4.5 Graph Results
This section illustrates the graph results of the accuracy, sensitivity, specificity and
precision and time over BFS based Naive bayes, BFS based J48, BFS based Multilayer
Perceptron, BFS based Sequential Minimal Optimization and BFS based Radial Basis
Function, GS based Naive bayes algorithm, GS based J48, GS based Multilayer Perceptron,
GS based Sequential Minimal Optimization and GS based Radial Basis Function
Fig.3.4 shows performance analysis related to accuracy of various classification
algorithms for BFS based CFS.
Performance Analysis Related to Accuracy
90
Accucacy in %
88
86
84
82
80
Accuracy
78
76
BFS based BFS based BFS based BFS based BFS based
CFS-MLP CFS-RBF CFS-SMO CFS-J48 CFS-NB
Algorithms
Fig. 3.4 Performance related to Accuracy for BFS based CFS-NB
83
Fig. 3.5 shows performance analysis related to time over various classification
algorithms for BFS based CFS.
Time(ms)
Performance Analysis Related to Time
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Time
BFS based BFS based BFS based BFS based BFS based
CFS-MLP CFS-RBF CFS-SMO CFS-J48
CFS-NB
Algorithms
Fig. 3.5 Performance related to Time for BFS based CFS-NB
Fig.3.6 shows performance analysis related to Sensitivity of various classification
algorithms for BFS based CFS.
Performance Analysis Related to Sensitivity
90
Sensitivity in %
88
86
84
82
80
Sensitivity
78
76
GS based GS based GS based GS based GS Based
CFS-MLP CFS-RBF CFS-SMO CFS-J48 CFS-NB
Algorithms
Fig. 3.6 Performance related to Sensitivity for BFS based CFS-NB
84
Fig. 3.7 shows performance analysis related to Specificity of various classification
algorithms for BFS based CFS.
Specificity in %
Performance Analysis Related to Specificity
92.5
92
91.5
91
90.5
90
89.5
89
88.5
Specificity
BFS basedBFS based BFS based BFS based BFS Based
CFS-MLP CFS-RBF CFS-SMO CFS-J48 CFS-NB
Algorithms
Fig. 3.7 Performance related to Specificity for BFS based CFS-NB
Fig. 3.8 shows performance analysis related to Precision of various classification
algorithms for BFS based CFS.
Precision in %
Performance Analysis Related to Precision
90
88
86
84
82
80
78
76
74
Precision
BFS based BFS based BFS based BFS based BFS Based
CFS-MLP CFS-RBF CFS-SMO CFS-J48
CFS-NB
Algorithms
Fig. 3.8 Performance related to Precision for BFS based CFS-NB
85
Fig. 3.9 shows performance analysis related to accuracy of various classification
algorithms for GS based CFS.
Performance Analysis Related to Accuracy
90
Accucacy in %
88
86
84
82
Accuracy
80
78
GS based GS based GS based GS based GS based
CFS-MLP CFS-RBF CFS-SMO CFS-J48 CFS-NB
Algorithms
Fig. 3.9 Performance related to Accuracy for GS based CFS-NB
Fig. 3.10 shows performance analysis related to time of various classification
algorithms for GS based CFS.
Time(ms)
Performance Analysis Related to Time
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Time
GS based GS based GS based
CFS-MLP CFS-RBF CFS-SMO
GS based GS based
CFS-J48
CFS-NB
Algorithms
Fig. 3.10 Performance related to Time for GS based CFS-NB
86
Fig. 3.11 shows performance analysis related to Sensitivity of various
classification algorithms for GS based CFS.
Performance Analysis Related to Sensitivity
Sensitivity in %
90
88
86
84
82
Sensitivity
80
78
GS based GS based GS based GS based GS Based
CFS-MLP CFS-RBF CFS-SMO CFS-J48 CFS-NB
Algorithms
Fig. 3.11 Performance related to Sensitivity for GS based CFS-NB
Fig. 3.12 shows performance analysis related to Specificity of various
classification algorithms for GS based CFS.
Performance Analysis Related to Specificity
Specificity in %
93
92
91
90
89
88
Specificity
87
86
GS based GS based GS based GS based GS Based
CFS-MLP CFS-RBF CFS-SMO CFS-J48 CFS-NB
Algorithms
Fig. 3.12 Performance related to Specificity for GS based CFS-NB
87
Fig. 3.13 shows performance analysis related to Precision of various classification
algorithms for GS based CFS.
Precision in %
Performance Analysis Related to Precision
90
88
86
84
82
80
78
76
74
Precision
GS based GS based GS based GS based GS Based
CFS-MLP CFS-RBF CFS-SMO CFS-J48 CFS-NB
Algorithms
Fig. 3.13 Performance related to Precision for GS based CFS-NB
3.5 Summary
In this proposed work an enhanced medical diagnostic method for addressing
hepatitis diagnosis problem is developed. Experiment results on various portions of the
hepatitis dataset proved that the new approach performs better in distinguishing the live
from the dead one. It is observed that BFS based CFS-NB achieved the best classification
accuracies for a reduced feature subset that contained ten features. Meanwhile,
comparative study is conducted on the methods such as BFS based CFS-MLP, BFS based
CFS-SMO, BFS based CFS-J48, BFS based CFS-RBF, GS based CFS-MLP, GS based
CFS-SMO, GS based CFS-J48 and GS based CFS-RBF. The experimental result shows
that the BFS based CFS-NB performed advantageously over the other methods in terms
88
of the classification accuracy, Sensitivity, Specificity and time. Other measures like
kappa statistics and classification error measures are also elaborated. The BFS search
based CFS-NB‘s performance is better with other algorithms.
In this work best first search and greedy search techniques were used. Further
work incorporate Particle swarm Optimization and Genetic search based algorithms
GNSCFS-NB and PSOCFS-NB were developed to reduce the dimensionality of data and
computational cost. These algorithms are described in the subsequent chapter.
89