Data mining project Phase (2) Index: 1.1) introduction…………………………………………………………………..2 1.2) associations…………………………………………………………………..2 1.3) classifications………………………………………………………………4 1.4) conclusion………………………………………………………………………7 1 1.1) Introduction: In this phase we introduce association and classifications methods on two data sets "White wine and Brest tissue" which we described in the last phase. We apply association to see the base rules on each data sets. And apply classifications rule (Rule induction and K-nearest on white wine, Decision tree and naïve Byes on Brest tissue) to find best accuracy for prediction. 1.2) Associations: 1.2.1) White wine dataset: At first make selected attribute [alcohol, free sulfur dioxide, pH, quality, residual sugar, volatile acidity] which we know it is related from preprocessing phase. Then convert real to integer to prepare data to convert to binominal. Then use FP-Growth to generate frequent item sets at min support =0.8. Finally create association to generate rules. Figure 1.2.1.1: build association rule to white wine dataset Figure 1.2.1.2: table view result for white wine dataset 2 Figure 1.2.1.3: text view association rules for white wine dataset Explain results: From figure 1.2.1.3 illustrate that the confidence is greater than or equal 0.9, this means this rule is important. But when you see figure 1.2.1.2 the left is equal 1 that means all items are not correlated . 1.2.2) Brest tissue dataset: At first make selected attribute [AD/A, DR, IO, MAX IP, P, CASE#, CALSS] which we know it is related from preprocessing phase. Then convert real to integer to prepare data to convert to binominal. Then use FP-Growth to generate frequent item sets at min support =0.95. Finally create association to generate rules. Figure 1.2.2.1: build association rule to Brest tissue dataset Figure 1.2.2.2: table view result to Brest tissue dataset 3 Figure 1.2.2.3: text view result to Brest tissue dataset (association rules) Explain results: From figure 1.2.2.3 illustrate that the confidence is equal 100%, this means this rule is important. But when you see figure 1.2.2.2 the left is equal 1 that means all items are not correlated . 1.3) Classifications: 1.3.1) White wine dataset: In this dataset before we using any model to make classification we need to make discretizeing in training and testing side because there is no nominal target class. 1.3.1.1) Rule Induction: Figure 1.3.1.1.1: the first step to build classification for white wine dataset Figure 1.3.1.1.2: the second step to build training and testing for white wine dataset by using rule induction 4 Figure 1.3.1.1.3: the result of classification for white wine dataset 1.3.1.2) K-nearest: Figure 1.3.1.2.1: the first step to build classification for white wine dataset Figure 1.3.1.2.3: the second step to build training and testing for white wine dataset by using K-NN Figure 1.3.1.2.3: the result for white wine dataset Explain results: By comparing the accuracy of Rule induction =73.27%, Decision tree =60.22, K-nearest=72.45%, Naïve Byes=65.31%. We found that the best one is rule induction. Because predict true good and true very good with large ratio of correct prediction, true excellent with less correct prediction, and no prediction for true bad. But in KNN predict true very good with large ratio of true prediction and predict true good with less ratio of true prediction but true bad and true excellent there is no correct prediction. 5 1.3.2) Brest tissue dataset: 1.3.2.1) Decision Tree: Figure 1.3.2.1.1: the first step to build classification for Brest tissue dataset Figure 1.3.2.1.2: the second step to build training and testing for Brest tissue dataset by using Decision tree Figure 1.3.2.1.3: the result for Brest tissue dataset 1.3.2.2) Naïve Bayes: Figure 1.3.2.2.1: the first step to build classification for Brest tissue dataset Figure 1.3.2.1.2: the second step to build training and testing for Brest tissue dataset by using Naïve Bayes 6 1.3.2.2.3: the result for Brest tissue dataset Explain result: By comparing the accuracy of models in Brest tissue set the decision tree and rule induction is the best 100% , Naïve Bayes 93.75% and K-NN 71.88% . because in decision tree all of column are correctly predict the target class but in Naïve Bayes there are an errors prediction in true con and true mas. 1.4) Conclusion: 1. If confidence is equal to 100% that means the left and right sides in the rule are view in transaction set equally to view number of left side 2. If confidence is less than 100% that means the left and right sides in the rule are view in transaction set less than to view number of left side 3. Rule Induction is better than Decision Tree with huge data and numerical because the induction rule give an excess rules than decision tree which give a rule from the tree only. 4. In white wine dataset have a noise and large dataset but when we apply Knearest does not give a good accuracy because it depends on the value of k and the attributes are independence. 5. Naïve Bayes also given a low accuracy in white wine dataset, it is give the prediction of the (true very good) is greatest one because very good repeated in target class of dataset more than the other classes. 7
© Copyright 2026 Paperzz