ML2008_Assignment2_Weka.pdf

Machine Learning(40717)
Fall 2008
From: 1387/8/21
Due: 1387/9/4
Assignment 2
Computer Engineering
Machine Learning Tools(1)
“Weka”
Sharif University of
Technology
What is Weka:
(Waikato Environment for Knowledge Analysis)
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine
learning software written in Java, developed at the University of Waikato. WEKA is free
software available under the GNU General Public License.
Weka is a collection of machine learning algorithms that especially used in data mining
tasks. The algorithms can either be applied directly to a dataset or called from your own
Java code. Weka contains tools for data pre-processing, classification, regression,
clustering, association rules, and visualization. It is also well-suited for developing new
machine learning schemes. In this Assignment you should work with Weka and use it to
analysis different machine learning algorithms.
Downloading and installing Weka:
There are different options for downloading and installing it on your system. See:
http://www.cs.waikato.ac.nz/~ml/weka/index.html
Learning Weka:
You must use Weka Knowledge Explorer. For more info about it, you can see:
http://www.cs.waikato.ac.nz/~ml/weka/gui_explorer.html
Downloading Data Sets:
You must download two dataset including spambase.arff and iris.arff from:
http://www.hakank.org/weka/iris.arff
http://www.hakank.org/weka/spambase.arff
What you need to do:
1. For each dataset D, perform the following experiments using 1X-fold crossvalidation:
(X is the last digit of your student ID. If (X=2) then {12-fold crossvalidation})
a. Create a classifier based on D using J48 (C4.5):
•
Run error-based pruning (the default), while testing the effect of the
Confidence
factor parameter (if (x is odd) then {0.25, 0.5, and 0.75}
else {0.3, 0.5, and 0.7}).
•
Run "Reduced error pruning" (which uses validation set), testing the effect of
the validation set portion (if (x is odd) then {3, 5, and 7} else {4, 6, and 8}).
b. Create a classifier based on D using KNN (IBK), while examining the
affect of:
• Different values of K (if (x is odd) then {3, 5, and 7} else {4, 9, and 14}).
• Weights by distance (1/distance and 1-distance).
c. Create a classifier based on D using NaiveBayes.
d. Create a classifier based on D using ID3.
(First you must discretize the non-nominal attributes by using discretize filter)
2. Summarize your experimental results in tables and graphs.
3. Compare the performance of ID3, KNN, j48, and NaiveBayes based on the accuracy
they yielded in the above experiments. Draw conclusions about their relative
performance with respect to the datasets of different nature and with different
parameter settings.
Note: Don’t forget to perform all experiments under 1X-fold cross-validation!
(X is the last digit of your student ID. If (X=2) then {12-fold crossvalidation})
Feel free to contact Mr.Ghasempour for your questions by [email protected]
Delivery format:
1.You should briefly explain what your work with Weka for doing this assignment, and
especially and completely explain your result as graph, table, … and analyze them.
2.You should upload your result + Document + other needed files as single zipped file
only in Sharif Courseware (http://cw.sharif.edu). Your file name should be
HW_X_ID_fullName.rar that X indicate homework number and ID indicate you student
number for example Æ “HW_2_87111111_RasoulMohammadiNasiri.rar”.
3. Check your file before uploading, no corruption would have expected.