A Short Introduction to Weka Natural Language Processing Thursday, November 5th What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’ weka Homepage ● http://www.cs.waikato.ac.nz/ml/weka/ ● To run: – java -Xmx1024M -jar ~cs4705/bin/weka.jar & .arff file format ● http://www.cs.waikato.ac.nz/~ml/weka/arff.html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE sepallength NUMERIC sepalwidth NUMERIC petallength NUMERIC petalwidth NUMERIC class {Iris-setosa,Iris-versicolor, Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa … .arff file format @attribute attrName {numeric, string, <nominal>, date} numeric: a number nominal: a (finite) set of strings, e.g. {Iris-setosa,Iris-versicolor, Irisvirginica} string: <arbitrary strings> date: (default ISO-8601) yyyy-MMdd’T’HH:mm:ss Example Arff Files ● ~cs4705/bin/weka-3-4-11/data/ ● iris.arff ● soybean.arff ● weather.arff To Classify with weka GUI 1. Run weka GUI 1. (in Unix: java –jar weka.jar) 2.Click 'Explorer' 3.'Open file...' 7.Click 'Start' 8.Wait... 9.Right-click on Result list entry 4.Select 'Classify' tab a.'Save result buffer' 5.'Choose' a classifier b.'Save model' 6.Confirm options Classify ● Some classifiers to start with. – NaiveBayes – JRip – J48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation! Analyzing Results ● Important tools for Homework 3 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization Running weka from the Command Line ● ● http://weka.wikispaces.com/Primer Running an N-fold cross validation experiment – ● java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i Using a predefined test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff ● Saving the model – ● Classifying a test set – ● java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff Getting help – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -? Homework 3 Weka Workflow … T1 Your Feature Extractor S1 S2 … TN Your Feature Extractor .arff Weka best model Test .arff SN results Preprocessing (you) Experimentation (you) Weka results Grading (us) Tips for Homework Success ● ● ● ● Start early Read instructions carefully Start simply Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.
© Copyright 2025 Paperzz