CPCS202 - The Lab Note

28th
March
2010
LAB 5: Data Pre-Processing Techniques
Statement Purpose:
Today students will learn about Data Pre-processing techniques,
we will apply several techniques on different Excel datasets by
using Rapidminer.
Activity Outcomes:
Student will learn and practice following Data Pre-processing
schemes by using real world examples and data:
1. Data Cleaning Techniques:
a.Fill the Missing Values by using Rapidminer
b. Outlier Treatment for Reducing Noise by using
Rapidminer
2. Data Transformation Technique:
 Normalization by using Rapidminer
3. Data Discritization Technique:
 Discritization by using Rapidminer
Instructor Note:
Follow the instructions.
CPIS-342 - The Lab Note
Lab 5
28th
March
2010
LAB 5: Data Pre-Processing Techniques
1. Data Cleaning Techniques:
a. Fill the Missing Values by using Rapidminer
 Open Rapidminer and Import File “sales_data missing “
 Check the missing value in meta data view ,take look to
amount attribute .
 Using operators expand Data Transformation ,then expand
Data Cleansing ,open Replace Missing Values.
CPIS-342 - The Lab Note
Lab 5
28th
March
2010
LAB 5: Data Pre-Processing Techniques
 Run the program, check the missing values
b. Detect Outlier by using Rapidminer
Import File sales_data Outlier
 Take a view on this file in the attribute amount, there are
some outliers (Noise), means there are some values under this
attribute which are very far from other. Now we will apply a
technique of data cleaning to reduce this noise from the data.
CPIS-342 - The Lab Note
Lab 5
28th
March
2010
LAB 5: Data Pre-Processing Techniques
 Using operators expand Transformation ,then expand Data
Cleansing ,Outlier Detection open Detect Outlier(Distant)
 In Detect outlier number of outlier change the value to 1
 Run the program, Check the outlier
CPIS-342 - The Lab Note
Lab 5
28th
March
2010
LAB 5: Data Pre-Processing Techniques
2. Data Transformation Technique:
 Normalization by using Rapidminer
Import File sales_data Normalize
 Take a view on this file in all attributes, we will apply
normalization on this file, by using “Range TransformationMin-Max Method”, with the range of 0.0 to 1.0.
 Using operators expand Data Transformation ,then expand
Value Modification Numerical Value Modification open
Normalize)
CPIS-342 - The Lab Note
Lab 5
28th
March
2010
LAB 5: Data Pre-Processing Techniques
Run the program, Check the result after normalization all the values will
replace by given range.
CPIS-342 - The Lab Note
Lab 5
28th
March
2010
LAB 5: Data Pre-Processing Techniques
3. Data Discretization Technique:

Discritization by using Rapidminer
Import File sales_data Discretization
 Take a view on this file in all attributes, we will apply
discretization on this file, by using “Discretize by Binning”,
provided by number of bins.
 Using operators expand Data Transformation ,then expand
Type Conversion  Discretization  Discretize by Binning
CPIS-342 - The Lab Note
Lab 5
28th
March
2010
LAB 5: Data Pre-Processing Techniques
Run the program, Check the result after discretization all the see the
difference by plotting Histogram for the attribute “product_id”.
Histogram of “product_id” before discretization, original number of bins.
Histogram of “product_id” after discretization, “5” number of bins.
CPIS-342 - The Lab Note
Lab 5