28th March 2010 LAB 5: Data Pre-Processing Techniques Statement Purpose: Today students will learn about Data Pre-processing techniques, we will apply several techniques on different Excel datasets by using Rapidminer. Activity Outcomes: Student will learn and practice following Data Pre-processing schemes by using real world examples and data: 1. Data Cleaning Techniques: a.Fill the Missing Values by using Rapidminer b. Outlier Treatment for Reducing Noise by using Rapidminer 2. Data Transformation Technique: Normalization by using Rapidminer 3. Data Discritization Technique: Discritization by using Rapidminer Instructor Note: Follow the instructions. CPIS-342 - The Lab Note Lab 5 28th March 2010 LAB 5: Data Pre-Processing Techniques 1. Data Cleaning Techniques: a. Fill the Missing Values by using Rapidminer Open Rapidminer and Import File “sales_data missing “ Check the missing value in meta data view ,take look to amount attribute . Using operators expand Data Transformation ,then expand Data Cleansing ,open Replace Missing Values. CPIS-342 - The Lab Note Lab 5 28th March 2010 LAB 5: Data Pre-Processing Techniques Run the program, check the missing values b. Detect Outlier by using Rapidminer Import File sales_data Outlier Take a view on this file in the attribute amount, there are some outliers (Noise), means there are some values under this attribute which are very far from other. Now we will apply a technique of data cleaning to reduce this noise from the data. CPIS-342 - The Lab Note Lab 5 28th March 2010 LAB 5: Data Pre-Processing Techniques Using operators expand Transformation ,then expand Data Cleansing ,Outlier Detection open Detect Outlier(Distant) In Detect outlier number of outlier change the value to 1 Run the program, Check the outlier CPIS-342 - The Lab Note Lab 5 28th March 2010 LAB 5: Data Pre-Processing Techniques 2. Data Transformation Technique: Normalization by using Rapidminer Import File sales_data Normalize Take a view on this file in all attributes, we will apply normalization on this file, by using “Range TransformationMin-Max Method”, with the range of 0.0 to 1.0. Using operators expand Data Transformation ,then expand Value Modification Numerical Value Modification open Normalize) CPIS-342 - The Lab Note Lab 5 28th March 2010 LAB 5: Data Pre-Processing Techniques Run the program, Check the result after normalization all the values will replace by given range. CPIS-342 - The Lab Note Lab 5 28th March 2010 LAB 5: Data Pre-Processing Techniques 3. Data Discretization Technique: Discritization by using Rapidminer Import File sales_data Discretization Take a view on this file in all attributes, we will apply discretization on this file, by using “Discretize by Binning”, provided by number of bins. Using operators expand Data Transformation ,then expand Type Conversion Discretization Discretize by Binning CPIS-342 - The Lab Note Lab 5 28th March 2010 LAB 5: Data Pre-Processing Techniques Run the program, Check the result after discretization all the see the difference by plotting Histogram for the attribute “product_id”. Histogram of “product_id” before discretization, original number of bins. Histogram of “product_id” after discretization, “5” number of bins. CPIS-342 - The Lab Note Lab 5
© Copyright 2026 Paperzz