Using IF-THEN Rules for Classification

Rule-based Classification
Compiled By:
Umair Yaqub
Lecturer
Govt. Murray College Sialkot
Readings:
Chapter 6 – Han and Kamber
Using IF-THEN Rules for Classification
 A rule-based classifier uses a set of IF-THEN rules for classification.
 An IF-THEN rule is an expression of the form:


IF condition THEN conclusion.
R1: IF age = youth AND student = yes THEN buys computer = yes.
 The “IF”-part (or left-hand side)of a rule is known as the rule
antecedent or precondition.
 The “THEN”-part (or right-hand side) is the rule consequent.
 R1 can also be written as

2
R1: (age = youth) ^ (student = yes))(buys computer = yes).
Using IF-THEN Rules for Classification
 If the condition (that is, all of the attribute tests) in a rule antecedent
holds true for a given tuple, we say that the rule antecedent is satisfied
(or simply, that the rule is satisfied) and that the rule covers the tuple.
 A rule R can be assessed by its coverage and accuracy.
 a rule’s coverage is the percentage of tuples that are covered by the rule
(i.e., whose attribute values hold true for the rule’s antecedent).
 Where
be the be the number of tuples covered by R and
number of tuples in D.
3
be the
Using IF-THEN Rules for Classification
 For a rule’s accuracy, we look at the tuples that it covers and see what
percentage of them the rule can correctly classify.:
 Where
be the be the number of tuples covered by R and
the number of tuples correctly classified by R.
4
be
Using IF-THEN Rules for Classification
5
Using IF-THEN Rules for Classification
 Our task is to predict whether a customer will buy a computer.
 Consider rule R1 above, which covers 2 of the 14 tuples.
 It can correctly classify both tuples. Therefore, coverage(R1) = 2/14 =
14.28% and accuracy (R1) = 2/2 = 100%.
6
Using IF-THEN Rules for Classification
 Let’s see how we can use rule-based classification to predict the class
label of a given tuple, X. If a rule is satisfied by X, the rule is said to be
triggered. For example, suppose we have
 Note that triggering does not always mean firing because there may be
more than one rule that is satisfied! If more than one rule is triggered,
we have a potential problem.
 We look at two strategies namely size ordering and rule ordering.
7
Using IF-THEN Rules for Classification
 The size ordering scheme assigns the highest priority to the triggering
rule that has the “toughest” requirements, where toughness is measured
by the rule antecedent size. That is, the triggering rule with the most
attribute tests is fired.
 The rule ordering scheme prioritizes the rules beforehand. With classbased ordering, the classes are sorted in order of decreasing
“importance,” such as by decreasing order of prevalence. That is, all of
the rules for the most prevalent (or most frequent) class come first, the
rules for the next prevalent class come next, and so on..
8
Rule Extraction from a Decision Tree
 Rules may be easier to understand than trees
 One rule is created for each path from the root to a leaf
age?
 Each attribute-value pair along a path forms a conjunction: the
leaf holds the class prediction
<=30
student?
 Rules are mutually exclusive and exhaustive
no

Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no
THEN buys_computer = no
IF age = young AND student = yes
THEN buys_computer = yes
IF age = mid-age
IF age = young AND credit_rating = fair

THEN buys_computer = no
If part -> Rule antecedent, Then part -> Rule consequent
9
no
THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
31..40
yes
yes
yes
>40
credit rating?
excellent
fair
yes
Pruning Rule Sets

In some cases, when
trees are large, the set
of extracted rules may
be difficult to interpret

Decision trees may
suffer from subtree
repetition and
replication

Rule set may need
pruning
10
Rule Assessment
 For a given rule antecedent, any condition that does not improve the
estimated accuracy of the rule can be pruned
 Assessment of a rule: coverage and accuracy

ncovers = # of tuples covered by R (Rule antecedent holds true)

ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers
11
Issues in Rule Extraction
 The rules may no longer be mutually exclusive and exhaustive



More than one rule may be triggered
What if they specify different classes?
What if no rule is triggered?
 If more than one rule is triggered, need conflict resolution
12

Size ordering: assign the highest priority to the triggering rule that has the
“toughest” requirement (i.e., with the most attribute test)

Class-based ordering: Classes are sorted in order of decreasing importance
(order of prevalence or misclassification cost per class)
Issues in Rule Extraction (contd…)
 Rule-based ordering (decision list): rules are organized into one long
priority list, according to some measure of rule quality or by experts
 If no rule is satisfied
13

A default rule may be specified to specify a default class

This may be the class in majority or the majority class of the tuples that
were not covered by any rule
Rule Extraction from Training Data
 Sequential covering algorithm: Extracts rules directly from training data
 Typical sequential covering algorithms: FOIL, AQ, CN2, RIPPER
 Rules are learned sequentially, each for a given class Ci will cover many tuples of Ci
but none (or few) of the tuples of other classes
 Steps:
 Rules are learned one at a time
 Each time a rule is learned, the tuples covered by the rules are removed
 The process repeats on the remaining tuples unless termination condition, e.g., when no
more training examples or when the quality of a rule returned is below a user-specified
threshold
 Comparison with decision-tree induction: learning a set of rules simultaneously
14
Sequential covering algorithm
15
Sequential covering algorithm…
16