Representing structural

Representing structural patterns:






Plain Classification rules
Decision Tree
Rules with exceptions
Relational solution
Tree for Numerical Prediction
Instance-based presentation
Reading Material:

Chapter 3 of the textbook by Witten
Ways of representing patterns:
decision trees, rules, instance-based, …
 “knowledge” representation
— Representation determines inference
method
— Understanding the output is the key to
understanding the underlying learning
methods
— Different types of output for different
learning problems (e. g. classification,
regression, …)
For classification, a natural choice is to use
the same form as the input.
Outlook
Sunny
Sunny
Overcast
Overcast
Rainy
Rainy
Humidity
High
Normal
High
Normal
High
Normal
Play
No
Yes
Yes
Yes
No
No
How to choose the right attributes?
Decision trees
— “Divide- and- conquer” approach
— Nodes involve comparing an attribute’s
value with a constant
Other possibilities:

Comparing two attribute values

Using functions of attributes
— Leaves assign classification, set of
classifications, or probability distribution
— Unknown instance is routed down the
tree
Nominal attribute
 Number of children usually equals to
number of values. Usually one test
unless the attribute values are divided
into two subsets
Numeric attribute: test whether value is
greater or less than constant 
 ay be tested several times
Three-way split (or multi- way split)
Integer: less than, equal to, greater than
Real: below, within, above
Missing values:
1. Assign “missing” a separate value
2. Assign instance to most popular branch
(method 6)
3. Split instance into pieces that receive
weight according to fraction of training
instances that go down each branch
Classifications from leave nodes are combined
using the weights that have percolated to them
(only works for some special schemes)
Classification rules
—Pre-condition: a series of tests
— Tests are usually logically ANDed
together
—Consequent: classes, set of classes, or
probability distribution assigned by rule
—Individual rules are often logically ORed
together
Conflicts arise if different conclusions apply
The weather problem
Outlook
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy
Temperature
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
humidity
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
windy
False
True
False
False
False
True
True
False
False
False
True
True
False
True
Play
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
A Decision List is a set of rules to be
interpreted in sequence. For instance
If outlook=sunny and humidity=high then
play=no
If outlook=rainy and windy=true then play=no
If outlook=overcast then play=yes
If humidity=normal then play=yes
If none of the above then play=yes
Interpreted as a decision list, the rules
correctly classify all the instances
Note: applying the fourth rule leads to a
wrong conclusion!
Unordered set of rules may overlap
and lead to different conclusions
for the same instance
What to do if confliction appears?
 Go with the most popular rule.
If no rules applied to a test instance, what
should we do?
 Go with the most frequent class in the
training set?
Special case: boolean class
— Assumption: if instance does not belong
to class “yes”, it belongs to class “no”
— Trick: only learn rules for class “yes” and
use default rule for “no”
— Order of rules is not important. No
conflicts!
Rules in disjunctive normal form:
If x = 1 and y = 1 then class = a
If z = 1 and w = 1 then class = a
No clues for the executing process.
From trees to rules
— Easy: converting a tree into a set of rules by
following conditions for all the nodes on the path
from the root to the leaf
Consequent is class assigned by the leaf
— Produces rules that are unambiguous
— Resulting rules are unnecessarily complex
Pruning to remove redundant rules
From rules to trees
Difficult: transforming a rule set into a tree
Tree cannot easily express disjunction
between rules
Example: rules that test different attributes
— Symmetry needs to be broken
— Corresponding tree contains identical
subtrees; Example
 If a and b then x
 If c and d then x
Decision Tree
Hard to add structure
Hard to describe
disjunctive rules
No conflicting rules
Following the path
VS
Classification
Easy to add extra rules
Easy for disjunctive rules
Conflicting rules might
appear
Order of interpretation is
important
Association rules
— Association rules can predict any
attribute and combinations of attributes
— Not intended to be used together as a set
due to the huge number of possible rules
— Output needs to be restricted to show
only the most predictive associations.
Issue: How to find those rules?
Support: number of instances predicted
correctly
Confidence: number of correct predictions,
as proportion of all instances that rule
applies to (coverage and accuracy?)
— Example: 4 cool days with normal
humidity
Support = 4, confidence = 100%
Minimum support and confidence prespecified (e. g. 58 rules with support 2 and
confidence 95% for weather data)
If temperature = cool then humidity = normal
Interpreting association rules
If humidity = high and windy = false and play =
no then outlook = sunny
is not the same as
If windy=false and play=no then outlook=sunny
If windy=false and play=no then humidity=high
— But, it means
If windy=false and play=no then outlook=sunny
and humidity=high
Rules with exceptions
— Idea: allow rules to have exceptions
Example: rule for iris data
If petal-length in [2.45, 4.45)
then Iris-versicolor
New instance:
Sepal
length
5.1
Sepal
Width
3.5
Petal
length
2.6
Petal
width
0.2
Type
Irissetosa
— Modified rule:
If petal-length in [2.45,4.45) then
Iris-versicolor
EXCEPT if petal-width < 1.0 then
Iris-setosa
Exceptions to exceptions can be added…
Advantages of using exceptions
— Rules can be updated incrementally
Easy to incorporate new data
Easy to incorporate domain knowledge

People often think in terms of
exceptions and like to be treated as
exceptions,

Each conclusion can be considered
just in the context of rules and
exceptions that lead to it,
Locality property is important for
understanding large rule sets
Rules involving relations
— All rules involved comparing an attributevalue to a constant are called “propositional”
as they have the same expressive power as
propositional logic
— What if problem involves relationships
between examples (e. g. family tree
problem from above)?
A propositional solution
Width
height
Sides
class
2
4
4
Standing
3
6
4
Standing
4
3
4
Lying
7
8
3
Standing
7
6
3
Lying
2
9
4
Standing
9
1
3
Lying
10
2
3
Lying
If width
If height 3.5 then standing
A relational solution:
If width > height then lying
If height > width then standing
A relational solution:
— Comparing attributes with each other
— Generalizes better to new data
— Standard relations: =, <, >
— But: learning relational rules is costly
— Simple solution: adding extra attributes
(e. g. a binary attribute is width < height?)
Trees for numeric prediction:
A combination of decision tree and
regression.
—Regression: the process of computing an
expression that predicts a numeric quantity
—Regression tree: “decision tree” where
each leaf predicts a numeric quantity
 Predicted value is the average value of
training instances that reach the leaf
—Model tree: “regression tree” with linear
regression models at the leaf nodes
Linear models approximate nonlinear
continuous function
Example: The performance of CPU.
Instance- based representation
1. Training instances are searched for
instance that most closely resembles new
instance
2. The instances themselves represent the
knowledge
3. Also called instance-based learning
— Similarity function defines what’s
“learned”: `lazy’ learning
— Methods: nearest-neighbor, k-nearestneighbor, …
The distance function
— One numeric attribute: distance is the
difference between two attribute values
— Several numeric attributes: Euclidean
distance for normalized attributes.
— Nominal attributes: distance is set to 1 if
values are different, 0 if they are equal
Issues: Are all attributes equally important?
 Weighting the attributes if necessary