Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter 3 of the textbook by Witten Ways of representing patterns: decision trees, rules, instance-based, … “knowledge” representation — Representation determines inference method — Understanding the output is the key to understanding the underlying learning methods — Different types of output for different learning problems (e. g. classification, regression, …) For classification, a natural choice is to use the same form as the input. Outlook Sunny Sunny Overcast Overcast Rainy Rainy Humidity High Normal High Normal High Normal Play No Yes Yes Yes No No How to choose the right attributes? Decision trees — “Divide- and- conquer” approach — Nodes involve comparing an attribute’s value with a constant Other possibilities: Comparing two attribute values Using functions of attributes — Leaves assign classification, set of classifications, or probability distribution — Unknown instance is routed down the tree Nominal attribute Number of children usually equals to number of values. Usually one test unless the attribute values are divided into two subsets Numeric attribute: test whether value is greater or less than constant ay be tested several times Three-way split (or multi- way split) Integer: less than, equal to, greater than Real: below, within, above Missing values: 1. Assign “missing” a separate value 2. Assign instance to most popular branch (method 6) 3. Split instance into pieces that receive weight according to fraction of training instances that go down each branch Classifications from leave nodes are combined using the weights that have percolated to them (only works for some special schemes) Classification rules —Pre-condition: a series of tests — Tests are usually logically ANDed together —Consequent: classes, set of classes, or probability distribution assigned by rule —Individual rules are often logically ORed together Conflicts arise if different conclusions apply The weather problem Outlook Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy Temperature Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High windy False True False False False True True False False False True True False True Play No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No A Decision List is a set of rules to be interpreted in sequence. For instance If outlook=sunny and humidity=high then play=no If outlook=rainy and windy=true then play=no If outlook=overcast then play=yes If humidity=normal then play=yes If none of the above then play=yes Interpreted as a decision list, the rules correctly classify all the instances Note: applying the fourth rule leads to a wrong conclusion! Unordered set of rules may overlap and lead to different conclusions for the same instance What to do if confliction appears? Go with the most popular rule. If no rules applied to a test instance, what should we do? Go with the most frequent class in the training set? Special case: boolean class — Assumption: if instance does not belong to class “yes”, it belongs to class “no” — Trick: only learn rules for class “yes” and use default rule for “no” — Order of rules is not important. No conflicts! Rules in disjunctive normal form: If x = 1 and y = 1 then class = a If z = 1 and w = 1 then class = a No clues for the executing process. From trees to rules — Easy: converting a tree into a set of rules by following conditions for all the nodes on the path from the root to the leaf Consequent is class assigned by the leaf — Produces rules that are unambiguous — Resulting rules are unnecessarily complex Pruning to remove redundant rules From rules to trees Difficult: transforming a rule set into a tree Tree cannot easily express disjunction between rules Example: rules that test different attributes — Symmetry needs to be broken — Corresponding tree contains identical subtrees; Example If a and b then x If c and d then x Decision Tree Hard to add structure Hard to describe disjunctive rules No conflicting rules Following the path VS Classification Easy to add extra rules Easy for disjunctive rules Conflicting rules might appear Order of interpretation is important Association rules — Association rules can predict any attribute and combinations of attributes — Not intended to be used together as a set due to the huge number of possible rules — Output needs to be restricted to show only the most predictive associations. Issue: How to find those rules? Support: number of instances predicted correctly Confidence: number of correct predictions, as proportion of all instances that rule applies to (coverage and accuracy?) — Example: 4 cool days with normal humidity Support = 4, confidence = 100% Minimum support and confidence prespecified (e. g. 58 rules with support 2 and confidence 95% for weather data) If temperature = cool then humidity = normal Interpreting association rules If humidity = high and windy = false and play = no then outlook = sunny is not the same as If windy=false and play=no then outlook=sunny If windy=false and play=no then humidity=high — But, it means If windy=false and play=no then outlook=sunny and humidity=high Rules with exceptions — Idea: allow rules to have exceptions Example: rule for iris data If petal-length in [2.45, 4.45) then Iris-versicolor New instance: Sepal length 5.1 Sepal Width 3.5 Petal length 2.6 Petal width 0.2 Type Irissetosa — Modified rule: If petal-length in [2.45,4.45) then Iris-versicolor EXCEPT if petal-width < 1.0 then Iris-setosa Exceptions to exceptions can be added… Advantages of using exceptions — Rules can be updated incrementally Easy to incorporate new data Easy to incorporate domain knowledge People often think in terms of exceptions and like to be treated as exceptions, Each conclusion can be considered just in the context of rules and exceptions that lead to it, Locality property is important for understanding large rule sets Rules involving relations — All rules involved comparing an attributevalue to a constant are called “propositional” as they have the same expressive power as propositional logic — What if problem involves relationships between examples (e. g. family tree problem from above)? A propositional solution Width height Sides class 2 4 4 Standing 3 6 4 Standing 4 3 4 Lying 7 8 3 Standing 7 6 3 Lying 2 9 4 Standing 9 1 3 Lying 10 2 3 Lying If width If height 3.5 then standing A relational solution: If width > height then lying If height > width then standing A relational solution: — Comparing attributes with each other — Generalizes better to new data — Standard relations: =, <, > — But: learning relational rules is costly — Simple solution: adding extra attributes (e. g. a binary attribute is width < height?) Trees for numeric prediction: A combination of decision tree and regression. —Regression: the process of computing an expression that predicts a numeric quantity —Regression tree: “decision tree” where each leaf predicts a numeric quantity Predicted value is the average value of training instances that reach the leaf —Model tree: “regression tree” with linear regression models at the leaf nodes Linear models approximate nonlinear continuous function Example: The performance of CPU. Instance- based representation 1. Training instances are searched for instance that most closely resembles new instance 2. The instances themselves represent the knowledge 3. Also called instance-based learning — Similarity function defines what’s “learned”: `lazy’ learning — Methods: nearest-neighbor, k-nearestneighbor, … The distance function — One numeric attribute: distance is the difference between two attribute values — Several numeric attributes: Euclidean distance for normalized attributes. — Nominal attributes: distance is set to 1 if values are different, 0 if they are equal Issues: Are all attributes equally important? Weighting the attributes if necessary
© Copyright 2026 Paperzz