Induction of Decision Trees An Example Data Set and Decision Tree # Attribute Outlook Company Sailboat 1 sunny big small 2 sunny med small 3 sunny med big 4 sunny no small 5 sunny big big 6 rainy no small 7 rainy med small 8 rainy big big 9 rainy no big 10 rainy med big Class Sail? yes yes yes yes yes no yes yes no no outlook sunny rainy yes company no med no sailboat small yes big no big yes Classification # 1 2 Attribute Outlook Company Sailboat sunny no big rainy big small Class Sail? ? ? outlook sunny rainy yes company no med no sailboat small yes big no big yes Breast Cancer Recurrence Degree of Malig <3 Tumor Size < 15 Involved Nodes >= 15 no rec 125 recurr 39 Age no rec 4 recurr 1 >= 3 <3 no rec 30 recurr 18 >= 3 recurr 27 no_rec 10 no rec 32 recurr 0 Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists Another Example # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy Attribute Temperature Humidity hot high hot high hot high moderate high cold normal cold normal cold normal moderate high cold normal moderate normal moderate normal moderate high hot normal moderate high Windy no yes no no no yes yes no no no yes yes no yes Class Play N N P P P N P N P P P P P N Simple Tree Outlook sunny overcast Humidity high N rainy P Windy normal P yes N no P Complicated Tree Temperature cold moderate Outlook sunny P overcast P hot Outlook rainy sunny Windy yes N no P Windy yes P Windy rainy overcast P Humidity no N yes high N Humidity normal Windy yes N P no sunny P no N high normal Outlook P overcast P rainy null Attribute Selection Criteria • Main principle – Select attribute which partitions the learning set into subsets as “pure” as possible • Various measures of purity – – – – – Information-theoretic Gini index X2 ReliefF ... Information-Theoretic Approach • To classify an object, a certain information is needed – I, information • After we have learned the value of attribute A, we only need some remaining amount of information to classify the object – Ires, residual information • Gain – Gain(A) = I – Ires(A) • The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain Entropy • The average amount of information I needed to classify an object is given by the entropy measure • For a two-class problem: entropy p(c1) Triangles and Squares # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Color green green yellow red red red green green yellow red green yellow yellow red Attribute Outline dashed dashed dashed dashed solid solid solid dashed solid solid solid dashed solid dashed Shape Dot no yes no no no yes no no yes no yes yes no yes triange triange square square square triange square triange square square square square square triange Triangles and Squares # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Color green green yellow red red red green green yellow red green yellow yellow red Attribute Outline dashed dashed dashed dashed solid solid solid dashed solid solid solid dashed solid dashed Shape Dot no yes no no no yes no no yes no yes yes no yes triange triange square square square triange square triange square square square square square triange Data Set: A set of classified objects . . . . . . Entropy . . . . . • 5 triangles • 9 squares • class probabilities . • entropy Entropy reduction by data set partitioning . . . . . . . . red Color? green . yellow . . . Entropija vrednosti atributa . . . . . . . . red Color? green yellow . . . . Information Gain . . . . . . . . red Color? green yellow . . . . Information Gain of The Attribute • Attributes – Gain(Color) = 0.246 – Gain(Outline) = 0.151 – Gain(Dot) = 0.048 • Heuristics: attribute with the highest gain is chosen • This heuristics is local (local minimization of impurity) . . . . . . . . red Color? green . . yellow . . Gain(Outline) = 0.971 – 0 = 0.971 bits Gain(Dot) = 0.971 – 0.951 = 0.020 bits . . . . . . . . red Gain(Outline) = 0.971 – 0.951 = 0.020 bits Gain(Dot) = 0.971 – 0 = 0.971 bits Color? green . . yellow . . solid Outline? dashed . . . . . . . . . . red . yes Dot? Color? . no green . . yellow . . solid Outline? dashed . . Decision Tree . . . . . . Color red Dot yes triangle yellow green square no square Outline dashed triangle solid square Gini Index • Another sensible measure of impurity (i and j are classes) • After applying attribute A, the resulting Gini index is • Gini can be interpreted as expected error rate Gini Index . . . . . . Gini Index for Color . . . . . . . . red Color? green yellow . . . . Gain of Gini Index Three Impurity Measures A Gain(A) GainRatio(A) GiniGain(A) Color 0.247 0.156 0.058 Outline 0.152 0.152 0.046 Dot 0.048 0.049 0.015 • These impurity measures assess the effect of a single attribute • Criterion “most informative” that they define is local (and “myopic”) • It does not reliably predict the effect of several attributes applied jointly
© Copyright 2026 Paperzz