Decision Trees What is a Decision Tree? How to build a good one… Machine Learning Group University College Dublin 2 Classifying Apples & Pears Greeness Height 210 60 220 70 215 55 180 76 220 68 160 65 215 Apples & 63Pears 180 55 220 68 190 60 No. 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 No. 8 70 No. 9 65No. 10 Width 62 53 50 40 45 68 45 56 65 58 Taste Sweet Sweet Tart Sweet Sweet Sour Sweet Sweet Tart Sour Weight 186 180 152 152 153 221 140 154 221 174 60 Width 55 Pear Apple 50 45 40 35 30 50 55 60 D Trees 65 Height 70 75 80 Height/Width 0.97 1.32 1.10 1.90 1.51 0.96 1.40 0.98 1.05 1.03 Class Apple Pear Apple Pear Pear Apple Pear Apple Apple Apple 3 A Decision Tree Apples & Pears 70 65 60 Width 55 Pear Apple 50 45 Width <55 >55 40 35 50 55 60 65 70 75 Apple Height 30 80 Height <59 Apple D Trees >59 Pear 4 A Decision Tree No. 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 No. 8 No. 9 No. 10 Greeness 210 220 215 180 220 160 215 180 220 190 Height 60 70 55 76 68 65 63 55 68 60 Width 62 53 50 40 45 68 45 56 65 58 Taste Sweet Sweet Tart Sweet Sweet Sour Sweet Sweet Tart Sour Weight 186 180 152 152 153 221 140 154 221 174 Height/Width 0.97 1.32 1.10 1.90 1.51 0.96 1.40 0.98 1.05 Width 1.03 <55 Class Apple Pear Apple Pear Pear Apple Pear Apple Apple Apple >55 Height/Width <1.2 >1.2 Apple Pear D Trees Apple Height <59 Apple >59 Pear 5 Decision Trees Each internal node tests an attribute Each branch corresponds to an attribute value Each leaf node assigns a classification Cannot readily represent ,, XOR (A B) , (C D E) M of N D Trees 6 When to consider D-Trees Instances described by attribute-value pairs Target function is discrete valued Disjunctive hypothesis may be required Possibly noisy training data Classification can be done using a few features Examples Equipment or medical diagnosis Credit risk analysis D Trees 7 D-Tree Example Alternative Bar Fri/Sat Hungary Patrons Price Raining Reservation Type Whether there is a suitable alternative restaurant nearby. Is there a comfortable bar area? True on Friday or Saturday nights. How hungry is the subject? How many people in the restaurant? Price range. Is it raining outside? Does the subject have a reservation? Type of Restaurant. Stay? Stay or Go D Trees 8 D-Tree Example Case Alt. Bar Fri. Hun Pat Price Rain Res Type Est. Stay? X1 Yes No No Yes Some $$$ No Yes French 0-10 Yes X2 Yes No No Yes Full $ No No Thai 30-60 No X3 No Yes No No Some $ No No Burger 0-10 Yes X4 Yes No Yes Yes Full $ No No Thai Oct-30 Yes X5 Yes No Yes No Full $$$ No Yes French >60 No X6 No Yes No Yes Some $$ Yes Yes Italian 0-10 Yes X7 No Yes No No None $ Yes No Burger 0-10 No X8 No No No Yes Some $$ Yes Yes Thai 0-10 Yes X9 No Yes Yes No Full $ Yes No Burger >60 No X10 Yes Yes Yes Yes Full $$$ No Yes Italian Oct-30 X11 No No No No None $ No No Thai 0-10 No X12 Yes Yes Yes Yes Full $ No No Burger 30-60 Yes D Trees No 9 D-Tree Example Very good D-Tree • Classifies all examples correctly • Very few nodes Patrons? None Full Some Hungry? Yes No No Yes No Type? Burger French Thai Italian Yes No Objective in building a decision tree is to choose attributes so as to minimise the depth of the tree D Trees Fri/Sat? No No Yes Yes Yes 10 Top-down induction of D-Trees 1. A the “best” decision attribute for next node 2. Assign A as decision attribute for node 3. For each value of A create new descendant of node 4. Sort training examples to leaf nodes 5. If training examples perfectly classified, Then Stop, Else repeat recursively over leaf nodes A1=? [29+,35-] t A2=? [29+,35-] f t f Which attribute is best? [21+,5-] D Trees [8+,30-] [18+,33-] [11+,2-] 11 Good and Bad Attributes A perfect attribute divides examples into categories of one type. (e.g. Patrons) • A poor attribute produces categories of mixed type. (e.g. Type) Patrons? None No 2 No Some Yes 4 Yes Type? Full French Hungry? 4 No 2 Yes Yes/No 1 No 1 Yes How can we measure this? D Trees Thai Yes/No 2 No 2 Yes Burger Yes/No 2 No 2 Yes Italian Yes/No 1 No 1 Yes 12 Entropy 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 p D Trees 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0 0.2 0.9 0.1 1 0 S is a sample of training examples p is the proportion of positive examples in S q is the proportion of positive examples in S Entropy measures the impurity of S Entropy(S) = -plog2(p) -qlog2(q) Entropy(s) 13 Entropy Entropy(S) = expected number of bits needed to encode class (p or q) of randomly drawn members of S (under optimal shortest length code) Why? Information theory: optimal length code assigns - log2(q) bits to messages having probability p. So, expected number of bits to encode messages in ratio p:q of random members of S. -p(log2(p)) -q(log2(q)) i.e. Entropy(S) = -plog2(p) -qlog2(q) D Trees 14 Information Gain Gain(S,A) = expected reduction in entropy due to sorting on A Sv Gain( S , A) Entropy ( S ) Entropy ( Sv ) S v Values( A) A1=? [29+,35-] t [21+,5-] D Trees A2=? [29+,35-] f [8+,30-] t [18+,33-] f [11+,2-] 15 D-Tree Example Heigh Sean short Mike tall Paddy tall Mike Óg short Hair blond blond red dark Eyes blue brow blue blue Clan McD Joyce McD Joyce Colm Liam Johnny Cóilín dark blond dark blond blue blue brow brow Joyce McD Joyce Joyce D Trees tall tall tall short 16 Minimal D-Tree Hair dark short, dark, blue: J tall, dark, blue: J tall, dark, brown:J blond red tall, red, blue, McD Eyes blue short, blond, blue: McD tall, blond, blue: McD D Trees brown tall, blond, brown: J short,blond, brown:J 17 Summary ML avoids some KE effort Recursive algorithm for bulding D-Trees Using informatio gain (Entropy) to select discriminating attribute Example Important People Claude Shannon http://en.wikipedia.org/wiki/Claude_Shannon William of Ockham http://en.wikipedia.org/wiki/William_of_Ockham D Trees
© Copyright 2026 Paperzz