FEUP::PDEEC Machine Learning 2016/17 Decision Trees Jaime S. Cardoso [email protected] INESC TEC and Faculdade de Engenharia, Universidade do Porto Oct. 3, 2016 Predict if John will play tennis 2 3 4 5 6 ID3 algorithm • Split (node, {examples}): 1. A<- the best attribute for splitting the {examples} 2. Decision attribute for this node <-A 3. Foreach value of A, create new child node 4. Split training {examples} to child nodes 5. If examples perfectly classified: STOP else: iterate over new child nodes Split(child_node, {subset of examples}) • Ross Quinlan (ID3: 1986), (C4.5: 1993) • Breiman etal (CaRT: 1984) from statistics 7 Which attribute to split on? 8 Entropy 9 Information Gain 10 Overffiting in Decision Trees • Can always classify training examples perfectly – Keep splitting until each node contains 1 example – singleton=pure • Doesn’t work on new data 11 Avoid overfitting • Stop splitting when not statistically significant • Grow, then post-prune – Based on validation set • Sub-tree replacement pruning (WF 6.1) – For each node: • Pretend remove node + all children from the tree • Measure performance on validation set – Remove node that results in greatest improvement – Repeat until further pruning is harmful 12 General Structure • Task: classification, discriminative • Model structure: decision tree • Score function – Information gain at each node – Preference for short trees – Preference for high-gain attributes near the root • Optimization/search method – Greedy search from simple to complex – Guided by information gain 13 Problems with Information Gain 14 Trees are interpretable 15 Continuous Attributes 16 Multiclass and regression 17 Pros and Cons 18 Random Decision Forest 19 Summary 20 References • Christopher M. Bishop, Pattern recognition and machine learning, Springer, 2006. • Trevor Hastie and Robert Tibshirani and Jerome Friedman, The elements of statistical learning, Springer. • Sergios Theodoridis and Konstantinos Koutroumbas, Pattern recognition, Elsevier, Academic Press, 2009. • Tom M. Mitchell, Machine learning McGraw-Hill, 1997. • Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification, John Wiley & Sons, 2001 21 References • IAML (source of the slides) – http://www.inf.ed.ac.uk/teaching/courses/iaml/ • See also the section of Sergios Theodoridis and Konstantinos Koutroumbas, Pattern recognition, Elsevier, Academic Press, 2009. • Eric Xing' Homepage, http://www.cs.cmu.edu/~epxing/ • Andrew Moore, Statistical Data Mining Tutorials, http://www.autonlab.org/tutorials/ • Mário A. T. Figueiredo' Homepage, http://www.lx.it.pt/~mtf/ • Nuno Vasconcelos' Homepage, http://www.svcl.ucsd.edu/~nuno/ • Joachim Buhmann' Homepage http://ml2.inf.ethz.ch/courses/iml/ http://www.ml.inf.ethz.ch/people/professors/jbuhmann 22
© Copyright 2026 Paperzz