Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar Fully worked-out example Weekend (Example) W1 Sunny Yes Rich Decision (Category) Cinema W2 Sunny No Rich Tennis W3 Windy Yes Rich Cinema W4 Rainy Yes Poor Cinema W5 Rainy No Rich Stay in W6 Rainy Yes Poor Cinema W7 Windy No Poor Cinema W8 Windy No Rich Shopping W9 Windy Yes Rich Cinema W10 Sunny No Rich Tennis Weather Homework Money 2 Decision Tree Based Classification  Advantages: – Inexpensive to construct – Extremely fast at classifying unknown records – Easy to interpret for small-sized trees – Accuracy is comparable to other classification techniques for many simple data sets © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Tree Induction  Greedy strategy. – Split the records based on an attribute test that optimizes certain criterion.  Issues – Determine how to split the records How to specify the attribute test condition? How to determine the best split? – Determine when to stop splitting © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› The “training set/testing set" paradigm (2) Learning the model Slide credit: Leon Bottou © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› The “validation set” Slide credit: Leon Bottou © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Underfitting and Overfitting Overfitting Underfitting: when model is too simple, both training and test errors are large © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Overfitting due to Noise Decision boundary is distorted by noise point © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Notes on Overfitting  Overfitting results in decision trees that are more complex than necessary  Training error no longer provides a good estimate of how well the tree will perform on previously unseen records  Need new ways for estimating errors © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› How to Address Overfitting  Pre-Pruning (Early Stopping Rule) – Stop the algorithm before it becomes a fully-grown tree – Typical stopping conditions for a node:  Stop if all instances belong to the same class  Stop if all the attribute values are the same – More restrictive conditions: Stop if number of instances is less than some user-specified threshold  Stop if class distribution of instances are independent of the available features (e.g., using  2 test)   Stop if expanding the current node does not improve impurity measures (e.g., information gain). © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› How to Address Overfitting…  Post-pruning – Grow decision tree to its entirety – Trim the nodes of the decision tree in a bottom-up fashion – If generalization error improves after trimming, replace sub-tree by a leaf node. – Class label of leaf node is determined from majority class of instances in the sub-tree © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Decision Boundary 1 0.9 x < 0.43? 0.8 0.7 Yes No y 0.6 y < 0.33? y < 0.47? 0.5 0.4 Yes 0.3 0.2 :4 :0 0.1 No Yes :0 :4 :0 :3 No :4 :0 0 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1 • Border line between two neighboring regions of different classes is known as decision boundary • Decision boundary is parallel to axes because test condition involves a single attribute at-a-time © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Oblique Decision Trees x+y<1 Class = + Class = • Test condition may involve multiple attributes • More expressive representation • Finding optimal test condition is computationally expensive © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›
© Copyright 2025 Paperzz