Decision Trees

FEUP::PDEEC
Machine Learning 2016/17
Decision Trees
Jaime S. Cardoso
[email protected]
INESC TEC and Faculdade de Engenharia, Universidade do Porto
Oct. 3, 2016
Predict if John will play tennis
2
3
4
5
6
ID3 algorithm
• Split (node, {examples}):
1. A<- the best attribute for splitting the {examples}
2. Decision attribute for this node <-A
3. Foreach value of A, create new child node
4. Split training {examples} to child nodes
5. If examples perfectly classified: STOP
else: iterate over new child nodes
Split(child_node, {subset of examples})
• Ross Quinlan (ID3: 1986), (C4.5: 1993)
• Breiman etal (CaRT: 1984) from statistics
7
Which attribute to split on?
8
Entropy
9
Information Gain
10
Overffiting in Decision Trees
• Can always classify training examples perfectly
– Keep splitting until each node contains 1 example
– singleton=pure
• Doesn’t work
on new data
11
Avoid overfitting
• Stop splitting when not statistically significant
• Grow, then post-prune
– Based on validation set
• Sub-tree replacement pruning (WF 6.1)
– For each node:
• Pretend remove node + all children from the tree
• Measure performance on validation set
– Remove node that results in greatest improvement
– Repeat until further pruning is harmful
12
General Structure
• Task: classification, discriminative
• Model structure: decision tree
• Score function
– Information gain at each node
– Preference for short trees
– Preference for high-gain attributes near the root
• Optimization/search method
– Greedy search from simple to complex
– Guided by information gain
13
Problems with Information Gain
14
Trees are interpretable
15
Continuous Attributes
16
Multiclass and regression
17
Pros and Cons
18
Random Decision Forest
19
Summary
20
References
• Christopher M. Bishop, Pattern recognition and machine
learning, Springer, 2006.
• Trevor Hastie and Robert Tibshirani and Jerome Friedman,
The elements of statistical learning, Springer.
• Sergios Theodoridis and Konstantinos Koutroumbas,
Pattern recognition, Elsevier, Academic Press, 2009.
• Tom M. Mitchell, Machine learning McGraw-Hill, 1997.
• Richard O. Duda, Peter E. Hart, David G. Stork, Pattern
Classification, John Wiley & Sons, 2001
21
References
• IAML (source of the slides)
– http://www.inf.ed.ac.uk/teaching/courses/iaml/
• See also the section of Sergios Theodoridis and Konstantinos
Koutroumbas, Pattern recognition, Elsevier, Academic Press, 2009.
• Eric Xing' Homepage, http://www.cs.cmu.edu/~epxing/
• Andrew Moore, Statistical Data Mining Tutorials,
http://www.autonlab.org/tutorials/
• Mário A. T. Figueiredo' Homepage, http://www.lx.it.pt/~mtf/
• Nuno Vasconcelos' Homepage, http://www.svcl.ucsd.edu/~nuno/
• Joachim Buhmann' Homepage http://ml2.inf.ethz.ch/courses/iml/
http://www.ml.inf.ethz.ch/people/professors/jbuhmann
22