Using R for Decision Trees for Multiple Measures

Using R for Decision Trees for
Multiple Measures
June 7, 2016
http://www.rpgroup.org/projects/multiple-measures-assessment-project
Jean Lafitte National Historical Park and Preserve
-photo by Willett
Why Decision Trees?
•
Provide easy-to follow and easy-to-code
rule sets
Create discrete categories
No distributional or relational assumptions
Help visualize decision-making criteria
•
•
•
•
•
Positive predictive values for discrete groups
They’re fun!
ABZ
Increasing
Homogeneity with
each split
ZAB
Root Node
BZA
Branch
ABA
Internal Node
ZZZ
BAB
AAA
BBB
Leaf Node
Link to R Code from RP Home Page
Home › Projects › All Projects › Multiple
Measures Assessment Project ›
Decision Rules and Analysis Code
http://rpgroup.org/system/files/MMAPScripts
PhaseIIDraft3.pdf
How is heterogeneity measured?
•Gini-Simpson Index
𝑛
2
𝑝𝑖
𝐷 =1−
𝑖=1
•If selecting two individual items randomly from a
collection, what is the probability they are in
different categories.
•The Gini coefficient is a measure of the inequality of
a distribution, a value of 0 expressing total equality
and a value of 1 maximal inequality.
Splitting Methods
•Class = used for categorical dependent var
•ANOVA = used for continuous dependent var
•Poisson = used for count of events in time
frame such as survival data
•Exponential = can also be used for survival
with different distributional assumptions
Two approaches to avoid overfitting
•Forward pruning: Stop growing the tree earlier.
–Stop splitting the nodes if the number of samples is
too small to make reliable decisions.
–Stop if the proportion of samples from a single class
(node purity) is larger than a given threshold
•Post-pruning: Allow overfit and then post-prune
the tree.
–Estimation of errors and tree size to decide which
subtree should be pruned.
Predicting success in college statistics using Poisson
Root Node
Internal Node/ Split
Leaf /
Terminal
Node
Success rate
Percent of sample in leaf
Visualization with rCharts
•http://rcharts.io/
•To install rCharts:
•install.packages('devtools')
•require('devtools')
•install_github('ramnathv/rCharts')
•Tutorials and examples using rCharts:
•https://sites.google.com/site/usingrcharts/
•https://github.com/ramnathv/rCharts
Contacts
Terrence Willett
The RP Group
[email protected]
Craig Hayward
The RP Group
[email protected]
Loris Fagioli
The RP Group
[email protected]
John Hetts
Educational Results Partnership
[email protected]
Mallory Newell
The RP Group
[email protected]
Ken Sorey
Educational Results Partnership
[email protected]
Peter Bahr
University of Michigan
[email protected]
Daniel Lamoree
Educational Results Partnership
[email protected]