Using R for Decision Trees for Multiple Measures June 7, 2016 http://www.rpgroup.org/projects/multiple-measures-assessment-project Jean Lafitte National Historical Park and Preserve -photo by Willett Why Decision Trees? • Provide easy-to follow and easy-to-code rule sets Create discrete categories No distributional or relational assumptions Help visualize decision-making criteria • • • • • Positive predictive values for discrete groups They’re fun! ABZ Increasing Homogeneity with each split ZAB Root Node BZA Branch ABA Internal Node ZZZ BAB AAA BBB Leaf Node Link to R Code from RP Home Page Home › Projects › All Projects › Multiple Measures Assessment Project › Decision Rules and Analysis Code http://rpgroup.org/system/files/MMAPScripts PhaseIIDraft3.pdf How is heterogeneity measured? •Gini-Simpson Index 𝑛 2 𝑝𝑖 𝐷 =1− 𝑖=1 •If selecting two individual items randomly from a collection, what is the probability they are in different categories. •The Gini coefficient is a measure of the inequality of a distribution, a value of 0 expressing total equality and a value of 1 maximal inequality. Splitting Methods •Class = used for categorical dependent var •ANOVA = used for continuous dependent var •Poisson = used for count of events in time frame such as survival data •Exponential = can also be used for survival with different distributional assumptions Two approaches to avoid overfitting •Forward pruning: Stop growing the tree earlier. –Stop splitting the nodes if the number of samples is too small to make reliable decisions. –Stop if the proportion of samples from a single class (node purity) is larger than a given threshold •Post-pruning: Allow overfit and then post-prune the tree. –Estimation of errors and tree size to decide which subtree should be pruned. Predicting success in college statistics using Poisson Root Node Internal Node/ Split Leaf / Terminal Node Success rate Percent of sample in leaf Visualization with rCharts •http://rcharts.io/ •To install rCharts: •install.packages('devtools') •require('devtools') •install_github('ramnathv/rCharts') •Tutorials and examples using rCharts: •https://sites.google.com/site/usingrcharts/ •https://github.com/ramnathv/rCharts Contacts Terrence Willett The RP Group [email protected] Craig Hayward The RP Group [email protected] Loris Fagioli The RP Group [email protected] John Hetts Educational Results Partnership [email protected] Mallory Newell The RP Group [email protected] Ken Sorey Educational Results Partnership [email protected] Peter Bahr University of Michigan [email protected] Daniel Lamoree Educational Results Partnership [email protected]
© Copyright 2024 Paperzz