CS 461: Machine Learning Lecture 3 Dr. Kiri Wagstaff [email protected] 1/24/09 CS 461, Winter 2009 1 Questions? Homework 2 Project Proposal Weka Other questions from Lecture 2 1/24/09 CS 461, Winter 2009 2 Review from Lecture 2 Representation, feature types numeric, discrete, ordinal Decision trees: nodes, leaves, greedy, hierarchical, recursive, non-parametric Impurity: misclassification error, entropy Evaluation: confusion matrix, cross-validation 1/24/09 CS 461, Winter 2009 3 Plan for Today Decision trees Regression trees, pruning, rules Benefits of decision trees Evaluation Comparing two classifiers Support Vector Machines Classification Linear discriminants, maximum margin Learning (optimization) Non-separable classes Regression 1/24/09 CS 461, Winter 2009 4 Remember Decision Trees? 1/24/09 CS 461, Winter 2009 5 Algorithm: Build a Decision Tree 1/24/09 CS 461, Winter 2009 [Alpaydin 2004 The MIT Press] 6 Building a Regression Tree Same algorithm… different criterion Instead of impurity, use Mean Squared Error (in local region) Predict mean output for node Compute training error (Same as computing the variance for the node) Keep splitting until node error is acceptable; then it becomes a leaf Acceptable: error < threshold 1/24/09 CS 461, Winter 2009 7 Turning Trees into Rules 1/24/09 CS 461, Winter 2009 [Alpaydin 2004 The MIT Press] 8 Comparing Two Algorithms Chapter 14 1/24/09 CS 461, Winter 2009 9 Machine Learning Showdown! McNemar’s Test Under H0, we expect e01= e10=(e01+ e10)/2 e e10 1 2 01 e01 e10 1/24/09 ~ X12 Accept if < X2α,1 CS 461, Winter 2009 [Alpaydin 2004 The MIT Press] 10 Support Vector Machines Chapter 10 1/24/09 CS 461, Winter 2009 11 Linear Discrimination Model class boundaries (not data distribution) Learning: maximize accuracy on labeled data Inductive bias: form of discriminant used d gx | w,b wT x b w i x i b i1 C1 if g x 0 choose C2 otherwise 1/24/09 CS 461, Winter 2009 [Alpaydin 2004 The MIT Press] 12 How to find best w, b? E(w|X) is error with parameters w on sample X w*=arg minw E(w | X) Gradient E E E w E , ,..., w w w 2 d 1 T Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradient 1/24/09 CS 461, Winter 2009 [Alpaydin 2004 The MIT Press] 13 Gradient Descent w i E , i w i w i w i w i E (wt) E (wt+1) wt wt+1 η 1/24/09 CS 461, Winter 2009 [Alpaydin 2004 The MIT Press] 14 Support Vector Machines Maximum-margin linear classifiers Imagine: Army Ceasefire How to find best w, b? Quadratic programming: min 1 2 w subject to y t wT x t b 1,t 2 1/24/09 CS 461, Winter 2009 15 Optimization (primal formulation) 1 2 min w subject to y t wT x t b 1,t Must get training data right! 2 N 1 2 L p w t y t wT x t b1 2 t1 N N 1 2 w t y t wT x t b t 2 t1 t1 L p w L p b 1/24/09 N 0+d w t ytxt N +1 parameters t1 N 0 t y t 0 t1 CS 461, Winter 2009 [Alpaydin 2004 The MIT Press] 16 2 t1 N N 1 2 w t y t wT x t b t 2 t1 t1 Optimization (dual formulation) L p We know: N 1 0 2 w t y t x t t T t w w subject t1 to y w x b 1,t min 2 N L p 1 0 2 N t ytt t0 T t Lb w t1 y w x b1 p 2 t1 N So re-write: L L d p w L p b 1 T t t t w w wNT y x b t y t t t t t 2 0 w t y x t t t1 1 T w w t N 2 t tt 1 t1 t s t s t T s y y x x t 2 t s t 0 y 0 subject to N 1 2 L p w t y t wT x t b t 2 t1 t1 y t t 0 and 0,t t t αt >0 are the SVs, capped by C N parameters. Where did w and b go? 1/24/09 CS 461, Winter 2009 [Alpaydin 2004 The MIT Press] 17 What if Data isn’t Linearly Separable? 1. Embed data in higher-dimensional space Explicit: Basis functions (new features) Implicit: Kernel functions (new dot product/similarity) Visualization of 2D -> 3D Polynomial RBF/Gaussian Sigmoid SVM applet Still need to find a linear hyperplane 2. Add “slack” variables to permit some errors 1/24/09 CS 461, Winter 2009 18 Example: Orbital Classification EO-1 Linear SVM flying on EO-1 Earth Orbiter since Dec. 2004 Ice Water Classify every pixel Four classes 12 features (of 256 collected) Land Snow Hyperion 1/24/09 CS 461, Winter 2009 Classified [Castano et al., 2005] 19 SVM in Weka SMO: Sequential Minimal Optimization Faster than QP-based versions Try linear, RBF kernels 1/24/09 CS 461, Winter 2009 20 Summary: What You Should Know Decision trees Regression trees, pruning, rules Benefits of decision trees Evaluation Comparing two classifiers (McNemar’s test) Support Vector Machines Classification Linear discriminants, maximum margin Learning (optimization) Non-separable classes 1/24/09 CS 461, Winter 2009 21 Next Time Reading Evaluation (read Ch. 14.7) Support Vector Machines (read Ch. 10.1-10.4, 10.6, 10.9) Questions to answer from the reading Posted on the website Three volunteers: Sen, Jimmy, and Irvin 1/24/09 CS 461, Winter 2009 22
© Copyright 2026 Paperzz