Feature Extraction Sparse, Flexible and Efficient Modeling using L1-Regularization Saharon Rosset and Ji Zhu Markus Uhr Sparse Modeling Using L1-Regularization Contents 1. Idea 2. Algorithm 3. Results Feature Extraction Sparse Modeling Using L1-Regularization Part 1: Idea Feature Extraction Sparse Modeling Using L1-Regularization Introduction Setting: • Implicit dependency on training data • Linear model • Model: ( use j-functions) Feature Extraction Sparse Modeling Using L1-Regularization Feature Extraction Introduction Problem: How to choose weight l of regularization? Answer: Find for all l [0, ) • Can this be done efficiently (time, memory)? • Yes, if we impose restrictions on Sparse Modeling Using L1-Regularization Restrictions shall be piecewise linear • What impact on L(w) and J(w)? • Can we still solve real world problems? Feature Extraction Sparse Modeling Using L1-Regularization Restrictions must be piecewise constant • L(w) quadratic in w • J(w) linear in w Feature Extraction Sparse Modeling Using L1-Regularization Feature Extraction Quadratic Loss Functions • square loss in regression • hinge loss for classification (SVM) Sparse Modeling Using L1-Regularization Linear Penalty Functions • Sparseness property Feature Extraction Sparse Modeling Using L1-Regularization Bet on Sparseness • 50 samples with 300 independent Gaussian variables 1. Row: 3 non-zero variables 2. Row: 30 non-zero variables 3. Row: 300 non-zero variables Feature Extraction Sparse Modeling Using L1-Regularization Part 2: Algorithm Feature Extraction Sparse Modeling Using L1-Regularization Feature Extraction „Linear Toolbox“ a(r), b(r) and c(r) piecewise constant coefficients Regression Classification Sparse Modeling Using L1-Regularization Optimization Problem Feature Extraction Sparse Modeling Using L1-Regularization Algorithm Initialization • start at t=0 w=0 • determine set of non-zero components • starting direction Feature Extraction Sparse Modeling Using L1-Regularization Algorithm Loop follow the direction until one of the following happens: • addition of new component • vanishing of a non-zero component • hit of a “knot” (discontinuity of a(r), b(r), c(r) ) Feature Extraction Sparse Modeling Using L1-Regularization Algorithm Loop • direction update • stopping criterion Feature Extraction Sparse Modeling Using L1-Regularization Part 3: Results Feature Extraction Sparse Modeling Using L1-Regularization NIPS Results General procedure 1. pre-selection (univariate t-statistic) 2. Algorithm loss function: Huberized hinge loss 3. Find best l* based on validation dataset Feature Extraction Sparse Modeling Using L1-Regularization Feature Extraction NIPS Results Dexter Dataset • m=300, n=20'000, • linear pieces of • Optimum at pre-selection: n=1152 : 452 ( 120 non-zero components) Sparse Modeling Using L1-Regularization NIPS Results Not very happy with the results working with the original variables simple linear model L1 regularization for feature selection Feature Extraction Sparse Modeling Using L1-Regularization Conclusion • theory practice • limited to linear classifier • other extensions Regularization Path for the SVM (L2) Feature Extraction
© Copyright 2026 Paperzz