Paper A slides

Feature Extraction
Sparse, Flexible and Efficient Modeling using
L1-Regularization
Saharon Rosset and Ji Zhu
Markus Uhr
Sparse Modeling Using L1-Regularization
Contents
1. Idea
2. Algorithm
3. Results
Feature Extraction
Sparse Modeling Using L1-Regularization
Part 1: Idea
Feature Extraction
Sparse Modeling Using L1-Regularization
Introduction
Setting:
• Implicit dependency on training data
• Linear model
• Model:
( use j-functions)
Feature Extraction
Sparse Modeling Using L1-Regularization
Feature Extraction
Introduction
Problem: How to choose weight l of regularization?
Answer: Find
for all l  [0, )
• Can this be done efficiently (time, memory)?
• Yes, if we impose restrictions on
Sparse Modeling Using L1-Regularization
Restrictions
shall be piecewise linear
• What impact on L(w) and J(w)?
• Can we still solve real world problems?
Feature Extraction
Sparse Modeling Using L1-Regularization
Restrictions
must be piecewise constant
• L(w) quadratic in w
• J(w) linear in w
Feature Extraction
Sparse Modeling Using L1-Regularization
Feature Extraction
Quadratic Loss Functions
•
square loss in regression
•
hinge loss for classification (SVM)
Sparse Modeling Using L1-Regularization
Linear Penalty Functions
• Sparseness property
Feature Extraction
Sparse Modeling Using L1-Regularization
Bet on Sparseness
• 50 samples with 300 independent
Gaussian variables
1. Row: 3 non-zero variables
2. Row: 30 non-zero variables
3. Row: 300 non-zero variables
Feature Extraction
Sparse Modeling Using L1-Regularization
Part 2: Algorithm
Feature Extraction
Sparse Modeling Using L1-Regularization
Feature Extraction
„Linear Toolbox“
a(r), b(r) and c(r) piecewise constant coefficients
Regression
Classification
Sparse Modeling Using L1-Regularization
Optimization Problem
Feature Extraction
Sparse Modeling Using L1-Regularization
Algorithm Initialization
• start at t=0  w=0
• determine set of non-zero
components
• starting direction
Feature Extraction
Sparse Modeling Using L1-Regularization
Algorithm Loop
follow the direction until one of
the following happens:
• addition of new component
• vanishing of a non-zero
component
• hit of a “knot” (discontinuity
of a(r), b(r), c(r) )
Feature Extraction
Sparse Modeling Using L1-Regularization
Algorithm Loop
• direction update
• stopping criterion
Feature Extraction
Sparse Modeling Using L1-Regularization
Part 3: Results
Feature Extraction
Sparse Modeling Using L1-Regularization
NIPS Results
General procedure
1. pre-selection
(univariate t-statistic)
2. Algorithm loss function:
Huberized hinge loss
3. Find best l* based
on validation dataset
Feature Extraction
Sparse Modeling Using L1-Regularization
Feature Extraction
NIPS Results
Dexter Dataset
• m=300, n=20'000,
• linear pieces of
• Optimum at
pre-selection: n=1152
: 452
( 120 non-zero components)
Sparse Modeling Using L1-Regularization
NIPS Results
Not very happy with the results
 working with the original variables
 simple linear model
 L1 regularization for feature selection
Feature Extraction
Sparse Modeling Using L1-Regularization
Conclusion
• theory  practice
• limited to linear classifier
• other extensions
Regularization Path for the SVM (L2)
Feature Extraction