Gonzalo Mateos and Georgios B. Giannakis

Robust Nonparametric Regression
by Controlling Sparsity
Gonzalo Mateos and Georgios B. Giannakis
ECE Department, University of Minnesota
Acknowledgments: NSF grants no. CCF-0830480, 1016605
EECS-0824007, 1002180
May 24, 2011
1
Nonparametric regression
 Given
, function estimation allows predicting
 Estimate unknown
from a training data set
 If one trusts data more than any parametric model
 Then go nonparametric regression:
 lives in a (possibly
-dimensional) space of “smooth’’ functions
 Ill-posed problem
 Workaround: regularization [Tikhonov’77], [Wahba’90]
 RKHS
with reproducing kernel
and norm
 Our focus
 Nonparametric regression robust against outliers
 Robustness by controlling sparsity
2
Our work in context
 Noteworthy applications
 Load curve data cleansing [Chen et al’10]
 Spline-based PSD cartography [Bazerque et al’09]
 Robust nonparametric regression
 Huber’s function [Zhu et al’08]
 No systematic way to select thresholds
 Robustness and sparsity in linear (parametric) regression
 Huber’s M-type estimator as Lasso [Fuchs‘99]; contamination model
 Bayesian framework [Jin-Rao‘10][Mitra et al’10]; rigid choice of
3
Variational LTS
 Least-trimmed squares (LTS) regression [Rousseeuw’87]
Variational (V)LTS counterpart
(VLTS)


is the -th order statistic among
residuals discarded
 Q: How should we go about minimizing
?
(VLTS) is nonconvex; existence of minimizer(s)?
A: Try all
subsamples of size , solve, and pick the best
 Simple but intractable beyond small problems
4
Modeling outliers
 Outlier variables
 Nominal data obey
s.t.
outlier
otherwise
; outliers something else
 Remarks
 Both
and
are unknown
 If outliers sporadic, then vector is sparse!
 Natural (but intractable) nonconvex estimator
5
VLTS as sparse regression
 Lagrangian form
(P0)
 Tuning parameter
controls sparsity in
Proposition 1: If
solves (P0) with
then
solves (VLTS) too.
number of outliers
chosen s.t.
,
 The equivalence
 Formally justifies the regression model and its estimator (P0)
 Ties sparse regression with robust estimation
6
Just relax!
 (P0) is NP-hard
relax
(P1)
 (P1) convex, and thus efficiently solved
 Role of sparsity controlling
is central
 Q: Does (P1) yield robust estimates ?
A: Yap! Huber estimator is a special case
where
7
Alternating minimization
 (P1) jointly convex in
AM solver
(P1)
 Remarks
 Single Cholesky factorization of
 Soft-thresholding
 Reveals the intertwining between
 Outlier identification
 Function estimation with outlier compensated data
8
Lassoing outliers
 Alternative to AM
Proposition 2: Minimizers
solve Lasso [Tibshirani’94]
of (P1) are fully determined by
w/
as
and
, with
 Enables effective methods to select
 Lasso solvers return entire robustification path (RP)
 Cross-validation (CV) fails with multiple outliers [Hampel’86]
9
Robustification paths
 LARS returns whole RP [Efron’03]
 Same cost of a single LS fit (
)
Coeffs.
 Lasso path of solutions is piecewise linear
 Lasso is simple in the scalar case
 Coordinate descent is fast! [Friedman ‘07]
 Exploits warm starts, sparsity
 Other solvers: SpaRSA [Wright et al’09], SPAMS [Mairal et al’10]
 Leverage these solvers

values of
 For each ,
consider 2-D grid
values of
10
Selecting
and
 Relies on RP and knowledge on the data model
 Number of outliers known: from RP, obtain range of
Discard outliers (known), and use CV to determine
s.t.
 Variance of the nominal noise known: from RP, for each
grid, obtain an entry of the
sample variance matrix
The best
.
on the
as
are s.t.
 Variance of the nominal noise unknown: replace
above with a
robust estimate
, e.g., median absolute deviation (MAD)
11
Nonconvex regularization
 Nonconvex penalty terms approximate
better in (P0)
 Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08]
 Iterative linearization-minimization of
around
 Remarks
 Initialize with
, use
and
 Bias reduction (cf. adaptive Lasso [Zou’06])
12
Robust thin-plate splines
 Specialize to thin-plate splines [Duchon’77], [Wahba’80]
 Smoothing penalty only a seminorm in
 Solution:
 Radial basis function
 Augment w/ member of the nullspace of
 Given , unknowns
found in closed form
 Still, Proposition 2 holds for appropriate
13
Simulation setup
 Training set

: noisy samples of Gaussian mixture
examples,
i.i.d.
 Outliers:
 Nominal:
True function
i.i.d. for
w/
i.i.d. (
known)
Data
14
Robustification paths
 Grid parameters:


grid:
grid:
 Paths obtained using SpaRSA [Wright et al’09]
Outlier
Inlier
15
Results
True function
Robust predictions
Nonrobust predictions
Refined predictions
 Effectiveness in rejecting outliers is apparent
16
Generalization capability
 In all cases, 100% outlier identification success rate
 Figures of merit
 Training error:
 Test error:
 Nonconvex refinement leads to consistently lower
17
Load curve data cleansing
 Load curve: electric power consumption recorded periodically
 Reliable data: key to realize smart grid vision
 B-splines for load curve prediction and denoising [Chen et al ’10]
 Deviation from nominal models (outliers)
 Faulty meters, communication errors
 Unscheduled maintenance, strikes, sporting events
Uruguay’s aggregate power consumption (MW)
18
Real data tests
Robust predictions
Nonrobust predictions
Refined predictions
19
Concluding summary
 Robust nonparametric regression
 VLTS as
-(pseudo)norm regularized regression (NP-hard)
 Convex relaxation
variational M-type estimator
Lasso
 Controlling sparsity amounts to controlling number of outliers
 Sparsity controlling role of
is central
 Selection of
using the Lasso robustification paths
 Different options dictated by available knowledge on the data model
 Refinement via nonconvex penalty terms
 Bias reduction and improved generalization capability
 Real data tests for load curve cleansing
20