Basis Expansion and
Regularization
Presenter: Hongliang Fei
Brian Quanz
Date:
July 03, 2008
Contents
Introduction
Piecewise Polynomials and
Splines
Filtering and Feature Extraction
Smoothing Splines
Automatic Smoothing parameter
selection
1. Introduction
Basis: In Linear Algebra, a basis is a
set of vectors satisfying:
Linear combination of the basis can
represent every vector in a given
vector space;
No element of the set can be
represented as a linear combination
of the others.
In Function Space, Basis is
degenerated to a set of basis
functions;
Each function in the function space
can be represented as a linear
combination of the basis functions.
Example: Quadratic Polynomial bases
{1,t,t^2}
What is Basis Expansion?
Given data X and transformation
hm ( X ) : p , m 1,..., M . Then we model
as a linear basis expansion in X, where
hm ( X ) is a basis function.
Why Basis Expansion?
In regression problems, f(X) will
typically nonlinear in X;
Linear model is convenient and easy
to interpret;
When sample size is very small but
attribute size is very large, Linear
model is all what we can do to avoid
over fitting.
2. Piecewise Polynomials and
Splines
Spline:
In Mathematics, a spline is a special function
defined piecewise by polynomials;
In Computer Science, the term spline more
frequently refers to a piecewise polynomial
(parametric) curve.
Simple construction, ease and accuracy of
evaluation, capacity to approximate
complex shapes through curve fitting and
interactive curve design.
Example of a Spline
http://en.wikipedia.org/wiki/Image:BezierInterpolation.gif
Assume four knots spline (two
boundary knots and two interior
knots), also X is one dimensional.
Piecewise constant basis:
Piecewise Linear Basis:
Piecewise Cubic Polynomial
Basis functions:
Six functions corresponding to a sixdimensional linear space.
An M-order spline with knots j , j 1,..., K
has continuous derivatives up to
order M-2. The general form for
truncated-power basis set would be:
Natural cubic Spline
A natural cubic spline adds additional
constrains: function is linear beyond
the boundary knots.
A natural cubic spline with K knots is
represented by K basis functions.
One can start from a basis for cubic
splines, and derive the reduced basis
by imposing boundary constraints.
Example of Natural cubic spline
Starting from the truncated power
series basis, we arrive at:
Where
An example of application (Phoneme
Recognition)
Data:1000 samples drawn from 695
“aa”s and 1022 “ao”s, with a feature
vector of length 256.
Goal: use such data to classify
spoken phoneme.
The coefficients can be plotted as a
function of frequency
Fitting via maximum likelihood only, the
coefficient curve is very rough;
Fitting through natural cubic splines:
Rewrite the coefficient function as
expansion of splines
that’s
where H is a p by M basis matrix of
natural cubic splines.
since
we replace input features x
by filtered version
.
Fit via linear logistic regression on x*
Final result
3. Filtering and Feature Extraction
Preprocessing high-dimensional features is
a power method to improve performance
of learning algorithm.
Previous example
, a filtering
approach to transform features;
They need not be linear, but can be in a
general form
.
Another example: wavelet transform
refers to section 5.9.
4.Smoothing Splines
Purpose: avoid complexity of knot
selection problem by using maximal set of
knots.
Complexity is controlled via regularization.
Considering this problem: among all
functions with two continuous second
derivative, minimize
Though RSS is defined on an infinitedimensional function space, it has an
explicit, finite-dimensional unique
minimizer : a natural cubic spline
with knots at the unique values of
the xi , i 1,..., N.
Penalty term translates to a penalty
on the spline coefficients.
Rewrite the solution:
, where
are N-dimensional set of basis functions
representing the family of natural splines.
Matrix format criterion:
Where
.
With ridge regression result, the solution:
The fitted smooth spline is given by
Example of a smoothing spline
Degree of freedom and smoother matrix
A smoothing spline with prechosen
is a linear operator.
Let fˆ be the N-vector of fitted values
at the training predictors xi :
Here S is called smoother matrix. It
depends on , xi only.
fˆ ( xi )
Suppose B is a N by M matrix of M
cubic spline basis functions evaluated
at the N training points xi , with knot
sequence . The fitted spline value is
given by:
Here linear operator H is a projection
operator, known as hat matrix in
statistics.
Similarity and difference between S and H
Both are symmetric, positive, semi-definite.
Idempotent
Rank( S)=N, Rank( H )=M.
Trace of H gives the dimension of the
projection space (number of basis
functions).
Define effective degree of freedom
as:
By specifying df , we can derive .
Since S is symmetric, hence rewrite
is the solution of
K is known as Penalty Matrix.
Eigen-decomposition of S is given by:
where
k ( )
1
1 dk
d k , uk are eigen value and eigen vector of K.
Highlights of eigen-decompostion
The eigen-vectors are not effected by
changes in .
Shrinking nature
.
The eigen-vector sequence ordered by
decreasing k ( ) appears to increase in
complexity.
First two eigen values are always 1,
since d1=d2=0, showing Linear
functions are not penalized.
Figure: cubic smooth spline fitting to some data
5. Automatic selection of the
smoothing parameters
Selecting the placement and number
of knots for regression splines can be
a combinatorially complex task;
For smoothing splines, only penalty .
Method: fixing the degree of freedom,
solve it from
.
Criterion: Bias-Variance tradeoff.
The Bias-Variance Tradeoff
Integrated squared prediction error (EPE):
Cross Validation:
An example:
Figure: EPE,CV and effects for different
degree of freedom
Any questions?
© Copyright 2026 Paperzz