Generalized Cross Validation

Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Generalized Cross Validation
Mårten Marcus
26 november 2009
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
References
Spline models for Observational Data (1990) Grace Wahba
Optimal Estimation of Contour Properties by Cross-Validated
Regularization (1989) - Behzad Shahraray, David Anderson
Smoothing Noisy Data with Spline Function (1979) - Peter
Craven, Grace Wahba
Chapter 4 of
Mårten Marcus
Generalized Cross Validation
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Conditional Expectations
From probability theory, we have for
X ∈ L2 {Ω, F, P}
E[X ] = argmin E (X − θ)2
(1)
θ∈R
The least squares estimate therefore gives discretized estimate of
θ
and a natural norm to choose when searching for data disturbed by
white noise
Mårten Marcus
Generalized Cross Validation
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Ill-posedness
Consider the model
m
yi = g (ti ) + i , i = 1, 2, . . . , n, ti ∈ [0, 1]
m
m
0 , f 00 , . . . , f ( −1) }abs .cont ., f ( )
( )
g ∈ W2 = {f |f
{i ∼ WN (0, σ 2 )}
where
σ
∈ L2 [0, 1]
where
and
is unknown.
The least squares estimate gives
min
n
X
gˆ ∈W2 i =1
(m )
(yi − gˆi )2 = 0
∀{yi }ni=1
i
The minimum does not depend on data and is clearly ill-posed.
Mårten Marcus
Generalized Cross Validation
(2)
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Regularization
We may choose to regularize, or smooth, the data as
ĝn,λ
Z 1
n
X
2
= argmin
(yi − ĝ (ti )) + λ
(ĝ (m) (t ))2 dt
ĝ ∈W2 i =1
0
(m )
i
λ≥0
(3)
which has a unique solution.
Mårten Marcus
Generalized Cross Validation
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Properties of
For
Discussion
ĝn,λ
The reason for the term
For
Generalized Cross-Validation
λ = 0, ĝn,λ
smoothing spline
is the following
can be seen as an interpolating spline
λ = ∞, ĝn,λ
is a single polynomial of degree
m−1
(optimal in the least squares sense)
For 0
< λ < ∞,
it can be shown that
polynomials of degree at most 2
[ti , ti +1 ],
i ∈ {1, 2, . . . , n − 1}
m−1
ĝn,λ
is composed by
on the intevals
such that the function and its
m − 2 derivative are
(k ) (t ) = 0 for
1) = f
n
derivatives up to and including the 2
continuous at the knots, and
k = {m, m + 1, . . . , 2m − 2},
f
k
( ) (t
i.e. natural conditions in the end
points.
From which we can see that there is a strong dependence between
the quality of the result and a good choice of
Mårten Marcus
λ.
Generalized Cross Validation
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Examples
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Examples
Example
λ on ĝn,λ and gˆ0 n,λ for
πt
g (t ) = sin( 180
), t ∈ [0, 360]
Example of impact of dierent
yi = g (ti ) + i
where
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Dening the Optimal λ
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Dening the Optimal λ
True Mean Square Error
Generalized Cross-Validation
Discussion
R (λ) and Optimal λ
Using the notation above we dene the true mean square error,
R (λ),
as
R (λ) :=
The optimal
λ
1
n
n
X
i =1
(ĝn,λ (ti ) − gt )2
i
(4)
is then dened as
λ∗ = argmin R (λ)
λ∈R+
Mårten Marcus
Generalized Cross Validation
(5)
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Cross Validation
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Cross Validation
Intuition
Given our sample of
n
measurements, we leave one measurement
out and predict that data point by the use of the
n−1
remaining
measurements.
The idea is now that the best modell for the measurements is the
one that best predicts each measurement as a function of the
others.
Mårten Marcus
Generalized Cross Validation
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Cross Validation
The Ordinary Cross-Validation (OCV)
Denition
Let
ĝnk,λ ,
k
n
ĝ [ ,λ]
=
k ∈ {1, 2, . . . , n}
argmin
n
X
ĝ ∈W2 ii 6==k1
(m )
be dened as
(yi − ĝ (ti )) + λ
2
i
Z
0
1
(ĝ (m) (t ))2 dt
λ≥0
(6)
Then we dene the OCV mean square error as
V0 (λ) =
1
n
n
X
i =1
Mårten Marcus
(ĝn[i,λ] (ti ) − yi )2
Generalized Cross Validation
(7)
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Cross Validation
The Ordinary Cross-Validation (OCV)
Denition
Let
V0 (λ)
be dened as in the previous denition, the Ordinary
Cross-Validation Estimate of
λ
is
λ0 = argmin V0 (λ)
λ∈R+
Mårten Marcus
Generalized Cross Validation
(8)
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Cross Validation
Why we need more
The OCV estimate have proven good accuracy for certain periodic
functions, but in general it lacks a feature shown by the following:
m
( )
{g (t ) ∈ W2 , g (t ) = g (t + 1)} sampled
{tj = nj , j ∈ {1, 2, . . . , n}} adding some
Consider the function
equidistantly e.g.
uniform noise.
In this case all data are treated symmetrically. On the other hand,
dropping the conditions of periodicity and equidistant sampling, the
points interact dierently, giving rise to the need of weighting the
samples dierently.
Mårten Marcus
Generalized Cross Validation
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Generalized Cross Validation (GCV)
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Generalized Cross Validation (GCV)
Rewriting
V0
There exists an
n×n
A(λ),
matrix
the
inuence matrix,
with the
property
ĝn,λ (t1 )
 ĝn,λ (t2 ) 






Such that
V0 (λ)
ĝn,λ (tn )
 = A(λ)y

(9)
can be rewritten as
V0 (λ) =
Where
.
.
.
1
n
akj , k , j ∈ 1, 2, . . . , n
n
X
k =1
Pn
i =1 (akj yj − yk )2
(1 − akk )2
is element
Mårten Marcus
{k , j }
of
A(λ)
Generalized Cross Validation
(10)
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Generalized Cross Validation (GCV)
The Generalized Cross Validation (GCV)
Denition
Let
A(λ)
be the inuence matrix dened above, then the GCV
function is dened as
1
||(I − A(λ))y ||2
2
tr
(I
−
A
(λ))
n
V (λ) = n
1
We say that the Generalized Cross-Validation Estimate of
λ̄ = argmin V (λ)
λ∈R+
Mårten Marcus
Generalized Cross Validation
(11)
λ
is
(12)
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Generalized Cross Validation (GCV)
Similarity to OCV
By rewriting
V (λ)
we get
V (λ) =
where
wk (λ)
1
n
n
X
i =1
(13)
are given by
wk (λ) =
wk (λ)
(ĝn[i,λ] (ti ) − yi )2 wk (λ)
− akk (λ)
1
n tr (I − A(λ))
1
have been shown to adjust
V0 (λ)
!2
for periodicity and
non-equidistant samples.
Mårten Marcus
Generalized Cross Validation
(14)
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Generalized Cross Validation (GCV)
Comparison
Comparison of the true mean square error with the GCV function
for the example
t ),
g (t ) = sin( 180
π
λ
on
ĝn,λ
and
t ∈ [0, 360]
gˆ0 n,λ
Mårten Marcus
for
yi = g (ti ) + i
Generalized Cross Validation
where
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Convergence Result
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Discussion
Convergence Result
Convergence Result
Theorem
g (·) ∈ W2(m) and let {ti }ni=1 satisfy 0t w (u )du = ni , where
w (u ) is a strictly positive continuous weight function. Then there
∞
∗ ∞
exists sequences {λ̄n }n=1 and {λn }n=1 of minimas of E[V (λ)] and
E[R (λ)] such that the expectation ineciency I ∗ ,
R
Let
I∗ =
E[V (λ̄n )]
→ 1,
E[R (λ∗n )]
Mårten Marcus
as
i
n→∞
Generalized Cross Validation
(15)
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
Plan
1
Generals
2
What Regularization Parameter
λ
is Optimal?
Examples
Dening the Optimal
3
λ
Generalized Cross-Validation
Cross Validation
Generalized Cross Validation (GCV)
Convergence Result
4
Discussion
Mårten Marcus
Generalized Cross Validation
Discussion
Innehåll
Generals
What Regularization Parameter λ is Optimal?
Generalized Cross-Validation
An implicit assumption that the true function
g (t )
Discussion
is smooth
GCV was applied eectively to problems such as:
Finding the right order of splines in regression
Regularized solution of the Fredholm integral equations of the
rst kind
It provides an estimate of the regularization parameter from
general assumptions on the data.
Mårten Marcus
Generalized Cross Validation