Regularization Parameter Estimation for Least Squares: A Newton

Regularization Parameter Estimation for Least
Squares: A Newton method using the χ2 -distribution
Rosemary Renaut, Jodi Mead
Arizona State and Boise State
September 2007
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
1 / 31
Outline
1
Introduction- Ill-posed least squares
Some Standard Methods
2
A Statistically based method: Chi squared Method
Background
Algorithm
Single Variable Newton Method
Extend for General D: Generalized Tikhonov
Observations
3
Results
4
Conclusions
5
References
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
2 / 31
Regularized Least Squares for Ax = b
Assume: A ∈ Rm×n , b ∈ Rm , x ∈ Rn , and the system is ill-posed.
Generalized Tikhonov regularization, operator D acts on x.
2
x̂ = argmin J(x) = argmin{kAx − bkW
+ kD(x − x0 )k2Wx }.
b
(1)
Assume N (A) ∩ N (D) = ∅
Statistically Wb is inverse covariance matrix for data b.
Standard: Wx = λ2 I, λ unknown penalty parameter: (1) is
x̂(λ) = argmin J(x) = argmin{kAx − bk2Wb + λ2 kD(x − x0 )k2 }. (2)
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
3 / 31
Regularized Least Squares for Ax = b
Assume: A ∈ Rm×n , b ∈ Rm , x ∈ Rn , and the system is ill-posed.
Generalized Tikhonov regularization, operator D acts on x.
2
x̂ = argmin J(x) = argmin{kAx − bkW
+ kD(x − x0 )k2Wx }.
b
(1)
Assume N (A) ∩ N (D) = ∅
Statistically Wb is inverse covariance matrix for data b.
Standard: Wx = λ2 I, λ unknown penalty parameter: (1) is
x̂(λ) = argmin J(x) = argmin{kAx − bk2Wb + λ2 kD(x − x0 )k2 }. (2)
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
3 / 31
Regularized Least Squares for Ax = b
Assume: A ∈ Rm×n , b ∈ Rm , x ∈ Rn , and the system is ill-posed.
Generalized Tikhonov regularization, operator D acts on x.
2
x̂ = argmin J(x) = argmin{kAx − bkW
+ kD(x − x0 )k2Wx }.
b
(1)
Assume N (A) ∩ N (D) = ∅
Statistically Wb is inverse covariance matrix for data b.
Standard: Wx = λ2 I, λ unknown penalty parameter: (1) is
x̂(λ) = argmin J(x) = argmin{kAx − bk2Wb + λ2 kD(x − x0 )k2 }. (2)
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
3 / 31
Regularized Least Squares for Ax = b
Assume: A ∈ Rm×n , b ∈ Rm , x ∈ Rn , and the system is ill-posed.
Generalized Tikhonov regularization, operator D acts on x.
2
x̂ = argmin J(x) = argmin{kAx − bkW
+ kD(x − x0 )k2Wx }.
b
(1)
Assume N (A) ∩ N (D) = ∅
Statistically Wb is inverse covariance matrix for data b.
Standard: Wx = λ2 I, λ unknown penalty parameter: (1) is
x̂(λ) = argmin J(x) = argmin{kAx − bk2Wb + λ2 kD(x − x0 )k2 }. (2)
Question: What is the correct λ?
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
3 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ) − A)b:
Influence Matrix A(λ) =
A(AT Wb A + λ2 D T D)−1 AT
Plot
log(kDxk), log(kr(λ)k)
Trade off contributions.
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Not statistically based
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ) − A)b:
Influence Matrix A(λ) =
A(AT Wb A + λ2 D T D)−1 AT
Plot
log(kDxk), log(kr(λ)k)
Trade off contributions.
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Not statistically based
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ) − A)b:
Influence Matrix A(λ) =
A(AT Wb A + λ2 D T D)−1 AT
Plot
log(kDxk), log(kr(λ)k)
Trade off contributions.
Expensive - requires range of
λ.
Find corner
GSVD makes calculations
efficient.
Not statistically based
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ) − A)b:
Influence Matrix A(λ) =
A(AT Wb A + λ2 D T D)−1 AT
Plot
log(kDxk), log(kr(λ)k)
Find corner
Trade off contributions.
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Not statistically based
No corner
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ) − A)b:
Influence Matrix A(λ) =
A(AT Wb A + λ2 D T D)−1 AT
Plot
log(kDxk), log(kr(λ)k)
Find corner
Trade off contributions.
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Not statistically based
No corner
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ) − A)b:
Influence Matrix A(λ) =
A(AT Wb A + λ2 D T D)−1 AT
Plot
log(kDxk), log(kr(λ)k)
Find corner
Trade off contributions.
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Not statistically based
No corner
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ) − A)b:
Influence Matrix A(λ) =
A(AT Wb A + λ2 D T D)−1 AT
Plot
log(kDxk), log(kr(λ)k)
Find corner
Trade off contributions.
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Not statistically based
No corner
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
4 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
2
kb − Ax(λ)kW
b
[trace(Im − A(λ))]2
,
which estimates predictive
risk.
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Requires minimum
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
2
kb − Ax(λ)kW
b
[trace(Im − A(λ))]2
,
which estimates predictive
risk.
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Multiple minima
Statistically based
Requires minimum
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
2
kb − Ax(λ)kW
b
[trace(Im − A(λ))]2
,
which estimates predictive
risk.
Multiple minima
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Requires minimum
Sometimes flat
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
2
kb − Ax(λ)kW
b
[trace(Im − A(λ))]2
,
which estimates predictive
risk.
Multiple minima
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Requires minimum
Sometimes flat
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
2
kb − Ax(λ)kW
b
[trace(Im − A(λ))]2
,
which estimates predictive
risk.
Multiple minima
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Requires minimum
Sometimes flat
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
2
kb − Ax(λ)kW
b
[trace(Im − A(λ))]2
,
which estimates predictive
risk.
Multiple minima
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Requires minimum
Sometimes flat
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
2
kb − Ax(λ)kW
b
[trace(Im − A(λ))]2
,
which estimates predictive
risk.
Multiple minima
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Requires minimum
Sometimes flat
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
5 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation
(UPRE)
Minimize expected value of
predictive risk: Minimize
UPRE function
kb − Ax(λ)k2Wb
+2 trace(A(λ)) − m
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Minimum needed
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation
(UPRE)
Minimize expected value of
predictive risk: Minimize
UPRE function
kb − Ax(λ)k2Wb
+2 trace(A(λ)) − m
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Minimum needed
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation
(UPRE)
Minimize expected value of
predictive risk: Minimize
UPRE function
kb − Ax(λ)k2Wb
+2 trace(A(λ)) − m
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Minimum needed
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation
(UPRE)
Minimize expected value of
predictive risk: Minimize
UPRE function
kb − Ax(λ)k2Wb
+2 trace(A(λ)) − m
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Minimum needed
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation
(UPRE)
Minimize expected value of
predictive risk: Minimize
UPRE function
kb − Ax(λ)k2Wb
+2 trace(A(λ)) − m
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Minimum needed
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation
(UPRE)
Minimize expected value of
predictive risk: Minimize
UPRE function
kb − Ax(λ)k2Wb
+2 trace(A(λ)) − m
Expensive - requires range of
λ.
GSVD makes calculations
efficient.
Statistically based
Minimum needed
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
6 / 31
Development
A New Statistically based method: The Chi squared Method
Its Background
A Newton algorithm
Some Examples
Future Work
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
7 / 31
Development
A New Statistically based method: The Chi squared Method
Its Background
A Newton algorithm
Some Examples
Future Work
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
7 / 31
Development
A New Statistically based method: The Chi squared Method
Its Background
A Newton algorithm
Some Examples
Future Work
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
7 / 31
Development
A New Statistically based method: The Chi squared Method
Its Background
A Newton algorithm
Some Examples
Future Work
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
7 / 31
Development
A New Statistically based method: The Chi squared Method
Its Background
A Newton algorithm
Some Examples
Future Work
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
7 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b − Ax)T Cb −1 (b − Ax) + (x − x0 )T Cx −1 (x − x0 ),
x and b are stochastic (need not be normal)
r = b − Ax0 are iid. (Assume no components are zero)
Matrices Cb = Wb −1 and Cx = Wx −1 are SPD Then for large m,
minimium value of J is a random variable
it follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
8 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b − Ax)T Cb −1 (b − Ax) + (x − x0 )T Cx −1 (x − x0 ),
x and b are stochastic (need not be normal)
r = b − Ax0 are iid. (Assume no components are zero)
Matrices Cb = Wb −1 and Cx = Wx −1 are SPD Then for large m,
minimium value of J is a random variable
it follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
8 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b − Ax)T Cb −1 (b − Ax) + (x − x0 )T Cx −1 (x − x0 ),
x and b are stochastic (need not be normal)
r = b − Ax0 are iid. (Assume no components are zero)
Matrices Cb = Wb −1 and Cx = Wx −1 are SPD Then for large m,
minimium value of J is a random variable
it follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
8 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b − Ax)T Cb −1 (b − Ax) + (x − x0 )T Cx −1 (x − x0 ),
x and b are stochastic (need not be normal)
r = b − Ax0 are iid. (Assume no components are zero)
Matrices Cb = Wb −1 and Cx = Wx −1 are SPD Then for large m,
minimium value of J is a random variable
it follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
8 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b − Ax)T Cb −1 (b − Ax) + (x − x0 )T Cx −1 (x − x0 ),
x and b are stochastic (need not be normal)
r = b − Ax0 are iid. (Assume no components are zero)
Matrices Cb = Wb −1 and Cx = Wx −1 are SPD Then for large m,
minimium value of J is a random variable
it follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
8 / 31
Implications:
Theorem implies
m−
√
2zα/2 < J(x̂) < m +
√
2zα/2
for confidence interval (1 − α), x̂ the solution.
Equivalently, when D = I,
√
√
m − 2zα/2 < rT (ACx AT + Cb )−1 r < m + 2zα/2 .
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
9 / 31
Implications:
Theorem implies
m−
√
2zα/2 < J(x̂) < m +
√
2zα/2
for confidence interval (1 − α), x̂ the solution.
Equivalently, when D = I,
√
√
m − 2zα/2 < rT (ACx AT + Cb )−1 r < m + 2zα/2 .
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
9 / 31
Implications:
Theorem implies
m−
√
2zα/2 < J(x̂) < m +
√
2zα/2
for confidence interval (1 − α), x̂ the solution.
Equivalently, when D = I,
√
√
m − 2zα/2 < rT (ACx AT + Cb )−1 r < m + 2zα/2 .
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
9 / 31
Implications:
Theorem implies
m−
√
2zα/2 < J(x̂) < m +
√
2zα/2
for confidence interval (1 − α), x̂ the solution.
Equivalently, when D = I,
√
√
m − 2zα/2 < rT (ACx AT + Cb )−1 r < m + 2zα/2 .
Note no assumptions on Wx : it is completely general
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
9 / 31
Implications:
Theorem implies
m−
√
2zα/2 < J(x̂) < m +
√
2zα/2
for confidence interval (1 − α), x̂ the solution.
Equivalently, when D = I,
√
√
m − 2zα/2 < rT (ACx AT + Cb )−1 r < m + 2zα/2 .
Note no assumptions on Wx : it is completely general
Can we use the result to obtain an efficient algorithm?
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
9 / 31
First attempt: a New Algorithm for Estimating Model Covariance
Algorithm (Mead 07)
Given confidence interval parameter α, initial residual r = b − Ax0 and
estimate of the data covariance Cb , find Lx which solves the nonlinear
optimization.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
10 / 31
First attempt: a New Algorithm for Estimating Model Covariance
Algorithm (Mead 07)
Given confidence interval parameter α, initial residual r = b − Ax0 and
estimate of the data covariance Cb , find Lx which solves the nonlinear
optimization.
Minimize
Subject to
kLx Lx√T k2F
√
m − 2zα/2 < rT (ALx Lx T AT + Cb )−1 r < m + 2zα/2
ALx Lx T AT + Cb well-conditioned.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
10 / 31
First attempt: a New Algorithm for Estimating Model Covariance
Algorithm (Mead 07)
Given confidence interval parameter α, initial residual r = b − Ax0 and
estimate of the data covariance Cb , find Lx which solves the nonlinear
optimization.
Minimize
Subject to
kLx Lx√T k2F
√
m − 2zα/2 < rT (ALx Lx T AT + Cb )−1 r < m + 2zα/2
ALx Lx T AT + Cb well-conditioned.
Expensive
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
10 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σx−2 I, where regularization parameter λ = 1/σx .
Use SVD to implement Ub Σb VbT = Wb 1/2 A, svs σ1 ≥ σ2 ≥ . . . σp
and define s = Ub Wb 1/2 r:
Find σx such that
m−
√
2zα/2 < sT diag(
√
1
σi2 σx2
+1
)s < m +
2zα/2 .
Equivalently, find σx2 such that
F (σx ) = sT diag(
Renaut and Mead (ASU/Boise)
1
)s − m = 0.
1 + σx2 σi2
Scalar Newton method
September 2007
11 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σx−2 I, where regularization parameter λ = 1/σx .
Use SVD to implement Ub Σb VbT = Wb 1/2 A, svs σ1 ≥ σ2 ≥ . . . σp
and define s = Ub Wb 1/2 r:
Find σx such that
m−
√
2zα/2 < sT diag(
√
1
σi2 σx2
+1
)s < m +
2zα/2 .
Equivalently, find σx2 such that
F (σx ) = sT diag(
Renaut and Mead (ASU/Boise)
1
)s − m = 0.
1 + σx2 σi2
Scalar Newton method
September 2007
11 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σx−2 I, where regularization parameter λ = 1/σx .
Use SVD to implement Ub Σb VbT = Wb 1/2 A, svs σ1 ≥ σ2 ≥ . . . σp
and define s = Ub Wb 1/2 r:
Find σx such that
m−
√
2zα/2 < sT diag(
√
1
σi2 σx2
+1
)s < m +
2zα/2 .
Equivalently, find σx2 such that
F (σx ) = sT diag(
Renaut and Mead (ASU/Boise)
1
)s − m = 0.
1 + σx2 σi2
Scalar Newton method
September 2007
11 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σx−2 I, where regularization parameter λ = 1/σx .
Use SVD to implement Ub Σb VbT = Wb 1/2 A, svs σ1 ≥ σ2 ≥ . . . σp
and define s = Ub Wb 1/2 r:
Find σx such that
m−
√
2zα/2 < sT diag(
√
1
σi2 σx2
+1
)s < m +
2zα/2 .
Equivalently, find σx2 such that
F (σx ) = sT diag(
Renaut and Mead (ASU/Boise)
1
)s − m = 0.
1 + σx2 σi2
Scalar Newton method
September 2007
11 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σx−2 I, where regularization parameter λ = 1/σx .
Use SVD to implement Ub Σb VbT = Wb 1/2 A, svs σ1 ≥ σ2 ≥ . . . σp
and define s = Ub Wb 1/2 r:
Find σx such that
m−
√
2zα/2 < sT diag(
1
σi2 σx2 + 1
√
2zα/2 .
)s < m +
Equivalently, find σx2 such that
F (σx ) = sT diag(
1
)s − m = 0.
1 + σx2 σi2
Scalar Root Finding: Newton’s Method
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
11 / 31
Extension to Generalized Tikhonov
Define
2
x̂GTik = argminJD (x) = argmin{kAx − bkW
+ kD(x − x0 )k2Wx },
b
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
(3)
12 / 31
Extension to Generalized Tikhonov
Define
2
x̂GTik = argminJD (x) = argmin{kAx − bkW
+ kD(x − x0 )k2Wx },
b
(3)
Theorem
For large m, the minimium value of JD is a random variable which
follows a χ2 distribution with m − n + p degrees of freedom. (Assuming
that no components of r are zero)
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
12 / 31
Extension to Generalized Tikhonov
Define
2
x̂GTik = argminJD (x) = argmin{kAx − bkW
+ kD(x − x0 )k2Wx },
b
(3)
Theorem
For large m, the minimium value of JD is a random variable which
follows a χ2 distribution with m − n + p degrees of freedom. (Assuming
that no components of r are zero)
Proof.
Use the Generalized Singular Value Decomposition for
[Wb 1/2 A, Wx 1/2 D]
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
12 / 31
Extension to Generalized Tikhonov
Define
2
x̂GTik = argminJD (x) = argmin{kAx − bkW
+ kD(x − x0 )k2Wx },
b
(3)
Theorem
For large m, the minimium value of JD is a random variable which
follows a χ2 distribution with m − n + p degrees of freedom. (Assuming
that no components of r are zero)
Proof.
Use the Generalized Singular Value Decomposition for
[Wb 1/2 A, Wx 1/2 D]
Find Wx such that JD is χ2 with m − n + p d.o.f.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
12 / 31
Newton Root Finding Wx = σx−2 Ip
Let
GSVD of [Wb 1/2 A, D]
Υ
A=U
XT
0m−n×n
D = V [M, 0p×n−p ]X T ,
γi are the generalized singular values
P
P
2
m̃ = m − n + p − pi=1 si2 δγi 0 − m
i=n+1 si ,
s̃i = si /(γi2 σx2 + 1), i = 1, . . . , p
ti = s̃i γi .
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
13 / 31
Newton Root Finding Wx = σx−2 Ip
Let
GSVD of [Wb 1/2 A, D]
Υ
A=U
XT
0m−n×n
D = V [M, 0p×n−p ]X T ,
γi are the generalized singular values
P
P
2
m̃ = m − n + p − pi=1 si2 δγi 0 − m
i=n+1 si ,
s̃i = si /(γi2 σx2 + 1), i = 1, . . . , p
ti = s̃i γi .
Find root of
Pp
1
2
i=1 ( γ 2 σ 2 +1 )si
Solve F = 0, where
+
i
F (σx ) = sT s̃ − m̃
Renaut and Mead (ASU/Boise)
Pm
2
i=n+1 si
=m
and F 0 (σx ) = −2σx ktk22 .
Scalar Newton method
September 2007
13 / 31
An Illustrative Example: phillips Fredholm integral equation (Hansen)
Example Error 10%
Add noise to b
Standard deviation
σbi = .01|bi | + .1bmax
Covariance matrix
Cb = σb2 Im = Wb −1
σb2 average of σb2i
− is the original b and ∗ noisy
data.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
14 / 31
An Illustrative Example: phillips Fredholm integral equation (Hansen)
Comparison with new method
Compare Solutions:
+ is reference x0 . −− is exact.
L-Curve o
Three other solutions: UPRE,
GCV and χ2 method (blue,
magenta, black)
Each method gives different
solution - but UPRE, GCV and
χ2 are comparable.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
14 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).
Algorithm is cheap as compared to GCV, UPRE, L-curve.
F is monotonic decreasing, even
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
15 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).
Algorithm is cheap as compared to GCV, UPRE, L-curve.
F is monotonic decreasing, even
Solution either exists and is unique for positive σ
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
15 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).
Algorithm is cheap as compared to GCV, UPRE, L-curve.
F is monotonic decreasing, even
Solution either exists and is unique for positive σ
Or no solution exists F (0) < 0.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
15 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).
Algorithm is cheap as compared to GCV, UPRE, L-curve.
F is monotonic decreasing, even
Solution either exists and is unique for positive σ
Or no solution exists F (0) < 0.
Theoretically, limσ→∞ F > 0 possible. Equivalent to λ = 0. No
regularization needed.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
15 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).
Algorithm is cheap as compared to GCV, UPRE, L-curve.
F is monotonic decreasing, even
Solution either exists and is unique for positive σ
Or no solution exists F (0) < 0.
Theoretically, limσ→∞ F > 0 possible. Equivalent to λ = 0. No
regularization needed.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
15 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.
Equivalently, there are insufficient degrees of freedom.
Notice
J(x̂) = kP 1/2 sk22 ,
P = diag(1/((γi σ)2 + 1), 0n−p , Im−n )
In particular J(x̂(0)) = kP 1/2 (0)sk22 = y, for some y. If y < m̃, set
m̃ = floor(y )
Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.
In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
16 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.
Equivalently, there are insufficient degrees of freedom.
Notice
J(x̂) = kP 1/2 sk22 ,
P = diag(1/((γi σ)2 + 1), 0n−p , Im−n )
In particular J(x̂(0)) = kP 1/2 (0)sk22 = y, for some y. If y < m̃, set
m̃ = floor(y )
Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.
In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
16 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.
Equivalently, there are insufficient degrees of freedom.
Notice
J(x̂) = kP 1/2 sk22 ,
P = diag(1/((γi σ)2 + 1), 0n−p , Im−n )
In particular J(x̂(0)) = kP 1/2 (0)sk22 = y, for some y. If y < m̃, set
m̃ = floor(y )
Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.
In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
16 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.
Equivalently, there are insufficient degrees of freedom.
Notice
J(x̂) = kP 1/2 sk22 ,
P = diag(1/((γi σ)2 + 1), 0n−p , Im−n )
In particular J(x̂(0)) = kP 1/2 (0)sk22 = y, for some y. If y < m̃, set
m̃ = floor(y )
Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.
In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
16 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.
Equivalently, there are insufficient degrees of freedom.
Notice
J(x̂) = kP 1/2 sk22 ,
P = diag(1/((γi σ)2 + 1), 0n−p , Im−n )
In particular J(x̂(0)) = kP 1/2 (0)sk22 = y, for some y. If y < m̃, set
m̃ = floor(y )
Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.
In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
16 / 31
Example: Seismic Signal Restoration
Real data set of 48 signals of length 500.
The point spread function is derived from the signals
Solve Pf = g, where P is psf matrix, g is signal and restore f .
Calculate the signal variance pointwise over all 48 signals.
Compare restoration of S-wave with derivative orders 0, 1, 2
Weighting matrices are I, σg−2 I, and diag(σg−2
), cases 1, 2, and 3.
i
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
17 / 31
Tikhonov Regularization
Observations
Reduced
Degrees of
Freedom
Relevant!
Degrees of
Freedom found
automatically
Case 2 and 1
have different
solutions
Case 3 leaves
the features of
the signal
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
18 / 31
First and Second Order Derivative Restoration
Observations
Here derivative
smoothing is not
desirable
Case 3 preserves
signal characterist
Given value is λ, th
regularization
parameter.
λ increases with
derivative order - m
smoothing.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
19 / 31
Comparison with L-curve and UPRE Solutions
Observations
L-curve
underestimates λ
severely.
UPRE and χ2 are
similar when DOF
limited on χ2 .
UPRE underestima
for case 2 and 3
weighting.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
20 / 31
Newton’s Method converges in 5 − 10 Iterations
l
cb
0
0
0
1
1
1
2
2
2
1
2
3
1
2
3
1
2
3
Iterations k
mean
std
8.23e + 00 6.64e − 01
8.31e + 00 9.80e − 01
8.06e + 00 1.06e + 00
4.92e + 00 5.10e − 01
1.00e + 01 1.16e + 00
1.00e + 01 1.19e + 00
5.01e + 00 8.90e − 01
8.29e + 00 1.48e + 00
8.38e + 00 1.50e + 00
Table: Convergence characteristics for problem phillips with n = 40 over 500
runs
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
21 / 31
Newton’s Method converges in 5 − 10 Iterations
l
cb
0
0
0
1
1
1
2
2
2
1
2
3
1
2
3
1
2
3
Iterations k
mean
std
6.84e + 00 1.28e + 00
8.81e + 00 1.36e + 00
8.72e + 00 1.46e + 00
6.05e + 00 1.30e + 00
7.40e + 00 7.68e − 01
7.17e + 00 8.12e − 01
6.01e + 00 1.40e + 00
7.28e + 00 8.22e − 01
7.33e + 00 8.66e − 01
Table: Convergence characteristics for problem blur with n = 36 over 500
runs
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
21 / 31
Estimating The Error and Predictive Risk
l
cb
0
0
1
1
2
2
2
3
2
3
2
3
χ2
mean
4.37e − 03
4.32e − 03
4.35e − 03
4.39e − 03
4.50e − 03
4.37e − 03
Error
L
GCV
mean
mean
4.39e − 03 4.21e − 03
4.42e − 03 4.21e − 03
5.17e − 03 4.30e − 03
5.05e − 03 4.38e − 03
6.68e − 03 4.39e − 03
6.66e − 03 4.43e − 03
UPRE
mean
4.22e − 03
4.22e − 03
4.30e − 03
4.37e − 03
4.56e − 03
4.54e − 03
Table: Error characteristics for problem phillips with n = 60 over 500 runs with
error contaminated x0 . Relative errors larger than .009 removed.
Results are comparable
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
22 / 31
Estimating The Error and Predictive Risk
Risk
l
0
0
1
1
2
2
cb
χ2
2
3
2
3
2
3
mean
3.78e − 02
3.88e − 02
3.94e − 02
1.10e − 01
3.41e − 02
3.61e − 02
L
mean
5.22e − 02
5.10e − 02
5.71e − 02
5.90e − 02
6.00e − 02
5.98e − 02
GCV
mean
3.15e − 02
2.97e − 02
3.02e − 02
3.27e − 02
3.35e − 02
3.35e − 02
UPRE
mean
2.92e − 02
2.90e − 02
2.74e − 02
2.79e − 02
3.79e − 02
3.82e − 02
Table: Error characteristics for problem phillips with n = 60 over 500 runs
χ2 method does not give best estimate of risk
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
22 / 31
Estimating The Error and Predictive Risk
Error Histogram
Normal noise on rhs, first order derivative, Cb = σ 2 I
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
22 / 31
Estimating The Error and Predictive Risk
Error Histogram
Exponential noise on rhs, first order derivative, Cb = σ 2 I
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
22 / 31
Conclusions
χ2 Newton algorithm is cost effective
It performs as well ( or better) than GCV and UPRE when
statistical information is available.
Should be method of choice when statistical information is
provided
Method can be adapted to find Wb if Wx is provided.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
23 / 31
Future Work
Analyse for truncated expansions (TSVD and TGSVD) -reduce
the degrees of freedom.
Further theoretical analysis and simulations with other noise
distributions. Comparison new work of Rust & O’Leary 2007.
Can it be extended for nonlinear regularization terms? (TV?)
Development of the nonlinear least squares for general diagonal
Wx .
Efficient calculation of uncertainty information, covariance matrix.
Nonlinear problems?
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
24 / 31
Some Solutions: with no prior information x0
Illustrated are solutions and error bars
No Statistical Information
Solution is Smoothed
Renaut and Mead (ASU/Boise)
With statistical information
Cb = diag(σb2i )
Scalar Newton method
September 2007
25 / 31
Some Generalized Tikhonov Solutions: First Order Derivative
No Statistical Information
Renaut and Mead (ASU/Boise)
Scalar Newton method
Cb = diag(σb2i )
September 2007
26 / 31
Some Generalized Tikhonov Solutions: Prior x0 : Solution not smoothed
No Statistical Information
Renaut and Mead (ASU/Boise)
Scalar Newton method
Cb = diag(σb2i )
September 2007
27 / 31
Some Generalized Tikhonov Solutions: x0 = 0: Exponential noise
No Statistical Information
Renaut and Mead (ASU/Boise)
Scalar Newton method
Cb = diag(σb2i )
September 2007
28 / 31
Relationship to Discrepancy Principle
The discrepancy principle can be implemented by a Newton
method.
Finds σx such that the regularized residual satisfies
σb2 =
1
kb − Ax(σ)k22 .
m
(4)
Consistent with our notation
p
X
i=1
(
1
γi2 σ 2 + 1
m
X
)2 si2 +
si2 = m,
(5)
i=n+1
Weight in the first sum is squared here, otherwise functional is the
same.
But discrepancy principle often oversmooths. What happens
here?
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
29 / 31
Major References
Bennett A, 2005 Inverse Modeling of the Ocean and Atmosphere
(Cambridge University Press)
Hansen, P. C., 1994, Regularization Tools: A Matlab Package for
Analysis and Solution of Discrete Ill-posed Problems, Numerical
Algorithms 6 1-35.
Mead J., 2007, A priori weighting for parameter estimation, J. Inv.
Ill-posed Problems, to appear.
Rao, C. R., 1973, Linear Statistical Inference and its applications,
Wiley, New York.
Tarantola A 2005 Inverse Problem Theory and Methods for Model
Parameter Estimation (SIAM).
Vogel, C. R., 2002. Computational Methods for Inverse Problems,
(SIAM), Frontiers in Applied Mathematics.
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
30 / 31
blur Atmospheric (Gaussian PSF) (Hansen): Again with noise
Solution on Left and Degraded on the Right
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
31 / 31
blur Atmospheric (Gaussian PSF) (Hansen): Again with noise
Solutions using x0 = 0, Generalized Tikhonov Second Derivative 5% noise
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
31 / 31
blur Atmospheric (Gaussian PSF) (Hansen): Again with noise
Solutions using x0 = 0, Generalized Tikhonov Second Derivative 10% noise
Renaut and Mead (ASU/Boise)
Scalar Newton method
September 2007
31 / 31

Download Report

Regularization Parameter Estimation for Least Squares: A Newton

Paperzz.com

Your Paperzz