### Towards Solution of Large Scale Image Restoration and

```Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Towards Solution of Large Scale Image Restoration and Reconstruction
Problems
Rosemary Renaut
Joint work with Anne Gelb, Aditya Viswanathan, Hongbin Guo, Doug Cochran,Youzuo
Lin,
Arizona State University
November 4, 2009
National Science Foundation: Division of Computational Mathematics
1 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Outline
1
Motivation
Quick Review
2
Statistical Results for Least Squares
Summary of LS Statistical Results
Implications of Statistical Results for Regularized Least Squares
3
Newton algorithm
Algorithm with LSQR (Paige and Saunders)
Results
4
Large Scale Problems
Application in Image Reconstruction and Restoration
5
Stopping the EM Algorithm Statistically
6
Edge Detection for PSF Estimation
National Science Foundation: Division of Computational Mathematics
2 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Signal/Image Restoration:
Integral Model of Signal Degradation b(t) =
R
K(t, s)x(s)ds
K(t, s) describes blur of the signal.
Convolutional model: invariant K(t, s) = K(t − s) is Point Spread Function (PSF).
Typically sampling includes noise e(t), model is
Z
b(t) = K(t − s)x(s)ds + e(t)
Discrete model: given discrete samples b, find samples x of x
Let A discretize K, assume known, model is given by
b = Ax + e.
Naı̈vely invert the system to find x!
National Science Foundation: Division of Computational Mathematics
3 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Example 1-D Original and Blurred Noisy Signal
Original signal x.
Blurred and noisy signal b
Gaussian PSF.
National Science Foundation: Division of Computational Mathematics
4 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
The Solution: Regularization is needed
Naı̈ve Solution
A Regularized Solution
National Science Foundation: Division of Computational Mathematics
5 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Least Squares for Ax = b: A Quick Review
Background
Consider discrete systems: A ∈ Rm×n , b ∈ Rm , x ∈ Rn
Ax = b + e,
Classical Approach Linear Least Squares (A full rank)
xLS = arg min ||Ax − b||22
x
Difficulty xLS sensitive to changes in right hand side b when A is ill-conditioned.
For convolutional models system is ill-posed.
National Science Foundation: Division of Computational Mathematics
6 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Introduce Regularization to Find Acceptable Solution
Weighted Fidelity with Regularization
• Regularize
xRLS (λ) = arg min{kb − Axk2Wb + λ2 R(x)},
x
Weighting matrix Wb
• R(x) is a regularization term
• λ is a regularization parameter which is unknown.
Solution xRLS (λ)
depends on λ.
depends on regularization operator R
depends on the weighting matrix Wb
National Science Foundation: Division of Computational Mathematics
7 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
The Weighting Matrix:
Some Assumptions for Multiple Data Measurements
Given multiple measurements of data b:
Usually error in b, e is an m−vector of random measurement errors with mean 0 and
positive definite covariance matrix Cb = E(eeT ).
For uncorrelated heteroskedastic measurements Cb is diagonal matrix of standard
deviations of the errors. (Colored noise)
For white noise Cb = σ 2 I.
Weighting by Wb = Cb −1 in data fit term, theoretically, ẽ = Wb 1/2 e are uncorrelated.
Difficulty if Wb increases ill-conditioning of A!
For images find Wb from the image data
National Science Foundation: Division of Computational Mathematics
8 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Formulation: Generalized Tikhonov Regularization With Weighting
Use R(x) = kD(x − x0 )k2
x̂ = argmin J(x) = argmin{kAx − bk2Wb + λ2 kD(x − x0 )k2 }.
(1)
D is a suitable operator, often derivative approximation.
Assume N (A) ∩ N (D) = {0}
x0 is a reference solution, often x0 = 0, might need to be average solution.
Having found λ, the posterior inverse covariance matrix is
W̃x = AT Wb A + λ2 I
Posterior information can give some confidence on parameter estimates.
National Science Foundation: Division of Computational Mathematics
9 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ is crucial: an example with D = I.
National Science Foundation: Division of Computational Mathematics
10 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Choice of λ crucial: Different algorithms - different solutions.
Discrepancy Principle
Suppose noise is white: Cb = σb2 I.
Find λ such that the regularized residual satisfies
σb2 =
1
kb − Ax(λ)k22 .
m
Can be implemented by a Newton root finding algorithm.
But discrepancy principle typically oversmooths.
Others [Vog02]
L-Curve
Generalized Cross Validation (GCV)
Unbiased Predictive Risk (UPRE)
National Science Foundation: Division of Computational Mathematics
11 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ) − A)b: Influence
Matrix
A(λ) = A(AT Wb A + λ2 DT D)−1 AT
Plot
log(kDxk), log(kr(λ)k)
Find corner
Expensive - requires range of λ.
GSVD makes calculations efficient.
Not statistically based
No corner
National Science Foundation: Division of Computational Mathematics
12 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Generalized Cross-Validation (GCV)
Let
A(λ) = A(AT Wb A + λ2 DT D)−1 AT
Can pick Wb = I.
Minimize GCV function
kb − Ax(λ)k2Wb
,
[trace(Im − A(λ))]2
Multiple minima
which estimates predictive risk.
Expensive - requires range of λ.
GSVD makes calculations efficient.
Requires minimum
Sometimes flat
National Science Foundation: Division of Computational Mathematics
13 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Unbiased Predictive Risk Estimation (UPRE)
Minimize expected value of predictive
risk: Minimize UPRE function
kb − Ax(λ)k2Wb
+2 trace(A(λ)) − m
Expensive - requires range of λ.
GSVD makes calculations efficient.
Need estimate of trace
Minimum needed
National Science Foundation: Division of Computational Mathematics
14 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Background: Statistics of the Least Squares Problem
Theorem (Rao73)
Let r be the rank of A and for b ∼ N (Ax, σb2 I), (errors in measurements are normally
distributed with mean 0 and covariance σb2 I), then
J = min kAx − bk2 ∼ σb2 χ2 (m − r).
x
2
J follows a χ distribution with m − r degrees of freedom:
Basically the Discrepancy Principle
Corollary (Weighted Least Squares)
For b ∼ N (Ax, Cb ), Wb = Cb −1 then
J = min kAx − bk2Wb ∼ χ2 (m − r).
x
National Science Foundation: Division of Computational Mathematics
15 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Extension: Statistics of the Regularized Least Squares Problem
Thm: χ2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting
Matrix on Regularization term.)
x̂ = argmin JD (x) = argmin{kAx − bk2Wb + k(x − x0 )k2WD },
WD = DT Wx D. (2)
Assume
Wb and Wx are symmetric positive definite.
Problem is uniquely solvable N (A) ∩ N (D) = {0}.
Moore-Penrose generalized inverse of WD is CD
Statistics: Errors in the right hand side e ∼ N (0, Cb ), and x0 is known so that
(x − x0 ) = f ∼ N (0, CD ),
x0 is the mean vector of the model parameters.
Then
JD (x̂(WD )) ∼ χ2 (m + p − n)
National Science Foundation: Division of Computational Mathematics
16 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Significance of the χ2 result
JD ∼ χ2 (m + p − n)
For sufficiently large m̃ = m + p − n
E(J(x(WD ))) = m + p − n
Moreover
m̃ −
E(JJ T ) = 2(m + p − n)
√
√
2m̃zα/2 < J(x̂(WD )) < m̃ + 2m̃zα/2 .
(3)
2
zα/2 is the relevant z-value for a χ -distribution with m̃ = m + p − n degrees
National Science Foundation: Division of Computational Mathematics
17 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
General Result [MR09b], [RHM09], [MR09a]
The Cost Functional follows a χ2 Statistical Distribution
If x0 is not the mean value, then we introduce a non-central χ2 distribution with centrality
parameter c.
If the problem is rank deficient the degrees of freedom are reduced.
Suppose degrees of freedom m̃ and centrality parameter c then
E(JD ) = m̃ + c
T
E(JD JD
) = 2(m̃) + 4c
Suggests: Try to find WD so that E(J) = m̃ + c
First find λ only.
Find Wx = λ2 I
National Science Foundation: Division of Computational Mathematics
18 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
What do we need to apply the Theory?
Requirements
Covariance Cb on data parameters b (or on model parameters x!)
A priori information x0 , mean x.
But x (and hence x0 ) are not known.
If not known use repeated data measurements calculate Cb and mean b.
Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence
c = kck22 = kQ̃U T Wb 1/2 (b − Ax0 )k22
E(JD ) = E(kQ̃U T Wb 1/2 (b − Ax0 )k22 ) = m + p − n + kck22
Given the GSVD estimate the degrees of freedom m̃.
Then we can use E(J) to find λ
National Science Foundation: Division of Computational Mathematics
19 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Assume x0 is the mean (experimentalists know something about model parameters)
DESIGNING THE ALGORITHM: I
Recall: if Cb and Cx are good estimates of covariance
|JD (x̂) − (m + p − n)|
should be small.
Thus, let m̃ = m + p − n then we want
√
√
m̃ − 2m̃zα/2 < J(x(WD )) < m̃ + 2m̃zα/2 .
zα/2 is the relevant z-value for a χ2 -distribution with m̃ degrees
GOAL
Find Wx to make (3) tight: Single Variable case find λ
JD (x̂(λ)) ≈ m̃
National Science Foundation: Division of Computational Mathematics
20 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra)
Newton to Solve F (σ) = JD (σ) − m̃ = 0
We use σ = 1/λ, and y(σ (k) ) is the current solution for which
x(σ (k) ) = y(σ (k) ) + x0
Then
2
∂
J(σ) = − 3 kDy(σ)k2 < 0
∂σ
σ
Hence we have a basic Newton Iteration
σ (k+1) = σ (k) (1 +
1 σ (k) 2
(
) (JD (σ (k) ) − m̃)).
2 kDyk
National Science Foundation: Division of Computational Mathematics
21 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Practical Details of Algorithm: Large Scale problems
Algorithm
Initialization
Convert generalized Tikhonov problem to standard form.( if L is not invertible you just
need to know how to find Ax and AT x, and the null space of L)
Use LSQR (Paige and Saunders) algorithm to find the bidiagonal matrix for the projected
problem.
Obtain a solution of the bidiagonal problem for given initial σ.
Subsequent Steps
Increase dimension of space if needed with reuse of existing bidiagonalization. May also
use smaller size system if appropriate.
Each σ calculation of algorithm reuses saved information from the Lancos
bidiagonalization.
National Science Foundation: Division of Computational Mathematics
22 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Illustrating the Results for Problem Size 512: Two Standard Test Problems
Comparison for noise level 10%. On left D = I and on right D is first derivative
Notice L-curve and χ2 -LSQR perform well.
UPRE does not perform well.
National Science Foundation: Division of Computational Mathematics
23 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Real Data: Seismic Signal Restoration
The Data Set and Goal
Real data set of 48 signals of length 3000.
The point spread function is derived from the signals.
Calculate the signal variance pointwise over all 48 signals.
Goal: restore the signal x from Ax = b, where A is PSF matrix and b is given blurred
signal.
Method of Comparison- no exact solution known: use convergence with respect to
downsampling.
National Science Foundation: Division of Computational Mathematics
24 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Comparison High Resolution White noise
Greater contrast with χ2 . UPRE is insufficiently regularized.
L-curve severely undersmooths (not shown). Parameters not consistent across resolutions
National Science Foundation: Division of Computational Mathematics
25 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
THE UPRE SOLUTION: x0 = 0 White Noise
Regularization Parameters are consistent: σ = 0.01005 all resolutions
National Science Foundation: Division of Computational Mathematics
26 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
THE LSQR Hybrid SOLUTION: White Noise
Regularization quite consistent resolution 2 to 100
σ = 0.0000029, .0000029, .0000029, .0000057, .0000057
National Science Foundation: Division of Computational Mathematics
27 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Illustrating the Deblurring Result: Problem Size 65536
Example taken from RESTORE TOOLS Nagy et al 2007-8: 15% Noise
Computational Cost is Minimal: Projected Problem Size is 15, λ = .58
National Science Foundation: Division of Computational Mathematics
28 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Problem Grain noise 15% added : increasing subproblem size to validate against
increasing subproblem size
(a) Signal to noise ratio 10 log10 (1/e) relative error e
(b) Regularization Parameter Against Problem
Size
National Science Foundation: Division of Computational Mathematics
29 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Illustrating the progress of the Newton algorithm post LSQR
National Science Foundation: Division of Computational Mathematics
30 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Illustrating the progress of the Newton algorithm with LSQR
National Science Foundation: Division of Computational Mathematics
31 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Problem Grain noise 15% added for increasing subproblem size
Figure: Signal to noise ratio 10 log10 (1/e) relative error e
National Science Foundation: Division of Computational Mathematics
32 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
An Alternative Direction For Large Scale Problems : Domain Decomposition [Ren98]
Domain decomposition of x into several domains:
x = (xT1 , xT2 , ..., xTp )T .
Corresponding to different splitting of image x, kernel operator A is split
A = (A1 , A2 , ..., Ap ).
eg: Different Splitting Schemes
National Science Foundation: Division of Computational Mathematics
33 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Formulation - Regularized Least Squares [LRG09]
The linear system Ax ≈ b is replaced with the split systems
X
Ai yi ≈ bi (x), bi (x) = b −
Aj xj = b − Ax + Ai xi .
j6=i
Locally solve Ax ≈ b
min kAi yi − bi (x)k2 ,
yi ∈<ni
1 ≤ i ≤ p.
If the problem is ill-posed we have the regularized problem
˘
¯
min k Ax − b k22 +λ2 kDxk22 .
f
Similarly, we will have splitting on operator assuming local regularization
«„
«
„
««
„
« „„
A2
Ap
A
A1
.
=
···
λ2 D2
λp Dp
λ1 D1
DΛ
Solve iteratively using novel updates for changing right hand sides, [Saa87], [CW97]
Update Scheme - Global solution update from local solutions at step k to step k + 1
x(k+1) =
p
X
(k+1)
τi
(k+1)
(xlocal )i
,
i=1
(k+1)
where (xlocal )i
(k)
(k)
(k+1) T
= ((x1 )T , . . . , (xi−1 )T , (yi
(k)
(k)
) , (xi+1 )T , . . . , (xp )T )T
National Science Foundation: Division of Computational Mathematics
34 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Feasibility of the Approach 1-D Phillips, size 1024, noise level 6% Regularization .25
(a) No Splitting Rel Error .0525
(b) 4 Sub Problems Rel Error .0499
National Science Foundation: Division of Computational Mathematics
35 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
2-D PET Reconstruction, Size 64 × 64, Noise Level 1.5% Regularization .2
(c) No Splitting SNR 11.73DB
(d) 4 Sub Problems SNR 12.24DB
National Science Foundation: Division of Computational Mathematics
36 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
A New Stopping Rule for the EM Method [GR09])
Quick Rationale PET
Well-known that ML-EM method converges to overly noisy solutions, iterative methods
have to stop before convergence [SV82], [HHL92], [HL94].
P
Detected counts in tube i are Poisson with mean bi = j aij xj . Hence basic relationship
is b ≈ Ax, A is projection matrix, b are counts and x is the density to be reconstructed.
EM iteration: x(k+1) = (AT (b./(Ax(k) ))). ∗ x(k) .
(e) True
(f) k = 95
National Science Foundation: Division of Computational Mathematics
(g) k = 500
37 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
A New Estimate of the Stopping Level
Algorithm: For k until converged for m tubes
Calculate step of EM, x(k) .
Update tube means b(k) = Ax(k) . Bin all tubes to have b(k) > 20.
√
Calculate y = (b − b(k) )./ b(k) , then y ∼ N (0, 1)
Calculate mean ȳ and sample standard deviation s for yi , i = 1 : m.
√
Calculate α = m − 1ȳ/s, and pt (α), α ∼ t(m − 1) (t-student density with m − 1
degrees of freedom).
p
Calculate β = (m − 1)s2 and pN (β), β ∼ N (m − 1, 2(m − 1)) (Gaussian density
mean m − 1).
Calculate likelihood of sampling α and β from the two distributions: l(k) = pt (α)pN (β).
When l(k) is maximum, l(k) < l(k−1) . STOP Solution is x(k−1) .
National Science Foundation: Division of Computational Mathematics
38 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Simulations: Validation
Table: The best and the predicted stopping step for 11 simulations
Best
Pred
95
96
90
88
89
94
90
89
90
89
95
95
90
100
92
100
89
95
94
93
91
94
Figure: l(k) = pt (α)pN (β) for 500 steps. Maximized at k = 96.
National Science Foundation: Division of Computational Mathematics
39 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Extension for Mammogram Denoising: Early Ideas
The Model
Assume blurring of the mammogram by PSF kernel K and measured image is b.
Deconvolve in the Fourier domain and invert to give noisy estimate of optical density d.
√
Each entry of d is a linear combination of x-ray energy with Poisson noise, and d is
close to normally distributed [ANS48].
To find true optical density x denoise the deconvolved d. Use total variation
min
x
m
X
√
√
( di − xi )2 + λ2 kxkT V , s. t. x ≥ 0.
i=1
Given knowledge of variance in noise in x-ray automatically select λ using statistical
estimation approach.
National Science Foundation: Division of Computational Mathematics
40 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Trial Experiment: Use data set from UoFlorida (DDSM) cancer case 0001, left breast with CC
scanning angle
(a) Original Image
(b) Restored Image
Figure: Total yellow (calcification) reduced by deblurring. Rectangle at bottom rhs indicates deblurring
National Science Foundation: Division of Computational Mathematics
41 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
PSF Estimation in Blurring Problems using Edge Detection( Cochran, Gelb, Viswanathan,
Renaut, Stefan)
Given the blurring model (PSF convolution operator K) and x ∈ L2 (−π, π) piecewise-smooth.
We estimate the psf starting with 2N + 1 blurred Fourier coefficients b̂(j), j = −N, ..., N .
b=K ∗x+e
Principle:
Apply a linear edge detector, denote by T . We shall assume that the edge detector can be
written as a convolution with an appropriate kernel
T ∗ (K ∗ x + e) = (K ∗ x + e) ∗ T
=x∗K ∗T +e∗T
= (x ∗ T ) ∗ K + e ∗ T
≈ [x] ∗ K + ẽ
Here [x](s) is a jump function. For a jump discontinuity in a function the jump function at any
point s only depends on the values of x at s+ and s− .
[x](s) := x(s+ ) − x(s− )
Hence, we observe shifted and scaled replicates of the psf.
National Science Foundation: Division of Computational Mathematics
42 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Example (No Noise)
Function
2
1.5
1.5
1
1
h(x)
f(x)
0.5
0
0.5
−0.5
−1
0
−1.5
−2
−3
−2
−1
0
x
1
2
−0.5
3
−3
(a) True function
−1
0
x
1
2
3
2
3
(b) Motion Blur PSF
Blurred Function
1.5
−2
2
1.5
1
1
0.5
f(x)
g(x)
0.5
0
0
−0.5
−0.5
−1
Function
−1
−1.5
Blur after edge detection
True blur (normalized)
−1.5
−3
−2
−1
0
x
1
(c) Blurred Function
2
3
−2
−3
−2
−1
0
x
1
(d) After applying edge detection
Figure: Function subjected to motion blur, N = 128
National Science Foundation: Division of Computational Mathematics
43 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
2
2
1.5
1.5
1
1
0.5
0.5
f(x)
f(x)
Representative Examples : Gaussian PSF
0
−0.5
0
−0.5
−1
−1
Function
Function
Blurred, noisy function
−1.5
Blurred, noisy function
−1.5
Blur after edge detection
Blur after edge detection
True blur (normalized)
−2
−3
−2
−1
True blur (normalized)
0
x
1
2
(a) Noisy blur estimation
3
−2
−3
−2
−1
0
x
1
2
3
(b) After low-pass filtering
Figure: Function subjected to Gaussian blur, N = 128
Complex noise distribution on Fourier coefficients – ê ∼ N (0,
1.5
)
(2N +1)2
Second picture subjected to low-pass (Gaussian) filtering
It is conceivable that parameter estimation for a Gaussian PSF can take into account the
effect of Gaussian filtering
National Science Foundation: Division of Computational Mathematics
44 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Representative Examples: Motion Blur
2
2
1.5
1.5
1
1
0.5
0.5
Function
Blurred, noisy function
f(x)
f(x)
Blur after edge detection
0
−0.5
True blur (normalized)
0
−0.5
−1
−1
Function
Blurred, noisy function
−1.5
−1.5
Blur after edge detection
True blur (normalized)
−2
−3
−2
−1
0
x
1
2
(a) Noisy blur estimation
3
−2
−3
−2
−1
0
x
1
2
3
(b) After TV denoising
Figure: Function subjected to Motion blur, N = 128
Cannot perform conventional low-pass filtering since blur is piecewise-smooth
We compute the noisy blur estimate for Fourier expansion of blurred jump
SN [b] ≈ [x] ∗ K + ẽ
Denoising problem formulation
min
x
k x − SN [b] k22 + λ2 k Dx k1 .
National Science Foundation: Division of Computational Mathematics
45 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Future Work Combining Approaches
Extend the parameter selection methods to the domain decomposition problems for large
scale.
Use efficient schemes for large scale problems - eg right hand side updates
Extend to edge detection approaches
Use tensor product of the PSF for extension to 2D - is it feasible
Use parameter estimation techniques for the 2D problem
Further development of statistical techniques for estimating acceptable solutions.
National Science Foundation: Division of Computational Mathematics
46 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Bibliography I
F. J. ANSCOMBE.
The transformation of poisson, binomial and negative-binomial data.
Biometrika, 35:246–254, 1948.
T. F. Chan and W. L. Wan.
Analysis of projection methods for solving linear systems with multiple right-hand sides.
1997.
H. Guo and R. A. Renaut.
Revisiting stopping rules for iterative methods used in emission tomography: Analysis and
developments.
Physics of Medicine and Biology, submitted, 2009.
H. M. Hudson, B. F. Hutton, and R. Larkin.
Accelerated EM reconstruction using ordered subsets.
J. Nucl. Med., 33:960, 1992.
H. M. Hudson and R. Larkin.
Accelerated imaging reconstruction using orded subsets of projection data.
IEEE Trans. Med. Imag., 13(4):601–609, 1994.
National Science Foundation: Division of Computational Mathematics
47 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Bibliography II
Y. Lin, R. A. Renaut, and H. Guo.
Multisplitting for regularized least squares.
in prep., 2009.
J. Mead and R. A. Renaut.
Least squares problems with inequality constraints as quadratic constraints.
Linear Algebra and its Applications, 2009.
J. Mead and R. A. Renaut.
A Newton root-finding algorithm for estimating the regularization parameter for solving
ill-conditioned least squares problems.
Inverse Problems, 25, 2009.
R. A. Renaut.
A parallel multisplitting solution of the least squares problem.
BIT, 1998.
R. A Renaut, I. Hnetynkova, and J. Mead.
Regularization parameter estimation for large scale Tikhonov regularization using a priori
information.
Computational Statistics and Data Analysis, 54(1), 2009.
National Science Foundation: Division of Computational Mathematics
48 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Bibliography III
On the Lanczos method for solving symmetric linear systems with several right-hand
sides.
1987.
L. A. Shepp and Y. Vardi.
Maximum likelihood reconstruction for emission tomography.
IEEE Trans. Med. Imag., MI-1(2):113–122, Oct. 1982.
Curtis R. Vogel.
Computational Methods for Inverse Problems.
Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.
National Science Foundation: Division of Computational Mathematics
49 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Future Work
Other Results and Future Work
Software Package!
Diagonal Weighting Schemes
Edge preserving regularization - Total Variation
Better handling of Colored Noise.
Residual Periodogram for large scale.
National Science Foundation: Division of Computational Mathematics
50 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Algorithm Using the GSVD
GSVD
Use GSVD of [Wb 1/2 A, D]
For γi the generalized singular values, and s = U T Wb 1/2 r
m̃ = m − n + p
s̃i = si /(γi2 σx2 + 1), i = 1, . . . , p,
Find root of
F (σx ) =
p
X
(
i=1
ti = s̃i γi .
1
γi2 σx2 + 1
)s2i +
m
X
s2i − m̃ = 0
i=n+1
Equivalently: solve F = 0, where
F (σx ) = sT s̃ − m̃
and F 0 (σx ) = −2σx ktk22 .
National Science Foundation: Division of Computational Mathematics
51 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Practical Details of Algorithm
Find the parameter
Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields
sigmamax and sigmamin
Step 2: Calculate step, with steepness controlled by tolD. Let t = Dy/σ (k) , where y is
the current update, then
step =
1
1
(
)2 (JD (σ (k) ) − m̃)
2 max {ktk, tolD}
Step 3: Introduce line search α(k) in Newton
sigmanew = σ (k) (1 + α(k) step)
National Science Foundation: Division of Computational Mathematics
52 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Key Aspects of the Proof I: The Functional J
Algebraic Simplifications: Rewrite functional as quadratic form
Regularized solution given in terms of regularization matrix R(WD )
x̂
R(WD )
=
x0 + (AT Wb A + DT Wx D)−1 AT Wb r,
=
x0 + R(WD )Wb
=
x0 + y(WD ).
=
T
1/2
(4)
r = b − Ax0
r,
(5)
T
−1
(A Wb A + D Wx D)
T
A Wb
1/2
(6)
Functional is given in terms of influence matrix A(WD )
A(WD )
=
Wb 1/2 AR(WD )
JD (x̂)
=
rT Wb 1/2 (Im − A(WD ))Wb 1/2 r,
=
T
(7)
let r̃ = Wb 1/2 r
r̃ (Im − A(WD ))r̃. A Quadratic Form
National Science Foundation: Division of Computational Mathematics
(8)
(9)
53 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Key Aspects of the Proof II : Properties of a Quadratic Form
χ2 distribution of Quadratic Forms xT P x for normal variables (Fisher-Cochran Theorem)
Components xi are independent normal variables xi ∼ N (0, 1), i = 1 : n.
A necessary and sufficient condition that xT P x has a central χ2 distribution is that P is
idempotent, P 2 = P . In which case the degrees of freedom of χ2 is
rank(P ) =trace(P ) = n. .
When the means of xi are µi 6= 0, xT P x has a non-central χ2 distribution, with
non-centrality parameter c = µT P µ
A χ2 random variable with n degrees of freedom and centrality parameter c has mean
n + c and variance 2(n + 2c).
National Science Foundation: Division of Computational Mathematics
54 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Key Aspects of the Proof III: Requires the GSVD
Lemma
Assume invertibility and m ≥ n ≥ p. There exist unitary matrices U ∈ Rm×m , V ∈ Rp×p ,
and a nonsingular matrix X ∈ Rn×n such that
»
–
Υ
A=U
X T D = V [M, 0p×(n−p) ]X T ,
(10)
0(m−n)×n
Υ = diag(υ1 , . . . , υp , 1, . . . , 1) ∈ Rn×n ,
0 ≤ υ1 ≤ · · · ≤ υp ≤ 1,
υi2 + µ2i = 1,
M = diag(µ1 , . . . , µp ) ∈ Rp×p ,
1 ≥ µ1 ≥ · · · ≥ µp > 0,
i = 1, . . . p.
(11)
The Functional with the GSVD
Let Q̃ = diag(µ1 , . . . , µp , 0n−p , Im−n )
then J
=
r̃T (Im − A(WD ))r̃ = kQ̃U T r̃k22 = kkk22
National Science Foundation: Division of Computational Mathematics
55 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
Proof IV: Statistical Distribution of the Weighted Residual
Covariance Structure
Errors in b are e ∼ N (0, Cb ). Now b depends on x, b = Ax hence we can show
b ∼ N (Ax0 , Cb + ACD AT ) (x0 is mean of x)
Residual r = b − Ax ∼ N (0, Cb + ACD AT ).
r̃ = Wb 1/2 r ∼ N (0, I + ÃCD ÃT ), Ã = Wb 1/2 A.
Use the GSVD
I + ÃCD ÃT = U Q−2 U T ,
Q = diag(µ1 , . . . , µp , In−p , Im−n )
Now k = QU r̃ then k ∼ N (0, QU (U Q−2 U T )U Q) ∼ N (0, Im )
T
T
But J = kQ̃U T r̃k2 = kk̃k2 , where k̃ is the vector k excluding components p + 1 : n.
Thus
JD ∼ χ2 (m + p − n).
National Science Foundation: Division of Computational Mathematics
56 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
When mean of the parameters is not known, or x0 = 0 is not the mean
Corollary: non-central χ2 distribution of the regularized functional
Recall
x̂ = argmin JD (x) = argmin{kAx − bk2Wb + k(x − x0 )k2WD },
WD = DT Wx D.
Assume all assumptions as before, but x 6= x0 is the mean vector of the model parameters.
Let
c = kck22 = kQ̃U T Wb 1/2 A(x − x0 )k22
Then
JD ∼ χ2 (m + p − n, c)
The functional at optimum follows a non central χ2 distribution
National Science Foundation: Division of Computational Mathematics
57 / 49
Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation
A further result when A is not of full column rank
The Rank Deficient Solution
Suppose A is not full column rank. Then the filtered solution can be written in terms of the
GSVD
xFILT (λ) =
p
X
p
n
n
X
X
X
fi
γi2
s
x̃
+
s
x̃
=
s
x̃
+
si x̃i .
i
i
i
i
i
i
+ λ2 )
υi
i=p+1
i=1
i=p+1
υi (γi2
i=p+1−r
Here fi = 0, i = 1 : p − r, fi = γi2 /(γi2 + λ2 ), i = p − r + 1 : p. This yields
J(xFILT (λ)) ∼ χ2 (m − n + r, c)
notice degrees of freedom are reduced.
National Science Foundation: Division of Computational Mathematics
58 / 49
```