Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Towards Solution of Large Scale Image Restoration and Reconstruction Problems Rosemary Renaut Joint work with Anne Gelb, Aditya Viswanathan, Hongbin Guo, Doug Cochran,Youzuo Lin, Arizona State University Jodi Mead, Boise State University November 4, 2009 National Science Foundation: Division of Computational Mathematics 1 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Outline 1 Motivation Quick Review 2 Statistical Results for Least Squares Summary of LS Statistical Results Implications of Statistical Results for Regularized Least Squares 3 Newton algorithm Algorithm with LSQR (Paige and Saunders) Results 4 Large Scale Problems Application in Image Reconstruction and Restoration 5 Stopping the EM Algorithm Statistically 6 Edge Detection for PSF Estimation National Science Foundation: Division of Computational Mathematics 2 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Signal/Image Restoration: Integral Model of Signal Degradation b(t) = R K(t, s)x(s)ds K(t, s) describes blur of the signal. Convolutional model: invariant K(t, s) = K(t − s) is Point Spread Function (PSF). Typically sampling includes noise e(t), model is Z b(t) = K(t − s)x(s)ds + e(t) Discrete model: given discrete samples b, find samples x of x Let A discretize K, assume known, model is given by b = Ax + e. Naı̈vely invert the system to find x! National Science Foundation: Division of Computational Mathematics 3 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Example 1-D Original and Blurred Noisy Signal Original signal x. Blurred and noisy signal b Gaussian PSF. National Science Foundation: Division of Computational Mathematics 4 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation The Solution: Regularization is needed Naı̈ve Solution A Regularized Solution National Science Foundation: Division of Computational Mathematics 5 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Least Squares for Ax = b: A Quick Review Background Consider discrete systems: A ∈ Rm×n , b ∈ Rm , x ∈ Rn Ax = b + e, Classical Approach Linear Least Squares (A full rank) xLS = arg min ||Ax − b||22 x Difficulty xLS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 6 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Introduce Regularization to Find Acceptable Solution Weighted Fidelity with Regularization • Regularize xRLS (λ) = arg min{kb − Axk2Wb + λ2 R(x)}, x Weighting matrix Wb • R(x) is a regularization term • λ is a regularization parameter which is unknown. Solution xRLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix Wb National Science Foundation: Division of Computational Mathematics 7 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m−vector of random measurement errors with mean 0 and positive definite covariance matrix Cb = E(eeT ). For uncorrelated heteroskedastic measurements Cb is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise Cb = σ 2 I. Weighting by Wb = Cb −1 in data fit term, theoretically, ẽ = Wb 1/2 e are uncorrelated. Difficulty if Wb increases ill-conditioning of A! For images find Wb from the image data National Science Foundation: Division of Computational Mathematics 8 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = kD(x − x0 )k2 x̂ = argmin J(x) = argmin{kAx − bk2Wb + λ2 kD(x − x0 )k2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) ∩ N (D) = {0} x0 is a reference solution, often x0 = 0, might need to be average solution. Having found λ, the posterior inverse covariance matrix is W̃x = AT Wb A + λ2 I Posterior information can give some confidence on parameter estimates. National Science Foundation: Division of Computational Mathematics 9 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ is crucial: an example with D = I. National Science Foundation: Division of Computational Mathematics 10 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Choice of λ crucial: Different algorithms - different solutions. Discrepancy Principle Suppose noise is white: Cb = σb2 I. Find λ such that the regularized residual satisfies σb2 = 1 kb − Ax(λ)k22 . m Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. Others [Vog02] L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 11 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Some standard approaches I: L-curve - Find the corner Let r(λ) = (A(λ) − A)b: Influence Matrix A(λ) = A(AT Wb A + λ2 DT D)−1 AT Plot log(kDxk), log(kr(λ)k) Find corner Trade off contributions. Expensive - requires range of λ. GSVD makes calculations efficient. Not statistically based No corner National Science Foundation: Division of Computational Mathematics 12 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Generalized Cross-Validation (GCV) Let A(λ) = A(AT Wb A + λ2 DT D)−1 AT Can pick Wb = I. Minimize GCV function kb − Ax(λ)k2Wb , [trace(Im − A(λ))]2 Multiple minima which estimates predictive risk. Expensive - requires range of λ. GSVD makes calculations efficient. Requires minimum Sometimes flat National Science Foundation: Division of Computational Mathematics 13 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Unbiased Predictive Risk Estimation (UPRE) Minimize expected value of predictive risk: Minimize UPRE function kb − Ax(λ)k2Wb +2 trace(A(λ)) − m Expensive - requires range of λ. GSVD makes calculations efficient. Need estimate of trace Minimum needed National Science Foundation: Division of Computational Mathematics 14 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Background: Statistics of the Least Squares Problem Theorem (Rao73) Let r be the rank of A and for b ∼ N (Ax, σb2 I), (errors in measurements are normally distributed with mean 0 and covariance σb2 I), then J = min kAx − bk2 ∼ σb2 χ2 (m − r). x 2 J follows a χ distribution with m − r degrees of freedom: Basically the Discrepancy Principle Corollary (Weighted Least Squares) For b ∼ N (Ax, Cb ), Wb = Cb −1 then J = min kAx − bk2Wb ∼ χ2 (m − r). x National Science Foundation: Division of Computational Mathematics 15 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Extension: Statistics of the Regularized Least Squares Problem Thm: χ2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) x̂ = argmin JD (x) = argmin{kAx − bk2Wb + k(x − x0 )k2WD }, WD = DT Wx D. (2) Assume Wb and Wx are symmetric positive definite. Problem is uniquely solvable N (A) ∩ N (D) = {0}. Moore-Penrose generalized inverse of WD is CD Statistics: Errors in the right hand side e ∼ N (0, Cb ), and x0 is known so that (x − x0 ) = f ∼ N (0, CD ), x0 is the mean vector of the model parameters. Then JD (x̂(WD )) ∼ χ2 (m + p − n) National Science Foundation: Division of Computational Mathematics 16 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Significance of the χ2 result JD ∼ χ2 (m + p − n) For sufficiently large m̃ = m + p − n E(J(x(WD ))) = m + p − n Moreover m̃ − E(JJ T ) = 2(m + p − n) √ √ 2m̃zα/2 < J(x̂(WD )) < m̃ + 2m̃zα/2 . (3) 2 zα/2 is the relevant z-value for a χ -distribution with m̃ = m + p − n degrees National Science Foundation: Division of Computational Mathematics 17 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation General Result [MR09b], [RHM09], [MR09a] The Cost Functional follows a χ2 Statistical Distribution If x0 is not the mean value, then we introduce a non-central χ2 distribution with centrality parameter c. If the problem is rank deficient the degrees of freedom are reduced. Suppose degrees of freedom m̃ and centrality parameter c then E(JD ) = m̃ + c T E(JD JD ) = 2(m̃) + 4c Suggests: Try to find WD so that E(J) = m̃ + c First find λ only. Find Wx = λ2 I National Science Foundation: Division of Computational Mathematics 18 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation What do we need to apply the Theory? Requirements Covariance Cb on data parameters b (or on model parameters x!) A priori information x0 , mean x. But x (and hence x0 ) are not known. If not known use repeated data measurements calculate Cb and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = kck22 = kQ̃U T Wb 1/2 (b − Ax0 )k22 E(JD ) = E(kQ̃U T Wb 1/2 (b − Ax0 )k22 ) = m + p − n + kck22 Given the GSVD estimate the degrees of freedom m̃. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 19 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Assume x0 is the mean (experimentalists know something about model parameters) DESIGNING THE ALGORITHM: I Recall: if Cb and Cx are good estimates of covariance |JD (x̂) − (m + p − n)| should be small. Thus, let m̃ = m + p − n then we want √ √ m̃ − 2m̃zα/2 < J(x(WD )) < m̃ + 2m̃zα/2 . zα/2 is the relevant z-value for a χ2 -distribution with m̃ degrees GOAL Find Wx to make (3) tight: Single Variable case find λ JD (x̂(λ)) ≈ m̃ National Science Foundation: Division of Computational Mathematics 20 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra) Newton to Solve F (σ) = JD (σ) − m̃ = 0 We use σ = 1/λ, and y(σ (k) ) is the current solution for which x(σ (k) ) = y(σ (k) ) + x0 Then 2 ∂ J(σ) = − 3 kDy(σ)k2 < 0 ∂σ σ Hence we have a basic Newton Iteration σ (k+1) = σ (k) (1 + 1 σ (k) 2 ( ) (JD (σ (k) ) − m̃)). 2 kDyk National Science Foundation: Division of Computational Mathematics 21 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Practical Details of Algorithm: Large Scale problems Algorithm Initialization Convert generalized Tikhonov problem to standard form.( if L is not invertible you just need to know how to find Ax and AT x, and the null space of L) Use LSQR (Paige and Saunders) algorithm to find the bidiagonal matrix for the projected problem. Obtain a solution of the bidiagonal problem for given initial σ. Subsequent Steps Increase dimension of space if needed with reuse of existing bidiagonalization. May also use smaller size system if appropriate. Each σ calculation of algorithm reuses saved information from the Lancos bidiagonalization. National Science Foundation: Division of Computational Mathematics 22 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Illustrating the Results for Problem Size 512: Two Standard Test Problems Comparison for noise level 10%. On left D = I and on right D is first derivative Notice L-curve and χ2 -LSQR perform well. UPRE does not perform well. National Science Foundation: Division of Computational Mathematics 23 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Real Data: Seismic Signal Restoration The Data Set and Goal Real data set of 48 signals of length 3000. The point spread function is derived from the signals. Calculate the signal variance pointwise over all 48 signals. Goal: restore the signal x from Ax = b, where A is PSF matrix and b is given blurred signal. Method of Comparison- no exact solution known: use convergence with respect to downsampling. National Science Foundation: Division of Computational Mathematics 24 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Comparison High Resolution White noise Greater contrast with χ2 . UPRE is insufficiently regularized. L-curve severely undersmooths (not shown). Parameters not consistent across resolutions National Science Foundation: Division of Computational Mathematics 25 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation THE UPRE SOLUTION: x0 = 0 White Noise Regularization Parameters are consistent: σ = 0.01005 all resolutions National Science Foundation: Division of Computational Mathematics 26 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation THE LSQR Hybrid SOLUTION: White Noise Regularization quite consistent resolution 2 to 100 σ = 0.0000029, .0000029, .0000029, .0000057, .0000057 National Science Foundation: Division of Computational Mathematics 27 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Illustrating the Deblurring Result: Problem Size 65536 Example taken from RESTORE TOOLS Nagy et al 2007-8: 15% Noise Computational Cost is Minimal: Projected Problem Size is 15, λ = .58 National Science Foundation: Division of Computational Mathematics 28 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Problem Grain noise 15% added : increasing subproblem size to validate against increasing subproblem size (a) Signal to noise ratio 10 log10 (1/e) relative error e (b) Regularization Parameter Against Problem Size National Science Foundation: Division of Computational Mathematics 29 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Illustrating the progress of the Newton algorithm post LSQR National Science Foundation: Division of Computational Mathematics 30 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Illustrating the progress of the Newton algorithm with LSQR National Science Foundation: Division of Computational Mathematics 31 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Problem Grain noise 15% added for increasing subproblem size Figure: Signal to noise ratio 10 log10 (1/e) relative error e National Science Foundation: Division of Computational Mathematics 32 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation An Alternative Direction For Large Scale Problems : Domain Decomposition [Ren98] Domain decomposition of x into several domains: x = (xT1 , xT2 , ..., xTp )T . Corresponding to different splitting of image x, kernel operator A is split A = (A1 , A2 , ..., Ap ). eg: Different Splitting Schemes National Science Foundation: Division of Computational Mathematics 33 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Formulation - Regularized Least Squares [LRG09] The linear system Ax ≈ b is replaced with the split systems X Ai yi ≈ bi (x), bi (x) = b − Aj xj = b − Ax + Ai xi . j6=i Locally solve Ax ≈ b min kAi yi − bi (x)k2 , yi ∈<ni 1 ≤ i ≤ p. If the problem is ill-posed we have the regularized problem ˘ ¯ min k Ax − b k22 +λ2 kDxk22 . f Similarly, we will have splitting on operator assuming local regularization «„ « „ «« „ « „„ A2 Ap A A1 . = ··· λ2 D2 λp Dp λ1 D1 DΛ Solve iteratively using novel updates for changing right hand sides, [Saa87], [CW97] Update Scheme - Global solution update from local solutions at step k to step k + 1 x(k+1) = p X (k+1) τi (k+1) (xlocal )i , i=1 (k+1) where (xlocal )i (k) (k) (k+1) T = ((x1 )T , . . . , (xi−1 )T , (yi (k) (k) ) , (xi+1 )T , . . . , (xp )T )T National Science Foundation: Division of Computational Mathematics 34 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Feasibility of the Approach 1-D Phillips, size 1024, noise level 6% Regularization .25 (a) No Splitting Rel Error .0525 (b) 4 Sub Problems Rel Error .0499 National Science Foundation: Division of Computational Mathematics 35 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation 2-D PET Reconstruction, Size 64 × 64, Noise Level 1.5% Regularization .2 (c) No Splitting SNR 11.73DB (d) 4 Sub Problems SNR 12.24DB National Science Foundation: Division of Computational Mathematics 36 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation A New Stopping Rule for the EM Method [GR09]) Quick Rationale PET Well-known that ML-EM method converges to overly noisy solutions, iterative methods have to stop before convergence [SV82], [HHL92], [HL94]. P Detected counts in tube i are Poisson with mean bi = j aij xj . Hence basic relationship is b ≈ Ax, A is projection matrix, b are counts and x is the density to be reconstructed. EM iteration: x(k+1) = (AT (b./(Ax(k) ))). ∗ x(k) . (e) True (f) k = 95 National Science Foundation: Division of Computational Mathematics (g) k = 500 37 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation A New Estimate of the Stopping Level Algorithm: For k until converged for m tubes Calculate step of EM, x(k) . Update tube means b(k) = Ax(k) . Bin all tubes to have b(k) > 20. √ Calculate y = (b − b(k) )./ b(k) , then y ∼ N (0, 1) Calculate mean ȳ and sample standard deviation s for yi , i = 1 : m. √ Calculate α = m − 1ȳ/s, and pt (α), α ∼ t(m − 1) (t-student density with m − 1 degrees of freedom). p Calculate β = (m − 1)s2 and pN (β), β ∼ N (m − 1, 2(m − 1)) (Gaussian density mean m − 1). Calculate likelihood of sampling α and β from the two distributions: l(k) = pt (α)pN (β). When l(k) is maximum, l(k) < l(k−1) . STOP Solution is x(k−1) . National Science Foundation: Division of Computational Mathematics 38 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Simulations: Validation Table: The best and the predicted stopping step for 11 simulations Best Pred 95 96 90 88 89 94 90 89 90 89 95 95 90 100 92 100 89 95 94 93 91 94 Figure: l(k) = pt (α)pN (β) for 500 steps. Maximized at k = 96. National Science Foundation: Division of Computational Mathematics 39 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Extension for Mammogram Denoising: Early Ideas The Model Assume blurring of the mammogram by PSF kernel K and measured image is b. Deconvolve in the Fourier domain and invert to give noisy estimate of optical density d. √ Each entry of d is a linear combination of x-ray energy with Poisson noise, and d is close to normally distributed [ANS48]. To find true optical density x denoise the deconvolved d. Use total variation min x m X √ √ ( di − xi )2 + λ2 kxkT V , s. t. x ≥ 0. i=1 Given knowledge of variance in noise in x-ray automatically select λ using statistical estimation approach. National Science Foundation: Division of Computational Mathematics 40 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Trial Experiment: Use data set from UoFlorida (DDSM) cancer case 0001, left breast with CC scanning angle (a) Original Image (b) Restored Image Figure: Total yellow (calcification) reduced by deblurring. Rectangle at bottom rhs indicates deblurring National Science Foundation: Division of Computational Mathematics 41 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation PSF Estimation in Blurring Problems using Edge Detection( Cochran, Gelb, Viswanathan, Renaut, Stefan) Given the blurring model (PSF convolution operator K) and x ∈ L2 (−π, π) piecewise-smooth. We estimate the psf starting with 2N + 1 blurred Fourier coefficients b̂(j), j = −N, ..., N . b=K ∗x+e Principle: Apply a linear edge detector, denote by T . We shall assume that the edge detector can be written as a convolution with an appropriate kernel T ∗ (K ∗ x + e) = (K ∗ x + e) ∗ T =x∗K ∗T +e∗T = (x ∗ T ) ∗ K + e ∗ T ≈ [x] ∗ K + ẽ Here [x](s) is a jump function. For a jump discontinuity in a function the jump function at any point s only depends on the values of x at s+ and s− . [x](s) := x(s+ ) − x(s− ) Hence, we observe shifted and scaled replicates of the psf. National Science Foundation: Division of Computational Mathematics 42 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Example (No Noise) Function 2 1.5 1.5 1 1 h(x) f(x) 0.5 0 0.5 −0.5 −1 0 −1.5 −2 −3 −2 −1 0 x 1 2 −0.5 3 −3 (a) True function −1 0 x 1 2 3 2 3 (b) Motion Blur PSF Blurred Function 1.5 −2 2 1.5 1 1 0.5 f(x) g(x) 0.5 0 0 −0.5 −0.5 −1 Function −1 −1.5 Blur after edge detection True blur (normalized) −1.5 −3 −2 −1 0 x 1 (c) Blurred Function 2 3 −2 −3 −2 −1 0 x 1 (d) After applying edge detection Figure: Function subjected to motion blur, N = 128 National Science Foundation: Division of Computational Mathematics 43 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation 2 2 1.5 1.5 1 1 0.5 0.5 f(x) f(x) Representative Examples : Gaussian PSF 0 −0.5 0 −0.5 −1 −1 Function Function Blurred, noisy function −1.5 Blurred, noisy function −1.5 Blur after edge detection Blur after edge detection True blur (normalized) −2 −3 −2 −1 True blur (normalized) 0 x 1 2 (a) Noisy blur estimation 3 −2 −3 −2 −1 0 x 1 2 3 (b) After low-pass filtering Figure: Function subjected to Gaussian blur, N = 128 Complex noise distribution on Fourier coefficients – ê ∼ N (0, 1.5 ) (2N +1)2 Second picture subjected to low-pass (Gaussian) filtering It is conceivable that parameter estimation for a Gaussian PSF can take into account the effect of Gaussian filtering National Science Foundation: Division of Computational Mathematics 44 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Representative Examples: Motion Blur 2 2 1.5 1.5 1 1 0.5 0.5 Function Blurred, noisy function f(x) f(x) Blur after edge detection 0 −0.5 True blur (normalized) 0 −0.5 −1 −1 Function Blurred, noisy function −1.5 −1.5 Blur after edge detection True blur (normalized) −2 −3 −2 −1 0 x 1 2 (a) Noisy blur estimation 3 −2 −3 −2 −1 0 x 1 2 3 (b) After TV denoising Figure: Function subjected to Motion blur, N = 128 Cannot perform conventional low-pass filtering since blur is piecewise-smooth We compute the noisy blur estimate for Fourier expansion of blurred jump SN [b] ≈ [x] ∗ K + ẽ Denoising problem formulation min x k x − SN [b] k22 + λ2 k Dx k1 . National Science Foundation: Division of Computational Mathematics 45 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Future Work Combining Approaches Extend the parameter selection methods to the domain decomposition problems for large scale. Use efficient schemes for large scale problems - eg right hand side updates Extend to edge detection approaches Use tensor product of the PSF for extension to 2D - is it feasible Use parameter estimation techniques for the 2D problem Further development of statistical techniques for estimating acceptable solutions. National Science Foundation: Division of Computational Mathematics 46 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Bibliography I F. J. ANSCOMBE. The transformation of poisson, binomial and negative-binomial data. Biometrika, 35:246–254, 1948. T. F. Chan and W. L. Wan. Analysis of projection methods for solving linear systems with multiple right-hand sides. 1997. H. Guo and R. A. Renaut. Revisiting stopping rules for iterative methods used in emission tomography: Analysis and developments. Physics of Medicine and Biology, submitted, 2009. H. M. Hudson, B. F. Hutton, and R. Larkin. Accelerated EM reconstruction using ordered subsets. J. Nucl. Med., 33:960, 1992. H. M. Hudson and R. Larkin. Accelerated imaging reconstruction using orded subsets of projection data. IEEE Trans. Med. Imag., 13(4):601–609, 1994. National Science Foundation: Division of Computational Mathematics 47 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Bibliography II Y. Lin, R. A. Renaut, and H. Guo. Multisplitting for regularized least squares. in prep., 2009. J. Mead and R. A. Renaut. Least squares problems with inequality constraints as quadratic constraints. Linear Algebra and its Applications, 2009. J. Mead and R. A. Renaut. A Newton root-finding algorithm for estimating the regularization parameter for solving ill-conditioned least squares problems. Inverse Problems, 25, 2009. R. A. Renaut. A parallel multisplitting solution of the least squares problem. BIT, 1998. R. A Renaut, I. Hnetynkova, and J. Mead. Regularization parameter estimation for large scale Tikhonov regularization using a priori information. Computational Statistics and Data Analysis, 54(1), 2009. National Science Foundation: Division of Computational Mathematics 48 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Bibliography III Y. Saad. On the Lanczos method for solving symmetric linear systems with several right-hand sides. 1987. L. A. Shepp and Y. Vardi. Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imag., MI-1(2):113–122, Oct. 1982. Curtis R. Vogel. Computational Methods for Inverse Problems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. National Science Foundation: Division of Computational Mathematics 49 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Future Work Other Results and Future Work Software Package! Diagonal Weighting Schemes Edge preserving regularization - Total Variation Better handling of Colored Noise. Residual Periodogram for large scale. National Science Foundation: Division of Computational Mathematics 50 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Algorithm Using the GSVD GSVD Use GSVD of [Wb 1/2 A, D] For γi the generalized singular values, and s = U T Wb 1/2 r m̃ = m − n + p s̃i = si /(γi2 σx2 + 1), i = 1, . . . , p, Find root of F (σx ) = p X ( i=1 ti = s̃i γi . 1 γi2 σx2 + 1 )s2i + m X s2i − m̃ = 0 i=n+1 Equivalently: solve F = 0, where F (σx ) = sT s̃ − m̃ and F 0 (σx ) = −2σx ktk22 . National Science Foundation: Division of Computational Mathematics 51 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by tolD. Let t = Dy/σ (k) , where y is the current update, then step = 1 1 ( )2 (JD (σ (k) ) − m̃) 2 max {ktk, tolD} Step 3: Introduce line search α(k) in Newton sigmanew = σ (k) (1 + α(k) step) National Science Foundation: Division of Computational Mathematics 52 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Key Aspects of the Proof I: The Functional J Algebraic Simplifications: Rewrite functional as quadratic form Regularized solution given in terms of regularization matrix R(WD ) x̂ R(WD ) = x0 + (AT Wb A + DT Wx D)−1 AT Wb r, = x0 + R(WD )Wb = x0 + y(WD ). = T 1/2 (4) r = b − Ax0 r, (5) T −1 (A Wb A + D Wx D) T A Wb 1/2 (6) Functional is given in terms of influence matrix A(WD ) A(WD ) = Wb 1/2 AR(WD ) JD (x̂) = rT Wb 1/2 (Im − A(WD ))Wb 1/2 r, = T (7) let r̃ = Wb 1/2 r r̃ (Im − A(WD ))r̃. A Quadratic Form National Science Foundation: Division of Computational Mathematics (8) (9) 53 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Key Aspects of the Proof II : Properties of a Quadratic Form χ2 distribution of Quadratic Forms xT P x for normal variables (Fisher-Cochran Theorem) Components xi are independent normal variables xi ∼ N (0, 1), i = 1 : n. A necessary and sufficient condition that xT P x has a central χ2 distribution is that P is idempotent, P 2 = P . In which case the degrees of freedom of χ2 is rank(P ) =trace(P ) = n. . When the means of xi are µi 6= 0, xT P x has a non-central χ2 distribution, with non-centrality parameter c = µT P µ A χ2 random variable with n degrees of freedom and centrality parameter c has mean n + c and variance 2(n + 2c). National Science Foundation: Division of Computational Mathematics 54 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Key Aspects of the Proof III: Requires the GSVD Lemma Assume invertibility and m ≥ n ≥ p. There exist unitary matrices U ∈ Rm×m , V ∈ Rp×p , and a nonsingular matrix X ∈ Rn×n such that » – Υ A=U X T D = V [M, 0p×(n−p) ]X T , (10) 0(m−n)×n Υ = diag(υ1 , . . . , υp , 1, . . . , 1) ∈ Rn×n , 0 ≤ υ1 ≤ · · · ≤ υp ≤ 1, υi2 + µ2i = 1, M = diag(µ1 , . . . , µp ) ∈ Rp×p , 1 ≥ µ1 ≥ · · · ≥ µp > 0, i = 1, . . . p. (11) The Functional with the GSVD Let Q̃ = diag(µ1 , . . . , µp , 0n−p , Im−n ) then J = r̃T (Im − A(WD ))r̃ = kQ̃U T r̃k22 = kkk22 National Science Foundation: Division of Computational Mathematics 55 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation Proof IV: Statistical Distribution of the Weighted Residual Covariance Structure Errors in b are e ∼ N (0, Cb ). Now b depends on x, b = Ax hence we can show b ∼ N (Ax0 , Cb + ACD AT ) (x0 is mean of x) Residual r = b − Ax ∼ N (0, Cb + ACD AT ). r̃ = Wb 1/2 r ∼ N (0, I + ÃCD ÃT ), Ã = Wb 1/2 A. Use the GSVD I + ÃCD ÃT = U Q−2 U T , Q = diag(µ1 , . . . , µp , In−p , Im−n ) Now k = QU r̃ then k ∼ N (0, QU (U Q−2 U T )U Q) ∼ N (0, Im ) T T But J = kQ̃U T r̃k2 = kk̃k2 , where k̃ is the vector k excluding components p + 1 : n. Thus JD ∼ χ2 (m + p − n). National Science Foundation: Division of Computational Mathematics 56 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation When mean of the parameters is not known, or x0 = 0 is not the mean Corollary: non-central χ2 distribution of the regularized functional Recall x̂ = argmin JD (x) = argmin{kAx − bk2Wb + k(x − x0 )k2WD }, WD = DT Wx D. Assume all assumptions as before, but x 6= x0 is the mean vector of the model parameters. Let c = kck22 = kQ̃U T Wb 1/2 A(x − x0 )k22 Then JD ∼ χ2 (m + p − n, c) The functional at optimum follows a non central χ2 distribution National Science Foundation: Division of Computational Mathematics 57 / 49 Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation A further result when A is not of full column rank The Rank Deficient Solution Suppose A is not full column rank. Then the filtered solution can be written in terms of the GSVD xFILT (λ) = p X p n n X X X fi γi2 s x̃ + s x̃ = s x̃ + si x̃i . i i i i i i + λ2 ) υi i=p+1 i=1 i=p+1 υi (γi2 i=p+1−r Here fi = 0, i = 1 : p − r, fi = γi2 /(γi2 + λ2 ), i = p − r + 1 : p. This yields J(xFILT (λ)) ∼ χ2 (m − n + r, c) notice degrees of freedom are reduced. National Science Foundation: Division of Computational Mathematics 58 / 49
© Copyright 2024 Paperzz