Inverse Problems and Regularization – An Introduction

Inverse Problems and Regularization –
An Introduction
Stefan Kindermann
Industrial Mathematics Institute
University of Linz, Austria
Introduction to Regularization
What are Inverse Problems ?
One possible definition [Engl, Hanke, Neubauer ’96]:
Inverse problems are concerned with determining causes for a
desired or an observed effect.
Cause
Direct Problem
=⇒
(Parameter, Unknown,
Solution of Inv. Prob, . . .)
Effect
(Data, Observation, . . .)
Inverse Problem
⇐=
Introduction to Regularization
Direct and Inverse Problems
The classification as direct or inverse is in the most cases based on
the well/ill-posedness of the associated problems:
Stable
=⇒
Cause
Effect
Unstable
⇐=
Inverse Problems ∼ Ill-posed/(Ill-conditioned) Problems
Introduction to Regularization
What are Inverse Problems?
A central feature of inverse problems is their ill-posedness
Well-Posedness in the sense of Hadamard
[Hadamard ’23]
Existence of a solution (for all admissible data)
Uniqueness of a solution
Continuous dependence of solution on the data
Well-Posedness in the sense of Nashed [Nashed, ’87]
A problem is well posed if the set of Data/Observations is a closed
set. (The range of the forward operator is closed).
Introduction to Regularization
Abstract Inverse Problem
Abstract inverse problem:
Solve equation for x ∈ X (Banach/Hilbert- ... space), given data
y ∈ Y (Banach/Hilbert- ... space)
F (x) = y ,
where F −1 does not exist or is not continuous.
F . . . forward operator
We want
00 †
x = F −1 (y )00
x † .. (generalized) solution
Introduction to Regularization
Abstract Inverse Problem
• If the forward operator is linear ⇒ linear inverse problem.
• A linear inverse problem is well-posed in the sense of Nashed if
the range of F is closed.
Theorem: An linear operator with finite dimensional range is
always well-posed (in Nashed’s sense).
“Ill-posedness lives in infinite dimensional spaces”
Introduction to Regularization
Abstract Inverse Problem
“Ill-posedness lives in infinite dimensional spaces”
Problems with a few number of parameters usually do not need
regularization.
Discretization acts as Regularization/Stabilization
Ill-posedness in finite dimensional space ∼ Ill-conditioning
Measure of ill-posedness: decay of singular values of forward
operator
Introduction to Regularization
Methodologies in studying Inverse Problems
Deterministic Inverse Problems
(Regularization, worst case convergence, infinite dimensional, no assumptions on noise)
Statistics
(Estimators, average case analysis, often finite dimensional, noise is random variable, specific structure )
Bayesian Inverse Problems
(Posteriori distribution, finite dimensional, analysis
of post. dist. by estimators, specific assumptions
on noise and prior)
Control Theory
(x= control, F (x)= state, convergence of state not
control, infinite dimensional, no assumptions)
Introduction to Regularization
Deterministic Inverse Problems and Regularization
Try to solve
F (x) = y ,
when
00 †
x = F −1 (y )00
does not exist.
Notation: x † the “true” (unknown) solution (minimal norm
solution)
Even if F −1 (y ) exists, it might not be computable [Pour-El, Richards ’88]
Introduction to Regularization
Deterministic Inverse Problems and Regularization
Data noise: Usually we do not have the exact data
y = F (x † )
but only noisy data
yδ = F (x † ) + noise
Amount of noise: noiselevel
δ = kF (x † ) − yδ k
Introduction to Regularization
Deterministic Inverse Problems and Regularization
Method to solve Ill-posed problems:
Regularization: Approximate the inverse F −1 by a family of
stable operators Rα
F (x) = y
“x † = F −1 (y )“
⇒ xα = Rα (y )
Rα ∼ F −1
Rα Regularization operators
α Regularization parameter
Introduction to Regularization
Regularization
α small ⇒ Rα good approximation for F −1 , but unstable
α large ⇒ Rα stable but bad approximation for F −1 ,
||xα−x||
α ... controls Trade-off between approximation and stability.
Total error = approximation error + propagated data error
Total Error
↓
Approximation
Error →
Propagated
← Data Error
α
How to select α: Parameter choice rules
Introduction to Regularization
Example: Tikhonov Regularization
Tikhonov Regularization: [Phillips ’62; Tikhonov ’63]
Let F : X → Y be linear between Hilbertspaces:
A least squares solution to F (x) = y is given by the normal
equations
F ∗ Fx = F ∗ y
Tikhonov regularization: Solve regularized problem
F ∗ Fx + αx = F ∗ y
xα = (F ∗ F + αI )−1 F ∗ y
Introduction to Regularization
Example: Tikhonov Regularization
Error estimates (under some conditions)
kxα − x † k2 ≤
total Error
δ2
+
α
(Stability)
C αν
Approx.
Theory of linear and nonlinear problems in Hilbert spaces:
[Tikhonov, Arsensin ’77; Groetsch ’84; Hofmann ’86; Baumeister ’87, Louis ’89;
Kunisch, Engl, Neubauer ’89; Bakushinskii, Goncharskii ’95; Engl, Hanke, Neubauer
’96; Tikhonov, Leonov, Yagola ’98; . . . ]
Introduction to Regularization
Example: Landweber iteration
Landweber iteration
[Landweber ’51]
Solve normal equation by Richardson iteration
Landweber iteration
xk+1 = xk − F ∗ (F (xk ) − y )
k = 0, . . .
Iteration index is the regularization parameter α =
1
k
Introduction to Regularization
Example: Landweber iteration
Error estimates (under some conditions)
kxk − x † k2 ≤
total Error
kδ
+
(Stability)
C
kν
Approx.
Semiconvergence
Iterative Regularization Methods:
Parameter choice = choice of stopping index k
Theory: [Landweber ’51; Fridman ’56; Bialy ’59; Strand ’74; Vasilev ’83; Groetsch
’85; Natterer ’86; Hanke, Neubauer, Scherzer ’95; Bakushinskii, Goncharskii ’95;
Engl, Hanke, Neubauer ’96;. . . ]
Introduction to Regularization
Notion of Convergence
Does the regularized solution converges to the true solution as the
noise level tends to 0
lim xα → x †
δ→0
(Worst case) convergence
lim sup{kxα − x † k | ∀yδ : kyδ − F (x † )k ≤ δ} = 0
δ→0
(for a given parameter choice rule)
Convergence in expectation
Ekxα − x † k2 → 0
as Ekyδ − F (x † )k2 → 0
Introduction to Regularization
Theory of Regularization of Inverse Problems
Convergence depends on x †
Question of speed: convergence rates
kxα − x † k ≤ f (α)
or kxα − x † k ≤ f (δ)
Introduction to Regularization
Theoretical Results
[Schock ’85]:
Convergence can be arbitrarily slow !
Theorem: For ill-posed problems in the sense of Nashed, there
cannot be a function f with limδ→ f (δ) = 0 such that for all x †
kxα − x † k ≤ f (δ)
Uniform bounds on the convergence rates are impossible
Convergence rates are possible if x † in some smoothness class
Introduction to Regularization
Theoretical Results
Convergence rates: requires a source condition
x† ∈ M
Convergence rates ∼ modulus of continuity of the inverse
Ω(δ, M) = sup{kx † 1 − x † 2 k | kF (x † 1 ) − F (x † 2 )k ≤ δ, x † 1 , x † 2 ∈ M}
Theorem[Tikhonov, Arsenin ’77, Morozov ’92, Traub, Wozniakowski ’80]
For an arbitrary regularization map, arbitrary parameter choice rule
(with Rα (0) = 0)
kxα − x † k ≥ Ω(δ, M)
Introduction to Regularization
Theoretical Results
Standard smoothness classes:
For linear ill-posed problems in Hilbert spaces we can form
M = X µ = {x † = (F ∗ F )ν ω | ω ∈ X }
(Hölder) source condition (=abstract smoothness condition)
2µ
Ω(δ, X µ ) = C δ 2µ+1
Best convergence rate for Hölder source conditions
A regularization operator and a parameter choice rule such that
2µ
kxα − x † k = C δ 2µ+1
is called order optimal.
Introduction to Regularization
Theoretical Results
Special case
x † = F ∗ω
Such source conditions can be generalized to nonlinear problems
e.g.
x † = F 0 (x † )∗ ω
x † = (F 0 (x † )∗ F (x † ))ν ω
Introduction to Regularization
Theoretical Results
Many regularization method have shown to be order optimal.
A significant amount of theoretical results in regularization theory
deals with this issue:
Convergence of method and parameter choice rule
Optimal order convergence under source condition.
Knowledge of the source condition does not have to be known.
Introduction to Regularization
Parameter Choice Rules
How to choose the regularization parameter:
Classification
a-priori α = α(δ)
a-posteriori α = α(δ, y )
heuristic α = α(y )
Introduction to Regularization
Bakushinskii veto
Bakushinskii veto: [Bakushinskii ’84] A parameter choice without
knowledge of δ cannot yield a convergent regularization in the
worst case (for ill-posed problems).
Knowledge of δ is needed !
⇒ heuristic parameter choice rules are nonconvergent in the worst
case
Introduction to Regularization
a-priori-rules
Example of a-priori rule:
If x † ∈ X µ , then
1
α = δ 2µ+1
yields optimal order for Tikhonov regularization
+ Easy to implement
− Needs information on source condition
Introduction to Regularization
a-posteriori rules
Example a-posteriori rules:
Morozov’s Discrepancy principle: [Morozov ’66]
Fix τ > 1,
DP: Choose the largest α such that the residual is of the order of
the noise level
kF (xα ) − y k ≤ τ δ
Yields in many situations a optimal order method
+ Easy to implement
+ No information on source conditions
− In some cases not optimal order
Other a-posteriori choice rules:
Gferer-Raus-Rule (improved discrepancy principle)
Gferer ’87]
Balancing principle
...
[Lepski ’90; Mathe, Pereverzev ’03]
Introduction to Regularization
[Raus ’85;
Heuristic Parameter Choice rules
Example heuristic rules:
Quasi-optimality Rule
[Tikhonov, Glasko ’64]
Choose a sequence of geometrically decaying regularization
parameter
αn = Cq n q < 1
For each α compute xαn
Choose α = αn∗ where n∗ is the minimizer of
kxαn+1 − xαn k
Introduction to Regularization
Heuristic Parameter Choice rules
Example heuristic rules:
Hanke-Raus Rule [Hanke, Raus ’96]
Choose α as minimizer of
1
√ kF (xα ) − y k
α
Introduction to Regularization
Heuristic Parameter Choice rules: Theory
Heuristic Rules cannot converge in the worst case:
Convergence in the restricted noise case
lim kxα − x † k → 0
δ→0
[K., Neubauer ’08, K. ’11]
if yδ = F (x) + noise, noise ∈ N
The condition
noise ∈ N
is an abstract noise condition.
Introduction to Regularization
Heuristic Parameter Choice rules: Theory
In the linear case reasonable noise conditions can be stated and
convergence and convergence rates can be shown:
Noise condition: ”Data noise has to be sufficiently irregular”
Introduction to Regularization
Nonlinear Case :Tikhonov Regularization
F (x) = y
with F nonlinear
Tikhonov Regularization for Nonlinear Problems
[Tikhonov, Arsenin ’77; Engl, Kunisch Neubauer, ’89; Neubauer ’89, . . . ]
xα is a (global) minimizer of the Tikhonov functional
J(x) = kF (x) − y k2 + αR(x)
R(x) is a regularization functional
Introduction to Regularization
Nonlinear Case :Tikhonov Regularization
Convergence (Rates) Theory:
Hilbert spaces [Engl, Kunisch Neubauer ’89; Neubauer ’89]
Banach spaces [Kaltenbacher, Hofmann, Pöschl, Scherzer ’08]
Parameter Choice rules:
a-priori: α = δ ξ
a-posteriori: Discrepancy principle
Introduction to Regularization
Nonlinear Case :Tikhonov Regularization
Examples: Sobolev norm
R(x) = kxk2H s
Total Variation
Z
|∇x|
R(x) =
L1 -norm
Z
R(x) =
|x|
Maximum Entropy
Z
R(x) =
|x| log(x)
Introduction to Regularization
Nonlinear Case :Tikhonov Regularization
Choice of the Regularization functional:
Deterministic Theory: User can choose:
Should stabilize problem
Convergence theory should apply
R(x) should reflect what we expect from solution
Bayesian viewpoint: Regularization functional ∼ prior
Introduction to Regularization
Nonlinear Case :Tikhonov Regularization
Computational issue:
The regularized solution is a global minimum of a optimization
problem
xα is a (global) minimizer
J(x) = kF (x) − y k2 + αR(x)
Introduction to Regularization
Iterative Methods
Example:
Nonlinear Landweber iteration
[Hanke, Neubauer, Scherzer ’95]
xk+1 = xk − F 0 (xk )∗ (F (xk ) − y )
Parameter choice by choosing the stopping index.
Convergence rates theory needs a nonlinearity condition
kF (x) − F (x † ) − F 0 (x † )(x − x † )k ≤ C kF (x) − F (x † )k
Restricts the nonlinearity of the problem
Variants of a nonlinearity condition
Range-invariance [Blaschke/Kaltenbacher ’96]
Curvature condition [Chavent, Kunisch ’98]
Variational inequalities [Kaltenbacher, Hofmann, Pöschl, Scherzer ’08]
Faster alternative: Gauss-Newton type iterations [Bakushinskii ’92,
Blaschke, Neubauer, Scherzer ’97]
Introduction to Regularization
Summary
Theoretical issues:
For a given inverse problem
Understand ill-posedness (Uniqueness/Stability)
Are data rich enough to characterize solution uniquely
How unstable is the inverse problem (degree of ill-posedness)
Method of Regularization + Parameter Choice
Design efficient regularization method for class of problem
Convergence, Convergence rates (optimal order),
Interplay: Regularization, Discretization
Practical issues:
How to compute global optimum in TR (efficiently)
Improving iterative methods (Newton-type, preconditioning)
What Regularization term to choose
Introduction to Regularization
Dynamic Inverse Problems
Forward operator/Solution x(t) depend on time
F (x(t 0 ≤ t), t) = y (t)
Introduction to Regularization
Dynamic Inverse Problems
Examples:
Volterra integral equation of the first kind
Z t
k(t, s)x(s)ds = y (t)
0
Parameter identification in ODEs
y 0 (t) = f (t, y (t), x(t))
Control theory
z(t)0 = Az(t) + Bx(t)
y (t)0 = Cz(t) + Dx(t)
Introduction to Regularization
Methods
Example: Tikhonov Regularization
Z
T
kF (t, x(., t)) − y (t)k2 dt + αR(x(t)
0
+ Convergence
− Not causal/sequential: Computation of x(t) requires all
data (past/future)
Introduction to Regularization
Methods
Alternative:
Dynamic Programing
[K.,Leitao ’06]
x 0 (t) = G (x(t), V (t))
+ Convergence
− Only for linear problems
− Partially causal/sequential: Computation of V (t) requires
all data (past/future)
Introduction to Regularization
Methods
Control Theoretic Methods:
Feedback control
x(t) = Ky (t)
(x(t), x 0 (t)) = Ky (t)
−Convergence in x (Asymptotic convergence)?
− Fully causal/sequential: Computation of x(t) requires only
data (at t)
+ Nonlinear
Introduction to Regularization
Methods
Control Theoretic Methods:
Kalman filter
− Restrictive Assumptions on noise
+ Fully causal/sequential
Introduction to Regularization
Methods
Local Regularization
[Lamm, Scofield ’01; Lamm ’03]
xα (t) is given by an ODE related to Volterra equation
+ Fully causal/sequential
+ Convergence theory
+ Nonlinear
− Quite specific method for Volterra equations
Introduction to Regularization
Methods
Kügler’ online parameter identification
[Kügler ’08]
x 0 (t) = G (x(t))∗ (F (x(t)) − y (t))
+ Fully causal/sequential
+ Asymptotic convergence theory (also for nonlinear case)
− Assumptions realistic ?
− Assumes x does not depend on time
Introduction to Regularization
The wanted method
fully causal/sequential method
convergence theory in the illposed and nonlinear case
no/weak assumptions on operator
no/weak assumptions on solution
no assumption on noise
efficient to compute
Introduction to Regularization