Parallel Total Variation Minimization

Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Parallel Total Variation Minimization
Jahn Müller
[email protected]
27.01.2009
Jahn Müller
Parallel Total Variation Minimization
1 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Outline
1
Introduction
2
Total Variation Regularization
The ROF model
Different Formulations
3
Domain Decomposition
Overlapping Decomposition
Schwarz Methods
4
The Algorithm
Numerical Realization
Results
Jahn Müller
Parallel Total Variation Minimization
2 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Introduction
desirable to extend 2D algorithms to 3D
currently used workstations reach their technical limitations
expedient:
divide original problem in subproblems
solve independently on several CPU’s
merge together to a solution of the complete problem
data dependences may arise (neglecting leads to undesirable
effects at the interfaces)
Jahn Müller
Parallel Total Variation Minimization
3 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Image Denoising
function space for denoising
Lebesgue space
R
Lp (Ω) := {u : Ω → R̄ | Ω |u|p dx < ∞}, p ≥ 1
contain oscillating images, in particular noise
Sobolev space
W 1,p (Ω) := {u ∈ Lp (Ω) |
∂u
∂xj
∈ Lp (Ω), j = 1, . . . , d}, p ≥ 1
to restrictiv, e.g. for piecewise constant images
need for a proper space
space of functions with bounded variation (BV)
Jahn Müller
Parallel Total Variation Minimization
4 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Total Variation
Jahn Müller
Parallel Total Variation Minimization
5 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Total Variation Denoising
Jahn Müller
Parallel Total Variation Minimization
6 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Total Variation Regularization
Let f : Ω ⊂ Rd → R beRa noisy version of a given image u0 with
noise variance given by Ω (u0 − f )2 dx ≤ σ 2 .
The ROF model: (Rudin, Osher, Fatemi)
Z
λ
û = arg min
(u − f )2 dx +
2
u∈BV (Ω)
}
| Ω {z
data fitting
0
||ϕ||∞ ≤1
R
Ω u∇
(1)
regularization
where
|u|TV := supϕ∈C ∞ (Ω)d
|u|TV
| {z }
· ϕ dx
denotes the total variation of u
BV (Ω) = u ∈ L1 (Ω) | |u|TV < ∞ is the space of functions
with bounded total variation.
Jahn Müller
Parallel Total Variation Minimization
7 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Total Variation Regularization
depending on exact definition of the supremum norm
||p||∞ := ess sup ||p(x)||l r
x∈Ω
family of equivalent seminorms:
Z
Z
u∇ · ϕ dx
|Du|l s = sup
Ω
with
1
s
+
1
r
ϕ∈C0∞ (Ω)d
||ϕ||∞ ≤1
Ω
= 1 (Hölder conjugate)
isotropic total variation (r = 2)
anisotropic total variation (r = ∞)
Jahn Müller
Parallel Total Variation Minimization
8 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Isotropic vs Anistropic Total Variation
Jahn Müller
(a) Original image
(b) Noisy image
(c) Isotropic TV
(d) Anisotropic TV
Parallel Total Variation Minimization
9 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Total Variation Regularization
The functional
λ
J(u) :=
2
Z
2
(u − f ) dx +
Ω
Z
|u|TV
Ω
is strict convex and attains a unique minimum in BV (lower
semicontinuity and weak*- compactness of the sub-level sets).
optimality condition in terms of subgradients
(∂J(u) := {p ∈ U ∗ | J(w ) ≥ J(u) + hp, w − ui, ∀w ∈ U}):
0 ∈ ∂J(u)
0 ∈ λ(u − f ) + ∂|u|TV
Jahn Müller
Parallel Total Variation Minimization
10 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Total Variation Regularization
Primal Formulation:
λ
J(u) =
2
Z
2
(u − f ) dx +
Z
|∇u| dx
Ω
Ω
for u sufficiently smooth, particularly u ∈ W 1,1 (Ω).
Euler Lagrange equation:
λ(u − f ) − ∇ ·
∇u
|∇u|
=0
perturbe norm to overcome issue with singularity at ∇u = 0:
|∇u|β :=
p
|∇u|2 + β
solution methods: steepest descent (Rudin et al.), fixed point
iteration (Vogel and Oman), Newton’s method (Chan et al.)
Jahn Müller
Parallel Total Variation Minimization
11 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Total Variation Regularization
exact formulation:
min
sup
u∈BV (Ω) kpk∞ ≤1
|
λ
2
Z
(u − f )2 dx +
Ω
{z
Z
u∇ · p dx
Ω
=:L(u,p)
interchange min and sup (not trivial!):
}
min sup L(u, p) = max min L(u, p)
u
p
p
u
Dual Formulation:
2
min λ1 ∇ · p − f 2
kpk∞ ≤1
(2)
after solving (2) we obtain the solution for u from
u = f − λ1 ∇ · p
solution methods: projection algorithm (Chambolle)
Jahn Müller
Parallel Total Variation Minimization
12 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Total Variation Regularization
Primal Dual Formulation:
Z
Z
λ
2
inf
sup
u∇ · p dx
(u − f ) dx +
Ω
u∈BV (Ω) kpk∞ ≤1 2 Ω
|
{z
}
=:L(u,p)
For a saddle point we achieve the following optimality
conditions
∂L
= λ(u − f ) + ∇ · p = 0
∂u
and
L(u, p) ≥ L(u, q)
∀q , kqk∞ ≤ 1,
(3)
(4)
(4) implies ∇ · p ∈ ∂|u|TV and hence 0 ∈ ∂J(u)
Jahn Müller
Parallel Total Variation Minimization
13 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Penalty and Barrier Approximation
approximate the constraint kpk ≤ 1 in (3) by using Barrier or
Penalty methods
Penalty approximation
1
Lε (u, p) = L(u, p) − F (kpk − 1)
ε
with ε > 0 small and a term F penalizing if kpk − 1 > 0.
Typical example:
F (s) =
1
max{s, 0}2
2
still allows violations of the constraint
Jahn Müller
Parallel Total Variation Minimization
14 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Penalty and Barrier Approximation
Barrier approximation
add a continuous barrier term to L such that G (s) = ∞, if the
constraint is violated
Lε (u, p) = L(u, p) − εG (kpk2 − 1)
For example:
G (s) = − log(−s)
ensures that the constraint is not violated
also called interior-point method
choice of approximation effects the shape of the solution u
Jahn Müller
Parallel Total Variation Minimization
15 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Penalty and Barrier Approximation
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
−0.2
−1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−0.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
h = 10−3
ε = 10−3
0
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Penalty method
Jahn Müller
ε = 0.1
0
−0.8
1.2
−0.2
−1
h = 10−3
0.8
1
−0.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Barrier method
Parallel Total Variation Minimization
16 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Newton method with damping
Optimality conditions (using penalty approximation)
∂Lε
= λ(u − f ) + ∇ · p = 0
∂u
∂Lε
= −∇u − 2ε H(p) = 0
∂p
with H(p) being the derivative of F (kpk − 1).
Linearize H(p) via first-order Taylor-approximation
H(p k+1 ) ≈ H(p k ) + H ′ (p k )(p k+1 − p k )
Add damping term, with damping parameter τ
τ k (p k+1 − p k )
Jahn Müller
Parallel Total Variation Minimization
17 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
The ROF model
Different Formulations
Newton method with damping
Solve in each step
λ(u k+1 − f ) + ∇ · p k+1 = 0
−∇u k+1 − ε2 H ′ (p k )(p k+1 − p k ) − ε2 H(p k ) − τ k (p k+1 − p k ) = 0
linear system and easy to descretize
choose ε → 0 during iteration, to obtain fast convergence
start with small value τ and increase during iteration, to avoid
oscillations
ε and τ are chosen from experimental runs
back
Jahn Müller
Parallel Total Variation Minimization
18 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Overlapping Decomposition
Schwarz Methods
Domain Decomposition
Example
Lu = f
u = 0
in Ω
on ∂Ω
split the given domain Ω of the problem into subdomains
Ωi , i = 1, . . . , S
overlapping or non overlapping decompositions
when all unknowns are coupled, a straight forward splitting
leads to signifant errors at the interfaces
transmission conditions at the interface
avoid computing transmission conditions by using overlapping
decompositions
Jahn Müller
Parallel Total Variation Minimization
19 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Overlapping Decomposition
Schwarz Methods
Overlapping Decomposition
replacementsz
Ω′2
Ω′1
z
}|
}|
{
{
Ω2
Ω1
Γ2
Γ1
| {z }
δ
for a uniform lattice with stepsize h, δ = mh, with m ∈ N
redundant degress of freedom
achieve update for boundary data from this redundancy
Jahn Müller
Parallel Total Variation Minimization
20 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Overlapping Decomposition
Schwarz Methods
Additive Schwarz Method
Let u (0) be an initial function,
 (k+1)
= f,
in Ω1
Lu1
(k+1)
u
= u (k) |Γ1 ,
 1(k+1)
u1
= 0,
on Γ1
on ∂Ω1 \ Γ1
and
 (k+1)
= f,
Lu2

(k+1)
= u1
(k+1)
= 0,
u2
u2
(k)
|Γ2 ,
in Ω2
on Γ2
on ∂Ω2 \ Γ2 .
The next step is computed by
 (k+1)
if x ∈ Ω \ Ω1
u2 (x),
(k+1)
(k+1)
u
(x),
if x ∈ Ω \ Ω2
u
(x) =
1
 u1(k+1) (x)+u2(k+1) (x)
2
if x ∈ Ω1 ∩ Ω2 .
related to the well-known Jacobi method
direct application of parallelization
Jahn Müller
Parallel Total Variation Minimization
21 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Overlapping Decomposition
Schwarz Methods
Multiplicative Schwarz Method
Let u (0) be an initial function,
 (k+1)
= f,
in Ω1
Lu1
(k+1)
u1
(k+1)
u1

(k)
=u
= 0,
|Γ1 ,
on Γ1
on ∂Ω1 \ Γ1
and
 (k+1)
= f,
Lu2
(k+1)
u2
(k+1)
u2

=
(k+1)
u1
|Γ2 ,
= 0,
in Ω2
on Γ2
on ∂Ω2 \ Γ2 .
The next step is computed by
(k+1)
u
(x), if x ∈ Ω2
(k+1)
u
(x) = 2(k+1)
u1
(x),
if x ∈ Ω \ Ω2 .
related to the well-known Gauss-Seidel method
seems not conventient for a parallel implementation
need of coloring
Jahn Müller
Parallel Total Variation Minimization
22 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Overlapping Decomposition
Schwarz Methods
Multiplicative Schwarz Method - Coloring
painting subdomins in several colors, such that same colored
domains do not overlap
solve domains of same color in parallel using the latest
boundary conditions from the other “colors“
Jahn Müller
Parallel Total Variation Minimization
23 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Numerical Realization
•u11
∗p1
11
•u21
∗p1
21
..
.
∗p1
n−11
•un1
∗p2
11
∗p2
21
∗p2
n1
•u12
∗p1
12
•u22
∗p1
22
..
.
∗p1
n−12
•un2
∗p2
12
∗p2
22
∗p2
n2
...
...
...
...
..
.
...
...
∗p2
1n−1
∗p2
2n−1
∗p2
nn−1
•u1n
∗p1
1n
•u2n
∗p1
2n
..
.
∗p1
n−1n
•unn
Lay the degrees of freedom of the dual variable p in the center
between the pixels of u
compute the divergence of p effective as a value in each pixel
(using a single-sided difference quotient)
Jahn Müller
Parallel Total Variation Minimization
24 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Numerical Realization
applying anisotropic total variation
slightly change penalty term F :
F (p) =
1
1
max{|p1 | − 1, 0}2 + max{|p2 | − 1, 0}2
2
2
and hence
H(p) =
sgn(p1 ) · (|p1 | − 1) · 1{|p1 |≥1}
sgn(p2 ) · (|p2 | − 1) · 1{|p2 |≥1}
and its Hessian
′
H (p) =
1{|p1 |≥1}
0
0
1{|p2 |≥1}
Newton with damping
Jahn Müller
Parallel Total Variation Minimization
25 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Numerical Realization
write u, p 1 , p 2 column wise in vectors, to obtain a vector
x = (~u , p~1 , p~2 )
construct a system matrix A and righthand side b to obtain a
system Ax = b

A=















λ
..
.
D1
D2
D1t
M1
0
D2t
0
M2
λ
















due to the shape of A some improvements to solve this
system are applicable , e.g. Schur complement, or conjugate
gradient methods.
Jahn Müller
Parallel Total Variation Minimization
26 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Parallel Implementation
•
|
•
|
•
|
•
|
•
Jahn Müller
− •
|
− •
|
− •
|
− •
|
− •
− •
|
− •
|
− •
|
− •
|
− •
− •
|
− •
|
− •
|
− •
|
− •
− •
|
− •
|
− •
|
− •
|
− •
Processor #1
Processor #2
• − • − • −
|
|
|
• − • − • −
|
|
|
• − • − • −
|
|
|
− • − • − •
|
|
|
− • − • − •
|
|
|
− • − • − •
|
|
|
Processor #3
Processor #4
|
|
|
• − • − • −
|
|
|
• − • − • −
|
|
|
• − • − • −
|
|
|
− • − • − •
|
|
|
− • − • − •
|
|
|
− • − • − •
Parallel Total Variation Minimization
27 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Parallel Implementation
“ghost cells“(red) have to be communicated after each
iteration
explicit communication needed
communication realized via the Message Passing Interface
(MPI)
MatlabMPI makes MPI available in MATLAB (provided by
the Lincoln Laboratory of the Massachusetts Institute Of
Technology (MIT))
Jahn Müller
Parallel Total Variation Minimization
28 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Additive Version
Neumann boundary conditions for u turn into Dirichlet
boundary conditions for p
A
u
p
p
=
b
in Ω
=
0
on ∂Ω.
solve in each step
(k+1)
A|Ω1
u1
(k+1)
p1
!
(k+1)
p1
(k+1)
p1
Jahn Müller
(k+1)
=
b|Ω1
in Ω1
=
p2
(k)
on Γ1
=
0
on ∂Ω1 \ Γ1
A|Ω2
u2
(k+1)
p2
!
=
b|Ω2
p2
(k+1)
=
p1
(k+1)
p2
=
0
Parallel Total Variation Minimization
(k)
in Ω2
on Γ2
on ∂Ω2 \ Γ2 .
29 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Multiplicative Version
using m colors, the number of divisions would be m times
higher than the number of CPUs
alternatively coloring similar to a checkerboard
overlapping of domains of same color (but only 1 pixel)
Jahn Müller
Parallel Total Variation Minimization
30 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Results
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
500
50
100
150
200
250
300
350
400
450
500
(a) Original
50
100
150
200
250
300
350
400
450
500
(b) Noisy
using the following parameters:
λ=2
ǫ = 10−2
Jahn Müller
Parallel Total Variation Minimization
31 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Results: Sequential Algorithm
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
500
50
100
150
200
250
300
350
400
450
500
50
(a) After 1 iteration
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
150
200
250
300
350
400
450
500
450
500
500
50
100
150
200
250
300
350
400
450
(c) After 3 iterations
Jahn Müller
100
(b) After 2 iterations
500
50
100
150
200
250
300
350
400
450
500
(d) After 13 iterations
Parallel Total Variation Minimization
32 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Results: Multiplicative version on 2 CPUs (4 domains)
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
500
50
100
150
200
250
300
350
400
450
500
50
(a) After 1 iteration
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
150
200
250
300
350
400
450
500
450
500
500
50
100
150
200
250
300
350
400
450
(c) After 3 iterations
Jahn Müller
100
(b) After 2 iterations
500
50
100
150
200
250
300
350
400
450
500
(d) After 13 iterations
Parallel Total Variation Minimization
33 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Results: Additive Version on 4 CPUs
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
500
50
100
150
200
250
300
350
400
450
500
50
(a) After 1 iteration
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
150
200
250
300
350
400
450
500
450
500
500
50
100
150
200
250
300
350
400
450
(c) After 3 iterations
Jahn Müller
100
(b) After 2 iterations
500
50
100
150
200
250
300
350
400
450
500
(d) After 13 iterations
Parallel Total Variation Minimization
34 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Results: Multiplicative Version on 32 CPUs (64 domains)
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
500
50
100
150
200
250
300
350
400
450
500
50
(a) After 1 iteration
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
150
200
250
300
350
400
450
500
450
500
500
50
100
150
200
250
300
350
400
450
(c) After 3 iterations
Jahn Müller
100
(b) After 2 iterations
500
50
100
150
200
250
300
350
400
450
500
(d) After 14 iterations
Parallel Total Variation Minimization
35 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Results: Additive Version on 32 CPUs
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
500
50
100
150
200
250
300
350
400
450
500
50
(a) After 1 iteration
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
150
200
250
300
350
400
450
500
450
500
500
50
100
150
200
250
300
350
400
450
(c) After 3 iterations
Jahn Müller
100
(b) After 2 iterations
500
50
100
150
200
250
300
350
400
450
500
(d) After 16 iterations
Parallel Total Variation Minimization
36 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Computation Time
Additive
Multiplicative
90
Additive
Multiplicative
400
80
350
70
300
Time s
Time s
60
50
250
200
40
150
30
100
20
50
10
1 2
4
8
16
Processes
32
(a) Image size 512 × 512
1 2
4
8
16
Processes
32
(b) Image size 1024 × 1024
4
x 10
5000
Additive
Multiplicative
Additive
Multiplicative
6
4500
4000
5
3500
4
Time s
Time s
3000
2500
3
2000
2
1500
1000
1
500
1 2
4
8
16
Processes
32
(c) Image size 2048 × 2048
Jahn Müller
1 2
4
8
16
Processes
32
(d) Image size 4096 × 4096
Parallel Total Variation Minimization
37 / 38
Introduction
Total Variation Regularization
Domain Decomposition
The Algorithm
Numerical Realization
Results
Speedup
Additive
Multiplicative
linear Speedup
Additive
Multiplicative
linear Speedup
30
25
25
20
20
Speedup
Speedup
30
15
10
15
10
5
5
1 2
4
8
16
Processes
32
1 2
(a) Image size 512 × 512
Additive
Multiplicative
linear Speedup
35
4
8
16
Processes
32
(b) Image size 1024 × 1024
Additive
Multiplicative
linear Speedup
55
50
45
30
40
25
Speedup
Speedup
35
20
30
25
15
20
15
10
10
5
5
1 2
4
8
16
Processes
32
(c) Image size 2048 × 2048
Jahn Müller
1 2
4
8
16
Processes
32
(d) Image size 4096 × 4096
Parallel Total Variation Minimization
38 / 38