Introduction - Mathematical Foundations

Variational Methods in Computer Vision
ICCV Tutorial, 6.11.2011
Chapter 1
Introduction
Mathematical Foundations
Daniel Cremers and Bastian Goldlücke
Computer Vision Group
Technical University of Munich
Thomas Pock
Institute for Computer Graphics and Vision
Graz University of Technology
1
R
Rx
0
Rx
0
Overview
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
1 Variational Methods
Introduction
Convex vs. non-convex functionals
Archetypical model: ROF denoising
The variational principle
The Euler-Lagrange equation
2 Total Variation and Co-Area
The space BV(Ω)
Geometric properties
Co-area
3 Convex analysis
Convex functionals
Constrained Problems
Conjugate functionals
Subdifferential calculus
Proximation and implicit subgradient descent
4 Summary
2
R
Rx
Variational Methods
Overview
R x0
Rx
0
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
1 Variational Methods
Introduction
Convex vs. non-convex functionals
Archetypical model: ROF denoising
The variational principle
The Euler-Lagrange equation
2 Total Variation and Co-Area
The space BV(Ω)
Geometric properties
Co-area
3 Convex analysis
Convex functionals
Constrained Problems
Conjugate functionals
Subdifferential calculus
Proximation and implicit subgradient descent
4 Summary
3
R
Variational Methods
Rx
0
R
Introduction 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Fundamental problems in computer vision 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Image labeling problems
Segmentation
and Classification
Stereo
Optic flow
4
R
Variational Methods
Rx
0
R
Introduction 0x
Rx
Fundamental problems in computer vision 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
3D Reconstruction
5
R
Variational Methods
Rx
0
R
Introduction 0x
Rx
Variational methods 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Unifying concept: variational approach
6
R
Variational Methods
Rx
0
R
Introduction 0x
Rx
Variational methods 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Unifying concept: variational approach
Problem solution is the minimizer of an energy functional E,
argmin E(u).
u∈V
6
R
Variational Methods
Rx
0
R
Introduction 0x
Rx
Variational methods 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Unifying concept: variational approach
Problem solution is the minimizer of an energy functional E,
argmin E(u).
u∈V
In the variational framework, we adopt a
continuous world view.
6
R
Variational Methods
Rx
0
R
Introduction 0x
Rx
Images are functions 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
A greyscale image is a real-valued function
u : Ω → R on an open set Ω ⊂ R2 .
7
R
Variational Methods
Rx
0
R
Introduction 0x
Rx
Surfaces are manifolds 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Region Ω0 (backgroun
Ω1
Ω1
Ω0
Σ
2D
Region Ω1 (flower)
3D
Volume usually modeled as the level set {x ∈ Ω : u(x) = 1}
of a binary function u : Ω → {0, 1}.
8
R
Variational Methods
Rx
0
R
Convex vs. non-convex functionals 0x
Convex versus non-convex energies
non-convex energy
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
convex energy
9
R
Variational Methods
Rx
0
R
Convex vs. non-convex functionals 0x
Convex versus non-convex energies
non-convex energy
• Cannot be globally minimized
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
convex energy
• Efficient global minimization
9
R
Variational Methods
Rx
0
R
Convex vs. non-convex functionals 0x
Convex versus non-convex energies
non-convex energy
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
convex energy
• Cannot be globally minimized
• Efficient global minimization
• Realistic modeling
• Often unrealistic models
9
R
Variational Methods
Rx
0
R
Convex vs. non-convex functionals 0x
Convex relaxation
D. Cremers, B. Goldlücke, T. Pock
Rx
ICCV 2011 Tutorial
Variational Methods in Computer Vision
0
Convex relaxation: best of both worlds?
E
{
R
• Start with realistic non-convex model energy E
• Relax to convex lower bound R, which can be efficiently
minimized
• Find a (hopefully small) optimality bound to estimate quality of
solution.
R
10
Variational Methods
Rx
0
R
Archetypical model: ROF denoising 0x
Rx
A simple (but important) example: Denoising 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The TV-L2 (ROF) model, Rudin-Osher-Fatemi 1992
For a given noisy input image f , compute
Z
Z
1
(u − f )2 dx .
argmin
|∇u|2 dx +
2λ Ω
u∈L2 (Ω)
| Ω {z
} |
{z
}
regularizer / prior
data / model term
Note: In Bayesian statistics, this can be interpreted as a MAP
estimate for Gaussian noise.
Original
Noisy
Result, λ = 2
R
11
Variational Methods
n
Rx
0
2
R vs. L (Ω)
R
Archetypical model: ROF denoising 0x
Rx
ICCV 2011 Tutorial
Variational Methods in Computer Vision
0
Elements
Inner
Product
Norm
D. Cremers, B. Goldlücke, T. Pock
V = Rn
finitely many components
xi , 1 ≤ i ≤ n
Pn
(x, y ) = i=1 xi yi
qP
n
2
|x|2 =
i=1 xi
V = L2 (Ω)
infinitely many “components”
u(x), x ∈ Ω
R
(u, v ) = Ω uv dx
R
12
2
kuk2 = Ω |u| dx
Derivatives of a functional E : V → R
Gradient
(Fréchet )
Directional
(Gâteaux )
Condition for
minimum
dE(x) = ∇E(x)
dE(u) = ?
δE(x; h) = ∇E(x) · h
δE(u; h) = ?
∇E(x̂) = 0
?
R
12
Variational Methods
Rx
0
R
The variational principle 0x
Rx
Gâteaux differential 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Definition
Let V be a vector space, E : V → R a functional, u, h ∈ V. If the limit
1
(E(u + αh) − E(u))
α→0 α
δE(u; h) := lim
exists, it is called the Gâteaux differential of E at u with increment h.
• The Gâteaux differential can be though of as the directional
derivative of E at u in direction h.
• A classical term for the Gâteaux differential is “variation of E”,
hence the term “variational methods”. You test how the functional
“varies” when you go into direction h.
R
13
Variational Methods
Rx
0
R
The variational principle 0x
Rx
The variational principle 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The variational principle is a generalization of the necessary condition
for extrema of functions on Rn .
Theorem (variational principle)
If û ∈ V is an extremum of a functional E : V → R, then
δE(û; h) = 0 for all h ∈ V.
For a proof, note that if û is an extremum of E, then 0 must be an
extremum of the real function
t 7→ E(û + th)
for all h.
R
14
Variational Methods
Rx
0
R
The Euler-Lagrange equation 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Derivation of Euler-Lagrange equation (1) 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Method:
• Compute the Gâteaux derivative of E at u in direction h, and
write it in the form
Z
δE(u; h) =
φu h dx,
Ω
with a function φu : Ω → R and a test function h ∈ Cc∞ (Ω).
• At an extremum, this expression must be zero for arbitrary test
functions h, thus (due to the “duBois-Reymond Lemma”) you get
the condition
φu = 0.
This is the Euler-Lagrange equation of the functional E.
• Note: the form above is in analogy to the finite-dimensional case,
where the gradient satisfies δE(x; h) = h∇E(x), ·hi.
R
15
Variational Methods
Rx
0
R
The Euler-Lagrange equation 0x
Rx
Euler-Lagrange equation 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The Euler-Lagrange equation is a PDE which has to be satisfied by
an extremal point û. A ready-to-use formula can be derived for
energy functionals of a specific, but very common form.
Theorem
Let û be an extremum of the functional E : C 1 (Ω) → R, and E be of
the form
Z
E(u) =
L(u, ∇u, x) dx,
Ω
with L : R × Rn × Ω → R, (a, b, x) 7→ L(a, b, x) continuously
differentiable. Then û satisfies the Euler-Lagrange equation
∂a L(u, ∇u, x) − divx [∇b L(u, ∇u, x)] = 0,
where the divergence is computed with respect to the location
variable x, and
T
∂L
∂L
∂L
∂a L :=
, ∇b L :=
...
.
∂a
∂b1
∂bn
R
16
Variational Methods
Rx
0
R
The Euler-Lagrange equation 0x
Rx
Derivation of Euler-Lagrange equation (2) 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The Gâteaux derivative of E at u in direction h is
Z
1
δE(u; h) = lim
L(u + αh, ∇(u + αh), x) − L(u, ∇u, x) dx.
α→0 α Ω
Because of the assumptions on L, we can take the limit below the
integral and apply the chain rule to get
Z
δE(u; h) =
∂a L(u, ∇u, x)h + ∇b L(u, ∇u, x) · ∇h dx.
Ω
Applying integration by parts to the second part of the integral with
p = ∇b L(u, ∇u, x), noting h∂Ω = 0, we get
Z δE(u; h) =
∂a L(u, ∇u, x) − divx [∇b L(u, ∇u, x)] · h dx.
Ω
This is the desired expression, from which we can directly see the
definition of φu .
R
17
Variational Methods
Rx
0
R
Roadmap 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Open questions 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
• The regularizer of the ROF functional is
Z
Ω
|∇u|2 dx,
which requires u to be differentiable. Yet, we are looking for
minimizers in L2 (Ω). It is necessary to generalize the definition of
the regularizer, which will lead to the total variation in the next
section.
• The total variation is not a differentiable functional, so the
variational principle is not applicable. We need a theory for
convex, but not differentiable functionals.
R
18
Total Variation and Co-Area
Overview
Rx
0
Rx
0
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
1 Variational Methods
Introduction
Convex vs. non-convex functionals
Archetypical model: ROF denoising
The variational principle
The Euler-Lagrange equation
2 Total Variation and Co-Area
The space BV(Ω)
Geometric properties
Co-area
3 Convex analysis
Convex functionals
Constrained Problems
Conjugate functionals
Subdifferential calculus
Proximation and implicit subgradient descent
4 Summary
R
19
Total Variation and Co-Area
Rx
0
R
The space BV(Ω) 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Definition of the total variation 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
• Let u ∈ L1loc (Ω). Then the total variation of u is defined as
Z
1
n
J(u) := sup − u · div(ξ) dx : ξ ∈ Cc (Ω, R ), kξk∞ ≤ 1 .
Ω
• The space BV(Ω) of functions of bounded variation is defined as
BV(Ω) := u ∈ L1loc (Ω) : J(u) < ∞ .
An idea why this is the same as before: the norm of the gradient
|∇u(x)|2 below the integral for a fixed point x can be written as
|∇u(x)|2 =
sup
∇u(x) · ξ.
ξ∈Rn :|ξ|2 ≤1
We can then use Gauss’ theorem again to shift the gradient from u to
a divergence on ξ.
R
20
Total Variation and Co-Area
Rx
0
R
The space BV(Ω) 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Convexity and lower-semicontinuity 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Below are the main analytical properties of the total variation. It also
enjoys a number of interesting geometrical relationships, which will
be explored next.
Proposition
• J is a semi-norm on BV(Ω), and it is convex on L2 (Ω).
• J is lower semi-continuous on L2 (Ω), i.e.
kun − uk2 → 0 =⇒ J(u) ≤ lim inf J(un ).
un
The above can be shown immediately from the definition, lower
semi-continuity requires Fatou’s Lemma.
Lower semi-continuity is important for the existence of minimizers,
see next section.
R
21
Total Variation and Co-Area
Rx
0
R
Geometric properties 0x
Rx
Characteristic functions of sets 0
∂U
{1U = 0}
U = {1U = 1}
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
n
Let U ⊂ Ω. Then the characteristic
function of U is defined as
(
1 if x ∈ U,
1U (x) :=
0 otherwise.
Notation
If u : Ω → R then {f = 0} is a short notation for the set
{x ∈ Ω : f (x) = 0} ⊂ Ω.
Similar notation is used for inequalities and other properties.
R
22
Total Variation and Co-Area
Rx
0
R
Geometric properties 0x
Rx
Total variation of a characteristic function 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
We now compute the TV of the characteristic function of a
“sufficiently nice” set U ⊂ Ω, with a C 1 -boundary.
Remember: to compute the total variation, one maximizes over all
vector fields ξ ∈ Cc1 (Ω, Rn ), kξk∞ ≤ 1:
Z
Z
− 1U · div(ξ) dx = − div(ξ) dx
Ω
Z U
=
n · ξ ds (Gauss’ theorem)
∂U
The expression is maximized for any vector field with ξ|∂U = n, hence
Z
J(1U ) =
ds = Hn−1 (∂U).
∂U
Here, Hn−1 , is the (n − 1)-dimensional Haussdorff measure, i.e. the
length in the case n = 2, or area for n = 3.
R
23
Total Variation and Co-Area
Rx
0
R
Co-area 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
The co-area formula 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The co-area formula in its geometric form says that the total variation
of a function equals the integral over the (n − 1)-dimensional area of
the boundaries of all its lower level sets. More precisely,
Theorem (co-area formula)
Let u ∈ BV(Ω). Then
Z
∞
J(u) =
J(1{u≤t} ) dt
−∞
R
24
Convex analysis
Overview
Rx
R0 x
Rx
0
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
1 Variational Methods
Introduction
Convex vs. non-convex functionals
Archetypical model: ROF denoising
The variational principle
The Euler-Lagrange equation
2 Total Variation and Co-Area
The space BV(Ω)
Geometric properties
Co-area
3 Convex analysis
Convex functionals
Constrained Problems
Conjugate functionals
Subdifferential calculus
Proximation and implicit subgradient descent
4 Summary
R
25
Convex analysis
Rx
0
R
Convex functionals 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
The epigraph of a functional 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Definition
The epigraph epi(f ) of a functional f : V → R ∪ {∞} is the set “above
the graph”, i.e.
epi(f ) := {(x, µ) : x ∈ V and µ ≥ f (x)}.
R
epi(f )
f
dom(f )
V
R
26
Convex analysis
Rx
0
R
Convex functionals 0x
Rx
Convex functionals 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
We choose the geometric definition of a convex function here
because it is more intuitive, the usual algebraic property is a simple
consequence.
Definition
• A functional f : V → R ∪ {∞} is called proper if f 6= ∞, or
equivalently, the epigraph is non-empty.
• A functional f : V → R ∪ {∞} is called convex if epi(f ) is a convex
set.
• The set of all proper and convex functionals on V is denoted
conv(V).
The only non-proper function is the constant function f = ∞. We
exclude it right away, otherwise some theorems become cumbersome
to formulate. From now on, every functional we write down will be
proper.
R
27
Convex analysis
Rx
0
R
Convex functionals 0x
Rx
Extrema of convex functionals 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Convex functionals have some very important properties with respect
to optimization.
Proposition
Let f ∈ conv(V). Then
• the set of minimizers argminx∈V f (x) is convex (possibly empty).
• if x̂ is a local minimum of f , then x̂ is in fact a global minimum,
i.e. x̂ ∈ argminx∈V f (x).
Both can be easily deduced from convexity of the epigraph.
R
28
Convex analysis
Rx
0
R
Convex functionals 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Lower semi-continuity and closed functionals 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Lower semi-continuity is an important property for convex functionals,
since together with coercivity it guarantees the existence of a
minimizer. It has an intuitive geometric interpretation.
Definition
Let f : V → R ∪ {∞} be a functional. Then f is called closed if epi(f )
is a closed set.
Proposition (closedness and lower semi-continuity)
For a functional f : V → R ∪ {∞}, the following two are equivalent:
• f is closed.
• f is lower semi-continuous, i.e.
f (x) ≤ lim inf f (xn )
xn →x
for any sequence (xn ) which converges to x.
R
29
Convex analysis
Rx
0
R
Convex functionals 0x
Rx
An existence theorem for a minimum 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Definition
Let f : V → R ∪ {∞} be a functional. Then f is called coercive if it is
“unbounded at infinity”. Precisely, for any sequence (xn ) ⊂ V with
lim kxn k = ∞, we have lim f (xn ) = ∞.
Theorem
Let f be a closed, coercive and convex functional on a Banach
space V. Then f attains a minimum on V.
The requirement of coercivity can be weakened, a precise condition
and proof is possible to formulate with the subdifferential calculus. On
Hilbert spaces (and more generally, the so-called “reflexive” Banach
spaces), the requirements of “closed and convex” can be replaced by
“weakly lower semi-continuous”. See [Rockafellar] for details.
R
30
Convex analysis
Rx
0
R
Convex functionals 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Examples 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
• The function x 7→ exp(x) is convex, lower semi-continuous but
not coercive on R. The infimum 0 is not attained.
• The function
(
∞
x 7→
x2
if x ≤ 0
if x > 0
is convex, coercive, but not closed on R. The infimum 0 is not
attained.
• The functional of the ROF model is closed and convex. It is also
coercive on L2 (Ω): from the inverse triangle inequality,
|kuk2 − kf k2 | ≤ ku − f k2 .
Thus, if kun k2 → ∞, then
E(un ) ≥ kun − f k2 ≥ |kun k2 − kf k2 | → ∞.
Therefore, there exists a minimizer of ROF for each input
f ∈ L2 (Ω).
R
31
Convex analysis
Rx
0
R
Constrained Problems 0x
Rx
The Indicator Function of a Set 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Definition
For any subset S ⊂ V of a vector space, the indicator function
δS : V → R ∪ ∞ is defined as
(
∞ if x 6= S,
δS (x) :=
0 if x ∈ S.
Indicator functions give examples for particularly simple convex
functions, as they have only two different function values.
Proposition (convexity of indicator functions)
S is a convex set if and only if δS is a convex function.
The proposition is easy to prove (exercise). Note that by convention,
r < ∞ for all r ∈ R.
R
32
Convex analysis
Rx
0
R
Constrained Problems 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Constrained Problems 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Suppose you want to find the minimizer of a convex
functional f : C → R defined on a convex set C ⊂ V. You can always
exchange that with an unconstrained problem which has the same
minimizer: introduce an extended function
(
f (x) if x ∈ C
f̃ : V → R,
f̃ (x) :=
∞
otherwise.
Then
argmin f (x) = argmin f̃ (x),
x∈V
x∈C
and f̃ is convex.
Similarly, if f : V → R is defined on the whole space V, then
argmin f (x) = argmin [f (x) + δC (x)] ,
x∈C
x∈V
and the function on the right hand side is convex.
R
33
Convex analysis
Rx
0
R
Conjugate functionals 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Affine functions 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Note: If you do not know what the dual space V ∗ of a vector space is,
then you can substitute V - we work the Hilbert space L2 (Ω), so they
are the same.
Definition
Let ϕ ∈ V ∗ and c ∈ R, then an affine function on V is given by
hϕ,c : v 7→ hx, ϕi − c.
We call ϕ the slope and c the intercept of hϕ,c .
R
−c
hϕ,c
[ϕ
−1]
V
R
34
Convex analysis
Rx
0
R
Conjugate functionals 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Affine functions 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
We would like to find the largest affine function below f . For this,
consider for each x ∈ V the affine function which passes through
(x, f (x)):
hϕ,c (x) = f (x) ⇔ hx, ϕi − c = f (x) ⇔ c = hx, ϕi − f (x).
−(hx, ϕi − f (x))
f (x)
epi(f )
f
hϕ,hx,ϕi−f (x)
x
V
To get the largest affine function below f , we have to pass to the
supremum. The intercept of this function is called the conjugate
functional of f .
R
35
Convex analysis
Rx
0
R
Conjugate functionals 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Conjugate functionals 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Definition
Let f ∈ conv(V). Then the conjugate functional f ∗ : V ∗ → R ∪ {∞} is
defined as
f ∗ (ϕ) := sup [hx, ϕi − f (x)] .
x∈V
R
epi(f )
−f ∗ (ϕ)
f
[ϕ
−1]
V
hϕ,f ∗ (ϕ)
R
36
Convex analysis
Rx
0
Second conjugate
R
Conjugate functionals 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
ICCV 2011 Tutorial
Variational Methods in Computer Vision
0
The epigraph of f ∗ consists of all pairs (ϕ, c) such that hϕ,c lies below
f . It almost completely characterizes f . The reason for the “almost” is
that you can recover f only up to closure.
Theorem
Let f ∈ conv(V) be closed and V be reflexive, i.e. V ∗∗ = V. Then
f ∗∗ = f .
For the proof, note that
f (x) = sup hϕ,c (x) =
hϕ,c ≤f
sup
hϕ,c (x)
(ϕ,c)∈epi(f ∗ )
= sup [hx, ϕi − f ∗ (ϕ)] = f ∗∗ (x).
ϕ∈V ∗
The first equality is intuitive, but surprisingly difficult to show - it
ultimately relies on the theorem of Hahn-Banach.
R
37
Convex analysis
Rx
0
The subdifferential
R
Subdifferential calculus 0x
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Definition
• Let f ∈ conv(V). A vector ϕ ∈ V ∗ is called a subgradient of f at
x ∈ V if
f (y ) ≥ f (x) + hy − x, ϕi for all y ∈ V.
• The set of all subgradients of f at x is called the subdifferential
∂f (x).
Geometrically speaking, ϕ is a subgradient if the graph of the affine
function
h(y ) = f (x) + hy − x, ϕi
lies below the epigraph of f . Note that also h(x) = f (x), so it
“touches” the epigraph.
R
38
Convex analysis
Rx
0
The subdifferential
R
Subdifferential calculus 0x
Rx
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
0
Example: the subdifferential of f : x 7→ |x| in 0 is
∂f (0) = [−1, 1].
R
39
Convex analysis
Rx
0
R
Subdifferential calculus 0x
Rx
Subdifferential and derivatives 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The subdifferential is a generalization of the Fréchet derivative (or the
gradient in finite dimension), in the following sense.
Theorem (subdifferential and Fréchet derivative
Let f ∈ conv(V) be Fréchet differentiable at x ∈ V. Then
∂f (x) = {df (x)}.
The proof of the theorem is surprisingly involved - it requires to relate
the subdifferential to one-sided directional derivatives. We will not
explore these relationships in this lecture.
R
40
Convex analysis
Rx
0
R
Subdifferential calculus 0x
D. Cremers, B. Goldlücke, T. Pock
Rx
Relationship between subgradient and conjugate 0
ICCV 2011 Tutorial
Variational Methods in Computer Vision
R
epi(f )
−f ∗ (ϕ)
f
hϕ,f ∗ (ϕ)
x
V
ϕ is a subgradient at x if and only if the line hϕ,f ∗ (ϕ) touches the
epigraph at x. In formulas,
ϕ ∈ ∂f (x)
⇔
hϕ,f ∗ (ϕ) (y ) = f (x) + hy − x, ϕi
⇔
f ∗ (ϕ) = hx, ϕi − f (x)
R
41
Convex analysis
Rx
0
R
Subdifferential calculus 0x
Rx
The subdifferential and duality 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The previously seen relationship between subgradients and
conjugate functional can be summarized in the following theorem.
Theorem
Let f ∈ conv(V) and x ∈ V. Then the following conditions on a vector
ϕ ∈ V ∗ are equivalent:
• ϕ ∈ ∂f (x).
• x = argmaxy∈V [hy, ϕi − f (y)] .
• f (x) + f ∗ (ϕ) = hx, ϕi.
If furthermore, f is closed, then more conditions can be added to this
list:
• x ∈ ∂f ∗ (ϕ).
• ϕ = argmaxψ∈V ∗ [hx, ψi − f ∗ (ψ)] .
R
42
Convex analysis
Rx
0
R
Subdifferential calculus 0x
Rx
Formal proof of the theorem 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The equivalences are easy to see.
• Rewriting the subgradient definition, one sees that ϕ ∈ ∂f (x)
means
hx, ϕi − f (x) ≥ hy , ϕi − f (y ) for all y ∈ V.
This implies the first equivalence.
• We have seen the second one on the slide before.
• If f is closed, then f ∗∗ = f , thus we get
f ∗∗ (x) + f ∗ (ϕ) = hx, ϕi .
This is equivalent to the last two conditions using the same
arguments as above on the conjugate functional.
R
43
Convex analysis
Rx
0
R
Subdifferential calculus 0x
Rx
Variational principle for convex functionals 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
As a corollary of the previous theorem, we obtain a generalized
variational principle for convex functionals. It is a necessary and
sufficient condition for the (global) extremum.
Corollary (variational principle for convex functionals)
Let f ∈ conv(V). Then x̂ is a global minimum of f if and only if
0 ∈ ∂f (x̂).
Furthermore, if f is closed, then x̂ is a global minimum if and only if
x̂ ∈ ∂f ∗ (0),
i.e. minimizing a functional is the same as computing the
subdifferential of the conjugate functional at 0.
To see this, just set ϕ = 0 in the previous theorem.
R
44
Convex analysis
Rx
0
R
Proximation and implicit subgradient descent 0x
Rx
Moreau’s theorem 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
For the remainder of the lecture, we will assume that the underlying
space is a Hilbert space H, for example L2 (Ω).
Theorem (geometric Moreau)
Let f be convex and closed on the Hilbert space H, which we identify
with its dual. Then for every z ∈ H there is a unique decomposition
z = x̂ + ϕ with ϕ ∈ ∂f (x̂),
and the unique x̂ in this decomposition can be computed with the
proximation
1
2
proxf (z) := argmin
kx − zkH + f (x) .
2
x∈H
Corollary to Theorem 31.5 in Rockafellar, page 339 (of 423). The
actual theorem has somewhat more content, but is very technical and
quite hard to digest. The above is the essential consequence.
R
45
Convex analysis
Rx
0
R
Proximation and implicit subgradient descent 0x
Rx
Proof of Moreau’s Theorem 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
The correctness of the theorem is not too hard to see: if x̂ = proxf (z),
then
1
2
x̂ ∈ argmin
kx − zkH + f (x)
2
x∈H
⇔
0 ∈ x̂ − z + ∂f (x̂)
⇔
z ∈ x̂ + ∂f (x̂).
Existence and uniqueness of the proximation follows because the
functional is closed, strictly convex and coercive.
R
46
Convex analysis
Rx
0
R
Proximation and implicit subgradient descent 0x
The geometry of the graph of ∂f
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
• The map z 7→ (proxf (z), z − proxf (z)) is a continuous map from
H into the graph of ∂f ,
graph(∂f ) := {(x, ϕ) : x ∈ H, ϕ ∈ ∂f (x)} ⊂ H × H,
with continuous inverse (x, ϕ) 7→ x + ϕ.
• The theorem of Moreau now says that this map is one-to one. In
particular,
H ' graph(∂f ),
i.e. the sets are homeomorphic.
• In particular, graph(∂f ) is always connected.
R
47
Convex analysis
Rx
0
R
Proximation and implicit subgradient descent 0x
Rx
Fixed points of the proximation operator 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Proposition
Let f be closed and convex on the Hilbert space H. Let ẑ be a fixed
point of the proximation operator proxf , i.e.
ẑ = proxf (ẑ).
Then ẑ is a minimizer of f . In particular, it also follows that
ẑ ∈ (I − proxf )−1 (0).
To proof this, just note that because of Moreau’s theorem,
ẑ ∈ proxf (ẑ) + ∂f (ẑ) ⇔ 0 ∈ ∂f (ẑ)
if ẑ is a fixed point.
R
48
Convex analysis
Rx
0
R
Proximation and implicit subgradient descent 0x
Rx
Subgradient descent 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Let λ > 0, z ∈ H and x = proxλf (z). Then
z ∈ x + ∂λf (x)
⇔ x ∈ z − λ∂f (x).
In particular, we have the following interesting observation:
The proximation operator proxλf computes an implicit subgradient
descent step of step size λ for the functional f .
Implicit here means that the subgradient is not evaluated at the
original, but at the new location. This improves stability of the
descent. Note that if subgradient descent converges, then it
converges to a fixed point ẑ of I − λ∂f , in particular ẑ is a minimizer of
the functional f .
R
49
Summary
Rx
0
Rx
0
Rx
Summary 0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
• Variational calculus deals with functionals on infinite-dimensional
vector spaces.
• Minima are characterized by the variational principle, which leads
to the Euler-Lagrange equation as a condition for a local
minimum.
• The total variation is a powerful regularizer for image processing
problems. For binary functions u, it equals the perimeter of the
set where u = 1.
• Convex optimization deals with finding minima of convex
functionals, which can be non-differentiable.
• The generalization of the variational principle for a convex
functional is the condition that the subgradient at a minimum is
zero.
• Efficient optimization methods rely heavily on the concept of
duality. It allows certain useful transformations of convex
problems, which will be employed in the next chapter.
R
50
References
Rx
0
References (1)
Rx
0
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Variational methods
Luenberger,
“Optimization by Vector Space Methods”,
Wiley 1969.
• Elementary introduction of optimization on Hilbert and Banach spaces.
• Easy to read, many examples from other disciplines, in particular
economics.
Gelfand and Fomin,
“Calculus of Variations”,
translated 1963 (original in Russian).
• Classical introduction of variational calculus, somewhat outdated
terminology, inexpensive and easy to get
• Historically very interesting, lots of non-computer-vision applications
(classical geometric problems, Physics: optics, mechanics, quantum
mechanics, field theory)
R
51
References
Rx
0
References (2)
Rx
0
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Total Variation
Chambolle, Caselles, Novaga, Cremers, Pock
“An Introduction to Total Variation for Image Analysis”,
Summer School, Linz, Austria 2006.
• Focused introduction to total variation for image processing applications,
plus some basics of convex optimization and the numerics of
optimization.
• Available online for free.
Attouch, Buttazzo and Micaille,
“Variational Analysis in Sobolev and BV spaces”,
SIAM 2006.
• Exhaustive introduction to variational methods and convex optimization
in infinite dimensional spaces, as well as the theory of BV functions.
• Mathematically very advanced, requires solid knowledge of functional
analysis.
R
52
References
Rx
0
References (3)
Rx
0
Rx
0
D. Cremers, B. Goldlücke, T. Pock
ICCV 2011 Tutorial
Variational Methods in Computer Vision
Convex Optimization
Boyd and Vandenberghe,
“Convex Optimization”,
Stanford University Press 2004.
• Excellent recent introduction to convex optimization.
• Reads very well, available online for free.
Rockafellar,
“Convex Analysis”,
Princeton University Press 1970.
• Classical introduction to convex analysis and optimization.
• Somewhat technical and not too easy to read, but very exhaustive.
R
53