Gradient Flows in Spaces of Probability Measures

Gradient Flows in Spaces of Probability Measures
Luigi Ambrosio∗
Scuola Normale Superiore, Pisa
Giuseppe Savaré†
Dipartimento di Matematica, Università di Pavia
January 20, 2006
Contents
1 Notation and measure-theoretic results
3
2 Metric and differentiable structure of the Wasserstein space
2.1 Absolutely continuous maps and metric derivative . . . . . . .
2.2 The change of variables formula . . . . . . . . . . . . . . . . . .
2.3 The quadratic optimal transport problem . . . . . . . . . . . .
2.4 Geodesics in P2 (Rd ) . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Existence of optimal transport maps . . . . . . . . . . . . . . .
2.6 The continuity equation with locally Lipschitz velocity fields . .
2.7 The tangent bundle to the Wasserstein space . . . . . . . . . .
.
.
.
.
.
.
.
5
5
6
7
9
10
12
20
3 Convex functionals in P2 (Rd )
3.1 λ-geodesically convex functionals in Pp (X) . . . . .
3.2 Examples of convex functionals in P2 (Rd ) . . . . . .
3.3 Relative entropy and convex functionals of measures
3.4 Log-concavity and displacement convexity . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
30
32
37
40
4 Subdifferential calculus in P2 (Rd )
4.1 Definition of the subdifferential for a.c. measures
4.2 Subdifferential calculus in P2a (Rd ) . . . . . . . .
4.3 The case of λ-convex functionals along geodesics
4.4 Regular functionals . . . . . . . . . . . . . . . . .
4.5 Examples of subdifferentials . . . . . . . . . . . .
4.5.1 Variational integrals: the smooth case . .
4.5.2 The potential energy . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
47
49
50
53
56
56
57
∗ [email protected][email protected]
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.5.3
4.5.4
4.5.5
4.5.6
4.5.7
The
The
The
The
The
internal energy . . . . . . . . . . . . . .
relative internal energy . . . . . . . . . .
interaction energy . . . . . . . . . . . . .
opposite Wasserstein distance . . . . . .
sum of internal, potential and interaction
. . . . .
. . . . .
. . . . .
. . . . .
energy
.
.
.
.
.
.
.
.
.
.
58
62
64
67
68
5 Gradient flows of λ-geodesically convex functionals in P2 (Rd ) 71
5.1 Uniqueness and various characterizations of gradient flows . . . . 72
5.2 Main properties of Gradient Flows . . . . . . . . . . . . . . . . . 75
5.3 Existence of Gradient Flows by convergence of the “Minimizing
Movement” scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . 86
6 Applications to Evolution PDE’s
6.1 Gradient flows and evolutionary PDE’s of diffusion type . . . . .
6.1.1 Changing the reference measure . . . . . . . . . . . . . . .
6.2 The linear transport equation for λ-convex potentials . . . . . . .
6.3 Kolomogorov-Fokker-Planck equation . . . . . . . . . . . . . . .
6.3.1 Relative Entropy and Fisher Information . . . . . . . . .
6.3.2 Wasserstein formulation of the Kolmogorov-Fokker-Planck
equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.3 The construction of the Markovian semigroup . . . . . . .
6.4 Nonlinear diffusion equations . . . . . . . . . . . . . . . . . . . .
6.5 Drift diffusion equations with non local terms . . . . . . . . . . .
6.6 Gradient flow of −W 2 /2 and geodesics . . . . . . . . . . . . . . .
6.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . .
2
89
89
90
92
94
94
95
100
105
107
108
109
1
Notation and measure-theoretic results
In this section we recall the main notation used in this paper and some basic
measure-theoretic terminology and results. Given a metric space (X, d), we
denote by P(X) the set of probability measures µ : B(X) → [0, 1], where
B(X) is the Borel σ-algebra. When X is a Borel subset of an euclidean space
Rd , we set
Z
m2 (µ) :=
|x|2 dµ
X
and we denote by P2 (X) the subspace of X made by measures with finite
quadratic moment:
P2 (X) := {µ ∈ P(X) : m2 (µ) < ∞} .
We denote by L d the Lebesgue measure in Rd and set
P2a (X) := µ ∈ P2 (X) : µ L d ,
whenever X ∈ B(Rd ).
If µ ∈ P(X1 ), and r : X1 → X2 is a Borel (or, more generally, µ-measurable)
map, we denote by r # µ ∈ P(X2 ) the push-forward of µ through r, defined by
r # µ(B) := µ(r −1 (B)) ∀ B ∈ B(X2 ).
More generally we have
Z
(1.1)
Z
f (r(x)) dµ(x) =
X1
f (y) d r # µ(y)
(1.2)
X2
for every bounded (or r # µ-integrable) Borel function f : X2 → R. It is easy to
check that
∀ µ, ν ∈ P(X1 ).
(1.3)
ν µ =⇒ r # ν r # µ
Notice also the natural composition rule
(r ◦ s)# µ = r # (s# µ)
where s : X1 → X2 , r : X2 → X3 , µ ∈ P(X1 ). (1.4)
We denote by π i , i = 1, 2, the projection operators defined on a product
space X := X1 × X2 , defined by
π 1 : (x1 , x2 ) 7→ x1 ∈ X1 ,
π 2 : (x1 , x2 ) 7→ x2 ∈ X2 .
(1.5)
If X is endowed with the canonical product metric and the Borel σ-algebra and
µ ∈ P(X), the marginals of µ are the probability measures
i
µi := π#
µ ∈ P(Xi ),
i = 1, 2.
(1.6)
Given µ1 ∈ P(X1 ) and µ2 ∈ P(X2 ) the class Γ(µ1 , µ2 ) of transport plans
between µ1 and µ2 is defined by
n
o
i
Γ(µ1 , µ2 ) := µ ∈ P(X1 × X2 ) : π#
µ = µi , i = 1, 2 .
(1.7)
3
Notice also that
Γ(µ1 , µ2 ) = {µ1 × µ2 }
if either µ1 or µ2 is a Dirac mass.
(1.8)
To each couple of measures µ1 ∈ P(X1 ), µ2 = r # µ1 ∈ P(X2 ) linked by a Borel
transport map r : X1 → X2 we can associate the transport plan
µ := (i × r)# µ1 ∈ Γ(µ1 , µ2 ),
i being the identity map on X1 .
(1.9)
If µ is representable as in (1.9) then we say that µ is induced by r. Each
transport plan µ concentrated on a µ-measurable graph in X1 × X2 admits the
representation (1.9) for some µ1 -measurable map r, which therefore transports
µ1 to µ2 (see, e.g., [6]).
Conformally to the probabilistic terminology, we say that a sequence (µn ) ⊂
P(X) is narrowly convergent to µ ∈ P(X) as n → ∞ if
Z
Z
lim
f (x) dµn (x) =
f (x) dµ(x)
(1.10)
n→∞
for every function f ∈
tions defined on X.
X
Cb0 (X),
X
the space of continuous and bounded real func-
Theorem 1.1 (Prokhorov, [29, III-59]) If a set K ⊂ P(X) is tight, i.e.
∀ε > 0
∃ Kε compact in X such that
µ(X \ Kε ) ≤ ε
∀ µ ∈ K,
(1.11)
then K is relatively compact in P(X).
When one needs to pass to the limit in expressions like (1.10) w.r.t. unbounded or lower semicontinuous functions f , the following two properties are
quite useful. The first one is a lower semicontinuity property:
Z
Z
lim inf
g(x) dµn (x) ≥
g(x) dµ(x)
(1.12)
n→∞
X
X
for every sequence (µn ) ⊂ P(X) narrowly convergent to µ and any l.s.c. function
g : X → (−∞, +∞] bounded from below: it follows easily by a monotone
approximation argument of g by continuous and bounded functions. Changing g
in −g one gets the corresponding “lim sup” inequality for upper semicontinuous
functions bounded from above. In particular, choosing as g the characteristic
functions of open and closed subset of X, we obtain
lim inf µn (G) ≥ µ(G) ∀ G open in X,
(1.13)
lim sup µn (F ) ≤ µ(F ) ∀ F closed in X.
(1.14)
n→∞
n→∞
The statement of the second property requires the following definitions: we say
that a Borel function g : X → [0, +∞] is uniformly integrable w.r.t. a given set
K ⊂ P(X) if
Z
lim
g(x) dµ(x) = 0 uniformly w.r.t. µ ∈ K.
(1.15)
k→∞
{x:g(x)≥k}
4
In the particular case of g(x) := d(x, x̄)p , for some (and thus any) x̄ ∈ X and a
given p > 0, i.e. if
Z
lim
dp (x̄, x) dµ(x) = 0 uniformly w.r.t. µ ∈ K,
(1.16)
k→∞
X\Bk (x̄)
we say that the set K ⊂ P(X) has uniformly integrable p-moments. The following lemma (see for instance Lemma 5.1.7 of [8] for its proof) provides a characterization of p-uniformly integrable families, extending the validity of (1.10)
to unbounded but with p-growth functions, i.e. functions f : X → R such that
|f (x)| ≤ A + B dp (x̄, x)
∀x ∈ X,
(1.17)
for some A, B ≥ 0 and x̄ ∈ X.
Lemma 1.2 Let (µn ) be a sequence in P(X) narrowly convergent to µ ∈
P(X). If f : X → R is continuous, g : X → (−∞, +∞] is lower semicontinuous, and |f | and g − are uniformly integrable w.r.t. the set {µn }n∈N , then
Z
Z
lim inf
g(x) dµn (x) ≥
g(x) dµ(x) > −∞,
(1.18a)
n→∞
ZX
ZX
lim
f (x) dµn (x) =
f (x) dµ(x).
(1.18b)
n→∞
X
X
Conversely, if f : X → [0, ∞) is continuous, µn -integrable, and
Z
Z
lim sup
f (x) dµn (x) ≤
f (x) dµ(x) < +∞,
n→∞
X
(1.19)
X
then f is uniformly integrable w.r.t. {µn }n∈N .
In particular, a family {µn }n∈N ⊂ P(X) has uniformly integrable p-moments
iff (1.10) holds for every continuous function f : X → R with p-growth.
2
2.1
Metric and differentiable structure of the Wasserstein space
Absolutely continuous maps and metric derivative
Let (E, d) be a metric space.
Definition 2.1 (Absolutely continuous curves) Let I ⊂ R be an interval
and let v : I → E. We say that v is absolutely continuous if there exists
m ∈ L1 (I) such that
Z
d(v(s), v(t)) ≤
t
m(τ ) dτ
s
5
∀s, t ∈ I, s ≤ t.
(2.1)
Any absolutely continuous curve is obviously uniformly continuous, and
therefore it can be uniquely extended to the closure of I. It is not difficult
to show (see for instance Theorem 1.1.2 in [8] or [9]) that the metric derivative
d(v(t + h), v(t))
h→0
|h|
|v 0 |(t) := lim
(2.2)
exists at L 1 -a.e. t ∈ I for any absolutely continuous curve v(t). Furthermore,
|v 0 | ∈ L1 (I) and is the minimal m fulfilling (2.1) (i.e. |v 0 | fulfils (2.1) and
m ≥ |v 0 | L 1 -a.e. in I for any m with this property).
2.2
The change of variables formula
Let f : A ⊂ Rd → Rd be a function, with A open. Then, denoting by Σf =
D(∇f ) the Borel set where f is differentiable, there is a sequence of sets Σn ↑ Σ
such that f |Σn is a Lipschitz function for any n (see [34, 3.1.8]). Therefore the
well-known area formula for Lipschitz maps (see for instance [33, 34]) extends
to this general class of maps and reads as follows:
Z
Z
X
h(x) dy
(2.3)
h(x)|det∇f |(x) dx =
Rd x∈Σ ∩f −1 (y)
f
Σf
for any Borel function h : Rd → [0, +∞]. This formula leads to a simple rule for
computing the density of the push-forward of measures absolutely continuous
w.r.t. L d .
Lemma 2.2 (Density of the push-forward) Let ρ ∈ L1 (Rd ) be a nonnegative function and assume that there exists a Borel set Σ ⊂ Σf such that
f |Σ is
injective and the difference {ρ > 0} \ Σ is L d -negligible. Then f# ρL d L d
if and only if |det∇f | > 0 L d -a.e. on Σ and in this case
f# ρL d =
ρ
◦ f −1 |f (Σ) L d .
|det∇f |
Proof. If |det∇f | > 0 L d -a.e. on Σ we can put h = ρχf −1 (B)∩Σ /|det∇f | in
(2.3), with B ∈ B(Rd ), to obtain
Z
Z
Z
ρ(f −1 (y))
ρ dx =
ρ dx =
dy.
˜−1 (y))|
f −1 (B)
f −1 (B)∩Σ
B∩f (Σ) |det∇f (f
Conversely, if there is a Borel set B ⊂ Σ with L d (B) > 0 and |det∇f | = 0 on
B, the area formula gives L d (f˜(B)) = 0. On the other hand
Z
ρ dx > 0
f# (ρL d )(f (B)) =
f −1 (f (B))
because at L d -a.e. x ∈ B we have ρ(x) > 0. Hence f# (ρL d ) is not absolutely
continuous with respect to L d .
6
By applying the area formula again we obtain the rule for computing integrals of the densities:
Z
Z
f# (ρL d )
ρ
F
dx
=
|det∇f | dx
(2.4)
F
Ld
|det∇f |
Rd
Rd
for any Borel function F : [0, +∞) → [0, +∞] with F (0) = 0. Notice that in
this formula the set Σf does not appear anymore (due to the fact that F (0) = 0
and ρ = 0 out of Σ), so it holds provided f is differentiable ρL d -a.e., it is ρL d essentially injective (i.e. there exists a Borel set Σ such that f |Σ is injective and
ρ = 0 L d -a.e. out of Σ) and |det∇f | > 0 ρL d -a.e. in Rd .
We will apply mostly these formulas when f is the gradient of a convex
function g. In this specific case the following result holds (see for instance
[4, 33]):
Theorem 2.3 (Aleksandrov) Let Ω ⊂ Rd be a convex open set and let g :
Ω → R be a convex function. Then g is a locally Lipschitz function, ∇g is
differentiable, its gradient ∇2 g(x) is a symmetric matrix and g has the second
order Taylor expansion
1
g(y) = g(x) + h∇g(x), y − xi + h∇2 g(x), y − xi + o(|y − x|2 )
2
as y → x (2.5)
for L d -a.e. x ∈ Ω.
Notice that ∇g is also monotone
h∇g(x1 ) − ∇g(x2 ), x1 − x2 i ≥ 0
x1 , x2 ∈ D(∇g),
and that the above inequality is strict if g is strictly convex: in this case, it is
immediate to check that ∇g is injective on D(∇g), and that |det∇2 g| > 0 on
the differentiability set of ∇g if g is uniformly convex.
2.3
The quadratic optimal transport problem
Let X, Y be complete and separable metric spaces such that and let c : X ×Y →
[0, +∞] be a Borel cost function. Given µ ∈ P(X), ν ∈ P(Y ) the optimal
transport problem, in Monge’s formulation, is given by
Z
inf
c(x, t(x)) dµ(x) : t# µ = ν .
(2.6)
X
This problem can be ill posed because sometimes there is no transport map t
such that t# µ = ν (this happens for instance when µ is a Dirac mass and ν is
not a Dirac mass). Kantorovich’s formulation
Z
min
c(x, y) dγ(x, y) : γ ∈ Γ(µ, ν)
(2.7)
X×Y
7
circumvents this problem (as µ × ν ∈ Γ(µ, ν)). The existence of an optimal
transport plan, when c is l.s.c., is provided by (1.12) and by Theorem 1.1,
taking into account that Γ(µ, ν) is tight (this follows easily by the fact that the
marginals of the measures in Γ(µ, ν) are fixed, and by the fact that according
to Ulam’s theorem any finite measure in a complete and separable metric space
is tight, see also Chapter 6 in [8] for more general formulations).
The problem (2.7) is truly a weak formulation of (2.6) in the following sense:
if c is bounded and continuous, and if µ has no atom, then the “min” in (2.7)
is equal to the “inf” in (2.6), see [36], [6]. This result can also be extended to
classes of unbounded cost functions, see [61].
In the sequel we consider the case when X = Y and c(x, y) = d2 (x, y),
where d is the distance in X, and denote by Γo (µ, ν) the optimal plans in (2.7)
corresponding to this choice of the cost function. In this case we ue the minimum
value to define the Kantorovich-Rubinstein-Wasserstein distance
Z
1/2
2
W2 (µ, ν) :=
d (x, y) dγ
γ ∈ Γo (µ, ν).
(2.8)
X×X
Theorem 2.4 Let X be a complete and separable metric space. Then W2 defines a distance in P2 (X) and P2 (X), endowed with this distance, is a complete
and separable metric space. Furthermore, for a given sequence (µn ) ⊂ P2 (X)
we have
(
µn narrowly converge to µ,
2
lim W2 (µn , µ) = 0 ⇐⇒
(2.9)
n→∞
(µn ) has uniformly integrable 2-moments.
Proof. We just prove that W2 is a distance. The complete statement is proved
for instance in Proposition 7.1.5 of [8] or, in the locally compact case, in [65].
Let µ, ν, σ ∈ P2 (X) and let γ ∈ Γo (µ, ν) and η ∈ Γo (ν, σ). General results
of probability theory (see the above mentioned references) ensure the existence
of λ ∈ P(X × X × X) such that
(π 1 , π 2 )# λ = γ,
(π 2 , π 3 )# λ = η.
Then, as
1
1
1
π#
(π 1 , π 3 )# λ = π#
λ = π#
γ = µ,
2
3
2
π#
(π 1 , π 3 )# λ = π#
λ = π#
η = σ,
we obtain that (π 1 , π 3 )# λ ∈ Γ(µ, σ), hence
Z
W2 (µ, ν) ≤
1/2
d (x1 , x2 ) d(π , π )# λ
= kd(x1 , x3 )kL2 (λ) .
2
1
3
X×X
As d(x1 , x3 ) ≤ d(x1 , x2 ) + d(x2 , x3 ) and
kd(x1 , x2 )kL2 (λ) = kd(x1 , x2 )kL2 (γ) = W2 (µ, ν),
kd(x2 , x3 )kL2 (λ) = kd(x2 , x3 )kL2 (γ) = W2 (ν, σ),
the triangle inequality W2 (µ, σ) ≤ W2 (µ, ν) + W2 (ν, σ) follows by the strandard
triangle inequality in L2 (λ).
8
Working with Monge’s formulation the proof above is technically easier, as
an admissible transport map between µ and σ can be obtained just composing transport maps between µ and ν with transport maps between ν and σ.
However, in order to give a complete proof one needs to know either that optimal plans are induced by maps, or that the infimum in Monge’s formulation
coincides with the minimum in Kantorovich’s one, and none of these results is
trivial, even in Euclidean spaces.
Although in many situations that we consider in this paper the optimal plans
are induced by maps, still the Kantorovich formulation of the optimal transport
problem is quite useful to provide estimates from above on W2 . For instance:
Z
2
W2 (µ, ν) ≤
d2 (t(x), s(x)) dσ(x) whenever t# σ = µ, s# σ = ν.
(2.10)
X
This follows by the fact that (t, s)# σ ∈ Γ(µ, ν) and by the identity
Z
Z
d2 (t(x), s(x)) dσ(x) =
d2 (x, y) d(t, s)# σ.
X
2.4
X×X
Geodesics in P2 (Rd )
Let (X, d) be a metric space. Recall that a constant speed geodesic γ : [0, T ] →
X is a map satisfying
d(γ(s), γ(t)) =
(t − s)
d(γ(0), γ(T ))
T
whenever 0 ≤ s ≤ t ≤ T .
Actually only the inequality d(γ(s), γ(t)) ≤ T −1 (t − s)d(γ(0), γ(T )) needs to be
checked for all 0 ≤ s ≤ t ≤ T . Indeed, if the strict inequality occurs for some
s < t, then the triangle inequality provides
d(γ(0), γ(T )) ≤ d(γ(0), γ(s)) + d(γ(s), γ(T )) + d(γ(t), γ(T ))
1
<
(s + (t − s) + (T − t))d(γ(0), γ(T )) = d(γ(0), γ(T )),
T
a contradiction.
Using this elementary fact one can show that, for any choice of µ, ν ∈
P2 (Rd ), and γ ∈ Γo (µ, ν), the map
µt := (1 − t)π 1 + tπ 2 # γ
t ∈ [0, 1]
(2.11)
is a constant speed geodesic. Indeed,
γ st := ((1 − s)π 1 + sπ 2 ), ((1 − t)π 1 + tπ 2 ) # γ ∈ Γ(µs , µt )
and this plan provides the estimate
W2 (µs , µt ) ≤ (t − s)W2 (µ, ν),
9
(2.12)
as
Z
2
Rd ×Rd
Z
|x1 − x2 | dγ st
=
|(1 − s)x1 + sx2 − (1 − t)x1 − tx2 |2 dγ
Z
2
(s − t)
|x1 − x2 |2 dγ.
Rd ×Rd
=
Rd ×Rd
It has been proved in Theorem 7.2.2 of [8] that any constant speed geodesic
joining µ to ν can be built in this way. We discuss additional regularity properties of the geodesics in the next section. Here we just mention that, in the case
when γ is induced by a transport map t (i.e. γ = (i, t)# µ), then (2.11) reduces
to
µt = ((1 − t)i + tt)# µ
t ∈ [0, 1].
(2.13)
2.5
Existence of optimal transport maps
The following result has been first proved in []. See [], [] and of [8] for several
extensions of it (more general cost functions as well as more general classes of
initial measures), and for additional bibliographical references. Here we just
briefly mention the proof of (iii).
Theorem 2.5 (Existence and uniqueness of optimal transport maps)
For any µ ∈ P2a (Rd ), ν ∈ P2 (Rd ) Kantorovich’s optimal transport problem
(2.7) with c(x, y) = |x − y|2 has a unique solution γ. Moreover:
(i) γ is induced by a transport map t, i.e. γ = (i, t)# µ. In particular t is the
unique solution of Monge’s optimal transport problem (2.6).
(ii) The map t coincides µ-a.e. with the gradient of a convex function ϕ :
Rd → (−∞, +∞], whose finiteness domain D(φ) satisfies µ(Rd \ D(φ)) =
0.
(iii) If ν = ρ0 L d ∈ P2a (Rd ) as well, then
s ◦ t = i µ-a.e. in Rd
and
t ◦ s = i ν-a.e. in Rd .
In particular t is µ-essentially injective, i.e. there exists a µ-negligible set
N ⊂ Rd such that, setting Ω = Rd \ N , t|Ω is injective. Finally
ρ
◦ (t|Ω )−1 ν-a.e. in Rd .
det ∇2 ϕ
Proof. Let s be the unique optimal transport plan between ν and µ. Since
(i, t)# µ and (s, i)# ν are both optimal plans between µ and ν, they coincide.
Testing this identity between plans on |s(t(x)−x| (resp. |t(s(y))−y|) we obtain
that s ◦ t = i µ-a.e. in Rd (resp. t ◦ s = i ν-a.e. in Rd ):
Z
Z
Z
|x − s(t(x))| dµ(x) =
|x − s(y)| d(i, t)# µ =
|x − s(y)| d(s, i)# ν
d
d
Rd
Rd ×Rd
ZR ×R
|s(y) − s(y)| dν(y) = 0.
=
ρ0 :=
Rd
10
The formula for the density of ν with respect to L d follows by Lemma 2.2,
taking into account the µ-essential injectivity of t.
In the following we shall denote by tνµ the unique optimal map given by
Theorem 2.5. Notice that t = ∇ϕ is uniquely determined only µ-a.e., hence ϕ
is not uniquely determined, not even up to additive constants, unless µ = ρL d
with ρ > 0 L d -a.e. in Rd . However, the existence proof (at least the one
achieved through a duality argument), yields some “canonical” ϕ, given by
ϕ(x) =
sup hx, yi + ψ(y)
y∈supp ν
x ∈ Rd
(2.14)
for a suitable function ψ : supp ν → [−∞, +∞). This explicit expression is
sometimes technically useful: for instance, it shows that when supp ν is bounded
we can always find a globally convex and Lipschitz map ϕ whose gradient is the
optimal transport map.
Theorem 2.6 (Regularity in the interior of geodesics) Let µ, ν ∈ P2 (Rd )
and let
µt := (1 − t)π 1 + tπ 2 # γ
be a constant speed geodesic induced by γ ∈ Γo (µ, ν). Then the following properties hold:
(i) For any t ∈ [0, 1) there exists a unique optimal plan between µt and µ, and
this plan is induced by a map st with Lipschitz constant less than 1/(1−t);
(ii) If µ = ρL d ∈ P2a (Rd ) then µt ∈ P2a (Rd ) for all t ∈ [0, 1).
Proof. (i) The necessary optimality conditions at the level of plans (see for
instance §6.2.3 of [8], or [65]) imply that the support of γ is contained in the
graph
{(x, y) : y ∈ Γ(x)}
of a monotone operator Γ(x). On the other hand, the same argument
used
in the proof of (2.12) shows that the plan γ t := π 1 , (1 − t)π 1 + tπ 2 # γ is
optimal between µ and µt . As the support of γ t is contained in the graph of
the monotone operator (1 − t)I + tΓ, whose inverse
Γ−1 (y) := x ∈ Rd : y ∈ Γ(x)
is single-valued and 1/(1 − t)-Lipschitz continuous. Therefore the graph of Γ−1
is the graph of a 1/(1 − t)-Lipschitz map st pushing µt to µ. The uniqueness of
this map, even at the level of plans, is proved in Lemma 7.2.1 of [8].
(ii) If A ∈ B(Rd ) is L d -negligible, then st (A) is also L d -negligible, hence
µ-negligible. The identity st ◦ tt = i µ-a.e. then gives
µt (A) = µ(t−1
t (A)) ≤ µ(st (A)) = 0.
This proves that µt L d .
11
2.6
The continuity equation with locally Lipschitz velocity
fields
In this section we collect some results on the continuity equation
in Rd × (0, T ),
∂t µt + ∇ · (vt µt ) = 0
(2.15)
which we will need in the sequel. Here µt is a Borel family of probability
measures on Rd defined for t in the open interval I := (0, T ), v : (x, t) 7→
vt (x) ∈ Rd is a Borel velocity field such that
T
Z
Z
|vt (x)| dµt (x) dt < +∞,
0
(2.16)
Rd
and we suppose that (2.15) holds in the sense of distributions, i.e.
Z
T
Z ∂t ϕ(x, t) + hvt (x), ∇x ϕ(x, t)i dµt (x) dt = 0,
Rd
0
∀ϕ ∈
Cc∞ (Rd
(2.17)
× (0, T )).
Remark 2.7 (More general test functions) By a simple regularization ar- gument via convolution, it is easy to show that (2.17) holds if ϕ ∈ Cc1 Rd × (0, T )
as well. Moreover, under condition (2.16), we can also consider bounded test
functions ϕ, with bounded gradient, whose support has a compact projection in
(0, T ) (that is, the support in x need not be compact): it suffices to approximate
ϕ by ϕχR where χR ∈ Cc∞ (Rd ), 0 ≤ χR ≤ 1, |∇χR | ≤ 2 and χR = 1 on BR (0).
First of all we recall some technical preliminaries.
Lemma 2.8 (Continuous representative) Let µt be a Borel family of probability measures satisfying (2.17) for a Borel vector field vt satisfying (2.16).
Then there exists a narrowly continuous curve t ∈ [0, T ] 7→ µ̃t ∈ P(Rd ) such
that µt = µ̃t for L 1 -a.e. t ∈ (0, T ). Moreover, if ϕ ∈ Cc1 (Rd × [0, T ]) and
t1 ≤ t2 ∈ [0, T ] we have
Z
Z
ϕ(x, t1 ) dµ̃t1 (x)
ϕ(x, t2 ) dµ̃t2 (x) −
Rd
Rd
Z t2 Z (2.18)
∂t ϕ + h∇ϕ, vt i dµt (x) dt.
=
t1
Rd
Proof. Let us take ϕ(x, t) = η(t)ζ(x), η ∈ Cc∞ (0, T ) and ζ ∈ Cc∞ (Rd ); we have
Z
−
T
Z
η (t)
0
0
Z
ζ(x) dµt (x) dt =
Rd
T
Z
η(t)
h∇ζ(x), vt (x)i dµt (x) dt,
Rd
0
so that the map
Z
t 7→ µt (ζ) =
ζ(x) dµt (x)
Rd
12
belongs to W 1,1 (0, T ) with distributional derivative
Z
µ̇t (ζ) =
h∇ζ(x), vt (x)i dµt (x) for L 1 -a.e. t ∈ (0, T )
(2.19)
Rd
with
Z
|µ̇t (ζ)| ≤ V (t) sup |∇ζ|,
|vt (x)| dµt (x),
V (t) :=
Rd
V ∈ L1 (0, T ).
(2.20)
Rd
If Lζ is the set of its Lebesgue points, we know that L 1 ((0, T ) \ Lζ ) = 0. Let us
now take a countable set Z which is dense in Cc1 (Rd ) with respect the usual C 1
norm kζkC 1 = supRd (|ζ|, |∇ζ|) and let us set LZ := ∩ζ∈Z Lζ . The restriction of
the curve µ to LZ provides a uniformly continuous family of bounded functionals
on Cc1 (Rd ), since (2.20) shows
Z t
|µt (ζ) − µs (ζ)| ≤ kζkC 1
V (λ) dλ ∀ s, t ∈ LZ .
s
Therefore, it can be extended in a unique way to a continuous curve {µ̃t }t∈[0,T ]
in [Cc1 (Rd )]0 . If we show that {µt }t∈LZ is also tight, the extension provides a
continuous curve in P(Rd ).
For, let us consider nonnegative, smooth functions ζk : Rd → [0, 1], k ∈ N,
such that
if |x| ≤ k,
ζk (x) = 1
ζk (x) = 0
if |x| ≥ k + 1,
|∇ζk (x)| ≤ 2.
It is not restrictive to suppose that ζk ∈ Z. Applying the previous formula
(2.19), for t, s ∈ LZ we have
Z TZ
|µt (ζk ) − µs (ζk )| ≤ ak := 2
|vλ (x)| dµλ (x) dλ,
0
k<|x|<k+1
P+∞
with k=1 ak < +∞. For a fixed s ∈ LZ and ε > 0, being µs tight, we can find
k ∈ N such that µs (ζk ) > 1 − ε/2 and ak < ε/2. It follows that
µt (Bk+1 (0)) ≥ µt (ζk ) ≥ 1 − ε
∀ t ∈ LZ .
Now we show (2.18). Let us choose ϕ ∈ Cc1 (Rd × [0, T ]) and set ϕε (x, t) =
ηε (t)ϕ(x, t), where ηε ∈ Cc∞ (t1 , t2 ) such that
0 ≤ ηε (t) ≤ 1,
lim ηε (t) = χ(t1 ,t2 ) (t) ∀ t ∈ [0, T ],
ε↓0
lim ηε0 = δt1 − δt2
ε↓0
in the duality with continuous functions in [0, T ]. We get
Z TZ 0=
∂t (ηε ϕ) + h∇x (ηε ϕ), vt i dµt (x) dt
Rd
0
T
Z
=
Z
ηε (t)
∂t ϕ(x, t) + hvt (x), ∇x ϕ(x, t)i dµt (x) dt
Rd
0
Z
+
0
T
ηε0 (t)
Z
ϕ(x, t) dµ̃t (x) dt.
Rd
13
Passing to the limit as ε vanishes and invoking the continuity of µ̃t , we get
(2.18).
Lemma 2.9 (Time rescaling) Let t : s ∈ [0, T 0 ] → t(s) ∈ [0, T ] be a strictly
increasing absolutely continuous map with absolutely continuous inverse s :=
t−1 . Then (µt , vt ) is a distributional solution of (2.15) if and only if
µ̂ := µ ◦ t, v̂ := t0 v ◦ t,
is a distributional solution of (2.15) on (0, T 0 ).
Proof. By an elementary smoothing argument we can assume that s is continuously differentiable and s0 > 0. We choose ϕ̂ ∈ Cc1 (Rd × (0, T 0 )) and let us set
ϕ(x, t) := ϕ̂(x, s(t)); since ϕ ∈ Cc1 (Rd × (0, T )) we have
Z TZ
s0 (t)∂s ϕ̂(x, s(t)) + h∇ϕ̂(x, s(t)), v̂t (x)i dµt (x) dt
0=
Z
Rd
0
T
s0 (t)
=
0
Z
T0
Z
=
0
Z
vt (x) i dµt (x) dt
s0 (t)
Rd
∂s ϕ̂(x, s) + h∇x ϕ̂(x, s), t0 (s)vt(s) (x)i dµ̂s (x) ds.
∂s ϕ̂(x, s(t)) + h∇x ϕ̂(x, s(t)),
Rd
When the velocity field vt is more regular, the classical method of characteristics provides an explicit solution of (2.15). First we recall an elementary
result of the theory of ordinary differential equations.
Lemma 2.10 (The characteristic system of ODE) Let vt be a Borel vector field such that for every compact set B ⊂ Rd
Z T
sup |vt | + Lip(vt , B) dt < +∞.
(2.21)
B
0
Then, for every x ∈ Rd and s ∈ [0, T ] the ODE
Xs (x, s) = x,
d
Xt (x, s) = vt (Xt (x, s)),
dt
(2.22)
admits a unique maximal solution defined in an interval I(x, s) relatively open
in [0, T ] and containing s as (relatively) internal point.
Furthermore, if t 7→ |Xt (x, s)| is bounded in the interior of I(x, s) then I(x, s) =
[0, T ]; finally, if v satisfies the global bounds analogous to (2.21)
Z T
S :=
sup |vt | + Lip(vt , Rd ) dt < +∞,
(2.23)
0
Rd
then the flow map X satisfies
Z T
sup |∂t Xt (x, s)| dt ≤ S,
0
x∈Rd
sup Lip(Xt (·, s), Rd ) ≤ eS .
t,s∈[0,T ]
14
(2.24)
For simplicity, we set Xt (x) := Xt (x, 0) in the particular case s = 0 and we
denote by τ (x) := sup I(x, 0) the length of the maximal time domain of the
characteristics leaving from x at t = 0.
Remark 2.11 (The characteristics method for backward first order linear PDE’s)
Characteristics provide a useful representation formula for classical solutions
of the backward equation (formally adjoint to (2.15))
in Rd × (0, T ),
∂t ϕ + hvt , ∇ϕi = ψ
ϕ(x, T ) = ϕT (x) x ∈ Rd ,
(2.25)
when, e.g., ψ ∈ Cb1 (Rd × (0, T )), ϕT ∈ Cb1 (Rd ) and v satisfies the global bounds
(2.23), so that maximal solutions are always defined in [0, T ]. A direct calculation shows that
Z T
ϕ(x, t) := ϕT (XT (x, t)) −
ψ(Xs (x, t), s) ds
(2.26)
t
solve (2.25). For Xs (Xt (x, 0), t) = Xs (x, 0) yields
Z T
ψ(Xs (x, 0), s) ds,
ϕ(Xt (x, 0), t) = ϕT (XT (x, 0)) −
t
and differentiating both sides with respect to t we obtain
∂ϕ
+ hvt , ∇ϕi (Xt (x, 0), t) = ψ(Xt (x, 0), t).
∂t
Since x (and then Xt (x, 0)) is arbitrary we conclude that (2.32) is fulfilled.
Now we use characteristics to prove the existence, the uniqueness, and a
representation formula of the solution of the continuity equation, under suitable
assumption on v.
Lemma 2.12 Let vt be a Borel velocity field satisfying (2.21), (2.16), let µ0 ∈
P(Rd ), and let Xt be the maximal solution of the ODE (2.22) (corresponding
to s = 0). Suppose that for some t̄ ∈ (0, T ]
τ (x) > t̄
for µ0 -a.e. x ∈ Rd .
(2.27)
Then t 7→ µt := (Xt )# µ0 is a continuous solution of (2.15) in [0, t̄].
Proof. The continuity of µt follows easily since lims→t Xs (x) = Xt (x) for µ0 a.e. x ∈ Rd : thus for every continuous and bounded function ζ : Rd → R the
dominated convergence theorem yields
Z
Z
Z
Z
lim
ζ dµs = lim
ζ(Xs (x)) dµ0 (x) =
ζ(Xt (x)) dµ0 (x) =
ζ dµt .
s→t
Rd
s→t
Rd
Rd
Rd
For any ϕ ∈ Cc∞ (Rd × (0, t̄)) and for µ0 -a.e. x ∈ Rd the maps t 7→ φt (x) :=
ϕ(Xt (x), t) are absolutely continuous in (0, t̄), with
φ̇t (x) = ∂t ϕ(Xt (x), t) + h∇ϕ(Xt (x), t), vt (Xt (x))i = Λ(·, t) ◦ Xt ,
15
where Λ(x, t) := ∂t ϕ(x, t) + h∇ϕ(x, t), vt (x)i. We thus have
T
Z
Z
Z
T
Z
T
Z
|φ̇t (x)| dµ0 (x) dt =
0
|Λ(Xt (x), t)| dµ0 (x) dt
Rd
Rd
0
Z
|Λ(x, t)| dµt (x) dt
=
Rd
0
T
Z
≤ Lip(ϕ) T +
and therefore
Z
Z
0=
ϕ(x, t̄) dµt̄ (x) −
Rd
Z
=
Rd
t̄
|vt (x)| dµt (x) dt < +∞
0
Rd
Z
ϕ(x, 0) dµ0 (x) =
Rd
Z
Z
ϕ(Xt̄ (x), t̄) − ϕ(x, 0) dµ0 (x)
Rd
Z t̄ Z
∂t ϕ + h∇ϕ, vt i dµt dt,
φ̇t (x) dt dµ0 (x) =
0
0
Rd
by a simple application of Fubini’s theorem.
We want to prove that, under reasonable assumptions, in fact any solution
of (2.15) can be represented as in Lemma 2.12. The first step is a uniqueness
theorem for the continuity equation under minimal regularity assumptions on
the velocity field. Notice that the only global information on vt is (2.28). The
proof is based on a classical duality argument (see for instance [30, 6]).
Proposition 2.13 (Uniqueness and comparison for the continuity equation)
Let σt be a narrowly continuous family of signed measures solving ∂t σt + ∇ ·
(vt σt ) = 0 in Rd × (0, T ), with σ0 ≤ 0,
T
Z
Z
|vt | d|σt |dt < +∞,
0
and
Z
T
(2.28)
Rd
|σt |(B) + sup |vt | + Lip(vt , B) dt < +∞
B
0
for any bounded closed set B ⊂ Rd . Then σt ≤ 0 for any t ∈ [0, T ].
Proof. Fix ψ ∈ Cc∞ (Rd × (0, T )) with 0 ≤ ψ ≤ 1, R > 0, and a smooth cut-off
function
χR (·) = χ(·/R) ∈ Cc∞ (Rd )
such that 0 ≤ χR ≤ 1, |∇χR | ≤ 2/R,
χR ≡ 1 on BR (0), and χR ≡ 0 on Rd \ B2R (0).
(2.29)
We define wt so that wt = vt on B2R (0) × (0, T ), wt = 0 if t ∈
/ [0, T ] and
sup |wt | + Lip(wt , Rd ) ≤ sup |vt | + Lip(vt , B2R (0)) ∀ t ∈ [0, T ].
Rd
B2R (0)
16
(2.30)
Let wtε be obtained from wt by a double mollification with respect to the space
and time variables: notice that wtε satisfy
T
Z
sup
ε∈(0,1)
sup |wtε | + Lip(wtε , Rd ) dt < +∞.
(2.31)
Rd
0
We now build, by the method of characteristics described in Remark 2.11, a
smooth solution ϕε : Rd × [0, T ] → R of the PDE
∂ϕε
+ hwtε , ∇ϕε i = ψ
∂t
in Rd × (0, T ),
ϕε (x, T ) = 0
x ∈ Rd .
(2.32)
Combining the representation formula (2.26), the uniform bound (2.31), and the
estimate (2.24), it is easy to check that 0 ≥ ϕε ≥ −T and |∇ϕε | is uniformly
bounded with respect to ε, t and x.
We insert now the test function ϕε χR in the continuity equation and take
into account that σ0 ≤ 0 and ϕε ≤ 0 to obtain
Z
≥
0
Rd
Z
T
Z
T
Z
=
Rd
0
Z
≥
0
Rd
T
∂ϕε
+ hvt , χR ∇ϕε + ϕε ∇χR i dσt dt
∂t
0
Rd
Z TZ
χR (ψ + hvt − wtε , ∇ϕε i) dσt dt +
ϕε h∇χR , vt i dσt dt
ϕε χR dσ0 =
−
Z
Z
χR
Rd
0
Z
χR (ψ + hvt − wtε , ∇ϕε i) dσt dt −
T
Z
|∇χR ||vt | d|σt | dt.
Rd
0
Letting ε ↓ 0 and using the uniform bound on |∇ϕε | and the fact that wt = vt
on supp χR × [0, T ], we get
Z
0
T
Z
χR ψ dσt dt ≤
Rd
T
Z
Z
|∇χR ||vt | d|σt | dt ≤
Rd
0
Eventually letting R → ∞ we obtain that
arbitrary the proof is achieved.
2
R
RT R
0
Rd
Z
T
Z
|vt | d|σt | dt.
0
R≤|x|≤2R
ψ dσt dt ≤ 0. Since ψ is
Proposition 2.14 (Representation formula for the continuity equation)
Let µt , t ∈ [0, T ], be a narrowly continuous family of Borel probability measures
solving the continuity equation (2.15) w.r.t. a Borel vector field vt satisfying
(2.21) and (2.16). Then for µ0 -a.e. x ∈ Rd the characteristic system (2.22)
admits a globally defined solution Xt (x) in [0, T ] and
µt = (Xt )# µ0
∀ t ∈ [0, T ].
(2.33)
Moreover, if
Z
0
T
Z
|vt (x)|2 dµt (x) dt < +∞
Rd
17
(2.34)
then the velocity field vt is the time derivative of Xt in the L2 -sense
2
Z T −h Z Xt+h (x) − Xt (x)
lim
− vt (Xt (x)) dµ0 (x) dt = 0,
(2.35)
h↓0 0
h
Rd
Xt+h (x, t) − x
lim
= vt (x) in L2 (µt ; Rd ) for L 1 -a.e. t ∈ (0, T ). (2.36)
h→0
h
Proof. Let Es = {τ > s} and let us use the fact that, proved in Lemma 2.12,
that t 7→ Xt# (χEs µ0 ) is the solution of (2.15) in [0, s]. By Proposition 2.13 we
get also
Xt# (χEs µ0 ) ≤ µt
whenever 0 ≤ t ≤ s.
Using the previous inequality with s = t we can estimate:
Z
Z Z τ (x)
|Ẋt (x)| dµ0 (x)
sup |Xt (x) − x| dµ0 (x) ≤
Rd (0,τ (x))
Rd
Z
0
τ (x)
Z
|vt (Xt (x))| dµ0 (x)
=
Rd
Z T
0
Z
|vt (Xt (x))| dµ0 (x) dt
=
0
Z
Et
T
Z
≤
|vt | dµt dt.
0
Rd
It follows that Xt (x) is bounded on (0, τ (x)) for µ0 -a.e. x ∈ Rd and therefore
Xt is globally defined in [0, T ] for µ0 -a.e. in Rd . Applying Lemma 2.12 and
Proposition 2.13 we obtain (2.33).
Now we observe that the differential quotient Dh (x, t) := h−1 (Xt+h (x) −
Xt (x)) can be bounded in L2 (µ0 × L 1 ) by
Z T −h Z Xt+h (x) − Xt (x) 2
dµ0 (x) dt
h
0
Rd
2
Z T −h Z Z h
1
=
vt+s (Xt+s (x)) ds dµ0 (x) dt
h
d
0
R
0
Z T −h Z
Z h
1
2
≤
|vt+s (Xt+s (x))| ds dµ0 (x) dt
h
d
0
R
0
Z TZ
2
≤
|vt (Xt (x))| dµ0 (x) dt < +∞.
0
Rd
Since we already know that Dh is pointwise converging to vt ◦ Xt µ0 × L 1 -a.e.
in Rd × (0, T ), we obtain the strong convergence in L2 (µ0 × L 1 ), i.e. (2.35).
Finally, we can consider t 7→ Xt (·) and t 7→ vt (Xt (·) as maps from (0, T ) to
L2 (µ0 ; Rd ); (2.35) is then equivalent to
2
Z T −h Xt+h − Xt
− vt (Xt )
dt = 0,
lim
2
h↓0 0
h
L (µ0 ;Rd )
18
and it shows that t 7→ Xt (·) belongs to AC 2 (0, T ; L2 (µ0 ; Rd )). General results
for absolutely continuous maps with values in Hilbert spaces yield that Xt is
differentiable L 1 -a.e. in (0, T ), so that
2
Z Xt+h (x) − Xt (x)
dµ0 (x) = 0 for L 1 -a.e. t ∈ (0, T ).
lim
−
v
(X
(x))
t
t
h→0 Rd
h
Since Xt+h (x) = Xh (Xt (x), t), we obtain (2.36).
Now we state an approximation result for general solution of (2.15) with
more regular ones, satisfying the conditions of the previous Proposition 2.14.
Lemma 2.15 (Approximation by regular curves) Let µt be a time-continuous
solution of (2.15) w.r.t. a velocity field satisfying the integrability condition
Z TZ
|vt (x)|2 dµt (x) dt < +∞.
(2.37)
0
Rd
Let (ρε ) ⊂ C ∞ (Rd ) be a family of strictly positive mollifiers in the x variable,
(e.g. ρε (x) = (2πε)−d/2 exp(−|x|2 /2ε)), and set
µεt := µt ∗ ρε ,
Etε := (vt µt ) ∗ ρε ,
vtε :=
Etε
.
µεt
(2.38)
Then µεt is a continuous solution of (2.15) w.r.t. vtε , which satisfies the local
regularity assumptions (2.21) and the uniform integrability bounds
Z
Z
|vtε (x)|2 dµεt (x) ≤
|vt (x)|2 dµt (x) ∀ t ∈ (0, T ).
(2.39)
Rd
Rd
Moreover, Etε → vt µt narrowly and
lim kvtε kL2 (µεt ;Rd ) = kvt kL2 (µt ;Rd )
ε↓0
∀t ∈ (0, T ).
(2.40)
Proof. With a slight abuse of notation, we are denoting the measure µεt and
its density w.r.t. L d by the same symbol. Notice first that |E ε |(t, ·) and its
spatial gradient are uniformly bounded in space by the product of kvt kL1 (µt )
with a constant depending on ε, and the first quantity is integrable in time.
Analogously, |µεt |(t, ·) and its spatial gradient are uniformly bounded in space
by a constant depending on ε. Therefore, as vtε = Etε /µεt , the local regularity
assumptions (2.21) is fulfilled if
inf
|x|≤R, t∈[0,T ]
µεt (x) > 0
for any ε > 0, R > 0.
This property is immediate, since µεt are continuous w.r.t. t and equi-continuous
w.r.t. x, and therefore continuous in both variables.
Lemma 2.16 below shows that (2.39) holds. Notice also that µεt solve the
continuity equation
∂t µεt + ∇ · (vtε µεt ) = 0
19
in Rd × (0, T ),
(2.41)
because, by construction, ∇·(vtε µεt ) = ∇·((vt µt )∗ρε ) = (∇·(vt µt ))∗ρε . Finally,
general lower semicontinuity results on integral functionals defined on measures
of the form
Z 2
E dµ
(E, µ) 7→
Rd µ
(see for instance Theorem 2.34 and Example 2.36 in [7]) provide (2.40).
Lemma 2.16 Let µ ∈ P(Rd ) and let E be a Rm -valued measure in Rd with
finite total variation and absolutely continuous with respect to µ. Then
Z Z 2
E ∗ ρ 2
E µ ∗ ρ dx ≤
dµ
Rd µ ∗ ρ
Rd µ
for any convolution kernel ρ.
Proof. We use Jensen inequality in the following form: if Φ : Rm+1 → [0, +∞]
is convex, l.s.c. and positively 1-homogeneous, then
Z
Z
Φ
ψ(x) dθ(x) ≤
Φ(ψ(x)) dθ(x)
Rd
Rd
for any Borel map ψ : Rd → Rm+1 and any positive and finite measure θ in Rd
(by rescaling θ to be a probability measure and looking at the image measure
ψ# θ the formula reduces to the standard Jensen inequality). Fix x ∈ Rd and
apply the inequality above with ψ := (E/µ, 1), θ := ρ(x − ·)µ and
 2
|z|


if t > 0

t
Φ(z, t) := 0
if (z, t) = (0, 0)



+∞ if either t < 0 or t = 0, z 6= 0,
to obtain
E ∗ ρ(x) 2
µ ∗ ρ(x) µ ∗ ρ(x)
Z
Z
E
= Φ
(y)ρ(x − y) dµ(y), ρ(x − y)dµ(y)
Rd µ
Z
E
Φ( (y), 1)ρ(x − y) dµ(y)
≤
µ
d
R
Z 2
E (y)ρ(x − y) dµ(y).
=
µ
d
R
An integration with respect to x leads to the desired inequality.
2.7
The tangent bundle to the Wasserstein space
In this section we endow P2 (Rd ) with a kind of differential structure, consistent
with the metric structure introduced in Section 2.3. Our starting point is the
20
analysis of absolutely continuous curves µt : (a, b) → P2 (Rd ): recall that this
concept depends only on the metric structure of P2 (Rd ), by Definition 2.1. We
show in Theorem 2.17 that this class of curves coincides with (distributional)
solutions of the continuity equation
∂
µt + ∇ · (vt µt ) = 0
∂t
in Rd × (a, b).
More precisely, given an absolutely continuous curve µt , one can find a Borel
time-dependent velocity field vt : Rd → Rd such that kvt kLp (µt ) ≤ |µ0 |(t) for
L 1 -a.e. t ∈ (a, b) and the continuity equation holds. Here |µ0 |(t) is the metric
derivative of µt , defined in (2.2). Conversely, if µt solve the continuity equaRb
tion for some Borel velocity field vt with a kvt kL2 (µt ) dt < +∞, then µt is an
absolutely continuous curve and kvt kL2 (µt ) ≥ |µ0 |(t) for L 1 -a.e. t ∈ (a, b).
As a consequence of Theorem 2.17 we see that among all velocity fields vt
which produce the same flow µt , there is a unique optimal one with smallest
L2 (µt ; Rd )-norm, equal to the metric derivative of µt ; we view this optimal field
as the “tangent” vector field to the curve µt . To make this statement more
precise, one can show that the minimality of the L2 norm of vt is characterized
by the property
vt ∈ {∇ϕ : ϕ ∈ Cc∞ (Rd ))}
L2 (µt ;Rd )
for L 1 -a.e. t ∈ (a, b).
(2.42)
The characterization (2.42) of tangent vectors strongly suggests to consider
the following tangent bundle to P2 (Rd )
L2 (µ;Rd )
Tanµ P2 (Rd ) := {∇ϕ : ϕ ∈ Cc∞ (Rd )}
∀µ ∈ P2 (Rd ),
(2.43)
endowed with the natural L2 metric. Moreover, as a consequence of the characterization of absolutely continuous curves in P2 (Rd ), we recover the Benamou–
Brenier (see [12], where the formula was introduced for numerical purposes)
formula for the Wasserstein distance:
Z 1
d
2
2
W2 (µ0 , µ1 ) = min
kvt kL2 (µt ;Rd ) dt :
µt + ∇ · (vt µt ) = 0 .
(2.44)
dt
0
Indeed, for any admissible curve we use the inequality between L2 norm of vt
and metric derivative to obtain:
Z 1
Z 1
2
kvt kL2 (µt ;Rd ) dt ≥
|µ0 |2 (t) dt ≥ W22 (µ0 , µ1 ).
0
0
Conversely, since we know that P2 (Rd ) is a length space, we can use a geodesic
µt and its tangent vector field vt to obtain equality in (2.44). We also show that
optimal transport maps belong to Tanµ P2 (Rd ) under quite general conditions.
In this way we recover in a more general framework the Riemannian interpretation of the Wasserstein distance developed by Otto in [57] (see also [56],
21
[45]) and used to study the long time behaviour of the porous medium equation.
In the original paper [57], (2.44) is derived using formally the concept of Riemannian submersion and the family of maps φ 7→ φ# µ (indexed by µ L d )
from Arnold’s space of diffeomorphisms into the Wasserstein space. In Otto’s
d
formalism tangent vectors are rather thought as s = dt
µt and these vectors are
identified, via the continuity equation, with −D · (vs µt ). Moreover vs is chosen
to be the gradient of a function ψs , so that D · (∇ψs µt ) = −s. Then the metric
tensor is induced by the identification s 7→ ∇φs as follows:
Z
0
hs, s iµt :=
h∇ψs , ∇ψs0 i dµt .
Rd
As noticed in [57], both the identification between tangent vectors and gradients
and the scalar product depend on µt , and these facts lead to a non trivial
geometry of the Wasserstein space. We prefer instead to consider directly vt as
the tangent vectors, allowing them to be not necessarily gradients: this leads to
(2.43).
Another consequence of the characterization of absolutely continuous curves
is a result, given in Proposition 2.22, concerning the infinitesimal behaviour
of the Wasserstein distance along absolutely continuous curves µt : given the
tangent vector field vt to the curve, we show that
lim
h→0
W2 (µt+h , (i + hvt )# µt )
=0
|h|
for L 1 -a.e. t ∈ (a, b).
Moreover the rescaled optimal transport maps between µt and µt+h converge to
the transport plan (i × vt )# µt associated to vt (see (2.57)). As a consequence,
we will obtain in Theorem 2.23 a key formula for the derivative of the map
t 7→ W22 (µt , ν).
Theorem 2.17 (Absolutely continuous curves and the continuity equation)
Let I be an open interval in R, let µt : I → P2 (Rd ) be an absolutely continuous
curve and let |µ0 | ∈ L1 (I) be its metric derivative, given by (2.2). Then there
exists a Borel vector field v : (x, t) 7→ vt (x) such that
vt ∈ L2 (µt ; Rd ),
kvt kL2 (µt ;Rd ) ≤ |µ0 |(t)
for L 1 -a.e. t ∈ I,
(2.45)
and the continuity equation
∂t µt + ∇ · (vt µt ) = 0
in Rd × I
holds in the sense of distributions, i.e.
Z Z ∂t ϕ(x, t)+hvt (x), ∇x ϕ(x, t)i dµt (x) dt = 0
I
Rd
(2.46)
∀ ϕ ∈ Cc∞ (Rd ×I). (2.47)
Moreover, for L 1 -a.e. t ∈ I vt belongs to the closure in L2 (µt , Rd ) of the subspace generated by the gradients ∇ϕ with ϕ ∈ Cc∞ (Rd ).
22
Conversely, if a narrowly continuous curve µt : I → P2 (Rd ) satisfies the continuity equation for some Borel velocity field vt with kvt kL2 (µt ;Rd ) ∈ L1 (I) then
µt : I → P2 (Rd ) is absolutely continuous and |µ0 |(t) ≤ kvt kL2 (µt ;Rd ) for L 1 -a.e.
t ∈ I.
Proof.
Taking into account that any absolutely continuous curve can be
reparametrized by arc length (see for instance [9]) and Lemma 2.9, we will
assume with no loss of generality that |µ0 | ∈ L∞ (I) in the proof of the first
statement. To fix the ideas, we also assume that I = (0, 1).
First of all we show that for every ϕ ∈ Cc∞ (Rd ) the function t 7→ µt (ϕ)
is absolutely continuous, and its derivative can be estimated with the metric
derivative of µt . Indeed, for s, t ∈ I and µst ∈ Γo (µs , µt ) we have, using the
Hölder inequality,
Z
|µt (ϕ) − µs (ϕ)| = ϕ(y) − ϕ(x) dµst ≤ Lip(ϕ)W2 (µs , µt ),
Rd
whence the absolute continuity follows. In order to estimate more precisely the
derivative of µt (ϕ) we introduce the upper semicontinuous and bounded map

|∇ϕ(x)|
if x = y,



H(x, y) :=


 |ϕ(x) − ϕ(y)| if x 6= y,
|x − y|
and notice that, setting µh = µ(s+h)s , we have
Z
|µs+h (ϕ) − µs (ϕ)|
1
≤
|x − y|H(x, y) dµh
|h|
|h| Rd ×Rd
Z
1/2
W2 (µs+h , µs )
2
≤
H (x, y) dµh
.
|h|
Rd ×Rd
If t is a point where s 7→ µs is metrically differentiable, using the fact that
µh → (x, x)# µt narrowly (because their marginals are narrowly converging,
any limit point belongs to Γo (µt , µt ) and is concentrated on the diagonal of
Rd × Rd ) we obtain
lim sup
h→0
|µt+h (ϕ) − µt (ϕ)|
≤ |µ0 |(t)
|h|
Z
H 2 (x, x) dµt
Rd
1/2
= |µ0 |(t)k∇ϕkL2 (µt ;Rd ) .
(2.48)
R
Set Q = Rd ×I and let µ = µt dt ∈ P(Q) be the measure whose disintegration
is {µt }t∈I . For any ϕ ∈ Cc∞ (Q) we have
Z
Z
ϕ(x, s) − ϕ(x, s − h)
dµ(x, s)
∂s ϕ(x, s) dµ(x, s) = lim
h↓0
h
Q
Q
Z Z
Z
1
ϕ(x, s) dµs (x) −
ϕ(x, s) dµs+h (x) ds.
= lim
h↓0 I h
Rd
Rd
23
Taking into account (2.48), Fatou’s Lemma yields
Z
Z
Z
1/q
0
∂s ϕ(x, s) dµ(x, s) ≤
|µ
|(s)
|∇ϕ(x, s)|q dµs (x)
ds
Q
J
Rd
Z
1/p Z
1/q
≤
|µ0 |p (s) ds
|∇ϕ(x, s)|q dµ(x, s)
,
J
Q
(2.49)
d
where J ⊂ I is any interval such that
supp
ϕ
⊂
J
×
R
.
If
V
denotes
the
closure
n
o
in L2 (µ; Rd ) of the subspace V := ∇ϕ, ϕ ∈ Cc∞ (Q) , the previous formula
says that the linear functional L : V → R defined by
Z
∂s ϕ(x, s) dµ(x, s)
L(∇ϕ) := −
Q
can be uniquely extended to a bounded functional on V . Therefore the minimum
problem
Z
1
2
|w(x, s)| dµ(x, s) − L(w) : w ∈ V
(2.50)
min
2 Q
admits a unique solution w ∈ V such that v := jq (w) satisfies
Z
hv(x, s), ∇ϕ(x, s)i dµ(x, s) = hL, ∇ϕi
∀ϕ ∈ Cc∞ (Q).
(2.51)
Q
Setting vt (x) = v(x, t) and using the definition of L we obtain (2.47). Moreover,
choosing a sequence (∇ϕn ) ⊂ V converging to w in Lq (µ; Rd ), it is easy to show
that for L 1 -a.e. t ∈ I there exists a subsequence n(i) (possibly depending on t)
such that ∇ϕn(i) (·, t) ∈ Cc∞ (Rd ) converge in L2 (µt ; Rd ) to w(·, t) = jp (v(·, t)).
Finally, choosing an interval J ⊂ I and η ∈ Cc∞ (J) with 0 ≤ η ≤ 1, (2.51)
and (2.49) yield
Z
Z
Z
2
η(s)|v(x, s)| dµ(x, s) =
ηhv, wi dµ = lim
ηhv, ∇ϕn i dµ
Q
n→∞
Q
Z
= lim hL, ∇(ηϕn )i ≤
n→∞
1/2 Z
Z
0 2
=
|µ | (s) ds
|µ0 |2 (s) ds
J
|w|2 dµ
1/2
Rd ×J
J
Q
Z
1/2
|∇ϕn |2 dµ
lim
n→∞
Rd ×J
1/2
Z
1/2 Z
0 2
|v|2 dµ
.
|µ | (s) ds
=
1/2
J
Rd ×J
Taking a sequence of smooth approximations of the characteristic function of J
we obtain
Z Z
Z
2
|vs (x)| dµs (x) ds ≤
|µ0 |2 (s) ds,
(2.52)
J
Rd
J
and therefore
kvt kL2 (µt ,Rd ) ≤ |µ0 |(t)
for L 1 -a.e. t ∈ I.
Now we show the converse implication. We apply the regularization Lemma 2.15,
finding approximations µεt , vtε satisfying the continuity equation, the uniform integrability condition (2.16) and the local regularity assumptions (2.21). Therefore, we can apply Proposition 2.14, obtaining the representation formula µεt =
24
(Ttε )# µε0 , where Ttε is the maximal solution of the ODE Ṫtε = vtε (Ttε ) with the
initial condition T0ε = x (see Lemma 2.10).
Now, taking into account Lemma 2.16, we estimate
Z
Z Z t2
|Ttε2 (x) − Ttε1 (x)|2 dµε0 ≤ (t2 − t1 )
|Ṫtε (x)|2 dt dµε0 (2.53)
Rd
Rd t1
t2 Z
Z
=
(t2 − t1 )
t1
Z t2
Rd
Z
|vtε (x)|2 dµεt dt
|vt |2 dµt dt,
≤ (t2 − t1 )
Rd
t1
therefore the transport plan γ ε := (Ttε1 × Ttε2 )# µε0 ∈ Γ(µεt1 , µεt2 ) satisfies
W22 (µεt1 , µεt2 )
Z
2
≤
ε
t2
Z
Z
|vt |2 dµt dt.
|x − y| dγ ≤ (t2 − t1 )
R2d
t1
Rd
Since for every t ∈ I µεt converges narrowly to µt as ε → 0, a compactness
argument (see Lemma 5.2.2 or Proposition 7.1.3 of [8]) gives
W22 (µt1 , µt2 )
Z
≤
Z
2
t2
Z
|x − y| dγ ≤ (t2 − t1 )
R2d
t1
|vt |2 dµt dt
Rd
for some optimal transport plan γ between µt1 and µt2 . Since t1 and t2 are arbitrary this implies that µt is absolutely continuous and that its metric derivative
is less than kvt kLp (µt ;Rd ) for L 1 -a.e. t ∈ I.
Notice that the continuity equation (2.46) involves only the action of vt on
∇ϕ with ϕ ∈ Cc∞ (Rd ). Moreover, Theorem 2.17 shows that the minimal norm
among all possible velocity fields vt is the metric derivative and that vt belongs
to the L2 closure of gradients of functions in Cc∞ (Rd ). These facts suggest
a “canonical” choice of vt and the following definition of tangent bundle to
P2 (Rd ).
Definition 2.18 (Tangent bundle) Let µ ∈ P2 (Rd ). We define
L2 (µ;Rd )
Tanµ P2 (Rd ) := {∇ϕ : ϕ ∈ Cc∞ (Rd )}
.
This definition is motivated by the following variational selection principle:
Lemma 2.19 (Variational selection of the tangent vectors) A vector v ∈
L2 (µ; Rd ) belongs to the tangent space Tanµ P2 (Rd ) iff
kv + wkL2 (µ;Rd ) ≥ kvkL2 (µ;Rd )
∀ w ∈ L2 (µ; Rd ) such that ∇ · (wµ) = 0. (2.54)
In particular, for every v ∈ L2 (µ; Rd ), denoting by Π(v) its orthogonal projection
on Tanµ P2 (Rd ), we have ∇ · ((v − Π(v))µ) = 0.
25
Proof. By the convexity of the L2 norm, (2.54) holds iff
Z
hv, wi dµ = 0 for any w ∈ L2 (µ; Rd ) such that ∇ · (wµ) = 0
(2.55)
Rd
2
and,
by the Hahn-Banach
theorem, this is true iff v belongs to the L closure
∞
d
of ∇φ : φ ∈ Cc (R ) . Therefore (2.54) holds iff v belongs to Tanµ P2 (Rd ).
The remarks above lead also to the following characterization of divergencefree vector fields (we skip the elementary proof of this statement):
Proposition 2.20 Let w ∈ L2 (µ; Rd ). Then ∇ · (wµ) = 0 iff
kv − wkL2 (µ;Rd ) ≥ kvkL2 (µ;Rd )
∀v ∈ Tanµ P2 (Rd ).
Moreover equality holds for some v iff w = 0.
By the characterization (2.55) of Tanµ P2 (Rd ) we obtain also
d
2
d
Tan⊥
µ P2 (R ) = v ∈ L (µ, R ) : ∇ · (vµ) = 0 .
(2.56)
The following two propositions show that the notion of tangent space is
consistent with the metric structure, with the continuity equation, and with
optimal transport maps (if any).
Proposition 2.21 (Tangent vector to a.c. curves) Let µt : I → P2 (Rd )
be an absolutely continuous curve and let vt ∈ Lp (µt ; Rd ) be such that (2.46)
holds. Then vt satisfies (2.45) as well if and only if vt = Π(vt ) ∈ Tanµt P2 (Rd )
for L 1 -a.e. t ∈ I. The vector vt is uniquely determined L 1 -a.e. in I by (2.45)
and (2.46).
Proof. The uniqueness of vt is a straightforward consequence of the linearity
with respect to the velocity field of the continuity equation and of the strict
convexity of the Lp norm.
In the proof of Theorem 2.17 we built vector fields vt ∈ Tanµt P2 (Rd ) satisfying (2.45) and (2.46). By uniqueness, it follows that conditions (2.45) and
(2.46) imply vt ∈ Tanµt P2 (Rd ) for L 1 -a.e. t.
In the following proposition we recover the tangent vector field to a curve
(µt ) ⊂ P2a (Rd ) through the infinitesimal behaviour of optimal transport maps
along the curve. See Proposition 8.4.6 of [8] for a more general result in the case
of curves (µt ) ⊂ P2 (Rd ).
Proposition 2.22 (Optimal plans along a.c. curves) Let µt : I → P2a (Rd )
be an absolutely continuous curve and let vt ∈ Tanµt P2 (Rd ) be characterized by
Proposition 2.21. Then, for L 1 -a.e. t ∈ I the following properties hold:
lim
h→0
1 µt+h
(t
− i) = vt
h µt
26
in L2 (µt ; Rd ),
(2.57)
µ
where tµtt+h is the unique optimal transport map between µt and µt+h , and
W2 (µt+h , (i + hvt )# µt )
= 0.
|h|
lim
h→0
(2.58)
Proof. Let D ⊂ Cc∞ (Rd ) be a countable set with the following property: for
any integer R > 0 and any ψ ∈ Cc∞ (Rd ) with supp ψ ⊂ BR there exist (ϕn ) ⊂ D
with supp ϕn ⊂ BR and ϕn → ϕ in C 1 (Rd ).
We fix t ∈ I such that W2 (µt+h , µt )/|h| → |µ0 |(t) = kvt kL2 (µt ) and
Z
µt+h (ϕ) − µt (ϕ)
=
lim
h∇ϕ, vt i dµt
∀ϕ ∈ D.
(2.59)
h→0
h
Rd
Since D is countable, the metric differentiation theorem implies that both conditions are fulfilled for L 1 -a.e. t ∈ I. Set
µ
sh :=
tµt+h
−i
t
h
and fix ϕ ∈ D and a weak limit point s0 of sh as h → 0. We use the identity
Z
1
µt+h (ϕ) − µt (ϕ)
=
ϕ(tµµt+h
(x)) − ϕ(x) dµt
t
h
h Rd
Z
Z
1
=
ϕ(x + hsh (x)) − ϕ(x) dµh = h
h∇ϕ(x), sh (x)i + ωx (h) dµh
h Rd
Rd
with ωx (h) bounded and infinitesimal as h → 0, to obtain
Z
Z
h∇ϕ, vt i dµt =
h∇ϕ, s0 i dµt (x).
Rd
Rd
It follows that
∇ · ((s0 − vt )µt ) = 0.
We now claim that
Z
(2.60)
2
|s0 |2 dµt (x) ≤ [|µ0 |(t)] .
(2.61)
Rd
Indeed
Z
Rd
|s0 |2 dµt (x) ≤ lim inf
Z
|sh |2 dµt
Z
1
= lim inf 2
|tµµt+h
(x) − x|2 dµt
t
h→0 h
Rd
W2 (µt+h , µt )
= lim inf
= |µ0 |2 (t).
h→0
h2
h→0
Rd
From (2.61) we obtain that ks0 kL2 (µt ;Rd ) ≤ [|µ0 |(t)] = kvt kLp (µt ;Rd ) . Therefore
Proposition 2.20 entails that s0 = vt . Moreover, the first inequality above is
strict if sh converge weakly, but not strongly, to s0 . Therefore (2.57) holds.
Now we show (2.58). By (2.10) we can estimate the distance between µt+h
µ
kL2 (µt ;Rd ) , and because of (2.57) this norm
and (i + hvt )# µt with ki + hvt − tµt+h
t
tends to 0 faster than h.
27
As an application of (2.58) we are now able to show the L 1 -a.e. differentiability of t 7→ W2 (µt , σ) along absolutely continuous curves µt , wuth
µt ∈ P2a (Rd ).
Theorem 2.23 (Generic differentiability of W2 (µt , σ)) Let µt : I → P2a (Rd )
be an absolutely continuous curve, let σ ∈ P2a (Rd ) and let vt ∈ Tanµt P2 (Rd )
be its tangent vector field, characterized by Proposition 2.21. Then
Z
d 2
W (µt , σ) = 2
hx − tσµt (x), vt (x)i dµt (x)
for L 1 -a.e. t ∈ I. (2.62)
dt 2
Rd
Proof. We show that the stated property is true at any t̄ where (2.58) holds
and the derivative of t 7→ W2 (µt , σ) exists (recall that this map is absolutely
continuous). Due to (2.58), we know that the limit
W22 ((i + hvt̄ )# µt̄ , σ) − W22 (µt̄ , σ)
h→0
h
L := lim
d
exists and coincides with dt
W22 (µt , σ) evaluated at t = t̄, and we have to show
that it is equal to the left hand side in (2.62).
Using the transport s := (i + hvt̄ ) ◦ tµσt̄ to estimate from above W2 ((i +
hvt̄ )# µt̄ , σ), we get
Z
W22 ((i + hvt̄ )# µt̄ , σ) ≤
|(i + hvt̄ ) ◦ tµσt̄ − i|2 dσ
d
ZR
Z
=
|i + hvt̄ − tσµt̄ |2 dµt̄ = 2
hi − tµt̄ , vt̄ i dµt̄ + o(h).
Rd
Rd
Dividing both sides by h and taking limits as h ↓ 0 or h ↑ 0 we obtain
Z
L≤2
hx − tσµt (x), vt (x)i dµt (x) ≤ L. Rd
The argument in the previous proof leads to the so-called superdifferentiability property of the Wasserstein distance, a theme used in many papers on this
subject (see in particular [50] and Chapter 10 of [8]). Finally, we compare the
tangent space arising from the closure of gradients of smooth compactly supported function with the tangent space built using optimal maps. Proposition
2.22 suggests indeed another possible definition of tangent cone to a measure
µ ∈ P2a (Rd ): we define
L2 (µ;Rd )
Tanrµ P2 (Rd ) := λ(tνµ − i) : ν ∈ P2 (Rd ), λ > 0
.
(2.63)
As a matter of fact, the two concepts coincide (see also §8.5 of [8] for a more
general statement).
Theorem 2.24 For any µ ∈ P2a (Rd ) we have Tanµ P2 (Rd ) = Tanrµ P2 (Rd ).
28
Proof. We show first that optimal transport maps t = tσµt belong to Tanµ P2 (Rd ).
Assume that supp σ is contained in B R (0) for some R > 0. We know that we
can represent t = ∇ϕ, where ϕ is a Lipschitz convex function. We consider
now the mollified functions ϕε . A truncation argument enabling an approximation by gradients with compact support gives that ∇ϕε belong to Tanµ P2 (Rd ).
Due to the absolute continuity of µ it is immediate to check using the dominated convergence theorem that ∇ϕε converge to ∇ϕ in L2 (µ; Rd ), therefore
∇ϕ ∈ Tanµ P2 (Rd ) as well. In the case when the support of σ is not bounded
we approximate σ in P2 (Rd ) by measures with compact support (details are
worked out in Lemma 8.5.3 of [8]).
Now we show the opposite inclusion: if ϕ ∈ Cc∞ (Rd ) it is always possible
to choose λ > 0 such that x 7→ 21 |x|2 + λ−1 φ(x) is convex. Therefore r :=
i + λ−1 ∇ϕ is the optimal map between µ and ν := r # µ; by (2.63) we obtain
that ∇φ = λ(r − i) belongs to Tanrµ P2 (Rd ).
29
Convex functionals in P2 (Rd )
3
The importance of geodesically convex functionals in Wasserstein spaces was
firstly pointed out by McCann [51], who introduced the three basic examples
we will discuss in detail in 3.4, 3.6, 3.8. His original motivation was to prove
the uniqueness of the minimizer of an energy functional which results from the
sum of the above three contributions.
Applications of this idea have been given to (im)prove many deep functional
(Brunn-Minkowski, Gaussian, (logarithmic) Sobolev, Isoperimetric, etc.) inequalities: we refer to Villani’s book [65, Chap. 6] (see also the survey [37])
for a detailed account on this topic. Connections with evolution equations have
also been exploited [53, 57, 58, 1, 21], mainly to study the asymptotic decay of
the solution to the equilibrium.
From our point of view, convexity is a crucial tool to study the well posedness
and the basic regularity properties of gradient flows. Thus in this section we
discuss the basic notions and properties related to this concept: the first part
of Section 3.1 is devoted to fixing the notion of convexity along geodesics in
P2 (Rd ).
Section 3.2 discusses in great generality the main examples of geodesically
convex functionals: potential, interaction and internal energy. We consider
also the convexity properties of the map µ 7→ −W22 (µ, ν) and its geometric
implications.
In the last section we give a closer look to the convexity properties of general
Relative Entropy functionals, showing that they are strictly related to the logconcavity of the reference measures.
3.1
λ-geodesically convex functionals in Pp (X)
In McCann’s approach, a functional φ : P2a (Rd ) → (−∞, +∞] is displacement
convex if
2
setting µ1→2
:= i + t(t − i) # µ1 , with t = tµµ1 ,
t
(3.1)
the map t ∈ [0, 1] 7→ φ µ1→2
is convex, ∀ µ1 , µ2 ∈ P2a (Rd ).
t
We have seen that the curve µ1→2
is the unique constant speed geodesic connectt
ing µ1 to µ2 ; therefore the following definition seems natural, when we consider
functionals whose domain contains general probability measures.
Definition 3.1 (λ-convexity along geodesics) Let φ : P2 (Rd ) → (−∞, +∞].
Given λ ∈ R, we say that φ is λ-geodesically convex in P2 (Rd ) if for every couple µ1 , µ2 ∈ P2 (Rd ) there exists µ ∈ Γo (µ1 , µ2 ) such that
φ(µ1→2
) ≤ (1 − t)φ(µ1 ) + tφ(µ2 ) −
t
λ
t(1 − t)W22 (µ1 , µ2 )
2
∀ t ∈ [0, 1],
(3.2)
where µ1→2
= (1 − t)π 1 + tπ 2 # µ, π 1 , π 2 being the projections onto the first
t
and the second coordinate in Rd × Rd , respectively.
30
Remark 3.2 (The map t 7→ φ(µ1→2
) is λ-convex) The standard definition
t
of λ-convexity for a map ϕ : Rn → R requires
λ
ϕ(tx+(1−t)y) ≤ tϕ(x)+(1−t)ϕ(y)− t(1−t)|x−y|2
2
∀t ∈ [0, 1], x, y ∈ Rn
(3.3)
(equivalently, if ϕ is continuous, one might ask that D2 ϕ ≥ λI in the sense
of distributions). The definition of λ-convexity expressed through (3.2) implies
that
the map t ∈ [0, 1] 7→ φ(µ1→2
) is λ-convex,
(3.4)
t
thus recovering an (apparently) stronger and more traditional form. This equivalence follows easily bythe fact that for t1 < t2 in [0, 1] with {t1 , t2 } =
6 {0, 1}
1→2
1→2
1→2
the plan πt1→2
×
π
µ
is
the
unique
element
of
Γ
(µ
,
µ
).
o t1
t2
t2
1
#
Let us discuss now the convexity properties of the squared Wasserstein distence. In the 1-dimensional case it can be easily shown (see Theorem 6.0.2 of [8])
that P2 (R) is isometrically isomorphic to a closed convex subset of an Hilbert
space: precisely the space of nondecreasing functions in (0, 1) (the inverses of
distribution functions), viewed as a subset of L2 (0, 1). Thus the 2-Wasserstein
distance in R satisfies the generalized parallelogram rule
W22 (µ1 , µ2→3
) = (1 − t)W22 (µ1 , µ2 ) + tW22 (µ1 , µ3 ) − t(1 − t)W22 (µ2 , µ3 )
t
∀ t ∈ [0, 1],
µ1 , µ2 , µ3 ∈ P2 (R1 ).
(3.5)
On the other hand, if the ambient space has dimension ≥ 2 the following example
shows that there is no constant λ such that W22 (·, µ1 ) is λ-convex along geodesics.
Example 3.3 (The squared distance function is not λ-convex along geodesics)
Let d = 2 and
1
1
δ(0,0) + δ(2,1) ,
µ3 :=
δ(0,0) + δ(−2,1) .
µ2 :=
2
2
It is easy to check that the unique optimal map r pushing µ2 to µ3 maps (0, 0)
in (−2, 1) and (2, 1) in (0, 0), therefore there is a unique constant speed geodesic
joining the two measures, given by
1
µ2→3
:=
δ(−2t,t) + δ(2−2t,1−t)
t ∈ [0, 1].
t
2
Choosing µ1 := 12 δ(0,0) + δ(0,−2) , there are two maps r t , st pushing µ1 to
µ2→3
, given by
t
r t (0, 0) = (−2t, t),
r t (0, −2) = (2 − 2t, 1 − t),
st (0, 0) = (2 − 2t, 1 − t),
st (0, −2) = (−2t, t).
Therefore
13 2
9
1
2
W22 (µ2→3
,
µ
)
=
min
5t
−
7t
+
,
5t
−
3t
+
t
2
2
has a concave cusp at t = 1/2 and therefore is not λ-convex along the geodesic
µ2→3
for any λ ∈ R.
t
31
3.2
Examples of convex functionals in P2 (Rd )
In this section we introduce the main classes of geodesically convex functionals.
Example 3.4 (Potential energy) Let V : Rd → (−∞, +∞] be a proper,
lower semicontinuous function whose negative part has a quadratic growth, i.e.
V (x) ≥ −A − B|x|2
∀ x ∈ Rd
In P2 (Rd ) we define
for some
A, B ∈ R+ .
(3.6)
Z
V(µ) :=
V (x) dµ(x).
(3.7)
Rd
Evaluating V on Dirac’s masses we check that V is proper; since V − has at
most quadratic growth it is not hard to show that V is lower semicontinuous in
P2 (Rd ). If V is bounded from below we have even lower semicontinuity w.r.t.
narrow convergence.
The following simple proposition shows that V is convex along all interpolating curves induced by admissible plans; choosing optimal plans one obtains
in particular that V is convex along geodesics.
Proposition 3.5 (Convexity of V) If V is λ-convex then for every µ1 , µ2 ∈
D(V) and µ ∈ Γ(µ1 , µ2 ) we have
Z
λ
1
2
V(µ1→2
)
≤
(1
−
t)V(µ
)
+
tV(µ
)
−
t(1
−
t)
|x1 − x2 |2 dµ(x1 , x2 ). (3.8)
t
2
Rd ×Rd
In particular V is λ-convex along geodesics.
Proof. Since V is bounded from below either by a continuous affine functional
(if λ ≥ 0) or by a quadratic function (if λ < 0) its negative part satisfies (3.6);
therefore the definition (3.7) makes sense.
Integrating (3.3) along any admissible transport plan µ ∈ Γ(µ1 , µ2 ) with
1
µ , µ2 ∈ D(V) we obtain (3.8), since
Z
V ((1 − t)x1 + tx2 ) dµ(x1 , x2 )
V(µ1→2
)
=
t
Rd ×Rd
Z
λ
(1 − t)V (x1 ) + tV (x2 ) − t(1 − t)|x1 − x2 |2 dµ(x1 , x2 )
≤
2
Rd ×Rd
Z
λ
|x1 − x2 |2 dµ(x1 , x2 ).
= (1 − t)V(µ1 ) + tV(µ2 ) − t(1 − t)
2
Rd ×Rd
Since V(δx ) = V (x), it is easy to check that the conditions on V are also
necessary for the validity of the previous proposition.
Example 3.6 (Interaction energy) Let us fix an integer k > 1 and let us
consider a lower semicontinuous function W : Rkd → (−∞, +∞], whose negative part satisfies the usual quadratic growth condition. Denoting by µ×k the
measure µ × µ × · · · × µ on Rkd , we set
Z
Wk (µ) :=
W (x1 , x2 , . . . , xk ) dµ×k (x1 , x2 , . . . , xk ).
(3.9)
Rkd
32
If
∃ x ∈ Rkd : W (x, x, . . . , x) < +∞,
(3.10)
then Wk is proper; its lower semicontinuity follows from the fact that
µn → µ
in P2 (Rd )
=⇒
×k
µ×k
n →µ
in P2 (Rkd ).
(3.11)
Here the typical example is k = 2 and W (x1 , x2 ) := W̃ (x1 − x2 ) for some
W̃ : Rd → (−∞, +∞] with W̃ (0) < +∞.
Proposition 3.7 (Convexity of W) If W is convex then the functional Wk
is convex along any interpolating curve µ1→2
, µ ∈ Γ(µ1 , µ2 ), in P2 (Rd ).
t
Proof. Observe that Wk is the restriction to the subset
n
o
P2× (Rkd ) := µ×k : µ ∈ P2 (Rd )
of the potential energy functional W on P2 (Rkd ) given by
Z
W(µ) :=
W (x1 , · · · , xk ) dµ(x1 , . . . , xk ).
Rkd
We consider the linear permutation of coordinates P : (R2d )k → (Rkd )2 defined
by
P (x1 , y1 ), (x2 , y2 ), . . . , (xk , yk ) := (x1 , . . . xk ), (y1 , . . . yk ) .
×k
kd 2
If µ ∈ Γ(µ1 , µ2 ) then it is easy to check that P# µ×k ∈ Γ(µ×k
1 , µ2 ) ⊂ P((R ) )
and
×k
(πt1→2 )# P# (µ×k ) = P# (πt1→2 )# µ
.
Therefore all the convexity properties for Wk follow from the corresponding ones
of W.
Example 3.8 (Internal energy) Let F : [0, +∞) → (−∞, +∞] be a proper,
lower semicontinuous convex function such that
F (0) = 0,
lim inf
s↓0
F (s)
d
> −∞ for some α >
.
α
s
d+2
We consider the functional F : P2 (Rd ) → (−∞, +∞] defined by
(R
F (ρ(x)) dL d (x) if µ = ρ · L d ∈ P2a (Rd ),
Rd
F(µ) :=
+∞
otherwise.
(3.12)
(3.13)
Remark 3.9 (The meaning of condition (3.12)) Condition (3.12) simply guarantees that the negative part of F (µ) is integrable in Rd . For, let us observe
that there exist nonnegative constants c1 , c2 such that the negative part of F
satisfies
F − (s) ≤ c1 s + c2 sα ∀ s ∈ [0, +∞),
33
and it is not restrictive to suppose α ≤ 1. Since µ = ρ L d ∈ P2 (Rd ) and
2α
1−α > d we have
Z
ρα (x) dL d (x) =
ρα (x)(1 + |x|)2α (1 + |x|)−2α dL d (x)
d
d
R
R
Z
α Z
1−α
2
≤
ρ(x)(1 + |x|) dL d (x)
(1 + |x|)−2α/(1−α) dL d (x)
< +∞
Z
Rd
Rd
and therefore F − (ρ) ∈ L1 (Rd ).
Remark 3.10 (Lower semicontinuity of F) General results on integral functionals (see for instance [7]) show that F is narrowly lower semicontinuous if F
has a superlinear growth at infinity. Indeed, under this assumption sequences
µn = ρn L d on which F is bounded have the property that (ρn ) is sequentially
weakly relatively compact in L1 (Rd ), and the convexity of F together with
the lower semicontinuity of F ensure the sequential lower semicontinuity with
respect to the weak L1 topology.
Proposition 3.11 (Convexity of F) If
the map
s 7→ sd F (s−d )
is convex and non increasing in (0, +∞),
(3.14)
then the functional F is convex along geodesics in P2 (Rd ).
Proof.
We consider two measures µi = ρi L d ∈ D(F), i = 1, 2 and the
optimal transport map r such that r # µ1 = µ2 . Setting r t := (1 − t)i + tr,
by the characterization of constant speed geodesics we know that r t is the
optimal transport map between µ1 and µt := r t# µ1 for any t ∈ [0, 1], and
µt = ρt L d ∈ P2a (Rd ), with
ρt (r t (x)) =
ρ1 (x)
˜ t (x)
det ∇r
By (2.4) it follows that
Z
Z
F(µt ) =
F (ρt (y)) dy =
Rd
F
for µ1 -a.e. x ∈ Rd .
Rd
ρ(x)
det ∇r t (x) dx.
det ∇r t (x)
Since for a diagonalizable map D with nonnegative eigenvalues
t 7→ det((1 − t)I + tD)1/d
is concave in [0, 1],
(3.15)
the integrand above may be seen as the composition of the convex and nonincreasing map s 7→ sd F (ρ(x)/sd ) and of the concave map in (3.15), so that the
resulting map is convex in [0, 1] for µ1 -a.e. x ∈ Rd . Thus we have
F
ρ1 (x) ˜ t (x) ≤ (1 − t)F (ρ1 (x)) + tF (ρ2 (x))
det ∇r
˜
det ∇r t (x)
34
and the thesis follows by integrating this inequality in Rd .
In order to express (3.14) in a different way, we introduce the function
LF (z) := zF 0 (z) − F (z)
which satisfies − LF (e−z )ez =
d
F (e−z )ez ; (3.16)
dz
denoting by F̂ the modified function F (e−z )ez we have the simple relation
L̂F (z) = −
2
d
c2 (z) = − d L̂ (z) = d F̂ (z),
F̂ (z), L
F
F
dz
dz
dz 2
2
0
LF (z) := LLF (z) = zLF (z) − LF (z).
where
(3.17)
The nonincreasing part of condition (3.14) is equivalent to say that
LF (z) ≥ 0
∀ z ∈ (0, +∞),
(3.18)
and it is in fact implied by the convexity of F . A simple computation in the
case F ∈ C 2 (0, +∞) shows
d2
d2
d2
d
−d d
2
F
(s
)s
=
F̂
(d
·
log
s)
=
L̂
(d
·
log
s)
+ L̂F (d · log s) 2 ,
F
ds2
ds2
s2
s
and therefore
1
(3.14) is equivalent to L2F (z) ≥ − LF (z) ∀ z ∈ (0, +∞),
d
(3.19)
i.e.
zL0F (z) ≥ 1−
1
LF (z),
d
the map z 7→ z 1/d−1 LF (z) is non increasing. (3.20)
Observe that the bigger is the dimension d, the stronger are the above conditions,
which always imply the convexity of F .
Remark 3.12 (A “dimension free” condition) The weakest condition on
F yielding the geodesic convexity of F in any dimension is therefore
L2F (z) = zL0F (z) − LF (z) ≥ 0
∀ z ∈ (0, +∞).
(3.21)
Taking into account (3.17), this is also equivalent to ask that
the map
s 7→ F (e−s )es
is convex and nonincreasing in (0, +∞).
(3.22)
Among the functionals F satisfying (3.14) we quote:
the entropy functional: F (s) = s log s,
1
the power functional: F (s) =
sm
m−1
(3.23)
1
for m ≥ 1 − .
d
(3.24)
Observe that the entropy functional and the power functional with m > 1 have
a superlinear growth. In order to deal with the power functional with m ≤ 1,
35
due to the failure of the lower semicontinuity property one has to introduce a
suitable relaxation F ∗ of it, defined by [43, 16]
Z
1
∗
F (µ) :=
F (ρ(x)) dL d (x) with µ = ρ · L d + µs , µs ⊥ L d . (3.25)
m − 1 Rd
In this case the functional takes only account of the density of the absolutely
continuous part of µ w.r.t. L d and the domain of F ∗ is the whole P2 (Rd ). The
functional F ∗ retains the convexity properties of F, see [8].
Example 3.13 (The opposite Wasserstein distance) Let us fix a base measure µ1 ∈ P2 (Rd ) and let us consider the functional
1
φ(µ) := − W22 (µ1 , µ).
2
(3.26)
Proposition 3.14 For each couple µ2 , µ3 ∈ P2 (Rd ) and each transfer plan
µ2 3 ∈ Γ(µ2 , µ3 ) we have
W22 (µ1 , µ2→3
) ≥ (1 − t)W22 (µ1 , µ2 ) + tW22 (µ1 , µ3 )
t
Z
− t(1 − t)
|x2 − x3 |2 dµ2 3 (x2 , x3 )
∀ t ∈ [0, 1].
(3.27)
Rd ×Rd
In particular the map φ : µ 7→ − 12 W22 (µ1 , µ) is (−1)-convex along geodesics.
Proof. For µ2 3 ∈ Γ(µ2 , µ3 ), we can find (see Proposition 7.3.1 of [8]) µ ∈
P(Rd × Rd × Rd ) whose projection on the second and third variable is µ2 3 and
such that
(π 1 , (1 − t)π 2 + tπ 3 )# µ ∈ Γo (µ1 , µ2→3
),
(3.28)
t
with µ2→3
:= ((1 − t)π 2 + tπ 3 )# µ2 3 . Therefore
t
Z
2 1
2→3
W2 (µ , µt ) =
|(1 − t)x2 + tx3 − x1 |2 dµ(x1 , x2 , x3 )
R3d
Z =
(1 − t)|x2 − x1 |2 + t|x3 − x1 |2 − t(1 − t)|x2 − x3 |2 dµ(x1 , x2 , x3 )
R3d
Z
≥ (1 − t)W22 (µ1 , µ2 ) + tW22 (µ1 , µ3 ) − t(1 − t)
|x2 − x3 |2 dµ2 3 (x2 , x3 ).
R2d
In particular, choosing optimal plans in (3.27), we obtain the semiconcavity
inequality of the Wasserstein distance from a fixed measure µ3 along the constant
speed geodesics µ1→2
connecting µ1 to µ2 :
t
W22 (µ1→2
, µ3 ) ≥ (1 − t)W22 (µ1 , µ3 ) + tW22 (µ2 , µ3 ) − t(1 − t)W22 (µ1 , µ2 ). (3.29)
t
According to Aleksandrov’s metric notion of curvature (see [5], [46]), this
inequality can be interpreted by saying that the Wasserstein space is a positively
36
curved metric space (in short, a P C-space). This was already pointed out by a
formal computation in [57], showing also that generically the inequality is strict.
An example where strict inequality occurs can be obtained as follows: let d = 2
and
µ1 :=
1
δ(1,1) + δ(5,3) ,
2
µ2 :=
1
δ(−1,1) + δ(−5,3) ,
2
µ3 :=
1
δ(0,0) + δ(0,−4) .
2
Then, it is immediate to check that W22 (µ1 , µ2 ) = 40, W22 (µ1 , µ3 ) = 30, and
W22 (µ2 , µ3 ) = 30. On the other hand, the unique constant speed geodesic joining
µ1 to µ2 is given by
µt :=
1
δ(1−6t,1+2t) + δ(5−6t,3−2t)
2
and a simple computation gives
24 = W22 (µ1/2 , µ3 ) >
3.3
30 30 40
+
− .
2
2
4
Relative entropy and convex functionals of measures
In this section we study in detail the relative entropy functional; although we
confine the discussion to a finite-dimensional situation, the formalism used in
this section is well adapted to the extension to an infinite-dimensional context,
see [8].
Definition 3.15 (Relative entropy) Let γ, µ be Borel probability measures
on Rd ; the relative entropy of µ w.r.t. γ is
Z
dµ
dµ

log
dγ if µ γ,
dγ
dγ
d
H(µ|γ) :=
(3.30)
R

+∞
otherwise.
As in Example 3.8 we introduce the nonnegative, l.s.c., (extended) real, (strictly)
convex function


s(log s − 1) + 1 if s > 0,
H(s) := 1
(3.31)
if s = 0,


+∞
if s < 0,
and we observe that
Z
dµ H(µ|γ) =
H
dγ ≥ 0;
dγ
Rd
H(µ|γ) = 0
⇔
µ = γ.
(3.32)
Remark 3.16 (Changing γ) Let γ be a Borel measure on Rd and let V :
Rd → (−∞, +∞] a Borel map such that
V+
has at most quadratic growth γ̃ := e−V · γ
37
is a probability measure.
(3.33)
Then for measures in P2 (Rd ) the relative entropy w.r.t. γ is well defined by the
formula
Z
H(µ|γ) := H(µ|γ̃) −
V (x) dµ(x) ∈ (−∞, +∞] ∀ µ ∈ P2 (Rd ).
(3.34)
Rd
In particular, when γ is the d-dimensional Lebesgue measure, we find the standard entropy functional introduced in (3.23).
More generally, we can consider a
proper, l.s.c., convex function F : [0, +∞) → [0, +∞]
with superlinear growth
(3.35)
and the related functional
F(µ|γ) :=
Z

F
dµ dγ
d
 R
+∞
dγ
if µ γ,
(3.36)
otherwise.
Lemma 3.17 (Joint lower semicontinuity) Let γ n , µn ∈ P(Rd ) be two sequences narrowly converging to γ, µ in P(Rd ). Then
lim inf H(µn |γ n ) ≥ H(µ|γ),
lim inf F(µn |γ n ) ≥ F(µ|γ).
n→∞
n→∞
(3.37)
The proof of this lemma follows easily from the next representation formula;
before stating it, we need to introduce the conjugate function of F
F ∗ (s∗ ) := sup s · s∗ − F (s) < +∞
∀ s∗ ∈ R,
(3.38)
s≥0
so that
F (s) = sup s∗ · s − F ∗ (s∗ );
(3.39)
s∗ ∈R
if s0 ≥ 0 is a minimizer of F then
F ∗ (s∗ ) ≥ s∗ s0 − F (s0 ),
s ≥ s0
⇒
F (s) = sup s∗ · s − F ∗ (s∗ ).
(3.40)
s∗ ≥0
∗
In the case of the entropy functional, we have H ∗ (s∗ ) = es − 1. Now we recall
a classical duality formula for functionals defined on measures.
Lemma 3.18 (Duality formula) For any γ, µ ∈ P(Rd ) we have
Z
nZ
o
F(µ|γ) = sup
S ∗ (x) dµ(x) −
F ∗ (S ∗ (x)) dγ(x) : S ∗ ∈ Cb0 (Rd ) . (3.41)
Rd
Rd
Proof.
Up to an addition of a constant, we can always assume F ∗ (0) =
− mins≥0 F (s) = −F (s0 ) = 0. Let us denote by F 0 (µ|γ) the right hand side of
(3.41). It is obvious that F 0 (µ|γ) ≤ H(µ|γ), so that we have to prove only the
converse inequality.
38
First of all we show that F 0 (µ|γ) < +∞ yields that µ γ. For, let us fix
s , ε > 0 and a Borel set A with γ(A) ≤ ε/2. Since µ, γ are finite measures we
can find a compact set K ⊂ A, an open set G ⊃ A and a continuous function
ζ : Rd → [0, s∗ ] such that
∗
µ(G \ K) ≤ ε,
γ(G) ≤ ε,
ζ(x) = s∗
on K,
ζ(x) = 0
on Rd \ G.
Since F ∗ is increasing (by the definition (3.38)) and F ∗ (0) = 0, we have
Z
Z
s∗ µ(K) − F ∗ (s∗ )ε ≤
ζ(x) dµ(x) −
F ∗ (ζ(x)) dγ(x)
K
G
Z
Z
≤
ζ(x) dµ(x) −
F ∗ (ζ(x)) dγ(x) ≤ F 0 (µ|γ)
Rd
Rd
Taking the supremum w.r.t. K ⊂ A and s∗ ≥ 0, and using (3.40) we get
εF µ(A)/ε ≤ F 0 (µ|γ) if µ(A) ≥ εs0 .
Since F (s) has a superlinear growth as s → +∞, we conclude that µ(A) → 0
as ε ↓ 0.
Now we can suppose that µ = ρ · γ for some Borel function ρ ∈ L1 (γ), so
that
o
nZ
S ∗ (x)ρ(x) − F ∗ (S ∗ (x)) dγ(x) : S ∗ ∈ Cb0 (Rd )
F 0 (µ|γ) = sup
Rd
and, for a suitable dense countable set C = {s∗n }n∈N ⊂ R
Z
F(µ|γ) =
sup s∗ ρ(x) − F ∗ (s∗ ) dγ(x)
∗
d
R s ∈C
Z
= lim
sup s∗ ρ(x) − F ∗ (s∗ ) dγ(x)
k→∞
Rd s∗ ∈Ck
where Ck = {s∗1 , · · · , s∗k }. Our thesis follows if we show that for every k
Z
max
s∗ ρ(x) − F ∗ (s∗ ) dγ(x) ≤ F 0 (µ|γ).
(3.42)
∗
Rd s ∈Ck
For we call
n
o
Aj = x ∈ Rd : s∗j ρ(x) − F ∗ (s∗j ) ≥ s∗i ρ(x) − F ∗ (s∗i ) ∀ i ∈ {1, . . . , k} ,
and
A01
= A1 ,
A0j+1
= Aj+1 \
j
[
Ai .
i=1
We find compact sets Kj ⊂ A0j , open sets Gj ⊃ Aj with Gj ∩ Ki = ∅ if i 6= j,
and continuous functions ζj such that
k
X
γ(Gj \ Kj ) + µ(Gj \ Kj ) ≤ ε,
ζj ≡ s∗j on Kj ,
j=1
39
ζj ≡ 0 on Rd \ Gj .
Pk
Pk
Denoting by ζ := j=1 ζj , M := j=1 |s∗j |, since the negative part of F ∗ (s∗ ) is
bounded above by |s∗ |s0 we have
Z
k
X
∗
∗ ∗
s
ρ(x)
−
F
(s
)
dγ(x)
=
max
∗
Rd s ∈Ck
≤
j=1
k Z
X
j=1
=
j=1
A0j
s∗j ρ(x) − F ∗ (s∗j ) dγ(x)
s∗j ρ(x) − F ∗ (s∗j ) dγ(x) + ε(M + M s0 )
Kj
k Z
X
Z
Z
ζ(x)ρ(x) − F ∗ (ζ(x)) dγ(x) + ε(M + M s0 )
Kj
ζ(x)ρ(x) − F ∗ (ζ(x)) dγ(x) + ε(M + M s0 + M + F ∗ (M )).
≤
Rd
Passing to the limit as ε ↓ 0 we get (3.42).
3.4
Log-concavity and displacement convexity
We want to characterize the probability measures γ inducing a geodesically
convex relative entropy functional H(·|γ) in P2 (Rd ). The following lemma
provides the first crucial property; the argument is strictly related to the proof of
the Brunn-Minkowski inequality for the Lebesgue measure, obtained via optimal
transportation inequalities [65]. See also [13] for the link between log-concavity
and representation formulae like (3.50).
Lemma 3.19 (γ is log-concave if H(·|γ) is displacement convex) Suppose
that for each couple of probability measures µ1 , µ2 ∈ P(Rd ) with bounded support there exists µ ∈ Γ(µ1 , µ2 ) such that H(·|γ) is convex along the interpolating
curve µ1→2
= (1 − t)π 1 + tπ 2 # µ, t ∈ [0, 1]. Then for each couple of open sets
t
A, B ⊂ Rd and t ∈ [0, 1] we have
log γ((1 − t)A + tB) ≥ (1 − t) log γ(A) + t log γ(B).
(3.43)
Proof. We can obviously assume that γ(A) > 0, γ(B) > 0 in (3.43); we consider
µ1 := γ(·|A) =
1
χA · γ,
γ(A)
µ2 := γ(·|B) =
1
χB · γ,
γ(B)
observing that
H(µ1 |γ) = − log γ(A),
H(µ2 |γ) = − log γ(B).
(3.44)
If µ1→2
is induced by a transfer plan µ ∈ Γ(µ1 , µ2 ) along which the relative
t
entropy is displacement convex, we have
H(µ1→2
|γ) ≤ (1 − t)H(µ1 |γ) + tH(µ2 |γ) = −(1 − t) log γ(A) − t log γ(B).
t
40
On the other hand the measure µ1→2
is concentrated on (1 − t)A + tB =
t
πt1→2 (A × B) and the next lemma shows that
− log γ((1 − t)A + tB) ≤ H(µ1→2
|γ). t
Lemma 3.20 (Relative entropy of concentrated measures) Let γ, µ ∈ P(Rd );
if µ is concentrated on a Borel set A, i.e. µ(X \ A) = 0, then
H(µ|γ) ≥ − log γ(A).
(3.45)
Proof. It is not restrictive to assume µ γ and γ(A) > 0; denoting by γA the
probability measure γ(·|A) := γ(A)−1 χA · γ, we have
Z
Z
dµ dµ
1 ·
H(µ|γ) =
log
dµ =
dµ
log
dγ
dγA γ(A)
d
A
ZR
Z
dµ dµ −
log γ(A) dµ = H(µ|γA ) − log γ(A)
=
log
dγA
A
A
≥ − log γ(A) . The previous results justifies the following definition:
Definition 3.21 (log-concavity of a measure) We say that a Borel probability measure γ ∈ P(Rd ) is log-concave if for every couple of open sets
A, B ⊂ Rd we have
log γ((1 − t)A + tB) ≥ (1 − t) log γ(A) + t log γ(B).
(3.46)
In Definition 3.21 and also in the previous theorem we confined ourselves
to pairs of open sets, to avoid the non trivial issue of the measurability of
(1 − t)A + tB when A and B are only Borel (in fact, it is an open set whenever
A and B are open). Observe that a log-concave measure γ in particular satisfies
log γ(Br ((1 − t)x0 + tx1 )) ≥ (1 − t) log γ(Br (x0 )) + t log γ(Br (x1 )),
(3.47)
for every couple of points x0 , x1 ∈ Rd , r > 0, t ∈ [0, 1].
We want to show that in fact log concavity is equivalent to the geodesic
convexity of the Relative Entropy functional H(·|γ).
Let us first recall some elementary properties of convex sets in Rd . Let
C ⊂ Rd be a convex set; the affine dimension dim C of C is the linear dimension
of its affine envelope
n
o
aff C = (1 − t)x0 + tx1 : x0 , x1 ∈ C, t ∈ R ,
(3.48)
which is an affine subspace of Rd . We denote by int C the relative interior of C
as a subset of aff C: it is possible to show that
int C 6= ∅,
int C = C,
H k (C \ int C) = 0
if k = dim C,
(3.49)
where H k is the k-dimensional Hausdorff measure in Rd . The previous theorem
shows that log-concavity of γ is equivalent to the convexity of H(µ|γ) along (even
generalized) geodesics of the Wasserstein space P2 (Rd ): the link between these
two concepts is provided by the representation formula (3.50).
41
Theorem 3.22 Let us suppose that γ ∈ P(Rd ) satisfies the log-concavity assumptions on balls (3.47). Then supp γ is convex and there exists a convex l.s.c.
function V : Rd → (∞, +∞] such that
γ = e−V · H k |aff(supp γ) ,
where k = dim(supp γ).
(3.50)
Conversely, if γ admits the representation (3.50) then γ is log-concave and the
relative entropy functional H(·|γ) is convex along any (generalized) geodesic of
P2 (Rd ).
Proof. Let us suppose that γ satisfies the log-concave inequality on balls and
let k be the dimension of aff(supp γ). Observe that the measure γ satisfies the
same inequality (3.47) for the balls of aff(supp γ): up to an isometric change of
coordinates it is not restrictive to assume that k = d and aff(supp γ) = Rd .
Let us now introduce the set
n
o
γ(Br (x))
D := x ∈ Rd : lim inf
>
0
.
(3.51)
r↓0
rd
Since (3.47) yields
γ(Br (xt ))
≥
rk
γ(Br (x0 ))
rk
1−t γ(Br (x1 ))
rk
t
t ∈ (0, 1),
(3.52)
it is immediate to check that D is a convex subset of Rd with D ⊂ supp γ.
General results on derivation of Radon measures in Rd (see for instance
Theorem 2.56 in [7]) show that
lim sup
r↓0
γ(Br (x))
< +∞
rd
and
lim sup
r↓0
for L d -a.e. x ∈ Rd
(3.53)
for γ-a.e. x ∈ Rd .
(3.54)
rd
< +∞
γ(Br (x))
Using (3.54) we see that actually γ is concentrated on D (so that supp γ ⊂ D)
and therefore, being d the dimension of aff(supp γ), it follows that d is also the
dimension of aff(D).
If a point x̄ ∈ Rd exists such that
lim sup
r↓0
γ(Br (x̄))
= +∞,
rd
then (3.52) forces every point of int(D) to verify the same property, but this
would be in contradiction with (3.53), since we know that int(D) has strictly
positive L d -measure. Therefore
lim sup
r↓0
γ(Br (x))
< +∞
rd
42
for all x ∈ Rd
(3.55)
and we obtain that γ L d , again by the theory of derivation of Radon measures in Rd . In the sequel we denote by ρ the density of γ w.r.t. L d and notice
that by Lebesgue differentiation theorem ρ > 0 L d -a.e. in D and ρ = 0 L d -a.e.
in Rd \ D.
By (3.47) the maps
γ(B (x)) r
Vr (x) = − log
ωd r d
are convex on Rd , and (3.55) gives that the family Vr (x) is bounded as r ↓ 0
for any x ∈ D. Using the pointwise boundedness of Vr on D and the convexity
of Vr it is easy to show that Vr are locally equi-bounded (hence locally equicontinuous) on int(D) as r ↓ 0. Let W be a limit point of Vr , with respect to
the local uniform convergence, as r ↓ 0: W is convex on int(D) and Lebegue
differentiation theorem shows that
∃ lim Vr (x) = − log ρ(x) = W (x)
r↓0
for L d -a.e. x ∈ int(D),
(3.56)
so that γ = ρL d = e−W χint(D) L d . In order to get a globally defined convex
and l.s.c function V we extend W with the +∞ value out of int(D) and define
V to be its convex and l.s.c. envelope. It turns out that V coincides with W on
int(D), so that still the representation γ = e−V L d holds.
Conversely, let us suppose that γ admits the representation (3.50) for a given
convex l.s.c. function V and let µ1 , µ2 ∈ P2 (Rd ); if their relative entropies are
finite then they are absolutely continuous w.r.t. γ and therefore their supports
are contained in aff(supp γ). It follows that the support of any optimal plan
µ ∈ Γo (µ1 , µ2 ) in P2 (Rd ) is contained in aff(supp γ)×aff(supp γ): up to a linear
isometric change of coordinates, it is not restrictive to suppose aff(supp γ) = Rd ,
µ1 , µ2 ∈ P2a (Rd ), γ = e−V · L d ∈ P(Rd ).
In this case we introduce the densities ρi of µi w.r.t. L d , observing that
dµi
= ρi eV
dγ
i = 1, 2,
where we adopted the convention 0 · (+∞) = 0 (recall that ρi (x) = 0 for L d -a.e.
x ∈ Rd \ D(V )). Therefore the entropy functional can be written as
Z
Z
H(µi |γ) =
ρi (x) log ρi (x) dx +
V (x) dµi (x),
(3.57)
Rd
Rd
i.e. the sum of two geodesically convex functionals, as we proved discussing
Examples 3.4 and Examples 3.8. Lemma 3.19 yields the log-concavity of γ.
If γ is log-concave and F satisfies (3.22), then all the integral functionals F(·|γ)
introduced in (3.36) are geodesically convex in P2 (Rd ).
Theorem 3.23 (Geodesic convexity for relative integral functionals) Suppose
that γ is log-concave and F : [0, +∞) → [0, +∞] satisfies conditions (3.35) and
(3.22). Then the integral functional F(·|γ) is geodesically convex in P2 (Rd ).
43
Proof. Arguing as in the final part of the proof of Theorem 3.22 we can assume
that γ := e−V L d for a convex l.s.c. function V : Rd → (−∞, +∞] whose
domain has not empty interior. For every couple of measures µ1 , µ2 ∈ D(F(·|γ))
we have
Z
µi = ρi eV · γ, F(µi |γ) =
F (ρi (x)eV (x) )e−V (x) dx i = 1, 2.
(3.58)
Rd
We denote by r the optimal transport map for the Wasserstein distance pushing µ1 to µ2 and we set r t := (1 − t)i + tr, µt := (r t )# µ1 ; arguing as in
Proposition 3.11, we get
Z
ρ(x)eV (rt (x)) F(µt |γ) =
F
det ∇r t (x)e−V (rt (x)) dx,
(3.59)
det ∇r t (x)
Rd
and the integrand above may be seen as the composition of the convex and
nonincreasing map s 7→ F (ρ(x)e−s )es with the concave curve
t 7→ −V (r t (x)) + log(det ∇r t (x)),
since D(x) := ∇r(x) is a diagonalizable map with nonnegative eigenvalues and
t 7→ log det (1 − t)I + tD(x) is concave in [0, 1].
44
4
Subdifferential calculus in P2 (Rd )
Let X be an Hilbert space. In the classical theory of subdifferential calculus
for proper, lower semicontinuous functionals φ : X → (−∞, +∞], the Fréchet
Subdifferential ∂φ : X → 2X of φ is a multivalued operator defined as
ξ ∈ ∂φ(v)
⇐⇒
v ∈ D(φ),
lim inf
w→v
φ(w) − φ(v) − hξ, w − vi
≥ 0,
|w − v|
which we will also write in the equivalent form for v ∈ D(φ)
ξ ∈ ∂φ(v) ⇐⇒ φ(w) ≥ φ(v) + hξ, w − vi + o |w − v| as w → v.
(4.1)
(4.2)
As usual in multivalued analysis, the proper domain D(∂φ) ⊂ D(φ) is defined
as the set of all v ∈ X such that ∂φ(v) 6= ∅; we will use this convention for all
the multivalued operators we will introduce.
The metric counterpart of the Fréchet Subdifferential is represented by the
metric slope of φ, which for every v ∈ D(φ) is defined by
|∂φ|(v) = lim sup
w→v
(φ(v) − φ(w))+
,
|w − v|
(4.3)
and can also be characterized by an asymptotic expansion similar to (4.2) for
s≥0
s ≥ |∂φ|(v) ⇐⇒ φ(w) ≥ φ(v) − s|w − v| + o |w − v| as w → v. (4.4)
It is then immediate to check that
ξ ∈ ∂φ(v)
=⇒
|∂φ|(v) ≤ |ξ|.
(4.5)
The Fréchet subdifferential and the metric slope occur quite naturally in the
Euler equations for minima of (smooth perturbation of) φ:
A. Euler equation for quadratic perturbations. If vτ is a minimizer of
w 7→ Φ(τ, v; w) := φ(w) +
1
|w − v|2
2τ
then
vτ ∈ D(∂φ)
and
−
for some
τ > 0, v ∈ X
(4.6)
vτ − v
∈ ∂φ(vτ );
τ
(4.7)
|v − vτ |
.
τ
(4.8)
concerning the slope we easily get
vτ ∈ D(|∂φ|)
and
|∂φ|(v) ≤
For λ-convex functionals the Fréchet subdifferential enjoys at least two other
simple but fundamental properties, which play a crucial role in the corresponding variational theory of evolution equations:
45
B. Characterization by variational inequalities and monotonicity. If
φ is λ-convex, then
ξ ∈ ∂φ(v)
λ
φ(w) ≥ φ(v)+hξ, w−vi+ |w−v|2
2
⇐⇒
∀ w ∈ D(φ); (4.9)
in particular,
ξi ∈ ∂φ(vi )
=⇒
hξ1 − ξ2 , v1 − v2 i ≥ λ|v1 − v2 |2
s ≥ |∂φ|(v)
⇐⇒
φ(w) ≥ φ(v) − s|w − v| +
∀ v1 , v2 ∈ D(∂φ).
(4.10)
As in (4.9), the slope of a λ-convex functional can also be characterized by
a system of inequalities for s ≥ 0
λ
|w − v|2
2
∀ w ∈ D(φ),
(4.11)
which can equivalently reformulated as
|∂φ|(v) = sup
w6=v
+
φ(v) − φ(w) λ
+ |v − w|
.
|v − w|
2
(4.12)
C. Convexity and strong-weak closure. [15, Chap. II, Ex. 2.3.4, Prop. 2.5]
If φ is λ-convex, then ∂φ(v) is closed and convex, and for every sequences
(vn ) ⊂ X, (ξn ) ⊂ X we have
ξn ∈ ∂φ(vn ), vn → v, ξn * ξ
=⇒
ξ ∈ ∂φ(v),
φ(vn ) → φ(v). (4.13)
The slope is l.s.c.
vn → v
=⇒
lim inf |∂φ|(vn ) ≥ |∂φ|(v).
n→∞
(4.14)
Modeled on the last property C, and following a terminology introduced by
F.H. Clarke, see e.g. [62, Chap. 8], we say that a functional φ is regular if
)
ξn ∈ ∂φ(vn ), ϕn = φ(vn )
=⇒ ξ ∈ ∂φ(v), ϕ = φ(v).
(4.15)
vn → v, ξn * ξ, ϕn → ϕ
D. Minimal selection and slope. (see *.*.* of [8]) If φ is regular (in particular if φ is λ-convex) |∂φ|(v) is finite if and only if ∂φ(v) 6= ∅ and
n
o
|∂φ|(v) = min |ξ| : ξ ∈ ∂φ(v) .
(4.16)
E. Chain rule. If v : (a, b) → D(φ) is a curve in X then
d
φ(v(t)) = hξ, v 0 (t)i
dt
46
∀ ξ ∈ ∂φ(v(t)),
(4.17)
at each point t where v and φ ◦ v are differentiable and ∂φ(v(t)) 6= ∅. In
particular (see [15, Chap. III, Lemma 3.3] and Corollary 2.4.10 in [8]) if φ
is also λ-convex, v ∈ AC(a, b; X), and
Z
b
|∂φ|(v(t))|v 0 (t)| dt < +∞,
(4.18)
a
then φ ◦ v is absolutely continuous in (a, b) and (4.17) holds for L 1 -a.e.
t ∈ (a, b).
The aim of this section is to extend the notion of Fréchet subdifferentiability and
these properties to the Wasserstein framework (see also [21] for related results).
4.1
Definition of the subdifferential for a.c. measures
In this section we focus our attention to functionals φ defined on P2 (Rd ). The
formal mechanism for translating statements from the euclidean framework to
the Wasserstein formalism is simple: if µ ↔ v is the reference point, scalar products h·, ·i have to be intended in the reference Hilbert space L2 (µ; Rd ) (which
contains the tangent space Tanµ P2 (Rd )) and displacement vectors w − v corresponds to transport maps tνµ −i, which is well defined if µ ∈ P2a (Rd ). According
to these two natural rules, the transposition of (4.1) yields:
Definition 4.1 (Fréchet subdifferential and metric slope) Let us consider
a functional φ : P2 (Rd ) → (−∞, +∞] and a measure µ ∈ D(φ) ∩ P2a (Rd ). We
say that ξ ∈ L2 (µ; Rd ) belongs to the Fréchet subdifferential ∂φ(µ) if
Z
φ(ν) − φ(µ) ≥
hξ(x), tνµ (x) − xi dµ(x) + o W2 (µ, ν) .
(4.19)
Rd
When ξ ∈ ∂φ(µ) also satisfies
Z
hξ(x), t(x) − xi dµ(x) + o kt − ikL2 (µ;Rd ) ,
φ(t# µ) − φ(µ) ≥
(4.20)
Rd
then we will say that ξ is a strong subdifferential.
It is obvious that ∂φ(µ) is a closed convex subset of L2 (µ; Rd ); in fact, we could
also impose that it is contained in the tangent space Tanµ P2 (Rd ), since the
vector ξ in (4.19) acts only on tangent vectors (see Theorem 2.24): for, if Π
denotes the orthogonal projection onto Tanµ P2 (Rd ) in L2 (µ; Rd ),
ξ ∈ ∂φ(µ)
=⇒
Πξ ∈ ∂φ(µ).
(4.21)
It is interesting to note that elements in ∂φ(µ) ∩ Tanµ P2 (Rd ) are in fact strong
subdifferentials.
Proposition 4.2 (Subdifferentials in Tanµ P2 (Rd ) are strong) Let µ ∈ D(φ)∩
P2a (Rd ) and let ξ ∈ ∂φ(µ) ∩ Tanµ P2 (Rd ). Then ξ is a strong subdifferential.
47
Proof. We argue by contradiction, and we assume that a constant δ > 0, a
sequence sn ∈ L2 (µ; Rd ) with εn := ksn − ikL2 (µ;Rd ) → 0 as n → ∞ exist such
that
Z
hξ, sn − ii dµ ≤ −δ εn , µn := (sn )# µ.
(4.22)
φ(µn ) − φ(µ) −
Rd
Let us denote by tn the optimal transport pushing µ onto µn : we know that
ktn − ikL2 (µ;Rd ) = W2 (µ, µn ) ≤ εn → 0.
(4.23)
By the definition of subdifferential, there exists n0 ∈ N such that for every
n ≥ n0
Z
δ
φ(µn ) − φ(µ) ≥
hξ, tn − ii dµ − εn ;
2
d
R
combining with (4.22) we obtain
Z
δ
hξ, tn − sn i dµ ≤ − εn
2
d
R
∀ n ≥ n0 .
(4.24)
Up to an extraction of a suitable subsequence, we can assume that
sn − i
* s̃,
εn
tn − i
* t̃
εn
weakly in L2 (µ; Rd )
as n → ∞;
(4.25)
δ
< 0.
2
(4.26)
by (4.24) we get
tn − sn
* t̃ − s̃s,
εn
Z
hξ, t̃ − s̃i dµ ≤ −
Rd
On the other hand, for every test function ζ ∈ Cc∞ (Rd ), the second order Taylor
expansion
ζ(y)−ζ(x) ≤ hDζ(x), y −xi+C|y −x|2
∀ x, y ∈ Rd ,
for some constant C ≥ 0
yields
Z
ζ(tn (x)) − ζ(sn (x)) dµ(x) ≤
hDζ(x), sn (x) − tn (x)i dµ(x)
Rd
Rd
Z +C
|sn (x) − x|2 + |tn (x) − x|2 dµ(x)
Rd
Z
≥
Dζ(x), tn (x) − sn (x)i dµ(x) − 2Cε2n .
Z
0=
Rd
Dividing by εn and passing to the limit as n → ∞ we get
Z
hDζ, t̃ − s̃i dµ ≥ 0 ∀ ζ ∈ Cc∞ (Rd ).
(4.27)
Rd
Since the gradients of Cc∞ (Rd ) functions are dense in Tanµ P2 (Rd ), (4.27) contradicts (4.26).
48
The definition of the metric slope of φ is in fact common to functionals defined
in arbitrary metric spaces.
Definition 4.3 (Metric slope) Let us consider a functional φ : P2 (Rd ) →
(−∞, +∞] and a measure µ ∈ D(φ). The metric slope of φ at µ is defined by
|∂φ|(µ) = lim sup
ν→µ
(φ(ν) − φ(µ))+
,
W2 (ν, µ)
(4.28)
or, equivalently, by
n
|∂φ|(µ) := inf s ≥ 0 : φ(ν) ≥ φ(µ) − sW2 (ν, µ) + o W2 (ν, µ)
o
as W2 (ν, µ) → 0 .
4.2
(4.29)
Subdifferential calculus in P2a (Rd )
We now try to reproduce in the Wasserstein framework the calculus properties for the subdifferential, we briefly discussed at the beginning of the present
section.
In order to simplify some technical point, we are supposing that
φ : P2 (Rd ) → (−∞, +∞] is proper and lower semicontinuous,
with D(|∂φ|) ⊂ P2a (Rd ),
(4.30a)
and that for some τ∗ > 0 the functional
1 2
W (µ, ν) + φ(ν) admits at least
2τ 2
a minimum point µτ , for all τ ∈ (0, τ∗ ) and µ ∈ P2 (X).
ν 7→ Φ(τ, µ; ν) =
(4.30b)
Notice that D(φ) ⊂ P2r (X) is a sufficient but not necessary condition for
(4.30a): the internal energy functionals induced by a class of sublinear functions F satisfy (4.30a), but have a domain strictly larger than P2a (Rd ) (see
Theorem 10.4.8 of [8]).
A. Euler equation for quadratic perturbations. When we want to minimize the perturbed functional (4.30b) we get a result completely analogous to
the euclidean one:
Lemma 4.4 Let φ be satisfying (4.30a,b). Each minimizer µτ of (4.30b) belongs to µτ ∈ D(|∂φ|) and
1 µ
tµτ − i ∈ ∂φ(µτ )
τ
is a strong subdifferential.
49
(4.31)
Proof. The minimality of µτ gives for every ν ∈ P2 (Rd )
1 2
φ(ν) − φ(µτ ) = Φ(τ, µ; ν) − Φ(τ, µ; µτ ) +
W2 (µτ , µ) − W22 (ν, µ)
2τ
1 2
2
≥
W2 (µτ , µ) − W2 (ν, µ)
(4.32)
2τ
1
≥ − W2 (µτ , ν) W2 (µτ , µ) + W2 (ν, µ) .
(4.33)
2τ
Letting ν converge to µτ , (4.33) yields
|∂φ|(µτ ) ≤
W2 (µτ , ν)
.
τ
(4.34)
By (4.30a) we get µτ ∈ P2a (Rd ); if ν = t# µτ we have
Z
Z
W22 (µτ , µ) =
|tµµτ (x) − x|2 dµτ (x), W22 (ν, µ) ≤
Rd
Rd
|t(x) − tµµτ (x)|2 dµτ (x),
and therefore the elementary identity 21 |a|2 − 12 |b|2 = ha, a − bi − 12 |a − b|2 and
(4.32) yield
Z 1
φ(ν) − φ(µτ ) ≥
|tµµτ (x) − x|2 − |tµµτ (x) − t(x)|2 dµτ (x)
2τ d
Z R
1 µ
1
=
htµτ (x) − x, t(x) − xi −
|t(x) − x|2 dµτ (x)
τ
2τ
d
ZR
1 µ
1
=
htµτ (x) − x, t(x) − xi dµτ (x) −
kt − ik2L2 (µτ ;Rd ) .
τ
2τ
d
R
We deduce τ1 tµµτ − i ∈ ∂φ(µτ ) and the strong subdifferentiability condition.
The above result, though simple, is very useful and usually provides the first
crucial information when one looks for the properties of solutions of the variational problem (4.30b). The nice argument which combines the minimality of
µτ and the possibility to use any “test” transport map t to estimate W22 (t# ν, µ)
was originally introduced by F. Otto.
4.3
The case of λ-convex functionals along geodesics
Let us now focus our attention to the case of a λ-convex functional:
φ is λ-convex on geodesics, according to Definition 3.1.
(4.35)
B. Characterization by Variational inequalities and monotonicity. Suppose that φ satisfies (4.30a,b) and (4.35). Then a vector ξ ∈ L2 (µ; X) belongs
to the Fréchet subdifferential of φ at µ iff
Z
λ
φ(ν) − φ(µ) ≥
hξ(x), tνµ (x) − xi dµ(x) + W22 (µ, ν) ∀ ν ∈ D(φ). (4.36)
2
d
R
50
In particular if ξ i ∈ ∂φ(µi ), i = 1, 2, and t = tµµ21 is the optimal transport map,
then
Z
hξ 2 (t(x)) − ξ 1 (x), t(x) − xi dµ1 (x) ≥ λW22 (µ1 , µ2 ).
(4.37)
Rd
Concerning the slope of φ we have for every s ≥ 0
s ≥ |∂φ|(µ)
⇐⇒
φ(ν) ≥ φ(µ) − sW2 (ν, µ) +
λ 2
W (ν, µ)
2 2
∀ ν ∈ D(φ),
(4.38)
or, equivalently,
|∂φ|(µ) = sup
ν6=µ
+
φ(µ) − φ(ν) λ
.
+ W2 (ν, µ)
W2 (ν, µ)
2
(4.39)
Proof. One implication of (4.36) and of (4.38) is trivial. To prove the other
one, in the case of (4.36) suppose that ξ ∈ ∂φ(µ) and ν ∈ D(φ); for t ∈ [0, 1]
we set µt := (i + t(tνµ − i))# µ and we recall that the λ-convexity yields
φ(µt ) − φ(µ)
λ
≤ φ(ν) − φ(µ) − (1 − t)W22 (µ, ν).
t
2
(4.40)
On the other hand, since W2 (µ, µt ) = tW2 (µ, ν), Fréchet differentiability yields
Z
1
φ(µt ) − φ(µ)
≥ lim inf
hξ(x), tµµt (x) − xi dµ(x)
lim inf
t↓0
t
t→0+ t Rd
Z
≥
hξ(x), tνµ (x) − xi dµ(x),
Rd
since tµµt (x) = x + t(tνµ (x) − x).
In the case of the slope (4.38), (4.40) and the fact that
lim inf
t↓0
φ(µt ) − φ(µ)
≥ −|∂φ|(µ) W2 (µ, ν)
t
yield (4.38).
(4.41)
C. Convexity and strong-weak closure. The next step is to show the
closure of the graph of ∂φ: here one has to be careful in the meaning of the
convergence of vectors ξ n ∈ L2 (µn ; Rd ), which belongs to different L2 -spaces,
and we will adopt the following one.
Definition 4.5 Let (µn ) ⊂ P(Rd ) be narrowly converging to µ in P(Rd ) and
let v n ∈ L1 (µn ; Rd ). We say that v n weakly converge to v ∈ L1 (µ; Rd ) if
Z
Z
ζ(x)v n (x) dµn (x) =
ζ(x)v(x) dµ(x)
∀ζ ∈ Cc∞ (Rd ). (4.42)
lim
n→∞
Rd
Rd
We say that v n strongly converge to v in L2 if (4.42) holds and
lim sup kv n kL2 (µn ;Rd ) ≤ kvkL2 (µ;Rd ) .
n→∞
51
(4.43)
We state without proof (see Theorem 5.4.4 of [8] for it) some basic properties
of this convergence.
Theorem 4.6 Let (µn ) ⊂ P2 (Rd ) be narrowly converging to µ in P(Rd ) and
let v n ∈ L2 (µn ; Rd ) be such that
Z (4.44)
sup
|x|2 + |v n (x)|2 dµn (x) < +∞.
n∈N
Rd
Then the following statements hold:
(i) The sequence (v n ) has weak limit points as n → ∞. If v is any limit
point, along some subsequence n(k), then
Z
Z
lim
hv n(k) , ϕi dµn(k) (x) =
hv(x), ϕi dµ(x),
(4.45)
k→∞
Rd
Rd
for every continuous function ϕ : Rd → Rd with p-growth for some p < 2.
(ii) If v n strongly converge to v in L2 and W2 (µn , µ) → 0, then
Z
Z
lim
f (x, v n (x)) dµn (x) =
f (x, v(x)) dµ(x),
n→∞
Rd
(4.46)
Rd
holds for every continuous function f : Rd ×Rd → R with quadratic growth.
Lemma 4.7 (Closure of the subdifferential) Let φ be a λ-convex functional
satisfying (4.30a), let (µn ) be converging to µ ∈ D(φ) in P2 (Rd ), let ξ n ∈
∂φ(µn ) be satisfying
Z
sup
|ξ n (x)|2 dµn (x) < +∞,
(4.47)
n
Rd
and converging to ξ according to Definition 4.5. Then ξ ∈ ∂φ(µ).
Proof. Let ν ∈ D(φ) and let C be the constant in (4.47). We have to pass to
the limit as n → ∞ in the subdifferential inequality
Z
λ
φ(ν) − φ(µn ) ≥
hξ n (x), tνµn (x) − xi dµn (x) + W22 (µn , ν).
(4.48)
2
d
R
By the lower semicontinuity of φ the upper limit of φ(ν) − φ(µn ) is less than
φ(ν)−φ(µ). Passing to the right hand side, given ε > 0 we choose t̄ ∈ Cb0 (Rd ; Rd )
such that ktνµ − t̄kL2 (µ;Rd ) < ε2 and split the integrals as
Z
Z
ν
hξ n (x), tµn (x) − t̄(x)i dµn (x) +
hξ n (x), t̄(x) − xi dµn (x).
(4.49)
Rd
Rd
By the Young inequality, the first integrals can be estimated with
Z
Z
Cε
1
Cε
1
+
lim sup
|tνµn − t̄|2 dµn =
+
lim sup
|y − t̄(x)|2 dγ n ,
2
2ε n→∞ Rd
2
2ε n→∞ Rd ×Rd
52
mi sembra occorra un
controllo anche sul momento delle µn
where γ n = (i × tνµn )# µn are the optimal plans induced by tνµn . Now, by
Proposition 7.1.3 of [8] (showing that optimal plans are stable under narrow
convergence), we know that γ n narrowly converge to the plan γ = (i × tνµ )# µ
induced by tνµ ; moreover, as |y|2 is uniformly integrable with respect to {γ n }
(because the second marginal of γ n is constant), Lemma 1.2 gives that the upper
limits above are less than
Z
Z
|y − t̄(x)|2 dγ =
|tνµ − t̄|2 dµ ≤ ε2 .
Rd ×Rd
Rd ×Rd
Summing up, we proved that the limsup of the first integrals in (4.49) is less
than (C + 1)ε/2. The convergence of the second integrals in (4.49) to
Z
hξ(x), t̄(x) − xi dµ(x)
Rd
follows directly from Theorem 4.6(i). As a consequence
Z
Z
lim inf
hξ n (x), tνµn (x) − xi dµn (x) ≥
hξ(x), tνµ (x) − xi dµ(x)
n→∞
Rd
Rd
Z
ε
− (C + 1) −
|ξ(x)| · |t̄(x) − tνµ | dµ(x).
2
Rd
As ε is arbitrary, the variational inequality (4.48) passes to the limit.
The lower semicontinuity of the slope
µn → µ in P2 (Rd )
=⇒
lim inf |∂φ|(µn ) ≥ |∂φ|(µ)
n→∞
(4.50)
is an immediate consequence of (4.38) and of the lower semicontinuity of φ.
4.4
Regular functionals
Definition 4.8 A functional φ : P2 (Rd ) → (−∞, +∞] satisfying (4.30a) is
regular if, whenever the strong subdifferentials ξ n ∈ ∂φ(µn ), ϕn = φ(µn ) satisfy

 µn → µ in P2 (Rd ), ϕn → ϕ, sup kξ n kL2 (µn ;Rd ) < +∞
n
(4.51)

ξ n → ξ weakly, according to Definition 4.5,
then ξ ∈ ∂φ(µ) and ϕ = φ(µ).
We just proved that λ-convex functionals are indeed regular.
D. Minimal selection and slope.
Lemma 4.9 Let φ be a regular functional satisfying (4.30a,b). µ ∈ D(|∂φ|) if
and only if ∂φ(µ) is not empty and
n
o
|∂φ|(µ) = min kξkL2 (µ;Rd ) : ξ ∈ ∂φ(µ) ,
(4.52)
53
where the metric slope |∂φ|(µ) is defined in (4.3). By the convexity of ∂φ(µ)
there exists a unique vector ξ ∈ ∂φ(µ) which attains the minimum in (4.52):
we will denote it by ∂ ◦ φ(µ), it belongs to Tanµ P2 (Rd ) and it is also a strong
subdifferential.
Proof. It is clear from the very definition of Fréchet subdifferential that
|∂φ|(µ) ≤ kξkL2 (µ;Rd )
∀ ξ ∈ ∂φ(µ);
thus we should prove that if |∂φ|(µ) < +∞ there exists ξ ∈ ∂φ(µ) such that
kξkL2 (µ;Rd ) ≤ |∂φ|(µ). We argue by approximation: for µ ∈ D(|∂φ|) and τ ∈
(0, τ∗ ), let µτ be a minimizer of (4.30b); by Lemma 4.4 we know that
Z
W 2 (µ, µτ )
1 µ
tµτ − i ∈ ∂φ(µτ ),
,
|ξ τ (x)|2 dµτ (x) = 2 2
ξτ =
τ
τ
Rd
and ξ τ is a strong subdifferential. Furthermore, it is proved in Lemma 3.1.5 of
[8] (in a general metric space setting) that there exists a sequence (τn ) ↓ 0 such
that
W22 (µτn , µ)
= |∂φ|2 (µ).
(4.53)
lim
n→∞
τ2
By Theorem 4.6(i) we know that ξ τ has some limit point ξ ∈ L2 (µ; Rd ) as τ ↓ 0,
according to Definition 4.5. By (4.51) we get ξ ∈ ∂φ(µ) with kξkL2 (µ;Rd ) ≤
|∂φ|(µ), so that ξ is the (unique) element of minimal norm in ∂φ(µ).
By (4.21) we also deduce that ξ ∈ Tanµ P2 (Rd ) and Proposition 4.2 shows
that ξ is a strong subdifferential.
Remark 4.10 (The λ-convex case) When φ satisfies the λ-convexity assumption (4.35), the proof of Property (4.53) is considerably easier, since µτ satisfies
the a priori bound [8, Thm. 3.1.6]
(1 + λτ )
W2 (µτ , µ)
≤ |∂φ|(µ).
τ
(4.54)
For, we choose µt := (i+t(tµµτ −i))# µ and we recall that λ-convexity of φ yields
1 2
1 2
W2 (µ, µτ ) + φ(µτ ) ≤
W (µ, µt ) + φ(µt )
2τ
2τ 2
t t − λτ (1 − t) W22 (µ, µτ ) + (1 − t)φ(µ) + tφ(µτ ).
≤
2τ
Since the right hand quadratic function has a minimum for t = 1, taking the
left derivative we obtain
λ 1
+
W22 (µ, µτ ) + φ(µτ ) − φ(µ) ≤ 0,
2
τ
54
and therefore, by (4.38)
1
W 2 (µ, µτ )
φ(µ) − φ(µτ ) W22 (µ, µτ )
(1 + λτ ) 2 2
≤
−
2
τ
τ
2τ 2
W2 (µτ , µ)
W 2 (µτ , µ)
≤ |∂φ|(µ)
− (1 + λτ ) 2 2
τ
2τ
1
≤
|∂φ|2 (µ),
2(1 + λτ )
which yields (4.54).
E. Chain rule. Let φ : P2 (Rd ) → (−∞, +∞] be a regular functional satisfying (4.30a,b), and let µ : (a, b) 7→ µt ∈ D(φ) ⊂ P2 (Rd ) be an absolutely
continuous curve with tangent velocity vector v t . Let Λ ⊂ (a, b) be the set of
points t ∈ (a, b) such that
(a) |∂φ|(µt ) < +∞;
(b) φ ◦ µ is differentiable at t;
(c) condition (2.57) of Proposition 2.22 holds.
Then
d
φ(µt ) =
dt
Z
Rd
hξ t (x), v t (x)i dµt (x)
∀ ξ t ∈ ∂φ(µt ),
∀t ∈ Λ.
Moreover, if φ is λ-convex (4.35) and
Z b
|∂φ|(µt )|µ0 |(t) dt < +∞,
(4.55)
(4.56)
a
then the map t 7→ φ(µt ) is absolutely continuous, and (a, b) \ Λ is L 1 -negligible.
Proof. Let t̄ ∈ Λ; observing that
1 µt̄+h
tµt̄ − i → vt̄ in L2 (µt̄ ; Rd ),
v h :=
h
we have
Z
φ(µt̄+h ) − φ(µt̄ ) ≥ h
hv h (x), ξ t̄ (x)i dµt̄ (x) + o(h).
(4.57)
(4.58)
Rd
Dividing by h and taking the right and left limits as h → 0 we obtain that the
left and right derivatives d/dt± φ(µt ) satisfy
Z
d
φ(µt )|t=t̄ ≥
hvt̄ (x), ξ t̄ (x)i dµt̄ (x),
dt+
Rd
Z
d
hvt̄ (x), ξ t̄ (x)i dµt̄ (x)
φ(µt )|t=t̄ ≤
dt−
Rd
and therefore we find (4.55).
In the λ-convex case, it can be shown (see Corollary 2.4.10 in [8]) that (4.56)
implies that t 7→ φ(µt ) is absolutely continuous in (a, b) and thus conditions
(a,b,c) hold L 1 -a.e. in (a, b).
55
4.5
Examples of subdifferentials
In this section we consider in the detail the subdifferential of the convex functionals presented in Section 3.2 (potential energy, interaction energy, internal
energy, negative Wasserstein distance), with a particular attention to the characterization of the elements with minimal norm.
We start by considering a general, but smooth, situation.
4.5.1
Variational integrals: the smooth case
In order to clarify the underlying structure of many examples and the link between the notion of Wasserstein subdifferential and the standard variational
calculus for integral functionals, we first consider the case of a variational integral of the type
Z

F (x, ρ(x), ∇ρ(x)) dx if µ = ρ · L d with ρ ∈ C 1 (Rd )
d
F (µ) :=
(4.59)
 R
+∞
otherwise.
Since we are not claiming any generality and we are only interested in the
form of the subdifferential, we will assume enough regularity to justify all the
computations; therefore, we suppose that F : Rd × [0, +∞) × Rd → [0, +∞) is
a C 2 function with F (x, 0, p) = 0 for every x, p ∈ Rd and we consider the case
of a smooth and strictly positive density ρ: as usual, we denote by (x, z, p) ∈
Rd × R × Rd the variables of F and by δF /δρ the first variation density
δF
(x) := Fz (x, ρ(x), ∇ρ(x)) − ∇ · Fp (x, ρ(x), ∇ρ(x)).
δρ
(4.60)
Lemma 4.11 If µ = ρ · L d ∈ P2a (Rd ) with ρ ∈ C 1 (Rd ) satisfies F (µ) < +∞
and w ∈ L2 (µ; Rd ) belongs to the strong subdifferential of F at µ (in particular,
by Proposition 4.2, if w ∈ ∂φ(µ) ∩ Tanµ P2 (Rd )), then
w(x) = ∇
δF
(x)
δρ
for µ-a.e. x ∈ Rd ,
and for every vector field ξ ∈ Cc∞ (Rd ; Rd ) we have
Z
Z
δF
w(x) · ξ(x) dµ(x) = −
(x) ∇ · ρ(x)ξ(x) dx.
Rd
Rd δρ
(4.61)
(4.62)
Proof. We take a smooth vector field ξ ∈ Cc∞ (Rd ; Rd ) and we set for ε ∈ R
sufficiently small µε := (i + εξ)# µ. If w is a strong subdifferential, we know
that
Z
F (µε ) − F (µ)
F (µε ) − F (µ)
≤
w(x) · ξ(x) dµ(x) ≤ lim inf
;
lim sup
ε↓0
ε
ε
d
ε↑0
R
(4.63)
56
on the other hand, by the change of variables formula we know that µε = ρε L d
with
−1
ρ
ρε (y) =
◦ i + εξ
(y) ∀ y ∈ Rd .
(4.64)
det(I + ε∇ξ)
The map (x, ε) 7→ ρε (x) is of class C 2 with ρε (x) = ρ(x) outside a compact set
and
∂ρε (x)
= −∇ · ρ(x)ξ(x) .
(4.65)
ρε (x)|ε=0 = ρ(x),
|
ε=0
∂ε
Standard variational formulae (see e.g. [41, Vol. I, 1.2.1]) yield
Z
δF
F (µε ) − F (µ)
=−
(x) ∇ · ρ(x)ξ(x) dx,
(4.66)
lim
ε→0
ε
Rd δρ
which shows (4.62).
Let us now suppose that w ∈ ∂F (µ)∩Tanµ P2 (Rd ); then (4.66) holds whenever i + εξ is, an optimal transport map for |ε| small enough, and in particular
for gradient vector fields ξ = ∇ζ with ζ ∈ Cc∞ (Rd ). Since Tanµ P2 (Rd ) is the
closure in L2 (µ; Rd ) of the space of such gradients, we have
Z
Z
δF
w(x)·ξ(x) dµ(x) = −
∇
(x)·ξ(x) dµ(x) ∀ ξ ∈ Tanµ P2 (Rd ). (4.67)
δρ
Rd
Rd
We obtain (4.61) noticing that δF /δρ ∈ Tanµ P2 (Rd ), by the assumption that
ρ ∈ Cc2 (Rd ).
4.5.2
The potential energy
Let V : RRd → (−∞, +∞] be a proper, l.s.c. and λ-convex functional and let
V(µ) = Rd V dµ be defined on P2 (Rd ). We denote by graph ∂V the graph
of the Fréchet subdifferential of V in Rd × Rd , i.e. the subset of the couples
(x1 , x2 ) ∈ Rd × Rd satisfying
V (x3 ) ≥ V (x1 ) + hx2 , x3 − x1 i +
λ
|x1 − x2 |2
2
∀ x3 ∈ Rd .
(4.68)
As usual, ∂ o V (x) denotes the element of minimal norm in ∂V (x).
Notice that the potential energy functional (as well as the interaction energy
functional) fails to satisfy (4.30a), and for this reason it would be more appropriate to consider a more general notion of subdifferential, involving plans and
not only maps as elements of the subdifferential, and, at the same time, takes
onto account transport plans and not only transport maps (see §10.3 of [8]).
In the present case, we choose an intermediate generalization, and say that
ξ ∈ L2 (µ; Rd ) belongs to the Fréchet subdifferential ∂V(µ) at µ ∈ D(V) if
Z
(4.69)
V(ν) − V(µ) ≥
inf
hξ(x), y − xi dγ(x) + o W2 (µ, ν) .
γ∈Γo (µ,ν)
Rd ×Rd
The following result is proved in Proposition 10.4.2 of [8].
57
Proposition 4.12 Let µ ∈ P2 (Rd ) and ξ ∈ L2 (µ; Rd ). Then
(i) ξ is a subdifferential of V at µ iff ξ(x) ∈ ∂V (x) for µ-a.e. x.
(ii) ∂ ◦ V(µ) = ∂ ◦ V (x) for µ-a.e. x.
4.5.3
The internal energy
Let F be the functional
Z

F (ρ(x)) dL d (x) if µ = ρ · L d ∈ P2a (Rd ),
F(µ) :=
Rd

+∞
otherwise,
(4.70)
for a convex differentiable function satisfying
F (0) = 0,
lim inf
s↓0
d
F (s)
> −∞ for some α >
sα
d+2
(4.71)
as in Example 3.8. Recall that if F has superlinear growth at infinity then
the functional F is l.s.c. with respect to the narrow convergence (indeed, under
this growth condition the lower semicontinuity can be checked w.r.t. to the
stronger weak L1 convergence, by Dunford-Pettis theorem, and lower semicontinuity w.r.t. weak L1 convergence is a direct consequence of the convexity of
F).
We confine our discussion to the case when F has a more than linear growth
at infinity, i.e.
F (z)
lim
= +∞,
(4.72)
z→+∞
z
see Theorem 10.4.6 and Theorem 10.4.8 of [8] for a discussion of the (sub)linear
case.
We set LF (z) = zF 0 (z) − F (z) : [0, +∞) → [0, +∞) and we observe that LF
is strictly related to the convex function
G(z, s) := sF (z/s),
z ∈ [0, +∞), s ∈ (0, +∞),
(4.73)
since
∂
z
G(z, s) = − F 0 (z/s) + F (z/s) = −LF (z/s).
∂s
s
In particular (recall that F (0) = 0, by (4.71))
G(z, s) ≤ F (z)
for s ≥ 1,
F (z) − G(z, s)
↑ LF (z)
s−1
as s ↓ 1.
(4.74)
(4.75)
We will also suppose that F satisfies the condition
the map s 7→ sd F (s−d )
is convex and non increasing in (0, +∞),
(4.76)
yielding the geodesic convexity of F.
The following lemma shows the existence of the directional derivative of F
along a suitable class of directions including all optimal transport maps.
58
Lemma 4.13 (Directional derivative of F) Suppose that F : [0, +∞) → R
is a convex differentiable function satisfying (4.71), (4.72) and (4.76). Let µ =
ρL d ∈ D(F), r ∈ L2 (µ; Rd ) and t̄ > 0 be such that
(i) r is differentiable ρL d -a.e. and r t := (1 − t)i + tr is ρL d -injective with
| det ∇r t (x)| > 0 ρL d -a.e., for any t ∈ [0, t̄];
(ii) ∇r t̄ is diagonalizable with positive eigenvalues;
(iii) F((r t̄ )# µ) < +∞.
Then the map t 7→ t−1 F((r t )# µ) − F ∗ (µ) is nondecreasing in [0, t̄] and
Z
F((r t )# µ) − F(µ)
˜ − i) dx.
=−
LF (ρ)tr ∇(r
(4.77)
+∞ > lim
t↓0
t
Rd
The identity above still holds when assumptions (ii) on r is replaced by
˜ − i)kL∞ (ρL d ;Rd×d ) < +∞ (in particular if r − i ∈ Cc∞ (Rd ; Rd )),
(ii’) k∇(r
and F satisfies in addition the “doubling” condition
∃C > 0 :
F (z + w) ≤ C 1 + F (z) + F (w)
∀ z, w.
(4.78)
Proof. By assumptions (i) and (ii), taking into account Lemma 2.2 we have
Z
Z
ρ(x)
det ∇r t (x) dx −
F (ρ(x)) dx
F((r t )# µ) − F(µ) =
F
det ∇r t (x)
Rd
Rd
Z =
G(ρ(x), det ∇r t (x)) − F (ρ(x)) dx
Rd
for any t ∈ (0, t̄]. Assumption (4.76), together with the concavity of the map
t 7→ [det ((1 − t)I + t∇r)]1/d , implies that the function
G(ρ(x), det ∇r t ) − F (ρ(x))
t ∈ (0, t̄]
(4.79)
t
is nondecreasing w.r.t. t and bounded above by an integrable function (take
t = t̄ and apply (iii)). Therefore the monotone convergence theorem gives
Z
d
F((r t )# µ) − F(µ)
=
G(ρ(x), det ∇r t (x))t=0 dx
lim
t↓0
t
dt
d
R
and the expansion det ∇r t = 1 + t tr ∇(r − i) + o(t) together with (4.74) give
the result.
In the case when (ii’) holds, the argument is analogous but, since condition
(ii) fails, we cannot rely anymore on the monotonicity of the function in (4.79).
However, using the inequalities
F (w) − F (0) ≤ wF 0 (w) ≤ F (2w) − F (w)
and the doubling condition we easily see that the derivative w.r.t. s of the function G(z, s) can be bounded by C(1+F + (z)) for |s−1| ≤ 1/2. Therefore we can
use the dominated convergence theorem instead of the monotone convergence
theorem to pass to the limit.
59
The next technical lemma shows that we can “integrate by parts” in (4.77)
preserving the inequality, if LF (ρ) is locally in W 1,1 .
Lemma 4.14 (A “weak” integration by parts formula) Under the same
assumptions of Lemma 4.13, let us suppose that
(i) supp µ ⊂ Ω, Ω being a convex open subset of Rd (not necessarily bounded);
1,1
(ii) LF (ρ) ∈ Wloc
(Ω);
(iii) supp((r t̄ )# µ) is a compact subset of Ω for some t̄ ∈ [0, 1];
(iv) r ∈ BVloc (Rd ; Rd ) and D · r ≥ 0.
Then we can find an increasing family of nonnegative Lipschitz functions χk :
Rd → [0, 1] with compact support in Ω such that χk ↑ χΩ and
Z
Z
−
LF (ρ(x))tr ∇(r − i) dx ≥ lim sup
h∇LF (ρ), r − iiχk dx.
(4.80)
k→∞
Rd
Rd
Proof. Possibly replacing r by r t̄ , we can assume that t̄ = 1 in (iii). Let us
first recall that by Calderon-Zygmund theorem (see for instance [7]) the pointwise divergence tr (∇r) is the absolutely continuous part of the distributional
divergence D · r; therefore we have
Z
Z
v tr (∇r) dx ≤ −
h∇v, ri dx,
(4.81)
Rd
Rd
provided v ∈ Cc∞ (Rd ) is nonnegative. As r is bounded, by approximation the
same inequality remains true for every nonnegative function v ∈ W 1,1 (Rd ). For
every Lipschitz function η : Rd → [0, 1] with compact support in Ω, choosing
v := ηLF (ρ) ∈ W 1,1 (Rd ) we get
Z
Z
ηLF (ρ) tr (∇r) dx ≤ −
h∇ ηLF (ρ) , ri dx.
(4.82)
Rd
Rd
On the other hand, a standard integration by parts yields
Z
Z
ηLF (ρ) tr (∇i) dx = − h∇ ηLF (ρ) , ii dx;
(4.83)
Ω
Ω
summing up with (4.82) and inverting the sign we find
Z
Z
−
ηLF (ρ) tr (∇(r − i)) dx ≥
h∇ ηLF (ρ) , r − ii dx.
Rd
(4.84)
Rd
Now we choose carefully the test function η. We consider an increasing family
bounded open convex sets Ωk such that
Ωk ⊂⊂ Ω,
Ω=
∞
[
k=1
60
Ωk
and for each convex set Ωk we consider the function
χk (x) := k d(x, Rd \ Ωk ) ∧ 1.
(4.85)
χk is an increasing family of nonnegative Lipschitz functions which take their
values in [0, 1] and satisfy χk (x) ≡ 1 if d(x, Rd \ Ωk ) ≥ k1 ; in particular, χk ≡ 1
in K for k sufficiently large. Moreover χk is concave in Ωk , since the distance
function d(·, Rd \ Ωk ) is concave. Choosing η := χk in (4.84) we get
Z
Z
χk LF (ρ) tr (∇(r − i)) dx ≥
−
h∇LF (ρ), r − ii χk dx
Rd
Rd
Z
+
h∇χk , r − ii LF (ρ) dx
(4.86)
Ωk
Z
≥
h∇LF (ρ), r − ii χk dx
Rd
since the integrand of (4.86) is nonnegative: in fact, for L d -a.e. x ∈ Ωk where
LF (ρ(x)) is strictly positive, the concavity of χk and r(x) ∈ K yields
h∇χk (x), r(x) − i(x)i ≥ χk (r(x)) − χk (x) = 1 − χk (x) ≥ 0.
Passing to the limit as k → ∞ in the previous integral inequality, we obtain
(4.80) (recall that the function in the left hand side of (4.80) is semiintegrable
by (4.77)).
In the following theorem we characterize the minimal selection in the subdifferential of F and give, under the doubling condition, a formula for the slope
of the functional.
Theorem 4.15 (Slope and subdifferential of F) Suppose that F : [0, +∞) →
R is a convex differentiable function satisfying (4.71), (4.72), (4.76) and (4.78).
Assume that F has finite slope at µ = ρL d ∈ P2a (Rd ). Then LF (ρ) ∈
W 1,1 (Rd ), ∇LF (ρ) = wρ for some function w ∈ L2 (ρL d ; Rd ) and
Z
1/2
|w(x)|2 ρ(x) dx
= |∂F|(µ) < +∞.
(4.87)
Rd
1,1
Conversely, if ∇LF (ρ) ∈ Wloc
(Rd ) and ∇LF (ρ) = wρ for some w ∈ L2 (µ; Rd ),
then F has a finite slope at µ = ρL d and w = ∂ ◦ F(µ).
Proof. (a) We apply first (4.77) with r = 0 ρL d -a.e. and r = i and take into
account that
W2 (µ, ((1 − t)i + tr)# µ) ≤ tkikL2 (ρL d ;Rd )
to obtain
Z
LF (ρ) dx ≤ |∂F|(µ)kikL2 (ρL d ;Rd ) ,
d
Rd
61
so that LF (ρ) ∈ L1 (Rd ). Next, we apply (4.77) with r −i equal to a Cc∞ (Rd ; Rd )
function t (notice that condition (i) holds with t̄ < sup |∇t|) and use again the
inequality W2 (µ, ((1 − t)i + tr)# µ) ≤ tkr − ikL2 (ρL d ) to obtain
Z
LF (ρ)tr (∇t) dx ≤ |∂F ∗ |(µ)ktkLp (ρL d ) ≤ |∂F ∗ |(µ) sup |t|.
Rd
Rd
As t is arbitrary, Riesz theorem gives that LF (ρ) is a function of bounded
variation (i.e. its distributional derivative DLF (ρ) is a finite Rd -valued measure
in Rd ), so that we can rewrite the inequality as
d Z
X
ti dDi LF (ρ) ≤ |∂F|(µ)ktkL2 (ρL d ;Rd ) .
Rd
i=1
By L2 duality theory there exists w ∈ L2 (ρL d ; Rd ) with kwk2 ≤ |∂F|(µ) such
that
Z
d Z
X
ti dDi LF (ρ) =
hw, ti dρL d
∀t ∈ Cc∞ (Rd , Rd ).
i=1
Rd
Rd
Therefore LF (ρ) ∈ W 1,1 (Rd ) and ∇LF (ρ) = wρ. This leads to the inequality
≤ in (4.87).
In order to show that equality holds in (4.87) we will prove that (i × w)# µ
belongs to ∂F(µ). We have to show that (4.36) holds for any ν ∈ D(F). Using
the doubling condition it is also easy to find a sequence of measures νh with
compact support converging to ν in P2 (Rd ) and such that F(νh ) converges to
F(ν), hence we can also assume that supp ν is compact.
As tνµ is induced by the gradient of a Lipschitz and convex map ϕ, we know
that all the conditions of Lemma 4.13 are fulfilled with r = ∇ϕ, and also Lemma
4.14 holds; therefore, by applying (4.77), the geodesic convexity of F, and (4.80)
we obtain
Z
F(ν) − F(µ) ≥ lim sup
h∇LF (ρ), (r − i))χh dx
d
h→∞
Z
ZR
χ
= lim sup
hw, (r − i)) h ρ dx =
hw, r − ii dµ,
h→∞
Rd
Rd
proving that w ∈ ∂F(µ).
Finally, we notice that our proof that w = ∇LF (ρ)/ρ ∈ ∂F(µ) does not
use the finiteness of slope, but only the assumption w ∈ L2 (µ; Rd ), therefore
these conditions imply that the subdifferential is not empty and that the slope
is finite.
4.5.4
The relative internal energy
In this section we briefly discuss the modifications which should be apported
to the previous results, when one consider a relative energy functional as in
Section 3.3.
62
We thus consider a log-concave probability measure γ = e−V · L d ∈ P(Rd )
induced by a convex l.s.c. potential
V : Rd → (−∞, +∞],
with Ω = int D(V ) 6= ∅.
(4.88)
We are also assuming that the energy density
F : [0, +∞) → [0, +∞] is convex and l.s.c.,
it satisfies the doubling property (4.78),
(4.89)
and the geodesic convexity condition (3.22),
which yield that the map s 7→ F̂ (s) := F (e−s )es is convex and non increasing
in R. The functional
Z
Z
F(µ|γ) :=
F (σ) dγ =
F ρ/e−V e−V dx,
µ = σ · γ = ρL d (4.90)
Rd
Ω
is therefore geodesically convex in P2 (Rd ), by Theorem 3.23. It is easy to check
that whenever F̂ is not constant (case which corresponds to a linear F and a
constant functional F), F has a superlinear growth and therefore F is lower
semicontinuous in P2 (Rd ).
Theorem 4.16 (Subdifferential of F(·|γ)) The functional F(·|γ) has finite
1,1
slope at µ = σ · γ ∈ D(F) if and only if LF (σ) ∈ Wloc
(Ω) and ∇LF (σ) = σw
for some function w ∈ L2 (µ; Rd ). In this case
Z
1/2
= |∂F|(µ),
(4.91)
|w(x)|2 dµ(x)
Rd
and w = ∂ ◦ F(µ).
Proof.
We argue as in Theorem 4.15: in the present case the directional
derivative formula (4.77) becomes
F((r t )# µ|γ) − F(µ|γ)
+∞ > lim
t↓0
t
Z
˜ − i) − e−V h∇V, r − ii dx
=−
LF (ρ/e−V ) e−V tr ∇(r
d
ZR
˜ e−V (r − i) dx
=−
LF (σ)tr ∇
(4.92)
Rd
for every vector field r satisfying the assumptions of Lemma 4.13 and F(r # µ|γ) <
+∞. Choosing as before r = i + eV t, t ∈ Cc∞ (Ω; Rd ), since V is bounded in
each compact subset of Ω, we get
Z
LF (σ)tr ∇t dx ≤ |∂F|(µ) sup |eV t|,
Rd
Ω
63
so that LF (σ) ∈ BVloc (Ω). Choosing now r = i + t with t ∈ Cc∞ (Ω; Rd ) we get
d Z
X
ti dDi LF (σ) dγ ≤ |∂F|(µ)ktkL2 (µ;Rd )
Ω
i=1
so that there exists w ∈ L2 (µ; Rd ) such that
d Z
X
i=1
Ω
Z
Z
hw, ti dµ =
ti dDi LF (σ) dγ =
Rd
hρw, tie−V dx
Rd
∀ t ∈ Cc∞ (Ω; Rd ),
1,1
thus showing that LF (σ) ∈ Wloc
(Ω) and ∇LF (σ) = ρe−V w = σw.
1,1
Conversely, if LF (σ) ∈ Wloc (Ω) with ∇LF (σ) = σw and w ∈ L2 (µ; Rd ),
arguing as in Lemma 4.14 we have for every measure ν = r # µ with compact
support in Ω
Z
˜ e−V (r − i) χk dx
F(ν|γ) − F(µ|γ) ≥ lim sup −
LF (σ)tr ∇
k→∞
Z Ω
hχk ∇LF (σ) + LF (σ)∇χk , r − ii dγ
≥ lim sup
k→∞
ZΩ
h∇LF (σ), r − iiχk dγ
≥ lim sup
k→∞
Ω
Z
Z
χ
hw, r − ii k dµ =
hw, r − ii dµ,
≥ lim sup
k→∞
Ω
Ω
which shows, through a density argument, that w ∈ ∂F(µ).
4.5.5
The interaction energy
In this section we consider the interaction energy functional W : P2 (Rd ) →
[0, +∞] defined by
Z
1
W(µ) :=
W (x − y) dµ × µ(x, y).
2 Rd ×Rd
Without loss of generality we shall assume that W : Rd → [0, +∞) is an even
function; our main assumption, besides the convexity of Rd , is the doubling
condition
∃ CW > 0 : W (x + y) ≤ CW 1 + W (x) + W (y) ∀ x, y ∈ Rd .
(4.93)
Let us first state a preliminary result: we are denoting by µ̄ the barycenter of
the measure µ:
Z
µ̄ :=
x dµ(x).
(4.94)
Rd
64
Lemma 4.17 Assume that W : Rd → [0, +∞) is convex, Gateaux differentiable, even, and satisfies the doubling condition (4.93). Then for any µ ∈ D(W)
we have
Z
W (x) dµ(x) ≤ CW 1 + W(µ) + W (µ̄) < +∞,
(4.95)
Rd
Z
|∇W (x − y)| dµ × µ(x, y) ≤ CW 1 + SW + W(µ) < +∞,
(4.96)
Rd ×Rd
where SW := sup|y|≤1 W (y). In particular w := (∇W ) ∗ µ is well defined for
µ-a.e. x ∈ Rd , it belongs to L1 (µ; Rd ), and it satisfies
Z
h∇W (x1 − x2 ), y1 − x1 i dγ(x1 , y1 ) dµ(x2 )
R2d ×Rd
Z
(4.97)
hw(x1 ), y1 − x1 i dγ(x1 , y1 ),
=
R2d
for every plan γ ∈ Γ(µ, ν) with ν ∈ D(W). In particular, choosing γ := (i ×
r)# µ, we have
Z
Z
h∇W (x − y), r(x)i dµ × µ(x, y) =
hw(x), r(x)i dµ(x)
(4.98)
Rd ×Rd
Rd
for every vector field r ∈ L∞ (µ; Rd ) and for r := λi, λ ∈ R.
Proof. By Jensen inequality we have
Z
W (x − y) dµ(y) ∀ x ∈ Rd ,
W (x − µ̄) ≤
(4.99)
Rd
so that a further integration yields
Z
W (x − µ̄) dµ(x) ≤ W(µ);
(4.100)
Rd
(4.95) follows directly from (4.100) and the doubling condition (4.93), since
W (x) ≤ CW 1 + W (x − µ̄) + W (µ̄) .
Combining the doubling condition and the convexity of W we also get
|∇W (x)| = sup h∇W (x), yi ≤ sup W (x + y) − W (x)
|y|≤1
|y|≤1
≤ CW 1 + W (x) + sup W (y) ,
(4.101)
|y|≤1
which yields (4.96).
If now ν ∈ D(W) and γ ∈ Γ(µ, ν), then the positive part of the map
(x1 , y1 , x2 ) 7→ h∇W (x1 − x2 ), y1 − x1 i belongs to L1 (γ × µ) since convexity
yields
h∇W (x1 − x2 ), y1 − x1 i ≤ W (y1 − x2 ) − W (x1 − x2 ),
65
and the right hand side of this inequality is integrable:
Z
Z
W (y1 −x2 ) dγ×µ =
W (y1 −x2 ) dν×µ ≤ C 1+W(ν)+W(µ)+W (ν̄−µ̄) ,
R3d
R2d
Z
Z
W (x1 − x2 ) dγ × µ =
R3d
W (x1 − x2 ) dµ × µ = W(µ).
R2d
Therefore we can apply Fubini-Tonelli theorem to obtain
Z
h∇W (x1 − x2 ), y1 − x1 i dγ × µ(x1 , y1 , x2 )
R3d
Z Z
=
h∇W (x1 − x2 ), y1 − x1 i dµ(x2 ) dγ(x1 , y1 )
2d
ZR ZX
=
h
∇W (x1 − x2 ) dµ(x2 ) , y1 − x1 i dγ(x1 , y1 )
2d
X
ZR
=
hw(x1 ), y1 − x1 i dγ(x1 , y1 ),
R2d
which yields (4.97).
As the interaction energy fails to satisfy (4.30a), as we did for the potential
enegy functional we say that ξ ∈ L2 (µ; Rd ) belongs to the Fréchet subdifferential
∂W(µ) at µ ∈ D(W) if
Z
W(ν) − W(µ) ≥
inf
hξ(x), y − xi dγ(x) + o W2 (µ, ν) . (4.102)
γ∈Γo (µ,ν)
Rd ×Rd
Theorem 4.18 (Minimal subdifferential of W) Assume that W : Rd →
[0, +∞) is convex, Gateaux differentiable, even, and satisfies the doubling condition (4.93). Then µ ∈ P2 (Rd ) belongs to D(|∂W|) if and only if w = (∇W )∗ρ ∈
L2 (µ; Rd ). In this case w = ∂ ◦ W(µ).
Proof. As we did for the internal energy functional, we start by computing
the directional derivative of W along a direction induced by a transport map
r = i + t, with t bounded and with a compact support (by the growth condition
on W , this ensures that W(r # µ) < +∞). Since the map
t 7→
W ((x − y) + t(t(x) − t(y))) − W (x − y)
t
is nondecreasing w.r.t. t, the monotone convergence theorem and (4.98) give
(taking into account that ∇W is an odd function)
+∞
W((i + tt)# µ − W(µ)
> lim
t↓0
t
Z
Z
1
h∇W (x − y), (t(x) − t(y))i dµ × µ =
hw, ti dµ.
=
2 Rd ×Rd
Rd
66
On the other hand, since |∂W|(µ) < +∞, using the inequality W2 ((i+tt)# µ, µ) ≤
ktkL2 (µ;Rd ) we get
Z
hw, ti dµ ≥ −|∂W|(µ)ktkL2 (µ;Rd ) ;
Rd
changing the sign of t we obtain
Z
≤ |∂W|(µ)ktkL2 (µ;Rd ) ,
hw,
ti
dµ
d
R
and this proves that w ∈ L2 (µ; Rd ) and that kwkL2 ≤ |∂W|(µ).
Now we prove that if w = (∇W ) ∗ µ ∈ L2 (µ; X), then it belongs to ∂W(µ).
Let us consider a test measure ν ∈ D(W), a plan γ ∈ Γ(µ, ν), and the directional
derivative of W along the direction induced by γ. Since the map
t 7→
W ((1 − t)(x1 − x2 ) + t(y1 − y2 )) − W (x1 − x2 )
t
is nondecreasing w.r.t. t, the monotone convergence theorem, the fact that ∇W
is an odd function, and (4.98) give
W(((1 − t)π 1 + tπ 2 )# γ − W(µ)
W(ν) − W(µ) ≥ lim
t↓0
t
Z
1
=
h∇W (x1 − x2 ), (y1 − x1 ) − (y2 − x2 )i dγ × γ
2 2d 2d
Z R ×R
hw(x1 ), y1 − x1 i dγ(x1 , y1 ),
=
R2d
and this proves that (i × w)# µ ∈ ∂W(µ).
4.5.6
The opposite Wasserstein distance
In this section we compute the (metric) slope of the function ψ(·) := − 21 W22 (·, µ2 ),
i.e. the limit
1
W 2 (ν, µ2 ) − W22 (µ, µ2 )
lim sup 2
= |∂ψ|(µ);
(4.103)
2 ν→µ
W2 (ν, µ)
observe that the triangle inequality shows that the “lim sup” above is always
less than W2 (µ, µ2 ); however this inequality is always strict when optimal plans
are not induced by transports, as the following theorem shows; the right formula
for the slope involves the minimal L2 norm of the barycentric projection of the
optimal plans and gives that the minimal selection is always induced by a map.
Theorem 4.19 (Minimal subdifferential of the opposite Wasserstein distance)
Let ψ(µ) = − 21 W22 (µ, µ2 ). Then
Z
2
2
2
|∂ψ| (µ) = min
|γ̄ − i| dµ : γ ∈ Γo (µ, µ )
X
67
∀µ ∈ P2 (X),
(4.104)
and ∂ ◦ ψ(µ) = γ̄ −i is a strong subdifferential, where γ is the unique minimizing
plan above.
Moreover µ 7→ |∂ψ|(µ) is lower semicontinuous with respect to narrow convergence in P(X), along sequences bounded in P2 (X).
4.5.7
The sum of internal, potential and interaction energy
In this section we consider, as in [21], the functional φ : P2 (Rd ) → (−∞, +∞]
given by the sum of internal, potential and interaction energy:
Z
Z
Z
1
F (ρ) dx +
V dµ +
W dµ × µ if µ = ρL d , (4.105)
φ(µ) :=
2 Rd ×Rd
Rd
Rd
setting φ(µ) = +∞ if µ ∈ P2 (Rd )\P2a (Rd ). Recalling the “doubling condition”
stated in (4.78), we make the following assumptions on F , V and W :
(F) F : [0, +∞) → R is a doubling, convex differentiable function with superlinear growth satisfying (4.71) (i.e. the bounds on F − ) and (4.76)
(yielding the geodesic convexity of the internal energy).
(V) V : Rd → (−∞, +∞] is a l.s.c. λ-convex function with proper domain
D(V ) with nonempty interior Ω ⊂ Rd .
(W) W : Rd → [0, +∞) is a convex, differentiable, even function satisfying the
doubling condition (4.93).
The finiteness of φ yields
supp µ ⊂ Ω = D(V ),
µ(∂Ω) = 0,
(4.106)
so that its density ρ w.r.t. L d can be considered as a function of L1 (Ω).
The same monotonicity argument used in the proof of Lemma 4.13 gives
R
R
Z
V d((1 − t)i + tr))# µ − Rd V dµ
Rd
=
h∇V, r − ii dµ, (4.107)
+∞ > lim
t↓0
t
Rd
R
R
whenever both Rd V dµ < +∞ and Rd V dr # µ < +∞.
Analogously, denoting by W the interaction energy functional induced by
W/2, arguing as in the first part of Theorem 4.18 we have
Z
W(((1 − t)i + tr)# µ) − W(µ)
+∞ > lim
=
h(∇W ) ∗ µ), r − ii dµ, (4.108)
t↓0
t
Rd
whenever W(µ) + W(r # µ) < +∞. The growth condition on W ensures that
µ ∈ D(W) implies r # µ ∈ D(W) if either r − i is bounded or r = 2i (here we
use the doubling condition).
We have the following characterization of the minimal selection in the subdifferential ∂ ◦ φ(µ):
68
Theorem 4.20 (Minimal subdifferential of φ) A measure µ = ρL d ∈ D(φ) ⊂
1,1
P2 (Rd ) belongs to D(|∂φ|) if and only if LF (ρ) ∈ Wloc
(Ω) and
ρw = ∇LF (ρ) + ρ∇V + ρ(∇W ) ∗ ρ
for some
w ∈ L2 (µ; Rd ).
(4.109)
In this case the vector w defined µ-a.e. by (4.109) is the minimal selection in
∂φ(µ), i.e. w = ∂ ◦ φ(µ).
Proof.
We argue exactly as in the proof of Theorem 4.15, computing the
Gateaux derivative of φ in several directions r, using Lemma 4.13 for the internal
energy and (4.107), (4.108) respectively for the potential and interaction energy.
Choosing r = i + t, with t ∈ Cc∞ (Ω; Rd ), we obtain
Z
Z
Z
−
LF (ρ)∇ · t dx +
h∇V, ti dµ +
h(∇W ) ∗ ρ, ti dµ ≥ −|∂φ|(µ)ktkL2 (µ) .
Rd
Rd
Rd
(4.110)
Since V is locally Lipschitz in Ω and ∇W ∗ ρ is locally bounded, following the
same argument of Theorem 4.15, we obtain from (4.110) first that LF (ρ) ∈
1,1
BVloc (Rd ) and then that LF (ρ) ∈ Wloc
(Rd ), with
∇LF (ρ) + ρ∇V + ρ(∇W ) ∗ ρ = wρ
for some w ∈ L2 (µ; Rd )
(4.111)
with kwkL2 ≤ |∂φ|(µ).
In order to show that the vector w is in the subdifferential (and then, by
the previous estimate, it is the minimal selection) we choose eventually a test
measure ν ∈ D(φ) with compact support contained in Ω and the associated
optimal transport map r = tνµ ; Lemma 4.13, (4.107), (4.108), and Lemma 4.14
yield
d
φ(ν) − φ(µ) ≥ φ (((1 − t)i + tr)# µ) t=0+
dt
Z
Z
Z
=−
LF (ρ)∇ · (r − i) dx +
h∇V, r − ii dµ +
h(∇W ) ∗ ρ, r − ii dµ
Ω
Ω
Ω
Z
Z
≥ lim sup
h∇LF (ρ), r − iiχh dx +
h∇V + (∇W ) ∗ ρ, r − ii dµ
h→∞
Ω
Ω
Z
= lim sup
h∇LF (ρ) + ρ∇V + ρ(∇W ) ∗ ρ, r − iiχh dx
Zh→∞ Ω
Z
hρw, r − ii dx =
hw, r − ii dµ.
=
Ω
Ω
Finally, we notice that the proof that w belongs to the subdifferential did not
use the finiteness of slope, but only the assumption (previously derived by the
1,1
finiteness of slope) that LF (ρ) ∈ Wloc
(Ω), (4.109), and φ(µ) < +∞; therefore
these conditions imply that the subdifferential is not empty, hence the slope is
finite and the vector w is the minimal selection in ∂φ(µ).
An interesting particular case of the above result is provided by the relative
entropy functional: let us choose W ≡ 0 and
F (s) := s log s,
γ :=
1 −V
e
· L d = e−(V (x)+log Z) · L d ,
Z
69
with Z > 0 chosen so that γ(Rd ) = 1. Recalling Remark 3.16, the functional φ
can also be written as
φ(µ) = H(µ|γ) − log Z.
(4.112)
Since in this case LF (ρ) = ρ, a vector w ∈ L2 (µ; Rd ) is the minimal selection
∂ ◦ φ(µ) if and only if
Z
Z
Z
−
∇ · ζ(x) dµ(x) =
hw(x), ζ(x)i dµ(x) −
h∇V (x), ζ(x)i dµ(x),
Rd
Rd
Rd
(4.113)
for every test function ζ ∈ Cc∞ (Rd ; Rd ); (4.113) can also be written in terms of
σ = dµ
dγ as
Z
−
σ∇ · (e−V (x) ζ(x)) dx =
Rd
Z
hσw(x), e−V (x) ζ(x)i dx,
Rd
which shows that σw = ∇σ.
70
(4.114)
5
Gradient flows of λ-geodesically convex functionals in P2 (Rd )
In this chapter we state some structural results, concerning existence, uniqueness, approximation, and qualitative properties of gradient flows in P2 (Rd )
generated by
a proper and l.s.c. functional φ : P2 (Rd ) → (−∞, +∞].
(5.1a)
We will also assume that
φ is λ-geodesically convex, according to Definition 3.1.
(5.1b)
Since we are mostly concnerned with absolutely continuous measures, some
technical details will be simpler assuming that
D(|∂φ|) ⊂ P2a (Rd );
(5.1c)
finally, the (simplified) existence theory we are presenting here will also require
that for some τ∗ > 0
1 2
W (µ, ν) + φ(ν) admits at least
2τ 2
a minimum point µτ , for all τ ∈ (0, τ∗ ) and µ ∈ P2 (X).
the map ν 7→ Φ(τ, µ; ν) =
(5.1d)
Notice that (5.1b) gives that any minimizer µτ in (5.1d) belongs to P2a (Rd ),
due to Lemma 4.4.
Remark 5.1 (5.1d) is slightly more restrictive than lower semicontinuity in
P2 (Rd ); by the standard direct method in Calculus of Variations, it surely
holds if φ satisfies the following coerciveness-l.s.c. conditions:
inf
µ∈P2
(Rd )
φ(µ) +
)
µn → µ narrowly in P(Rd )
sup m2 (µn ) < +∞
1 2
m (µ) > −∞,
2τ∗ 2
=⇒
lim inf φ(µn ) ≥ φ(µ).
n→∞
(5.2a)
(5.2b)
n
Another sufficient condition yielding (5.1d) and satisfied by our main examples
is (5.60): it will be introduced in the “existence” Theorem 5.8.
The inclusion (5.1c) is a simplifying assumption, which ensures that the
flows stay inside the absolutely continuous measures, thus avoiding more complicated notions of subdifferentials (see Chapter 11 of [8], where this restriction
is completely removed).
2
Definition 5.2 (Gradient flows) We say that a map µt ∈ ACloc
(0, +∞); P2 (Rd )
is a solution of the gradient flow equation
v t ∈ −∂φ(µt )
71
t > 0,
(5.3)
if, for L 1 -a.e. t > 0, µt ∈ P2a (Rd ) and its velocity vector field v t ∈ Tanµt P2 (Rd )
belongs to the subdifferential (4.19) of φ at µt .
Recalling the characterization of the tangent velocity field to an absolutely continuous curve, the above definition is equivalent to the requirement that there
exists a Borel vector field v t such that
v t ∈ Tanµt P2 (Rd ) for L 1 -a.e. t > 0, kv t kL2 (µt ;Rd ) ∈ L2loc (0, +∞),
(5.4a)
the continuity equation
∂t µt + ∇ · v t µt = 0
in Rd × (0, +∞)
(5.4b)
holds in the sense of distributions according to (2.47), and finally
−v t ∈ ∂φ(µt )
for L 1 -a.e. t > 0.
(5.4c)
Before studying the question of existence of solutions to (5.3), which we will
postpone to the next sections, we want to discuss some preliminary issues.
5.1
Uniqueness and various characterizations of gradient
flows
Theorem 5.3 (Gradient flows, E.V.I., and curves of Maximal Slope)
Let φ : P2 (Rd ) → (−∞, +∞] be as in (5.1a,b). An absolutely continuous curve
2
(0, +∞); P2 (Rd ) with µt ∈ P2a (Rd ) for L 1 -a.e. t ∈ (0, +∞) is a
µ ∈ ACloc
gradient flow of φ according to Definition 5.2 if and only if it satisfies one of
the following equivalent characterizations:
i) There exists a Borel vector field ṽ t with kṽ t kL2 (µt ;Rd ) in L2loc (0, +∞) such
that
∂t µt + ∇ · (ṽ t µt ) = 0
in Rd × (0, +∞),
in the sense of distributions, and
Z
λ
−
hṽ t , tσµt − ii dµt ≤ φ(σ) − φ(µt ) − W22 (σ, µt )
2
Rd
(5.5a)
∀σ ∈ D(φ), (5.5b)
L 1 -a.e. in (0, +∞).
ii) Every Borel vector field ṽ t with kṽ t kL2 (µt ;Rd ) in L2loc (0, +∞) (in particular the velocity vector field v t ∈ Tanµt P2 (Rd )) satisfying the continuity
equation
∂t µt + ∇ · (ṽ t µt ) = 0
in Rd × (0, +∞),
(5.6a)
in the sense of distributions, satisfies the variational inequality
Z
λ
−
hṽ t , tσµt − ii dµt ≤ φ(σ) − φ(µt ) − W22 (σ, µt ) ∀σ ∈ D(φ), (5.6b)
2
d
R
for t ∈ (0, +∞) \ N , N being a L 1 -negligible set.
72
iii) The metric Evolution Variational Inequalities
1 d 2
λ
W (µt , σ) + W22 (µt , σ) ≤ φ(σ) − φ(µt )
2 dt 2
2
for L 1 -a.e. t > 0
(5.7)
hold for every σ ∈ D(φ).
iv) The map t 7→ φ(µt ) is locally absolutely continuous in (0, +∞) and
−
1
1
d
φ(µt ) ≥ kv t k2L2 (µt ;Rd ) + |∂φ|2 (µt )
dt
2
2
L 1 -a.e. in (0, +∞).
(5.8)
v) The map t 7→ φ(µt ) is locally absolutely continuous in (0, +∞) and
−
d
φ(µt ) = kv t k2L2 (µt ;Rd ) = |∂φ|2 (µt )
dt
L 1 -a.e. in (0, +∞).
(5.9)
In particular, (5.3) and v) yield
−v t = ∂ ◦ φ(µt )
for L 1 -a.e. t > 0.
(5.10)
Proof.
i) If µt is a gradient flow according to Definition 5.2, recalling the
property of the subdifferential (4.36), it is immediate that µt and its velocity
vector field v t satisfy (5.5a,b).
Conversely, suppose that ṽ t satisfies (5.5a,b) and let us denote by v t ∈
Tanµt P2 (Rd ) the tangent velocity vector of µt . Since, by (2.56), for L 1 -a.e.
t > 0 v t is the orthogonal projection of ṽ t on Tanµt P2 (Rd ), the difference
ṽ t − v t is orthogonal to the tangent space, and therefore by Theorem 2.24 we
have
Z
hṽ t − v t , tσµt − ii dµt = 0 ∀ σ ∈ P2 (Rd ), for L 1 -a.e. t > 0.
(5.11)
Rd
As a consequence, v t fulfils (5.5b) for L 1 -a.e. t, and this property characterizes
the elements of the subdifferential.
ii) follows by the same argument, thanks to (5.11).
iii) Assume that (5.7) holds for all σ ∈ D(φ). For any σ ∈ D(φ) fixed, the
differentiability of W22 stated in Lemma 2.23 gives
Z
1 d 2
W2 (µt , σ) =
hv t , i − tσµt i dµt for L 1 -a.e. t ∈ (0, +∞).
2 dt
d
R
Therefore we can find, for any countable set D ⊂ D(φ), a L 1 -negligble set of
times N such that
Z
λ
(5.12)
−
hv t , tσµt − ii dµt ≤ φ(σ) − φ(µt ) − W22 (σ, µt )
2
Rd
holds for all t ∈ (0, +∞) \ N and all φ ∈ D. Choosing D to be dense relative to
the distance W2 (µ, ν) + |φ(µ) − φ(ν)| in D(φ), we obtain that (5.5b) holds for
all t ∈ (0, +∞) \ N . The converse implication is analogous.
73
L’argomento di densita’ qui mi sembra
inutile
iv) If µt is a gradient flow in the sense of (5.3), taking into account that
|mu0t | = kv t kL2 (µt ;Rd ) and that |∂φ(µt )| ≤ kv t kL2 (µt ;Rd ) (by (4.52)) we obtain
|∂φ(µt )||µ0t | ∈ L1loc ((0, +∞)) .
Thanks to the λ-convexity and the lower semicontinuity of φ, this implies
(see Corollary 2.4.10 in [8]) that t 7→ φ(µt ) is locally absolutely continuous
in (0, +∞). Then, the chain rule (4.55) easily yields
Z
d
− φ(µt ) =
|v t |2 dµt ≥ |∂φ|2 (µt )
(5.13)
dt
Rd
and therefore (5.8).
Conversely, if t 7→ φ(µt ) is locally absolutely continuous and µt satisfies
(5.8), we know that ∂φ(µt ) 6= ∅ for L 1 -a.e. t > 0; thus the chain rule (4.55)
shows that
Z
d
ϕ(t) =
hξ, v t i dµt ∀ ξ ∈ ∂φ(µt ), for L 1 -a.e. t > 0.
(5.14)
dt
Rd
Choosing in particular ξ t = ∂ ◦ φ(µt ), for L 1 -a.e. t > 0 we get
Z 1
1
|v t |2 + |ξ t |2 + hξ t , v t i dµt ≤ 0.
2
Rd 2
(5.15)
It follows that
ξ t (x) = −v t (x)
for µt -a.e. x ∈ Rd ,
i.e. v t = −∂ ◦ φ(µt ).
v) is equivalent to iv) by the previous argument.
Remark 5.4 The “purely metric” formulations (5.7) or (5.8) do not require
that µt is an absolutely continuous measure at L 1 -a.e. t ∈ (0, +∞) and do
not depend on an explicit expression of the subdifferential of φ, as only the
metric slope is involved; therefore they can be used to define the gradient flow
of φ under more general assumptions: again, we refer to [8] for a complete
development of this approach.
Theorem 5.5 (Uniqueness of gradient flows) If µit : (0, +∞) → P2 (Rd ),
i = 1, 2, are gradient flows satisfying µit → µi ∈ P2 (Rd ) as t ↓ 0 in P2 (Rd ),
then
W2 (µ1t , µ2t ) ≤ e−λt W2 (µ1 , µ2 ) ∀ t > 0.
(5.16)
In particular, for any µ0 ∈ P2 (Rd ) there is at most one gradient flow µt satisfying the initial Cauchy condition µt → µ0 as t ↓ 0.
Proof. If µ1t , µ2t are two gradient flows satisfying the initial Cauchy condition
µit → µi as t ↓ 0, i = 1, 2, by the E.V.I. formulation (5.7) we can apply the next
Lemma 5.6 with the choices d(s, t) := W22 (µ1s , µ2t ), δ(t) := d(t, t), thus obtaining
δ 0 ≤ −2λδ. Since δ(0+ ) = W22 (µ1 , µ2 ) we obtain (5.16).
74
Lemma 5.6 Let d(s, t) : (a, b)2 → R be a map satisfying
|d(s, t) − d(s0 , t)| ≤ |v(s) − v(s0 )|,
|d(s, t) − d(s, t0 )| ≤ |v(t) − v(t0 )|
for any s, t, s0 , t0 ∈ (a, b), for some locally absolutely continuous map v :
(a, b) → R and let δ(t) := d(t, t). Then δ is locally absolutely continuous in
(a, b) and
d
d(t, t) − d(t − h, t)
d(t, t + h) − d(t, t)
δ(t) ≤ lim sup
+lim sup
dt
h
h
h↓0
h↓0
L 1 -a.e. in (a, b).
Proof. Since |δ(s) − δ(t)| ≤ 2|v(s) − v(t)| the function δ is locally absolutely
continuous. We fix a nonnegative function ζ ∈ Cc∞ (a, b) and h > 0 such that
±h + supp ζ ⊂ (a, b). We have then
Z b
Z b
ζ(t + h) − ζ(t)
d(t, t) − d(t − h, t − h)
δ(t)
dt =
ζ(t)
dt
−
h
h
a
a
Z b
Z b
d(t, t) − d(t − h, t)
d(t, t + h) − d(t, t)
=
ζ(t)
dt +
ζ(t + h)
dt,
h
h
a
a
where the last equality follows by adding and subtracting d(t − h, t) and then
making a change of variables in the last integral. Since
h−1 |d(t, t) − d(t − h, t)| ≤ h−1 |v(t) − v(t − h)| → |v 0 (t)| in L1loc (a, b) as h ↓ 0
and an analogous inequality holds for the other difference quotient, we can
apply (an extended version of) Fatou’s Lemma and pass to the upper limit in
the integrals as h ↓ 0 (recall that Fatou’s lemma with the limsup holds even for
sequences bounded above by a sequence strongly convergent in L1 ); denoting
byR α and β Rthe two upper derivatives in the statement of the Lemma we get
− δζ 0 dt ≤ (α + β)ζ dt, whence the inequality between distributions follows.
5.2
Main properties of Gradient Flows
In this section we collect the main properties of the gradient flow generated by
a functional φ : P2 (Rd ) → (−∞, +∞] satisfying the assumptions (5.1a,b,c).
Theorem 5.7 (Main properties of gradient flows) Let us suppose that φ :
P2 (Rd ) → (−∞, +∞] satisfies (5.1a,b) and let us suppose that its gradient
flow µt (according to the E.V.I. formulation (5.7)) exists for every initial value
µ0 ∈ D, D being a dense subset of D(φ).
λ-contractive semigroup. For every µ0 ∈ D(φ) there exists a unique solution
µ := S[µ0 ] of the Cauchy problem associated to (5.3) with limt↓0 µt = µ0 .
The map µ0 7→ St [µ0 ] is a λ-contracting semigroup on D(φ), i.e.
W2 (S[µ0 ](t), S[ν0 ](t)) ≤ e−λt W2 (µ0 , ν0 )
75
∀ µ0 , ν0 ∈ D(φ).
(5.17)
Regularizing effect. St maps D(φ) into D(∂φ) ⊂ D(φ) for every t > 0,
the map t 7→ eλt |∂φ|(µt ) is non increasing,
(5.18)
and each solution µt = St [µ0 ] satisfies the following regularization estimates:

1 2


if λ ≥ 0
 φ(µt ) ≤ W2 (µ0 , σ) + φ(σ)
2t
(5.19)
λ

2

W
(µ
,
σ)
+
φ(σ)
if
λ
=
6
0
 φ(µt ) ≤
0
2
2(eλt − 1)
−
1
e−2λ t |∂φ|2 (µt ) ≤ |∂φ|2 (σ) + 2 W22 (µ0 , σ)
t
Z
(5.20)
λ
λ t 2
− W22 (µt , σ) − 2
W2 (µs , σ) ds
2t
t 0
for every σ ∈ D(∂φ).
Energy identity. If v t ∈ Tanµt P2 (Rd ) is the tangent velocity field of a gradient flow µt = St [µ0 ], then the energy identity hlods:
Z
a
b
Z
|v t (x)|2 dµt (x) dt + φ(µb ) = φ(µa )
∀ 0 ≤ a < b < +∞. (5.21)
Rd
Asymptotic behaviour. If λ > 0, then φ admits a unique minimum point µ
and for t ≥ t0 we have
λ 2
1
W (µt , µ̄) ≤ φ(µt ) − φ(µ̄) ≤
|∂φ|2 (µt ) ∀ t ≥ 0,
2 2
2λ
W2 (µt , µ) ≤ W2 (µt0 , µ)e−λ(t−t0 ) ,
φ(µt ) − φ(µ) ≤ φ(µt0 ) − φ(µ) e−2λ(t−t0 ) ,
|∂φ|(µt ) ≤ |∂φ|(µt0 )e−λ(t−t0 ) .
(5.22a)
(5.22b)
(5.22c)
(5.22d)
If λ = 0 and µ is any minimum point of φ then we have
W2 (µ0 , µ)
W 2 (µ0 , µ)
, φ(µt ) − φ(µ) ≤ 2
,
t
2t
the map t 7→ W2 (µt , µ) is not increasing.
|∂φ|(µt ) ≤
(5.23)
Right and left limits, precise pointwise formulation of the equation.
If moreover φ satisfies (5.1c), then for every t > 0 the right limit
µ
v t+ := lim
h↓0
tµt+h
−i
t
h
exists in L2 (µt ; Rd )
(5.24)
and satisfies
−v t+ = ∂ ◦ φ(µt )
76
∀ t > 0,
(5.25)
d
φ(µt ) = −
dt+
Z
|v t+ |2 dµt = −|∂φ|2 (µt )
∀ t > 0.
(5.26)
Rd
(5.24), (5.25), and (5.26) hold at t = 0 iff µ0 ∈ D(∂φ) = D(|∂φ|). Moreover, there exists an at most countable set C ⊂ (0, +∞) such that the
analogous identities for the left limits hold for every t ∈ (0, +∞) \ C:

µ
t t−h − i

 v t− = lim µt
= −∂ ◦ φ(µt ),

h↓0
h
‘ ∀ t ∈ (0, +∞) \ C.
(5.27)

d

2

φ(µt ) = −|∂φ| (µt ).
dt−
Proof.
Regularizing effect. We first observe that for every h > 0 the map t 7→ µt+h
is still a gradient flow, and therefore estimate (5.16) yields
W2 (µt+h , µt ) ≤ e−λ(t−t0 ) W2 (µt0 +h , µt0 ) ∀ 0 ≤ t0 < t < +∞.
(5.28)
Setting
δ(t) := lim sup
h↓0
W2 (µt+h , µt )
h
t ≥ 0,
(5.29)
(5.28) yields
the map t 7→ eλt δ(t)
is nonincreasing.
(5.30)
We denote by N the subset of (0, +∞) whose points t0 satisfies µt0 ∈ D(∂φ) ⊂
P2a (Rd ), the metric derivative of µt coincides with kv t kL2 (µt ;Rd ) and −v t0 =
∂ ◦ φ(µt0 ): by the definition of gradient flow, Theorem 2.17, and point v) of
Theorem 5.3, L 1 (0, +∞) \ N = 0 and
δ(t) = kv t kL2 (µt0 ;Rd ) = |∂φ|(µt ) < +∞
∀ t ∈ (0, +∞) \ N ;
(5.31)
in particular, (5.30) yields δ(t) < +∞ for every t > 0.
We want to show now that
δ(t) = |∂φ|(µt ) ∀ t ≥ 0.
(5.32)
Integrating the E.V.I. (5.7) in the interval (t, t + h) and dividing by h we get
for every σ ∈ D(φ)
1
h
Z
0
h
λ 2
W2 (µt+s , σ) ds − φ(σ)
2
1 2
1 2
≤ W2 (µt , σ) −
W (µt+h , σ)
2h
2h 2
W2 (µt+h , µt ) ≤
W2 (µt , σ) + W2 (µt+h , σ) .
2h
φ(µt+s ) +
77
(5.33)
Passing to the limit as h ↓ 0 and recalling that the map t 7→ φ(µt ) is (absolutely)
continuous, we obtain
φ(µt ) − φ(σ) +
λ 2
W (µt , σ) ≤ δ(t)W2 (µt , σ),
2 2
(5.34)
which yields
|∂φ|(µt ) ≤ δ(t) ∀ t ≥ 0.
(5.35)
Choosing σ := µt in (5.33), and rescaling the integrand, we can use ?? to obtain
Z
1
1 1
λ 2
2
W
(µ
,
µ
)
≤
φ(µ
)
−
φ(µ
)
−
W
(µ
,
µ
)
ds
t+h
t
t
t+hs
t+hs
t
2
2
2h2
h 0
2
Z 1
Z 1 2
W2 (µt+hs , µt )
W2 (µt+hs , µt )
≤ |∂φ|(µt )
s ds − λ
ds.
hs
h
0
0
Passing to the limit as h ↓ 0 we obtain
Z 1
1
1 2
δ (t) ≤ |∂φ|(µt )
δ(t)s ds = |∂φ|(µt )δ(t),
2
2
0
(5.36)
which yields (5.32) and in particular (5.18).
The estimates (5.19) follow easily by integrating in the interval (0, t) the
following form of (5.7)
d eλs 2
W (µs , σ) + eλs φ(µs ) ≤ eλs φ(σ)
ds 2 2
and recalling that t 7→ φ(µt ) is nonincreasing; when λ 6= 0 we get
Z t
Z t
d eλs 2
eλt − 1
φ(µt ) ≤
W2 (µs , σ) ds +
eλs φ(σ) ds
λ
0 ds 2
0
1
eλt − 1
≤ W22 (µ0 , σ) +
φ(σ).
2
λ
(5.37)
d
φ(µt ) = |∂φ|2 (µt ) and
In order to show (5.20) we apply (5.18), the fact that − dt
finally the E.V.I. to obtain
Z t
Z t
−
0
e−2λ t t2
2
−2λ− s
2
|∂φ| (µt ) ≤
se
|∂φ| (µs ) ds ≤ −
s φ(µs ) ds
2
0
0
Z t
=
φ(µs ) ds − tφ(µt )
0
Z
1
1
λ t 2
≤ t(φ(σ) − φ(µt )) + W22 (µ0 , σ) − W22 (µt , σ) −
W2 (µs , σ) ds.
2
2
2 0
If σ ∈ D(∂φ), using ?? we can bound the right hand side by
Z
1
1 2
λ t 2
2
t|∂φ|(σ)W2 (µt , σ) − (tλ + 1)W2 (µt , σ) + W2 (µ0 , σ) −
W2 (µs , σ) ds
2
2
2 0
Z
t2
tλ
1
λ t 2
W2 (µs , σ) ds,
≤ |∂φ|2 (σ) − W22 (µt , σ) + W22 (µ0 , σ) −
2
2
2
2 0
which yields (5.20).
78
λ-contractive semigroup. Thanks to the λ-contraction estimate of Theorem 5.5, it is now easy to extend the semigroup S defined on D to its closure,
which coincides with D(φ). Observe that each trajectory µt of the extended
semigroup still satisfies the E.V.I. formulation (5.7); moreover, the previous
regularization estimates show that t 7→ µt is locally Lipschitz and µt ∈ D(|∂φ|)
for every t > 0, in particular µt ∈ P2a (Rd ) for every t > 0. Theorem 5.3 then
shows that µt is a gradient flow for φ.
Energy identity. It is an immediate consequence of (5.9).
Asymptotic behaviour. When λ > 0 (5.28) shows that for every gradient
flow µt the sequence k 7→ µk satisfies the Cauchy condition in P2 (Rd ), since
W2 (µk+1 , µk ) ≤ e−λ W2 (µk , µk−1 ).
(5.38)
Therefore it is convergent to some limit µ̄; (5.19) and the lower semicontinuity of
φ show that µ̄ is a minimum point for φ; in particular, the constant curve t 7→ µ̄
is a gradient flow. (5.22b) is a particular case of the λ-contraction property
(5.17) and in particular it shows that the minimum point µ̄ is unique, when
λ > 0.
The inequality (5.22d) is simply (5.30), while (5.22a) is a general property
of λ-geodesically convex functions (even in metric spaces, see Theorem 2.4.14 of
[8]): for, if µ ∈ D(∂φ), property ?? of the slope and Young inequality yield
1
λ 2
W (µ, µ̄) ≤
|∂φ|2 (µ).
2 2
2λ
For the opposite inequality, being 0 ∈ ∂φ(µ̄), from (4.36) we easily get
φ(µ) − φ(µ̄) ≤ |∂φ|(µ)W2 (µ, µ̄) −
λ 2
W (µ, µ̄).
2 2
The estimate (5.22c) now follows by observing that (5.39) yields
φ(µ) − φ(µ̄) ≥
d
φ(µt ) − φ(µ̄) = −|∂φ|2 (µt ) ≤ −2λ φ(µt ) − φ(µ̄) .
dt
(5.39)
(5.40)
(5.41)
Right and left limits, precise pointwise formulation of the equation.
Here, for the sake of simplicity, we are assuming that λ ≥ 0.
We already know that ∂φ(µt ) is not empty for t > 0: we set ξ t = ∂ ◦ φ(µt );
since the slope |∂φ| is lower semicontinuous ?? and the map t 7→ |∂φ|(µt ) is
nonincreasing, we obtain
|∂φ|(µt ) = lim |∂φ|(µt+h ).
h↓0
(5.42)
Moreover, the map t 7→ φ(µt ) is absolutely continuous, nonincreasing, and its
time derivative coincides L 1 -a.e. with the nondecreasing map −|∂φ|2 (µt ); it
follows that t 7→ φ(µt ) is continuous and convex, so that
∃
φ(µt+h ) − φ(µt )
d
φ(µt ) = lim
= −|∂φ|2 (µt ) = −δ 2 (t) ∀ t > 0.
h↓0
dt+
h
79
(5.43)
Let now fix t > 0 and an infinitesimal sequence hn such that
µ
tµtt+hn − i
* ṽ t
hn
weakly in L2 (µt ; Rd ).
(5.44)
By the definition of subdifferential, it is immediate to check that
Z
d
−|∂φ|2 (µt ) = −kξ t k2L2 (µt ;Rd ) =
φ(µt ) ≥
hξ t , ṽ t i dµt .
dt+
Rd
(5.45)
On the other hand
kṽ t kL2 (µt ;Rd ) ≤ δ(t) = kξ t kL2 (µt ;Rd ) .
(5.46)
It follows that ṽ t = −ξ t ; since the limit is uniquely determined independently
of the subsequence hn , we obtain that
µ
lim
h↓0
tµtt+h − i
= −ξ t
h
weakly in L2 (µt ; Rd ).
(5.47)
On the other hand
µt+h
tµt − i W2 (µt , µt+h )
= δ(t) = kξ t kL2 (µt ;Rd )
lim sup = lim sup
h
h
h↓0
h↓0
L2 (µt ;Rd )
and therefore the limit in (5.47) is also strong in L2 (µt ; Rd ).
The same argument can be applied for the left limit at each continuity point
of the map t 7→ |∂φ|(µt ) (whose complement C in (0, +∞) is at most countable)
i.e. for every t such that
lim |∂φ|(µt−h ) = |∂φ|(µt ),
(5.48)
d
φ(µt ) == −|∂φ|2 (µt )
dt
(5.49)
h↓0
observing that in this case
∃
and (by the L 1 -a.e. equality of |µt |0 and |∂φ(µt )| and the monotonicity of
eλt |∂φ(µt )|)
Z
Z
W2 (µt−h , µt )
1 t
1 t
=
|µ0s | ds =
|∂φ|(µs ) ds
h
h t−h
h t−h
≤
1 − e−λh
|∂φ|(µt−h ),
λh
and therefore
µt−h
tµt − i W2 (µt−h , µt )
lim sup ≤ |∂φ|(µt )
= lim sup
2
h
h
d
h↓0
h↓0
L (µt ;R )
Anche qui ho inserito
una correzione dipendente da lambda
if t ∈ (0, +∞)\C.
(5.50)
80
5.3
Existence of Gradient Flows by convergence of the
“Minimizing Movement” scheme
The existence of solutions to the Cauchy problem for (5.3) will be obtained
as limit of a variational approximation scheme (the “Minimizing Movement”
scheme, in De Giorgi’s terminology [25], which we will briefly recall.
The variational approximation scheme Let us introduce a uniform partition Pτ of (0, +∞) by intervals Iτn of size τ > 0
Pτ := 0 < t1τ = τ < t2τ = 2τ < · · · < tnτ = nτ < · · · , Iτn := ((n − 1)τ, nτ ],
and a given family of “discrete” values Mτ0 approximating the initial value µ0 ∈
D(φ) so that
in P2 (Rd ),
Mτ0 → µ0
φ(Mτ0 ) → φ(µ0 )
as τ ↓ 0.
(5.51)
If (5.1c) and (5.1d) are satisfied, for every τ ∈ (0, τ∗ ) we can find sequences
(Mτn )n∈N ⊂ P2a (Rd ) recursively defined by solving the variational problem
Mτn
minimizes
µ 7→ Φ(τ, Mτn−1 ; µ) =
1 2
W (µ, Mτn−1 ) + φ(µ).
2τ 2
(5.52)
We call “discrete solution” the piecewise constant interpolant
M τ (t) :=Mτn
if t ∈ ((n − 1)τ, nτ ],
(5.53)
and we say that a curve µt is a Minimizing Movement of Φ starting from µ0 ,
writing µt ∈ M M (Φ; µ0 ), if there exists a family of discrete solutions M τ such
that
M τ (t) → µt in P2 (Rd ) for every t > 0, as τ ↓ 0.
(5.54)
In order to clarify why this variational scheme provides an approximation of
the Gradient Flow equation (5.3), we introduce the optimal transport maps
M n−1
tnτ = tMττn pushing Mτn to Mτn−1 , and we define the discrete velocity vector
Vτn as (i − tnτ )/τ . By Lemma 4.4 and Theorem 4.19
−Vτn =
tnτ − i
∈ ∂φ(Mτn ),
τ
(5.55)
which can be considered as an Euler implicit discretization of (5.3). By introducing the piecewise constant interpolant
Vτ (t) := Vτn
if t ∈ ((n − 1)τ, nτ ],
(5.56)
the identity (5.55) reads
−Vτ (t) ∈ ∂φ(M τ (t))
81
for t > 0.
(5.57)
By general compactness argument it would not be difficult to show that, up to
subsequences, Vτ M τ * vµ in the distribution sense in Rd × (0, +∞), for some
vector field v(t, x) = v t (x) satisfying
in Rd × (0, +∞),
∂t µt + ∇ · (v t µt ) = 0
kv t kL2 (µt ;Rd ) ∈ L2loc (0, +∞). (5.58)
The main difficulty is to show that the nonlinear equation (5.57) is preserved in
the limit.
Theorem 5.8 (Existence and approximation of Gradient Flows) Let us
assume that φ : P2 (Rd ) → (−∞, +∞] satisfy (5.1a,b,c) and at least one of the
following two conditions
i) (Coercivity) For every C > 0 the sublevels
n
o
µ ∈ P2 (Rd ) : φ(µ) ≤ C, m2 (µ) ≤ C
are compact in P2 (Rd ). (5.59)
ii) (Generalized convexity) For every µ ∈ D(|∂φ|) and σ0 , σ1 ∈ D(φ)

 s 7→ φ σs − λ W 2 (σ0 , σ1 )s2
2 2
is convex in [0, 1].
the map
 σ := (1 − s)tσ0 + stσ1 µ
s
µ
µ
(5.60)
#
Then for every µ0 ∈ D(φ) there exists a unique solution µt of the gradient flow
(according to Definition 5.2) satisfying the Cauchy condition
lim µt = µ0
t↓0
in P2 (Rd ).
(5.61)
Moreover, for every choice of the discrete initial values Mτ0 satisfying (5.51), the
discrete solutions M τ (t) converge to µt in P2 (Rd ), uniformly in each bounded
time interval.
Finally, if condition ii) holds with λ ≥ 0 and Mτ0 = µ0 ∈ D(φ), for every
t = kτ ∈ Pτ we have the apriori error estimate
τ2
W22 (µt , M τ (t)) ≤ τ φ(µ0 ) − φτ (µ0 ) ≤
|∂φ|2 (µ0 ),
2
(5.62)
where we set
φτ (µ) :=
inf
ν∈P2
(Rd )
φ(ν) +
1 2
W (µ, ν) =
inf Φ(τ, µ; ν).
2τ 2
ν∈P2 (Rd )
(5.63)
Proof. We shall only give a brief sketch of the proof (showing a rough error
estimate, still sufficient to prove convergence) in a rather simplified setting, by
assuming that the generalized convexity assumption (5.60) holds for λ ≥ 0, φ is
nonnegative, and µ0 , Mτ0 ∈ D(φ).
82
Aggiungere
qualcosa
sul caso coercivo ?
As a preliminary remark, let us observe that if σs is defined as in (5.60) we
have
Z 2
W22 (µ, σs ) =
(1 − s)tσµ0 + stσµ1 − i dµ
Rd
Z 2
2 σ1
σ0
σ0
σ1 =
(1 − s)tµ − i + stµ − i − s(1 − s)tµ − tµ dµ
Rd
Z 2
σ0
2
2
= (1 − s)W2 (µ, σ0 ) + sW2 (µ, σ1 ) − s(1 − s)
tµ − tσµ1 dµ
≤ (1 −
s)W22 (µ, σ0 )
+
sW22 (µ, σ1 )
− s(1 −
Rd
s)W22 (σ0 , σ1 ).
(5.64)
This inequality reflects a nice convexity property of the functional Φ defined in
(5.52) and provides the starting point of our estimates.
A “metric variational inequality” for Mτn . The first step consists in writing a variational inequality for the discrete solution, analogous to (5.7): here we
will use in a crucial way (5.60) and (5.64). In fact, it is easy to see that they
yield the following strong convexity property for the functionals s 7→ Φ(τ, µ; σs )
Φ(τ, µ; σs ) ≤ (1 − s)Φ(τ, µ; σ0 ) + sΦ(τ, µ; σ1 ) −
1
s(1 − s)W22 (σ0 , σ1 ). (5.65)
2τ
Starting from the minimum property (5.52) and applying (5.65) with µ :=
Mτn−1 , σ0 := Mτn , σ := σ1 ∈ D(φ), we get
Φ(τ, Mτn−1 ; Mτn ) ≤ Φ(τ, Mτn−1 ; σs )
≤(1 − s)Φ(τ, Mτn−1 ; Mτn ) + sΦ(τ, Mτn−1 ; σ) −
1
s(1 − s)W22 (Mτn , σ).
2τ
The minimum condition says that the right derivative at s = 0 of the right hand
side is nonnegative; thus we find
Φ(τ, Mτn−1 ; σ) − Φ(τ, Mτn−1 ; Mτn ) −
1 2
W (Mτn , σ) ≥ 0
2τ 2
∀ σ ∈ D(φ),
which can also be written as
11 2
1
W2 (Mτn , σ) − W22 (Mτn−1 , σ) ≤ φ(σ) − φ(Mτn )
τ 2
2
1 2
W (Mτn , Mτn−1 ).
−
2τ 2
(5.66)
(5.67)
A continuous formulation of (5.67). We want to write (5.67) as a true
differential evolution inequality for the discrete solution M τ , in order to compare
two discrete solutions corresponding to different time steps τ, η > 0, and to try
to reproduce the same comparison argument which we used in Theorem 5.5.
Therefore, we set
φτ (t) := “the linear interpolant of φ(Mτn−1 ) and φ(Mτn )” if t ∈ (tn−1
, tnτ ],
τ
83
i.e.
φτ (t) :=
tnτ − t
t − tn−1
τ
φ(Mτn−1 ) +
φ(Mτn )
τ
τ
t ∈ (tn−1
, tnτ ].
τ
(5.68)
Analogously, for any σ ∈ D(φ) we set
Wτ2 (t; σ) :=
t − tn−1
tnτ − t 2
τ
W2 (Mτn−1 , σ) +
W22 (Mτn , σ)
τ
τ
t ∈ (tn−1
, tnτ ].
τ
(5.69)
Since
d 2
1 2
W2 (Mτn , σ) − W22 (Mτn−1 , σ)
Wτ (t; σ) =
dt
τ
t ∈ (tn−1
, tnτ ],
τ
neglecting the last negative term, (5.67) becomes
d 1 2
1
Wτ (t; σ) ≤ φ(σ) − φτ (t) + Rτ (t)
dt 2
2
∀ t ∈ (0, T ) \ Pτ ,
(5.70)
where we set, for t ∈ (tn−1
, tnτ ],
τ
1
tn − t Rτ (t) := φτ (t) − φ(Mτn ) = τ
φ(Mτn−1 ) − φ(Mτn ) ≥ 0.
2
τ
(5.71)
The comparison argument. We consider now another time step η > 0
inducing the partition Pη , a corresponding discrete solution (Mηk ), and the
piecewise linear interpolating functions
2
Wτ,η
(t, s) :=
s − tk−1
tkη − s 2
η
Wτ (t, Mηk ) +
Wτ2 (t, Mηk−1 ) s ∈ (tk−1
, tkη ], (5.72)
η
η
η
observing that
Forse qui ci vorrebbe
una spiegazione in piu’
2
2
Wτ,η
(t, s) = Wη,τ
(s, t) ∀ s, t ≥ 0,
2
Wτ,η
(tnτ , skη ) = W22 (Mτn , Mηk ).
(5.73)
Taking a convex combination w.r.t. the variable s ∈ Iτk of (5.70) written for
σ := Mηk−1 and σ := Mηk , we easily get
∂ 1 2
1
W (t, s) ≤ φη (s) − φτ (t) + Rτ (t)
∂t 2 τ,η
2
t ∈ (0, +∞) \ Pτ , s > 0. (5.74)
Reversing the rôles of η and τ , and recalling (5.73), we also find
∂ 1 2
1
Wτ,η (t, s) ≤ φτ (t) − φη (s) + Rη (s)
∂s 2
2
t > 0, s ∈ (0, +∞) \ Pη . (5.75)
Summing up (5.74) and (5.75) we end up with
∂ 2
∂ 2
Wτ,η (t, s)+ Wτ,η
(s, t) ≤ Rτ (t)+Rη (s)
∂t
∂s
84
t ∈ (0, +∞)\Pτ , s ∈ (0, +∞)\Pη .
(5.76)
Choosing s = t we eventually find
d 1 2
W (t, t) ≤ Rτ (t) + Rη (t)
dt 2 τ,η
t ∈ (0, ∞) \ (Pτ ∪ Pη ),
(5.77)
2
and therefore, being t 7→ Wτ,η
(t, t) continuous,
2
2
Wτ,η
(T, T ) ≤ Wτ,η
(0, 0) +
1
2
Z
T
Rτ (t) + Rη (t) dt
∀ T > 0.
(5.78)
0
Observe now that
Z
+∞
Rτ (t) dt =
0
+∞ Z
X
j=1
tjτ
tj−1
τ
Rτ (t) dt =
+∞
X
τ φ(Mτj−1 ) − φ(Mτj )
j=1
(5.79)
≤ τ φ(Mτ0 ),
so that (5.78) yields
2
Wτ,η
(T, T ) ≤ W22 (Mτ0 , Mη0 ) +
τ
η
φ(Mτ0 ) + φ(Mη0 )
2
2
∀ T > 0.
(5.80)
Convergence and rough error estimates. Recalling that
W22 (Mτn , Mτn−1 ) ≤ τ φ(Mτ0 ),
W22 (Mηk , Mηk−1 ) ≤ ηφ(Mη0 ),
and
2
W22 (M τ (t), M η (t)) ≤ 3 Wτ,η
(t, t)+W22 (Mτn , Mτn−1 )+W22 (Mηk , Mηk−1 )
t ∈ Iτn ∩Iηk ,
we get
sup W22 (M τ (t), M η (t)) ≤ 3 W22 (Mτ0 , Mη0 ) + 2τ φ(Mτ0 ) + 2ηφ(Mη0 ) ,
(5.81)
t≥0
thus showing that τ 7→ M τ (t) is a Cauchy sequence in P2 (Rd ) for every t ≥ 0.
Denoting by µt its limit, we can pass to the limit in (5.80) as η ↓ 0 by taking τ
fixed and choosing t ∈ Pτ , thus obtaining the error estimate
sup W22 (M τ (t), µt ) ≤ W22 (Mτ0 , µ0 ) + τ φ(Mτ0 ).
(5.82)
t∈Pτ
µt is the gradient flow. To this aim, it suffices to check that µt satisfies the
metric Evolution Variational Inequality (5.7) with λ = 0 for every σ ∈ D(φ).
Starting from the integrated form of (5.70) and recalling (5.79), we get for every
0 < a < b < +∞
Z b
1
1 2
Wτ (b, σ) − Wτ2 (a, σ) +
φτ (t) dt ≤ (b − a)φ(σ) + τ φ(Mτ0 ).
(5.83)
2
2
a
85
Since
lim Wτ2 (t, σ) = W22 (µt , σ),
τ ↓0
lim inf φτ (t) ≥ φ(µt ),
τ ↓0
lim φ(Mτ0 ) = φ(µ0 ) < +∞,
τ ↓0
we easily get
1
1 2
W (µb , σ) − W22 (µa , σ) +
2 2
2
Z
b
φ(µt ) dt ≤ (b − a)φ(σ)
∀ σ ∈ D(φ), (5.84)
a
which yields (5.7). The regularization estimates of Theorem 5.7 (which depend
only on the metric E.V.I. formulation) show then that µt ∈ P2a (Rd ) for t > 0.
5.4
Bibliographical notes
There are at least four possible approaches to gradient flows which can be
adapted to the framework of Wasserstein spaces:
1. The “Minimizing Movement” approximation. We can simply consider
any limit curve of the variational approximation scheme we introduced in
Section 5.3, a “Generalized minimizing movement” in the terminology suggested by E. De Giorgi in [25]. In the context of P2 (Rd ) this procedure
has been first used in [45, 54, 55, 53, 56] and subsequently it has been
applied in many different contexts, e.g. by [44, 52, 57, 38, 39, 42, 35, 18,
19, 1, 40, 32, ?]. It has the advantage to allow for the greatest generality
of functionals φ, it provides a simple constructive method for proving existence of gradient flows, and it can be applied to arbitrary metric spaces,
in particular to Pp (Rd ), the space of probability measures endowed with
the p-Wasserstein distance.
2. Curves of Maximal Slope. We can look for absolutely continuous curves
2
((0, +∞); P2 (Rd )) which satisfy the differential form of the
µt ∈ ACloc
Energy inequality
d
1
1
φ(µt ) ≤ − |µ0 |2 (t) − |∂φ|2 (µt ) ≤ −|∂φ|(µt ) · |µ0 |(t)
dt
2
2
(5.85)
for L 1 -a.e. t ∈ (0, +∞). This definition has been introduced (in a slight
different form) in [26] and further developed in [27, 49, ?], it is still purely
metric and it provides a general strategy to deduce differential properties
satisfied by the limit curves of the Minimizing Movement scheme.
3. The pointwise differential formulation. It is the notion we adopted in
Definition 5.2 and which requires the richest structure: since we have at
our disposal a notion of tangent space and the related concepts of velocity
vector field v t and (sub)differential ∂φ(µt ), we can reproduce the simple
definition of gradient flow modeled on smooth Riemannian manifold, i.e.
v t ∈ −∂φ(µt ).
86
(5.86)
Questa sezione e’ da
completare, anche inserendo parte del materiale finale.
The a priori assumption that µt ∈ P2a (Rd ) avoids subtle technical complications arising from the introduction of “plan” (or measure valued)
subdifferentials instead of the simpler vector fields. The general theory,
also covering the case of an underlying Hilbert space of infinite dimension,
has been presented in [?]. Similar approach has been developed in [?].
4. Systems of Evolution Variational Inequalities (E.V.I.). In the case of
λ-convex functionals along geodesics in P2 (Rd ), one can try to find solutions of the family of “metric” variational inequalities
1 d 2
λ
W (µt , ν) ≤ φ(ν) − φ(µt ) − W22 (µt , ν) ∀ ν ∈ D(φ).
2 dt 2
2
(5.87)
This formulation provides the best kind of solutions, for which in particular one can prove not only uniqueness, but also error estimates and the
regularization effect, follwoing the same argument of Theorem 5.7.
In any case, the variational approximation scheme is the basic tool for proving
existence of gradient flows: at the highest level of generality, when the functional
φ does not satisfy any convexity or regularity assumption, one can only hope
to prove the existence of a limit curve which will satisfy a sort of “relaxed”
differential equation: this point of view is developed in [8, Sec. 11.1.3].
When φ satisfies λ-convexity/compactness condition Theorems 5.3 and 5.8
showed that all these notions essentially coincide.
We shall see three different kinds of arguments which give some
insights for this problem and reflect different properties of the functional.
1. The first one is a direct development of the compactness method:
passing to the limit in the discrete equation satisfied at each
step by the approximating sequence Mτn , one tries to write a
relaxed form of the limit differential equation, assuming only
narrow convergences of weak type. It may happen that under
suitable closure and convexity assumptions on the sections of
the subdifferential, which should be checked in each particular
situation, this relaxed version coincides with the stronger one,
and therefore one gets an effective solution to (5.3).
In general, however, even under some simplifying assumptions
(p = 2, all the measures are regular), the results of this direct
approach are not completely satisfactory: it could be considered
as a first basic step, which should be common to each attempt
to apply the Wasserstein formalism for studying a gradient flow.
In order to clarify the basic arguments of this preliminary strategy, we will try to explain it at the end of this section (in a
simplified setting, to keep the presentation easier) without invoking all the abstract results of the first part of this book.
87
Materiale
temare
da
risis-
2. The second approach (see Section ??) involves the regularity of
the functional according to Definition ??, and works for every
p > 1; in particular it can applied to λ-convex functionals.
In this case, thanks to Theorem ??, the gradient flow equation
is equivalent to the maximal slope condition, which is of purely
metric nature. We can then apply the abstract theory we presented in Chapter 2 and therefore we can prove that any limit
curve µ of (5.56) is a solution to (5.3).
The key ingredient, which allows to pass to the limit in the
“doubly nonlinear” differential inclusion (without any restrictions on the regularity of µt ) and to gain a better insight on
the limit than the previous simpler method, is the refined discrete energy estimate (??) (related to De Giorgi’s variational
interpolation (??)) and the lower semicontinuity of the slope,
which follows from the regularity of the functional.
3. The last approach, presented in Section 5.2, is based on the general estimates of Theorem ?? of Chapter 4: it can be performed
only in P2 (Rd ) and imposes on the functionals the strongest
condition of λ-convexity along generalized geodesics.
Despite the strong convexity requirements on φ, which are nevertheless satisfied by all the examples of Section 4.5 in P2 (Rd ),
this approach has many nice features:
• it does not require compactness assumptions of the sublevels of φ in P2 (Rd ): the convergence of the “Minimizing
movement” scheme is proved by a Cauchy-type estimate.
• The gradient flow equation (5.3) is satisfied in the limit,
since Theorem ?? provides directly the system of evolution
variational inequalities (5.7).
• It provides our strongest results in terms of regularity,
asymptotic behaviour, and error estimates for the continuous solution, which can be directly derived from the general
metric setting.
• It allows for general initial data µ0 which belong to the
closure of the domain of φ: in particular, one can often
directly consider a sort of “nonlinear fundamental solution”
for initial values which are concentrated in one point.
• The Γ-convergence of functionals, in the sense of Lemma
??, induces the uniform convergence of the corresponding
gradient flows.
88
6
Applications to Evolution PDE’s
First of all we mention the basic (but formal, at this level) example, which
provides one of the main motivations to study this kind of gradient flows.
6.1
Gradient flows and evolutionary PDE’s of diffusion
type
In the space-time open cylinder Rd × (0, +∞) we look for nonnegative solutions
u : Rd × (0, +∞) of a parabolic equation of the type
δF =0
∂t u − ∇ · u∇
δu
in Rd × (0, +∞),
(6.1)
where
δF (u)
= −∇ · Fp (x, u, ∇u) + Fz (x, u, ∇u).
δu
is the first variation of a typical integral functional as in (4.59a,b).
Z
F (u) =
F (x, u(x), ∇u(x)) dx
(6.2)
(6.3)
Rd
associated to a (smooth) Lagrangian F = F (x, z, p) : Rd × [0, +∞) × Rd → R.
Observe that (6.1) has the following structure:
∂t u + ∇ · (uv) = 0
uv = u∇ψ
ψ=−
δF (u)
δu
(continuity equation),
(6.4a)
(gradient condition),
(6.4b)
(nonlinear relation).
(6.4c)
Observe that in the case when F depends only on z = u then we have
δF (u)
= Fz (u),
δu
u∇Fz (x, u) = ∇LF (u),
LF (z) := zF 0 (z) − F (z).
(6.5)
Since we look for nonnegative solutions having (constant, by (6.4a), normalized)
finite mass
Z
u(x, t) ≥ 0,
u(x, t) dx = 1 ∀t ≥ 0,
(6.6)
Rd
and finite quadratic momentum
Z
|x|2 u(x, t) dx < +∞
∀ t ≥ 0,
(6.7)
Rd
recalling Example 4.5.1, we can
identify u with the measures µt := u(·, t) · L d ,
89
(6.8)
and we consider F as a functional defined in P2 (Rd ). Then any smooth positive
function u is a solution of the system (6.4a,b,c) if and only if µ is a solution in
P2 (Rd ) of the Gradient Flow equation (5.3) for the functional F .
Observe that (6.4a) coincides with (5.4b), the gradient constraint (6.4b)
corresponds to the tangent condition v t ∈ Tanµt P2 (Rd ) of (5.4c), and the
nonlinear coupling ψ = −δF (u)/δu is equivalent to the differential inclusion
v t ∈ −∂F (µt ) of (5.4c).
At this level of generality the equivalence between the system (6.4a,b,c) and
the evolution equation (5.3) is known only for smooth solution (which, by the
way, may not exist); nevertheless, the point of view of gradient flow in the
Wasserstein spaces, which was introduced by F. Otto in a series of pioneering
and enlightening papers [54, 45, 56, 57], still presents some interesting features,
whose role should be discussed in each concrete case.
6.1.1
Changing the reference measure
In many situations the choice of the Lebesgue measure L d as a reference measure, thus inducing the identification (6.8), looks quite natural; nevertheless
there are some interesting cases where a different measure γ plays a crucial role
(see e.g. the example of Section 4.5.4 and the next Section 6.3) and it may happen that an evolution PDE takes a simpler form by an appropriate choice of
γ.
From the Wasserstein point of view, an integral functional φ inducing the
gradient flow is defined on measures µ but its explicit form depends on the
reference γ, so that different PDE’s involving the density of µ w.r.t. γ could
arise from the same functional.
Let us suppose, e.g., that φ takes the integral form
Z
φ(µ) = Fγ (ρ) =
F̃ (x, ρ(x), ∇ρ(x)) dγ(x) if µ = ργ,
(6.9)
Rd
where γ is a Radon measure induced by the (smooth) potential V , i.e.
γ := e−V L d ∈ P2 (Rd ).
(6.10)
Since
dµ
= eV ρ, ∇u = eV (ρ∇V + ∇ρ),
(6.11)
dL d
the integrand F̃ (x, z̃, p̃) of (6.9) is related to the integrand F of the representation (6.3) by the relation
u=
z̃ := eV (x) z,
p̃ = eV (x) (z∇V (x) + p)
(6.12)
F (x, z, p) = e−V F̃ (x, z̃, p̃) = e−V (x) F̃ x, eV (x) z, eV (x) (z∇V (x) + p) .
In this case it could be better to write the solution of the gradient flow µt
generated by φ in terms of the density
ρt :=
dµt
dµt
= eV
,
dγ
dL d
90
(6.13)
and to use the differential operators associated with γ
∇γ ρ :=eV ∇ e−V ρ = ∇ρ − ρ∇V,
V
−V
∇γ · ξ :=e ∇ · (e
(6.14a)
ξ) = ∇ · ξ − ξ · ∇V,
(6.14b)
which satisfy the “integration by parts formulae” with respect to the measure
γ
Z
Z
Z
Z
ξ · ∇ζ dγ = −
ζ∇γ · ξ dγ,
∇γ · ξ ζ dγ = −
∇γ ζ · ξ dγ, (6.15)
Rd
Rd
Rd
Rd
when ζ ∈ Cc∞ (Rd ), ξ ∈ Cc∞ (Rd ; Rd ). The system (6.4a,b,c) preserves the same
structure and takes the form
∂t ρ + ∇γ · (ρv) = 0
ρv = ρ∇ψ
ψ=−
where
δFγ (ρ)
δρ
(continuity equation),
(6.16a)
(gradient condition),
(6.16b)
(nonlinear relation).
(6.16c)
δFγ (ρ)
= −∇γ · F̃p̃ (x, ρ, ∇ρ) + F̃z̃ (x, ρ, ∇ρ).
δρ
(6.17)
For, (6.16a) (resp. (6.16b)) can be transformed into (6.4a) (resp. (6.4b)), simply
by multiplying the equation by e−V and recalling (6.14b). The equivalence of
(6.16c) and (6.4c) follows by a direct computation starting from (6.12): by
(6.11) we get (with the obvious convention to evaluate F in (x, u, ∇u) and F̃ in
(x, ρ, ∇ρ))
δF (u)
= Fz − ∇ · Fp = F̃z̃ + ∇V · F̃p̃ − ∇ · F̃p̃
δu
δFγ (ρ)
= F̃z̃ − ∇γ · F̃p̃ =
.
δρ
Remark 6.1 (Equations in bounded sets and Neumann B.C.) The possibility to change the reference measure is also useful to study evolution equations in a bounded open set Ω ⊂ Rd : they correspond to a measure γ whose
support is included in Ω, e.g.
γ := L d |Ω
Observe that in any case the family of time-dependent measures µt = ut L d |Ω ,
which solves of the gradient flow equation according to Definition 5.2, still satisfies the continuity equation (5.4b) in Rd × (0, +∞). This can be seen as a weak
formulation of the continuity equation for ut in Ω × (0, +∞) with Neumann
boundary conditions on ∂Ω × (0, +∞)
∂t ut + ∇ · (ut v t ) = 0
in Ω × (0, +∞),
91
ut v t · n = 0
on ∂Ω × (0, +∞). (6.18)
For, since q := uv ∈ L1 (Ω × (a, b)) for every bounded interval with 0 < a <
b < +∞, by choosing a test function ζ(x, t) = η(t)ϕ(x) with η ∈ Cc∞ (0, ∞),
ϕ ∈ Cc∞ (Rd ), we obtain
Z Z +∞
Z Z +∞
η(t)q(x, t) dt · ∇ϕ(x) dx = −
η 0 (t)ρ(x, t) dt ϕ(x) dx.
Ω
0
Ω
0
Selecting a family ηε converging to characteristic function of a time interval
Rb
(a, b) b (0, +∞), we easily see that q̃ a,b (x) := a q(x, t) dt satisfies
Z
Z
ρ(x, b) − ρ(x, a) ϕ(x) dx ∀ ϕ ∈ Cc∞ (Rd ), (6.19)
q̃ a,b (x) · ∇ϕ(x) dx =
Ω
Ω
which shows that ∇ · q̃ a,b ∈ L1 (Ω) and ∂n q̃ a,b = 0 on ∂Ω for every 0 < a < b <
+∞.
6.2
The linear transport equation for λ-convex potentials
Let V : Rd → (−∞, +∞] be a proper, l.s.c. and λ-convex potential. We are
looking for curves t 7→ µt ∈ P2 (Rd ) which solve the evolution equation
∂
µt + ∇ · (µt v t ) = 0,
∂t
with −v t (x) ∈ ∂V (x) for µt -a.e. x ∈ Rd ,
(6.20)
which is the gradient flow in P2 (Rd ) of the potential energy functional discussed
in Example 3.4:
Z
V(µ) :=
V (x) dµ(x).
(6.21)
Rd
If V is differentiable, (6.20) can also be written as
∂
µt (x) = ∇ · (µt (x)∇V (x))
∂t
in the distribution sense.
(6.22)
In the statement of the following theorem we denote by T the λ-contractive
semigroup on D(V ) ⊂ Rd induced by the differential inclusion
d
Tt (x) ∈ −∂V (Tt (x)),
dt
T0 (x) = x ∀ x ∈ D(V ).
Recall also that, according to Brezis theorem,
each point of differentiability t > 0.
d
dt Tt (x)
(6.23)
equals −∂ ◦ V (Tt (x)) at
Theorem 6.2 For every µ0 ∈ P2 (Rd ) with supp µ0 ⊂ D(V ) there exists a
unique solution (µt , v) of (6.20) satisfying
Z
lim µt = µ0 ,
|v t (x)|2 dµt (x) ∈ L1loc (0, +∞);
(6.24)
t↓0
Rd
this solution is the gradient flow of V in the sense of the E.V.I. formulation
(6.20) and the Energy Identity (5.9) of Theorem 5.3, it induces a λ-contractive
92
semigroup on µ ∈ P2 (Rd ) : supp(µ) ⊂ D(V ) , it exhibits the regularizing
effect and the asymptotic behaviour as in Theorem 5.7.
Moreover, for every t > 0 we have the representation formula
v t (x) = −∂ ◦ V (x)
µt = (Tt )# µ0 ,
for µt -a.e. x ∈ Rd .
(6.25)
Proof.
Proposition 3.5 shows that the functional V satisfies (5.1a), (5.1b),
(5.1d); it is also easy to check that (5.60) holds. On the other hand, V does not
satisfy (5.1c), thus our simplified existence results can not be directly applied.
Nevertheless, the more general theory of [8] covers also this case and yields the
present result.
In any case, the solution to (6.20) can also be directly constructed by the
representation formula (6.25). For, it is immediate to check directly that if we
choose µ0 of the type
µ0 :=
K
X
K
X
αk ≥ 0,
αk δxk ,
k=1
αk = 1,
xk ∈ D(V ),
(6.26)
k=1
then
µt =
K
X
αk δTt (xk ) = (Tt )# µ0
(6.27)
k=1
solves (6.20) (see also Section 2.6, where the connection between characteristics
and solutions of the continuity equation is studied in detail), whereas (6.24)
follows by the energy identity
Z b
|∂ ◦ V (Tt (x))|2 dt + φ(Tb (x)) = φ(Ta (x)) ∀ x ∈ D(V ).
a
Arguing as in the proof of Theorem 2.23 we also get for every σ ∈ D(V) and
every γ ∈ Γo (µt , σ)
Z
1 d 2
W (µt , σ) =
hv t (x), x − yi dγ t (x, y)
2 dt 2
d
d
ZR ×R λ
V (y) − V (x) − |x − y|2 dγ t (x, y)
≤
2
Rd ×Rd
λ
= V(σ) − V(µt ) − W22 (µt , σ).
(6.28)
2
µt = (Tt )# µ0 thus solves the E.V.I. formulation (5.7) of the gradient flow for
every initial datum µ0 which is a convex combination of Dirac masses in D(V ).
A standard approximation argument via (5.17) and Theorem 5.7 yields the
same result for µt = (Tt )# µ0 and every admissible initial measure µ0 ∈ D(V):
for, being supp µ0 ⊂ D(V ), we can find a sequence (νn ) ⊂ D(V) of convex
combination of Dirac masses
νn :=
Kn
X
αn,k δxn,k ,
αn,k ≥ 0,
k=1
Kn
X
αn,k = 1,
xn,k ∈ D(V ),
(6.29)
k=1
such that νn → µ0 in P2 (Rd ).
93
6.3
6.3.1
Kolomogorov-Fokker-Planck equation
Relative Entropy and Fisher Information
Let us consider
a l.s.c. λ−convex potential V : Rd → (−∞, +∞]
with Ω := Int D(V ) 6= ∅;
(6.30)
for the sake of simplicity, we assume that the reference measure induced by the
potential V is a probability measure with finite quadratic moment, i.e.
γ := e−V L d ∈ P2 (Rd ).
(6.31)
This condition, up to a renormalization, is always satified if, e.g., λ > 0. Observe
that the density e−V of γ with respect to L d is 0 outside Ω = D(V ). We adopt
the convention to write a measure µ ∈ P2a (Rd ) supported in Ω as
µ = uL d |Ω = ργ,
u = e−V ρ;
the Relative Entropy (see Section 3.3) of µ w.r.t. γ is defined as
Z
Z H(µ|γ) =
ρ log ρ dγ =
u log u + V dx,
Ω
(6.32)
(6.33)
Ω
whereas the Relative Fisher Information is defined as
2
Z Z 2
Z ∇ρ 2
∇ρ
∇u
+
u∇V
I(µ|γ) :=
dγ =
dx
ρ dµ =
ρ
u
Ω
Ω
Ω
(6.34)
1,1
whenever u, ρ ∈ Wloc
(Ω) (recall that V is locally Lipschitz in Ω); as usual,
we set H(µ|γ) = +∞ if µ is not absolutely continuous, and I(µ|γ) = +∞ if
1,1
ρ 6∈ Wloc
(Ω).
Let us collect in the following proposition the main properties of these two
functionals, we already discussed in Sections 3 and 4.
Proposition 6.3 (Entropy and Fisher information) Let V, γ as in (6.30)
and (6.31).
i) λ-convexity of the Relarive Entropy: The functional µ 7→ H(µ|γ)
is λ-displacement convex and it also satisfies the “generalized” convexity
assumption of (5.60).
ii) Subdifferential and slope of the Entropy: A measure µ = ρ γ =
uL d |Ω belongs to D(∂H) = D(|∂H|) iff I(µ|γ) < +∞, i.e.
1,1
ρ, u ∈ Wloc
(Ω)
and
∇ρ
∇u
=
+ ∇V ∈ L2 (µ; Rd );
ρ
u
94
(6.35)
Questa proposizione si
potrebbe spostare anche alla fine del Capitolo 4.
in this case
ξ = ∂ ◦ H(µ|γ)
⇐⇒
ξ=
∇ρ
∈ L2 (µ; Rd ),
ρ
(6.36)
so that
Z
I(µ|γ) =
2
ξ dµ = |∂H|2 (µ).
(6.37)
Ω
iii) Variational inequality for the logarithmic gradient: If I(µ|γ) <
+∞, the logarithmic gradient ξ = ∇ρ
ρ satisfies
Z λ
tσµ −x ·ξ+ |tσµ −x|2 dµ ≤ H(σ|γ)−H(µ|γ) ∀ σ ∈ P2a (Rd ). (6.38)
2
Ω
iv) Log-Sobolev inequality: If λ > 0 then
H(µ|γ) ≤
1
I(µ|γ)
2λ
∀ µ ∈ P2a (Rd ).
(6.39)
v) Derivative of the Entropy along curves: Let µ : t ∈ [0, T ] 7→ µt =
ρt γ ∈ P2 (Rd ) be a continuous family of measures satisfying the continuity
equation
∂t µ + ∇ · v µ = 0 in D0 (Rd × (0, T ))
(6.40)
for a Borel vector field v with
Z TZ
2
v t dµt dt < +∞,
0
Ω
Z
T
I(µt |γ) dt < +∞.
(6.41)
0
Then the map t 7→ H(µt |γ) is absolutely continuous in [0, T ] and for L 1 a.e. t ∈ (0, T ) its derivative is
Z
Z
∇ρt
d
dµt =
v t · ∇ρt dγ.
(6.42)
H(γ|µt ) =
vt ·
dt
ρt
Ω
Ω
Proof. i) follows from Proposition 3.5 and 3.11. The generalized convexity
property (5.60) follows by analogous arguments (see [8, Prop. 9.3.9]).
ii) and iii) have been proved in 4.20 and (4.36).
(6.39) follows from (5.39)
v) follows from the general Chain rule (4.55).
6.3.2
Wasserstein formulation of the Kolmogorov-Fokker-Planck equation
Under the same assumption (6.30), (6.31) of the previous section, and recalling
the differential operators of (6.14a,b), let us introduce the Laplacian operator
∆γ induced by γ:
∆γ ρ := ∇γ · (∇ρ) = eV ∇ · e−V ∇ρ = ∆ρ − ∇ρ · ∇V,
(6.43)
95
and its formal adjoint (with respect to the Lebesgue measure) Fokker-Planck
operator
∆∗γ u := e−V ∆γ eV u = ∇ · ∇u + u∇V .
(6.44)
For smooth function with compact support in Ω they satisfy
Z
Z
Z
−
∆γ ρ ζ dγ =
∇ρ · ∇ζ dγ = −
ρ ∆γ ζ dγ,
Ω
Ω
Ω
Z
Z
−
∆∗γ u ζ dx = −
u ∆γ ζ dx.
Ω
(6.45)
(6.46)
Ω
In the case of the centered Gaussian measure with variance λ−1 .
2
λ
1
1
e− 2 |x| L d
V (x) = λ|x|2 − log(λ/2π) , γ =
2
(2π/λ)d/2
(6.47)
∆γ is the Ornstein-Uhlenbeck operator ∆ − λx · ∇.
Definition 6.4 (“Wasserstein” solutions of K.F.P. equations) A continuous family µt = ρt γ = ut L d |Ω ∈ C 0 (0, +∞; P2 (Rd )) is a Wasserstein solution
of the Kolmogorov-Fokker-Plank equation if t 7→ I(µt |γ) belongs to L2loc (0, +∞)
so that for L 1 -a.e. t ∈ (0, +∞)
1,1
ρt , ut ∈ Wloc
(Ω),
ξt =
and
∂t µt − ∇ · µt
i.e.
Z
+∞
Z o
− ∂t ζ +
Ω
∇ρt
∇ut
=
+ ∇V ∈ L2 (µt ; Rd ),
ρt
ut
∇ρt =0
ρt
in D 0 (Rd × (0 + ∞)),
∇ρt
· ∇ζ dµt dt = 0
ρt
∀ ζ ∈ Cc∞ (Rd × (0, +∞)).
Equivalently, ρt satisfy the weak formulation
Z +∞ Z − ρ ∂t ζ + ∇ρ · ∇ζ dγ dt = 0 ∀ ζ ∈ Cc∞ (Rd × (0, +∞)
0
(6.48)
(6.49)
(6.50)
(6.51)
Ω
of
∂t ρt − ∆γ ρt = 0
in Ω × (0, +∞),
e−V ∂n ρt = 0
on ∂Ω × (0, +∞) (6.52)
Remark 6.5 In terms of the Lebesgue density ut , (6.49) reads
Z +∞ Z − u ∂t ζ + ∇u + u∇V · ∇ζ dx dt = 0 ∀ ζ ∈ Cc∞ (Rd × (0, +∞),
0
Ω
(6.53)
corresponding to the Fokker-Planck equation
∂t u − ∆∗γ u = ∂t u − ∇ · ∇u + u∇V ) = 0
in Ω × (0, +∞),
(6.54)
with variational boundary conditions ∇u + u∇V · n = 0 on ∂Ω × (0, +∞).
96
We introduce the (narrowly) closed (geodsically and linearly) convex subset of
P2 (Rd )
n
o
P2 (Ω) := µ ∈ P2 (Rd ) : supp(µ) ⊂ Ω .
(6.55)
Theorem 6.6 For every µ0 ∈ P2 (Ω) there exists a unique Wasserstein solution µt = ρt γ = ut L d |Ω of the Kolmogorov-Fokker-Planck equation (6.49)
satisfying limt↓0 µt = µ0 in P2 (Rd ) and it coincides with the Wasserstein gradient flow generated by the functional φ(µ) := H(µ|γ).
The maps St : µ0 7→ µt , t ≥ 0, define a continuous λ-contractive semigroup in
P2 (Ω) which can be chacterized by the system of E.V.I.
1 d 2
λ
W (µt , σ) + W22 (µt , σ) ≤ H(σ|γ) − H(µt |γ)
2 dt 2
2
∀ σ ∈ P2 (Ω);
(6.56)
it exhibits the regularizing effect
H(µt |γ) < +∞,
I(µt |γ) < +∞
∀ t > 0,
(6.57)
with, for λ ≥ 0,
H(µt |γ) ≤
1 2
W (µt , γ),
2t 2
I(µt |γ) ≤
1 2
W (µt , γ).
t2 2
(6.58)
The map t 7→ e2λt I(µt |γ) is non increasing and it satisfies the Energy Identity
Z
H(µb |γ) +
b
I(µt |γ) dt = H(µa |γ)
∀ 0 ≤ a ≤ b ≤ +∞.
(6.59)
a
When λ > 0 the Asymptotic behaviour of µt as t0 ≤ t → +∞ is governed by
W2 (µt , γ) ≤ e−λ(t−t0 ) W2 (µt0 , γ),
H(µt |γ) ≤ e−2λ(t−t0 ) H(µt0 |γ),
I(µt |γ) ≤ e−2λ(t−t0 ) I(µt0 |γ),
(6.60)
Moreover, for every t > 0 (and also for t = 0 provided I(µ0 |γ) < +∞)
µ
tµtt+h − i
∇ρt
=
in L2 (µt ; Rd ),
h↓0
h
ρt
H(µt+h |γ) − H(µt |γ)
∃ lim
= I(µt |γ).
h↓0
h
∃ lim
(6.61)
Proof. Is is not difficult to check that
D(φ) = P2 (Ω).
For, D(φ) contains all the measures of the type
µx0 ,ρ :=
1
χB (x ) · γ
γ(Bρ (x0 )) ρ 0
97
with x0 ∈ supp γ, ρ > 0,
(6.62)
and their convex combinations, so that
X
αi δxi ∈ D(φ) if xi ∈ supp γ,
αi ≥ 0,
X
i
αi = 1.
i
Since the subset of all the finite convex combinations of δ-measures concntrated
in Ω is dense in P2 (Ω), we get (6.62).
By Proposition 6.3 the Relative Entropy functional µ 7→ H(µ|γ) satisfies
all the assumptions of Theorems 5.3, 5.7, and 5.8(ii). Theorem 6.6 is a simple
transposition of the results of Section 4, taking also into account the particular
form of the subdifferential of H expressed by (6.36) and the fact that γ is the
unique minimum of H with H(γ|γ) = 0.
We conclude this section by briefly discussing some further properties of the
semigroup constructed by Theorem 6.6. First of all we set
n
o
Bγ := ρ ∈ L1 (γ) : ργ ∈ P2 (Rd ) ,
(6.63)
we introduce the weighted Sobolev space
n
o
1,2
Wγ1,2 (Ω) := ρ ∈ Wloc
(Ω) : ∇ρ ∈ L2 (γ; Rd ) ,
(6.64)
and the “transition probabilities” νx,t = ϑx,t γ
νx,t := St [δx ]
with densities
ϑx,t :=
dνx,t
∈ L1 (γ) ∀ x ∈ Ω, t > 0. (6.65)
dγ
As we shall see in the next section, it is possible to find densities ϑx,t such that
for every t > 0 the map
(x, y) ∈ Ω × Ω → ϑx,t (y)
is Borel.
(6.66)
Theorem 6.7 (The associated Markovian semigroup) Let (St )t≥0 be the
semigroup constructed in the previous Theorem 6.6.
Extension to a contraction semigroup in Lp (γ) There exists a unique strongly
continuous semigroup of linear contraction operators (St )t≥0 in L1 (γ) such
that
St [ρ0 ] = ρt ⇐⇒ St [ρ0 γ] = ρt γ ∀ ρ0 ∈ Bγ .
(6.67)
For every p ∈ [1, +∞] (the restriction to Lp (γ) of ) St is a C 0 (only weakly∗
continuous, if p = +∞) contraction semigroup in Lp (γ)
St [ρ] p ≤ ρ p
∀ ρ ∈ Lp (γ),
(6.68)
L (γ)
L (γ)
it is order preserving
ρ 0 ≤ ρ1
⇒
98
St [ρ0 ] ≤ St [ρ1 ],
(6.69)
and regularizing, since
St (L∞ (γ)) ⊂ Cb (Ω)
Moreover
St Lip(Ω) ⊂ Lip(Ω)
−λt
Lip(St [ρ]; Ω) ≤ e
Lip(ρ, Ω)
∀ t > 0.
∀ t > 0,
∀ ρ ∈ Lip(Ω).
(6.70)
(6.71)
Representation formula The semigroups S, S admit the representation formula for γ-a.e. x ∈ Ω
Z
St [µ] = ρt γ
with ρt (x) =
ϑy,t (x) dµ(y),
(6.72)
d
ZR
St [ρ0 ] = ρt
with ρt (x) =
ϑy,t (x)ρt (y) dγ(y)
(6.73)
Ω
Dirichlet form In L2 (γ) St coincides with the (analytic) semigroup S̃ associated to the symmetric Dirichlet form with domain Wγ1,2 (Ω) ⊂ L2 (γ)
Z
aγ (ρ, η) :=
∇ρ · ∇η dγ ∀ ρ, η ∈ Wγ1,2 (Ω).
(6.74)
Ω
In particular, if ρ0 ∈ L2 (γ) then ρt = St [ρ0 ] ∈ Wγ1,2 (Ω) ∀ t > 0, ∂t ρ ∈
0
(0, +∞; L2 (γ)), and
Cloc
Z
Z
∂t ρ η dγ +
∇ρ · ∇η dγ = 0 ∀ η ∈ Wγ1,2 (Ω) ∀ t > 0.
(6.75)
Ω
Ω
Symmetry of the transition densities For every t > 0 the transition densities ϑx,t satisfies
for γ × γ-a.e. (x, y) ∈ Ω × Ω,
(6.76)
so that the “adjoint” representation formula holds
Z
ϑx,t (y)ρt (y) dγ(y),
St [ρ0 ] = ρt with ρt (x) =
(6.77)
ϑx,t (y) = ϑy,t (x)
Ω
which provides the continuous representative of ρt when ρ0 ∈ L∞ (γ).
Proof. Most of the results of the previous theorem are a direct consequence of
the “linearity” of the semigroup St and of its regularizing effect; therefore, we
postpone their proof to the next section, where we will discuss from a general
point of view the construction of a Markov semigroup starting from a “linear”
Wasserstein semigroup.
Here we only consider the last two properties, establishing the link with
the “Dirichlet form” approach. In order to check the equivalence, we observe
that for every initial datum ρ0 ∈ L∞ (γ) with ρ0 (x) ≥ r > 0 for γ-a.e. x ∈ Ω,
99
Si puo’ dimostrare a
partire dalla formula
di
rappresentazione
(6.72) che St è λcontractive anche per
la distanza di Wasserstein Wp , 1 ≤ p ≤ 2:
aggiungere?
Il caso
p > 2 pero’ non riesco
a dadurlo.
the unique solution ρt of (6.75) still satifies the lower bound ρt ≥ r and the
estimates
Z
Z +∞ Z
Z
2
2
2
sup t
|∂t ρt | dγ < +∞,
|∇ρt | dγ dt =
|ρ0 |2 dγ < +∞ (6.78)
t>0
Ω
0
Ω
Ω
In particular,
Z
+∞
I(ρt γ|γ) dt ≤ r
−1
0
Z
+∞
Z
0
|∇ρt |2 dγ dt < +∞,
(6.79)
Ω
so that the measures µt = ρt γ provides the (unique) Wasserstein solution of
(6.51) (since γ is a finite measure, Cc∞ (Rd ) is a subset of Wγ1,2 (Ω)). Therefore
the semigroups S and S̃ coincide on L2 (γ)-densities bounded away from 0: a
simple density argument shows that they coincide on L2 (γ).
Since aγ is a symmetric form, St is selfadjoint in L2 (γ) and we have
Z
Z
ρ St [η] dγ =
St [ρ] η dγ ∀ ρ, η ∈ L2 (γ).
(6.80)
Ω
Ω
For every bounded nonnegative ρ, η ∈ L∞ (γ) (6.73) yields
Z
Z Z
Z
ρ(x)
ϑy,t (x)η(y) dγ(y) dγ(x) =
ϑx,t (y)ρ(x) dγ(x) η(y) dγ(y).
Ω
Ω
Ω
Ω
By (6.66) and Fubini’s Theorem, we get (6.76).
(6.81)
Remark 6.8 The measures (νx,t )t≥0 are a Markovian semigroup of kernels .associated with (St )t≥0 [48, II-4]
6.3.3
The construction of the Markovian semigroup
Among general λ-contracting semigroups in P2 (Rd ) the Kolmogorov-FokkerPlanck equation enjoys several other interesting features, due to its linearity.
As we will see in the next Lemma, this is a direct consequence of the following
“linearity condition”
)
ξ i = ∂ ◦ φ(µi ), αi ≥ 0, α1 + α2 = 1
=⇒ ξ ∈ ∂φ(α1 µ1 + α1 µ2 ). (6.82)
ξ(α1 µ1 + α2 µ2 ) = α1 ξ 1 µ1 + α2 ξ 2 µ2
satisfied by the Wasserstein subdifferential of φ(µ) := H(µ|γ).
The aim of this section is to show how easily one can deduce contraction
and regularizing estimates starting from a “linear” Wasserstein semigroup; in
paricular, the construction of the fundamental solutions is particularly simple. It
should not be too difficult to extend the following results to infninite dimensional
underlying spaces.
100
Lemma 6.9 (Linearity of the gradient flow) Let φ : P2 (Rd ) → (−∞, +∞]
be a functional satisfying (5.1a,b,c,d) and let St be the λ-contractive semigroup
generated by its gradient flow on D(φ) as in Theorem 5.7. If φ satisfies (6.82),
then the semigroup St satisfies the “linearity” property
St [α1 µ1 +α2 µ2 ] = α1 St [µ1 ]+α2 St [µ2 ]
∀ µ1 , µ2 ∈ D(φ), α1 , α2 ≥ 0, α1 +α2 = 1.
(6.83)
Proof.
Take two initial data µ1 , µ2 ∈ D(φ) and set µi,t := St [µi ], v i,t =
−∂ ◦ φ(µi,t ) their velocity vector fields, µt = α1 µ1,t + α2 µ2,t , and define the
vector field v t so that
v t µt := α1 v 1,t µ1,t + α2 v 2,t µ2,t .
(6.84)
Assuming αi > 0 and introducing the densities
ρi,t :=
dµi,t
,
dµt
so that
v t = α1 ρ1,t v 1,t + α2 ρ2,t v 2,t ,
α1 ρ1,t + α2 ρ2,t = 1,
it is easy to check that for every t > 0
Z
Z
|v t |2 dµt =
|α1 ρ1,t v 1,t + α2 ρ2,t v 2,t |2 dµt
Rd
Rd
Z
Z
2
≤ α1
|v 1,t | ρ1,t dµt + α2
|v 2,t |2 ρ2,t dµt
Rd
Rd
Z
Z
= α1
|v 1,t |2 dµ1,t + α2
|v 2,t |2 dµ2,t .
Rd
(6.85)
Rd
It follows that the map t 7→ kv t kL2 (µt ;Rd ) belongs to L2loc (0, +∞) and, by linearity, µt satisfies the continuity equation
∂t µt + ∇ · (v t µt ) = 0
in Rd × (0, +∞).
(6.86)
Since v t ∈ ∂φ(µt ) by (6.82), µt is the unique gradient flow with initial datum
α1 µ1 + α2 µ2 , that is µt = St (α1 µ1 + α2 µ2 ).
Let γ be a nonnegative Borel measure on Rd , with support D and let Bγ be
defined as in (6.63).
Theorem 6.10 Let St , t ≥ 0, be a continuous λ-contracting semigroup defined
in P2 (D) satisfying the follwing assumptions:
S0 [µ] = µ, St Ss [µ] = St+s [µ] ∀ µ ∈ P2 (D), s, t ≥ 0.
(6.87a)
St [µ] γ
St [αµ + βν] = αSt [µ] + βSt [ν]
∀ µ ∈ P2 (D), t > 0.
(6.87b)
∀ µ, ν ∈ P2 (D), α, β ≥ 0, α + β = 1. (6.87c)
Extension to L1 (γ) There exists a unique C 0 semigroup (denoted by St ) of
bounded linear operators on L1 (γ) such that
St [ργ] = St [ρ]γ
101
∀ ρ ∈ Bγ .
(6.88)
Contraction and order preserving properties St is in fact a contraction
and order preserving semigroup, i.e.
St [ρ] 1 ≤ ρ 1 , ρ1 ≤ ρ2 ⇒ St [ρ1 ] ≤ St [ρ2 ].
(6.89)
L (γ)
L (γ)
Representation formula Denoting by νt,x = θt,x γ the “transition probabilities”
νx,t := St [δx ]
with densities
ϑx,t :=
dνx,t
∈ L1 (γ)
dγ
∀ x ∈ D, t > 0,
(6.90)
the semigroup S admits the representation formula
Z
St [µ] = ρt γ with ρt (x) =
ϑy,t (x) dµ(y) for γ-a.e. x ∈ D. (6.91)
Rd
Invariant measure and Markov property If
γ ∈ P2 (Rd )
is an invariant measure, i.e.
St [γ] = γ
∀ t ≥ 0, (6.92)
then
St Lp (γ) ⊂ Lp (γ),
p
(6.93)
0
and the restriction of St to L (γ) is a C contraction semigroup.
Proof. Let us first extend S by homogeneity to the cone M2 (D) of nonnegative
finite measures with finite second moment
n
o
M2 (D) := λµ : µ ∈ P2 (D), λ ≥ 0
(6.94)
simply by setting
St [λµ] = λSt [µ] ∀ µ ∈ P2 (D), λ ≥ 0.
(6.95)
It is easy to check that this extension preserves properties (6.87a,b,c) moreover
(6.83) holds for every couple of nonnegative coefficients α, β:
St [αµ + βν] = αSt [µ] + βSt [ν] ∀ µ, ν ∈ M2 (D), α, β ≥ 0.
(6.96)
The uniqueness of S is then immediate: if ργ ∈ M2 (D) then by (6.88) and
(6.87b)
St [ργ]
St [ρ] =
.
(6.97)
γ
Being St coontinuous, it is sufficient to determine it on the set
Z
n
o
Cγ := ρ ∈ L1 (γ) : |x|2 |ρ(x)| dγ < +∞ ,
(6.98)
which is clearly dense in L1 (γ); since each ρ ∈ Cγ can be decomposed as
ρ = ρ+ − ρ−
where
102
ρ+ γ, ρ− γ ∈ M2 (D)
(6.99)
St [ρ] should be equal to the difference between St [ρ+ ] and St [ρ− ]. Let us check
that this representation is independent of the particular decomposition: if ρ0= , ρ0−
is another admissible couple as in (6.99), then
ρ+ + ρ0− = ρ0+ + ρ−
and therefore
St [ρ+ ] + St [ρ0− ] = St [ρ+ + ρ0− ] = St [ρ0+ + ρ− ] = St [ρ0+ ] + St [ρ− ]
showing that
St [ρ+ ] − St [ρ− ] = St [ρ0+ ] − St [ρ0− ].
Choosing in particular ρ+ := max[ρ, 0] ρ− := − min[ρ, 0] we get the bound
St [ρ] 1 ≤ St [ρ+ ] 1 +St [ρ− ] 1 = ρ+ 1 +ρ− 1 = ρ 1
L (γ)
L (γ)
L (γ)
L (γ)
L (γ)
L (γ)
(6.100)
which shows that St is non expansive. Therefore, it can also be extended to a
non expansive linear operator on L1 (γ).
Let us now check check that S is a C 0 -semigroup in L1 (γ); it is well known
(see e.g. [59]) that it is sufficient to prove that S is weakly continuous, i.e.
St [ρ] * ρ weakly in L1 (γ)
as t ↓ 0
∀ ρ ∈ L1 (γ).
(6.101)
Since St is non expansive, it is sufficient to check (6.101) on the dense subset Cγ ;
by linearity we can further reduce our analysis to ρ ∈ Bγ . Since St is induced
by St through (6.88) on this set, we know that
St [ρ]γ → ργ
in P2 (Rd )
as t ↓ 0,
(6.102)
and this yields the required weak convergence in L1 (γ) thanks to the next
Lemma.
By the same argument one can show that
µn → µ in P2 (Rd )
=⇒
St [µn ]
St [µ]
*
γ
γ
weakly in L1 (γ).
(6.103)
In particular, for every t > 0
the map x ∈ D 7→ ϑx,t
is weakly continuous in L1 (γ),
therefore it is also a strongly measurable L1 (γ)-valued map, with
Z Z
|ϑx,t (y)| dγ(y) dγ(x) = 1.
D
(6.104)
(6.105)
D
It follows that the map (6.104) belongs to L1 (γ; L1 (γ)) which is isomorphic to
L1 (γ × γ), thus showing (6.66). (6.104) also yields
Z
ϕ(y)ϑx,t (y) dγ(y) is continuous ∀ ϕ ∈ L∞ (γ).
(6.106)
the map x 7→
D
103
In order to prove the
P representation formula (6.91) we observe that for every
initial measure ν = i αi δxi ∈ P2 (D) and every ϕ ∈ Cb0 (D), νt = St [ν] satisfies
Z
X Z
ϕ(y) dνt (y) =
αi
ϕ(y)ϑxi ,t (y) dγ(y) =
D
D
i
=
Z Z
D
(6.107)
ϕ(y)ϑx,t (y) dγ(y) dν(x).
D
Therefore, by approximating
an arbitrary measure µ by a sequence of conceP
trated measures ν k = i αik δxki converging in P2 (D), since St [ν k ] → St [µ] =
µt = ρt γ in P2 (Rd ) (6.106) yields
Z Z
Z
ϕ(y) ρt (y) dγ(y) =
ϕ(y)ϑx,t (y) dγ(y) dµ(x).
(6.108)
D
D
D
and therefore (6.91) by Fubini’s Theorem.
Finally, if γ is an invariant measure, then St [1] = 1; the order preserving
property shows that kSt [ρ]kL∞ (γ) ≤ kρkL∞ (γ) . By interpolation, the same property holds for every space Lp (γ).
Lemma 6.11 (Narrow and weak L1 -convergence) Let µn = ρn γ, µ = ργ ∈
P(Rd ) be Borel probability measures absolutely continuous w.r.t. the Radon
measure γ. Then
µn → µ
narrowly in P(Rd )
⇐⇒
ρn * ρ
weakly in L1 (γ).
(6.109)
The proof is an immediate consequence of the equiintegrability criterium stated
in [7, Remark 5.33].
For every function ζ ∈ L∞ (γ) we can thus define
Z
ζt (x) = St∗ [ζ](x) :=
ζ(y)ϑx,t (y) dγ(y).
(6.110)
Rd
The next result show that St∗ is the adjoint semigroup of St and that is exhibits
the Feller regularizing property.
Theorem 6.12 (The adjoint semigroup) Under the same assumption of the
previous Theorem, the maps St∗ defined by (6.110) are the weakly∗ -continuous,
non expansive, adjoint semigroup on L∞ (γ) induced by St , i.e. they satisfy
Z
Z
∗
St [ζ]ρ dγ =
ζSt [ρ] dγ ∀ ρ ∈ L1 (γ).
(6.111)
Rd
Rd
Moreover, for every t > 0
St∗ L∞ (γ) ⊂ Cb (D),
St∗ Lip(D) ⊂ Lip(D).
104
(6.112)
Proof. Let us denote by S̃t∗ the adjoint semigroup, defined as in (6.111), and
by ζt the image of ζ ∈ L∞ (γ) by S̃t∗ ; we introduce the measures
γx0 ,r :=
1
χB (x ) · γ ∈ P2 (Rd ),
γ(Br (x0 )) r 0
∀ x0 ∈ D = supp(γ), r > 0, (6.113)
satisfying
in P2 (Rd )
γx0 ,r → δx0
as r ↓ 0
∀ x0 ∈ D.
By (6.103) we know that that for every x0 ∈ D the limit
Z
1
ζ̃t (x0 ) := lim
ζt (x) dγ(x)
r↓0 γBr (x0 ) B (x )
r
0
(6.114)
(6.115)
exists since
1
γBr (x0 )
Z
Z
ζt (x) dγ(x) =
ζt
χBr (x0 )
dγ
γBr (x0 )
Z
χ
Br (x0 )
dγ =
ζ d St (γx0 ,r )
=
ζSt
γBr (x0 )
Rd
Rd
Rd
Br (x0 )
Z
and therefore
Z
Z
ζ d St (δx0 ,r ) =
ζ̃t (x0 ) =
Rd
ζ θx0 ,r dγ.
(6.116)
Rd
Lebesgue differentiation Theorem yields ζ̃t (x) = ζt (x) = St∗ [ζ](x) for γ-a.e.
x ∈ D, thus showing that S̃ ∗ = S ∗ .
(6.117) and the fact that limx→x0 δx = δx0 in P2 (Rd ) shows that
θx,t * θx0 ,t
weakly in L1 (γ) as x → x0
∀ t > 0,
(6.117)
and therefore ζ̃t is the continuous representative of ζt ; this also show the first
inclusion of (6.112).
The second inclusion of (6.53) follows easily, since for each ζ ∈ Lip(D) with
ζt = St∗ (ζ) and each couple of points x, y ∈ D we have
Z
Z
ζt dνx,t −
ζt dνy,t ≤ Lip(ζ; D)W2 (νx,t , νy,t )
|ζt (x) − ζt (y)| ≤ Rd
Rd
−λt
≤ Lip(ζ; D)e
W2 (δx , δy ) = e−λt Lip(ζ; D)|x − y|.
6.4
Nonlinear diffusion equations
In this section we consider the case of nonlinear diffusion equations in Rd .
Let us consider a convex differentiable function F : [0, +∞) → R which satisfies (4.71), (4.76), and (4.78): F is the density of the internal energy functional
F defined in (4.70).
105
Setting LF (z) := zF 0 (z) − F (z), we are looking for nonnegative solution of
the evolution equation
in Rd × (0, +∞),
(6.118a)
satisfying the (normalized) mass conservation
Z
ut ∈ L1 (Rd ),
ut (x) dx = 1 ∀ t > 0,
(6.118b)
∂t ut − ∆(LF (ut )) = 0
Rd
the finiteness of the quadratic moment
Z
|x|2 ut (x) dx < +∞
∀ t > 0,
(6.118c)
Rd
the integrability condition LF (u) ∈ L1loc (Rd × (0, +∞)), and the initial Cauchy
condition
lim ut · L d = µ0 in P2 (Rd ).
(6.118d)
t↓0
Therefore (6.118a) has the usual distributional meaning
Z +∞ Z − ut ∂t ζ + LF (ut )∆ζ dx dt = 0
∀ ζ ∈ D(Rd × (0, +∞).
0
Rd
Theorem 6.13 Suppose that F has a superlinear growth as in (4.72). Then
for every µ0 ∈ P2 (Rd ) there exists a unique solution
2
u ∈ ACloc
((0, +∞); P2 (Rd ))
of (6.118a,b,c,d) among those satisfying
Z
|∇LF (u)|2
1,1
1
d
LF (u) ∈ Lloc ((0, +∞); Wloc (R )),
dx ∈ L1loc (0, +∞). (6.119)
u
Rd
The map t 7→ St [µ0 ] = µt = ut L d is the unique gradient flow in P2 (Rd ) of the
functional F defined in (4.70), which is geodesically convex (and also satisfies
(5.60) with λ = 0). The gradient flow satisfies all the properties of Theorem 5.7
for λ = 0. In particular, it is characterized by the system of E.V.I.
1 d 2
W (µt , σ) ≤ F(σ) − F(µt )
2 dt 2
it is non expansive
W2 (St [µ0 ], St [ν0 ]) ≤ W2 (µ0 , ν0 )
∀ σ ∈ D(F),
∀ µ0 , ν0 ∈ P2 (Rd ), t ≥ 0,
(6.120)
(6.121)
and regularizing
Z
sup t
0<t≤1
sup t2
0<t≤1
Z
t 7→
Rd
F (ut (x)) dx < +∞,
Rd
Z
Rd
|∇LF (ut (x))|2
dx < +∞,
ut (x)
|∇LF (ut (x))|2
dx
ut (x)
106
is not increasing,
(6.122)
and for every t > 0
µ
∇LF (ut )
tµtt+h − i
=
in L2 (µt ; Rd ),
h↓0
h
ut
Z
F(µt+h ) − F(µt )
|∇LF (ut (x))|2
dx.
∃ lim
=
h↓0
h
ut (x)
Rd
∃ lim
(6.123)
Proof. The proof is a simple combination of theorems 5.3, 5.7, and 5.8(ii).
(see [8, Theorem 9.3.9]) and of the results of Section 4.5.3 for the functional F,
noticing that the domain of F is dense in P2 (Rd ).
Remark 6.14 When F has a sublinear growth and satisfies
lim
z→+∞
F (z)
= 0,
z
lim
z→+∞
F (z)
= −∞,
z 1−1/d
(6.124)
then it is possible to prove [8, Thm. 10.4.8] that F still satisfies (5.1c) and the
Wasserstein semigroup generated by F provides the unique solution of (6.118a)
in the above precise meaning: for, even if µ0 is not regular (e.g. a Dirac mass),
the regularizing effect of the Wasserstein semigroup show that µt := S[µ0 ](t) is
absolutely continuous w.r.t. the Lebesgue measure L d for all t > 0: its density
ut w.r.t. L d is therefore well defined and solves (6.118a).
Remark 6.15 Equation (6.118a) is a very classical problem: it has been studied by many authors from different points of view, which is impossibile to recall
in detail here.
We only mention that in the case of homogeneous Dirichlet boundary conditions
in a bounded domain, H. Brézis showed that the equation is the gradient flow
(see [15]) of the convex functional (since LF is monotone)
Z
Z u
ψ(u) :=
GF (u) dx, where GF (u) :=
LF (r) dr,
Rd
0
−1
in the space H (Ω). We refer to the paper of Otto [57] for a detailed comparison of the two notions of solutions and for a physical justification of the
interest of the Wasserstein approach.
It is also possible to prove that the differential operator −∆(LF (u)) is maccretive in L1 (Rd ) and therefore it induces a (nonlinear) contraction semigroup
in L1 (Rd ). Notice that here we allow for more general initial data (an arbitrary
probability measure), whereas in the H −1 (or L1 ) formulation Dirac masses are
not allowed (but see [60, 22]).
6.5
Drift diffusion equations with non local terms
Let us consider, as in [20, 21], a functional φ which is the sum of internal,
potential, and interaction energy:
Z
Z
Z
1
φ(µ) :=
F (u) dx +
V dµ +
W dµ × µ if µ = uL d .
2
d
d
d
d
R ×R
R
R
107
Here F, V, W satisfy the assumptions considered in Section 4.5.7; as usual we
set φ(µ) = +∞ if µ ∈ P2 (Rd ) \ P2a (Rd ). The gradient flow of φ in P2 (Rd )
leads to the equation
∂t ut − ∇ · ∇LF (ut ) + ut ∇V + ut (∇W ) ? ut = 0,
(6.125)
coupled with conditions (6.118b), (6.118c), (6.118d).
Theorem 6.16 For every µ0 ∈ P2 (Rd ) there exists a unique distributional
solution ut of (6.125) among those satisfying ut L d → µ0 in P2 (Rd ) as t ↓ 0,
1,1
LF (ut ) ∈ L1loc ((0, +∞); Wloc
(Ω)), and
∇LF (ut )
+ ∇V + ∇W ? ut (6.126)
∈ L2loc (0, +∞).
2
ut
L (µt ;Rd )
Furthermore, this solution is the unique gradient flow in P2 (Rd ) of the functional φ, which is λ-geodesically convex, and therefore satisfies all the properties
stated in Theorem 5.7. In particular, when λ > 0 there exists a unique minimizer µ of φ and the gradient flow generates a λ-contracting and regularizing
semigroup which exhibit the asymptotic behaviour of (5.22a,b,c,d).
Proof. The existence of ut follows by Theorem 5.8 (besides ( (5.1)) φ satisfies
(5.60), see [8, Theorem 9.3.5]) and by the characterization, given in Section 4.5.7,
of the (minimal) subdifferential of φ. The same characterization proves that any
ut as in the statement of the theorem is a gradient flow; therefore the uniqueness
Theorem 5.5 can be applied.
In the limiting case F, V = 0, the generated semigroup looses its regularizing
effect and its existence and main properties follows from the more general theory
of [8]. In this way it is possibile to study a model equation for the evolution of
granular flows (see e.g. [17]);
Notice that, as we did in Section 6.3, we can also consider evolution equations in convex (bounded or unbounded) domains Ω ⊂ Rd with homogeneous
Neumann boundary conditions, simply by setting V (x) ≡ +∞ for x ∈ Rd \ Ω.
6.6
Gradient flow of −W 2 /2 and geodesics
For a fixed reference measure σ ∈ P2 (Rd ) let us now consider the functional
φ(µ) := − 21 W22 (µ, σ), as in Theorem 4.19. Being φ (−1)-convex along generalized geodesics, we can apply Theorem 5.7 to show that φ generates an evolution
semigroup on P2 (Rd ).
When Γo (σ, µ0 ) contains a plan γ such that
π 1 , π 1 + T (π 2 − π 1 ) # γ is optimal for some T > 1,
(6.127)
then the semigroup moves µ0 along the geodesics induced by γ. Lemma ??
shows that in this case γ admits the representation γ = (r × i)# µ0 for some
transport map r and γ is the unique element of Γo (σ, µ0 ).
108
Per
quest’ultima
applicazione
non
saprei bene cosa fare:
tradurla
nel
caso
di misure a.c.
solo
per mostrare che la
geodetica estesa e’ un
gradient flow?
Theorem 6.17 Let be given two measures σ, µ0 ∈ P2 (Rd ) and suppose that
γ ∈ Γo (σ, µ0 ) satisfies (6.127), i.e. the constant speed geodesic
γ(s) := (1 − s)π 1 + sπ 2 # γ
can be extended to an interval [0, T ], with T > 1. Then the formula
t → µ(t) := γ(et ),
for 0 ≤ t ≤ log(T ),
(6.128)
gives the gradient flow of µ 7→ − 21 W22 (µ, σ) starting from µ0 .
Proof. Lemma ?? shows that (1 − et )π 1 + et π 2 , π 1 # γ is the unique optimal
plan in Γo (µ(t), σ); therefore
W22 (µ(t), µ(t)) = |et − et |2 W22 (µ0 , σ),
W22 (µ(t), σ) = e2t W22 (µ0 , σ), (6.129)
so that
d
φ(µ(t)) = −e2t W22 (µ0 , σ) = −|µ0 |2 (t).
dt
On the other hand, the characterization of |∂φ| given in (4.104) gives
(6.130)
|∂φ|2 (µ(t)) = e2t W22 (µ0 , σ).
This shows that µ(t) is a curve of maximal slope; combining Theorem ?? with
the uniqueness Theorem 5.5, we conclude.
6.7
Bibliographical notes
a) The gradient flow formulation (5.3) suggests a general variational scheme
(the Minimizing Movement approach, which we discussed in the first part of
this book and which we will apply in the next sections) to approximate the
solution of (6.4a,b,c): proving its convergence is interesting both from the
theoretical (cf. the papers quoted at the beginning of the Chapter) and the
numerical point of view [47].
b) The variational scheme exhibits solutions which are a priori nonnegative,
even if the equation does not satisfies any maximum principle as in the
fourth order case [55, 40].
c) Working in Wasserstein spaces allows for weak assumptions on the data:
initial values which are general measures (as for fundamental solutions, in
the linear cases) fit quite naturally in this framework.
d) The gradient flow structure suggests new contraction and energy estimates,
which may be useful to study the asymptotic behaviour of solutions to
(6.4a,b,c) [57, 11, 17, 21, 2, 63, 31], or to prove uniqueness under weak
assumptions on the data.
109
e) The interplay with the theory of Optimal Transportation provides a novel
point of view to get new functional inequalities with sharp constants [58, 64,
3, 23, 10, 28].
f) The variational structure provides an important tool in the study of the
dependence of solutions from perturbation of the functional.
g) The setting in space of measures is particularly well suited when one considers
evolution equations in infinite dimensions and tries to “pass to the limit” as
the dimension d goes to ∞.
110
References
[1] M. Agueh, Existence of solutions to degenerate parabolic equations via the
Monge-Kantorovich theory, Adv. Differential Equations, to appear, (2002).
[2]
, Asymptotic behavior for doubly degenerate parabolic equations, C. R.
Math. Acad. Sci. Paris, 337 (2003), pp. 331–336.
[3] M. Agueh, N. Ghoussoub, and X. Kang, The optimal evolution of the
free energy of interacting gases and its applications, C. R. Math. Acad. Sci.
Paris, 337 (2003), pp. 173–178.
[4] G. Alberti and L. Ambrosio, A geometrical approach to monotone functions in Rn , Math. Z., 230 (1999), pp. 259–316.
[5] A. D. Aleksandrov, A theorem on triangles in a metric space and some
of its applications, in Trudy Mat. Inst. Steklov., v 38, Trudy Mat. Inst.
Steklov., v 38, Izdat. Akad. Nauk SSSR, Moscow, 1951, pp. 5–23.
[6] L. Ambrosio, Lecture notes on optimal transport problem, in Mathematical aspects of evolving interfaces, CIME summer school in Madeira (Pt),
P. Colli and J. Rodrigues, eds., vol. 1812, Springer, 2003, pp. 1–52.
[7] L. Ambrosio, N. Fusco, and D. Pallara, Functions of bounded variation and free discontinuity problems, Oxford Mathematical Monographs,
Clarendon Press, Oxford, 2000.
[8] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows in metric spaces
and in the spaces of probability measures, Lectures in Mathematics ETH,
Birkhäuser, Zürich, 2005.
[9] L. Ambrosio and P. Tilli, Topics on Analysis in Metric Spaces, Oxford
University Press, Oxford, 2004.
[10] A. Arnold and J.Dolbeault, Refined convex Sobolev inequalities, Tech.
Rep. 431, Ceremade, 2004.
[11] A. Arnold, P. Markowich, G. Toscani, and A. Unterreiter, On
convex Sobolev inequalities and the rate of convergence to equilibrium for
Fokker-Planck type equations, Comm. Partial Differential Equations, 26
(2001), pp. 43–100.
[12] J.-D. Benamou and Y. Brenier, A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem, Numer. Math., 84
(2000), pp. 375–393.
[13] V. I. Bogachev, Gaussian measures, vol. 62 of Mathematical Surveys and
Monographs, American Mathematical Society, Providence, RI, 1998.
111
[14] V. I. Bogachev, G. Da Prato, and M. Röckner, Existence of solutions to weak parabolic equations for measures, Proc. London Math. Soc.
(3), 88 (2004), pp. 753–774.
[15] H. Brézis, Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert, North-Holland Publishing Co., Amsterdam, 1973. North-Holland Mathematics Studies, No. 5. Notas de
Matemática (50).
[16] G. Buttazzo, Semicontinuity, relaxation and integral representation in
the calculus of variations, vol. 207 of Pitman Research Notes in Mathematics Series, Longman Scientific & Technical, Harlow, 1989.
[17] E. Caglioti and C. Villani, Homogeneous cooling states are not always
good approximations to granular flows, Arch. Ration. Mech. Anal., 163
(2002), pp. 329–343.
[18] E. A. Carlen and W. Gangbo, Constrained steepest descent in the 2Wasserstein metric, Ann. of Math. (2), 157 (2003), pp. 807–846.
[19]
, Solution of a model Boltzmann equation via steepest descent in the
2-Wasserstein metric, Arch. Ration. Mech. Anal., 172 (2004), pp. 21–64.
[20] J. A. Carrillo, R. J. McCann, and C. Villani, Kinetic equilibration rates for granular media and related equations: entropy dissipation
and mass transportation estimates, Rev. Mat. Iberoamericana, 19 (2003),
pp. 971–1018.
[21] J. A. Carrillo, R. J. McCann, and C. Villani, Contractions in the 2Wasserstein space and thermalization of granular media, To appear, (2004).
[22] E. Chasseigne and J. L. Vazquez, Theory of extended solutions for fastdiffusion equations in optimal classes of data. Radiation from singularities,
Arch. Ration. Mech. Anal., 164 (2002), pp. 133–187.
[23] D. Cordero-Erausquin, B. Nazaret, and C. Villani, A masstransportation approach to sharp Sobolev and Gagliardo-Nirenberg inequalities, Adv. Math., 182 (2004), pp. 307–332.
[24] G. Da Prato and A. Lunardi, Elliptic operators with unbounded drift
coefficients and Neumann boundary condition, J. Differential Equations,
198 (2004), pp. 35–52.
[25] E. De Giorgi, New problems on minimizing movements, in Boundary
Value Problems for PDE and Applications, C. Baiocchi and J. L. Lions,
eds., Masson, 1993, pp. 81–98.
[26] E. De Giorgi, A. Marino, and M. Tosques, Problems of evolution in
metric spaces and maximal decreasing curve, Atti Accad. Naz. Lincei Rend.
Cl. Sci. Fis. Mat. Natur. (8), 68 (1980), pp. 180–187.
112
[27] M. Degiovanni, A. Marino, and M. Tosques, Evolution equations
with lack of convexity, Nonlinear Anal., 9 (1985), pp. 1401–1443.
[28] M. Del Pino, J. Dolbeault, and I. Gentil, Nonlinear diffusions, hypercontractivity and the optimal Lp -Euclidean logarithmic Sobolev inequality, J. Math. Anal. Appl., 293 (2004), pp. 375–388.
[29] C. Dellacherie and P.-A. Meyer, Probabilities and potential, vol. 29 of
North-Holland Mathematics Studies, North-Holland Publishing Co., Amsterdam, 1978.
[30] R. J. DiPerna and P.-L. Lions, Ordinary differential equations, transport theory and Sobolev spaces, Invent. Math., 98 (1989), pp. 511–547.
[31] J. Dolbeault, D. Kinderlehrer, and M. Kowalczyk, Remarks about
the Flashing Rachet, Tech. Rep. 406, Ceremade, 2004.
[32] L. C. Evans, W. Gangbo, and O. Savin, Diffeomorphisms and nonlinear heat flows, (2004).
[33] L. C. Evans and R. F. Gariepy, Measure theory and fine properties of
functions, Studies in Advanced Mathematics, CRC Press, Boca Raton, FL,
1992.
[34] H. Federer, Geometric measure theory, Die Grundlehren der mathematischen Wissenschaften, Band 153, Springer-Verlag New York Inc., New
York, 1969.
[35] J. Feng and M. Katsoulakis, A Hamilton Jacobi theory for controlled
gradient flows in infinite dimensions,, tech. rep., 2003.
[36] W. Gangbo, The Monge mass transfer problem and its applications, in
Monge Ampère equation: applications to geometry and optimization (Deerfield Beach, FL, 1997), vol. 226 of Contemp. Math., Amer. Math. Soc.,
Providence, RI, 1999, pp. 79–104.
[37] R. Gardner, The Brunn-Minkowski inequality, Bull. Amer. Math. Soc.,
39 (2002), pp. 355–405.
[38] L. Giacomelli and F. Otto, Variational formulation for the lubrication approximation of the Hele-Shaw flow, Calc. Var. Partial Differential
Equations, 13 (2001), pp. 377–403.
[39]
, Rigorous lubrication approximation, Interfaces Free Bound., 5
(2003), pp. 483–529.
[40] U. Gianazza, G. Toscani, and G. Savaré, A fourth order parabolic
equation and the Wasserstein distance, tech. rep., IMATI-CNR, Pavia,
2004. to appear.
113
[41] M. Giaquinta and S. Hildebrandt, Calculus of Variations I, vol. 310 of
Grundlehren der mathematischen Wissenschaften, Springer, Berlin, 1996.
[42] K. Glasner, A diffuse interface approach to Hele-Shaw flow, Nonlinearity,
16 (2003), pp. 49–66.
[43] C. Goffman and J. Serrin, Sublinear functions of measures and variational integrals, Duke Math. J., 31 (1964), pp. 159–178.
[44] C. Huang and R. Jordan, Variational formulations for Vlasov-PoissonFokker-Planck systems, Math. Methods Appl. Sci., 23 (2000), pp. 803–843.
[45] R. Jordan, D. Kinderlehrer, and F. Otto, The variational formulation of the Fokker-Planck equation, SIAM J. Math. Anal., 29 (1998),
pp. 1–17 (electronic).
[46] J. Jost, Nonpositive curvature: geometric and analytic aspects, Lectures
in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 1997.
[47] D. Kinderlehrer and N. J. Walkington, Approximation of parabolic
equations using the Wasserstein metric, M2AN Math. Model. Numer.
Anal., 33 (1999), pp. 837–852.
[48] Z.-M. Ma and M. Röckner, Introduction to the Theory of (Nonsymmetric) Dirichlet Forms, Springer, New York, 1992.
[49] A. Marino, C. Saccon, and M. Tosques, Curves of maximal slope and
parabolic variational inequalities on nonconvex constraints, Ann. Scuola
Norm. Sup. Pisa Cl. Sci. (4), 16 (1989), pp. 281–330.
[50] R. McCann, Polar factorization of maps on riemannian manifolds, Geometric and Functional Analysis, 11 (2001), pp. 589–608.
[51] R. J. McCann, A convexity principle for interacting gases, Adv. Math.,
128 (1997), pp. 153–179.
[52] T. Mikami, Dynamical systems in the variational formulation of the
Fokker-Planck equation by the Wasserstein metric, Appl. Math. Optim.,
42 (2000), pp. 203–227.
[53] F. Otto, Doubly degenerate diffusion equations as steepest descent,
Manuscript, (1996).
[54]
, Dynamics of labyrinthine pattern formation in magnetic fluids: a
mean-field theory, Arch. Rational Mech. Anal., 141 (1998), pp. 63–103.
[55]
, Lubrication approximation with prescribed nonzero contact angle,
Comm. Partial Differential Equations, 23 (1998), pp. 2077–2164.
[56]
, Evolution of microstructure in unstable porous media flow: a relaxational approach, Comm. Pure Appl. Math., 52 (1999), pp. 873–915.
114
[57]
, The geometry of dissipative evolution equations: the porous medium
equation, Comm. Partial Differential Equations, 26 (2001), pp. 101–174.
[58] F. Otto and C. Villani, Generalization of an inequality by Talagrand
and links with the logarithmic Sobolev inequality, J. Funct. Anal., 173
(2000), pp. 361–400.
[59] A. Pazy, Semigroups of linear operators and applications to partial differential equations, Springer-Verlag, New York, 1983.
[60] M. Pierre, Uniqueness of the solutions of ut − ∆ϕ(u) = 0 with initial
datum a measure, Nonlinear Anal., 6 (1982), pp. 175–187.
[61] A. Pratelli, On the equality between Monge’s infimum and Kantorovich’s
minimum in optimal mass transportation, To appear, (2004).
[62] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, SpringerVerlag, Berlin, 1998.
[63] C. Sparber, J. A. Carrillo, J. Dolbeault, and P. A. Markowich,
On the long-time behavior of the quantum Fokker-Planck equation,
Monatsh. Math., 141 (2004), pp. 237–257.
[64] C. Villani, Optimal transportation, dissipative PDE’s and functional inequalities, in Optimal transportation and applications (Martina Franca,
2001), vol. 1813 of Lecture Notes in Math., Springer, Berlin, 2003, pp. 53–
89.
[65]
, Topics in optimal transportation, vol. 58 of Graduate Studies in
Mathematics, American Mathematical Society, Providence, RI, 2003.
115