BV ESTIMATES IN OPTIMAL TRANSPORTATION - cvgmt

BV ESTIMATES IN OPTIMAL TRANSPORTATION AND APPLICATIONS
GUIDO DE PHILIPPIS, ALPÁR RICHÁRD MÉSZÁROS, FILIPPO SANTAMBROGIO,
AND BOZHIDAR VELICHKOV
Abstract. In this paper we study the BV regularity for solutions of variational problems in
Optimal Transportation. As an application we recover BV estimates for solutions of some
non-linear parabolic PDE by means of optimal transportation techniques. We also prove that
the Wasserstein projection of a measure with BV density on the set of measures with density
bounded by a given BV function f is of bounded variation as well. In particular, in the case
f = 1 (projection onto a set of densities with an L∞ bound) we precisely prove that the total
variation of the projection does not exceed the total variation of the projected measure. This
is an estimate which can be iterated, and is therefore very useful in some evolutionary PDEs
(crowd motion,. . . ). We also establish some properties of the Wasserstein projection which are
interesting in their own, and allow for instance to prove uniqueness of such a projection in a
very general framework.
1. Introduction
Among variational problems involving optimal transportation and Wasserstein distances, a
very recurrent one is the following
(1.1)
min
%∈P2 (Ω)
1 2
W (%, g) + τ F (%) ,
2 2
where F is a given functional on probability measures, τ > 0 a parameter which can possibly be
small, and g is a givenRprobability in P2 (Ω) (the space of probability measures on Ω ⊆ Rd with
finite second moment |x|2 d%(x) < +∞). This very instance of the problem is exactly the one
we face in the time-discretization of the gradient flow of F in P2 (Ω), where g = %τk is the measure
at step k, and the optimal % will be the next measure %τk+1 . Under suitable assumptions, at the
limit when τ → 0, this sequence converges to a curve of measures which is the gradient flow of
F (see [2, 1] for a general description of this theory).
But the same problem also appears in other frameworks as well, for fixed τ . For instance in
image processing, if F is a smoothing functional, this is a model to find a better (smoother)
image % which is not so far from the original g (and the choice of the distance W2 is justified by
robustness arguments), see [15]. In urban planning (see [5, 23]) g can represent the distribution
of some resources and % that of population, who wants to be close to g but also to guarantee
enough space toReach individual. In this case the functional F favors diffuse measures, for
instance F (%) = h(%(x)) dx for a convex and superlinear function h, which gives a higher cost
to high densities of % (note that in this case, by Jensen’s inequality, F is minimized by the
uniform measure on Ω). Reciprocally, g could instead represent the distribution of population,
and % that of services, to be chosen so that they are close enough to g but more concentrated.
In this case F will favor concentrated measures, instead.
1
2
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
When F takes only value 0 and +∞, the above problem becomes a projection problem. Recently, the projection onto the set K1 of densities bounded above by the constant 11 has received
lot of attention because of its applications in the time-discretization of evolution problems with
density constraints, in particular for crowd motion (see [22, 16], where a crowd is described as a
population of particles which cannot overlap, and cannot go beyond
a certain threshold density).
R
In this paper we concentrate on the case where F (%) = h(%) for a convex integrand h :
R+ → R ∪ {+∞}. The case of the projection on K1 is obtained by taking the following function:
(
0,
if 0 ≤ % ≤ 1
h(%) =
+∞, if % > 1 ,
We are interested in the estimates that one can give on the minimizer %̄, which can be roughly
divided into two categories: those which are independent of g but depend on τ , and those which
are uniform in τ but refer to similar bounds
R on g. For instance, by writing down the optimality
conditions for (1.1) in the case F (%) = h(%), we get ϕ + τ h0 (%̄) = const, where ϕ is the
Kantorovich potential in the transport from %̄ to g (this equality would be true %̄−a.e., but
we will forget this detail in this heuristic discussion). On a bounded domain, ϕ is Lipschitz
continuous, and so is τ h0 (%̄). If h is strictly convex and C 1 , this allows to get continuity for %̄,
but the bounds obviously degenerate as τ → 0. On the other hand, they do not really depend
on g.
Another bound that one can prove is k%̄kL∞ ≤ kgkL∞ (see [7, 23]), which, on the contrary, is
independent of τ .
In this paper we are mainly concerned with BV estimates. As we also expect some uniform
estimate, we get rid of the parameter τ that we only introduced for the sake of this presentation.
We recall that for every function % ∈ L1 and every open set A the total variation of ∇% in A is
defined as
Z
Z
1
T V (u, A) =
|∇%| = sup
% divξ dx : ξ ∈ Cc (A), |ξ| ≤ 1 .
A
Our main theorem reads as follows:
Theorem 1.1. Let Ω ⊂ Rd be a (possibly unbounded) convex set, h : R+ → R ∪ {+∞} be a
convex function and g ∈ P2 (Ω)∩BV (Ω). If %̄ is a minimizer of the following variational problem
Z
1 2
h(%(x)) dx ,
W2 (%, g) +
min
%∈P2 (Ω) 2
Ω
then
Z
Z
|∇%̄| dx ≤
Ω
|∇g| dx .
Ω
As we said, this covers the case of the Wassertstein projection of g on the subset K1 of P2 (Ω)
given by the measures with density less than or equal to 1. When dealing with Wasserstein
projections we are actually able to establish BV bounds in the more general case in which we
project on the set of measures with density less then or equal to a prescribed BV function f .
More precisely we have the following theorem
1Here and in the sequel we denote by K the set f absolutely continuous measure with density bounded by f ,
f
i.e.
Kf := {% ∈ P(Ω) : % ≤ f dx}
BV ESTIMATES IN OPTIMAL TRANSPORT
3
Theorem 1.2. Let Ω ⊂ Rd be a (possibly unbounded) convex set, g ∈ P2 (Ω) ∩ BV (Ω) and let
f ∈ BVloc (Ω) be a function with
Z
f dx ≥ 1.
Ω
If
(1.2)
n
%̄ = argmin W22 (%, g) : % ∈ P2 (Ω),
o
% ≤ f a.e. ,
then
|∇f | dx.
|∇g| dx + 2
|∇%̄| dx ≤
Ω
Z
Z
Z
(1.3)
Ω
Ω
We would like to spend some words on this BV estimate for the projection, at least in the case
of the projection onto K1 , which is the original motivation for this paper. This BV estimate is
indeed natural, at least in the 1D case (all the oscillation beyond the level 1 are replaced by a
possible jump between a value smaller than 1 and 1, as in the picture below).
For the higher-dimensional case, the situation is trickier, as, for instance, the projection of a measure g = (1 + ε)1B(0,R) (with
(1 + ε)|B(0, R)| = 1) is the indicator funcg
tion of a bigger ball. The total variation
involves two opposite effects: the perime%̄ = 1
1
ter of the ball increases, but the height of
the jump passes from 1 + ε to 1. It is not
difficult to see that the combination of the
%̄ = g
two effects is such that the total variation
decreases. It is also possible to adapt this
argument to the case of a radially symmetric density g, but the general case is
not evident.
These kind of BV estimates are useful when the projection is treated as one time-step of a
discretized evolution process. For instance, a BV bound allows to transform weak convergence
in the sense of measures into strong L1 convergence (see Section 6.3). Also, if we consider a
PDE mixing a smooth evolution, such as the Fokker-Planck evolution, and some projection
steps (in order to impose a density constraint, as in crowd motion issues), one could wonder
which bounds on the regularity of the solution are preserved in time. From the fact that the
discontinuities in the projected measure destroy any kind of W 1,p norm, it is natural
to look
R
R for
BV bounds. Notice by the way that, for these kind of applications, proving Ω |∇%̄| ≤ Ω |∇g|
(with no multiplicative coefficient nor additional term) is crucial in order to iterate this estimate
at every step.
The paper is structured as follows: In Section 2 we recall some preliminary results in optimal transportation, in Section 3 we establish our key mother inequality, in Section 4 we prove
Theorem 1.1 while in Section 5 we collect some properties of solution of (1.2) which can be
interesting in their own and we we prove Theorem 1.2. Eventually, in Section 6 we present some
applications of the above results, connections with other variational and evolution problems and
some open questions.
4
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
2. Notations and preliminaries
In this section we collect some facts about optimal transport that we will need in the sequel,
referring the reader to [25] for more details. We will denote by P(Ω) the set of probability
measures in Ω and by P2 (Ω)
R the subset of P(Ω) given by those with finite second moment (i.e.
µ ∈ P2 (Ω) if and only if |x|2 dµ < ∞). We will also use the spaces M(Ω) Rof finite measures
on Ω and L1+ (Ω) of non-negative functions in L1 . Notice {f ∈ L1+ (Ω) : f (x) dx = 1} =
L1+ (Ω) ∩ P(Ω). In the sequel we will always identify an absolutely continuous measure with its
density (for instance writing T# f for T# (f dx) and so on..).
Theorem 2.1. Let Ω ⊂ Rd be a given convex set and let %, g ∈ L1+ (Ω) be two probability densities
on Ω. Then the following hold:
(i) The problem
(2.1)
(2.2)
1 2
W (%, g) := min
2 2
Z
Ω×Ω
1
2
|x − y| dγ : γ ∈ Π(%, g) ,
2
where Π(%, g) is the set of the so-called transport plans, i.e. Π(%dx, gdx) := {γ ∈ P(Ω ×
Ω) : (π x )# γ = %, (π y )# γ = g}, has a unique solution, which is of the form γT̂ :=
(id, T̂ )# %, and T̂ : Ω → Ω is a solution of the problem
Z
1
min
|x − T (x)|2 %(x) dx .
T# %=g Ω 2
(ii) The map T̂ : {% > 0} → {g > 0} is a.e. invertible and its inverse Ŝ := T̂ −1 is a solution
of the problem
Z
1
|x − S(x)|2 g(x) dx.
(2.3)
min
S# g=% Ω 2
(iii) W2 (·, ·) is a distance on the space P2 (Ω) of probabilities over Ω with finite second moment.
(iv) We have
(2.4)
Z
Z
1 2
1
W2 (%, g) = max
ϕ(x)%(x) dx +
ψ(y)g(y) dy : ϕ(x) + ψ(y) ≤ |x − y|2 , ∀x, y ∈ Ω .
2
2
Ω
Ω
(v) The optimal functions ϕ̂, ψ̂ in (2.4) are continuous, differentiable almost everywhere,
Lipschitz if Ω is bounded, and such that:
• T̂ (x) = x − ∇ϕ̂(x) and Ŝ(x) = x − ∇ψ̂(x) for a.e. x ∈ Ω; in particular, the
gradients of the optimal functions are uniquely determined (even in case of nonuniqueness of ϕ̂ and ψ̂) a.e. on {% > 0} and {g > 0}, respectively;
• the functions
x 7→
|x|2
− ϕ̂(x)
2
and
x 7→
|x|2
− ψ̂(x),
2
are convex in Ω and hence ϕ̂ and
ψ̂ are semi-concave;
1
1
2
2
• ϕ̂(x) = max
|x − y| − ψ̂(y)
and
ψ̂(y) = max
|x − y| − ϕ̂(x) ;
y∈Ω
x∈Ω
2
2
BV ESTIMATES IN OPTIMAL TRANSPORT
(2.5)
5
• if we denote by χc the so-called c−transform of a function χ : Ω → R defined through
χc (y) = inf x∈Ω 12 |x − y|2 − χ(x), then the maximal value in (2.4) is also equal to
Z
Z
c
0
ϕ (y)g(y) dy, ϕ ∈ C (Ω)
max
ϕ(x)%(x) dx +
Ω
Ω
and the optimal ϕ is the same ϕ̂ as above, and is such that ϕ̂ = (ϕ̂c )c a.e. on
{% > 0}.
(vi) The functional W : M(Ω) → R ∪ {+∞} defined through
(1 2
Z
Z
W (%, g), if % ∈ P2 (Ω)
c
0
ϕ (y)g(y) dy, ϕ ∈ C (Ω) = 2 2
W (%) = max
ϕ(x)%(x) dx +
+∞,
otherwise,
Ω
Ω
is convex and its subdifferential is given by
∂W (%) = ϕ ∈ C 0 (Ω) which are optimal in (2.5) .
The only non-standard point is the last one (the computation of the sub-differential of W ):
it is sketched in [5], and a more detailed presentation will be part of [24].
We also need some regularity results on optimal transport maps, see [8, 9].
Theorem 2.2. Let Ω ⊂ Rd be a bounded strictly convex set with smooth boundary and let
%, g ∈ L1+ (Ω) be two probability densities on Ω away from zero and infinity2. Then, using the
notations from Theorem 2.1, we have:
(i) T̂ ∈ C 0,α (Ω) and Ŝ ∈ C 0,α (Ω).
(ii) If % ∈ C k,β (Ω) and g ∈ C k,β (Ω), then T̂ ∈ C k+1,β (Ω) and Ŝ ∈ C k+1,β (Ω).
Most of our proofs will be done by approximation. To do this, we need a stability result
Theorem 2.3. Let Ω ⊂ Rd be a bounded convex set and let %n ∈ L1+ (Ω) and gn ∈ L1+ (Ω) be two
sequences of probability densities in Ω. Then, using the notations from Theorem 2.1, if %n * %
and gn * g weakly as measures, then we have:
(i) W2 (%, g) = limn→∞ W2 (%n , gn ).
(ii) there exist two semi-concave functions ϕ, ψ such that ∇ϕ̂n → ∇ϕ
and
∇ψ̂n →
∇ψ a.e. and ∇ϕ = ∇ϕ̂ a.e. on {% > 0} and ∇ψ = ∇ψ̂ a.e. on {g > 0}.
If Ω is unbounded (for instance Ω = Rd ), then the convergence %n * % and gn * g weakly as
measures is not enough to guarantee (i) but only implies W2 (%, g) ≤ lim inf n→∞ W2 (%n , gn ). Yet,
(i) is satisfied if W2 (%n , %), W2 (gn , g) → 0, which is a stronger condition.
Proof. The proof of (i) can be found in [25]. We prove (ii). (Actually this is a consequence of
the Theorem 3.3.3. from [10], but for the sake of completeness we sketch its simple proof).
We first note that due to Theorem 2.1 (v) the sequences ϕ̂n and ψ̂n are equi-continuous.
Moreover, since the Kantorovich potentials are uniquely determined up to a constant we may
suppose that there is x0 ∈ Ω such that ϕ̂n (x0 ) = ψ̂n (x0 ) = 0 for every n ∈ N. Thus, ϕ̂n and ψ̂n
are locally uniformly bounded in Ω and, by the Ascoli-Arzelà Theorem, they converge uniformly
up to a subsequence
ϕ̂n −−−→ ϕ∞
and
ψ̂n −−−→ ψ∞ ,
n→∞
n→∞
2We say that % and g are away from zero and infinity if there is some ε > 0 such that ε ≤ % ≤ 1/ε and
ε ≤ g ≤ 1/ε a.e. in Ω.
6
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
to some continuous functions ϕ∞ , ψ∞ ∈ C(Ω), satisfying
1
ϕ∞ (x) + ψ∞ (y) ≤ |x − y|2 , for every x, y ∈ Ω.
2
In order to show that ϕ∞ and ψ∞ are precisely Kantorovich potentials, we use the characterization of the potentials as solutions to the problem (2.4). Indeed, let ϕ and ψ be such that
ϕ(x) + ψ(y) ≤ 12 |x − y|2 for every x, y ∈ Ω. Then, for every n ∈ N we have
Z
Z
Z
Z
ϕ̂n (x)%n (x) dx +
ψ̂n (y)gn (y) dy ≥
ϕ(x)%n (x) dx +
ψ(y)gn (y) dy,
Ω
Ω
Ω
Ω
and passing to the limit we obtain
Z
Z
Z
Z
ϕ∞ (x)%(x) dx +
ψ∞ (y)g(y) dy ≥
ϕ(x)%(x) dx +
ψ(y)g(y) dy,
Ω
Ω
Ω
Ω
which proves that ϕ∞ and ψ∞ are optimal. In particular, the gradient of these functions coincide
with those of ϕ̂ and ψ̂ on the sets where the densities are strictly positive.
We now prove that ∇ϕ̂n → ∇ϕ∞ a.e. in Ω. We denote with N ⊂ Ω the set of points x ∈ Ω,
such that there is a function among ϕ̂ and ϕ̂n , for n ∈ N, which is not differentiable at x. We
note that by Theorem 2.1 (v) the set N has Lebesgue measure zero. Let now x0 ∈ Ω \ N and
suppose, without loss of generality, x0 = 0. Setting
|x|2
|x|2
− ϕ̂n (x) + ϕ̂n (0) + x · ∇ϕ∞ (0) and α(x) :=
− ϕ∞ (x) + ϕ∞ (0) + x · ∇ϕ∞ (0),
2
2
we have that αn are all convex and such that αn (0) = 0, and hence αn (x) ≥ ∇αn (0)·x. Moreover,
αn → α locally uniformly and ∇α(0) = 0. Suppose by contradiction that limn→∞ ∇αn (0) 6= 0.
Then, there is a unit vector p ∈ Rd and a constant δ > 0 such that, up to a subsequence,
p · ∇αn ≥ δ for every n > 0. Then, for every t > 0 we have
αn (pt)
α(pt)
= lim
≥ lim inf p · ∇αn (0) ≥ δ,
n→∞
n→∞
t
t
which is a contradiction with the fact that ∇α(0) = 0.
αn (x) :=
In order to handle our approximation procedures, we also need to spend some words on the
notion of Γ − convergence (see [11]).
Definition 2.1. On a metric space X let Fn : X → R ∪ {+∞} be a sequence of functions. We
define the two lower-semicontinuous functions F − and F + (called Γ − lim inf and Γ − lim sup of
this sequence, respectively) by
F − (x) := inf{lim inf Fn (xn ) : xn → x},
n→∞
F + (x) := inf{lim sup Fn (xn ) : xn → x}.
n→∞
F−
Should
and
F = F − = F +.
F+
coincide, then we say that Fn actually Γ−converges to the common value
This means that, when one wants to prove Γ−convergence of Fn towards a given functional
F , one has actually to prove two distinct facts: first we need F − ≥ F (i.e. we need to prove
lim inf n Fn (xn ) ≥ F (x) for any approximating sequence xn → x; not only, it is sufficient to prove
it when Fn (xn ) is bounded) and then F + ≤ F (i.e. we need to find a recovery sequence xn → x
BV ESTIMATES IN OPTIMAL TRANSPORT
7
such that lim supn Fn (xn ) ≤ F (x)). The definition of Γ−convergence for a continuous parameter
ε → 0 obviously passes through the convergence to the same limit for any subsequence εn → 0.
Among the properties of Γ−convergence we have the following:
• if there exists a compact set K ⊂ X such that inf X Fn = inf K Fn for any n, then F
attains its minimum and inf Fn → min F ,
• if (xn )n is a sequence of minimizers for Fn admitting a subsequence converging to x,
then x minimizes F (in particular, if F has a unique minimizer x and the sequence of
minimizers xn is compact, then xn → x),
• if Fn is a sequence Γ−converging to F , then Fn + G will Γ−converge to F + G for any
continuous function G : X → R ∪ {+∞}.
In the sequel we will need the following two easy criteria to guarantee Γ−convergence.
Proposition 2.4. If each Fn is l.s.c. and Fn → F uniformly, then Fn Γ−converges to F .
If each Fn is l.s.c., Fn ≤ Fn+1 and F (x) = limn Fn (x) for all x, then Fn Γ−converges to F .
We will essentially apply the notion of Γ−convergence in the space X = P(Ω) endowed with
the weak convergence3 (which is indeed metrizable on this bounded subset of the Banach space of
measures) since the space P2 (Ω) endowed with the W2 convergence lacks compactness whenever
Ω is not compact.
We conclude this section with the following simple lemma concerning properties of the functional
(R
if % dx,
Ω h(%(x)) dx,
M(Ω) 3 % 7→ H(%) =
+∞,
otherwise.
Lemma 2.5. Let Ω be an open set and h : R → R ∪ {+∞} be convex and superlinear at +∞,
then the functional H : M(Ω) → R ∪ {+∞} is convex and lower semicontinuous with respect to
the weak convergence of measures. Moreover if h ∈ C 1 then we have
Z
H(% + εχ) − H(%)
= h0 (%) dχ
lim
ε→0
ε
whenever ρ, χ dx, H(%) < +∞ and H(% + εχ) < +∞ at least for small ε. As a consequence,
h0 (%) is the first variation of H and the we have
∂H(%) = {h0 (%)}.
For this classical fact, and in particular for the semicontinuity, we refer to [4] and [3].
3. The “mother” inequality
In this section we establish the key inequality needed in the proof of Theorems 1.1 and 1.2.
3We say that a family of probability measure µ weakly converges to a probability measure µ in Ω if
n
Z
Z
ϕ, dµn →
ϕ dµ
∀ϕ ∈ Cb (Ω) ,
where Cb (Ω) is the space of continuous and bounded functions on Ω
8
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
Lemma 3.1. Suppose that %, g ∈ L1+ are smooth probability densities, which are away from 0
and infinity, and let H ∈ C 2 (Ω) be a convex function. Then we have the following inequality
Z (3.1)
% ∇ · ∇H(∇ϕ) − g ∇ · ∇H(−∇ψ) dx ≤ 0,
Ω
where ϕ and ψ are the corresponding Kantorovich potentials.
Proof. We first note that since % and g are smooth and away from zero and infinity in Ω, Theorem
2.2 implies that ϕ, ψ are smooth as well. Now using the identity S(T (x)) ≡ x and that S# g = %
we get
Z
Z
h
i
g(x) ∇ · ∇H(∇ϕ) (S(x)) dx
%(x) ∇ · ∇H(∇ϕ(x)) dx =
Ω
ZΩ
h
i
g(x) ∇ · ∇H ∇ϕ ◦ S (x) dx
=
Ω
Z
h
i
+
g(x) ∇ · ∇H(∇ϕ) (S(x)) − ∇ · ∇H ∇ϕ ◦ S (x) dx,
Ω
and, by the equality
−∇ψ(x) = S(x) − x = S(x) − T (S(x)) = ∇ϕ(S(x)),
we obtain
Z % ∇ · ∇H(∇ϕ) − g ∇ · ∇H(−∇ψ) dx =
Ω
Z
=
(3.2)
g(x)
h
i
∇ · ∇H(∇ϕ) (S(x)) − ∇ · ∇H ∇ϕ ◦ S (x) dx
Ω
Z
h
i
%(x) ∇ · ∇H(∇ϕ) − ∇ · ∇H ∇ϕ ◦ S ◦ T dx,
=
Ω
For simplicity we set
E = ∇ · (∇H(∇ϕ)) − ∇ · (∇H(∇ϕ) ◦ S) ◦ T
(3.3)
= ∇ · ξ − ∇ · (ξ ◦ S) ◦ T,
where by ξ we denote the continuously differentiable function
ξ(x) = (ξ 1 , . . . , ξ d ) := ∇H(∇ϕ(x)),
whose derivative is given by
Dξ = D ∇H(∇ϕ) = D2 H(∇ϕ) · D2 ϕ.
We now calculate
∇ · (ξ ◦ S) ◦ T
(3.4)
=
d
X
∂(ξ i ◦ S)
i=1
∂xi
◦T =
d X
d
X
∂ξ i
∂S j
(S(T
))
◦T
∂xj
∂xi
i=1 j=1
= tr Dξ · (DT )−1 = tr D2 H(∇ϕ) · D2 ϕ · (Id − D2 ϕ)−1 ,
BV ESTIMATES IN OPTIMAL TRANSPORT
9
where the last two equality follow by DS ◦ T = (DT )−1 and we also used that (DT )−1 =
(Id − D2 ϕ)−1 , where Id is the d-dimensional identity matrix.
By (3.3) and (3.4) we have that
E = tr D2 H(∇ϕ) · D2 ϕ · Id − (Id − D2 ϕ)−1
2
= −tr D2 H(∇ϕ) · D2 ϕ · (Id − D2 ϕ)−1 .
Since we have that
Id − D2 ϕ ≥ 0,
and that the trace of the product of two positive matrices is positive, we obtain E ≤ 0, which
together with (3.2) concludes the proof.
Lemma 3.2. Let Ω ⊂ Rd be bounded and convex, %, g ∈ W 1,1 (Ω) be two probability densities
and H ∈ C 2 (Rd ) be a radially symmetric convex function. Then the following inequality holds
Z (3.5)
∇% · ∇H(∇ϕ) + ∇g · ∇H(∇ψ) dx ≥ 0,
Ω
where ϕ and ψ are the corresponding Kantorovich potentials.
Proof. Let us start noticing that due to the radial symmetry of H
∇H(∇ψ) = −∇H(−∇ψ).
(3.6)
Step 1. Proof in the smooth case. Suppose that the probability densities % and g are smooth
and bounded away from zero and infinity. As in Lemma 3.1, we note that under these assumption
on % and g the Kantorovich potentials are smooth, hence after integration by part the left hand
side of (3.5) becomes
Z Z ∇% · ∇H(∇ϕ) + ∇g · ∇H(∇ψ) dx =
% ∇H(∇ϕ) · n + g ∇H(∇ψ) · n dHd−1
Ω
∂Ω
−
Z % ∇ · ∇H(∇ϕ) + g ∇ · ∇H(∇ψ) dx
Ω
Z
≥
% ∇H(∇ϕ) + g ∇H(∇ψ) · n dHd−1 ,
∂Ω
where we used Lemma 3.1 and (3.6). Moreover, by the radial symmetry of H gives that ∇H(z) =
c(z)z, for some c(z) > 0. Since the gradients of the Kantorovich potentials ∇ϕ and ∇ψ calculated
in boundary points are pointing outward Ω (since T (x) = x − ∇ϕ(x) ∈ Ω, and S(x) = x −
∇ψ(x) ∈ Ω) we have that
∇H(∇ϕ(x)) · n(x) ≥ 0
and
∇H(∇ψ(x)) · n(x) ≥ 0,
∀x ∈ ∂Ω,
which concludes the proof of (3.5) if % and g are smooth.
Step 2. Proof for generic %, g ∈ W 1,1 (Ω). We first note that for every ε > 0 there are smooth
nonnegative functions %ε ∈ C 1 (Ω) and gε ∈ C 1 (Ω) such that
W 1,1 (Ω)
%ε −−−−−→ %
ε→0
and
W 1,1 (Ω)
gε −−−−−→ g.
ε→0
10
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
Moreover, by adding a positive constant and then multiplying by another one, we may assume
that %ε and gε are probability densities away from zero:
Z
Z
2
2
gε dx = 1.
%ε dx =
%ε ≥ ε ,
gε ≥ ε
and
Ω
Ω
C 2,β (Ω)
C 2,β (Ω)
Let ϕε ∈
and ψε ∈
be the Kantorovich potentials corresponding to the optimal
transport maps between %ε and gε . By Step 1 we have
Z (3.7)
∇%ε · ∇H(∇ϕε ) + ∇gε · ∇H(∇ψε ) dx ≥ 0.
Ω
On the other hand by Theorem 2.3 and by the fact that Ω is bounded we have that
|∇ϕε |, |∇ψε | ≤ C,
a.e.
∇ϕε −−−→ ∇ϕ
ε→0
and
a.e.
∇ψε −−−→ ∇ψ,
ε→0
and so passing to the limit as ε → 0 in (3.7) (by dominated convergence, since ∇H is locally
bounded and we can suppose that the convergence %ε → % and gε → g holds a.e. and is
dominated) we obtain (3.5), which concludes the proof.
Remark 3.1. In Lemma 3.2 we can drop the convexity assumption on Ω if %, g have compact
support: indeed, it is enough to choose a ball Ω0 ⊃ Ω containing the supports of % and g.
Remark 3.2. In Lemma 3.2 also remains true in the case of compactly supported densities g
and %, even if we drop the assumption on H, that H(z) = H(|z|). In this case the inequality
becomes
Z ∇% · ∇H(∇ϕ) − ∇g · ∇H(−∇ψ) dx ≥ 0.
Rd
Proof. The proof follows the same scheme of that of Lemma 3.2, first in the smooth case and
then for approximation. We select a convex domain Ω large enough to contain the supports of %
and g in its interior: all the integrations and integration by parts are performed on Ω. The only
difficulty is that we cannot guarantee the boundary term to be positive. Yet, we first take %, g
1
1
to be smooth and we approximate them by taking %ε := ε |Ω|
+ (1 − ε)% and gε := ε |Ω|
+ (1 − ε)g.
For these densities and their corresponding potentials ϕε , ψε , we obtain the inequality
Z Z ∇%ε · ∇H(∇ϕε ) + ∇gε · ∇H(∇ψε ) dx ≥
%ε ∇H(∇ϕε ) + gε ∇H(∇ψε ) · n dHd−1 .
Ω
∂Ω
We can pass to the limit (by dominated convergence as before) in this inequality, and notice
that the r.h.s. tends to 0, since |∇H(∇ϕε )|, |∇H(∇ψε )| ≤ C and %ε = gε = ε/|Ω| on ∂Ω. Once
the inequality is proven for smooth %, g, a new approximation gives the desired result.
p
By approximating H(z) = |z| with H(z) = ε2 + |z|2 , Lemma 3.2 has the following important
z
= 0 for z = 0.
corollary, where we use the convention
|z|
Corollary 3.3. Let Ω ⊂ Rd be a given bounded convex set and %, g ∈ W 1,1 (Ω) be two probability
densities. Then the following inequality holds
Z ∇ϕ
∇ψ (3.8)
∇% ·
+ ∇g ·
dx ≥ 0,
|∇ϕ|
|∇ψ|
Ω
where ϕ and ψ are the corresponding Kantorovich potentials.
BV ESTIMATES IN OPTIMAL TRANSPORT
11
4. BV estimates for minimizers
In this section we prove Theorem 1.1. Since we will need to perform several approximation
arguments, and we want to use Γ−convergence, we need to provide uniqueness of the minimizers.
Lemma 4.1. Let g ∈ P(Ω) ∩ L1+ (Ω), then the functional µ 7→ W22 (µ, g) is strictly convex on
P2 (Ω).
Proof. Suppose by contradiction that µ0 6= µ1 and t ∈]0, 1[ are such that
W22 (µt , g) = (1 − t)W22 (µ0 , g) + tW22 (µ1 , g),
where µt = (1 − t)µ0 + tµ1 . Let γ0 be the optimal transport plan in the transport from µ0 to g
(pay attention to the direction: it is a transport map if we see it backward: from g to µ0 ). As the
starting measure is absolutely continuous, by Brenier’s Theorem, γ0 is of the form (T0 , id)# g.
Analogously, take γ1 = (T1 , id)# g optimal from µ1 to g. Set γt := (1 − t)γ0 + tγ1 ∈ Π(µt , g). We
have
Z
Z
Z
2
2
2
2
2
(1−t)W2 (µ0 , g)+tW2 (µ1 , g) = W2 (µt , g) ≤ |x−y| dγt = (1−t) |x−y| dγ0 +t |x−y|2 dγ1
= (1 − t)W22 (%0 , g) + tW22 (%1 , g),
which implies that γt is actually optimal in the transport from g to µt . Yet γt is not induced
from a transport map, unless T0 = T1 a.e. on {g > 0}. This is a contradiction with µ0 6= µ1 and
proves strict convexity.
Let us denote by C the class of convex l.s.c. function h : R+ → R ∪ {+∞}, finite in a
neighborhood of 0 and with finite right derivative h0 (0) at 0, and superlinear at +∞.
Lemma 4.2. If h ∈ C there exists a sequence of C 2 convex functions hn , superlinear at ∞, with
h00n > 0, hn ≤ hn+1 and h(x) = limn hn (x) for every x ∈ R+ .
Moreover, if h : R+ → R ∪ {+∞} is a convex l.s.c. superlinear function, there exists a
sequence of functions hn ∈ C with hn ≤ hn+1 and h(x) = limn hn (x) for every x ∈ R+ .
Proof. Let us start from the case h ∈ C. Set `+ := sup{x : h(x) < +∞} ∈ R+ ∪ {+∞}. Let us
define an increasing function ξn : R → R in the following way:
 0
h (0)
for x ∈] − ∞, 0]



h0 (x)
for x ∈ [0, `+ − n1 ]
ξn (x) :=
h0 (`+ − n1 )
for `+ − n1 ≤ x < `+ ,


 0 + 1
h (` − n ) + n(x − `+ ) for x ≥ `+ ,
where, if the derivative if h does not exist somewhere, we just replace it with the right derivative.
(Notice that when `+ = +∞, the last two cases do not
R apply).
Let q ≥ 0 be a C 1 function with spt(q) ⊂ [−1, 0], q(t) dt = 1 and let us set qn (t) = nq(nt).
We define hn as the primitive of the C 1 function
Z 1 −t
0
hn (x) :=
ξn (t) − e
qn (t − x) dt,
n
with hn (0) = h(0). It is easy to check that all the required properties are satisfied: we have
h00n (x) ≥ n1 e−x , hn is superlinear because limx→∞ ξn (x) = +∞, and we have increasing convergence hn → h.
12
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
For the case of a generic function h, it is possible to approximate
define `− := inf{x : h(x) < +∞} ∈ R+ and take

1
1
1
−
0 −
−
−

h(` + n ) + h (` + n )(x − ` − n ) + n|x − ` |
hn (x) = h(`− + n1 ) + h0 (`− + n1 )(x − `− − n1 )


h(x)
it with functions in C if we
for x ≤ `−
for x ∈]`− , `− + n1 ]
for x ≥ `− + n1 .
In this case as well, it is easy to check that all the required properties are satisfied.
Proof of Theorem 1.1.
Proof. Let us start from the case where g is W 1,1 and h is C 2 , superlinear, with h00 > 0,
and Ω is a bounded convex set. A minimizer %̄ exists (by semicontinuity of the criterion and
compactness of P2 (Ω)). Thanks to Theorem 2.1 (vi) and Lemma 2.5, the optimality conditions
are of the following there exists a Kantorovich potential ϕ for the transport from %̄ to g such
that 0 = ϕ + h0 (%̄). This shows that h0 (%̄) is Lipschitz continuous. Hence, %̄ is bounded. On
bounded sets h0 is a diffeomorphism with Lipschitz inverse, thanks to h00 > 0, which proves that
%̄ itself is Lipschitz. Then we can apply Corollary 3.3 and get
Z ∇ϕ
∇ψ dx ≥ 0.
∇%̄ ·
+ ∇g ·
|∇ϕ|
|∇ψ|
Ω
Yet, from ϕ = −h0 (%̄) and h00 > 0, we get that ∇ϕ and ∇%̄ are vectors with opposite directions.
Hence we have
Z
Z
Z
∇ψ
|∇%̄| dx ≤
∇g ·
dx ≤
|∇g| dx,
|∇ψ|
Ω
Ω
Ω
which is the desired estimate.
We can generalize to h ∈ C by using the previous lemma and approximating it with a sequence
hn . Thanks to monotone convergence we have Γ−convergence for the minimization problem that
we consider. We also have compactness since P2 (Ω) is compact,
and Runiqueness of the minimizer.
R
Hence, the minimizers %̄n corresponding to hn satisfy Ω |∇%̄n | ≤ Ω |∇g| and converge to the
minimizer %̄ corresponding to h. By the semicontinuity of the total variation we conclude the
proof in this case.
Similarly, we can generalize to other convex functions h, approximating them with functions
in C (notice that this is only interesting if the function h allows the existence of at least a
probability density with finite cost, i.e. if h(1/|Ω|) < +∞). Also, we can take g ∈ BV and
approximate it with W 1,1 functions. If the approximation is done for instance by convolution,
then we have a sequence with W2 (gn , g) → 0, which guarantees uniform convergence of the
functionals, and hence Γ−convergence.
We can also handle the case Ω = Rd , by first taking g to be compactly supported and h ∈ C. In
this case the same arguments as above hold, since the optimality condition 0 = ϕ+h0 (%̄) imposes
that %̄ is compactly supported. Indeed, on {%̄ > 0} we have ϕ = ψ c , where ψ is the Kantorovich
potential defined on spt(g), which is bounded. Hence ϕ grows at infinity quadratically, from
ϕ(x) = inf y∈spt(g) 21 |x − y|2 − ψ(y) and h0 is bounded from below. As a consequence, it is not
possible to have points with %̄ > 0 too far. Once we know that the densities are compactly
supported, the same arguments as above apply. Then one passes to the limit obtaining the
result for any generic convex function h, and then we can also approximate g (as above, we
select a sequence gn of compactly supported densities converging to g in W2 ). Notice that in
BV ESTIMATES IN OPTIMAL TRANSPORT
13
this case the convergence is no more uniform on P2 (Ω), but it is uniform on a bounded set
W2 (%, g) ≤ C which is the only one interesting in the minimization.
5. Projected measures under density constraints
5.1. Existence, uniqueness, characterization, stability of the projected measure. In
this section we will take Ω ⊂ Rd be a given
R closed set with negligible boundary, f : Ω → [0, +∞[
a measurable function in L1loc (Ω) with Ω f dx > 1 and µ ∈ P(Ω) a given probability density on
Ω. We will consider the following projection problem
(5.1)
min W22 (%, µ),
%∈Kf
R
where we set Kf = {% ∈ L1+ (Ω) : Ω % dx = 1, % ≤ f }.
This section is devoted to the study of the above projection problem. We first want to
summarize the main known results. Most of these results are only available in the case f = 1.
Existence. The existence of a solution to Problem (5.1) is a consequence of the direct
method of calculus of variations. Indeed, take a minimizing sequence %n ; it is tight thanks to
the bound W2 (%n , µ) ≤ C; it admits a weakly converging subsequence and the limit minimizes
the functional W2 (·, µ) because of its semicontinuity and of the fact that the inequality % ≤ f is
preserved. We note that from the existence point of view, the case f ≡ 1 and the general case
do not show any significant difference.
Characterization. The optimality conditions, derived in [22] exploiting the strategy developed in [16] (in the case f = 1, but they are easy to adapt to the general case) state the following:
if % is a solution to the above problem and ϕ is a Kantorovich potential in the transport from %
to µ, then there exists a threshold ` ∈ R such that


if ϕ(x) < `,
f (x),
%(x) = 0,
if ϕ(x) > `,


∈ [0, f (x)], if ϕ(x) = `.
In particular, this shows that ∇ϕ = 0 %−a.e. on {% < f } and, since T (x) = x − ∇ϕ(x), that the
optimal transport T from % to µ is the identity on such set. If µ = gdx is absolutely continuous,
then one can write the Monge-Ampère equation
det(DT (x)) = %(x)/g(T (x))
and deduce %(x) = g(T (x)) = g(x) a.e. on {% < f }. This suggests a sort of saturation result for
the optimal %, i.e. %(x) is either equal to g(x) or to f (x) (but one has to pay attention to the
case % = 0 and also to assume that g is absolutely continuous).
Uniqueness. For absolutely continuous measures µ = g dx and generic f the uniqueness of
the projection follows by Lemma 4.1. In the specific case f = 1 and Ω convex the uniqueness
was proved in [16, 22] by a completely different method. In this case, as observed by A. Figalli,
one can use displacement convexity along generalized geodesics. This means that if %0 and %1
are two solutions, one can take for every t ∈ [0, 1] the convex combination T t = (1 − t)T 0 + tT 1
of the optimal transport maps T i from g to %i and the curve t 7→ %t := ((1 − t)T 0 + tT 1 )# µ in
P2 , interpolating from %0 to %1 . It can be proven that %t still satisfies %t ≤ 1 (but this can not
be adapted to f , unless f is concave) and that t 7→ W22 (%t , g) < (1 − t)W22 (%0 , g) + tW22 (%1 , g),
which is a contradiction to the minimality. The assumption on µ can be relaxed but we need
to ensure the existence of optimal transport maps: what we need to assume, is that µ gives
14
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
no mass to “small” sets (i.e. (d − 1)−dimensional); see [14] for the sharp assumptions and
notions about this issue. Thanks to this uniqueness result, we can define a projection operator
PK1 : P2 (Ω) ∩ L1 (Ω) → P2 (Ω) ∩ L1 (Ω) through
PK1 [g] := argmin{W22 (%, g) : % ∈ K1 }.
Stability. From the same displacement interpolation idea, A. Roudneff-Chupin also proved
([22]) that the projection is Hölder continuous with exponent 1/2 for the W2 distance whenever
Ω is a compact convex set. We do not develop the proof here, we just refer to Proposition 2.3.4
of [22]. Notice that the constant in the Hölder continuity depends a priori on the diameter of
Ω. However, to be more precise, the following estimate is obtained (for g 0 and g 1 absolutely
continuous)
(5.2)
W22 (PK1 [g 0 ], PK1 [g 1 ]) ≤ W22 (g 0 , g 1 ) + W2 (g 0 , g 1 )(dist(g 0 , K1 ) + dist(g 1 , K1 )),
which shows that, even on unbounded domains, we have a local Hölder behavior.
In the rest of the section, we want to recover similar results in the largest possible generality,
i.e. for general f , and without the assumptions on µ and Ω.
We will first get a saturation characterization for the projections, which will allow for a general
uniqueness result. Continuity will be an easy corollary.
In order to proceed, we first need the following lemma.
Lemma 5.1. Let % be a solution of the Problem 5.1. Let moreover γ ∈ Π(%, µ) the optimal plan
from % to µ. If (x0 , y0 ) ∈ spt(γ) then % = f a.e. in B(y0 , R), where R = |y0 − x0 |.
Proof. Let us suppose that this is not true and there exists a compact set K ⊂ B(y0 , R) with
positive Lebesgue measure such that % < f a.e. in K. Let ε := dist(∂B(y0 , R), K) > 0.
By the definition of the support, for all r > 0 we have that
Z
Z
0 < γ(B(x0 , r) × B(y0 , r)) ≤
% dx ≤
f dx.
B(x0 ,r)
B(x0 ,r)
By the absolute continuity of the integral, for r > 0 small enough there exists 0 < α ≤ 1 such
that
Z
γ(B(x0 , r) × B(y0 , r)) = α (f − %) dx =: αm.
K
Now we construct the following measures γ̃, η ∈ P(Ω × Ω) as
γ̃ := γ − γ (B(x0 , r) × B(y0 , r)) + η
and η := α(f − %)dx K ⊗ (π y )# γ (B(x0 , r) × B(y0 , r)).
It is immediate to check that (π y )# γ̃ = µ. On the other hand
%̃ := (π x )# γ̃ = % − %
B(x0 , r) + α(f − %)
K≤f
is an admissible competitor in Problem (5.1) and we have the following
Z
2
W2 (%̃, µ) ≤
|x − y|2 dγ̃(x, y)
Ω×Ω
Z
Z
≤ W22 (%, g) −
|x − y|2 dγ(x, y) +
|x − y|2 dη(x, y)
≤
W22 (%, g)
B(x0 ,r)×B(y0 ,r)
2
K×B(y0 ,r)
2
− (R − 2r) αm + (R − ε + r) αm.
Now if we chose r > 0 small enough to have R − 2r > R − ε + r, i.e. r < ε/3 we get that
W22 (%̃, g) < W22 (%, g),
BV ESTIMATES IN OPTIMAL TRANSPORT
15
which is clearly a contradiction, hence the result follows.
The following proposition establishes uniqueness of the projection on Kf as well as a very
precise description of it. For a given measure µ we are going to denote by µac the density of its
absolutely continuous part with respect to the Lebesgue measure, i.e.
µ = µac dx + µs ,
with µs ⊥ dx.
R
Proposition 5.2. Let Ω ⊂ Rd be a convex set and let f ∈ L1loc (Ω), f ≥ 0 be such that Ω f ≥ 1.
Then, for every probability measure µ ∈ P(Ω), there is a unique solution % of the problem (5.1).
Moreover, % is of the form
% = µac 1B + f 1B c ,
(5.3)
for a measurable set B ⊂ Ω.
Proof. We first note that by setting f = 0 on Ωc we can assume that Ω = Rd . Existence of a
solution in Problem 5.1 follows by the direct methods in the calculus of variations by noticing
that the set Kf is closed with respect to the weak convergence of measures.
Let us prove now the saturation result (5.3). Let us first premise the following fact: if
µ, ν ∈ P(Ω), γ ∈ Π(µ, ν) and we define the set
A(γ) := {x ∈ Ω : the only point (x, y) ∈ spt(γ) is (x, x)},
then
(5.4)
µ
A(γ) ≤ ν
In particular µac ≤ ν ac for a.e. x ∈ A(γ). To prove
Z
Z
Z
φ dµ = φ(x)1A(γ) (x) dγ(x, y) =
A(γ)
Z
=
Z
≤
A(γ).
(5.4), let φ ≥ 0 and write
φ(x)12A(γ) (x) dγ(x, y)
φ(y)1A(γ) (y)1A(γ) (x) dγ(x, y)
Z
φ(y)1A(γ) (y) dγ(x, y) =
φ dν,
A(γ)
where we used the fact that γ−a.e. 1A(γ) (x) > 0 implies x = y.
Now, for an optimal transport plan γ ∈ Π(%, µ), let us define
B := Leb(f ) ∩ Leb(µac ) ∩ Leb(%) ∩ {% < f }(1) ∩ {% 6= µac }(1) ∩ A(γ)(1) ∩ A(γ̃)(1) .
Here γ̃ ∈ Π(g, %) is the transport plan obtained by seeing γ “the other way around”, i.e. γ̃ is
the image of γ through the maps (x, y) 7→ (y, x) while Leb(h) is the set of Lebesgue points of h
and for a set A we denote by A(1) := Leb(1A ) the set of its density one points.
Let now x0 ∈ B and let us consider the following two cases:
Case 1. %(x0 ) < µac (x0 ). Since, in particular, µac (x0 ) > 0 and x0 ∈ Leb(µac ) we have that
x0 ∈ spt(µ). From Lemma 5.1 wee see that (y0 , x0 ) ∈ spt(γ) implies y0 = x0 . Indeed if this were
not the case there would exist a ball where % = f a.e. and x0 would be in the middle of this ball;
from x0 ∈ Leb(f ) ∩ Leb(%) we would get %(x0 ) = f (x0 ) a contradiction with x0 ∈ B. Hence, if
we use the set A(γ̃) defined above with ν = %, we have x0 ∈ A(γ̃). From x0 ∈ Leb(µac ) ∩ Leb(%)
we get µac (x0 ) ≤ %(x0 ), which is a contradiction.
16
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
Case 2. µac (x0 ) < %(x0 ). Exactly as in the previous case we have that x0 ∈ spt(%) and, by the
Lemma 5.1, we have again that (x0 , y0 ) ∈ spt(γ) implies y0 = x0 . Indeed, otherwise x0 would
be on the boundary of a ball where % = f a contradiction with x0 ∈ {% < f }(1) . Hence, we get
x0 ∈ A(γ) and %(x0 ) ≤ µac (x0 ), again a contradiction.
Hence we get that µac = % for x ∈ B. By the definition of B,
B c ⊂a.e. {% = f } ∪ A(γ)c ∪ A(γ̃)c ,
where a.e. refers to the Lebesgue measure. By applying Lemma 5.1, this implies that % = f a.e.
on B c , and concludes the proof of (5.3).
Uniqueness of the projection it is now an immediate consequence of the saturation property
(5.3). Indeed, suppose that %0 and %1 were two different projections of a same measure g. Define
%1/2 = 21 %0 + 12 %1 . Then, by convexity of W22 (·, µ), we get that %1/2 is also optimal. But its
density is not saturated on the set where the densities of %0 and %1 differ, in contradiction with
(5.3).
Corollary 5.3. For fixed f , the map PKf : P2 (Ω) → P2 (Ω) defined through
PKf [µ] := argmin{W22 (%, µ) : % ∈ Kf }
is continuous in the following sense: if µn → µ for the W2 distance, then PKf [µn ] * PKf [µ] in
the weak convergence.
Moreover, in the case where f = 1 and Ω is a convex set, the projection is also locally
1
2 −Hölder continuous for W2 on the whole P(Ω) and satisfies (5.2).
Proof. This is just a matter of compactness and uniqueness. Indeed, take a sequence µn → µ
and look at PKf [µn ]. It is a tight sequence of measures since
(5.5)
W2 (PKf [µn ], µ) ≤ W2 (PKf [µn ], µn ) + W2 (µn , µ) ≤ W2 (%, µ) + 2W2 (µn , µ) ,
where % ∈ Kf is any admissible measure. Hence we can extract a weakly converging subsequence
to some measure %̃ ∈ Kf (recall that Kf is weakly closed). Moreover, by the lower semicontinuity
of W2 with respect to the weak convergence and since W2 (µn , µ) → 0, passing to the limit in
(5.5) we get
W2 (%̃, µ) ≤ W2 (%, µ)
∀ % ∈ Kf .
Uniqueness of the projection implies %̃ = PKf (µ) and thus that the limit is independent on the
extracted subsequence, this proves the desired continuity.
Concerning the second part of the statement, we take arbitrary µ1 and µ2 (not necessarily
absolutely continuous) and we approximate them in the W2 distance with absolutely continuous
measures gni (i = 1, 2; for instance by convolution), then we have, from (5.2)
W22 (PK1 [gn0 ], PK1 [gn1 ]) ≤ W22 (gn0 , gn1 ) + W2 (gn0 , gn1 )(dist(gn0 , K1 ) + dist(gn1 , K1 )),
and we can pass to the limit as n → ∞.
The following technical lemma will be used in the next section and establishes the continuity
of the projection with respect to f . To state it let us consider for given f ∈ L1loc and µ ∈ P2 (Ω)
let us consider following functional
(
1
W 2 (µ, %),
if % ∈ Kf
Ff (%) := 2 2
+∞,
otherwise.
BV ESTIMATES IN OPTIMAL TRANSPORT
17
Proposition 5.2 can be restated by saying that the functional Ff has a unique minimizer in
P2 (Ω).
R
R
Lemma 5.4. Let fn , f ∈ L1loc (Ω) with Ω fn dx ≥ 1, Ω f dx ≥ 1 Rand let us assume that fn → f
1
in
R Lloc (Ω) and almost everywhere. Also assume fn ∈ P2 (Ω) if Ω fn dx = 1 and f ∈ P2 (Ω) if
Ω f dx = 1. Then, for every µ ∈ P2 (Ω),
(i) The sequence (PKfn (µ))n is tight.
(ii) WeR have PKfn (µ) * PKf (µ).
(iii) If Ω f > 1, then Ffn Γ−converges to Ff with respect to the weak convergence of measures.
Proof. Let us denote by %̄n the projection PKfn (µ) and let us start from proving
R its tightness, i.e.
ε
(i). We fix ε > 0: there exists a radius R0 such that µ(B(0, R0 )) > 1 − 2 and B(0,R0 ) f > 1 − 2ε .
R
By L1loc convergence, there exists n0 such that B(0,R0 ) fn > 1 − ε pour n > n0 . Now, take
R > 3R0 and suppose %̄n (B(0, R)c ) > ε for n ≥ n0 . Then, the optimal transport T from %̄n to
µ should move some mass from B(0, R)c to B(0, R0 ). Let us take a point x0 ∈ B(0, R)c such
that T (x0 ) ∈ B(0, R0 ). From Lemma 5.1, this means that
R %̄n = fn onR the ball B(T (x0 ), |x0 −
T (x0 )|) ⊃ B(T (x0 ), 2R0 ) ⊃ B(0, R0 ). But this means B(0,R0 ) %̄n = B(0,R0 ) fn > 1 − ε, and
c
hence %̄n (B(0,
R R) ) ≤ ε, which is a contradiction. This shows that %̄n is tight.
Now, if Ω f = 1, then the weak limit of %̄n up to subsequences can only be f itself, since it
mustRbe a probability density
R bounded above by f . And f = PKf (µ). This proves (ii) in the
case Ω f = 1. In theR case Ω f > 1, this will be a consequence of (iii). Notice that in this case
we necessarily have Ω fn > 1 for n large enough.
Let us prove (iii). Since %n ≤ fn , %n * % and fn → f in L1loc immediately implies that % ≤ f ,
the Γ−liminf inequality simply follows by the lower semicontinuity of W2 .
Concerning the Γ−limsup, we need to prove that every density % ∈ P2 (Ω) with % ≤ f a.e. can
be approximated by a sequence %n ≤ fn a.e. with W2 (%n , g) → W2 (%, g). In order to do this let
us define %̃nR := min{%, fn }. Note that
R %̃n is not admissible since it is not a probability, because
in general %̃n < 1. Yet, we have %̃n → 1 since %̃n → min{%, f } = % and this convergence
is dominated by %. We want to “complete” %̃n so as to get a probability, stay admissible, and
converge to % in W2 , since this willR imply that W2 (%n , g) → W2 (%, g).
Let us select a ball B such that B∩Ω f > 1 and note that we can find ε > 0 such that the set
{f > % + ε} ∩ B is of positive measure, i.e. m := |{f > % + ε} ∩ B| > 0. Since fn → f a.e., the
set Bn := {fnR> % + 2ε } ∩ B has measure larger than m/2 for large n. Now take Bn0 ⊂ Bn with
|Bn0 | = 2ε (1 − %̃n ) → 0, and define
ε
%n := %̃n + 1Bn0 .
2
R
By construction, %n = 1 and %n ≤ fn a.e. since on Bn0 we have %̃n = % and % + 2ε < fn while
on the complement of Bn0 , %̃n ≤ fn a.e. by definition. To conclude the proof we only need to
check W2 (%n , %) → 0. This is equivalent (see, for instance, [2] or [25]) to
Z
Z
(5.6)
φ%n → φ%
for all continuous functions φ with such that φ ≤ C(1 + |x|2 ). Since %R∈ P2 (Ω) and %̃n ≤ %,
thank to the dominated convergence theorem it is enough to show that φ(%n − %̃n ) → 0. But
18
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
%n − %̃n converges to 0 in L1 and it is supported in Bn0 ⊂ B. Since φ is bounded on B we obtain
the desired conclusion.
Remark 5.1. Let us conclude this section with the following open question: in a Hilbert space,
the only fact that the projection onto a set K is uniquely determined for every starting point
implies that K is convex and thus that the projection is 1-Lipschitz continuous. Here we are
in a metric space which has a sort of Hilbertian manifold structure (see [2, 14]), and we could
wonder if the same is true stays true. The set Kf is always convex in the usual sense, but it
is also geodesically convex (which seems to be more pertinent in this setting) when f = 1, and
also convex w.r.t. generalized geodesics.
For f = 1 the projection is continuous and we can even provide Hölder bounds on PK1 . The
question whether PK1 is 1-Lipschitz, as far as we know, is open. Let us underline that some sort
of 1-Lipschitz results have been proven in [6] for solutions of similar variational problems, but
seem impossible to adapt in this framework.
For the case f 6= 1 even the continuity of the projection with respect to the Wasserstein
distance seems delicate.
5.2. BV estimates for PKf . In this section, we prove Theorem 1.2. Notice that the case f = 1
has already been proven as a particular case of Theorem 1.1. To handle the general case, we
develop a slightly different strategy, based on the standard idea to approximate L∞ bounds with
Lp penalizations.
Let m ∈ N and let us assume that inf f > 0, for µ ∈ P2 (Ω), we define the approximating
functionals Fm : L1+ (Ω) → R ∪ {+∞} by
Z m+1
Z 2
1
%
εm
%
1 2
dx +
dx
Fm (%) := W2 (µ, %) +
2
m+1 Ω f
2 Ω f
and the limit functional F as
(
F(%) :=
1
2
2 W2 (µ, %),
+∞,
if % ∈ Kf
otherwise
Here εm ↓ 0 is a small parameter to be chosen later.
Lemma 5.5. Let Ω ⊂ Rd and f : Ω → (0, +∞) be a measurable function, bounded from below
and from above by positive constants and let µ ∈ P2 (Ω). Then:
(i) There are unique minimizers %, %m in L1 (Ω) for each of the functionals F and Fm ,
respectively.
(ii) The family of functionals Fm Γ-converges for the weak convergence of probability measures to F, and the minimizers %m weakly converge to %, as m → ∞.
(iii) The minimizers %m of Fm satisfy
m
%m
1
%m 1
(5.7)
ϕm +
+ εm
= 0,
f
f
f
f
for a suitable Kantorovich potential ϕm in the transport from %m to µ.
Proof. Existence and uniqueness of minimizers of F has been established in Proposition 5.2.
Existence of minimizers of Fm is again a simple application of the direct methods in the calculus
of variations and uniqueness follows from strict convexity.
BV ESTIMATES IN OPTIMAL TRANSPORT
19
Let us prove the Γ−convergence in (ii). In order to prove the Γ−liminf inequality, let %m * %.
If Fm (%m ) ≤ C, then for every m0 ≤ m and every finite measure set A ⊂ Ω, we have
1
k%m /f kLm0 (A) ≤ |A| m0
1
− m+1
1
(C(m + 1)) m+1 .
1
If we pass to the limit m → ∞, from %fm * f% , we get ||%/f ||Lm0 (A) ≤ |A| m0 . Letting m0 go to
infinity we obtain ||%/f ||L∞ ≤ 1, i.e. % ∈ Kf . Since
1
Fm (%m ) ≥ W22 (µ, %m ),
2
the lower semicontinuity of W22 with respect to weak converges proves the Γ−liminf inequality.
In order to prove Γ−limsup, we use the constant sequence %m = % as a recovery sequence.
Since we can assume % ≤ f (otherwise there is nothing to prove, since F(%) = +∞), it is clear
that the second and third parts of the functional tend to 0, thus proving the desired inequality.
The last part of the statement finally follows form Theorem 2.1 (vi) and Lemma 2.5
Proof of Theorem 1.2
Proof. Clearly we can assume that T V (g, Ω) and T V (f, Ω) are finite and that
otherwise the conclusion is trivial.
R
Ωf
> 1 since
Step 1. Assume that the support of g is compact, that f ∈ C ∞ (Ω) is bounded from above and
below by positive constants, and let %m be the minimizer of Fm . As in the proof of Theorem
1.1, we can use the optimality condition (5.7) to prove that % is compactly supported. Also, the
same condition imply that % is Lipschitz continuous. Indeed, we can write (5.7) as
%
0
= 0,
ϕf + Hm
f
1
00 is bounded from
where Hm (t) = m+1
tm+1 + ε2m t2 . Since Hm is smooth and convex and Hm
0 is invertible and
below by a positive constant Hm
0 −1
% = f · (Hm
) (ϕf ),
0 )−1 is Lipschitz continuous. Since ϕ and f are locally Lipschitz, this gives Lipschitz
where (Hm
continuity for % on a neighborhood of its support.
Taking the derivative of the optimality condition (5.7) we obtain
!
m−1
m
%m
f ∇%m − %m ∇f
%m
%m ∇f
+ εm
−
+
ε
= 0.
∇ϕm + m
m
f
f3
f
f
f2
Rearranging the terms we have
∇ϕm + A∇%m − B∇f = 0,
where by A and B we denote the (positive!) functions
!
!
m−1
m−1
m
%m
1
%m
%m
%m
%m 1
+εm
+εm
+ εm
A := m
and B := m
+
.
f
f2
f
f3
f
f
f2
Now we will use the inequality from Corollary 3.3 for %m and g in the form
Z
Z
Z
∇ϕm
∇%m
|∇%m | dx ≤
|∇g| dx +
∇%m ·
+
dx.
|∇%m | |∇ϕm |
Ω
Ω
Ω
20
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
In order to estimate the second integral on the right-hand side we use the inequality
a
b a
b b
b |a − b| |b| − |a|
2
(5.8)
|a| − |b| ≤ |a| − |a| + |a| − |b| = |a| + |a| ≤ |a| |a − b|,
for all non-zero a, b ∈ Rd (that we apply to a = A∇%m and b = −∇ϕm ), and we obtain
Z
Z
Z
A∇%m
∇ϕm |∇%m | · dx
|∇g| dx +
+
|∇%m | dx ≤
A|∇%m | |∇ϕm | Ω
Ω
Ω
Z
Z
1 ≤
|∇g| dx + 2
A∇%m + ∇ϕm dx
A
ZΩ
ZΩ
B
≤
|∇g| dx + 2
|∇f | dx.
Ω
Ω A
We must now estimate the ration B/A. If we denote by λ the ratio %m /f we may write
εm + λm−1
1
εm λ
B
=λ+λ
≤λ 1+
+
.
m−1
A
εm + mλ
m
εm + mλm−1
Now, consider that
m−2
εm λ
=
max
m−1
λ∈R+ εm + mλ
m−1
εm
m(m − 2)
1/(m−1)
=: δm
is a quantity depending on m and tending to 0 if εm is chosen small enough (for instance
2
εm = 2−m ). This allows to write
Z
Z
Z
Z
1
%m
|∇%m | dx ≤
|∇g| dx + 2 1 +
|∇f | dx + 2δm
|∇f | dx.
m
Ω
Ω
Ω f
Ω
In the limit, as m → +∞, we obtain
Z
Z
Z
%
|∇f | dx.
|∇%| dx ≤
|∇g| dx + 2
Ω
Ω
Ω f
Using the fact that % ≤ f , we get
Z
Z
Z
|∇%| dx ≤
|∇g| dx + 2 |∇f | dx.
Ω
Ω
Ω
Step 2. To treat the case g, f ∈ BVloc (Ω) we proceed by approximation as in the proof of
Theorem 1.1. To do this we just note that Corollary 5.3 and Lemma 5.4 give the desired
continuity property of the projection with respect both to g and f , lower semicontinuity of the
total variation with respect to the weak convergence then implies the conclusion.
Remark 5.2. We conclude this section by noticing that the constant 2 in Equation (1.3) can
not be replaced by any smaller constant. Indeed if Ω = R, f = 1R+ , g = n1 1[−n,0] then
R
R
R
% = PKf (g) = 1[0,1] and |∇%| = 2, |∇f | = 1, |∇g| = n2 .
6. Applications
In this section we discuss some applications of Theorems 1.1 and 1.2 and we present some
open problem.
BV ESTIMATES IN OPTIMAL TRANSPORT
21
6.1. Partial transport. The projection problem on Kf is a particular case of the so called
partial transport problem, see [12, 13]. Indeed, the problem is to transport µ to a part of the
measure f , which is a measure with mass larger than 1. As typical in the partial transport
problem, the solution has an active region, which is given by f restricted to a certain set. This
set satisfies a sort of interior ball condition, with a radius depending on the distance between
each point and its image. In the partial transport case some regularity (C 1,α ) is known for the
optimal map away from the intersection of the supports of the two measures.
A natural question is how to apply the technique that we developed here in the framework
of more general partial transport problems (in general, both measures could have mass larger
than 1 and could be transported only partially), and/or whether results or ideas from partial
transport could be translated into the regularity of the free boundary in the projection.
6.2.
Shape optimization. If we take a set A ⊂ Rd with |A| < 1 and finite second moment
R
2
A |x| dx < +∞, a natural question is which is the set B with volume 1 such that the uniform
probability density on B is closest to that on A. This means solving a shape optimization
problem of the form
1
1A ) : |B| = 1}.
min{W22 (1B ,
|A|
The considerations in Section 5.1 show that solving such a problem is equivalent to solving
1
min{W22 (%,
1A ) : % ∈ P2 (Rd )}
|A|
and that the optimal % is of the form % = 1B , B ⊃ A. Also, from our Theorem 1.2 (with f = 1),
1
we deduce that if A is of finite perimeter, then the same is true for B, and Per(B) ≤ |A|
Per(A)
(i.e. the perimeter is bounded by the Cheeger ratio of A).
It is interesting to compare this problem with this perimeter bound with the problem studied
in [19], which has the same words but in different order: more precisely: here we minimize the
Wasserstein distance and we try to get an information on the perimeter, in [19] the functional
to be minimized is a combination of W2 and the perimeter. Hence, the techniques to prove any
kind of results are different, because here W2 cannot be considered as a lower order perturbation
of the perimeter.
As a consequence, many natural questions arise: if A is a nice closed set, can we say that B
contains A in its interior? if A is convex is B convex? what about the regularity of ∂B?
6.3. Set evolution problems. Consider the following problem. For a given set
A ⊂ Rd we
define %0 = 1A . For a time interval [0, T ] and a time step τ > 0 (and N + 1 := Tτ ) we consider
the following scheme %τ0 := %0 and
(6.1)
%τk+1 := PK1 [(1 + τ )%τk ] , k ∈ {0, . . . , N − 1},
(here we extend the notion of Wasserstein distance and projection to measures with the same
mass, even if different from 1: in particular, the mass of %τk will be |A|(1 + τ )k and at every step
we project %τk on the set of finite positive measure, with the same mass of %τk , and with density
bounded by 1, and we still denote this set by K1 and the projection operator in the sense of the
quadratic Wasserstein distance onto this set by PK1 ). We want to study the convergence of this
algorithm as τ → 0. This is a very simplified model for the growth of a biological population,
which increases exponentially in size (supposing that there is enough food: see [17] for a more
sophisticated model) but is subject to a density constraint because each individual needs a
22
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
certain amount of space. Notice that this scheme formally follows the same evolution as in the
Hele-Shaw flow (this can be justified by the fact that, close to uniform density the W2 distance
and the H −1 distance are asymptotically the same).
Independently of the compactness arguments that we need to prove the convergence of the
scheme, we notice that, for fixed τ > 0, all the densities %τk are indeed indicator functions (this
comes from the consideration in Section 5.1). Thus we have an evolution of sets. A natural
question is whether this stays true when we pass to the limit as τ → 0. Indeed, we generally
prove convergence of the scheme in the weak sense of measures, and it is well-known that,
in general, a weak limit of indicator functions is not necessarily an indicator itself. However
Theorem 1.2 provides an a priori bound the perimeter of these sets. This BV bound allows to
transform weak convergence as measures into strong L1 convergence, and to preserve the fact
that these densities are indicator functions.
Notice on the other hand that the same result could not be applied in the case where
R the
projection was performed onto Kf , for a non-constant f . The reason lies in the term 2 |∇f |
in the estimate we provided. This means that, a priori, instead
of being decreasing, the total
R
variation could increase at each step of a fixed amount 2 |∇f |. When τ → 0, the number of
iterations diverges and this does not allow to prove any BV estimate on the solution. Yet, a
natural question would be to prove that the set evolution is well-defined as well, using maybe
the fact that these sets are increasing in time.
6.4. Crowd movement with diffusion. In [16, 22] crowd movement models where a density %
evolves according to a given vector field v, but subject to a density constraint % ≤ 1 are studied.
This means that, without the density constraint, the equation would be ∂t % + ∇ · (%v) = 0, and
a natural way to discretize the constrained equation would be to set %̃τk+1 = (id + τ v)# %τk and
then %τk+1 = PK1 [%̃τk+1 ].
What happens if we want to add some diffusion, i.e. if the continuity equation is replaced
by a Fokker-Planck equation ∂t % − ∆% + ∇ · (%v) = 0? among other possible methods, one
discretization idea is the following: define %̃τk+1 by following the unconstrained Fokker-Planck
equation for time τ starting from %τk , and then project. In order to get some compactness of the
discrete curves we need to estimate the distance between %τk and %̃τk+1 . It is not difficult to see
that the speed of the solution of the Heat Equation (and also of the Fokker-Planck equation) for
the distance Wp is related to k∇%kLp . It is well known that these parabolic equations regularize
and so the Lp norm of the gradient will not blow up in time, but we have to keep into account
the projections that we perform every time step τ . From the discontinuities that appear in
the projected measures, one cannot expected that W 1,p bounds on % are preserved. The only
reasonable bound is for p = 1, i.e. a BV bound, which is exactly what is provided in this paper.
The application to crowd motion with diffusion are a matter of current study by the second
and third author [20].
6.5. BV estimates for some degenerate diffusion equation. In this subsection we apply
our main Theorem 1.1 to establish BV estimates for for some degenerate diffusion equation.
BV estimates for these equations are usually known and they can be derived by looking at the
evolution in time of the BV norm of the solution. Theorem Theorem 1.1 allows to give an
optimal transport proof of these estimates. Let h : R+ → R be a given super-linear convex
BV ESTIMATES IN OPTIMAL TRANSPORT
23
function and let us consider the problem
(
∂t %t = ∇ · (h00 (%t )ρt ∇ρt ) , in (0, T ] × Rd ,
(6.2)
%(0, ·) = %0 ,
in Rd ,
where %0 is a non-negative BV probability density. We remark that by the evolution for any
t ∈ (0, T ] %t will remain a non-negative probability density. In the case h(ρ) = ρm /(m − 1) in
equation (6.2) we get precisely the porous medium equation ∂t ρ = ∆(ρm ) (see [26]).
Since the seminal work of F. Otto ([21]) we know that the problem (6.2) can be seen as a
gradient flow of the functional
Z
F(%) :=
h(%)
Rd
in the space (P(Rd ), W2 ). As a gradient flow, this equation can be discretized in time through
an implicit Euler scheme. More precisely let us take a time step τ > 0 and let us consider the
following scheme: %τ0 := %0 and
Z
1 2
τ
τ
W (%, %k ) + h(%) , k ∈ {0, . . . , N − 1}.
(6.3)
%k+1 := argmin%
2τ 2
where N := Tτ . Defining piecewise constant and geodesic interpolations between the %τk ’s with
the corresponding velocities and momentums, it is possible to show that as τ → 0 we will get a
curve %t , t ∈ [0, T ] in (P(Rd ), W2 ) which solves
(
∂t ρt + ∇ · (%t vt ) = 0
vt = −h00 (%t )∇%t ,
hence
∂t %t − ∇ · (h00 (%t )%t ∇%t ) = 0,
that is %t is a solution to (6.2), see [2] for a rigorous presentation of these facts.
We now note that Theorem 1.1 implies that
Z
Z
τ
|∇%τk | dx,
|∇%k+1 | dx ≤
Rd
Rd
%τ0 , . . . , %τN .
hence the total variation decreases for the sequence
As the estimations do not depend
on τ > 0 this will remain true also in the limit τ → 0. Hence (assuming uniqueness for the limiting
equation) we get that for any t, s ∈ [0, T ], t > s
T V (%t , Rd ) ≤ T V (%s , Rd ),
and in particular for any t ∈ [0, T ]
T V (%t , Rd ) ≤ T V (%0 , Rd ).
References
[1] L. Ambrosio, Movimenti minimizzanti, Rend. Accad. Naz. Sci. XL Mem. Mat. Sci. Fis. Natur., 113 (1995),
191-246.
[2] L. Ambrosio, N. Gigli, G. Savaré, Gradient flows in metric spaces and in the space of probability measures,
Lectures in Math., ETH Zürich, (2005).
[3] G. Bouchitté, G. Buttazzo, New lower semicontinuity results for nonconvex functionals defined on measures,
Nonlinear Anal., 15 (1990), No. 7, 679-692.
[4] G. Buttazzo, Semicontinuity, relaxation, and integral representation in the calculus of variations, Longman
Scientific & Technical, 1989.
24
G. DE PHILIPPIS, A. R. MÉSZÁROS, F. SANTAMBROGIO, AND B. VELICHKOV
[5] G. Buttazzo, F. Santambrogio, A model for the optimal planning of an urban area, SIAM J. Math. Anal. 37
(2005), No. 2, 514-530.
[6] E. Carlen, K. Craig, Contraction of the proximal map and generalized convexity of the Moreau-Yosida
regularization in the 2-Wasserstein metric, Math. and Mech. Compl. Syst., Vol. 1 (2013), No. 1, 33–65.
[7] G. Carlier, F. Santambrogio, A variational model for urban planning with traffic congestion, ESAIM Contr.
Opt. Calc. Var. Vol. 11, No. 4, (2005), 595-613.
[8] L. A. Caffarelli, The regularity of mappings with a convex potential, J. Amer. Math. Soc., 5 (1992), No. 1,
99-104.
[9] L. A. Caffarelli, Boundary regularity of maps with convex potentials, Comm. Pure Appl. Math., 45 (1992),
No. 9, 1141-1151.
[10] P. Cannarsa, C. Sinestrari, Semiconcave functions, Hamilton-Jacobi euqations, and optimal control,
Birkhäuser, (2004).
[11] G. Dal Maso: An Introduction to Γ−convergence. Birkhäuser, Basel, (1992).
[12] A. Figalli, A note on the regularity of the free boundaries in the optimal partial transport problem, Rendiconti
del Circolo Matematico di Palermo, 58 (2009), 283-286.
[13] A. Figalli, The optimal partial transport problem, Arch. Rat. Mech. Anal., 195 (2010), 533-560.
[14] N. Gigli, On the inverse implication of Brenier-McCann theorems and the structure of (P2 (M ), W2 ), Meth.
Appl. of Anal., Vol 18 (2011), no 2, 127-158.
[15] J. Lellmann, D.A. Lorenz, C. Schönlieb, T. Valkonen, Imaging with Kantorovich-Rubinstein discrepancy,
preprint (available at http://arxiv.org/pdf/1407.0221v1.pdf
[16] B. Maury, A. Roudneff-Chupin, F. Santambrogio A macroscopic crowd motion model of gradient flow type,
Math. Models and Methods in Appl. Sciences, Vol. 20 (2010), No. 10, 1787-1821.
[17] B. Maury, A. Roudneff-Chupin, F. Santambrogio Congestion-driven dendritic growth, Discr. Cont. Dyn.
Syst., Vol. 34 (2014), No. 4, 1575-1604.
[18] R. J. McCann, A convexity principle for interacting gases. Adv. Math. 128 (1997), No. 1, 153-159.
[19] E. Milakis, On the regularity of optimal sets in mass transfer problems, Comm. Partial Differential Equations,
31 (2006), no. 4-6, 817-826.
[20] A. R. Mészáros, F. Santambrogio, A diffusive model for macroscopic crowd movements with congestion, in
preparation.
[21] F. Otto, The geometry of dissipative evolution equations: the porous medium equation, Commun. in PDE,
26 (2001), No. 1-2, 101-174.
[22] A. Roudneff-Chupin, Modélisation macroscopique de mouvements de foule, PhD Thesis, Université Paris-Sud,
(2011), available at http://www.math.u-psud.fr/∼roudneff/Images/these roudneff.pdf
[23] F. Santambrogio, Transport and concentration problems with interaction effects. J. Global Optim., 38 (2007),
no. 1, 129–141.
[24] F. Santambrogio, Optimal Transport for Applied Mathematicians. To be published by Birkäuser. An incomplete version is available at http://www.math.u-psud.fr/ santambr/OTAM.pdf
[25] Cédric Villani, Optimal Transport. Old and New. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 338. Springer-Verlag, Berlin, 2009.
[26] J. L Vázquez, The porous medium equation. Mathematical theory, The Clarendon Press, Oxford University
Press, (2007).
Institut für Mathematik, Universität Zürich, Zürich, Switzerland
E-mail address: [email protected]
Laboratoire de Mathématiques d’Orsay, Université Paris-Sud, 91405 Orsay Cedex, France
E-mail address: [email protected]
Laboratoire de Mathématiques d’Orsay, Université Paris-Sud, 91405 Orsay Cedex, France
E-mail address: [email protected]
Dipartimento di Matematica, Università di Pisa, Pisa, Italy
E-mail address: [email protected]