Gradient Flows in Spaces of Probability Measures Luigi Ambrosio∗ Scuola Normale Superiore, Pisa Giuseppe Savaré† Dipartimento di Matematica, Università di Pavia January 20, 2006 Contents 1 Notation and measure-theoretic results 3 2 Metric and differentiable structure of the Wasserstein space 2.1 Absolutely continuous maps and metric derivative . . . . . . . 2.2 The change of variables formula . . . . . . . . . . . . . . . . . . 2.3 The quadratic optimal transport problem . . . . . . . . . . . . 2.4 Geodesics in P2 (Rd ) . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Existence of optimal transport maps . . . . . . . . . . . . . . . 2.6 The continuity equation with locally Lipschitz velocity fields . . 2.7 The tangent bundle to the Wasserstein space . . . . . . . . . . . . . . . . . 5 5 6 7 9 10 12 20 3 Convex functionals in P2 (Rd ) 3.1 λ-geodesically convex functionals in Pp (X) . . . . . 3.2 Examples of convex functionals in P2 (Rd ) . . . . . . 3.3 Relative entropy and convex functionals of measures 3.4 Log-concavity and displacement convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 30 32 37 40 4 Subdifferential calculus in P2 (Rd ) 4.1 Definition of the subdifferential for a.c. measures 4.2 Subdifferential calculus in P2a (Rd ) . . . . . . . . 4.3 The case of λ-convex functionals along geodesics 4.4 Regular functionals . . . . . . . . . . . . . . . . . 4.5 Examples of subdifferentials . . . . . . . . . . . . 4.5.1 Variational integrals: the smooth case . . 4.5.2 The potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 47 49 50 53 56 56 57 ∗ [email protected] † [email protected] 1 . . . . . . . . . . . . . . 4.5.3 4.5.4 4.5.5 4.5.6 4.5.7 The The The The The internal energy . . . . . . . . . . . . . . relative internal energy . . . . . . . . . . interaction energy . . . . . . . . . . . . . opposite Wasserstein distance . . . . . . sum of internal, potential and interaction . . . . . . . . . . . . . . . . . . . . energy . . . . . . . . . . 58 62 64 67 68 5 Gradient flows of λ-geodesically convex functionals in P2 (Rd ) 71 5.1 Uniqueness and various characterizations of gradient flows . . . . 72 5.2 Main properties of Gradient Flows . . . . . . . . . . . . . . . . . 75 5.3 Existence of Gradient Flows by convergence of the “Minimizing Movement” scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . 86 6 Applications to Evolution PDE’s 6.1 Gradient flows and evolutionary PDE’s of diffusion type . . . . . 6.1.1 Changing the reference measure . . . . . . . . . . . . . . . 6.2 The linear transport equation for λ-convex potentials . . . . . . . 6.3 Kolomogorov-Fokker-Planck equation . . . . . . . . . . . . . . . 6.3.1 Relative Entropy and Fisher Information . . . . . . . . . 6.3.2 Wasserstein formulation of the Kolmogorov-Fokker-Planck equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 The construction of the Markovian semigroup . . . . . . . 6.4 Nonlinear diffusion equations . . . . . . . . . . . . . . . . . . . . 6.5 Drift diffusion equations with non local terms . . . . . . . . . . . 6.6 Gradient flow of −W 2 /2 and geodesics . . . . . . . . . . . . . . . 6.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . 2 89 89 90 92 94 94 95 100 105 107 108 109 1 Notation and measure-theoretic results In this section we recall the main notation used in this paper and some basic measure-theoretic terminology and results. Given a metric space (X, d), we denote by P(X) the set of probability measures µ : B(X) → [0, 1], where B(X) is the Borel σ-algebra. When X is a Borel subset of an euclidean space Rd , we set Z m2 (µ) := |x|2 dµ X and we denote by P2 (X) the subspace of X made by measures with finite quadratic moment: P2 (X) := {µ ∈ P(X) : m2 (µ) < ∞} . We denote by L d the Lebesgue measure in Rd and set P2a (X) := µ ∈ P2 (X) : µ L d , whenever X ∈ B(Rd ). If µ ∈ P(X1 ), and r : X1 → X2 is a Borel (or, more generally, µ-measurable) map, we denote by r # µ ∈ P(X2 ) the push-forward of µ through r, defined by r # µ(B) := µ(r −1 (B)) ∀ B ∈ B(X2 ). More generally we have Z (1.1) Z f (r(x)) dµ(x) = X1 f (y) d r # µ(y) (1.2) X2 for every bounded (or r # µ-integrable) Borel function f : X2 → R. It is easy to check that ∀ µ, ν ∈ P(X1 ). (1.3) ν µ =⇒ r # ν r # µ Notice also the natural composition rule (r ◦ s)# µ = r # (s# µ) where s : X1 → X2 , r : X2 → X3 , µ ∈ P(X1 ). (1.4) We denote by π i , i = 1, 2, the projection operators defined on a product space X := X1 × X2 , defined by π 1 : (x1 , x2 ) 7→ x1 ∈ X1 , π 2 : (x1 , x2 ) 7→ x2 ∈ X2 . (1.5) If X is endowed with the canonical product metric and the Borel σ-algebra and µ ∈ P(X), the marginals of µ are the probability measures i µi := π# µ ∈ P(Xi ), i = 1, 2. (1.6) Given µ1 ∈ P(X1 ) and µ2 ∈ P(X2 ) the class Γ(µ1 , µ2 ) of transport plans between µ1 and µ2 is defined by n o i Γ(µ1 , µ2 ) := µ ∈ P(X1 × X2 ) : π# µ = µi , i = 1, 2 . (1.7) 3 Notice also that Γ(µ1 , µ2 ) = {µ1 × µ2 } if either µ1 or µ2 is a Dirac mass. (1.8) To each couple of measures µ1 ∈ P(X1 ), µ2 = r # µ1 ∈ P(X2 ) linked by a Borel transport map r : X1 → X2 we can associate the transport plan µ := (i × r)# µ1 ∈ Γ(µ1 , µ2 ), i being the identity map on X1 . (1.9) If µ is representable as in (1.9) then we say that µ is induced by r. Each transport plan µ concentrated on a µ-measurable graph in X1 × X2 admits the representation (1.9) for some µ1 -measurable map r, which therefore transports µ1 to µ2 (see, e.g., [6]). Conformally to the probabilistic terminology, we say that a sequence (µn ) ⊂ P(X) is narrowly convergent to µ ∈ P(X) as n → ∞ if Z Z lim f (x) dµn (x) = f (x) dµ(x) (1.10) n→∞ for every function f ∈ tions defined on X. X Cb0 (X), X the space of continuous and bounded real func- Theorem 1.1 (Prokhorov, [29, III-59]) If a set K ⊂ P(X) is tight, i.e. ∀ε > 0 ∃ Kε compact in X such that µ(X \ Kε ) ≤ ε ∀ µ ∈ K, (1.11) then K is relatively compact in P(X). When one needs to pass to the limit in expressions like (1.10) w.r.t. unbounded or lower semicontinuous functions f , the following two properties are quite useful. The first one is a lower semicontinuity property: Z Z lim inf g(x) dµn (x) ≥ g(x) dµ(x) (1.12) n→∞ X X for every sequence (µn ) ⊂ P(X) narrowly convergent to µ and any l.s.c. function g : X → (−∞, +∞] bounded from below: it follows easily by a monotone approximation argument of g by continuous and bounded functions. Changing g in −g one gets the corresponding “lim sup” inequality for upper semicontinuous functions bounded from above. In particular, choosing as g the characteristic functions of open and closed subset of X, we obtain lim inf µn (G) ≥ µ(G) ∀ G open in X, (1.13) lim sup µn (F ) ≤ µ(F ) ∀ F closed in X. (1.14) n→∞ n→∞ The statement of the second property requires the following definitions: we say that a Borel function g : X → [0, +∞] is uniformly integrable w.r.t. a given set K ⊂ P(X) if Z lim g(x) dµ(x) = 0 uniformly w.r.t. µ ∈ K. (1.15) k→∞ {x:g(x)≥k} 4 In the particular case of g(x) := d(x, x̄)p , for some (and thus any) x̄ ∈ X and a given p > 0, i.e. if Z lim dp (x̄, x) dµ(x) = 0 uniformly w.r.t. µ ∈ K, (1.16) k→∞ X\Bk (x̄) we say that the set K ⊂ P(X) has uniformly integrable p-moments. The following lemma (see for instance Lemma 5.1.7 of [8] for its proof) provides a characterization of p-uniformly integrable families, extending the validity of (1.10) to unbounded but with p-growth functions, i.e. functions f : X → R such that |f (x)| ≤ A + B dp (x̄, x) ∀x ∈ X, (1.17) for some A, B ≥ 0 and x̄ ∈ X. Lemma 1.2 Let (µn ) be a sequence in P(X) narrowly convergent to µ ∈ P(X). If f : X → R is continuous, g : X → (−∞, +∞] is lower semicontinuous, and |f | and g − are uniformly integrable w.r.t. the set {µn }n∈N , then Z Z lim inf g(x) dµn (x) ≥ g(x) dµ(x) > −∞, (1.18a) n→∞ ZX ZX lim f (x) dµn (x) = f (x) dµ(x). (1.18b) n→∞ X X Conversely, if f : X → [0, ∞) is continuous, µn -integrable, and Z Z lim sup f (x) dµn (x) ≤ f (x) dµ(x) < +∞, n→∞ X (1.19) X then f is uniformly integrable w.r.t. {µn }n∈N . In particular, a family {µn }n∈N ⊂ P(X) has uniformly integrable p-moments iff (1.10) holds for every continuous function f : X → R with p-growth. 2 2.1 Metric and differentiable structure of the Wasserstein space Absolutely continuous maps and metric derivative Let (E, d) be a metric space. Definition 2.1 (Absolutely continuous curves) Let I ⊂ R be an interval and let v : I → E. We say that v is absolutely continuous if there exists m ∈ L1 (I) such that Z d(v(s), v(t)) ≤ t m(τ ) dτ s 5 ∀s, t ∈ I, s ≤ t. (2.1) Any absolutely continuous curve is obviously uniformly continuous, and therefore it can be uniquely extended to the closure of I. It is not difficult to show (see for instance Theorem 1.1.2 in [8] or [9]) that the metric derivative d(v(t + h), v(t)) h→0 |h| |v 0 |(t) := lim (2.2) exists at L 1 -a.e. t ∈ I for any absolutely continuous curve v(t). Furthermore, |v 0 | ∈ L1 (I) and is the minimal m fulfilling (2.1) (i.e. |v 0 | fulfils (2.1) and m ≥ |v 0 | L 1 -a.e. in I for any m with this property). 2.2 The change of variables formula Let f : A ⊂ Rd → Rd be a function, with A open. Then, denoting by Σf = D(∇f ) the Borel set where f is differentiable, there is a sequence of sets Σn ↑ Σ such that f |Σn is a Lipschitz function for any n (see [34, 3.1.8]). Therefore the well-known area formula for Lipschitz maps (see for instance [33, 34]) extends to this general class of maps and reads as follows: Z Z X h(x) dy (2.3) h(x)|det∇f |(x) dx = Rd x∈Σ ∩f −1 (y) f Σf for any Borel function h : Rd → [0, +∞]. This formula leads to a simple rule for computing the density of the push-forward of measures absolutely continuous w.r.t. L d . Lemma 2.2 (Density of the push-forward) Let ρ ∈ L1 (Rd ) be a nonnegative function and assume that there exists a Borel set Σ ⊂ Σf such that f |Σ is injective and the difference {ρ > 0} \ Σ is L d -negligible. Then f# ρL d L d if and only if |det∇f | > 0 L d -a.e. on Σ and in this case f# ρL d = ρ ◦ f −1 |f (Σ) L d . |det∇f | Proof. If |det∇f | > 0 L d -a.e. on Σ we can put h = ρχf −1 (B)∩Σ /|det∇f | in (2.3), with B ∈ B(Rd ), to obtain Z Z Z ρ(f −1 (y)) ρ dx = ρ dx = dy. ˜−1 (y))| f −1 (B) f −1 (B)∩Σ B∩f (Σ) |det∇f (f Conversely, if there is a Borel set B ⊂ Σ with L d (B) > 0 and |det∇f | = 0 on B, the area formula gives L d (f˜(B)) = 0. On the other hand Z ρ dx > 0 f# (ρL d )(f (B)) = f −1 (f (B)) because at L d -a.e. x ∈ B we have ρ(x) > 0. Hence f# (ρL d ) is not absolutely continuous with respect to L d . 6 By applying the area formula again we obtain the rule for computing integrals of the densities: Z Z f# (ρL d ) ρ F dx = |det∇f | dx (2.4) F Ld |det∇f | Rd Rd for any Borel function F : [0, +∞) → [0, +∞] with F (0) = 0. Notice that in this formula the set Σf does not appear anymore (due to the fact that F (0) = 0 and ρ = 0 out of Σ), so it holds provided f is differentiable ρL d -a.e., it is ρL d essentially injective (i.e. there exists a Borel set Σ such that f |Σ is injective and ρ = 0 L d -a.e. out of Σ) and |det∇f | > 0 ρL d -a.e. in Rd . We will apply mostly these formulas when f is the gradient of a convex function g. In this specific case the following result holds (see for instance [4, 33]): Theorem 2.3 (Aleksandrov) Let Ω ⊂ Rd be a convex open set and let g : Ω → R be a convex function. Then g is a locally Lipschitz function, ∇g is differentiable, its gradient ∇2 g(x) is a symmetric matrix and g has the second order Taylor expansion 1 g(y) = g(x) + h∇g(x), y − xi + h∇2 g(x), y − xi + o(|y − x|2 ) 2 as y → x (2.5) for L d -a.e. x ∈ Ω. Notice that ∇g is also monotone h∇g(x1 ) − ∇g(x2 ), x1 − x2 i ≥ 0 x1 , x2 ∈ D(∇g), and that the above inequality is strict if g is strictly convex: in this case, it is immediate to check that ∇g is injective on D(∇g), and that |det∇2 g| > 0 on the differentiability set of ∇g if g is uniformly convex. 2.3 The quadratic optimal transport problem Let X, Y be complete and separable metric spaces such that and let c : X ×Y → [0, +∞] be a Borel cost function. Given µ ∈ P(X), ν ∈ P(Y ) the optimal transport problem, in Monge’s formulation, is given by Z inf c(x, t(x)) dµ(x) : t# µ = ν . (2.6) X This problem can be ill posed because sometimes there is no transport map t such that t# µ = ν (this happens for instance when µ is a Dirac mass and ν is not a Dirac mass). Kantorovich’s formulation Z min c(x, y) dγ(x, y) : γ ∈ Γ(µ, ν) (2.7) X×Y 7 circumvents this problem (as µ × ν ∈ Γ(µ, ν)). The existence of an optimal transport plan, when c is l.s.c., is provided by (1.12) and by Theorem 1.1, taking into account that Γ(µ, ν) is tight (this follows easily by the fact that the marginals of the measures in Γ(µ, ν) are fixed, and by the fact that according to Ulam’s theorem any finite measure in a complete and separable metric space is tight, see also Chapter 6 in [8] for more general formulations). The problem (2.7) is truly a weak formulation of (2.6) in the following sense: if c is bounded and continuous, and if µ has no atom, then the “min” in (2.7) is equal to the “inf” in (2.6), see [36], [6]. This result can also be extended to classes of unbounded cost functions, see [61]. In the sequel we consider the case when X = Y and c(x, y) = d2 (x, y), where d is the distance in X, and denote by Γo (µ, ν) the optimal plans in (2.7) corresponding to this choice of the cost function. In this case we ue the minimum value to define the Kantorovich-Rubinstein-Wasserstein distance Z 1/2 2 W2 (µ, ν) := d (x, y) dγ γ ∈ Γo (µ, ν). (2.8) X×X Theorem 2.4 Let X be a complete and separable metric space. Then W2 defines a distance in P2 (X) and P2 (X), endowed with this distance, is a complete and separable metric space. Furthermore, for a given sequence (µn ) ⊂ P2 (X) we have ( µn narrowly converge to µ, 2 lim W2 (µn , µ) = 0 ⇐⇒ (2.9) n→∞ (µn ) has uniformly integrable 2-moments. Proof. We just prove that W2 is a distance. The complete statement is proved for instance in Proposition 7.1.5 of [8] or, in the locally compact case, in [65]. Let µ, ν, σ ∈ P2 (X) and let γ ∈ Γo (µ, ν) and η ∈ Γo (ν, σ). General results of probability theory (see the above mentioned references) ensure the existence of λ ∈ P(X × X × X) such that (π 1 , π 2 )# λ = γ, (π 2 , π 3 )# λ = η. Then, as 1 1 1 π# (π 1 , π 3 )# λ = π# λ = π# γ = µ, 2 3 2 π# (π 1 , π 3 )# λ = π# λ = π# η = σ, we obtain that (π 1 , π 3 )# λ ∈ Γ(µ, σ), hence Z W2 (µ, ν) ≤ 1/2 d (x1 , x2 ) d(π , π )# λ = kd(x1 , x3 )kL2 (λ) . 2 1 3 X×X As d(x1 , x3 ) ≤ d(x1 , x2 ) + d(x2 , x3 ) and kd(x1 , x2 )kL2 (λ) = kd(x1 , x2 )kL2 (γ) = W2 (µ, ν), kd(x2 , x3 )kL2 (λ) = kd(x2 , x3 )kL2 (γ) = W2 (ν, σ), the triangle inequality W2 (µ, σ) ≤ W2 (µ, ν) + W2 (ν, σ) follows by the strandard triangle inequality in L2 (λ). 8 Working with Monge’s formulation the proof above is technically easier, as an admissible transport map between µ and σ can be obtained just composing transport maps between µ and ν with transport maps between ν and σ. However, in order to give a complete proof one needs to know either that optimal plans are induced by maps, or that the infimum in Monge’s formulation coincides with the minimum in Kantorovich’s one, and none of these results is trivial, even in Euclidean spaces. Although in many situations that we consider in this paper the optimal plans are induced by maps, still the Kantorovich formulation of the optimal transport problem is quite useful to provide estimates from above on W2 . For instance: Z 2 W2 (µ, ν) ≤ d2 (t(x), s(x)) dσ(x) whenever t# σ = µ, s# σ = ν. (2.10) X This follows by the fact that (t, s)# σ ∈ Γ(µ, ν) and by the identity Z Z d2 (t(x), s(x)) dσ(x) = d2 (x, y) d(t, s)# σ. X 2.4 X×X Geodesics in P2 (Rd ) Let (X, d) be a metric space. Recall that a constant speed geodesic γ : [0, T ] → X is a map satisfying d(γ(s), γ(t)) = (t − s) d(γ(0), γ(T )) T whenever 0 ≤ s ≤ t ≤ T . Actually only the inequality d(γ(s), γ(t)) ≤ T −1 (t − s)d(γ(0), γ(T )) needs to be checked for all 0 ≤ s ≤ t ≤ T . Indeed, if the strict inequality occurs for some s < t, then the triangle inequality provides d(γ(0), γ(T )) ≤ d(γ(0), γ(s)) + d(γ(s), γ(T )) + d(γ(t), γ(T )) 1 < (s + (t − s) + (T − t))d(γ(0), γ(T )) = d(γ(0), γ(T )), T a contradiction. Using this elementary fact one can show that, for any choice of µ, ν ∈ P2 (Rd ), and γ ∈ Γo (µ, ν), the map µt := (1 − t)π 1 + tπ 2 # γ t ∈ [0, 1] (2.11) is a constant speed geodesic. Indeed, γ st := ((1 − s)π 1 + sπ 2 ), ((1 − t)π 1 + tπ 2 ) # γ ∈ Γ(µs , µt ) and this plan provides the estimate W2 (µs , µt ) ≤ (t − s)W2 (µ, ν), 9 (2.12) as Z 2 Rd ×Rd Z |x1 − x2 | dγ st = |(1 − s)x1 + sx2 − (1 − t)x1 − tx2 |2 dγ Z 2 (s − t) |x1 − x2 |2 dγ. Rd ×Rd = Rd ×Rd It has been proved in Theorem 7.2.2 of [8] that any constant speed geodesic joining µ to ν can be built in this way. We discuss additional regularity properties of the geodesics in the next section. Here we just mention that, in the case when γ is induced by a transport map t (i.e. γ = (i, t)# µ), then (2.11) reduces to µt = ((1 − t)i + tt)# µ t ∈ [0, 1]. (2.13) 2.5 Existence of optimal transport maps The following result has been first proved in []. See [], [] and of [8] for several extensions of it (more general cost functions as well as more general classes of initial measures), and for additional bibliographical references. Here we just briefly mention the proof of (iii). Theorem 2.5 (Existence and uniqueness of optimal transport maps) For any µ ∈ P2a (Rd ), ν ∈ P2 (Rd ) Kantorovich’s optimal transport problem (2.7) with c(x, y) = |x − y|2 has a unique solution γ. Moreover: (i) γ is induced by a transport map t, i.e. γ = (i, t)# µ. In particular t is the unique solution of Monge’s optimal transport problem (2.6). (ii) The map t coincides µ-a.e. with the gradient of a convex function ϕ : Rd → (−∞, +∞], whose finiteness domain D(φ) satisfies µ(Rd \ D(φ)) = 0. (iii) If ν = ρ0 L d ∈ P2a (Rd ) as well, then s ◦ t = i µ-a.e. in Rd and t ◦ s = i ν-a.e. in Rd . In particular t is µ-essentially injective, i.e. there exists a µ-negligible set N ⊂ Rd such that, setting Ω = Rd \ N , t|Ω is injective. Finally ρ ◦ (t|Ω )−1 ν-a.e. in Rd . det ∇2 ϕ Proof. Let s be the unique optimal transport plan between ν and µ. Since (i, t)# µ and (s, i)# ν are both optimal plans between µ and ν, they coincide. Testing this identity between plans on |s(t(x)−x| (resp. |t(s(y))−y|) we obtain that s ◦ t = i µ-a.e. in Rd (resp. t ◦ s = i ν-a.e. in Rd ): Z Z Z |x − s(t(x))| dµ(x) = |x − s(y)| d(i, t)# µ = |x − s(y)| d(s, i)# ν d d Rd Rd ×Rd ZR ×R |s(y) − s(y)| dν(y) = 0. = ρ0 := Rd 10 The formula for the density of ν with respect to L d follows by Lemma 2.2, taking into account the µ-essential injectivity of t. In the following we shall denote by tνµ the unique optimal map given by Theorem 2.5. Notice that t = ∇ϕ is uniquely determined only µ-a.e., hence ϕ is not uniquely determined, not even up to additive constants, unless µ = ρL d with ρ > 0 L d -a.e. in Rd . However, the existence proof (at least the one achieved through a duality argument), yields some “canonical” ϕ, given by ϕ(x) = sup hx, yi + ψ(y) y∈supp ν x ∈ Rd (2.14) for a suitable function ψ : supp ν → [−∞, +∞). This explicit expression is sometimes technically useful: for instance, it shows that when supp ν is bounded we can always find a globally convex and Lipschitz map ϕ whose gradient is the optimal transport map. Theorem 2.6 (Regularity in the interior of geodesics) Let µ, ν ∈ P2 (Rd ) and let µt := (1 − t)π 1 + tπ 2 # γ be a constant speed geodesic induced by γ ∈ Γo (µ, ν). Then the following properties hold: (i) For any t ∈ [0, 1) there exists a unique optimal plan between µt and µ, and this plan is induced by a map st with Lipschitz constant less than 1/(1−t); (ii) If µ = ρL d ∈ P2a (Rd ) then µt ∈ P2a (Rd ) for all t ∈ [0, 1). Proof. (i) The necessary optimality conditions at the level of plans (see for instance §6.2.3 of [8], or [65]) imply that the support of γ is contained in the graph {(x, y) : y ∈ Γ(x)} of a monotone operator Γ(x). On the other hand, the same argument used in the proof of (2.12) shows that the plan γ t := π 1 , (1 − t)π 1 + tπ 2 # γ is optimal between µ and µt . As the support of γ t is contained in the graph of the monotone operator (1 − t)I + tΓ, whose inverse Γ−1 (y) := x ∈ Rd : y ∈ Γ(x) is single-valued and 1/(1 − t)-Lipschitz continuous. Therefore the graph of Γ−1 is the graph of a 1/(1 − t)-Lipschitz map st pushing µt to µ. The uniqueness of this map, even at the level of plans, is proved in Lemma 7.2.1 of [8]. (ii) If A ∈ B(Rd ) is L d -negligible, then st (A) is also L d -negligible, hence µ-negligible. The identity st ◦ tt = i µ-a.e. then gives µt (A) = µ(t−1 t (A)) ≤ µ(st (A)) = 0. This proves that µt L d . 11 2.6 The continuity equation with locally Lipschitz velocity fields In this section we collect some results on the continuity equation in Rd × (0, T ), ∂t µt + ∇ · (vt µt ) = 0 (2.15) which we will need in the sequel. Here µt is a Borel family of probability measures on Rd defined for t in the open interval I := (0, T ), v : (x, t) 7→ vt (x) ∈ Rd is a Borel velocity field such that T Z Z |vt (x)| dµt (x) dt < +∞, 0 (2.16) Rd and we suppose that (2.15) holds in the sense of distributions, i.e. Z T Z ∂t ϕ(x, t) + hvt (x), ∇x ϕ(x, t)i dµt (x) dt = 0, Rd 0 ∀ϕ ∈ Cc∞ (Rd (2.17) × (0, T )). Remark 2.7 (More general test functions) By a simple regularization ar- gument via convolution, it is easy to show that (2.17) holds if ϕ ∈ Cc1 Rd × (0, T ) as well. Moreover, under condition (2.16), we can also consider bounded test functions ϕ, with bounded gradient, whose support has a compact projection in (0, T ) (that is, the support in x need not be compact): it suffices to approximate ϕ by ϕχR where χR ∈ Cc∞ (Rd ), 0 ≤ χR ≤ 1, |∇χR | ≤ 2 and χR = 1 on BR (0). First of all we recall some technical preliminaries. Lemma 2.8 (Continuous representative) Let µt be a Borel family of probability measures satisfying (2.17) for a Borel vector field vt satisfying (2.16). Then there exists a narrowly continuous curve t ∈ [0, T ] 7→ µ̃t ∈ P(Rd ) such that µt = µ̃t for L 1 -a.e. t ∈ (0, T ). Moreover, if ϕ ∈ Cc1 (Rd × [0, T ]) and t1 ≤ t2 ∈ [0, T ] we have Z Z ϕ(x, t1 ) dµ̃t1 (x) ϕ(x, t2 ) dµ̃t2 (x) − Rd Rd Z t2 Z (2.18) ∂t ϕ + h∇ϕ, vt i dµt (x) dt. = t1 Rd Proof. Let us take ϕ(x, t) = η(t)ζ(x), η ∈ Cc∞ (0, T ) and ζ ∈ Cc∞ (Rd ); we have Z − T Z η (t) 0 0 Z ζ(x) dµt (x) dt = Rd T Z η(t) h∇ζ(x), vt (x)i dµt (x) dt, Rd 0 so that the map Z t 7→ µt (ζ) = ζ(x) dµt (x) Rd 12 belongs to W 1,1 (0, T ) with distributional derivative Z µ̇t (ζ) = h∇ζ(x), vt (x)i dµt (x) for L 1 -a.e. t ∈ (0, T ) (2.19) Rd with Z |µ̇t (ζ)| ≤ V (t) sup |∇ζ|, |vt (x)| dµt (x), V (t) := Rd V ∈ L1 (0, T ). (2.20) Rd If Lζ is the set of its Lebesgue points, we know that L 1 ((0, T ) \ Lζ ) = 0. Let us now take a countable set Z which is dense in Cc1 (Rd ) with respect the usual C 1 norm kζkC 1 = supRd (|ζ|, |∇ζ|) and let us set LZ := ∩ζ∈Z Lζ . The restriction of the curve µ to LZ provides a uniformly continuous family of bounded functionals on Cc1 (Rd ), since (2.20) shows Z t |µt (ζ) − µs (ζ)| ≤ kζkC 1 V (λ) dλ ∀ s, t ∈ LZ . s Therefore, it can be extended in a unique way to a continuous curve {µ̃t }t∈[0,T ] in [Cc1 (Rd )]0 . If we show that {µt }t∈LZ is also tight, the extension provides a continuous curve in P(Rd ). For, let us consider nonnegative, smooth functions ζk : Rd → [0, 1], k ∈ N, such that if |x| ≤ k, ζk (x) = 1 ζk (x) = 0 if |x| ≥ k + 1, |∇ζk (x)| ≤ 2. It is not restrictive to suppose that ζk ∈ Z. Applying the previous formula (2.19), for t, s ∈ LZ we have Z TZ |µt (ζk ) − µs (ζk )| ≤ ak := 2 |vλ (x)| dµλ (x) dλ, 0 k<|x|<k+1 P+∞ with k=1 ak < +∞. For a fixed s ∈ LZ and ε > 0, being µs tight, we can find k ∈ N such that µs (ζk ) > 1 − ε/2 and ak < ε/2. It follows that µt (Bk+1 (0)) ≥ µt (ζk ) ≥ 1 − ε ∀ t ∈ LZ . Now we show (2.18). Let us choose ϕ ∈ Cc1 (Rd × [0, T ]) and set ϕε (x, t) = ηε (t)ϕ(x, t), where ηε ∈ Cc∞ (t1 , t2 ) such that 0 ≤ ηε (t) ≤ 1, lim ηε (t) = χ(t1 ,t2 ) (t) ∀ t ∈ [0, T ], ε↓0 lim ηε0 = δt1 − δt2 ε↓0 in the duality with continuous functions in [0, T ]. We get Z TZ 0= ∂t (ηε ϕ) + h∇x (ηε ϕ), vt i dµt (x) dt Rd 0 T Z = Z ηε (t) ∂t ϕ(x, t) + hvt (x), ∇x ϕ(x, t)i dµt (x) dt Rd 0 Z + 0 T ηε0 (t) Z ϕ(x, t) dµ̃t (x) dt. Rd 13 Passing to the limit as ε vanishes and invoking the continuity of µ̃t , we get (2.18). Lemma 2.9 (Time rescaling) Let t : s ∈ [0, T 0 ] → t(s) ∈ [0, T ] be a strictly increasing absolutely continuous map with absolutely continuous inverse s := t−1 . Then (µt , vt ) is a distributional solution of (2.15) if and only if µ̂ := µ ◦ t, v̂ := t0 v ◦ t, is a distributional solution of (2.15) on (0, T 0 ). Proof. By an elementary smoothing argument we can assume that s is continuously differentiable and s0 > 0. We choose ϕ̂ ∈ Cc1 (Rd × (0, T 0 )) and let us set ϕ(x, t) := ϕ̂(x, s(t)); since ϕ ∈ Cc1 (Rd × (0, T )) we have Z TZ s0 (t)∂s ϕ̂(x, s(t)) + h∇ϕ̂(x, s(t)), v̂t (x)i dµt (x) dt 0= Z Rd 0 T s0 (t) = 0 Z T0 Z = 0 Z vt (x) i dµt (x) dt s0 (t) Rd ∂s ϕ̂(x, s) + h∇x ϕ̂(x, s), t0 (s)vt(s) (x)i dµ̂s (x) ds. ∂s ϕ̂(x, s(t)) + h∇x ϕ̂(x, s(t)), Rd When the velocity field vt is more regular, the classical method of characteristics provides an explicit solution of (2.15). First we recall an elementary result of the theory of ordinary differential equations. Lemma 2.10 (The characteristic system of ODE) Let vt be a Borel vector field such that for every compact set B ⊂ Rd Z T sup |vt | + Lip(vt , B) dt < +∞. (2.21) B 0 Then, for every x ∈ Rd and s ∈ [0, T ] the ODE Xs (x, s) = x, d Xt (x, s) = vt (Xt (x, s)), dt (2.22) admits a unique maximal solution defined in an interval I(x, s) relatively open in [0, T ] and containing s as (relatively) internal point. Furthermore, if t 7→ |Xt (x, s)| is bounded in the interior of I(x, s) then I(x, s) = [0, T ]; finally, if v satisfies the global bounds analogous to (2.21) Z T S := sup |vt | + Lip(vt , Rd ) dt < +∞, (2.23) 0 Rd then the flow map X satisfies Z T sup |∂t Xt (x, s)| dt ≤ S, 0 x∈Rd sup Lip(Xt (·, s), Rd ) ≤ eS . t,s∈[0,T ] 14 (2.24) For simplicity, we set Xt (x) := Xt (x, 0) in the particular case s = 0 and we denote by τ (x) := sup I(x, 0) the length of the maximal time domain of the characteristics leaving from x at t = 0. Remark 2.11 (The characteristics method for backward first order linear PDE’s) Characteristics provide a useful representation formula for classical solutions of the backward equation (formally adjoint to (2.15)) in Rd × (0, T ), ∂t ϕ + hvt , ∇ϕi = ψ ϕ(x, T ) = ϕT (x) x ∈ Rd , (2.25) when, e.g., ψ ∈ Cb1 (Rd × (0, T )), ϕT ∈ Cb1 (Rd ) and v satisfies the global bounds (2.23), so that maximal solutions are always defined in [0, T ]. A direct calculation shows that Z T ϕ(x, t) := ϕT (XT (x, t)) − ψ(Xs (x, t), s) ds (2.26) t solve (2.25). For Xs (Xt (x, 0), t) = Xs (x, 0) yields Z T ψ(Xs (x, 0), s) ds, ϕ(Xt (x, 0), t) = ϕT (XT (x, 0)) − t and differentiating both sides with respect to t we obtain ∂ϕ + hvt , ∇ϕi (Xt (x, 0), t) = ψ(Xt (x, 0), t). ∂t Since x (and then Xt (x, 0)) is arbitrary we conclude that (2.32) is fulfilled. Now we use characteristics to prove the existence, the uniqueness, and a representation formula of the solution of the continuity equation, under suitable assumption on v. Lemma 2.12 Let vt be a Borel velocity field satisfying (2.21), (2.16), let µ0 ∈ P(Rd ), and let Xt be the maximal solution of the ODE (2.22) (corresponding to s = 0). Suppose that for some t̄ ∈ (0, T ] τ (x) > t̄ for µ0 -a.e. x ∈ Rd . (2.27) Then t 7→ µt := (Xt )# µ0 is a continuous solution of (2.15) in [0, t̄]. Proof. The continuity of µt follows easily since lims→t Xs (x) = Xt (x) for µ0 a.e. x ∈ Rd : thus for every continuous and bounded function ζ : Rd → R the dominated convergence theorem yields Z Z Z Z lim ζ dµs = lim ζ(Xs (x)) dµ0 (x) = ζ(Xt (x)) dµ0 (x) = ζ dµt . s→t Rd s→t Rd Rd Rd For any ϕ ∈ Cc∞ (Rd × (0, t̄)) and for µ0 -a.e. x ∈ Rd the maps t 7→ φt (x) := ϕ(Xt (x), t) are absolutely continuous in (0, t̄), with φ̇t (x) = ∂t ϕ(Xt (x), t) + h∇ϕ(Xt (x), t), vt (Xt (x))i = Λ(·, t) ◦ Xt , 15 where Λ(x, t) := ∂t ϕ(x, t) + h∇ϕ(x, t), vt (x)i. We thus have T Z Z Z T Z T Z |φ̇t (x)| dµ0 (x) dt = 0 |Λ(Xt (x), t)| dµ0 (x) dt Rd Rd 0 Z |Λ(x, t)| dµt (x) dt = Rd 0 T Z ≤ Lip(ϕ) T + and therefore Z Z 0= ϕ(x, t̄) dµt̄ (x) − Rd Z = Rd t̄ |vt (x)| dµt (x) dt < +∞ 0 Rd Z ϕ(x, 0) dµ0 (x) = Rd Z Z ϕ(Xt̄ (x), t̄) − ϕ(x, 0) dµ0 (x) Rd Z t̄ Z ∂t ϕ + h∇ϕ, vt i dµt dt, φ̇t (x) dt dµ0 (x) = 0 0 Rd by a simple application of Fubini’s theorem. We want to prove that, under reasonable assumptions, in fact any solution of (2.15) can be represented as in Lemma 2.12. The first step is a uniqueness theorem for the continuity equation under minimal regularity assumptions on the velocity field. Notice that the only global information on vt is (2.28). The proof is based on a classical duality argument (see for instance [30, 6]). Proposition 2.13 (Uniqueness and comparison for the continuity equation) Let σt be a narrowly continuous family of signed measures solving ∂t σt + ∇ · (vt σt ) = 0 in Rd × (0, T ), with σ0 ≤ 0, T Z Z |vt | d|σt |dt < +∞, 0 and Z T (2.28) Rd |σt |(B) + sup |vt | + Lip(vt , B) dt < +∞ B 0 for any bounded closed set B ⊂ Rd . Then σt ≤ 0 for any t ∈ [0, T ]. Proof. Fix ψ ∈ Cc∞ (Rd × (0, T )) with 0 ≤ ψ ≤ 1, R > 0, and a smooth cut-off function χR (·) = χ(·/R) ∈ Cc∞ (Rd ) such that 0 ≤ χR ≤ 1, |∇χR | ≤ 2/R, χR ≡ 1 on BR (0), and χR ≡ 0 on Rd \ B2R (0). (2.29) We define wt so that wt = vt on B2R (0) × (0, T ), wt = 0 if t ∈ / [0, T ] and sup |wt | + Lip(wt , Rd ) ≤ sup |vt | + Lip(vt , B2R (0)) ∀ t ∈ [0, T ]. Rd B2R (0) 16 (2.30) Let wtε be obtained from wt by a double mollification with respect to the space and time variables: notice that wtε satisfy T Z sup ε∈(0,1) sup |wtε | + Lip(wtε , Rd ) dt < +∞. (2.31) Rd 0 We now build, by the method of characteristics described in Remark 2.11, a smooth solution ϕε : Rd × [0, T ] → R of the PDE ∂ϕε + hwtε , ∇ϕε i = ψ ∂t in Rd × (0, T ), ϕε (x, T ) = 0 x ∈ Rd . (2.32) Combining the representation formula (2.26), the uniform bound (2.31), and the estimate (2.24), it is easy to check that 0 ≥ ϕε ≥ −T and |∇ϕε | is uniformly bounded with respect to ε, t and x. We insert now the test function ϕε χR in the continuity equation and take into account that σ0 ≤ 0 and ϕε ≤ 0 to obtain Z ≥ 0 Rd Z T Z T Z = Rd 0 Z ≥ 0 Rd T ∂ϕε + hvt , χR ∇ϕε + ϕε ∇χR i dσt dt ∂t 0 Rd Z TZ χR (ψ + hvt − wtε , ∇ϕε i) dσt dt + ϕε h∇χR , vt i dσt dt ϕε χR dσ0 = − Z Z χR Rd 0 Z χR (ψ + hvt − wtε , ∇ϕε i) dσt dt − T Z |∇χR ||vt | d|σt | dt. Rd 0 Letting ε ↓ 0 and using the uniform bound on |∇ϕε | and the fact that wt = vt on supp χR × [0, T ], we get Z 0 T Z χR ψ dσt dt ≤ Rd T Z Z |∇χR ||vt | d|σt | dt ≤ Rd 0 Eventually letting R → ∞ we obtain that arbitrary the proof is achieved. 2 R RT R 0 Rd Z T Z |vt | d|σt | dt. 0 R≤|x|≤2R ψ dσt dt ≤ 0. Since ψ is Proposition 2.14 (Representation formula for the continuity equation) Let µt , t ∈ [0, T ], be a narrowly continuous family of Borel probability measures solving the continuity equation (2.15) w.r.t. a Borel vector field vt satisfying (2.21) and (2.16). Then for µ0 -a.e. x ∈ Rd the characteristic system (2.22) admits a globally defined solution Xt (x) in [0, T ] and µt = (Xt )# µ0 ∀ t ∈ [0, T ]. (2.33) Moreover, if Z 0 T Z |vt (x)|2 dµt (x) dt < +∞ Rd 17 (2.34) then the velocity field vt is the time derivative of Xt in the L2 -sense 2 Z T −h Z Xt+h (x) − Xt (x) lim − vt (Xt (x)) dµ0 (x) dt = 0, (2.35) h↓0 0 h Rd Xt+h (x, t) − x lim = vt (x) in L2 (µt ; Rd ) for L 1 -a.e. t ∈ (0, T ). (2.36) h→0 h Proof. Let Es = {τ > s} and let us use the fact that, proved in Lemma 2.12, that t 7→ Xt# (χEs µ0 ) is the solution of (2.15) in [0, s]. By Proposition 2.13 we get also Xt# (χEs µ0 ) ≤ µt whenever 0 ≤ t ≤ s. Using the previous inequality with s = t we can estimate: Z Z Z τ (x) |Ẋt (x)| dµ0 (x) sup |Xt (x) − x| dµ0 (x) ≤ Rd (0,τ (x)) Rd Z 0 τ (x) Z |vt (Xt (x))| dµ0 (x) = Rd Z T 0 Z |vt (Xt (x))| dµ0 (x) dt = 0 Z Et T Z ≤ |vt | dµt dt. 0 Rd It follows that Xt (x) is bounded on (0, τ (x)) for µ0 -a.e. x ∈ Rd and therefore Xt is globally defined in [0, T ] for µ0 -a.e. in Rd . Applying Lemma 2.12 and Proposition 2.13 we obtain (2.33). Now we observe that the differential quotient Dh (x, t) := h−1 (Xt+h (x) − Xt (x)) can be bounded in L2 (µ0 × L 1 ) by Z T −h Z Xt+h (x) − Xt (x) 2 dµ0 (x) dt h 0 Rd 2 Z T −h Z Z h 1 = vt+s (Xt+s (x)) ds dµ0 (x) dt h d 0 R 0 Z T −h Z Z h 1 2 ≤ |vt+s (Xt+s (x))| ds dµ0 (x) dt h d 0 R 0 Z TZ 2 ≤ |vt (Xt (x))| dµ0 (x) dt < +∞. 0 Rd Since we already know that Dh is pointwise converging to vt ◦ Xt µ0 × L 1 -a.e. in Rd × (0, T ), we obtain the strong convergence in L2 (µ0 × L 1 ), i.e. (2.35). Finally, we can consider t 7→ Xt (·) and t 7→ vt (Xt (·) as maps from (0, T ) to L2 (µ0 ; Rd ); (2.35) is then equivalent to 2 Z T −h Xt+h − Xt − vt (Xt ) dt = 0, lim 2 h↓0 0 h L (µ0 ;Rd ) 18 and it shows that t 7→ Xt (·) belongs to AC 2 (0, T ; L2 (µ0 ; Rd )). General results for absolutely continuous maps with values in Hilbert spaces yield that Xt is differentiable L 1 -a.e. in (0, T ), so that 2 Z Xt+h (x) − Xt (x) dµ0 (x) = 0 for L 1 -a.e. t ∈ (0, T ). lim − v (X (x)) t t h→0 Rd h Since Xt+h (x) = Xh (Xt (x), t), we obtain (2.36). Now we state an approximation result for general solution of (2.15) with more regular ones, satisfying the conditions of the previous Proposition 2.14. Lemma 2.15 (Approximation by regular curves) Let µt be a time-continuous solution of (2.15) w.r.t. a velocity field satisfying the integrability condition Z TZ |vt (x)|2 dµt (x) dt < +∞. (2.37) 0 Rd Let (ρε ) ⊂ C ∞ (Rd ) be a family of strictly positive mollifiers in the x variable, (e.g. ρε (x) = (2πε)−d/2 exp(−|x|2 /2ε)), and set µεt := µt ∗ ρε , Etε := (vt µt ) ∗ ρε , vtε := Etε . µεt (2.38) Then µεt is a continuous solution of (2.15) w.r.t. vtε , which satisfies the local regularity assumptions (2.21) and the uniform integrability bounds Z Z |vtε (x)|2 dµεt (x) ≤ |vt (x)|2 dµt (x) ∀ t ∈ (0, T ). (2.39) Rd Rd Moreover, Etε → vt µt narrowly and lim kvtε kL2 (µεt ;Rd ) = kvt kL2 (µt ;Rd ) ε↓0 ∀t ∈ (0, T ). (2.40) Proof. With a slight abuse of notation, we are denoting the measure µεt and its density w.r.t. L d by the same symbol. Notice first that |E ε |(t, ·) and its spatial gradient are uniformly bounded in space by the product of kvt kL1 (µt ) with a constant depending on ε, and the first quantity is integrable in time. Analogously, |µεt |(t, ·) and its spatial gradient are uniformly bounded in space by a constant depending on ε. Therefore, as vtε = Etε /µεt , the local regularity assumptions (2.21) is fulfilled if inf |x|≤R, t∈[0,T ] µεt (x) > 0 for any ε > 0, R > 0. This property is immediate, since µεt are continuous w.r.t. t and equi-continuous w.r.t. x, and therefore continuous in both variables. Lemma 2.16 below shows that (2.39) holds. Notice also that µεt solve the continuity equation ∂t µεt + ∇ · (vtε µεt ) = 0 19 in Rd × (0, T ), (2.41) because, by construction, ∇·(vtε µεt ) = ∇·((vt µt )∗ρε ) = (∇·(vt µt ))∗ρε . Finally, general lower semicontinuity results on integral functionals defined on measures of the form Z 2 E dµ (E, µ) 7→ Rd µ (see for instance Theorem 2.34 and Example 2.36 in [7]) provide (2.40). Lemma 2.16 Let µ ∈ P(Rd ) and let E be a Rm -valued measure in Rd with finite total variation and absolutely continuous with respect to µ. Then Z Z 2 E ∗ ρ 2 E µ ∗ ρ dx ≤ dµ Rd µ ∗ ρ Rd µ for any convolution kernel ρ. Proof. We use Jensen inequality in the following form: if Φ : Rm+1 → [0, +∞] is convex, l.s.c. and positively 1-homogeneous, then Z Z Φ ψ(x) dθ(x) ≤ Φ(ψ(x)) dθ(x) Rd Rd for any Borel map ψ : Rd → Rm+1 and any positive and finite measure θ in Rd (by rescaling θ to be a probability measure and looking at the image measure ψ# θ the formula reduces to the standard Jensen inequality). Fix x ∈ Rd and apply the inequality above with ψ := (E/µ, 1), θ := ρ(x − ·)µ and 2 |z| if t > 0 t Φ(z, t) := 0 if (z, t) = (0, 0) +∞ if either t < 0 or t = 0, z 6= 0, to obtain E ∗ ρ(x) 2 µ ∗ ρ(x) µ ∗ ρ(x) Z Z E = Φ (y)ρ(x − y) dµ(y), ρ(x − y)dµ(y) Rd µ Z E Φ( (y), 1)ρ(x − y) dµ(y) ≤ µ d R Z 2 E (y)ρ(x − y) dµ(y). = µ d R An integration with respect to x leads to the desired inequality. 2.7 The tangent bundle to the Wasserstein space In this section we endow P2 (Rd ) with a kind of differential structure, consistent with the metric structure introduced in Section 2.3. Our starting point is the 20 analysis of absolutely continuous curves µt : (a, b) → P2 (Rd ): recall that this concept depends only on the metric structure of P2 (Rd ), by Definition 2.1. We show in Theorem 2.17 that this class of curves coincides with (distributional) solutions of the continuity equation ∂ µt + ∇ · (vt µt ) = 0 ∂t in Rd × (a, b). More precisely, given an absolutely continuous curve µt , one can find a Borel time-dependent velocity field vt : Rd → Rd such that kvt kLp (µt ) ≤ |µ0 |(t) for L 1 -a.e. t ∈ (a, b) and the continuity equation holds. Here |µ0 |(t) is the metric derivative of µt , defined in (2.2). Conversely, if µt solve the continuity equaRb tion for some Borel velocity field vt with a kvt kL2 (µt ) dt < +∞, then µt is an absolutely continuous curve and kvt kL2 (µt ) ≥ |µ0 |(t) for L 1 -a.e. t ∈ (a, b). As a consequence of Theorem 2.17 we see that among all velocity fields vt which produce the same flow µt , there is a unique optimal one with smallest L2 (µt ; Rd )-norm, equal to the metric derivative of µt ; we view this optimal field as the “tangent” vector field to the curve µt . To make this statement more precise, one can show that the minimality of the L2 norm of vt is characterized by the property vt ∈ {∇ϕ : ϕ ∈ Cc∞ (Rd ))} L2 (µt ;Rd ) for L 1 -a.e. t ∈ (a, b). (2.42) The characterization (2.42) of tangent vectors strongly suggests to consider the following tangent bundle to P2 (Rd ) L2 (µ;Rd ) Tanµ P2 (Rd ) := {∇ϕ : ϕ ∈ Cc∞ (Rd )} ∀µ ∈ P2 (Rd ), (2.43) endowed with the natural L2 metric. Moreover, as a consequence of the characterization of absolutely continuous curves in P2 (Rd ), we recover the Benamou– Brenier (see [12], where the formula was introduced for numerical purposes) formula for the Wasserstein distance: Z 1 d 2 2 W2 (µ0 , µ1 ) = min kvt kL2 (µt ;Rd ) dt : µt + ∇ · (vt µt ) = 0 . (2.44) dt 0 Indeed, for any admissible curve we use the inequality between L2 norm of vt and metric derivative to obtain: Z 1 Z 1 2 kvt kL2 (µt ;Rd ) dt ≥ |µ0 |2 (t) dt ≥ W22 (µ0 , µ1 ). 0 0 Conversely, since we know that P2 (Rd ) is a length space, we can use a geodesic µt and its tangent vector field vt to obtain equality in (2.44). We also show that optimal transport maps belong to Tanµ P2 (Rd ) under quite general conditions. In this way we recover in a more general framework the Riemannian interpretation of the Wasserstein distance developed by Otto in [57] (see also [56], 21 [45]) and used to study the long time behaviour of the porous medium equation. In the original paper [57], (2.44) is derived using formally the concept of Riemannian submersion and the family of maps φ 7→ φ# µ (indexed by µ L d ) from Arnold’s space of diffeomorphisms into the Wasserstein space. In Otto’s d formalism tangent vectors are rather thought as s = dt µt and these vectors are identified, via the continuity equation, with −D · (vs µt ). Moreover vs is chosen to be the gradient of a function ψs , so that D · (∇ψs µt ) = −s. Then the metric tensor is induced by the identification s 7→ ∇φs as follows: Z 0 hs, s iµt := h∇ψs , ∇ψs0 i dµt . Rd As noticed in [57], both the identification between tangent vectors and gradients and the scalar product depend on µt , and these facts lead to a non trivial geometry of the Wasserstein space. We prefer instead to consider directly vt as the tangent vectors, allowing them to be not necessarily gradients: this leads to (2.43). Another consequence of the characterization of absolutely continuous curves is a result, given in Proposition 2.22, concerning the infinitesimal behaviour of the Wasserstein distance along absolutely continuous curves µt : given the tangent vector field vt to the curve, we show that lim h→0 W2 (µt+h , (i + hvt )# µt ) =0 |h| for L 1 -a.e. t ∈ (a, b). Moreover the rescaled optimal transport maps between µt and µt+h converge to the transport plan (i × vt )# µt associated to vt (see (2.57)). As a consequence, we will obtain in Theorem 2.23 a key formula for the derivative of the map t 7→ W22 (µt , ν). Theorem 2.17 (Absolutely continuous curves and the continuity equation) Let I be an open interval in R, let µt : I → P2 (Rd ) be an absolutely continuous curve and let |µ0 | ∈ L1 (I) be its metric derivative, given by (2.2). Then there exists a Borel vector field v : (x, t) 7→ vt (x) such that vt ∈ L2 (µt ; Rd ), kvt kL2 (µt ;Rd ) ≤ |µ0 |(t) for L 1 -a.e. t ∈ I, (2.45) and the continuity equation ∂t µt + ∇ · (vt µt ) = 0 in Rd × I holds in the sense of distributions, i.e. Z Z ∂t ϕ(x, t)+hvt (x), ∇x ϕ(x, t)i dµt (x) dt = 0 I Rd (2.46) ∀ ϕ ∈ Cc∞ (Rd ×I). (2.47) Moreover, for L 1 -a.e. t ∈ I vt belongs to the closure in L2 (µt , Rd ) of the subspace generated by the gradients ∇ϕ with ϕ ∈ Cc∞ (Rd ). 22 Conversely, if a narrowly continuous curve µt : I → P2 (Rd ) satisfies the continuity equation for some Borel velocity field vt with kvt kL2 (µt ;Rd ) ∈ L1 (I) then µt : I → P2 (Rd ) is absolutely continuous and |µ0 |(t) ≤ kvt kL2 (µt ;Rd ) for L 1 -a.e. t ∈ I. Proof. Taking into account that any absolutely continuous curve can be reparametrized by arc length (see for instance [9]) and Lemma 2.9, we will assume with no loss of generality that |µ0 | ∈ L∞ (I) in the proof of the first statement. To fix the ideas, we also assume that I = (0, 1). First of all we show that for every ϕ ∈ Cc∞ (Rd ) the function t 7→ µt (ϕ) is absolutely continuous, and its derivative can be estimated with the metric derivative of µt . Indeed, for s, t ∈ I and µst ∈ Γo (µs , µt ) we have, using the Hölder inequality, Z |µt (ϕ) − µs (ϕ)| = ϕ(y) − ϕ(x) dµst ≤ Lip(ϕ)W2 (µs , µt ), Rd whence the absolute continuity follows. In order to estimate more precisely the derivative of µt (ϕ) we introduce the upper semicontinuous and bounded map |∇ϕ(x)| if x = y, H(x, y) := |ϕ(x) − ϕ(y)| if x 6= y, |x − y| and notice that, setting µh = µ(s+h)s , we have Z |µs+h (ϕ) − µs (ϕ)| 1 ≤ |x − y|H(x, y) dµh |h| |h| Rd ×Rd Z 1/2 W2 (µs+h , µs ) 2 ≤ H (x, y) dµh . |h| Rd ×Rd If t is a point where s 7→ µs is metrically differentiable, using the fact that µh → (x, x)# µt narrowly (because their marginals are narrowly converging, any limit point belongs to Γo (µt , µt ) and is concentrated on the diagonal of Rd × Rd ) we obtain lim sup h→0 |µt+h (ϕ) − µt (ϕ)| ≤ |µ0 |(t) |h| Z H 2 (x, x) dµt Rd 1/2 = |µ0 |(t)k∇ϕkL2 (µt ;Rd ) . (2.48) R Set Q = Rd ×I and let µ = µt dt ∈ P(Q) be the measure whose disintegration is {µt }t∈I . For any ϕ ∈ Cc∞ (Q) we have Z Z ϕ(x, s) − ϕ(x, s − h) dµ(x, s) ∂s ϕ(x, s) dµ(x, s) = lim h↓0 h Q Q Z Z Z 1 ϕ(x, s) dµs (x) − ϕ(x, s) dµs+h (x) ds. = lim h↓0 I h Rd Rd 23 Taking into account (2.48), Fatou’s Lemma yields Z Z Z 1/q 0 ∂s ϕ(x, s) dµ(x, s) ≤ |µ |(s) |∇ϕ(x, s)|q dµs (x) ds Q J Rd Z 1/p Z 1/q ≤ |µ0 |p (s) ds |∇ϕ(x, s)|q dµ(x, s) , J Q (2.49) d where J ⊂ I is any interval such that supp ϕ ⊂ J × R . If V denotes the closure n o in L2 (µ; Rd ) of the subspace V := ∇ϕ, ϕ ∈ Cc∞ (Q) , the previous formula says that the linear functional L : V → R defined by Z ∂s ϕ(x, s) dµ(x, s) L(∇ϕ) := − Q can be uniquely extended to a bounded functional on V . Therefore the minimum problem Z 1 2 |w(x, s)| dµ(x, s) − L(w) : w ∈ V (2.50) min 2 Q admits a unique solution w ∈ V such that v := jq (w) satisfies Z hv(x, s), ∇ϕ(x, s)i dµ(x, s) = hL, ∇ϕi ∀ϕ ∈ Cc∞ (Q). (2.51) Q Setting vt (x) = v(x, t) and using the definition of L we obtain (2.47). Moreover, choosing a sequence (∇ϕn ) ⊂ V converging to w in Lq (µ; Rd ), it is easy to show that for L 1 -a.e. t ∈ I there exists a subsequence n(i) (possibly depending on t) such that ∇ϕn(i) (·, t) ∈ Cc∞ (Rd ) converge in L2 (µt ; Rd ) to w(·, t) = jp (v(·, t)). Finally, choosing an interval J ⊂ I and η ∈ Cc∞ (J) with 0 ≤ η ≤ 1, (2.51) and (2.49) yield Z Z Z 2 η(s)|v(x, s)| dµ(x, s) = ηhv, wi dµ = lim ηhv, ∇ϕn i dµ Q n→∞ Q Z = lim hL, ∇(ηϕn )i ≤ n→∞ 1/2 Z Z 0 2 = |µ | (s) ds |µ0 |2 (s) ds J |w|2 dµ 1/2 Rd ×J J Q Z 1/2 |∇ϕn |2 dµ lim n→∞ Rd ×J 1/2 Z 1/2 Z 0 2 |v|2 dµ . |µ | (s) ds = 1/2 J Rd ×J Taking a sequence of smooth approximations of the characteristic function of J we obtain Z Z Z 2 |vs (x)| dµs (x) ds ≤ |µ0 |2 (s) ds, (2.52) J Rd J and therefore kvt kL2 (µt ,Rd ) ≤ |µ0 |(t) for L 1 -a.e. t ∈ I. Now we show the converse implication. We apply the regularization Lemma 2.15, finding approximations µεt , vtε satisfying the continuity equation, the uniform integrability condition (2.16) and the local regularity assumptions (2.21). Therefore, we can apply Proposition 2.14, obtaining the representation formula µεt = 24 (Ttε )# µε0 , where Ttε is the maximal solution of the ODE Ṫtε = vtε (Ttε ) with the initial condition T0ε = x (see Lemma 2.10). Now, taking into account Lemma 2.16, we estimate Z Z Z t2 |Ttε2 (x) − Ttε1 (x)|2 dµε0 ≤ (t2 − t1 ) |Ṫtε (x)|2 dt dµε0 (2.53) Rd Rd t1 t2 Z Z = (t2 − t1 ) t1 Z t2 Rd Z |vtε (x)|2 dµεt dt |vt |2 dµt dt, ≤ (t2 − t1 ) Rd t1 therefore the transport plan γ ε := (Ttε1 × Ttε2 )# µε0 ∈ Γ(µεt1 , µεt2 ) satisfies W22 (µεt1 , µεt2 ) Z 2 ≤ ε t2 Z Z |vt |2 dµt dt. |x − y| dγ ≤ (t2 − t1 ) R2d t1 Rd Since for every t ∈ I µεt converges narrowly to µt as ε → 0, a compactness argument (see Lemma 5.2.2 or Proposition 7.1.3 of [8]) gives W22 (µt1 , µt2 ) Z ≤ Z 2 t2 Z |x − y| dγ ≤ (t2 − t1 ) R2d t1 |vt |2 dµt dt Rd for some optimal transport plan γ between µt1 and µt2 . Since t1 and t2 are arbitrary this implies that µt is absolutely continuous and that its metric derivative is less than kvt kLp (µt ;Rd ) for L 1 -a.e. t ∈ I. Notice that the continuity equation (2.46) involves only the action of vt on ∇ϕ with ϕ ∈ Cc∞ (Rd ). Moreover, Theorem 2.17 shows that the minimal norm among all possible velocity fields vt is the metric derivative and that vt belongs to the L2 closure of gradients of functions in Cc∞ (Rd ). These facts suggest a “canonical” choice of vt and the following definition of tangent bundle to P2 (Rd ). Definition 2.18 (Tangent bundle) Let µ ∈ P2 (Rd ). We define L2 (µ;Rd ) Tanµ P2 (Rd ) := {∇ϕ : ϕ ∈ Cc∞ (Rd )} . This definition is motivated by the following variational selection principle: Lemma 2.19 (Variational selection of the tangent vectors) A vector v ∈ L2 (µ; Rd ) belongs to the tangent space Tanµ P2 (Rd ) iff kv + wkL2 (µ;Rd ) ≥ kvkL2 (µ;Rd ) ∀ w ∈ L2 (µ; Rd ) such that ∇ · (wµ) = 0. (2.54) In particular, for every v ∈ L2 (µ; Rd ), denoting by Π(v) its orthogonal projection on Tanµ P2 (Rd ), we have ∇ · ((v − Π(v))µ) = 0. 25 Proof. By the convexity of the L2 norm, (2.54) holds iff Z hv, wi dµ = 0 for any w ∈ L2 (µ; Rd ) such that ∇ · (wµ) = 0 (2.55) Rd 2 and, by the Hahn-Banach theorem, this is true iff v belongs to the L closure ∞ d of ∇φ : φ ∈ Cc (R ) . Therefore (2.54) holds iff v belongs to Tanµ P2 (Rd ). The remarks above lead also to the following characterization of divergencefree vector fields (we skip the elementary proof of this statement): Proposition 2.20 Let w ∈ L2 (µ; Rd ). Then ∇ · (wµ) = 0 iff kv − wkL2 (µ;Rd ) ≥ kvkL2 (µ;Rd ) ∀v ∈ Tanµ P2 (Rd ). Moreover equality holds for some v iff w = 0. By the characterization (2.55) of Tanµ P2 (Rd ) we obtain also d 2 d Tan⊥ µ P2 (R ) = v ∈ L (µ, R ) : ∇ · (vµ) = 0 . (2.56) The following two propositions show that the notion of tangent space is consistent with the metric structure, with the continuity equation, and with optimal transport maps (if any). Proposition 2.21 (Tangent vector to a.c. curves) Let µt : I → P2 (Rd ) be an absolutely continuous curve and let vt ∈ Lp (µt ; Rd ) be such that (2.46) holds. Then vt satisfies (2.45) as well if and only if vt = Π(vt ) ∈ Tanµt P2 (Rd ) for L 1 -a.e. t ∈ I. The vector vt is uniquely determined L 1 -a.e. in I by (2.45) and (2.46). Proof. The uniqueness of vt is a straightforward consequence of the linearity with respect to the velocity field of the continuity equation and of the strict convexity of the Lp norm. In the proof of Theorem 2.17 we built vector fields vt ∈ Tanµt P2 (Rd ) satisfying (2.45) and (2.46). By uniqueness, it follows that conditions (2.45) and (2.46) imply vt ∈ Tanµt P2 (Rd ) for L 1 -a.e. t. In the following proposition we recover the tangent vector field to a curve (µt ) ⊂ P2a (Rd ) through the infinitesimal behaviour of optimal transport maps along the curve. See Proposition 8.4.6 of [8] for a more general result in the case of curves (µt ) ⊂ P2 (Rd ). Proposition 2.22 (Optimal plans along a.c. curves) Let µt : I → P2a (Rd ) be an absolutely continuous curve and let vt ∈ Tanµt P2 (Rd ) be characterized by Proposition 2.21. Then, for L 1 -a.e. t ∈ I the following properties hold: lim h→0 1 µt+h (t − i) = vt h µt 26 in L2 (µt ; Rd ), (2.57) µ where tµtt+h is the unique optimal transport map between µt and µt+h , and W2 (µt+h , (i + hvt )# µt ) = 0. |h| lim h→0 (2.58) Proof. Let D ⊂ Cc∞ (Rd ) be a countable set with the following property: for any integer R > 0 and any ψ ∈ Cc∞ (Rd ) with supp ψ ⊂ BR there exist (ϕn ) ⊂ D with supp ϕn ⊂ BR and ϕn → ϕ in C 1 (Rd ). We fix t ∈ I such that W2 (µt+h , µt )/|h| → |µ0 |(t) = kvt kL2 (µt ) and Z µt+h (ϕ) − µt (ϕ) = lim h∇ϕ, vt i dµt ∀ϕ ∈ D. (2.59) h→0 h Rd Since D is countable, the metric differentiation theorem implies that both conditions are fulfilled for L 1 -a.e. t ∈ I. Set µ sh := tµt+h −i t h and fix ϕ ∈ D and a weak limit point s0 of sh as h → 0. We use the identity Z 1 µt+h (ϕ) − µt (ϕ) = ϕ(tµµt+h (x)) − ϕ(x) dµt t h h Rd Z Z 1 = ϕ(x + hsh (x)) − ϕ(x) dµh = h h∇ϕ(x), sh (x)i + ωx (h) dµh h Rd Rd with ωx (h) bounded and infinitesimal as h → 0, to obtain Z Z h∇ϕ, vt i dµt = h∇ϕ, s0 i dµt (x). Rd Rd It follows that ∇ · ((s0 − vt )µt ) = 0. We now claim that Z (2.60) 2 |s0 |2 dµt (x) ≤ [|µ0 |(t)] . (2.61) Rd Indeed Z Rd |s0 |2 dµt (x) ≤ lim inf Z |sh |2 dµt Z 1 = lim inf 2 |tµµt+h (x) − x|2 dµt t h→0 h Rd W2 (µt+h , µt ) = lim inf = |µ0 |2 (t). h→0 h2 h→0 Rd From (2.61) we obtain that ks0 kL2 (µt ;Rd ) ≤ [|µ0 |(t)] = kvt kLp (µt ;Rd ) . Therefore Proposition 2.20 entails that s0 = vt . Moreover, the first inequality above is strict if sh converge weakly, but not strongly, to s0 . Therefore (2.57) holds. Now we show (2.58). By (2.10) we can estimate the distance between µt+h µ kL2 (µt ;Rd ) , and because of (2.57) this norm and (i + hvt )# µt with ki + hvt − tµt+h t tends to 0 faster than h. 27 As an application of (2.58) we are now able to show the L 1 -a.e. differentiability of t 7→ W2 (µt , σ) along absolutely continuous curves µt , wuth µt ∈ P2a (Rd ). Theorem 2.23 (Generic differentiability of W2 (µt , σ)) Let µt : I → P2a (Rd ) be an absolutely continuous curve, let σ ∈ P2a (Rd ) and let vt ∈ Tanµt P2 (Rd ) be its tangent vector field, characterized by Proposition 2.21. Then Z d 2 W (µt , σ) = 2 hx − tσµt (x), vt (x)i dµt (x) for L 1 -a.e. t ∈ I. (2.62) dt 2 Rd Proof. We show that the stated property is true at any t̄ where (2.58) holds and the derivative of t 7→ W2 (µt , σ) exists (recall that this map is absolutely continuous). Due to (2.58), we know that the limit W22 ((i + hvt̄ )# µt̄ , σ) − W22 (µt̄ , σ) h→0 h L := lim d exists and coincides with dt W22 (µt , σ) evaluated at t = t̄, and we have to show that it is equal to the left hand side in (2.62). Using the transport s := (i + hvt̄ ) ◦ tµσt̄ to estimate from above W2 ((i + hvt̄ )# µt̄ , σ), we get Z W22 ((i + hvt̄ )# µt̄ , σ) ≤ |(i + hvt̄ ) ◦ tµσt̄ − i|2 dσ d ZR Z = |i + hvt̄ − tσµt̄ |2 dµt̄ = 2 hi − tµt̄ , vt̄ i dµt̄ + o(h). Rd Rd Dividing both sides by h and taking limits as h ↓ 0 or h ↑ 0 we obtain Z L≤2 hx − tσµt (x), vt (x)i dµt (x) ≤ L. Rd The argument in the previous proof leads to the so-called superdifferentiability property of the Wasserstein distance, a theme used in many papers on this subject (see in particular [50] and Chapter 10 of [8]). Finally, we compare the tangent space arising from the closure of gradients of smooth compactly supported function with the tangent space built using optimal maps. Proposition 2.22 suggests indeed another possible definition of tangent cone to a measure µ ∈ P2a (Rd ): we define L2 (µ;Rd ) Tanrµ P2 (Rd ) := λ(tνµ − i) : ν ∈ P2 (Rd ), λ > 0 . (2.63) As a matter of fact, the two concepts coincide (see also §8.5 of [8] for a more general statement). Theorem 2.24 For any µ ∈ P2a (Rd ) we have Tanµ P2 (Rd ) = Tanrµ P2 (Rd ). 28 Proof. We show first that optimal transport maps t = tσµt belong to Tanµ P2 (Rd ). Assume that supp σ is contained in B R (0) for some R > 0. We know that we can represent t = ∇ϕ, where ϕ is a Lipschitz convex function. We consider now the mollified functions ϕε . A truncation argument enabling an approximation by gradients with compact support gives that ∇ϕε belong to Tanµ P2 (Rd ). Due to the absolute continuity of µ it is immediate to check using the dominated convergence theorem that ∇ϕε converge to ∇ϕ in L2 (µ; Rd ), therefore ∇ϕ ∈ Tanµ P2 (Rd ) as well. In the case when the support of σ is not bounded we approximate σ in P2 (Rd ) by measures with compact support (details are worked out in Lemma 8.5.3 of [8]). Now we show the opposite inclusion: if ϕ ∈ Cc∞ (Rd ) it is always possible to choose λ > 0 such that x 7→ 21 |x|2 + λ−1 φ(x) is convex. Therefore r := i + λ−1 ∇ϕ is the optimal map between µ and ν := r # µ; by (2.63) we obtain that ∇φ = λ(r − i) belongs to Tanrµ P2 (Rd ). 29 Convex functionals in P2 (Rd ) 3 The importance of geodesically convex functionals in Wasserstein spaces was firstly pointed out by McCann [51], who introduced the three basic examples we will discuss in detail in 3.4, 3.6, 3.8. His original motivation was to prove the uniqueness of the minimizer of an energy functional which results from the sum of the above three contributions. Applications of this idea have been given to (im)prove many deep functional (Brunn-Minkowski, Gaussian, (logarithmic) Sobolev, Isoperimetric, etc.) inequalities: we refer to Villani’s book [65, Chap. 6] (see also the survey [37]) for a detailed account on this topic. Connections with evolution equations have also been exploited [53, 57, 58, 1, 21], mainly to study the asymptotic decay of the solution to the equilibrium. From our point of view, convexity is a crucial tool to study the well posedness and the basic regularity properties of gradient flows. Thus in this section we discuss the basic notions and properties related to this concept: the first part of Section 3.1 is devoted to fixing the notion of convexity along geodesics in P2 (Rd ). Section 3.2 discusses in great generality the main examples of geodesically convex functionals: potential, interaction and internal energy. We consider also the convexity properties of the map µ 7→ −W22 (µ, ν) and its geometric implications. In the last section we give a closer look to the convexity properties of general Relative Entropy functionals, showing that they are strictly related to the logconcavity of the reference measures. 3.1 λ-geodesically convex functionals in Pp (X) In McCann’s approach, a functional φ : P2a (Rd ) → (−∞, +∞] is displacement convex if 2 setting µ1→2 := i + t(t − i) # µ1 , with t = tµµ1 , t (3.1) the map t ∈ [0, 1] 7→ φ µ1→2 is convex, ∀ µ1 , µ2 ∈ P2a (Rd ). t We have seen that the curve µ1→2 is the unique constant speed geodesic connectt ing µ1 to µ2 ; therefore the following definition seems natural, when we consider functionals whose domain contains general probability measures. Definition 3.1 (λ-convexity along geodesics) Let φ : P2 (Rd ) → (−∞, +∞]. Given λ ∈ R, we say that φ is λ-geodesically convex in P2 (Rd ) if for every couple µ1 , µ2 ∈ P2 (Rd ) there exists µ ∈ Γo (µ1 , µ2 ) such that φ(µ1→2 ) ≤ (1 − t)φ(µ1 ) + tφ(µ2 ) − t λ t(1 − t)W22 (µ1 , µ2 ) 2 ∀ t ∈ [0, 1], (3.2) where µ1→2 = (1 − t)π 1 + tπ 2 # µ, π 1 , π 2 being the projections onto the first t and the second coordinate in Rd × Rd , respectively. 30 Remark 3.2 (The map t 7→ φ(µ1→2 ) is λ-convex) The standard definition t of λ-convexity for a map ϕ : Rn → R requires λ ϕ(tx+(1−t)y) ≤ tϕ(x)+(1−t)ϕ(y)− t(1−t)|x−y|2 2 ∀t ∈ [0, 1], x, y ∈ Rn (3.3) (equivalently, if ϕ is continuous, one might ask that D2 ϕ ≥ λI in the sense of distributions). The definition of λ-convexity expressed through (3.2) implies that the map t ∈ [0, 1] 7→ φ(µ1→2 ) is λ-convex, (3.4) t thus recovering an (apparently) stronger and more traditional form. This equivalence follows easily bythe fact that for t1 < t2 in [0, 1] with {t1 , t2 } = 6 {0, 1} 1→2 1→2 1→2 the plan πt1→2 × π µ is the unique element of Γ (µ , µ ). o t1 t2 t2 1 # Let us discuss now the convexity properties of the squared Wasserstein distence. In the 1-dimensional case it can be easily shown (see Theorem 6.0.2 of [8]) that P2 (R) is isometrically isomorphic to a closed convex subset of an Hilbert space: precisely the space of nondecreasing functions in (0, 1) (the inverses of distribution functions), viewed as a subset of L2 (0, 1). Thus the 2-Wasserstein distance in R satisfies the generalized parallelogram rule W22 (µ1 , µ2→3 ) = (1 − t)W22 (µ1 , µ2 ) + tW22 (µ1 , µ3 ) − t(1 − t)W22 (µ2 , µ3 ) t ∀ t ∈ [0, 1], µ1 , µ2 , µ3 ∈ P2 (R1 ). (3.5) On the other hand, if the ambient space has dimension ≥ 2 the following example shows that there is no constant λ such that W22 (·, µ1 ) is λ-convex along geodesics. Example 3.3 (The squared distance function is not λ-convex along geodesics) Let d = 2 and 1 1 δ(0,0) + δ(2,1) , µ3 := δ(0,0) + δ(−2,1) . µ2 := 2 2 It is easy to check that the unique optimal map r pushing µ2 to µ3 maps (0, 0) in (−2, 1) and (2, 1) in (0, 0), therefore there is a unique constant speed geodesic joining the two measures, given by 1 µ2→3 := δ(−2t,t) + δ(2−2t,1−t) t ∈ [0, 1]. t 2 Choosing µ1 := 12 δ(0,0) + δ(0,−2) , there are two maps r t , st pushing µ1 to µ2→3 , given by t r t (0, 0) = (−2t, t), r t (0, −2) = (2 − 2t, 1 − t), st (0, 0) = (2 − 2t, 1 − t), st (0, −2) = (−2t, t). Therefore 13 2 9 1 2 W22 (µ2→3 , µ ) = min 5t − 7t + , 5t − 3t + t 2 2 has a concave cusp at t = 1/2 and therefore is not λ-convex along the geodesic µ2→3 for any λ ∈ R. t 31 3.2 Examples of convex functionals in P2 (Rd ) In this section we introduce the main classes of geodesically convex functionals. Example 3.4 (Potential energy) Let V : Rd → (−∞, +∞] be a proper, lower semicontinuous function whose negative part has a quadratic growth, i.e. V (x) ≥ −A − B|x|2 ∀ x ∈ Rd In P2 (Rd ) we define for some A, B ∈ R+ . (3.6) Z V(µ) := V (x) dµ(x). (3.7) Rd Evaluating V on Dirac’s masses we check that V is proper; since V − has at most quadratic growth it is not hard to show that V is lower semicontinuous in P2 (Rd ). If V is bounded from below we have even lower semicontinuity w.r.t. narrow convergence. The following simple proposition shows that V is convex along all interpolating curves induced by admissible plans; choosing optimal plans one obtains in particular that V is convex along geodesics. Proposition 3.5 (Convexity of V) If V is λ-convex then for every µ1 , µ2 ∈ D(V) and µ ∈ Γ(µ1 , µ2 ) we have Z λ 1 2 V(µ1→2 ) ≤ (1 − t)V(µ ) + tV(µ ) − t(1 − t) |x1 − x2 |2 dµ(x1 , x2 ). (3.8) t 2 Rd ×Rd In particular V is λ-convex along geodesics. Proof. Since V is bounded from below either by a continuous affine functional (if λ ≥ 0) or by a quadratic function (if λ < 0) its negative part satisfies (3.6); therefore the definition (3.7) makes sense. Integrating (3.3) along any admissible transport plan µ ∈ Γ(µ1 , µ2 ) with 1 µ , µ2 ∈ D(V) we obtain (3.8), since Z V ((1 − t)x1 + tx2 ) dµ(x1 , x2 ) V(µ1→2 ) = t Rd ×Rd Z λ (1 − t)V (x1 ) + tV (x2 ) − t(1 − t)|x1 − x2 |2 dµ(x1 , x2 ) ≤ 2 Rd ×Rd Z λ |x1 − x2 |2 dµ(x1 , x2 ). = (1 − t)V(µ1 ) + tV(µ2 ) − t(1 − t) 2 Rd ×Rd Since V(δx ) = V (x), it is easy to check that the conditions on V are also necessary for the validity of the previous proposition. Example 3.6 (Interaction energy) Let us fix an integer k > 1 and let us consider a lower semicontinuous function W : Rkd → (−∞, +∞], whose negative part satisfies the usual quadratic growth condition. Denoting by µ×k the measure µ × µ × · · · × µ on Rkd , we set Z Wk (µ) := W (x1 , x2 , . . . , xk ) dµ×k (x1 , x2 , . . . , xk ). (3.9) Rkd 32 If ∃ x ∈ Rkd : W (x, x, . . . , x) < +∞, (3.10) then Wk is proper; its lower semicontinuity follows from the fact that µn → µ in P2 (Rd ) =⇒ ×k µ×k n →µ in P2 (Rkd ). (3.11) Here the typical example is k = 2 and W (x1 , x2 ) := W̃ (x1 − x2 ) for some W̃ : Rd → (−∞, +∞] with W̃ (0) < +∞. Proposition 3.7 (Convexity of W) If W is convex then the functional Wk is convex along any interpolating curve µ1→2 , µ ∈ Γ(µ1 , µ2 ), in P2 (Rd ). t Proof. Observe that Wk is the restriction to the subset n o P2× (Rkd ) := µ×k : µ ∈ P2 (Rd ) of the potential energy functional W on P2 (Rkd ) given by Z W(µ) := W (x1 , · · · , xk ) dµ(x1 , . . . , xk ). Rkd We consider the linear permutation of coordinates P : (R2d )k → (Rkd )2 defined by P (x1 , y1 ), (x2 , y2 ), . . . , (xk , yk ) := (x1 , . . . xk ), (y1 , . . . yk ) . ×k kd 2 If µ ∈ Γ(µ1 , µ2 ) then it is easy to check that P# µ×k ∈ Γ(µ×k 1 , µ2 ) ⊂ P((R ) ) and ×k (πt1→2 )# P# (µ×k ) = P# (πt1→2 )# µ . Therefore all the convexity properties for Wk follow from the corresponding ones of W. Example 3.8 (Internal energy) Let F : [0, +∞) → (−∞, +∞] be a proper, lower semicontinuous convex function such that F (0) = 0, lim inf s↓0 F (s) d > −∞ for some α > . α s d+2 We consider the functional F : P2 (Rd ) → (−∞, +∞] defined by (R F (ρ(x)) dL d (x) if µ = ρ · L d ∈ P2a (Rd ), Rd F(µ) := +∞ otherwise. (3.12) (3.13) Remark 3.9 (The meaning of condition (3.12)) Condition (3.12) simply guarantees that the negative part of F (µ) is integrable in Rd . For, let us observe that there exist nonnegative constants c1 , c2 such that the negative part of F satisfies F − (s) ≤ c1 s + c2 sα ∀ s ∈ [0, +∞), 33 and it is not restrictive to suppose α ≤ 1. Since µ = ρ L d ∈ P2 (Rd ) and 2α 1−α > d we have Z ρα (x) dL d (x) = ρα (x)(1 + |x|)2α (1 + |x|)−2α dL d (x) d d R R Z α Z 1−α 2 ≤ ρ(x)(1 + |x|) dL d (x) (1 + |x|)−2α/(1−α) dL d (x) < +∞ Z Rd Rd and therefore F − (ρ) ∈ L1 (Rd ). Remark 3.10 (Lower semicontinuity of F) General results on integral functionals (see for instance [7]) show that F is narrowly lower semicontinuous if F has a superlinear growth at infinity. Indeed, under this assumption sequences µn = ρn L d on which F is bounded have the property that (ρn ) is sequentially weakly relatively compact in L1 (Rd ), and the convexity of F together with the lower semicontinuity of F ensure the sequential lower semicontinuity with respect to the weak L1 topology. Proposition 3.11 (Convexity of F) If the map s 7→ sd F (s−d ) is convex and non increasing in (0, +∞), (3.14) then the functional F is convex along geodesics in P2 (Rd ). Proof. We consider two measures µi = ρi L d ∈ D(F), i = 1, 2 and the optimal transport map r such that r # µ1 = µ2 . Setting r t := (1 − t)i + tr, by the characterization of constant speed geodesics we know that r t is the optimal transport map between µ1 and µt := r t# µ1 for any t ∈ [0, 1], and µt = ρt L d ∈ P2a (Rd ), with ρt (r t (x)) = ρ1 (x) ˜ t (x) det ∇r By (2.4) it follows that Z Z F(µt ) = F (ρt (y)) dy = Rd F for µ1 -a.e. x ∈ Rd . Rd ρ(x) det ∇r t (x) dx. det ∇r t (x) Since for a diagonalizable map D with nonnegative eigenvalues t 7→ det((1 − t)I + tD)1/d is concave in [0, 1], (3.15) the integrand above may be seen as the composition of the convex and nonincreasing map s 7→ sd F (ρ(x)/sd ) and of the concave map in (3.15), so that the resulting map is convex in [0, 1] for µ1 -a.e. x ∈ Rd . Thus we have F ρ1 (x) ˜ t (x) ≤ (1 − t)F (ρ1 (x)) + tF (ρ2 (x)) det ∇r ˜ det ∇r t (x) 34 and the thesis follows by integrating this inequality in Rd . In order to express (3.14) in a different way, we introduce the function LF (z) := zF 0 (z) − F (z) which satisfies − LF (e−z )ez = d F (e−z )ez ; (3.16) dz denoting by F̂ the modified function F (e−z )ez we have the simple relation L̂F (z) = − 2 d c2 (z) = − d L̂ (z) = d F̂ (z), F̂ (z), L F F dz dz dz 2 2 0 LF (z) := LLF (z) = zLF (z) − LF (z). where (3.17) The nonincreasing part of condition (3.14) is equivalent to say that LF (z) ≥ 0 ∀ z ∈ (0, +∞), (3.18) and it is in fact implied by the convexity of F . A simple computation in the case F ∈ C 2 (0, +∞) shows d2 d2 d2 d −d d 2 F (s )s = F̂ (d · log s) = L̂ (d · log s) + L̂F (d · log s) 2 , F ds2 ds2 s2 s and therefore 1 (3.14) is equivalent to L2F (z) ≥ − LF (z) ∀ z ∈ (0, +∞), d (3.19) i.e. zL0F (z) ≥ 1− 1 LF (z), d the map z 7→ z 1/d−1 LF (z) is non increasing. (3.20) Observe that the bigger is the dimension d, the stronger are the above conditions, which always imply the convexity of F . Remark 3.12 (A “dimension free” condition) The weakest condition on F yielding the geodesic convexity of F in any dimension is therefore L2F (z) = zL0F (z) − LF (z) ≥ 0 ∀ z ∈ (0, +∞). (3.21) Taking into account (3.17), this is also equivalent to ask that the map s 7→ F (e−s )es is convex and nonincreasing in (0, +∞). (3.22) Among the functionals F satisfying (3.14) we quote: the entropy functional: F (s) = s log s, 1 the power functional: F (s) = sm m−1 (3.23) 1 for m ≥ 1 − . d (3.24) Observe that the entropy functional and the power functional with m > 1 have a superlinear growth. In order to deal with the power functional with m ≤ 1, 35 due to the failure of the lower semicontinuity property one has to introduce a suitable relaxation F ∗ of it, defined by [43, 16] Z 1 ∗ F (µ) := F (ρ(x)) dL d (x) with µ = ρ · L d + µs , µs ⊥ L d . (3.25) m − 1 Rd In this case the functional takes only account of the density of the absolutely continuous part of µ w.r.t. L d and the domain of F ∗ is the whole P2 (Rd ). The functional F ∗ retains the convexity properties of F, see [8]. Example 3.13 (The opposite Wasserstein distance) Let us fix a base measure µ1 ∈ P2 (Rd ) and let us consider the functional 1 φ(µ) := − W22 (µ1 , µ). 2 (3.26) Proposition 3.14 For each couple µ2 , µ3 ∈ P2 (Rd ) and each transfer plan µ2 3 ∈ Γ(µ2 , µ3 ) we have W22 (µ1 , µ2→3 ) ≥ (1 − t)W22 (µ1 , µ2 ) + tW22 (µ1 , µ3 ) t Z − t(1 − t) |x2 − x3 |2 dµ2 3 (x2 , x3 ) ∀ t ∈ [0, 1]. (3.27) Rd ×Rd In particular the map φ : µ 7→ − 12 W22 (µ1 , µ) is (−1)-convex along geodesics. Proof. For µ2 3 ∈ Γ(µ2 , µ3 ), we can find (see Proposition 7.3.1 of [8]) µ ∈ P(Rd × Rd × Rd ) whose projection on the second and third variable is µ2 3 and such that (π 1 , (1 − t)π 2 + tπ 3 )# µ ∈ Γo (µ1 , µ2→3 ), (3.28) t with µ2→3 := ((1 − t)π 2 + tπ 3 )# µ2 3 . Therefore t Z 2 1 2→3 W2 (µ , µt ) = |(1 − t)x2 + tx3 − x1 |2 dµ(x1 , x2 , x3 ) R3d Z = (1 − t)|x2 − x1 |2 + t|x3 − x1 |2 − t(1 − t)|x2 − x3 |2 dµ(x1 , x2 , x3 ) R3d Z ≥ (1 − t)W22 (µ1 , µ2 ) + tW22 (µ1 , µ3 ) − t(1 − t) |x2 − x3 |2 dµ2 3 (x2 , x3 ). R2d In particular, choosing optimal plans in (3.27), we obtain the semiconcavity inequality of the Wasserstein distance from a fixed measure µ3 along the constant speed geodesics µ1→2 connecting µ1 to µ2 : t W22 (µ1→2 , µ3 ) ≥ (1 − t)W22 (µ1 , µ3 ) + tW22 (µ2 , µ3 ) − t(1 − t)W22 (µ1 , µ2 ). (3.29) t According to Aleksandrov’s metric notion of curvature (see [5], [46]), this inequality can be interpreted by saying that the Wasserstein space is a positively 36 curved metric space (in short, a P C-space). This was already pointed out by a formal computation in [57], showing also that generically the inequality is strict. An example where strict inequality occurs can be obtained as follows: let d = 2 and µ1 := 1 δ(1,1) + δ(5,3) , 2 µ2 := 1 δ(−1,1) + δ(−5,3) , 2 µ3 := 1 δ(0,0) + δ(0,−4) . 2 Then, it is immediate to check that W22 (µ1 , µ2 ) = 40, W22 (µ1 , µ3 ) = 30, and W22 (µ2 , µ3 ) = 30. On the other hand, the unique constant speed geodesic joining µ1 to µ2 is given by µt := 1 δ(1−6t,1+2t) + δ(5−6t,3−2t) 2 and a simple computation gives 24 = W22 (µ1/2 , µ3 ) > 3.3 30 30 40 + − . 2 2 4 Relative entropy and convex functionals of measures In this section we study in detail the relative entropy functional; although we confine the discussion to a finite-dimensional situation, the formalism used in this section is well adapted to the extension to an infinite-dimensional context, see [8]. Definition 3.15 (Relative entropy) Let γ, µ be Borel probability measures on Rd ; the relative entropy of µ w.r.t. γ is Z dµ dµ log dγ if µ γ, dγ dγ d H(µ|γ) := (3.30) R +∞ otherwise. As in Example 3.8 we introduce the nonnegative, l.s.c., (extended) real, (strictly) convex function s(log s − 1) + 1 if s > 0, H(s) := 1 (3.31) if s = 0, +∞ if s < 0, and we observe that Z dµ H(µ|γ) = H dγ ≥ 0; dγ Rd H(µ|γ) = 0 ⇔ µ = γ. (3.32) Remark 3.16 (Changing γ) Let γ be a Borel measure on Rd and let V : Rd → (−∞, +∞] a Borel map such that V+ has at most quadratic growth γ̃ := e−V · γ 37 is a probability measure. (3.33) Then for measures in P2 (Rd ) the relative entropy w.r.t. γ is well defined by the formula Z H(µ|γ) := H(µ|γ̃) − V (x) dµ(x) ∈ (−∞, +∞] ∀ µ ∈ P2 (Rd ). (3.34) Rd In particular, when γ is the d-dimensional Lebesgue measure, we find the standard entropy functional introduced in (3.23). More generally, we can consider a proper, l.s.c., convex function F : [0, +∞) → [0, +∞] with superlinear growth (3.35) and the related functional F(µ|γ) := Z F dµ dγ d R +∞ dγ if µ γ, (3.36) otherwise. Lemma 3.17 (Joint lower semicontinuity) Let γ n , µn ∈ P(Rd ) be two sequences narrowly converging to γ, µ in P(Rd ). Then lim inf H(µn |γ n ) ≥ H(µ|γ), lim inf F(µn |γ n ) ≥ F(µ|γ). n→∞ n→∞ (3.37) The proof of this lemma follows easily from the next representation formula; before stating it, we need to introduce the conjugate function of F F ∗ (s∗ ) := sup s · s∗ − F (s) < +∞ ∀ s∗ ∈ R, (3.38) s≥0 so that F (s) = sup s∗ · s − F ∗ (s∗ ); (3.39) s∗ ∈R if s0 ≥ 0 is a minimizer of F then F ∗ (s∗ ) ≥ s∗ s0 − F (s0 ), s ≥ s0 ⇒ F (s) = sup s∗ · s − F ∗ (s∗ ). (3.40) s∗ ≥0 ∗ In the case of the entropy functional, we have H ∗ (s∗ ) = es − 1. Now we recall a classical duality formula for functionals defined on measures. Lemma 3.18 (Duality formula) For any γ, µ ∈ P(Rd ) we have Z nZ o F(µ|γ) = sup S ∗ (x) dµ(x) − F ∗ (S ∗ (x)) dγ(x) : S ∗ ∈ Cb0 (Rd ) . (3.41) Rd Rd Proof. Up to an addition of a constant, we can always assume F ∗ (0) = − mins≥0 F (s) = −F (s0 ) = 0. Let us denote by F 0 (µ|γ) the right hand side of (3.41). It is obvious that F 0 (µ|γ) ≤ H(µ|γ), so that we have to prove only the converse inequality. 38 First of all we show that F 0 (µ|γ) < +∞ yields that µ γ. For, let us fix s , ε > 0 and a Borel set A with γ(A) ≤ ε/2. Since µ, γ are finite measures we can find a compact set K ⊂ A, an open set G ⊃ A and a continuous function ζ : Rd → [0, s∗ ] such that ∗ µ(G \ K) ≤ ε, γ(G) ≤ ε, ζ(x) = s∗ on K, ζ(x) = 0 on Rd \ G. Since F ∗ is increasing (by the definition (3.38)) and F ∗ (0) = 0, we have Z Z s∗ µ(K) − F ∗ (s∗ )ε ≤ ζ(x) dµ(x) − F ∗ (ζ(x)) dγ(x) K G Z Z ≤ ζ(x) dµ(x) − F ∗ (ζ(x)) dγ(x) ≤ F 0 (µ|γ) Rd Rd Taking the supremum w.r.t. K ⊂ A and s∗ ≥ 0, and using (3.40) we get εF µ(A)/ε ≤ F 0 (µ|γ) if µ(A) ≥ εs0 . Since F (s) has a superlinear growth as s → +∞, we conclude that µ(A) → 0 as ε ↓ 0. Now we can suppose that µ = ρ · γ for some Borel function ρ ∈ L1 (γ), so that o nZ S ∗ (x)ρ(x) − F ∗ (S ∗ (x)) dγ(x) : S ∗ ∈ Cb0 (Rd ) F 0 (µ|γ) = sup Rd and, for a suitable dense countable set C = {s∗n }n∈N ⊂ R Z F(µ|γ) = sup s∗ ρ(x) − F ∗ (s∗ ) dγ(x) ∗ d R s ∈C Z = lim sup s∗ ρ(x) − F ∗ (s∗ ) dγ(x) k→∞ Rd s∗ ∈Ck where Ck = {s∗1 , · · · , s∗k }. Our thesis follows if we show that for every k Z max s∗ ρ(x) − F ∗ (s∗ ) dγ(x) ≤ F 0 (µ|γ). (3.42) ∗ Rd s ∈Ck For we call n o Aj = x ∈ Rd : s∗j ρ(x) − F ∗ (s∗j ) ≥ s∗i ρ(x) − F ∗ (s∗i ) ∀ i ∈ {1, . . . , k} , and A01 = A1 , A0j+1 = Aj+1 \ j [ Ai . i=1 We find compact sets Kj ⊂ A0j , open sets Gj ⊃ Aj with Gj ∩ Ki = ∅ if i 6= j, and continuous functions ζj such that k X γ(Gj \ Kj ) + µ(Gj \ Kj ) ≤ ε, ζj ≡ s∗j on Kj , j=1 39 ζj ≡ 0 on Rd \ Gj . Pk Pk Denoting by ζ := j=1 ζj , M := j=1 |s∗j |, since the negative part of F ∗ (s∗ ) is bounded above by |s∗ |s0 we have Z k X ∗ ∗ ∗ s ρ(x) − F (s ) dγ(x) = max ∗ Rd s ∈Ck ≤ j=1 k Z X j=1 = j=1 A0j s∗j ρ(x) − F ∗ (s∗j ) dγ(x) s∗j ρ(x) − F ∗ (s∗j ) dγ(x) + ε(M + M s0 ) Kj k Z X Z Z ζ(x)ρ(x) − F ∗ (ζ(x)) dγ(x) + ε(M + M s0 ) Kj ζ(x)ρ(x) − F ∗ (ζ(x)) dγ(x) + ε(M + M s0 + M + F ∗ (M )). ≤ Rd Passing to the limit as ε ↓ 0 we get (3.42). 3.4 Log-concavity and displacement convexity We want to characterize the probability measures γ inducing a geodesically convex relative entropy functional H(·|γ) in P2 (Rd ). The following lemma provides the first crucial property; the argument is strictly related to the proof of the Brunn-Minkowski inequality for the Lebesgue measure, obtained via optimal transportation inequalities [65]. See also [13] for the link between log-concavity and representation formulae like (3.50). Lemma 3.19 (γ is log-concave if H(·|γ) is displacement convex) Suppose that for each couple of probability measures µ1 , µ2 ∈ P(Rd ) with bounded support there exists µ ∈ Γ(µ1 , µ2 ) such that H(·|γ) is convex along the interpolating curve µ1→2 = (1 − t)π 1 + tπ 2 # µ, t ∈ [0, 1]. Then for each couple of open sets t A, B ⊂ Rd and t ∈ [0, 1] we have log γ((1 − t)A + tB) ≥ (1 − t) log γ(A) + t log γ(B). (3.43) Proof. We can obviously assume that γ(A) > 0, γ(B) > 0 in (3.43); we consider µ1 := γ(·|A) = 1 χA · γ, γ(A) µ2 := γ(·|B) = 1 χB · γ, γ(B) observing that H(µ1 |γ) = − log γ(A), H(µ2 |γ) = − log γ(B). (3.44) If µ1→2 is induced by a transfer plan µ ∈ Γ(µ1 , µ2 ) along which the relative t entropy is displacement convex, we have H(µ1→2 |γ) ≤ (1 − t)H(µ1 |γ) + tH(µ2 |γ) = −(1 − t) log γ(A) − t log γ(B). t 40 On the other hand the measure µ1→2 is concentrated on (1 − t)A + tB = t πt1→2 (A × B) and the next lemma shows that − log γ((1 − t)A + tB) ≤ H(µ1→2 |γ). t Lemma 3.20 (Relative entropy of concentrated measures) Let γ, µ ∈ P(Rd ); if µ is concentrated on a Borel set A, i.e. µ(X \ A) = 0, then H(µ|γ) ≥ − log γ(A). (3.45) Proof. It is not restrictive to assume µ γ and γ(A) > 0; denoting by γA the probability measure γ(·|A) := γ(A)−1 χA · γ, we have Z Z dµ dµ 1 · H(µ|γ) = log dµ = dµ log dγ dγA γ(A) d A ZR Z dµ dµ − log γ(A) dµ = H(µ|γA ) − log γ(A) = log dγA A A ≥ − log γ(A) . The previous results justifies the following definition: Definition 3.21 (log-concavity of a measure) We say that a Borel probability measure γ ∈ P(Rd ) is log-concave if for every couple of open sets A, B ⊂ Rd we have log γ((1 − t)A + tB) ≥ (1 − t) log γ(A) + t log γ(B). (3.46) In Definition 3.21 and also in the previous theorem we confined ourselves to pairs of open sets, to avoid the non trivial issue of the measurability of (1 − t)A + tB when A and B are only Borel (in fact, it is an open set whenever A and B are open). Observe that a log-concave measure γ in particular satisfies log γ(Br ((1 − t)x0 + tx1 )) ≥ (1 − t) log γ(Br (x0 )) + t log γ(Br (x1 )), (3.47) for every couple of points x0 , x1 ∈ Rd , r > 0, t ∈ [0, 1]. We want to show that in fact log concavity is equivalent to the geodesic convexity of the Relative Entropy functional H(·|γ). Let us first recall some elementary properties of convex sets in Rd . Let C ⊂ Rd be a convex set; the affine dimension dim C of C is the linear dimension of its affine envelope n o aff C = (1 − t)x0 + tx1 : x0 , x1 ∈ C, t ∈ R , (3.48) which is an affine subspace of Rd . We denote by int C the relative interior of C as a subset of aff C: it is possible to show that int C 6= ∅, int C = C, H k (C \ int C) = 0 if k = dim C, (3.49) where H k is the k-dimensional Hausdorff measure in Rd . The previous theorem shows that log-concavity of γ is equivalent to the convexity of H(µ|γ) along (even generalized) geodesics of the Wasserstein space P2 (Rd ): the link between these two concepts is provided by the representation formula (3.50). 41 Theorem 3.22 Let us suppose that γ ∈ P(Rd ) satisfies the log-concavity assumptions on balls (3.47). Then supp γ is convex and there exists a convex l.s.c. function V : Rd → (∞, +∞] such that γ = e−V · H k |aff(supp γ) , where k = dim(supp γ). (3.50) Conversely, if γ admits the representation (3.50) then γ is log-concave and the relative entropy functional H(·|γ) is convex along any (generalized) geodesic of P2 (Rd ). Proof. Let us suppose that γ satisfies the log-concave inequality on balls and let k be the dimension of aff(supp γ). Observe that the measure γ satisfies the same inequality (3.47) for the balls of aff(supp γ): up to an isometric change of coordinates it is not restrictive to assume that k = d and aff(supp γ) = Rd . Let us now introduce the set n o γ(Br (x)) D := x ∈ Rd : lim inf > 0 . (3.51) r↓0 rd Since (3.47) yields γ(Br (xt )) ≥ rk γ(Br (x0 )) rk 1−t γ(Br (x1 )) rk t t ∈ (0, 1), (3.52) it is immediate to check that D is a convex subset of Rd with D ⊂ supp γ. General results on derivation of Radon measures in Rd (see for instance Theorem 2.56 in [7]) show that lim sup r↓0 γ(Br (x)) < +∞ rd and lim sup r↓0 for L d -a.e. x ∈ Rd (3.53) for γ-a.e. x ∈ Rd . (3.54) rd < +∞ γ(Br (x)) Using (3.54) we see that actually γ is concentrated on D (so that supp γ ⊂ D) and therefore, being d the dimension of aff(supp γ), it follows that d is also the dimension of aff(D). If a point x̄ ∈ Rd exists such that lim sup r↓0 γ(Br (x̄)) = +∞, rd then (3.52) forces every point of int(D) to verify the same property, but this would be in contradiction with (3.53), since we know that int(D) has strictly positive L d -measure. Therefore lim sup r↓0 γ(Br (x)) < +∞ rd 42 for all x ∈ Rd (3.55) and we obtain that γ L d , again by the theory of derivation of Radon measures in Rd . In the sequel we denote by ρ the density of γ w.r.t. L d and notice that by Lebesgue differentiation theorem ρ > 0 L d -a.e. in D and ρ = 0 L d -a.e. in Rd \ D. By (3.47) the maps γ(B (x)) r Vr (x) = − log ωd r d are convex on Rd , and (3.55) gives that the family Vr (x) is bounded as r ↓ 0 for any x ∈ D. Using the pointwise boundedness of Vr on D and the convexity of Vr it is easy to show that Vr are locally equi-bounded (hence locally equicontinuous) on int(D) as r ↓ 0. Let W be a limit point of Vr , with respect to the local uniform convergence, as r ↓ 0: W is convex on int(D) and Lebegue differentiation theorem shows that ∃ lim Vr (x) = − log ρ(x) = W (x) r↓0 for L d -a.e. x ∈ int(D), (3.56) so that γ = ρL d = e−W χint(D) L d . In order to get a globally defined convex and l.s.c function V we extend W with the +∞ value out of int(D) and define V to be its convex and l.s.c. envelope. It turns out that V coincides with W on int(D), so that still the representation γ = e−V L d holds. Conversely, let us suppose that γ admits the representation (3.50) for a given convex l.s.c. function V and let µ1 , µ2 ∈ P2 (Rd ); if their relative entropies are finite then they are absolutely continuous w.r.t. γ and therefore their supports are contained in aff(supp γ). It follows that the support of any optimal plan µ ∈ Γo (µ1 , µ2 ) in P2 (Rd ) is contained in aff(supp γ)×aff(supp γ): up to a linear isometric change of coordinates, it is not restrictive to suppose aff(supp γ) = Rd , µ1 , µ2 ∈ P2a (Rd ), γ = e−V · L d ∈ P(Rd ). In this case we introduce the densities ρi of µi w.r.t. L d , observing that dµi = ρi eV dγ i = 1, 2, where we adopted the convention 0 · (+∞) = 0 (recall that ρi (x) = 0 for L d -a.e. x ∈ Rd \ D(V )). Therefore the entropy functional can be written as Z Z H(µi |γ) = ρi (x) log ρi (x) dx + V (x) dµi (x), (3.57) Rd Rd i.e. the sum of two geodesically convex functionals, as we proved discussing Examples 3.4 and Examples 3.8. Lemma 3.19 yields the log-concavity of γ. If γ is log-concave and F satisfies (3.22), then all the integral functionals F(·|γ) introduced in (3.36) are geodesically convex in P2 (Rd ). Theorem 3.23 (Geodesic convexity for relative integral functionals) Suppose that γ is log-concave and F : [0, +∞) → [0, +∞] satisfies conditions (3.35) and (3.22). Then the integral functional F(·|γ) is geodesically convex in P2 (Rd ). 43 Proof. Arguing as in the final part of the proof of Theorem 3.22 we can assume that γ := e−V L d for a convex l.s.c. function V : Rd → (−∞, +∞] whose domain has not empty interior. For every couple of measures µ1 , µ2 ∈ D(F(·|γ)) we have Z µi = ρi eV · γ, F(µi |γ) = F (ρi (x)eV (x) )e−V (x) dx i = 1, 2. (3.58) Rd We denote by r the optimal transport map for the Wasserstein distance pushing µ1 to µ2 and we set r t := (1 − t)i + tr, µt := (r t )# µ1 ; arguing as in Proposition 3.11, we get Z ρ(x)eV (rt (x)) F(µt |γ) = F det ∇r t (x)e−V (rt (x)) dx, (3.59) det ∇r t (x) Rd and the integrand above may be seen as the composition of the convex and nonincreasing map s 7→ F (ρ(x)e−s )es with the concave curve t 7→ −V (r t (x)) + log(det ∇r t (x)), since D(x) := ∇r(x) is a diagonalizable map with nonnegative eigenvalues and t 7→ log det (1 − t)I + tD(x) is concave in [0, 1]. 44 4 Subdifferential calculus in P2 (Rd ) Let X be an Hilbert space. In the classical theory of subdifferential calculus for proper, lower semicontinuous functionals φ : X → (−∞, +∞], the Fréchet Subdifferential ∂φ : X → 2X of φ is a multivalued operator defined as ξ ∈ ∂φ(v) ⇐⇒ v ∈ D(φ), lim inf w→v φ(w) − φ(v) − hξ, w − vi ≥ 0, |w − v| which we will also write in the equivalent form for v ∈ D(φ) ξ ∈ ∂φ(v) ⇐⇒ φ(w) ≥ φ(v) + hξ, w − vi + o |w − v| as w → v. (4.1) (4.2) As usual in multivalued analysis, the proper domain D(∂φ) ⊂ D(φ) is defined as the set of all v ∈ X such that ∂φ(v) 6= ∅; we will use this convention for all the multivalued operators we will introduce. The metric counterpart of the Fréchet Subdifferential is represented by the metric slope of φ, which for every v ∈ D(φ) is defined by |∂φ|(v) = lim sup w→v (φ(v) − φ(w))+ , |w − v| (4.3) and can also be characterized by an asymptotic expansion similar to (4.2) for s≥0 s ≥ |∂φ|(v) ⇐⇒ φ(w) ≥ φ(v) − s|w − v| + o |w − v| as w → v. (4.4) It is then immediate to check that ξ ∈ ∂φ(v) =⇒ |∂φ|(v) ≤ |ξ|. (4.5) The Fréchet subdifferential and the metric slope occur quite naturally in the Euler equations for minima of (smooth perturbation of) φ: A. Euler equation for quadratic perturbations. If vτ is a minimizer of w 7→ Φ(τ, v; w) := φ(w) + 1 |w − v|2 2τ then vτ ∈ D(∂φ) and − for some τ > 0, v ∈ X (4.6) vτ − v ∈ ∂φ(vτ ); τ (4.7) |v − vτ | . τ (4.8) concerning the slope we easily get vτ ∈ D(|∂φ|) and |∂φ|(v) ≤ For λ-convex functionals the Fréchet subdifferential enjoys at least two other simple but fundamental properties, which play a crucial role in the corresponding variational theory of evolution equations: 45 B. Characterization by variational inequalities and monotonicity. If φ is λ-convex, then ξ ∈ ∂φ(v) λ φ(w) ≥ φ(v)+hξ, w−vi+ |w−v|2 2 ⇐⇒ ∀ w ∈ D(φ); (4.9) in particular, ξi ∈ ∂φ(vi ) =⇒ hξ1 − ξ2 , v1 − v2 i ≥ λ|v1 − v2 |2 s ≥ |∂φ|(v) ⇐⇒ φ(w) ≥ φ(v) − s|w − v| + ∀ v1 , v2 ∈ D(∂φ). (4.10) As in (4.9), the slope of a λ-convex functional can also be characterized by a system of inequalities for s ≥ 0 λ |w − v|2 2 ∀ w ∈ D(φ), (4.11) which can equivalently reformulated as |∂φ|(v) = sup w6=v + φ(v) − φ(w) λ + |v − w| . |v − w| 2 (4.12) C. Convexity and strong-weak closure. [15, Chap. II, Ex. 2.3.4, Prop. 2.5] If φ is λ-convex, then ∂φ(v) is closed and convex, and for every sequences (vn ) ⊂ X, (ξn ) ⊂ X we have ξn ∈ ∂φ(vn ), vn → v, ξn * ξ =⇒ ξ ∈ ∂φ(v), φ(vn ) → φ(v). (4.13) The slope is l.s.c. vn → v =⇒ lim inf |∂φ|(vn ) ≥ |∂φ|(v). n→∞ (4.14) Modeled on the last property C, and following a terminology introduced by F.H. Clarke, see e.g. [62, Chap. 8], we say that a functional φ is regular if ) ξn ∈ ∂φ(vn ), ϕn = φ(vn ) =⇒ ξ ∈ ∂φ(v), ϕ = φ(v). (4.15) vn → v, ξn * ξ, ϕn → ϕ D. Minimal selection and slope. (see *.*.* of [8]) If φ is regular (in particular if φ is λ-convex) |∂φ|(v) is finite if and only if ∂φ(v) 6= ∅ and n o |∂φ|(v) = min |ξ| : ξ ∈ ∂φ(v) . (4.16) E. Chain rule. If v : (a, b) → D(φ) is a curve in X then d φ(v(t)) = hξ, v 0 (t)i dt 46 ∀ ξ ∈ ∂φ(v(t)), (4.17) at each point t where v and φ ◦ v are differentiable and ∂φ(v(t)) 6= ∅. In particular (see [15, Chap. III, Lemma 3.3] and Corollary 2.4.10 in [8]) if φ is also λ-convex, v ∈ AC(a, b; X), and Z b |∂φ|(v(t))|v 0 (t)| dt < +∞, (4.18) a then φ ◦ v is absolutely continuous in (a, b) and (4.17) holds for L 1 -a.e. t ∈ (a, b). The aim of this section is to extend the notion of Fréchet subdifferentiability and these properties to the Wasserstein framework (see also [21] for related results). 4.1 Definition of the subdifferential for a.c. measures In this section we focus our attention to functionals φ defined on P2 (Rd ). The formal mechanism for translating statements from the euclidean framework to the Wasserstein formalism is simple: if µ ↔ v is the reference point, scalar products h·, ·i have to be intended in the reference Hilbert space L2 (µ; Rd ) (which contains the tangent space Tanµ P2 (Rd )) and displacement vectors w − v corresponds to transport maps tνµ −i, which is well defined if µ ∈ P2a (Rd ). According to these two natural rules, the transposition of (4.1) yields: Definition 4.1 (Fréchet subdifferential and metric slope) Let us consider a functional φ : P2 (Rd ) → (−∞, +∞] and a measure µ ∈ D(φ) ∩ P2a (Rd ). We say that ξ ∈ L2 (µ; Rd ) belongs to the Fréchet subdifferential ∂φ(µ) if Z φ(ν) − φ(µ) ≥ hξ(x), tνµ (x) − xi dµ(x) + o W2 (µ, ν) . (4.19) Rd When ξ ∈ ∂φ(µ) also satisfies Z hξ(x), t(x) − xi dµ(x) + o kt − ikL2 (µ;Rd ) , φ(t# µ) − φ(µ) ≥ (4.20) Rd then we will say that ξ is a strong subdifferential. It is obvious that ∂φ(µ) is a closed convex subset of L2 (µ; Rd ); in fact, we could also impose that it is contained in the tangent space Tanµ P2 (Rd ), since the vector ξ in (4.19) acts only on tangent vectors (see Theorem 2.24): for, if Π denotes the orthogonal projection onto Tanµ P2 (Rd ) in L2 (µ; Rd ), ξ ∈ ∂φ(µ) =⇒ Πξ ∈ ∂φ(µ). (4.21) It is interesting to note that elements in ∂φ(µ) ∩ Tanµ P2 (Rd ) are in fact strong subdifferentials. Proposition 4.2 (Subdifferentials in Tanµ P2 (Rd ) are strong) Let µ ∈ D(φ)∩ P2a (Rd ) and let ξ ∈ ∂φ(µ) ∩ Tanµ P2 (Rd ). Then ξ is a strong subdifferential. 47 Proof. We argue by contradiction, and we assume that a constant δ > 0, a sequence sn ∈ L2 (µ; Rd ) with εn := ksn − ikL2 (µ;Rd ) → 0 as n → ∞ exist such that Z hξ, sn − ii dµ ≤ −δ εn , µn := (sn )# µ. (4.22) φ(µn ) − φ(µ) − Rd Let us denote by tn the optimal transport pushing µ onto µn : we know that ktn − ikL2 (µ;Rd ) = W2 (µ, µn ) ≤ εn → 0. (4.23) By the definition of subdifferential, there exists n0 ∈ N such that for every n ≥ n0 Z δ φ(µn ) − φ(µ) ≥ hξ, tn − ii dµ − εn ; 2 d R combining with (4.22) we obtain Z δ hξ, tn − sn i dµ ≤ − εn 2 d R ∀ n ≥ n0 . (4.24) Up to an extraction of a suitable subsequence, we can assume that sn − i * s̃, εn tn − i * t̃ εn weakly in L2 (µ; Rd ) as n → ∞; (4.25) δ < 0. 2 (4.26) by (4.24) we get tn − sn * t̃ − s̃s, εn Z hξ, t̃ − s̃i dµ ≤ − Rd On the other hand, for every test function ζ ∈ Cc∞ (Rd ), the second order Taylor expansion ζ(y)−ζ(x) ≤ hDζ(x), y −xi+C|y −x|2 ∀ x, y ∈ Rd , for some constant C ≥ 0 yields Z ζ(tn (x)) − ζ(sn (x)) dµ(x) ≤ hDζ(x), sn (x) − tn (x)i dµ(x) Rd Rd Z +C |sn (x) − x|2 + |tn (x) − x|2 dµ(x) Rd Z ≥ Dζ(x), tn (x) − sn (x)i dµ(x) − 2Cε2n . Z 0= Rd Dividing by εn and passing to the limit as n → ∞ we get Z hDζ, t̃ − s̃i dµ ≥ 0 ∀ ζ ∈ Cc∞ (Rd ). (4.27) Rd Since the gradients of Cc∞ (Rd ) functions are dense in Tanµ P2 (Rd ), (4.27) contradicts (4.26). 48 The definition of the metric slope of φ is in fact common to functionals defined in arbitrary metric spaces. Definition 4.3 (Metric slope) Let us consider a functional φ : P2 (Rd ) → (−∞, +∞] and a measure µ ∈ D(φ). The metric slope of φ at µ is defined by |∂φ|(µ) = lim sup ν→µ (φ(ν) − φ(µ))+ , W2 (ν, µ) (4.28) or, equivalently, by n |∂φ|(µ) := inf s ≥ 0 : φ(ν) ≥ φ(µ) − sW2 (ν, µ) + o W2 (ν, µ) o as W2 (ν, µ) → 0 . 4.2 (4.29) Subdifferential calculus in P2a (Rd ) We now try to reproduce in the Wasserstein framework the calculus properties for the subdifferential, we briefly discussed at the beginning of the present section. In order to simplify some technical point, we are supposing that φ : P2 (Rd ) → (−∞, +∞] is proper and lower semicontinuous, with D(|∂φ|) ⊂ P2a (Rd ), (4.30a) and that for some τ∗ > 0 the functional 1 2 W (µ, ν) + φ(ν) admits at least 2τ 2 a minimum point µτ , for all τ ∈ (0, τ∗ ) and µ ∈ P2 (X). ν 7→ Φ(τ, µ; ν) = (4.30b) Notice that D(φ) ⊂ P2r (X) is a sufficient but not necessary condition for (4.30a): the internal energy functionals induced by a class of sublinear functions F satisfy (4.30a), but have a domain strictly larger than P2a (Rd ) (see Theorem 10.4.8 of [8]). A. Euler equation for quadratic perturbations. When we want to minimize the perturbed functional (4.30b) we get a result completely analogous to the euclidean one: Lemma 4.4 Let φ be satisfying (4.30a,b). Each minimizer µτ of (4.30b) belongs to µτ ∈ D(|∂φ|) and 1 µ tµτ − i ∈ ∂φ(µτ ) τ is a strong subdifferential. 49 (4.31) Proof. The minimality of µτ gives for every ν ∈ P2 (Rd ) 1 2 φ(ν) − φ(µτ ) = Φ(τ, µ; ν) − Φ(τ, µ; µτ ) + W2 (µτ , µ) − W22 (ν, µ) 2τ 1 2 2 ≥ W2 (µτ , µ) − W2 (ν, µ) (4.32) 2τ 1 ≥ − W2 (µτ , ν) W2 (µτ , µ) + W2 (ν, µ) . (4.33) 2τ Letting ν converge to µτ , (4.33) yields |∂φ|(µτ ) ≤ W2 (µτ , ν) . τ (4.34) By (4.30a) we get µτ ∈ P2a (Rd ); if ν = t# µτ we have Z Z W22 (µτ , µ) = |tµµτ (x) − x|2 dµτ (x), W22 (ν, µ) ≤ Rd Rd |t(x) − tµµτ (x)|2 dµτ (x), and therefore the elementary identity 21 |a|2 − 12 |b|2 = ha, a − bi − 12 |a − b|2 and (4.32) yield Z 1 φ(ν) − φ(µτ ) ≥ |tµµτ (x) − x|2 − |tµµτ (x) − t(x)|2 dµτ (x) 2τ d Z R 1 µ 1 = htµτ (x) − x, t(x) − xi − |t(x) − x|2 dµτ (x) τ 2τ d ZR 1 µ 1 = htµτ (x) − x, t(x) − xi dµτ (x) − kt − ik2L2 (µτ ;Rd ) . τ 2τ d R We deduce τ1 tµµτ − i ∈ ∂φ(µτ ) and the strong subdifferentiability condition. The above result, though simple, is very useful and usually provides the first crucial information when one looks for the properties of solutions of the variational problem (4.30b). The nice argument which combines the minimality of µτ and the possibility to use any “test” transport map t to estimate W22 (t# ν, µ) was originally introduced by F. Otto. 4.3 The case of λ-convex functionals along geodesics Let us now focus our attention to the case of a λ-convex functional: φ is λ-convex on geodesics, according to Definition 3.1. (4.35) B. Characterization by Variational inequalities and monotonicity. Suppose that φ satisfies (4.30a,b) and (4.35). Then a vector ξ ∈ L2 (µ; X) belongs to the Fréchet subdifferential of φ at µ iff Z λ φ(ν) − φ(µ) ≥ hξ(x), tνµ (x) − xi dµ(x) + W22 (µ, ν) ∀ ν ∈ D(φ). (4.36) 2 d R 50 In particular if ξ i ∈ ∂φ(µi ), i = 1, 2, and t = tµµ21 is the optimal transport map, then Z hξ 2 (t(x)) − ξ 1 (x), t(x) − xi dµ1 (x) ≥ λW22 (µ1 , µ2 ). (4.37) Rd Concerning the slope of φ we have for every s ≥ 0 s ≥ |∂φ|(µ) ⇐⇒ φ(ν) ≥ φ(µ) − sW2 (ν, µ) + λ 2 W (ν, µ) 2 2 ∀ ν ∈ D(φ), (4.38) or, equivalently, |∂φ|(µ) = sup ν6=µ + φ(µ) − φ(ν) λ . + W2 (ν, µ) W2 (ν, µ) 2 (4.39) Proof. One implication of (4.36) and of (4.38) is trivial. To prove the other one, in the case of (4.36) suppose that ξ ∈ ∂φ(µ) and ν ∈ D(φ); for t ∈ [0, 1] we set µt := (i + t(tνµ − i))# µ and we recall that the λ-convexity yields φ(µt ) − φ(µ) λ ≤ φ(ν) − φ(µ) − (1 − t)W22 (µ, ν). t 2 (4.40) On the other hand, since W2 (µ, µt ) = tW2 (µ, ν), Fréchet differentiability yields Z 1 φ(µt ) − φ(µ) ≥ lim inf hξ(x), tµµt (x) − xi dµ(x) lim inf t↓0 t t→0+ t Rd Z ≥ hξ(x), tνµ (x) − xi dµ(x), Rd since tµµt (x) = x + t(tνµ (x) − x). In the case of the slope (4.38), (4.40) and the fact that lim inf t↓0 φ(µt ) − φ(µ) ≥ −|∂φ|(µ) W2 (µ, ν) t yield (4.38). (4.41) C. Convexity and strong-weak closure. The next step is to show the closure of the graph of ∂φ: here one has to be careful in the meaning of the convergence of vectors ξ n ∈ L2 (µn ; Rd ), which belongs to different L2 -spaces, and we will adopt the following one. Definition 4.5 Let (µn ) ⊂ P(Rd ) be narrowly converging to µ in P(Rd ) and let v n ∈ L1 (µn ; Rd ). We say that v n weakly converge to v ∈ L1 (µ; Rd ) if Z Z ζ(x)v n (x) dµn (x) = ζ(x)v(x) dµ(x) ∀ζ ∈ Cc∞ (Rd ). (4.42) lim n→∞ Rd Rd We say that v n strongly converge to v in L2 if (4.42) holds and lim sup kv n kL2 (µn ;Rd ) ≤ kvkL2 (µ;Rd ) . n→∞ 51 (4.43) We state without proof (see Theorem 5.4.4 of [8] for it) some basic properties of this convergence. Theorem 4.6 Let (µn ) ⊂ P2 (Rd ) be narrowly converging to µ in P(Rd ) and let v n ∈ L2 (µn ; Rd ) be such that Z (4.44) sup |x|2 + |v n (x)|2 dµn (x) < +∞. n∈N Rd Then the following statements hold: (i) The sequence (v n ) has weak limit points as n → ∞. If v is any limit point, along some subsequence n(k), then Z Z lim hv n(k) , ϕi dµn(k) (x) = hv(x), ϕi dµ(x), (4.45) k→∞ Rd Rd for every continuous function ϕ : Rd → Rd with p-growth for some p < 2. (ii) If v n strongly converge to v in L2 and W2 (µn , µ) → 0, then Z Z lim f (x, v n (x)) dµn (x) = f (x, v(x)) dµ(x), n→∞ Rd (4.46) Rd holds for every continuous function f : Rd ×Rd → R with quadratic growth. Lemma 4.7 (Closure of the subdifferential) Let φ be a λ-convex functional satisfying (4.30a), let (µn ) be converging to µ ∈ D(φ) in P2 (Rd ), let ξ n ∈ ∂φ(µn ) be satisfying Z sup |ξ n (x)|2 dµn (x) < +∞, (4.47) n Rd and converging to ξ according to Definition 4.5. Then ξ ∈ ∂φ(µ). Proof. Let ν ∈ D(φ) and let C be the constant in (4.47). We have to pass to the limit as n → ∞ in the subdifferential inequality Z λ φ(ν) − φ(µn ) ≥ hξ n (x), tνµn (x) − xi dµn (x) + W22 (µn , ν). (4.48) 2 d R By the lower semicontinuity of φ the upper limit of φ(ν) − φ(µn ) is less than φ(ν)−φ(µ). Passing to the right hand side, given ε > 0 we choose t̄ ∈ Cb0 (Rd ; Rd ) such that ktνµ − t̄kL2 (µ;Rd ) < ε2 and split the integrals as Z Z ν hξ n (x), tµn (x) − t̄(x)i dµn (x) + hξ n (x), t̄(x) − xi dµn (x). (4.49) Rd Rd By the Young inequality, the first integrals can be estimated with Z Z Cε 1 Cε 1 + lim sup |tνµn − t̄|2 dµn = + lim sup |y − t̄(x)|2 dγ n , 2 2ε n→∞ Rd 2 2ε n→∞ Rd ×Rd 52 mi sembra occorra un controllo anche sul momento delle µn where γ n = (i × tνµn )# µn are the optimal plans induced by tνµn . Now, by Proposition 7.1.3 of [8] (showing that optimal plans are stable under narrow convergence), we know that γ n narrowly converge to the plan γ = (i × tνµ )# µ induced by tνµ ; moreover, as |y|2 is uniformly integrable with respect to {γ n } (because the second marginal of γ n is constant), Lemma 1.2 gives that the upper limits above are less than Z Z |y − t̄(x)|2 dγ = |tνµ − t̄|2 dµ ≤ ε2 . Rd ×Rd Rd ×Rd Summing up, we proved that the limsup of the first integrals in (4.49) is less than (C + 1)ε/2. The convergence of the second integrals in (4.49) to Z hξ(x), t̄(x) − xi dµ(x) Rd follows directly from Theorem 4.6(i). As a consequence Z Z lim inf hξ n (x), tνµn (x) − xi dµn (x) ≥ hξ(x), tνµ (x) − xi dµ(x) n→∞ Rd Rd Z ε − (C + 1) − |ξ(x)| · |t̄(x) − tνµ | dµ(x). 2 Rd As ε is arbitrary, the variational inequality (4.48) passes to the limit. The lower semicontinuity of the slope µn → µ in P2 (Rd ) =⇒ lim inf |∂φ|(µn ) ≥ |∂φ|(µ) n→∞ (4.50) is an immediate consequence of (4.38) and of the lower semicontinuity of φ. 4.4 Regular functionals Definition 4.8 A functional φ : P2 (Rd ) → (−∞, +∞] satisfying (4.30a) is regular if, whenever the strong subdifferentials ξ n ∈ ∂φ(µn ), ϕn = φ(µn ) satisfy µn → µ in P2 (Rd ), ϕn → ϕ, sup kξ n kL2 (µn ;Rd ) < +∞ n (4.51) ξ n → ξ weakly, according to Definition 4.5, then ξ ∈ ∂φ(µ) and ϕ = φ(µ). We just proved that λ-convex functionals are indeed regular. D. Minimal selection and slope. Lemma 4.9 Let φ be a regular functional satisfying (4.30a,b). µ ∈ D(|∂φ|) if and only if ∂φ(µ) is not empty and n o |∂φ|(µ) = min kξkL2 (µ;Rd ) : ξ ∈ ∂φ(µ) , (4.52) 53 where the metric slope |∂φ|(µ) is defined in (4.3). By the convexity of ∂φ(µ) there exists a unique vector ξ ∈ ∂φ(µ) which attains the minimum in (4.52): we will denote it by ∂ ◦ φ(µ), it belongs to Tanµ P2 (Rd ) and it is also a strong subdifferential. Proof. It is clear from the very definition of Fréchet subdifferential that |∂φ|(µ) ≤ kξkL2 (µ;Rd ) ∀ ξ ∈ ∂φ(µ); thus we should prove that if |∂φ|(µ) < +∞ there exists ξ ∈ ∂φ(µ) such that kξkL2 (µ;Rd ) ≤ |∂φ|(µ). We argue by approximation: for µ ∈ D(|∂φ|) and τ ∈ (0, τ∗ ), let µτ be a minimizer of (4.30b); by Lemma 4.4 we know that Z W 2 (µ, µτ ) 1 µ tµτ − i ∈ ∂φ(µτ ), , |ξ τ (x)|2 dµτ (x) = 2 2 ξτ = τ τ Rd and ξ τ is a strong subdifferential. Furthermore, it is proved in Lemma 3.1.5 of [8] (in a general metric space setting) that there exists a sequence (τn ) ↓ 0 such that W22 (µτn , µ) = |∂φ|2 (µ). (4.53) lim n→∞ τ2 By Theorem 4.6(i) we know that ξ τ has some limit point ξ ∈ L2 (µ; Rd ) as τ ↓ 0, according to Definition 4.5. By (4.51) we get ξ ∈ ∂φ(µ) with kξkL2 (µ;Rd ) ≤ |∂φ|(µ), so that ξ is the (unique) element of minimal norm in ∂φ(µ). By (4.21) we also deduce that ξ ∈ Tanµ P2 (Rd ) and Proposition 4.2 shows that ξ is a strong subdifferential. Remark 4.10 (The λ-convex case) When φ satisfies the λ-convexity assumption (4.35), the proof of Property (4.53) is considerably easier, since µτ satisfies the a priori bound [8, Thm. 3.1.6] (1 + λτ ) W2 (µτ , µ) ≤ |∂φ|(µ). τ (4.54) For, we choose µt := (i+t(tµµτ −i))# µ and we recall that λ-convexity of φ yields 1 2 1 2 W2 (µ, µτ ) + φ(µτ ) ≤ W (µ, µt ) + φ(µt ) 2τ 2τ 2 t t − λτ (1 − t) W22 (µ, µτ ) + (1 − t)φ(µ) + tφ(µτ ). ≤ 2τ Since the right hand quadratic function has a minimum for t = 1, taking the left derivative we obtain λ 1 + W22 (µ, µτ ) + φ(µτ ) − φ(µ) ≤ 0, 2 τ 54 and therefore, by (4.38) 1 W 2 (µ, µτ ) φ(µ) − φ(µτ ) W22 (µ, µτ ) (1 + λτ ) 2 2 ≤ − 2 τ τ 2τ 2 W2 (µτ , µ) W 2 (µτ , µ) ≤ |∂φ|(µ) − (1 + λτ ) 2 2 τ 2τ 1 ≤ |∂φ|2 (µ), 2(1 + λτ ) which yields (4.54). E. Chain rule. Let φ : P2 (Rd ) → (−∞, +∞] be a regular functional satisfying (4.30a,b), and let µ : (a, b) 7→ µt ∈ D(φ) ⊂ P2 (Rd ) be an absolutely continuous curve with tangent velocity vector v t . Let Λ ⊂ (a, b) be the set of points t ∈ (a, b) such that (a) |∂φ|(µt ) < +∞; (b) φ ◦ µ is differentiable at t; (c) condition (2.57) of Proposition 2.22 holds. Then d φ(µt ) = dt Z Rd hξ t (x), v t (x)i dµt (x) ∀ ξ t ∈ ∂φ(µt ), ∀t ∈ Λ. Moreover, if φ is λ-convex (4.35) and Z b |∂φ|(µt )|µ0 |(t) dt < +∞, (4.55) (4.56) a then the map t 7→ φ(µt ) is absolutely continuous, and (a, b) \ Λ is L 1 -negligible. Proof. Let t̄ ∈ Λ; observing that 1 µt̄+h tµt̄ − i → vt̄ in L2 (µt̄ ; Rd ), v h := h we have Z φ(µt̄+h ) − φ(µt̄ ) ≥ h hv h (x), ξ t̄ (x)i dµt̄ (x) + o(h). (4.57) (4.58) Rd Dividing by h and taking the right and left limits as h → 0 we obtain that the left and right derivatives d/dt± φ(µt ) satisfy Z d φ(µt )|t=t̄ ≥ hvt̄ (x), ξ t̄ (x)i dµt̄ (x), dt+ Rd Z d hvt̄ (x), ξ t̄ (x)i dµt̄ (x) φ(µt )|t=t̄ ≤ dt− Rd and therefore we find (4.55). In the λ-convex case, it can be shown (see Corollary 2.4.10 in [8]) that (4.56) implies that t 7→ φ(µt ) is absolutely continuous in (a, b) and thus conditions (a,b,c) hold L 1 -a.e. in (a, b). 55 4.5 Examples of subdifferentials In this section we consider in the detail the subdifferential of the convex functionals presented in Section 3.2 (potential energy, interaction energy, internal energy, negative Wasserstein distance), with a particular attention to the characterization of the elements with minimal norm. We start by considering a general, but smooth, situation. 4.5.1 Variational integrals: the smooth case In order to clarify the underlying structure of many examples and the link between the notion of Wasserstein subdifferential and the standard variational calculus for integral functionals, we first consider the case of a variational integral of the type Z F (x, ρ(x), ∇ρ(x)) dx if µ = ρ · L d with ρ ∈ C 1 (Rd ) d F (µ) := (4.59) R +∞ otherwise. Since we are not claiming any generality and we are only interested in the form of the subdifferential, we will assume enough regularity to justify all the computations; therefore, we suppose that F : Rd × [0, +∞) × Rd → [0, +∞) is a C 2 function with F (x, 0, p) = 0 for every x, p ∈ Rd and we consider the case of a smooth and strictly positive density ρ: as usual, we denote by (x, z, p) ∈ Rd × R × Rd the variables of F and by δF /δρ the first variation density δF (x) := Fz (x, ρ(x), ∇ρ(x)) − ∇ · Fp (x, ρ(x), ∇ρ(x)). δρ (4.60) Lemma 4.11 If µ = ρ · L d ∈ P2a (Rd ) with ρ ∈ C 1 (Rd ) satisfies F (µ) < +∞ and w ∈ L2 (µ; Rd ) belongs to the strong subdifferential of F at µ (in particular, by Proposition 4.2, if w ∈ ∂φ(µ) ∩ Tanµ P2 (Rd )), then w(x) = ∇ δF (x) δρ for µ-a.e. x ∈ Rd , and for every vector field ξ ∈ Cc∞ (Rd ; Rd ) we have Z Z δF w(x) · ξ(x) dµ(x) = − (x) ∇ · ρ(x)ξ(x) dx. Rd Rd δρ (4.61) (4.62) Proof. We take a smooth vector field ξ ∈ Cc∞ (Rd ; Rd ) and we set for ε ∈ R sufficiently small µε := (i + εξ)# µ. If w is a strong subdifferential, we know that Z F (µε ) − F (µ) F (µε ) − F (µ) ≤ w(x) · ξ(x) dµ(x) ≤ lim inf ; lim sup ε↓0 ε ε d ε↑0 R (4.63) 56 on the other hand, by the change of variables formula we know that µε = ρε L d with −1 ρ ρε (y) = ◦ i + εξ (y) ∀ y ∈ Rd . (4.64) det(I + ε∇ξ) The map (x, ε) 7→ ρε (x) is of class C 2 with ρε (x) = ρ(x) outside a compact set and ∂ρε (x) = −∇ · ρ(x)ξ(x) . (4.65) ρε (x)|ε=0 = ρ(x), | ε=0 ∂ε Standard variational formulae (see e.g. [41, Vol. I, 1.2.1]) yield Z δF F (µε ) − F (µ) =− (x) ∇ · ρ(x)ξ(x) dx, (4.66) lim ε→0 ε Rd δρ which shows (4.62). Let us now suppose that w ∈ ∂F (µ)∩Tanµ P2 (Rd ); then (4.66) holds whenever i + εξ is, an optimal transport map for |ε| small enough, and in particular for gradient vector fields ξ = ∇ζ with ζ ∈ Cc∞ (Rd ). Since Tanµ P2 (Rd ) is the closure in L2 (µ; Rd ) of the space of such gradients, we have Z Z δF w(x)·ξ(x) dµ(x) = − ∇ (x)·ξ(x) dµ(x) ∀ ξ ∈ Tanµ P2 (Rd ). (4.67) δρ Rd Rd We obtain (4.61) noticing that δF /δρ ∈ Tanµ P2 (Rd ), by the assumption that ρ ∈ Cc2 (Rd ). 4.5.2 The potential energy Let V : RRd → (−∞, +∞] be a proper, l.s.c. and λ-convex functional and let V(µ) = Rd V dµ be defined on P2 (Rd ). We denote by graph ∂V the graph of the Fréchet subdifferential of V in Rd × Rd , i.e. the subset of the couples (x1 , x2 ) ∈ Rd × Rd satisfying V (x3 ) ≥ V (x1 ) + hx2 , x3 − x1 i + λ |x1 − x2 |2 2 ∀ x3 ∈ Rd . (4.68) As usual, ∂ o V (x) denotes the element of minimal norm in ∂V (x). Notice that the potential energy functional (as well as the interaction energy functional) fails to satisfy (4.30a), and for this reason it would be more appropriate to consider a more general notion of subdifferential, involving plans and not only maps as elements of the subdifferential, and, at the same time, takes onto account transport plans and not only transport maps (see §10.3 of [8]). In the present case, we choose an intermediate generalization, and say that ξ ∈ L2 (µ; Rd ) belongs to the Fréchet subdifferential ∂V(µ) at µ ∈ D(V) if Z (4.69) V(ν) − V(µ) ≥ inf hξ(x), y − xi dγ(x) + o W2 (µ, ν) . γ∈Γo (µ,ν) Rd ×Rd The following result is proved in Proposition 10.4.2 of [8]. 57 Proposition 4.12 Let µ ∈ P2 (Rd ) and ξ ∈ L2 (µ; Rd ). Then (i) ξ is a subdifferential of V at µ iff ξ(x) ∈ ∂V (x) for µ-a.e. x. (ii) ∂ ◦ V(µ) = ∂ ◦ V (x) for µ-a.e. x. 4.5.3 The internal energy Let F be the functional Z F (ρ(x)) dL d (x) if µ = ρ · L d ∈ P2a (Rd ), F(µ) := Rd +∞ otherwise, (4.70) for a convex differentiable function satisfying F (0) = 0, lim inf s↓0 d F (s) > −∞ for some α > sα d+2 (4.71) as in Example 3.8. Recall that if F has superlinear growth at infinity then the functional F is l.s.c. with respect to the narrow convergence (indeed, under this growth condition the lower semicontinuity can be checked w.r.t. to the stronger weak L1 convergence, by Dunford-Pettis theorem, and lower semicontinuity w.r.t. weak L1 convergence is a direct consequence of the convexity of F). We confine our discussion to the case when F has a more than linear growth at infinity, i.e. F (z) lim = +∞, (4.72) z→+∞ z see Theorem 10.4.6 and Theorem 10.4.8 of [8] for a discussion of the (sub)linear case. We set LF (z) = zF 0 (z) − F (z) : [0, +∞) → [0, +∞) and we observe that LF is strictly related to the convex function G(z, s) := sF (z/s), z ∈ [0, +∞), s ∈ (0, +∞), (4.73) since ∂ z G(z, s) = − F 0 (z/s) + F (z/s) = −LF (z/s). ∂s s In particular (recall that F (0) = 0, by (4.71)) G(z, s) ≤ F (z) for s ≥ 1, F (z) − G(z, s) ↑ LF (z) s−1 as s ↓ 1. (4.74) (4.75) We will also suppose that F satisfies the condition the map s 7→ sd F (s−d ) is convex and non increasing in (0, +∞), (4.76) yielding the geodesic convexity of F. The following lemma shows the existence of the directional derivative of F along a suitable class of directions including all optimal transport maps. 58 Lemma 4.13 (Directional derivative of F) Suppose that F : [0, +∞) → R is a convex differentiable function satisfying (4.71), (4.72) and (4.76). Let µ = ρL d ∈ D(F), r ∈ L2 (µ; Rd ) and t̄ > 0 be such that (i) r is differentiable ρL d -a.e. and r t := (1 − t)i + tr is ρL d -injective with | det ∇r t (x)| > 0 ρL d -a.e., for any t ∈ [0, t̄]; (ii) ∇r t̄ is diagonalizable with positive eigenvalues; (iii) F((r t̄ )# µ) < +∞. Then the map t 7→ t−1 F((r t )# µ) − F ∗ (µ) is nondecreasing in [0, t̄] and Z F((r t )# µ) − F(µ) ˜ − i) dx. =− LF (ρ)tr ∇(r (4.77) +∞ > lim t↓0 t Rd The identity above still holds when assumptions (ii) on r is replaced by ˜ − i)kL∞ (ρL d ;Rd×d ) < +∞ (in particular if r − i ∈ Cc∞ (Rd ; Rd )), (ii’) k∇(r and F satisfies in addition the “doubling” condition ∃C > 0 : F (z + w) ≤ C 1 + F (z) + F (w) ∀ z, w. (4.78) Proof. By assumptions (i) and (ii), taking into account Lemma 2.2 we have Z Z ρ(x) det ∇r t (x) dx − F (ρ(x)) dx F((r t )# µ) − F(µ) = F det ∇r t (x) Rd Rd Z = G(ρ(x), det ∇r t (x)) − F (ρ(x)) dx Rd for any t ∈ (0, t̄]. Assumption (4.76), together with the concavity of the map t 7→ [det ((1 − t)I + t∇r)]1/d , implies that the function G(ρ(x), det ∇r t ) − F (ρ(x)) t ∈ (0, t̄] (4.79) t is nondecreasing w.r.t. t and bounded above by an integrable function (take t = t̄ and apply (iii)). Therefore the monotone convergence theorem gives Z d F((r t )# µ) − F(µ) = G(ρ(x), det ∇r t (x))t=0 dx lim t↓0 t dt d R and the expansion det ∇r t = 1 + t tr ∇(r − i) + o(t) together with (4.74) give the result. In the case when (ii’) holds, the argument is analogous but, since condition (ii) fails, we cannot rely anymore on the monotonicity of the function in (4.79). However, using the inequalities F (w) − F (0) ≤ wF 0 (w) ≤ F (2w) − F (w) and the doubling condition we easily see that the derivative w.r.t. s of the function G(z, s) can be bounded by C(1+F + (z)) for |s−1| ≤ 1/2. Therefore we can use the dominated convergence theorem instead of the monotone convergence theorem to pass to the limit. 59 The next technical lemma shows that we can “integrate by parts” in (4.77) preserving the inequality, if LF (ρ) is locally in W 1,1 . Lemma 4.14 (A “weak” integration by parts formula) Under the same assumptions of Lemma 4.13, let us suppose that (i) supp µ ⊂ Ω, Ω being a convex open subset of Rd (not necessarily bounded); 1,1 (ii) LF (ρ) ∈ Wloc (Ω); (iii) supp((r t̄ )# µ) is a compact subset of Ω for some t̄ ∈ [0, 1]; (iv) r ∈ BVloc (Rd ; Rd ) and D · r ≥ 0. Then we can find an increasing family of nonnegative Lipschitz functions χk : Rd → [0, 1] with compact support in Ω such that χk ↑ χΩ and Z Z − LF (ρ(x))tr ∇(r − i) dx ≥ lim sup h∇LF (ρ), r − iiχk dx. (4.80) k→∞ Rd Rd Proof. Possibly replacing r by r t̄ , we can assume that t̄ = 1 in (iii). Let us first recall that by Calderon-Zygmund theorem (see for instance [7]) the pointwise divergence tr (∇r) is the absolutely continuous part of the distributional divergence D · r; therefore we have Z Z v tr (∇r) dx ≤ − h∇v, ri dx, (4.81) Rd Rd provided v ∈ Cc∞ (Rd ) is nonnegative. As r is bounded, by approximation the same inequality remains true for every nonnegative function v ∈ W 1,1 (Rd ). For every Lipschitz function η : Rd → [0, 1] with compact support in Ω, choosing v := ηLF (ρ) ∈ W 1,1 (Rd ) we get Z Z ηLF (ρ) tr (∇r) dx ≤ − h∇ ηLF (ρ) , ri dx. (4.82) Rd Rd On the other hand, a standard integration by parts yields Z Z ηLF (ρ) tr (∇i) dx = − h∇ ηLF (ρ) , ii dx; (4.83) Ω Ω summing up with (4.82) and inverting the sign we find Z Z − ηLF (ρ) tr (∇(r − i)) dx ≥ h∇ ηLF (ρ) , r − ii dx. Rd (4.84) Rd Now we choose carefully the test function η. We consider an increasing family bounded open convex sets Ωk such that Ωk ⊂⊂ Ω, Ω= ∞ [ k=1 60 Ωk and for each convex set Ωk we consider the function χk (x) := k d(x, Rd \ Ωk ) ∧ 1. (4.85) χk is an increasing family of nonnegative Lipschitz functions which take their values in [0, 1] and satisfy χk (x) ≡ 1 if d(x, Rd \ Ωk ) ≥ k1 ; in particular, χk ≡ 1 in K for k sufficiently large. Moreover χk is concave in Ωk , since the distance function d(·, Rd \ Ωk ) is concave. Choosing η := χk in (4.84) we get Z Z χk LF (ρ) tr (∇(r − i)) dx ≥ − h∇LF (ρ), r − ii χk dx Rd Rd Z + h∇χk , r − ii LF (ρ) dx (4.86) Ωk Z ≥ h∇LF (ρ), r − ii χk dx Rd since the integrand of (4.86) is nonnegative: in fact, for L d -a.e. x ∈ Ωk where LF (ρ(x)) is strictly positive, the concavity of χk and r(x) ∈ K yields h∇χk (x), r(x) − i(x)i ≥ χk (r(x)) − χk (x) = 1 − χk (x) ≥ 0. Passing to the limit as k → ∞ in the previous integral inequality, we obtain (4.80) (recall that the function in the left hand side of (4.80) is semiintegrable by (4.77)). In the following theorem we characterize the minimal selection in the subdifferential of F and give, under the doubling condition, a formula for the slope of the functional. Theorem 4.15 (Slope and subdifferential of F) Suppose that F : [0, +∞) → R is a convex differentiable function satisfying (4.71), (4.72), (4.76) and (4.78). Assume that F has finite slope at µ = ρL d ∈ P2a (Rd ). Then LF (ρ) ∈ W 1,1 (Rd ), ∇LF (ρ) = wρ for some function w ∈ L2 (ρL d ; Rd ) and Z 1/2 |w(x)|2 ρ(x) dx = |∂F|(µ) < +∞. (4.87) Rd 1,1 Conversely, if ∇LF (ρ) ∈ Wloc (Rd ) and ∇LF (ρ) = wρ for some w ∈ L2 (µ; Rd ), then F has a finite slope at µ = ρL d and w = ∂ ◦ F(µ). Proof. (a) We apply first (4.77) with r = 0 ρL d -a.e. and r = i and take into account that W2 (µ, ((1 − t)i + tr)# µ) ≤ tkikL2 (ρL d ;Rd ) to obtain Z LF (ρ) dx ≤ |∂F|(µ)kikL2 (ρL d ;Rd ) , d Rd 61 so that LF (ρ) ∈ L1 (Rd ). Next, we apply (4.77) with r −i equal to a Cc∞ (Rd ; Rd ) function t (notice that condition (i) holds with t̄ < sup |∇t|) and use again the inequality W2 (µ, ((1 − t)i + tr)# µ) ≤ tkr − ikL2 (ρL d ) to obtain Z LF (ρ)tr (∇t) dx ≤ |∂F ∗ |(µ)ktkLp (ρL d ) ≤ |∂F ∗ |(µ) sup |t|. Rd Rd As t is arbitrary, Riesz theorem gives that LF (ρ) is a function of bounded variation (i.e. its distributional derivative DLF (ρ) is a finite Rd -valued measure in Rd ), so that we can rewrite the inequality as d Z X ti dDi LF (ρ) ≤ |∂F|(µ)ktkL2 (ρL d ;Rd ) . Rd i=1 By L2 duality theory there exists w ∈ L2 (ρL d ; Rd ) with kwk2 ≤ |∂F|(µ) such that Z d Z X ti dDi LF (ρ) = hw, ti dρL d ∀t ∈ Cc∞ (Rd , Rd ). i=1 Rd Rd Therefore LF (ρ) ∈ W 1,1 (Rd ) and ∇LF (ρ) = wρ. This leads to the inequality ≤ in (4.87). In order to show that equality holds in (4.87) we will prove that (i × w)# µ belongs to ∂F(µ). We have to show that (4.36) holds for any ν ∈ D(F). Using the doubling condition it is also easy to find a sequence of measures νh with compact support converging to ν in P2 (Rd ) and such that F(νh ) converges to F(ν), hence we can also assume that supp ν is compact. As tνµ is induced by the gradient of a Lipschitz and convex map ϕ, we know that all the conditions of Lemma 4.13 are fulfilled with r = ∇ϕ, and also Lemma 4.14 holds; therefore, by applying (4.77), the geodesic convexity of F, and (4.80) we obtain Z F(ν) − F(µ) ≥ lim sup h∇LF (ρ), (r − i))χh dx d h→∞ Z ZR χ = lim sup hw, (r − i)) h ρ dx = hw, r − ii dµ, h→∞ Rd Rd proving that w ∈ ∂F(µ). Finally, we notice that our proof that w = ∇LF (ρ)/ρ ∈ ∂F(µ) does not use the finiteness of slope, but only the assumption w ∈ L2 (µ; Rd ), therefore these conditions imply that the subdifferential is not empty and that the slope is finite. 4.5.4 The relative internal energy In this section we briefly discuss the modifications which should be apported to the previous results, when one consider a relative energy functional as in Section 3.3. 62 We thus consider a log-concave probability measure γ = e−V · L d ∈ P(Rd ) induced by a convex l.s.c. potential V : Rd → (−∞, +∞], with Ω = int D(V ) 6= ∅. (4.88) We are also assuming that the energy density F : [0, +∞) → [0, +∞] is convex and l.s.c., it satisfies the doubling property (4.78), (4.89) and the geodesic convexity condition (3.22), which yield that the map s 7→ F̂ (s) := F (e−s )es is convex and non increasing in R. The functional Z Z F(µ|γ) := F (σ) dγ = F ρ/e−V e−V dx, µ = σ · γ = ρL d (4.90) Rd Ω is therefore geodesically convex in P2 (Rd ), by Theorem 3.23. It is easy to check that whenever F̂ is not constant (case which corresponds to a linear F and a constant functional F), F has a superlinear growth and therefore F is lower semicontinuous in P2 (Rd ). Theorem 4.16 (Subdifferential of F(·|γ)) The functional F(·|γ) has finite 1,1 slope at µ = σ · γ ∈ D(F) if and only if LF (σ) ∈ Wloc (Ω) and ∇LF (σ) = σw for some function w ∈ L2 (µ; Rd ). In this case Z 1/2 = |∂F|(µ), (4.91) |w(x)|2 dµ(x) Rd and w = ∂ ◦ F(µ). Proof. We argue as in Theorem 4.15: in the present case the directional derivative formula (4.77) becomes F((r t )# µ|γ) − F(µ|γ) +∞ > lim t↓0 t Z ˜ − i) − e−V h∇V, r − ii dx =− LF (ρ/e−V ) e−V tr ∇(r d ZR ˜ e−V (r − i) dx =− LF (σ)tr ∇ (4.92) Rd for every vector field r satisfying the assumptions of Lemma 4.13 and F(r # µ|γ) < +∞. Choosing as before r = i + eV t, t ∈ Cc∞ (Ω; Rd ), since V is bounded in each compact subset of Ω, we get Z LF (σ)tr ∇t dx ≤ |∂F|(µ) sup |eV t|, Rd Ω 63 so that LF (σ) ∈ BVloc (Ω). Choosing now r = i + t with t ∈ Cc∞ (Ω; Rd ) we get d Z X ti dDi LF (σ) dγ ≤ |∂F|(µ)ktkL2 (µ;Rd ) Ω i=1 so that there exists w ∈ L2 (µ; Rd ) such that d Z X i=1 Ω Z Z hw, ti dµ = ti dDi LF (σ) dγ = Rd hρw, tie−V dx Rd ∀ t ∈ Cc∞ (Ω; Rd ), 1,1 thus showing that LF (σ) ∈ Wloc (Ω) and ∇LF (σ) = ρe−V w = σw. 1,1 Conversely, if LF (σ) ∈ Wloc (Ω) with ∇LF (σ) = σw and w ∈ L2 (µ; Rd ), arguing as in Lemma 4.14 we have for every measure ν = r # µ with compact support in Ω Z ˜ e−V (r − i) χk dx F(ν|γ) − F(µ|γ) ≥ lim sup − LF (σ)tr ∇ k→∞ Z Ω hχk ∇LF (σ) + LF (σ)∇χk , r − ii dγ ≥ lim sup k→∞ ZΩ h∇LF (σ), r − iiχk dγ ≥ lim sup k→∞ Ω Z Z χ hw, r − ii k dµ = hw, r − ii dµ, ≥ lim sup k→∞ Ω Ω which shows, through a density argument, that w ∈ ∂F(µ). 4.5.5 The interaction energy In this section we consider the interaction energy functional W : P2 (Rd ) → [0, +∞] defined by Z 1 W(µ) := W (x − y) dµ × µ(x, y). 2 Rd ×Rd Without loss of generality we shall assume that W : Rd → [0, +∞) is an even function; our main assumption, besides the convexity of Rd , is the doubling condition ∃ CW > 0 : W (x + y) ≤ CW 1 + W (x) + W (y) ∀ x, y ∈ Rd . (4.93) Let us first state a preliminary result: we are denoting by µ̄ the barycenter of the measure µ: Z µ̄ := x dµ(x). (4.94) Rd 64 Lemma 4.17 Assume that W : Rd → [0, +∞) is convex, Gateaux differentiable, even, and satisfies the doubling condition (4.93). Then for any µ ∈ D(W) we have Z W (x) dµ(x) ≤ CW 1 + W(µ) + W (µ̄) < +∞, (4.95) Rd Z |∇W (x − y)| dµ × µ(x, y) ≤ CW 1 + SW + W(µ) < +∞, (4.96) Rd ×Rd where SW := sup|y|≤1 W (y). In particular w := (∇W ) ∗ µ is well defined for µ-a.e. x ∈ Rd , it belongs to L1 (µ; Rd ), and it satisfies Z h∇W (x1 − x2 ), y1 − x1 i dγ(x1 , y1 ) dµ(x2 ) R2d ×Rd Z (4.97) hw(x1 ), y1 − x1 i dγ(x1 , y1 ), = R2d for every plan γ ∈ Γ(µ, ν) with ν ∈ D(W). In particular, choosing γ := (i × r)# µ, we have Z Z h∇W (x − y), r(x)i dµ × µ(x, y) = hw(x), r(x)i dµ(x) (4.98) Rd ×Rd Rd for every vector field r ∈ L∞ (µ; Rd ) and for r := λi, λ ∈ R. Proof. By Jensen inequality we have Z W (x − y) dµ(y) ∀ x ∈ Rd , W (x − µ̄) ≤ (4.99) Rd so that a further integration yields Z W (x − µ̄) dµ(x) ≤ W(µ); (4.100) Rd (4.95) follows directly from (4.100) and the doubling condition (4.93), since W (x) ≤ CW 1 + W (x − µ̄) + W (µ̄) . Combining the doubling condition and the convexity of W we also get |∇W (x)| = sup h∇W (x), yi ≤ sup W (x + y) − W (x) |y|≤1 |y|≤1 ≤ CW 1 + W (x) + sup W (y) , (4.101) |y|≤1 which yields (4.96). If now ν ∈ D(W) and γ ∈ Γ(µ, ν), then the positive part of the map (x1 , y1 , x2 ) 7→ h∇W (x1 − x2 ), y1 − x1 i belongs to L1 (γ × µ) since convexity yields h∇W (x1 − x2 ), y1 − x1 i ≤ W (y1 − x2 ) − W (x1 − x2 ), 65 and the right hand side of this inequality is integrable: Z Z W (y1 −x2 ) dγ×µ = W (y1 −x2 ) dν×µ ≤ C 1+W(ν)+W(µ)+W (ν̄−µ̄) , R3d R2d Z Z W (x1 − x2 ) dγ × µ = R3d W (x1 − x2 ) dµ × µ = W(µ). R2d Therefore we can apply Fubini-Tonelli theorem to obtain Z h∇W (x1 − x2 ), y1 − x1 i dγ × µ(x1 , y1 , x2 ) R3d Z Z = h∇W (x1 − x2 ), y1 − x1 i dµ(x2 ) dγ(x1 , y1 ) 2d ZR ZX = h ∇W (x1 − x2 ) dµ(x2 ) , y1 − x1 i dγ(x1 , y1 ) 2d X ZR = hw(x1 ), y1 − x1 i dγ(x1 , y1 ), R2d which yields (4.97). As the interaction energy fails to satisfy (4.30a), as we did for the potential enegy functional we say that ξ ∈ L2 (µ; Rd ) belongs to the Fréchet subdifferential ∂W(µ) at µ ∈ D(W) if Z W(ν) − W(µ) ≥ inf hξ(x), y − xi dγ(x) + o W2 (µ, ν) . (4.102) γ∈Γo (µ,ν) Rd ×Rd Theorem 4.18 (Minimal subdifferential of W) Assume that W : Rd → [0, +∞) is convex, Gateaux differentiable, even, and satisfies the doubling condition (4.93). Then µ ∈ P2 (Rd ) belongs to D(|∂W|) if and only if w = (∇W )∗ρ ∈ L2 (µ; Rd ). In this case w = ∂ ◦ W(µ). Proof. As we did for the internal energy functional, we start by computing the directional derivative of W along a direction induced by a transport map r = i + t, with t bounded and with a compact support (by the growth condition on W , this ensures that W(r # µ) < +∞). Since the map t 7→ W ((x − y) + t(t(x) − t(y))) − W (x − y) t is nondecreasing w.r.t. t, the monotone convergence theorem and (4.98) give (taking into account that ∇W is an odd function) +∞ W((i + tt)# µ − W(µ) > lim t↓0 t Z Z 1 h∇W (x − y), (t(x) − t(y))i dµ × µ = hw, ti dµ. = 2 Rd ×Rd Rd 66 On the other hand, since |∂W|(µ) < +∞, using the inequality W2 ((i+tt)# µ, µ) ≤ ktkL2 (µ;Rd ) we get Z hw, ti dµ ≥ −|∂W|(µ)ktkL2 (µ;Rd ) ; Rd changing the sign of t we obtain Z ≤ |∂W|(µ)ktkL2 (µ;Rd ) , hw, ti dµ d R and this proves that w ∈ L2 (µ; Rd ) and that kwkL2 ≤ |∂W|(µ). Now we prove that if w = (∇W ) ∗ µ ∈ L2 (µ; X), then it belongs to ∂W(µ). Let us consider a test measure ν ∈ D(W), a plan γ ∈ Γ(µ, ν), and the directional derivative of W along the direction induced by γ. Since the map t 7→ W ((1 − t)(x1 − x2 ) + t(y1 − y2 )) − W (x1 − x2 ) t is nondecreasing w.r.t. t, the monotone convergence theorem, the fact that ∇W is an odd function, and (4.98) give W(((1 − t)π 1 + tπ 2 )# γ − W(µ) W(ν) − W(µ) ≥ lim t↓0 t Z 1 = h∇W (x1 − x2 ), (y1 − x1 ) − (y2 − x2 )i dγ × γ 2 2d 2d Z R ×R hw(x1 ), y1 − x1 i dγ(x1 , y1 ), = R2d and this proves that (i × w)# µ ∈ ∂W(µ). 4.5.6 The opposite Wasserstein distance In this section we compute the (metric) slope of the function ψ(·) := − 21 W22 (·, µ2 ), i.e. the limit 1 W 2 (ν, µ2 ) − W22 (µ, µ2 ) lim sup 2 = |∂ψ|(µ); (4.103) 2 ν→µ W2 (ν, µ) observe that the triangle inequality shows that the “lim sup” above is always less than W2 (µ, µ2 ); however this inequality is always strict when optimal plans are not induced by transports, as the following theorem shows; the right formula for the slope involves the minimal L2 norm of the barycentric projection of the optimal plans and gives that the minimal selection is always induced by a map. Theorem 4.19 (Minimal subdifferential of the opposite Wasserstein distance) Let ψ(µ) = − 21 W22 (µ, µ2 ). Then Z 2 2 2 |∂ψ| (µ) = min |γ̄ − i| dµ : γ ∈ Γo (µ, µ ) X 67 ∀µ ∈ P2 (X), (4.104) and ∂ ◦ ψ(µ) = γ̄ −i is a strong subdifferential, where γ is the unique minimizing plan above. Moreover µ 7→ |∂ψ|(µ) is lower semicontinuous with respect to narrow convergence in P(X), along sequences bounded in P2 (X). 4.5.7 The sum of internal, potential and interaction energy In this section we consider, as in [21], the functional φ : P2 (Rd ) → (−∞, +∞] given by the sum of internal, potential and interaction energy: Z Z Z 1 F (ρ) dx + V dµ + W dµ × µ if µ = ρL d , (4.105) φ(µ) := 2 Rd ×Rd Rd Rd setting φ(µ) = +∞ if µ ∈ P2 (Rd )\P2a (Rd ). Recalling the “doubling condition” stated in (4.78), we make the following assumptions on F , V and W : (F) F : [0, +∞) → R is a doubling, convex differentiable function with superlinear growth satisfying (4.71) (i.e. the bounds on F − ) and (4.76) (yielding the geodesic convexity of the internal energy). (V) V : Rd → (−∞, +∞] is a l.s.c. λ-convex function with proper domain D(V ) with nonempty interior Ω ⊂ Rd . (W) W : Rd → [0, +∞) is a convex, differentiable, even function satisfying the doubling condition (4.93). The finiteness of φ yields supp µ ⊂ Ω = D(V ), µ(∂Ω) = 0, (4.106) so that its density ρ w.r.t. L d can be considered as a function of L1 (Ω). The same monotonicity argument used in the proof of Lemma 4.13 gives R R Z V d((1 − t)i + tr))# µ − Rd V dµ Rd = h∇V, r − ii dµ, (4.107) +∞ > lim t↓0 t Rd R R whenever both Rd V dµ < +∞ and Rd V dr # µ < +∞. Analogously, denoting by W the interaction energy functional induced by W/2, arguing as in the first part of Theorem 4.18 we have Z W(((1 − t)i + tr)# µ) − W(µ) +∞ > lim = h(∇W ) ∗ µ), r − ii dµ, (4.108) t↓0 t Rd whenever W(µ) + W(r # µ) < +∞. The growth condition on W ensures that µ ∈ D(W) implies r # µ ∈ D(W) if either r − i is bounded or r = 2i (here we use the doubling condition). We have the following characterization of the minimal selection in the subdifferential ∂ ◦ φ(µ): 68 Theorem 4.20 (Minimal subdifferential of φ) A measure µ = ρL d ∈ D(φ) ⊂ 1,1 P2 (Rd ) belongs to D(|∂φ|) if and only if LF (ρ) ∈ Wloc (Ω) and ρw = ∇LF (ρ) + ρ∇V + ρ(∇W ) ∗ ρ for some w ∈ L2 (µ; Rd ). (4.109) In this case the vector w defined µ-a.e. by (4.109) is the minimal selection in ∂φ(µ), i.e. w = ∂ ◦ φ(µ). Proof. We argue exactly as in the proof of Theorem 4.15, computing the Gateaux derivative of φ in several directions r, using Lemma 4.13 for the internal energy and (4.107), (4.108) respectively for the potential and interaction energy. Choosing r = i + t, with t ∈ Cc∞ (Ω; Rd ), we obtain Z Z Z − LF (ρ)∇ · t dx + h∇V, ti dµ + h(∇W ) ∗ ρ, ti dµ ≥ −|∂φ|(µ)ktkL2 (µ) . Rd Rd Rd (4.110) Since V is locally Lipschitz in Ω and ∇W ∗ ρ is locally bounded, following the same argument of Theorem 4.15, we obtain from (4.110) first that LF (ρ) ∈ 1,1 BVloc (Rd ) and then that LF (ρ) ∈ Wloc (Rd ), with ∇LF (ρ) + ρ∇V + ρ(∇W ) ∗ ρ = wρ for some w ∈ L2 (µ; Rd ) (4.111) with kwkL2 ≤ |∂φ|(µ). In order to show that the vector w is in the subdifferential (and then, by the previous estimate, it is the minimal selection) we choose eventually a test measure ν ∈ D(φ) with compact support contained in Ω and the associated optimal transport map r = tνµ ; Lemma 4.13, (4.107), (4.108), and Lemma 4.14 yield d φ(ν) − φ(µ) ≥ φ (((1 − t)i + tr)# µ) t=0+ dt Z Z Z =− LF (ρ)∇ · (r − i) dx + h∇V, r − ii dµ + h(∇W ) ∗ ρ, r − ii dµ Ω Ω Ω Z Z ≥ lim sup h∇LF (ρ), r − iiχh dx + h∇V + (∇W ) ∗ ρ, r − ii dµ h→∞ Ω Ω Z = lim sup h∇LF (ρ) + ρ∇V + ρ(∇W ) ∗ ρ, r − iiχh dx Zh→∞ Ω Z hρw, r − ii dx = hw, r − ii dµ. = Ω Ω Finally, we notice that the proof that w belongs to the subdifferential did not use the finiteness of slope, but only the assumption (previously derived by the 1,1 finiteness of slope) that LF (ρ) ∈ Wloc (Ω), (4.109), and φ(µ) < +∞; therefore these conditions imply that the subdifferential is not empty, hence the slope is finite and the vector w is the minimal selection in ∂φ(µ). An interesting particular case of the above result is provided by the relative entropy functional: let us choose W ≡ 0 and F (s) := s log s, γ := 1 −V e · L d = e−(V (x)+log Z) · L d , Z 69 with Z > 0 chosen so that γ(Rd ) = 1. Recalling Remark 3.16, the functional φ can also be written as φ(µ) = H(µ|γ) − log Z. (4.112) Since in this case LF (ρ) = ρ, a vector w ∈ L2 (µ; Rd ) is the minimal selection ∂ ◦ φ(µ) if and only if Z Z Z − ∇ · ζ(x) dµ(x) = hw(x), ζ(x)i dµ(x) − h∇V (x), ζ(x)i dµ(x), Rd Rd Rd (4.113) for every test function ζ ∈ Cc∞ (Rd ; Rd ); (4.113) can also be written in terms of σ = dµ dγ as Z − σ∇ · (e−V (x) ζ(x)) dx = Rd Z hσw(x), e−V (x) ζ(x)i dx, Rd which shows that σw = ∇σ. 70 (4.114) 5 Gradient flows of λ-geodesically convex functionals in P2 (Rd ) In this chapter we state some structural results, concerning existence, uniqueness, approximation, and qualitative properties of gradient flows in P2 (Rd ) generated by a proper and l.s.c. functional φ : P2 (Rd ) → (−∞, +∞]. (5.1a) We will also assume that φ is λ-geodesically convex, according to Definition 3.1. (5.1b) Since we are mostly concnerned with absolutely continuous measures, some technical details will be simpler assuming that D(|∂φ|) ⊂ P2a (Rd ); (5.1c) finally, the (simplified) existence theory we are presenting here will also require that for some τ∗ > 0 1 2 W (µ, ν) + φ(ν) admits at least 2τ 2 a minimum point µτ , for all τ ∈ (0, τ∗ ) and µ ∈ P2 (X). the map ν 7→ Φ(τ, µ; ν) = (5.1d) Notice that (5.1b) gives that any minimizer µτ in (5.1d) belongs to P2a (Rd ), due to Lemma 4.4. Remark 5.1 (5.1d) is slightly more restrictive than lower semicontinuity in P2 (Rd ); by the standard direct method in Calculus of Variations, it surely holds if φ satisfies the following coerciveness-l.s.c. conditions: inf µ∈P2 (Rd ) φ(µ) + ) µn → µ narrowly in P(Rd ) sup m2 (µn ) < +∞ 1 2 m (µ) > −∞, 2τ∗ 2 =⇒ lim inf φ(µn ) ≥ φ(µ). n→∞ (5.2a) (5.2b) n Another sufficient condition yielding (5.1d) and satisfied by our main examples is (5.60): it will be introduced in the “existence” Theorem 5.8. The inclusion (5.1c) is a simplifying assumption, which ensures that the flows stay inside the absolutely continuous measures, thus avoiding more complicated notions of subdifferentials (see Chapter 11 of [8], where this restriction is completely removed). 2 Definition 5.2 (Gradient flows) We say that a map µt ∈ ACloc (0, +∞); P2 (Rd ) is a solution of the gradient flow equation v t ∈ −∂φ(µt ) 71 t > 0, (5.3) if, for L 1 -a.e. t > 0, µt ∈ P2a (Rd ) and its velocity vector field v t ∈ Tanµt P2 (Rd ) belongs to the subdifferential (4.19) of φ at µt . Recalling the characterization of the tangent velocity field to an absolutely continuous curve, the above definition is equivalent to the requirement that there exists a Borel vector field v t such that v t ∈ Tanµt P2 (Rd ) for L 1 -a.e. t > 0, kv t kL2 (µt ;Rd ) ∈ L2loc (0, +∞), (5.4a) the continuity equation ∂t µt + ∇ · v t µt = 0 in Rd × (0, +∞) (5.4b) holds in the sense of distributions according to (2.47), and finally −v t ∈ ∂φ(µt ) for L 1 -a.e. t > 0. (5.4c) Before studying the question of existence of solutions to (5.3), which we will postpone to the next sections, we want to discuss some preliminary issues. 5.1 Uniqueness and various characterizations of gradient flows Theorem 5.3 (Gradient flows, E.V.I., and curves of Maximal Slope) Let φ : P2 (Rd ) → (−∞, +∞] be as in (5.1a,b). An absolutely continuous curve 2 (0, +∞); P2 (Rd ) with µt ∈ P2a (Rd ) for L 1 -a.e. t ∈ (0, +∞) is a µ ∈ ACloc gradient flow of φ according to Definition 5.2 if and only if it satisfies one of the following equivalent characterizations: i) There exists a Borel vector field ṽ t with kṽ t kL2 (µt ;Rd ) in L2loc (0, +∞) such that ∂t µt + ∇ · (ṽ t µt ) = 0 in Rd × (0, +∞), in the sense of distributions, and Z λ − hṽ t , tσµt − ii dµt ≤ φ(σ) − φ(µt ) − W22 (σ, µt ) 2 Rd (5.5a) ∀σ ∈ D(φ), (5.5b) L 1 -a.e. in (0, +∞). ii) Every Borel vector field ṽ t with kṽ t kL2 (µt ;Rd ) in L2loc (0, +∞) (in particular the velocity vector field v t ∈ Tanµt P2 (Rd )) satisfying the continuity equation ∂t µt + ∇ · (ṽ t µt ) = 0 in Rd × (0, +∞), (5.6a) in the sense of distributions, satisfies the variational inequality Z λ − hṽ t , tσµt − ii dµt ≤ φ(σ) − φ(µt ) − W22 (σ, µt ) ∀σ ∈ D(φ), (5.6b) 2 d R for t ∈ (0, +∞) \ N , N being a L 1 -negligible set. 72 iii) The metric Evolution Variational Inequalities 1 d 2 λ W (µt , σ) + W22 (µt , σ) ≤ φ(σ) − φ(µt ) 2 dt 2 2 for L 1 -a.e. t > 0 (5.7) hold for every σ ∈ D(φ). iv) The map t 7→ φ(µt ) is locally absolutely continuous in (0, +∞) and − 1 1 d φ(µt ) ≥ kv t k2L2 (µt ;Rd ) + |∂φ|2 (µt ) dt 2 2 L 1 -a.e. in (0, +∞). (5.8) v) The map t 7→ φ(µt ) is locally absolutely continuous in (0, +∞) and − d φ(µt ) = kv t k2L2 (µt ;Rd ) = |∂φ|2 (µt ) dt L 1 -a.e. in (0, +∞). (5.9) In particular, (5.3) and v) yield −v t = ∂ ◦ φ(µt ) for L 1 -a.e. t > 0. (5.10) Proof. i) If µt is a gradient flow according to Definition 5.2, recalling the property of the subdifferential (4.36), it is immediate that µt and its velocity vector field v t satisfy (5.5a,b). Conversely, suppose that ṽ t satisfies (5.5a,b) and let us denote by v t ∈ Tanµt P2 (Rd ) the tangent velocity vector of µt . Since, by (2.56), for L 1 -a.e. t > 0 v t is the orthogonal projection of ṽ t on Tanµt P2 (Rd ), the difference ṽ t − v t is orthogonal to the tangent space, and therefore by Theorem 2.24 we have Z hṽ t − v t , tσµt − ii dµt = 0 ∀ σ ∈ P2 (Rd ), for L 1 -a.e. t > 0. (5.11) Rd As a consequence, v t fulfils (5.5b) for L 1 -a.e. t, and this property characterizes the elements of the subdifferential. ii) follows by the same argument, thanks to (5.11). iii) Assume that (5.7) holds for all σ ∈ D(φ). For any σ ∈ D(φ) fixed, the differentiability of W22 stated in Lemma 2.23 gives Z 1 d 2 W2 (µt , σ) = hv t , i − tσµt i dµt for L 1 -a.e. t ∈ (0, +∞). 2 dt d R Therefore we can find, for any countable set D ⊂ D(φ), a L 1 -negligble set of times N such that Z λ (5.12) − hv t , tσµt − ii dµt ≤ φ(σ) − φ(µt ) − W22 (σ, µt ) 2 Rd holds for all t ∈ (0, +∞) \ N and all φ ∈ D. Choosing D to be dense relative to the distance W2 (µ, ν) + |φ(µ) − φ(ν)| in D(φ), we obtain that (5.5b) holds for all t ∈ (0, +∞) \ N . The converse implication is analogous. 73 L’argomento di densita’ qui mi sembra inutile iv) If µt is a gradient flow in the sense of (5.3), taking into account that |mu0t | = kv t kL2 (µt ;Rd ) and that |∂φ(µt )| ≤ kv t kL2 (µt ;Rd ) (by (4.52)) we obtain |∂φ(µt )||µ0t | ∈ L1loc ((0, +∞)) . Thanks to the λ-convexity and the lower semicontinuity of φ, this implies (see Corollary 2.4.10 in [8]) that t 7→ φ(µt ) is locally absolutely continuous in (0, +∞). Then, the chain rule (4.55) easily yields Z d − φ(µt ) = |v t |2 dµt ≥ |∂φ|2 (µt ) (5.13) dt Rd and therefore (5.8). Conversely, if t 7→ φ(µt ) is locally absolutely continuous and µt satisfies (5.8), we know that ∂φ(µt ) 6= ∅ for L 1 -a.e. t > 0; thus the chain rule (4.55) shows that Z d ϕ(t) = hξ, v t i dµt ∀ ξ ∈ ∂φ(µt ), for L 1 -a.e. t > 0. (5.14) dt Rd Choosing in particular ξ t = ∂ ◦ φ(µt ), for L 1 -a.e. t > 0 we get Z 1 1 |v t |2 + |ξ t |2 + hξ t , v t i dµt ≤ 0. 2 Rd 2 (5.15) It follows that ξ t (x) = −v t (x) for µt -a.e. x ∈ Rd , i.e. v t = −∂ ◦ φ(µt ). v) is equivalent to iv) by the previous argument. Remark 5.4 The “purely metric” formulations (5.7) or (5.8) do not require that µt is an absolutely continuous measure at L 1 -a.e. t ∈ (0, +∞) and do not depend on an explicit expression of the subdifferential of φ, as only the metric slope is involved; therefore they can be used to define the gradient flow of φ under more general assumptions: again, we refer to [8] for a complete development of this approach. Theorem 5.5 (Uniqueness of gradient flows) If µit : (0, +∞) → P2 (Rd ), i = 1, 2, are gradient flows satisfying µit → µi ∈ P2 (Rd ) as t ↓ 0 in P2 (Rd ), then W2 (µ1t , µ2t ) ≤ e−λt W2 (µ1 , µ2 ) ∀ t > 0. (5.16) In particular, for any µ0 ∈ P2 (Rd ) there is at most one gradient flow µt satisfying the initial Cauchy condition µt → µ0 as t ↓ 0. Proof. If µ1t , µ2t are two gradient flows satisfying the initial Cauchy condition µit → µi as t ↓ 0, i = 1, 2, by the E.V.I. formulation (5.7) we can apply the next Lemma 5.6 with the choices d(s, t) := W22 (µ1s , µ2t ), δ(t) := d(t, t), thus obtaining δ 0 ≤ −2λδ. Since δ(0+ ) = W22 (µ1 , µ2 ) we obtain (5.16). 74 Lemma 5.6 Let d(s, t) : (a, b)2 → R be a map satisfying |d(s, t) − d(s0 , t)| ≤ |v(s) − v(s0 )|, |d(s, t) − d(s, t0 )| ≤ |v(t) − v(t0 )| for any s, t, s0 , t0 ∈ (a, b), for some locally absolutely continuous map v : (a, b) → R and let δ(t) := d(t, t). Then δ is locally absolutely continuous in (a, b) and d d(t, t) − d(t − h, t) d(t, t + h) − d(t, t) δ(t) ≤ lim sup +lim sup dt h h h↓0 h↓0 L 1 -a.e. in (a, b). Proof. Since |δ(s) − δ(t)| ≤ 2|v(s) − v(t)| the function δ is locally absolutely continuous. We fix a nonnegative function ζ ∈ Cc∞ (a, b) and h > 0 such that ±h + supp ζ ⊂ (a, b). We have then Z b Z b ζ(t + h) − ζ(t) d(t, t) − d(t − h, t − h) δ(t) dt = ζ(t) dt − h h a a Z b Z b d(t, t) − d(t − h, t) d(t, t + h) − d(t, t) = ζ(t) dt + ζ(t + h) dt, h h a a where the last equality follows by adding and subtracting d(t − h, t) and then making a change of variables in the last integral. Since h−1 |d(t, t) − d(t − h, t)| ≤ h−1 |v(t) − v(t − h)| → |v 0 (t)| in L1loc (a, b) as h ↓ 0 and an analogous inequality holds for the other difference quotient, we can apply (an extended version of) Fatou’s Lemma and pass to the upper limit in the integrals as h ↓ 0 (recall that Fatou’s lemma with the limsup holds even for sequences bounded above by a sequence strongly convergent in L1 ); denoting byR α and β Rthe two upper derivatives in the statement of the Lemma we get − δζ 0 dt ≤ (α + β)ζ dt, whence the inequality between distributions follows. 5.2 Main properties of Gradient Flows In this section we collect the main properties of the gradient flow generated by a functional φ : P2 (Rd ) → (−∞, +∞] satisfying the assumptions (5.1a,b,c). Theorem 5.7 (Main properties of gradient flows) Let us suppose that φ : P2 (Rd ) → (−∞, +∞] satisfies (5.1a,b) and let us suppose that its gradient flow µt (according to the E.V.I. formulation (5.7)) exists for every initial value µ0 ∈ D, D being a dense subset of D(φ). λ-contractive semigroup. For every µ0 ∈ D(φ) there exists a unique solution µ := S[µ0 ] of the Cauchy problem associated to (5.3) with limt↓0 µt = µ0 . The map µ0 7→ St [µ0 ] is a λ-contracting semigroup on D(φ), i.e. W2 (S[µ0 ](t), S[ν0 ](t)) ≤ e−λt W2 (µ0 , ν0 ) 75 ∀ µ0 , ν0 ∈ D(φ). (5.17) Regularizing effect. St maps D(φ) into D(∂φ) ⊂ D(φ) for every t > 0, the map t 7→ eλt |∂φ|(µt ) is non increasing, (5.18) and each solution µt = St [µ0 ] satisfies the following regularization estimates: 1 2 if λ ≥ 0 φ(µt ) ≤ W2 (µ0 , σ) + φ(σ) 2t (5.19) λ 2 W (µ , σ) + φ(σ) if λ = 6 0 φ(µt ) ≤ 0 2 2(eλt − 1) − 1 e−2λ t |∂φ|2 (µt ) ≤ |∂φ|2 (σ) + 2 W22 (µ0 , σ) t Z (5.20) λ λ t 2 − W22 (µt , σ) − 2 W2 (µs , σ) ds 2t t 0 for every σ ∈ D(∂φ). Energy identity. If v t ∈ Tanµt P2 (Rd ) is the tangent velocity field of a gradient flow µt = St [µ0 ], then the energy identity hlods: Z a b Z |v t (x)|2 dµt (x) dt + φ(µb ) = φ(µa ) ∀ 0 ≤ a < b < +∞. (5.21) Rd Asymptotic behaviour. If λ > 0, then φ admits a unique minimum point µ and for t ≥ t0 we have λ 2 1 W (µt , µ̄) ≤ φ(µt ) − φ(µ̄) ≤ |∂φ|2 (µt ) ∀ t ≥ 0, 2 2 2λ W2 (µt , µ) ≤ W2 (µt0 , µ)e−λ(t−t0 ) , φ(µt ) − φ(µ) ≤ φ(µt0 ) − φ(µ) e−2λ(t−t0 ) , |∂φ|(µt ) ≤ |∂φ|(µt0 )e−λ(t−t0 ) . (5.22a) (5.22b) (5.22c) (5.22d) If λ = 0 and µ is any minimum point of φ then we have W2 (µ0 , µ) W 2 (µ0 , µ) , φ(µt ) − φ(µ) ≤ 2 , t 2t the map t 7→ W2 (µt , µ) is not increasing. |∂φ|(µt ) ≤ (5.23) Right and left limits, precise pointwise formulation of the equation. If moreover φ satisfies (5.1c), then for every t > 0 the right limit µ v t+ := lim h↓0 tµt+h −i t h exists in L2 (µt ; Rd ) (5.24) and satisfies −v t+ = ∂ ◦ φ(µt ) 76 ∀ t > 0, (5.25) d φ(µt ) = − dt+ Z |v t+ |2 dµt = −|∂φ|2 (µt ) ∀ t > 0. (5.26) Rd (5.24), (5.25), and (5.26) hold at t = 0 iff µ0 ∈ D(∂φ) = D(|∂φ|). Moreover, there exists an at most countable set C ⊂ (0, +∞) such that the analogous identities for the left limits hold for every t ∈ (0, +∞) \ C: µ t t−h − i v t− = lim µt = −∂ ◦ φ(µt ), h↓0 h ‘ ∀ t ∈ (0, +∞) \ C. (5.27) d 2 φ(µt ) = −|∂φ| (µt ). dt− Proof. Regularizing effect. We first observe that for every h > 0 the map t 7→ µt+h is still a gradient flow, and therefore estimate (5.16) yields W2 (µt+h , µt ) ≤ e−λ(t−t0 ) W2 (µt0 +h , µt0 ) ∀ 0 ≤ t0 < t < +∞. (5.28) Setting δ(t) := lim sup h↓0 W2 (µt+h , µt ) h t ≥ 0, (5.29) (5.28) yields the map t 7→ eλt δ(t) is nonincreasing. (5.30) We denote by N the subset of (0, +∞) whose points t0 satisfies µt0 ∈ D(∂φ) ⊂ P2a (Rd ), the metric derivative of µt coincides with kv t kL2 (µt ;Rd ) and −v t0 = ∂ ◦ φ(µt0 ): by the definition of gradient flow, Theorem 2.17, and point v) of Theorem 5.3, L 1 (0, +∞) \ N = 0 and δ(t) = kv t kL2 (µt0 ;Rd ) = |∂φ|(µt ) < +∞ ∀ t ∈ (0, +∞) \ N ; (5.31) in particular, (5.30) yields δ(t) < +∞ for every t > 0. We want to show now that δ(t) = |∂φ|(µt ) ∀ t ≥ 0. (5.32) Integrating the E.V.I. (5.7) in the interval (t, t + h) and dividing by h we get for every σ ∈ D(φ) 1 h Z 0 h λ 2 W2 (µt+s , σ) ds − φ(σ) 2 1 2 1 2 ≤ W2 (µt , σ) − W (µt+h , σ) 2h 2h 2 W2 (µt+h , µt ) ≤ W2 (µt , σ) + W2 (µt+h , σ) . 2h φ(µt+s ) + 77 (5.33) Passing to the limit as h ↓ 0 and recalling that the map t 7→ φ(µt ) is (absolutely) continuous, we obtain φ(µt ) − φ(σ) + λ 2 W (µt , σ) ≤ δ(t)W2 (µt , σ), 2 2 (5.34) which yields |∂φ|(µt ) ≤ δ(t) ∀ t ≥ 0. (5.35) Choosing σ := µt in (5.33), and rescaling the integrand, we can use ?? to obtain Z 1 1 1 λ 2 2 W (µ , µ ) ≤ φ(µ ) − φ(µ ) − W (µ , µ ) ds t+h t t t+hs t+hs t 2 2 2h2 h 0 2 Z 1 Z 1 2 W2 (µt+hs , µt ) W2 (µt+hs , µt ) ≤ |∂φ|(µt ) s ds − λ ds. hs h 0 0 Passing to the limit as h ↓ 0 we obtain Z 1 1 1 2 δ (t) ≤ |∂φ|(µt ) δ(t)s ds = |∂φ|(µt )δ(t), 2 2 0 (5.36) which yields (5.32) and in particular (5.18). The estimates (5.19) follow easily by integrating in the interval (0, t) the following form of (5.7) d eλs 2 W (µs , σ) + eλs φ(µs ) ≤ eλs φ(σ) ds 2 2 and recalling that t 7→ φ(µt ) is nonincreasing; when λ 6= 0 we get Z t Z t d eλs 2 eλt − 1 φ(µt ) ≤ W2 (µs , σ) ds + eλs φ(σ) ds λ 0 ds 2 0 1 eλt − 1 ≤ W22 (µ0 , σ) + φ(σ). 2 λ (5.37) d φ(µt ) = |∂φ|2 (µt ) and In order to show (5.20) we apply (5.18), the fact that − dt finally the E.V.I. to obtain Z t Z t − 0 e−2λ t t2 2 −2λ− s 2 |∂φ| (µt ) ≤ se |∂φ| (µs ) ds ≤ − s φ(µs ) ds 2 0 0 Z t = φ(µs ) ds − tφ(µt ) 0 Z 1 1 λ t 2 ≤ t(φ(σ) − φ(µt )) + W22 (µ0 , σ) − W22 (µt , σ) − W2 (µs , σ) ds. 2 2 2 0 If σ ∈ D(∂φ), using ?? we can bound the right hand side by Z 1 1 2 λ t 2 2 t|∂φ|(σ)W2 (µt , σ) − (tλ + 1)W2 (µt , σ) + W2 (µ0 , σ) − W2 (µs , σ) ds 2 2 2 0 Z t2 tλ 1 λ t 2 W2 (µs , σ) ds, ≤ |∂φ|2 (σ) − W22 (µt , σ) + W22 (µ0 , σ) − 2 2 2 2 0 which yields (5.20). 78 λ-contractive semigroup. Thanks to the λ-contraction estimate of Theorem 5.5, it is now easy to extend the semigroup S defined on D to its closure, which coincides with D(φ). Observe that each trajectory µt of the extended semigroup still satisfies the E.V.I. formulation (5.7); moreover, the previous regularization estimates show that t 7→ µt is locally Lipschitz and µt ∈ D(|∂φ|) for every t > 0, in particular µt ∈ P2a (Rd ) for every t > 0. Theorem 5.3 then shows that µt is a gradient flow for φ. Energy identity. It is an immediate consequence of (5.9). Asymptotic behaviour. When λ > 0 (5.28) shows that for every gradient flow µt the sequence k 7→ µk satisfies the Cauchy condition in P2 (Rd ), since W2 (µk+1 , µk ) ≤ e−λ W2 (µk , µk−1 ). (5.38) Therefore it is convergent to some limit µ̄; (5.19) and the lower semicontinuity of φ show that µ̄ is a minimum point for φ; in particular, the constant curve t 7→ µ̄ is a gradient flow. (5.22b) is a particular case of the λ-contraction property (5.17) and in particular it shows that the minimum point µ̄ is unique, when λ > 0. The inequality (5.22d) is simply (5.30), while (5.22a) is a general property of λ-geodesically convex functions (even in metric spaces, see Theorem 2.4.14 of [8]): for, if µ ∈ D(∂φ), property ?? of the slope and Young inequality yield 1 λ 2 W (µ, µ̄) ≤ |∂φ|2 (µ). 2 2 2λ For the opposite inequality, being 0 ∈ ∂φ(µ̄), from (4.36) we easily get φ(µ) − φ(µ̄) ≤ |∂φ|(µ)W2 (µ, µ̄) − λ 2 W (µ, µ̄). 2 2 The estimate (5.22c) now follows by observing that (5.39) yields φ(µ) − φ(µ̄) ≥ d φ(µt ) − φ(µ̄) = −|∂φ|2 (µt ) ≤ −2λ φ(µt ) − φ(µ̄) . dt (5.39) (5.40) (5.41) Right and left limits, precise pointwise formulation of the equation. Here, for the sake of simplicity, we are assuming that λ ≥ 0. We already know that ∂φ(µt ) is not empty for t > 0: we set ξ t = ∂ ◦ φ(µt ); since the slope |∂φ| is lower semicontinuous ?? and the map t 7→ |∂φ|(µt ) is nonincreasing, we obtain |∂φ|(µt ) = lim |∂φ|(µt+h ). h↓0 (5.42) Moreover, the map t 7→ φ(µt ) is absolutely continuous, nonincreasing, and its time derivative coincides L 1 -a.e. with the nondecreasing map −|∂φ|2 (µt ); it follows that t 7→ φ(µt ) is continuous and convex, so that ∃ φ(µt+h ) − φ(µt ) d φ(µt ) = lim = −|∂φ|2 (µt ) = −δ 2 (t) ∀ t > 0. h↓0 dt+ h 79 (5.43) Let now fix t > 0 and an infinitesimal sequence hn such that µ tµtt+hn − i * ṽ t hn weakly in L2 (µt ; Rd ). (5.44) By the definition of subdifferential, it is immediate to check that Z d −|∂φ|2 (µt ) = −kξ t k2L2 (µt ;Rd ) = φ(µt ) ≥ hξ t , ṽ t i dµt . dt+ Rd (5.45) On the other hand kṽ t kL2 (µt ;Rd ) ≤ δ(t) = kξ t kL2 (µt ;Rd ) . (5.46) It follows that ṽ t = −ξ t ; since the limit is uniquely determined independently of the subsequence hn , we obtain that µ lim h↓0 tµtt+h − i = −ξ t h weakly in L2 (µt ; Rd ). (5.47) On the other hand µt+h tµt − i W2 (µt , µt+h ) = δ(t) = kξ t kL2 (µt ;Rd ) lim sup = lim sup h h h↓0 h↓0 L2 (µt ;Rd ) and therefore the limit in (5.47) is also strong in L2 (µt ; Rd ). The same argument can be applied for the left limit at each continuity point of the map t 7→ |∂φ|(µt ) (whose complement C in (0, +∞) is at most countable) i.e. for every t such that lim |∂φ|(µt−h ) = |∂φ|(µt ), (5.48) d φ(µt ) == −|∂φ|2 (µt ) dt (5.49) h↓0 observing that in this case ∃ and (by the L 1 -a.e. equality of |µt |0 and |∂φ(µt )| and the monotonicity of eλt |∂φ(µt )|) Z Z W2 (µt−h , µt ) 1 t 1 t = |µ0s | ds = |∂φ|(µs ) ds h h t−h h t−h ≤ 1 − e−λh |∂φ|(µt−h ), λh and therefore µt−h tµt − i W2 (µt−h , µt ) lim sup ≤ |∂φ|(µt ) = lim sup 2 h h d h↓0 h↓0 L (µt ;R ) Anche qui ho inserito una correzione dipendente da lambda if t ∈ (0, +∞)\C. (5.50) 80 5.3 Existence of Gradient Flows by convergence of the “Minimizing Movement” scheme The existence of solutions to the Cauchy problem for (5.3) will be obtained as limit of a variational approximation scheme (the “Minimizing Movement” scheme, in De Giorgi’s terminology [25], which we will briefly recall. The variational approximation scheme Let us introduce a uniform partition Pτ of (0, +∞) by intervals Iτn of size τ > 0 Pτ := 0 < t1τ = τ < t2τ = 2τ < · · · < tnτ = nτ < · · · , Iτn := ((n − 1)τ, nτ ], and a given family of “discrete” values Mτ0 approximating the initial value µ0 ∈ D(φ) so that in P2 (Rd ), Mτ0 → µ0 φ(Mτ0 ) → φ(µ0 ) as τ ↓ 0. (5.51) If (5.1c) and (5.1d) are satisfied, for every τ ∈ (0, τ∗ ) we can find sequences (Mτn )n∈N ⊂ P2a (Rd ) recursively defined by solving the variational problem Mτn minimizes µ 7→ Φ(τ, Mτn−1 ; µ) = 1 2 W (µ, Mτn−1 ) + φ(µ). 2τ 2 (5.52) We call “discrete solution” the piecewise constant interpolant M τ (t) :=Mτn if t ∈ ((n − 1)τ, nτ ], (5.53) and we say that a curve µt is a Minimizing Movement of Φ starting from µ0 , writing µt ∈ M M (Φ; µ0 ), if there exists a family of discrete solutions M τ such that M τ (t) → µt in P2 (Rd ) for every t > 0, as τ ↓ 0. (5.54) In order to clarify why this variational scheme provides an approximation of the Gradient Flow equation (5.3), we introduce the optimal transport maps M n−1 tnτ = tMττn pushing Mτn to Mτn−1 , and we define the discrete velocity vector Vτn as (i − tnτ )/τ . By Lemma 4.4 and Theorem 4.19 −Vτn = tnτ − i ∈ ∂φ(Mτn ), τ (5.55) which can be considered as an Euler implicit discretization of (5.3). By introducing the piecewise constant interpolant Vτ (t) := Vτn if t ∈ ((n − 1)τ, nτ ], (5.56) the identity (5.55) reads −Vτ (t) ∈ ∂φ(M τ (t)) 81 for t > 0. (5.57) By general compactness argument it would not be difficult to show that, up to subsequences, Vτ M τ * vµ in the distribution sense in Rd × (0, +∞), for some vector field v(t, x) = v t (x) satisfying in Rd × (0, +∞), ∂t µt + ∇ · (v t µt ) = 0 kv t kL2 (µt ;Rd ) ∈ L2loc (0, +∞). (5.58) The main difficulty is to show that the nonlinear equation (5.57) is preserved in the limit. Theorem 5.8 (Existence and approximation of Gradient Flows) Let us assume that φ : P2 (Rd ) → (−∞, +∞] satisfy (5.1a,b,c) and at least one of the following two conditions i) (Coercivity) For every C > 0 the sublevels n o µ ∈ P2 (Rd ) : φ(µ) ≤ C, m2 (µ) ≤ C are compact in P2 (Rd ). (5.59) ii) (Generalized convexity) For every µ ∈ D(|∂φ|) and σ0 , σ1 ∈ D(φ) s 7→ φ σs − λ W 2 (σ0 , σ1 )s2 2 2 is convex in [0, 1]. the map σ := (1 − s)tσ0 + stσ1 µ s µ µ (5.60) # Then for every µ0 ∈ D(φ) there exists a unique solution µt of the gradient flow (according to Definition 5.2) satisfying the Cauchy condition lim µt = µ0 t↓0 in P2 (Rd ). (5.61) Moreover, for every choice of the discrete initial values Mτ0 satisfying (5.51), the discrete solutions M τ (t) converge to µt in P2 (Rd ), uniformly in each bounded time interval. Finally, if condition ii) holds with λ ≥ 0 and Mτ0 = µ0 ∈ D(φ), for every t = kτ ∈ Pτ we have the apriori error estimate τ2 W22 (µt , M τ (t)) ≤ τ φ(µ0 ) − φτ (µ0 ) ≤ |∂φ|2 (µ0 ), 2 (5.62) where we set φτ (µ) := inf ν∈P2 (Rd ) φ(ν) + 1 2 W (µ, ν) = inf Φ(τ, µ; ν). 2τ 2 ν∈P2 (Rd ) (5.63) Proof. We shall only give a brief sketch of the proof (showing a rough error estimate, still sufficient to prove convergence) in a rather simplified setting, by assuming that the generalized convexity assumption (5.60) holds for λ ≥ 0, φ is nonnegative, and µ0 , Mτ0 ∈ D(φ). 82 Aggiungere qualcosa sul caso coercivo ? As a preliminary remark, let us observe that if σs is defined as in (5.60) we have Z 2 W22 (µ, σs ) = (1 − s)tσµ0 + stσµ1 − i dµ Rd Z 2 2 σ1 σ0 σ0 σ1 = (1 − s)tµ − i + stµ − i − s(1 − s)tµ − tµ dµ Rd Z 2 σ0 2 2 = (1 − s)W2 (µ, σ0 ) + sW2 (µ, σ1 ) − s(1 − s) tµ − tσµ1 dµ ≤ (1 − s)W22 (µ, σ0 ) + sW22 (µ, σ1 ) − s(1 − Rd s)W22 (σ0 , σ1 ). (5.64) This inequality reflects a nice convexity property of the functional Φ defined in (5.52) and provides the starting point of our estimates. A “metric variational inequality” for Mτn . The first step consists in writing a variational inequality for the discrete solution, analogous to (5.7): here we will use in a crucial way (5.60) and (5.64). In fact, it is easy to see that they yield the following strong convexity property for the functionals s 7→ Φ(τ, µ; σs ) Φ(τ, µ; σs ) ≤ (1 − s)Φ(τ, µ; σ0 ) + sΦ(τ, µ; σ1 ) − 1 s(1 − s)W22 (σ0 , σ1 ). (5.65) 2τ Starting from the minimum property (5.52) and applying (5.65) with µ := Mτn−1 , σ0 := Mτn , σ := σ1 ∈ D(φ), we get Φ(τ, Mτn−1 ; Mτn ) ≤ Φ(τ, Mτn−1 ; σs ) ≤(1 − s)Φ(τ, Mτn−1 ; Mτn ) + sΦ(τ, Mτn−1 ; σ) − 1 s(1 − s)W22 (Mτn , σ). 2τ The minimum condition says that the right derivative at s = 0 of the right hand side is nonnegative; thus we find Φ(τ, Mτn−1 ; σ) − Φ(τ, Mτn−1 ; Mτn ) − 1 2 W (Mτn , σ) ≥ 0 2τ 2 ∀ σ ∈ D(φ), which can also be written as 11 2 1 W2 (Mτn , σ) − W22 (Mτn−1 , σ) ≤ φ(σ) − φ(Mτn ) τ 2 2 1 2 W (Mτn , Mτn−1 ). − 2τ 2 (5.66) (5.67) A continuous formulation of (5.67). We want to write (5.67) as a true differential evolution inequality for the discrete solution M τ , in order to compare two discrete solutions corresponding to different time steps τ, η > 0, and to try to reproduce the same comparison argument which we used in Theorem 5.5. Therefore, we set φτ (t) := “the linear interpolant of φ(Mτn−1 ) and φ(Mτn )” if t ∈ (tn−1 , tnτ ], τ 83 i.e. φτ (t) := tnτ − t t − tn−1 τ φ(Mτn−1 ) + φ(Mτn ) τ τ t ∈ (tn−1 , tnτ ]. τ (5.68) Analogously, for any σ ∈ D(φ) we set Wτ2 (t; σ) := t − tn−1 tnτ − t 2 τ W2 (Mτn−1 , σ) + W22 (Mτn , σ) τ τ t ∈ (tn−1 , tnτ ]. τ (5.69) Since d 2 1 2 W2 (Mτn , σ) − W22 (Mτn−1 , σ) Wτ (t; σ) = dt τ t ∈ (tn−1 , tnτ ], τ neglecting the last negative term, (5.67) becomes d 1 2 1 Wτ (t; σ) ≤ φ(σ) − φτ (t) + Rτ (t) dt 2 2 ∀ t ∈ (0, T ) \ Pτ , (5.70) where we set, for t ∈ (tn−1 , tnτ ], τ 1 tn − t Rτ (t) := φτ (t) − φ(Mτn ) = τ φ(Mτn−1 ) − φ(Mτn ) ≥ 0. 2 τ (5.71) The comparison argument. We consider now another time step η > 0 inducing the partition Pη , a corresponding discrete solution (Mηk ), and the piecewise linear interpolating functions 2 Wτ,η (t, s) := s − tk−1 tkη − s 2 η Wτ (t, Mηk ) + Wτ2 (t, Mηk−1 ) s ∈ (tk−1 , tkη ], (5.72) η η η observing that Forse qui ci vorrebbe una spiegazione in piu’ 2 2 Wτ,η (t, s) = Wη,τ (s, t) ∀ s, t ≥ 0, 2 Wτ,η (tnτ , skη ) = W22 (Mτn , Mηk ). (5.73) Taking a convex combination w.r.t. the variable s ∈ Iτk of (5.70) written for σ := Mηk−1 and σ := Mηk , we easily get ∂ 1 2 1 W (t, s) ≤ φη (s) − φτ (t) + Rτ (t) ∂t 2 τ,η 2 t ∈ (0, +∞) \ Pτ , s > 0. (5.74) Reversing the rôles of η and τ , and recalling (5.73), we also find ∂ 1 2 1 Wτ,η (t, s) ≤ φτ (t) − φη (s) + Rη (s) ∂s 2 2 t > 0, s ∈ (0, +∞) \ Pη . (5.75) Summing up (5.74) and (5.75) we end up with ∂ 2 ∂ 2 Wτ,η (t, s)+ Wτ,η (s, t) ≤ Rτ (t)+Rη (s) ∂t ∂s 84 t ∈ (0, +∞)\Pτ , s ∈ (0, +∞)\Pη . (5.76) Choosing s = t we eventually find d 1 2 W (t, t) ≤ Rτ (t) + Rη (t) dt 2 τ,η t ∈ (0, ∞) \ (Pτ ∪ Pη ), (5.77) 2 and therefore, being t 7→ Wτ,η (t, t) continuous, 2 2 Wτ,η (T, T ) ≤ Wτ,η (0, 0) + 1 2 Z T Rτ (t) + Rη (t) dt ∀ T > 0. (5.78) 0 Observe now that Z +∞ Rτ (t) dt = 0 +∞ Z X j=1 tjτ tj−1 τ Rτ (t) dt = +∞ X τ φ(Mτj−1 ) − φ(Mτj ) j=1 (5.79) ≤ τ φ(Mτ0 ), so that (5.78) yields 2 Wτ,η (T, T ) ≤ W22 (Mτ0 , Mη0 ) + τ η φ(Mτ0 ) + φ(Mη0 ) 2 2 ∀ T > 0. (5.80) Convergence and rough error estimates. Recalling that W22 (Mτn , Mτn−1 ) ≤ τ φ(Mτ0 ), W22 (Mηk , Mηk−1 ) ≤ ηφ(Mη0 ), and 2 W22 (M τ (t), M η (t)) ≤ 3 Wτ,η (t, t)+W22 (Mτn , Mτn−1 )+W22 (Mηk , Mηk−1 ) t ∈ Iτn ∩Iηk , we get sup W22 (M τ (t), M η (t)) ≤ 3 W22 (Mτ0 , Mη0 ) + 2τ φ(Mτ0 ) + 2ηφ(Mη0 ) , (5.81) t≥0 thus showing that τ 7→ M τ (t) is a Cauchy sequence in P2 (Rd ) for every t ≥ 0. Denoting by µt its limit, we can pass to the limit in (5.80) as η ↓ 0 by taking τ fixed and choosing t ∈ Pτ , thus obtaining the error estimate sup W22 (M τ (t), µt ) ≤ W22 (Mτ0 , µ0 ) + τ φ(Mτ0 ). (5.82) t∈Pτ µt is the gradient flow. To this aim, it suffices to check that µt satisfies the metric Evolution Variational Inequality (5.7) with λ = 0 for every σ ∈ D(φ). Starting from the integrated form of (5.70) and recalling (5.79), we get for every 0 < a < b < +∞ Z b 1 1 2 Wτ (b, σ) − Wτ2 (a, σ) + φτ (t) dt ≤ (b − a)φ(σ) + τ φ(Mτ0 ). (5.83) 2 2 a 85 Since lim Wτ2 (t, σ) = W22 (µt , σ), τ ↓0 lim inf φτ (t) ≥ φ(µt ), τ ↓0 lim φ(Mτ0 ) = φ(µ0 ) < +∞, τ ↓0 we easily get 1 1 2 W (µb , σ) − W22 (µa , σ) + 2 2 2 Z b φ(µt ) dt ≤ (b − a)φ(σ) ∀ σ ∈ D(φ), (5.84) a which yields (5.7). The regularization estimates of Theorem 5.7 (which depend only on the metric E.V.I. formulation) show then that µt ∈ P2a (Rd ) for t > 0. 5.4 Bibliographical notes There are at least four possible approaches to gradient flows which can be adapted to the framework of Wasserstein spaces: 1. The “Minimizing Movement” approximation. We can simply consider any limit curve of the variational approximation scheme we introduced in Section 5.3, a “Generalized minimizing movement” in the terminology suggested by E. De Giorgi in [25]. In the context of P2 (Rd ) this procedure has been first used in [45, 54, 55, 53, 56] and subsequently it has been applied in many different contexts, e.g. by [44, 52, 57, 38, 39, 42, 35, 18, 19, 1, 40, 32, ?]. It has the advantage to allow for the greatest generality of functionals φ, it provides a simple constructive method for proving existence of gradient flows, and it can be applied to arbitrary metric spaces, in particular to Pp (Rd ), the space of probability measures endowed with the p-Wasserstein distance. 2. Curves of Maximal Slope. We can look for absolutely continuous curves 2 ((0, +∞); P2 (Rd )) which satisfy the differential form of the µt ∈ ACloc Energy inequality d 1 1 φ(µt ) ≤ − |µ0 |2 (t) − |∂φ|2 (µt ) ≤ −|∂φ|(µt ) · |µ0 |(t) dt 2 2 (5.85) for L 1 -a.e. t ∈ (0, +∞). This definition has been introduced (in a slight different form) in [26] and further developed in [27, 49, ?], it is still purely metric and it provides a general strategy to deduce differential properties satisfied by the limit curves of the Minimizing Movement scheme. 3. The pointwise differential formulation. It is the notion we adopted in Definition 5.2 and which requires the richest structure: since we have at our disposal a notion of tangent space and the related concepts of velocity vector field v t and (sub)differential ∂φ(µt ), we can reproduce the simple definition of gradient flow modeled on smooth Riemannian manifold, i.e. v t ∈ −∂φ(µt ). 86 (5.86) Questa sezione e’ da completare, anche inserendo parte del materiale finale. The a priori assumption that µt ∈ P2a (Rd ) avoids subtle technical complications arising from the introduction of “plan” (or measure valued) subdifferentials instead of the simpler vector fields. The general theory, also covering the case of an underlying Hilbert space of infinite dimension, has been presented in [?]. Similar approach has been developed in [?]. 4. Systems of Evolution Variational Inequalities (E.V.I.). In the case of λ-convex functionals along geodesics in P2 (Rd ), one can try to find solutions of the family of “metric” variational inequalities 1 d 2 λ W (µt , ν) ≤ φ(ν) − φ(µt ) − W22 (µt , ν) ∀ ν ∈ D(φ). 2 dt 2 2 (5.87) This formulation provides the best kind of solutions, for which in particular one can prove not only uniqueness, but also error estimates and the regularization effect, follwoing the same argument of Theorem 5.7. In any case, the variational approximation scheme is the basic tool for proving existence of gradient flows: at the highest level of generality, when the functional φ does not satisfy any convexity or regularity assumption, one can only hope to prove the existence of a limit curve which will satisfy a sort of “relaxed” differential equation: this point of view is developed in [8, Sec. 11.1.3]. When φ satisfies λ-convexity/compactness condition Theorems 5.3 and 5.8 showed that all these notions essentially coincide. We shall see three different kinds of arguments which give some insights for this problem and reflect different properties of the functional. 1. The first one is a direct development of the compactness method: passing to the limit in the discrete equation satisfied at each step by the approximating sequence Mτn , one tries to write a relaxed form of the limit differential equation, assuming only narrow convergences of weak type. It may happen that under suitable closure and convexity assumptions on the sections of the subdifferential, which should be checked in each particular situation, this relaxed version coincides with the stronger one, and therefore one gets an effective solution to (5.3). In general, however, even under some simplifying assumptions (p = 2, all the measures are regular), the results of this direct approach are not completely satisfactory: it could be considered as a first basic step, which should be common to each attempt to apply the Wasserstein formalism for studying a gradient flow. In order to clarify the basic arguments of this preliminary strategy, we will try to explain it at the end of this section (in a simplified setting, to keep the presentation easier) without invoking all the abstract results of the first part of this book. 87 Materiale temare da risis- 2. The second approach (see Section ??) involves the regularity of the functional according to Definition ??, and works for every p > 1; in particular it can applied to λ-convex functionals. In this case, thanks to Theorem ??, the gradient flow equation is equivalent to the maximal slope condition, which is of purely metric nature. We can then apply the abstract theory we presented in Chapter 2 and therefore we can prove that any limit curve µ of (5.56) is a solution to (5.3). The key ingredient, which allows to pass to the limit in the “doubly nonlinear” differential inclusion (without any restrictions on the regularity of µt ) and to gain a better insight on the limit than the previous simpler method, is the refined discrete energy estimate (??) (related to De Giorgi’s variational interpolation (??)) and the lower semicontinuity of the slope, which follows from the regularity of the functional. 3. The last approach, presented in Section 5.2, is based on the general estimates of Theorem ?? of Chapter 4: it can be performed only in P2 (Rd ) and imposes on the functionals the strongest condition of λ-convexity along generalized geodesics. Despite the strong convexity requirements on φ, which are nevertheless satisfied by all the examples of Section 4.5 in P2 (Rd ), this approach has many nice features: • it does not require compactness assumptions of the sublevels of φ in P2 (Rd ): the convergence of the “Minimizing movement” scheme is proved by a Cauchy-type estimate. • The gradient flow equation (5.3) is satisfied in the limit, since Theorem ?? provides directly the system of evolution variational inequalities (5.7). • It provides our strongest results in terms of regularity, asymptotic behaviour, and error estimates for the continuous solution, which can be directly derived from the general metric setting. • It allows for general initial data µ0 which belong to the closure of the domain of φ: in particular, one can often directly consider a sort of “nonlinear fundamental solution” for initial values which are concentrated in one point. • The Γ-convergence of functionals, in the sense of Lemma ??, induces the uniform convergence of the corresponding gradient flows. 88 6 Applications to Evolution PDE’s First of all we mention the basic (but formal, at this level) example, which provides one of the main motivations to study this kind of gradient flows. 6.1 Gradient flows and evolutionary PDE’s of diffusion type In the space-time open cylinder Rd × (0, +∞) we look for nonnegative solutions u : Rd × (0, +∞) of a parabolic equation of the type δF =0 ∂t u − ∇ · u∇ δu in Rd × (0, +∞), (6.1) where δF (u) = −∇ · Fp (x, u, ∇u) + Fz (x, u, ∇u). δu is the first variation of a typical integral functional as in (4.59a,b). Z F (u) = F (x, u(x), ∇u(x)) dx (6.2) (6.3) Rd associated to a (smooth) Lagrangian F = F (x, z, p) : Rd × [0, +∞) × Rd → R. Observe that (6.1) has the following structure: ∂t u + ∇ · (uv) = 0 uv = u∇ψ ψ=− δF (u) δu (continuity equation), (6.4a) (gradient condition), (6.4b) (nonlinear relation). (6.4c) Observe that in the case when F depends only on z = u then we have δF (u) = Fz (u), δu u∇Fz (x, u) = ∇LF (u), LF (z) := zF 0 (z) − F (z). (6.5) Since we look for nonnegative solutions having (constant, by (6.4a), normalized) finite mass Z u(x, t) ≥ 0, u(x, t) dx = 1 ∀t ≥ 0, (6.6) Rd and finite quadratic momentum Z |x|2 u(x, t) dx < +∞ ∀ t ≥ 0, (6.7) Rd recalling Example 4.5.1, we can identify u with the measures µt := u(·, t) · L d , 89 (6.8) and we consider F as a functional defined in P2 (Rd ). Then any smooth positive function u is a solution of the system (6.4a,b,c) if and only if µ is a solution in P2 (Rd ) of the Gradient Flow equation (5.3) for the functional F . Observe that (6.4a) coincides with (5.4b), the gradient constraint (6.4b) corresponds to the tangent condition v t ∈ Tanµt P2 (Rd ) of (5.4c), and the nonlinear coupling ψ = −δF (u)/δu is equivalent to the differential inclusion v t ∈ −∂F (µt ) of (5.4c). At this level of generality the equivalence between the system (6.4a,b,c) and the evolution equation (5.3) is known only for smooth solution (which, by the way, may not exist); nevertheless, the point of view of gradient flow in the Wasserstein spaces, which was introduced by F. Otto in a series of pioneering and enlightening papers [54, 45, 56, 57], still presents some interesting features, whose role should be discussed in each concrete case. 6.1.1 Changing the reference measure In many situations the choice of the Lebesgue measure L d as a reference measure, thus inducing the identification (6.8), looks quite natural; nevertheless there are some interesting cases where a different measure γ plays a crucial role (see e.g. the example of Section 4.5.4 and the next Section 6.3) and it may happen that an evolution PDE takes a simpler form by an appropriate choice of γ. From the Wasserstein point of view, an integral functional φ inducing the gradient flow is defined on measures µ but its explicit form depends on the reference γ, so that different PDE’s involving the density of µ w.r.t. γ could arise from the same functional. Let us suppose, e.g., that φ takes the integral form Z φ(µ) = Fγ (ρ) = F̃ (x, ρ(x), ∇ρ(x)) dγ(x) if µ = ργ, (6.9) Rd where γ is a Radon measure induced by the (smooth) potential V , i.e. γ := e−V L d ∈ P2 (Rd ). (6.10) Since dµ = eV ρ, ∇u = eV (ρ∇V + ∇ρ), (6.11) dL d the integrand F̃ (x, z̃, p̃) of (6.9) is related to the integrand F of the representation (6.3) by the relation u= z̃ := eV (x) z, p̃ = eV (x) (z∇V (x) + p) (6.12) F (x, z, p) = e−V F̃ (x, z̃, p̃) = e−V (x) F̃ x, eV (x) z, eV (x) (z∇V (x) + p) . In this case it could be better to write the solution of the gradient flow µt generated by φ in terms of the density ρt := dµt dµt = eV , dγ dL d 90 (6.13) and to use the differential operators associated with γ ∇γ ρ :=eV ∇ e−V ρ = ∇ρ − ρ∇V, V −V ∇γ · ξ :=e ∇ · (e (6.14a) ξ) = ∇ · ξ − ξ · ∇V, (6.14b) which satisfy the “integration by parts formulae” with respect to the measure γ Z Z Z Z ξ · ∇ζ dγ = − ζ∇γ · ξ dγ, ∇γ · ξ ζ dγ = − ∇γ ζ · ξ dγ, (6.15) Rd Rd Rd Rd when ζ ∈ Cc∞ (Rd ), ξ ∈ Cc∞ (Rd ; Rd ). The system (6.4a,b,c) preserves the same structure and takes the form ∂t ρ + ∇γ · (ρv) = 0 ρv = ρ∇ψ ψ=− where δFγ (ρ) δρ (continuity equation), (6.16a) (gradient condition), (6.16b) (nonlinear relation). (6.16c) δFγ (ρ) = −∇γ · F̃p̃ (x, ρ, ∇ρ) + F̃z̃ (x, ρ, ∇ρ). δρ (6.17) For, (6.16a) (resp. (6.16b)) can be transformed into (6.4a) (resp. (6.4b)), simply by multiplying the equation by e−V and recalling (6.14b). The equivalence of (6.16c) and (6.4c) follows by a direct computation starting from (6.12): by (6.11) we get (with the obvious convention to evaluate F in (x, u, ∇u) and F̃ in (x, ρ, ∇ρ)) δF (u) = Fz − ∇ · Fp = F̃z̃ + ∇V · F̃p̃ − ∇ · F̃p̃ δu δFγ (ρ) = F̃z̃ − ∇γ · F̃p̃ = . δρ Remark 6.1 (Equations in bounded sets and Neumann B.C.) The possibility to change the reference measure is also useful to study evolution equations in a bounded open set Ω ⊂ Rd : they correspond to a measure γ whose support is included in Ω, e.g. γ := L d |Ω Observe that in any case the family of time-dependent measures µt = ut L d |Ω , which solves of the gradient flow equation according to Definition 5.2, still satisfies the continuity equation (5.4b) in Rd × (0, +∞). This can be seen as a weak formulation of the continuity equation for ut in Ω × (0, +∞) with Neumann boundary conditions on ∂Ω × (0, +∞) ∂t ut + ∇ · (ut v t ) = 0 in Ω × (0, +∞), 91 ut v t · n = 0 on ∂Ω × (0, +∞). (6.18) For, since q := uv ∈ L1 (Ω × (a, b)) for every bounded interval with 0 < a < b < +∞, by choosing a test function ζ(x, t) = η(t)ϕ(x) with η ∈ Cc∞ (0, ∞), ϕ ∈ Cc∞ (Rd ), we obtain Z Z +∞ Z Z +∞ η(t)q(x, t) dt · ∇ϕ(x) dx = − η 0 (t)ρ(x, t) dt ϕ(x) dx. Ω 0 Ω 0 Selecting a family ηε converging to characteristic function of a time interval Rb (a, b) b (0, +∞), we easily see that q̃ a,b (x) := a q(x, t) dt satisfies Z Z ρ(x, b) − ρ(x, a) ϕ(x) dx ∀ ϕ ∈ Cc∞ (Rd ), (6.19) q̃ a,b (x) · ∇ϕ(x) dx = Ω Ω which shows that ∇ · q̃ a,b ∈ L1 (Ω) and ∂n q̃ a,b = 0 on ∂Ω for every 0 < a < b < +∞. 6.2 The linear transport equation for λ-convex potentials Let V : Rd → (−∞, +∞] be a proper, l.s.c. and λ-convex potential. We are looking for curves t 7→ µt ∈ P2 (Rd ) which solve the evolution equation ∂ µt + ∇ · (µt v t ) = 0, ∂t with −v t (x) ∈ ∂V (x) for µt -a.e. x ∈ Rd , (6.20) which is the gradient flow in P2 (Rd ) of the potential energy functional discussed in Example 3.4: Z V(µ) := V (x) dµ(x). (6.21) Rd If V is differentiable, (6.20) can also be written as ∂ µt (x) = ∇ · (µt (x)∇V (x)) ∂t in the distribution sense. (6.22) In the statement of the following theorem we denote by T the λ-contractive semigroup on D(V ) ⊂ Rd induced by the differential inclusion d Tt (x) ∈ −∂V (Tt (x)), dt T0 (x) = x ∀ x ∈ D(V ). Recall also that, according to Brezis theorem, each point of differentiability t > 0. d dt Tt (x) (6.23) equals −∂ ◦ V (Tt (x)) at Theorem 6.2 For every µ0 ∈ P2 (Rd ) with supp µ0 ⊂ D(V ) there exists a unique solution (µt , v) of (6.20) satisfying Z lim µt = µ0 , |v t (x)|2 dµt (x) ∈ L1loc (0, +∞); (6.24) t↓0 Rd this solution is the gradient flow of V in the sense of the E.V.I. formulation (6.20) and the Energy Identity (5.9) of Theorem 5.3, it induces a λ-contractive 92 semigroup on µ ∈ P2 (Rd ) : supp(µ) ⊂ D(V ) , it exhibits the regularizing effect and the asymptotic behaviour as in Theorem 5.7. Moreover, for every t > 0 we have the representation formula v t (x) = −∂ ◦ V (x) µt = (Tt )# µ0 , for µt -a.e. x ∈ Rd . (6.25) Proof. Proposition 3.5 shows that the functional V satisfies (5.1a), (5.1b), (5.1d); it is also easy to check that (5.60) holds. On the other hand, V does not satisfy (5.1c), thus our simplified existence results can not be directly applied. Nevertheless, the more general theory of [8] covers also this case and yields the present result. In any case, the solution to (6.20) can also be directly constructed by the representation formula (6.25). For, it is immediate to check directly that if we choose µ0 of the type µ0 := K X K X αk ≥ 0, αk δxk , k=1 αk = 1, xk ∈ D(V ), (6.26) k=1 then µt = K X αk δTt (xk ) = (Tt )# µ0 (6.27) k=1 solves (6.20) (see also Section 2.6, where the connection between characteristics and solutions of the continuity equation is studied in detail), whereas (6.24) follows by the energy identity Z b |∂ ◦ V (Tt (x))|2 dt + φ(Tb (x)) = φ(Ta (x)) ∀ x ∈ D(V ). a Arguing as in the proof of Theorem 2.23 we also get for every σ ∈ D(V) and every γ ∈ Γo (µt , σ) Z 1 d 2 W (µt , σ) = hv t (x), x − yi dγ t (x, y) 2 dt 2 d d ZR ×R λ V (y) − V (x) − |x − y|2 dγ t (x, y) ≤ 2 Rd ×Rd λ = V(σ) − V(µt ) − W22 (µt , σ). (6.28) 2 µt = (Tt )# µ0 thus solves the E.V.I. formulation (5.7) of the gradient flow for every initial datum µ0 which is a convex combination of Dirac masses in D(V ). A standard approximation argument via (5.17) and Theorem 5.7 yields the same result for µt = (Tt )# µ0 and every admissible initial measure µ0 ∈ D(V): for, being supp µ0 ⊂ D(V ), we can find a sequence (νn ) ⊂ D(V) of convex combination of Dirac masses νn := Kn X αn,k δxn,k , αn,k ≥ 0, k=1 Kn X αn,k = 1, xn,k ∈ D(V ), (6.29) k=1 such that νn → µ0 in P2 (Rd ). 93 6.3 6.3.1 Kolomogorov-Fokker-Planck equation Relative Entropy and Fisher Information Let us consider a l.s.c. λ−convex potential V : Rd → (−∞, +∞] with Ω := Int D(V ) 6= ∅; (6.30) for the sake of simplicity, we assume that the reference measure induced by the potential V is a probability measure with finite quadratic moment, i.e. γ := e−V L d ∈ P2 (Rd ). (6.31) This condition, up to a renormalization, is always satified if, e.g., λ > 0. Observe that the density e−V of γ with respect to L d is 0 outside Ω = D(V ). We adopt the convention to write a measure µ ∈ P2a (Rd ) supported in Ω as µ = uL d |Ω = ργ, u = e−V ρ; the Relative Entropy (see Section 3.3) of µ w.r.t. γ is defined as Z Z H(µ|γ) = ρ log ρ dγ = u log u + V dx, Ω (6.32) (6.33) Ω whereas the Relative Fisher Information is defined as 2 Z Z 2 Z ∇ρ 2 ∇ρ ∇u + u∇V I(µ|γ) := dγ = dx ρ dµ = ρ u Ω Ω Ω (6.34) 1,1 whenever u, ρ ∈ Wloc (Ω) (recall that V is locally Lipschitz in Ω); as usual, we set H(µ|γ) = +∞ if µ is not absolutely continuous, and I(µ|γ) = +∞ if 1,1 ρ 6∈ Wloc (Ω). Let us collect in the following proposition the main properties of these two functionals, we already discussed in Sections 3 and 4. Proposition 6.3 (Entropy and Fisher information) Let V, γ as in (6.30) and (6.31). i) λ-convexity of the Relarive Entropy: The functional µ 7→ H(µ|γ) is λ-displacement convex and it also satisfies the “generalized” convexity assumption of (5.60). ii) Subdifferential and slope of the Entropy: A measure µ = ρ γ = uL d |Ω belongs to D(∂H) = D(|∂H|) iff I(µ|γ) < +∞, i.e. 1,1 ρ, u ∈ Wloc (Ω) and ∇ρ ∇u = + ∇V ∈ L2 (µ; Rd ); ρ u 94 (6.35) Questa proposizione si potrebbe spostare anche alla fine del Capitolo 4. in this case ξ = ∂ ◦ H(µ|γ) ⇐⇒ ξ= ∇ρ ∈ L2 (µ; Rd ), ρ (6.36) so that Z I(µ|γ) = 2 ξ dµ = |∂H|2 (µ). (6.37) Ω iii) Variational inequality for the logarithmic gradient: If I(µ|γ) < +∞, the logarithmic gradient ξ = ∇ρ ρ satisfies Z λ tσµ −x ·ξ+ |tσµ −x|2 dµ ≤ H(σ|γ)−H(µ|γ) ∀ σ ∈ P2a (Rd ). (6.38) 2 Ω iv) Log-Sobolev inequality: If λ > 0 then H(µ|γ) ≤ 1 I(µ|γ) 2λ ∀ µ ∈ P2a (Rd ). (6.39) v) Derivative of the Entropy along curves: Let µ : t ∈ [0, T ] 7→ µt = ρt γ ∈ P2 (Rd ) be a continuous family of measures satisfying the continuity equation ∂t µ + ∇ · v µ = 0 in D0 (Rd × (0, T )) (6.40) for a Borel vector field v with Z TZ 2 v t dµt dt < +∞, 0 Ω Z T I(µt |γ) dt < +∞. (6.41) 0 Then the map t 7→ H(µt |γ) is absolutely continuous in [0, T ] and for L 1 a.e. t ∈ (0, T ) its derivative is Z Z ∇ρt d dµt = v t · ∇ρt dγ. (6.42) H(γ|µt ) = vt · dt ρt Ω Ω Proof. i) follows from Proposition 3.5 and 3.11. The generalized convexity property (5.60) follows by analogous arguments (see [8, Prop. 9.3.9]). ii) and iii) have been proved in 4.20 and (4.36). (6.39) follows from (5.39) v) follows from the general Chain rule (4.55). 6.3.2 Wasserstein formulation of the Kolmogorov-Fokker-Planck equation Under the same assumption (6.30), (6.31) of the previous section, and recalling the differential operators of (6.14a,b), let us introduce the Laplacian operator ∆γ induced by γ: ∆γ ρ := ∇γ · (∇ρ) = eV ∇ · e−V ∇ρ = ∆ρ − ∇ρ · ∇V, (6.43) 95 and its formal adjoint (with respect to the Lebesgue measure) Fokker-Planck operator ∆∗γ u := e−V ∆γ eV u = ∇ · ∇u + u∇V . (6.44) For smooth function with compact support in Ω they satisfy Z Z Z − ∆γ ρ ζ dγ = ∇ρ · ∇ζ dγ = − ρ ∆γ ζ dγ, Ω Ω Ω Z Z − ∆∗γ u ζ dx = − u ∆γ ζ dx. Ω (6.45) (6.46) Ω In the case of the centered Gaussian measure with variance λ−1 . 2 λ 1 1 e− 2 |x| L d V (x) = λ|x|2 − log(λ/2π) , γ = 2 (2π/λ)d/2 (6.47) ∆γ is the Ornstein-Uhlenbeck operator ∆ − λx · ∇. Definition 6.4 (“Wasserstein” solutions of K.F.P. equations) A continuous family µt = ρt γ = ut L d |Ω ∈ C 0 (0, +∞; P2 (Rd )) is a Wasserstein solution of the Kolmogorov-Fokker-Plank equation if t 7→ I(µt |γ) belongs to L2loc (0, +∞) so that for L 1 -a.e. t ∈ (0, +∞) 1,1 ρt , ut ∈ Wloc (Ω), ξt = and ∂t µt − ∇ · µt i.e. Z +∞ Z o − ∂t ζ + Ω ∇ρt ∇ut = + ∇V ∈ L2 (µt ; Rd ), ρt ut ∇ρt =0 ρt in D 0 (Rd × (0 + ∞)), ∇ρt · ∇ζ dµt dt = 0 ρt ∀ ζ ∈ Cc∞ (Rd × (0, +∞)). Equivalently, ρt satisfy the weak formulation Z +∞ Z − ρ ∂t ζ + ∇ρ · ∇ζ dγ dt = 0 ∀ ζ ∈ Cc∞ (Rd × (0, +∞) 0 (6.48) (6.49) (6.50) (6.51) Ω of ∂t ρt − ∆γ ρt = 0 in Ω × (0, +∞), e−V ∂n ρt = 0 on ∂Ω × (0, +∞) (6.52) Remark 6.5 In terms of the Lebesgue density ut , (6.49) reads Z +∞ Z − u ∂t ζ + ∇u + u∇V · ∇ζ dx dt = 0 ∀ ζ ∈ Cc∞ (Rd × (0, +∞), 0 Ω (6.53) corresponding to the Fokker-Planck equation ∂t u − ∆∗γ u = ∂t u − ∇ · ∇u + u∇V ) = 0 in Ω × (0, +∞), (6.54) with variational boundary conditions ∇u + u∇V · n = 0 on ∂Ω × (0, +∞). 96 We introduce the (narrowly) closed (geodsically and linearly) convex subset of P2 (Rd ) n o P2 (Ω) := µ ∈ P2 (Rd ) : supp(µ) ⊂ Ω . (6.55) Theorem 6.6 For every µ0 ∈ P2 (Ω) there exists a unique Wasserstein solution µt = ρt γ = ut L d |Ω of the Kolmogorov-Fokker-Planck equation (6.49) satisfying limt↓0 µt = µ0 in P2 (Rd ) and it coincides with the Wasserstein gradient flow generated by the functional φ(µ) := H(µ|γ). The maps St : µ0 7→ µt , t ≥ 0, define a continuous λ-contractive semigroup in P2 (Ω) which can be chacterized by the system of E.V.I. 1 d 2 λ W (µt , σ) + W22 (µt , σ) ≤ H(σ|γ) − H(µt |γ) 2 dt 2 2 ∀ σ ∈ P2 (Ω); (6.56) it exhibits the regularizing effect H(µt |γ) < +∞, I(µt |γ) < +∞ ∀ t > 0, (6.57) with, for λ ≥ 0, H(µt |γ) ≤ 1 2 W (µt , γ), 2t 2 I(µt |γ) ≤ 1 2 W (µt , γ). t2 2 (6.58) The map t 7→ e2λt I(µt |γ) is non increasing and it satisfies the Energy Identity Z H(µb |γ) + b I(µt |γ) dt = H(µa |γ) ∀ 0 ≤ a ≤ b ≤ +∞. (6.59) a When λ > 0 the Asymptotic behaviour of µt as t0 ≤ t → +∞ is governed by W2 (µt , γ) ≤ e−λ(t−t0 ) W2 (µt0 , γ), H(µt |γ) ≤ e−2λ(t−t0 ) H(µt0 |γ), I(µt |γ) ≤ e−2λ(t−t0 ) I(µt0 |γ), (6.60) Moreover, for every t > 0 (and also for t = 0 provided I(µ0 |γ) < +∞) µ tµtt+h − i ∇ρt = in L2 (µt ; Rd ), h↓0 h ρt H(µt+h |γ) − H(µt |γ) ∃ lim = I(µt |γ). h↓0 h ∃ lim (6.61) Proof. Is is not difficult to check that D(φ) = P2 (Ω). For, D(φ) contains all the measures of the type µx0 ,ρ := 1 χB (x ) · γ γ(Bρ (x0 )) ρ 0 97 with x0 ∈ supp γ, ρ > 0, (6.62) and their convex combinations, so that X αi δxi ∈ D(φ) if xi ∈ supp γ, αi ≥ 0, X i αi = 1. i Since the subset of all the finite convex combinations of δ-measures concntrated in Ω is dense in P2 (Ω), we get (6.62). By Proposition 6.3 the Relative Entropy functional µ 7→ H(µ|γ) satisfies all the assumptions of Theorems 5.3, 5.7, and 5.8(ii). Theorem 6.6 is a simple transposition of the results of Section 4, taking also into account the particular form of the subdifferential of H expressed by (6.36) and the fact that γ is the unique minimum of H with H(γ|γ) = 0. We conclude this section by briefly discussing some further properties of the semigroup constructed by Theorem 6.6. First of all we set n o Bγ := ρ ∈ L1 (γ) : ργ ∈ P2 (Rd ) , (6.63) we introduce the weighted Sobolev space n o 1,2 Wγ1,2 (Ω) := ρ ∈ Wloc (Ω) : ∇ρ ∈ L2 (γ; Rd ) , (6.64) and the “transition probabilities” νx,t = ϑx,t γ νx,t := St [δx ] with densities ϑx,t := dνx,t ∈ L1 (γ) ∀ x ∈ Ω, t > 0. (6.65) dγ As we shall see in the next section, it is possible to find densities ϑx,t such that for every t > 0 the map (x, y) ∈ Ω × Ω → ϑx,t (y) is Borel. (6.66) Theorem 6.7 (The associated Markovian semigroup) Let (St )t≥0 be the semigroup constructed in the previous Theorem 6.6. Extension to a contraction semigroup in Lp (γ) There exists a unique strongly continuous semigroup of linear contraction operators (St )t≥0 in L1 (γ) such that St [ρ0 ] = ρt ⇐⇒ St [ρ0 γ] = ρt γ ∀ ρ0 ∈ Bγ . (6.67) For every p ∈ [1, +∞] (the restriction to Lp (γ) of ) St is a C 0 (only weakly∗ continuous, if p = +∞) contraction semigroup in Lp (γ) St [ρ] p ≤ ρ p ∀ ρ ∈ Lp (γ), (6.68) L (γ) L (γ) it is order preserving ρ 0 ≤ ρ1 ⇒ 98 St [ρ0 ] ≤ St [ρ1 ], (6.69) and regularizing, since St (L∞ (γ)) ⊂ Cb (Ω) Moreover St Lip(Ω) ⊂ Lip(Ω) −λt Lip(St [ρ]; Ω) ≤ e Lip(ρ, Ω) ∀ t > 0. ∀ t > 0, ∀ ρ ∈ Lip(Ω). (6.70) (6.71) Representation formula The semigroups S, S admit the representation formula for γ-a.e. x ∈ Ω Z St [µ] = ρt γ with ρt (x) = ϑy,t (x) dµ(y), (6.72) d ZR St [ρ0 ] = ρt with ρt (x) = ϑy,t (x)ρt (y) dγ(y) (6.73) Ω Dirichlet form In L2 (γ) St coincides with the (analytic) semigroup S̃ associated to the symmetric Dirichlet form with domain Wγ1,2 (Ω) ⊂ L2 (γ) Z aγ (ρ, η) := ∇ρ · ∇η dγ ∀ ρ, η ∈ Wγ1,2 (Ω). (6.74) Ω In particular, if ρ0 ∈ L2 (γ) then ρt = St [ρ0 ] ∈ Wγ1,2 (Ω) ∀ t > 0, ∂t ρ ∈ 0 (0, +∞; L2 (γ)), and Cloc Z Z ∂t ρ η dγ + ∇ρ · ∇η dγ = 0 ∀ η ∈ Wγ1,2 (Ω) ∀ t > 0. (6.75) Ω Ω Symmetry of the transition densities For every t > 0 the transition densities ϑx,t satisfies for γ × γ-a.e. (x, y) ∈ Ω × Ω, (6.76) so that the “adjoint” representation formula holds Z ϑx,t (y)ρt (y) dγ(y), St [ρ0 ] = ρt with ρt (x) = (6.77) ϑx,t (y) = ϑy,t (x) Ω which provides the continuous representative of ρt when ρ0 ∈ L∞ (γ). Proof. Most of the results of the previous theorem are a direct consequence of the “linearity” of the semigroup St and of its regularizing effect; therefore, we postpone their proof to the next section, where we will discuss from a general point of view the construction of a Markov semigroup starting from a “linear” Wasserstein semigroup. Here we only consider the last two properties, establishing the link with the “Dirichlet form” approach. In order to check the equivalence, we observe that for every initial datum ρ0 ∈ L∞ (γ) with ρ0 (x) ≥ r > 0 for γ-a.e. x ∈ Ω, 99 Si puo’ dimostrare a partire dalla formula di rappresentazione (6.72) che St è λcontractive anche per la distanza di Wasserstein Wp , 1 ≤ p ≤ 2: aggiungere? Il caso p > 2 pero’ non riesco a dadurlo. the unique solution ρt of (6.75) still satifies the lower bound ρt ≥ r and the estimates Z Z +∞ Z Z 2 2 2 sup t |∂t ρt | dγ < +∞, |∇ρt | dγ dt = |ρ0 |2 dγ < +∞ (6.78) t>0 Ω 0 Ω Ω In particular, Z +∞ I(ρt γ|γ) dt ≤ r −1 0 Z +∞ Z 0 |∇ρt |2 dγ dt < +∞, (6.79) Ω so that the measures µt = ρt γ provides the (unique) Wasserstein solution of (6.51) (since γ is a finite measure, Cc∞ (Rd ) is a subset of Wγ1,2 (Ω)). Therefore the semigroups S and S̃ coincide on L2 (γ)-densities bounded away from 0: a simple density argument shows that they coincide on L2 (γ). Since aγ is a symmetric form, St is selfadjoint in L2 (γ) and we have Z Z ρ St [η] dγ = St [ρ] η dγ ∀ ρ, η ∈ L2 (γ). (6.80) Ω Ω For every bounded nonnegative ρ, η ∈ L∞ (γ) (6.73) yields Z Z Z Z ρ(x) ϑy,t (x)η(y) dγ(y) dγ(x) = ϑx,t (y)ρ(x) dγ(x) η(y) dγ(y). Ω Ω Ω Ω By (6.66) and Fubini’s Theorem, we get (6.76). (6.81) Remark 6.8 The measures (νx,t )t≥0 are a Markovian semigroup of kernels .associated with (St )t≥0 [48, II-4] 6.3.3 The construction of the Markovian semigroup Among general λ-contracting semigroups in P2 (Rd ) the Kolmogorov-FokkerPlanck equation enjoys several other interesting features, due to its linearity. As we will see in the next Lemma, this is a direct consequence of the following “linearity condition” ) ξ i = ∂ ◦ φ(µi ), αi ≥ 0, α1 + α2 = 1 =⇒ ξ ∈ ∂φ(α1 µ1 + α1 µ2 ). (6.82) ξ(α1 µ1 + α2 µ2 ) = α1 ξ 1 µ1 + α2 ξ 2 µ2 satisfied by the Wasserstein subdifferential of φ(µ) := H(µ|γ). The aim of this section is to show how easily one can deduce contraction and regularizing estimates starting from a “linear” Wasserstein semigroup; in paricular, the construction of the fundamental solutions is particularly simple. It should not be too difficult to extend the following results to infninite dimensional underlying spaces. 100 Lemma 6.9 (Linearity of the gradient flow) Let φ : P2 (Rd ) → (−∞, +∞] be a functional satisfying (5.1a,b,c,d) and let St be the λ-contractive semigroup generated by its gradient flow on D(φ) as in Theorem 5.7. If φ satisfies (6.82), then the semigroup St satisfies the “linearity” property St [α1 µ1 +α2 µ2 ] = α1 St [µ1 ]+α2 St [µ2 ] ∀ µ1 , µ2 ∈ D(φ), α1 , α2 ≥ 0, α1 +α2 = 1. (6.83) Proof. Take two initial data µ1 , µ2 ∈ D(φ) and set µi,t := St [µi ], v i,t = −∂ ◦ φ(µi,t ) their velocity vector fields, µt = α1 µ1,t + α2 µ2,t , and define the vector field v t so that v t µt := α1 v 1,t µ1,t + α2 v 2,t µ2,t . (6.84) Assuming αi > 0 and introducing the densities ρi,t := dµi,t , dµt so that v t = α1 ρ1,t v 1,t + α2 ρ2,t v 2,t , α1 ρ1,t + α2 ρ2,t = 1, it is easy to check that for every t > 0 Z Z |v t |2 dµt = |α1 ρ1,t v 1,t + α2 ρ2,t v 2,t |2 dµt Rd Rd Z Z 2 ≤ α1 |v 1,t | ρ1,t dµt + α2 |v 2,t |2 ρ2,t dµt Rd Rd Z Z = α1 |v 1,t |2 dµ1,t + α2 |v 2,t |2 dµ2,t . Rd (6.85) Rd It follows that the map t 7→ kv t kL2 (µt ;Rd ) belongs to L2loc (0, +∞) and, by linearity, µt satisfies the continuity equation ∂t µt + ∇ · (v t µt ) = 0 in Rd × (0, +∞). (6.86) Since v t ∈ ∂φ(µt ) by (6.82), µt is the unique gradient flow with initial datum α1 µ1 + α2 µ2 , that is µt = St (α1 µ1 + α2 µ2 ). Let γ be a nonnegative Borel measure on Rd , with support D and let Bγ be defined as in (6.63). Theorem 6.10 Let St , t ≥ 0, be a continuous λ-contracting semigroup defined in P2 (D) satisfying the follwing assumptions: S0 [µ] = µ, St Ss [µ] = St+s [µ] ∀ µ ∈ P2 (D), s, t ≥ 0. (6.87a) St [µ] γ St [αµ + βν] = αSt [µ] + βSt [ν] ∀ µ ∈ P2 (D), t > 0. (6.87b) ∀ µ, ν ∈ P2 (D), α, β ≥ 0, α + β = 1. (6.87c) Extension to L1 (γ) There exists a unique C 0 semigroup (denoted by St ) of bounded linear operators on L1 (γ) such that St [ργ] = St [ρ]γ 101 ∀ ρ ∈ Bγ . (6.88) Contraction and order preserving properties St is in fact a contraction and order preserving semigroup, i.e. St [ρ] 1 ≤ ρ 1 , ρ1 ≤ ρ2 ⇒ St [ρ1 ] ≤ St [ρ2 ]. (6.89) L (γ) L (γ) Representation formula Denoting by νt,x = θt,x γ the “transition probabilities” νx,t := St [δx ] with densities ϑx,t := dνx,t ∈ L1 (γ) dγ ∀ x ∈ D, t > 0, (6.90) the semigroup S admits the representation formula Z St [µ] = ρt γ with ρt (x) = ϑy,t (x) dµ(y) for γ-a.e. x ∈ D. (6.91) Rd Invariant measure and Markov property If γ ∈ P2 (Rd ) is an invariant measure, i.e. St [γ] = γ ∀ t ≥ 0, (6.92) then St Lp (γ) ⊂ Lp (γ), p (6.93) 0 and the restriction of St to L (γ) is a C contraction semigroup. Proof. Let us first extend S by homogeneity to the cone M2 (D) of nonnegative finite measures with finite second moment n o M2 (D) := λµ : µ ∈ P2 (D), λ ≥ 0 (6.94) simply by setting St [λµ] = λSt [µ] ∀ µ ∈ P2 (D), λ ≥ 0. (6.95) It is easy to check that this extension preserves properties (6.87a,b,c) moreover (6.83) holds for every couple of nonnegative coefficients α, β: St [αµ + βν] = αSt [µ] + βSt [ν] ∀ µ, ν ∈ M2 (D), α, β ≥ 0. (6.96) The uniqueness of S is then immediate: if ργ ∈ M2 (D) then by (6.88) and (6.87b) St [ργ] St [ρ] = . (6.97) γ Being St coontinuous, it is sufficient to determine it on the set Z n o Cγ := ρ ∈ L1 (γ) : |x|2 |ρ(x)| dγ < +∞ , (6.98) which is clearly dense in L1 (γ); since each ρ ∈ Cγ can be decomposed as ρ = ρ+ − ρ− where 102 ρ+ γ, ρ− γ ∈ M2 (D) (6.99) St [ρ] should be equal to the difference between St [ρ+ ] and St [ρ− ]. Let us check that this representation is independent of the particular decomposition: if ρ0= , ρ0− is another admissible couple as in (6.99), then ρ+ + ρ0− = ρ0+ + ρ− and therefore St [ρ+ ] + St [ρ0− ] = St [ρ+ + ρ0− ] = St [ρ0+ + ρ− ] = St [ρ0+ ] + St [ρ− ] showing that St [ρ+ ] − St [ρ− ] = St [ρ0+ ] − St [ρ0− ]. Choosing in particular ρ+ := max[ρ, 0] ρ− := − min[ρ, 0] we get the bound St [ρ] 1 ≤ St [ρ+ ] 1 +St [ρ− ] 1 = ρ+ 1 +ρ− 1 = ρ 1 L (γ) L (γ) L (γ) L (γ) L (γ) L (γ) (6.100) which shows that St is non expansive. Therefore, it can also be extended to a non expansive linear operator on L1 (γ). Let us now check check that S is a C 0 -semigroup in L1 (γ); it is well known (see e.g. [59]) that it is sufficient to prove that S is weakly continuous, i.e. St [ρ] * ρ weakly in L1 (γ) as t ↓ 0 ∀ ρ ∈ L1 (γ). (6.101) Since St is non expansive, it is sufficient to check (6.101) on the dense subset Cγ ; by linearity we can further reduce our analysis to ρ ∈ Bγ . Since St is induced by St through (6.88) on this set, we know that St [ρ]γ → ργ in P2 (Rd ) as t ↓ 0, (6.102) and this yields the required weak convergence in L1 (γ) thanks to the next Lemma. By the same argument one can show that µn → µ in P2 (Rd ) =⇒ St [µn ] St [µ] * γ γ weakly in L1 (γ). (6.103) In particular, for every t > 0 the map x ∈ D 7→ ϑx,t is weakly continuous in L1 (γ), therefore it is also a strongly measurable L1 (γ)-valued map, with Z Z |ϑx,t (y)| dγ(y) dγ(x) = 1. D (6.104) (6.105) D It follows that the map (6.104) belongs to L1 (γ; L1 (γ)) which is isomorphic to L1 (γ × γ), thus showing (6.66). (6.104) also yields Z ϕ(y)ϑx,t (y) dγ(y) is continuous ∀ ϕ ∈ L∞ (γ). (6.106) the map x 7→ D 103 In order to prove the P representation formula (6.91) we observe that for every initial measure ν = i αi δxi ∈ P2 (D) and every ϕ ∈ Cb0 (D), νt = St [ν] satisfies Z X Z ϕ(y) dνt (y) = αi ϕ(y)ϑxi ,t (y) dγ(y) = D D i = Z Z D (6.107) ϕ(y)ϑx,t (y) dγ(y) dν(x). D Therefore, by approximating an arbitrary measure µ by a sequence of conceP trated measures ν k = i αik δxki converging in P2 (D), since St [ν k ] → St [µ] = µt = ρt γ in P2 (Rd ) (6.106) yields Z Z Z ϕ(y) ρt (y) dγ(y) = ϕ(y)ϑx,t (y) dγ(y) dµ(x). (6.108) D D D and therefore (6.91) by Fubini’s Theorem. Finally, if γ is an invariant measure, then St [1] = 1; the order preserving property shows that kSt [ρ]kL∞ (γ) ≤ kρkL∞ (γ) . By interpolation, the same property holds for every space Lp (γ). Lemma 6.11 (Narrow and weak L1 -convergence) Let µn = ρn γ, µ = ργ ∈ P(Rd ) be Borel probability measures absolutely continuous w.r.t. the Radon measure γ. Then µn → µ narrowly in P(Rd ) ⇐⇒ ρn * ρ weakly in L1 (γ). (6.109) The proof is an immediate consequence of the equiintegrability criterium stated in [7, Remark 5.33]. For every function ζ ∈ L∞ (γ) we can thus define Z ζt (x) = St∗ [ζ](x) := ζ(y)ϑx,t (y) dγ(y). (6.110) Rd The next result show that St∗ is the adjoint semigroup of St and that is exhibits the Feller regularizing property. Theorem 6.12 (The adjoint semigroup) Under the same assumption of the previous Theorem, the maps St∗ defined by (6.110) are the weakly∗ -continuous, non expansive, adjoint semigroup on L∞ (γ) induced by St , i.e. they satisfy Z Z ∗ St [ζ]ρ dγ = ζSt [ρ] dγ ∀ ρ ∈ L1 (γ). (6.111) Rd Rd Moreover, for every t > 0 St∗ L∞ (γ) ⊂ Cb (D), St∗ Lip(D) ⊂ Lip(D). 104 (6.112) Proof. Let us denote by S̃t∗ the adjoint semigroup, defined as in (6.111), and by ζt the image of ζ ∈ L∞ (γ) by S̃t∗ ; we introduce the measures γx0 ,r := 1 χB (x ) · γ ∈ P2 (Rd ), γ(Br (x0 )) r 0 ∀ x0 ∈ D = supp(γ), r > 0, (6.113) satisfying in P2 (Rd ) γx0 ,r → δx0 as r ↓ 0 ∀ x0 ∈ D. By (6.103) we know that that for every x0 ∈ D the limit Z 1 ζ̃t (x0 ) := lim ζt (x) dγ(x) r↓0 γBr (x0 ) B (x ) r 0 (6.114) (6.115) exists since 1 γBr (x0 ) Z Z ζt (x) dγ(x) = ζt χBr (x0 ) dγ γBr (x0 ) Z χ Br (x0 ) dγ = ζ d St (γx0 ,r ) = ζSt γBr (x0 ) Rd Rd Rd Br (x0 ) Z and therefore Z Z ζ d St (δx0 ,r ) = ζ̃t (x0 ) = Rd ζ θx0 ,r dγ. (6.116) Rd Lebesgue differentiation Theorem yields ζ̃t (x) = ζt (x) = St∗ [ζ](x) for γ-a.e. x ∈ D, thus showing that S̃ ∗ = S ∗ . (6.117) and the fact that limx→x0 δx = δx0 in P2 (Rd ) shows that θx,t * θx0 ,t weakly in L1 (γ) as x → x0 ∀ t > 0, (6.117) and therefore ζ̃t is the continuous representative of ζt ; this also show the first inclusion of (6.112). The second inclusion of (6.53) follows easily, since for each ζ ∈ Lip(D) with ζt = St∗ (ζ) and each couple of points x, y ∈ D we have Z Z ζt dνx,t − ζt dνy,t ≤ Lip(ζ; D)W2 (νx,t , νy,t ) |ζt (x) − ζt (y)| ≤ Rd Rd −λt ≤ Lip(ζ; D)e W2 (δx , δy ) = e−λt Lip(ζ; D)|x − y|. 6.4 Nonlinear diffusion equations In this section we consider the case of nonlinear diffusion equations in Rd . Let us consider a convex differentiable function F : [0, +∞) → R which satisfies (4.71), (4.76), and (4.78): F is the density of the internal energy functional F defined in (4.70). 105 Setting LF (z) := zF 0 (z) − F (z), we are looking for nonnegative solution of the evolution equation in Rd × (0, +∞), (6.118a) satisfying the (normalized) mass conservation Z ut ∈ L1 (Rd ), ut (x) dx = 1 ∀ t > 0, (6.118b) ∂t ut − ∆(LF (ut )) = 0 Rd the finiteness of the quadratic moment Z |x|2 ut (x) dx < +∞ ∀ t > 0, (6.118c) Rd the integrability condition LF (u) ∈ L1loc (Rd × (0, +∞)), and the initial Cauchy condition lim ut · L d = µ0 in P2 (Rd ). (6.118d) t↓0 Therefore (6.118a) has the usual distributional meaning Z +∞ Z − ut ∂t ζ + LF (ut )∆ζ dx dt = 0 ∀ ζ ∈ D(Rd × (0, +∞). 0 Rd Theorem 6.13 Suppose that F has a superlinear growth as in (4.72). Then for every µ0 ∈ P2 (Rd ) there exists a unique solution 2 u ∈ ACloc ((0, +∞); P2 (Rd )) of (6.118a,b,c,d) among those satisfying Z |∇LF (u)|2 1,1 1 d LF (u) ∈ Lloc ((0, +∞); Wloc (R )), dx ∈ L1loc (0, +∞). (6.119) u Rd The map t 7→ St [µ0 ] = µt = ut L d is the unique gradient flow in P2 (Rd ) of the functional F defined in (4.70), which is geodesically convex (and also satisfies (5.60) with λ = 0). The gradient flow satisfies all the properties of Theorem 5.7 for λ = 0. In particular, it is characterized by the system of E.V.I. 1 d 2 W (µt , σ) ≤ F(σ) − F(µt ) 2 dt 2 it is non expansive W2 (St [µ0 ], St [ν0 ]) ≤ W2 (µ0 , ν0 ) ∀ σ ∈ D(F), ∀ µ0 , ν0 ∈ P2 (Rd ), t ≥ 0, (6.120) (6.121) and regularizing Z sup t 0<t≤1 sup t2 0<t≤1 Z t 7→ Rd F (ut (x)) dx < +∞, Rd Z Rd |∇LF (ut (x))|2 dx < +∞, ut (x) |∇LF (ut (x))|2 dx ut (x) 106 is not increasing, (6.122) and for every t > 0 µ ∇LF (ut ) tµtt+h − i = in L2 (µt ; Rd ), h↓0 h ut Z F(µt+h ) − F(µt ) |∇LF (ut (x))|2 dx. ∃ lim = h↓0 h ut (x) Rd ∃ lim (6.123) Proof. The proof is a simple combination of theorems 5.3, 5.7, and 5.8(ii). (see [8, Theorem 9.3.9]) and of the results of Section 4.5.3 for the functional F, noticing that the domain of F is dense in P2 (Rd ). Remark 6.14 When F has a sublinear growth and satisfies lim z→+∞ F (z) = 0, z lim z→+∞ F (z) = −∞, z 1−1/d (6.124) then it is possible to prove [8, Thm. 10.4.8] that F still satisfies (5.1c) and the Wasserstein semigroup generated by F provides the unique solution of (6.118a) in the above precise meaning: for, even if µ0 is not regular (e.g. a Dirac mass), the regularizing effect of the Wasserstein semigroup show that µt := S[µ0 ](t) is absolutely continuous w.r.t. the Lebesgue measure L d for all t > 0: its density ut w.r.t. L d is therefore well defined and solves (6.118a). Remark 6.15 Equation (6.118a) is a very classical problem: it has been studied by many authors from different points of view, which is impossibile to recall in detail here. We only mention that in the case of homogeneous Dirichlet boundary conditions in a bounded domain, H. Brézis showed that the equation is the gradient flow (see [15]) of the convex functional (since LF is monotone) Z Z u ψ(u) := GF (u) dx, where GF (u) := LF (r) dr, Rd 0 −1 in the space H (Ω). We refer to the paper of Otto [57] for a detailed comparison of the two notions of solutions and for a physical justification of the interest of the Wasserstein approach. It is also possible to prove that the differential operator −∆(LF (u)) is maccretive in L1 (Rd ) and therefore it induces a (nonlinear) contraction semigroup in L1 (Rd ). Notice that here we allow for more general initial data (an arbitrary probability measure), whereas in the H −1 (or L1 ) formulation Dirac masses are not allowed (but see [60, 22]). 6.5 Drift diffusion equations with non local terms Let us consider, as in [20, 21], a functional φ which is the sum of internal, potential, and interaction energy: Z Z Z 1 φ(µ) := F (u) dx + V dµ + W dµ × µ if µ = uL d . 2 d d d d R ×R R R 107 Here F, V, W satisfy the assumptions considered in Section 4.5.7; as usual we set φ(µ) = +∞ if µ ∈ P2 (Rd ) \ P2a (Rd ). The gradient flow of φ in P2 (Rd ) leads to the equation ∂t ut − ∇ · ∇LF (ut ) + ut ∇V + ut (∇W ) ? ut = 0, (6.125) coupled with conditions (6.118b), (6.118c), (6.118d). Theorem 6.16 For every µ0 ∈ P2 (Rd ) there exists a unique distributional solution ut of (6.125) among those satisfying ut L d → µ0 in P2 (Rd ) as t ↓ 0, 1,1 LF (ut ) ∈ L1loc ((0, +∞); Wloc (Ω)), and ∇LF (ut ) + ∇V + ∇W ? ut (6.126) ∈ L2loc (0, +∞). 2 ut L (µt ;Rd ) Furthermore, this solution is the unique gradient flow in P2 (Rd ) of the functional φ, which is λ-geodesically convex, and therefore satisfies all the properties stated in Theorem 5.7. In particular, when λ > 0 there exists a unique minimizer µ of φ and the gradient flow generates a λ-contracting and regularizing semigroup which exhibit the asymptotic behaviour of (5.22a,b,c,d). Proof. The existence of ut follows by Theorem 5.8 (besides ( (5.1)) φ satisfies (5.60), see [8, Theorem 9.3.5]) and by the characterization, given in Section 4.5.7, of the (minimal) subdifferential of φ. The same characterization proves that any ut as in the statement of the theorem is a gradient flow; therefore the uniqueness Theorem 5.5 can be applied. In the limiting case F, V = 0, the generated semigroup looses its regularizing effect and its existence and main properties follows from the more general theory of [8]. In this way it is possibile to study a model equation for the evolution of granular flows (see e.g. [17]); Notice that, as we did in Section 6.3, we can also consider evolution equations in convex (bounded or unbounded) domains Ω ⊂ Rd with homogeneous Neumann boundary conditions, simply by setting V (x) ≡ +∞ for x ∈ Rd \ Ω. 6.6 Gradient flow of −W 2 /2 and geodesics For a fixed reference measure σ ∈ P2 (Rd ) let us now consider the functional φ(µ) := − 21 W22 (µ, σ), as in Theorem 4.19. Being φ (−1)-convex along generalized geodesics, we can apply Theorem 5.7 to show that φ generates an evolution semigroup on P2 (Rd ). When Γo (σ, µ0 ) contains a plan γ such that π 1 , π 1 + T (π 2 − π 1 ) # γ is optimal for some T > 1, (6.127) then the semigroup moves µ0 along the geodesics induced by γ. Lemma ?? shows that in this case γ admits the representation γ = (r × i)# µ0 for some transport map r and γ is the unique element of Γo (σ, µ0 ). 108 Per quest’ultima applicazione non saprei bene cosa fare: tradurla nel caso di misure a.c. solo per mostrare che la geodetica estesa e’ un gradient flow? Theorem 6.17 Let be given two measures σ, µ0 ∈ P2 (Rd ) and suppose that γ ∈ Γo (σ, µ0 ) satisfies (6.127), i.e. the constant speed geodesic γ(s) := (1 − s)π 1 + sπ 2 # γ can be extended to an interval [0, T ], with T > 1. Then the formula t → µ(t) := γ(et ), for 0 ≤ t ≤ log(T ), (6.128) gives the gradient flow of µ 7→ − 21 W22 (µ, σ) starting from µ0 . Proof. Lemma ?? shows that (1 − et )π 1 + et π 2 , π 1 # γ is the unique optimal plan in Γo (µ(t), σ); therefore W22 (µ(t), µ(t)) = |et − et |2 W22 (µ0 , σ), W22 (µ(t), σ) = e2t W22 (µ0 , σ), (6.129) so that d φ(µ(t)) = −e2t W22 (µ0 , σ) = −|µ0 |2 (t). dt On the other hand, the characterization of |∂φ| given in (4.104) gives (6.130) |∂φ|2 (µ(t)) = e2t W22 (µ0 , σ). This shows that µ(t) is a curve of maximal slope; combining Theorem ?? with the uniqueness Theorem 5.5, we conclude. 6.7 Bibliographical notes a) The gradient flow formulation (5.3) suggests a general variational scheme (the Minimizing Movement approach, which we discussed in the first part of this book and which we will apply in the next sections) to approximate the solution of (6.4a,b,c): proving its convergence is interesting both from the theoretical (cf. the papers quoted at the beginning of the Chapter) and the numerical point of view [47]. b) The variational scheme exhibits solutions which are a priori nonnegative, even if the equation does not satisfies any maximum principle as in the fourth order case [55, 40]. c) Working in Wasserstein spaces allows for weak assumptions on the data: initial values which are general measures (as for fundamental solutions, in the linear cases) fit quite naturally in this framework. d) The gradient flow structure suggests new contraction and energy estimates, which may be useful to study the asymptotic behaviour of solutions to (6.4a,b,c) [57, 11, 17, 21, 2, 63, 31], or to prove uniqueness under weak assumptions on the data. 109 e) The interplay with the theory of Optimal Transportation provides a novel point of view to get new functional inequalities with sharp constants [58, 64, 3, 23, 10, 28]. f) The variational structure provides an important tool in the study of the dependence of solutions from perturbation of the functional. g) The setting in space of measures is particularly well suited when one considers evolution equations in infinite dimensions and tries to “pass to the limit” as the dimension d goes to ∞. 110 References [1] M. Agueh, Existence of solutions to degenerate parabolic equations via the Monge-Kantorovich theory, Adv. Differential Equations, to appear, (2002). [2] , Asymptotic behavior for doubly degenerate parabolic equations, C. R. Math. Acad. Sci. Paris, 337 (2003), pp. 331–336. [3] M. Agueh, N. Ghoussoub, and X. Kang, The optimal evolution of the free energy of interacting gases and its applications, C. R. Math. Acad. Sci. Paris, 337 (2003), pp. 173–178. [4] G. Alberti and L. Ambrosio, A geometrical approach to monotone functions in Rn , Math. Z., 230 (1999), pp. 259–316. [5] A. D. Aleksandrov, A theorem on triangles in a metric space and some of its applications, in Trudy Mat. Inst. Steklov., v 38, Trudy Mat. Inst. Steklov., v 38, Izdat. Akad. Nauk SSSR, Moscow, 1951, pp. 5–23. [6] L. Ambrosio, Lecture notes on optimal transport problem, in Mathematical aspects of evolving interfaces, CIME summer school in Madeira (Pt), P. Colli and J. Rodrigues, eds., vol. 1812, Springer, 2003, pp. 1–52. [7] L. Ambrosio, N. Fusco, and D. Pallara, Functions of bounded variation and free discontinuity problems, Oxford Mathematical Monographs, Clarendon Press, Oxford, 2000. [8] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows in metric spaces and in the spaces of probability measures, Lectures in Mathematics ETH, Birkhäuser, Zürich, 2005. [9] L. Ambrosio and P. Tilli, Topics on Analysis in Metric Spaces, Oxford University Press, Oxford, 2004. [10] A. Arnold and J.Dolbeault, Refined convex Sobolev inequalities, Tech. Rep. 431, Ceremade, 2004. [11] A. Arnold, P. Markowich, G. Toscani, and A. Unterreiter, On convex Sobolev inequalities and the rate of convergence to equilibrium for Fokker-Planck type equations, Comm. Partial Differential Equations, 26 (2001), pp. 43–100. [12] J.-D. Benamou and Y. Brenier, A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem, Numer. Math., 84 (2000), pp. 375–393. [13] V. I. Bogachev, Gaussian measures, vol. 62 of Mathematical Surveys and Monographs, American Mathematical Society, Providence, RI, 1998. 111 [14] V. I. Bogachev, G. Da Prato, and M. Röckner, Existence of solutions to weak parabolic equations for measures, Proc. London Math. Soc. (3), 88 (2004), pp. 753–774. [15] H. Brézis, Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert, North-Holland Publishing Co., Amsterdam, 1973. North-Holland Mathematics Studies, No. 5. Notas de Matemática (50). [16] G. Buttazzo, Semicontinuity, relaxation and integral representation in the calculus of variations, vol. 207 of Pitman Research Notes in Mathematics Series, Longman Scientific & Technical, Harlow, 1989. [17] E. Caglioti and C. Villani, Homogeneous cooling states are not always good approximations to granular flows, Arch. Ration. Mech. Anal., 163 (2002), pp. 329–343. [18] E. A. Carlen and W. Gangbo, Constrained steepest descent in the 2Wasserstein metric, Ann. of Math. (2), 157 (2003), pp. 807–846. [19] , Solution of a model Boltzmann equation via steepest descent in the 2-Wasserstein metric, Arch. Ration. Mech. Anal., 172 (2004), pp. 21–64. [20] J. A. Carrillo, R. J. McCann, and C. Villani, Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates, Rev. Mat. Iberoamericana, 19 (2003), pp. 971–1018. [21] J. A. Carrillo, R. J. McCann, and C. Villani, Contractions in the 2Wasserstein space and thermalization of granular media, To appear, (2004). [22] E. Chasseigne and J. L. Vazquez, Theory of extended solutions for fastdiffusion equations in optimal classes of data. Radiation from singularities, Arch. Ration. Mech. Anal., 164 (2002), pp. 133–187. [23] D. Cordero-Erausquin, B. Nazaret, and C. Villani, A masstransportation approach to sharp Sobolev and Gagliardo-Nirenberg inequalities, Adv. Math., 182 (2004), pp. 307–332. [24] G. Da Prato and A. Lunardi, Elliptic operators with unbounded drift coefficients and Neumann boundary condition, J. Differential Equations, 198 (2004), pp. 35–52. [25] E. De Giorgi, New problems on minimizing movements, in Boundary Value Problems for PDE and Applications, C. Baiocchi and J. L. Lions, eds., Masson, 1993, pp. 81–98. [26] E. De Giorgi, A. Marino, and M. Tosques, Problems of evolution in metric spaces and maximal decreasing curve, Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. (8), 68 (1980), pp. 180–187. 112 [27] M. Degiovanni, A. Marino, and M. Tosques, Evolution equations with lack of convexity, Nonlinear Anal., 9 (1985), pp. 1401–1443. [28] M. Del Pino, J. Dolbeault, and I. Gentil, Nonlinear diffusions, hypercontractivity and the optimal Lp -Euclidean logarithmic Sobolev inequality, J. Math. Anal. Appl., 293 (2004), pp. 375–388. [29] C. Dellacherie and P.-A. Meyer, Probabilities and potential, vol. 29 of North-Holland Mathematics Studies, North-Holland Publishing Co., Amsterdam, 1978. [30] R. J. DiPerna and P.-L. Lions, Ordinary differential equations, transport theory and Sobolev spaces, Invent. Math., 98 (1989), pp. 511–547. [31] J. Dolbeault, D. Kinderlehrer, and M. Kowalczyk, Remarks about the Flashing Rachet, Tech. Rep. 406, Ceremade, 2004. [32] L. C. Evans, W. Gangbo, and O. Savin, Diffeomorphisms and nonlinear heat flows, (2004). [33] L. C. Evans and R. F. Gariepy, Measure theory and fine properties of functions, Studies in Advanced Mathematics, CRC Press, Boca Raton, FL, 1992. [34] H. Federer, Geometric measure theory, Die Grundlehren der mathematischen Wissenschaften, Band 153, Springer-Verlag New York Inc., New York, 1969. [35] J. Feng and M. Katsoulakis, A Hamilton Jacobi theory for controlled gradient flows in infinite dimensions,, tech. rep., 2003. [36] W. Gangbo, The Monge mass transfer problem and its applications, in Monge Ampère equation: applications to geometry and optimization (Deerfield Beach, FL, 1997), vol. 226 of Contemp. Math., Amer. Math. Soc., Providence, RI, 1999, pp. 79–104. [37] R. Gardner, The Brunn-Minkowski inequality, Bull. Amer. Math. Soc., 39 (2002), pp. 355–405. [38] L. Giacomelli and F. Otto, Variational formulation for the lubrication approximation of the Hele-Shaw flow, Calc. Var. Partial Differential Equations, 13 (2001), pp. 377–403. [39] , Rigorous lubrication approximation, Interfaces Free Bound., 5 (2003), pp. 483–529. [40] U. Gianazza, G. Toscani, and G. Savaré, A fourth order parabolic equation and the Wasserstein distance, tech. rep., IMATI-CNR, Pavia, 2004. to appear. 113 [41] M. Giaquinta and S. Hildebrandt, Calculus of Variations I, vol. 310 of Grundlehren der mathematischen Wissenschaften, Springer, Berlin, 1996. [42] K. Glasner, A diffuse interface approach to Hele-Shaw flow, Nonlinearity, 16 (2003), pp. 49–66. [43] C. Goffman and J. Serrin, Sublinear functions of measures and variational integrals, Duke Math. J., 31 (1964), pp. 159–178. [44] C. Huang and R. Jordan, Variational formulations for Vlasov-PoissonFokker-Planck systems, Math. Methods Appl. Sci., 23 (2000), pp. 803–843. [45] R. Jordan, D. Kinderlehrer, and F. Otto, The variational formulation of the Fokker-Planck equation, SIAM J. Math. Anal., 29 (1998), pp. 1–17 (electronic). [46] J. Jost, Nonpositive curvature: geometric and analytic aspects, Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 1997. [47] D. Kinderlehrer and N. J. Walkington, Approximation of parabolic equations using the Wasserstein metric, M2AN Math. Model. Numer. Anal., 33 (1999), pp. 837–852. [48] Z.-M. Ma and M. Röckner, Introduction to the Theory of (Nonsymmetric) Dirichlet Forms, Springer, New York, 1992. [49] A. Marino, C. Saccon, and M. Tosques, Curves of maximal slope and parabolic variational inequalities on nonconvex constraints, Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4), 16 (1989), pp. 281–330. [50] R. McCann, Polar factorization of maps on riemannian manifolds, Geometric and Functional Analysis, 11 (2001), pp. 589–608. [51] R. J. McCann, A convexity principle for interacting gases, Adv. Math., 128 (1997), pp. 153–179. [52] T. Mikami, Dynamical systems in the variational formulation of the Fokker-Planck equation by the Wasserstein metric, Appl. Math. Optim., 42 (2000), pp. 203–227. [53] F. Otto, Doubly degenerate diffusion equations as steepest descent, Manuscript, (1996). [54] , Dynamics of labyrinthine pattern formation in magnetic fluids: a mean-field theory, Arch. Rational Mech. Anal., 141 (1998), pp. 63–103. [55] , Lubrication approximation with prescribed nonzero contact angle, Comm. Partial Differential Equations, 23 (1998), pp. 2077–2164. [56] , Evolution of microstructure in unstable porous media flow: a relaxational approach, Comm. Pure Appl. Math., 52 (1999), pp. 873–915. 114 [57] , The geometry of dissipative evolution equations: the porous medium equation, Comm. Partial Differential Equations, 26 (2001), pp. 101–174. [58] F. Otto and C. Villani, Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality, J. Funct. Anal., 173 (2000), pp. 361–400. [59] A. Pazy, Semigroups of linear operators and applications to partial differential equations, Springer-Verlag, New York, 1983. [60] M. Pierre, Uniqueness of the solutions of ut − ∆ϕ(u) = 0 with initial datum a measure, Nonlinear Anal., 6 (1982), pp. 175–187. [61] A. Pratelli, On the equality between Monge’s infimum and Kantorovich’s minimum in optimal mass transportation, To appear, (2004). [62] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, SpringerVerlag, Berlin, 1998. [63] C. Sparber, J. A. Carrillo, J. Dolbeault, and P. A. Markowich, On the long-time behavior of the quantum Fokker-Planck equation, Monatsh. Math., 141 (2004), pp. 237–257. [64] C. Villani, Optimal transportation, dissipative PDE’s and functional inequalities, in Optimal transportation and applications (Martina Franca, 2001), vol. 1813 of Lecture Notes in Math., Springer, Berlin, 2003, pp. 53– 89. [65] , Topics in optimal transportation, vol. 58 of Graduate Studies in Mathematics, American Mathematical Society, Providence, RI, 2003. 115
© Copyright 2026 Paperzz