f-Divergence for convex bodies - Case Western Reserve University

f -Divergence for convex bodies
Elisabeth M. Werner
∗
†
Abstract
We introduce f -divergence, a concept from information theory and
statistics, for convex bodies in Rn . We prove that f -divergences are
SL(n) invariant valuations and we establish an affine isoperimetric
inequality for these quantities. We show that generalized affine surface
area and in particular the Lp affine surface area from the Lp Brunn
Minkowski theory are special cases of f -divergences.
1
Introduction.
In information theory, probability theory and statistics, an f -divergence is
a function Df (P, Q) that measures the difference between two probability
distributions P and Q. The divergence is intuitively an average, weighted by
the function f , of the odds ratio given by P and Q. These divergences were
introduced independently by Csiszár [2], Morimoto [37] and Ali & Silvey [1].
Special cases of f -divergences are the Kullback Leibler divergence or relative
entropy and the Rényi divergences (see Section 1).
Due to a number of highly influential works (see, e.g., [4] - [11], [14], [15],
[19], [20], [22] - [27], [29], [31], [34] - [36], [38], [42], [43] - [54], [56] - [58]),
the Lp -Brunn-Minkowski theory is now a central part of modern convex
geometry. A fundamental notion within this theory is Lp affine surface area,
introduced by Lutwak in the ground breaking paper [26].
It was shown in [52] that Lp affine surface areas are entropy powers
of Rényi divergences of the cone measures of a convex body and its polar,
∗
Keywords: f -divergence, relative entropy, affine surface area. 2010 Mathematics Subject Classification: 52A20, 53A15
†
Partially supported by an NSF grant, a FRG-NSF grant and a BSF grant
1
thus establishing further connections between information theory and convex
geometric analysis. Further examples of such connections are e.g. several
papers by Lutwak, Yang, and Zhang [28, 30, 32, 33] and the recent article
[39] where it is shown how relative entropy appears in convex geometry.
In this paper we introduce f -divergences to the theory of convex bodies and thus strengthen the already existing ties between information theory
and convex geometric analysis. We show that generalizations of the Lp affine
surface areas, the Lφ and Lψ affine surface areas introduced in [23] and [21],
are in fact f -divergences for special functions f . We show that f -divergences
are SL(n) invariant valuations and establish an affine isoperimetric inequality for these quantities. Finally, we give geometric characterizations of f divergences.
Usually, in the literature, f -divergences are considered for convex functions f . A similar theory with the obvious modifications can be developed
for concave functions. Here, we restrict ourselves to consider the convex
setting.
Further Notation.
We work in Rn , which is equipped with a Euclidean structure h·, ·i. We write
B2n for the Euclidean unit ball centered at 0 and S n−1 for the unit sphere.
Volume is denoted by | · | or, if we want to emphasize the dimension, by
vold (A) for a d-dimensional set A.
Let K0 be the space of convex bodies K in Rn that contain the origin
in their interiors. Throughout the paper, we will only consider such K. For
K ∈ K0 , K ◦ = {y ∈ Rn : hx, yi ≤ 1 for all x ∈ K} is the polar body of K.
For a point x ∈ ∂K, the boundary of K, NK (x) is the outer unit normal in
x to K and κK (x), or, in short κ, is the (generalized) Gauss curvature in
2 , if K has C 2 boundary ∂K with everywhere strictly
x. We write K ∈ C+
positive Gaussian curvature κK . By µ or µK we denote the usual surface
area measure on ∂K and by σ the usual surface area measure on S n−1 .
Let K be a convex body in Rn and let u ∈ S n−1 . Then hK (u) is the
support function of K in direction u ∈ S n−1 , and fK (u) is the curvature
function, i.e. the reciprocal of the Gaussian curvature κK (x) at the point
x ∈ ∂K that has u as outer normal.
2
2
f -divergences.
Let (X, µ) be a measure space and let dP = pdµ and dQ = qdµ be probability measures on X that are absolutely continuous with respect to the
measure µ. Let f : (0, ∞) → R be a convex function. The ∗-adjoint function f ∗ : (0, ∞) → R of f is defined by (e.g. [17])
f ∗ (t) = tf (1/t), t ∈ (0, ∞).
(1)
It is obvious that (f ∗ )∗ = f and that f ∗ is again convex if f is convex. Csiszár [2], and independently Morimoto [37] and Ali & Silvery [1]
introduced the f -divergence Df (P, Q) of the measures P and Q which, for
a convex function f : (0, ∞) → R can be defined as (see [17])
Z
p
f
Df (P, Q) =
qdµ + f (0) Q ({x ∈ X : p(x) = 0})
q
{pq>0}
+ f ∗ (0) P ({x ∈ X : q(x) = 0}) ,
(2)
where
f (0) = lim f (t)
t↓0
and f ∗ (0) = lim f ∗ (t).
t↓0
(3)
We make the convention that 0 · ∞ = 0.
Please note that
Df (P, Q) = Df ∗ (Q, P ).
(4)
With (3) and as
Z
q
p
f (0) P ({x ∈ X : q(x) = 0}) =
f
pdµ =
f
qdµ,
p
q
{q=0}
{q=0}
∗
Z
∗
we can write in short
p
qdµ.
Df (P, Q) =
f
q
X
Z
(5)
For particular choices of f we get many common divergences. E.g. for
f (t) = t ln t with ∗-adjoint function f ∗ (t) = − ln t, the f -divergence is the
classical information divergence, also called Kullback-Leibler divergence or
relative entropy from P to Q (see [3])
Z
p
DKL (P kQ) =
p ln dµ.
(6)
q
X
3
For the convex or concave functions f (t) = tα we obtain the Hellinger integrals (e.g. [17])
Z
pα q 1−α dµ.
(7)
Hα (P, Q) =
X
Those are related to the Rényi divergence of order α, α 6= 1, introduced by
Rényi [41] (for α > 0) as
Z
1
1
α 1−α
Dα (P kQ) =
ln
p q
dµ =
ln (Hα (P, Q)) .
(8)
α−1
α−1
X
The case α = 1 is the relative entropy DKL (P kQ).
3
f -divergences for convex bodies.
We will now consider f -divergences for convex bodies K ∈ K0 . Let
pK (x) =
hx, NK (x)i
κK (x)
, qK (x) =
.
n
◦
hx, NK (x)i n|K |
n |K|
(9)
Usually, in the literature, the measures under consideration are probability
measures. Therefore we have normalized the densities. Thus
PK = pK µK
and QK = qK µK
(10)
are measures on ∂K that are absolutely continuous with respect to µK . QK
2.
is a probability measure and PK is one if K is in C+
Recall that the normalized cone measure cmK on ∂K is defined as follows: For every measurable set A ⊆ ∂K
1 cmK (A) =
ta : a ∈ A, t ∈ [0, 1] .
(11)
|K| The next proposition is well known. See e.g. [39] for a proof. It shows that
the measures PK and QK defined in (10) are the cone measures of K and
K ◦ . NK : ∂K → S n−1 , x → NK (x) is the Gauss map.
Proposition 3.1. Let K be a convex body in Rn . Let PK and QK be the
probability measures on ∂K defined by (10). Then
QK = cmK ,
4
or, equivalently, for every measurable subset A in ∂K QK (A) = cmK (A).
2 , then
If K is in addition in C+
−1
PK = NK
NK ◦ cmK ◦
or, equivalently, for every measurable subset A in ∂K
−1
PK (A) = cmK ◦ NK ◦ NK (A) .
(12)
It is in the sense (12) that we understand PK to be the “cone measure”
of K ◦ and we write PK = cmK ◦ .
We now define the f -divergences of K ∈ K0 . Note that hx, NK (x)i > 0
for all x ∈ ∂K and therefore {x ∈ ∂K : qK (x) = 0} = ∅. Hence, possibly
also using our convention 0 · ∞ = 0,
f ∗ (0) PK ({x ∈ ∂K : qK (x) = 0}) = 0.
Definition 3.2. Let K be a convex body in K0 and let Let f : (0, ∞) → R be
a convex function. The f -divergence of K with respect to the cone measures
PK and QK is
Z
pK
f
Df (PK , QK ) =
qK dµK
qK
∂K
Z
|K|κK (x)
hx, NK (x)i
=
f
dµK . (13)
◦
n+1
|K |hx, NK (x)i
n|K|
∂K
Remarks.
By (4) and (13)
qK
pK dµK = Df ∗ (PK , QK )
Df (QK , PK ) =
f
pK
∂K
Z
pK
=
f∗
qK dµK
qK
∂K
◦
Z
|K |hx, NK (x)in+1
κK (x) dµK
=
f
. (14)
|K|κK (x)
n|K ◦ |hx, NK (x)in
∂K
Z
f -divergences can also be expressed as integrals over S n−1 ,
Z
hK (u)fK (u)
|K|
Df (PK , QK ) =
f
dσ (15)
◦ |f (u)h (u)n+1
|K
n|K|
n−1
K
K
S
5
and
Z
f
Df (QK , PK ) =
S n−1
|K ◦ |fK (u)hK (u)n+1
|K|
dσK
.
◦
n|K |hK (u)n
(16)
Examples.
If K is a polytope, the Gauss curvature κK of K is 0 a.e. on ∂K. Hence
Df (PK , QK ) = f (0) and Df (QK , PK ) = f ∗ (0).
(17)
For every ellipsoid E,
Df (PE , QE ) = Df (QE , PE ) = f (1) = f ∗ (1).
(18)
Denote by Conv(0, ∞) the set of functions ψ : (0, ∞) → (0, ∞) such that
ψ is convex, limt→0 ψ(t) = ∞, and limt→∞ ψ(t) = 0. For ψ ∈ Conv(0, ∞),
Ludwig [21] introduces the Lψ affine surface area for a convex body K in
Rn
Z
κK (x)
ΩΨ (K) =
ψ
hx, NK (x)idµK .
(19)
hx, NK (x)in+1
∂K
Thus, Lψ affine surface areas are special cases of (non-normalized) f -divergences
for f = ψ.
For ψ ∈ Conv(0, ∞), the ∗-adjoint function ψ ∗ is convex, limt→0 ψ(t) =
0, and limt→∞ ψ(t) = ∞. Thus ψ ∗ is an Orlicz function (see [18]), and gives
rise to the corresponding Orlicz-divergences Dψ∗ (PK , QK ) and Dψ∗ (QK , PK ).
p
Let p ≤ 0. Then the function f : (0, ∞) → (0, ∞), f (t) = t n+p , is
convex. The corresponding (non-normalized) f -divergence (which is also an
Orlicz-divergence) is the Lp affine surface area, introduced by Lutwak [26]
for p > 1 and by Schütt and Werner [47] for p < 1, p 6= −n. See also [12].
It was shown in [52] that all Lp affine surface areas are entropy powers
of Rényi divergences.
p
For p ≥ 0, the function f : (0, ∞) → (0, ∞), f (t) = t n+p is concave. The
p
n+p
R
κK
dµK
corresponding Lp affine surface areas ∂K
n(p−1) are examples of Lφ
hx,NK (x)i
n+p
affine surface areas which were considered in [23] and [21]. Those, in turn
are special cases of (non-normalized) f -divergences for concave functions f .
6
Let f (t) = t ln t. Then the ∗-adjoint function is f ∗ (t) = − ln t. The corresponding f -divergence is the Kullback Leibler divergence or relative entropy
DKL (PK kQK ) from PK to QK
Z
κK (x)
|K|κK (x)
DKL (PK kQK ) =
dµK . (20)
ln
◦
n
|K ◦ |hx, NK (x)in+1
∂K n|K |hx, NK (x)i
The relative entropy DKL (QK kPK ) from QK to PK is
DKL (QK kPK ) = Df ∗ (PK , QK )
(21)
◦
Z
n+1
hx, NK (x)i
|K |hx, NK (x)i
=
log
dµK .(22)
n|K|
|K|κK (x)
∂K
Those were studied in detail in [39].
Equations (15) and (16) of the above remark lead us to define f -divergences
for several convex bodies, or mixed f -divergences.
Let K1 , . . . , Kn be convex bodies in K0 . Let u ∈ S n−1 . For 1 ≤ i ≤ n,
define
1
fKi (u)hKi (u)
pKi (u) =
, qKi (u) =
.
(23)
◦
n|Ki |hKi (u)
n |Ki |
and measures on S n−1 by
PKi = pKi σ
and QKi = qKi σ.
(24)
Let fi : (0, ∞) → R, 1 ≤ i ≤ n, be convex functions. Then we define the
mixed f -divergences for convex bodies K1 , . . . , Kn in K0 by
Definition 3.3.
1
n Y
n
pKi
Df1 ...fn (PK1 × · · · × PKn , QK1 × · · · × QKn ) =
fi
qKi dσ
qKi
S n−1
Z
i=1
and
1
n Y
n
qKi
Df1 ...fn (QK1 × · · · × QKn , PK1 × · · · × PKn ) =
fi
pKi dσ.
pKi
S n−1
Z
i=1
7
Note that
Df1∗ ...fn∗ (PK1 × · · · × PKn , QK1 × · · · × QKn )
= Df1 ...fn (QK1 × · · · × QKn , PK1 × · · · × PKn ).
Here, we concentrate on f -divergence for one convex body. Mixed f divergences are treated similarly. We also refer to [55], where they have
been investigated for functions in Conv(0, ∞).
The observation (17) about polytopes holds more generally.
Proposition 3.4. Let K be a convex body in K0 and let f : (0, ∞) → R be
a convex function. If K is such that µK ({pK > 0}) = 0, then
Df (PK , QK ) = f (0) and Df (QK , PK ) = f ∗ (0).
Proof. µK ({pK > 0}) = 0 iff QK ({pK > 0}) = 0. Hence the assumption
implies that QK ({pK = 0}) = 1. Therefore,
Z
pK
Df (PK , QK ) =
f
qK dµK
qK
∂K
Z
Z
pK
pK
=
f
qK dµK +
f
qK dµK
q
qK
K
{pK >0}
{pK =0}
= f (0).
By (4), Df (QK , PK ) = Df ∗ (PK , QK ) = f ∗ (0).
The next proposition complements the previous one. In view of (18) and
(27), it corresponds to the affine isoperimetric inequality for f -divergences.
It was proved in [17] in a different setting and in the special case of f ∈
Conv(0, ∞) by Ludwig [21]. We include a proof for completeness.
Proposition 3.5. Let K be a convex body in K0 and let f : (0, ∞) → R be
a convex function. If K is such that µK ({pK > 0}) > 0, then
PK ({pK > 0})
QK ({pK > 0}) + f (0) QK ({pK = 0})
Df (PK , QK ) ≥ f
QK ({pK > 0})
(25)
and
PK ({pK > 0})
∗
QK ({pK > 0})+f ∗ (0) QK ({pK = 0}) .
Df (QK , PK ) ≥ f
QK ({pK > 0})
(26)
8
2 , or if f is decreasing, then
If K is in C+
Df (PK , QK ) ≥ f (1) and Df (QK , PK ) ≥ f ∗ (1) = f (1).
(27)
Equality holds in (25) and (26) iff f is linear or K is an ellipsoid. If K
2 , equality holds in both inequalities (27) iff f is linear or K is an
is in C+
ellipsoid. If f is decreasing, equality holds in both inequalities (27) iff K is
an ellipsoid.
Remark. It is possible for f to be deceasing and linear without having
equality in (27). To see that, let f (t) = at + b, a < 0, b > 0. Then, for
polytopes K (for which µK ({pK > 0}) = 0), Df (PK , QK ) = f (0) = b >
f (1) = a + b. But, also in the case when 0 < µK ({pK > 0}) < 1, strict
inequality may hold.
n (ε) be a “rounded”
Indeed, let ε > 0 be sufficiently small and let K = B∞
n with sidelength
cube, where we have “rounded” the corners of the cube B∞
2 centered at 0 by replacing each corner with εB2n Euclidean balls. Then
Df (PK , QK ) = b + a PK ({pK > 0}) > b + a = f (1).
Proof of Proposition 3.5. Let K be such that µK ({pK > 0}) > 0, which
is equivalent to QK ({pK > 0}) > 0. Then, by Jensen’s inequality,
Z
pK
qK dµK
Df (PK , QK ) = QK ({pK > 0})
f
qK QK ({pK > 0})
{pK >0}
+ f (0) QK ({pK = 0})
PK ({pK > 0})
+ f (0)QK ({pK = 0}) .
≥ QK ({pK > 0}) f
QK ({pK > 0})
Inequality (26) follows by (4), as Df (QK , PK ) = Df ∗ (PK , QK ).
2 , Q ({p > 0}) = 1, Q ({p = 0}) = 0, P ({p > 0}) =
If K is in C+
K
K
K
K
K
K
1 and PK ({pK = 0}) = 0. Thus we get that Df (PK , QK ) ≥ f (1) and
Df (QK , PK ) ≥ f ∗ (1) = f (1).
If f is decreasing, then, by Jensen’s inequality
Z
Z
pK
Df (PK , QK ) =
f
qK dµK ≥ f
pK dµK ≥ f (1).
qK
∂K
∂K
R
The last inequality holds as ∂K pK dµK ≤ 1 and as f is decreasing.
9
K
Equality holds in Jensen’s inequality iff either f is linear or pqK
is constant. Indeed, if f (t) = at + b, then
Z
pK
a
+ b qK dµK + f (0) QK ({pK = 0})
Df (PK , QK ) =
qK
{pK >0}
= aPK ({pK > 0}) + f (0).
K
If f is not linear, equality holds iff pqK
= c, c a constant. As by assumption
µK ({pK > 0}) > 0, c 6= 0. By a theorem of Petty [40], this holds iff K is an
ellipsoid.
The next proposition can be found in [17] in a different setting. Again,
we include a proof for completeness.
Proposition 3.6. Let K be a convex body in K0 and let f : (0, ∞) → R be
a convex function. Then
∗
Df (PK , QK ) ≤ f (0)+f (0)+f (1) QK ({0 < pK ≤ qK })+PK ({0 < qK ≤ pK })
and
Df (QK , PK ) ≤ f (0)+f (0)+f (1) QK ({0 < pK ≤ qK })+PK ({0 < qK ≤ pK }) .
∗
If f is decreasing, the inequalities reduce to Df (PK , QK ) ≤ f (0) respectively,
Df (QK , PK ) ≤ f ∗ (0).
Proof. It is enough to prove the first inequality. The second one follows
immediately form the first by (4).
10
pK
f
qK dµ
Df (PK , QK ) =
qK
∂K
Z
pK
f
qK dµ + f (0) QK ({pK = 0})
=
qK
{pK >0}
Z
pK
f
= f (0) QK ({pK = 0}) +
qK dµ
qK
0
{0<pK }∩{f ≥0}
Z
pK
f
+
qK dµ
qK
0
{0<pK }∩{f ≤0}
0
≤ f (0) QK ({pK = 0}) + QK {pK > 0} ∩ {f ≤ 0}
Z
Z
pK
pK
+
f
qK dµ +
f
qK dµ
qK
qK
{0<pK ≤qK }∩{f 0 ≥0}
{0<qK ≤pK }∩{f 0 ≥0}
≤ f (0) + f (1) QK {0 < pK ≤ qK } ∩ {f 0 ≥ 0}
Z
∗ qK
+
f
pK dµ
pK
{0<qK ≤pK }∩{f 0 ≥0}
= f (0) + f (1) QK {0 < pK ≤ qK } ∩ {f 0 ≥ 0}
Z
∗ qK
+
f
pK dµ
pK
{0<qK ≤pK }∩{f 0 ≥0}∩{(f ∗ )0 ≥0}
Z
∗ qK
f
pK dµ
+
pK
{0<qK ≤pK }∩{f 0 ≥0}∩{(f ∗ )0 ≤0}
≤ f (0) + f (1) QK {0 < pK ≤ qK } ∩ {f 0 ≥ 0}
+ f ∗ (1) PK {0 < qK ≤ pK } ∩ {f 0 ≥ 0} ∩ {(f ∗ )0 ≥ 0}
+ f ∗ (0) PK {0 < qK ≤ pK } ∩ {f 0 ≥ 0} ∩ {(f ∗ )0 ≤ 0}
≤ f (0) + f ∗ (0) PK {0 < qK ≤ pK } ∩ {f 0 ≥ 0}
0
0
+f (1) QK ({0 < pK ≤ qK } ∩ {f ≥ 0}) + PK ({0 < qK ≤ pK } ∩ {f ≥ 0}) .
Z
It follows from the last expression that, if f is decreasing, the inequality
reduces to Df (PK , QK ) ≤ f (0).
The next proposition shows that f -divergences are GL(n) invariant and
that non-normalized f -divergences are SL(n) invariant valuations. For functions in Conv(0, ∞), this was proved by Ludwig [21].
For functions in Conv(0, ∞) the expressions are also lower semicontinuous, as it was shown in [21]. However, this need not be the case anymore
11
if we assume just convexity of f . Indeed, let f (t) = t2 and let K = B2n
be the Euclidean unit ball. Let (Kj )j∈N be a sequence of polytopes that
converges to B2n . As observed above, Df (PKj , QKj ) = f (0) = 0 for all j.
But Df (PB2n , QB2n ) = f (1) = 1.
κK µK
Let P̃K = hx,N
n and Q̃K = hx, NK (x)iµK . Then we will denote by
K (x)i
Df (P̃K , Q̃K ) and Df (Q̃K , P̃K ) the non-normalized f -divergences. We will
also use the following lemma from [47] for the proof of Proposition 3.8.
Lemma 3.7. Let K be a convex body in K0 . Let h : ∂K → R be an integrable
function, and T : Rn → Rn an invertible, linear map. Then
Z
Z
f (T −1 (y))
dµT (K) .
h(x)dµK = | det(T )|−1
−1t (N (T −1 (y)))k
K
∂K
∂T (K) kT
Proposition 3.8. Let K be a convex body in K0 and let f : (0, ∞) → R be a
convex function. Then Df (PK , QK ) and Df (QK , PK ) are GL(n) invariant
and Df (P̃K , Q̃K ) and Df (Q̃K , P̃K ) are SL(n) invariant valuations.
Proof. We use (e.g. [47]) that
hT (x), NT (K) (T (x)i =
hx, NK (x)i
,
kT −1t (NK (x))k
and
κK (x) = kT −1t (NK (x))kn+1 det(T )2 κT (K) (T (x))
and Lemma 3.7 to get that
Z
pK (x)
Df (PK , QK ) =
f
qK (x)dµ(x)
qK (x)
∂K
pK (T −1 (y))
Z
f
qK (T −1 (y))dµT (K)
−1
1
qK (T (y))
=
| det(T )| ∂T (K)
kT −1t (NK (T −1 (y)))k
= Df (PT (K) , QT (K) ).
The formula for Df (QK , PK ) follows immediately from this one and (4).
The SL(n) invariance for the non-normalized f -divergences is shown in the
same way.
Now we show that Df (P̃K , Q̃K ) and Df (Q̃K , P̃K ) are valuations, i.e. for
convex bodies K and L in K0 such that K ∪ L ∈ K0 ,
Df (P̃K∪L , Q̃K∪L ) + Df (P̃K∩L , Q̃K∩L ) = Df (P̃K , Q̃K ) + Df (P̃L , Q̃L ). (28)
12
Again, it is enough to prove this formula and the one for Df (Q̃K , P̃K ) follows
with (4). To prove (28), we proceed as in Schütt [44]. For completeness, we
include the argument. We decompose
∂(K ∪ L) = (∂K ∩ ∂L) ∪ (∂K ∩ Lc ) ∪ (K c ∩ ∂L),
∂(K ∩ L) = (∂K ∩ ∂L) ∪ (∂K ∩ intL) ∪ (intK ∩ ∂L),
∂K = (∂K ∩ ∂L) ∪ (∂K ∩ Lc ) ∪ (∂K ∩ intL),
∂L = (∂K ∩ ∂L) ∪ (∂K c ∩ ∂L) ∪ (intK ∩ ∂L),
where all unions on the right hand side are disjoint. Note that for x such
that the curvatures κK (x), κL (x), κK∪L (x) and κK∩L (x) exist,
hx, NK (x)i = hx, NL (x)i = hx, NK∩L (x)i = hx, NK∪L (x)i
(29)
and
κK∪L (x) = min{κK (x), κL (x)}, κK∩L (x) = max{κK (x), κL (x)}.
(30)
To prove (28), we split the involved integral using the above decompositions
and (29) and (30).
4
Geometric characterization of f -divergences.
In [52], geometric characterizations were proved for Rényi divergences. Now,
we want to establish such geometric characterizations for f -divergences as
well. We use the surface body [47] but the illumination surface body [54] or
the mean width body [13] can also be used.
Let K be a convex body in Rn . Let g : ∂K → R be a nonnegative,
integrable, function. Let s ≥ 0.
The surface body Kg,s , introduced in [47], is the intersection of all closed
half-spaces H + whose defining hyperplanes H cut off a set of f µK -measure
less than or equal to s from ∂K. More precisely,
\
Kg,s =
H +.
R
∂K∩H −
gdµK ≤s
For x ∈ ∂K and s > 0
xs = [0, x] ∩ ∂Kg,s .
13
The minimal function Mg : ∂K → R
R
Mg (x) = inf
0<s
∂K∩H − (xs ,NKg,s (xs )) g
dµK
(31)
voln−1 ∂K ∩ H − (xs , NKg,s (xs ))
was introduced in [47]. H(x, ξ) is the hyperplane through x and orthogonal
to ξ. H − (x, ξ) is the closed halfspace containing the point x + ξ, H + (x, ξ)
the other halfspace.
For x ∈ ∂K, we define r(x) as the maximum of all real numbers ρ so
that B2n (x − ρNK (x), ρ) ⊆ K. Then we formulate an integrability condition
for the minimal function
Z
dµK (x)
< ∞.
(32)
2
∂K (Mg (x)) n−1 r(x)
The following theorem was proved in [47].
Theorem 4.1. Let K be a convex body in Rn . Suppose that f : ∂K → R is
an integrable, almost everywhere strictly positive function that satisfies the
integrability condition (32). Then
1
cn lim
s→0
|K| − |Kg,s |
2
Z
=
s n−1
n−1
κK
2
∂K
dµK ,
g n−1
2
where cn = 2|B2n−1 | n−1 .
Theorem 4.1 was used in [47] to give geometric interpretations of Lp
affine surface area and in [52] to give geometric interpretations of Rényi
divergences. Now we use this theorem to give geometric interpretations of
f -divergence for cone measures of convex bodies.
For a convex function f : (0, ∞) → R, let gf , hf : ∂K → R be defined as

1
2
p K qK


gf (x) = n|K ◦ |nn |K|n n−1 
K
f pqK
and
1

hf (x) = gf ∗ (x) =
(33)
2
n /pn−2
qK


◦ n
n
K
n|K |n |K| n−1 
K
f pqK
14
.
(34)
Corollary 4.2. Let K be a convex body in K0 and let f : (0, ∞) → R be
convex. Let gf , hf : ∂K → R be defined as in (33) and (34). If gf and hf
are integrable, almost everywhere strictly positive functions that satisfy the
integrability condition (32), then
cn lim
s→0
|K| − |Kgf ,s |
2
= Df (PK , QK )
s n−1
and
cn lim
s→0
|K| − |Khf ,s |
2
= Df (QK , PK )
s n−1
Proof. The proof of the corollary follows immediately from Theorem 4.1.
15
References
[1] M. S. Ali and D. Silvey, A general class of coefficients of divergence
of one distribution from another, Journal of the Royal Statistical Society,
Series B 28 (1966), 131-142.
[2] I. Csiszár, Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten, Publ.
Math. Inst. Hungar. Acad. Sci. ser. A 8 (1963), 84-108.
[3] T. Cover and J. Thomas, Elements of information theory, second
ed., Wiley-Interscience, (John Wiley and Sons), Hoboken, NJ, (2006).
[4] R. J. Gardner, A positive answer to the Busemann-Petty problem in
three dimensions, Ann. of Math. (2) 140 (1994), 435-47.
[5] R. J. Gardner, The dual Brunn-Minkowski theory for bounded Borel
sets: Dual affine quermassintegrals and inequalities, Adv. Math. 216
(2007), 358-386.
[6] R. J. Gardner, A. Koldobsky, and T. Schlumprecht, An analytical solution to the Busemann-Petty problem on sections of convex
bodies, Ann. of Math. (2) 149 (1999), 691-703.
[7] R. J. Gardner and G. Zhang, Affine inequalities and radial mean
bodies. Amer. J. Math. 120, no.3, (1998), 505-528.
[8] E. Grinberg and G. Zhang, Convolutions, transforms, and convex
bodies, Proc. London Math. Soc. (3) 78 (1999), 77-115.
[9] C. Haberl, Blaschke valuations, Amer. J. of Math., in press
[10] C. Haberl and F. Schuster, General Lp affine isoperimetric inequalities. J. Differential Geometry 83 (2009), 1-26.
[11] C. Haberl, E. Lutwak, D. Yang and G. Zhang, The even Orlicz
Minkowski problem, Adv. Math. 224 (2010), 2485-2510
[12] D. Hug, Curvature Relations and Affine Surface Area for a General
Convex Body and its Polar. Results in Mathematics V. 29 233-248,
(1996).
[13] J. Jenkinson and E. Werner Relative entropies for convex bodies,
to appear in Transactions of the AMS.
16
[14] D. Klain, Star valuations and dual mixed volumes, Adv. Math. 121
(1996), 80-101.
[15] D. Klain, Invariant valuations on star-shaped sets, Adv. Math. 125
(1997), 95-113.
[16] F. Liese and I. Vajda, Convex Statistical Distances, Leipzig, Germany, Teubner, (1987).
[17] F. Liese and I. Vajda, On Divergences and Information in Statistics
and Information Theory, IEEE Transactions on Information Theory 52
(2006), 4394-4412.
[18] J. Lindenstrauss, L. Tzafriri, Classical Banach spaces. I. Sequence
spaces. Ergebnisse der Mathematik und ihrer Grenzgebiete, Vol. 92.
Springer-Verlag, Berlin-New York, (1977).
[19] M. Ludwig, Ellipsoids and matrix valued valuations, Duke Math. J.
119 (2003), 159-188.
[20] M. Ludwig, Minkowski areas and valuations, J. Differential Geometry,
86 (2010), 133-162.
[21] M. Ludwig, General affine surface areas, Adv. Math., 224 (2010),
2346-2360.
[22] M. Ludwig and M. Reitzner, A Characterization of Affine Surface
Area, Adv. Math. 147 (1999), 138-172.
[23] M. Ludwig and M. Reitzner, A classification of SL(n) invariant
valuations. Ann. of Math. 172 (2010), 1223-1271.
[24] E. Lutwak, Dual mixed volumes, Pacific J. Math. 58 (1975) 531-538.
[25] E. Lutwak, The Brunn-Minkowski-Firey theory I : Mixed volumes and
the Minkowski problem, J. Differential Geom. 38 (1993), 131-150.
[26] E. Lutwak, The Brunn-Minkowski-Firey theory II : Affine and geominimal surface areas, Adv. Math. 118 (1996), 244-294.
[27] E. Lutwak, Intersection bodies and dual mixed volumes, Adv. Math.
71 (1988), 232-261.
[28] E. Lutwak, D. Yang and G. Zhang, A new ellipsoid associated with
convex bodies, Duke Math. J. 104 (2000), 375-390.
17
[29] E. Lutwak, D. Yang and G. Zhang, Sharp Affine Lp Sobolev inequalities, J. Differential Geometry 62 (2002), 17-38.
[30] E. Lutwak, D. Yang and G. Zhang, The Cramer–Rao inequality
for star bodies, Duke Math. J. 112 (2002), 59-81.
[31] E. Lutwak, D. Yang and G. Zhang, Volume inequalities for subspaces of Lp , J. Differential Geometry 68 (2004), 159-184.
[32] E. Lutwak, D. Yang and G. Zhang, Moment-entropy inequalities,
Ann. Probab. 32 (2004), 757-774.
[33] E. Lutwak, D. Yang and G. Zhang, Cramer-Rao and momententropy inequalities for Renyi entropy and generalized Fisher information, IEEE Transactions on Information Theory 51 (2005), 473-478.
[34] E. Lutwak and G. Zhang, Blaschke-Santaló inequalities, J. Differential Geom. 47 (1997), 1-16.
[35] M. Meyer and E. Werner, The Santaló-regions of a convex body,
Transactions of the AMS 350, no.11 (1998), 4569-4591.
[36] M. Meyer and E. Werner, On the p-affine surface area, Adv. Math.
152 (2000), 288-313.
[37] T. Morimoto, Markov processes and the H-theorem, J. Phys. Soc. Jap.
18 (1963), 328-331.
[38] F. Nazarov, F. Petrov, D. Ryabogin and A. Zvavitch, A remark
on the Mahler conjecture: local minimality of the unit cube, Duke Math.
J. 154 (2010), 419-430.
[39] G. Paouris and E. Werner, Relative entropy of cone measures and
Lp -centroid bodies, Proc. London Math. Soc. DOI 10.1112/plms/pdr030
(2011).
[40] C. Petty, Affine isoperimetric problems, Discrete Geometry and Convexity. Ann. New York Acad. Sci. 441 (1985), 113-127.
[41] A. Rényi, On measures of entropy and information, Proceedings of
the 4th Berkeley Symposium on Probability Theory and Mathematical
Statistics, vol.1 (1961), 547-561.
18
[42] B. Rubin and G. Zhang, Generalizations of the Busemann-Petty
problem for sections of convex bodies, J. Funct. Anal. 213 (2004), 473501.
[43] F. Schuster, Crofton measures and Minkowski valuations, Duke
Math. J. 154 (2010), 1-30.
[44] C. Schütt, On the affine surface area, Proc. Amer. Math. Soc. 118
(1990), 275-290.
[45] C. Schütt and E. Werner, The convex floating body. Math. Scand.
66 275-290, (1990).
[46] C. Schütt and E. Werner, Random polytopes of points chosen from
the boundary of a convex body. GAFA Seminar Notes, Lecture Notes in
Mathematics 1807, Springer-Verlag (2002), 241-422.
[47] C. Schütt and E. Werner, Surface bodies and p-affine surface area.
Adv. Math. 187 (2004), 98-145.
[48] A. Stancu, The Discrete Planar L0 -Minkowski Problem. Adv. Math.
167 (2002), 160-174.
[49] A. Stancu, On the number of solutions to the discrete two-dimensional
L0 -Minkowski problem. Adv. Math. 180 (2003), 290-323.
[50] E. Werner, Illumination bodies and affine surface area, Studia Math.
110 (1994), 257-269.
[51] E. Werner, On Lp -affine surface areas, Indiana Univ. Math. J. 56 No.
5 (2007), 2305-2324.
[52] E. Werner, Rényi Divergence and Lp -affine surface area for convex bodies Adv. Math. 230 (2012), 1040-1059.
[53] E. Werner and D. Ye, New Lp affine isoperimetric inequalities, Adv.
Math. 218 (2008), no. 3, 762-780.
[54] E. Werner and D. Ye, Inequalities for mixed p-affine surface area,
Math. Ann. 347 (2010), 703-737
[55] D. Ye, Inequalities for general mixed affine surface areas, J. London
Math. Soc. 85 (2012), 101-120.
19
[56] G. Zhang, Intersection bodies and Busemann-Petty inequalities in R4 ,
Ann. of Math. 140 (1994), 331-346.
[57] G. Zhang, A positive answer to the Busemann-Petty problem in four
dimensions, Ann. of Math. 149 (1999), 535-543.
[58] G. Zhang, New Affine Isoperimetric Inequalities, ICCM 2007, Vol. II,
239-267.
Elisabeth Werner
Department of Mathematics
Case Western Reserve University
Cleveland, Ohio 44106, U. S. A.
[email protected]
Université de Lille 1
UFR de Mathématique
59655 Villeneuve d’Ascq, France
20