Weighted entropy: basic facts and properties

UDC 519.722
Weighted entropy: basic facts and properties
M. Kelbert∗ , Y. Suhov†‡ , I. Stuhl§
∗
†
Moscow Higher School of Economics, RF
Math Dept, Penn State University, PA, USA
‡
DPMMS, University of Cambridge, UK
§
Math Dept, University of Denver, USA
Abstract. The concept of weighted entropy takes into account values of different
outcomes, i.e., makes entropy context-dependent, through the weight function.
We analyse an analog of the entropy-power inequality for the weighted entropy
and discuss connections with weighted Lieb?s splitting inequality.
Keywords: weighted entropy, Gibbs inequality, Ky-Fan inequality, entropy power
inequality, Lieb splitting inequality.
1.
Introduction
We all know that the Shannon entropy of a probability distribution p
X
h(p) = −
p(xi ) log p(xi )
is context-free, i.e., it does not depend on the nature of outcomes xi ,
only upon probabilities p(xi ). However, imagine two equally rare medical
conditions, occurring with probability p << 1, one of which carries a major
health risk while the other is just a peculiarity. Formally, they provide the
same amount of information − log p but the value of this information can
be very different. So, we may modify the definition to make it context
dependent. The weighted entropy is defined as
X
hw
φ(xi )p(xi ) log p(xi ).
φ (p) = −
A positive function xi → φ(xi ) ∈ R+ represents weights of outcome xi . A
popular example is φ(x) = 1(x ∈ A) where A is a particular subcollection
of outcomes. A similar approach can be proposed for differential entropy
of the probability density function (PDF) fZ of random variable (RV) Z.
Define the weighted differential entropy (WDE) as
Z
w
hw
(f
)
=
h
(Z)
=
−E
φ(Z)
log
f
(Z)
=
−
φ(x)fZ (x) log fZ (x)dx.
Z
φ Z
φ
(1)
We say that WDE is finite if RV Z has a density and the integral in (1)
is absolutely convergent. Basic properties of WDE are studied in [4]. As
an example, take f (x) = fCN o (x) where fCN o (x) stands for d−dimensional
Gaussian distribution with mean 0 and covariance matrix C
log e −1
αφ (C)
log (2π)d det(C) +
tr C ΦC,φ
where
Z 2
Z2
αφ (C) =
φ(x)fCN o (x)dx, ΦC,φ =
xxT φ(x)fCN o (x)dx.
No
hw
φ (fC ) =
Rd
2.
Rd
The weighted Gibbs inequality
Given two non-negative functions f, g define the weighted KullbackLeibler divergence (or relative WDE) as
Z
f (x)
w
dx.
Dφ (f ||g) = φ(x)f (x) log
g(x)
Proposition 1 Suppose that
Z
φ(x)[f (x) − g(x)]dx ≥ 0.
Then Dφw (f ||g) ≥ 0.
3.
Concavity and convexity of weighted entropy
Theorem 2 (a) The function f → hw
φ (f ) is concave in argument f .
Namely, for any PDFs f1 (x), f2 (x), a non-negative function x → φ(x),
and λ1 , λ2 ∈ [0, 1] such that λ1 + λ2 = 1
w
w
hw
φ (λ1 f1 + λ2 f2 ) ≥ λ1 hφ (f1 ) + λ2 hφ (f2 ).
This inequality is strict unless φ(x)[f1 (x)−f2 (x)] = 0 for (λ1 f1 +λ2 f2 )−almost
all x.
(b) However, the relative WDE is convex: given two pairs of PDFs (f1 , f2 )
and (g1 , g2 )
λ1 Dφw (f1 ||g1 ) + λ2 Dφw (f2 ||g2 ) ≥ Dφw (λ1 f1 + λ2 f2 ||λ1 g1 + λ2 g2 ).
4.
Ky-Fan type inequalities
It is well-known that C → δ(C) = log det(C) is a concave function of
a (strictly) positive-definite matrix C
δ(λ1 C1 + λ2 C2 ) ≥ λ1 δ(C1 ) + λ2 δ(C2 )
where λ1 + λ2 = 1, λi ≥ 0. This is the well-known Ky-Fan inequality. It
terms of differential entropies it is equivalent to the inequality
h(fCN o ) ≥ λ1 h(fCN1o ) + λ2 h(fCN2o ).
where C = λ1 C1 + λ2 C2 . Theorem 3 below presents a previously unknown series of bounds of Ky-Fan
type. The most explicit results are
No
available for φ(x) = exp xT t , t ∈ Rd in view of identity hw
) =
φ (f
1 T
No
exp 2 t Ct h(f ). Introduce a set
S = {t ∈ Rd : F (1) (t) ≥ 0, F (2) (t) ≤ 0}
F (1) (t) =
2
P
F (2) (t) =
hi=1
2
P
1 T
2 t Ci t
λi exp
λi exp
i=1
+
2
P
λi exp
i=1
− exp
1 T
2 t Ci t
− exp
1 T
2 t Ct
,
1 T
2 t Ct
−1 tr C Ci − d exp
1 T
2 t Ci t
where
i
log (2π)d det(C)
1 T
2 t Ct
.
Theorem 3 Given positive-definite matrices C1 , C2 and λ1 , λ2 ∈ [0, 1]
with λ1 + λ2 = 1, set C = λ1 C1 + λ2 C2 . Assume t ∈ S. Then
1 T 1
1
t Ct − h(fCN1o ) exp tT C1 t − h(fCN2o ) exp tT C2 t ≥ 0,
2
2
2
equality iff λ1 λ2 = 0 or C1 = C2 .
h(fCN o ) exp
5.
Weighted entropy-power inequality (WEPI)
Let X1 , X2 be independent RVs with PDF f1 , f2 and X = X1 + X2 .
The famous Shannon entropy power inequality states that
h(X1 + X2 ) ≥ h(N1 + N2 )
where N1 , N2 are Gaussian RVs such that h(Xi ) = h(Ni ), i = 1, 2. Equivalently,
e2h(X1 +X2 ) ≥ e2h(X1 ) + e2h(X2 )
(2)
see, e.g., [1]. We are interested in the Weighted Entropy Power inequality
(WEPI)
κ := exp
2hw
2hw
2hw
φ (X1 ) φ (X2 ) φ (X) + exp
≤ exp
.
Eφ(X1 )
Eφ(X2 )
Eφ(X)
(3)
Note that (3) coinsides with (2) when φ ≡ 1. We set
i
h
hw (X2 )
hw
X1
X2
φ (X1 )
φ
−
, Y1 =
, Y2 =
.
α = tan−1 exp
Eφ(X2 ) Eφ(X1 )
cos α
sin α
(4)
Theorem 4 Given independent RVs X1 , X2 with PDFs f1 , f2 , and the
weight function φ, set X = X1 + X2 . Assume the following conditions: (i)
Eφ(Xi ) ≥ Eφ(X) if κ ≥ 1, i = 1, 2,
Eφ(Xi ) ≤ Eφ(X) if κ ≤ 1, i = 1, 2.
(5)
(ii) With Y1 , Y2 and α as defined in (4)
2 w
w
(cos α)2 hw
φc (Y1 ) + (sin α) hφs (Y2 ) ≤ hφ (X)
(6)
where φc (x) = φ(x cos α), φs (x) = φ(x sin α) and
w
hw
φc (Y1 ) = −E φc (Y1 ) log(fY1 (Y1 )) , hφs (Y2 ) = −E φs (Y2 ) log(fY2 (Y2 )) .
Then WEPI holds.
Paying homage to [3] we call (6) weighted Lieb’s splitting inequality (WLSI).
In some cases WLSI may be effectively checked.
Example 5 Let d = 1 and X1 ∼N(0, σ12 ), X2 ∼N(0, σ22 ). Then the WLSI
(6) takes the following form
e
2
log 2π(σ12 + σ22 ) Eφ(X) + σlog
2
2 E[X φ(X)] ≥
1 +σ2
h
i
2 2
2πσ1
log e
(cos α)2 log (cos α)
Eφ(X1 ) + (cos α)
E[X12 φ(X1 )]
2
2
σ
1
h
i
2 2
2πσ
log e
+(sin α)2 log (sin α)2 2 Eφ(X2 ) + (sin α)
E[X22 φ(X2 )].
σ2
2
5.1.
WLSI for the weight function close to a constant
Proposition 6 Let d = 1, Xi ∼N(µi , σi2 ), i = 1, 2 be independent and
X = X1 + X2 ∼N(µ1 + µ2 , σ12 + σ22 ). Suppose that WF x → φ(x) is twice
continuously differentiable and
|φ00 (x)| ≤ φ(x), |φ(x) − φ̄| ≤ (7)
where > 0 and φ̄ > 0 are constants. Then there exists 0 > 0 such that
for all WF φ satisfying (7) with 0 < < 0 WLSI holds true. Hence, the
checking of WEPI is reduced to condition (5).
For a RV Z, γ > 0 and independent Gaussian RV N ∼N(0, Id ) define
i
h
√
M (Z; γ0 ) = E ||Z − E[Z|Z γ + N ]||2
where ||.|| stands for Euclidean norm. According to [2, 5] the differential
entropy
Z
1 ∞
h(Z) = h(N ) +
M (Z; γ) − 1{γ<1} dγ.
2 0
For Z = Y1 , Y2 , X1 + X2 assume the following conditions
E[| log fZ (Z)|] < ∞, E[||Z||2 ] < ∞.
(8)
Theorem 7 Assume conditions (8). Let γ0 be a point of continuity of
M (Z; γ), Z = Y1 , Y2 , X1 + X2 . Suppose that there exists δ > 0 such that
M (X1 + X2 ; γ0 ) ≥ M (Y1 , γ0 )(cos α)2 + M (Y2 ; γ0 )(sin α)2 + δ.
Suppose that for some φ̄ > 0 the WF satisfies
|φ(x) − φ̄| < .
(9)
Then there exists 0 = 0 (γ0 , δ, f1 , f2 ) such that for all WFs satisfying (9)
with < 0 WLSI holds true.
References
1. Kelbert M, Suhov Y. Information Theory and Coding by Example.
Cambridge: Cambridge University Press, 2013
2. Kelbert M., Suhov Y. Continuity of mutual entropy in the limiting
signal-to-noise ratio regimes. In: Stochastic Analysis, Springer-Verlag:
Berlin (2010), 281-299
3. Lieb E., Proof of entropy conjecture of Wehrl. Commun. Math. Phys.,
62 (1978), 35-41
4. Suhov Y., Stuhl I, Sekeh S., Kelbert M., Basic inequalities for weighted
entropy, Aequationes Mathematicae, 90 (2016), 4, 817-848
5. Verdú S., Guo D., A simple proof of the entropy -power inequality.
IEEE Transaction on Information Theory, 52, No. 5 (2006), 2165-2166