SOME INEQUALITIES FOR THE CSISZ´AR Φ

SOME INEQUALITIES FOR THE CSISZÁR Φ-DIVERGENCE
S.S. DRAGOMIR
Abstract. Some inequalities for the Csiszár Φ-divergence and applications
for the Kullback-Leibler, Rényi, Hellinger and Bhattacharyya distances in Information Theory are given.
1. Introduction
Given a convex function Φ : R+ → R+ , the Φ-divergence functional
’ “
n
X
pi
IΦ (p, q) :=
qi Φ
(1.1)
qi
i=1
was introduced in Csiszár [1], [2] as a generalized measure of information, a “distance function” on the set of probability distributions Pn . The restriction here to
discrete distribution is only for convenience, similar results hold for general distributions.
As in Csiszár [2], we interpret undefined expressions by
’ “
0
= 0,
Φ (0) = lim Φ (t) , 0Φ
t→0+
0
a‘
a‘
Φ (t)
= lim Φ
= a lim
, a > 0.
0Φ
t→∞
ε→0+
0
ε
t
The following results were essentially given by Csiszár and Körner [3].
Theorem 1. If Φ : R+ → R is convex, then IΦ (p, q) is jointly convex in p and q.
The following lower bound for the Φ−divergence functional also holds.
Theorem 2. Let Φ : R+ → R+ be convex. Then for every p, q ∈ Rn+ , we have the
inequality:

P
n
p
n
i
X
 i=1 
.
(1.2)
qi Φ 
IΦ (p, q) ≥
n

P
i=1
qi
i=1
If Φ is strictly convex, equality holds in (1.2) iff
p2
pn
p1
(1.3)
=
= ... =
.
q1
q2
qn
Date: November 16, 1999.
1991 Mathematics Subject Classification. Primary 94Xxx; Secondary 26D15.
Key words and phrases. The Csiszár Φ-Divergence, Kullback-Leibler distance, Rényi
α−entropy, Hellinger discrimination, Bhattacharyya distance.
1
2
S.S. DRAGOMIR
Corollary 1. Let Φ : R+ → R be convex and normalized, i.e.,
(1.4)
Then for any p, q ∈
Φ (1) = 0.
Rn+
with
n
X
(1.5)
n
X
pi =
i=1
we have the inequality
(1.6)
qi ,
i=1
IΦ (p, q) ≥ 0.
If Φ is strictly convex, equality holds in (1.6) iff pi = qi for all i ∈ {1, ..., n}.
In particular, if p, q are probability vectors, then (1.5) is assured. Corollary 1
then shows, for strictly convex and normalized Φ : R+ → R, that
(1.7)
IΦ (p, q) ≥ 0 for all p, q ∈ Pn .
The equality holds in (1.7) iff p = q.
These are “distance properties”. However, IΦ is not a metric: It violates the
triangle inequality, and is asymmetric, i.e, for general p, q ∈ Rn+ , IΦ (p, q) =
6
IΦ (q, p).
In the examples below we obtain, for suitable choices of the kernel Φ, some of the
best known distance functions IΦ used in mathematical statistics [4]-[5], information
theory [6]-[8] and signal processing [9]-[10].
Example 1. (Kullback-Leibler) For
(1.8)
Φ (t) := t log t, t > 0;
the Φ−divergence is
(1.9)
IΦ (p, q) =
n
X
pi log
i=1
’
pi
qi
“
,
the Kullback-Leibler distance [11]-[12].
Example 2. (Hellinger) Let
√ ‘2
1
1 − t , t > 0.
2
Then IΦ gives the Hellinger distance [13]
(1.10)
Φ (t) =
(1.11)
IΦ (p, q) =
n
which is symmetric.
1X √
√ 2
( pi − qi ) ,
2 i=1
Example 3. (Renyi) For α > 1, let
(1.12)
Φ (t) = tα , t > 0.
Then
(1.13)
IΦ (p, q) =
n
X
i=1
which is the α−order entropy [14].
1−α
pα
,
i qi
SOME INEQUALITIES FOR THE CSISZÁR Φ-DIVERGENCE
Example 4. (χ2 −distance) Let
3
2
Φ (t) = (t − 1) , t > 0.
(1.14)
Then
IΦ (p, q) =
(1.15)
n
2
X
(pi − qi )
qi
i=1
is the χ2 −distance between p and q.
Finally, we have
Example 5. (Variation distance). Let Φ (t) = |t − 1| , t > 0. The corresponding divergence, called the variation distance, is symmetric,
n
X
|pi − qi | .
IΦ (p, q) =
i=1
For other examples of divergence measures, see the paper [22] by J.N. Kapur,
where further references are given.
2. Other Inequalities for the Csiszár Φ-Divergence
We start with the following result.
Theorem 3. Let Φ : R+ → R be differentiable convex. Then for all p, q ∈ Rn+ we
have the inequality
’ 2 “
p
(2.1)
Φ0 (1) (Pn − Qn ) ≤ IΦ (p, q) − Qn Φ (1) ≤ IΦ0
, p − IΦ0 (p, q) ,
q
n
n
P
P
qi > 0 and Φ0 : (0, ∞) → R is the derivative of
where Pn :=
pi > 0, Qn :=
i=1
i=1
Φ.
If Φ is strictly convex and pi , qi > 0 (i = 1, ..., n), then the equality holds in (2.1)
iff p = q.
Proof. As Φ is differentiable convex on R+ , then we have the inequality
(2.2)
Φ0 (y) (y − x) ≥ Φ (y) − Φ (x) ≥ Φ0 (x) (y − x)
for all x, y ∈ R+ .
Choose in (2.2) y = pqii and x = 1, to obtain
’ “’
“
’ “
’
“
pi
pi
pi
pi
0
0
(2.3)
− Φ (1) ≥ Φ (1)
Φ
−1 ≥Φ
−1
qi
qi
qi
qi
for all i ∈ {1, ..., n}.
Now, if we multiply (2.3) by qi ≥ 0 (i = 1, ..., n) and sum over i from 1 to n, we
can deduce
’ “
n
X
pi
0
≥ IΦ (p, q) − Qn Φ (1) ≥ Φ0 (1) (Pn − Qn )
(pi − qi ) Φ
q
i
i=1
and as
n
X
i=1
(pi − qi ) Φ
0
’
pi
qi
“
= I Φ0
’
“
p2
, p − IΦ0 (p, q) ,
q
4
S.S. DRAGOMIR
the inequality in (2.1) is thus obtained.
The case of equality holds in (2.2) for a strictly convex mapping iff x = y and
so the equality holds in (2.1) iff pqii = 1 for all i ∈ {1, ..., n}, and the theorem is
proved.
Remark 1. In the above theorem, if we would like to drop the‚ differentiabilityƒ
condition, we can choose instead of Φ0 (x) any number l = l (x) ∈ Φ0− (x) , Φ0+ (x)
and the inequality will still be valid. This follows by the fact that for the convex
mapping Φ : R+ → R+ we have
l2 (x) (x − y) ≥ Φ (x) − Φ (y) ≥ l1 (y) (x − y) , x, y ∈ (0, ∞) ;
ƒ
‚
‚
ƒ
where l1 (y) ∈ Φ0− (y) , Φ0+ (y) and l2 (x) ∈ Φ0− (x) , Φ0+ (x) , where Φ0− is the left
and Φ0+ is the right derivative of Φ respectively. We omit the details.
The following corollary is a natural consequence of the above theorem.
Corollary 2. Let Φ : R+ → R+ be convex and normalized. If Φ0 (1) (Pn − Qn ) ≥
0, then we have the positivity inequality
’ 2 “
p
0 ≤ IΦ (p, q) ≤ IΦ0
(2.4)
, p − IΦ0 (p, q) .
q
The equality holds in (2.4) for a strictly convex mapping Φ iff p = q.
Remark 2. Corollary 2 shows that the positivity inequality (1.6) holds for a larger
class
q) ∈ Rn+ than
that one considered in Corollary 1, namely, for (p, q) ∈
ˆ n of (p,
‰
n
R+ × R+ : Pn = Qn .
We have the following theorem as well.
Theorem 4. Assume that Φ is differentiable convex on (0, ∞). If p(j) , q (j) (j = 1, 2)
are probability distributions, then for all λ ∈ [0, 1] we have the inequality
‘


‘
(2.5) 0 ≤ λIΦ p(1) , q (1) + (1 − λ) IΦ p(2) , q (2)

‘
−IΦ λp(1) + (1 − λ) q (1) , λp(2) + (1 − λ) q (2)
Œ
Œ
Œ (1)
(2) Œ
pi Œ
Œ pi
!
!#
" Œ (1)
(2) Œ
n
(1)
(2)
Œ qi
X
qi Œ
pi
pi
0
0
−Φ
,
Φ
≤ λ (1 − λ)
(2)
(1)
(1)
(2)
qi
qi
i=1 λqi + (1 − λ) qi
where Φ0 is the derivative of Φ.
Proof. Using the inequality (2.2) , we may state
!
! (1)
(2)
(1)
(2)
(1)
λpi + (1 − λ) pi
λpi + (1 − λ) pi
pi
0
Φ
− (1)
(2.6)
(1)
(1)
(2)
(2)
λqi + (1 − λ) qi
λqi + (1 − λ) qi
qi
!
!
(2)
(1)
(1)
λpi + (1 − λ) pi
pi
≥ Φ
−Φ
(1)
(1)
(2)
λqi + (1 − λ) qi
qi
! !
(1)
(1)
(1)
(2)
pi
λpi + (1 − λ) pi
pi
0
≥ Φ
− (1)
(1)
(1)
(2)
qi
λqi + (1 − λ) qi
qi
SOME INEQUALITIES FOR THE CSISZÁR Φ-DIVERGENCE
5
and
Φ0
(2.7)
≥
≥
(1)
λpi
(1)
λqi
(2)
+ (1 − λ) pi
(2)
+ (1 − λ) qi
! (1)
λpi
(1)
(2)
λqi + (1 − λ) qi
!
!
(2)
(2)
(1)
λpi + (1 − λ) pi
pi
Φ
−Φ
(1)
(2)
(2)
λqi + (1 − λ) qi
qi
! !
(2)
(2)
(1)
(2)
+
(1
−
λ)
p
p
λp
p
i
i
i
i
Φ0
− (2)
.
(2)
(1)
(2)
qi
λqi + (1 − λ) qi
qi
(1)
(2)
(2)
+ (1 − λ) pi
pi
−
(2)
qi
!
(2)
Multiply (2.6) by λqi and (2.7) by (1 − λ) qi and add the obtained inequalities
to get
!
!"
n
(1)
(2)
(1)
(1)
(2)
X
λpi + (1 − λ) pi
λpi + (1 − λ) pi
pi
(1)
0
(2.8)
Φ
λqi
− (1)
(1)
(1)
(2)
(2)
λqi + (1 − λ) qi
λqi + (1 − λ) qi
qi
i=1
!#
(2)
(2)
(1)
pi
λpi + (1 − λ) pi
(2)
− (2)
+ (1 − λ) qi
(1)
(2)
λqi + (1 − λ) qi
qi
‘

≥ IΦ λp(1) + (1 − λ) p(2) , λq (1) + (1 − λ) q (2)

‘

‘
−λIΦ p(1) , q (1) − (1 − λ) IΦ p(2) , q (2)
! !
"
n
(2)
(1)
(1)
(1)
X
λpi + (1 − λ) pi
pi
pi
(1) 0
− (1)
≥
λqi Φ
(1)
(1)
(2)
qi
λqi + (1 − λ) qi
qi
i=1
! !#
(2)
(1)
(2)
(2)
λpi + (1 − λ) pi
pi
pi
(2) 0
+ (1 − λ) qi Φ
.
− (2)
(2)
(1)
(2)
qi
λqi + (1 − λ) qi
qi
However,
(1)
λqi
(1)
λpi
(1)
pi
!
− (1)
(2)
+ (1 − λ) qi
qi
(1)
(2)
λpi + (1 − λ) pi
(2)
(1)
λqi
(2)
+ (1 − λ) pi
+ (1 − λ) qi
(2)
pi
!
− (2)
(2)
+ (1 − λ) qi
q
Œ
Œ
Œ i
Œ (1)
Œ (1)
(2) Œ
(2)
pi Œ
pi
Œ pi
Œ pi
λ (1 − λ) Œ (1)
λ (1 − λ) Œ (1)
(2) Œ
(2)
Œ qi
Œ qi
qi Œ
qi
= −
+
(1)
(1)
(2)
(2)
λqi + (1 − λ) qi
λqi + (1 − λ) qi
(1)
λqi
Œ
Œ
Œ
Œ
Œ
= 0,
which shows that the first membership in (2.8) is zero.
In addition,
(1)
λqi
(1)
λpi
(1)
λqi
(2)
+ (1 − λ) pi
(2)
+ (1 − λ) qi
(1)
−
pi
(1)
qi
!
=−
Œ
Œ (1)
Œ pi
λ (1 − λ) Œ (1)
Œ qi
(1)
λqi
Œ
(2) Œ
pi Œ
(2) Œ
qi Œ
(2)
+ (1 − λ) qi
,
6
S.S. DRAGOMIR
and
(1 −
(2)
λ) qi
(1)
λpi
(1)
λqi
(2)
+ (1 − λ) pi
(2)
+ (1 − λ) qi
(2)
−
pi
(2)
qi
!
=−
Œ
Œ (1)
Œ pi
λ (1 − λ) Œ (1)
Œ qi
(1)
λqi
Œ
(2) Œ
pi Œ
(2) Œ
qi Œ
(2)
+ (1 − λ) qi
and then, the second membership in (2.4) is
Œ
Œ
Œ (1)
(2) Œ
pi Œ
Œ pi
!
!#
" Œ (1)
(2) Œ
n
(2)
(1)
Œ qi
X
qi Œ
pi
pi
0
0
−λ (1 − λ)
−Φ
,
Φ
(1)
(1)
(2)
(2)
+ (1 − λ) qi
qi
qi
i=1 λqi
,
which proves the theorem.
Remark 3. The first inequality in (2.5) is actually the joint convexity property of
IΦ (·, ·) which has been proven here in a different manner than in [3].
3. Applications for Some Particular Φ−Divergences
Let us consider the Kullback-Leibler distance given by (1.9)
’ “
n
X
pi
KL (p, q) :=
(3.1)
pi log
.
qi
i=1
Consider the convex mapping Φ (t) = − log t, t > 0. For this mapping we have
the Csiszár Φ−divergence
”
’ “•
n
X
pi
(3.2)
qi − log
IΦ (p, q) =
qi
i=1
’
“
n
X
qi
qi log
=
= KL (q, p) .
p
i
i=1
The following inequality holds.
Proposition 1. Let p, q ∈ Rn . Then we have the inequality
n
X
qi2
(3.3)
Qn − Pn ≤ KL (q, p) ≤
− Qn .
p
i=1 i
The case of equality holds iff p = q.
Proof. Since Φ (t) = − log t, then Φ0 (t) = − 1t , t > 0. We have


’ 2 “
n
X
1
p
 = −Qn ,
IΦ 0
,p
pi · −  2 ‘
=
pi
1
q
·
i=1
qi
pi
#
"
n
n
X
X
1
qi2
IΦ0 (p, q) =
qi · − p i = −
,
p
qi
i=1
i=1 i
and then, from (2.1), we get
− (Pn − Qn ) ≤ KL (q, p) ≤ −Qn +
which is the desired inequality (3.3).
n
X
q2
i
i=1
pi
,
SOME INEQUALITIES FOR THE CSISZÁR Φ-DIVERGENCE
7
The case of equality is obvious taking into account that − log is a strictly convex
mapping on (0, ∞).
The following result for the Kullback-Leibler distance also holds.
Proposition 2. Let p, q ∈ Rn . Then we have the inequality
(3.4)
“
’
p2
Pn − Qn ≤ KL (p, q) ≤ Pn − Qn + KL (q, p) − KL p,
.
q
The case of equality holds iff p = q.
Proof. As Φ (t) = t log (t), then Φ0 (t) = log t + 1. We have
IΦ (p, q) =
’ 2 “
p
=
,p
IΦ0
q
KL (p, q) ,
’ 2 “
’ 2 “
p
p
Ilog(·)+1
, p = Pn + Ilog(·)
,p .
q
q
As we know that I− log(·) (p, q) = KL (q, p) (see (3.2)), then we have that
’ 2 “
’
“
p
p2
Ilog(·)
, p = −KL p,
.
q
q
In addition, we have
IΦ0 (p, q)
=
Ilog(·)+1 (p, q) = Qn + Ilog(·) (p, q)
=
Qn − KL (q, p)
and then, by (2.1), we can state that
“
’
p2
Qn + KL (q, p)
Pn − Qn ≤ KL (p, q) ≤ Pn − Qn − KL p,
q
and the inequality (3.4) is obtained.
The case of equality holds from the fact that the mapping Φ (t) = t log t is strictly
convex on (0, ∞).
Now, let us consider the α−order entropy of Rényi (see (1.13))
(3.5)
Dα (p, q) :=
n
X
piα qi1−α , α > 1
i=1
and p, q ∈ Rn+ .
We know that Rényi’s entropy is actually the Csiszár Φ−divergence for the
convex mapping Φ (t) = tα , α > 1, t > 0 (see Example 3).
The following proposition holds.
Proposition 3. Let p, q ∈ Rn+ . Then we have the inequality
 2−α
‘i
h
(3.6)
α (Pn − Qn ) ≤ Dα (p, q) − Qn ≤ α Dα (p, q) − Dα q α , p−1 .
The case of equality holds iff p = q.
Proof. Since Φ (t) = tα , then Φ0 (t) = αtα−1 .
8
S.S. DRAGOMIR
We have
IΦ0
’
“
p2
,p
q
=
=
“α−1 #
p2i
pi α ·
qi p i
i=1
’
“
n
n
α−1
X
X
pi
pi
α
=α
qi1−α pα
i = αDα (p, q)
q
i
i=1
i=1
"
n
X
’
and
IΦ0 (p, q)
=
=
n
X
qi
i=1
n
X
α
"
α·
’
pi
qi
“α−1 #
pα−1
qi2−α
i
i=1
’
= αDα q
2−α
α
1
,
p
“
.
Using the inequality (2.1), we have
“•
’
2−α 1
α (Pn − Qn ) ≤ Dα (p, q) − Qn ≤ α Dα (p, q) − Dα q α ,
p
”
and the inequality (3.6) is proved.
The case of equality holds since the mapping Φ (t) = tα is strictly convex on
(0, ∞) for α > 1.
Consider now the Hellinger discrimination (see for example [22])
n
h2 (p, q) =
1X √
√ 2
( p i − qi ) ,
2 i=1
where p, q ∈ Rn+ .
We know that Hellinger discrimination is actually the Csiszár Φ−divergence for
2
€√
t−1 .
the convex mapping Φ (t) = 21
We may state the following proposition.
n
Proposition 4. Let p, q ∈ R+
. Then we have the inequality
(3.7)
" n
’r
r “#
qi
pi
1 X
1
.
qi
−
0 ≤ h (p, q) ≤ [Pn − Qn ] +
2
2 i=1
pi
qi
2
The equality holds iff p = q.
€√
2
1
and Φ00 (t) =
t − 1 , we have Φ0 (t) = 12 − 2√
Proof. As Φ (t) = 12
t
(t ∈ (0, ∞)) which shows that Φ is indeed strictly convex on (0, ∞).
1
4
·
√1
t3
>0
SOME INEQUALITIES FOR THE CSISZÁR Φ-DIVERGENCE
9
We also have:
h2 (p, q) ,


n
X
1
1
pi  − q 2 
p
2
i=1
2 qi pi i
IΦ (p, q) =
’ 2 “
p
IΦ0
,p
=
q
IΦ0 (p, q)
"
#
n
n
X
1
1 X√
1
√
=
Pn −
pi qi =
Pn −
pi qi
2
2 i=1
2
i=1


"
r #
n
n
X
X
1
1
1
qi
qi  − q  =
Qn −
=
qi
pi
2
2
p
i
2
i=1
i=1
qi
and as Φ0 (1) = 0 and Φ (1) = 0, then, by (2.1) applied for Φ as above, we deduce
(3.7). The case of equality is obvious by the strict convexity of Φ.
Consider now the Bhattacharyya distance ( see for example [22])
n
X
√
B (p, q) =
pi q i ,
i=1
n
.
where p, q ∈ R+
√
We know that for the convex mapping f (t) = − t, we have
r
n
X
pi
= −B (p, q) .
IΦ (p, q) = −
qi
qi
i=1
We may state the following proposition.
n
. Then we have the inequality
Proposition 5. Let p, q ∈ R+
n
1
1X
(Qn − Pn ) ≤ Qn − B (p, q) ≤
qi
2
2 i=1
(3.8)
’r
qi
−
pi
r
pi
qi
“
.
The case of equality holds iff p = q.
√
1
and Φ00 (t) = 4√1t3 , t > 0, which
Proof. As Φ (1) = − t, t > 0, then Φ0 (t) = − 2√
t
also shows that Φ (·) is strictly convex on (0, ∞). We also have


’ 2 “
n
n
X
1
1 X√
p
1
IΦ0
=
pi − q 2  = −
,p
pi qi = − B (p, q) ,
pi
q
2 i=1
2
i=1
2
qi p i
I
Φ0
(p, q) =
−
n
1X
2
i=1
1
qi q
pi
qi
n
1X
=−
qi
2 i=1
r
qi
pi
and as Φ0 (1) = − 12 , Φ (1) = −1, then by (2.1) applied for the mapping Φ as defined
above, we deduce (3.8).
The case of equality is obvious by the strict convexity of Φ.
10
S.S. DRAGOMIR
4. Further Bounds for the Case when Pn = Qn
The following inequality of Grüss type is well known in the literature as the
Biernacki, Pidek and Ryll-Nardzewski inequality (see for example [23]).
Lemma 1. Let ai , bi (i = 1, ..., n) be real numbers such that
(4.1)
a ≤ ai ≤ A, b ≤ bi ≤ B for all i ∈ {1, ..., n} .
Then we have the inequality:
Œ
Œ n
’
“
n
n
Œ
1 X X ŒŒ
1 hni
1 hni
Œ1 X
bi Œ ≤
ai bi − 2
1−
(A − a) (B − b) ,
ai
(4.2)
Œ
Œ n 2
Œn
n i=1
n 2
i=1
i=1
where [x] denotes the integer part of x.
The following inequality holds.
Theorem 5. Let Φ : R+ → R+ be differentiable convex. If p, q ∈ Rn+ are such that
Pn = Qn and
(4.3)
(4.4)
m ≤ pi − qi ≤ M, i = 1, ..., n
0<r≤
pi
≤ R < ∞, i = 1, ..., n,
qi
then we have the inequality
(4.5)
“
hni ’
1 hni
1−
(M − m) (Φ0 (R) − Φ0 (r)) .
0 ≤ IΦ (p, q) − Qn Φ (1) ≤
2
n 2
Proof. From (2.1) we have
(4.6)
0 ≤ IΦ (p, q) − Qn Φ (1) ≤
n
X
i=1
(pi − qi ) Φ0
’
pi
qi
“
.
Applying (4.2) we have
Œ n
’ “ŒŒ
’ “
n
n
Œ1 X
X
X
pi Œ
p
1
Œ
i
Φ0
(pi − qi )
− 2
(4.7)
(pi − qi ) Φ0
Œ
Œ
Œn
q
n
qi Œ
i
i=1
i=1
i=1
’
“
1 hni
1 hni
1−
(M − m) (Φ0 (R) − Φ0 (r))
≤
n 2
n 2
as the mapping Φ0 is monotonic nondecreasing, and then
’ “
pi
Φ0 (r) ≤ Φ0
≤ Φ0 (R) for all i ∈ {1, ..., n} .
qi
As
n
P
i=1
(pi − qi ) = 0, we deduce by (4.6) and (4.7) the desired result (4.5).
The following inequalities for particular distances are valid.
1. If p, q ∈ Rn+ are such that the conditions (4.3) and (4.4) hold, then we have
the inequalities
“
hni ’
R−r
1 hni
(4.8)
,
0 ≤ KL (q, p) ≤
1−
(M − m)
2
n 2
rR
SOME INEQUALITIES FOR THE CSISZÁR Φ-DIVERGENCE
and
(4.9)
0 ≤ KL (p, q) ≤
11
“
” ’ “•
hni ’
1 hni
R
1−
(M − m) log
.
2
n 2
r
2. If p, q are as in (4.3) and (4.4), we have the inequality (α ≥ 1)
“
hni ’

€
1 hni
0 ≤ Dα (p, q) − Qn ≤ α
(4.10)
1−
(M − m) Rα−1 − rα−1 .
2
n 2
3. If p, q are as in (4.3) and (4.4), we have the inequality
√
’
“
√
1 hni
1 hni
R− r
2
0 ≤ h (p, q) ≤
1− ·
(M − m) √
.
(4.11)
2 2
n 2
rR
4. Under the above assumptions for p and q, we have
√
’
“
√
1 hni
R− r
1 hni
.
(4.12)
1−
(M − m) √
0 ≤ Qn − B (p, q) ≤
2 2
n 2
rR
Using the following Grüss’ weighted inequality.
Lemma 2. Assume that ai , bi (i = 1, ..., n) are as in Lemma 1. If qi ≥ 0,
1, then we have the inequality
Œ
Œ
n
n
n
ŒX
Œ
X
X
Œ
Œ 1
qi ai
(4.13)
qi bi Œ ≤ (A − a) (B − b) .
qi ai bi −
Œ
Œ
Œ 4
i=1
i=1
i=1
n
P
qi =
i=1
We may prove the following converse inequality as well.
Theorem 6. Let Φ : R+ → R+ be differentiable convex. If p, q ∈ Rn+ are such that
Pn = Qn and
pi
≤ R < ∞, i = 1, ..., n,
0<r≤
(4.14)
qi
then we have the inequality
1
0 ≤ IΦ (p, q) − Qn Φ (1) ≤ (R − r) [Φ0 (R) − Φ0 (r)] .
(4.15)
4
Proof. From (2.1) we have
’ “
n
X
pi
0
(pi − qi ) Φ
(4.16)
0 ≤ IΦ (p, q) − Qn Φ (1) ≤
qi
i=1
“
“
’
’
n
X
pi
pi
=
qi
− 1 Φ0
.
qi
qi
i=1
As Φ0 (·) is monotonic nondecreasing, then
’ “
pi
0
0
≤ Φ0 (R) for all i ∈ {1, ..., n} .
Φ (r) ≤ Φ
qi
 ‘
Applying (4.13) for ai = pqii − 1, bi = Φ0 pqii , we obtain
Œ n
’ “ŒŒ
“ ’ “ X
“X
’
’
n
n
ŒX
p
p
pi
pi Œ
Œ
i
i
qi Φ 0
− 1 Φ0
−
qi
−1
qi
(4.17)
Œ
Œ
Œ
q
q
q
qi Œ
i
i
i
i=1
i=1
i=1
≤
1
(R − r) [Φ0 (R) − Φ0 (r)]
4
12
S.S. DRAGOMIR
and as
n
X
i=1
qi
’
pi
−1
qi
“
= 0,
then, by (4.16) and (4.17) we deduce (4.15).
The following inequalities for particular distances are valid.
1. If p, q are such that Pn = Qn and (4.14) holds, then
(4.18)
0 ≤ KL (q, p) ≤
(R − r)
4rR
and
(4.19)
0 ≤ KL (q, p) ≤
1
2
(R − r) ln
4
2
’
R
r
“
.
2. With the same assumptions for p, q, we have
€

α
(4.20)
(α ≥ 1) ;
0 ≤ Dα (p, q) − Qn ≤ (R − r) Rα−1 − rα−1
4 √
√
1
R− r
0 ≤ h2 (p, q) ≤ (R − r) √
(4.21)
8
Rr
and
√
√
R− r
1
(4.22)
0 ≤ Qn − B (p, q) ≤ (R − r) √
.
8
Rr
Remark 4. Any other Grüss type inequality can be used to provide different bounds
for the difference
’ “
n
X
pi
(pi − qi ) Φ0
.
∆ :=
qi
i=1
We omit the details.
References
[1] I. CSISZÁR, Information measures: A critical survey, Trans. 7th Prague Conf. on Info. Th.,
Statist. Decis. Funct., Random Processes and 8th European Meeting of Statist., Volume B,
Academia Prague, 1978, 73-86.
[2] I. CSISZÁR, Information-type measures of difference of probability distributions and indirect
observations, Studia Sci. Math. Hungar, 2 (1967), 299-318.
[3] I. CSISZÁR and J. KÖRNER, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, New York, 1981.
[4] J.H. JUSTICE (editor), Maximum Entropy and Bayssian Methods in Applied Statistics,
Cambridge University Press, Cambridge, 1986.
[5] J.N. KAPUR, On the roles of maximum entropy and minimum discrimination information
principles in Statistics, Technical Address of the 38th Annual Conference of the Indian Society
of Agricultural Statistics, 1984, 1-44.
[6] I. BURBEA and C.R. RAO, On the convexity of some divergence measures based on entropy
functions, IEEE Transactions on Information Theory, 28 (1982), 489-495.
[7] R.G. GALLAGER, Information Theory and Reliable Communications, J. Wiley, New York,
1968.
[8] C.E. SHANNON, A mathematical theory of communication, Bull. Sept. Tech. J., 27 (1948),
370-423 and 623-656.
[9] B.R. FRIEDEN, Image enhancement and restoration, Picture Processing and Digital Filtering (T.S. Huang, Editor), Springer-Verlag, Berlin, 1975.
SOME INEQUALITIES FOR THE CSISZÁR Φ-DIVERGENCE
13
[10] R.M. LEAHY and C.E. GOUTIS, An optimal technique for constraint-based image restoration and mensuration, IEEE Trans. on Acoustics, Speech and Signal Processing, 34 (1986),
1692-1642.
[11] S. KULLBACK, Information Theory and Statistics, J. Wiley, New York, 1959.
[12] S. KULLBACK and R.A. LEIBLER, On information and sufficiency, Annals Math. Statist.,
22 (1951), 79-86.
[13] R. BERAN, Minimum Hellinger distance estimates for parametric models, Ann. Statist., 5
(1977), 445-463.
[14] A. RENYI, On measures of entropy and information, Proc. Fourth Berkeley Symp. Math.
Statist. Prob., Vol. 1, University of California Press, Berkeley, 1961.
[15] S.S. DRAGOMIR and N.M. IONESCU, Some converse of Jensen’s inequality and applications, Anal. Num. Theor. Approx., 23 (1994), 71-78.
[16] S.S. DRAGOMIR and C.J. GOH, A counterpart of Jensen’s discrete inequality for differentiable convex mappings and applications in information theory, Math. Comput. Modelling,
24 (2) (1996), 1-11.
[17] S.S. DRAGOMIR and C.J. GOH, Some counterpart inequalities in for a functional associated
with Jensen’s inequality, J. of Ineq. & Appl., 1 (1997), 311-325.
[18] S.S. DRAGOMIR and C.J. GOH, Some bounds on entropy measures in information theory,
Appl. Math. Lett., 10 (1997), 23-28.
[19] S.S. DRAGOMIR and C.J. GOH, A counterpart of Jensen’s continuous inequality and applications in information theory, RGMIA Preprint,
http://matilda.vu.edu.au/~rgmia/InfTheory/Continuse.dvi
[20] M. MATIĆ, Jensen’s inequality and applications in Information Theory (in Croatian), Ph.D.
Thesis, Univ. of Zagreb, 1999.
[21] S.S. DRAGOMIR, J. ŠUNDE and M. SCHOLZ, Some upper bounds for relative entropy and
applications, Comp. and Math. with Appl. (in press).
[22] J.N. KAPUR, A comparative assessment of various measures of directed divergence, Advances
in Management Studies, 3 (1984), No. 1, 1-16.
[23] D.S. MITRINOVIĆ, J.E. PEČARIĆ and A.M. FINK, Classical and New Inequalities in
Analysis, Kluwer Academic Publishers, 1993.
School of Communications and Informatics, Victoria University of Technology, PO
Box 14428, Melbourne City MC 8001, Victoria, Australia.
E-mail address: [email protected]
URL: http://rgmia.vu.edu.au/SSDragomirWeb.html