Rényi Divergence Measures for Commonly Used Univariate

Rényi Divergence Measures for Commonly Used
Univariate Continuous Distributions
M. Gil1,∗, F. Alajaji, T. Linder
Department of Mathematics and Statistics, Queen’s University, Kingston, ON, K7L 3N6
Abstract
Probabilistic ‘distances’ (also called divergences), which in some sense assess how
‘close’ two probability distributions are from one another, have been widely employed in probability, statistics, information theory, and related fields. Of particular importance due to their generality and applicability are the Rényi divergence measures. This paper presents closed-form expressions for the Rényi and
Kullback-Leibler divergences for nineteen commonly used univariate continuous
distributions as well as those for multivariate Gaussian and Dirichlet distributions. In addition, a table summarizing four of the most important information measure rates for zero-mean stationary Gaussian processes, namely Rényi
entropy, differential Shannon entropy, Rényi divergence, and Kullback-Leibler
divergence, is presented. Lastly, a connection between the Rényi divergence
and the variance of the log-likelihood ratio of two distributions is established,
thereby extending a previous result by Song [J. Stat. Plan. Infer. 93 (2001)]
on the relation between Rényi entropy and the log-likelihood function. A table with the corresponding variance expressions for the univariate distributions
considered here is also included.
Keywords: Rényi divergence, Rényi divergence rate, Kullback divergence,
probabilistic distances, divergences, continuous distributions, log-likelihood
ratio.
1. Introduction
In his 1961 work [27], Rényi introduced generalized information and divergence measures which naturally extend Shannon entropy [29] and KullbackLeibler divergence (KLD) [19]. For a probability density f on RnR the Rényi
α
entropy of order α is defined via the integral hα (f ) := (1 − α)−1 ln f (x) dx
∗ Corresponding
author
Email addresses: [email protected] (M. Gil), [email protected] (F. Alajaji),
[email protected] (T. Linder)
1 Present address: McMaster University, Department of Engineering Physics, 1280 Main
Street West, John Hodgins Engineering Building, Room A315, Hamilton, Ontario, Canada
L8S 4L7, Phone: (905) 525-9140 ext 24545, Fax: (905) 527-8409.
Preprint submitted to Information Sciences
July 14, 2013
for α > 0 and α 6= 1. Under appropriate conditions, in the limit
R as α → 1, hα (f )
converges to the differential Shannon entropy h(f ) := − f (x) ln f (x) dx =
−Ef [ln f ] . If g is another probability density on Rn , the Rényi divergence of
R
α
1−α
dx
order α between f and g is given by Dα (f ||g) := (α−1)−1 ln f (x) g(x)
for α > 0 and α 6= 1. Under appropriate conditions, inR the limit α → 1, Dα (f ||g)
converges to the KLD between f and g, D(f ||g) := f (x) log(f (x)/g(x)) dx =
Ef [log(f /g)].
While a significant number of other divergence measures have since been
introduced [1, 3, 9], Rényi divergences are especially important because of their
generality, applicability, and the fact that they posses operational definitions in
the context of hypothesis testing [7, 2] as well as coding [14]. Additional applications of Rényi divergences include the derivation of a family of test statistics
for the hypothesis that the coefficients of variation of k normal populations are
equal [23], their use in problems of classification, indexing and retrieval, for example [15], and tests statistics for partially observed diffusion processes [28], to
name a few.
The ubiquity of Rényi divergences suggests the importance of establishing
their general mathematical properties as well as having a compilation of readily
available analytical expressions for commonly used distributions. The mathematical properties of the Rényi information measures have been studied both directly, e.g., in [27, 32, 13], and indirectly through their relation to f -divergences
[6, 21]. On the other hand, a compilation of important Rényi divergence and
KLD expressions does not seem to be currently available in the literature. Some
isolated results can be found in separate works, e.g.: Rényi divergence for Dirichlet distributions (via the Chernoff distance expression) [26]; Rényi divergence
for two multivariate Gaussian distributions [16, 15], a result that can be traced
back to [5] and [31]; Rényi divergence and KLD for a special case of univariate Pareto distributions [4]; and KLD for multivariate normal and univariate
Gamma [25, 24], as well as Dirichlet and Wishart [24].
The need for an analogous compilation for Rényi and Shannon entropies has
already been addressed in the literature. Closed-form expressions for differential
Shannon and Rényi entropies for several univariate continuous distributions are
presented in the work by Song [30]. The author also introduces an ‘intrinsic
loglikelihood-based distribution measure,’ Gf , derived from the Rényi entropy.
Song’s work was followed by [22] where the differential Shannon and Rényi
entropy, as well as Song’s intrinsic measure for an additional 26 continuous
univariate distribution families are presented. The same authors then expanded
these results for several multivariate families in [34].
The main objective of this work is to extend these compilations to include
expressions for the KLD and Rényi divergence of commonly used univariate
continuous distributions. Since most of the applications revolve around two
distributions within the same family, this was the focus of the calculations as
well. The complete derivations can be found in [11], where it is also shown
that the independently derived expressions are in agreement with the aforementioned existing results in the literature. We present tables containing these
2
expressions for nineteen univariate continuous distributions as well as those for
Dirichlet and multivariate Gaussian distributions in Section 2.2. In Section 2.3
we provide a table summarizing the expressions for the Rényi entropy, Shannon
entropy, Rényi divergence, and KLD rates for zero-mean stationary Gaussian
sources. Finally, in Section 3 , we establish a relationship between the Rényi
divergence and the variance of the log-likelihood ratio and present a table of
the corresponding variance expressions for the various univariate distributions
considered in this paper.
2. Closed-Form Expressions for Rényi and Kullback Divergences of
Continuous Distributions
We consider commonly used families of univariate continuous distributions
and present Rényi and Kullback divergence expressions between two members
of a given family. While all the expressions given in this section have been
independently derived in [11], we note that in the cases of distributions belonging
to exponential families, the results can be computed via a closed-form formula
originally obtained by Liese and Vajda [20] presented below. This result seems
to be largely unknown in the literature. For example, the works [26, 4, 25, 24] do
not reference it, while similar work by Vajda and Darbellay [8] on differential
entropy for exponential families is cited in some of the works compiling the
corresponding expressions such as [22, 34], and even in the entropy section of
[4].
Some expressions for Rényi divergences and/or Kullback divergences for
cases where the formula by Liese and Vajda is not applicable are also derived
in [11], namely the Rényi and Kullback divergence for general univariate Laplacian, general univariate Pareto, Cramér, and uniform distributions, as well as
the KLD for general univariate Gumbel and Weibull densities.
The integration calculations are carried out using standard techniques such
as substitution and the method of integration by parts, reparametrization of
some of the integrals so as to express the integrand as a known probability distribution scaled by some factor, and applying integral representations of special
functions, in particular the Gamma and related functions.
2.1. Rényi Divergence for Natural Exponential Families
In Chapter 2 of their 1987 book Convex Statistical Distances [20], Liese
and Vajda derive a closed-form expression for the Rényi divergence between
two members of a canonical exponential family. Note that their definition of
Rényi divergence, here denoted by Rα (fi ||fj ), differs by a factor of α from
the one considered in this work; that is Dα (fi ||fj ) = αRα (fi ||fj ). Also, the
authors allow the parameter α to be any real number as opposed to the more
standard definitions where α > 0. Consider a natural exponential family of
probability measures Pτ on Rn having densities pτ (x) = [C(τ )]−1 exphτ , T (x)i,
where T : Rn → Rk , τ is a k−dimensional parameter vector, and h·, ·i denotes
the standard inner product in Rk . Denote the natural parameter space by Θ.
Then Rα (fi ||fj ) is given by the following cases:
3
• If α ∈
/ {0, 1} and ατi + (1 − α)τj ∈ Θ,
Rα (fi ||fj ) =
1
C(ατi + (1 − α)τj )
;
ln
α(α − 1)
C(τi )α C(τj )1−α
• If α ∈
/ {0, 1} and ατi + (1 − α)τj ∈
/ Θ,
Rα (fi ||fj ) = +∞ ;
• If α = 0,
Rα (fi ||fj ) = ∆(τi , τj );
• If α = 1,
Rα (fi ||fj ) = ∆(τj , τi ) ;
with ∆(τ , τj ) := limα↓0 α−1 [αD(τi ) + (1 − α)D(τj ) − D(ατi + (1 − α)τj )] ,
and D(τ ) = ln C(τ ).
2.2. Tables
The results presented in the following tables are derived in [11], except for
the derivation of the KLD for Dirichlet distributions, which can be found in [24].
A few of these results can also be found in the references mentioned in Section 1.
Table 2 and Table 3 present the expressions for Rényi and Kullback divergences,
respectively. We follow the convention 0 ln 0 = 0, which is justified by continuity.
The densities associated with the distributions are given in Table 1.
The table of Rényi divergences includes a finiteness constraint for which the
given expression is valid. For all other cases (and α > 0), Dα (fi ||fj ) = ∞. In
the cases where the closed-form expression is a piecewise-defined function, the
conditions for each case are presented alongside the corresponding formula, and
it is implied that for all other cases Dα (fi ||fj ) = ∞. The expressions for the
Rényi divergence of Laplace and Cramer distributions are still continuous at
α = λi /(λi + λj ) (where λ is the scale parameter in the Laplace distribution)
and α = 1/2, respectively (this can be easily verified using l’Hospital’s rule).
One important property of Rényi divergence is that Dα (T (X)||T (Y )) =
Dα (X||Y ) for any invertible transformation T . This follows from the more
general data processing inequality (see, e.g., [20, 21]). For example, the Rényi
divergence between two lognormal densities is the same as that between two
normal densities. Also, applying the appropriate T and reparametrizing, one can
find similar relationships between other distributions, such as between Weibull
and Gumbel, Gamma and Chi (or the relevant special cases), and the halfnormal and Lévy under equal supports.
Note that Γ(x), ψ(x), and B(x) denote the Gamma, Digamma, and the multivariate Beta functions, respectively. Also, γ ≈ 0.5772 is the Euler-Mascheroni
constant. Lastly, note the direction of the divergence: i 7→ j.
4
Table 1: Common Continuous Distributions
Name
Density
Restrictions
Beta
xa−1 (1 − x)b−1
B(a, b)
a , b > 0 ; x ∈ (0, 1)
Chi
21−k/2 xk−1 e−x
σk Γ k
2
χ2
xd/2−1 e−x/2
2d/2 Γ(d/2)
d ∈ N; x > 0
Cramér
θ
2(1 + θ|x|)2
θ > 0; x ∈ R
Dirichlet
1 Y ak −1
x
B(a) k=1 k
2 /2σ 2
σ > 0, k ∈ N; x > 0
d
a ∈ Rd , ak > 0, d ≥ 2 ;
x ∈ Rd ,
Exponential
Gamma
λe
x
−λx
xk = 1
λ > 0; x > 0
k−1 −x/θ
e
θ k Γ(k)
θ > 0, k > 0; x > 0
0
1
Multivariate Gaussian
P
−1
e− 2 (x−µ) Σ (x−µ)
(2π)n/2 |Σ|1/2
µ ∈ Rn ; x ∈ Rn
Σ symmetric positive definite
Univariate Gaussian
e
−(x−µ)2 /2σ 2
√
0
1
Special Bivariate
σ > 0,µ ∈ R;x ∈ R
2πσ 2
−1
e− 2 x Φ x
2π(1 − ρ2 )1/2
"
ρ ∈ (−1, 1) , Φ =
1
ρ
x ∈ R2
Gaussian
−(x−µ)/β
Gumbel
Half-Normal
Inverse Gaussian
Laplace
Lévy
e−(x−µ)/β e−e
β
r
2
−x2 /(2σ 2 )
e
πσ 2
1/2
λ
−λ(x − µ)2
exp
3
2πx
2µ2 x
1 −|x−θ|/λ
e
2λ
c
r
−
e 2(x−µ)
c
2π (x − µ)3/2
µ ∈ R, β > 0 ; x ∈ R
σ > 0; x > 0
λ > 0, , µ > 0 ; x > 0
λ > 0,θ ∈ R; x ∈ R
c > 0,µ ∈ R; x > µ
2
Log-normal
(ln x−µ)
1
−
2σ 2
e
√
xσ 2π
r
Maxwell-Boltzmann
Pareto
Rayleigh
Uniform
Weibull
2
− x
2 x2 e 2σ2
π
σ3
a
am x
σ > 0; x > 0
−(a+1)
a,m > 0;x > m
x −x2 /(2σ2 )
e
σ2
1
b−a
kλ
σ > 0,µ ∈ R; x ∈ R
−k
x
σ > 0; x > 0
a<x<b
k−1 −(x/λ)k
e
5
k,λ > 0; x > 0
ρ
1
#
;
Table 2: Rényi Divergences for Common Continuous Distributions
Name
Dα (fi ||fj )
Beta
ln
Chi
Finiteness
Condition
B(aj , bj )
1
B(aα , bα )
+
ln
B(ai , bi )
α−1
B(ai , bi )
aα = αai + (1 − α)aj , bα = αbi + (1 − α)bj

 k
σj j Γ(kj /2)


ln
k
σi i Γ(ki /2)

!kα /2 
σi2 σj2
1
Γ(kα /2)

+
ln  k
2 ∗
α−1
σ i Γ(ki /2) (σ )α
aα , bα ≥ 0
(σ 2 )∗
α > 0, kα > 0
i
χ2
2
2
(σ 2 )∗
α = ασj + (1 − α)σi , kα = αki + (1 − α)kj
Γ(dj /2)
Γ(dα /2)
1
ln
ln
+
Γ(di /2)
α−1
Γ(di /2)
dα > 0
dα = αdi + (1 − α)dj
Cramér
For α = 1/2
ln
θi
+ 2 ln
θj
θi
θi
ln
θi − θj
θj
For α 6= 1/2
1
θi
+
ln
ln
θj
α−1
Dirichlet
!
θi 1 − (θj /θi )2α−1
(θi − θj )(2α − 1)
B(aj )
1
B(aα )
ln
+
ln
B(ai )
α−1
B(ai )
aαk > 0 ∀k
aα = αai + (1 − a)aj
Exponential
ln
1
λi
λi
+
ln
λj
α−1
λα
λα > 0
λα = αλi + (1 − α)λj

Gamma
ln 
+
k

k

Γ(kj )θj j
Γ(ki )θi i
1
ln
α−1
Γ(kα )
k
θi i Γ(ki )
kα !
θi θj
∗
θα
∗
θα
> 0 and kα > 0
∗
θα
= αθj + (1 − a)θi , kα = αki + (1 − α)kj
Multivariate
Gaussian
α
0
∗
(µi − µj ) (Σα ) (µi − µj )
2
1
|(Σα )∗ |
−
ln
2(α − 1)
|Σi |1−α |Σj |α
αΣ−1
+ (1 − α)Σ−1
i
j
positive definite
(Σα )∗ = αΣj + (1 − α)Σi
Univariate
Gaussian
ln
σj
1
+
ln
σi
2(α − 1)
σj2
!
+
(σ 2 )∗
α
2
2
(σ 2 )∗
α = ασj + (1 − α)σi
6
1 α(µi − µj )2
2
(σ 2 )∗
α
(σ 2 )∗
α > 0
Name
Dα (fi ||fj )
Special
Bivariate
Gaussian
1
ln
2
Finiteness
Condition
1 − ρ2j
!
−
1 − ρ2i
2
1 − (ρ∗
α)
2
(1 − ρj )
1
ln
2(α − 1)
!
αΦ−1
+ (1 − α)Φ−1
i
j
positive definite
ρ∗
α = αρj + (1 − α)ρi
Gumbel
Fixed Scale
(βi = βj )
µi − µj
1
eµi /β
+
ln µ /β β
α−1
e i
α
e
µi /β
µi /β
= αe
eµi /β
>0
α
+ (1 − α)e
µj /β
α
Half-Normal
Inverse
Gaussian
ln
!1/2
σj2
(σ 2 )∗
α > 0
(σ 2 )∗
α
2
2
(σ 2 )∗
α = ασj + (1 − α)σi
1
λi
1
λi
ln
+
ln
2
λj ( 2(α − 1)
λα
1/2 )
1
λ
λ
+
λ
−
α
α−1
µ α
µ2 α
Laplace
1
σj
+
ln
σi
α−1
λ
µ2
λ
µ
=α
α
λ
µ2
1−α
exp
λj
≥ 0 and λα > 0
α
λi
λj
+ (1 − α) 2 , λa = αλi + (1 − α)λj
µ2i
µj
=α
α
λi
µi
+ (1 − α)
λj
µj
2λi
λi + λj + |θi − θj |
.
For α = λi /(λi + λj )
ln
|θi − θj |
λi + λj
λj
+
+
ln
λi
λj
λj
For α 6= λi /(λi + λj ) and αλj + (1 − α)λi > 0
ln
λj
1
+
ln
λi
α−1
1
ln
2
ci
cj
+
!
α2 λ2j − (1 − α)2 λ2i
α
exp
λi
where g(α) =
Lévy
Equal
Supports
(µi = µj )
λi λ2j g(α)
−
(1 − α)|θi − θj |
λj
1
ln
2(α − 1)
ci
cα
−
−α|θi − θj |
λi
cα > 0
cα = αci + (1 − α)cj
Log-normal
ln
σj
1
ln
+
σi
2(α − 1)
σj2
!
+
(σ 2 )∗
α
1 α(µi − µj )2
2
(σ 2 )∗
α
(σ 2 )∗
α > 0
2
2
(σ 2 )∗
α = ασj + (1 − α)σi
Maxwell
Boltzmann
3 ln
σj
1
+
ln
σi
α−1
σj2
!3/2
(σ 2 )∗
α > 0
(σ 2 )∗
α
2
2
(σ 2 )∗
α = ασj + (1 − α)σi
Pareto
For α ∈ (0, 1)
a
ln
mi i
a
mj j
a
+ ln
ai mi i
ai
1
+
ln
,
aj
α−1
aα M aα
7
Dα (fi ||fj )
Name
Finiteness
Condition
M = max{mi , mj }
For α > 1, mi ≥ mj , and aα = αai + (1 − α)aj > 0
Rayleigh
a
ln
mi
mj
2 ln
σj
1
+
ln
σi
α−1
j
+ ln
ai
1
ai
+
ln
aj
α−1
aα
σj2
!
(σ 2 )∗
α > 0
(σ 2 )∗
α
2
2
(σ 2 )∗
α = ασj + (1 − α)σi
Uniform
For α ∈ (0, 1) and
bm = min{bi , bj } > aM = max{ai , aj }
ln
bj − aj
1
bm − aM
+
,
ln
bi − ai
α−1
bi − ai
For α > 1 , (ai , bi ) ⊂ (aj , bj )
bj − aj
bi − ai
λk
λj k
1
j
+
ln k ∗
ln
λi
α−1
(λ )α
ln
Weibull
Fixed Shape
(ki = kj )
k ∗
k
(λk )∗
α > 0
k
(λ )α = αλj + (1 − α)λi
Table 3: Kullback Divergences for Common Continuous Distributions
Name
D(fi ||fj )
Beta
ln
Chi
B(aj , bj )
+ ψ(ai )(ai − aj ) + ψ(bi )(bi − bj )
B(ai , bi )
+ [aj + bj − (ai + bi )]ψ(ai + bi )
"
#
1
σj kj Γ(kj /2)
ki 2
2
σi − σj
ψ(ki /2) (ki − kj ) + ln
+
2
σi
Γ(ki /2)
2σj2
Γ(dj /2)
di − dj
+
ψ(di /2)
Γ(di /2)
2
χ2
ln
Cramér
θi
θi + θj
ln
−2
θi − θj
θj
Dirichlet
log
Exponential
λi
λj − λi
ln
+
λj
λi
Gamma
Multivariate
Gaussian
Univariate
Gaussian
"
d
X
B(aj )
+
aik − ajk
ψ(aik ) − ψ
B(ai )
k=1
θi − θj
θj

k
Γ(kj )θj j
d
X
!#
aik
k=1

 + (ki − kj ) (ln θi + ψ(ki ))
k
Γ(ki )θi i
i
0 −1
1h
1
|Σj |
−1
ln
+ tr Σj Σi
+
µi − µj Σj
µi − µj − n
2
|Σi |
2
i
1 h
σj
2
2
2
(µi − µj ) + σi − σj + ln
2σj2
σi
ki + ln 
8
Name
D(fi ||fj )
Special
Bivariate
Gaussian
1
ln
2
1 − ρ2j
!
+
ρ2j − ρj ρi
General
Gumbel
1−
1 − ρ2j
βj
βi
βi
(µ −µ )/βj
ln
+γ
−1 +e j i
Γ
+1 −1
βi
βj
βj
Half-Normal
ln
σj
σi
Inverse
Gaussian
1
2
λj
+ ln
λi
Laplace
ln
Lévy
Equal
Supports
(µi = µj )
1
ln
2
Log-normal
i
1 h
σj
2
2
2
(µi − µj ) + σi − σj + ln
2σj2
σi
Maxwell
Boltzmann
3 ln
Pareto
ln
ρ2i
σi2 − σj2
+
2σj2
λi
λj
+
!
λj (µi − µj )2
−
1
µi µ2j
λj
|θi − θj |
λi
+
+
exp (−|θi − θj |/λi ) − 1
λi
λj
λj
ci
cj
σj
σi
+
+
cj − ci
2ci
3(σi2 − σj2 )
2σj2
m i aj
ai
aj − ai
+ ln
+
,
mj
aj
ai
σi2 − σj2
σj
2 ln
+
σi
σj2
Rayleigh
Uniform
General
Weibull
for mi ≥ mj and ∞ otherwise.
bj − aj
for (ai , bi ) ⊆ (aj , bj ) and ∞ otherwise.
bi − ai
!
λi kj
ki λj kj
kj − ki
kj
+
+γ
Γ 1+
−1
ln
kj λi
ki
λj
ki
ln
2.3. Information Rates for Stationary Gaussian Processes
It is also of interest to extend divergence measures to stochastic processes,
in which case one considers the rate of the given divergence measure. Given two
real-valued processes X = {Xn }n∈N , Y = {Yn }n∈N , the expression for the Rényi
divergence rate between X and Y is Dα (X||Y ) := limn→∞ (1/n)Dα (fX n ||fY n ),
whenever the limit exists, and where fX n and fY n are the densities of X n =
(X1 , ..., Xn ) and Y n = (Y1 , ..., Yn ), respectively. The KLD, Rényi entropy, and
differential Shannon entropy rates are similarly defined. In Table 4 we summarize these expressions for stationary zero-mean Gaussian processes with the
appropriate reference. Note that ϕ(λ) and ψ(λ) are the power spectral densities
of X and Y on [−π, π], respectively.
9
Table 4: Information Rates for Stationary Zero-Mean Gaussian Processes
Information
Measure
Rate
Differential
Entropy
1
1
ln(2πe) +
2
4π
Rényi
Entropy
Conditions
1
1
1
ln 2πα α−1 +
2
4π
ln ϕ(λ) ∈ L1 [−π, π]
π
Z
ln ϕ(λ) dλ [18, 17]
−π
Z
Also valid for nonzero mean
stationary Gaussian processes.
ln ϕ(λ) ∈ L1 [−π, π]
π
ln 2πϕ(λ) dλ [12]
−π
Also valid for nonzero mean
stationary Gaussian processes.
ϕ(λ)/ψ(λ) is bounded or
KLD
D(X||Y )
1
4π
Z
π
−π
ϕ(λ)
ϕ(λ)
− 1 − ln
ψ(λ)
ψ(λ)
dλ [17]
ψ(λ) > a > 0 , ∀λ ∈ [−π, π]
and ϕ ∈ L2 [−π, π].
h(λ) := αψ(λ) + (1 − α)ϕ(λ)
Rényi
Divergence
Dα (X||Y )
1
4π(1 − α)
Z
π
ln
−π
h(λ)
ϕ(λ)1−α ψ(λ)α
dλ [31]
ψ(λ) and ϕ(λ) are essentially
bounded and essentially
bounded away from zero,
and α ∈ (0, 1).
3. Rényi Divergence and the Log-likelihood Ratio
Song [30] pointed out a relationship between the derivative of the Rényi
entropy with respect to the parameter α and the variance of the log-likelihood
function of a distribution. Let f be a probability density, then
lim
α→1
d
1
hα (f ) = − Var(ln f (X)) ,
dα
2
assuming the integrals involved are well-defined and differentiation operations
are legitimate. Extending Song’s idea, we note that the variance of the loglikelihood ratio (LLR) between two densities can be similarly derived from an
analytic formula of their Rényi divergence of order α. Let fi and fj be two
probability densities such that Dα (fi || fj ) is n times continuously differentiable
with respect to α (n ≥ 2). Assuming differentiation and integration can be
interchanged, one obtains
d
1
fi (X)
lim
Dα (fi ||fj ) = Varfi ln
.
α→1 dα
2
fj (X)
R
Considering the integral G(α) := fi (x)α fj1−α (x) dx for α > 0, α 6= 1, we have
dn
G(α) = Efi
α→1 dαn
lim
10
ln
fi (X)
fj (X)
n .
Hence, from the definition of Dα (fi ||fj ), we find
d
1
−1 dG(α)
lim
(α
−
1)G(α)
Dα (fi ||fj ) = lim
−
ln
G(α)
α→1 dα
α→1 (α − 1)2
dα
"
2
2 #!
fi (X)
fi (X)
1
−Efi
ln
+ Efi
ln
,
=
2
fj (X)
fj (X)
where we used l’Hospital’s rule to evaluate the limit. This connection between
the Rényi divergence and the LLR variance becomes practically useful in light
of the Rényi divergence expressions presented in Section 2.2. Table 5 presents
these quantities for the continuous univariate distributions considered in this
paper. Note that ψ (1) (x) refers to the polygamma function of order 1.
Table 5: Var(LLR) for Common Continuous Univariate Distributions
ln
fi (X)
fj (X)
Name
Varfi
Beta
(ai − aj ) ψ (ai ) + (bi − bj ) ψ (bi ) − (ai − aj + bi − bj ) ψ
k
2 σi2 − σj2
ki σi2 + σj2 − 2kj σj2 + σj4 (ki − kj )2 ψ (1) 2i
Chi
χ2
Cramér
Exponential
Gamma
Univariate
Gaussian
Gumbel
(Fixed Scale)
Half-Normal
Inverse
Gaussian
Laplace
2
(1)
2
Log-normal
2
4σj4
1
di
2 (1)
(di − dj ) ψ
4
2
h
θ i
2
4 (θi − θj ) − θi θj log2 θj
i
(θi − θj )2
(λj − λi )2
λ2i
(θi − θj )(ki (θi + θj ) − 2kj θj ) + θj2 (ki − kj )2 ψ (1) (ki )
θj2
2σi2 (µi
2
− µj ) +
(σi2
−
σj2 )2
2σj4
−
e
2µi
β
e
µi
β
−e
µj
β
2
(σj2 − σi2 )2
2σj4
λi λ2j µ4i − 2λi λ2j µ2i µ2j + µ4j λi λ2j + 2µi (λi − λj )2
1
λ2j
(
2
λi
2−e
−
4λ2i µi µ4j
!
−θ |
2|θi
j
λi
− 2λi λj e
−2(λi + λj )|θi − θj |e
Lévy
Equal
Supports
(µi = µj )
(1)
−
(ci − cj )2
2c2i
2σi2 (µi − µj )2 + (σi2 − σj2 )2
2σj4
11
|θi −θj |
λi
−
|θi −θj |
λi
)
2
+ λj
(1)
(ai + bi )
ln
fi (X)
fj (X)
Name
Varfi
Maxwell
Boltzmann
3(σj2 − σi2 )2
Pareto
Rayleigh
2σj4
(ai − aj )2
a2i
(σj2 − σi2 )2
σj4
Weibull
Fixed Scale
k
λk
i − λj
2
λ2k
j
Song [30] also proposed a nonparametric estimate of Var(ln f (X)) based on
separately estimating ∆1 := E[ln f (X)] and ∆2 := E[(ln f (X))2 ] by the plug-in
R
ˆ l,n := fn (x) ln fn (x) l dx, l = 1, 2, where fn is a kernel density
estimates ∆
estimate of f from the independent and identically distributed (i.i.d.) samples
(X1 , X2 , . . . , Xn ) that are drawn according to f . Theorem 3.1 in [30] establishes
ˆ 2,n − ∆
ˆ 2 converges with probability one to
precise conditions under which ∆
1,n
Var(ln f (X)) as n → ∞ (the estimate is strongly consistent).
Assuming one has access to i.i.d. samples (X1 , . . . , Xn ) and (Y1 , . . . , Yn )
drawn according
to fi and fj , respectively, we can propose a similar estimate
of Varfi ln ffji (X)
(X) . Let
˜ l,n :=
∆
Z
fi,n (x)
fi,n (x) ln
fj,n (x)
l
dx,
l = 1, 2,
where fi,n and fj,n are kernel density estimates of fi and fj based on (X1 , . . . , Xn )
and (Y1 , . . . , Yn ), respectively. It is an interesting problem to find exact condi˜ 2,n − ∆
˜2
tions on fi and fj such that appropriate choices of fi,n and fj,n make ∆
1,n
fi (X)
a strongly consistent estimate of Varfi ln fj (X) . Investigating this problem is
out of the scope of this paper, but it is likely that the consistency proof in [30]
can be extended to this case under appropriate restrictions on fi and fj . Alternatively, the partitioning, k(n) nearest-neighbor, and minimum spanning tree
based methods reviewed in [33] and the references therein should also provide
consistent estimates.
4. Conclusion
In this work we presented closed-form expressions for Rényi and KullbackLeibler divergences for nineteen commonly used univariate continuous distributions, as well as those expressions for Dirichlet and multivariate Gaussian
distributions. We also presented a table summarizing four of the most important information measure rates for zero-mean stationary Gaussian processes,
as well as the relevant references. Lastly, we established a connection between
the log-likelihood ratio between two distributions and their Rényi divergence,
12
extending the work of Song [30], who considers the log-likelihood function and
its relation to Rényi entropy. Following this connection, we have provided the
corresponding expressions for the nineteen univariate distributions considered
here. The present compilation is of course not complete. Particularly valuable
future additions would be the Rényi divergence expressions for Student-t and
Kumaraswamy distributions, as well as for the exponentiated and beta-normal
distributions [10].
5. Acknowledgements
The authors are grateful to NSERC of Canada whose funds partially supported this work.
[1] J. Aczél, Z. Daróczy, On Measures of Information and their Characterization, Academic Press, 1975.
[2] F. Alajaji, P. N. Chen, Z. Rached, Csiszár’s cutoff rates for the general hypothesis testing problem, IEEE Transactions on Information Theory 50 (4)
(2004) 663 – 678.
[3] C. Arndt, Information Measures: Information and its Description in Science
and Engineering, Signals and Communication Technology, Springer, 2004.
[4] M. Asadi, N. Ebrahimi, G. G. Hamedani, E. S. Soofi, Information measures
for Pareto distributions and order statistics, in: Advances in Distribution
Theory, Order Statistics, and Inference, Springer, 2006, pp. 207–223.
[5] J. Burbea, The convexity with respect to Gaussian distributions of divergences of order α, Utilitas Mathematica 26 (1984) 171–192.
[6] I. Csiszár, Eine informationstheoretische Ungleichung und ihre Anwendung
auf den Beweis der Ergodizität von Markoffschen Ketten, Magyar Tud.
Akad. Mat. Kutató Int. Közl (8) (1963) 85–108.
[7] I. Csiszar, Generalized cutoff rates and Rényi’s information measures, IEEE
Transactions on Information Theory 41 (1) (1995) 26 –34.
[8] G. A. Darbellay, I. Vajda, Entropy expressions for multivariate continuous
distributions, IEEE Transactions on Information Theory (2000) 709–712.
[9] P. A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach,
Prentice - Hall International, 1982.
[10] N. Eugene, C. Lee, F. Famoye, Beta-normal distribution and its applications, Communications in Statistics - Theory and Methods, 31-(4) (2002)
497–512.
[11] M. Gil, On Rényi divergence measures for continuous alphabet sources,
Master’s thesis, Queen’s University (2011).
13
[12] L. Golshani, E. Pasha, Rényi entropy rate for Gaussian processes, Information Sciences 180 (8) (2010) 1486–1491.
[13] L. Golshani, E. Pasha, G. Yari, Some properties of Rényi entropy and Rényi
entropy rate, Information Sciences 179 (14) (2009) 2426 – 2433.
[14] P. Harremoës, Interpretations of Rényi entropies and divergences, Physica
A: Statistical Mechanics and its Applications 365 (1) (2006) 57 – 62.
[15] A. O. Hero, B. Ma, O. Michel, J. Gorman, Alpha-divergence for classification, indexing and retrieval, Tech. rep., University of Michigan (2001).
[16] T. Hobza, D. Morales, L. Pardo, Rényi statistics for testing equality of
autocorrelation coefficients, Statistical Methodology 6 (4) (2009) 424 – 436.
[17] S. Ihara, Information Theory for Continuous Systems, World Scientific,
1993.
[18] A. N. Kolmogorov, A new metric invariant of transient dynamical systems
and automorphisms, in: in Lebesgue Spaces.??? Dokl. Akad. Nauk. SSSR
119, 1958.
[19] S. Kullback, R. A. Leibler, On information and sufficiency, The Annals of
Mathematical Statistics 22 (1) (1951) 79–86.
[20] F. Liese, I. Vajda, Convex Statistical Distances, Treubner, 1987.
[21] F. Liese, I. Vajda, On divergences and informations in statistics and information theory, IEEE Transactions on Information Theory 52 (10) (2006)
4394 –4412.
[22] S. Nadarajah, K. Zografos, Formulas for Rényi information and related
measures for univariate distributions, Information Sciences 155 (1-2) (2003)
119 – 138.
[23] M. C. Pardo, J. A. Pardo, Use of Rényi’s divergence to test for the equality of the coefficients of variation, Journal of Computational and Applied
Mathematics 116 (1) (2000) 93 – 104.
[24] W. D. Penny, Kullback-Leibler divergences of normal, gamma, dirichlet
and wishart densities., Tech. rep., Wellcome Department of Cognitive Neurology (2001).
[25] W. D. Penny, S. Roberts, Bayesian methods for autoregressive models, in:
Neural Networks for Signal Processing X, 2000. Proceedings of the 2000
IEEE Signal Processing Society Workshop, vol. 1, 2000.
[26] T. W. Rauber, T. Braun, K. Berns, Probabilistic distance measures of the
Dirichlet and Beta distributions, Pattern Recognition 41 (2) (2008) 637 –
645.
14
[27] A. Rényi, On measures of entropy and information, in: Proceedings of the
Fourth Berkeley Symposium on Mathematical Statistics and Probability,
vol. 1, 1961.
[28] M. J. Rivas, M. T. Santos, D. Morales, Rényi test statistics for partially
observed diffusion processes, Journal of Statistical Planning and Inference
127 (1-2) (2005) 91 – 102.
[29] C. E. Shannon, A mathematical theory of communication, Bell System
Technical Journal 27 (1948) 379 - 423.
[30] K. Song, Rényi information, loglikelihood and an intrinsic distribution measure, Journal of Statistical Planning and Inference 93 (1-2) (2001) 51 – 69.
[31] I. Vajda, Theory of Statistical Inference and Information, Kluwer, Dordrecht, 1989.
[32] T. van Erven, P. Harremoës, Rényi divergence and majorization, in: Information Theory Proceedings (ISIT), 2010 IEEE International Symposium
on, 2010.
[33] Q. Wang, S. Kulkarni, S. Verdú, Universal estimation of information measures for analog sources, Foundations and Trends in Communications and
Information Theory, 5 (3) (2009), 265 – 353.
[34] K. Zografos, S. Nadarajah, Expressions for Rényi and Shannon entropies
for multivariate distributions, Statistics & Probability Letters 71 (1) (2005)
71 – 84.
15