Rényi Divergence Measures for Commonly Used Univariate Continuous Distributions M. Gil1,∗, F. Alajaji, T. Linder Department of Mathematics and Statistics, Queen’s University, Kingston, ON, K7L 3N6 Abstract Probabilistic ‘distances’ (also called divergences), which in some sense assess how ‘close’ two probability distributions are from one another, have been widely employed in probability, statistics, information theory, and related fields. Of particular importance due to their generality and applicability are the Rényi divergence measures. This paper presents closed-form expressions for the Rényi and Kullback-Leibler divergences for nineteen commonly used univariate continuous distributions as well as those for multivariate Gaussian and Dirichlet distributions. In addition, a table summarizing four of the most important information measure rates for zero-mean stationary Gaussian processes, namely Rényi entropy, differential Shannon entropy, Rényi divergence, and Kullback-Leibler divergence, is presented. Lastly, a connection between the Rényi divergence and the variance of the log-likelihood ratio of two distributions is established, thereby extending a previous result by Song [J. Stat. Plan. Infer. 93 (2001)] on the relation between Rényi entropy and the log-likelihood function. A table with the corresponding variance expressions for the univariate distributions considered here is also included. Keywords: Rényi divergence, Rényi divergence rate, Kullback divergence, probabilistic distances, divergences, continuous distributions, log-likelihood ratio. 1. Introduction In his 1961 work [27], Rényi introduced generalized information and divergence measures which naturally extend Shannon entropy [29] and KullbackLeibler divergence (KLD) [19]. For a probability density f on RnR the Rényi α entropy of order α is defined via the integral hα (f ) := (1 − α)−1 ln f (x) dx ∗ Corresponding author Email addresses: [email protected] (M. Gil), [email protected] (F. Alajaji), [email protected] (T. Linder) 1 Present address: McMaster University, Department of Engineering Physics, 1280 Main Street West, John Hodgins Engineering Building, Room A315, Hamilton, Ontario, Canada L8S 4L7, Phone: (905) 525-9140 ext 24545, Fax: (905) 527-8409. Preprint submitted to Information Sciences July 14, 2013 for α > 0 and α 6= 1. Under appropriate conditions, in the limit R as α → 1, hα (f ) converges to the differential Shannon entropy h(f ) := − f (x) ln f (x) dx = −Ef [ln f ] . If g is another probability density on Rn , the Rényi divergence of R α 1−α dx order α between f and g is given by Dα (f ||g) := (α−1)−1 ln f (x) g(x) for α > 0 and α 6= 1. Under appropriate conditions, inR the limit α → 1, Dα (f ||g) converges to the KLD between f and g, D(f ||g) := f (x) log(f (x)/g(x)) dx = Ef [log(f /g)]. While a significant number of other divergence measures have since been introduced [1, 3, 9], Rényi divergences are especially important because of their generality, applicability, and the fact that they posses operational definitions in the context of hypothesis testing [7, 2] as well as coding [14]. Additional applications of Rényi divergences include the derivation of a family of test statistics for the hypothesis that the coefficients of variation of k normal populations are equal [23], their use in problems of classification, indexing and retrieval, for example [15], and tests statistics for partially observed diffusion processes [28], to name a few. The ubiquity of Rényi divergences suggests the importance of establishing their general mathematical properties as well as having a compilation of readily available analytical expressions for commonly used distributions. The mathematical properties of the Rényi information measures have been studied both directly, e.g., in [27, 32, 13], and indirectly through their relation to f -divergences [6, 21]. On the other hand, a compilation of important Rényi divergence and KLD expressions does not seem to be currently available in the literature. Some isolated results can be found in separate works, e.g.: Rényi divergence for Dirichlet distributions (via the Chernoff distance expression) [26]; Rényi divergence for two multivariate Gaussian distributions [16, 15], a result that can be traced back to [5] and [31]; Rényi divergence and KLD for a special case of univariate Pareto distributions [4]; and KLD for multivariate normal and univariate Gamma [25, 24], as well as Dirichlet and Wishart [24]. The need for an analogous compilation for Rényi and Shannon entropies has already been addressed in the literature. Closed-form expressions for differential Shannon and Rényi entropies for several univariate continuous distributions are presented in the work by Song [30]. The author also introduces an ‘intrinsic loglikelihood-based distribution measure,’ Gf , derived from the Rényi entropy. Song’s work was followed by [22] where the differential Shannon and Rényi entropy, as well as Song’s intrinsic measure for an additional 26 continuous univariate distribution families are presented. The same authors then expanded these results for several multivariate families in [34]. The main objective of this work is to extend these compilations to include expressions for the KLD and Rényi divergence of commonly used univariate continuous distributions. Since most of the applications revolve around two distributions within the same family, this was the focus of the calculations as well. The complete derivations can be found in [11], where it is also shown that the independently derived expressions are in agreement with the aforementioned existing results in the literature. We present tables containing these 2 expressions for nineteen univariate continuous distributions as well as those for Dirichlet and multivariate Gaussian distributions in Section 2.2. In Section 2.3 we provide a table summarizing the expressions for the Rényi entropy, Shannon entropy, Rényi divergence, and KLD rates for zero-mean stationary Gaussian sources. Finally, in Section 3 , we establish a relationship between the Rényi divergence and the variance of the log-likelihood ratio and present a table of the corresponding variance expressions for the various univariate distributions considered in this paper. 2. Closed-Form Expressions for Rényi and Kullback Divergences of Continuous Distributions We consider commonly used families of univariate continuous distributions and present Rényi and Kullback divergence expressions between two members of a given family. While all the expressions given in this section have been independently derived in [11], we note that in the cases of distributions belonging to exponential families, the results can be computed via a closed-form formula originally obtained by Liese and Vajda [20] presented below. This result seems to be largely unknown in the literature. For example, the works [26, 4, 25, 24] do not reference it, while similar work by Vajda and Darbellay [8] on differential entropy for exponential families is cited in some of the works compiling the corresponding expressions such as [22, 34], and even in the entropy section of [4]. Some expressions for Rényi divergences and/or Kullback divergences for cases where the formula by Liese and Vajda is not applicable are also derived in [11], namely the Rényi and Kullback divergence for general univariate Laplacian, general univariate Pareto, Cramér, and uniform distributions, as well as the KLD for general univariate Gumbel and Weibull densities. The integration calculations are carried out using standard techniques such as substitution and the method of integration by parts, reparametrization of some of the integrals so as to express the integrand as a known probability distribution scaled by some factor, and applying integral representations of special functions, in particular the Gamma and related functions. 2.1. Rényi Divergence for Natural Exponential Families In Chapter 2 of their 1987 book Convex Statistical Distances [20], Liese and Vajda derive a closed-form expression for the Rényi divergence between two members of a canonical exponential family. Note that their definition of Rényi divergence, here denoted by Rα (fi ||fj ), differs by a factor of α from the one considered in this work; that is Dα (fi ||fj ) = αRα (fi ||fj ). Also, the authors allow the parameter α to be any real number as opposed to the more standard definitions where α > 0. Consider a natural exponential family of probability measures Pτ on Rn having densities pτ (x) = [C(τ )]−1 exphτ , T (x)i, where T : Rn → Rk , τ is a k−dimensional parameter vector, and h·, ·i denotes the standard inner product in Rk . Denote the natural parameter space by Θ. Then Rα (fi ||fj ) is given by the following cases: 3 • If α ∈ / {0, 1} and ατi + (1 − α)τj ∈ Θ, Rα (fi ||fj ) = 1 C(ατi + (1 − α)τj ) ; ln α(α − 1) C(τi )α C(τj )1−α • If α ∈ / {0, 1} and ατi + (1 − α)τj ∈ / Θ, Rα (fi ||fj ) = +∞ ; • If α = 0, Rα (fi ||fj ) = ∆(τi , τj ); • If α = 1, Rα (fi ||fj ) = ∆(τj , τi ) ; with ∆(τ , τj ) := limα↓0 α−1 [αD(τi ) + (1 − α)D(τj ) − D(ατi + (1 − α)τj )] , and D(τ ) = ln C(τ ). 2.2. Tables The results presented in the following tables are derived in [11], except for the derivation of the KLD for Dirichlet distributions, which can be found in [24]. A few of these results can also be found in the references mentioned in Section 1. Table 2 and Table 3 present the expressions for Rényi and Kullback divergences, respectively. We follow the convention 0 ln 0 = 0, which is justified by continuity. The densities associated with the distributions are given in Table 1. The table of Rényi divergences includes a finiteness constraint for which the given expression is valid. For all other cases (and α > 0), Dα (fi ||fj ) = ∞. In the cases where the closed-form expression is a piecewise-defined function, the conditions for each case are presented alongside the corresponding formula, and it is implied that for all other cases Dα (fi ||fj ) = ∞. The expressions for the Rényi divergence of Laplace and Cramer distributions are still continuous at α = λi /(λi + λj ) (where λ is the scale parameter in the Laplace distribution) and α = 1/2, respectively (this can be easily verified using l’Hospital’s rule). One important property of Rényi divergence is that Dα (T (X)||T (Y )) = Dα (X||Y ) for any invertible transformation T . This follows from the more general data processing inequality (see, e.g., [20, 21]). For example, the Rényi divergence between two lognormal densities is the same as that between two normal densities. Also, applying the appropriate T and reparametrizing, one can find similar relationships between other distributions, such as between Weibull and Gumbel, Gamma and Chi (or the relevant special cases), and the halfnormal and Lévy under equal supports. Note that Γ(x), ψ(x), and B(x) denote the Gamma, Digamma, and the multivariate Beta functions, respectively. Also, γ ≈ 0.5772 is the Euler-Mascheroni constant. Lastly, note the direction of the divergence: i 7→ j. 4 Table 1: Common Continuous Distributions Name Density Restrictions Beta xa−1 (1 − x)b−1 B(a, b) a , b > 0 ; x ∈ (0, 1) Chi 21−k/2 xk−1 e−x σk Γ k 2 χ2 xd/2−1 e−x/2 2d/2 Γ(d/2) d ∈ N; x > 0 Cramér θ 2(1 + θ|x|)2 θ > 0; x ∈ R Dirichlet 1 Y ak −1 x B(a) k=1 k 2 /2σ 2 σ > 0, k ∈ N; x > 0 d a ∈ Rd , ak > 0, d ≥ 2 ; x ∈ Rd , Exponential Gamma λe x −λx xk = 1 λ > 0; x > 0 k−1 −x/θ e θ k Γ(k) θ > 0, k > 0; x > 0 0 1 Multivariate Gaussian P −1 e− 2 (x−µ) Σ (x−µ) (2π)n/2 |Σ|1/2 µ ∈ Rn ; x ∈ Rn Σ symmetric positive definite Univariate Gaussian e −(x−µ)2 /2σ 2 √ 0 1 Special Bivariate σ > 0,µ ∈ R;x ∈ R 2πσ 2 −1 e− 2 x Φ x 2π(1 − ρ2 )1/2 " ρ ∈ (−1, 1) , Φ = 1 ρ x ∈ R2 Gaussian −(x−µ)/β Gumbel Half-Normal Inverse Gaussian Laplace Lévy e−(x−µ)/β e−e β r 2 −x2 /(2σ 2 ) e πσ 2 1/2 λ −λ(x − µ)2 exp 3 2πx 2µ2 x 1 −|x−θ|/λ e 2λ c r − e 2(x−µ) c 2π (x − µ)3/2 µ ∈ R, β > 0 ; x ∈ R σ > 0; x > 0 λ > 0, , µ > 0 ; x > 0 λ > 0,θ ∈ R; x ∈ R c > 0,µ ∈ R; x > µ 2 Log-normal (ln x−µ) 1 − 2σ 2 e √ xσ 2π r Maxwell-Boltzmann Pareto Rayleigh Uniform Weibull 2 − x 2 x2 e 2σ2 π σ3 a am x σ > 0; x > 0 −(a+1) a,m > 0;x > m x −x2 /(2σ2 ) e σ2 1 b−a kλ σ > 0,µ ∈ R; x ∈ R −k x σ > 0; x > 0 a<x<b k−1 −(x/λ)k e 5 k,λ > 0; x > 0 ρ 1 # ; Table 2: Rényi Divergences for Common Continuous Distributions Name Dα (fi ||fj ) Beta ln Chi Finiteness Condition B(aj , bj ) 1 B(aα , bα ) + ln B(ai , bi ) α−1 B(ai , bi ) aα = αai + (1 − α)aj , bα = αbi + (1 − α)bj k σj j Γ(kj /2) ln k σi i Γ(ki /2) !kα /2 σi2 σj2 1 Γ(kα /2) + ln k 2 ∗ α−1 σ i Γ(ki /2) (σ )α aα , bα ≥ 0 (σ 2 )∗ α > 0, kα > 0 i χ2 2 2 (σ 2 )∗ α = ασj + (1 − α)σi , kα = αki + (1 − α)kj Γ(dj /2) Γ(dα /2) 1 ln ln + Γ(di /2) α−1 Γ(di /2) dα > 0 dα = αdi + (1 − α)dj Cramér For α = 1/2 ln θi + 2 ln θj θi θi ln θi − θj θj For α 6= 1/2 1 θi + ln ln θj α−1 Dirichlet ! θi 1 − (θj /θi )2α−1 (θi − θj )(2α − 1) B(aj ) 1 B(aα ) ln + ln B(ai ) α−1 B(ai ) aαk > 0 ∀k aα = αai + (1 − a)aj Exponential ln 1 λi λi + ln λj α−1 λα λα > 0 λα = αλi + (1 − α)λj Gamma ln + k k Γ(kj )θj j Γ(ki )θi i 1 ln α−1 Γ(kα ) k θi i Γ(ki ) kα ! θi θj ∗ θα ∗ θα > 0 and kα > 0 ∗ θα = αθj + (1 − a)θi , kα = αki + (1 − α)kj Multivariate Gaussian α 0 ∗ (µi − µj ) (Σα ) (µi − µj ) 2 1 |(Σα )∗ | − ln 2(α − 1) |Σi |1−α |Σj |α αΣ−1 + (1 − α)Σ−1 i j positive definite (Σα )∗ = αΣj + (1 − α)Σi Univariate Gaussian ln σj 1 + ln σi 2(α − 1) σj2 ! + (σ 2 )∗ α 2 2 (σ 2 )∗ α = ασj + (1 − α)σi 6 1 α(µi − µj )2 2 (σ 2 )∗ α (σ 2 )∗ α > 0 Name Dα (fi ||fj ) Special Bivariate Gaussian 1 ln 2 Finiteness Condition 1 − ρ2j ! − 1 − ρ2i 2 1 − (ρ∗ α) 2 (1 − ρj ) 1 ln 2(α − 1) ! αΦ−1 + (1 − α)Φ−1 i j positive definite ρ∗ α = αρj + (1 − α)ρi Gumbel Fixed Scale (βi = βj ) µi − µj 1 eµi /β + ln µ /β β α−1 e i α e µi /β µi /β = αe eµi /β >0 α + (1 − α)e µj /β α Half-Normal Inverse Gaussian ln !1/2 σj2 (σ 2 )∗ α > 0 (σ 2 )∗ α 2 2 (σ 2 )∗ α = ασj + (1 − α)σi 1 λi 1 λi ln + ln 2 λj ( 2(α − 1) λα 1/2 ) 1 λ λ + λ − α α−1 µ α µ2 α Laplace 1 σj + ln σi α−1 λ µ2 λ µ =α α λ µ2 1−α exp λj ≥ 0 and λα > 0 α λi λj + (1 − α) 2 , λa = αλi + (1 − α)λj µ2i µj =α α λi µi + (1 − α) λj µj 2λi λi + λj + |θi − θj | . For α = λi /(λi + λj ) ln |θi − θj | λi + λj λj + + ln λi λj λj For α 6= λi /(λi + λj ) and αλj + (1 − α)λi > 0 ln λj 1 + ln λi α−1 1 ln 2 ci cj + ! α2 λ2j − (1 − α)2 λ2i α exp λi where g(α) = Lévy Equal Supports (µi = µj ) λi λ2j g(α) − (1 − α)|θi − θj | λj 1 ln 2(α − 1) ci cα − −α|θi − θj | λi cα > 0 cα = αci + (1 − α)cj Log-normal ln σj 1 ln + σi 2(α − 1) σj2 ! + (σ 2 )∗ α 1 α(µi − µj )2 2 (σ 2 )∗ α (σ 2 )∗ α > 0 2 2 (σ 2 )∗ α = ασj + (1 − α)σi Maxwell Boltzmann 3 ln σj 1 + ln σi α−1 σj2 !3/2 (σ 2 )∗ α > 0 (σ 2 )∗ α 2 2 (σ 2 )∗ α = ασj + (1 − α)σi Pareto For α ∈ (0, 1) a ln mi i a mj j a + ln ai mi i ai 1 + ln , aj α−1 aα M aα 7 Dα (fi ||fj ) Name Finiteness Condition M = max{mi , mj } For α > 1, mi ≥ mj , and aα = αai + (1 − α)aj > 0 Rayleigh a ln mi mj 2 ln σj 1 + ln σi α−1 j + ln ai 1 ai + ln aj α−1 aα σj2 ! (σ 2 )∗ α > 0 (σ 2 )∗ α 2 2 (σ 2 )∗ α = ασj + (1 − α)σi Uniform For α ∈ (0, 1) and bm = min{bi , bj } > aM = max{ai , aj } ln bj − aj 1 bm − aM + , ln bi − ai α−1 bi − ai For α > 1 , (ai , bi ) ⊂ (aj , bj ) bj − aj bi − ai λk λj k 1 j + ln k ∗ ln λi α−1 (λ )α ln Weibull Fixed Shape (ki = kj ) k ∗ k (λk )∗ α > 0 k (λ )α = αλj + (1 − α)λi Table 3: Kullback Divergences for Common Continuous Distributions Name D(fi ||fj ) Beta ln Chi B(aj , bj ) + ψ(ai )(ai − aj ) + ψ(bi )(bi − bj ) B(ai , bi ) + [aj + bj − (ai + bi )]ψ(ai + bi ) " # 1 σj kj Γ(kj /2) ki 2 2 σi − σj ψ(ki /2) (ki − kj ) + ln + 2 σi Γ(ki /2) 2σj2 Γ(dj /2) di − dj + ψ(di /2) Γ(di /2) 2 χ2 ln Cramér θi θi + θj ln −2 θi − θj θj Dirichlet log Exponential λi λj − λi ln + λj λi Gamma Multivariate Gaussian Univariate Gaussian " d X B(aj ) + aik − ajk ψ(aik ) − ψ B(ai ) k=1 θi − θj θj k Γ(kj )θj j d X !# aik k=1 + (ki − kj ) (ln θi + ψ(ki )) k Γ(ki )θi i i 0 −1 1h 1 |Σj | −1 ln + tr Σj Σi + µi − µj Σj µi − µj − n 2 |Σi | 2 i 1 h σj 2 2 2 (µi − µj ) + σi − σj + ln 2σj2 σi ki + ln 8 Name D(fi ||fj ) Special Bivariate Gaussian 1 ln 2 1 − ρ2j ! + ρ2j − ρj ρi General Gumbel 1− 1 − ρ2j βj βi βi (µ −µ )/βj ln +γ −1 +e j i Γ +1 −1 βi βj βj Half-Normal ln σj σi Inverse Gaussian 1 2 λj + ln λi Laplace ln Lévy Equal Supports (µi = µj ) 1 ln 2 Log-normal i 1 h σj 2 2 2 (µi − µj ) + σi − σj + ln 2σj2 σi Maxwell Boltzmann 3 ln Pareto ln ρ2i σi2 − σj2 + 2σj2 λi λj + ! λj (µi − µj )2 − 1 µi µ2j λj |θi − θj | λi + + exp (−|θi − θj |/λi ) − 1 λi λj λj ci cj σj σi + + cj − ci 2ci 3(σi2 − σj2 ) 2σj2 m i aj ai aj − ai + ln + , mj aj ai σi2 − σj2 σj 2 ln + σi σj2 Rayleigh Uniform General Weibull for mi ≥ mj and ∞ otherwise. bj − aj for (ai , bi ) ⊆ (aj , bj ) and ∞ otherwise. bi − ai ! λi kj ki λj kj kj − ki kj + +γ Γ 1+ −1 ln kj λi ki λj ki ln 2.3. Information Rates for Stationary Gaussian Processes It is also of interest to extend divergence measures to stochastic processes, in which case one considers the rate of the given divergence measure. Given two real-valued processes X = {Xn }n∈N , Y = {Yn }n∈N , the expression for the Rényi divergence rate between X and Y is Dα (X||Y ) := limn→∞ (1/n)Dα (fX n ||fY n ), whenever the limit exists, and where fX n and fY n are the densities of X n = (X1 , ..., Xn ) and Y n = (Y1 , ..., Yn ), respectively. The KLD, Rényi entropy, and differential Shannon entropy rates are similarly defined. In Table 4 we summarize these expressions for stationary zero-mean Gaussian processes with the appropriate reference. Note that ϕ(λ) and ψ(λ) are the power spectral densities of X and Y on [−π, π], respectively. 9 Table 4: Information Rates for Stationary Zero-Mean Gaussian Processes Information Measure Rate Differential Entropy 1 1 ln(2πe) + 2 4π Rényi Entropy Conditions 1 1 1 ln 2πα α−1 + 2 4π ln ϕ(λ) ∈ L1 [−π, π] π Z ln ϕ(λ) dλ [18, 17] −π Z Also valid for nonzero mean stationary Gaussian processes. ln ϕ(λ) ∈ L1 [−π, π] π ln 2πϕ(λ) dλ [12] −π Also valid for nonzero mean stationary Gaussian processes. ϕ(λ)/ψ(λ) is bounded or KLD D(X||Y ) 1 4π Z π −π ϕ(λ) ϕ(λ) − 1 − ln ψ(λ) ψ(λ) dλ [17] ψ(λ) > a > 0 , ∀λ ∈ [−π, π] and ϕ ∈ L2 [−π, π]. h(λ) := αψ(λ) + (1 − α)ϕ(λ) Rényi Divergence Dα (X||Y ) 1 4π(1 − α) Z π ln −π h(λ) ϕ(λ)1−α ψ(λ)α dλ [31] ψ(λ) and ϕ(λ) are essentially bounded and essentially bounded away from zero, and α ∈ (0, 1). 3. Rényi Divergence and the Log-likelihood Ratio Song [30] pointed out a relationship between the derivative of the Rényi entropy with respect to the parameter α and the variance of the log-likelihood function of a distribution. Let f be a probability density, then lim α→1 d 1 hα (f ) = − Var(ln f (X)) , dα 2 assuming the integrals involved are well-defined and differentiation operations are legitimate. Extending Song’s idea, we note that the variance of the loglikelihood ratio (LLR) between two densities can be similarly derived from an analytic formula of their Rényi divergence of order α. Let fi and fj be two probability densities such that Dα (fi || fj ) is n times continuously differentiable with respect to α (n ≥ 2). Assuming differentiation and integration can be interchanged, one obtains d 1 fi (X) lim Dα (fi ||fj ) = Varfi ln . α→1 dα 2 fj (X) R Considering the integral G(α) := fi (x)α fj1−α (x) dx for α > 0, α 6= 1, we have dn G(α) = Efi α→1 dαn lim 10 ln fi (X) fj (X) n . Hence, from the definition of Dα (fi ||fj ), we find d 1 −1 dG(α) lim (α − 1)G(α) Dα (fi ||fj ) = lim − ln G(α) α→1 dα α→1 (α − 1)2 dα " 2 2 #! fi (X) fi (X) 1 −Efi ln + Efi ln , = 2 fj (X) fj (X) where we used l’Hospital’s rule to evaluate the limit. This connection between the Rényi divergence and the LLR variance becomes practically useful in light of the Rényi divergence expressions presented in Section 2.2. Table 5 presents these quantities for the continuous univariate distributions considered in this paper. Note that ψ (1) (x) refers to the polygamma function of order 1. Table 5: Var(LLR) for Common Continuous Univariate Distributions ln fi (X) fj (X) Name Varfi Beta (ai − aj ) ψ (ai ) + (bi − bj ) ψ (bi ) − (ai − aj + bi − bj ) ψ k 2 σi2 − σj2 ki σi2 + σj2 − 2kj σj2 + σj4 (ki − kj )2 ψ (1) 2i Chi χ2 Cramér Exponential Gamma Univariate Gaussian Gumbel (Fixed Scale) Half-Normal Inverse Gaussian Laplace 2 (1) 2 Log-normal 2 4σj4 1 di 2 (1) (di − dj ) ψ 4 2 h θ i 2 4 (θi − θj ) − θi θj log2 θj i (θi − θj )2 (λj − λi )2 λ2i (θi − θj )(ki (θi + θj ) − 2kj θj ) + θj2 (ki − kj )2 ψ (1) (ki ) θj2 2σi2 (µi 2 − µj ) + (σi2 − σj2 )2 2σj4 − e 2µi β e µi β −e µj β 2 (σj2 − σi2 )2 2σj4 λi λ2j µ4i − 2λi λ2j µ2i µ2j + µ4j λi λ2j + 2µi (λi − λj )2 1 λ2j ( 2 λi 2−e − 4λ2i µi µ4j ! −θ | 2|θi j λi − 2λi λj e −2(λi + λj )|θi − θj |e Lévy Equal Supports (µi = µj ) (1) − (ci − cj )2 2c2i 2σi2 (µi − µj )2 + (σi2 − σj2 )2 2σj4 11 |θi −θj | λi − |θi −θj | λi ) 2 + λj (1) (ai + bi ) ln fi (X) fj (X) Name Varfi Maxwell Boltzmann 3(σj2 − σi2 )2 Pareto Rayleigh 2σj4 (ai − aj )2 a2i (σj2 − σi2 )2 σj4 Weibull Fixed Scale k λk i − λj 2 λ2k j Song [30] also proposed a nonparametric estimate of Var(ln f (X)) based on separately estimating ∆1 := E[ln f (X)] and ∆2 := E[(ln f (X))2 ] by the plug-in R ˆ l,n := fn (x) ln fn (x) l dx, l = 1, 2, where fn is a kernel density estimates ∆ estimate of f from the independent and identically distributed (i.i.d.) samples (X1 , X2 , . . . , Xn ) that are drawn according to f . Theorem 3.1 in [30] establishes ˆ 2,n − ∆ ˆ 2 converges with probability one to precise conditions under which ∆ 1,n Var(ln f (X)) as n → ∞ (the estimate is strongly consistent). Assuming one has access to i.i.d. samples (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) drawn according to fi and fj , respectively, we can propose a similar estimate of Varfi ln ffji (X) (X) . Let ˜ l,n := ∆ Z fi,n (x) fi,n (x) ln fj,n (x) l dx, l = 1, 2, where fi,n and fj,n are kernel density estimates of fi and fj based on (X1 , . . . , Xn ) and (Y1 , . . . , Yn ), respectively. It is an interesting problem to find exact condi˜ 2,n − ∆ ˜2 tions on fi and fj such that appropriate choices of fi,n and fj,n make ∆ 1,n fi (X) a strongly consistent estimate of Varfi ln fj (X) . Investigating this problem is out of the scope of this paper, but it is likely that the consistency proof in [30] can be extended to this case under appropriate restrictions on fi and fj . Alternatively, the partitioning, k(n) nearest-neighbor, and minimum spanning tree based methods reviewed in [33] and the references therein should also provide consistent estimates. 4. Conclusion In this work we presented closed-form expressions for Rényi and KullbackLeibler divergences for nineteen commonly used univariate continuous distributions, as well as those expressions for Dirichlet and multivariate Gaussian distributions. We also presented a table summarizing four of the most important information measure rates for zero-mean stationary Gaussian processes, as well as the relevant references. Lastly, we established a connection between the log-likelihood ratio between two distributions and their Rényi divergence, 12 extending the work of Song [30], who considers the log-likelihood function and its relation to Rényi entropy. Following this connection, we have provided the corresponding expressions for the nineteen univariate distributions considered here. The present compilation is of course not complete. Particularly valuable future additions would be the Rényi divergence expressions for Student-t and Kumaraswamy distributions, as well as for the exponentiated and beta-normal distributions [10]. 5. Acknowledgements The authors are grateful to NSERC of Canada whose funds partially supported this work. [1] J. Aczél, Z. Daróczy, On Measures of Information and their Characterization, Academic Press, 1975. [2] F. Alajaji, P. N. Chen, Z. Rached, Csiszár’s cutoff rates for the general hypothesis testing problem, IEEE Transactions on Information Theory 50 (4) (2004) 663 – 678. [3] C. Arndt, Information Measures: Information and its Description in Science and Engineering, Signals and Communication Technology, Springer, 2004. [4] M. Asadi, N. Ebrahimi, G. G. Hamedani, E. S. Soofi, Information measures for Pareto distributions and order statistics, in: Advances in Distribution Theory, Order Statistics, and Inference, Springer, 2006, pp. 207–223. [5] J. Burbea, The convexity with respect to Gaussian distributions of divergences of order α, Utilitas Mathematica 26 (1984) 171–192. [6] I. Csiszár, Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten, Magyar Tud. Akad. Mat. Kutató Int. Közl (8) (1963) 85–108. [7] I. Csiszar, Generalized cutoff rates and Rényi’s information measures, IEEE Transactions on Information Theory 41 (1) (1995) 26 –34. [8] G. A. Darbellay, I. Vajda, Entropy expressions for multivariate continuous distributions, IEEE Transactions on Information Theory (2000) 709–712. [9] P. A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, Prentice - Hall International, 1982. [10] N. Eugene, C. Lee, F. Famoye, Beta-normal distribution and its applications, Communications in Statistics - Theory and Methods, 31-(4) (2002) 497–512. [11] M. Gil, On Rényi divergence measures for continuous alphabet sources, Master’s thesis, Queen’s University (2011). 13 [12] L. Golshani, E. Pasha, Rényi entropy rate for Gaussian processes, Information Sciences 180 (8) (2010) 1486–1491. [13] L. Golshani, E. Pasha, G. Yari, Some properties of Rényi entropy and Rényi entropy rate, Information Sciences 179 (14) (2009) 2426 – 2433. [14] P. Harremoës, Interpretations of Rényi entropies and divergences, Physica A: Statistical Mechanics and its Applications 365 (1) (2006) 57 – 62. [15] A. O. Hero, B. Ma, O. Michel, J. Gorman, Alpha-divergence for classification, indexing and retrieval, Tech. rep., University of Michigan (2001). [16] T. Hobza, D. Morales, L. Pardo, Rényi statistics for testing equality of autocorrelation coefficients, Statistical Methodology 6 (4) (2009) 424 – 436. [17] S. Ihara, Information Theory for Continuous Systems, World Scientific, 1993. [18] A. N. Kolmogorov, A new metric invariant of transient dynamical systems and automorphisms, in: in Lebesgue Spaces.??? Dokl. Akad. Nauk. SSSR 119, 1958. [19] S. Kullback, R. A. Leibler, On information and sufficiency, The Annals of Mathematical Statistics 22 (1) (1951) 79–86. [20] F. Liese, I. Vajda, Convex Statistical Distances, Treubner, 1987. [21] F. Liese, I. Vajda, On divergences and informations in statistics and information theory, IEEE Transactions on Information Theory 52 (10) (2006) 4394 –4412. [22] S. Nadarajah, K. Zografos, Formulas for Rényi information and related measures for univariate distributions, Information Sciences 155 (1-2) (2003) 119 – 138. [23] M. C. Pardo, J. A. Pardo, Use of Rényi’s divergence to test for the equality of the coefficients of variation, Journal of Computational and Applied Mathematics 116 (1) (2000) 93 – 104. [24] W. D. Penny, Kullback-Leibler divergences of normal, gamma, dirichlet and wishart densities., Tech. rep., Wellcome Department of Cognitive Neurology (2001). [25] W. D. Penny, S. Roberts, Bayesian methods for autoregressive models, in: Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, vol. 1, 2000. [26] T. W. Rauber, T. Braun, K. Berns, Probabilistic distance measures of the Dirichlet and Beta distributions, Pattern Recognition 41 (2) (2008) 637 – 645. 14 [27] A. Rényi, On measures of entropy and information, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1961. [28] M. J. Rivas, M. T. Santos, D. Morales, Rényi test statistics for partially observed diffusion processes, Journal of Statistical Planning and Inference 127 (1-2) (2005) 91 – 102. [29] C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (1948) 379 - 423. [30] K. Song, Rényi information, loglikelihood and an intrinsic distribution measure, Journal of Statistical Planning and Inference 93 (1-2) (2001) 51 – 69. [31] I. Vajda, Theory of Statistical Inference and Information, Kluwer, Dordrecht, 1989. [32] T. van Erven, P. Harremoës, Rényi divergence and majorization, in: Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, 2010. [33] Q. Wang, S. Kulkarni, S. Verdú, Universal estimation of information measures for analog sources, Foundations and Trends in Communications and Information Theory, 5 (3) (2009), 265 – 353. [34] K. Zografos, S. Nadarajah, Expressions for Rényi and Shannon entropies for multivariate distributions, Statistics & Probability Letters 71 (1) (2005) 71 – 84. 15
© Copyright 2025 Paperzz