Rényi Entropies of Marginal Distributions Peter Harremoës 1 C.W.I., Centrum voor Wiskunde en Informatica Kruislaan 413 1098 SJ Amsterdam, The Nederlands email: [email protected] Christophe Vignat Equipe Signal et Communications, Institut Gaspard Monge 5 Boulevard Descartes, Cite Descartes 77454 Marne la Vallee Cedex 2, France email: [email protected] Abstract In this paper we are interested in n-dimensional uniform distributions on a p-sphere. We show that their marginal distributions are maximizers of Rényi entropy under a moment constraint. Moreover, using an example, we show that a distribution on a triangle with (uniform) maximum entropy marginals may have an arbitrary small entropy. As a last result, we address the asymptotic behavior of these results and provide a link to the de Finetti theorem. PACS: 02.50.-r; 05.90.+m; 65.40.Gr. Keywords: De Finetti theorem, Marginal distribution, p-moment, p-sphere, Rényi entropy, uniform distribution. 1 Introduction Define the p-sphere Sn,p in Rn by ( Sn,p = X ∈ Rn | n X ) |Xi |p = nµ . i=1 1 Supported by the Danish Natural Science Research Council. Preprint submitted to Elsevier Science 13 November 2006 where µ is a positive constant. In the literature the k-dimensional marginals of the ”uniform” distribution on Sn,p have been studied, see [?], [?] and [?]. The ”uniform” distribution can be defined geometrically by the normalized surface area of the hyper-surface. This definition of uniformity is highly problematic: P in practice, a condition ni=1 |Xi |p = nµ can never be observed. Instead, one may empirically verify a condition such as n 1X |Xi |p ∈ [µ − ε, µ + ε] n i=1 for some small value of ε. If we take the uniform distribution on the set ( X∈R n n 1X | |Xi |p ∈ [µ − ε, µ + ε] n i=1 ) and project it into Sn,p , we will get the disintegration measure. The disintegration measure equals the normalized surface area in the cases p = 1, 2 and ∞. The case p = +∞ is somewhat degenerate but the cases p = 1 and p = 2 shall be discussed in more detail. For a distribution in Rk with density f the Rényi entropy of order q is defined by log Rk f q . hq = 1−q R (1) An interesting property is that the k-dimensional marginal distribution (k ≤ n) of a uniform distribution on a sphere maximizes the Rényi entropy under a constraint of moment for some index q related to k and n. In this paper we shall study this phenomenon. We should note that maximizers of Rényi entropy are at the same time maximizers of Tsallis entropies. Maximizers of Rényi entropy were studied in [?] and [?]. We provide moreover a natural link that allows to extend this result to uniform distributions on a simplex in Rn+ where the sum X1 + X2 + ... + Xn is assumed to be constant. At last, we mention links between these results and the de Finetti theorem. 2 Marginals of the uniform distribution on the sphere and the ball For p > 0 we introduce the p-ball as the set ( n Bn,p (0, r) = (x1 , x2 , ..., xn ) ∈ R | n X i=1 2 ) p |xi | ≤ r p . First we note that for any k ∈ N the maximizer of the differential Rényi entropy hq under the variance constraint k 1X |Xi |p = µ E k i=1 ! is |x1 |p + |x2 |p + ... + |xk |p f (x) ∼ 1 − kµ ! 1 q−1 + (see [?] and [?]). The case q = ∞ is of special interest since we get the uniform distribution on the p-ball. The uniform distribution on Bn,p 0, (nµ)1/p is projected onto a distribution with support Bk,p 0, (nµ)1/p . The set of points in Bn,p 0, (nµ)1/p with projection (x1 , x2 , ..., xk ) ∈ Bk,p 0, (nµ)1/2 is the set 1/2 (x1 , x2 , ..., xn ) ∈ Bn,p 0, (nµ) | n X p |xi | = nµ − nµ − k X !1/p n−k |xi |p Pk i=1 |xi |p 1/p p |xi | . i=1 i=k+1 This is a ball of radius nµ − k X so it has volume proportional to = nµ − k X ! n−k p p |xi | (2) i=1 i=1 = (nµ) n−k p p i=1 |xi | nµ Pk 1− ! n−k p . (3) We see that this distribution maximizes the differential Rényi entropy of order q if n−k 1 = . q−1 p This is equivalent to q= n−k+p . n−k (4) Now we consider a two-step projection. First we take the marginal distribution on k < n dimensions. Then we take the marginals in l < k dimensions. The 3 . The first marginal distribution maximizes Rényi entropy of order q1 = n−k+p n−k n−l+p other marginal distribution maximizes the Rényi entropy of order q2 = n−l . Thus the maximizer of the Rényi entropy of order q1 has a marginal that maximizes Rényi entropy of order q2 . The dimension can be isolated in the expression for q1 and plugged into the expression of q2 leading to the following theorem. Theorem 1 Assume l < k. The distribution in Rk that maximizes the Rényi P entropy of order q1 ≥ 1 under the moment condition E k1 ki=1 |Xi |p = µ has a marginal in Rl that maximizes Rényi entropy of order q2 P the differential under the moment condition E 1l li=1 |Xi |p = µ where q2 = k−l+ k−l+ pq1 q1 −1 p q1 −1 . (5) We note that the equation (4) can be recovered by letting q1 tend to infinity in for some integer (5). Strictly speaking the proof only works when q1 = n−k+p n−k n, but this implies that it holds for infinitely many values of p and so it holds for all (positive) values by analytic continuation. The formula does not hold for negative values of q1 .The p-sphere corresponds to q1 = −∞ so we have to treat this case separately. We shall equip the sphere Sn,p (0, r) with the disintegration measure. Thus we shall consider the projection of a thin area lying between Sn,p (0, nµ) and Sn,p (0, n (µ + ε)) each equipped with uniform distribution. The projection of the (unnormalised) uniform distribution on Sn,p (0, nµ) is nµ − k X ! n−k p p |xi | i=1 + and the derivative with respect to µ is k X n−k n nµ − |xi |p p i=1 ! n−k −1 p . + Thus the projection of the sphere with the disintegration measure has density proportional to Pk 1− |xi |p nµ ! n−k −1 p i=1 + 4 and this distribution maximizes the differential entropy of order q given by the equation n−k 1 = −1 q−1 p that has the solution q= 3 n−k . n−k−p (6) Uniform Distribution on the Euclidian Sphere and the Simplex Consider the n-dimensional sphere ( n Sn = X ∈ R | n X ) Xi2 = nµ . i=1 The n-dimensional ball Bn is the convex hull of the n-dimensional sphere. Uniform distributions on sphere play an important role in statistics since they appear as the extreme points of the convex set of orthogonally invariant probabilities [?]. The sphere is highly symmetric, and the uniform distribution Un on Sn is characterized as the unique probability distribution invariant under the group of rotations. The rotations form a compact group and such a group has a unique Haar measure. The uniform distribution on the sphere can be obtained from the Haar measure via the map that maps a vector into the vector rotated by an element of the rotation group. None of the disintegration measures on p-spheres with p 6= 2 can be obtained from a Haar measure. Some distributions are of special interest. First of all the case q = 1 leads to the Gaussian distribution. The case q = −∞ leads to the uniform distribution on the sphere and q = ∞ leads to the uniform distribution on the ball. The uniform distribution of the disc in C is the so-called Girko law that appears as the asymptotic eigenvalue distribution of a large random matrix. The projection on the real axis is Wigner’s semicircular law that is the asymptotic eigenvalue distribution of a large Hermitian matrix. From Equation (4) with n = 2 and k = 1 we see that the semicircular law maximizes Rényi entropy of order 3. We are interested in the k-dimensional marginal distribution of vector [X1 , . . . , Xn ] uniformly distributed on the sphere, i.e. the distribution of the vector [X1 , . . . , Xk ] where k ≤ n. From Equation (6) we see that the marginal distribution is a 5 maximizer of the Rényi entropy of order q= n−k . n−k−2 (7) For p = 1 we shall restrict to the simplex ( Tn = (z1 , . . . , zn ) ∈ Rn+ | n X ) zi = nλ i=1 instead of the 1-sphere. We note that the disintegration measure equals the surface area. The simplex with the uniform distribution is invariant under permutations of the extreme points, but contrary to the sphere there are many other probability measures that are invariant under permutations of the extreme points. The surface area is ”locally” invariant under translations of the simplex, but such ”local” translations cannot be extended to an affine map of the simplex into itself. Now we will transform Equation (7) into a result on projection of the uniform distribution on the simplex. Consider the sphere S2n and the vector X = [X1 , Y1 , X2 , Y2 , . . . , Xn , Yn ] . Put Zi = Xi2 + Yi2 and Z = [Z1 , . . . , Zn ] . Then for X ∈ S2n , the vector Z belongs to the simplex Tn , where λ = 2µ. If each of the components Xi and Yi are independent Gaussian, then each component Zi is exponentially distributed. The condition Z ∈ Tn is equivalent to the condition X ∈ S2n . The conditional distribution of Z given that Z ∈ Tn is uniform on Tn and the conditional distribution of X given that X ∈ S2n is uniform on S2n . Thus the mapping from X to Z transforms the uniform distribution on S2n to the uniform distribution on Tn . Consider the projection of S2n into B2k and the corresponding projection of Tn into simplex T̃k . According to our previous results, the uniform distribution on S2n is transformed into the distribution on B2k that maximizes Rényi entropy n−k 2n−2k = n−k−1 . Thus the uniform distribution on Tn is projected of order q = 2n−2k−2 n−k into a distribution Qn on T̃k that maximizes Rényi entropy of order q = n−k−1 under the condition E (X) = λ. For k = 1 the density f (x) of Qn is given by the formula f (x) = n−2 x n−1 (1− nλ ) n λ 0 for x ∈ [0, nλ] else. One should note that all these distributions are Dirichlet distributions and that it is well-known that Dirichlet distributions project into Dirichlet distributions. One should also note that the uniform distribution on the circle 6 Fig. 1. Density of projections for λ = 1 and k = 1 and n = 2, 3, 4, 5, 10 and the exponential distribution (n = ∞). n o (x1 , x2 , ..., xn ) ∈ Rn+ | ni=1 x2i = 1 is transformed into a Dirichlet distribution with parameters 1/2 via (x1 , x2 , ..., xn ) → (x21 , x22 , ..., x2n ) and this distribution is the Jeffrey’s prior on the set of multinomial distributions. In the special case when n = 2 the Dirichlet distribution coincides with the well-known beta distributions. 4 P A Counterexample on triangle We have seen that a distribution on the sphere or simplex with high entropy projects into a distribution with high Rényi entropy. The converse is not true. There exist distributions with low entropy such that their marginal distributions have high entropy. We shall provide such examples on the 2-dimensional simplex, i.e. a triangle. First we decompose the triangle into six small triangles as illustrated in Figure 2. We consider the uniform distribution P1 on three of these six smaller triangles and note that this distribution has Rényi entropy hq = −1 for any order q and it has the same 1-dimensional marginal distributions as the uniform distribution on the whole triangle. Thus the joint distribution may have small entropy although the marginals have high entropy. The reason is that the triangle is only projected in three directions. If the distribution of the projections in all directions were known, the original distribution could be reconstructed from the projected distributions according to a theorem by Radon. The above construction can be iterated. We divide the triangle into smaller 7 !"!#$%&&'($")*++,)($-".($/0,1$2$3$45($%&&' y 1 0.75 0.5 0.25 0 0 1 2 3 4 x !"#$ %$ &'()"*+ ,- ./,0'1*",() -,/ ! ! " 2(3 " ! " 2(3 # ! #$ $$ %$ &$ "' 2(3 *4' '5.,('(*"26 3")*/"78*",( (# ! !) % 9,()"3'/ *4' ./,0'1*",( ,- !!! "(*, "!" 2(3 *4' 1,//').,(3: "(# ./,0'1*",( ,- #! "(*, )";.6'5 #!" $ <11,/3"(# *, ,8/ ./'=": ,8) /')86*)> *4' 8("-,/; 3")*/"78*",( ,( !!! ") */2()-,/;'3 "(*, *4' 3")*/"78*",( ,( "!" *42* ;25";"?') @A(+" '(*/,.+ ,- ,/3'/ !!!!" !!" $ " !!!!"!! " !!"!" % B48) *4' 8("-,/; 3")*/"78*",( ,( #! ") ./,0'1*'3 "(*, 2 3")*/"78*",( &! ,( #!" *42* ;25";"?') @A(+" !!" '(*/,.+ ,- ,/3'/ $ " !!"!" 8(3'/ *4' 1,(3"*",( ' #( " )$ % !,/ * " % *4' 3'()"*+ + #,$ ,- &! ") #"='( 7+ *4' -,/;862 ! "!! ! !!" #"! "# $ -,/ , ! &'- .)( + #,$ " ! # ' '6)'$ CD$ < 9 EFGBH@HI<JKLH EG BMH B @C<GNLH O' 42=' )''( *42* 2 3")*/"78*",( ,( *4' ).4'/' ,/ */"2(#6' P"*4 4"#4 '(*/,.+ ./,0'1*) "(*, 2 3")*/"78*",( P"*4 4"#4 @A(+" '(*/,.+$ B4' 1,(='/)' ") (,* */8'$ B4'/' '5")* 3")*/"78*",() P"*4 6,P '(*/,.+ )814 *42* *4'"/ ;2/#"(26 3")*/"78*",() 42=' 4"#4 '(*/,.+$ O' )4266 ./,="3' )814 '52;.6') ,( *4' Q: 3";'()",(26 )";.6'5$ !"/)* P' 3'1,;.,)' *4' )";.6'5 "(*, )"5 );266 */"2(#6') 2) "668)*/2*'3 "( !"#8/' Q$ O' 1,()"3'/ *4' 8("-,/; 3")*/"78*",( /" ,( *4/'' ,- *4')' )"5 );266'/ */"2(#6') 2(3 (,*' *42* *4") 3")*/"78*",( 42) @A(+" '(*/,.+ 0$ " "% -,/ 2(+ ,/3'/ $ 2(3 "* 42) *4' )2;' %:3";'()",(26 ;2/#"(26 3")*/"78*",() 2) *4' 8("-,/; 3")*/"78*",( ,( *4' P4,6' )";.6'5$ B48) *4' 0,"(* 3")*/"78*",( ;2+ 42=' );266 '(*/,.+ 26*4,8#4 *4' ;2/#"(26) 42=' 4"#4 '(*/,.+$ B4' 1 /'2),( ") *42* *4' )";.6'5 ") ,(6+ ./,0'1*'3 "( *4/'' 3"/'1*",()$ C- *4' 3")*/"78*",( ,- *4' ./,0'1*",() "( 266 3"/'1*",() P'/' R(,P( *4' ,/"#"(26 3")*/"78*",( 1,863 7' /'1,()*/81*'3 -/,; *4' ./,0'1*'3 3")*/"78*",() 211,/3"(# *, 2 *4',/'; 7+ @23,($ B4' 27,=' 1,()*/81*",( 12( 7' "*'/2*'3$ O' 3"="3' *4' )";.6'5 "(*, );266'/ )";.6"1') P"*4 )"3') .2/266'6 *, ,/"#"(26 )";.6'5$ C- 2 );266 )";.6'5 ") "()"3' *4' )8..,/* ,- /" "* ") 3"="3'3 "( )"5 */"2(#6') 2(3 *4' ./,727"6"*+ ;2)) ,( *4' 2 2) )";.6'5 ") 3")*/"78*'3 8("-,/;6+ ,( *4/'' ,- *4' */"2(#6') "668)*/2*'3 2* *4' 7,**,; "( !"#8/' Q$ B4' 1,()*/81*",( "(=,6=') 2( "(S("*' (8;7'/ ,- )";.6"1') 78* 7+ )*2(32/3 2/#8;'(*) ,(' 12( .'/-,/; *4") 1,()*/81*",( *, ,7*2"( /! % B4' 3")*/"78*",( /! !"#$ Q$ !,/ 2(+ =268' ,- & *4' 8("-,/; 3")*/"78*",( ,( 8..'/ )'* 42) '(*/,.+ '! ! "" 7"* 2(3 *4' 8("-,/; 3")*/"78*",( ,( *4' 6,P'/ )'* 42) '(*/,.+ '! ! "#$ Fig. 2. For any value of q the uniform distribution on left set has entropy hq = −1 bit and the uniform distribution on the right set has entropy hq = −2. 2#2"( 42) %:3";'()",(26 ;2/#"(26) P"*4 ;25";26 '(*/,.+ 78* *4' 0,"(* 3")*/"78*",( 42) '(*/,.+ 0 " ")$ C( *4") P2+ 2 )'T8'(1' /" - /! - /# - %%% ") 1,()*/81*'3> 2(3 *4' )'T8'(1' 1,(='/#') P'2R6+ *, 2 3")*/"78*",( /" P"*4 '(*/,.+ 0 " "# 2(3 ;2/#"(26) P"*4 ;25";26 '(*/,.+$ simplices with sides parallel to the original triangle. If a small triangle is inside the support of P it is divided in six triangles !and the probability mass on the D$ B triangle is distributed uniformly on three of the triangles as illustrated at the O' )4266 (,P 1,()"3'/ *4' 2)+;.*,*"1 7'42=",/ ,- *4' ./,0'1*",( / ,- *4' 8("-,/; 3")*/"78*",( ,( ).4'/' an ! ,(*, bottom in Figure 2. The construction involves infinite number of simplices ! -,/ . *'(3"(# *, "(S("*+ 2(3 P"*4 S5'3 =2/"2(1' 1,()*/2"(*$ B4") ./,76'; P2) )*83"'3 "( XYZ 2(3 X%Z$ B4' 2)+;.*,*"1 but by standard arguments one can perform this construction to obtain P2 . 7'42=",/ 42) 26), 7''( )*83"'3 "( *4' ;,/' #'('/26 )'*8. ,- *4' 1,(3"*",(26 6";"* *4',/';> )'' X[Z 2(3 X\Z$ L'* $ The distribution P again has with maximal entropy 3'(,*' *4' ,/3'/ 1-dimensional ,- *4' @A(+" '(*/,.+ *42* / marginals ;25";"?')> "$'$ $ " #. " *$ 1 #. " * " )$ % E8/ S/)* ,7)'/=2*",( ") but the joint distribution = −2. *42* $ has $ % -,/entropy . $ #% L'* /h 3'(,*' *4' 3")*/"78*",( MH <UVJKBEBC9 WHM<DCE@ <G& BMH &H CGHBBC BMHE@HJ ! " ! ! ! ! ! " *42* ;25";"?') *4' 3"--'/'(*"26 '(*/,.+ 0" 8(3'/ *4' =2/"2(1' In this way a sequence P1 , P2 , P3 , ... is constructed, and the sequence con!"#$ verges weakly to a distribution P∞ with entropy h = −∞ and marginals with maximal entropy. 5 The asymptotic behavior and the de Finetti theorem We shall now consider the asymptotic behavior of the projection Pn of the uniform distribution on 2-sphere Sn onto Rk for n tending to infinity and with fixed variance constraint. This problem was studied in [?] and [?]. The asymptotic behavior has also been studied in the more general setup of the conditional limit theorem, see [1] and [2]. Let qn denote the order of the Rényi entropy that Pn maximizes, i.e. qn = (n − k) / (n − k − 2) . Our first observation is that qn → 1 for n → ∞. Let P∞ denote the distribution that maximizes the differential entropy h1 under the variance constraint, i.e. P∞ is the Gaussian distribution satisfying the variance constraint. Note that the Gaussian distribution has independent components. Note also that Rényi entropy hq is 8 decreasing in the order q. Hence, since qn ≥ 1 h1 (P∞ ) ≥ h1 (Pn ) ≥ hqn (Pn ) ≥ hqn (P∞ ) . Thus D (Pn kP∞ ) = h1 (P∞ ) − h1 (Pn ) ≤ h1 (P∞ ) − hqn (P∞ ) . (8) Therefore Pn is asymptotically Gaussian as first observed by Poincaré. Sometimes it is useful to split the inequality (8) into two bounds describing respectively the deviation from having Gaussian components and the deviation from having independent components. Let Pn(1) denote the one dimensional projections. Then h1 (Pn ) = k · h1 Pn(1) − I where I denotes the mutual information between the components, i.e. I = D Pn kPn(1) ⊗ Pn(1) ⊗ ... ⊗ Pn(1) . Thus (1) k · h1 P∞ ≥ k · h1 Pn(1) − I ≥ hqn (P∞ ) . Thus (1) (1) D Pn(1) kP∞ = h1 P∞ − h1 Pn(1) (1) − ≤ h1 P∞ hqn (P∞ ) → 0 for n → ∞ k and I hq (P∞ ) (1) − n ≤ h1 P∞ → 0 for n → ∞. k k The first inequality implies that Pn asymptotically has Gaussian marginals. The second inequality implies that Pn asymptotically has independent components. We shall now formulate the de Finetti Theorem related to 1-dimensional marginals of the uniform distributions on spheres, see also [?]. Let a random process X1 , X2 , ... be given and assume that the (marginal) distribution 9 of X1 , X2 , ..., Xn is invariant under rotations in Rn for all n. Then the random process is a mixture of i.i.d. Gaussian processes. In order to prove the de Finetti Theorem one needs a bound on the total variation between Pn and P∞ . Such bounds are provided in [?], [?] and recently by [?]. We shall use Pinsker’s Inequality [3] and Inequality (8) to derive a simple derivation of such a bound. For simplicity we shall restrict to the case k = 1 and µ = 1. Then D (Pn kP∞ ) ≤ h1 (P∞ ) − hq (P∞ ) 1 = log (2πe) − 2 log 1 2 R∞ −∞ 2π q x2 e− 2 (2π)1/2 !q dx 1−q − 2q log (2π) log 1 log (2πe) − 2 1−q n−1 −1 1 q − 1 − log (q) q−1 1 = ≤ = n−3 = . 2 q−1 4 4 2 (n − 3) = According to Pinsker’s inequality this implies that kPn − P∞ k ≤ (n − 3)−1/2 . Although this bound is weaker than the bounds found in [?] and [?] it is strong enough for the proof of the de Finetti Theorem. Similar results can be obtained if the quadratic constraints are replaced by linear constraints and Gaussian distributions are replaced by exponential distributions. The above method gives the bound kQn − Q∞ k ≤ (2n − 4)−1/2 . 6 Discussion In this paper we have proved that marginals of Rényi maximizers are Rényi maximizers under moment constraints. We have not linked our results with any interpretation of Rényi entropies. We should note that the results presented in this paper are related to Rényi entropies of orders q that are either greater than 1 or negative. Rényi entropies of order between 0 and 1 have a natural interpretation related to coding [?]. Rényi entropies are also related to Rényi’s siege problem as described in [?], but we have not been able to relate our results to the siege problem. Rényi entropies or more precisely Rényi Divergences are related to cutoff rates but these results seems to have little todo with cutoff rates. Many applications of Rényi entropies focus on maximizing the 10 Rényi entropy. Our results show that maximizing Rényi entropy under moment constraints is equivalent with taking a uniform distribution in a space of higher dimension. If the bigger space has many more dimensions than the original space one should use Shannon’s differential entropy. Thus the Rényi entropies should appear when the systems under consideration are essentially finite. 7 Acknowledgement The authors want to thank Oliver Johnson and Ioannis Kontoyiannis for stimulating discussions and Emre Telatar and Ecole Polytechnique Fédérale de Lausanne for hosting a workshop where these ideas were developed. We also thank Andrew Barron and Tim van Erven for useful comments. References [1] I. Csiszár, “Sanov property, generalized I-projection and a conditional limit theorem,” Ann. Probab., vol. 12, pp. 768–793, 1984. [2] O. Johnson, “Entropy and a generalisation of Poincare’s observation,” Mathematical Proceeding of the Cambridge Philosophical Society, vol. 135, no. 2, pp. 375–384, 2003. [3] T. Cover and J. A. Thomas, Elements of Information Theory. Wiley, 1991. 11
© Copyright 2026 Paperzz