1 Cover’s Open Problem: “The Capacity of the Relay Channel” Xiugang Wu, Leighton Pate Barnes and Ayfer Özgür Abstract Consider a memoryless relay channel, where the channel from the relay to the destination is an isolated bit pipe of capacity C0 . Let C(C0 ) denote the capacity of this channel as a function of C0 . What is the critical value of C0 such that C(C0 ) first equals C(∞)? This is a long-standing open problem posed by Cover and named “The Capacity of the Relay Channel,” in Open Problems in Communication and Computation, Springer-Verlag, 1987. In this paper, we answer this question in the Gaussian case and show that C(C0 ) can not equal to C(∞) unless C0 = ∞, regardless of the SNR of the Gaussian channels, while the cut-set bound would suggest that C(∞) can be achieved at finite C0 . Our approach is geometric and relies on a strengthening of the isoperimetric inequality on the sphere by using the Riesz rearrangement inequality. I. P ROBLEM S ETUP AND M AIN R ESULT In 1987, Thomas M. Cover asked the following question which he called “The Capacity of the Relay Channel” [2]. A. The Capacity of the Relay Channel1 Consider the following seemingly simple discrete memoryless relay channel: Here Z and Y are condi- C0 tionally independent and conditionally identically distributed given X, that is, p(z, y|x) = p(z|x)p(y|x). Also, the channel from Z to Y does not interfere with Y . A (2nR , n) code for this channel is a map X n : [1 : 2nR ] → X n , a relay function fn : Z n → [1 : 2nC0 ] and a decoding function gn : Y n × [1 : 2nC0 ] → [1 : 2nR ]. The probability of error is given by Pe(n) = Pr(gn (Y n , fn (Z n )) 6= M ), where the message M is uniformly distributed over [1 : 2nR ] and n n −nR p(m, y , z ) = 2 n Y i=1 p(yi |xi (m)) n Y i=1 p(zi |xi (m)). Let C(C0 ) be the supremum of achievable rates R for a given C0 , that is, the supremum of the rates R (n) for which Pe can be made to tend to zero. We note the following facts: The work was supported in part by NSF award CCF-1514538 and by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-0939370. This paper was presented in part at the 2016 Allerton Conference on Communication, Control, and Computing [1]. X. Wu, L. P. Barnes and A. Özgür are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA (e-mail: [email protected]; [email protected]; [email protected]). 1 This subsection is taken verbatim from [2] with a few notation changes. 2 1. C(0) = supp(x) I(X; Y ). 2. C(∞) = supp(x) I(X; Y, Z). 3. C(C0 ) is a nondecreasing function of C0 . What is the critical value of C0 such that C(C0 ) first equals C(∞)? B. Main Result In this paper, we answer this long-standing open question in the Gaussian case. In particular, consider the symmetric Gaussian relay channel as depicted in Fig. 1, where Z = X + W1 Y = X + W2 with the transmitted signal being constrained to average power P , i.e.2 , nR E[kX n k2 ] = 2 1 X 2nR m=1 kxn (m)k2 ≤ nP, (1) and W1 , W2 ∼ N (0, N ) representing Gaussian noises that are independent of each other and X. Z W1 ⇠ N (0, N ) W2 ⇠ N (0, N ) C0 X Y Fig. 1. Symmetric Gaussian relay channel. For this channel it is easy to observe that 1 2P C(∞) = log 1 + . 2 N Let C0∗ := inf{C0 : C(C0 ) = C(∞)}. (2) The cut-set bound [8] yields the following lower bound on C0∗ : 1 2P 1 P ∗ C0 ≥ log 1 + − log 1 + , 2 N 2 N which may lead one to suspect that C(∞) could be achieved at finite C0 . The main result of our paper is to show that C0∗ = ∞ regardless of the parameters of the problem. Theorem 1.1: For the symmetric Gaussian relay channel depicted in Fig. 1, C0∗ = ∞. 2 This constraint is less stringent than requiring kxn (m)k2 ≤ nP, ∀m ∈ [1 : 2nR ], which is a more standard way of expressing the average power constraint for the AWGN channel. Note that the capacity can be only larger under (1) and therefore conclusions of Theorems 1.1 and 1.2 are also valid under the individual power constraint for the codewords, i.e. kxn (m)k2 ≤ nP, ∀m ∈ [1 : 2nR ]. 3 This theorem follows immediately from the following theorem which establishes an upper bound on the capacity of this channel for any C0 . Theorem 1.2: For the symmetric Gaussian relay channel depicted in Fig. 1, the capacity C(C0 ) satisfies P 1 (3) C(C0 ) ≤ 2 log 1 + N + C0 + log sin θ 1 P hθ (ω), (4) C(C0 ) ≤ 2 log 1 + N + ω∈ min ( π2 −θ, π2 ] for some θ ∈ arcsin 2−C0 , π2 , where SNR = -15 dB ! 2ω 4sin 2 (P + N − N sin2 ω2 )sin 2 θ 1 hθ (ω) = log . 2 (P + N )(sin2 θ − cos2 ω) 0.045 Cut-set bound C-F Old bound New bound 0.04 0.035 P of In Fig. 2 we plot this upper bound (label: New bound) under three different values of the SNR = N the Gaussian channels together with the celebrated cut-set bound [8] and an upper bound on the capacity of this channel we have previously derived in [6] (label: Old bound). For reference, we also provide the rate achieved by a compress-and-forward relay strategy (label: C-F), which employs Gaussian input distribution at the source combined with Gaussian quantization and Wyner-Ziv binning at the relay.3 Note that from these figures one can visually observe that the new upper bound reaches the value C(∞) only as C0 → ∞, which leads to the conclusion in Theorem 1.1. This is formally proved in the next section. 0.03 0.025 0.02 SNR = -15 dB 0.045 0.005 0.01 0.015 0.02 0.025 0.03 C0 (bit/channel use) 0.035 0.04 0.045 SNR = 0 dB 0.8 Cut-set bound C-F Old bound New bound 0.04 0 0.75 SNR = 15 dB 3.1 Cut-set bound C-F Old bound New bound Cut-set bound C-F Old bound New bound 3 0.7 2.9 0.65 2.8 0.6 2.7 0.55 2.6 0.035 0.03 0.025 0.02 0 0.005 0.01 0.015 0.02 0.025 0.03 C0 (bit/channel use) 0.035 0.04 0.045 0.5 0 0.1 0.2 0.3 0.4 C0 (bit/channel use) 0.5 0.6 2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 C0 (bit/channel use) 0.7 0.8 0.9 1 Fig. 2. Upper bounds and achievable rates for the Gaussian relay channel. SNR = 0 dB 0.8 Cut-set bound C-F Old bound New bound 0.75 C. Approach Our approach builds on the method we developed in our earlier work [3]–[7] for characterizing information tensions in a Markov chain by using high-dimensional geometry. The main idea is to study the geometry of the high-dimensional typical sets associated with the random variables in the Markov chain and then translate this high-dimensional geometry to information inequalities for the random variables. This idea applies equally well to single-letter and multi-letter random variables. The main geometric tool employed in our previous work [3]–[7] was the so-called blowing-up lemma. In the current paper, our main geometric ingredient is a strengthening of the isoperimetric inequality on a high-dimensional sphere, which we develop by building on the Riesz rearrangement inequality [15]. The classical isoperimetric inequality on the sphere states that among all sets on the sphere with a given volume the spherical cap has the smallest boundary or more generally the smallest volume of neighborhood [9]. In this paper, we show 0.7 0.65 0.6 0.55 0.5 0 0.1 0.2 0.3 0.4 C0 (bit/channel use) 0.5 0.6 3 This is not the best state-of-the-art achievable rate as this rate can be improved, for example, by using bursty transmissions and time-sharing at low SNR. 4 that the spherical cap is the extremal set not only in terms of minimizing the volume of its neighborhood, but roughly speaking also in terms of minimizing its total intersection volume with a ball drawn around a randomly chosen point on the sphere. It may be a priori surprising that the isoperimetric inequality appears as the main technical ingredient in the solution of a network information theory problem. However, a converse can be thought of as characterizing the extremal configuration of the typical sets of the random variables associated with an information theory problem, i.e., the configuration that is induced by the (extremal) capacity-achieving strategy. In this sense, it is quite natural that a tool, such as the isoperimetric inequality, which characterizes extremal sets in a certain geometric sense, turns out to be useful. Formulating the problem of determining the communication capacity of channels as a problem in high-dimensional geometry is one of Shannon’s most important insights that has led to the conception of information theory. His second paper [10], which appears couple of months after his classical paper “A Mathematical Theory of Communication” [11] but is cited in this first paper, develops a geometric representation of any point-to-point communication system. It then provides an elegant and intuitive geometric proof of the coding theorem for the AWGN channel, where the converse is based on a spherepacking argument in high-dimensional space and achievability is proved by a geometric random coding argument. However, to the best of our knowledge such techniques have not been used effectively for solving network problems. Our approach is similar to Shannon’s approach in [10] in that the key step in our proof is a packing argument on a spherical cap. However, it is also different from Shannon’s approach as we do not directly study the geometry of the codewords but rather use high-dimensional geometry to characterize information tensions in a Markov chain by a lifting step to a high-dimensional space. We believe this approach can be useful for solving other open problems in network information theory. II. P ROOFS OF T HEOREMS 1.1 AND 1.2 The proofs of both theorems follow from the below lemma, which is the main technical focus of this paper. The proof idea for this lemma is outlined in Section II-C and a formal proof is given in Section IV. We now state this lemma and show how it leads to the conclusion in Theorem 1.2, which is then used to prove Theorem 1.1. Lemma 2.1: Let In be an integer random variable and X n , Y n and Z n be n-length random vectors which form the Markov chain In − Z n − X n − Y n . Assume moreover that Z n and Y n are i.i.d. white Gaussian vectors given X n , i.e. Z n , Y n ∼ N (X n , N In×n ), where In×n denotes the identity matrix, and E[kX n k2 ] = nP , and In = fn (Z n ) is a deterministic mapping of Z n to a set of integers. Let H(In |X n ) be denoted by −n log sin θn , i.e., define4 1 n θn := arcsin 2− n H(In |X ) . Then the following inequality holds for any n, 1 H(In |Y n ) ≤ n · min log ω∈( π2 −θn , π2 ] 2 4sin2 ω2 (P + N − N sin2 ω2 ) (P + N )(sin2 θn − cos2 ω) (5) ! . (6) Note that the lemma provides an upper bound on H(In |Y n ) in terms of H(In |X n ) for a Markov chain that satisfies the conditions of the lemma. A. Proof of Theorem 1.2 Suppose a rate R is achievable. Then there exists a sequence of (2nR , n) codes such that the average (n) probability of error Pe → 0 as n → ∞. Let the relay’s transmission be denoted by In = fn (Z n ). By 4 Note that values of θn between 0 and π/2 span all possible values for H(In |X n ) between 0 and ∞. 5 standard information theoretic arguments, for this sequence of codes we have nR = H(M ) = I(M ; Y n , In ) + H(M |Y n , In ) ≤ I(X n ; Y n , In ) + nµ = I(X n ; Y n ) + I(X n ; In |Y n ) + nµ = I(X n ; Y n ) + H(In |Y n ) − H(In |X n ) + nµ ≤ nI(XQ ; YQ ) + H(In |Y n ) − H(In |X n ) + nµ P n + H(In |Y n ) − H(In |X n ) + nµ, ≤ log 1 + 2 N (7) (8) (9) (10) for any µ > 0 and n sufficiently large. In the above, (7) follows from applying the data processing inequality to the Markov chain M −X n −(Y n , In ) and Fano’s inequality, (8) uses the fact that In −X n −Y n form a Markov chain and thus H(In |X n , Y n ) = H(In |X n ), (9) follows by defining the time sharing random variable Q to be uniformly distributed over [1 : n], and (10) follows because nR E[XQ2 ] nR 2 n 2 1 X1X 2 1 1 X n kx (m)k2 ≤ P. xi (m) = = nR nR 2 m=1 n i=1 n 2 m=1 (11) Given (10), the standard way to proceed would be to upper bound the first entropy term by H(In |Y n ) ≤ H(In ) ≤ nC0 and lower bound the second entropy term H(In |X n ) simply by 0. This would lead to the socalled multiple-access bound in the well-known cut-set bound on the capacity of this channel [8]. However as we already point out in our previous works [3]–[7], this leads to a loose bound since it does not capture the inherent tension between how large the first entropy term can be and how small the second one can be. Instead, we can use Lemma 2.1 to more tightly upper bound the difference H(In |Y n ) − H(In |X n ) in (10). We start by verifying that the random variables In , X n , Z n and Y n associated with a code of blocklength n satisfy the conditions in the lemma. It is trivial to observe that they satisfy the required Markov chain condition and Z n and Y n are i.i.d. Gaussian given X n due to the channel structure. Note also that without loss of generality we can assume that the code satisfies the average power constraint in (1) with equality, i.e., 2nR 1 X n 2 E[kX k ] = nR kX n (m)k2 = nP. 2 m=1 (n) This is because given a (2nR , n) code with average probability of error Pe and E[kX n k2 ] = nP 0 < nP , p we can always scale up the codewords by a factor of nP/nP 0 and achieve an average probability of (n) error smaller than or equal to Pe .5 Therefore, applying Lemma 2.1 to the random variables associated with a code for the relay channel, we can bound the difference of the two entropy terms in (10) and conclude that for any achievable rate R, 1 P R ≤ log 1 + + min hθn (ω) + µ, (12) 2 N ω∈( π2 −θn , π2 ] where hθn (ω) is defined as 1 hθn (ω) = log 2 4sin2 ω2 (P + N − N sin2 ω2 )sin 2 θn (P + N )(sin2 θn − cos2 ω) ! , (13) 5 This can be done for example by adding additional independent Gaussian noise at the relay and the destination to emulate the transmission of the original codeword. 6 1 in which θn := arcsin 2− n H(In |X n) satisfies 1 θ0 := arcsin 2−C0 ≤ arcsin 2− n H(In |X n) = θn ≤ π . 2 At the same time, for any achievable rate R, we also have P 1 + C0 + log sin θn + µ, R ≤ log 1 + 2 N (14) (15) which simply follows from (10) by upper bounding H(In |Y n ) with nC0 and plugging in the definition of θn . Therefore, if a rate R is achievable then for any µ > 0 and n sufficiently large it should simultaneously satisfy both (12) and (15) for some θn that satisfies the condition in (14). This concludes the proof of the theorem. B. Proof of Theorem 1.1 In order to prove that Theorem 1.1 follows from Theorem 1.2, consider the second bound (4) on C(C0 ) in Theorem 1.2. Since the θ in (4) satisfies θ ≥ arcsin 2−C0 := θ0 , we can upper bound the right-hand side of (4) to obtain P 1 C(C0 ) ≤ log 1 + + min hθ (ω), 2 N ω∈( π2 −θ0 , π2 ] for some θ ∈ [θ0 , π/2]. Also because given any fixed ω ∈ π2 − θ0 , π2 , hθ (ω) ≤ hθ0 (ω), ∀θ ∈ [θ0 , π/2], we further have P 1 (16) C(C0 ) ≤ log 1 + + min hθ0 (ω). 2 N ω∈( π2 −θ0 , π2 ] The significance of the function hθ0 (ω) is that for any θ0 > 0, π 1 2P + N hθ0 = log , 2 2 P +N and hθ0 (ω) is increasing at π2 , or more precisely, π P = > 0. h0θ0 2 (2P + N ) ln 2 (17) Therefore, as long as θ0 > 0, which is the case when C0 is finite, the minimization of hθ0 (ω) with respect to ω in (16) yields a value strictly smaller than hθ0 π2 in (17). This would allow us to conclude that the capacity C(C0 ) for any finite C0 is strictly smaller than 12 log 1 + 2P . N We now formalize the above argument. Using the definition of the derivative, one obtains π π π h − h − ∆ θ θ 0 0 2 2 h0θ0 = lim . ∆→0 2 ∆ Therefore, there exists a sufficiently small ∆1 > 0 such that 0 < ∆1 < θ0 and h π − h π − ∆ π h0 π θ0 2 1 θ0 2 θ − h0θ0 ≤ 0 2 . ∆1 2 2 For such ∆1 we have hθ0 π 2 − ∆1 ∆1 h0θ0 π2 ≤ hθ0 − 2 2 2P + N P ∆1 1 , = log − 2 P +N 2(2P + N ) ln 2 π 7 Zn Set of Z n /Y n jointly typical with X n Xn p nN Yn 2P + N P +N Fig. 3. Jointly typical set with X n . which further implies that 1 min hθ0 (ω) ≤ log 2 ω∈( π2 −θ0 , π2 ] − P ∆1 . 2(2P + N ) ln 2 (18) Combining (16) and (18) we obtain that for any finite C0 , there exists some ∆1 > 0 such that 1 2P P ∆1 C(C0 ) ≤ log 1 + . − 2 N 2(2P + N ) ln 2 This proves Theorem 1.1. C. Proof Outline for Lemma 2.1 Recall that Lemma 2.1 bounds H(In |Y n ) in terms of H(In |X n ) in a Markov chain In − Z n − X n − Y n , where Z n and Y n are i.i.d. Gaussian vectors given X n , E[kX n k2 ] = nP and In = fn (Z n ) is a deterministic mapping of Z n to a set of integers. As a preliminary exercise to check that fixing H(In |X n ) indeed induces an upper bound on H(In |Y n ), one can verify that if H(In |X n ) = 0 then H(In |Y n ) = 0. One (heuristic) way to see this (that can be made precise) is as follows: H(In |X n ) = 0 implies that given the transmitted codeword X n , there is no ambiguity about In , or equivalently all Z n sequences jointly typical with X n are mapped to the same In . See Figure 3. However, since Y n and Z n are statistically equivalent given X n (they share the same typical set given X n ) this would imply that In can be also determined based on Y n and therefore H(In |Y n ) = 0. Following a similar line of thought, if H(In |X n ) is fixed to a certain non-zero value, say H(In |X n ) = −n log sin θn , this roughly speaking implies that the typical Z n ’s surrounding an X n are now mapped to multiple In values. This argument can be made precise as follows: Consider the following B-length i.i.d. sequence {(X n (b), Y n (b), Z n (b), In (b))}B b=1 , (19) where for any b ∈ [1 : B], (X n (b), Y n (b), Z n (b), In (b)) has the same distribution as (X n , Y n , Z n , In ). For notational convenience, in the sequel we write the B-length sequence [X n (1), X n (2), . . . , X n (B)] as X and similarly define Y, Z and I; note that here we have I = [fn (Z n (1)), fn (Z n (2)), . . . , fn (Z n (B))] =: f (Z). Now we can apply a standard typicality argument to say that for any typical (x, i) pair,6 . p(i|x) = Pr(f (Z) = i|x) = 2nB log sinθn . 6 . Following [12], we say am = bm if limm→0 1 m log am bm . . = 0. Notations “≥” and “≤” are similarly defined. (20) 8 z0 ✓n Cap(z0 , ✓n ) p nBN x Fig. 4. A spherical cap with angle θn . This probabilistic statement can be translated into the following geometric picture: Given x, typical y and z sequences √ will be approximately uniformly distributed on an -thin spherical shell centered at x and of radius nBN , denoted as p n hp io p p nB Shell x, nB(N − ), nB(N + ) := a ∈ R : ka − xk ∈ nB(N − ), nB(N + ) where → 0 as B → ∞. The relation (20) can then be used to argue that the set of z’s jointly typical with x that are mapped to the given i, denoted by p p Ax (i) = z ∈ Shell x, nB(N − ), nB(N + ) : f (z) = i , will occupy a volume 1 2 . |Ax (i)| = 2nB ( 2 log 2πeN sin θn ) , (21) on this thin shell. This translation between probabilities and volumes of sets is immediate since y and z are distributed approximately uniformly on the shell. Assume that the set Ax (i) were a spherical cap as illustrated in Fig. 4. In general, a spherical cap now p p on Shell x, nB(N − ), nB(N + ) can be defined as a ball in terms of the geodesic metric, or simply the angle: y·z ∠(y, z) = arccos kykkzk on the shell, i.e., p p Cap(z0 , φ) = z ∈ Shell x, nB(N − ), nB(N + ) : ∠(z0 , z) ≤ φ , where we will refer to z0 as the pole and φ as the angle of the cap. Using the volume formula for the hyperspherical cap and characterizing the exponent of such a volume (c.f. Appendix A), it can be shown that the volume in (21) would correspond to an angle of θn for the spherical cap as the thickness of the shell tends to zero. Now, a straightforward computation would yield the following result: Let Vn = |Cap(z0 , θn ) ∩ Cap(y0 , ωn )| where ∠(z0 , y0 ) = π/2 and θn + ωn > π/2. Then, . Pr |Ax (i) ∩ Cap(Y, ωn )| ≥ Vn x → 1 as B → ∞. (22) In words, if we take a y uniformly at random on the shell and draw a spherical cap centered at y with angle ωn > π/2−θn , then with high probability the intersection volume of this cap with the cap Ax (i) will be approximately lower bounded by Vn . This statement follows from the (unthinkable in low-dimensions) 9 Cap(z0 , ✓n ) p z0 nBN x Cap(z0 , ✓n ) \ Cap(y0 , !n ) ✓n !n y0 Cap(y0 , !n ) Fig. 5. Intersection of two spherical caps. fact that in high dimensions most of the volume of the shell is concentrated around the equator (any equator), and in particular the equator at angle π/2 from the pole of Ax (i). Therefore, as the dimension nB gets large, for almost all y’s, the intersection volume of the two spherical caps will be approximately given by Vn (see Fig. 5), which can be shown to be 1 2 2 . V = 2nB ( 2 log 2πeN (sin θn −cos ωn )) , n by using the volume formula for the intersection of two hyperspherical caps and characterizing the exponent of this volume (c.f. Appendix B).7 One of the main technical steps in our proof is to show that the statement (22) holds for any arbitrary set Ax (i) with volume given in (21), not only when Ax (i) is a spherical cap as we assumed above. Note that this can be regarded as an extension of the classical isoperimetric inequality on the sphere, which states that among all sets on the sphere with a given volume, the spherical cap has the smallest boundary, or more generally the smallest volume of neighborhood. Another way to interpret the classical isoperimetric inequality is the following: given an arbitrary set A on the sphere, if we take a random point on the same sphere and draw a ball of certain radius around it, the probability that this ball touches A is at least as large as the same probability when A is a spherical cap of the same volume. Proving that (22) holds for any set amounts to saying that if we take a random point on the sphere and draw a ball of given radius, with high probability the intersection of the ball with the set A would be at least as large as the intersection we would get if A were a spherical cap. Roughly speaking, it identifies the spherical cap as the extremal set, not only for minimizing the volume of its neighborhood as done by the classical isoperimetric inequality, but also the extremal set when one is interested in minimizing the total intersection volume with A at given distance. We provide a more detailed discussion of this technical step in Section III. The above statement allows us to reach the following conclusion regarding the random vectors (I, X, Y) with high probability: if we take Y and draw a Euclidean ball of radius r ωn nBN 4sin2 (23) 2 around it, since a Euclidean ball of this radius includes the spherical cap of angle ωn in (22); see Fig. 6, the volume of the intersection of the set A(I) with this ball is lower bounded by r . A(I) ∩ Ball Y, nBN 4sin2 ωn ≥ 2nB ( 21 log 2πeN (sin2 θn −cos2 ωn )) , (24) 2 7 Precisely speaking, Appendix B provides the intersection area of two hyperspherical caps on a sphere rather than the intersection volume of two caps on a shell. Note however that as the dimension grows the intersection volume and area, computed on the shell and on the sphere respectively, approach the same in the exponent. 10 Y Cap(Y, !n ) Fig. 6. Euclidean ball contains the cap. x sphere p 0 nBP p p nBN nB(P + N ) y/z sphere Fig. 7. x sphere and y/z sphere. where A(I) is defined as A(I) = z ∈ RnB : f (z) = I and Ball(c, r) denotes a ball centered at c with radius r. This follows from (22), since (22) says that this property holds with high probability conditioned on any x which is typical with (I, Y). In words, if we take a typical realization (i, y) of (I, Y) and draw a ball of radius (23) around y, the volume of the set of points that are mapped to i in this ball is lower bounded as in (24). This puts an upper limit on the number of possible values of i given y. To get a tighter bound, we can incorporate the fact that mostpof the x’s lie on a thin shell centered at 0 of radius √ nBP , and y and z lie on a thin shell of radius nB(P + N ). See Fig. 7. Therefore the number of possible values for I given Y can be bounded by the ratio of the spherical cap volume r p p p ω n 2 Shell 0, nB(P + N − ), nB(P + N + ) ∩ Ball , nB(P + N )e, nBN 4sin 2 where e is any arbitrary unit vector, to the volume each possible i occupies from this cap 1 2nB ( 2 log 2πeN (sin 2θ 2 n −cos ωn )) . See Figure 8. This ratio can be shown to be . nB ≤2 1 2 log 4sin2 ω2n (P +N −N sin2 ω2n ) (P +N )(sin2 θn −cos2 ωn ) which in turn imposes the following bound on H(In |Y n ): H(In |Y n ) ≤ n , 4sin2 ω2n (P + N − N sin2 ω2n ) 1 log 2 (P + N )(sin2 θn − cos2 ωn ) ! . The upper bound (6) in Lemma 2.1 follows by noting that the above argument holds for any ωn > π/2−θn . 11 p Fig. 8. A spherical cap with angle φ = 2 arcsin q nB(P + N ) N sin2 ω2n P +N r nBN 4 sin2 !n 2 . III. S TRENGTHENING THE I SOPERIMETRIC I NEQUALITY We now provide a more detailed discussion of the main geometric tool we develop for proving Lemma 2.1. As mentioned in the previous section, a key ingredient of our result is a strengthening of the isoperimetric inequality on the sphere, which we obtain by building on the Riesz rearrangement inequality. The classical isoperimetric inequality on the sphere states that among all sets on the sphere with a given volume, the spherical cap has the smallest boundary, or more generally the smallest volume of neighborhood. This can be formally stated as follows: Let Sm−1 ⊆ Rm denote the (m − 1)-sphere of radius R, i.e. Sm−1 = {z ∈ Rm : kzk = R}, equipped with the rotation invariant (Haar) measure µ = µm−1 normalized such that m 2π 2 m−1 m−1 µ(S )= m R , Γ( 2 ) i.e. the usual surface area, and let P(A) denote the probability of a set or event A with respect to the corresponding Haar probability measure, i.e. normalized Haar measure such that P(Sm−1 ) = 1. A spherical cap can be defined as a ball in the metric ∠(z, y) = arccos(hz/R, y/Ri) of Sm−1 , i.e., Cap(z0 , θ) = z ∈ Sm−1 : ∠(z0 , z) ≤ θ . The isoperimetric inequality states that for any arbitrary set A ⊆ Sm−1 such that µ(A) = µ(C), where C = Cap(z0 , θ) ⊆ Sm−1 is a spherical cap, it holds that µ(At ) ≥ µ(Ct ), ∀t ≥ 0, where At is the t-neighborhood of A, defined as m−1 0 At = z ∈ S : min ∠(z, z ) ≤ t , 0 z ∈A and similarly m−1 0 Ct = z ∈ S : min ∠(z, z ) ≤ t = Cap(z0 , θ + t). 0 z ∈C The isoperimetric inequality above, combined with a standard computation, immediately yields the following measure concentration result: Let A ⊆ Sm−1 be an arbitrary set and C = Cap(z0 , θ) ⊆ Sm−1 be a spherical cap such that µ(A) = µ(C). If this is the case, we will say that A has an effective angle of θ. Then for any > 0 and m sufficiently large, P(A π2 −θ+ ) ≥ 1 − . (25) 12 This is known as the blowing-up lemma on the sphere. Note that when A is a spherical cap, this result follows from an elementary computation which reveals that most of the volume of an (m − 1)-sphere is concentrated around the boundary of any half-sphere, i.e. any equator, therefore blowing-up the spherical cap to be slightly bigger than a half-sphere is sufficient to capture most of the volume or probability mass on the sphere. The isoperimetric inequality allows to extend this statement to any arbitrary set A and make the much more general statement that the volume is concentrated around the boundary of any set with probability 1/2. An equivalent way to view this result is the following: let A ⊆ Sm−1 be an arbitrary set with effective angle θ > 0, then π (26) P µ A ∩ Cap Y, − θ + > 0 > 1 − , 2 for any > 0 and sufficiently large m. In words, if we take a y uniformly at random on the sphere and draw a spherical cap of angle slightly larger than π2 − θ around it (or equivalently a Euclidean ball centered at y of radius equal to the corresponding Euclidean distance), this cap will intersect the set A with high probability. This statement is almost equivalent to (25) since the y’s for which the intersection is non-zero lie in the π2 − θ + -neighborhood of A. Note again that this statement would be trivial if A were known to be a spherical cap, and it holds for any A due to the isoperimetric inequality. By building on the Riesz rearrangement inequality in the next subsection, we show the following stronger result: Lemma 3.1: Let A ⊆ Sm−1 be any arbitrary set with effective angle θ > 0 (i.e. µ(A) = µ(C) with C = Cap(z0 , θ) for some z0 ∈ Sm−1 ), and let V = µ(Cap(z0 , θ) ∩ Cap(y0 , ω)) for some z0 ∈ Sm−1 and y0 ∈ Sm−1 such that ∠(z0 , y0 ) = π/2 and θ + ω > π/2. Then for any > 0 and m sufficiently large, P (µ(A ∩ Cap(Y, ω + )) > V ) ≥ 1 − . Note that if A itself is a cap the statement is straightforward and follows from the fact that y with high probability will be concentrated around the equator at angle π/2 from the pole of A. Therefore, as m gets large for almost all y’s, the intersection of the two spherical caps will be given by V . The statement however is much stronger than this and holds for any arbitrary set A, analogous to (26). It states that no matter what the set A is, if we take a random point on the sphere and draw a cap of angle slightly larger than ω, then with high probability the intersection of the cap with the set A would be at least as large as the intersection we would get if A were a spherical cap. Roughly speaking, it identifies the spherical cap as the extremal set, not only for minimizing the volume of its t-neighborhood as done by the isoperimetric inequality, i.e. when one is interested in touching a single point in A, but also the extremal set when one is interested in minimizing the total intersection volume with A at certain distance. As we saw in the previous section, what we actually need to prove Lemma 2.1 is a version of Lemma 3.1 on a thin spherical shell instead of on the sphere, since the typical z and y sequences in our problem concentrate on a thin shell, which can be made thinner and thinner as the dimension increases, but the sphere itself has measure zero compared to a shell. However, it is instructive to first prove Lemma 3.1, which can be interesting in its own right. We will discuss the extension to a spherical shell in Section V-B. A. Proof of Lemma 3.1 via Spherical Rearrangements We prove Lemma 3.1 via spherical rearrangements in two steps. We first verify below that the claim in Lemma 3.1 holds when A is known to be a spherical cap (with the intersection volume actually larger than 2V instead of just V ). We then show that if it holds for a spherical cap, then it holds for any arbitrary set A. Assume A = Cap(z0 , θ) for some z0 ∈ Sm−1 . Due to measure concentration, for any > 0 and m sufficiently large, we have P (∠(z0 , Y) ∈ [π/2 − /2, π/2 + /2]) ≥ 1 − /2. 13 Due to the triangle inequality for the geodesic metric (angle), for y such that ∠(z0 , y) ∈ [π/2−/2, π/2+ /2] we have A ∩ Cap(y0 , ω) ⊆ A ∩ Cap(y, ω + /2) where y0 is such that ∠(z0 , y0 ) = π/2. Therefore, P (µ(A ∩ Cap(Y, ω + /2)) > V ) ≥ 1 − /2. Indeed, because the intersection volume is growing exponentially in the angle ω + /2, we can also conclude that for any > 0 and m sufficiently large, P (µ(A ∩ Cap(Y, ω + )) > 2V ) ≥ 1 − /2. (27) Our main tool for extending the above result to arbitrary A is the symmetric decreasing rearrangement of functions on the sphere, along with a version of the Riesz rearrangement inequality on the sphere due to Baernstein and Taylor [15]. For any function f : Sm−1 → R and pole z0 , the symmetric decreasing rearrangement of f about z0 is defined to be the function f ∗ : Sm−1 → R such that f ∗ (y) depends only on the angle ∠(y, z0 ), is nonincreasing in ∠(y, z0 ), and has super-level sets of the same size as f , i.e. µ {y : f ∗ (y) > d} = µ {y : f (y) > d} for all d. The function f ∗ is unique except for on sets of measure zero. One important special case is when the function f = 1A is the characteristic function for a subset A. The function 1A is just the function such that ( 1 y∈A 1A (y) = 0 otherwise. In this case 1∗A is equal to the characteristic function associated with a spherical cap of the same size as A. In other words, if A∗ is a spherical cap about the pole z0 such that µ(A∗ ) = µ(A), then 1∗A = 1A∗ . Lemma 3.2 (Baernstein and Taylor [15]): Let K be a nondecreasing bounded measurable function on the interval [−1, 1]. Then for all functions f, g ∈ L1 (Sm−1 ), Z Sm−1 ≤ Z Z Sm−1 m−1 S Z f (z)K (hz/R, y/Ri) dz g(y)dy ∗ f (z)K (hz/R, y/Ri) dz g ∗ (y)dy. Sm−1 In order to prove Lemma 3.1, we will apply Lemma 3.2 by choosing f, K such that the inner integral Z f (z)K (hz/R, y/Ri) dz = µ(A ∩ Cap(y, ω + )). Sm−1 To do this, given an arbitrary set A, we set f = 1A and ( 1 0≤α≤ω+ K(cos α) = 0 ω + < α ≤ π. Note that K is nondecreasing, bounded, and measurable. Furthermore, the product f (z)K (hz/R, y/Ri) is one precisely when z ∈ A and ∠(z, y) ≤ ω + , and it is zero otherwise. Thus the integral Z ψ(y) = f (z)K (hz/R, y/Ri) dz (28) Sm−1 14 is exactly the measure of the set A ∩ Cap(y, ω + ). Recall that the spherical cap Cap(y, ω + ) is centered around the pole y with angle ω + . We will use test functions g which are also characteristic functions. Let g = 1C for some measurable subset C ⊆ Sm−1 . For a fixed measure µ(C), the left-hand side of the inequality from Lemma 3.2 will be maximized by picking C to correspond to a super-level set of ψ (this can be shown easily by contradiction, although we don’t actually use this fact and it just provides the motivation). The claim is that when using g = 1C where C = {y : ψ(y) > d} for some d, we have the following equality: Z Z ψ(y)1C (y)dy = ψ ∗ (y)1∗C (y)dy m−1 m−1 S ZS = ψ ∗ (y)dy. C∗ This follows from the layer-cake decomposition in that Z Z ψ(y)1C (y)dy = ψ(y)dy Sm−1 C Z Z ∞ = 1{ψ(y)>t} dtdy C 0 Z ∞Z = 1{ψ(y)>t} dydt 0 C Z ∞Z Z dZ 1{ψ(y)>t} dydt 1{ψ(y)>t} dydt + = d C 0 C Z dZ Z ∞Z = 1{ψ∗ (y)>t} dydt + 1{ψ∗ (y)>t} dydt ∗ ∗ 0 C d C Z ψ ∗ (y)dy . = (29) C∗ Using this equality and our choices for g, f, K above we will rewrite the inequality from Lemma 3.2 as Z Z ∗ ψ (y)dy ≤ ψ 0 (y)dy (30) C∗ where 0 ψ (y) = Z Sm−1 C∗ f ∗ (z)K (hz/R, y/Ri) dz = µ(A∗ ∩ Cap(y, ω + )). Note that both ψ ∗ (y) and ψ 0 (y) are spherically symmetric and nonincreasing. More concretely, they both depend only on the angle ∠(y, z0 ) and are nonincreasing in the angle ∠(y, z0 ). In an abuse of notation we will write ψ 0 (α) and ψ ∗ (α) where α = ∠(y, z0 ). For convenience we will define a measure ν by dν(φ) = Am−2 (Rsin φ)Rdφ where Am (R) denotes the Haar measure of the m-sphere with radius R. We do this so that an integral like Z Z π ∗ ψ dy = ψ ∗ (φ)Am−2 (Rsin φ)Rdφ Sm−1 can be expressed as 0 Z 0 π ψ ∗ dν . 15 We are now ready to prove Lemma 3.1. Define ξ to be such that π + ξ = 2V . ψ0 2 It is easy to see that ξ is uniquely defined and will be greater than zero for sufficiently large m. We also get from (27) that Z π +ξ 2 1 0 dν ≥ 1 − /2 (31) P (ψ (Y) > 2V ) = Am−1 (R) 0 for any > 0 and m sufficiently large. Suppose there is a subsequence mk such that for each m = mk , there exists a measurable subset A with ψ ∗ (β) = V for a 0 < β < π/2 + ξ. More specifically, let β be the smallest value such that ψ ∗ (β) = V . This quantity is well-defined, i.e. the minimum is achieved, because of the continuity of ψ ∗ . Our goal will be to show that even for such a subsequence, Z π +ξ 2 1 0 ∗ dν ≤ /2 P (ψ (Y) > 2V and ψ (Y) ≤ V ) = Am−1 (R) β as m = mk increases. Together with (31), this would imply that P (ψ ∗ (Y) ≤ V ) ≤ , and because we have P (ψ ∗ (Y) > V ) = P (ψ(Y) > V ) , we would have P (µ(A ∩ Cap(Y, ω + )) > V ) ≥ 1 − , for any arbitrary set A. To this end, we note that 1 Am−1 (R) Z π π +ξ 2 dν ≤ /2 (32) as a consequence of (31). However, we have the following chain of (in)equalities which will be justified below. Z π Z π 1 1 dν ≥ (ψ ∗ − ψ 0 )dν (33) Am−1 (R) π2 +ξ V Am−1 (R) π2 +ξ Z π +ξ 2 1 = (ψ 0 − ψ ∗ )dν (34) V Am−1 (R) 0 Z π +ξ 2 1 ≥ (ψ 0 − ψ ∗ )dν (35) V Am−1 (R) β Z π +ξ 2 1 dν (36) ≥ Am−1 (R) β From these inequalities, (36) must also be bounded by /2 and we are done. The first inequality (33) is a consequence of the fact that over the range of the integral, ψ ∗ is less than or equal to V and ψ 0 is non-negative. The equality in (34) follows from Z π Z π ∗ ψ dν = ψ 0 dν , 0 0 16 which is itself a consequence of (29) with C = Sm−1 and Z Z f (z)K(hz/R, y/Ri)dzdy Sm−1 Sm−1 Z Z = K(hy, zi)dyf (z)dz Z = µ(Cap(y, ω + ))f (z)dz = µ(Cap(y, ω + ))µ(A) Z = µ(Cap(y, ω + ))f ∗ (z)dz Z Z = f ∗ (z)K(hz/R, y/Ri)dzdy . Next we have (35) which is due to the rearrangement inequality (30) when C ∗ is a spherical cap with angle β. Finally, the inequality (36) is because ψ 0 (y) ≥ 2V and ψ ∗ ≤ V over the range of the integral. IV. P ROOF OF L EMMA 2.1 We are now in a position to prove Lemma 2.1. For this we will look at B-length i.i.d. sequences of the random vectors X n , Y n , Z n , and In , and derive some typicality properties for these sequences which hold with high probability when B is large. Specifically, consider the following B-length i.i.d. sequence {(X n (b), Y n (b), Z n (b), In (b))}B b=1 , (37) where for any b ∈ [1 : B], (X n (b), Y n (b), Z n (b), In (b)) has the same distribution as (X n , Y n , Z n , In ). For notational convenience, in the sequel we write the B-length sequence [X n (1), X n (2), . . . , X n (B)] as X and similarly define Y, Z and I; note that here we have I = [fn (Z n (1)), fn (Z n (2)), . . . , fn (Z n (B))] =: f (Z). We now introduce two lemmas that will be used to establish Lemma 2.1. The proof of the first lemma will be provided at the end of this section after we finish the proof of Lemma 2.1, while the proof of the second one contains significant novelty both conceptually and technically and deserves a dedicated treatment in Section V. Lemma 4.1: Let Shell (c, r1 , r2 ) denote a spherical shell defined as Shell (c, r1 , r2 ) := a ∈ RnB : r1 ≤ ka − ck ≤ r2 . For any δ > 0 and B sufficiently large, we have and where and Pr(E1 ) ≥ 1 − δ Pr(E2 ) ≥ 1 − δ, o n p p E1 := Z ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) n p o p E2 := Y ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) . Lemma 4.2: Let Ball(c, r) denote a Euclidean ball centered at c and of radius r, i.e., Ball (c, r) := a ∈ RnB : ka − ck ≤ r . For any δ > 0 and B sufficiently large, we have Pr(E3 ) ≥ 1 − δ, (38) (39) 17 where E3 is defined to be the following event: ) ( r p 1 ω 2 2 −1 f (I) ∩ Ball 0, nB(P + N + δ ∩ Ball Y, nBN 4sin 2 + δ ≥ 2nB[ 2 log(2πeN (sin θn −cos ω))−δ] 2 (40) 1 n in which f −1 (I) := {a ∈ RnB : f (a) = I}, θn is given by arcsin 2− n H(In |X ) as in (5) and ω ∈ (π/2 − θn , π/2]. With the above two lemmas, we now upper bound H(I|Y). Consider the indicator function F = I(E1 , E2 , E3 ) where I(·) is defined as I(A) = 1 0 if A holds otherwise, and the events E1 , E2 and E3 are as given by (38)–(40). Obviously, by the union bound, we have Pr(F = 1) ≥ 1 − 3δ for any δ > 0 and B sufficiently large, and therefore H(I|Y) ≤ H(I, F |Y) = H(F |Y) + H(I|Y, F ) ≤ H(I|Y, F ) + 1 = Pr(F = 1)H(I|Y, F = 1) + Pr(F = 0)H(I|Y, F = 0) + 1 ≤ H(I|Y, F = 1) + 3δnBC0 + 1. (41) To bound apply a packing argument as follows. Consider a ball centered at Y and q H(I|Y, F = 1), we of radius nBN 4sin 2 ω2 + δ , i.e., r ω , Ball Y, nBN 4sin 2 + δ 2 where and ω satisfies p p Y ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) , π/2 − θn < ω ≤ π/2. p We now upper bound the volume of the intersection between this ball and Ball 0, nB(P + N + δ) , i.e., r p ω Ball Y, nBN 4sin 2 + δ ∩ Ball 0, nB(P + N + δ) . 2 For this, first consider the intersection volume r p ω Ball Y, nBN 4sin 2 ∩ Ball 0, nB(P + N ) , 2 for p Y ∈ Sphere 0, nB(P + N ) , (42) 18 where Sphere(c, r) denotes a sphere centered at c of radius r, i.e., Sphere (c, r) := a ∈ RnB : ka − ck = r . See Fig. 8. Note that the intersection in (42) actually p consists oftwo spherical caps that are respectively p ω 2 located on Ball Y, nBN 4sin 2 and Ball 0, nB(P + N ) . Since these two caps have essentially p the same volume, it is sufficient to calculate the volume for the cap on Ball 0, nB(P + N ) , i.e. the shaded area in Fig. 8. In particular, observe that q nBN sin2 ω2 φ = 2 arcsin p nB(P + N ) s N sin2 ω2 = 2 arcsin P +N = 2φ1 q N sin2 ω 2 . Using the volume formula for the hyperspherical cap and characterizing the where φ1 := arcsin P +N p exponent of such a volume (c.f. Appendix A), we have that for any Y ∈ Sphere 0, nB(P + N ) and B sufficiently large, r p Ball Y, nBN 4sin 2 ω ∩ Ball 0, nB(P + N ) ≤ 2nB[ 12 log(2πe(P +N )sin2 φ)+δ] . 2 Since sin2 φ = (2sin φ1 cos φ1 )2 = 4sin 2 φ1 cos2 φ1 N sin 2 ω2 N sin 2 ω2 1− , =4 P +N P +N p we have for any Y ∈ Sphere 0, nB(P + N ) and B sufficiently large, r 2ω 8πeN sin2 ω p 2 (P +N −N sin 2 ) +δ nB 12 log ω P +N Ball Y, nBN 4sin 2 . ∩ Ball 0, nB(P + N ) ≤ 2 2 p p By continuity, this would further imply that for any Y ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) , r 2ω 8πeN sin2 ω p 2 (P +N −N sin 2 ) +δ nB 12 log ω 1 P +N Ball Y, nBN 4sin 2 + δ ∩ Ball 0, nB(P + N + δ) ≤ 2 , 2 where δ1 → 0 as δ → 0 and B → ∞. On the other hand, the condition F = 1 (c.f. the definition of E3 in Lemma 4.2) also ensures that given Y, each possible I occupies a volume of at least 1 2 θ −cos2 n 2nB[ 2 log(2πeN (sin from the set ω))−δ] r p ω 2 Ball Y, nBN 4sin +δ ∩ Ball 0, nB(P + N + δ) . 2 19 p p Therefore, given F = 1 and Y ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) , the number of different possibilities for I is upper bounded by the ratio between r p ω Ball Y, nBN 4sin 2 + δ ∩ Ball 0, nB(P + N + δ) 2 and 1 2 θ −cos2 n 2nB[ 2 log(2πeN (sin ω))−δ] , which can be further upper bounded by nB 2 nB =2 1 2 log 2ω 8πeN sin2 ω 2 (P +N −N sin 2 ) − 1 P +N 2 1 2 log 2ω 4sin2 ω 2 (P +N −N sin 2 ) +δ 2 (P +N )(sin2 θn −cos2 ω) log(2πeN (sin2 θn −cos2 ω))+δ+δ1 , where δ2 → 0 as δ → 0 and B → ∞. This immediately implies the following upper bound on H(I|Y, F = 1), # " 4sin2 ω2 (P + N − N sin2 ω2 ) 1 log + δ2 , H(I|Y, F = 1) ≤ nB 2 (P + N )(sin2 θn − cos2 ω) which combined with (41) yields that " # 4sin2 ω2 (P + N − N sin2 ω2 ) 1 log H(I|Y) ≤ nB + δ2 + 3δnBC0 + 1. 2 (P + N )(sin2 θn − cos2 ω) Dividing B at both sides of the above inequality and noting that H(I|Y) = B X b=1 we have H(In (b)|Y n (b)) = BH(In |Y n ), 4sin2 ω2 (P + N − N sin2 ω2 ) 1 1 H(In |Y ) ≤ n log + δ2 + 3δC0 + . 2 nB (P + N )(sin2 θn − cos2 ω) n (43) 1 nB in (43) can all be made arbitrarily small by choosing B sufficiently large, we obtain ! 2ω 2ω 4sin (P + N − N sin ) 1 2 2 H(In |Y n ) ≤ n . (44) log 2 (P + N )(sin2 θn − cos2 ω) Noting that (44) holds for any ω ∈ π2 − θn , π2 , this completes the proof of Lemma 2.1. Since δ, δ2 and A. Proof of Lemma 4.1 We now present the proof of Lemma 4.1. Proof of Lemma 4.1: Recalling that Z = [Z n (1), Z n (2), . . . , Z n (B)], we have 2 kZk = B X b=1 kZ n (b)k2 . Therefore by the weak law of large numbers, for any δ > 0 and B sufficiently large we have 1 2 n 2 Pr kZk − E[kZ k ] ≤ δ ≥ 1 − δ. B (45) 20 Since Z n = X n + W1n , we have E[kZ n k2 ] = E[kX n + W1n k2 ] = E[kX n k2 ] + 2E[X n · W1n ] + E[kW1n k2 ]. By assumption E[kX n k2 ] = nP and E[kW1n k2 ] = nN . Also we have " n # X Xi W1i E [X n · W1n ] = E i=1 = n X E[Xi W1i ] i=1 = n X E[Xi ]E[W1i ] i=1 = 0. Therefore, E[kZ n k2 ] = n(P + N ), and this combined with (45) yields that Pr(kZk2 ∈ [nB(P + N − δ), nB(P + N + δ)]) ≥ 1 − δ, for any δ > 0 and B sufficiently large. Since Z and Y are identically distributed, the above relation also holds with kZk2 replaced by kYk2 . This completes the proof of the lemma. V. P ROOF OF L EMMA 4.2 To prove Lemma 4.2, we introduce two lemmas as follows. The proofs of these two lemmas, which will be presented in Sections V-A and V-B, are based on typicality argument and strengthening the isoperimetric inequality on a spherical shell respectively. Lemma 5.1: Given any pair of (x, i), let S (Z n |x, i) be a set of z defined as n √ √ √ √ S (Z n |x, i) := z ∈ f −1 (i) : kx − zk ∈ [ nB( N − ), nB( N + )] p z ∈ Ball 0, nB(P + N + ) o 2nB(log sinθn −) ≤ p(f (z)|x) ≤ 2nB(log sinθn +) 1 n where θn = arcsin 2− n H(In |X ) . Then there exists a set S (X n , In ) of (x, i) pairs, such that √ Pr((X, I) ∈ S (X n , In )) ≥ 1 − , and for any (x, i) ∈ S (X n , In ), Pr(Z ∈ S (Z n |x, i)|x) ≥ 2nB(log sinθn −2) , (46) (47) for B sufficiently large. Lemma 5.2: Let Lm ⊆ Rm be a spherical shell defined as Lm = {y ∈ Rm : Rm − ∆ ≤ kyk ≤ Rm + ∆} √ √ √ where Rm := mR for some R > 0 and ∆ := mδ for some 0 < δ ≤ R. Let A ⊆ Lm be an arbitrary subset on this shell with volume m 2 |A| = 2 2 log 2πeRsin θ . For any ω ∈ (π/2 − θ, π/2], ε > 0 and m sufficiently large, we have 1 ω 2 2 Pr A ∩ Ball Y, 2(Rm + ∆)sin + 2∆ ≥ 2m[ 2 log(2πeR(sin θ−cos ω))−ε] ≥ 1 − ε, 2 (48) 21 where Y is uniformly drawn from Lm . Equipped with the above two lemmas, we are now ready to prove Lemma 4.2. Proof of Lemma 4.2: From Lemma 5.1, we have for any pair (x, i) ∈ S (X n , In ), Pr(Z ∈ S (Z n |x, i)|x) ≥ 2nB(log sinθn −2) , for B sufficiently large. We also have Pr(Z ∈ S (Z n |x, i)|x) ≤ |S (Z n |x, i)| sup f (z|x) z∈S (Z n |x,i) 1 ≤ |S (Z n |x, i)|2−nB ( 2 log 2πeN −1 ) , for some 1 → 0 as → 0, where the second inequality follows because i) Z is Gaussian distributed given x, and ii) for any z ∈ S (Z n |x, i), we have √ √ √ √ kx − zk ∈ [ nB( N − ), nB( N + )]. Therefore, the volume of S (Z n |x, i) can be lower bounded by 1 |S (Z n |x, i)| ≥ 2nB ( 2 log(2πeN sin 2 θ )−2− n 1 ). Noting that S (Z n |x, i) is a subset of √ √ √ √ Shell x, nB( N − ), nB( N + ) , by Lemma 5.2, we have for any ω ∈ (π/2 − θn , π/2], r 1 2 θ −cos2 ω))− ω nB log(2πeN (sin 2 n n 3] Pr S (Z |x, i) ∩ Ball U, nBN 4sin + 2 ≥ 2 [ 2 ≥ 1 − 3 (49) 2 √ √ √ √ where U is uniformly distributed over Shell x, nB( N − ), nB( N + ) , 2 is defined such that r √ √ √ ω ω nBN 4sin2 + 2 = 2 nBN + nB sin + 2 nB, 2 2 which tends to zero as goes to zero, and 3 → 0 as B → ∞. Now we translate the bound (49) on the probability involving U to a bound on the probability involving Y. Specifically, let Y(x,i) be defined as ( √ √ √ √ Y(x,i) = y ∈ Shell x, nB( N − ), nB( N + ) : ) r 1 2 ω 2 S (Z n |x, i) ∩ Ball U, nBN 4sin2 + 2 ≥ 2nB [ 2 log(2πeN (sin θn −cos ω))−3 ] . 2 By (49), the volume of Y(x,i) is lower bounded by √ √ √ √ |Y(x,i) | ≥ (1 − 3 ) Shell x, nB( N − ), nB( N + ) 1 ≥ 2nB ( 2 log 2πeN −4 ) , where 4 → 0 as → 0 and B → ∞. Therefore, we have Pr(Y ∈ Y(x,i) |x) ≥ |Y(x,i) | inf f (y|x) y∈Y(x,i) ≥ 2nB ( 1 2 log 2πeN −4 ) −nB ( 12 log 2πeN +1 ) −nB5 ≥2 2 , 22 where 5 → 0 as → 0 and B → ∞. We then apply the following blowing-up lemma to the set Y(x,i) ; for a proof of this lemma, see [6]. Lemma 5.3: Let V1 , V2 , . . . , Vn be n i.i.d. Gaussian random variables with Vi ∼ N (0, N ), ∀i ∈ {1, 2, . . . , n}. Then, for any A ⊆ Rn with Pr(V n ∈ A) ≥ 2−nan , nr 2 Pr(V n ∈ Γ√n(√2N an ln 2+r) (A)) ≥ 1 − 2− 2N , ∀r > 0, where Γl (A) := {v n ∈ Rn : ∃ v1n ∈ A s.t. kv n − v1n k ≤ l}. In particular, noting that Y is Gaussian distributed given x, the above lemma immediately yields that for √ 6 := 2N 5 ln 2 + , Pr(Y ∈ Γ√nB6 (Y(x,i) )|x) ≥ 1 − 2− ≥ 1 − , nB2 2N for B sufficiently large. Let 7 be defined such that r r √ ω 2ω nBN 4sin + 7 = nBN 4sin2 + 2 + nB6 . 2 2 Then we have for any y ∈ Γ√nB6 (Y(x,i) ), there exists a y0 ∈ Y(x,i) such that ky − y0 k ≤ r r 2ω 2ω 0 Ball y , nBN 4sin + 2 ⊆ Ball y, nBN 4sin + 7 . 2 2 √ nB6 and This implies that for any y ∈ Γ√nB6 (Y(x,i) ), r ω 2 n S (Z |x, i) ∩ Ball y, nBN 4sin + 7 ≥ 2nB [ 12 log(2πeN (sin2 θn −cos2 ω))−3 ] . 2 p By the definition S (Z n |x, i) is a subset of f −1 (i) ∩ Ball 0, nB(P + N + ) , therefore we in turn have r p −1 f (i) ∩ Ball 0, nB(P + N + ) ∩ Ball y, nBN 4sin2 ω + 7 ≥ 2nB [ 12 log(2πeN (sin2 θn −cos2 ω))−3 ] . 2 for any y ∈ Γ√nB6 (Y(x,i) ). Therefore, we have for B sufficiently large, r p −1 ω 2 Pr f (I) ∩ Ball 0, nB(P + N + ) ∩ Ball Y, nBN 4sin + 7 2 ! 1 2 2 ≥ 2nB [ 2 log(2πeN (sin θn −cos ω))−3 ] ≥ ≥ X (x,i) Pr(Y ∈ Γ√nB6 (Y(x,i) )|x)p(x, i) X (x,i)∈S (X n ,In ) ≥ (1 − )(1 − √ ≥ 1 − 2 . Pr(Y ∈ Γ√nB6 (Y(x,i) )|x)p(x, i) √ ) √ Finally, choosing δ = max{2 , 3 , 7 } concludes the proof of Lemma 4.2. 23 A. Proof of Lemma 5.1 Via Typicality Argument We now present the proof of Lemma 5.1. Recall that H(In |X n ) = −n log sin θn for the n-channel use code. By the law of large numbers and Lemma 4.1, we have for any > 0 and sufficiently large B, Pr((X, Z) ∈ S (X n , Z n )) ≥ 1 − where n √ √ √ √ S (X n , Z n ) := (x, z) : kx − zk ∈ [ nB( N − ), nB( N + )] p z ∈ Ball 0, nB(P + N + ) o 2nB(log sinθn −) ≤ p(f (z)|x) ≤ 2nB(log sinθn +) . Note that in terms of S (X n , Z n ), the set S (Z n |x, i) in Lemma 5.1 can be simply written as S (Z n |x, i) = {z : f (z) = i, (x, z) ∈ S (X n , Z n )}. Therefore, for B sufficiently large, we have Pr(Z ∈ / S (Z n |X, I)) = Pr(f (Z) = I, (X, Z) ∈ / S(X n , Z n )) ≤ . √ On the other hand, defining S (X n , In ) := {(x, i) : Pr(Z ∈ S (Z n |x, i)|x, i) ≥ 1 − }, we have X Pr(Z ∈ / S (Z n |X, I)) = Pr(Z ∈ / S (Z n |x, i)|x, i)p(x, i) (x,i)∈S (X n ,In ) + ≥ √ X Pr(Z ∈ / S (Z n |x, i)|x, i)p(x, i) (x,i)∈S / (X n ,In ) · Pr(Sc (X n , In )). Therefore, we have for B sufficiently large, √ Pr(Sc (X n , In )) ≤ √ = , and thus Pr(S (X n , In )) ≥ 1 − √ , which proves (46). To√prove (47), consider any (x, i) ∈ S (X n , In ). From the definition of S (X n , In ), Pr(S (Z n |x, i)|x, i) ≥ 1 − . Therefore, S (Z n |x, i) must be nonempty, i.e., there exists at least one z ∈ S (Z n |x, i). Consider any z ∈ S (Z n |x, i). By the definition of S (Z n |x, i), we have f (z) = i and (x, z) ∈ S (X n , Z n ). Then, it follows from the definition of S (X n , Z n ) that 2nB(log sinθn −) ≤ p(f (z)|x) = p(i|x) ≤ 2nB(log sinθn +) . This further implies that Pr(Z ∈ S (Z n |x, i)|x) Pr(f (Z) = i|x)Pr(Z ∈ S (Z n |x, i)|x, f (Z) = i) = Pr(f (Z) = i|Z ∈ S (Z n |x, i), x) = p(i|x)Pr(S (Z n |x, i)|x, i) √ ≥ 2nB(log sinθn −) (1 − ) ≥ 2nB(log sinθn −2) for sufficiently large B, which concludes the proof of (47) and Lemma 5.1. 24 B. Proof of Lemma 5.2 Via Strengthening Isoperimetric Inequality We now prove Lemma 5.2 by extending our result in Lemma 3.1 on a sphere to a spherical shell. Let Lm = {y ∈ Rm : Rm − ∆ ≤ kyk ≤ Rm + ∆} √ √ √ be this shell where Rm := mR for some R > 0 and ∆ := mδ for some 0 < δ ≤ R; note that one can also write this shell as [ Lm = Sm−1 r r∈[Rm −∆,Rm +∆] where Sm−1 := {y ∈ Rm : kyk = r} r denotes the (m − 1)-sphere of radius r. Let A ⊆ Lm be an arbitrary subset on this shell with volume m |A| = 2 2 log 2πeRsin2 θ . We say A is spherically symmetric about a pole z0 ∈ Sm−1 Rm if for any r ∈ [Rm − ∆, Rm + ∆], the intersection A ∩ Sm−1 yields a spherical cap about the pole rz0 /Rm . In this case, denote by θr the angle r m−1 associated with each cap A ∩ Sr . Similarly as the proof of Lemma 3.1, we prove Lemma 5.2 in two steps. In particular, we will first demonstrate that the claim in Lemma 5.2 holds when A is known to be spherically symmetric, and then show that if it holds for a spherically symmetric A, then it holds for any arbitrary set A. i) Spherically symmetric A: Suppose A is spherically symmetric about pole z0 . With a slight abuse of notation, denote by Cap(y, ω) := {y0 ∈ Lm : ∠(y, y0 ) ≤ ω} a shell cap in Lm with pole y and angle ω. To show (48), it suffices to show that for any ω > π/2 − θ, ε > 0 and m sufficiently large, 1 2 2 Pr |A ∩ Cap(Y, ω)| ≥ 2m( 2 log 2πeR(sin θ−cos ω)−ε) > 1 − ε where Y is uniformly distributed over Lm . This is because the shell Cap(Y, ω) is always contained by the Euclidean Ball Y, 2(Rm + ∆)sin ω2 + 2∆ and thus ω A ∩ Cap(Y, ω) ⊆ A ∩ Ball Y, 2(Rm + ∆)sin + 2∆ . 2 Since the intersection volume |A ∩ Cap(y, ω)| depends only on the angle ∠(y, y0 ) and not on the magnitude of y, we can instead consider y to be uniformly drawn from the sphere Sm−1 Rm . For each m−1 y ∈ SRm , the intersection volume |A ∩ Cap(y, ω)| can be written as an integral over each sphere in the shell, i.e., Z |A ∩ Cap(y, ω)| = 1A∩Cap(y,ω) (z) dz Rm Z Rm +∆ Z = 1A∩Cap(y,ω) (z) dz dr. Rm −∆ Sm−1 r Recall that due to measure concentration, for any > 0 and m sufficiently large, we have Pr (∠(z0 , Y) ∈ [π/2 − /2, π/2 + /2]) ≥ 1 − . (50) This allows us to concentrate on characterizing |A ∩ Cap(y, ω)| for y’s in Sm−1 Rm with ∠(z0 , y) ∈ [π/2 − /2, π/2 + /2]. 25 In particular, for y0 ∈ Sm−1 Rm with ∠(z0 , y0 ) = π/2 and any r ∈ [Rm − ∆, Rm + ∆], it can be shown that Z 1A∩Cap(y0 ,ω) (z) dz (51) Sm−1 r 1 2 0 2 ≥ 2m( 2 log 2πeRr (sin θr −cos ω)−) 1 2 0 2 ≥ 2m( 2 log 2πeR (sin θr −cos ω)−) (52) (53) where ω > π/2 − θr and can be made arbitrarily small by choosing m to be sufficiently large. In the above, (52) follows from characterizing the surface area of the intersection of two spherical caps (see p 0 0 such that mRr = r, and (53) follows from lower bounding Rr0 by Appendix B), where Rr is defined √ 0 0 R with R being defined such that mR0 = Rm − ∆. We would like to integrate (53) over r in order to get a lower bound on the total intersection volume |A ∩ Cap(y0 , ω)|. However, (53) is only valid when θr + ω ≥ π2 , and when θr + ω < π2 , the slice intersection area (51) will be empty and does not contribute to the total intersection volume. This motivates us to set π θr0 = max θr , − ω 2 and write Z 1 2 0 0 2 1A∩Cap(y0 ,ω) (z)dz ≥ 2m( 2 log 2πeR (sin θr −cos ω)−) . (54) Sm−1 r We now rewrite the right-hand side of (54) in the following way. Let the function h : [0, 1] → R be defined by 2 m 0 2 h(t) = 2 2 log 2πeR (t m −cos ω) so that m h (sin m θr0 ) = 2 2 log 2πeR0 (sin2 θr0 −cos2 ω) . 2 By differentiating h(t) twice it can be seen that h is convex for t n ≥ cos2 ω. Integrating both sides of (54) yields that for any > 0 and m sufficiently large, Z Rm +∆ Z 1A∩Cap(y0 ,ω) (z)dz dr Rm −∆ Sm−1 r Z Rm +∆ ≥ h(sin m θr0 )2−m dr (55) Rm −∆ Z Rm +∆ 1 m 0 sin θr dr 2−m (56) ≥ 2∆ h 2∆ Rm −∆ Z Rm +∆ 1 −m m 0 ≥2 h sin θr dr (57) 2∆ Rm −∆ where (56) is due to Jensen’s inequality. On the other hand, we have 1 2 |A| = 2m( 2 log 2πeRsin θ) Z Rm +∆ 1 2 0 2m( 2 log 2πeRr sin θr +) dr ≤ R −∆ Z Rmm +∆ 1 2 00 ≤ 2m( 2 log 2πeR sin θr +) dr Rm −∆ Z Rm +∆ m( 12 log 2πeR00 +) =2 sinm θr dr Rm −∆ 26 where the first inequality follows from characterizing the surface area of a hyperspherical cap (see √ 00 00 Appendix A), and the second inequality follows by defining R to be such that mR = Rm + ∆. Therefore, we have Z Rm +∆ 2θ 1 1 m 12 log Rsin − m 00 R sin θr dr ≥ 2 2∆ Rm −∆ 2∆ 1 2 ≥ 2m( 2 log sin θ−2) , for any > 0 and m sufficiently large. Combining this with (57), we have for m sufficiently large, |A ∩ Cap(y0 , ω)| Z Rm +∆ 1 −m m 0 ≥2 h sin θr dr 2∆ Rm −∆ Z Rm +∆ 1 m −m sin θr dr ≥2 h 2∆ Rm −∆ m( 12 log sin2 θ−2) −m ≥2 h 2 m 0 2 −4 2 = 2−m 2 2 log 2πeR (sin θ·2 −cos 1 2 2 ≥ 2m( 2 log 2πeR(sin θ−cos ω)−1 ) ω) where , 1 → 0 as m → ∞. This together with (50) then yields that 1 2 2 Pr |A ∩ Cap(Y, ω)| ≥ 2m( 2 log 2πeR(sin θ−cos ω)−ε) ≥ 1 − ε, (58) where ε → 0 as m → ∞. ii) Arbitrary A: To extend the above result to an arbitrary A, we again use the Riesz rearrangement inequality in Lemma 3.2, similarly as in the proof of Lemma 3.1. In particular, to apply Lemma 3.2 to the shell situation we must find appropriate functions on the sphere. Note that Z |A ∩ Cap(y, ω)| = 1A∩Cap(y,ω) (z) dz Rm Z Rm +∆ Z m−1 = r 1A∩Cap(y,ω) (rz) dr dz Sm−1 1 so that if fA (z) = Rm −∆ Z Rm +∆ rm−1 1A (rz) dr Rm −∆ and then ( 0 ω<α≤π K(cos α) = 1 0≤α≤ω ψ(y) = |A ∩ Cap(y, ω)| = Z Sm−1 1 fA (z)K(hz, y/kyki)dz. Let ψ ∗ be the symmetric decreasing rearrangement of ψ about a pole z0 . The claim is that there exists a subset A0 ⊆ Lm with |A| = |A0 | which is spherically symmetric about z0 and such that fA0 = fA∗ . Explicitly, one example would be [ A0 = A0r r∈[Rm −∆,Rm +∆] Rr where A0r ⊆ Sm−1 is a spherical cap with the same area as {z : f (z) > d} for d = rm−1 dr. A r Rm −∆ 0 0 0 Let ψ (y) = |A ∩ Cap(y, ω)|. Both ψ and ψ depend only on the angle ∠(z0 , y) and are nonincreasing in that angle. Following the same argument from the proof of Lemma 3.1, one can establish (58) for any arbitrary A ⊆ Lm , which finishes the proof of Lemma 5.2. 27 A PPENDIX A S URFACE A REA AND VOLUME OF A H YPERSPHERICAL C AP In this Appendix we derive the surface area and volume formulas for a hyperspherical cap (see also [13]) and then characterize the exponents of these expressions. √ Let C ⊆ Sm−1 be a spherical cap with angle θ on the (m − 1)-sphere of radius Rm = mR. The area µ(C) of C can be written as Z θ Am−2 (Rm sin ρ)Rm dρ µ(C) = 0 where Am−2 (Rm sin ρ) is the total surface area of the (m − 2)-sphere of radius Rm sin ρ. Plugging in the expression for the surface area of an (m − 2)-sphere leads to Z θ m−1 m−2 2π 2 (mR) 2 µ(C) = sin m−2 ρ dρ. Γ m−1 0 2 m−1 2 We now characterize the exponent of µ(C). First, by Stirling’s approximation, Γ2πm−1 (mR) ( 2 ) above can be bounded as m−2 2 in the m−1 2 m [log(2πeR)−1 ] 2 m−2 m 2π 2 (mR) 2 ≤ 2 2 [log(2πeR)+1 ] ≤ m−1 Γ 2 for some 1 → 0 as m → ∞. Also for m sufficiently large, we have Z θ Z θ m−2 2 m−2 sin ρdρ = 2 2 log sin ρ dρ 0 Z0 θ m−2 2 2 2 log sin ρ dρ ≥ 1 θ− m 1 m−2 log sin2 (θ− 1 ) m 2 2 m m 2 ≥ 2 2 (log sin θ−2 ) ≥ and Z θ m−2 sin ρdρ = Z θ 2 m−2 2 log sin2 ρ dρ 0 0 m−2 2 ≤ θ · 2 2 log sin θ m 2 ≤ 2 2 (log sin θ+2 ) for some 2 → 0 as m → ∞. Therefore, the area µ(C) can be bounded as m 2 θ)−] 2 2 [log(2πeRsin m 2 θ)+] ≤ µ(C) ≤ 2 2 [log(2πeRsin (59) for some → 0 as m → ∞. To find the volume |C| of C, we integrate over the surface area of the spheres at each radius, which leads to Z θ Z Rm m−1 2π 2 m−2 |C| = sin ρ dρ rm−2 dr Γ m−1 0 0 2 Z θ m−1 2(πmR) 2 = sin m−2 ρ dρ. (m − 1) Γ m−1 0 2 28 Following the similar approach as above, we can also bound the volume |C| as m 2 θ)−] 2 2 [log(2πeRsin m 2 θ)+] ≤ |C| ≤ 2 2 [log(2πeRsin (60) for some → 0 as m → ∞. Let Sm−1 A PPENDIX B S URFACE A REA OF THE I NTERSECTION OF T WO H YPERSPHERICAL C APS √ ⊆ Rm be an (m − 1)-sphere of radius Rm = mR. Let Ci = Cap(vi , θi ) = {v ∈ Sm−1 |∠(v, vi ) ≤ θi }, i = 1, 2 be two spherical caps on Sm−1 such that ∠(v1 , v2 ) = π2 , θi < π2 , and θ1 + θ2 > π2 . We have the following lemma which characterizes the intersection area µ(C1 ∩ C2 ) of these two caps. Lemma B.1: There exists some which tends to 0 as m goes to infinity, such that m 2 θ −cos2 θ ))−] 1 2 m 2 θ −cos2 θ ))+] 1 2 µ(C1 ∩ C2 ) ≥ 2 2 [log(2πeR(sin and µ(C1 ∩ C2 ) ≤ 2 2 [log(2πeR(sin . To prove this lemma, below we will first derive the surface area formula for the intersection of the above two caps (see also [14]) and then characterize the exponent of this area. A. Deriving the Surface Area Formula Consider the points v ∈ Sm−1 such that ∠(v1 , v) = θ1 and ∠(v2 , v) = θ2 . These points satisfy the linear relations 2 hv1 , vi = Rm cos θ1 and 2 hv2 , vi = Rm cos θ2 , and therefore all such v lie in the unique m − 1 dimensional subspace H defined by v1 v2 − , v = 0. cos θ1 cos θ2 The angle between the hyperplane H and the vector v2 is π 1 v1 v2 q φ = − arccos − , v2 1 2 cos θ1 cos θ2 R + 1 m cos2 θ1 cos2 θ2 and because v1 and v2 are orthogonal and kv2 k = Rm , π 1 q φ = − arccos 1 2 cos θ + 1 2 cos2 θ1 cos2 θ2 = arctan cos θ1 cos θ2 . 29 The approach will be as follows. Divide the intersection C1 ∩ C2 into two parts C + and C − that are on either side of the hyperplane H. More concretely, v2 v1 + − ≥0 C = v ∈ C1 ∩ C2 v, cos θ1 cos θ2 and − C = v1 v2 − <0 . v ∈ C1 ∩ C2 v, cos θ1 cos θ2 Each part C + and C − can be written as a union of lower dimensional spherical caps. We will find the measure of each part by integrating the measures of these lower dimensional caps. The measure of the cap C2 can be expressed as the integral Z θ2 Am−2 (Rm sin ρ)Rdρ µ(C2 ) = 0 where Am−2 (Rm sin ρ) is the surface area of the (m − 2)-sphere with radius Rm sin ρ. If we consider a single (m − 2)-sphere at some angle ρ, then the hyperplane H divides that (m − 2)-sphere into two spherical caps. The claim is that each of these m − 2 dimensional caps that is on the side of H with v1 is contained in C + (and those on the side with v2 are contained in C − ). Furthermore, all points in C + are in one of these m − 2 dimensional caps. The claim follows because v1 v2 v, − ≥0 cos θ1 cos θ2 implies cos θ2 cos ∠(v, v1 ) ≥ cos θ1 cos ∠(v, v2 ) and since ∠(v, v2 ) ≤ θ2 and cos ∠(v, v2 ) ≥ cos θ2 , this implies cos θ2 cos ∠(v, v1 ) ≥ cos θ1 cos θ2 . Finally, this implies ∠(v, v1 ) ≤ θ1 , v ∈ C1 , and v ∈ C + . Note that for ρ < φ, the (m − 2)-sphere at angle ρ is entirely on the v2 side of H. This establishes the fact that Z θ 2 µ(C + ) = θ ρ Cm−2 (Rm sin ρ)Rdρ φ θρ Cm−2 (Rm sin ρ) where is the surface area of an m − 2 dimensional spherical cap defined by angle θρ on the (m − 2)-sphere of radius Rm sin ρ. Writing cos θρ = h Rm sin ρ note that h is the distance from the center of the (m − 2)-sphere at angle ρ to the m − 2 dimensional hyperplane that divides the sphere into two caps. Furthermore, since the (m − 2)-sphere has center (Rm cos ρ)v2 , we have h tan φ = . Rm cos ρ Therefore, θρ = arccos tanθφ tanρ . 30 Combining this with the corresponding result for µ(C − ) yields µ(C1 ∩ C2 ) = µ(C + ) + µ(C − ) Z θ2 arccos( tanφ ) Cm−2 tanρ (Rm sin ρ)Rm dρ = φ + Z θ1 arccos( π −φ 2 Cm−2 tan(π/2−φ) tanρ ) (Rm sin ρ)Rm dρ. This expression can be rewritten using known expressions for the area of a hyperspherical cap in terms of the regularized incomplete beta function as µ(C1 ∩ C2 ) = J(φ, θ2 ) + J(π/2 − φ, θ1 ), where J(φ, θ2 ) is defined as m−1 (πmR) 2 J(φ, θ2 ) = Γ m−1 2 Z θ2 (sin φ m−1 ρ)I1− 2 ( tanφ tanρ ) m−1 1 , 2 2 dρ (61) and J(π/2 − φ, θ1 ) is defined similarly. Here in (61), Ix (a, b) is the regularized incomplete beta function, given by Ix (a, b) = B(x; a, b) , B(a, b) (62) where B(x; a, b) and B(a, b) are the incomplete beta function and the complete beta function respectively: Z x B(x; a, b) = ta−1 (1 − t)b−1 dt 0 B(a, b) = Γ(x)Γ(y) . Γ(x + y) B. Characterizing the Exponent We now lower and upper bound J(φ, θ2 ) with exponential functions. First, using Stirling’s approximam−1 tion, (πmR) 2 Γ( m−1 2 ) on the R.H.S. of (61) can be bounded as m−1 2 for some 1 → 0 as m → ∞. Now consider m [log(2πeR)−1 ] 2 m (πmR) 2 ≤ 2 2 [log(2πeR)+1 ] ≤ m−1 Γ 2 I1− tanφ 2 ( tanρ ) m−1 1 , 2 2 (63) inside the integral on the R.H.S. of (61). In light of (62), it can be written as 2 tanφ m−1 1 B 1 − tanρ ; 2 , 2 m−1 1 I1− tanφ 2 , = . ( tanρ ) 2 2 B m−1 , 12 2 For the denominator in (64), by Stirling’s approximation, we have − 1 m−1 1 1 m−1 2 B , ∼Γ . 2 2 2 2 (64) 31 For the numerator in (64), we have 1− B Z = tanφ tanρ 2 1−( tanφ tanρ ) 2 m−1 1 , ; 2 2 t m−3 2 t m−3 2 0 Z ≥ ! 1 (1 − t)− 2 dt 2 1−( tanφ tanρ ) dt 0 2 2 m−1 1−( tanφ ) t 2 0 tanρ n−1 " 2 # m−1 2 tanφ 2 1− = n−1 tanρ = n 2 ≥2 for some 2 → 0 as m → ∞, and B = 1− Z tanφ tanρ 2 1−( tanφ tanρ ) h i 2 log 1−( tanφ −2 tanρ ) 2 m−1 1 , ; 2 2 t m−3 2 t m−3 2 ≤ ! 1 (1 − t)− 2 dt 0 Z 2 1−( tanφ tanρ ) 1− 0 tanρ = tanφ tanθ2 ≤ tanφ Z , 1− 2 1−( tanφ tanρ ) t m−3 2 tanφ tanρ 2 !!− 12 dt 0 Z 2 1−( tanφ tanρ ) t m−3 2 dt 0 " 2 # m−1 2 2tanθ2 tanφ = 1− (m − 1)tanφ tanρ ≤2 m 2 h i 2 log 1−( tanφ +3 tanρ ) , for some 3 → 0 as m → ∞. Also noting that sin m−1 ρ = 2 m−1 2 with ρ ∈ [φ, θ2 ], we can bound the integrand in (61) as log sin2 ρ m−1 1 sin ρ I1− tanφ 2 , ( tanρ ) 2 2 h i tanφ 2 m 2 log (sin ρ) 1−( tanρ ) −4 ≥22 m−1 m = 2 2 [log(sin 2 ρ−tan2 φ cos2 ρ)−4 ] dt 32 and ρ I1− m−1 1 2 , sin ) ( tanφ 2 2 tanρ m 2 ρ−tan2 φ cos2 ρ + log sin ) 4] ≤22[ ( m−1 for some 4 → 0 as m → ∞. For sufficiently large m, Z θ2 m−1 1 m−1 (sin ρ)I1− tanφ 2 , dρ ( tanρ ) 2 2 φ Z θ2 m−1 1 m−1 , dρ ≥ (sin ρ)I1− tanφ 2 ( tanρ ) 1 2 2 θ2 − m Z θ2 m 2 2 2 ≥ 2 2 [log(sin ρ−tan φ cos ρ)−4 ] dρ 1 θ2 − m 1 n2 [log(sin2 (θ2 − m1 )−tan2 φ cos2 (θ2 − m1 ))−4 ] 2 m m 2 2 2 ≥ 2 2 [log(sin θ2 −tan φ cos θ2 )−5 ] m 2 2 = 2 2 [log(sin θ2 −cos θ1 )−5 ] , ≥ and Z θ2 (sin m−1 φ m ρ)I1− tanφ 2 ( tanρ ) ≤ 2 2 [log(sin 2θ 2 2 −cos θ1 m−1 1 , 2 2 )+5 ] dρ for some 5 → 0 as m → ∞. Combining this with (63), we can bound J(φ, θ2 ) as m 2 2 [log 2πeR(sin 2θ 2 −cos 2 θ1 )−6 ] m ≤ J(φ, θ2 ) ≤ 2 2 [log 2πeR(sin 2 θ −cos2 θ 2 1 )+6 ] for some 6 → 0 as m → ∞. Due to symmetry, we can also bound J(π/2 − φ, θ1 ) as m 2 2 [log 2πeR(sin 2θ 2 1 −cos θ2 )−6 ] ≤ J(π/2 − φ, θ ) ≤ 2 m2 [log 2πeR(sin2 θ1 −cos2 θ2 )+6 ] . 1 Noting that sin2 θ2 − cos2 θ1 = sin2 θ1 − cos2 θ2 , we have µ(C1 ∩ C2 ) ≥ J(φ, θ2 ) + J(π/2 − φ, θ1 ) m 2 2 ≥ 2 2 [log 2πeR(sin θ1 −cos θ2 )−] and m µ(C1 ∩ C2 ) ≤ 2 2 [log 2πeR(sin 2 θ −cos2 θ 1 2 )+] for some → 0 as m → ∞. This completes the proof of the lemma. ACKNOWLEDGEMENT The authors would like to acknowledge inspiring discussions with Liang-Liang Xie within a preceding collaboration [4]. 33 R EFERENCES [1] X. Wu, L. Barnes, A. Ozgur, “Cover’s open problem: “The capacity of the relay channel”,” Proc. of 54th Annual Allerton Conference on Communication, Control, and Computing, Allerton Retreat Center, Monticello, Illinois, 2016. [2] T. M. Cover, “The capacity of the relay channel,” Open Problems in Communication and Computation, edited by T. M. Cover and B. Gopinath, Eds. New York: Springer-Verlag, 1987, pp. 72–73. [3] X. Wu and A. Ozgur, “Improving on the cut-set bound via geometric analysis of typical sets,” in Proc. of 2016 International Zurich Seminar on Communications. [4] X. Wu, A. Ozgur, L.-L. Xie, “Improving on the cut-set bound via geometric analysis of typical sets,” submitted to IEEE Trans. Inform. Theory. Available: http://arxiv.org/abs/1602.08540 [5] X. Wu and A. Ozgur, “Cut-set bound is loose for Gaussian relay networks,” in Proc. of 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton Retreat Center, Monticello, Illinois, Sept. 29–Oct. 1, 2015. [6] X. Wu and A. Ozgur, “Cut-set bound is loose for Gaussian relay networks,” submitted to IEEE Trans. Inform. Theory. Available: http://arxiv.org/abs/1606.01374 [7] X. Wu and A. Ozgur, “Improving on the cut-set bound for general primitive relay channels,” in Proc. of IEEE Int. Symposium on Information Theory, 2016. [8] T. Cover and A. El Gamal, “Capacity theorems for the relay channel,” IEEE Trans. Inform. Theory, vol. 25, pp. 572–584, 1979. [9] G. Schechtman, “Concentration, results and applications,” Handbook of the geometry of Banach spaces, Vol. 2, 1603–1634, NorthHolland, Amsterdam, 2003. [10] C. E. Shannon, “Communication in the presence of noise,” Proc. IRE, vol. 37, pp. 10–21, Jan. 1949. [11] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pt. I, pp. 379–423, 1948; pt. II, pp. 623–656, 1948. [12] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed. New York, NY, USA: Wiley, 2006. [13] S. Li, “Concise formulas for the area and volume of a hyperspherical cap,” Asian Journal of Mathematics and Statistics, vol. 4, pp. 66–70, 2011. [14] Y. Lee and W. C. Kim, “Concise formulas for the surface area of the intersection of two hyperspherical caps,” KAIST Technical Report, 2014. [15] A. Baernstein II and B. A. Taylor, “Spherical rearrangements, subharmonic functions, and ∗-functions in n-space,” Duke Mathematical Journal, vol. 43, no. 2, pp. 245-268, 1976. [16] J. A. Thomas, “Feedback can at most double Gaussian multiple access channel capacity,” IEEE Trans. Inf. Theory, vol. IT-33, no. 5, pp. 711–716, Sep. 1987.
© Copyright 2026 Paperzz