Cover`s Open Problem: “The Capacity of the

1
Cover’s Open Problem:
“The Capacity of the Relay Channel”
Xiugang Wu, Leighton Pate Barnes and Ayfer Özgür
Abstract
Consider a memoryless relay channel, where the channel from the relay to the destination is an isolated bit
pipe of capacity C0 . Let C(C0 ) denote the capacity of this channel as a function of C0 . What is the critical value
of C0 such that C(C0 ) first equals C(∞)? This is a long-standing open problem posed by Cover and named “The
Capacity of the Relay Channel,” in Open Problems in Communication and Computation, Springer-Verlag, 1987.
In this paper, we answer this question in the Gaussian case and show that C(C0 ) can not equal to C(∞) unless
C0 = ∞, regardless of the SNR of the Gaussian channels, while the cut-set bound would suggest that C(∞) can
be achieved at finite C0 . Our approach is geometric and relies on a strengthening of the isoperimetric inequality
on the sphere by using the Riesz rearrangement inequality.
I. P ROBLEM S ETUP AND M AIN R ESULT
In 1987, Thomas M. Cover asked the following question which he called “The Capacity of the Relay
Channel” [2].
A. The Capacity of the Relay Channel1
Consider the following seemingly simple discrete memoryless relay channel: Here Z and Y are condi-
C0
tionally independent and conditionally identically distributed given X, that is, p(z, y|x) = p(z|x)p(y|x).
Also, the channel from Z to Y does not interfere with Y . A (2nR , n) code for this channel is a map
X n : [1 : 2nR ] → X n , a relay function fn : Z n → [1 : 2nC0 ] and a decoding function gn : Y n × [1 :
2nC0 ] → [1 : 2nR ]. The probability of error is given by
Pe(n) = Pr(gn (Y n , fn (Z n )) 6= M ),
where the message M is uniformly distributed over [1 : 2nR ] and
n
n
−nR
p(m, y , z ) = 2
n
Y
i=1
p(yi |xi (m))
n
Y
i=1
p(zi |xi (m)).
Let C(C0 ) be the supremum of achievable rates R for a given C0 , that is, the supremum of the rates R
(n)
for which Pe can be made to tend to zero. We note the following facts:
The work was supported in part by NSF award CCF-1514538 and by the Center for Science of Information (CSoI), an NSF Science
and Technology Center, under grant agreement CCF-0939370. This paper was presented in part at the 2016 Allerton Conference on
Communication, Control, and Computing [1].
X. Wu, L. P. Barnes and A. Özgür are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
(e-mail: [email protected]; [email protected]; [email protected]).
1
This subsection is taken verbatim from [2] with a few notation changes.
2
1. C(0) = supp(x) I(X; Y ).
2. C(∞) = supp(x) I(X; Y, Z).
3. C(C0 ) is a nondecreasing function of C0 .
What is the critical value of C0 such that C(C0 ) first equals C(∞)?
B. Main Result
In this paper, we answer this long-standing open question in the Gaussian case. In particular, consider
the symmetric Gaussian relay channel as depicted in Fig. 1, where
Z = X + W1
Y = X + W2
with the transmitted signal being constrained to average power P , i.e.2 ,
nR
E[kX n k2 ] =
2
1 X
2nR
m=1
kxn (m)k2 ≤ nP,
(1)
and W1 , W2 ∼ N (0, N ) representing Gaussian noises that are independent of each other and X.
Z
W1 ⇠ N (0, N )
W2 ⇠ N (0, N )
C0
X
Y
Fig. 1. Symmetric Gaussian relay channel.
For this channel it is easy to observe that
1
2P
C(∞) = log 1 +
.
2
N
Let
C0∗ := inf{C0 : C(C0 ) = C(∞)}.
(2)
The cut-set bound [8] yields the following lower bound on C0∗ :
1
2P
1
P
∗
C0 ≥ log 1 +
− log 1 +
,
2
N
2
N
which may lead one to suspect that C(∞) could be achieved at finite C0 . The main result of our paper
is to show that C0∗ = ∞ regardless of the parameters of the problem.
Theorem 1.1: For the symmetric Gaussian relay channel depicted in Fig. 1, C0∗ = ∞.
2
This constraint is less stringent than requiring kxn (m)k2 ≤ nP, ∀m ∈ [1 : 2nR ], which is a more standard way of expressing the average
power constraint for the AWGN channel. Note that the capacity can be only larger under (1) and therefore conclusions of Theorems 1.1 and
1.2 are also valid under the individual power constraint for the codewords, i.e. kxn (m)k2 ≤ nP, ∀m ∈ [1 : 2nR ].
3
This theorem follows immediately from the following theorem which establishes an upper bound on
the capacity of this channel for any C0 .
Theorem 1.2: For the symmetric Gaussian relay channel depicted in Fig. 1, the capacity C(C0 ) satisfies

P
1


(3)

 C(C0 ) ≤ 2 log 1 + N + C0 + log sin θ
1
P


hθ (ω),
(4)

 C(C0 ) ≤ 2 log 1 + N + ω∈ min
( π2 −θ, π2 ]
for some θ ∈ arcsin 2−C0 , π2 , where
SNR = -15 dB
!
2ω
4sin 2 (P + N − N sin2 ω2 )sin 2 θ
1
hθ (ω) = log
.
2
(P + N )(sin2 θ − cos2 ω)
0.045
Cut-set bound
C-F
Old bound
New bound
0.04
0.035
P
of
In Fig. 2 we plot this upper bound (label: New bound) under three different values of the SNR = N
the Gaussian channels together with the celebrated cut-set bound [8] and an upper bound on the capacity
of this channel we have previously derived in [6] (label: Old bound). For reference, we also provide
the rate achieved by a compress-and-forward relay strategy (label: C-F), which employs Gaussian input
distribution at the source combined with Gaussian quantization and Wyner-Ziv binning at the relay.3 Note
that from these figures one can visually observe that the new upper bound reaches the value C(∞) only
as C0 → ∞, which leads to the conclusion in Theorem 1.1. This is formally proved in the next section.
0.03
0.025
0.02
SNR = -15 dB
0.045
0.005
0.01
0.015 0.02 0.025 0.03
C0 (bit/channel use)
0.035
0.04
0.045
SNR = 0 dB
0.8
Cut-set bound
C-F
Old bound
New bound
0.04
0
0.75
SNR = 15 dB
3.1
Cut-set bound
C-F
Old bound
New bound
Cut-set bound
C-F
Old bound
New bound
3
0.7
2.9
0.65
2.8
0.6
2.7
0.55
2.6
0.035
0.03
0.025
0.02
0
0.005
0.01
0.015 0.02 0.025 0.03
C0 (bit/channel use)
0.035
0.04
0.045
0.5
0
0.1
0.2
0.3
0.4
C0 (bit/channel use)
0.5
0.6
2.5
0
0.1
0.2
0.3
0.4
0.5
0.6
C0 (bit/channel use)
0.7
0.8
0.9
1
Fig. 2. Upper bounds and achievable rates for the Gaussian relay channel.
SNR = 0 dB
0.8
Cut-set bound
C-F
Old bound
New bound
0.75
C. Approach
Our approach builds on the method we developed in our earlier work [3]–[7] for characterizing
information tensions in a Markov chain by using high-dimensional geometry. The main idea is to study the
geometry of the high-dimensional typical sets associated with the random variables in the Markov chain
and then translate this high-dimensional geometry to information inequalities for the random variables.
This idea applies equally well to single-letter and multi-letter random variables. The main geometric tool
employed in our previous work [3]–[7] was the so-called blowing-up lemma. In the current paper, our main
geometric ingredient is a strengthening of the isoperimetric inequality on a high-dimensional sphere, which
we develop by building on the Riesz rearrangement inequality [15]. The classical isoperimetric inequality
on the sphere states that among all sets on the sphere with a given volume the spherical cap has the
smallest boundary or more generally the smallest volume of neighborhood [9]. In this paper, we show
0.7
0.65
0.6
0.55
0.5
0
0.1
0.2
0.3
0.4
C0 (bit/channel use)
0.5
0.6
3
This is not the best state-of-the-art achievable rate as this rate can be improved, for example, by using bursty transmissions and time-sharing
at low SNR.
4
that the spherical cap is the extremal set not only in terms of minimizing the volume of its neighborhood,
but roughly speaking also in terms of minimizing its total intersection volume with a ball drawn around
a randomly chosen point on the sphere.
It may be a priori surprising that the isoperimetric inequality appears as the main technical ingredient
in the solution of a network information theory problem. However, a converse can be thought of as
characterizing the extremal configuration of the typical sets of the random variables associated with an
information theory problem, i.e., the configuration that is induced by the (extremal) capacity-achieving
strategy. In this sense, it is quite natural that a tool, such as the isoperimetric inequality, which characterizes
extremal sets in a certain geometric sense, turns out to be useful.
Formulating the problem of determining the communication capacity of channels as a problem in
high-dimensional geometry is one of Shannon’s most important insights that has led to the conception
of information theory. His second paper [10], which appears couple of months after his classical paper
“A Mathematical Theory of Communication” [11] but is cited in this first paper, develops a geometric
representation of any point-to-point communication system. It then provides an elegant and intuitive
geometric proof of the coding theorem for the AWGN channel, where the converse is based on a spherepacking argument in high-dimensional space and achievability is proved by a geometric random coding
argument. However, to the best of our knowledge such techniques have not been used effectively for
solving network problems. Our approach is similar to Shannon’s approach in [10] in that the key step in
our proof is a packing argument on a spherical cap. However, it is also different from Shannon’s approach
as we do not directly study the geometry of the codewords but rather use high-dimensional geometry to
characterize information tensions in a Markov chain by a lifting step to a high-dimensional space. We
believe this approach can be useful for solving other open problems in network information theory.
II. P ROOFS OF T HEOREMS 1.1 AND 1.2
The proofs of both theorems follow from the below lemma, which is the main technical focus of this
paper. The proof idea for this lemma is outlined in Section II-C and a formal proof is given in Section
IV. We now state this lemma and show how it leads to the conclusion in Theorem 1.2, which is then
used to prove Theorem 1.1.
Lemma 2.1: Let In be an integer random variable and X n , Y n and Z n be n-length random vectors
which form the Markov chain In − Z n − X n − Y n . Assume moreover that Z n and Y n are i.i.d. white
Gaussian vectors given X n , i.e. Z n , Y n ∼ N (X n , N In×n ), where In×n denotes the identity matrix, and
E[kX n k2 ] = nP , and In = fn (Z n ) is a deterministic mapping of Z n to a set of integers.
Let H(In |X n ) be denoted by −n log sin θn , i.e., define4
1
n
θn := arcsin 2− n H(In |X ) .
Then the following inequality holds for any n,
1
H(In |Y n ) ≤ n ·
min
log
ω∈( π2 −θn , π2 ] 2
4sin2 ω2 (P + N − N sin2 ω2 )
(P + N )(sin2 θn − cos2 ω)
(5)
!
.
(6)
Note that the lemma provides an upper bound on H(In |Y n ) in terms of H(In |X n ) for a Markov chain
that satisfies the conditions of the lemma.
A. Proof of Theorem 1.2
Suppose a rate R is achievable. Then there exists a sequence of (2nR , n) codes such that the average
(n)
probability of error Pe → 0 as n → ∞. Let the relay’s transmission be denoted by In = fn (Z n ). By
4
Note that values of θn between 0 and π/2 span all possible values for H(In |X n ) between 0 and ∞.
5
standard information theoretic arguments, for this sequence of codes we have
nR = H(M )
= I(M ; Y n , In ) + H(M |Y n , In )
≤ I(X n ; Y n , In ) + nµ
= I(X n ; Y n ) + I(X n ; In |Y n ) + nµ
= I(X n ; Y n ) + H(In |Y n ) − H(In |X n ) + nµ
≤ nI(XQ ; YQ ) + H(In |Y n ) − H(In |X n ) + nµ
P
n
+ H(In |Y n ) − H(In |X n ) + nµ,
≤ log 1 +
2
N
(7)
(8)
(9)
(10)
for any µ > 0 and n sufficiently large. In the above, (7) follows from applying the data processing
inequality to the Markov chain M −X n −(Y n , In ) and Fano’s inequality, (8) uses the fact that In −X n −Y n
form a Markov chain and thus H(In |X n , Y n ) = H(In |X n ), (9) follows by defining the time sharing
random variable Q to be uniformly distributed over [1 : n], and (10) follows because
nR
E[XQ2 ]
nR
2
n
2
1 X1X 2
1 1 X n
kx (m)k2 ≤ P.
xi (m) =
= nR
nR
2 m=1 n i=1
n 2 m=1
(11)
Given (10), the standard way to proceed would be to upper bound the first entropy term by H(In |Y n ) ≤
H(In ) ≤ nC0 and lower bound the second entropy term H(In |X n ) simply by 0. This would lead to the socalled multiple-access bound in the well-known cut-set bound on the capacity of this channel [8]. However
as we already point out in our previous works [3]–[7], this leads to a loose bound since it does not capture
the inherent tension between how large the first entropy term can be and how small the second one can
be. Instead, we can use Lemma 2.1 to more tightly upper bound the difference H(In |Y n ) − H(In |X n ) in
(10).
We start by verifying that the random variables In , X n , Z n and Y n associated with a code of blocklength
n satisfy the conditions in the lemma. It is trivial to observe that they satisfy the required Markov chain
condition and Z n and Y n are i.i.d. Gaussian given X n due to the channel structure. Note also that without
loss of generality we can assume that the code satisfies the average power constraint in (1) with equality,
i.e.,
2nR
1 X
n 2
E[kX k ] = nR
kX n (m)k2 = nP.
2 m=1
(n)
This is because given a (2nR , n) code with average probability
of error Pe and E[kX n k2 ] = nP 0 < nP ,
p
we can always scale up the codewords by a factor of nP/nP 0 and achieve an average probability of
(n)
error smaller than or equal to Pe .5
Therefore, applying Lemma 2.1 to the random variables associated with a code for the relay channel,
we can bound the difference of the two entropy terms in (10) and conclude that for any achievable rate
R,
1
P
R ≤ log 1 +
+
min hθn (ω) + µ,
(12)
2
N
ω∈( π2 −θn , π2 ]
where hθn (ω) is defined as
1
hθn (ω) = log
2
4sin2 ω2 (P + N − N sin2 ω2 )sin 2 θn
(P + N )(sin2 θn − cos2 ω)
!
,
(13)
5
This can be done for example by adding additional independent Gaussian noise at the relay and the destination to emulate the transmission
of the original codeword.
6
1
in which θn := arcsin 2− n H(In |X
n)
satisfies
1
θ0 := arcsin 2−C0 ≤ arcsin 2− n H(In |X
n)
= θn ≤
π
.
2
At the same time, for any achievable rate R, we also have
P
1
+ C0 + log sin θn + µ,
R ≤ log 1 +
2
N
(14)
(15)
which simply follows from (10) by upper bounding H(In |Y n ) with nC0 and plugging in the definition of
θn . Therefore, if a rate R is achievable then for any µ > 0 and n sufficiently large it should simultaneously
satisfy both (12) and (15) for some θn that satisfies the condition in (14). This concludes the proof of the
theorem.
B. Proof of Theorem 1.1
In order to prove that Theorem 1.1 follows from Theorem 1.2, consider the second bound (4) on C(C0 )
in Theorem 1.2. Since the θ in (4) satisfies θ ≥ arcsin 2−C0 := θ0 , we can upper bound the right-hand
side of (4) to obtain
P
1
C(C0 ) ≤ log 1 +
+ min hθ (ω),
2
N
ω∈( π2 −θ0 , π2 ]
for some θ ∈ [θ0 , π/2]. Also because given any fixed ω ∈ π2 − θ0 , π2 , hθ (ω) ≤ hθ0 (ω), ∀θ ∈ [θ0 , π/2],
we further have
P
1
(16)
C(C0 ) ≤ log 1 +
+ min hθ0 (ω).
2
N
ω∈( π2 −θ0 , π2 ]
The significance of the function hθ0 (ω) is that for any θ0 > 0,
π 1
2P + N
hθ0
= log
,
2
2
P +N
and hθ0 (ω) is increasing at π2 , or more precisely,
π P
=
> 0.
h0θ0
2
(2P + N ) ln 2
(17)
Therefore, as long as θ0 > 0, which is the case when C0 is finite, the minimization of hθ0 (ω) with respect
to ω in (16) yields a value strictly smaller than hθ0 π2 in (17). This would
allow us to conclude that the
capacity C(C0 ) for any finite C0 is strictly smaller than 12 log 1 + 2P
.
N
We now formalize the above argument. Using the definition of the derivative, one obtains
π
π
π h
−
h
−
∆
θ
θ
0
0
2
2
h0θ0
= lim
.
∆→0
2
∆
Therefore, there exists a sufficiently small ∆1 > 0 such that 0 < ∆1 < θ0 and
h π − h π − ∆ π h0 π θ0 2
1
θ0 2
θ
− h0θ0
≤ 0 2 .
∆1
2
2
For such ∆1 we have
hθ0
π
2
− ∆1
∆1 h0θ0 π2
≤ hθ0
−
2
2
2P + N
P ∆1
1
,
= log
−
2
P +N
2(2P + N ) ln 2
π 7
Zn
Set of Z n /Y n jointly
typical with X n
Xn
p
nN
Yn
2P + N
P +N
Fig. 3. Jointly typical set with X n .
which further implies that
1
min hθ0 (ω) ≤ log
2
ω∈( π2 −θ0 , π2 ]
−
P ∆1
.
2(2P + N ) ln 2
(18)
Combining (16) and (18) we obtain that for any finite C0 , there exists some ∆1 > 0 such that
1
2P
P ∆1
C(C0 ) ≤ log 1 +
.
−
2
N
2(2P + N ) ln 2
This proves Theorem 1.1.
C. Proof Outline for Lemma 2.1
Recall that Lemma 2.1 bounds H(In |Y n ) in terms of H(In |X n ) in a Markov chain In − Z n − X n − Y n ,
where Z n and Y n are i.i.d. Gaussian vectors given X n , E[kX n k2 ] = nP and In = fn (Z n ) is a deterministic
mapping of Z n to a set of integers. As a preliminary exercise to check that fixing H(In |X n ) indeed induces
an upper bound on H(In |Y n ), one can verify that if H(In |X n ) = 0 then H(In |Y n ) = 0. One (heuristic)
way to see this (that can be made precise) is as follows: H(In |X n ) = 0 implies that given the transmitted
codeword X n , there is no ambiguity about In , or equivalently all Z n sequences jointly typical with X n
are mapped to the same In . See Figure 3. However, since Y n and Z n are statistically equivalent given
X n (they share the same typical set given X n ) this would imply that In can be also determined based on
Y n and therefore H(In |Y n ) = 0.
Following a similar line of thought, if H(In |X n ) is fixed to a certain non-zero value, say H(In |X n ) =
−n log sin θn , this roughly speaking implies that the typical Z n ’s surrounding an X n are now mapped to
multiple In values. This argument can be made precise as follows: Consider the following B-length i.i.d.
sequence
{(X n (b), Y n (b), Z n (b), In (b))}B
b=1 ,
(19)
where for any b ∈ [1 : B], (X n (b), Y n (b), Z n (b), In (b)) has the same distribution as (X n , Y n , Z n , In ). For
notational convenience, in the sequel we write the B-length sequence [X n (1), X n (2), . . . , X n (B)] as X
and similarly define Y, Z and I; note that here we have
I = [fn (Z n (1)), fn (Z n (2)), . . . , fn (Z n (B))] =: f (Z).
Now we can apply a standard typicality argument to say that for any typical (x, i) pair,6
.
p(i|x) = Pr(f (Z) = i|x) = 2nB log sinθn .
6
.
Following [12], we say am = bm if limm→0
1
m
log
am
bm
.
.
= 0. Notations “≥” and “≤” are similarly defined.
(20)
8
z0
✓n
Cap(z0 , ✓n )
p
nBN
x
Fig. 4. A spherical cap with angle θn .
This probabilistic statement can be translated into the following geometric picture: Given x, typical y
and z sequences
√ will be approximately uniformly distributed on an -thin spherical shell centered at x
and of radius nBN , denoted as
p
n
hp
io
p
p
nB
Shell x, nB(N − ), nB(N + ) := a ∈ R : ka − xk ∈
nB(N − ), nB(N + )
where → 0 as B → ∞. The relation (20) can then be used to argue that the set of z’s jointly typical
with x that are mapped to the given i, denoted by
p
p
Ax (i) = z ∈ Shell x, nB(N − ), nB(N + ) : f (z) = i ,
will occupy a volume
1
2
.
|Ax (i)| = 2nB ( 2 log 2πeN sin θn ) ,
(21)
on this thin shell. This translation between probabilities and volumes of sets is immediate since y and z
are distributed approximately uniformly on the shell.
Assume
that the set Ax (i) were a spherical cap as illustrated in Fig. 4. In general, a spherical cap
now
p
p
on Shell x, nB(N − ), nB(N + ) can be defined as a ball in terms of the geodesic metric, or
simply the angle:
y·z
∠(y, z) = arccos
kykkzk
on the shell, i.e.,
p
p
Cap(z0 , φ) = z ∈ Shell x, nB(N − ), nB(N + ) : ∠(z0 , z) ≤ φ ,
where we will refer to z0 as the pole and φ as the angle of the cap. Using the volume formula for
the hyperspherical cap and characterizing the exponent of such a volume (c.f. Appendix A), it can be
shown that the volume in (21) would correspond to an angle of θn for the spherical cap as the thickness
of the shell tends to zero. Now, a straightforward computation would yield the following result: Let
Vn = |Cap(z0 , θn ) ∩ Cap(y0 , ωn )| where ∠(z0 , y0 ) = π/2 and θn + ωn > π/2. Then,
.
Pr |Ax (i) ∩ Cap(Y, ωn )| ≥ Vn x → 1 as B → ∞.
(22)
In words, if we take a y uniformly at random on the shell and draw a spherical cap centered at y with
angle ωn > π/2−θn , then with high probability the intersection volume of this cap with the cap Ax (i) will
be approximately lower bounded by Vn . This statement follows from the (unthinkable in low-dimensions)
9
Cap(z0 , ✓n )
p
z0
nBN
x
Cap(z0 , ✓n ) \ Cap(y0 , !n )
✓n
!n
y0
Cap(y0 , !n )
Fig. 5. Intersection of two spherical caps.
fact that in high dimensions most of the volume of the shell is concentrated around the equator (any
equator), and in particular the equator at angle π/2 from the pole of Ax (i). Therefore, as the dimension
nB gets large, for almost all y’s, the intersection volume of the two spherical caps will be approximately
given by Vn (see Fig. 5), which can be shown to be
1
2
2
.
V = 2nB ( 2 log 2πeN (sin θn −cos ωn )) ,
n
by using the volume formula for the intersection of two hyperspherical caps and characterizing the
exponent of this volume (c.f. Appendix B).7
One of the main technical steps in our proof is to show that the statement (22) holds for any arbitrary
set Ax (i) with volume given in (21), not only when Ax (i) is a spherical cap as we assumed above.
Note that this can be regarded as an extension of the classical isoperimetric inequality on the sphere,
which states that among all sets on the sphere with a given volume, the spherical cap has the smallest
boundary, or more generally the smallest volume of neighborhood. Another way to interpret the classical
isoperimetric inequality is the following: given an arbitrary set A on the sphere, if we take a random
point on the same sphere and draw a ball of certain radius around it, the probability that this ball touches
A is at least as large as the same probability when A is a spherical cap of the same volume. Proving
that (22) holds for any set amounts to saying that if we take a random point on the sphere and draw a
ball of given radius, with high probability the intersection of the ball with the set A would be at least
as large as the intersection we would get if A were a spherical cap. Roughly speaking, it identifies the
spherical cap as the extremal set, not only for minimizing the volume of its neighborhood as done by the
classical isoperimetric inequality, but also the extremal set when one is interested in minimizing the total
intersection volume with A at given distance. We provide a more detailed discussion of this technical step
in Section III.
The above statement allows us to reach the following conclusion regarding the random vectors (I, X, Y)
with high probability: if we take Y and draw a Euclidean ball of radius
r
ωn
nBN 4sin2
(23)
2
around it, since a Euclidean ball of this radius includes the spherical cap of angle ωn in (22); see Fig. 6,
the volume of the intersection of the set A(I) with this ball is lower bounded by
r
.
A(I) ∩ Ball Y, nBN 4sin2 ωn ≥ 2nB ( 21 log 2πeN (sin2 θn −cos2 ωn )) ,
(24)
2 7
Precisely speaking, Appendix B provides the intersection area of two hyperspherical caps on a sphere rather than the intersection volume
of two caps on a shell. Note however that as the dimension grows the intersection volume and area, computed on the shell and on the sphere
respectively, approach the same in the exponent.
10
Y
Cap(Y, !n )
Fig. 6. Euclidean ball contains the cap.
x sphere
p
0
nBP
p
p
nBN
nB(P + N )
y/z sphere
Fig. 7. x sphere and y/z sphere.
where A(I) is defined as A(I) = z ∈ RnB : f (z) = I and Ball(c, r) denotes a ball centered at c with
radius r. This follows from (22), since (22) says that this property holds with high probability conditioned
on any x which is typical with (I, Y). In words, if we take a typical realization (i, y) of (I, Y) and draw
a ball of radius (23) around y, the volume of the set of points that are mapped to i in this ball is lower
bounded as in (24). This puts an upper limit on the number of possible values of i given y. To get a
tighter
bound, we can incorporate the fact that mostpof the x’s lie on a thin shell centered at 0 of radius
√
nBP , and y and z lie on a thin shell of radius nB(P + N ). See Fig. 7. Therefore the number of
possible values for I given Y can be bounded by the ratio of the spherical cap volume
r
p
p
p
ω
n
2
Shell 0, nB(P + N − ), nB(P + N + ) ∩ Ball
,
nB(P + N )e, nBN 4sin
2 where e is any arbitrary unit vector, to the volume each possible i occupies from this cap
1
2nB ( 2 log 2πeN (sin
2θ
2
n −cos
ωn ))
.
See Figure 8. This ratio can be shown to be
.
nB
≤2
1
2
log
4sin2 ω2n (P +N −N sin2 ω2n )
(P +N )(sin2 θn −cos2 ωn )
which in turn imposes the following bound on H(In |Y n ):
H(In |Y n ) ≤ n
,
4sin2 ω2n (P + N − N sin2 ω2n )
1
log
2
(P + N )(sin2 θn − cos2 ωn )
!
.
The upper bound (6) in Lemma 2.1 follows by noting that the above argument holds for any ωn > π/2−θn .
11
p
Fig. 8. A spherical cap with angle φ = 2 arcsin
q
nB(P + N )
N sin2 ω2n
P +N
r
nBN 4 sin2
!n
2
.
III. S TRENGTHENING THE I SOPERIMETRIC I NEQUALITY
We now provide a more detailed discussion of the main geometric tool we develop for proving Lemma
2.1.
As mentioned in the previous section, a key ingredient of our result is a strengthening of the isoperimetric
inequality on the sphere, which we obtain by building on the Riesz rearrangement inequality. The classical
isoperimetric inequality on the sphere states that among all sets on the sphere with a given volume,
the spherical cap has the smallest boundary, or more generally the smallest volume of neighborhood.
This can be formally stated as follows: Let Sm−1 ⊆ Rm denote the (m − 1)-sphere of radius R, i.e.
Sm−1 = {z ∈ Rm : kzk = R}, equipped with the rotation invariant (Haar) measure µ = µm−1 normalized
such that
m
2π 2 m−1
m−1
µ(S
)= m R
,
Γ( 2 )
i.e. the usual surface area, and let P(A) denote the probability of a set or event A with respect to the
corresponding Haar probability measure, i.e. normalized Haar measure such that P(Sm−1 ) = 1. A spherical
cap can be defined as a ball in the metric ∠(z, y) = arccos(hz/R, y/Ri) of Sm−1 , i.e.,
Cap(z0 , θ) = z ∈ Sm−1 : ∠(z0 , z) ≤ θ .
The isoperimetric inequality states that for any arbitrary set A ⊆ Sm−1 such that µ(A) = µ(C), where
C = Cap(z0 , θ) ⊆ Sm−1 is a spherical cap, it holds that
µ(At ) ≥ µ(Ct ), ∀t ≥ 0,
where At is the t-neighborhood of A, defined as
m−1
0
At = z ∈ S
: min
∠(z, z ) ≤ t ,
0
z ∈A
and similarly
m−1
0
Ct = z ∈ S
: min
∠(z, z ) ≤ t = Cap(z0 , θ + t).
0
z ∈C
The isoperimetric inequality above, combined with a standard computation, immediately yields the
following measure concentration result: Let A ⊆ Sm−1 be an arbitrary set and C = Cap(z0 , θ) ⊆ Sm−1
be a spherical cap such that µ(A) = µ(C). If this is the case, we will say that A has an effective angle
of θ. Then for any > 0 and m sufficiently large,
P(A π2 −θ+ ) ≥ 1 − .
(25)
12
This is known as the blowing-up lemma on the sphere. Note that when A is a spherical cap, this result
follows from an elementary computation which reveals that most of the volume of an (m − 1)-sphere is
concentrated around the boundary of any half-sphere, i.e. any equator, therefore blowing-up the spherical
cap to be slightly bigger than a half-sphere is sufficient to capture most of the volume or probability
mass on the sphere. The isoperimetric inequality allows to extend this statement to any arbitrary set A
and make the much more general statement that the volume is concentrated around the boundary of any
set with probability 1/2.
An equivalent way to view this result is the following: let A ⊆ Sm−1 be an arbitrary set with effective
angle θ > 0, then
π
(26)
P µ A ∩ Cap Y, − θ + > 0 > 1 − ,
2
for any > 0 and sufficiently large m. In words, if we take a y uniformly at random on the sphere
and draw a spherical cap of angle slightly larger than π2 − θ around it (or equivalently a Euclidean ball
centered at y of radius equal to the corresponding Euclidean distance), this cap will intersect the set A
with high probability. This statement is almost equivalent to (25) since the y’s for which the intersection
is non-zero lie in the π2 − θ + -neighborhood of A. Note again that this statement would be trivial if A
were known to be a spherical cap, and it holds for any A due to the isoperimetric inequality.
By building on the Riesz rearrangement inequality in the next subsection, we show the following
stronger result:
Lemma 3.1: Let A ⊆ Sm−1 be any arbitrary set with effective angle θ > 0 (i.e. µ(A) = µ(C) with
C = Cap(z0 , θ) for some z0 ∈ Sm−1 ), and let V = µ(Cap(z0 , θ) ∩ Cap(y0 , ω)) for some z0 ∈ Sm−1 and
y0 ∈ Sm−1 such that ∠(z0 , y0 ) = π/2 and θ + ω > π/2. Then for any > 0 and m sufficiently large,
P (µ(A ∩ Cap(Y, ω + )) > V ) ≥ 1 − .
Note that if A itself is a cap the statement is straightforward and follows from the fact that y with high
probability will be concentrated around the equator at angle π/2 from the pole of A. Therefore, as m
gets large for almost all y’s, the intersection of the two spherical caps will be given by V . The statement
however is much stronger than this and holds for any arbitrary set A, analogous to (26). It states that no
matter what the set A is, if we take a random point on the sphere and draw a cap of angle slightly larger
than ω, then with high probability the intersection of the cap with the set A would be at least as large as
the intersection we would get if A were a spherical cap. Roughly speaking, it identifies the spherical cap
as the extremal set, not only for minimizing the volume of its t-neighborhood as done by the isoperimetric
inequality, i.e. when one is interested in touching a single point in A, but also the extremal set when one
is interested in minimizing the total intersection volume with A at certain distance.
As we saw in the previous section, what we actually need to prove Lemma 2.1 is a version of Lemma 3.1
on a thin spherical shell instead of on the sphere, since the typical z and y sequences in our problem
concentrate on a thin shell, which can be made thinner and thinner as the dimension increases, but the
sphere itself has measure zero compared to a shell. However, it is instructive to first prove Lemma 3.1,
which can be interesting in its own right. We will discuss the extension to a spherical shell in Section
V-B.
A. Proof of Lemma 3.1 via Spherical Rearrangements
We prove Lemma 3.1 via spherical rearrangements in two steps. We first verify below that the claim
in Lemma 3.1 holds when A is known to be a spherical cap (with the intersection volume actually larger
than 2V instead of just V ). We then show that if it holds for a spherical cap, then it holds for any arbitrary
set A.
Assume A = Cap(z0 , θ) for some z0 ∈ Sm−1 . Due to measure concentration, for any > 0 and m
sufficiently large, we have
P (∠(z0 , Y) ∈ [π/2 − /2, π/2 + /2]) ≥ 1 − /2.
13
Due to the triangle inequality for the geodesic metric (angle), for y such that ∠(z0 , y) ∈ [π/2−/2, π/2+
/2] we have
A ∩ Cap(y0 , ω) ⊆ A ∩ Cap(y, ω + /2)
where y0 is such that ∠(z0 , y0 ) = π/2. Therefore,
P (µ(A ∩ Cap(Y, ω + /2)) > V ) ≥ 1 − /2.
Indeed, because the intersection volume is growing exponentially in the angle ω + /2, we can also
conclude that for any > 0 and m sufficiently large,
P (µ(A ∩ Cap(Y, ω + )) > 2V ) ≥ 1 − /2.
(27)
Our main tool for extending the above result to arbitrary A is the symmetric decreasing rearrangement
of functions on the sphere, along with a version of the Riesz rearrangement inequality on the sphere due
to Baernstein and Taylor [15].
For any function f : Sm−1 → R and pole z0 , the symmetric decreasing rearrangement of f about z0
is defined to be the function f ∗ : Sm−1 → R such that f ∗ (y) depends only on the angle ∠(y, z0 ), is
nonincreasing in ∠(y, z0 ), and has super-level sets of the same size as f , i.e.
µ {y : f ∗ (y) > d} = µ {y : f (y) > d}
for all d. The function f ∗ is unique except for on sets of measure zero.
One important special case is when the function f = 1A is the characteristic function for a subset A.
The function 1A is just the function such that
(
1 y∈A
1A (y) =
0 otherwise.
In this case 1∗A is equal to the characteristic function associated with a spherical cap of the same size as
A. In other words, if A∗ is a spherical cap about the pole z0 such that µ(A∗ ) = µ(A), then 1∗A = 1A∗ .
Lemma 3.2 (Baernstein and Taylor [15]): Let K be a nondecreasing bounded measurable function on
the interval [−1, 1]. Then for all functions f, g ∈ L1 (Sm−1 ),
Z
Sm−1
≤
Z
Z
Sm−1
m−1
S Z
f (z)K (hz/R, y/Ri) dz g(y)dy
∗
f (z)K (hz/R, y/Ri) dz g ∗ (y)dy.
Sm−1
In order to prove Lemma 3.1, we will apply Lemma 3.2 by choosing f, K such that the inner integral
Z
f (z)K (hz/R, y/Ri) dz = µ(A ∩ Cap(y, ω + )).
Sm−1
To do this, given an arbitrary set A, we set f = 1A and
(
1 0≤α≤ω+
K(cos α) =
0 ω + < α ≤ π.
Note that K is nondecreasing, bounded, and measurable. Furthermore, the product f (z)K (hz/R, y/Ri)
is one precisely when z ∈ A and ∠(z, y) ≤ ω + , and it is zero otherwise. Thus the integral
Z
ψ(y) =
f (z)K (hz/R, y/Ri) dz
(28)
Sm−1
14
is exactly the measure of the set A ∩ Cap(y, ω + ). Recall that the spherical cap Cap(y, ω + ) is centered
around the pole y with angle ω + .
We will use test functions g which are also characteristic functions. Let g = 1C for some measurable
subset C ⊆ Sm−1 . For a fixed measure µ(C), the left-hand side of the inequality from Lemma 3.2 will be
maximized by picking C to correspond to a super-level set of ψ (this can be shown easily by contradiction,
although we don’t actually use this fact and it just provides the motivation).
The claim is that when using g = 1C where C = {y : ψ(y) > d} for some d, we have the following
equality:
Z
Z
ψ(y)1C (y)dy =
ψ ∗ (y)1∗C (y)dy
m−1
m−1
S
ZS
=
ψ ∗ (y)dy.
C∗
This follows from the layer-cake decomposition in that
Z
Z
ψ(y)1C (y)dy =
ψ(y)dy
Sm−1
C
Z Z ∞
=
1{ψ(y)>t} dtdy
C
0
Z ∞Z
=
1{ψ(y)>t} dydt
0
C
Z ∞Z
Z dZ
1{ψ(y)>t} dydt
1{ψ(y)>t} dydt +
=
d
C
0
C
Z dZ
Z ∞Z
=
1{ψ∗ (y)>t} dydt +
1{ψ∗ (y)>t} dydt
∗
∗
0
C
d
C
Z
ψ ∗ (y)dy .
=
(29)
C∗
Using this equality and our choices for g, f, K above we will rewrite the inequality from Lemma 3.2 as
Z
Z
∗
ψ (y)dy ≤
ψ 0 (y)dy
(30)
C∗
where
0
ψ (y) =
Z
Sm−1
C∗
f ∗ (z)K (hz/R, y/Ri) dz = µ(A∗ ∩ Cap(y, ω + )).
Note that both ψ ∗ (y) and ψ 0 (y) are spherically symmetric and nonincreasing. More concretely, they
both depend only on the angle ∠(y, z0 ) and are nonincreasing in the angle ∠(y, z0 ). In an abuse of
notation we will write ψ 0 (α) and ψ ∗ (α) where α = ∠(y, z0 ). For convenience we will define a measure
ν by
dν(φ) = Am−2 (Rsin φ)Rdφ
where Am (R) denotes the Haar measure of the m-sphere with radius R. We do this so that an integral
like
Z
Z π
∗
ψ dy =
ψ ∗ (φ)Am−2 (Rsin φ)Rdφ
Sm−1
can be expressed as
0
Z
0
π
ψ ∗ dν .
15
We are now ready to prove Lemma 3.1. Define ξ to be such that
π
+ ξ = 2V .
ψ0
2
It is easy to see that ξ is uniquely defined and will be greater than zero for sufficiently large m. We also
get from (27) that
Z π +ξ
2
1
0
dν ≥ 1 − /2
(31)
P (ψ (Y) > 2V ) =
Am−1 (R) 0
for any > 0 and m sufficiently large.
Suppose there is a subsequence mk such that for each m = mk , there exists a measurable subset A with
ψ ∗ (β) = V for a 0 < β < π/2 + ξ. More specifically, let β be the smallest value such that ψ ∗ (β) = V .
This quantity is well-defined, i.e. the minimum is achieved, because of the continuity of ψ ∗ . Our goal
will be to show that even for such a subsequence,
Z π +ξ
2
1
0
∗
dν ≤ /2
P (ψ (Y) > 2V and ψ (Y) ≤ V ) =
Am−1 (R) β
as m = mk increases. Together with (31), this would imply that
P (ψ ∗ (Y) ≤ V ) ≤ ,
and because we have
P (ψ ∗ (Y) > V ) = P (ψ(Y) > V ) ,
we would have
P (µ(A ∩ Cap(Y, ω + )) > V ) ≥ 1 − ,
for any arbitrary set A.
To this end, we note that
1
Am−1 (R)
Z
π
π
+ξ
2
dν ≤ /2
(32)
as a consequence of (31). However, we have the following chain of (in)equalities which will be justified
below.
Z π
Z π
1
1
dν ≥
(ψ ∗ − ψ 0 )dν
(33)
Am−1 (R) π2 +ξ
V Am−1 (R) π2 +ξ
Z π +ξ
2
1
=
(ψ 0 − ψ ∗ )dν
(34)
V Am−1 (R) 0
Z π +ξ
2
1
≥
(ψ 0 − ψ ∗ )dν
(35)
V Am−1 (R) β
Z π +ξ
2
1
dν
(36)
≥
Am−1 (R) β
From these inequalities, (36) must also be bounded by /2 and we are done.
The first inequality (33) is a consequence of the fact that over the range of the integral, ψ ∗ is less than
or equal to V and ψ 0 is non-negative. The equality in (34) follows from
Z π
Z π
∗
ψ dν =
ψ 0 dν ,
0
0
16
which is itself a consequence of (29) with C = Sm−1 and
Z
Z
f (z)K(hz/R, y/Ri)dzdy
Sm−1 Sm−1
Z Z
=
K(hy, zi)dyf (z)dz
Z
= µ(Cap(y, ω + ))f (z)dz
= µ(Cap(y, ω + ))µ(A)
Z
= µ(Cap(y, ω + ))f ∗ (z)dz
Z Z
=
f ∗ (z)K(hz/R, y/Ri)dzdy .
Next we have (35) which is due to the rearrangement inequality (30) when C ∗ is a spherical cap with
angle β. Finally, the inequality (36) is because ψ 0 (y) ≥ 2V and ψ ∗ ≤ V over the range of the integral.
IV. P ROOF OF L EMMA 2.1
We are now in a position to prove Lemma 2.1. For this we will look at B-length i.i.d. sequences of
the random vectors X n , Y n , Z n , and In , and derive some typicality properties for these sequences which
hold with high probability when B is large.
Specifically, consider the following B-length i.i.d. sequence
{(X n (b), Y n (b), Z n (b), In (b))}B
b=1 ,
(37)
where for any b ∈ [1 : B], (X n (b), Y n (b), Z n (b), In (b)) has the same distribution as (X n , Y n , Z n , In ). For
notational convenience, in the sequel we write the B-length sequence [X n (1), X n (2), . . . , X n (B)] as X and
similarly define Y, Z and I; note that here we have I = [fn (Z n (1)), fn (Z n (2)), . . . , fn (Z n (B))] =: f (Z).
We now introduce two lemmas that will be used to establish Lemma 2.1. The proof of the first lemma
will be provided at the end of this section after we finish the proof of Lemma 2.1, while the proof of
the second one contains significant novelty both conceptually and technically and deserves a dedicated
treatment in Section V.
Lemma 4.1: Let Shell (c, r1 , r2 ) denote a spherical shell defined as
Shell (c, r1 , r2 ) := a ∈ RnB : r1 ≤ ka − ck ≤ r2 .
For any δ > 0 and B sufficiently large, we have
and
where
and
Pr(E1 ) ≥ 1 − δ
Pr(E2 ) ≥ 1 − δ,
o
n
p
p
E1 := Z ∈ Shell 0, nB(P + N − δ), nB(P + N + δ)
n
p
o
p
E2 := Y ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) .
Lemma 4.2: Let Ball(c, r) denote a Euclidean ball centered at c and of radius r, i.e.,
Ball (c, r) := a ∈ RnB : ka − ck ≤ r .
For any δ > 0 and B sufficiently large, we have
Pr(E3 ) ≥ 1 − δ,
(38)
(39)
17
where E3 is defined to be the following event:
)
(
r
p
1
ω
2
2
−1
f (I) ∩ Ball 0, nB(P + N + δ ∩ Ball Y, nBN 4sin 2 + δ ≥ 2nB[ 2 log(2πeN (sin θn −cos ω))−δ]
2
(40)
1
n
in which f −1 (I) := {a ∈ RnB : f (a) = I}, θn is given by arcsin 2− n H(In |X ) as in (5) and ω ∈
(π/2 − θn , π/2].
With the above two lemmas, we now upper bound H(I|Y). Consider the indicator function
F = I(E1 , E2 , E3 )
where I(·) is defined as
I(A) =
1
0
if A holds
otherwise,
and the events E1 , E2 and E3 are as given by (38)–(40). Obviously, by the union bound, we have
Pr(F = 1) ≥ 1 − 3δ
for any δ > 0 and B sufficiently large, and therefore
H(I|Y) ≤ H(I, F |Y)
= H(F |Y) + H(I|Y, F )
≤ H(I|Y, F ) + 1
= Pr(F = 1)H(I|Y, F = 1) + Pr(F = 0)H(I|Y, F = 0) + 1
≤ H(I|Y, F = 1) + 3δnBC0 + 1.
(41)
To bound
apply a packing argument as follows. Consider a ball centered at Y and
q H(I|Y, F = 1), we
of radius nBN 4sin 2 ω2 + δ , i.e.,
r
ω
,
Ball Y, nBN 4sin 2 + δ
2
where
and ω satisfies
p
p
Y ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) ,
π/2 − θn < ω ≤ π/2.
p
We now upper bound the volume of the intersection between this ball and Ball 0, nB(P + N + δ) ,
i.e.,
r
p
ω
Ball Y, nBN 4sin 2 + δ
∩ Ball 0, nB(P + N + δ) .
2
For this, first consider the intersection volume
r
p
ω
Ball Y, nBN 4sin 2
∩ Ball 0, nB(P + N ) ,
2
for
p
Y ∈ Sphere 0, nB(P + N ) ,
(42)
18
where Sphere(c, r) denotes a sphere centered at c of radius r, i.e.,
Sphere (c, r) := a ∈ RnB : ka − ck = r .
See Fig. 8. Note that the intersection in (42) actually
p consists oftwo spherical caps that are respectively
p
ω
2
located on Ball Y, nBN 4sin 2 and Ball 0, nB(P + N ) . Since these two caps have essentially
p
the same volume, it is sufficient to calculate the volume for the cap on Ball 0, nB(P + N ) , i.e. the
shaded area in Fig. 8. In particular, observe that
q
nBN sin2 ω2
φ = 2 arcsin p
nB(P + N )
s
N sin2 ω2
= 2 arcsin
P +N
= 2φ1
q
N sin2 ω
2
. Using the volume formula for the hyperspherical cap and characterizing the
where φ1 := arcsin
P +N
p
exponent of such a volume (c.f. Appendix A), we have that for any Y ∈ Sphere 0, nB(P + N ) and
B sufficiently large,
r
p
Ball Y, nBN 4sin 2 ω ∩ Ball 0, nB(P + N ) ≤ 2nB[ 12 log(2πe(P +N )sin2 φ)+δ] .
2
Since
sin2 φ = (2sin φ1 cos φ1 )2
= 4sin 2 φ1 cos2 φ1
N sin 2 ω2
N sin 2 ω2
1−
,
=4
P +N
P +N
p
we have for any Y ∈ Sphere 0, nB(P + N ) and B sufficiently large,
r
2ω
8πeN sin2 ω
p
2 (P +N −N sin 2 ) +δ
nB 12 log
ω
P
+N
Ball Y, nBN 4sin 2
.
∩ Ball 0, nB(P + N ) ≤ 2
2
p
p
By continuity, this would further imply that for any Y ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) ,
r
2ω
8πeN sin2 ω
p
2 (P +N −N sin 2 ) +δ
nB 12 log
ω
1
P +N
Ball Y, nBN 4sin 2 + δ
∩ Ball 0, nB(P + N + δ) ≤ 2
,
2
where δ1 → 0 as δ → 0 and B → ∞.
On the other hand, the condition F = 1 (c.f. the definition of E3 in Lemma 4.2) also ensures that given
Y, each possible I occupies a volume of at least
1
2 θ −cos2
n
2nB[ 2 log(2πeN (sin
from the set
ω))−δ]
r
p
ω
2
Ball Y, nBN 4sin
+δ
∩ Ball 0, nB(P + N + δ) .
2
19
p
p
Therefore, given F = 1 and Y ∈ Shell 0, nB(P + N − δ), nB(P + N + δ) , the number of
different possibilities for I is upper bounded by the ratio between
r
p
ω
Ball Y, nBN 4sin 2 + δ
∩ Ball 0, nB(P + N + δ) 2
and
1
2 θ −cos2
n
2nB[ 2 log(2πeN (sin
ω))−δ]
,
which can be further upper bounded by
nB
2
nB
=2
1
2
log
2ω
8πeN sin2 ω
2 (P +N −N sin 2 ) − 1
P +N
2
1
2
log
2ω
4sin2 ω
2 (P +N −N sin 2 ) +δ
2
(P +N )(sin2 θn −cos2 ω)
log(2πeN (sin2 θn −cos2 ω))+δ+δ1
,
where δ2 → 0 as δ → 0 and B → ∞. This immediately implies the following upper bound on H(I|Y, F =
1),
#
"
4sin2 ω2 (P + N − N sin2 ω2 )
1
log
+ δ2 ,
H(I|Y, F = 1) ≤ nB
2
(P + N )(sin2 θn − cos2 ω)
which combined with (41) yields that
"
#
4sin2 ω2 (P + N − N sin2 ω2 )
1
log
H(I|Y) ≤ nB
+ δ2 + 3δnBC0 + 1.
2
(P + N )(sin2 θn − cos2 ω)
Dividing B at both sides of the above inequality and noting that
H(I|Y) =
B
X
b=1
we have
H(In (b)|Y n (b)) = BH(In |Y n ),
4sin2 ω2 (P + N − N sin2 ω2 )
1
1
H(In |Y ) ≤ n
log
+ δ2 + 3δC0 +
.
2
nB
(P + N )(sin2 θn − cos2 ω)
n
(43)
1
nB
in (43) can all be made arbitrarily small by choosing B sufficiently large, we obtain
!
2ω
2ω
4sin
(P
+
N
−
N
sin
)
1
2
2
H(In |Y n ) ≤ n
.
(44)
log
2
(P + N )(sin2 θn − cos2 ω)
Noting that (44) holds for any ω ∈ π2 − θn , π2 , this completes the proof of Lemma 2.1.
Since δ, δ2 and
A. Proof of Lemma 4.1
We now present the proof of Lemma 4.1.
Proof of Lemma 4.1: Recalling that Z = [Z n (1), Z n (2), . . . , Z n (B)], we have
2
kZk =
B
X
b=1
kZ n (b)k2 .
Therefore by the weak law of large numbers, for any δ > 0 and B sufficiently large we have
1
2
n
2
Pr kZk − E[kZ k ] ≤ δ ≥ 1 − δ.
B
(45)
20
Since Z n = X n + W1n , we have
E[kZ n k2 ] = E[kX n + W1n k2 ]
= E[kX n k2 ] + 2E[X n · W1n ] + E[kW1n k2 ].
By assumption E[kX n k2 ] = nP and E[kW1n k2 ] = nN . Also we have
" n
#
X
Xi W1i
E [X n · W1n ] = E
i=1
=
n
X
E[Xi W1i ]
i=1
=
n
X
E[Xi ]E[W1i ]
i=1
= 0.
Therefore, E[kZ n k2 ] = n(P + N ), and this combined with (45) yields that
Pr(kZk2 ∈ [nB(P + N − δ), nB(P + N + δ)]) ≥ 1 − δ,
for any δ > 0 and B sufficiently large. Since Z and Y are identically distributed, the above relation also
holds with kZk2 replaced by kYk2 . This completes the proof of the lemma.
V. P ROOF OF L EMMA 4.2
To prove Lemma 4.2, we introduce two lemmas as follows. The proofs of these two lemmas, which will
be presented in Sections V-A and V-B, are based on typicality argument and strengthening the isoperimetric
inequality on a spherical shell respectively.
Lemma 5.1: Given any pair of (x, i), let S (Z n |x, i) be a set of z defined as
n
√
√
√
√
S (Z n |x, i) := z ∈ f −1 (i) : kx − zk ∈ [ nB( N − ), nB( N + )]
p
z ∈ Ball 0, nB(P + N + )
o
2nB(log sinθn −) ≤ p(f (z)|x) ≤ 2nB(log sinθn +)
1
n
where θn = arcsin 2− n H(In |X ) . Then there exists a set S (X n , In ) of (x, i) pairs, such that
√
Pr((X, I) ∈ S (X n , In )) ≥ 1 − ,
and for any (x, i) ∈ S (X n , In ),
Pr(Z ∈ S (Z n |x, i)|x) ≥ 2nB(log sinθn −2) ,
(46)
(47)
for B sufficiently large.
Lemma 5.2: Let Lm ⊆ Rm be a spherical shell defined as
Lm = {y ∈ Rm : Rm − ∆ ≤ kyk ≤ Rm + ∆}
√
√
√
where Rm := mR for some R > 0 and ∆ := mδ for some 0 < δ ≤ R. Let A ⊆ Lm be an arbitrary
subset on this shell with volume
m
2
|A| = 2 2 log 2πeRsin θ .
For any ω ∈ (π/2 − θ, π/2], ε > 0 and m sufficiently large, we have
1
ω
2
2
Pr A ∩ Ball Y, 2(Rm + ∆)sin + 2∆ ≥ 2m[ 2 log(2πeR(sin θ−cos ω))−ε] ≥ 1 − ε,
2
(48)
21
where Y is uniformly drawn from Lm .
Equipped with the above two lemmas, we are now ready to prove Lemma 4.2.
Proof of Lemma 4.2: From Lemma 5.1, we have for any pair (x, i) ∈ S (X n , In ),
Pr(Z ∈ S (Z n |x, i)|x) ≥ 2nB(log sinθn −2) ,
for B sufficiently large. We also have
Pr(Z ∈ S (Z n |x, i)|x) ≤ |S (Z n |x, i)|
sup
f (z|x)
z∈S (Z n |x,i)
1
≤ |S (Z n |x, i)|2−nB ( 2 log 2πeN −1 ) ,
for some 1 → 0 as → 0, where the second inequality follows because i) Z is Gaussian distributed given
x, and ii) for any z ∈ S (Z n |x, i), we have
√
√
√
√
kx − zk ∈ [ nB( N − ), nB( N + )].
Therefore, the volume of S (Z n |x, i) can be lower bounded by
1
|S (Z n |x, i)| ≥ 2nB ( 2 log(2πeN sin
2 θ )−2−
n
1
).
Noting that S (Z n |x, i) is a subset of
√
√
√
√
Shell x, nB( N − ), nB( N + ) ,
by Lemma 5.2, we have for any ω ∈ (π/2 − θn , π/2],
r
1
2 θ −cos2 ω))−
ω
nB
log(2πeN
(sin
2
n
n
3]
Pr S (Z |x, i) ∩ Ball U, nBN 4sin + 2 ≥ 2 [ 2
≥ 1 − 3 (49)
2
√
√
√
√
where U is uniformly distributed over Shell x, nB( N − ), nB( N + ) , 2 is defined such that
r
√
√
√
ω
ω
nBN 4sin2 + 2 = 2
nBN + nB sin + 2 nB,
2
2
which tends to zero as goes to zero, and 3 → 0 as B → ∞.
Now we translate the bound (49) on the probability involving U to a bound on the probability involving
Y. Specifically, let Y(x,i) be defined as
(
√
√
√
√
Y(x,i) = y ∈ Shell x, nB( N − ), nB( N + ) :
)
r
1
2
ω
2
S (Z n |x, i) ∩ Ball U, nBN 4sin2 + 2 ≥ 2nB [ 2 log(2πeN (sin θn −cos ω))−3 ] .
2
By (49), the volume of Y(x,i) is lower bounded by
√
√
√
√
|Y(x,i) | ≥ (1 − 3 ) Shell x, nB( N − ), nB( N + ) 1
≥ 2nB ( 2 log 2πeN −4 ) ,
where 4 → 0 as → 0 and B → ∞. Therefore, we have
Pr(Y ∈ Y(x,i) |x) ≥ |Y(x,i) | inf f (y|x)
y∈Y(x,i)
≥ 2nB (
1
2
log 2πeN −4 ) −nB ( 12 log 2πeN +1 )
−nB5
≥2
2
,
22
where 5 → 0 as → 0 and B → ∞. We then apply the following blowing-up lemma to the set Y(x,i) ;
for a proof of this lemma, see [6].
Lemma 5.3: Let V1 , V2 , . . . , Vn be n i.i.d. Gaussian random variables with Vi ∼ N (0, N ), ∀i ∈ {1, 2, . . . , n}.
Then, for any A ⊆ Rn with Pr(V n ∈ A) ≥ 2−nan ,
nr 2
Pr(V n ∈ Γ√n(√2N an ln 2+r) (A)) ≥ 1 − 2− 2N , ∀r > 0,
where
Γl (A) := {v n ∈ Rn : ∃ v1n ∈ A s.t. kv n − v1n k ≤ l}.
In particular,
noting that Y is Gaussian distributed given x, the above lemma immediately yields that for
√
6 := 2N 5 ln 2 + ,
Pr(Y ∈ Γ√nB6 (Y(x,i) )|x) ≥ 1 − 2−
≥ 1 − ,
nB2
2N
for B sufficiently large.
Let 7 be defined such that
r
r
√
ω
2ω
nBN 4sin + 7 = nBN 4sin2 + 2 + nB6 .
2
2
Then we have for any y ∈ Γ√nB6 (Y(x,i) ), there exists a y0 ∈ Y(x,i) such that ky − y0 k ≤
r
r
2ω
2ω
0
Ball y , nBN 4sin + 2
⊆ Ball y, nBN 4sin + 7
.
2
2
√
nB6 and
This implies that for any y ∈ Γ√nB6 (Y(x,i) ),
r
ω
2
n
S (Z |x, i) ∩ Ball y, nBN 4sin + 7 ≥ 2nB [ 12 log(2πeN (sin2 θn −cos2 ω))−3 ] .
2
p
By the definition S (Z n |x, i) is a subset of f −1 (i) ∩ Ball 0, nB(P + N + ) , therefore we in turn
have
r
p
−1
f (i) ∩ Ball 0, nB(P + N + ) ∩ Ball y, nBN 4sin2 ω + 7 ≥ 2nB [ 12 log(2πeN (sin2 θn −cos2 ω))−3 ] .
2
for any y ∈ Γ√nB6 (Y(x,i) ).
Therefore, we have for B sufficiently large,
r
p
−1
ω
2
Pr f (I) ∩ Ball 0, nB(P + N + ) ∩ Ball Y, nBN 4sin + 7 2
!
1
2
2
≥ 2nB [ 2 log(2πeN (sin θn −cos ω))−3 ]
≥
≥
X
(x,i)
Pr(Y ∈ Γ√nB6 (Y(x,i) )|x)p(x, i)
X
(x,i)∈S (X n ,In )
≥ (1 − )(1 −
√
≥ 1 − 2 .
Pr(Y ∈ Γ√nB6 (Y(x,i) )|x)p(x, i)
√
)
√
Finally, choosing δ = max{2 , 3 , 7 } concludes the proof of Lemma 4.2.
23
A. Proof of Lemma 5.1 Via Typicality Argument
We now present the proof of Lemma 5.1. Recall that H(In |X n ) = −n log sin θn for the n-channel use
code. By the law of large numbers and Lemma 4.1, we have for any > 0 and sufficiently large B,
Pr((X, Z) ∈ S (X n , Z n )) ≥ 1 − where
n
√
√
√
√
S (X n , Z n ) := (x, z) : kx − zk ∈ [ nB( N − ), nB( N + )]
p
z ∈ Ball 0, nB(P + N + )
o
2nB(log sinθn −) ≤ p(f (z)|x) ≤ 2nB(log sinθn +) .
Note that in terms of S (X n , Z n ), the set S (Z n |x, i) in Lemma 5.1 can be simply written as
S (Z n |x, i) = {z : f (z) = i, (x, z) ∈ S (X n , Z n )}.
Therefore, for B sufficiently large, we have
Pr(Z ∈
/ S (Z n |X, I)) = Pr(f (Z) = I, (X, Z) ∈
/ S(X n , Z n ))
≤ .
√
On the other hand, defining S (X n , In ) := {(x, i) : Pr(Z ∈ S (Z n |x, i)|x, i) ≥ 1 − }, we have
X
Pr(Z ∈
/ S (Z n |X, I)) =
Pr(Z ∈
/ S (Z n |x, i)|x, i)p(x, i)
(x,i)∈S (X n ,In )
+
≥
√
X
Pr(Z ∈
/ S (Z n |x, i)|x, i)p(x, i)
(x,i)∈S
/ (X n ,In )
· Pr(Sc (X n , In )).
Therefore, we have for B sufficiently large,
√
Pr(Sc (X n , In )) ≤ √ = ,
and thus
Pr(S (X n , In )) ≥ 1 −
√
,
which proves (46).
To√prove (47), consider any (x, i) ∈ S (X n , In ). From the definition of S (X n , In ), Pr(S (Z n |x, i)|x, i) ≥
1 − . Therefore, S (Z n |x, i) must be nonempty, i.e., there exists at least one z ∈ S (Z n |x, i). Consider
any z ∈ S (Z n |x, i). By the definition of S (Z n |x, i), we have f (z) = i and (x, z) ∈ S (X n , Z n ). Then,
it follows from the definition of S (X n , Z n ) that
2nB(log sinθn −) ≤ p(f (z)|x) = p(i|x) ≤ 2nB(log sinθn +) .
This further implies that
Pr(Z ∈ S (Z n |x, i)|x)
Pr(f (Z) = i|x)Pr(Z ∈ S (Z n |x, i)|x, f (Z) = i)
=
Pr(f (Z) = i|Z ∈ S (Z n |x, i), x)
= p(i|x)Pr(S (Z n |x, i)|x, i)
√
≥ 2nB(log sinθn −) (1 − )
≥ 2nB(log sinθn −2)
for sufficiently large B, which concludes the proof of (47) and Lemma 5.1.
24
B. Proof of Lemma 5.2 Via Strengthening Isoperimetric Inequality
We now prove Lemma 5.2 by extending our result in Lemma 3.1 on a sphere to a spherical shell. Let
Lm = {y ∈ Rm : Rm − ∆ ≤ kyk ≤ Rm + ∆}
√
√
√
be this shell where Rm := mR for some R > 0 and ∆ := mδ for some 0 < δ ≤ R; note that one
can also write this shell as
[
Lm =
Sm−1
r
r∈[Rm −∆,Rm +∆]
where
Sm−1
:= {y ∈ Rm : kyk = r}
r
denotes the (m − 1)-sphere of radius r.
Let A ⊆ Lm be an arbitrary subset on this shell with volume
m
|A| = 2 2
log 2πeRsin2 θ
.
We say A is spherically symmetric about a pole z0 ∈ Sm−1
Rm if for any r ∈ [Rm − ∆, Rm + ∆], the
intersection A ∩ Sm−1
yields
a
spherical
cap
about
the
pole
rz0 /Rm . In this case, denote by θr the angle
r
m−1
associated with each cap A ∩ Sr .
Similarly as the proof of Lemma 3.1, we prove Lemma 5.2 in two steps. In particular, we will first
demonstrate that the claim in Lemma 5.2 holds when A is known to be spherically symmetric, and then
show that if it holds for a spherically symmetric A, then it holds for any arbitrary set A.
i) Spherically symmetric A: Suppose A is spherically symmetric about pole z0 . With a slight abuse of
notation, denote by
Cap(y, ω) := {y0 ∈ Lm : ∠(y, y0 ) ≤ ω}
a shell cap in Lm with pole y and angle ω. To show (48), it suffices to show that for any ω > π/2 − θ,
ε > 0 and m sufficiently large,
1
2
2
Pr |A ∩ Cap(Y, ω)| ≥ 2m( 2 log 2πeR(sin θ−cos ω)−ε) > 1 − ε
where Y is uniformly distributed over Lm . This
is because the shell Cap(Y, ω) is always contained by
the Euclidean Ball Y, 2(Rm + ∆)sin ω2 + 2∆ and thus
ω
A ∩ Cap(Y, ω) ⊆ A ∩ Ball Y, 2(Rm + ∆)sin + 2∆ .
2
Since the intersection volume |A ∩ Cap(y, ω)| depends only on the angle ∠(y, y0 ) and not on the
magnitude of y, we can instead consider y to be uniformly drawn from the sphere Sm−1
Rm . For each
m−1
y ∈ SRm , the intersection volume |A ∩ Cap(y, ω)| can be written as an integral over each sphere in the
shell, i.e.,
Z
|A ∩ Cap(y, ω)| =
1A∩Cap(y,ω) (z) dz
Rm
Z Rm +∆ Z
=
1A∩Cap(y,ω) (z) dz dr.
Rm −∆
Sm−1
r
Recall that due to measure concentration, for any > 0 and m sufficiently large, we have
Pr (∠(z0 , Y) ∈ [π/2 − /2, π/2 + /2]) ≥ 1 − .
(50)
This allows us to concentrate on characterizing |A ∩ Cap(y, ω)| for y’s in Sm−1
Rm with ∠(z0 , y) ∈ [π/2 −
/2, π/2 + /2].
25
In particular, for y0 ∈ Sm−1
Rm with ∠(z0 , y0 ) = π/2 and any r ∈ [Rm − ∆, Rm + ∆], it can be shown
that
Z
1A∩Cap(y0 ,ω) (z) dz
(51)
Sm−1
r
1
2
0
2
≥ 2m( 2 log 2πeRr (sin θr −cos ω)−)
1
2
0
2
≥ 2m( 2 log 2πeR (sin θr −cos ω)−)
(52)
(53)
where ω > π/2 − θr and can be made arbitrarily small by choosing m to be sufficiently large. In the
above, (52) follows from characterizing the surface
area of the intersection of two spherical caps (see
p
0
0
such that mRr = r, and (53) follows from lower bounding Rr0 by
Appendix B), where Rr is defined √
0
0
R with R being defined such that mR0 = Rm − ∆. We would like to integrate (53) over r in order to
get a lower bound on the total intersection volume |A ∩ Cap(y0 , ω)|. However, (53) is only valid when
θr + ω ≥ π2 , and when θr + ω < π2 , the slice intersection area (51) will be empty and does not contribute
to the total intersection volume. This motivates us to set
π
θr0 = max θr , − ω
2
and write
Z
1
2 0
0
2
1A∩Cap(y0 ,ω) (z)dz ≥ 2m( 2 log 2πeR (sin θr −cos ω)−) .
(54)
Sm−1
r
We now rewrite the right-hand side of (54) in the following way. Let the function h : [0, 1] → R be
defined by
2
m
0
2
h(t) = 2 2 log 2πeR (t m −cos ω)
so that
m
h (sin m θr0 ) = 2 2
log 2πeR0 (sin2 θr0 −cos2 ω)
.
2
By differentiating h(t) twice it can be seen that h is convex for t n ≥ cos2 ω. Integrating both sides of
(54) yields that for any > 0 and m sufficiently large,
Z Rm +∆ Z
1A∩Cap(y0 ,ω) (z)dz dr
Rm −∆
Sm−1
r
Z Rm +∆
≥
h(sin m θr0 )2−m dr
(55)
Rm −∆
Z Rm +∆
1
m 0
sin θr dr 2−m
(56)
≥ 2∆ h
2∆ Rm −∆
Z Rm +∆
1
−m
m 0
≥2
h
sin θr dr
(57)
2∆ Rm −∆
where (56) is due to Jensen’s inequality.
On the other hand, we have
1
2
|A| = 2m( 2 log 2πeRsin θ)
Z Rm +∆
1
2
0
2m( 2 log 2πeRr sin θr +) dr
≤
R −∆
Z Rmm +∆
1
2
00
≤
2m( 2 log 2πeR sin θr +) dr
Rm −∆
Z Rm +∆
m( 12 log 2πeR00 +)
=2
sinm θr dr
Rm −∆
26
where the first inequality follows from characterizing the surface area of a hyperspherical
cap (see
√
00
00
Appendix A), and the second inequality follows by defining R to be such that mR = Rm + ∆.
Therefore, we have
Z Rm +∆
2θ
1
1 m 12 log Rsin
−
m
00
R
sin θr dr ≥
2
2∆ Rm −∆
2∆
1
2
≥ 2m( 2 log sin θ−2) ,
for any > 0 and m sufficiently large. Combining this with (57), we have for m sufficiently large,
|A ∩ Cap(y0 , ω)|
Z Rm +∆
1
−m
m 0
≥2
h
sin θr dr
2∆ Rm −∆
Z Rm +∆
1
m
−m
sin θr dr
≥2
h
2∆ Rm −∆
m( 12 log sin2 θ−2)
−m
≥2
h 2
m
0
2
−4
2
= 2−m 2 2 log 2πeR (sin θ·2 −cos
1
2
2
≥ 2m( 2 log 2πeR(sin θ−cos ω)−1 )
ω)
where , 1 → 0 as m → ∞. This together with (50) then yields that
1
2
2
Pr |A ∩ Cap(Y, ω)| ≥ 2m( 2 log 2πeR(sin θ−cos ω)−ε) ≥ 1 − ε,
(58)
where ε → 0 as m → ∞.
ii) Arbitrary A: To extend the above result to an arbitrary A, we again use the Riesz rearrangement
inequality in Lemma 3.2, similarly as in the proof of Lemma 3.1. In particular, to apply Lemma 3.2 to
the shell situation we must find appropriate functions on the sphere. Note that
Z
|A ∩ Cap(y, ω)| =
1A∩Cap(y,ω) (z) dz
Rm
Z Rm +∆
Z
m−1
=
r
1A∩Cap(y,ω) (rz) dr dz
Sm−1
1
so that if
fA (z) =
Rm −∆
Z
Rm +∆
rm−1 1A (rz) dr
Rm −∆
and
then
(
0 ω<α≤π
K(cos α) =
1 0≤α≤ω
ψ(y) = |A ∩ Cap(y, ω)| =
Z
Sm−1
1
fA (z)K(hz, y/kyki)dz.
Let ψ ∗ be the symmetric decreasing rearrangement of ψ about a pole z0 . The claim is that there exists
a subset A0 ⊆ Lm with |A| = |A0 | which is spherically symmetric about z0 and such that fA0 = fA∗ .
Explicitly, one example would be
[
A0 =
A0r
r∈[Rm −∆,Rm +∆]
Rr
where A0r ⊆ Sm−1
is
a
spherical
cap
with
the
same
area
as
{z
:
f
(z)
>
d}
for
d
=
rm−1 dr.
A
r
Rm −∆
0
0
0
Let ψ (y) = |A ∩ Cap(y, ω)|. Both ψ and ψ depend only on the angle ∠(z0 , y) and are nonincreasing
in that angle. Following the same argument from the proof of Lemma 3.1, one can establish (58) for any
arbitrary A ⊆ Lm , which finishes the proof of Lemma 5.2.
27
A PPENDIX A
S URFACE A REA AND VOLUME OF A H YPERSPHERICAL C AP
In this Appendix we derive the surface area and volume formulas for a hyperspherical cap (see also
[13]) and then characterize the exponents of these expressions.
√
Let C ⊆ Sm−1 be a spherical cap with angle θ on the (m − 1)-sphere of radius Rm = mR. The area
µ(C) of C can be written as
Z
θ
Am−2 (Rm sin ρ)Rm dρ
µ(C) =
0
where Am−2 (Rm sin ρ) is the total surface area of the (m − 2)-sphere of radius Rm sin ρ. Plugging in the
expression for the surface area of an (m − 2)-sphere leads to
Z θ
m−1
m−2
2π 2
(mR) 2
µ(C) =
sin m−2 ρ dρ.
Γ m−1
0
2
m−1
2
We now characterize the exponent of µ(C). First, by Stirling’s approximation, Γ2πm−1
(mR)
( 2 )
above can be bounded as
m−2
2
in the
m−1
2
m
[log(2πeR)−1 ]
2
m−2
m
2π 2
(mR) 2 ≤ 2 2 [log(2πeR)+1 ]
≤
m−1
Γ 2
for some 1 → 0 as m → ∞. Also for m sufficiently large, we have
Z θ
Z θ
m−2
2
m−2
sin
ρdρ =
2 2 log sin ρ dρ
0
Z0 θ
m−2
2
2 2 log sin ρ dρ
≥
1
θ− m
1 m−2 log sin2 (θ− 1 )
m
2 2
m
m
2
≥ 2 2 (log sin θ−2 )
≥
and
Z
θ
m−2
sin
ρdρ =
Z
θ
2
m−2
2
log sin2 ρ
dρ
0
0
m−2
2
≤ θ · 2 2 log sin θ
m
2
≤ 2 2 (log sin θ+2 )
for some 2 → 0 as m → ∞. Therefore, the area µ(C) can be bounded as
m
2 θ)−]
2 2 [log(2πeRsin
m
2 θ)+]
≤ µ(C) ≤ 2 2 [log(2πeRsin
(59)
for some → 0 as m → ∞.
To find the volume |C| of C, we integrate over the surface area of the spheres at each radius, which
leads to
Z θ
Z Rm
m−1
2π 2
m−2
|C| =
sin
ρ dρ
rm−2 dr
Γ m−1
0
0
2
Z θ
m−1
2(πmR) 2
=
sin m−2 ρ dρ.
(m
−
1)
Γ m−1
0
2
28
Following the similar approach as above, we can also bound the volume |C| as
m
2 θ)−]
2 2 [log(2πeRsin
m
2 θ)+]
≤ |C| ≤ 2 2 [log(2πeRsin
(60)
for some → 0 as m → ∞.
Let Sm−1
A PPENDIX B
S URFACE A REA OF THE I NTERSECTION OF T WO H YPERSPHERICAL C APS
√
⊆ Rm be an (m − 1)-sphere of radius Rm = mR. Let
Ci = Cap(vi , θi ) = {v ∈ Sm−1 |∠(v, vi ) ≤ θi }, i = 1, 2
be two spherical caps on Sm−1 such that ∠(v1 , v2 ) = π2 , θi < π2 , and θ1 + θ2 > π2 . We have the following
lemma which characterizes the intersection area µ(C1 ∩ C2 ) of these two caps.
Lemma B.1: There exists some which tends to 0 as m goes to infinity, such that
m
2 θ −cos2 θ ))−]
1
2
m
2 θ −cos2 θ ))+]
1
2
µ(C1 ∩ C2 ) ≥ 2 2 [log(2πeR(sin
and µ(C1 ∩ C2 ) ≤ 2 2 [log(2πeR(sin
.
To prove this lemma, below we will first derive the surface area formula for the intersection of the
above two caps (see also [14]) and then characterize the exponent of this area.
A. Deriving the Surface Area Formula
Consider the points v ∈ Sm−1 such that
∠(v1 , v) = θ1
and
∠(v2 , v) = θ2 .
These points satisfy the linear relations
2
hv1 , vi = Rm
cos θ1
and
2
hv2 , vi = Rm
cos θ2 ,
and therefore all such v lie in the unique m − 1 dimensional subspace H defined by
v1
v2
−
, v = 0.
cos θ1 cos θ2
The angle between the hyperplane H and the vector v2 is


π
1
v1
v2
q
φ = − arccos 
−
, v2 
1
2
cos θ1 cos θ2
R
+ 1
m
cos2 θ1
cos2 θ2
and because v1 and v2 are orthogonal and kv2 k = Rm ,

π
1
q
φ = − arccos 
1
2
cos θ
+ 1
2
cos2
θ1
cos2

θ2
 = arctan
cos θ1
cos θ2
.
29
The approach will be as follows. Divide the intersection C1 ∩ C2 into two parts C + and C − that are
on either side of the hyperplane H. More concretely,
v2
v1
+
−
≥0
C = v ∈ C1 ∩ C2 v,
cos θ1 cos θ2
and
−
C =
v1
v2
−
<0 .
v ∈ C1 ∩ C2 v,
cos θ1 cos θ2
Each part C + and C − can be written as a union of lower dimensional spherical caps. We will find the
measure of each part by integrating the measures of these lower dimensional caps.
The measure of the cap C2 can be expressed as the integral
Z θ2
Am−2 (Rm sin ρ)Rdρ
µ(C2 ) =
0
where Am−2 (Rm sin ρ) is the surface area of the (m − 2)-sphere with radius Rm sin ρ. If we consider a
single (m − 2)-sphere at some angle ρ, then the hyperplane H divides that (m − 2)-sphere into two
spherical caps. The claim is that each of these m − 2 dimensional caps that is on the side of H with v1
is contained in C + (and those on the side with v2 are contained in C − ). Furthermore, all points in C +
are in one of these m − 2 dimensional caps. The claim follows because
v1
v2
v,
−
≥0
cos θ1 cos θ2
implies
cos θ2 cos ∠(v, v1 ) ≥ cos θ1 cos ∠(v, v2 )
and since ∠(v, v2 ) ≤ θ2 and cos ∠(v, v2 ) ≥ cos θ2 , this implies
cos θ2 cos ∠(v, v1 ) ≥ cos θ1 cos θ2 .
Finally, this implies ∠(v, v1 ) ≤ θ1 , v ∈ C1 , and v ∈ C + .
Note that for ρ < φ, the (m − 2)-sphere at angle ρ is entirely on the v2 side of H. This establishes the
fact that
Z θ
2
µ(C + ) =
θ
ρ
Cm−2
(Rm sin ρ)Rdρ
φ
θρ
Cm−2
(Rm sin ρ)
where
is the surface area of an m − 2 dimensional spherical cap defined by angle θρ on
the (m − 2)-sphere of radius Rm sin ρ. Writing
cos θρ =
h
Rm sin ρ
note that h is the distance from the center of the (m − 2)-sphere at angle ρ to the m − 2 dimensional
hyperplane that divides the sphere into two caps. Furthermore, since the (m − 2)-sphere has center
(Rm cos ρ)v2 , we have
h
tan φ =
.
Rm cos ρ
Therefore,
θρ = arccos
tanθφ
tanρ
.
30
Combining this with the corresponding result for µ(C − ) yields
µ(C1 ∩ C2 ) = µ(C + ) + µ(C − )
Z θ2
arccos( tanφ )
Cm−2 tanρ (Rm sin ρ)Rm dρ
=
φ
+
Z
θ1
arccos(
π
−φ
2
Cm−2
tan(π/2−φ)
tanρ
)
(Rm sin ρ)Rm dρ.
This expression can be rewritten using known expressions for the area of a hyperspherical cap in terms
of the regularized incomplete beta function as
µ(C1 ∩ C2 ) = J(φ, θ2 ) + J(π/2 − φ, θ1 ),
where J(φ, θ2 ) is defined as
m−1
(πmR) 2
J(φ, θ2 ) =
Γ m−1
2
Z
θ2
(sin
φ
m−1
ρ)I1−
2
( tanφ
tanρ )
m−1 1
,
2
2
dρ
(61)
and J(π/2 − φ, θ1 ) is defined similarly. Here in (61), Ix (a, b) is the regularized incomplete beta function,
given by
Ix (a, b) =
B(x; a, b)
,
B(a, b)
(62)
where B(x; a, b) and B(a, b) are the incomplete beta function and the complete beta function respectively:
Z x
B(x; a, b) =
ta−1 (1 − t)b−1 dt
0
B(a, b) =
Γ(x)Γ(y)
.
Γ(x + y)
B. Characterizing the Exponent
We now lower and upper bound J(φ, θ2 ) with exponential functions. First, using Stirling’s approximam−1
tion,
(πmR) 2
Γ( m−1
2 )
on the R.H.S. of (61) can be bounded as
m−1
2
for some 1 → 0 as m → ∞.
Now consider
m
[log(2πeR)−1 ]
2
m
(πmR) 2
≤ 2 2 [log(2πeR)+1 ]
≤
m−1
Γ 2
I1− tanφ 2
( tanρ )
m−1 1
,
2
2
(63)
inside the integral on the R.H.S. of (61). In light of (62), it can be written as
2
tanφ
m−1
1
B 1 − tanρ ; 2 , 2
m−1 1
I1− tanφ 2
,
=
.
( tanρ )
2
2
B m−1
, 12
2
For the denominator in (64), by Stirling’s approximation, we have
− 1
m−1 1
1
m−1 2
B
,
∼Γ
.
2
2
2
2
(64)
31
For the numerator in (64), we have
1−
B
Z
=
tanφ
tanρ
2
1−( tanφ
tanρ )
2
m−1 1
,
;
2
2
t
m−3
2
t
m−3
2
0
Z
≥
!
1
(1 − t)− 2 dt
2
1−( tanφ
tanρ )
dt
0
2
2 m−1 1−( tanφ
)
t 2 0 tanρ
n−1
"
2 # m−1
2
tanφ
2
1−
=
n−1
tanρ
=
n
2
≥2
for some 2 → 0 as m → ∞, and
B
=
1−
Z
tanφ
tanρ
2
1−( tanφ
tanρ )
h
i
2
log 1−( tanφ
−2
tanρ )
2
m−1 1
,
;
2
2
t
m−3
2
t
m−3
2
≤
!
1
(1 − t)− 2 dt
0
Z
2
1−( tanφ
tanρ )
1−
0
tanρ
=
tanφ
tanθ2
≤
tanφ
Z
,
1−
2
1−( tanφ
tanρ )
t
m−3
2
tanφ
tanρ
2 !!− 12
dt
0
Z
2
1−( tanφ
tanρ )
t
m−3
2
dt
0
"
2 # m−1
2
2tanθ2
tanφ
=
1−
(m − 1)tanφ
tanρ
≤2
m
2
h
i
2
log 1−( tanφ
+3
tanρ )
,
for some 3 → 0 as m → ∞. Also noting that
sin m−1 ρ = 2
m−1
2
with ρ ∈ [φ, θ2 ], we can bound the integrand in (61) as
log sin2 ρ
m−1 1
sin
ρ I1− tanφ 2
,
( tanρ )
2
2
h i
tanφ 2
m
2
log (sin ρ) 1−( tanρ )
−4
≥22
m−1
m
= 2 2 [log(sin
2 ρ−tan2 φ cos2
ρ)−4 ]
dt
32
and
ρ I1−
m−1 1
2
,
sin
)
( tanφ
2
2
tanρ
m
2 ρ−tan2 φ cos2 ρ +
log
sin
) 4]
≤22[ (
m−1
for some 4 → 0 as m → ∞. For sufficiently large m,
Z θ2
m−1 1
m−1
(sin
ρ)I1− tanφ 2
,
dρ
( tanρ )
2
2
φ
Z θ2
m−1 1
m−1
,
dρ
≥
(sin
ρ)I1− tanφ 2
( tanρ )
1
2
2
θ2 − m
Z θ2
m
2
2
2
≥
2 2 [log(sin ρ−tan φ cos ρ)−4 ] dρ
1
θ2 − m
1 n2 [log(sin2 (θ2 − m1 )−tan2 φ cos2 (θ2 − m1 ))−4 ]
2
m
m
2
2
2
≥ 2 2 [log(sin θ2 −tan φ cos θ2 )−5 ]
m
2
2
= 2 2 [log(sin θ2 −cos θ1 )−5 ] ,
≥
and
Z
θ2
(sin
m−1
φ
m
ρ)I1− tanφ 2
( tanρ )
≤ 2 2 [log(sin
2θ
2
2 −cos θ1
m−1 1
,
2
2
)+5 ]
dρ
for some 5 → 0 as m → ∞.
Combining this with (63), we can bound J(φ, θ2 ) as
m
2 2 [log 2πeR(sin
2θ
2 −cos
2
θ1 )−6 ]
m
≤ J(φ, θ2 ) ≤ 2 2 [log 2πeR(sin
2 θ −cos2 θ
2
1
)+6 ]
for some 6 → 0 as m → ∞.
Due to symmetry, we can also bound J(π/2 − φ, θ1 ) as
m
2 2 [log 2πeR(sin
2θ
2
1 −cos θ2
)−6 ] ≤ J(π/2 − φ, θ ) ≤ 2 m2 [log 2πeR(sin2 θ1 −cos2 θ2 )+6 ] .
1
Noting that sin2 θ2 − cos2 θ1 = sin2 θ1 − cos2 θ2 , we have
µ(C1 ∩ C2 ) ≥ J(φ, θ2 ) + J(π/2 − φ, θ1 )
m
2
2
≥ 2 2 [log 2πeR(sin θ1 −cos θ2 )−]
and
m
µ(C1 ∩ C2 ) ≤ 2 2 [log 2πeR(sin
2 θ −cos2 θ
1
2
)+]
for some → 0 as m → ∞. This completes the proof of the lemma.
ACKNOWLEDGEMENT
The authors would like to acknowledge inspiring discussions with Liang-Liang Xie within a preceding
collaboration [4].
33
R EFERENCES
[1] X. Wu, L. Barnes, A. Ozgur, “Cover’s open problem: “The capacity of the relay channel”,” Proc. of 54th Annual Allerton Conference
on Communication, Control, and Computing, Allerton Retreat Center, Monticello, Illinois, 2016.
[2] T. M. Cover, “The capacity of the relay channel,” Open Problems in Communication and Computation, edited by T. M. Cover and B.
Gopinath, Eds. New York: Springer-Verlag, 1987, pp. 72–73.
[3] X. Wu and A. Ozgur, “Improving on the cut-set bound via geometric analysis of typical sets,” in Proc. of 2016 International Zurich
Seminar on Communications.
[4] X. Wu, A. Ozgur, L.-L. Xie, “Improving on the cut-set bound via geometric analysis of typical sets,” submitted to IEEE Trans. Inform.
Theory. Available: http://arxiv.org/abs/1602.08540
[5] X. Wu and A. Ozgur, “Cut-set bound is loose for Gaussian relay networks,” in Proc. of 53rd Annual Allerton Conference on
Communication, Control, and Computing, Allerton Retreat Center, Monticello, Illinois, Sept. 29–Oct. 1, 2015.
[6] X. Wu and A. Ozgur, “Cut-set bound is loose for Gaussian relay networks,” submitted to IEEE Trans. Inform. Theory. Available:
http://arxiv.org/abs/1606.01374
[7] X. Wu and A. Ozgur, “Improving on the cut-set bound for general primitive relay channels,” in Proc. of IEEE Int. Symposium on
Information Theory, 2016.
[8] T. Cover and A. El Gamal, “Capacity theorems for the relay channel,” IEEE Trans. Inform. Theory, vol. 25, pp. 572–584, 1979.
[9] G. Schechtman, “Concentration, results and applications,” Handbook of the geometry of Banach spaces, Vol. 2, 1603–1634, NorthHolland, Amsterdam, 2003.
[10] C. E. Shannon, “Communication in the presence of noise,” Proc. IRE, vol. 37, pp. 10–21, Jan. 1949.
[11] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pt. I, pp. 379–423, 1948; pt. II, pp. 623–656,
1948.
[12] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed. New York, NY, USA: Wiley, 2006.
[13] S. Li, “Concise formulas for the area and volume of a hyperspherical cap,” Asian Journal of Mathematics and Statistics, vol. 4, pp.
66–70, 2011.
[14] Y. Lee and W. C. Kim, “Concise formulas for the surface area of the intersection of two hyperspherical caps,” KAIST Technical Report,
2014.
[15] A. Baernstein II and B. A. Taylor, “Spherical rearrangements, subharmonic functions, and ∗-functions in n-space,” Duke Mathematical
Journal, vol. 43, no. 2, pp. 245-268, 1976.
[16] J. A. Thomas, “Feedback can at most double Gaussian multiple access channel capacity,” IEEE Trans. Inf. Theory, vol. IT-33, no. 5,
pp. 711–716, Sep. 1987.