SOLUTIONS TO SELECTED PROBLEMS IN CHAPTER 2

Solutions to Selected Problems in Chapter 2
1
SOLUTIONS TO SELECTED PROBLEMS IN CHAPTER 2
..
Prove Fano’s inequality.
Solution: Let
1 if x = y,
E=󶁇
0 otherwise.
Then,
H(X |Y) = H(X, E |Y)
= H(E |Y) + H(X |E, Y)
≤ H(E) + P{E = 0} ⋅ H(X | E = 0, Y) + P{E = 1} ⋅ H(X | E = 1, Y)
= H(Pe ) + Pe log | X |
≤ 1 + Pe log | X |.
..
Prove the Csiszár sum identity.
Solution: The identity can be proved by induction, or more simply using the chain
rule of mutual information. Notice that by chain rule the mutual information between Z n and W can be expanded in n! ways, depending on how we order the
element of Z n . For instance, these are both valid expansions
I(Z n ; W) = 󵠈 I(Zi ; W |Z i−1 )
n
i=1
n
= 󵠈 I(Z j ; W |Z nj+1 ),
j=1
where the two orderings Z n = (Z1 , . . . , Zn ) and Z n = (Zn , . . . , Z1 ) have been used.
Therefore, we have
n
; Yi |Y i−1 , U ) = 󵠈 󵠈 I(X j ; Yi |Y i−1 , X nj+1 , U )
󵠈 I(Xi+1
n
(a)
i=1
n
n
i=1 j=i+1
= 󵠈 󵠈 I(X j ; Yi |Y i−1 , X nj+1 , U )
(b)
n j−1
j=2 i=1
= 󵠈 I(X j ; Y j−1 | X nj+1 )
(c)
n
j=2
= 󵠈 I(X j ; Y j−1 | X nj+1 )
(d)
n
j=1
n
= 󵠈 I(Y i−1 ; Xi | Xi+1
),
n
i=1
2
..
Solutions to Selected Problems in Chapter 2
where (a) and (c) follow from chain rule of mutual information, (b) is obtained by
switching the order in the summations, and finally (d) follows from the fact that
Y0 = .
Prove the properties of jointly typical sequences with δ(є) constants explicitly specified.
Solution:
(a) If (x n , y n ) ∈ Tє(n) , then for all (x, y) ∈ X × Y
|π(x, y|x n , y n ) − p(x, y)| ≤ єp(x, y),
or equivalently
p(x, y) − єp(x, y) ≤ π(x, y|x n , y n ) ≤ p(x, y) + єp(x, y).
By summing up over x ∈ X , we get
󵠈 (p(x, y) − єp(x, y)) ≤ 󵠈 π(x, y|x n , y n ) ≤ 󵠈 (p(x, y) + єp(x, y)),
x∈X
x∈X
x∈X
and therefore
p(y) − єp(y) ≤ π(y| y n ) ≤ p(y) + єp(y),
i.e., y n ∈ Tє(n) (Y). Similarly, by summing up over y ∈ Y , we get x n ∈ Tє(n) (X).
(b) By definition,
log p(y n |x n ) = 󵠈 log pY|X (yi |xi )
n
=
i=1
󵠈
(x, y)∈X ×Y
nπ(x, y|x n , y n ) log p(y|x).
But since (x n , y n ) ∈ Tє(n) (X, Y), for any (x, y) ∈ X × Y
p(x, y) − єp(x, y) ≤ π(x, y|x n , y n ) ≤ p(x, y) + єp(x, y).
Therefore,
log p(y n |x n ) ≤
󵠈
(x, y)∈X ×Y
n(p(x, y) − єp(x, y)) log p(y|x)
= −n(H(Y | X) − δ(є)),
where δ(є) = єH(Y|X). Similarly log p(y n |x n ) ≥ −n(H(Y|X) + nδ(є)). Hence
p(y n |x n ) ≐ 2−nH(Y|X) .
Solutions to Selected Problems in Chapter 2
3
(c) First, we establish the upper bound on |Tє(n) (Y|x n )|. Since
󵠈
y 󰑛 :(x 󰑛 , y 󰑛 )∈T󰜖(󰑛)
2−n(H(Y|X)+δ(є)) ≤
󵠈
y 󰑛 :(x 󰑛 , y 󰑛 )∈T󰜖(󰑛)
p(y n |x n ) ≤ 1,
|Tє(n) (Y|x n )| ≤ 2n(H(Y|X)+δ(є)) .
For the lower bound, let {k(x, y) : (x, y) ∈ X × Y } be any set of nonnegative
integers such that if π(x, y|x n , y n ) = n1 k(x, y) for all (x, y), then (x n , y n ) ∈
Tє(n) . Now consider
1
1
log | Tє(n) (Y |x n )| ≥ 󵠈 󶀨log 󶀨 󵠈 k(x, y)󶀸! − 󵠈 log k(x, y)!󶀸
n
n x∈X
y∈Y
y∈Y
≥
(1 − єn )
󵠈 󶀨 󵠈 k(x, y) log 󶀨 󵠈 k(x, y)󶀸 − 󵠈 k(x, y) log k(x, y)󶀸
n
y∈Y
y∈Y
y∈Y
x∈X
= (1 − єn ) 󵠈 󵠈
x∈X y∈Y
∑ y∈Y k(x, y)
k(x, y)
log 󶀦
󶀶,
n
k(x, y)
where the last inequality comes from Stirling’s formula for some єn such that
єn → 0 as n → ∞. Note that
󵄨
󵄨󵄨 1
󵄨󵄨 k(x, y) − p(x, y)󵄨󵄨󵄨 ≤ єp(x, y), and
󵄨󵄨
󵄨󵄨 n
󵄨
󵄨
k(x, y)
≤ p(x, y) + єp(x, y).
p(x, y) − єp(x, y) ≤
n
Substituting in the above inequality, we obtain
∑ y∈Y 󶀡p(x, y) − єp(x, y)󶀱
1
log | Tє(n) (Y |x n )| ≥ (1 − єn ) 󵠈 󶀡p(x, y) − єp(x, y)󶀱 log
n
󶀡p(x, y) + єp(x, y)󶀱
p(x, y)>0
= (1 − єn ) 󵠈 󶀡p(x, y) − єp(x, y)󶀱 log
p(x, y)>0
= (1 − єn ) 󵠈 󶀡p(x, y) − єp(x, y)󶀱 log
p(x, y)>0
= (1 − єn )󶁡 󵠈
p(x, y) log
p(x, y)>0
+ (1 − є) log 󶀡
∑ y∈Y 󶀡p(x, y) − єp(x, y)󶀱
󶀡p(x, y) + єp(x, y)󶀱
p(x) − єp(x)
󶀡p(x, y) + єp(x, y)󶀱
p(x)
p(x)
− є 󵠈 p(x, y) log
p(x, y)
p(x,
y)
p(x, y)>0
1−є
󶀱󶁱 → H(Y | X) as є → 0 and n → ∞.
1+є
4
Solutions to Selected Problems in Chapter 2
(d) Consider
P{( X̃ n , Ỹ n ) ∈ Tє(n) } =
≐
󵠉 p X (x̃i )pY ( ỹi )
󵠈
n
(x̃󰑛 , ỹ󰑛 )∈T󰜖(󰑛) i=1
󵠈
2−nH(X) 2−nH(Y)
(x̃󰑛 , ỹ󰑛 )∈T󰜖(󰑛)
≐ | Tє(n) |2−nH(X) 2−nH(Y)
≐ 2nH(X,Y) 2−nH(X) 2−nH(Y)
≐ 2−nI(X;Y) .
Solution:
(a)
i.Since (x n , y n ) ∈ Tє(n) (X, Y), we have |π(x, y|x n , y n )| ≤ єp(x, y) ∀(x, y) ∈
(X, Y)
⇒ ∑ y∈Y |π(x, y|x n , y n ) − p(x, y)| ≤ ∑ y∈Y єp(x, y) ∀(x, y) ∈ (X, Y)
⇒ |π(x|x n ) − p(x)| ≤ єp(x)
⇒ x n ∈ Tє(n) (X).
Similarly, we can prove y n ∈ Tє(n) (Y).
ii.Since x n ∈ Tє(n) (X), we have |π(x|x n ) − p(x)| ≤ єp(x) ∀x ∈ X.
By setting (x) = −lo PX (x) in the Typical Average Lemma, we have
(1 − є)E(−lo PX (x)) ≤
⇒ (1 − є)H(X) ≤
1 n
󵠈 −lo PX (xi ) ≤ (1 + є)E(−lo PX (x))
n i=1
1
󵠈 −lo PX (xi ) ≤ (1 + є)H(X).
n i=1
(n)
Since p(x n ) = ∏ni=1 PX (xi ), we have (1 − є)H(X) ≤ − n1 P(x n ) ≤ (1 +
є)H(X).
Thus, 2−n(H(X)+δ(є)) ≤ p(x n ) ≤ 2−n(H(X)−δ(є)) where δ(є) = єH(X).
Similarly, we have 2−n(H(Y)+δ(є)) ≤ p(y n ) ≤ 2−n(H(Y)−δ(є)) where δ(є) =
єH(Y)
iii.From next property, we have 2−n(H(X,Y)+єH(X,Y)) ≤ p(x n , y n ) ≤ 2−n(H(X,Y)−єH(X,Y)) .
From the first property, we have 2n(H(Y)−єH(Y)) ≤ p(y1 󰑛 ) ≤ 2n(H(Y)+єH(Y)) .
From above, we have 2−n(H(X|Y)+єH(X|Y)) ≤
p(x 󰑛 , y 󰑛 )
p(y 󰑛 )
−n(H(X|Y)−δ(є))
≤ 2−n(H(X|Y)−єH(X|Y)) .
Thus, 2−n(H(X|Y)+δ(є)) ≤ p(x n |y n ) ≤ 2
where δ(є) = єH(X|Y).
n
n
(n)
iv.Since (x , y ) ∈ Tє (X, Y), we have (1 − є)p(x, y) ≤ π(x, y|x n , y n ) ≤
(1 + є)p(x, y) ∀(x, y) ∈ X × Y.
⇒
󵠈
(x, y)∈X×Y
(1 − є)p(x, y)(x, y) ≤
⇒ (1 − є)E((x, y)) ≤
󵠈
(x, y)∈X×Y
π(x, y|x n , y n )(x, y) ≤
1 n
󵠈 (xi , yi ) ≤ (1 + є)E((x, y)).
n i=1
󵠈
(x, y)∈X×Y
(1 + є)p(x, y)(x, y).
Solutions to Selected Problems in Chapter 2
5
Let g(x,y) = -log p(x,y).
Then (1−є)E(−lo p(x, y)) ≤ − n1 ∑ni=1 lo p(xi , yi ) ≤ (1+є)E(−lo p(x, y)).
⇒ 2−n(H(X,Y)+єH(X,Y)) ≤ p(x n , y n ) ≤ 2−n(H(X,Y)−єH(X,Y)) .
(b) From the lower bound of p(x n , y n )
󵠈
1=
(x 󰑛 , y 󰑛 )∈X 󰑛 ×Y 󰑛
󵠈
≥
p(x n , y n )
(x 󰑛 , y 󰑛 )∈T󰜖(󰑛) (X,Y)
p(x n , y n )
≥ | Tє(n) (X, Y)|2−n(H(X,Y)+єH(X,Y)) .
Thus, |Tє(n) (X, Y)| ≤ 2nH(X,Y)+δ(є) where δ(є) = єH(X, Y).
For the other direction,
1 − є ≤ p{(x n , y n ) ∈ Tє(n) (X, Y)}
≤ |Tє(n) (X, Y)|2−n(H(X,Y)+єH(X,Y)) .
Thus, |Tє(n) (X, Y)| ≥ (1 − є)2n(H(X,Y)−єH(X,Y)) = 2n(H(X,Y)−δ(є)) where δ(є) =
єH(X, Y) − n1 lo (1 − є).
Therefore, from above |Tє(n) (X, Y)| ≐ 2n(H(X,Y)) .
(c) Since
1 = 󵠈 p(y n |x n )
≥
y 󰑛 ∈Y 󰑛
󵠈
y 󰑛 ∈T󰜖(󰑛) (Y)
p(y n |x n )
≥ | Tє(n) (Y |x n )|2n(H(Y|X)−єH(Y|X))
, we have |Tє(n) (Y|X)| ≤ 2n(H(Y|X)+δ(є)) where δ(є) = єH(Y|X).
(d) Since Y = (X), we have p(x, y) = p(x) if y = (x) and p(x, y) = 0 otherwise.
If yi = (x1 ) for i ∈ [1 : n], then |π(x, y|x n , y n ) − p(x, y)| = |π(x|x n ) − p(x)| ≤
єp(x) = єp(x, y).
If y n ∈ Tє(n) and if ∃i ∈ [1 : n] such that yi ̸= (xi ). |π(xi , yi )|x n , y n ) −
p(x, y)| ≤ єp(xi , yi )
⇒ |π(xi , yi )|x n , y n )| ≤ 0.
⇒ yi = (xi )∀i ∈ [1 : n].
Hence, from above, y n ∈ Tє(n) (Y|x n ) iff yi = (xi ) for i ∈ [1 : n].
6
Solutions to Selected Problems in Chapter 2
n
󳰀
(e) Since x n ∈ Tє(n)
∀x ∈ X.
󳰀 (X), we have |π(x|x ) − p(x)| < є p(x)
n
n n
By Law of Large numbers and Y ∼ p(y |x ) ∼ ∏ni=1 pY|X (yi |xi ), we have
π(y|y n ) → p(y|x) Further, if we fix the positions that all the xi in the positions
equal x, the distribution of yi on such positions will also converge to p(y|x)
for large enough n. It means that for n large enough, ∃є 󳰀󳰀 such that
󵄨󵄨
󵄨󵄨 π(x , y )|x n , y n )
i
i
󵄨󵄨
− p(y|x)󵄨󵄨󵄨󵄨 ≤ є 󳰀󳰀 p(y|x).
󵄨󵄨
n
󵄨󵄨
󵄨󵄨
π(x|x )
π(xi , yi )|x n , y n )
≤ (1 + є 󳰀 )(1 + є 󳰀󳰀 )p(x)p(y|x).
π(x|x n )
⇒ (1 − є)p(x, y) ≤ π(xi , yi )|x n , y n ) ≤ (1 + є)p(x, y) where є = є 󳰀 + є 󳰀󳰀 − є 󳰀 є 󳰀󳰀 > є 󳰀 .
⇒ (1 − є 󳰀 )(1 − є 󳰀󳰀 )p(x)p(y|x) ≤ π(x |x n )
⇒ lim P(x n , y n ) ∈ Tє(n) (X, Y) = 1.
n→∞
(f) If x n ∈ Tєn󳰀 and є 󳰀 < є, by Conditional Typicality Lemma,
1 − є < P{(x n , y n ) ∈ Tє(n) (X, Y)}
= P{y n ∈ Tє(n) (Y |x n )}
≤ | Tє(n) (Y |x n )|2nH(Y|X)−δ(є) .
..
⇒ |Tє(n) (Y|x n )| ≥ (1 − є)2n(H(Y|X)−δ(є)) .
Inequalities. Label each of the following statements with =, ≤, or ≥. Justify each
answer.
(a) H(X|Z) vs. H(X|Y) + H(Y|Z).
(b) h(X + Y) vs. h(X), if X and Y are independent continuous random variables.
(c) h(X + aY) vs. h(X + Y), if Y ∼ N(0, 1) is independent of X and a ≥ 1.
(d) I(X 2 ; Y 2 ) vs. I(X1 ; Y1 ) + I(X2 ; Y2 ), if p(y 2 |x 2 ) = p(y1 |x1 )p(y2 |x2 ).
(e) I(X 2 ; Y 2 ) vs. I(X1 ; Y1 ) + I(X2 ; Y2 ), if p(x 2 ) = p(x1 )p(x2 ).
(f) I(aX+Y ; bX) vs. I(X+Y/a; X), if Y ∼ N(0, 1) is independent of X and a, b ̸= 0.
Solution:
(a) Consider
H(X |Z) ≤ H(X, Y |Z) = H(Y |Z) + H(X |Y , Z) ≤ H(Y |Z) + H(X |Y).
(b) Consider
h(X + Y) ≥ h(X + Y |Y) = h(X |Y) = h(X).
(c) Let aY = Y1 + Y2 , where Y1 ∼ N(0, 1) and Y2 ∼ N(0, a2 − 1) are independent.
Then from part (b), h(X + aY) = h(X + Y1 + Y2 ) ≥ h(X + Y1 ) = h(X + Y).
Solutions to Selected Problems in Chapter 2
7
(d) Since Y1 → X1 → X2 → Y2 ,
I(X1 , X2 ; Y1 , Y2 ) = H(Y1 , Y2 ) − H(Y1 , Y2 | X1 , X2 )
= H(Y1 , Y2 ) − H(Y1 | X1 , X2 ) − H(Y2 | X1 , X2 , Y1 )
= H(Y1 , Y2 ) − H(Y1 | X1 ) − H(Y2 | X2 )
= I(X1 ; Y1 ) + I(X2 ; Y2 ) − I(Y1 ; Y2 )
≤ I(X1 ; Y1 ) + I(X2 ; Y2 ).
(e) Since X1 and X2 are independent,
I(X1 , X2 ; Y1 , Y2 ) = H(X1 , X2 ) − H(X1 , X2 |Y1 , Y2 )
= H(X1 ) + H(X2 ) − H(X1 |Y1 , Y2 ) − H(X2 | X1 , Y1 , Y2 )
= I(X1 ; Y1 , Y2 ) + I(X2 ; X1 , Y1 , Y2 )
(f) Since a, b ̸= 0,
≥ I(X1 ; Y1 ) + I(X2 ; Y2 ).
I(aX + Y ; bX) = h(aX + Y) − h(aX + Y |bX)
= h(aX + Y) − h(Y)
󵄨󵄨 1 󵄨󵄨
󵄨󵄨 1 󵄨󵄨
= h(aX + Y) + log 󵄨󵄨󵄨 󵄨󵄨󵄨 − h(Y) + log 󵄨󵄨󵄨 󵄨󵄨󵄨
󵄨󵄨 a 󵄨󵄨
󵄨󵄨 a 󵄨󵄨
Y
Y
= h 󶀤X + 󶀴 − h 󶀤 󶀴
a
a
Y
= I 󶀤X + ; X󶀴 .
a
..
Mrs. Gerber’s Lemma. Let H −1 : [0, 1] → [0, 1/2] be the inverse of the binary
entropy function.
(a) Show that H(H −1 (u) ∗ p) is convex in u for every p ∈ [0, 1].
(b) Use part (a) to prove the scalar MGL
H −1 (H(Y |U )) ≥ H −1 (H(X |U )) ∗ p.
(c) Use part (b) and induction to prove the vector MGL
H −1 󶀥
H(Y n |U )
H(X n |U )
󶀵 ≥ H −1 󶀥
󶀵 ∗ p.
n
n
Solution:
(a) We have the following chain of inequalities:
H(Y |U ) = H(X + Z |U )
= EU [H(X + Z |U = u)]
= EU [H(H (H(X |U = u)) ∗ p)]
−1
≥ H(H (EU (H(X |U = u))) ∗ p)
−1
= H(H (H(X |U )) ∗ p),
−1
(.)
(.)
(.)
8
Solutions to Selected Problems in Chapter 2
where (.) follows from the definition of conditional entropy, (.) follows
from the fact that Z ∼ Bern(p) is independent of (X, U ), and finally (.) is obtained using the convexity of H(H −1 (u) ∗ p) in u. Since H −1 : [0, 1] → [0, 1/2]
is an increasing function, by taking H −1 to both side of the above inequality,
we have
H −1 (H(Y |U )) ≥ H −1 (H(X |U )) ∗ p.
(b) We use induction to show the inequality in the part (b). The base case when
n = 1 follows from the part (a). Assume the inequality holds for n − 1. Then
we have the following chain of inequalities:
n−1
H(Y n |U ) H(Y n−1 |U ) H(Yn |Y1 , U )
=
+
n
n
n
n−1
H(Y n−1 |U ) n − 1 H(Yn |Y1 , U )
⋅
+
=
n−1
n
n
n−1
n−1
H(X
|U
)
n − 1 H(Yn |Y1 , U )
≥ H 󶀦H −1 󶀦
󶀶 ∗ p󶀶 ⋅
+
) (.)
n−1
n
n
≥ H 󶀦H −1 󶀦
= H 󶀦H −1 󶀦
≥ H 󶀦H −1 󶀦
n−1
n−1
H(X n−1 |U )
n − 1 H(Xn + Zn |Y1 , X1 , U )
󶀶 ∗ p󶀶 ⋅
+
)
n−1
n
n
(.)
n−1
n − 1 H(Xn + Zn |X1 , U )
H(X n−1 |U )
󶀶 ∗ p󶀶 ⋅
+
)
n−1
n
n
(.)
−1
n−1
H(X n−1 |U )
n − 1 H(H (H(Xn |X1 , U )) ∗ p)
󶀶 ∗ p󶀶 ⋅
+
)
n−1
n
n
(.)
n−1
H(X n−1 |U ) n − 1 H(Xn |X1 , U )
⋅
+
󶀶 ∗ p󶀶 (.)
n−1
n
n
H(X n |U )
= H 󶀥H −1 󶀥
󶀵 ∗ p󶀵 .
n
≥ H 󶀦H −1 󶀦
Here (.) follows from the induction hypothesis; (.) follows from the fact
that conditioning reduces entropy; (.) follows from the fact that Xn + Zn
is independent of Y1n−1 conditioned on X1n−1 , U ; (.) is obtained by part (a);
finally, (.) follows from the fact that H(H −1 (u) ∗ p) is convex in u.
By taking H −1 to both sides of the above inequality, we have
H −1 󶀥
..
H(Y n |U )
H(X n |U )
󶀵 ≥ H −1 󶀥
󶀵 ∗ p.
n
n
Differential entropy and MSE. Let (X, Y) = (X n , Y k ) ∼ f (x n , y k ) be a pair of
random sequences with covariance matrices KX and KY , respectively, and crosŝ be an estimate of X
covariance matrix KXY = E[(X − E(X))(Y − E(Y))T ]. Let X
̂
given Y and K be the covariance matrix of the error (X − X).
Solutions to Selected Problems in Chapter 2
9
(a) Using the fact that for any U ∼ f (un ),
h(U) ≤
show that
h(X|Y) ≤
1
log 󶀡(2πe)n |KU |󶀱 ,
2
1
log 󶀡(2πe)n |KX|Y |󶀱 .
2
(b) In particular, show that
h(X|Y) ≤
1
log 󶀢(2πe)n |KX − KXY KY−1 KYX |󶀲 .
2
Solution:
̂ is a function of Y,
(a) Since X
̂
h(X|Y) = h(X − X|Y)
̂
≤ h(X − X)
≤
1
log((2πe)n |K |).
2
̂ be the minimum mean square linear estimator of X given Y; i.e. X
̂ =
(b) Let X
−1
KXY KY Y (assuming zero means for both X and Y). Then
h(X|Y) ≤
=
..
1
log((2πe)n |K |)
2
1
log 󶀢(2πe)n |KX − KXY KY−1 KYX |󶀲 .
2
Maximum differential entropy. Let X ∼ f (x) be a zero-mean random variable and
X ∗ ∼ f (x ∗ ) be a zero-mean Gaussian random variable with the same variance as
X.
(a) Show that
− 󵐐 f (x) log f X ∗ (x)dx = − 󵐐 f X ∗ (x) log f X ∗ (x)dx = h(X ∗ ).
(b) Using part (a) and the nonnegativity of the relative entropy, conclude that
h(X) = −D( f X | | f X ∗ ) − 󵐐 f (x) log f X ∗ (x)dx ≤ h(X ∗ )
with equality iff X is Gaussian.
(c) Following similar steps, show that if X ∼ f (x) is a zero-mean random vector and X∗ ∼ f (x∗ ) is a zero-mean Gaussian random vector with the same
covariance matrix, then
h(X) ≤ h(X∗ )
with equality iff X is Gaussian.
10
Solutions to Selected Problems in Chapter 2
Solution:
(a) Let X ∗ ∼ N(0, σ 2 ). Since f X ∗ (x) =
1
󵀂2πσ 2
exp(− 2σx 2 ), we have
2
1
x2
− 󵐐 f X (x) log f X ∗ (x)dx = − 󵐐 f X (x) 󶀦− log(2πσ 2 ) − 2 󶀶 dx
2
2σ
x2
1
= − 󵐐 f X ∗ (x) 󶀦− log(2πσ 2 ) − 2 󶀶 dx
2
2σ
(a)
= − 󵐐 f X ∗ (x) log f X ∗ (x)dx = h(X ∗ ),
where equality (a) follows since E(X ∗ )2 = EX 2 .
(b) By the nonnegativity of relative entropy,
h(X) = − 󵐐 f X (x) log f X (x)dx
f X (x)
+ log f X ∗ (x)󶀵 dx
f X ∗ (x)
= − 󵐐 f X (x) 󶀥log
= −D( f X | | f X ∗ ) − 󵐐 f X (x) log f X ∗ (x)dx
≤ h(X ∗ ).
(c) Let X∗ ∼ N(0, K). Since fX∗ (x) =
1
󰑁
1
(2π) 2 |K| 2
exp(− 12 xT K−1 x), we have
− 󵐐 fX (x) log fX∗ (x)dx = − 󵐐 fX (x) 󶀣− log(2π) 2 |K| 2 − xT K−1 x󶀳
󰑁
1
= − 󵐐 fX∗ (x) 󶀣− log(2π) 2 |K| 2 − xT K−1 x󶀳
󰑁
1
= − 󵐐 fX∗ (x) log fX∗ (x)dx.
Hence, following similar steps as in (b), we have
h(X) = −D( fX | | fX∗ ) + h(X∗ ) ≤ h(X∗ ).
..
Maximum conditional differential entropy. Let (X, Y) = (X n , Y k ) ∼ f (x n , y k ) be
a pair of random sequences with covariance matrices KX = E[(X − E(X))(X −
E(X))T ] and KY = E[(Y − E(Y))(Y − E(Y))T ], respectively, and cross-covariance
matrix KXY = E[(X− E(X))(Y− E(Y))T ]. Let KX|Y = E[(X− E(X|Y))(X− E(X|Y))T ]
be the covariance matrix of the error vector of the minimum mean square error
(MMSE) estimate of X given Y.
(a) Show that
h(X|Y) ≤
1
log 󶀡(2πe)n |KX|Y |󶀱
2
with equality if (X, Y) are jointly Gaussian.
Solutions to Selected Problems in Chapter 2
(b) Show that
h(X|Y) ≤
11
1
log 󶀢(2πe)n |KX − KXY KY−1 KYX |󶀲
2
with equality if (X, Y) are jointly Gaussian.
Solution:
(a) Consider
h(X |Y) = h(X − E(X |Y)|Y) ≤ h(X − E(X |Y)) ≤
1
log(2πe)n |K X|Y |
2
with equality if (X, Y) are jointly Gaussian, where the last inequality follows
from Problem .
(b) Let X̂ = aY the linear MMSE estimator of X given Y. Then,
h(X |Y) = h(X − aY |Y)
≤ h(X − aY)
1
log(2πe)n |E[(X − aY)(X − aY)T ]|
2
1
≤ log(2πe)n |K X − K XY KY −1 KY X |
2
≤
..
with equality if (X, Y) are jointly Gaussian.
Hadamard inequality. Let Y n ∼ N(0, K). Use the fact that
h(Y n ) ≤
1
log 󶀧(2πe)n 󵠉 Kii 󶀷
2
i=1
n
to prove Hadamard’s inequality
det(K) ≤ 󵠉 Kii .
n
i=1
Solution: Hadamard’s inequality follows immediately since
1
1
log(2πe)n |K | = h(Y n ) ≤ log 󶀧(2πe)n 󵠉 Kii 󶀷 .
2
2
i=1
n
.. Conditional entropy power inequality. Let X ∼ f (x) and Z ∼ f (z) be independent
random variables and Y = X + Z. Then by the EPI,
22h(Y) ≥ 22h(X) + 22h(Z)
with equality iff both X and Z are Gaussian.
(a) Show that log(2x + 2 y ) is convex in (x, y).
12
Solutions to Selected Problems in Chapter 2
(b) Let X n and Z n be conditionally independent given an arbitrary random variable U , with conditional densities f (x n |u) and f (z n |u), respectively. Use part
(a), the scalar EPI, and induction to prove the conditional EPI
22h(Y
󰑛
≥ 22h(X
|U )/n
󰑛
|U )/n
+ 22h(Z
󰑛
|U )/n
.
Solution:
(a) The Hessian matrix of (u, 󰑣) = log(2u + 2󰑣 ) is
∇ (u, 󰑣) = 󶀨
2
󰜕2 
󰜕u󰜕󰑣
󰜕2 
(󰜕󰑣)2
󰑣+u
󰜕2 
(󰜕u)2
󰜕2 
󰜕󰑣󰜕u
= (ln 2)2
󶀸
2
1 −1
󶀶.
󶀦
(2u + 2󰑣 )2 −1 1
Since ∇2 (u, 󰑣) is positive semidefinite, (u, 󰑣) is convex.
(b) For n = 1, the result is immediate from the scalar EPI. We use induction to
prove the vector case. Assuming the vector EPI holds for n − 1, we will show it
also holds for n using part (a) and the scalar EPI. Consider
n − 1 2h(Y n−1 ) 1
⋅
+ ⋅ 2h(Yn |Y n−1 )
n
n−1
n
2󰑕(󰑍 󰑛−1 )
2󰑕(󰑋 󰑛−1 )
n−1
1
≥
log(2 (󰑛−1) + 2 (󰑛−1) ) + 2h(Yn |Y n−1 ),
n
n
2h(Y n )/n =
where the inequality comes from the induction assumption. If we show that
2h(Yn |Y n−1 ) ≥ log(2h(X󰑛 |X
󰑛−1
)
+ 2h(Z󰑛 |Z
󰑛−1
)
),
(.)
then combining the two inequalities we have
2󰑕(󰑋 󰑛−1 )
2󰑕(󰑍 󰑛−1 )
󰑛−1
󰑛−1
1
n−1
log(2 (󰑛−1) + 2 (󰑛−1) ) + log(2h(X󰑛 |X ) + 2h(Z󰑛 |Z ) )
n
n
2h(X 󰑛 )/n
2h(Z 󰑛 )/n
≥ log(2
+2
),
(.)
2h(Y n )/n ≥
where the last inequality comes from the convexity of f (u, 󰑣). Now it remains
to show inequality (.).
2h(Yn |Y n−1 ) ≥ 2h(Yn | X n−1 , Z n−1 )
= E(2h(Yn | X
n−1
≥ E(log 2h(X󰑛 |X
= E(log 2h(X󰑛 |X
≥ log 2h(X󰑛 |X
where
󰑛−1
󰑛−1
󰑛−1
)
=x
n−1
=x
󰑛−1
=x
󰑛−1
,Z
)
,Z
󰑛−1
n−1
=z
󰑛−1
)
=z
+ 2h(Z󰑛 |Z
)
,
(.)
))
+ 2h(Z󰑛 |X
+ 2h(Z󰑛 |Z
󰑛−1
n−1
󰑛−1
=z
󰑛−1
)
)
󰑛−1
=x 󰑛−1 ,Z 󰑛−1 =z 󰑛−1 )
) (.)
(.)
Solutions to Selected Problems in Chapter 2
13
∙inequality (.) comes from data processing inequality,
∙inequality (.) comes from the scalar EPI, and
∙inequality (.) comes from the convexity of (u, 󰑣).
Now assume for n − 1, equality happens in all the above inequalities iff X n−1
and Z n−1 are Gaussian with K X(n−1) = aKZ(n−1) . We show that the same is true
for n. First, we find the equality condition for each of the above inequalities.
Equality happens
∙in (.) iff
h(X 󰑛−1 )−h(Z 󰑛−1 )
n−1
= h(Xn |X n−1 ) − h(Zn |Z n−1 ),
∙in (.) iff if the distribution of Yn depends only on X n−1 and Z n−1 only
through the sum X n−1 + Z n−1 which implies
E(Yn | X n−1 + Z n−1 ) = E(Xn + Zn | X n−1 + Z n−1 )
= E(Xn | X n−1 ) + E(Xn | X n−1 )
has to be a function of (X n−1 + Z n−1 ), which happens only if E(Xn |X n−1 )
and E(Zn |Z n−1 ) are the same linear function up to some additive constant.
∙in (.) iff the distributions F(xn |x n−1 ) and F(zn |z n−1 ) are Gaussian,
∙in (.) iff Var(Xn |X n−1 = x n−1 ), Var(Zn |Z n−1 = z n−1 ) are independent of
x n−1 and z n−1 , respectively.
Note that if X n and Z n are Gaussian with K X = aKZ , then all the conditions
are satisfied.
To prove the necessity we can see that from the equality conditions in (.),
(.), (.), X n and Z n must be Gaussian. To show K Xn = aKZn , denote
K Xn = 󶀦
A (n−1)×(n−1)
B1×(n−1)
KZn = 󶀦
à (n−1)×(n−1)
B̃1×(n−1)
T
B(n−1)×1
󶀶
C1×1
T
B̃(n−1)×1
󶀶.
C̃1×1
̃ for some a. Since
Now by the induction hypothesis, A = K Xn−1 = aKZn−1 = a A,
n
n
X and Z are Gaussian we have
E(Xn | X n−1 ) = BA−1 X n−1
E(Zn |Z n−1 ) = B 󳰀 (A󳰀 )−1 Z n−1 ,
and from the equality constraint in (.) and the fact that à = aA, we can
conclude B̃ = aB. Lastly from the equality condition in (.) we conclude
that C̃ = aC, and hence K Xn = aKZn .
14
Solutions to Selected Problems in Chapter 2
..
Entropy rate of a stationary source. Let X = {Xi } be a discrete stationary random
process.
(a) Show that
H(X n ) H(X n−1 )
≤
n
n−1
for n = 2, 3, . . . .
(b) Conclude that the entropy rate
H(X n )
n→∞
n
H(X) = lim
is well-defined.
(c) Show that for a continuous stationary ergodic process Y = {Yi },
h(Y n ) h(Y n−1 )
≤
n
n−1
for n = 2, 3, . . . .
Solution:
(a) We first show that H(Xn |X n−1 ) is decreasing in n. By stationarity, we have
H(Xn+1 | X n ) ≤ H(Xn+1 | X2n )
= H(Xn | X n−1 ).
Now we have
H(X n ) H(X n−1 ) + H(Xn |X n−1 )
=
n
n
n−1
H(X ) 1 H(X n−1 )
=
− 󶀦
− H(Xn | X n−1 )󶀶
n−1
n
n−1
=
≤
=
(b) Since
n.
H(X 󰑛 )
n
n−1
1
H(X n−1 ) 1
− 󶀧
󶀧 󵠈 H(Xi | X i−1 )󶀷 − H(Xn | X n−1 )󶀷
n−1
n n − 1 i=1
n−1
1
H(X n−1 ) 1
− 󶀧
󶀧 󵠈 H(Xn | X n−1 )󶀷 − H(Xn | X n−1 )󶀷
n−1
n n − 1 i=1
H(X n−1 )
.
n−1
is decreasing in n, the limit H(X ) = limn→∞
H(X 󰑛 )
n
≤
H(X 󰑛 )
n
for all
.. Worst noise for estimation. Let X ∼ N(0, P) and Z be independent of X with zero
mean and variance N. Show that the minimum MSE of estimating X given X + Z
is upper bounded as
E󶁡(X − E(X | X + Z))2 󶁱 ≤
PN
P+N
Solutions to Selected Problems in Chapter 2
15
with equality if Z is Gaussian. Thus, Gaussian noise is the worst noise if the input
to the channel is Gaussian.
Solution: Since the (nonlinear) MMSE is upper bounded by the linear MMSE,
E[(X−E(X | X+Z))2 ] ≤ E[(X−
2
N
P
PN
P
(X+Z))2 ] = E 󶁥󶀤
X+
Z󶀴 󶁵 =
P+N
P+N
P+N
P+N
with equality if Z is Gaussian.
.. Worst noise for information. Let X and Z be independent, zero-mean random
variables with variances P and Q, respectively.
(a) Show that
h(X | X + Z) ≤
2πePQ
1
log 󶀥
󶀵
2
P +Q
with equality iff both X and Z are Gaussian. (Hint: Use the maximum differential entropy lemma or the EPI or Problem ..)
(b) Let X ∗ and Z ∗ be independent zero-mean Gaussian random variables with
variances P and Q, respectively. Use part (a) to show that
I(X ∗ ; X ∗ + Z ∗ ) ≤ I(X ∗ ; X ∗ + Z)
with equality iff Z is Gaussian.
Solution:
(a) By problem , we have
h(X | X + Z) ≤
1
PQ
1
lo (2πeE[(X − E(X | X + Z))2 ]) ≤ 󶀥2πe
󶀵
2
2
P +Q
with equality if both X and Z are Gaussian.
(b) We have
I(X ∗ ; X ∗ + Z) = h(X ∗ ) − h(X ∗ | X ∗ + Z)
1
1
PQ
≥ log(2πeP) − log(2πe
)
2
2
P +Q
= h(X ∗ ) − h(X ∗ | X ∗ + Z ∗ )
= I(X ∗ ; X ∗ + Z ∗ ).
.. Variations on the joint typicality lemma. Let (X, Y , Z) ∼ p(x, y, z) and 0 < є 󳰀 < є.
Prove the following statements.
(a) Let (X n , Y n ) ∼ ∏ni=1 p X,Y (xi , yi ) and Z̃ n ∼ ∏ni=1 pZ|X (̃zi |xi ), conditionally independent of Y n given X n . Then
P{(X n , Y n , Z n ) ∈ Tє(n) (X, Y , Z)} ≐ 2−nI(Y ;Z|X) .
16
Solutions to Selected Problems in Chapter 2
̃ n ∼ Unif(T (n) (Z|x n )). Then
(b) Let (x n , y n ) ∈ Tє(n)
󳰀 (X, Y) and Z
є
P{(x n , y n , Z̃ n ) ∈ Tє(n) (X, Y , Z)} ≐ 2−nI(Y ;Z|X) .
̃n be an arbitrary sequence, and Z̃ n ∼ p(̃z n |x n ), where
(c) Let x n ∈ Tє(n)
󳰀 (X), y
󶀂 󰑛 (󰑛) 󰑛
p(̃z |x ) = 󶀊 P{Z̃ ∈T󰜖 (Z|x )}
󶀚0
∏󰑛󰑖=1 p󰑍|󰑋 (̃z󰑖 |x󰑖 )
n
Then
n
if z̃n ∈ Tє(n) (Z|x n ),
otherwise.
P{(x n , ỹn , Z̃ n ) ∈ Tє(n) (X, Y , Z)} ≤ 2−n(I(Y ;Z|X)−δ(є)) .
Solution:
(a)
P(X n , Y n , Z n ) ∈ Tє(n) (X, Y , Z)) =
p(x n , y n )p(̃z n |x n )
󵠈
(x 󰑛 , y 󰑛 ,̃z 󰑛 )∈T󰜖(󰑛) (X,Y ,Z)
≐ | Tє(n) (X, Y , Z)|2−nH(X,Y) 2−nH(Z|X)
≐ 2nH(X,Y ,Z) 2−nH(X,Y) 2−nH(Z|X)
= 2−n(I(Y ;Z|X)) .
(b)
P(x n , y n , Z n ) ∈ Tє(n) (X, Y , Z)) =
=
󵠈
z̃󰑛 ∈T󰜖(󰑛) (Z|x 󰑛 , y 󰑛 )
p(̃z n |x n )
󵠈
1
|Tє(n) (Z|x n )|
z̃󰑛 ∈T󰜖(󰑛) (Z|x 󰑛 , y 󰑛 )
≐ 2H(Z|X,Y) 2−n(H(Z|X)
= 2−n(I(Y ;Z|X)) .
(c) Since P{̃z n ∈ Tє(n) (Z|x n )} ≥ 1 − є for n sufficiently large,
P(x n , ỹn , Z̃ n ) ∈ Tє(n) (X, Y , Z)) =
≤
≤
󵠈
z̃∈T󰜖(󰑛) (Z|x 󰑛 , y 󰑛 )
p(̃z n |x n )
󵠈
2−n(H(Z|X)−δ(є))
1−є
z̃∈T(Z|x 󰑛 , y 󰑛 )
1 nH(Z|X,Y) −n(H(Z|X)−δ(є)
2
2
1−є
≤ 2nH(Z|X,Y) 2−n(H(Z|X)−δ(є) .
Solutions to Selected Problems in Chapter 2
17
.. Jointly typical triples. Given (X, Y , Z) ∼ p(x, y, z), let
An = 󶁁(x n , y n , z n ) : (x n , y n ) ∈ Tє(n) (X, Y),
(y n , z n ) ∈ Tє(n) (Y , Z), (x n , z n ) ∈ Tє(n) (X, Z)󶁑.
(a) Show that |An | ≤ 2n(H(X,Y)+H(Y ,Z)+H(X,Z)+δ(є))/2 . (Hint: First show that |An | ≤
2n(H(X,Y)+H(Z|Y)+δ(є)) .)
(b) Does a corresponding lower bound hold?
Solution:
(a) Let
Bn := {(x n , y n , z n ) : (x n , y n ) ∈ Tє(n) (X, Y), (y n , z n ) ∈ Tє(n) (Y , Z)}.
Since An ⊆ Bn , |An | ≤ |Bn |.
Consider
| Bn | = |{(x n , y n , z n ) : y n ∈ Tє(n) (Y), x n ∈ Tє(n) (X | y n ), z n ∈ Tє(n) (Z | y n )}|
󵄨󵄨
󵄨󵄨
= 󵄨󵄨󵄨 󵠎 {(x n , y n , z n ) : x n ∈ Tє(n) (X | y n ), z n ∈ Tє(n) (Z | y n )}󵄨󵄨󵄨
󵄨󵄨 󰑛 (󰑛)
󵄨󵄨
y ∈T󰜖 (Y)
≤ | Tє(n) (Y)| | Tє(n) (X | y n )| | Tє(n) (Z | y n )|
≤ 2n(H(Y)+δ(є)) 2n(H(X|Y)+δ(є)) 2n(H(Z|Y)+δ(є))
= 2n(H(X,Y)+H(Z|Y)+3δ(є)) .
Similarly, by defining Cn := {(x n , y n , z n ) : (y n , z n ) ∈ Tє(n) (Y , Z), (x n , z n ) ∈
Tє(n) (X, Z)}, we have |An | ≤ |Cn | ≤ 2n(H(X,Z)+H(Y|Z)+3δ(є)) . Combining these
two upper bounds on |An | yields
| An | 2 ≤ 2n(H(X,Y)+H(Z|Y)+3δ(є)) 2n(H(X,Z)+H(Y|Z)+3δ(є))
= 2n(H(X,Y)+H(X,Z)+H(Z|Y)+H(Y|Z)+6δ(є) ,
which implies that
| An | ≤ 2n(H(X,Y)+H(X,Z)+H(Z|Y)+H(Y|Z)+6δ(є))/2
≤ 2n(H(X,Y)+H(X,Z)+H(Y ,Z)+6δ(є))/2
≤ 2n(H(X,Y)+H(X,Z)+H(Y ,Z)+δ (є))/2 .
󳰀
(b) This bound is not tight. For the random variables X, Y , Z always satisfying
X = Y = Z, we have |An | ≤ 2n(H(X)δ(є)) from the upper bound of the typical
3
set. However, the bound in the problem reduces to |An | ≤ 2n( 2 H(X)+δ(є)) .
.. є and є 󳰀 . Let (X, Y) be a pair of independent Bern(1/2) random variables. Let
k = ⌊(n/2)(1 + є)⌋ and x n be a binary sequence with k s followed by n − k s.
18
Solutions to Selected Problems in Chapter 2
(a) Check that x n ∈ Tє(n) (X).
(b) Let Y n be an i.i.d. Bern(1/2) sequence, independent of x n . Show that
P󶁁(x n , Y n ) ∈ Tє(n) (X, Y)󶁑 ≤ P 󶁇󵠈 Yi < (k + 1)/2󶁗 ,
k
i=1
which converges to 1/2 as n → ∞. Thus, the fact that x n ∈ Tє(n) (X) and Y n ∼
∏ni=1 pY|X (yi |xi ) does not necessarily imply that P󶁁(x n , Y n ) ∈ Tє(n) (X, Y)󶁑.
Remark: This problem illustrates that in general we need є > є 󳰀 in the conditional
typicality lemma.
Solution:
(a) Since π(1|x n ) =
know π(1|x n ) =
k
n
k
n
= 2 n
≤ 2 n = 12 (1 + є) = p X (1)(1 + є). We can also
1
≥ 2 (1 − є). Thus, x n ∈ Tє(n) (X).
⌊ 󰑛 (1+є)⌋
󰑛
(1+є)
(b) There should be approximately half ’s and half ’s in the first k bits and half ’s
and half ’s in the last (n − k) bits in y n so
k
n
n
n
n
n
P{(x n , y n ) ∈ Tє(n) (X, Y)} = P{ (1 − є) ≤ 󵠈 Yi ≤ (1 + є)}P{ (1 − є) ≤ 󵠈 Yi ≤ (1 + є)}
4
4
4
4
i=1
i=k+1
≤ P{󵠈 Yi ≤
k
i=1
≤ P{󵠈 <
k
i=1
n
(1 + є)}
4
k+1
}.
2
> n/2(1+є)
= n4 (1 + є) as n → ∞.
The last inequality comes from k+1
2
2
Since ∑ki=1 Yi < k+1
is approximately half ’s in the first k-bits, which has prob2
1
ability 2 . Thus, the fact that x n ∈ Tє(n) (X) and Y n ∼ ∏ni=1 pY|X (yi |xi ) doesn’t
necessarily imply that P(x n , Y n ) ∈ Tє(n) (X, Y) → 1.