Homework 2
Problem 2.10
(a)
Define a new random variable as follows:
E=
0
1
if X = X1 ,
if X = X2 .
(1)
Then Pr(E = 0) = α = 1 − Pr(E = 1) and H(E|X) = 0. We can get H(X) as
H(X)
=
H(X, E) − H(E|X)
(2)
=
H(X, E)
(3)
=
H(E) + H(X|E)
X
h(α) +
Pr(E = e)H(X|E = e)
(4)
=
(5)
e∈{0,1}
h(α) + αH(X1 ) + (1 − α)H(X2 )
=
(6)
(7)
4
where h(α) = −α log α − (1 − α) log(1 − α).
(b)
Start from Eqn.(6)
H(X) = h(α) + αH(X1 ) + (1 − α)H(X2 )
= −α log α − (1 − α) log(1 − α) + αH(X1 ) + (1 − α)H(X2 )
= −α log
α
2H(X1 )
− (1 − α) log
(1 − α)
2H(X2 )
1
≤ − log H(X )
1
2
+ 2H(X2 )
= log 2H(X1 ) + 2H(X2 )
(6)
(8)
(9)
(10)
(11)
where Eqn.(10) is due to log sum inequality. Then, Eqn.(11) yields
2H(X) ≤ 2H(X1 ) + 2H(X2 )
(12)
Problem 2.18
If A or B wins after Y = 5 turns, then the possible sequences of X are :
i.e., there are in total 2 ×
Therefore, P (Y = 5) = 2 ×
4
3
4
3
BAAAA, ABAAA, AABAA, AAABA,
(13)
ABBBB, BABBB, BBABB, BBBAB
(14)
sequences, and the probability of the event one of them happens is 1/25 .
× ( 21 )5 .
1
Then, we can calculate H(X), H(Y ), H(X|Y ) and H(Y |X) :
7
X
n
n
n−1
1
1
H(X) = −
2∗
log
3
2
2
n=4
(15)
≈ 5.813
(bits)
n n
n−1
1
n−1
1
log 2 ∗
H(Y ) = −
2∗
3
3
2
2
n=4
7
X
≈ 1.924
(bits)
(16)
(17)
(18)
H(Y |X) = 0
(19)
H(X|Y ) = H(X) + H(Y |X) − H(Y ) = 3.889
(bits)
(20)
Problem 2.26
(a)
Start from the Problem 2.13, i.e., ∀x > 0
1
x
(21)
1
≥1−x
x
(22)
ln x ≥ 1 −
which yields
− ln x = ln
i.e.,
ln x ≤ x − 1
(23)
where equality holds iff x = 1.
(b) If there exists x such that p(x) > 0 and q(x) = 0, by the convention (Textbook, Page 19), D(p||q) =
∞ > 0. Then we are done.
4
q(x)
Otherwise, let A = {x ∈ X : p(x) > 0}. Based on this definition, ∀x ∈ A, p(x)
is well-defined and
positive. Therefore, we can derive the upper bound −D(p||q) as
−D(p||q) =
X
p(x) ln
q(x)
p(x)
(24)
p(x) ln
q(x)
p(x)
(25)
p(x) ln
q(x)
p(x)
(26)
x∈X
=
X
x∈A
=
X
x∈A
≤
X
x∈A
=
X
q(x)
p(x)
−1
p(x)
[q(x) − p(x)]
(27)
(28)
x∈A
=
X
q(x) −
x∈A
=
X
X
p(x)
(29)
x∈X
q(x) − 1
(30)
x∈A
≤0
(31)
P
P
The inequality (27) holds iff p(x) = q(x) for all x ∈ A. Since 1 =
x∈A p(x) =
x∈X p(x) =
x∈X q(x), the condition p(x) = q(x) for all x ∈ A implies that q(x) = 0 for all x ∈ X \A, which is the
sufficient and necessary condition when the inequality (31) holds. In sum, p = q.
(c)
P
2
Problem 2.29
(1)
Start from the left hand side (LHS)
LHS = H(X|Z) + H(Y |X, Z)
(32)
≥ H(X|Z) = RHS
(33)
where the equality holds iff Y is a deterministic function of X, Z.
(2)
Similarly, start from the left hand side (LHS)
LHS = I(X; Z) + I(Y ; Z|X)
(34)
≥ I(X; Z) = RHS
(35)
where the equality holds iff Z → X → Y forms a Markov chain.
(3)
Start from the left hand side (LHS)
LHS = H(Z|X, Y )
(36)
= H(Z|X) − I(Z; Y |X)
(37)
≤ H(Z|X)
(38)
= RHS
(39)
where the equality holds iff Z → X → Y forms a Markov chain.
(4)
This is actually an equality.
LHS = I(X, Y ; Z) − I(Z; Y )
(40)
= I(X; Z) + I(Y ; Z|X) − I(Z; Y )
(41)
= RHS
(42)
Problem 2.30
(a)
The formal way is based on Lagrangian multiplier method, which is pretty straight-forward.
(b) Here we give another approach. Let q(n) = abn for n ≥ 0. Then, by the nonnegative property of
information divergence,
D(p||q) =
X
p(n) log
n
p(n)
≥0
q(n)
(43)
X
(44)
which means
H(X) = −
X
p(n) log p(n) ≤ −
X
n
p(n) log q(n) = −
n
p(n) log (abn ) = − log a − A log b
n
which is clearly an upper bound (and is a constant). Equality holds iff p = q and q is a pmf.
The next step is to find such a pmf achieving the upper bound. By solving the following equation array,
X
a
abn =
1=
1−b
n
(45)
X
ab
n
nab =
A=
2
(1 − b)
n
which gives the optimal pmf p∗ (n) =
1
1+A
A
1+A
n
for n ≥ 0 with max H(X) = − log
3
1
1+A
− A log
A
1+A .
Problem 3.3
Let Xi be the proportion of piece at time t. Note that the sequences {Xi } are i.i.d. with the distribution
2
3
3
1
Pr Xi =
=
and Pr Xi =
=
(46)
3
4
5
4
Qn
Then, after n cuts, the size of the case X is X = i=1 Xi . Take the log and use the law of large numbers
n
1
1X
log X =
log Xi
n
n i=1
i.e., the first order in exponent is
3
4
log
2
3
(47)
→ E [log Xi ]
2 1
3
3
= log + log
4
3 4
5
+ 41 log 35 n.
(48)
(49)
Problem 3.4
By AEP and the weak law of large numbers, both probabilities Pr(X n ∈ An ) and Pr(X n ∈ B n ) go to the
limit 1, i.e., for 0 > 0, Pr(X n ∈ An ) ≥ 1 − 0 and Pr(X n ∈ B n ) ≥ 1 − 0 for sufficient large n. So (a) is true.
For (b),
1 − Pr(X n ∈ An ∩ B n ) = Pr(X n ∈ An ∪ B n )
n
(50)
n
≤ Pr(X ∈ An ) + Pr(X ∈ B n )
4
≤ 0 + 0 = which implies that Pr(X n ∈ An ∩ B n ) → 1.
For (c), for all n
|An ∩ B n | ≤ |An | ≤ 2n(H+)
(51)
(52)
(53)
For (d), for sufficient large n, we have Eqn.(52), i.e.,
1 − ≤ Pr(X n ∈ An ∩ B n )
X
≤
Pr(xn )
(54)
(55)
xn ∈An ∩B n
X
2−n(H−)
(56)
= |An ∩ B n |2−n(H−)
(57)
≤
xn ∈An ∩B n
(58)
which means
|An ∩ B n | ≥ (1 − )2n(H−)
1
≥ 2n(H−)
2
(59)
(60)
where the last inequality is simply choosing < 1/2.
Problem II
4
4
Define p = P (X 6= Y ) and r = |X ||Y|. By Fano’s inequality,
H(X|Y ) ≤ H(p) + p log |X |
(61)
H(Y |X) ≤ H(p) + p log |Y|
(62)
4
one upper bound of ρ(X, Y ) is given as follows
ρ(X, Y ) ≤ 2H(p) + p log r
(63)
= −2p log p − 2(1 − p) log(1 − p) + p log r
1 =
− 2p ln p − 2(1 − p) ln(1 − p) + p ln r
ln 2
(64)
(65)
To find an explicit expression of p such that ρ(X, Y ) < , we need to get the upper bounds for each part of
Eqn.(65).
The third item p ln r is a linear function of p. The second item −2(1 − p) ln(1 − p) satisfies
1
1
≤ 2(1 − p)
− 1 = 2p
(66)
2(1 − p) ln
1−p
1−p
For the first item −2p ln p, we need the following inequality
√
−p ln p ≤ 2 p
(67)
There are several ways to prove this. One method based on Eqn.(23) is that for all x > 0
ln x ≤ x − 1
(68)
which is equivalent to say
1
≤
x
r
1 1
ln ≤
2 x
r
r
ln
1
−1
x
(69)
1
−1
x
(70)
i.e.,
which implies
r
1
x ln ≤ 2x
x
1
−1
x
!
√
≤2 x
(71)
In sum, the Eqn.(65) can be
1 − 2p ln p − 2(1 − p) ln(1 − p) + p ln r
ln 2
1
√
=
(4 p + 2p + p ln r)
ln 2
1
√
√
√
≤
(4 p + 2 p + ln r p)
ln 2
6 + ln r √
=
p
ln 2
ρ(X, Y ) =
(65)
(72)
(73)
(74)
(75)
where p < 1 guarantees p <
√
p. If we choose
p≤
ln 2
6 + ln r
2
(76)
then ρ(X, Y ) ≤ .
Problem III
Please note that the log function is the binary logarithm, which uses base 2.
(a) Let p, q be any general pmfs defined on a finite set X . Consider the set M = {x : p(x) ≥ q(x)}. Then
construct the Bernoulli pmfs p̃ = (p(M ), 1 − p(M )), q̃ = (q(M ), 1 − q(M )).
5
Firstly, we have
D(p||q) =
X
p(x) log
p(x)
q(x)
p(x) log
X
p(x)
p(x)
+
p(x) log
q(x)
q(x)
(78)
p(M )
1 − p(M )
+ [1 − p(M )] log
q(M )
1 − q(M )
(79)
x∈X
=
X
(77)
x6∈M
x∈M
≥ p(M ) log
= D(p̃||q̃)
(80)
where the inequality is due to log sum inequality.
On the other hand,
X
d(p, q) =
|p(x) − q(x)|
(81)
x∈X
X
=
|p(x) − q(x)| +
X
|p(x) − q(x)|
(82)
x6∈M
x∈M
= d(p̃, q̃)
(83)
Therefore, if we could prove
D(p̃||q̃) ≥
1 2
d (p̃, q̃)
2 ln 2
(84)
then
1 2
1 2
d (p̃, q̃) =
d (p, q)
2 ln 2
2 ln 2
The equivalent form of Eqn.(84) is shown as follows
D(p||q) ≥ D(p̃||q̃) ≥
p log
p
1−p
2
+ (1 − p) log
−
(p − q)2 ≥ 0
q
1−q
ln 2
(85)
(86)
where for convenience we change the notation (p̃, q̃) → (p, q).
It suffices to prove that for any 0 ≤ p, q ≤ 1 Eqn.(86) always satisfies. Fix p, denote
4
fp (q) = p log
1−p
2
p
+ (1 − p) log
−
(p − q)2
q
1−q
ln 2
(87)
and take the derivative
fp0 (q)
Since
1
q(1−q)
4
≥ 4, ∆ =
h
1
q(1−q)
1
p 1−p
=
− +
+ 4(p − q)
ln 2
q
1−q
q−p
1
=
−4
ln 2 q(1 − q)
q−p
=
∆
ln 2
(88)
(89)
(90)
i
− 4 ≥ 0, which implies that
0
fp (q) = 0
f 0 (q) ≥ 0
p0
fp (q) ≤ 0
if q = p
if q > p
if q < p
(91)
i.e., fp (q)q=p = 0 is a minimum point, which is valid for all p. Therefore, Eqn.(86) is true for all 0 ≤ p, q ≤ 1.
(b)
To find the p, q such that D(p||q)
d2 (p,q) −
1 2 ln 2 < for small > 0, it suffices to consider Bernoulli pmfs.
6
Note that, for any p 6= q, d(p, q) 6= 0. By Eqn.(84), it is equivalent to solve
0≤
After choosing q =
1
2
and p =
D(p||q)
1
−
<
d2 (p, q) 2 ln 2
1
2
(92)
+ δ, (δ > 0), it is easy to check that
1 − h 21 + δ
D(p||q)
1
1
lim 2
−
−
= lim
=0
(93)
δ→0 d (p, q)
2 ln 2 δ→0
4δ 2
2 ln 2
The idea is that, since limδ→0 1 − h 12 + δ = 0, we can use l’Hospital’s rule. The limit of the first item
1−h( 12 +δ )
D(p||q)
1
1 is
exactly
.
This
means
that
we
can
choose
sufficient
small
δ
to
make
−
2
2
4δ
ln 2
d (p,q)
2 ln 2 < .
7
© Copyright 2026 Paperzz