EE5139R: Problem Set 4
Assigned: 31/08/16, Due: 07/09/16
1. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set
Cn (t) = {xn : PX n (xn ) ≥ 2−nt }
(a) We have
1=
X
X
PX n (xn ) ≥
xn
X
PX n (xn ) ≥
xn ∈Cn (t)
2−nt = |Cn (t)|2−nt
xn ∈Cn (t)
from which the desired result follows since |Cn (t)| ≤ 2nt .
(b) We want to find the set of values t for which
Pr(X n ∈ Cn (t)) → 1.
Consider the probability therein:
n
−nt
n
Pr(X ∈ Cn (t)) = Pr PX n (X ) ≥ 2
!
n
1
1X
n
= Pr − log PX n (X ) ≤ t = Pr −
log PX (Xi ) ≤ t
n
n i=1
Now note that the mean of − n1 log PX (Xi ) is H(X) and this rv has finite variance. If t = H(X)+δ
for any δ > 0, then by the law of large numbers, the probability converges to one. Hence the
required set is the open interval (H(X), ∞).
2. (Optional) Cover and Thomas: Problem 3.7 AEP and Source Coding:
(a) The number of 100-bit binary sequences with three or fewer ones is:
100
100
100
100
+
+
+
= 1 + 100 + 4950 + 161700 = 166751
0
1
2
3
so the required codelength is dlog2 166751e = 18.
(b) The probability that a 100-bit sequence has three or fewer ones is:
3 X
100
i=0
i
0.0053 × 0.995100−i = 0.99833
Thus, the probability that the sequence that is generated cannot be encoded is ≈ 1 − 0.99833 =
0.00167.
(c) If Sn that is the sum of n iid random variables X1 , . . . Xn , Chebyshev’s inequality states that,
Pr(|Sn − nµ| ≥ ) ≤
1
nσ 2
2
where µ and σ 2 are the mean and variance of the Xi ’s. In this problem, n = 100, µ = 0.005, and
σ 2 = 0.005 × 0.995. Note that S100 ≥ 4 if and only if |S100 − 100 × 0.005| ≥ 3.5 so we should
choose = 3.5. Then
100 × 0.005 × 0.995
≈ 0.04061
3.52
Pr(S100 ≥ 4) ≤
This bound is much larger than the actual probability 0.00167.
3. Cover and Thomas: Problem 3.9 AEP and Divergence:
(a) We have that X1 , . . . , Xn are iid according to p(x). We are asked to evaluate the limit in probability of
1
L = − log q(X1 , . . . , Xn )
n
First note from memorylessness (independence) that
n
L=−
1X
log q(Xi )
n i=1
We may also write
n
L=
1X
p(Xi ) 1
log
− log p(Xi )
n i=1
q(Xi ) | n {z
}
|
{z
}
=L2
=L1
We know by the usual AEP that the second term L2 converges to H(p) = H(X) in probability.
The first term L1 converges to
E[L1 ] =
X
p(x) log
x
p(x)
= D(pkq)
q(x)
in probability so L converges to
E[L] = D(pkq) + H(p)
in probability.
(b) The limit of the log-likelihood ratio in probability is
X
q(X)
q(x)
E log
=
= −D(pkq).
p(x) log
p(X)
p(x)
x
4. From Last Year’s Exam:
(a) One value for c is
c=
H(X) + H(X 0 )
2
This is because
−
1
1
log Pr(X n = xn ) = −
n
n
n
X
log Pr(Xi = xi ) −
i=1,i odd
n
1 X
log Pr(Xi = xi )
n i=1,i even
By the law of large numbers, the first sum tends to H(X)/2 since the distribution of Xi for odd
i is pX while the second sum tends to H(X 0 )/2 since the distribution of Xi for even i is pX 0 .
2
(b) The optimal compression rate is c. We code as follows. If the encoder observes a sequence in
Tεn (c), represent it using c + ε + 2/n bits per symbol (with a single bit prefix to indicate that the
sequence belongs in the typical set). If the encoder observes a sequence not in Tεn (c), encoder it
with an arbitrary string of length nc + nε + 2. This ensures that the compression rate is no more
than c + ε + 2/n and the probability of decoding error goes to zero as n becomes large. Since
c + ε + 2/n is arbitrarily close to c, the claim is proved.
(c) The minimum compression rate is H(X|Y ). We code as follows assuming Y = {0, 1}. Let
I0 := {i = 1, . . . , n : yi = 0} be the indices in which the side information yi = 0 and let I1 :=
{1, . . . , n} \ I0 . Then we use an optimum length-|I0 | source code for the source Pr(X = x|Y = 0)
and an optimum length-|I1 | source code for the source Pr(X = x|Y = 1). The code rate is thus
|I1 |
|I0 |
H(X|Y = 0) +
H(X|Y = 1)
n
n
The decoder also knows the indices I0 and I1 and can partition its observations into these two
|I |
subblocks and decode per normal. However, by the law of large numbers, nj → pY (j) for j = 0, 1.
Thus, for large enough n with very high probability, the code rate above is
pY (0)H(X|Y = 0) + pY (1)H(X|Y = 1)
which is the conditional entropy H(X|Y ).
(d) The source has uncertainty or entropy H(X). The side information reduces the uncertainty from
H(X) to H(X|Y ). The difference is the mutual information I(X; Y ) which is how much reduction
in uncertainty of X given I know Y .
(e) Yes. The code rate will be increased to
available one-tenth of the time.
9
10 H(X)
+
1
10 H(X|Y
) since the side-information is only
5. (Optional) 2015/6 Quiz 1: (10 points) Weighted Source Coding: In class, we saw that the
minimum rate of compression for an i.i.d. source X n with distribution PX is
X
H(X) = −
PX (x) log PX (x).
x∈X
Now suppose that there are costs to encoding each symbol. Consider a cost function c : X → [0, ∞).
For any length-n string, let
n
Y
c(n) (xn ) :=
c(xi )
i=1
and let the “size” of any set A ⊂ X
n
be
c(n) (A) :=
X
c(n) (xn ).
xn ∈A
We say that a rate R is achievable if there exists a sequence (in n) of sets An ⊂ X n with “sizes”
c(n) (An ) that satisfy
1
log c(n) (An ) → R
n
Pr(X n ∈
/ An ) → 0
and
as
We also define the optimal weighted source coding rate to be
R∗ (X; c) := inf{R ∈ R : R is achievable}.
Define
H(P kc) :=
X
x∈X
3
PX (x) log
c(x)
,
PX (x)
n → ∞.
and for a small > 0, the set
B(n) (X; c) :=
xn : H(P kc) − ≤
c(n) (xn )
1
log
≤
H(P
kc)
+
.
n
PX n (xn )
(a) What is R∗ (X; c) when c(x) = 1 for all x ∈ X ? No justification needed, only a information
quantity needs to be stated.
The answer is H(X), the entropy.
(b) Now for general c : X → [0, ∞), is it true that
Pr(X n ∈ B(n) (X; c)) → 1,
n→∞?
as
Prove or argue why it is not true.
Yes it is true. By the law of large numbers. Consider
!
n
1 X
c(X
)
i
Pr(X n ∈ B(n) (X; c)) = Pr log
− H(P kc) ≤ n
PX (Xi )
i=1
n
!
1 X
c(Xi )
= 1 − Pr log
− H(P kc) > n
PX (Xi )
≥1−
i=1
Var(log Pc(X)
)
X (X)
2
n
→1
(c) Show carefully that
c(n) B(n) (X; c) ≤ 2n(H(P kc)+)
We have
c(n) B(n) (X; c) =
X
(n)
xn ∈B
(X;c)
X
≤
(n)
xn ∈B
c(n) (xn )
PX n (xn )2n(H(P kc)+)
(X;c)
n(H(P kc)+)
=2
X
(n)
xn ∈B
PX n (xn )
(X;c)
n(H(P kc)+)
≤2
(d) Using part (c), find the best possible upper bound for R∗ (X; c).
You need to prove an achievability result, i.e., specify the sets An and provide clear a reason for
your upper bound. You do not need to prove any converse.
An upper bound is
R∗ (X; c) ≤ H(P kc) + .
(n)
Take An := B (X; c).
6. Bad Huffman Codes: Which of these codes cannot be Huffman codes for any probability assignment?
(a) {0, 10, 11}.
Solution: {0, 10, 11} is a Huffman code for the distribution (1/2, 1/4, 1/4).
4
(b) {00, 01, 10, 110}.
Solution: {00, 01, 10, 110} is not a Huffman code because there is a unique longest codeword.
(c) {01, 10}.
Solution: The code {01, 10} can be shortened to {0, 1} without losing its instantaneous property,
and therefore is not optimal and not a Huffman code.
7. (Optional) Suffix-Free Codes: Define a suffix-free code as a code in which no codeword is a suffix
of any other codeword.
(a) Show that suffix-free codes are uniquely decodable. Use the definition of unique decodability,
rather than the intuitive but vague idea of decodability with initial synchronization.
Solution: Assume the contrary i.e. suffix-free codes are not uniquely decodable. Then there
must exist two distinct sequence of source letters, say, (x1 , x2 , . . . xn ) and (x01 , x02 , . . . x0m ) such
that,
C(x1 )C(x2 ) . . . C(xn ) = C(x01 )C(x02 ) . . . C(x0m )
Then one of the following must hold (i) C(xn ) = C(x0m ) or (ii) C(xn ) is a suffix of C(x0m ) or (ii)
C(x0m ) is a suffix of C(xn ). In the last two cases we arrive at a contradiction since our code is
suffix-free. In the first case, simply delete the the last two source letters from each sequence and
repeat the argument till one of the latter two cases holds and a contradiction is reached. Hence,
suffix-free codes are uniquely decodable.
Alternatively, the fact that the codes are uniquely decodable can be seen easily be reversing the
order of the code. For any received sequence, we work backwards from the end, and look for
the reversed codewords. Since the codewords satisfy the suffix condition, the reversed codewords
satisfy the prefix condition, and the we can uniquely decode the reversed code.
(b) Find an example of a suffix free code with codeword lengths (1, 2, 2) that is not a prefix-free code.
Can a code word be decoded as soon as its last bit arrives at the decoder? Show that a decoder
might have to wait for an arbitrarily long time before decoding (this is why a careful definition
of unique decodability is required).
Solution: The {0, 01, 11} code discussed in the lecture is an example of a suffix-free code with
codeword lengths (1, 2, 2) that is not a prefix-free code. Clearly, a codeword cannot be decoded
as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the
following output produced by the encoder, 0111111111 . . .. Assuming that source letters {a, b, c}
map to {0, 01, 11}, we cannot distinguish between the two possible source sequences, acccccccc . . .
and bcccccccc . . . till the end of the string is reached. Hence, in this case the decoder might have
to wait for an arbitrarily long time before decoding.
8. (Optional) Kraft for Uniquely Decodable Codes: Assume a uniquely decodable code has lengths
l1 , . . . , lM .
(a) Prove the following identity (this is easy):
n
M
M X
M
M
X
X
X
2−lj =
...
2−(lj1 +lj2 +...+ljn )
j=1
j1 =1 j2 =1
jn =1
Solution: This is trivial. Simply expand the sum.
(b) Show that there is one term on the right for each concatenation of n codewords (i.e. for the
encoding of the n-tuple x = (x1 , . . . , xn ) where lj1 + lj2 + . . . + ljn is the aggregate length of that
concatenation.)
Solution: lj1 + lj2 + . . . + ljn is the length of n codewords from the code.
5
(c) Let Ai be the number of concatenations that have overall length i and show that
n
nl
M
max
X
X
Ai 2−i
2−lj =
i=n
j=1
Solution: The smallest value this exponent can take is n, which would happen if all code words
had the length 1. The largest value the exponent can take is nlmax where lmax is the maximal
codeword length. The summation can then be written as above.
(d) Using unique decodability, show that Ai ≤ 2i and hence
n
M
X
2−lj ≤ nlmax
j=1
Solution: The number of possible binary sequences of length i is 2i . Since the code is uniquely
decodable, we must have Ai ≤ 2i in order to be able to decode. Plugging this into the above
bound yields
n
nl
M
max
X
X
2i 2−i = n(lmax − 1).
2−lj ≤
i=n
j=1
(e) By taking n-th root and letting n → ∞, recover Kraft’s inequality for uniquely decodable codes.
Solution: We have
M
X
1/n
1
−lj
2
≤ n(lmax − 1)
= exp
log(n(lmax − 1))
n
j=1
The exponent goes to zero as n → ∞ and hence,
PM
j=1
2−lj ≤ 1, Kraft’s inequality for UD codes.
9. (Optional) Infinite Alphabet Optimal Code: Let X be an i.i.d. random variable with an infinite
alphabet, X = {1, 2, 3, . . . , }. In addition let P (X = i) = 2−i .
(a) What is the entropy of X?
Solution: By direct calculation,
H(X) =
∞
X
−2−i log(2−i ) =
∞
X
i2−i = 2.
i=1
i=1
This is because
n
X
ixi−1 =
i=1
1
(1 − x)2
for all |x| < 1. Can be shown by differentiation.
(b) Find an optimal variable length code, and show that it is indeed optimal.
Solution: Take the codelengths to be − log(2−1 ), − log(2−2 ), − log(2−3 ), . . .. Codewords can be
C(1) = 0
C(2) = 10
C(3) = 110
..
.
6
© Copyright 2026 Paperzz