GI12/4C59 – Homework #2 (Due 12am, November 8, 2005)
Aim: To get familiarity with the basic concepts of Information Theory (entropy,
mutual information, etc.) and some coding principles. Presentation, clarity, and
synthesis of exposition will be taken into account in the assessment of these exercises.
This document is available from
http://www.cs.ucl.ac.uk/staff/Rob.Smith/Internal/information_theory.htm
1.
[30 pts] Let X and Y be two random variables with values in the sets
X {0 1 2} and Y {0 1} respectively. Define the probability distribution p
on X Y by the table
X 0 X 1 X 2
1
1
1
Y 0
p
3
12
6
1
1
1
Y 1
12
6
6
(a) Compute the joint entropy of X and Y , H ( X Y ) .
(b) Find the marginal distribution of X and the conditional distribution of
Y given X . Then, use these quantities to compute the entropy of X ,
H ( X ) and the relative entropy of Y given X , H (Y X ) .
(c) Verify the entropy results above by using the chain rule which relates
H ( X Y ) to H ( X ) and H (Y X ) .
(d) Compute the mutual information between X and Y .
2. [30 pts] This one might require you to use a computer, but only in the most
basic way. The frequency pn of the nth most frequent word in English (and in
many other languages) is roughly approximated by:
0.1
for n 1...12,367
pn n
n 12,367
0
(this is known as Zipf’s Law). Assuming English is generated by selecting words
at random from this distribution, what is the information entropy of English?
3. [10 pts] The following strings are received in a (7,4) Hamming code. Decode,
please:
(a) r = 1101011
(b) r = 0110110
(c) r = 0100111
(d) r = 1111111
4. [30 pts] Imagine constructing a Huffman code for symbols made up of blocks
of n bits. We will call each of these codes Xn. If the probability of 0 is 0.8, and
the probability of 1 is 0.2, determine optimal length Huffman codes X2 and X3.
Calculate the expected length of the codewords, and the information entropy
for the codes. Please show your work.
Model answers:
1.
a)
H ( X Y ) P x, y log P x, y
x, y
1
1 1
1 1
1 1
1 1
1 1
1
log log log log log log
3
3 12
12 6
6 12
12 6
6 6
6
2.418
b)
P x P x, y
y
P(0) P 0,0 P 0,1
(1/ 3) (1/12) 5 /12
P(1) P 1,0 P 1,1
(1/12) (1/ 6) 1/ 4
P(2) P 2,0 P 2,1
(1/ 6) (1/ 6) 1/ 3
P X ,Y
PX
P Y | X
Y 0
P X |Y
Y 1
X 0
13
5 12
1 12
5 12
X 1 X 2
1 12
16
14
13
16
16
14
1 3
X 0 X 1 X 2
Y 0 4 5
13
12
Y 1 1 5
23
1 2
H X P x log P x
x
5
5 1
1 1
1
log log log
12
12 4
4 3
3
1.555
H Y | X P x, y log P x | y
x, y
1
4 1
1 1
1
log log log
3
5 12
3 6
2
1
1 1
2 1
1
log log log 0.864
12
5 6
3 6
2
H X ,Y H ( X ) H (Y | X ) 1.555+0.864=2.418 (forgive the rounded results)
c)
I X ;Y H ( X ) H ( X | Y )
Or
I X ;Y H ( X ) H ( X ,Y ) H (Y )
P y P x, y
y
P ( y 0) P 0,0 P 1,0 P 2,0
7 /12
P ( y 1) P 0,1 P 1,1 P 2,1
5 /12
H (Y ) 0.98
I ( X ;Y ) 1.555 2.418 0.98 0.117
2. 9.717
3.
a) 1100
b) 0100
c) 0100
d) 1111
4.
X2: P(00)=0.64, P(01)=P(10)=0.16, P(11)=0.04
X3: P(000)=0.512, P(001)=P(010) =P(100)=0.128,
P(011)= P(101)= P(110)=0.032, P(111)=0.008
Shannon limits to code efficiency:
H(X2)=1.443856
H(X3)=2.165784
For X2 , let’s choose the codewords 0, 10, 110, and 111. Following the Huffman
coding procedure, let’s assign input word 11 to 111, 10 to 110, 01 to 10, and 0 to 0
The resulting average number of code bits per input word is
(3*0.04)+(3*0.16)+(2*0.16)+(1*0.64)=1.56.
For X3, let’s chose codewords 0, 10, 110, 1110, 11110, 111110, 1111110, 1111111.
Following the Huffman Coding procedure, let’s assign input word 111 to 11111, 110
to 11110, 101 to 11101, 011 to 11110, 100 to 110, 010 to 101, 001 to 100, and 000 to
0.
The resulting average number of code bits per input word is
(5*0.008)+(5*0.032)+(5*0.0032)+(5*0.0032)+(3*0.128)+(3*0.128)+(3*0.128)+
(1*0.512)=2.184
© Copyright 2026 Paperzz