Homework set 3

ECE 255AN
Fall 2016
Homework set 3 - solutions
Solutions to Chapter 4 problems
11. Stationary processes.
(a) H(Xn |X0 ) = H(X−n |X0 ).
This statement is true, since
H(Xn |X0 ) = H(Xn , X0 ) − H(X0 )
(1)
H(X−n |X0 ) = H(X−n , X0 ) − H(X0 )
(2)
and H(Xn , X0 ) = H(X−n , X0 ) by stationarity.
(b) H(Xn |X0 ) ≥ H(Xn−1 |X0 ).
This statement is not true in general, though it is true for first order Markov chains.
A simple counterexample is a periodic process with period n. Let X0 , X1 , X2 , . . . , Xn−1
be i.i.d. uniformly distributed binary random variables and let Xk = Xk−n for k ≥
n. In this case, H(Xn |X0 ) = 0 and H(Xn−1 |X0 ) = 1, contradicting the statement
H(Xn |X0 ) ≥ H(Xn−1 |X0 ).
(c) H(Xn |X1n−1 , Xn+1 ) is non-increasing in n.
This statement is true, since by stationarity H(Xn |X1n−1 , Xn+1 ) = H(Xn+1 |X2n , Xn+2 ) ≥
H(Xn+1 |X1n , Xn+2 ) where the inequality follows from the fact that conditioning reduces
entropy.
18. (a) SInce the coin tosses are indpendent conditional on the coin chosen, I(Y1 ; Y2 |X) = 0.
(b) The key point is that if we did not know the coin being used, then Y1 and Y2 are not
independent. The joint distribution of Y1 and Y2 can be easily calculated from the
following table
X Y1 Y2 Probability
1 H H
p2
1 H T
p(1 − p)
1 T H
p(1 − p)
1 T T
(1 − p)2
2 H H
(1 − p)2
2 H T
p(1 − p)
p(1 − p)
2 T H
2 T T
p2
Thus the joint distribution of (Y1 , Y2 ) is ( 21 (p2 +(1−p)2 ), p(1−p), p(1−p), 12 (p2 +(1−p)2 )),
1
and we can now calculate
I(X; Y1 , Y2 ) = H(Y1 , Y2 ) − H(Y1 , Y2 |X)
(3)
= H(Y1 , Y2 ) − H(Y1 |X) − H(Y2 |X)
(4)
= H(Y1 , Y2 ) − 2H(p)
(5)
1 2
1
= H
(p + (1 − p)2 ), p(1 − p), p(1 − p), (p2 + (1 − p)2 ) − 2H(p)
2
2
= H(p(1 − p)) + 1 − 2H(p)
(6)
where the last step follows from using the grouping rule for entropy.
(c)
H(Y1 , Y2 , . . . , Yn )
n
H(X, Y1 , Y2 , . . . , Yn ) − H(X|Y1 , Y2 , . . . , Yn )
= lim
n
H(X) + H(Y1 , Y2 , . . . , Yn |X) − H(X|Y1 , Y2 , . . . , Yn )
= lim
n
H(Y) = lim
(7)
(8)
(9)
Since 0 ≤ H(X|Y1 , Y2 , . . . , Yn ) ≤ H(X) ≤ 1, we have lim n1 H(X) = 0 and similarly
1
n H(X|Y1 , Y2 , . . . , Yn ) = 0. Also, H(Y1 , Y2 , . . . , Yn |X) = nH(p), since the Yi ’s are i.i.d.
given X. Combining these terms, we get
H(Y) = lim
nH(p)
= H(p)
n
19. (a) The stationary distribution for a connected graph of undirected edges with equal weight
Ei
is given as µi = 2E
where Ei denotes the number of edges emanating from node i
and E is the total number of edges in the graph. Hence, the stationary distribution is
3 3 3 3 4
[ 16
, 16 , 16 , 16 , 16 ]; i.e., the first four nodes exterior nodes have steady state probability
3
while node 5 has steady state probability of 41 .
of 16
3
(b) Thus, the entropy rate of the random walk on this graph is 4 16
log2 (3) +
3
1
log
(3)
+
=
log
16
−
H(3/16,
3/16,
3/16,
3/16,
1/4)
2
4
2
4
16
log2 (4) =
(c) The mutual information
I(Xn+1 ; Xn ) = H(Xn+1 ) − H(Xn+1 |Xn )
(10)
= H(3/16, 3/16, 3/16, 3/16, 1/4) − (log16 − H(3/16, 3/16, 3/16, 3/16, 1/4))
(11)
= 2H(3/16, 3/16, 3/16, 3/16, 1/4) − log16
3
16 1
= 2( log
+ log 4) − log16
4
3
4
3
= 3 − log 3
2
(12)
(13)
(14)
22. 3D Maze.
The entropy rate of a random walk on a graph with equal weights is given by equation 4.41
in the text:
2
H(X ) = log(2E) − H
Em
E1
,...,
2E
2E
There are 8 corners, 12 edges, 6 faces and 1 center. Corners have 3 edges, edges have 4 edges,
faces have 5 edges and centers have 6 edges. Therefore, the total number of edges E = 54.
So,
3
4
5
6
3
4
5
6
H(X ) = log(108) + 8
+ 12
+6
+1
log
log
log
log
108
108
108
108
108
108
108
108
= 2.03 bits
30. (a) Let µn denote the probability mass function at time n. Since µ1 = ( 13 , 13 , 13 ) and µ2 =
µ1 P = µ1 , µn = µ1 = ( 13 , 13 , 13 ) for all n and {Xn } is stationary.
Alternatively, the observation P is doubly stochastic will lead the same conclusion.
(b) Since {Xn } is stationary Markov,
lim H(X1 , . . . , Xn ) = H(X2 |X1 )
n→∞
=
2
X
P (X1 = k)H(X2 |X1 = k)
k=0
= 3×
=
1
1 1 1
× H( , , )
3
2 4 4
3
.
2
(c) Since (X1 , . . . , Xn ) and (Z1 , . . . , Zn ) are one-to-one, by the chain rule of entropy and the
Markovity,
H(Z1 , . . . , Zn ) = H(X1 , . . . , Xn )
n
X
=
H(Xk |X1 , . . . , Xk−1 )
k=1
= H(X1 ) +
n
X
H(Xk |Xk−1 )
k=2
= H(X1 ) + (n − 1)H(X2 |X1 )
3
= log 3 + (n − 1).
2
Alternatively, we can use the results of parts (d), (e), and (f). Since Z1 , . . . , Zn are
independent and Z2 , . . . , Zn are identically distributed with the probability distribution
( 12 , 14 , 14 ),
H(Z1 , . . . , Zn ) = H(Z1 ) + H(Z2 ) + . . . + H(Zn )
= H(Z1 ) + (n − 1)H(Z2 )
3
= log 3 + (n − 1).
2
3
(d) Since {Xn } is stationary with µn = ( 13 , 13 , 13 ),
1 1 1
H(Xn ) = H(X1 ) = H( , , ) = log 3.
3 3 3


 0,
For n ≥ 2, Zn =
1,


2,
Hence, H(Zn ) =
1
2,
1
4,
1
4.
H( 12 , 41 , 41 )
= 32 .
(e) Due to the symmetry of P , P (Zn |Zn−1 ) = P (Zn ) for n ≥ 2. Hence, H(Zn |Zn−1 ) =
H(Zn ) = 32 .
Alternatively, using the result of part (f), we can trivially reach the same conclusion.
(f) Let k ≥ 2. First observe that by the symmetry of P , Zk+1 = Xk+1 − Xk is independent
of Xk . Now that
H(Zk+1 |Xk , Xk−1 ) = H(Xk+1 − Xk |Xk , Xk−1 )
= H(Xk+1 − Xk |Xk )
= H(Xk+1 − Xk )
= H(Zk+1 ),
Zk+1 is independent of (Xk , Xk−1 ) and hence independent of Zk = Xk − Xk−1 .
For k = 1, again by the symmetry of P , Z2 is independent of Z1 = X1 trivially.
Solutions to Chapter 2 problems
29. Inequalities.
(a) Using the chain rule for conditional entropy,
H(X, Y |Z) = H(X|Z) + H(Y |X, Z) ≥ H(X|Z),
with equality iff H(Y |X, Z) = 0, that is, when Y is a function of X and Z.
(b) Using the chain rule for mutual information,
I(X, Y ; Z) = I(X; Z) + I(Y ; Z|X) ≥ I(X; Z),
with equality iff I(Y ; Z|X) = 0, that is, when Y and Z are conditionally independent
given X.
(c) Using first the chain rule for entropy and then the definition of conditional mutual
information,
H(X, Y, Z) − H(X, Y ) = H(Z|X, Y ) = H(Z|X) − I(Y ; Z|X)
≤ H(Z|X) = H(X, Z) − H(X) ,
with equality iff I(Y ; Z|X) = 0, that is, when Y and Z are conditionally independent
given X.
4
(d) Using the chain rule for mutual information,
I(X; Z|Y ) + I(Z; Y ) = I(X, Y ; Z) = I(Z; Y |X) + I(X; Z) ,
and therefore
I(X; Z|Y ) = I(Z; Y |X) − I(Z; Y ) + I(X; Z) .
We see that this inequality is actually an equality in all cases.
48. (a)
I(X N ; N )
=
H(N ) − H(N |X N )
=
H(N ) − 0
(a)
N
I(X ; N )
=
E(N )
=
2
where (a) comes from the fact that the entropy of a geometric random variable is just
the mean.
(b) Since given N we know that Xi = 0 for all i < N and XN = 1,
H(X N |N ) = 0.
(c)
H(X N ) = I(X N ; N ) + H(X N |N )
= I(X N ; N ) + 0
H(X N ) = 2.
(d)
I(X N ; N ) = H(N ) − H(N |X N )
= H(N ) − 0
N
I(X ; N ) = HB (1/3)
(e)
2
1
H(X 6 |N = 6) + H(X 12 |N = 12)
3
3
1
2
6
12
=
H(X ) + H(X )
3
3
1
2
=
6 + 12
3
3
H(X N |N ) = 10.
H(X N |N ) =
5
(f)
H(X N ) = I(X N ; N ) + H(X N |N )
= I(X N ; N ) + 10
H(X N ) = H(1/3) + 10.
6