ECE 534 Information Theory

ECE 534 Information Theory - MIDTERM 1
09/29/2010, LH 305.
• This exam has 4 questions, each of which is worth 25 points.
• You will be given the full 1.25 hours. Use it wisely! Many of the problems have short answers; try
to find shortcuts. Do questions that you think you can answer correctly first.
• You may bring and use one 8.5x11” double-sided crib sheet.
• No other notes or books are permitted.
• No calculators are permitted.
• Talking, passing notes, copying (and all other forms of cheating) is forbidden.
• Make sure you explain your answers in a way that illustrates your understanding of the problem. Ideas
are important, not just the calculation.
• Partial marks will be given.
• Write all answers directly on this exam.
Your name:
Your UIN:
Your signature:
The exam has 4 questions, for a total of 100 points.
Question:
1
2
3
4
Total
Points:
25
25
25
25
100
Score:
ECE534 Fall 2010 EXAM 1
Name:
1. A dog looking for a bone.
A dog walks on the integers, possibly reversing direction at each step with probability p = 0.1. Let
X0 = 0. The first step is equally likely to be positive or negative. A typical walk might look like
this:
(X0 , X1 , X2 , · · · ) = (0, −1, −2, −3, −4, −3, −2, −1, 0, 1, 2, 1, 0, · · · )
(a) (10 points) Find H(X1 , X2 , · · · Xn ) for any n.
73
Entropy Rates of a Stochastic Process
Solution: Cover+Thomas question 4.12.
(a) By the chain rule,
H(X0 , X1 , . . . , Xn ) =
Entropy Rates of a Stochastic Process
n
!
i=0
H(Xi |X i−1 )
= H(X0 ) + H(X1 |X0 ) +
(a) By the chain rule,
n
!
i=2
73
H(Xi |Xi−1 , Xi−2 ),
n
!
since, for i > 1 , the next position
depends only on the previous two (i.e., the
H(X0 , X1 , . . . , Xn ) =
H(Xi |X i−1 )
dog’s of
walk
is 2nd order
Markov,
if
the
dog’s position is the state). Since X 0 =730
Entropy Rates
a Stochastic
Process
i=0
deterministically, H(X0 ) = 0 and since the first step is!
nequally likely to be positive
or negative,
for1 |X
i >0 )1+,
H(X0 ) + H(X
H(Xi |Xi−1 , Xi−2 ),
(a) By
the chain H(X
rule,1 |X0 ) = 1=. Furthermore
i=2
(b)
(c)
n
!
H(Xi |X
i−1 , Xi−2 )i−1= H(.1, .9).
X11, ,. .the
. , Xnext
H(X
)only on the previous two (i.e., the
since, H(X
for i0 ,>
depends
n ) =position
i |X
i=0if the dog’s position is the state). Since X 0 = 0
dog’s
walk is 2nd order Markov,
Therefore,
n
deterministically, H(X
0 and
the1 first
step1)H(.1,
is!
equally
to be positive
0) =
H(X
. . . ,since
X0 )n )+=
+1(n
.9).ilikely
0, X
= 1 , H(X
H(X
|X−
H(X
|Xi−1 , Xi−2 ),
0) +
or negative, H(X1 |X0 ) = 1 . Furthermore for i > 1 , i=2
(b) From a),
H(X
H(.1,
since, for i > 1 , the next
position
depends
on .9).
the previous two (i.e., the
i |Xi−1 , X
i−2 ) =only
1 position
+ (n − 1)H(.1,
.9)
H(X
, X1 ,browsing
...X
) dog’s
(10 points) Find the
entropy
rateorder
of this
dog.
0Markov,
dog’s
walk is 2nd
if nthe
is the state).
Since X 0 = 0
=
+equally
1
Therefore,
deterministically,
H(X0 ) n=+0 1and since the first stepn is
likely to be positive
H(.1,
H(X
, X , . . . , Xn ) →
= 1for
+
(n
.9).
or negative, H(X1 |X
i >−.9).
11)H(.1,
,
0 ) =0 1 . 1Furthermore
Solution:
(b)
a), must take at least
(c) From
The dog
one
toi−2
establish
the direction
of travel from which
H(X
) = H(.1,
.9).
i |Xstep
i−1 , X
it ultimately reverses.
S
be) the number
of−steps
taken
between reversals,
1
+
(n
1)H(.1,
.9)
H(XLetting
,
X
,
.
.
.
X
0
1
n
=
Therefore,
we have
n+1
n+1
H(X0 , X1 , . . . , Xn ) =∞1 + (n − 1)H(.1, .9).
! H(.1, .9).
→
E(S) =
s(.9)s−1 (.1)
(b) From a),
s=1
(c) The dog must take at least one step to establish the direction of travel from which
= 10. 1 + (n − 1)H(.1, .9)
H(XLetting
Xbe
ultimately
reverses.
of steps taken between reversals,
0 , X1 , . . S
n ) the number
(5 points) What isitthe
expected
number
of. steps
=the dog takes before reversing direction?
we
have at time 0, the expected
n + 1 number of steps to
n+
1 first reversal is 11.
Starting
the
→
∞ H(.1, .9).
!
s−1 a stationary stochastic process
Solution:13. The past has little to say about
E(S) the
= future.
s(.9)For
(.1)
The
dog
must
take
at
least
one
step
to
establish
the direction of travel from which
s=1
X(c)
,
X
,
.
.
.
,
X
,
.
.
.
,
show
that
1
2
n
it ultimately reverses. Letting S be=the10.
number of steps taken between reversals,
1
we have
lim
I(X1 , X2 , . . . , Xn ; Xn+1 , Xn+2 , . . . , X2n ) = 0.
(4.78)
n→∞ 0,
2nthe expected number of steps to the first reversal is 11.
Starting at time
∞
!
s−1
E(S)the
=
s(.9)
(.1)
Thus past
the dependence
adjacent
n -blocks
ofFor
a stationary
process
does not
grow
13. The
has little between
to say about
future.
a stationary
stochastic
process
s=1
linearly
with
n
.
X
,
X
,
.
.
.
,
X
,
.
.
.
,
show
that
1
2
n
= 10.
Solution:
1
lim 0, the
I(Xexpected
, Xn ; Xn+1
Xn+2to
, . .the
. , Xfirst
0.
1 , X2 , . . . number
2n ) =
Starting at time
of ,steps
reversal
is 11. (4.78)
2n
I(X1 , X2 , . . . , Xnn→∞
; Xn+1
, Xn+2 , . . . , X2n )
13. The
has little between
to
say about
the
future.
Forstationary
a stationary
process
Thus
the
dependence
adjacent
n -blocks
not
grow
= past
H(X
, . . . , Xof
H(X
, . . .stochastic
, Xndoes
, Xn+1
,X
1 , X2 , . . . , X
n ) + H(X
n+1 , Xn+2
2n )a−
1 , X2process
n+2 , . . . , X2n )
X
. . , X , . . . , show that
1 , X2 , .with
linearly
= 2H(X ,nnX. , . . . , X ) − H(X , X , . . . , X , X
,X
,...,X )
(4.79)
1
2
n
1
2
n
n+1
n+2
2n
Solution:
1
lim
I(X1 , X2 , . . . , Xn ; Xn+1 , Xn+2 , . . . , X2n ) = 0.
(4.78)
n→∞ 2n
I(X1 , X2 , . . . , Xn ; Xn+1 , Xn+2 , . . . , X2n )
Thus the dependence between adjacent n -blocks of a stationary process does not grow
= H(X1 , X2 , . . . , Xn ) + H(Xn+1 , Xn+2 , . . . , X2n ) − H(X1 , X2 , . . . , Xn , Xn+1 , Xn+2 , . . . , X2n )
linearly with n .
= 2H(X1 , X2 , . . . , Xn ) − H(X1 , X2 , . . . , Xn , Xn+1 , Xn+2 , . . . , X2n )
(4.79)
Solution:
I(X1 , X2 , . . . , Xn ; Xn+1 , Xn+2 , . . . , X2n )
= H(X1 , X2 , . . . , Xn ) + H(Xn+1 , Xn+2 , . . . , X2n ) − H(X1 , X2 , . . . , Xn , Xn+1 , Xn+2 , . . . , X2n )
= 2H(X1 , X2 , . . . , Xn ) − H(X1 , X2 , . . . , Xn , Xn+1 , Xn+2 , . . . , X2n )
Points earned:
(4.79)
out of a possible 25 points
ECE534 Fall 2010 EXAM 1
Name:
.
Points earned:
out of a possible 0 points
ECE534 Fall 2010 EXAM 1
Name:
2. AEP.
according to p(x) for x ∈ {1, 2, · · · , m}. Let µ = U [X], and H =
P Let Xi be i.i.d. distributed
n
52
Asymptotic
Equipartition
− x p(x)
log
p(x).
Let
A
be
the
(weakly)
typical set The
as defined
in class.
Let B n Property
be the set B n = {xn ∈
Pn
1
n
X : | n i=1 Xi − µ| ≤ }.
4. AEP
(a) (4 points) Rigorously
thextypical
An. .Let µ = EX, and H = − ! p(x) log p(x). Let
Let Xi be define
iid ∼ p(x),
∈ {1, 2, . set
. . , m}
!
An = {xn ∈ X n : | − n1 log p(xn ) − H| ≤ !} . Let B n = {xn ∈ X n : | n1
n
i=1 Xi − µ|
≤ !} .
52 Cover+Thomas
The Asymptotic Equipartition Property
Solution:
question 3.4.
(a) Does Pr{X n ∈ An } −→ 1 ?
1
n
n
n
n
A = {x 4.∈AEP
X Does
: | −Pr{X
(b)
∈ An )∩−
B nH(X)|
} −→ 1 ?≤ }.
2 logn p(x
(b)
(c)
(d)
(e)
!
Let
{1, 2, ,. .for
. , m}
i be iid
(c) X
Show
|An∼∩p(x),
B n | ≤x 2∈n(H+!)
all .nLet
. µ = EX, and H = − p(x) log p(x). Let
n = {xn ∈ X n : | − 1 log p(xn ) − H| ≤ !} . Let B n = {xn ∈ X n : | 1 !n X − µ| ≤ !} .
A
n n nnn
1
n(H−!)
i=1 inot.
n
(3 points) Does
If so, , sketch
a proof, large.
if not, show
why
(d)Pr{X
Show |A∈ A
∩ B} |→
≥ (1?
for n sufficiently
2 )2
52
The Asymptotic Equipartition Property
n
n
(a) Does Pr{X ∈ A } −→ 1 ?
Solution:
Solution:4. AEP
(b) Does Pr{X n ∈ An ∩ B n } −→ 1 ?
(a) X
Yes,
by the
AEP
for
discrete
variables the probability !
X n is log
typical
Let
x 2∈n(H+!)
{1, 2, ., .for
.random
, m}
p(x).goes
Let
i be iid
(c) Show
|An∼∩p(x),
Bn| ≤
all . nLet
. µ = EX, and H = − !p(x)
n
n ) − H| ≤ !} . Let B n = {xn ∈ X n : | 1
An =to{x1.n ∈ Xnn : | −n n1 log1p(xn(H−!)
X
−
i=1 i µ| ≤ !} .
n
(d) Show |A ∩ B | ≥ ( 2 )2
, for n sufficiently large.
n ) → 1 . So there exists
(b) Yes, by the nStrong
Law of Large Numbers
P r(X n ∈ BEquipartition
52
The Asymptotic
Property
(a) Does Pr{X ∈ An } −→ 1 ?
!
n
n
! > 0 nand Nn1 suchnthat P r(X ∈ A ) > 1 − 2 for all n > N1 , and there exists
Solution:
(6 points) Does
Pr{X
∈
A
∩
B
}
→
1?
If
so,
sketch
a
proof,
if
not,
show
why not.
n
n
n
!
n
n
(b) N
Does
Pr{X
} −→
that ∈P A
r(X∩ B
∈B
) > 11?− 2 for all n > N2 . So for all n > max(N1 , N2 ) :
2 such
4. AEP
!X n is typical goes
(a)
Yes,
by
the
AEP
for
discrete
random
variables
the
probability
n
n
n(H+!)
(c)XShow
|An∼∩p(x),
Bn | ≤
2
, for all n .
Let
x∈
and Hn = − p(x)
log p(x). Let
i be iid
n {1, 2, . . . , m}n. Letn µ = EX, n
!n n ∈ An ∪ B n )
P1.
PH|
r(X≤ !}
∈ .ALet
) +BPnr(X
BX)n−: |P1r(X
1 B )1 =
nr(X
n) −
n∈∈
Solution: An = to
n(H−!)
{x
∈|A
X nn∈∩
:A
|B
−n∩
log( p(x
= {x
Xi − µ| ≤ !} .
i=1
(d) Show
)2
,
for
n
sufficiently
large.
n| ≥
n
!
!
2
(b) Yes, by the Strong Law
> of1Large
− +Numbers
1 − − 1P r(X n ∈ B n ) → 1 . So there exists
n ∈ An } −→ 1 ?
2
2
n
n
(a)
Does
Pr{X
! > 0 and N1 such that P r(X ∈ A ) > 1 − 2! for all n > N1 , and there exists
Solution:
= n1 − ! !
n ∈PA
n ∩nB∈
n }B−→
N2 such
that
r(X
) >1 1? − 2 for all n > N2 . So for all n > max(N1 , N2 ) :
(b) Does
Pr{X
(a) Yes, by the
AEP
for
discrete
random variables the probability X nn is typical
goes
n
n
n(H+!)
So
for
any
!
>
0
there
exists
max(N
that P r(X ∈ An ∩ B n ) >
(c) Show
|An ∩ Bn | ≤ 2n
, forNall=
n
.
1 , N2 ) such
n
toP1.r(X
∈ A ∩ B ) = P r(X n ∈ nAn ) +n P r(X
∈ B n ) − P r(X n ∈ An ∪ B n )
n
1 − ! for
all
nn>
therefore
∈!A ∩ B large.
) → 1.
! r(X
(d)
|Anthe
∩B
| ≥N(,21Law
)2n(H−!)
,P
for
n sufficiently
n
(b) Show
Yes, by
Strong
P r(X
B n ) → 1 . So there exists
>of 1Large
− !+Numbers
1 − − 1p(x
n) ≤ ∈
(c) By the law of total probability
1 . Also, for xn ∈ An , from
n
n
n
!
n
n
2 ∈x A∈A) 2∩B
!
>
0
and
N
such
that
P
r(X
>
1
−
for
all
n > N1 , and there exists
1
n )!≥ 2−n(H+!) . Combining
2
Solution:
= np(x
1−
Theorem 3.1.2 in the ntext,
these two equations gives
! ) > 1 − 2! for
! that P r(Xn ∈ B
all n > N2n. So for
all n > max(N1 , N2 ) :
N2 such
−n(H+!)
1 ≥ xn ∈An ∩B n p(x ) ≥ xn ∈An ∩B n 2
= |A ∩ B n |2−n(H+!)
. Multiplying
(a) Yes,
by
the
AEP
for
discrete
random
variables
the
probability
X nn is∈ typical
goes
n
n
n(H+!)
n
n
n(H+!)
So
for
any
!
>
0
there
exists
N
=
max(N
,
N
)
such
that
P
n
n
n
n
n
n
n
n A n ∩ B n) >
1
2
through
by∈ 2A ∩ B gives
the
result∈ |A
∩+BP r(X
| ≤ 2 ∈ B ). − Pr(X
P
r(X
)
=
P
r(X
A
)
r(X
∈A ∪B )
to
1.
n
n
n
1 − ! for all n > N , therefore
r(X
∈! A ∩ B ) → 1 .
n ∩P
n} →
!B
(d) Since from (b) P r{X n >∈ A
exists nN such that P r{X n ∈
n
− !
+
1 − 1−, 1there
(b)
by nlaw
the Strong Law
of 1Large
Numbers
P r(X
) → for
1 . Son theren exists
n ≤∈1 B
n ∩
(c) Yes,
By
the
probability
. Also,
n
n p(x )
2
A
B } ≥of21 total
for all
n > N .n2 From
Theorem
3.1.2
text, xfor∈xAn ∈, from
An ,
xn ∈A
∩B
n ) > 1 − ! for allinn the
!Theorem
> n0 and3.1.2
N
such
that
Pp(x
r(X
∈ 2A−n(H+!)
>N
, and
there exists
!1two
n )! ≥
1 in
=
1
−
2
1 these
−n(H−!)
ngives
the
text,
.
Combining
equations
p(x
)
≤
2
.
So
combining
these
two
gives
≤
p(x
)
≤
n
n
n
n
n
n(H+)
!
n
n
!
xn ∈A
∩B
2 for
for
all n >=N|A
> max(N
N
Pnr(X
− 2nall
(6 points) Show that
|Anthat
∩n B
|p(x
≤n2)∈≥B!n) >
,1for
n.
n |2all
−n(H+!)
2 .n So
1 , N2 ) :
!
1 2≥nsuch
2−n(H+!)
∩ Bthrough
. n(H−!)
Multiplying
−n(H−!)
−n(H−!)
x∩n B
∈Ann|2
∩B
x ∈A
∩B
=
|A
.
Multiplying
by
2
gives
n
n 2
n
n
n
x
∈A
∩B
n(H+!)
n 1B
n |2 )
So
for any
!>
0 there exists
N
= max(N
,N
that
through
the
result
≤nsuch
2n(H+!)
. P r(X ∈ A ∩ B ) >
n(H−!)
n by|A2nn ∩ B nn| gives
n for|A
n ∩
the
result
( 21 )2
sufficiently
nn
nP∩
=
P r(X
∈A
r(X
B1n. ) − P r(X n ∈ An ∪ B n )
1P
−r(X
! for ∈
allAn ∩
>B
N ,)≥
therefore
P r(X
∈) +
A
B n) ∈
→large.
n
n
n
Solution: (d) Since
from (b) P r{X ∈ A ∩! !
B } →! 1 , there exists N such that P r{X n ∈
> 1 − +x1n ∈A
− ∩B
−n1p(xn ) ≤ 1 . Also, for xn ∈ Ann , from
1total probability
n ∩
(c) By
the
5. Sets
defined
A
B nlaw
}by
≥ofprobabilities.
for
all
n
>
N
.
3.1.2 in the text, for x ∈ An ,
2 From n2Theorem
2
n ) ≥ 2−n(H+!) . Combining these
!twoentropy
Theorem
in
the
text,
p(x
equations
Let Xp(x
. . .3.1.2
be
an
i.i.d.
sequence
of
discrete
random
variables
H(X).
1
n )2 , ≤
−n(H−!)
ngives
1, X
=!
1−!
2
.n So
combining
these
two
gives
≤ with
!
xn ∈An ∩B n p(x ) ≤
−n(H+!) = |An ∩ 2B n |2−n(H+!)
1 ≥ xn ∈An ∩B
. n(H−!)
Multiplying
Let !
n ∈An
n ∩B
n 2
n p(x ) ≥
−n(H−!)
n
−n(H−!)
x
by 2
gives
xn ∈An ∩B n 2 n(H+!) = |A ∩ B n|2
n n . Multiplying
nn
n(H+!)through
through
by! 2> 0 there
gives
result
∩1 ,BN
≤such
So
for any
N
=∈max(N
C exists
(t)the
= {x
X|A
: p(x
)2| )≥
22−nt
}that. P r(X n ∈ An ∩ B n ) >
the result |An ∩ B n | ≥n( 21 )2n(H−!) for
n sufficiently
large.
n
n
n
n ∈ An ∩
− ! for
all (b)
n >PNr{X
, therefore
P r(X
∈ A1 , ∩there
B ) exists
→
−nt1.. N such that P r{X n ∈
(d) 1Since
from
Bn} →
denote
the subset
of n -sequences
with
!probabilities n≥ 2
1
n ∩ B n }by
5. Sets
defined
A
≥ofprobabilities.
for
all
n
>
N
.
From
the text,
An ,
(c) By
the law
total
probability
≤
1 . inAlso,
for xnfor∈ x
Ann ,∈from
n ∈AnTheorem
n p(x )3.1.2
x
∩B
2n
1 n(H−)
n
!
nt
n
−n(H+!)
1
(6 points) Show
|A
∩
B
|
≥
2
for
n
sufficiently
large.
n
−n(H−!)
n
Let
X
,
X
,
.
.
.
be
an
i.i.d.
sequence
of
discrete
random
variables
with
entropy
H(X).
(a)that
Show
|C
(t)|
≤
2
.
1
2
n
Theorem
p(x ) ≥ 2 these .two
Combining
equations
p(x !
) ≤ 3.1.2
2 in the. text,
gives 2 these
≤ two
) ≤
2 So combining
n ∩B n p(xgives
xn ∈A
!
n ≥ !n
n |2−n(H+!) . Multiplying
Let
n
≥xnwhat
2−n(H+!)
|An ∩ Bthrough
n({X
n ∈A
n ∩B
n p(x
2−n(H−!)
= |A P
B nnn∩B
|2−n(H−!)
. Multiplying
by 2n(H−!) gives
(b) 1For
values
of t )does
∈n C
→=1?
n
x∩
∈A
n (t)})
∈Axn ∩B
n
n
−nt
Cn((t)
=
{x
∈for
X
≥ 22n(H+!)
} .
1 the
n ∩ B n |gives
n(H−!)
through
by|A
2n(H+!)
result
|Ann:∩p(x
B n |) ≤
the result
≥
sufficiently
large.
Solution:
2 )2
Solution:
n
n
n
−nt
(d) Since
(b)
∈ A with
∩ B probabilities
} → 1 , there
denote
the from
subset
of Pnr{X
-sequences
≥ exists
2 . N such that P r{X n ∈
5. Sets defined
by probabilities.
An ∩ B n } ≥ 21 for
all
n
>
N
.
From
Theorem
3.1.2
in the text, for x n ∈ An ,
Let
. . . −n(H−!)
be ≤
an2nt
i.i.d.
sequence of discrete random variables !
with entropy H(X).
1 ,nX2 ,|C
(a) X
Show
n
p(x
) ≤ n2(t)|
.. So combining these two gives 21 ≤
xn ∈An ∩B n p(x ) ≤
Let !
n−n(H−!)
−n(H−!)
n
n
n(H−!)
(b) For
what
values
of
t
does
P
({X
∈
C
(t)})
→
1?
n . Multiplying
=
∩B
gives
n
xn ∈An ∩B n 2
Cn|A
(t) =
{xn|2∈ X n : p(x
) ≥ 2−nt } through by 2
the result |An ∩ B n | ≥ ( 21 )2n(H−!) for n sufficiently large.
Solution:
denote
the subset of n -sequences with probabilities ≥ 2 −nt .
5. Sets defined by probabilities.
nt .
(a) XShow
≤ 2i.i.d.
Let
. . n. (t)|
be an
sequence of discrete random variables with entropy H(X).
1 , X2 ,|C
Let
(b) For what values of t does P ({X n ∈ Cn (t)}) → 1?
Cn (t) = {xn ∈ X n : p(xn ) ≥ 2−nt }
Solution:
denote the subset of n -sequences with probabilities ≥ 2 −nt .
(a) Show |Cn (t)| ≤ 2nt .
(b) For what values of t does P ({X n ∈ Cn (t)}) → 1?
Solution:
Points earned:
out of a possible 25 points
ECE534 Fall 2010 EXAM 1
Name:
.
Points earned:
out of a possible 0 points
which can also be expressed as
H (X, Y ) = −E log p(X, Y ).
(2.9)
ECE534 Fall 2010 EXAM 1
Name:
We also define the conditional entropy of a random variable given
another as the expected value of the entropies of the conditional distribu3. Short answers.
tions, averaged over the conditioning random variable.
(a) (6 points) A Definition
source produces
a character
from
alphabetentropy
A = {0,H1,(Y2,|X)
· · · 9,
If (X,
Y ) ∼ p(x,xy),
the the
conditional
is a, b, c, · · · , y, z};
with probability
1/3,
x
is
a
numeral
{0,
1,
2
·
·
·
,
9},
with
probability
1/3,
x
is
a
vowel
{a, e, i, o, u};
defined as
and with probability 1/3 it’s one of the!
21 consonants. All numerals are equiprobable, and the same
goes for vowels and consonants.
Determine
the entropy
p(x)H
(Y |X = of
x) X.
(2.10)
H (Y |X)
=
x∈X
!
Solution: log(3) + 13 (log(10) + log(5)!
+ log(21))
=−
p(x)
p(y|x) log p(y|x)
x∈X
(2.11)
y∈Y
!= 100m2 . The average of the lengths of their sides is
!A
(b) (7 points) Three squares have average area
=
−
y) log
l = 10m. What can be said about the size of thep(x,
largest
of p(y|x)
the three squares? (2.12)
(HINT: Use Jensen’s
x∈
X
y∈
Y
inequality.)
= −E log p(Y |X).
(2.13)
Solution: Let x be the length of the side of a square, and let the probability of x be (1/3, 1/3, 1/3)
naturalness
definition
joint entropythat
and we
conditional
over the threeThe
lengths
(l1 , l2 ,ofl3the
). Then
the ofinformation
have is entropy
that E[x] = 10 and
is exhibited
that
thefunction
entropy mapping
of a pair of
random
is is a strictly
E[f (x)] = 100,
where fby
(x)the=fact
x2 is
the
lengths
to variables
areas. This
the
entropy
of
one
plus
the
conditional
entropy
of
the
other.
This
is
proved
convex function. We notice that the equality E[f (x)] = f (E[x]) holds, therefore x is a constant,
in the
following
and the three
lengths
must theorem.
all be equal. The area of the largest square is 100m2 .
Theorem 2.2.1 (Chain rule)
(c) (6 points) State and rigorously prove the chain rule for entropy (for 2 random variables only)
12
Entropy, Relative Entropy and Mutual Information
H (X,information.
Y ) = H (X) + H (Y |X).
(2.14)
without using the definition of mutual
since −t log t ≥ 0 for 0 ≤ t ≤ 1 , and is strictly positive for t not equal to 0 or 1.
Proofthe conditional entropy H(Y |X) is 0 if and only if Y is a function of X .
Solution:
Therefore
!!
p(x, vs.
y) log
p(x, y)
(2.15)
H (X, Y
) = − information
6. Conditional
mutual
unconditional
mutual information.
Give
examples of joint random
X , Y and Z such that
x∈Xvariables
y∈Y
!!
(a) I(X; Y | Z) <
= I(X;
− Y ),
p(x, y) log p(x)p(y|x)
(b) I(X; Y | Z) > I(X;
. Y
x∈Y
X )y∈
(2.16)
!!
!!
Solution: Conditional
unconditional
mutual
p(x, y) log vs.
p(x)
−
p(x,
y)information.
log p(y|x)
= − mutual information
x∈X y∈Y
x∈X y∈Y
(a) The last corollary to Theorem 2.8.1 in the text states that if X → Y →(2.17)
Z that
!!
is, if p(x, y | z) = !
p(x | z)p(y | z) then, I(X;
Y ) ≥ I(X; Y | Z) . Equality holds if
logand
p(x)Z −are independent.
p(x, y) log p(y|x) (2.18)
= −Z) = p(x)
and only if I(X;
0 or X
x∈X y∈Y the inequality conditions above
A simple examplex∈
ofXrandom variables satisfying
is, X is a fair
=binary
H (X)random
+ H (Y variable
|X). and Y = X and Z = Y . In this case,
(2.19)
I(X; Y ) = H(X) − H(X | Y ) = H(X) = 1
and,
(d) (6 points) Provide an exampleI(X;
of random
variables X, Y, Z such that I(X; Y ) < I(X; Y |Z).
Y | Z) = H(X | Z) − H(X | Y, Z) = 0.
Solution: So that I(X; Y ) > I(X; Y | Z) .
(b) This example is also given in the text. Let X, Y be independent fair binary
random variables and let Z = X + Y . In this case we have that,
I(X; Y ) = 0
and,
I(X; Y | Z) = H(X | Z) = 1/2.
So I(X; Y ) < I(X; Y | Z) . Note that in this case X, Y, Z are not markov.
7. Coin weighing. Suppose one has n coins, among which there may or may not be one
counterfeit coin. If there is a counterfeit coin, it may be either heavier or lighter than
the other coins. The coins are to be weighed by a balance.
(a) Find an upper bound on the number of coins n so that k weighings will find the
counterfeit coin (if any) and correctly declare it to be heavier or lighter.
(b) (Difficult) What is the coin weighing strategy for k = 3 weighings and 12 coins?
Solution: Coin weighing.
Points earned:
(a) For n coins, there are 2n + 1 possible situations or “states”.
• One of the n coins is heavier.
• One of the n coins is lighter.
out of a possible 25 points
ECE534 Fall 2010 EXAM 1
Name:
.
Points earned:
out of a possible 0 points
ECE534 Fall 2010 EXAM 1
Name:
4. Coding theory.
(a) (10 points) Can (1, 2, 2) be the codeword lengths of a binary Huffman code? What about (2, 2, 3, 3)?
Explain
why/why
not.
Data
Compression
119
(a) Cover+Thomas
Since li = !log p1i " 5.36
, we have
Solution:
Codeword
P −lilengths of a binary Huffman 1code must 1satisfy the Kraft inequality with equality,
li < log
+ 1 every node in the (5.45)
i.e.
= 1. An easy way to see log
this is≤ the
following:
tree has a sibling
i2
pi
pi
(property of an optimal binary code) and if we assign each node a “weight”, namely 2−li , then
which
implies
2P× 2−li is the
weight
ofthat
the father/mother !
node. Thus, collapsing the tree back, we see that
−li
H(X)
≤
L
=
pi li < H(X) + 1.
(5.46)
2
=
1.
i
So, we see that
codewords
with
equality.
TheHuffman
difficult part
is to provemust
that satisfy
the codethe
is aKraft
prefix inequality
code. By the
choice
of l i , So, for this
Data Compression
135
problem all we
need to do is check whether these codeword lengths satisfy the Kraft inequality
we have
−l
−(l
−1)
i
i
with equality.
(1,2,2)Piecewise
does, soHuffman.
these can
bepi the
lengths of a Huffman
2 ≤
< 2 codeword
.
(5.47) code, while
Solution:
(2,2,3,3) do not,
so these cannot be Huffman codeword
lengths.
Codeword
−li
Thus Fj , j > i differs from Fi by at least 2 , and will therefore differ from Fi
a
x1 16 16 22 31 69
is at least one place in the first li bits of the binary expansion of Fi . Thus the
b1
x2 15 16 16 22
codeword
for xFj , 12
j >15
i ,ofwhich
has length lj ≥
li , differs
from the
codeword for
(b) (5 points) Find
the Shannon
code
the 16
distribution
(0.5,
0.25, 0.125,
0.125).
c1
16
3
Fic0at least once
x4 in10the12first15li places. Thus no codeword is a prefix of any other
codeword.
b01
x5 8 10
Solution: Cover+Thomas 5.28
b00build the xfollowing
8
6
(b) We
table
Note
that
the
above
code
only uniquely
decodable,
is also instantaneously
Symbol Probability is
Finot
in decimal
Fi in
binary but
li itCodeword
decodable.
Generally
given a uniquely
decodable
an instan1
0.5
0.0
0.0 code, we
1 can construct
0
taneous
code
with
the
same
codeword
lengths.
This
is
not
the
case
with
the piece2
0.25
0.5
0.10
2
10
wise Huffman construction. There exists a code with smaller expected lengths that is
3
0.125
0.75
0.110
3
110
uniquely decodable, but not instantaneous.
4
0.125
0.875
0.111
3
111
Codeword
The Shannon code in this case achieves the entropy bound (1.75 bits) and is
a
optimal.
(c)
b
c
29. Optimal codes for dyadic distributions. For a Huffman code tree, define the
a0
of a node as the sum of the probabilities of all the leaves under that node.
(10 points)probability
Find
b0 the codeword lengths of the optimal binary encoding of the distribution p =
Let
the
1
1
1c0random variable X be drawn from a dyadic distribution, i.e., p(x) = 2 −i , for
( 100 , 100 , · · · , 100 ). Note – just find the optimal codeword lengths (how many codewords of various
some i , for all x ∈ X . Now consider a binary Huffman code for this distribution.
!
lengths) – no need to specify the actual code.
44. Huffman. Find the word lengths of the optimal binary encoding of p =
1
1
1
100 , 100 , . . . , 100
(a) Argue that for any node in the tree, the probability of the left child is equal to the
Solution: Huffman.
probability of the right
Solution: Cover+Thomas
5.44 child.
"
.
Since the distribution is uniform the Huffman tree will consist of word lengths of
(b) Let
X1 , X2=
, . .7. ,and
Xn be
drawn i.i.d.
p(x) . are
Using
the Huffman
code
forwhich
p(x) ,(64we
!log(100)"
#log(100)$
= 6 . ∼There
64 nodes
of depth
6, of
map
X
,
X
,
.
.
.
,
X
to
a
sequence
of
bits
Y
,
Y
,
.
.
.
,
Y
.
(The
length
1
2
n
1
2
k(X1 ,X
n ) 2k leaf nodes
2 ,...,X
k ) will be leaf nodes; and there are k nodes of depth 6 which
will
form
of
this sequence
willtotal
depend
on the
outcome
. , Xn .) Use part (a) to
1 , Xwe
2 , . .have
of depth
7. Since the
number
of leaf
nodes isX100,
argue that the sequence Y1 , Y2 , . . . , forms a sequence of fair coin flips, i.e., that
1 − k) + 2k = 100 ⇒ k = 36.
Pr{Yi = 0} = Pr{Yi = 1} =(64
2 , independent of Y1 , Y2 , . . . , Yi−1 .
Thus
theare
entropy
of codewords
the coded of
sequence
is 1 bit
per symbol.
So there
64 - 36rate
= 28
word length
6, and
2 × 36 = 72 codewords of
word alength
7.
(c) Give
heuristic
argument why the encoded sequence of bits for any code that
achieves
the
entropy
bound Let
cannot
be uniformly
compressible
and therefore
have
an
45. Random “20”
questions.
X be
distributed
over {1,should
2, . . . , m}
. Asentropy
bitask
perrandom
symbol.questions: Is X ∈ S1 ? Is X ∈ S2 ?...until only one
sume m rate
= 2nof. 1We
integer remains. All 2m subsets of {1, 2, . . . , m} are equally likely.
Solution: Optimal codes for dyadic distributions.
(a) How many deterministic questions are needed to determine X ?
(b) Without loss of generality, suppose that X = 1 is the random object. What is
the probability that object 2 yields the same answers for k questions as object 1?
(c) What is the expected number of objects in {2, 3, . . . , m} that have the same
answers to the questions as does the correct object 1?
√
(d) Suppose we ask n + n random questions. What is the expected number of
wrong objects agreeing with the answers?
Points earned:
out of a possible 25 points
ECE534 Fall 2010 EXAM 1
Name:
.
Points earned:
out of a possible ?? points

Download Report

ECE 534 Information Theory

Paperzz.com

Your Paperzz