Chapter 8: Differential entropy
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Chapter 8 outline
• Motivation
• Definitions
• Relation to discrete entropy
• Joint and conditional differential entropy
• Relative entropy and mutual information
• Properties
• AEP for Continuous Random Variables
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Motivation
• Our goal is to determine the capacity of an AWGN channel
N
h
X
Y
Wireless channel
with fading
Gaussian noise ~ N(0,PN)
=hX+N
time
time
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Motivation
• Our goal is to determine the capacity of an AWGN channel
N
h
X
Y
Wireless channel
with fading
C=
=
1
2
1
2
log
!
|h|2 P +PN
PN
"
Gaussian noise ~ N(0,PN)
=hX+N
log (1 + SN R) (bits/channel use)
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Motivation
• need to define entropy, mutual information between CONTINUOUS random
variables
• Can you guess?
• Discrete X, p(x):
• Continuous X, f(x):
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Definitions - densities
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Properties - densities
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Properties - densities
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Properties - densities
247
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY
an interpretation of the differential entropy: It is the logarithm of the
equivalent side length of the smallest set that contains most of the probability. Hence low entropy implies that the random variable is confined
to a small effective volume and high entropy indicates that the random
widely ECE
dispersed.
Universityvariable
of Illinois is
at Chicago
534, Fall 2009, Natasha Devroye
Note. Just as the entropy is related to the volume of the typical set, there
is a quantity called Fisher information which is related to the surface
area of the typical set. We discuss Fisher information in more detail in
Sections 11.10 and 17.8.
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE
ENTROPY
Quantized
random variables
Consider a random variable X with density f (x) illustrated in Figure 8.1.
Suppose that we divide the range of X into bins of length !. Let us
assume that the density is continuous within the bins. Then, by the mean
value theorem, there exists a value xi within each bin such that
f (xi )! =
!
(i+1)!
f (x) dx.
(8.23)
i!
Consider the quantized random variable X ! , which is defined by
X ! = xi
if i! ≤ X < (i + 1)!.
(8.24)
f(x)
∆
x
FIGURE 8.1. Quantization of a continuous random variable.
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
variable is widely dispersed.
variable is widely dispersed.
Note. Just as the entropy is related to the volume of the typicalNote.
set, there
Just as the entropy is related to the volume of the typical set, there
is a quantity called Fisher information which is related to the
is asurface
quantity called Fisher information which is related to the surface
area of the typical set. We discuss Fisher information in morearea
detail
of the
in typical set. We discuss Fisher information in more detail in
Sections 11.10 and 17.8.
Sections 11.10 and 17.8.
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE
ENTROPY
ENTROPY
Quantized random variables
Consider a random variable X with density f (x) illustrated in Figure
Consider
8.1.a random variable X with density f (x) illustrated in Figure 8.1.
Suppose that we divide the range of X into bins of length !.
Suppose
Let usthat we divide the range of X into bins of length !. Let us
assume that the density is continuous within the bins. Then, byassume
the mean
that the density is continuous within the bins. Then, by the mean
value theorem, there exists a value xi within each bin such thatvalue theorem, there exists a value xi within each bin such that
f (xi )! =
!
(i+1)!
f (x) dx.
(8.23)
i!
f (xi )! =
!
(i+1)!
f (x) dx.
(8.23)
i!
Consider the quantized random variable X ! , which is defined by
Consider the quantized random variable X ! , which is defined by
X ! = xi
if i! ≤ X < (i + 1)!.
(8.24)
f(x)
X ! = xi
if i! ≤ X < (i + 1)!.
(8.24)
f(x)
∆
∆
x
x
247 8.3
RELATION
OF DIFFERENTIAL
ENTROPY TO
DISCRETE
ENTROPY
FIGURE
8.1. Quantization
of a continuous
random
variable.
8.3
RELATION
OF DIFFERENTIAL
ENTROPY TO
DISCRETE
ENTROPY
FIGURE
8.1. Quantization
of a continuous
random
variable.
247
an interpretation of the differential entropy: It is the logarithm
an interpretation
of the
of the differential entropy: It is the logarithm of the
most ofequivalent
the prob- side length of the smallest set that contains most of the probability. Hence low entropy implies that the random variable isability.
confined
Hence low entropy implies that the random variable is confined
to a small effective volume and high entropy indicates that the
to random
a small effective volume and high entropy indicates that the random
variable is widely dispersed.
variable is widely dispersed.
Note. Just as the entropy is related to the volume of the typicalNote.
set, there
Just as the entropy is related to the volume of the typical set, there
is a quantity called Fisher information which is related to the
is asurface
quantity called Fisher information which is related to the surface
area of the typical set. We discuss Fisher information in morearea
detail
of the
in typical set. We discuss Fisher information in more detail in
Sections 11.10 and 17.8.
Sections 11.10 and 17.8.
equivalent
side
length ECE
of the
smallest
that contains
University
of Illinois
at Chicago
534,
Fall 2009,set
Natasha
Devroye
Quantized random variables
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE
ENTROPY
ENTROPY
Consider a random variable X with density f (x) illustrated in Figure
Consider
8.1.a random variable X with density f (x) illustrated in Figure 8.1.
Suppose that we divide the range of X into bins of length !.
Suppose
Let usthat we divide the range of X into bins of length !. Let us
assume that the density is continuous within the bins. Then, byassume
the mean
that the density is continuous within the bins. Then, by the mean
value theorem, there exists a value xi within each bin such thatvalue theorem, there exists a value xi within each bin such that
f (xi )! =
!
(i+1)!
f (x) dx.
(8.23)
i!
f (xi )! =
!
(i+1)!
f (x) dx.
(8.23)
i!
Consider the quantized random variable X ! , which is defined by
Consider the quantized random variable X ! , which is defined by
X ! = xi
if i! ≤ X < (i + 1)!.
(8.24)
f(x)
X ! = xi
if i! ≤ X < (i + 1)!.
(8.24)
f(x)
∆
∆
x
FIGURE 8.1. Quantization of a continuous random variable.
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
x
FIGURE 8.1. Quantization of a continuous random variable.
Differential entropy - definition
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Examples
f(x)
a
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
b
x
Examples
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Differential entropy - the good the bad and the ugly
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Differential entropy - the good the bad and the ugly
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Differential entropy - multiple RVs
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Differential entropy of a multi-variate Gaussian
SUMMARY
41
Proof: We have
!
2−H (p)−D(p||r) = 2
!
p(x) log p(x)+
p(x) log r(x)
!
=2
"
≤
p(x)2log r(x)
"
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
=
p(x)r(x)
= Pr(X = X # ),
r(x)
p(x) log p(x)
(2.151)
(2.152)
(2.153)
(2.154)
(2.155)
where the inequality follows from Jensen’s inequality and the convexity
of the function f (y) = 2y .
!
Parallels
with discrete
entropy....
The following telegraphic
summary omits qualifying
conditions.
SUMMARY
Definition The entropy H (X) of a discrete random variable X is
defined by
"
H (X) = −
p(x) log p(x).
(2.156)
Properties of H
x∈X
1. H (X) ≥ 0.
2. Hb (X) = (logb a)Ha (X).
3. (Conditioning reduces entropy) For any two random variables, X
and Y , we have
H (X|Y ) ≤ H (X)
(2.157)
with equality if and only if X and Y are independent.
!
4. H (X1 , X2 , . . . , Xn ) ≤ ni=1 H (Xi ), with equality if and only if the
Xi are independent.
5. H (X) ≤ log | X |, with equality if and only if X is distributed uniformly over X.
6. H (p) is concave in p.
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
!
!
....
....
....
....
Parallels with discrete entropy....
42
ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION
Definition The relative entropy D(p ! q) of the probability mass
function p with respect to the probability mass function q is defined by
!
p(x)
.
(2.158)
q(x)
Definition The mutual information between two random variables X
and Y is defined as
!!
p(x, y)
.
(2.159)
p(x, y) log
I (X; Y ) =
p(x)p(y)
D(p ! q) =
x
p(x) log
x∈X y∈Y
Alternative expressions
42
1
ENTROPY, RELATIVE ENTROPY,
H (X) = EAND
logMUTUAL, INFORMATION
p
p(X)
(2.160)
1
Definition The relative
, of the probability
(2.161) mass
H (X, Y ) entropy
= Ep log D(p ! q)
p(X,
)
function p with respect to the probability Ymass
function q is defined by
1
, p(x)
(2.162)
p(X|Y
)
.
(2.158)
p(x) log
x
q(x)
p(X, Y )
The mutual
variables X
I (X; Yinformation
) = Ep log between, two random
(2.163)
p(X)p(Y )
....
....
....
....
log
H (X|Y ) = Ep!
D(p ! q) =
Definition
and Y is defined as
p(X)
PROBLEMS
43
. p(x, y)
(2.164)
.
(2.159)
p(x, y)q(X)
log
!!
D(p||q)
= Ep log
I (X; Y ) =
p(x)p(y)
Properties
of D and Ix∈X y∈Y
Relative entropy:
D(p(x, expressions
y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)).
Alternative
1. I (X; Y ) = H (X) − H (X|Y ) = H (Y ) − H (Y |X) = H (X) +
H
(Y
)
−
H
(X,
). IfFall
inequality.
f 2009,
is a convex
University of Jensen’s
Illinois at Chicago
ECEY534,
Natasha function,
Devroye1 then Ef (X) ≥ f (EX).
H (X) =
, = q(x), for all x ∈(2.160)
2. D(p ! q) ≥ 0 with equality
if E
and
only if p(x)
p log
p(X) a , a , . . . , a and
LogX. sum inequality. For n positive numbers,
1 2
n
b3.1 , Ib(X;
1 equality if and only if
2 , . .Y. ,) b=
n , D(p(x, y)||p(x)p(y)) ≥ 0, with
,
(2.161)
H (X,
p(x, y) = p(x)p(y)
(i.e.,Y )X=and
are
independent).
"EnpYlog
# p(X,
$Y
n
n )
!
! distribution
a
ai uniform
i
u
is
the
over
X
,
then
D(p
!
4. If | X |= m, and
$i=1
ai log ≥
ai log 1
(2.165)
n
u) = log m − H (p). bi
i=1,bi
(2.162)
i=1 H (X|Y ) = E
i=1
p log
5. D(p||q) is convex in the pair (p, q). p(X|Y )
ai
with equality if and only if bi = constant. p(X, Y )
Chain rules
I (X; Y ) = "
E log |X , . . .,, X ).
(2.163)
Entropy: H (X1 , X2 , . . . , Xn ) = pni=1 H (X
i
i−1 )
1
p(X)p(Y
Data-processing
inequality. If X → Y →
Z forms
a Markov
chain,
information:
I Mutual
(X; Y ) ≥
I (X; Z).
"n
p(X)
;Y) = =
I (X
I (X1 , X2 , . . . , XnD(p||q)
i ; Y |X1 , X.2 , . . . , Xi−1 ).
i=1E
(2.164)
p log
q(X) to {fθ (x)} if and only
Sufficient statistic. T (X) is sufficient relative
if I (θ ; X) = I (θ ; T (X)) for all distributions on θ .
Parallels with discrete entropy....
Properties of D and I
Fano’s inequality. Let P = Pr{X̂(Y ) #= X}. Then
1. I (X; Y ) = H (X) − He (X|Y ) = H (Y ) − H (Y |X) = H (X) +
H (Y ) − H (X, YH
).(Pe ) + Pe log |X| ≥ H (X|Y ).
(2.166)
2. D(p ! q) ≥ 0 with equality if and only if p(x) = q(x), for all x ∈
Inequality.
If X and X $ are independent and identically distributed,
X.
then
3. I (X; Y ) = D(p(x, y)||p(x)p(y)) ≥ 0, with equality if and only if
(X) independent).
p(x, y) = p(x)p(y)Pr(X
(i.e.,=XX$and
) ≥ 2Y−Hare
,
(2.167)
4. If | X |= m, and u is the uniform distribution over X, then D(p !
u) = log m − H (p).
5.PROBLEMS
D(p||q) is convex in the pair (p, q).
2.1 Coin
until the first head occurs. Let
Chain
rulesflips. A fair coin is flipped
"n
X denote
of
flips
required.
, Xnumber
,
.
.
.
,
X
)
=
Entropy:
H (X1the
2
n
i=1 H (Xi |Xi−1 , . . . , X1 ).
(a) Find the entropy H (X) in bits. The following expressions may
Mutual information:
"
be useful:
I (X1 , X2 , . . . , Xn ; Y ) = ni=1 I (Xi ; Y |X1 , X2 , . . . , Xi−1 ).
∞
!
n=0
rn =
1
,
1−r
∞
!
n=0
nr n =
r
.
(1 − r)2
(b) A random variable X is drawn according to this distribution.
Find an “efficient” sequence of yes–no questions of the form,
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
....
....
....
....
Parallels with discrete entropy...
PROBLEMS
43
Relative entropy:
D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)).
Jensen’s inequality. If f is a convex function, then Ef (X) ≥ f (EX).
Log sum inequality. For n positive numbers, a1 , a2 , . . . , an and
b1 , b2 , . . . , bn ,
" n #
$n
n
!
!
ai
ai
$
ai log ≥
ai log i=1
(2.165)
n
bi
i=1 bi
i=1
....
i=1
with equality if and only if
ai
bi
= constant.
Data-processing inequality. If X → Y → Z forms a Markov chain,
I (X; Y ) ≥ I (X; Z).
Sufficient statistic. T (X) is sufficient relative to {fθ (x)} if and only
if I (θ ; X) = I (θ ; T (X)) for all distributions on θ .
....
....
....
Fano’s inequality. Let Pe = Pr{X̂(Y ) #= X}. Then
H (Pe ) + Pe log |X| ≥ H (X|Y ).
(2.166)
Inequality. If X and X $ are independent and identically distributed,
then
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Pr(X = X $ ) ≥ 2−H (X) ,
(2.167)
PROBLEMS
2.1
Coin flips.
A fair coin is flipped until the first head occurs. Let
X denote the
number of flipsand
required.mutual information
Relative
entropy
(a) Find the entropy H (X) in bits. The following expressions may
be useful:
∞
!
n=0
rn =
1
,
1−r
∞
!
n=0
nr n =
r
.
(1 − r)2
(b) A random variable X is drawn according to this distribution.
Find an “efficient” sequence of yes–no questions of the form,
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Properties
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
ASIDE: A general definition of mutual information
252
DIFFERENTIAL ENTROPY
Definition The mutual information between two random variables X
and Y is given by
I (X; Y ) = sup I ([X]P ; [Y ]Q ),
(8.54)
P ,Q
where the supremum is over all finite partitions P and Q.
This is the master definition of mutual information that always applies,
even to joint distributions with atoms, densities, and singular parts. Moreover, by continuing to refine the partitions P and Q, one finds a monotonically increasing sequence I ([X]P ; [Y ]Q ) ! I .
By arguments similar to (8.52), we can show that this definition of
mutual information is equivalent to (8.47) for random variables that have
a density. For discrete random variables, this definition is equivalent to
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
A quick example
• Find the mutual information between the correlated Gaussian random
variables with correlation coefficient "
• What is I(X;Y)?
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
More properties of differential entropy
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
More properties of differential entropy
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Examples of changes in variables
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Concavity and convexity
• Same as in the discrete entropy and mutual information....
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Maximum entropy distributions
• For a discrete random variable taking on K values, what distribution
maximized the entropy?
• Can you think of a continuous counter-part?
[Look ahead to Ch.12, pg. 409-412]
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Maximum entropy distributions
[Look ahead to Ch.12, pg. 409-412]
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Maximum entropy examples
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Maximum entropy examples
[Look ahead to Ch.12, pg. 409-412]
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
38
ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION
calculate a function g(Y ) = X̂, where X̂ is an estimate of X and takes on
Estimation
entropy
X̂. We will and
not restrictdifferential
the alphabet X̂ to be equal
to X, and we
values inerror
• A counter
will also allow the function g(Y ) to be random. We wish to bound the
probability that X̂ != X. We observe that X → Y → X̂ forms a Markov
chain. Define the probability of error
part to Fano’s inequality
RVs...
! for discrete
"
(2.129)
Pe = Pr X̂ != X .
Theorem 2.10.1 (Fano’s Inequality)
For any estimator X̂ such that
X → Y → X̂, with Pe = Pr(X != X̂), we have
H (Pe ) + Pe log |X| ≥ H (X|X̂) ≥ H (X|Y ).
(2.130)
This inequality can be weakened to
1 + Pe log |X| ≥ H (X|Y )
(2.131)
Why can’t we use Fano’s?
or
Pe ≥
H (X|Y ) − 1
.
log |X|
(2.132)
Remark Note from (2.130) that Pe = 0 implies that H (X|Y ) = 0, as
intuition suggests.
Proof: We first ignore the role of Y and prove the first inequality in
(2.130). We will then use the data-processing inequality to prove the more
traditional form of Fano’s inequality, given by the second inequality in
(2.130). Define an error random variable,
#
1 if X̂ != X,
(2.133)
E=
0 if X̂ = X.
using
the Natasha
chain rule
for
University of Illinois at Chicago Then,
ECE 534,
Fall 2009,
Devroye
entropies to expand H (E, X|X̂) in two
different ways, we have
H (E, X|X̂) = H (X|X̂) + H (E|X, X̂)
$ %& '
(2.134)
Estimation error and differential entropy
58
ASYMPTOTIC EQUIPARTITION PROPERTY
probability distribution. Here it turns out that p(X1 , X2 , . . . , Xn ) is close
to 2−nH with high probability.
We summarize this by saying, “Almost all events are almost equally
surprising.” This is a way of saying that
!
"
Pr (X1 , X2 , . . . , Xn ) : p(X1 , X2 , . . . , Xn ) = 2−n(H ±!) ≈ 1
(3.1)
if X1 , X2 , . . . , Xn are i.i.d. ∼ p(x).
#
#
In the example just given, where p(X1 , X2 , . . . , Xn ) = p Xi q n− Xi ,
we are simply saying that the number of 1’s in the sequence is close
to np (with high probability), and all such sequences have (roughly) the
same probability 2−nH (p) . We use the idea of convergence in probability,
defined as follows:
Definition
(Convergence
University of Illinois at Chicago
ECE 534, Fall 2009,
Natasha Devroyeof
random variables). Given a sequence of
random variables, X1 , X2 , . . . , we say that the sequence X1 , X2 , . . . converges to a random variable X:
1. In probability if for every ! > 0, Pr{|Xn − X| > !} → 0
2. In mean square if E(Xn − X)2 → 0
3. With probability 1 (also called almost surely) if Pr{limn→∞ Xn =
X} = 1
The AEP for continuous RVs
3.1
ASYMPTOTIC EQUIPARTITION PROPERTY THEOREM
equipartition
property is formalized in the following
• The AEPThe
for asymptotic
discrete RVs
said.....
theorem.
Theorem 3.1.1
(AEP)
If X1 , X2 , . . . are i.i.d. ∼ p(x), then
1
− log p(X1 , X2 , . . . , Xn ) → H (X)
n
• The
in probability.
(3.2)
Proof: Functions of independent random variables are also independent
random variables. Thus, since the Xi are i.i.d., so are log p(Xi ). Hence,
weak law of RVs
large numbers,
AEP by
forthe
continuous
says.....
1$
1
log p(Xi )
− log p(X1 , X2 , . . . , Xn ) = −
n
n
(3.3)
i
→ −E log p(X)
= H (X),
which proves the theorem.
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
in probability
(3.4)
(3.5)
!
Typical sets
• One of the points of the AEP is to define typical sets.
• Typical set for discrete RVs...
• Typical set of continuous RVs....
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Typical sets and volumes
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Summary
256
DIFFERENTIAL ENTROPY
SUMMARY
h(X) = h(f ) = −
.
f (X n )=2−nh(X)
!
f (x) log f (x) dx
. nh(X)
.
Vol(A(n)
! )=2
256
(8.82)
(8.83)
(8.84)
H ([X]2−n ) ≈ h(X) + n.
DIFFERENTIAL ENTROPY
1
log 2π eσ 2 .
2
1
log(2π e)n |K|.
h(Nn (µ, K)) =SUMMARY
2
!
f!
D(f ||g) = f log ≥ 0.
h(X) = h(f ) = − g f (x) log f (x) dx
h(N(0, σ 2 )) =
n
"
(8.81)
S
(8.85)
(8.86)
(8.87)
(8.81)
S
. n ) −nh(X)
. . .n,)X
=
h(Xi |X1 , X2 , . . . , Xi−1 ).
h(X1 , Xf2 ,(X
=2
(8.88)
(8.82)
i=1
.
(n)
nh(X)
.
Vol(Ah(X|Y
! )=2) ≤ h(X).
h(aX)
= h(X)
++n.log |a|.
H ([X]2−n
) ≈ h(X)
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye!
f (x, y)
2
Y) 1
= logf2π
(x,eσ
y)2log
h(N(0, σI (X;
)) =
. f (x)f (y) ≥ 0.
2
1
n
= log(2π e)
|K|.
max h(X) 1
t
e)n |K|.
(µ, =K
K)) = log(2π
h(NnEXX
2
2
! 1 2h(X|Y )
E(X − X̂(Y ))2 ≥
e f .
e
D(f ||g) = 2π
f log
≥ 0.
g
Summary
(8.83)
(8.89)
(8.90)
(8.84)
(8.91)
(8.85)
(8.92)
(8.86)
(8.87)
size for a discrete random variable.
2nH (X) is the effective alphabet
n
"
2nh(X) is the effective support
set size for a continuous random variable.
,X ,...,X ) =
h(Xi |X
, X2 , . .of. ,capacity
Xi−1 ). C.
(8.88)
h(X
2C is1 the2effective nalphabet size of
a 1channel
i=1
PROBLEMS
8.1
h(X|Y ) ≤ h(X).
h(aX) = h(X) + log |a|.
(8.89)
(8.90)
Differential
entropy.
! Evaluate the differential entropy h(X) =
#
f (x, y)
− f Iln(X;
f for
the
following:
≥ 0.
(8.91)
Y ) = f (x, y) log
−λx(y)
f λe
(x)f
(a) The exponential density, f (x) =
, x ≥ 0.
1
log(2π e)n |K|.
2
1 2h(X|Y )
E(X − X̂(Y ))2 ≥
e
.
2π e
max h(X) =
EXXt =K
(8.92)
2nH (X) is the effective alphabet size for a discrete random variable.
2nh(X) is the effective support set size for a continuous random variable.
2C is the effective alphabet size of a channel of capacity C.
PROBLEMS
8.1
University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye
Differential
entropy. Evaluate the differential entropy h(X) =
#
− f ln f for the following:
(a) The exponential density, f (x) = λe−λx , x ≥ 0.
© Copyright 2026 Paperzz