Leung, Hoi Ming; (1976)Bounds and the evaluation of rate distortion functions." with batch arrivals."

*This research was supported in part by the Air Force Office of Scientific
Research under Grant No. AFOSR-75-2796.
BOUNDS AND THE EVALUATION OF
RATE DISTORTION FUNCfIONS*
Hoi Ming Lemg
Department of Statistics
University of North Carolina
Chapel Hill
Institute of Statistics Mimeo Series #1057
February 1976
ABSTRACT
LEUNG, HOI MING
Bounds and The Evaluation of Rate
Distortion Functions. (Under the direction of STAMATIS
CAMBANIS.)
This work is concerned with calculating (Chapters III
and IV ) and bounding (Chapter V) rate distortion functions
and with some of their properties (Chapter II). A review of
the basic notions and results in the rate distortion theory
is included in Chapter I.
In Chapter II, we consider a memoryless Gaussian vector
source and assume that the variances of the components of its
i.i.d. vectors are known and that no further information on
their covariance structure is available. We study the dependence of the rate distortion function of such a source on the
the covariance structure of its i.i.d. vectors. In particular
we give upper and lower bounds of the rate distortion function
with respect to the square-error fidelity criterion,both expressed only in terms of the known variances, and we show
that the lower bound is tighter than the Shannon lower bound
over a certain distortion region.
Spherically invariant distributions,which are mixtures
of Gaussian distributions are considered in Chapter III. For
spherically invariant random vectors and sequences we find
the Shannon lower bound of their rate distortion functions,
and we show that if the the mixtures do not include Gaussian
distributions with small variances then the lower bound is
tight over a certain range of distortions. Under this condition some simple upper bounds are also obtained which are
valid over only a certain range of distortions.
Chapter IV gives the rate distortion function with respect to the
magnitude-er~or
criterion of a certain class of
continuous densities, including continuous concave densities
with finite support and extending a recent result of Tan and
Yao by sUbstantially weakening the conditions on the densities.
As a by-product of Chapter IV, we obtain in Chapter V
a family of lower bounds of rate distortion functions which
cannot be evaluated by the method presented in Chapter IV,
and we compare them with the Shannon lower bounds. Further
bounds are obtained by using the Vasershtein distance of
distributions.
iv
ACKNOWLEDGEMENTS
I am indebted to my advisor Professor S. Cambanis for
his constant encouragements and helpful suggestions throughout the entire research.He has always shown great enthusiasm
and devotion in exploring various ideas.
It was those in
depth discussions that made this dissertation a reality.
I am grateful to Professor C.R. Baker for introducing
me to the area of statistical communications and also to his
personal concern for me during my years in Chapel Hill.
I also like to express my appreciation to Professors
N.L. Johnson, M.R. Leadbetter, I.M. Chakravarti and E.J.
Wegman for their advice on various academic and personal
matters.
Thanks are due to the department of statistics for providing financial support for the many years of my graduate
study.
Last but not the least, I wish to acknowledge my indebtedness to my wife who provided me time, love and understanding that made all this possible.
CONTENTS
Acknowledgements------------------------------------------ iv
I.
INTRODUCTION TO RATE DISTORTION FUNCTIONS-----------
1
II.
THE RATE DISTORTION FUNCTIONS WITH RESPECT TO THE
SQUARE-E:{ROR CRITERION OF A MEMORYLESS GAUSSIAN
VECTOR SOURCE WHOSE COMPONENTS HAVE FIXED VARIANCES- 17
III.
THE RATE DISTORTION FUNCTIONS WITH RESPECT TO THE
SQUARE-ERROR CRITERION OF SPHERICALLY INVARIANT
SOURCES--------------------------------------------- 37
IV.
RATE DISTORTION FUNCTIONS OF CERTAIN MEMORYLESS
SOURCES WITH RESPECT TO THE MAGNITUDE-ERROR
CRITERION------------------------------------------- 55
V.
BOUNDS TO RATE DISTORTION FUNCTIONS WITH RESPECT
TO THE MAGNITUDE-ERROR CRITERION-------------------- 71
APPENDIX--------------------------------------------------101
BIBLIOGRAPHY----------------------------------------------122
,,'
I
INTRODUCTION TO RATE DISTORTION FUNCTIONS
Rate distortion theory is the branch of information
theory devoted to situations
in
which the entropy of the
source exceeds the capacity of the channel. The name derives
from C.E. Shannon's rate distortion function of an information source with respect to a fidelity criterion. Rate distortion theory provides a mathematical basis for data compression.
The problem of information transmission can be divided
into two categories: [Shannon (1959)J
1. What information should be transmitted?
2. How should it be transmitted?
The first problem pertains to the source coding theory;whereas the second problem pertains to the channel coding theory.
Historically, the second problem was treated and developed
long before the first problem was even considered seriously.
The original work on problem 1 was done by Shannon in his
1959 article "Coding Theorems for a Discrete Source with a
Fidelity Criterion". Since then, the rate distortion theory
has grown steadily both in theory and application.
Unless otherwise stated, all results that are described
here can be found in [Berger(1971)].
§ 1.1
RATE DISTORTION FUNCTION OF A RANDOM VARIABLE
For simplicity,we define the rate distortion function
only for real random variables with probability density functions; the definition for discrete as well as for general
random variables will be similar.
Let X and Y be real random variables. Let P(x,y) be a
measurable mapping from Rlx Rl ~ Rl such that p(x,y) ~ 0; P
is called a distortion measure. Let p(x) be the probability
density function of X and q(ylx) the conditional probability
density of Y given X. The mutual information between X and Y
is
I(X,Y)
= J J p(x)q(ylx)ln ~ dxdy
Rl Rl
where
is
g(y)
the
g
=J
p(x)q(ylx)dx
l
R
probability density of Y.
In the following, we will keep X, and hence also p(x),
fixed and consider all possible pairs of random variables X
and Y, hence all possible conditional probability density
functions q of Y given X. In order to emphasize the dependence on q, we will denote I(X,Y) by I(q). Let
d(q)
=
Ef(X,Y)
=J J
p(x)q(ylx)~x,y)dxdy.
Rl Rl
Then the Rate Distortion Function of X with respect to the
distortion measure p is defined by
R(D)
= inf I(q)
qEQD
for all D > 0
J
where
There is an alternative and perhaps more natural way of
defining this curve in the (D,R) plane. That is, we let R be
the variable, and define D(R), the Distortion Rate Function,
rather than the rate distortion function, by
D(R) =
where
§ 1.2
inf d(q)
qEQR
for all R >
°
QR={q: I(q)~R}.
PROPERTIES OF R(D)
The following analytical expression of R(D) is useful.
THEOREM 1.2.1
Let
~
be the set of all non-negative
functions As satisfying
cs(y)= JooAs(x)p(x)esP(x,y)dX < 1
for all y.
(1.1)
_00
Then
00
R(D)=
For each s
sup
(sD + J p(x)ln As(x) dx).
s<O, AsE~
-00
~
(1.2)
0, a necessary and sufficient condition for As
to realize the supremum in (1.2) is the existence of a probability distribution Gs which is related to As by
[As(x)]-l= Joo esf(x,y) dGs(y)
-00
and is such that Cs(y) = 1
a.e.[dG s ]' Moreover for such As
and Gs ' the rate distortion function R(D) is given parametrically in s by
00
R(Ds ) = sDs + J p(x)lnAs(x)dx
-00
Ds =
/>0
-00
l'OOA S (x)p(x)f(x,y)esp(x,y)dXdG (y)
s
-00
s ::£
o.
There is of course an analogous expression of R(D) for
discrete random variables.
The following theorem summarizes the basic properties
of rate distortion functions.
THEOREM 1.2.2
R(D) is a convex non-negative, monotonic-
ally decreasing function, strictly positive on (O,D
) and
max
R(D)=O for D ~ Dmax ' where Dmax is the smallest value of D
such that R(D)=O. The derivative R'(D) exists and is continuous in the open interval 0 < D < D
and tends to - 00 as D
max
approaches O. A discontinuity in R'(D) can occur only at
D
= Dmax •
The most commonly used distortion measures p(x,y) de-
pend only on the difference between x and y. Typical examples
are the square-error p(x,y)=(x_y)2 and the magnitude-error
P(x,y)=lx-yl. Whenever P(x,y)=P(x-y), we say that pC) is a
difference distortion measure.
In general the calculation of R(D) for a given random
variable is very difficult, thus it is useful to derive lower
and upper bounds of R(D), and if possible, conditions under
which these bounds are tight.
In the important case of difference distortion measure,
Theorem 1.2.1 can be specialized to permit the calculation of
an interesting lower bound to R(D), known as the Shannon
Lower Bound. The Shannon lower bound is given by the following parametric expressions:
4It
.5
s < 0
00
Ds = J f{x)gs (x)dx
-00
where for each s
~
0, gs is the probability density function
and
R'(D)= s.
h(p) denotes the entropy of a random variable with the inRSL(D)~
dicated probability density function. We always have
R(D) for all D > 0 and necessary and sufficient conditions
for equality are given in the following theorem.
THEOREM 1.2.3
Given any s
~
0, RSL(D)=R(D) if and only
if there is a probability density function q such that the
probability density function p of X is given by
p(x) = J q(y)gs(x-y)dy.
-00
Again,Ds is such that R'(D s )= s.
Using the Shannon lower bound technique, the rate distortion function of a Gaussian random variable with zero mean
and variance 0 2 with respect to the square-error criterion is
found to be
1
R(D) = 2
max(O,ln
§ 1.3
2 {I
~ )= :
D>
0
2
•
RATE DISTORTION FUNCTION OF A RANDOM VECTOR
Now let X=(Xl, ••• ,Xn ) be an n-dimensional random vec-
tor. Let also p be a distortion measure and define the single
letter distortion measure Pn on Rnx Rn by
6
I
Pn(x,y) = n
n
~ P(xk'Yk)
k=l
for all x=(xI, ••• ,xn)'Y=(YI' ••• 'Yn ) ERn. The rate distortion function of X is defined by
R(n)(D) = I inf I(q)
n qE~D
for all D > 0
where Y is an n-dimensional random vector with conditional
probability density function given X: q(ylx),
I(q)= I(X,Y) = J J p(x)q(ylx)ln ~ dxdy,
n n
~
R R
g(y) = J p(x)q(ylx)dx
Rn
and
QD={q(Ylx) :d(q)=Ep (X, Y)= J J p(x)q(y\x)p (x,y)dxdy~Dl·
n
n n
n
R R
e.
For a vector of independent random variables we have
the following:
THEOREM 1.3.1
Let X=(XI, ••• ,Xn ) be such that the random
variables (X , ••• ,Xn ) are independent and denote by Rk the
1
rate distortion function of each Xk,k=l, •.• ,n. Then the rate
distortion function R(n) of X is given by the parametric
expression
where
D = 1. ~ D(k)
s n k=l s
s < 0
and D is the value of D such that R'(D)= s, and similarly
s
for D~k).
For an n-dimensional Gaussian vector we have the
•
7
following:
THEORElVI 1.3.2
The rate distortion function with respect
to the square-error distortion measure of an n-dimensional
Gaussian vector with mean zero has the following parametric
expression:
( )
1
R n (D )= n
e
n
Ak
~ max(O,!lne-)
k=l
e
>
°
where {AkJ~=l are the eigenvalues of t , the covariance matrix of X.
§ 1.4
RATE DISTORTION FUNCTION OF A SEQUENCE OF RANDOM
VARIABLES
Let {Xk};=l be a sequence of random variables. Let also
P be a distortion measure and for each n=1,2, •••
Pn = Rnx Rn - [0,00) by
1
f n= n
define
n
~ P(xk'Yk)
k=l
for all x=(xl, ••• ,xn ), Y=(YI'.'.'Yn ) ERn. The family
Fp={Pn = n=1,2, ••• }
is called the single letter fidelity
criterion generated by p. The rate distortion function R(D)
of the sequence {XkJ;=l is defined by
R(D) = lim inf R(n)(D)
n-.o
where for each n=1,2, ••• , R(n)(D) is the rate distortion
function of x(n)=(Xl, ••• ,Xn ). If the sequence
fXk}~=l is
(strictly) stationary, then lim R(n)(D) always exists and
n-.o
b
thus for a stationary sequence
R(D)=~!m R(n)(D).
For sta-
tionary Gaussian sequences, we have the following expression
for R(D).
THEOREM 1.4.1
Let
{Xn,n=O,~l, •..
J
be a stationary Gaussian
sequence with mean zero a'ld spectral density function
00
1:
<1>
k
e- jkA
k=-oo
where ~k= EXoXk • Then the rate distortion function of {XnJ~
with respect to the square-error fidelity criterion has the
parametric representation
R(D )=
a
~_ J~max(O,ln
f11l
_~
a
)dA
"'I'll
a
~
Da = 12 J min(a,f(A»dA
~ -~
> 0
.
The non-zero portion of the R(D) curve is generated for a in
the interval 0 <
9
<A=ess.sup f(A) «
+00).
An interesting and useful relationship between the distortion rate functions of two stationary ergodic sequences
has been developed recently [Gray,et.al.(1975)]. Let Il and A
be the distributions of the stationary ergodic sequences
X=f~J~=l and Y='{Ynl~=l respectively,i.e. the probability
00
measures induced on R by X and Y respectively. The P distance between Il and A which is a generalization of the d distance [Ornstein(1973)] is defined by
P(Il,A)= sup Pn (Il,A)
n
-
.
Pn (Il,A)= J.nf
1
n
n i~lEIXi-Yil
4It
9
where the infimum is taken over all possible joint distributions of (Xl' •.• ,Xn ) and (Y l , ••• ,Yn ) with those fixed marginals determined by ~ and A respectively.[Gray,et.al.(1975)]
THEOREM 1.4.2 [Gray,et.al.(1975)]
Let
D~(R)
and DA(R) be the distortion rate functions
of X and Y with respect to the single-letter fidelity criterion P(x,y)=lx-y\. Then for all R > 0,
Notice that when the stationary sequences X and Yare
i.i.d. with distribution functions F and G respectively, then
P(~,A)= Pl(~,A)= ~(F,G)
by Vallender(1973):
and the latter has been evaluated
00
= J IF(x)-G(x)!dx.
-00
P(~,A)
has not yet been evaluated for any non i.i.d. station-
ary sequences X and Y. For Gaussian (non i.i.d.) stationary
sequences X and Y upper and lower bounds are obtained in
[Gray,et.al.(1975)]. Also expressions for Pn(~,A) are not
available either(for upper and lower bounds see Section 3
of Chapter V).
§ 1.5
RATE DISTORTION FUNCTION OF A STOCHASTIC PROCESS
Let X={X(t,w),-
00
< t <
00) be a real stochastic pro-
cess defined on the probability space (n",p). For each T>O,
let XT={X(t,w),-T ~ t ~ T}, RT be the set of all real func-
u:
tions on [-T,T] and
the a-field generated by the cylinder
sets of RT • Denote by ~y the probability measure induced by
1"
the measurable transformation
X(o,w) from (n,J,p) to
w~
(RT,UT),ioeo the distribution of XToIf Y is another stochastic process defined on the same probability space, the distribution
~Y
of YT is defined similarly and the joint distribution ~XT'YT of XT and YT is the probability measure induced
T
by the measurable transfurmation
~~(X(o,~),Y(o,w))
from (n,J,
p) to (RTxRT,uTx~)oDefine the mutual information between X
T
and YT as follows:I(XT'YT)=~ if ~XT'YT is not absolutely
continuous with respect to
d~
~X
~Y
x
T
and
T
dtL xT , Y
XT'Y T
T
I ( XT ' YT ) = j' J ~(,.-::---=-"-T') ln d (
) d(~X x ~Y )
T T d ~X x ~Y
~X x ~Y
T
T
i f ~X
Y
T' T
R R
«~ X x ~ Y
T
T
0
T
T
T
Le t
T
QD~t~XT'YT: ~XT fixed,
J
PT(u,v)d~X
Y (u,v) :5. D}
T' T
where D > 0 and PT is a measurable map from RTxR T to [0,00)0
If P :R l xR 1 . { 0,00) is a distortion measure and PT(u,v) =
EfT(X(o,w),Y(o,w))=
RTxRT
T
2~ J P(u(t),v(t))dt then Fp={PT , T > O} is called a single-T
letter fidelity criterion; if P(x,y)=(x_y)2 (respectively =
Ix-yl) then Fp is called a square-error (respectively magnitUde-error) fidelity criteriono Now define the rate distortion function of the stochastic process X={X(t),-oo < t < ooJ
with respect to the fidelity criterion Fp by
R(D) = lim inf RT(D)
T-ooo
where
for all D > 0
11
For a stationary process X, the limit always exists and
R(D) = lim RT(D).
T-I()()
The rate distortion functions have been calculated for
the following stochastic processes:
Let X= {X(t,w), -
00
< t < ooJ be a Gaussian process with
zero mean and continuous covariance R(t,s). Let RT be the
integral type operator on L2 [-T,T] with kernel Rand Ak(T),
~~(t) its eigenvalues and eigenfunctions respectively. Assuming it exists,the rate distortion function of X with respect
to the square-error fidelity criterion has the following
parametric expression:
1
00
Ak(T)
R(D e )= lim 2T ~ max(O,!ln
e )
T-I()()
k=l
and
e >
00
De = lim ~T ~ min(e,Ak(T»
T-I()()
k=l
where
00
T 2
1
1
< D < Dmax = lim 2T ~ Ak (T) = lim 2T E[J x (t)dt]
T-I()()
T-t><>
k=l
-T
°
°
1
T
= lim 2T J R(t,t)dt.
T-t><>
-T
When the Gaussian process X is stationary and has spectral
density f(A), its rate distortion function with respect to
the square-error fidelity criterion exists and has the following form:
R(D )=
e
k;
Joomax(O,ln
f~A»dA
_00
e
min ( e, f (A) )dA •
>
°
12
If X is the Wiener process,i.e. R(t,s)= cr 2min(t,s), its rate
~
distortion functioll with respect to the square-error fidelity
criterion is given by: [Berger(1970)]
2cr 2
R(D)= rr 2D
For a Poisson process with rate
a
< D<
~,
~
the rate distortion
function with respect to the magnitude-error fidelity criterion is given by: [Rubin(1974)]
R(D)= lim RT(D)
T~
where
R (D)= ~[R(U)(~) - In ~T]
T
1
~T
e
and RiU)(D) is the rate distortion function with respect to
the magnitude-error fidelity criterion of a sequence of i.i.d.
random variables each uniformly distributed over [O,lJ.
§ 1.6
RELATION OF RATE DISTORTION THEORY TO SOURCE CODING
AND INFORMATION TRANSMISSION
The importance of the rate distortion function R(D)
resides in certain source coding theorems which establish
that R(D) specifies the minimum rate at which one must receive information about the source output in order to be able
to reproduce it with an average distortion of D or less. Also
if the channel capacity C is such that C
~
R(D) then it is
possible to recover the source with fidelity of D or less,
while if R(D) > C, then it is impossible to recover the source with fidelity D. We will describe briefly the main results
without going into some details,like the ergodic theory,etc.
~
13
Source Coding Theorems
A source of information is typically a sequence of
random variables (or random vectors) or a continuous parameter stochastic process; the source is called discretetime in the first case and continuous-time in the second
case. We will consider here in detail the case of a random
sequence. If the sequence is stationary, ergodic, Gaussian
etc., then the source is called stationary, ergodic,Gaussian
etc. respectively. If the sequence consists of i.i.d. random
variables then the source is called memoryless. The purpose
of source coding is to make the source suitable to transmit
through the channel.
We will from now on consider discrete-time sources
X={XJ~=l
whose random variables take values in the space
Ao equipped with the a-fields of subsets a o ' We call Ao the
source alphabet. We remark that R(D) which was defined when
Ao is the real line can be
spaces (Ao,a )' Let A(n)=
o
length n is an element of
similarly defined for general
n
TT A
I
0
A (n) •
00
and
A= TT Ao ' A word of
I
A
A
A
Let Ao be a reproducing alphabet and (Ao,a o ) a measurable space which is determined by the channel to be used
A(
)
n
A
A
for transmission. Similarly define A n = TTA . A and A are
I
0
called the source space and channel space respectively.
A code on A is a map from A into A. A code of size k
(n)... ,
and length n is a map from A(n) to B c A(n~ where B={ yI'
(n ») Each y~n) is called a code word of length n. Let P
Yk
.
1
n
1':';
be a map from A(n)x A(n) to [0,00) such that for each fixed
4It
yE A(n), Pn("y) is a measurable map from A(n) to [0,00).
Pn(x,y) expresses the cost or loss of coding x by y. Each
source word xE A(n) is mapped into that yE B which minimizes
p (x,y). The minimum is dr,noted by Pn(xIB)= min p (x,y). The
n
yEB n
average distortion of the code determined by B is defined by
Pn (B)=EPn (X(n)IB), where x(n)=(X l , ••• ,Xn
) and its rate by
-R= n-lln k. If Pn (B) -< D, then we say that the code determined by B is D-admissible. The most general source coding
theorem for a discrete-time, stationary, ergodic source is
as follows:
THEOREM 1.6.1 (Source Coding Theorem)
Let X={Xn}~=l be a discrete-time,stationary ergodic
source having rate distortion function R(D) with respect to
the single -letter fidelity criterion Fp
,
and assume that
there exists a y*E Ao such that EP1(Xl,y*) < 00. Then, for
any E>O and any D ~
such that R(D) < 00, there exists a
°
(D+E)-admissible code with rate less than R(D)+E. Also there
is no D-admissible codes with rate less than R(D).
Source coding theorems on discrete-time,stationary
sources without the ergodic assumption were given by Gray
and Davission(1974). The solution was given in terms of a
weighted distortion rate function rather than the conventional rate distortion
or distortion rate function.
Source coding theorems similar to Theorem 1.6.1 have
been proved for discrete memoryless sources and for the
..
15
following continuous-time sources:
(1) stationary Gaussian processes [Gallager(1968)]
(2) the Wiener process [Berger(1970)]
(3) the Poisson process.[Rubin(1974)]
Information Transmission Theorems
For simplicity, we consider only a memoryless channel
and its capacity. Other types of channels are described in
[Gallager(1968)]. A memoryless channel with input alphabet
~
A
(Ao,a o ) and output alphabet (Ao,a o ) is defined as follows.
If the input to the channel is the sequence of random
variables X={XnJ~=l ' with each random variable taking values
in Ao ' then its output is the sequence of random variables
A
Y={YJ~=l' with each random variable taking values in Ao '
which is determined by its (regular) conditional distribution
given X=x which is of the form
00
Q(E,x) =
00
00
IT
k=l
A
Qk(Ek,xk )
00
IT E E IT a , x={x0~=1 E IT A and each Q is a
k
k=l k
1 0
1 ~ A
(regular) conditional distribution on (Ao,a o ) (and hence the
where E=
conditional distribution of Yk given Xk=x k ). The capacity of
the memoryless channel is defined by
c= sup I(X,Y)
where the supremum is taken over all distributions of X.
We wish to know what is the maximum rate of transmission allowed within a given fidelity. The answer to this
question is given by the following theorem in which the rate
Ib
distortion function plays an important role.
THEOREM 1.6.2
(Information Transmission Theorem)
Let E > 0 and D > 0 be given. Let R(D) be the rate
distortion function of a +,ime discrete, stationary ergodic
source with respect to the single-letter fidelity criterion
Fp
generated by a bounded distortion measure
p. Then the
source output can be reproduced with fidelity D by the output of any channel with capacity C > R(D). Conversely,fidelity
D is unattainable over sufficiently long time intervals at
the output of any channel with capacity C < R(D).
A similar theorem holds for discrete memoryless
sources.
..
...................................------~j
II.
THE RATE DISTORTION FUNCTIONS WITH RESPECT TO THE
SQUARE-ERROR CRITERION OF A MEMORYLESS GAUSSIAN
VECTOR SOURCE WHOSE COMPONENTS HAVE FIXED VARIANCES
§ 2.1 INTRODUCTION
We consider a Gaussian n-dimensional vector X whose
components have zero means and given variances, and we study
the dependence of its rate distortion function (with respect
to the square-error criterion) on its covariance structure.
Theorems 2.2.1 and 2.3.1 give upper and lower bounds of the
rate distortion function respectively. These bounds are defined only in terms of
the
variances of the components of
X and the lower bound is tighter than the Shannon lower bound
over a certain region. It is shown in Theorem 2.4.1 that the
space between the lower and upper bounds is filled up by the
rate distortion functions of such
XiS
under all possible co-
variance structures. Finally, Theorem 2.5.1 shows that the
upper bound of Theorem 2.2.1 remains true even if X is not
Gaussian, and this is true with respect to any single-letter
fidelity criterion.
These results apply to the rate distortion function of
a memoryless Gaussian vector source whose components have
zero means and fixed variances, since this is equal to the
rate distortion function of a Gaussian vector whose distribution is that of the i.i.d. vectors of the source.
18
All properties of rate distortion functions used here
can be found in Berger (1971). A square-error fidelity criterion is used throughout § 2.2 to §2.4 without any further
reminder.
THE UPPER BOUND
§ 2.2
We begin with a property given in Lemma 2.2.1 in its
general form and in its Corollary for the special case of
Gaussian vectors.
LEMMA 2.2.1
Let Rl(D) and R2 (D) be the rate distortion
functions of the random variables Xl and X2 respectively, and
let D(i) be the least D such that Ri(D)
max
D(l) = D(2) =
Dmax ' then
max
max
=
0, i
=
1,2. If
if and only if
for all
0 <
D < Dmax
with equality in the first relationship if and only if
equality holds in the second relationship.
Proof:
Sufficiency.
Suppose Ri(D)
~
2
R (D) for all 0 < D <
Dmax ' Since R'(D) is a continuous function on (O,Dmax),and
the left derivative at Dmax is finite (and ~ 0), R'(D) is
integrable and thus absolutely continuous over each interval
[Do,Dmax ) where 0 < Do < Dmax ' Hence
Dmax
::: J
Do
implies
R
2(D ) dD
..
19
Since Rl(Dmax ) = R2 (Dmax ) = 0 by assumption, we have
Rl(D') ~ R2 (D') for all
< D' < Dmax ' If Ri(D) = R (D),
°<
°
2
D, equality holds throughout each step.
Necessity.
Suppose Rl(D)
~
R2 (D),
° < D.
Then
Rl(Dmax ) - Rl(Do ) ~ R2 (Dmax ) - R2 (Do ) and as in the proof of
sufficiency
Dmax
J
[Ri(D) - R (D)]dD > O.
~
Since this holds for all 0 < Do < Dmax ,and since Rl',R 2' are
continuous on (O,Dmax )' it follows that for all 0 < D < Dmax '
2
2
Ri(D) - R (D) >
if Rl(D)
0
or Ri(D)
~
2
2
R (D). Clearly, Ri(D) = R (D)
= R2 (D).
Lemma 2.2.1 holds also for n-dimensional random vectors
whose rate distortion functions have equal Dmax points, since
its proof uses only the basic properties of a rate distortion
function and does not involve the dimensionality.
The parametric expression of RX(D) for an n-dimensional zero mean Gaussian random vector X has the following form:
n
A(X)
~ max(O,tln k )
= 1
n k=l
e
o
1
n
(X)
D (e) = - ~ min(e,A
k
X
n k=l
<
e
)
where A~X) are the eigenvalues of the covariance matrix of X.
Note that the function DX(e) is one-to-one for 0 < e < max A(X~
l<k<n k
its inverse function is denoted b y eX(D) f or 0 < D -< ln ~nk=l A(X)
k •
Clearly D(X) = l ~ A(X).
max
n k=l k
20
COROLLARY TO LEMMA 2.2.1
Let X,Y be two Gaussian n-dimen-
sional random vectors whose components have zero means and
the same set of variances
ai, ... ,a~.
RX(D) < Ry(D)
Then
for all 0 < D
if and only if
eX(D) :::. 0y(D)
n
for all 0 < D < D = 1 L: a 2
max n
k=l k
with equality in the first relationship if and only if
equality holds in the second relationship.
Proof:
From the parametric expression of R we have
X
[Berger (1971),p.lllJ
1
for all 0 < D < D(X)
max
and also D(X) = D(Y) =! ~ a 2 since the components of X
max
max
n k=l k
and Y have the same set of variances. Thus Ox :::. 0 y if and
only if Rk(D) :::. RY(D). Now the result follows from Lemma 2.2.1.
We will make use of the following result in the form of
its corollary.
LEMMA 2.2.2 [Fan (1949)J.
Let the eigenvalues Ai of a sym-
metric positive definite N x N matrix A be so arranged that
For any positive integer q, 1
q
J.:
q T
A. = min 1: x.Ax.
x.j=l J J
. 1 J.
J.=
where
fX j }3=1
J
are orthonormal vectors in RN.
~
q < N
21
COROLLARY TO LEMMA 2.2.2
of A, then for all 1
~
q
1::
i=l
Proof:
If of are the diagonal elements
N,
~
q
A. <
1. -
q
1::
i=l
2
0 ••
1.
Take x3 = (0,0, ••• ,0,1,0, ••• ,0), i.e. 1 in the jth
place and zero elsewhere. Then x~Ax. = o~ and the result
J
J
follows immediately from Lemma 2.2.2.
J
We now show that among all zero mean Gaussian vectors
X whose components have the same set of variances, the one
with independent components has the highest rate distortion
function and is thus hardest to transmit (i.e., it requires
the highest channel capacity). This provides an upper bound
for RX which is expressed only in terms of the variances of
its components.
Let X and Y be two n-dimensional Gaussian
THEOREM 2.2.1
vectors whose components have zero means and the same set of
variances and suppose that the components of Yare independent. Then
and equality holds if and only if the components of X are
also independent.
=
Proof:
1
n
n
1::
k=l
min(e,A k ),
and
1
n
1::
n k=l
max(O,tln
1
=n
°<
n
1::
k=l
2
min(e,ok) ,
e, be the rate
22
n
distortion functions of X and Y respectively, where {Ak}k=l
2 n
are the eigenvalues and {crk}k=l the diagonal elements of
the covariance matrix of X. By Corollary to Lemma 2.2.1, it
sUffices to show that ex(D) ~ ey(D) for all
1
-
< D< D
=
max
2
n
r:
0
n k=l
cr k •
Arrange the eigenvalues A and the variances cr 2 in
k
k
increasing order, i.e., 0 ~ Al ~ A ~ ••• ~ An and 0 < crf <
2
2
2
cr2 ~ ••• ~ °n- , and fix D E (O,D
). Assuming 0.2 < ey(D) <
max
. J
O}+l and
< 8X(D)
'P+l we have nD=
••• +cr}+(n-j)ey(D)
= Al + A2 +···+ Ap + (n-p)ex(D).
~
'p
oi+o~+
Case (i):
Suppose 0 ~ p ~ j ~ n, and define cr; = A = 0
o
for convenience. By the Corollary to Lemma 2.2.2,
Also Ak > eX(D)
J
r:
k=l
for k= p+l, ••• ,j. Thus
Ak + (n-j)ey(D) ~
j
r: cr~+ (n-j)ey(D) =
k=l
P
j
k=l
k=l
= r: Ak + (n-p)eX(D) ~ r: A + (n-j)eX(D).
k
This impl~es ey(D) ~ ex(D). Equality holds if and only if
j
r: Ak = r:J cr k2 , l~ 'J < n, l.e. lf an d on1 y 1of ~k=
1
2
Ok'
k=l
k=l
1 ~ k < n.
0
Case (ii):
have
0
Suppose 0 < j < P ~ n. Then similarly we
23
j
2
E crk + (n-j)9 y (D) =
k=l
=
Hence 9y (D)
~
9X (D) and equality holds if and only if
j
1 < j < n,
E A
k=l k
i.e. if and only if
1 < k < n.
~
We have thus shown that RX(D)
Ry(D) for all 0 < D
and that eqality holds if and only if Ak = cr~, k=l, ••• ,n. We
now complete the proof of the theorem by showing that Ak = cr~,
k=l, ••• ,n, if and only if the components of X are independent,
i.e. if and only if its covariance matrix
~
is diagonal. The
"if" part of the statement is obvious. For the "only if" part
we assume that (Al, ••• ,A n ) = (cri, ••• ,cr~), the equality being
between sets of real numbers in their original order ( and
not rearranged in increasing order). Let t = UDU T be the
Jordan canonical form of t, where
D
is a unitary matrix and
is the diagonal matrix diag(Al, ••• ,A n ). Denote the k th row
U
of U by e T = (uk 1"'" uk n) • Then we have
k
n
2
E A .u · •
j=l J k J
On the other hand, by assumption, each cr~ is equal to one of
the Ai's, i.e.
2
crk = Ai(k)
where i maps (l, ••• ,n) onto itself and is such that i(k)li(j)
for klj. It then follows from
24
2
n
l:
A.U , that each vector e k has all components 0
j=l J k J
except for the i(k)th component which equals 1 or -1. Now if
i ={akj~' it follows from t = UDU T that
a kj = Ak e yTr ; T
j
and thus
t
=t
k
if
k = j
if
k
-I
j
is the diagonal matrix diag ( AI' · • . , An) .
As a consequence of Theorem 2.2.1 we have the following
ordering of the rate distortion functions for different covariance structures.
COROLLARY TO THEOREM 2.2.1
Let X and Y be two n-dimensional
zero mean Gaussian vectors and let
{AiX~=l
and
{frji~l
be
the eigenvalues of their covariance matrices respectively
arranged in increasing order. Then RX(D)
~
Ry(D) for all
o < D irand only if
~ A~X)< ~ A~Y) ,
. 1 J.
J.=
-. 1
J.=
J.
1
~
P
~
n.
for each i =1, ... ,n.
Equality holds if and only if Ai X)= A~Y)
J.
Proof:
The "if" part follows from Theorem 2.2.1 by repla-
cing cr~ by A~ y). The "only if" part follows by a similar arJ.
J.
gument as in the proof of Theorem 2.2.1.
§ 2.3
THE LOWER BOUND
Among all zero mean Gaussian vectors X whose components
have the same set of variances, the easiest to transmit and
thus the one with the least rate distortion function, should
be one whose components are completely correlated. This is
25
shown in the following theorem which thus provides a lower
bound for Rx ' This lower bound turns out to compare favorably
with the Shannon lower bound.
THEOREM 2.3.1
If X is an n-dimensional Gaussian vector
whose components have zero means and variances
0
2
2
1 , ••• ,on'
then
for all 0 < D <
1 n 2
E 0 .•
n ~=
. 1 ~
Let Y = (Yl, ••• ,Yn ) be a degenerate Gaussian vector
whose components have zero means and EY.Y. = 0.0. for all
Proof:
~
i,j = l,2, •.• ,n
J
~
J
and let! be its covariance matrix. It is
well known that Rank(i)
= 1. Thus
~
has a zero eigenvalue
of mul tiplici ty n·-l and one posi tive eigenvalue of mul tiplicity 1. Since
n
E A.=
. 1
~
~=
it
n
E
. 1
~=
follows
2
0 .• Let
~
n
~
. 1
~=
2
o.
~
where A; are the eigenvalues of t,
...
that
t =
the simple non-zero eigenvalue
is
1
UDUbe the Jordan canonical form of t ,
where U is a unitary matrix and D is the diagonal n x n
n
matrix diag(O, •.• ,O, ~ o? ). If Z = U-ly, then Z=(Zl",.,Zn)
. 1
~=
~
is Gaussian with independent components where Zl,Z2' ""Zn_l
are degenerate with zero variances, i.e. Zi= 0 a.s. 1 < i <
n-l, and Zn is Gaussian with mean zero and variance
n
E
. 1
~=
2
0 .•
~
Since H(Z.)=
-p(O)ln p(O) = -1 ln 1 = 0 and 0 < RZ . (D)
~
~
< H(Zi) it follows that RZ . (D) = 0 for i = 1,2, ••• ,n-l.
~
Since the components of Z are independent we have (see p.57
of Berger (1971»
26
1 n
.
D = - l: D1
s n i=l s
and since RZ. ;: 0 for i= 1, ••• ,n-l we obtain Ds = n1 Dns
1
R (Ds ) = 1 R (Dn ) = 1 R (nD )' Hence
Z
Zn
s
n Zn s
and
n
11
l: cr~
(Zn) 1 n
. 1 1
1=
< D <1 D
:=
In
0
RZ(D)= ~ Rzn(nD) = .J.
for
l: a~1
2n
nD
-n max
n i==l
(Z.1 )
and D( Zn)
n 2
since Dmax
= 0 for i = l, ••. ,n-l
= l: a .• It now
max
i=l 1
follows from the Corollary to Theorem 2.2.1 that RZ(D)~RX(D)
for all D > 0, since the eigenvalues in increasing order of
n
the covariance matrices of Z and X are (0, ••• ,0, l: O'~) and
. 1 1
.
.
n
n
2
1=
(Al, ••• ,A ) respectlvely, wlth l: A.=
l: cr •• Also RZ(D)=Ry(D)
'1 1 1'11
n
1=
since the unitary transformation Z = U-1y preserves mutual
n
information as well as the Euclidean norm Pn(y,z)= 1 nl:(Yk-zk) 2
k=l
in the definition of R(D) (see p.110 of Berger (1971». Thus
Ry(D)
~
e"
RX(D).
The lower bound RL(D) of Theorem 2.3.1:
n
2
l: cr.
. 1 1
1=
1
RL(D) = 2n In nD
o <
n
D < 1 l: O'~
- n 1=
. 1 1
should be compared with the Shannon lower bound RSL(D) (see
p.lll of Berger (1971»:
R
where
f
SL
(D)
= .J. In ~
2n
Dn
is the covariance matrix of X. For n=l we have
for all 0 < D <
- D
. max = 0'2. We thus
restrict our attention to the case where n ::: 2. The following
e
27
relationship will serve as the basis for the comparison (see
Bellman (1960))
n
tr (f) = E A. =
. 1 l
l=
n 2
l: cr ••
. 1 l
l=
n
Sinceltl=
IT
i=l
A-, it follows that
l
1
n
Itl ~
*
tr(i)
with equality if and only if Al = ••• = An (see p.17 of Hardy,
Littlewood and Polya (1952)). In the following we assume
that t has at least two distinct eigenvalues and thus
Ii ,lin ~
*tr(~).
One sees immediately from the expressions of RL,R SL '
and from the above inequality that RL is defined and strictly
*
positive on the entire interval (0, Dmax =
tr(!)) where RX
is strictly positive, while RSL is defined and strictly
positive only over the subinterval (O'I!l l/n ) of (0, ~ tr (4))).
Thus RL provides a lower bound over the interval
where none was available up to now.
(\~I~,*tr(t)),
Also RL depends only on the variance of the components
of X, while RSL depends on their covariances as well, and R
L
is much easier to compute. Therefore RL can be useful as a
lower bound even if it is smaller than RSL • At first glance,
since RSL depends on the full covariance structure of X, one
would expect it to be a better lower bound than RL • However,
it turns out that (except when Al = ••• = An) there is always
an interval of distortion where RL is better than RSL • This
is seen as follows. It is clear from the expressions of R
L
26
and RSL that
This interval is non-empty if and only if Iti
if and only if t has at
if
t
l~ast
1
n
<
*
tr (t ) , i. e. ,
two distinct eigenvalues. Hence
has at least two distinct eigenvalues we have the fol-
lowing properties:
for
o
lt~l(~~)
< D<
-L
n-l
1
for
(tfflJJ n-l
for Itl
1
n
< D < Dmax = ~ tr(i)
: RL(D) is defined
RSL(D) is not defined.
If
~
has one eigenvalue of multiplicity n then the last two
intervals are empty and the Shannon lower bound is everywhere
better than the one given in Theorem 2.3.1.
§ 2.4
THE AREA BETWEEN THE TWO BOUNDS
We now show that all values between the upper and lower
bounds of Theorems 2.2.1 and 2.3.1 are obtained for appropriate covariance structures.
THEOREM 2.4.1
The family of the rate distortion functions
of all n-dimensional Gaussian vectors whose components have
zero means and the set of variances
crf, ... ,cr~
fills up the
space between the upper and lower bounds of Theorems 2.2.1
and 2.3.1 respectively.
29
Proof:
It suffices to prove that the space between the
upper and lower bounds is filled up by the subfamily of ndimensional Gaussian vectors Xp =(Xi P ), ""~P»
fy EX1~P) =
=pcricr j
°
EX1~P)2 = cr?
which satis-
for all i = 1, ••• ,n and EX~P)X~p)
1
1
J
for all i!j= 1,2, ••• ,n, where 0::s. p ::s.l.
Recall that the rate distortion function Rp of Xp is
given by
1
= 2n
where
n
~i{P)
L max{O, In
e
i=l
n
l
L min {e, ~. (p ) )
• 1
1
1=
(p), i= l, ••• ,n, are the eigenvalues of the covariance
Do (e) = -n
r
and
~.
1
matrix of Xp '
We give the proof in the following three steps.
(I)
~i{P)
is a continuous function of P for each i=l, ..
. ,n.
Proof:
~i{P)
is a solution of a polynomial of degree
n with coefficients multiples of powers of P. Let us denote
this polynomial by
F{P,~).
Then
F{P'~i)=
0. F is a polynomial
in p as well as in ~. Consequently F, ~r. and ~~
are continuous functions of P and
~i'
1
Moreover,
(exist and)
of
ar.
t
1
the degree of F as a polynomial in
~i
is
~
° since
1. Thus, by a
well known result in implicit function theory, Ai{P) is a
continuous function in p.
pIn 2
(2) For every fixed D E{O,D
= - L cr.), ep{D) (the
max n i=l 1
inverve of Dp{O» is a continuous function of p E [o,ll.
Proof:
function of
Fix
P for
PoE [O,lJ. Since Ai(P) is a continuous
each i=l, ••• ,n, given E > 0, there exists
6.1 > 0 such that It..·1
(p )A. (p ) I < nf whenever I P - Po I < {) 1.•
10
Take {) .- min 6 .• Then for every q, 1 ~ q ~ n,
l<i<n 1
Now fix D E (O,D~ax) and let 80 = 8p (D).
o
Case I. Suppose Ap(PO ) < 80 < Ap+l(p ) for some p,
o
1 ~ p < n-l. Since each Ai(p) is continuous, we can choose
such that Ap(p) < 80 < t..P+l(P) for Ip- Pol < 6.
{)
Let
8p = 8 p(D) and suppose Ak(P) < 8p~ Ak+l(P). Then
nD=
p
~
i=l
A. (p )+(n-p)8 =
1
0
0
p
~ A. (P)+(n-p)8 =
i=l
1
0
k
~
i=l
A. (P)+(n-k)8 p •
1
If k = p, clearly 8p - 80 = O. We now consider the cases k > P
and k < p.
(A) p < k < n-l. We now have
P
~ (A. (p ) -
i=l
1
0
t...
1
(p»
=
k
~
i=p+l
A. (P)-(k-p)8 +(n-k)(8p -8 ).
1
0
0
Thus
k
I L:(A.(p). i=p+l
1
p
80 )+(n-k)(8p - 8 )I=} L: (A. (p )- t...
o
i=l 1 0
1
(p»l<
E.
k
Now since Ap(p) < 80 < Ap+l(p), . L: (Ai(P )- 8 ) > O.
o
0
l=p+l
Also 80 < Ap +l (P)<Ak(P)~8,since k > p, and thus 8 p - 8 > O.
0
It then follows that 0 < 8p - 80 < n=k ~ E.
(B) 1 ~ k < p < n. We now have
31
k
L:
i=l
0,.
1
»=
(p ) - A• (p
1
0
=
P
L:
A. (p ) - ( p - k ) ep + (n - p ) (e
i=k+l 1
P
L:
i=k+l
0
- ep )
0
(2.1 )
(A.(P )-ep)+(n-p)(8 -8 p )'
1 0
0
ep
Since k < p, we have Ak(p) <
thus eo-e p > o. Now let Aj(Po ) <
If k < j ~ p, we can write
< Ak+l(P) ~ Ap(p) < eo' and
ep
~Aj+l(Po)
for some j.
j
P
(A.(p )-e p )= L: (A.(p )-8 p ) + .L:. (A1,(po)-e p
i=k+l 1 0
i=k+l 1 0
l=J+l
p
L:
)
(2.2)
where each term of the first sum is negative and each term
of the second sum is positive. Then from (2.1) and (2.2), we
have
P
L:
. '+1
l=J
(A. (p )-e p ) + (n-p)(eo-e~)
l O r
j
k
= E (A.(P)-A.(P»
i=l
1
1
and since ep:s. Ai (p )
+ E
(8 p -A·(P»
i=k+l
1
0
0
i = k+l, •.• , j, we obtain
j
p
.E. (A (po)-8 p )+(n-p)(e o -e p ) < E (A.(p)-A.(p»
l=J+l i
i=l 1
1
0
< E.
It follows that (n-p)(eo-e p ) < E and finally 0 < eo - ep< E.
p
E
i=k+l
If 1 < j ::s. k, then Aj(PO ) < e p ~ Aj +l (p )::s. Ak +l (po)·Thus
(Ai(P )-ep) > 0 and thus from (2.1)
0
P
(n-p)(eo-ep ) ~ i~k+iAi (Po)-9 p )+(n-p)(eo-ep )
k
=
z;: (A. (p)-A. (p
i=l
1
1
0
It follows that 0 < eo-e p < E.
»<
E.
32
Suppose e o = Ap(Po ) I o. Again, by the continuity of the A.1 's, we may choose 6 such that Ap- l(P) < e 0 <
Case II.
Ap+l(p) for IP-pol< 5. Suppose that Ak(p):s. ep < Ak+l(P). We
then have
p-l
nD= ~ A. (p )+(n-p+l)e =
i=l 1 0
0
k
=
~ A.
.1 1
1=
(p )+(n-k)e p '
If k = p-l, then
(n-p+l)\e-e
o
I=I
p-l
~ (A. (P)-A. (p ) 1< E
i=l 1
1
0
and hence 18p-801 < E. If k = p, then
(n-p )(8 p - 80 ) = min{8 0 ,A p (p)} - Ap(p)
and since 80 = Ap(Po ) we have (n-p)18p-801:s.IAp(p)-Ap(Po)l< E
and hence )8p -8 0 '< E • Now for p < k :s. n-l the proof follows
e.
as in (A) of Case I, and for 1 :s. k < P the proof follows as
in (B) of Case I.
1 n 2
~ cr.),
ax = n 1=
. 1 1
(J) For every fixed D E (O,DmP
is a
continuous function of P E [O,lJ.
Proof:
The parametric expression of Rp can be written
as follows by using the function
8p (D):
Ai(P)
1 n 2
~cr .•
Rp(D)= 2n i~lmax(O,ln 8 (D))' O<D<- n 1=
. 1 1
p
1
n
Since by (1 ) each Ai(P) is continuous in p, and by (2) for
each fixed D, 8p (D) is continuous in f , it follows that for
each fixed D, Rp(D) is continuous in p
e
33
Now Theorem 2.4.1 is an immediate consequence of (3),
if we notice that Rl(D) is the lower bound of Theorem 2.3.1
and RO(D) is the upper bound of Theorem 2.2.1.
§ 2.5
AN UPPER BOUND IN THE GENERAL CASE
It is intuitively clear that the property of Theorem
2.2.1 should be true in general and not just for Gaussian
random vectors. This is now shown in Theorem 2.5.1,where a
general single-letter fidelity criterion is also considered.
For its proof, we will need the property stated in the following lemma.
Let X=(Xl , ••• ,Xn ) and Y=(Yl, ••• ,Yn ) be two
random vectors and let p(x) be the probability density func-
LEMMA 2.5.1
tion of X and q(ylx) the conditional probability density
function of Y given X, where x=(xl,···,xn ), y=(Yl' ••• 'Yn).
n
If q(Ylx) = IT q. (Y·lx.) where each q. is a conditional proi=l 1 1 1
1
bability density function, then q.1 is the conditional probability density function of Yi given Xi and
n
I(X,Y) <
I(X.,y.).
I:
i=l
Proof:
1
1
The joint probability density function f(x,y) of
X and Y is given by f(x,y)
= p(x)q(ylx) = p(x)
n
q. (y.lx.).
i=l 1 1 1
Let fk(xk'Yk) be the joint p.d.f. of Xk and Yk and Pk(xk )
the p.d.f. of Xk • Then
IT
34
JR q.1 (y.\x.
)dy. =
1 1
1
since
1
for all
Xl·
E R. This equality
implies that qk is the conditional p.d.f. of Yk given Xk .
Now let g be the p.d.f. of Y and gi the p.d.f. of Yi ,
1 = 1, .•• ,n. We will use the well known property of entropy
H(Y)=- J g(y)ln g(y) dy
Rn
n
~- ~
i=l
J g.(y. )In g. (y. )dy.=
1 1
R
1
1
1
1
n
~
i=l
H(Y.).
1
We have
I(X,Y)= J J f(x,y)ln q(ylx)dxdy - J J f(x,y) In g(y)dxdy
Rn Rn
Rn Rn
n
= ~ J J f(x,y)ln qi(Yi1xi)dxdy - J g(y)ln g(y)dy
i=l Rn Rn
Rn
n
(y.1
Ix.
)dx.
J f. (x. , y. )In q.1
L: J
1
1dy.
1 i=l Rl RI I I 1
n
(y.1)dy.
L: J
gi(Yi) In g.1
1
i=l Rl
n
q·(Y·lx.)
1 1 1 dx.dy.
,Yo
)
In
= L: J J f.1(x.
g.1 (y.1 )
1
1
1 1
i=l Rl Rl
n
= L: I(X.,Y.).
. 1
1 1
1=
<
-
This calculation is justified when the right hand side is
finite, and otherwise it is trivial.
35
THEOREM 2.5.1
Let X
= (Xl' ••• ,Xn ) be a random vector with
joint probability density p(x l , •.• ,xn ) and marginal densities
Pl· (X. ). Let X = (Xl' •.• ,X ) be a random vector with joint
n
1
density
for D >
n
n p.1 (Xl.)' Then
.1
1=
° and a
single-letter fidelity criterion.
1
R-(D ) = X s
n
.
~ R. (D l
n i=l
1
s
n.
D = -I ~ nl
s n i=l s
),
where Ds ' respectively D~, is the value of the distortion at
which the derivative (slope) of RX' respectively Ri equals
s <
° (see
p.57 of Berger (1971)).
For each D >
°
let QD be the set of all conditional
p.d.f.'s q(ylx) such that
d(q) = J
J Pn(x,y)p(x)q(y\x)dXdy < D
n n
R R
where by assumption the distortion measure is given by
Pn(x,y)=
n
1
~
n i=l
P(x. ,Yo )
1
1
where P maps R2 into [0,00). If we denote by EqF(X,Y) the
expected value of F(X,Y) when the joint distribution of X
and Y is determined by p and q: p(x)q(Ylx), then
1
=-
n
~ E P(X.,Y.).
n i=l q
1
1
36
Similarly, for each D>
° and
i = l, ••• ,n, let Q~ be the set
of all conditional p.d.f.'s qi(Yi1Xi) such that
d. (q. )= J J p( x. ,y. )p. (x. )q. (y. \ x. )dx . dy. < D
1 1
1 1
1 1 1 1 1 1 1
1 1
R R
and with similar notation as above we have
d.(q.)
1 1
= Eq. p(X.,Y.).
1 1
1
We will show now that for each fixed s < 0, if q. E Qi .,
1
D1
n
S
i=l, ••• ,n, and if q (Ylx)= n q. (y.\x.) then q E QD • It
o
i=l 1 1 1
0
s
is clear that qo is a conditional p.d.f •• Thus by Lerruna 2.5.1,
if the joint p.d.f. of X and Y is determined by p and qo'
then qi is the conditional p.d.f. of Yi given Xi' and the
joint p.d.f. of Xi and Yi is determined by Pi and qi' It
follows that
1 n
1 n
d(qo)= n l: E p(X. ,Y.) = n . l: 1 Eq. P(X.,Y.)
1 1
i=l qo
1 1
1=
1
n
1 n .
- l: DJ. = D
= n1 l: d.(q.) <
s
. 1 s
.11 1 - n J.=
1=
i.e. qo E QD ' Now from the definition of rate distortion
s
function and Lerruna 2.5.1 we have
n
RX(D )= inf ~ Iq(X,y) < 1 I (X,Y) < 1 E I (X. ,Yo )
s
qEQD
- n qo
- n i=l qi 1 1
s
where the subscripts are used as above to indicate the joint
p.d.f.'s. Since this is true for all qi E Qi i' we have
D
s
1 n
.
inf. I q. (X.1 , y.1 )=
E R; (D1 )= R-X(D )
n i=l ~ s
s
qEQ1. 1
DJ.
and since this is true fors every s < O,it follows that
tt
III. THE RATE DISTORTION FUNCTIONS WITH RESPECT TO THE
SQUARE-ERROR CRITERION OF SPHERICALLY INVARIANT SOURCES
§ 3.1
INTRODUCTION.
When the distribution of a
source
is a mixture
of
distributions whose rate distortion functions are known, it
would be of interest to express the rate distortion function
of this source (or some of its properties) in terms of
the
rate distortion functions of the mixing distributions ( or
their properties).
No results of this
type seem
to
be
currently available in the literature and this chapter is a
small contribution in this direction.
A square-error
fidelity criterion is
considered
throughout with no further reminder.
We consider spherically invariant distributions which
are mixtures of Gaussian distributions. For
spherically
invariant random vectors and sequences we find the Shannon
lower bound of their rate distortion functions, and we show
that if the mixtures do not include Gaussian distributions
with small variances then the lower bound is tight over a
certain range of distortions.
Under this condition
some
simple upper bounds are also obtained which are valid over
only a certain range of distortions.
Jb
These results do not seem to be extendable to contin-
4It
uous parameter spherically invariant processes. Also no source coding theorem is given here for non-ergodic spherically
invariant sequences, the main difficulty being the lack of
an analytic expression of the rate distortion function over
. the entire range of distortions (notice that an ergodic
sperically invariant sequence - or process - is Gaussian
[Vershik(1964)]). The results are clearly applicable to memoryless spherically invariant vector sources and to memoryless sources whose components are spherically invariant
sequences, since in both cases a source coding theorem is of
course known. Also, for a discrete spherically invariant
source, with no source coding theorem presently available,
the rate distortion function can be used along with the channel capacity to provide a lower bound for the minimum attainable distortion.
We now define the class of spherically invariant processes, originally introduced and studied in [Vershik(1964)],
and we introduce the related terminology and notation which
is used throughout. Let X ={ X(t), t E T}, T an arbitrary
index set, be a zero mean (for simplicity) random process.
Let R(t,s) be a non-negative definite function defined on
TxT and F(v) a distribution function such that F(O+)= 0
and
00
a = J v dF(v) <
o
00
•
4It ~
39
X is called spherically invariant with parameters (R,F) if
its n-dimensional characteristic functions have the form
i(u X(t )+ ••• +u X(t»
00
(
)
; (n ) (u)= E e l
I n n = J exp[_V uA n uT]dF(v)
°
.1
2
-.1 -
where .1=(tl, ... ,tn)ETn,~=(uI, ... ,Un)ERn and the n x n matrix
At(n) is defined by At(n)={R(t.,t.»)~ ·_I.For simplicity we
_
_
l
J l,~will assume throughout that each Ai n is non-singular.
Notice that fI n )(u) is
indee~
a characteristic function
since the integrand is the characteristic function of N(O,VAin))
a zero mean Gaussian vector with covariance matrix vAIn), andit is well known that if
(A,L,~)
is a probability space and
~A(~) a characteristic function for every AEAsuch that ~A(~)
is L-measurable for each uERn,then ~(u)= ~~Ad~(A) is again a
characteristic function.Also if PA
(~)
function corresponding to cf>A (u),then
is a probability density
p(~)=
J PA (~)dll·(A) is a
probability density function corresponding to f(~).Thus a
spherically invariant random process can be considered as a
mixture of Gaussian processes.The probability density function
pin) of (X(t l ), ... ,X(tn ») can be calculated easily. Let
p(n)(x)
/ l( ) i /
exp[--l XA(n)-lx], xE Rn ,
.1,v - (2~)n 2
n 12vn 2
2v -.1
1At
be the probability
~ensity
function of N(O,VAin» (notice
that we have assumed that each Ai n ) is nonsingular). Then
n
p(n)(x)= /)Qp(n)(X)dF(V)=(2~)-2IA(n)r~g(XA(n)-lx), xER n
.1
t,v t
n - t
°
where
00
°<
r
40
Notice that if F puts all its mass on a single point v, then
X is Gaussian with covariance function vR. From the expressions for n=2 we can easily see that for all t,s E T
E X(t)X(s) = aR(t,s).
If T ={1,2, .•• } , then X is called a spherically invariant
random sequence with parameters (~,F) where
f
=\R(i,j)}~ '-I
1,J-
is an infinite dimensional non-negative definite matrix. If
T ={1,2, ••. ,n}, then X is called a spherically invariant
random vector with parameters (A(n),F) where A(n)=IR(i,j~~ '-I
1,J-
is an n x n non-negative definite matrix. If T = {I}, then
X is called a spherically invariant random variable with
parameters (a 2 ,F) where a 2= R(l,l).
§ 3.2
THE RATE DISTORTION FUNCTION OF A SPHERICALLY
INVARIANT RANDOM VARIABLE.
For a spherically invariant random variable we calculate
the Shannon lower bound of its rate distortion function and
we find necessary and sufficient conditions for this bound
to be tight over a certain range of distortions.
Let X be a zero mean spherically invariant
random variable with parameters (a 2 ,F). The Shannon lower
THEOREM 3.2.1
bound RSL(D) of the rate distortion function R(D) of X with
respect to the square-error fidelity criterion is given by
aa 2
= 12 In -nfor D > 0 where
-
C(l,F)
41
I
I
C(l,F) = -2+2
- In a +
J27rr J
00
0
2
for all 0 < D < Acr 2
and RSL(D) = R(D)
2
gl(r )In gl(r )dr,
if and only if
F(A-) = 0, where 0 < A < a and F(A-) is the left limit of F
at A.
Note that RSL(D) = RG(D) - C(I,F), where RG(D) is the
rate
distortion function of a Gaussian random variable with
zero mean and variance acr 2 • Since Var(X) = acr 2 , it is known
that RG(D) is an upper bound of R(D) for all D > 0 [Berger
(1971),p.101]. We thus have
I
2 In
acr 2
-n-
acr 2
-C(I,F) ~ R(D) ~ 2 In D
1
for 0 < D < acr 2
and as a by-product we also obtain that C(I,F)
also that if A
~
o. Note
= a in the theorem, i.e., if F(a-) = 0, it
00
follows from
a =
l
vdF(v)
that F assigns all its mass at
a and then X is Gaussian and C(l,F) = O.
Proof:
The Shannon lower bound with respect to the square-
error fidelity criterion is given by [Berger(1971),p.98]
where
00
H(X)
= - J p(x) In p(x) dx.
Since for a spherically invariant random variable with parameters (cr 2 ,F) we have
42
1
=
p (x)
,
-
C)()
<
X
<:
,X),
the expressions for RSL and C(l,F) follow immediately.
Also for fixed D E (O,acr 2 ] we have RSL(D) ~ R(D) if and
only if
~(u)
=
Du 2 A.
exp(--2--)T(U)
is a characteristic function, where ~(u) is the characterictic
function of X [Berger(197l),p.99]. Since
2 2
00
;(u) = J exp[- va u ]dF(v)
2
°
we have
00
;(u) = J exp[- (va
We now show that
if
~
2 D) 2
2
u ldF(v).
°
is a characteristic function if and only
F(D 2 -) = O.
a
D
First assume that F("2-) = O. Then
a
00
exp[-!(va 2 -D)u 2 ]dF(V)
~(u) = J
Dcr- 2
and the integrand is the characteristic function of a Gaussian random variable since vcr 2 -D ~ O. Thus ~ is a characteristic function, since it is a mixture of Gaussian characteristic functions.
Now suppose that F(D 2 -)! O. Then
D
~(u)=
a
"2a
2
2
J exp[-t(vcr -D)u ]dF(v)
°
00
43
The second term on the right is bounded, since it is a multiple of a characteristic function. However the first term is
unbounded since for each fixed v C(O,Dcr- 2 ) we have
as u t
and thus, from a standard result on integrals of
00,
non-negative functions,
ncr- 2 _
J
o
Hence
~
exp[-i(vcr 2 -D)u 2 ]dF(v) 1
00
as u ,
00.
is not a characteristic function.
It follows that for each fixed D E (o,acr 2 J,R sL (D)=R(D)
if and only if F(D 2 -)=o, which is equivalent to the condition
a
stated in the theorem. To see this, suppose that RSL(D)=R(D)
for all D E(O,Aa 2 ] where 0 < A < a. Then taking D = Acr 2 we
have F(A-)=O. Conversely if F(A-)=O, then
2
D
for all
a
2
F(~-)=O
D E (O,Aa ], and thus RSL(D)=R(D) for all D E (O,Acr ].
§ 3.3
THE RATE DISTORTION FUNCTION OF A SPHERICALLY
INVARIANT RANDOM VECTOR.
In this section we consider a spherically invariant
random vector. We find the Shannon lower bound of its rate
distortion function, necessary and sufficient conditions for
this lower bound to be tight over a certain range of distortions, and also two simple upper bounds.
44
THEOREM 3.3.1
R~~)(D)
The Shannon lower bound
of the rate
distortion function R(n)(D) of an n-dimensional zero mean
spherically invariant random vector X with parameters (A(n: F )
is given by
1
]
alA(n)ln
- 2 In ~ - C(n,F)
for D > 0 where
1
n-2
1
---
C(n,F)= - + -21n a + (2 2 r(rr )n)-lJ rn-lg (r 2 )lng (r 2 )dr.
2
2
Also R~~)(D) = R(n)(D) for all
°<
00
°
n
n
D < AA if and only if
F(A-)=O, where 0 < A < a, A = min Ak and the Ak'S are the
l<k<n
eigenvalues of A(n ) .
The n-dimensional version of the Shannon lower bound
is given by
where
n
2
n
2
gns(x) =(exp[s E x.J)/ J exp[s ~ z.Jdz,
. 1 1
. 1 1- 1=
Rn
1=
and s < 0
[Gerish & Schultheis(1964)1. It is easily seen
that
n
'2 In 2Tl'eDs •
Also
H(X)
where
=
tt
and thus
n
In (2rr) -2 1A (n),-t
H(X) --
n
(2rr)-2IA(n)\-t J gn(XA(n)-lxT)ln
gn(~A(n)-l~rJI)d~.
Rn
It follows that
I
R(n)(D)
SL
= ~ In
(n)1 n
a IA D
- C(n,F)
where C(n,F)= ~ + ~ In a + B(n,F), and
Since A(n) is non-singular, there exists a unitary matrix U
such that
UA (n) UT
.
( AI, ••• ,A ) .
= Dlag
n
~ = ~U-l and Y =(v I ' .•• ,vn ) where v.=
~tw. ,i=l, ... ,n.
111
Then dx = d~ =IA(n)lidY and
Let
B(n,F)
Now let r = (vvT)i. A simple calculation shows that the
Jacobian of this transformation is given by
n
2(rr)2 r n-l ,
r(D.)
2
where
r
is the gamma function, and thus
B(n,F)
=
46
It is also shown in [Gerish
&
Schultheis (l96 Lj-
n
that
for fixed D > 0, R~Z)(D) ~ R(n)(D) if and only if ~(u)
D T )<f>(}!) is a characteristic function, where <p is the
exp ('2uu
characteristic function of X. Since in this case
it can be shown as in the proof of Theorem ).2.1 that f is a
characteristic function if and only if vA(n)-DI is nonnegative definite a.e.[dFJ, i.e., all eigenvalues of vA(n2 DI
are non-negative definite a.e.[dFJ. Let {AJ~=l be the eigenvalues of A(n). Then for each fixed v the eigenvalues of
°
vA (n)_DI are {vA i -D}~=l alld thus vA (n)_DI :::.
a.e.LdF-1 if and
only if v
a.e.[dFJ, where A = min A.• It follows that
f\.
l<i<n 1
R~n)(D) = R(n)(D) if and only if
:::.?
'JI,
D
F(r-) = 0.
Following an argument similar to that at the end of the proof
of Theorem ).2.1, we see that the above condition is equivalent to the condition stated in the theorem.
As in the one dimensional case R(n)(D) is upper-bounded
by the rate distortion function R(n)(D) of a Gaussian n-dimenG
sional random vector with mean zero and covariance matrix
aA(n),i.e. the covariance matrix of the spherically invariant
vector X [Binia et al.(1974)J.
We now derive two simpler upper bounds valid only over
a certain range of distortions for the rate distortion
e
function R~n)(D) of an n-dimensional zero mean spherically
invariant random vector X=(X l , •.. ,Xn ) with parameters (A(n~F)
and with F(A-)=O where 0 < A < a.
Let X - (jl'
... ,~ ) be a ran..
n
The first upper bound.
dom vector with the same marginals as X and with independent
components ~.,
i = l, ..• , n. We know from 'rheorem 2.5.1 that
1
Rj(D)
~
RX(D) for every D > O. Rj(D) can be calculated for
A min G~J , where the a ~'s are the diagonal e 1ements
l<k<n
of A(n), as- follows. Since F(A-) = 0 and 0 < A < a,
D (
(0,
- C(l,F)
for
o
<
D~
<
AG~.
Fm'ther, since the random variables X.1 are independent, we
have that
n.
I
I: R--( n1
D ..
)
n 1=
. 1 X.J. s '
'1'11US
s
I n .1
I:
D
n i==1 s
2
aG k.
R-X(D)=
E ln
- C(l,F)
2n k=l
D
1
'rh~second
rl
•
upper bound.
for
'
O < D· < AmIn
Ok2
L<k<n
A better upperbound,wh:ich
is valid over a smaller region of distortion, (Uw
:.~amc'
regi
<Jl i
for which an expression for R~n)(D) is given in Theorem J.j.L),
is obtained as follows. Let
X=
(Xl' •..
,Xn )
be a zero mean
spherically invariant random vector with parameters (B(n),F)
where B(n) is the diagonal matrix diag(Gf, ••. ,G~) and the
a~'s are the diagonal elements of A(n). Then
48
R(n)(D) < Rin)(D) < Rln )(D)
X
- X
- X
for
0 < D < AI..
where A is as in Theorem J.J.l and,by Theorem J.J.l,
o
<: D
:s:. AI...
The first inequality follows from Theorem J.J.l and the fact
that I A(n)1
n
2
~ k~lcrk
[Bellman(1960),p.126]. The second in-
equality follows from Theorem 2.5.1.
Notice that both upper bounds depend only on F and the
diagonal elements of A(n),i.e., on F and the variances of the
components of X, and as such they are useful when the variances of the components of X are known but the entire covariance structure of X,i.e. A(n), is not known. Since
min A
k
l~k~n
~ min cr~ (this follows from min Ak = min n~A(n)~T
l~k<n
l<k<n
uER
by considering the vector ~ whose i th
component is 1 and
all remaining components 0, where i is determined by cr~=
1
minl<k<n cr~) the first upper bound is valid over a larger
range of distortions. Also from R~n)(D) ~ R~n)(D) we have
C(I,F)
~
C(n,F).
Finally, for all 0 < D < AI.. the difference
is given by
n
TT
2
cr k
Rin)(D)-R~n)(D)
RiXn)(D) - Rx(n)(D) = 2nl In k=l
I A (n )1
and thus it does not depend on F.
49
§
3.4
THE RATE DISTORTION FUNCTION OF A SPHERICALLY
INVARIANT RANDOM SEQUENCE.
In this section we express the Shannon lower bound of
the rate distortion function of a spherically invariant sequence in terms of the Shannon lower bound of the rate distortion function of a Gaussian sequence with the same covariance structure. We also find conditions under which this
lower bound is tight, and we show how to find simple upper
bounds under certain conditions.
THEOREM 3.4.1
Let {Xn}~=l
be a zero mean spherically in-
variant random sequence with parameters (t,F). If the Shannon
lower bound of the rate distortion function of a Gaussian
sequence with zero mean and covariance
t exists and if the
limit C(F) = lim C(n,F) exists, then the Shannon lower bound
n-.o
RSL(D) of the rate distortion function R(D) of {Xn\~=l is
RSL(D) = R~L(D) - C(F)
for D > 0, where R~L(D) is the Shannon lower bound of a
Gaussian sequence with zero mean and covariance
RSL(D) = R(D) for all
°< D
~
a~.
Also
Ao if F(A-)=O for some 0 < A
<
a, where 0 = lim min A(n) and the A~n),S are the eigenn-.o l<k<n k
values of the n-dimensional non-negative definite sub-matrix
~(n)={ i(i, j )}l!1., J'-1 of t.
Proof:
From Theorem 3.3.1 the Shannon lower bound R~~)(D)
of an n-dimensional spherically invariant random vector with
50
parameters (f(n),F) is given by
1
R(n)(D) - 1 In
SL
- 2
~
D
- C(n,F)
for D > O. The first term on the right side is the Shannon
lower bound of the rate distortion function of an n-dimensional Gaussian vector with zero mean and covariance matrix
ai(n). Thus, if
and
lim C(n,F) = C(F)
n-ooo
exist, we have
= lim
n-ooo
R~~)(D) = R~L(D) - C(F).
The second part of the theorem also follows from Theorem 3.3·1.
It is clear from its definition that a spherically invariant sequence with parameters (f,F) is wide sense stationary, if and only if it is strictly stationary, if and only if
~
is stationary,i.e. f(i,j) = <f>(i-j).
COROLLARY 3.4.1
If the zero mean spherically invariant
random sequence {XJ~=l with parameters (t,F) is stationary
with spectral density af(A) and F(A-)=O for some 0 < A
then for 0 < D
~
ao
a,
its rate distortion function is given
by
where C(F) = lim C(n,F)
n-ooo
~
and
0 = ess.inf f(A).
4It
51
Proof:
Since F(A-) = 0 we have from Theorem ].].1 that
for
Since {Xn}~=l
fore
o < D < A min A(n) .
l<k<n k
is stationary, lim R(n)(D) exists and there-
R(D) = lim RS(Ln)(D) for On;OOD
~
A6, where 6=lim min A(n)
n-ool<k<n k
= esse inf f(A) [Berger(1971),p.112]. Thus
n~
1
R(D) = lim [21 1n
~
D
-
C(n,F)].
n~
But it is known [Grenander
lim 12 In
n-.oo
1
al t(n)t n
D
&
Szego(1958)] that
1 11' In af(A) dA.
= [h;J
D
-11'
Therefore the limit C(F) = lim C(n,F) exists and the desired
n-.oo
expression of R(D) follows.
Note that C(n,F) does not depend on
~
, and thus on
the stationarity of {Xn}~=l. Hence the limit C(F)= lim C(n,F)
n-xJ
always exists provided F(A-)=O for some 0 < A < a.
COROLLARY 3.4.2
If the spherically invariant random se-
quence {Xnl~=l with zero mean has parameters (f,p) where
2
t={a min(i,j)}i,j=1
for
0 < D <
Aa 2 /4
and F(A-) = 0 for some 0 < A ~ a,
its rate distortion function is given
I
aa 2
R(D) = 2 In If"" - C(F)
where
C(F) = lim C(n,F).
n~
thNl
by
52
Proof:
Since F(A-) = 0, Theorem 3.4.1 and the remark 1'01-
G
lowing Corollary 3.4.1 imply that R(D) = RSL(D)
- C(F) for
a < D < Ao. Since ~ =ta2min(i,j)}~,j=1 the eigenvalues A~n)
can be calculated easily and we have [Berger(1970)l
a
A(n)_
k
2
k=l, .•• , n.
- 4sin2 (2k-1 T1')'
2n+1 2
Thus
o = lim min A( n) = lim
n~ l<k<n k
n-xl
\(n)_ a 2
~n
- 4
and since
I f(n)\ = ~
fn )
=
k=l k
cr
2n 4 -n
n
n
k=l
. -2(2k-1 T1')
Sln
2n+l
2 '
we have for D > 0,
eI
= 2 In
aa 2
I n . (2k-1 T1')
~ - lim n ~ In Sln 2n+1
n-.oo
k=l
=
I
I In aa 2
In sin(¥.u)du
J
~2
a
=
1 In aa 2 + In 2 = -I In aa 2
1)
~
2
2
2
G
Hence the corollary. Notice that the calculation of RSL(D)
is essentially the same as the calculation of the rate distortion function RG(D) of the Wiener sequence with variance
2
aa [Berger(197 0 )] and it could be reduced to this by noting
that Theorem 3.3.1 implies
for
a < D<
A
min A(n)
l:£k:£n k
and hence
for a < D < Ao.
53
2
2
Aa <
aa , the desired expression
Since Ao = ~
_ ~
G
I
aa 2
RSL(D) = 2 In
-n-
for
0 < D < Ao
follows from the expression of RG(D) derived in [Berger(1970)].
Again by [Binia,et al.(1974)],the rate distortion function
of a spherically invariant sequence
is upper-bounded by the
rate distortion function of a Gaussian sequence with the
same covariance structure. In some cases simpler upper bounds
valid only over a certain range of distortions can be obtained
by means of the upper bounds described in
.~
3.3. Whenever ap-
plicable these upper bounds have the following two advantages:
they are simple and they depend only on the variances of the
random variables of the sequence and not on their covariances.
An example of such an upper bound is the following.
THEOREM 3.4.2
Let {Xn}~=l
be a spherically invariant ran-
dom sequence with zero mean and parameters (t,F). Suppose
that E~ = aa 2 for n > 1 (i.e., each diagonal element of t
is equal to a 2 ) and F(A-) = 0 for some o < A $ a, and that
the rate distortion function R(D) of X exists. Then
R(D) < ~ In a~
Proof:
2
- C(I,F)
for
o < D<
2
Aa •
Consider the n-dimensional random vector
X(n) = ( XI, ••• ,Xn ) • Then the first upper bound for the rate
distortion function R(n)(D) of X(n) in § 3.3 gives
54
R
(n)
(D) < 2n
2
n
1
~
k=l
In
aO k
-n- -
2
1
C(l,F) = 2 In
ao
-n-
C(l,F)
2
min o~ = Ao • Taking the limit as n
l<k<n
obtain the upper bound for R(D).
for
<
0
D
<
A
~
00
we
Notice that the first term in the upper bound is the
rate distortion function of a Gaussian random variable with
zero mean and variance ao 2 • In general, the limit as n ~ 00 of
1
n
2n
L:
k=l
aO k2
In -D-
may not exist. However, whenever this limit exists, we can
,
n
( ) For example, 1f
find an upper bound for RD.
TIk=l
ok2 ~ e Sn
as n
~
00
(f~g
gf
means
~
e-
1), then
n
ao 2
lim -l L: In
k
n~ 2n k=l
-n- = lime ~
n~
=
In
TI
+
2~
In
~ o~J
k=l
1
a
~
2 In D + 2 '
and thus
R(D) ~ ~ In ~ + ~ - C(l,F)
h
were
0
"
= 1 1m
m1n Ok2 •
for
0 < D < Ao
Of course, similarly, one can obtain
n~l~k<n
better upper bounds for R(D) by using the second upper bound
of § 3.3. In this case C(l,F) is replaced in the final expression by C(F)
smaller.
(~
C(l,F)
and the range of distortions is
IV.
RATE DISTORTION FUNCTIONS OF CERTAIN MEMORYLESS SOURCES
WITH RESPECT TO THE MAGNITUDE-ERROR CRITERION.
The rate distortion function of an independent and
identically distributed source is clearly equal to the rate
distortion function of each random variable of the source.
In this chapter we will calculate the rate distortion function with respect to the magnitude-error criterion p(x,y)=
lx-yl of a random variable X with probability density function p(x) satisfying certain conditions. In this case Theorem
1.2.1 specializes as follows:
Let X be a random variable with probability
THEOREM 4.1
density function p(x) and rate distortion function R(D). For
each s < 0, let As be the set of all non-negative functions
As satisfying
00
J As(x)p(x)eslx-yldx
Cs(y)=
< 1
for all y.
(4.1)
_00
Then
00
R(D)=
sup
[sD +
s<O,A
fA
s~~
For each s
~
J p(x)lnAs(x}dX].
(4.2)
-00
0, a necessary and sufficient condition
for As to realize the supremum in (4.2) is the existence of
a probability distribution Gs which is related to As by
-1
[As(x)]
00
= ~oo eSlx-YldGs(y)
(4.3)
and is such that Cs(y)=l a.e.[dGs ]' Moreover, for such As
56
and Gs ' the rate distortion function R(D) is given parametrically in s by
4It
00
R(D s )= sDs + J p(x)lnAs(x)dx
_00
(4.4)
s <
00
00
°
Ds = J J As(x)p(x),x-y\eslX-YldX dGs(y)·
_00 -00
Recently,an ingeneous procedure to search for As satisfying (4.1) and (4.3) was given by Tan and YaoL1975]. Using
this procedure, the rate distortion function of an i.i.d.
Gaussian source with respect to the magnitude-error criterion
was calculated explicitly,as well as the rate distortion functions of a certain class of i.i.d. sources. We first describe
this procedure and then use it to calculate the rate distortion functions of certain classes of i.i.d. sources. In Theorem 4.2 and 4.3, the density of t:he source has finite support. In Theorem 4.4, the support of the source density may
be the entire real line, and the result is an extension (by
a substantial weakening of the conditions on the density) of
that of Tan and Yao[1975]. As a by-product of these results,
a family of lower bounds of rate distortion functions is developed and compared with the Shannon lower bound in Chapter
V.
The procedure suggested by Tan and Yao[1975] is the
following immediate corollary of Theorem 4.1.
COROLLARY TO THEOREM 4.1
Let X be a random variable with
probability density function p(x) which vanishes outside the
interval (a,b),
_00
< a < b ~
00.
For each s < O,let V be a
s
4It-
57
subinterval of (a,b) and assume that the distribution function Gs(y),yE Vs ' whose total probability is concentrated on
Vs and As(x), xE[a,b] satisfy
[ As(x)]
-1
,
= J e S x-Y\dG (y),
V
s
s
(4.6)
xEla,b]
and
If As satisfies (4.1), then the rate distortion function of
X with respect to the magnitude-error criterion is given by
(4.4) and (4.5).
The significance of this rather obvious corollary of
Theorem 4.1 lies in the fact that for some densities p, intelligent (or appropriate) choices of Vs can be made such
that (4.6) and (4.7) can be solved and the solutions satisfy
the properties stated in the Corollary.
We first consider continuous densities which vanish
outside a finite interval.
THEOREM 4.2
Let X be a random variable with probability
density function p(x) which vanishes outside the interval
[a,b],
_00
< a < b <
00.
Assume the following:
(1) p is continuous with median
~
and there is an at most
finite set of points a=d o < d < ••• < dm < dm+ =b (m~O)
1
1
such that on each [d j ,d j + ], j=O,l, ••• ,m, p(x) is differ1
entiable and its derivative p'(x) is absolutely continuous
and satisfies
p~(dj) ~
p+(d j ), j=l, ••• ,m where
p~(dj)
and p+(d j ) are the left and right limits of p' at d
j
58
respectively. Also
x
J p(t)dt
b
> a
for x > a;
a
J' p(t)dt
a for x
>
<
b.
x
(2) The function
b
(4.8)
K (x)= p(x)/ J p(t)dt
for XE[~,b)
1
x
diverges to +00 as x increases to b; and the function
x
K2 (x)= p(x)/ J p(t)dt
for xE
a
diverges to +00 as x decreases to a.
Then for each s E
exist unique as > a
(-00,-2p(~»,there
and b s > a such that as t
b s are determined by
~-a
(a,~J
and b s 1'b-~ as s
~
- 00 and as and
~-as= min { y E (a,~): K 2 (y)= lst}
(4.10)
~+bs= max{y E(~,b): K 1 (y)=l s l}.
(4.11)
Suppose in addition that
(3) for each s E(-
e-
00,-2p(~»
p(x)- s-2p "(x) >
0
a.e. [Leb.] on l~-as,~+bsJ.
Then the magnitude-error criterion rate distortion function
R(D), 0 < D < Dmax ' of X is given parametrically in s by
~+b
R(D s )= lnl~ - J
~-a
s
p(x)ln(ep(x»dx - In(p(~-as))J
~-a
a
s
s
p(x)dx
(4.12)
b
+ J
~+b
(x-~-bs)p(x)dx
s
(4.13)
where
- 00 < s <
-2p(~)
b
and Dmax = Jlx-~I p(x)dx.
a
"
59
Proof:
We will show that all conditions of Corollary to
Theorem 4.1 are satisfied with Vs = [~-as,~+bsJ where the dependence of as,b s on s will be specified later.
Substituting (4.6) into (4.7) and dividing [a,b] into
the three subintervals [a,~l-asJ,[~-as,~+bsJ and [~+bs,b] we
have
A e SY + A e- sy + J a (x)eS\x-Y\dx = 1
1,s
2,s
V s
s
for yEV s ' where A1 ,s,A 2 ,s and as are given by
~-a
s
A1 ,s= J
a
p(t)dt
p(x)
J
e-stdGs(t)
(4.16)
J eslx-tldG (t).
V
s
(4.17)
p(t)dt /
~+bs
as(x)=
Jest dGs(t)
V
s
b
A2 ,s= J
I
(4.14)
Vs
I
s
By differentiating (4.14) with respect to y, we find that it
is necessary that
a (x)= I.~J
s
2 '
and these are also sufficient for (4.14) (which can be verified by substituting into (4.14)).
Substituting the solutions into (4.15),(4.16) and (4.17)
we obtain
J es(t-~+as)dG (t)
V
s
s
~-a
= 2J
s
p(t)dt
(4.18 )
a
b
= 2 J p(t)dt
~+bs
(4.19)
60
(4.20)
XEV S •
Clearly (4.18) and (4.19) are consistent with (4.20) if and
only if
~-a
J s p(t)dt =I~ p(~-aJ)
(4.21 )
a
and
b
J
~+bs
p(t)dt = I~ p(~+bs)·
Now (4.21) is equivalent to
(4.22)
K2(~-as)=fsl
• Note that conditions
(1) and (2) imply that K2 (x) is continuous on (a,~J,differentiable on (a,~J except at those d.'s which belong to (a,~J
J
at which left and right derivatives exist, and satisfies
K2(~)=2p(~)
and lim X1a K2 (x)= + 00. It follows that given any
sE(- 00,-2p(~» the equation K2(~-as)=lsl has at least one
solution. For reasons whcih will become clear later on in
this proof we will choose the smallest solution:
(4.23 )
which is clearly such that
~-as ~
a as s l -
00
and has the fol-
lowing properties (to be used later on):
K2,+(~-as) ~
0, and K2 (y) > lsI for all YE(a,~-as).
Similarly, by the properties of Kl(x), b s is uniquely determined by
(4.24)
and has the following properties: I..l. +b s t b as s
~
- 00,
Ki,_{I..l.+b s ) ~ 0, and Kl(y) > 'sl for all yE{I..l.+bs,b).
We next show that for each sE(- 00,-2p(I..l.», the distribution
function Gs(x) which has absolutely continuous part with
e-
61
density p(x)-s-2p "{x) on Vs and zero elsewhere, discrete part
with atoms at the points ~-as,~+bs and the dj'S which are in
~-as,~+bs
and masses to be determined, and zero continuous
singular part, is a solution of (4.20). For notational convenience we will work with the "density" gs of the above described distribution function Gs which is thus of the form
£+n-l
g (t) =p ( t )- s - 2P " ( t ) +C 1 SO ( t -~ +a s ) +C 2 SO ( t -~ - b ) + L: D. sO ( t - d . )
s
,
,
S
j=£ J,
J
(4.25)
for tEV s and zero elsewhere, where
d£_l <
~-as
< d£< ••• <
d£+n_l~ ~+bs
< d£+n
and o{.) is the Dirac Delta function.
·The masses Cl ,s,C 2 ,s and Dj,s can now be determined so
that (4.20) will be satisfied. We find (see Appendix 1)
2
Cl,s=lsr 0sJ p{~-as)-p~{~-as)J
C2 ,s= Isr 2 0si p{~+bs)+p~{~+bs)J
Dj,s= ISr2[p~{dj) - P~{dj)J
(4.26)
£ ~ j ~e+n-l.
Having determined gs so as to satisfy (4.20) it remains to
be shown that gs is a probability Udensity" function, i.e.
that Gs is a probability distribution function. Since
pet) - s-2p ·,{t) > 0 a.e.[Leb] on Vs by assumption (3) and
p~{dj)
-
P~{dj) >
0 by assumption (I), it is clear from the
expressions (4.25) and (4.26) that Gs is a distribution
function if and only if
lSI p{~-as) - p~{~-as) > 0
(4.27)
lsi P (Il +b s ) + P ~ (Il +bs ) > 0
(4.28)
62
and
I g (t)dt = I dGs(t) = 1.
V
s
s
V
s
To show (4.27),we proceed as follows. Since K2' , +(~-a s ) <
we have from (4.9)
p~(~-as) I
~-a
a
sp(t)c,t. p2(~-as) <
°
°
and using (4.21) obtain
~-a
[~
s
p(t)dtJ[p~(~-as)-lsl p(~-as)J < 0.
Now since a < ~-as we have
~-a
J
s p(t)dt >
a
°
by condition
(1) and thus (4.27) follows. (4.28) can be proved similarly,
and (4.29) is verified in Appendix 2.
Next we need to show that Cs(Y)
~
1 for Y
f Vs ' As(x) is
found by substituting gs into (4.6) and we have (see Appendix
3)
~
2p ~-as
AS (x)=
e -s(~-as )+sx
xE[a,~-asJ
2p l~l)
,sl
2P(~+bs)
xE L~ -a s f ~ +b s ]
e s(~+bs )-sx
Now suppose that a
~
Y<
~-as'
(4.30)
x E[fl +bs ' b ] •
Then
C (Y)= IYA (x)p(x)es(y-x)dx + IbA (x)p(x)es(x-Y)dx
S
a S
Y S
Differentiating Cs(Y) with respect to y, we have
C'(Y)=-ISIIYA (x)p(x)es(y-x)dX+ISIIbA (x)p(x)es(x-Y)dx
S
S
S
a
Y
(4.31)
where
e·
6J
h (y) = 2e sy jYA (x)p(x)e-sxdx.
s
a S
(4.J2)
SUbstituting (4.JO) into (4.J2), we have
and finally, because of (4.21)
hS(y)= e SY jYp(X)dX / es(~-as) j~-aS p(x)dx
a
a
We now show that
f(y)= e SY jYp(t)dt
s.
a:s.Y<~-a
a
is increasing. Indeed we have
Y
f'(y)= eSY(p(y) -lsi J p(t)dt ).
a
Now (4.2J) and (4.9) imply, as was remarked, that
It then follows from (4.9) that f'(y)
~
0, a < y
~ ~-as'
and
thus f is increasing on (a,~-asJ. Hence hs(y) ~ 1 for YE(a,~-asJ
and since hs(a)=O, it follows that hs(y) ~ 1 for YE[a,~-asJ.
Now since As in (4.JO) is a solution of (4.6) and (4.7),
it is obvious that
Cs(~-as)=
1. Suppose there exists a
YoE[a.~-as) such that Cs(Yo) > 1. Let
y' = sup { yE [a. ~ -as): Cs (y) > I} •
Since Cs(y) is continuous on [a.~-as)' there exists Yl'Y2
in [a,y') such that
Yl< Y2 < y'
and
Cs(Yl) > Cs (Y2) > 1
for every yE[Yl'Y2 J.
64
Since C (y) is differentiable,we have, by the mean-value
s
theorem
C~(y*) <
for some y*E (Yl'Y2)' This implies that
O. But
Cs(y*) > 1 implies that
C'(y*)=IS\(C
s
s (y*)-h s (y*)) >
0
since hs(y*) 5 1. This is a contradiction. Therefore
The proof of Cs(y) 5 1 for
YE(~+bs,b]
is similar and is omitted.
Thus,by Corollary to Theorem 4.1, the rate distortion
function R(D) of X is given parametrically by (4.4) and (4.5).
The calculations of (4.4) and (4.5) are given in Appendix 4
and the final expressions are given in (4.12) and (4.13).
This complete the proof of the The0rem.
EXAMPLE 4.1
Let p(x) be a truncated double-exponential
density function defined on [-c,c], c >
o.
lxl ~ c,
a > O.
Assumptions in Theorem 4.2 are satisfied and are verified in
the following:
(1) p(x) is clearly continuous with median
~=O
since p(x) is
symmetric about x=O. p(x) has continuous second derivatives everywhere except at x=O at which the first derivative is discontinuous. Hence
4It -
c
Kl(x)= e-ax/ J e-atdt = ae-ax/(e-ax_e-ac)
x
lim xtc Kl(x)=
00.
Similarly,
lim x.-c
I
K2 (x)= 00 •
In fact, Kl(x) is easily seen to be monotonically in-
creasing for xE[O,c] and K2 (x) is monotonically decreasing for xE[-c,O]. Thus (4.10) and (4.11) give
a
s
=
c + 1 In ( 1- ~).
a
lSI
Now lsi takes on its minimum when as=O. This implies that
a
> a.
l_e- ac
Thus
2
(3) p(x) - s-2p "(x)= (1- a 2 )p(x) >
s
for s E(- 00,-
°
a
) and for all
l-e -ac
x E [-as,a s ].
Therefore the magnitude-error criterion rate distortion
function R(D) for
° < D < Dmax
is given by (4.12) and (4.13).
Calculation (routine and thus omitted) shows that
R (D )=
s
c
ae- ac
1
a
lnUsl(l-e -a )/aJ - 1 _ e -ac Cc + -a In(l- lsi
- )]
D=
1
s 1- e -ac
[ 1 + 1 e- ac In(l- ~)J
,sl
a
lsi
(4.33)
(4.34)
(4.35)
66
It will be shown in Chapter V that if a sequence
REMARK
of distributions converges weakly to a distribution and if
all distributions have finite means, then the corresponding
sequence of rate distortion functions converges to the rate
distortion function ofthr limiting distribution. Thus if we
take the limit as c
~
00
in Example 4.1, we will find the rate
distortion function for the double exponential on the entire
real
line.(sinc~
all distributions involved have finite means.)
i.e.
R(D)= - lnaD
o
1
< D < -
= Dmax
- a
Of course, the above rate distortion function has been found
by using the Shannon lower bound method.[Berger(197l),p.9S]
Let X be a random variable with density p(x)
THEOREM 4.3
which vanishes outside the interval [a,bl,-
00
<a < b <
00.
Suppose p(x) is a continuous concave function on [a,b] and
there is an at most finite set of points a=d o< d l <· •• < dm<
< dm+l = b (m ~ 0) such that on each [dj,d j +l ], j=O, ..• ,m,
p(x) is differentiable and its derivative is absolutely continuous. Then the rate distortion function of X with respect
to the magnitude-error criterion is given by (4.12) and
(4.13) of Theorem 4.2.
0 and p~(x) ~ p~(x).
x
Also J p(t)dt > 0 for x > a. For suppose J °p(t)dt = 0 for
a
a
some xo > a. Then p(t)=O for each tE[a,xoJ by continuity of
p. Thus p'(t)= 0 for each tE [a,x o ]. Since p~(t) ~ p~(t),we
Proof:
Since p(x) is concave, p"(x)
x
have p'(t) ~ 0
~
for each tE[a,b]. This implies p(t)= 0
for
e-
67
J
tE[a,b] which is a contradiction. Similarly
b
p(t)dt
>
0
x
for x < b. Thus the only assumption left to be verified in
Theorem 4.2 is (2). In this case we will show that
b
~ < x < b
KI(x)= p(x)/ J p(t)dt
x
is strictly increasing. The proof that K2 (x) is strictly
decreasing is similar and is omitted here.
Let d j ,j=I,2, ... ,n be points in [a,b] where p'(x) is
discontinuous. Assume a < d l < .•• < dn < b. Differentiating
KI(x) with respect to x (with derivatives at the end points
d k _l and d k , I < k ~ n+l where d o = a and dn+l=b, taken to be
p~(dk_l) and p~(dk) respectively), we have
b
b
Ki(x)=[p'(x) J p(t)dt + p2(x)]/[ J P(t)dt]2.
x
x
We will show that Ki(x) > 0 for every xE(a,b). If
p'(x) > 0, since p(x) > 0 on (a,b), then clearly Ki(x) > O.
Now suppose that p'(x) < O. Consider the tangent at (x,p(x».
Let y be the distance between x and the horizontal intercept
of the tangent. Since p(x) is concave, it is clear that
b
J p(t)dt
x
~
fP(x).y •
But
y= p(x)/tan(n-e)
=-
p(x)/p'(x)
where e is the angle between the tangent and the horizontal
axis. Thus
J
x
b
p(t)dt < - p2(x)/2p'(x)
which implies
68
b
p'(x) J p(t)dt +
x
1
2
2
p (x)
o.
~
Since p(x) > 0 on (a,b), it follows that Ki(x) > O. The results of Theorem 4.2 can now be applied.
COROLLARY TO THEOREM 4.3
Let X be a random variable with
continuous probability density function p consisting of line
segments and vanishing outside a finite interval. Then the
rate distortion function of X with respect to
the magnitude-
error criterion is given by the parametric expressions (4.12)
and (4.13) if and only if p is concave.
It is clear from Theorem 4.3 that the rate dis-
Proof:
tortion function of such convex polygon is given parametrically by (4.12) and (4.13).
Now suppose that p(x) is not concave. Then there exist
two adjacent line segments such that the left derivative at
their common point is smaller than the right derivative.Hence
for each s, Gs(y) in the proof of Theorem 4.2 is not a probability distribution function and thus, by Theorem 4.1, the
parametric expressions (4.12) and (4.13) do not give the rate
distortion function of X.
EXAMPLE 4.2
Rate distortion function of a trapezoid
(see Appendix 5 for the calculation).
Let
(a+c )-1
p(x)= {
Ixl:s.
c
O<c<a. (4.36)
69
Then
2cos
2[4Tt 1
-l(
3+ 3co s
-
)J. - a+3c
2 (a +c) -
Ja 3D
2 2
-c
_ In(2 ~a-c cos[4Tt + 1cos- l (_
a+c
3
3
o <
for
R(D)=
D <
)J)
Ja 3D
2 -c.2
1. (a - c ) (a +2 c )
3
a+c
4D
1- a+c
for
(4.37)
+ ~_c)2
3 (a +c)
2
l.
1. (a-c )(a+2c) < D < D =
3
a+c
- max
Since weak convergence of distributions with finite
means implies convergence of the corresponding rate distortion functions (see Chapter V), we can obtain from (4.37)
the rate distortion function of the triangular distribution
and the uniform distribution as follows:
(1) Triangular distribution:
In (4.36), let c
.
l~mc~o
~
0 along any fixed sequence, then
p (x) -- 1
a 1
- &
2
a
J
o ~'x,~
a
which is a triangular density. The rate distortion function
of this triangular density is found by taking the limit in
(4.37) as c
~
O. Thus
R(D)=2cos 2[4Tt + lcos- l (- lQ)]_ 1 _ In(2cos[4Tt~cos-l(_1Q)J)
3
3
a
2
3 3
a
(4.38 )
for
(2) Uniform distribution:
In (4.36) J let c t a, then
70
lim cta p(x) = 2~
which is a uniform distribution and its rate distortion function is found by taking the limit in (4.37) as c+a. We find
R(D)= -
ji-
2D
/12D)
a - In(l- ,
a
for 0 < D <
~2=Dmax . (4.39)
Note that the rate distortion functions of the triangular
and uniform distributions can also be calculated by (4.12)
and (4.13) since all assumptions in Theorem 4.2 are satisfied.
Theorem 4.2 can be extended to the case where p(x) is defined
on the entire real line. The proof is the same and is omitted
here. The result is stated in the following:
THEOREM 4.4
Let X be a random variable with density p(x)
defined on the real line. Suppose p(x) satisfies all assumptions in Theorem 4.2 (here a= - oo,b = + 00), then the rate
distortion function of X with respect to the magnitude-error
criterion is given by (4.12) and (4.13) (with a=- oo,b=+ 00).
e·
v.
BOUNDS TO RATE DISTORTION FUNCTIONS
WITH RESPECT TO THE MAGNITUDE-ERROR CRITERION
We will study various bounds for rate distortion functions which cannot be evaluated by previous results. Unless
stated otherwise,all rate distortion functions discussed in
this chapter are with respect to the magnitude-error criterion.
§ 5.1
BOUNDS USING AUXILIARY PROBABILITY DENSITY FUNCTIONS
THEOREM 5.1.1
Let X be a random variable with proba-
bility density function p(x) satisfying the assumptions in
Theorem 4.2. Let Xl be another random variable whose probability density function Pl(x) vanishes outside the interval
[a,b], and Pl(x) has at most a finite number of simple discontinuities. Then a lower bound for the rate distortion
function of Xl is given parametrically in s by
.§.I
IJ. -as
RL(D s )= -Hp(Pl) + In'2 - In(p(lJ.-a s »!
Pl(x)dxIJ.~
b
- J s Pl(x)ln(ep(x»dx - In(p(lJ.+bs»J Pl(x)dx
IJ.-a s
lJ.+b s
IJ.-a
lJ.+b
b
Ds=J
S(IJ.-aS-x)Pl (x)dX-fs-, J sP l (x)dx +J (x-lJ.-bs)Pl (x)dx
a
IJ.-a
lJ.+b
b
Pl(x)s
s
(5.2)
where Hp(Pl)= ~ Pl(x)ln p(x) dx is the generalized entropy
of PI with respect to P [Pinsker (1964),p.lS]ilJ. is the median
of p and as and b s are related to s by (4.10) and (4.11).
Proof:
Since p(x) satisfies the assumptions of Theorem 4.2,
AS(X) given by (4.Jo) satisfies
C (y) =
s
b
Ja
A (x)p(x)eslx-yl dx < 1
s
v yE[a, b].
A(1 ) (x )p (x )= AS (x )p (x ) •
s
1
also satisfies
Then A(i)(x)
s
Cs(y) = J
a
b
A~l)(x)pi(x)esIX-YldX ~ 1
VY E
[a,b].
According to Theorem 4.1, A~l)(x) yields a lower bound to the
rate distortion function of Xl' i.e.
sup (sD + J
s<o
a
b
Pl(x)ln A~l)(X)dX).
b
Let RL(D,s) = sD + J Pi (x)ln A~l)(X)dX and let d., j=l, ···,n,
a
J
a=d o < d i < ••• < dn < dn + =b be the points where Pl (d j ) has
1
simple discontinuities. Then from (5.3),(4.30) and (A.5),
p (x)ln A(l)(x) is continuous both in s and x and its partial
1
s
derivative with respect to s exists for each x E [a,b] and
s
~
0 and is bounded by a constant. Thus
oR L (D, s)
= D
oS
Setting
d
+
J
a
1
n-l
+
l;
j=l d.
oR L (D, s) = 0
oS
J d j +1
J
b p (x)
OA(l)(x)
s
dx.
+ J
as
dn At1)(x)
s
and substituting (5.3) in the above
4It
73
expression, we have
b
Ds =
(5.4)
- J
a
Thus for each fixed D, if
for all s <
°
then RL(D,s) as a function of s is concave and its supremum
is achieved by the point sD satisfying (5.4), i.e.
Whether or not (5.5) is satisfied, RL(Ds ) = RL(Ds'S) along
with (5.4) provide the parametric expressions (in s) of a
lower bound of the rate distortion function of X .Substituting
l
(A.5) into (5.4), we have
b
+ J (X-fl- b )P1 (x)dx .
s
fl +b s
(5.6)
SUbstituting (5.6) and (4.30) into (4.4), we have
RL(Ds )= -Hp (P1)+ ln~ - In(p(fl- a s
fl-a s
))!
Pl (x)dx -
fl~s
-J
fl-a s
b
P1(x)ln(ep(x))dx- In(p(fl+bs))J'
~l
Pl(x)dx
+b s
b
P1 (x)
where Hp (P ) = J Pl (x)ln p(x) dx
is the generalized en1
a
tropy of P1 with respect to p and has the property H (Pl»O
p
-
74
with equality if and only if p(x) ~ Pi (x) a.s.LPinsker(1964)
4It
P.19].
as and b s are related to s by (4.10) and (4.11). It
should be clear from (5.7) that RL is useful only when
Hp (Pl) <
00.
~
0, RL(D s ) given by
Theorem 5.1.1 is equal to R(D s ) if and only if there exists
a probability djstribution function Qs whose total probability
THEOREM
For eacn fixed s
5. 1 • 2
is concentrated on (a subset of) [a,b] and is such that
(i)
~
Jb eslx-yl dQs(y)
a
4Itb
J eslx-yldQ (y)
a
s
x E (i1 +b s ' b]
(5.8)
where as and b s are given by (4.10) and (4.11 ), and
(ii)
J b ~s(x)p(x)e s Ix -yI dx
1
a.e.LdQs]·
a
~
Proof:
(5.9)
Suppose the assumptions are satisfied for a given
s < 0. SUbstituting (4.30) into (5.8), we have
b
Pi (x)~ AS(X)p (x)J eslx-yldQs(y)·
a
SUbstituting
(~.3)
into the above expression, we have
b
Pl(x) = ~~1)(x)P1(x)J eslx-yldQs(y)·
a
•
75
If Pl (x)
I
0, then we have
[A (1) (x) ]
-1
s
b
= J eslx-YI dQ (y) .
s
a
If Pl (xJ=o for some xoE[a,b], then from (5.8) Pl (xo)=p(xo)=o
for some XoE[a,~-as)U(~+bs,bJ. In this case, we can define
A~l)(x) by (5.10). Therefore RL(Ds)=R(D s ) by Theorem 4.1.
Conversely, for a given s
~
0, suppose RL(Ds)=R(D s )' Then
A~l)(x) achieves the supremum in (4.2) and hence it satisfies
(4.3),i.e. (5.10) and is such that Cs (Y)=l a.e.[dQs(Y)] for
some probability distribution Qs' SUbstituting (5.3) and
(4.30) into (5.10), we obtain (5.8).
It would be of interest to compare the lower bound of
Theorem 5.1.1 to the Shannon lower bound which is now computed for densities which vanish outside the interval [a,b],
-00
< a < b <
00.
THEOREM 5.1.3
Let p(x) be a probability density function
of a random variable X which vanishes outside
< b <
00.
[a,b],~~ <
Then the Shannon lower bound to the rate distortion
function of X is given parametrically in s by
R (D )=h(p)- lsi
SL s
~b-a)
[l_exp(J§l(b_a))]-l +
2
lsi
+ In--------2e[1-exp(- 1~(b-a))J
Ds
=
-1lsi
a <
b - a
+ ------,.----
2[l-exP(~(b-a))J
b
where h(p) - - J p(x)ln p(x)dx
a
is the entropy of p(x).
76
Moreover RSL(D) < R(D) for all D with R(D) > 0 unless
p(x) =
(I
IS\
+bl
a
.exp s x- --)
2[1-exp(- 1~(b-a»J
2
xE[a,bl
in which case RSL(Do)=R(Dc,) at the point Do with slope s.
Proof:
Let
p(x) > 0
otherwise.
From the condition
Cs(y) = J
a
b
Kseslx-yldX ~ 1
'rJ
yE[a,b]
one can take
sup JbK eslx-yl dx = 1.
yE[a,b]a s
Thus
K
s
= [
By a simple
b
sup
J eSlx-yldx
yE[a,b] a
e·
l-l
calculation, the supremum is attained at
y= !(a+b) and thus
s
K
s
= -------2[exP(~(b-a»-1]
By Theorem 4.1
b
Ks
RSL(D)= sup(sD + J p(x)lnPTiJ dx )
s<O
a
= sup (h(p) + sD + ln Ks )
s~O
lSI
)
= sup (h(p)+sD+ln
s
s<o
2[1-exp('2(b-a»]
(5.14)
.-
b
where h(p) = - J p(x)ln p(x)dx
a
is the entropy of p(x).
e
77
To find the supremum of the function within the parentheses in (5.14), we differentiate it with respect to sand
set the partial derivative to zero. i.e.
The corresponding extremum point corresponds to a unique
maximum if and only if the second partial derivative with
respect to s is
~
0 for all s
~
0, which reduces to
(5.16)
Now (5.16) is shown to hold as follows:
Let x = I~I (b-a). Then (5.16) is equivalent to
for all x > O.
Differentiating f(x) with respect to x, we have
,
f (x)= -2 ( l-e x) e x -2xe x - x 2e x
x2
= 2e x [e x _ (1 +x+'2)].
,
x2
x 3 +••• ,
Since e X= 1 + x +2T+3T
we have f (x) >- 0
Therefore f(x) > f(O) = 0 for every x > O.
V
x >
- O.
-
Thus the supremum is achieved in (5.14) when (5.15) holds.
Substituting (5.15) into (5.14), we have the desired result.
Now from Theorem 4.1, RSL(D)=R(D) at a point D with
slope s,if and only if Cs (y)=l a.e.[dGsJ. But from the derivation of the Shannon lower bound, for each s < 0, Cs(y)= 1
78
if and only if y=!(a+b). Hence RSL(D)=R(D) at a point D with
slope s if and only if Gs puts its total probability mass at
!(a+b), in which case
v
xE[a,b]
and from
for p(x) > 0
we have
.!.
I
lsi exp(s \x-!(a+b)\ )
p(x)=K e S I x-2(a+b) = -----~---s
2[1- exp(- I~~b-a))]
V xE[a,b].
It is easy to verify that p(x) is a density. Thus it follows
that if p(x) is given by the above expression, then there is
one s < 0 such that at the point Do with slope s, we have
RSL(Do)=R(Do )' For all other densities p which vanish outside [a,b], we have RSL(D) < R(D) for all D >
0
such that
R(D) > O.
EXAMPLE 5.1.1
Let p(x) be the uniform density defined on
[a,b]. i.e. p(x)= (b_a)-l, a < x ~ b. Then p(x) satisfies
all the assumptions in Theorem 4.2. Let Pl(x) be a piecewise continuous density defined on [a,b]. Then by applying
Theorem 5.1.1, a lower bound for the rate distortion function
of Pl (x) can be found. Calculation shows (see Appendix 6)
lsi
RL(Ds)=h(pl)+ln~
Ds
=
1.
--1lSI
1
+ J [Pl (a+t)+Pl (b-t)]dt
lSI
0
lSI
-
J t[P i (a+t) + Pi (b-t)]dt
o
(5.17)
1
S<- b-a'
(5.18)
e
79
In (5.18), if for a given D, \sl
is not single-valued, and if
a branch of Is\ can be chosen such that (5.5) is satisfied,
then f'or this branch of I s\ , RL (D) is the best possible lower
bound achieved by the method of Theorem 5.1.1. Note that condition (5.5) is equivalent to
P1 (a+\~ ) + Pi (b- I~) < lsi
in this example. (see Appendix 6)
The lower bound of Theorem 5.1.1 is of course useful when
Theorem 4.2 in not applicable to P1 (x). As an illustration we
now calculate the lower bound of Theorem 5.1.1 given by (5.17)
and (5.18) when P1 is the truncated double exponential density.
In this case the rate distortion function of Pi has been calculated in Chapter IV and therefore we can see how tight is
the lower bound determined by (5.17) and (5.18).
Thus
a. >
o.
Calculations show the following:
h(p)
=
1 + In
a.
2(1- e- 2)
~----o.~~~-
a.
2(eo.!2_ 1 )
2(1_e-o./2) + In ~
a.
2
Jsl > 2
and
-a.
D
=
1
[(_..1 _ L)e 2
max 1- e -0./2
2
a.
For the Shannon lower bound we have
+ _1_J.
a. -
(5.20 )
80
lsi> 0
Curves are plotted for a
0.1, 2,
5,
10. In general, RL is
a better lower bound than RSL ' except in a small neighborhood
of Dmax where the Shannon lower bound is better. As a-O, the
difference between RL and RSL becomes larger and as a--+oo, the
difference becomes smaller. It can be seen also that RL(D) is
a very good approximation to R(D).(see Figures
p.118-121)
5.1,5.2,5.3,5.~
D
Figure 5.5
L..---J..---L..--.L..-------.. lsi
O
2
IS 01
ISmaxl
If, for a > 0 fixed, we plot (5.20) (D as a function of lsi ),
we obtain a curve as shown in Figure 5.5 .No te -eha tat lSI = 2,
Ds = Dmax • Also Ds achieves its maximum at some point ISol> 2,
and Ds is a decreasing function for all \sl ~ ISol • It follows
(as it is easily checked analytically) that for all /sl>ls
- maxI,
condition (5.19) is satisfied and thus the branch of /sl :
lsi>
~maxl
gives the tightest possible lower bound RL(Ds ).
Another lower bound for the rate distortion function of
the truncated double exponential density can be obtained by
79
In (5.18), if for a given D, \sl is not single-valued, and if
a branch of 1st can be chosen such that (5.5) is satisfied,
then f'or this branch of lsi, RL (D) is the best possible lower
bound achieved by the method of Theorem 5.1.1. Note that condition (5.5) is equivalent to
Pl (a+
I~I
I~I)
) + Pl (b-
< lsi
in this example. (see Appendix 6)
The lower bound of Theorem 5.1.1 is of course useful when
Theorem 4.2 in not applicable to Pl (x). As an illustration we
now calculate the lower bound of Theorem 5.1.1 given by (5.17)
and (5.18) when Pl is the truncated double exponential density.
In this case the rate distortion function of Pl has been calculated in Chapter IV and therefore we can see how tight is
the lower bound determined by (5.17) and (5.18).
Thus
e -a. Ixl
a. >
o.
Calculations show the following:
a.
e
h(P) = 1 + In 2(1-a. - 2)
a.
- 2(ea./2_ 1 )-
a.
+ In 2(1_e-a./2) + In i2L
- 2(ea./2_ 1 )
a.
2
-lL
-1- ) e lsi +
a.
and
lsi> 2
_l_J
a.
-a.
D =
1
[(max 1- e-a.7 2
1
2
_1_)e--2--+
a.
For the Shannon lower bound we have
--l--J.
a. -
(5.20)
80
lsi> 0
(5.21 )
1
Curves are plotted for a . - 0.1, 2, 5, 10. In general, RL is
a better lower bound than RSL ' except ln a small neighborhood
of Dmax where the Shannon lower bound is better. As a -0, the
difference between RL and RSL becomes larger and as a-+oo, the
difference becomes smaller. It can be seen also that RL(D) is
a very good approximation to R(D).(see Figures
p.118-121)
5.1,5.2,5.J,5.~
D
Figure 5.5
o
If, for a > 0 fixed, we plot (5.20) (D as a function of lsi ),
we
0
btain a curve as shown in Figure 5.5.No te that at lSI = 2,
Ds = Dmax ' Also Ds achieves its maximum at some point ISol> 2,
and Ds is a decreasing function for all \s/2: ISol . It follows
(as it is easily checked analytically) that for all ISI>ls
- maxI,
condition (5.19) is satisfied and thus the branch of rsl :
lsi>
~maxl
gives the tightest possible lower bound RL(Ds )'
Another lower bound for the rate distortion function of
the truncated double exponential density can be obtained by
e-
81
using the truncated Gaussian density in stead of the uniform
density. Let
_ X 2/(2a 2
)
p(x) = Ke
where
~(x)=
J
x
Since p(x) satisfies all assumptions in Theorem 4.2, a lower
bound for the rate distortion function of Pi (x) is obtained
by Theorem 5.1.1. Calculation shows:
+
1-e
~a./2 [(~
-(1-
8a
2 1 2)(1+
a. a
~)+lnK)e-a./2-lnKJ +
2
+
Ds =
1
1-e
-0./2
1
[_ as e-a./2 +(1- ~(~s+ 1
a. ))e-a. a s ]
1_e-a./2
2a 2
a.a
[1
!Sf
+
-0./2 ( 1
ase
- lSI
The relationship between as and s is given by
a
f(7)]
-1
lsi> o.
Numerical calculations show that this lower bound is slightly
better than the one obtained via the uniform distribution.
Another class of lower bounds can be constructed by
considering
t~e
class of densities
82
for Ixl:s.
c,
where S > 0 and C2 :s. 1, The rate distortion function of p(x)
can be found by Theorem 4.2 for some values of Sand C2 , For
C=~,
example if
S=l, C2=O, then all the assumptions in
Theorem 4.2 are satisfied. (see Appendix 7) The resulting
lower bound has the following form:
\s\ > 0
.l.
where
.2
J e
X
2
dx
e-
as
If the method used in Theorem 4.2 is applied to a discontinuous probability density function, a As(x) .:::
° may be
found satisfying condition (4.1) whereas a distribution
function Gs(y) satisfying (4.6) and (4.7) may not exist. In
this case, using the above mentioned As' a lower bound for
the rate distortion function of the discontinuous density
can be obtained by (4.2). The following example illustrates
this point.
EXAMPLE 5.1.2
(For calculations, see Appendix 8).
Let
-1 < x < 0
o < x < 2.
83
Then for 0 < D ~
It ' we
have
12
24 <- D <- 12
24 = Dmax' the rate distortion function
and for
itself can be found and is given by:
§ 5.2
BOUNDS USING AN APPROXIMATING PROBABILITY DENSITY
FUNCTION
The following property is useful in finding upper and
lower bounds for the distortion rate function of a random
variable.
LEMMA 5.2.1
If the random variable Xi' has distribution
function Fi , and distortion rate function Di (R),i=1,2, then
for all R > 0
00
J IF 1 (t)- F2 (t)ldt.
-CO
Thus clearly if Fn - F - 0 in L , then Dn(R) - D(R) uniformly.
1
Also, if Fn - F weakly and Fn,F have finite means, then
p(Fn,F) - 0
Proof:
and Dn(R) - D(R) uniformly.
The inequality in (5.22) follows from the result
of [Gray,et.al.(1975)]
applied to i.i.d. sources with dis-
tributions F and F2 The expression for p on the right side
1
of (5.22) is a result of Vallender(1973)o Now if Fn converges
0
to F weakly and all distributions involved have finite means,
then by a theorem from Dorbrushin(1970), we have
p(F ,F)-0
n
84
and hence Dn (R) - D(R) uniformly.
The following well-known property should be noted in
connection with Lemma 5.2.1: If a sequence of probability
density functions Pn converges to a probability density
function p almost everywhere, then the corresponding sequence
of distributions Fn converges to the distribution F of p
weakly.
Using Lemma 5.2.1, one obtains the following lower and
upper bounds for the distortion rate function of any random
variable:
We can choose a random variable X2 whose rate distortion
function can be calculated by means of Theorem 4.2 and whose
p
distribution F 2 is close to F in the sense of the
distance.
1
The resulting bounds are especially useful if D (R) cannot
1
be calculated using Theorem 4.2,e.g. the case of discontinuous density of Example 5.1.2.
EXAMPLE 5.2.1
In order to derive lower and upper bounds
for the rate distortion function of the random variable X
with density
p(x)
={
~
-1 < x < 0
8
we consider for each 0 < E < 1
o
< x
~
2,
a random variable XE with
probability density function P E defined by
e-
85
1
-1 < x < -E
4
x 2 + ---L + -L
8E
16E 2
16
x 2 + ---L + -L
- 16E 2
8E
16
PE(x)=
-E < x < 0
0 < x <
E
1
8
Note that PE(x) converges to p(x) almost everywhere as E
(i.e.
~
o.
Vx/O).
If FE and F are distributions of XE and X respectively, then we have (see Appendix 9)
_
E2
9b .
p(FE'F) =
For E >
12
--~~---
'" 0.593 67, the distortion rate function
'"
15 -J2 - 1
of XE can be calculated using Theorem 4.2 (see Appendix 9).
The evaluation of DE(R) is routine but lengthy and tedious
and is omitted here. The result is the following:
(i) -1 -<
~-a
<-E
s-
u-a
R
i
+
t - ~tan-1 ~
-
~ ln~ + -~ - i(~+bs) + ~ln4 - i ln
+Ecilni -
~ln4
-
~lnrt
+
(ii) -E
R
(~-as )
=
16
In[
(~-as
16E
2
)
2
~tanh-1~J
86
(iii)
a
<
II-a S < E
_!""
+
(Il- a
s
)2
1(Il- a )
_ _..:=.s_ - 28(Il+b ) +
48
48E
s
r
IIW
+
.
e1
D= - [
lsi
(ll- a s )3
----.;;~
48E 2
2
E
1
+ 6 + 192 +
(/l-a )3
s
+ ----:::.-48
D= sisr[(Il+b s ) - (Il-a s )] +
+
i
[1 - t(/l+b s )]2 +
It (/l-as )2
k+ ~
•
..
87
The relations between lsI and as,b s are given by:
1
lsI
-1 <
<-E
- fl-a s -
=
-E<
- fl-a s < 0
lsi =
E <
-
Is I = 2 _
(t +bs )
where fl=
fl-a s<
fl
fl ~ fl +bs < 2
2
J'
If we take E=0.59367, then p(F,FE)~ 0.00367 and numerical
computations show that DE(R)-0.00367
~
DL(R)
~
DE(R) where
DL(R) isthe inverse function of RL(D) in Example 5.1.2.
Thus we have
§ 5.3
BOUNDS FOR A GAUSSIAN VECTOR SOURCE
In this section we derive two kinds of upper and lower
bounds of the distortion rate function with respect to the
magnitude-error criterion of a Gaussian vector. These bounds
are expressed in terms of the distortion rate function of a
Gaussian vector with independent components, which can be
evaluated using the result of Tan and Yao(1975) for a Gaussian
random variable.
88
Consider an n-dimensional Gaussian vector X with zero
means. Its distortion rate function is defined by
D{R) =
1n
inf E \Ix- YII
qEQR
tl." is the ,£~n) nGrm and QR= {q: ~ I{X, y) ::s. R J where
where
q is the conditional distribution of Y given X.
Let A be the unitary matrix in the Jordan canonical
form of the covariance matrix of X. It is well-known that
U= A- 1 X is an n-dimensional Gaussian vector with independent
components. Then we have the following relationship between
the distortion rate functions of X and U.
5.3.1
LEMIVIA
where D*{R) is the distortion rate function of an n-dimensional Gaussian vector with independent components whose
variances are the eigenvalues of the covariance matrix of X
n
and \lAII = sup ·~lla .. \ .
i
Proof:
J-
lJ
"
1 I{U, V) ::s. R}
Let V=A- 1 Y and QR={q
: n
with q
the
conditional distribution of V given U. Since A is invertible,
the conditional distribution of Y given X determines uniquely
the conditional distribution of V given U, and vice versa.
Then
I\x-YII = IIAu-AvlI ::: IIAII.llu-vll
IIAII
ln
inf
q'EQR
implies
E\lu-vil
=I\AI!.D*{R)
tt
89
where D*(R) is the distortion rate function of U,i.e. of an
n-dimensional Gaussian vector with independent components
whose variances are the eigenvalues of the covariance matrix
of X. Similarly
IIA- 1 (X-Y)II ~IIA-1IHlx-Y'1
implies D*(R) ~ IlA- 1I1D(R) and the result follows from
!lA- 1 11 == \lATI' since A is unitary.
Ilu-vll =
It should be noted that D*(R) can be evaluated since its
components are independent and the distortion rate function
of each component is known. [Tan and Yao(1975)]. Thus the
lower and upper bounds of Lemma 5.3.1 can be evaluated. Numerical computations show that in general these bounds are
not good. But if X is "nearly independent
1\ I-A II < 2~E'
E >
II
in the sense that
0 I then the ratio r of the upper and lower
bound is close to 1,i.e.
1~r:::1+E
(see Appendix 10). This implies that D*(R)
~
D(R). Notice
that if A is the matrix resulting from the orthonormalization
of X, then the Gaussian vector U has i.i.d. standard normal
components, and the inequality of Lemma 5.3.1 is valid with
n*(R) now the distortion rate function of the standard normal
random variable (since the distortion rate function of the
vector with i.i.d. components equals the distortion rate
function of one of its components).
We now derive upper and lower bounds of the distortion
rate function
Dx(n)(R) of an n-dimensional random vector
x(n)=(X11 ••• ~), in terms of the distortion rate function
9C
Dy(n)(R) of an auxiliary n-dimensional random variable y(n)=
tt
(Yl' ""Yn) and the Vasershtein distance p(F(n),G(n)) between
the distribution functions F(n) and G(n) of X and Y respectively. P is defined by
p(F(n),G(n)) = inf E
n
L:
. 1
1=
IX.1 -Y·I1
where the infimum is tak8n over all possible joint distributions of the two random vectors X(n) and y(n)(having the
fixed marginals of F(n) and G(n)), and the bounds follow
from the n-dimensional analog of Lemma 5.2.1:
These bounds are useful only when the distortion rate function of the auxiliary random vector y(n) as well as
p(F(n),G(n)) can be evaluated. When the component random
variables of y(n) are independent and their distributions
e-
are such that their distortion rate functions can be evaluated (for instance by Theorem 4.2) then the distortion rate
function of y(n) can be evaluated by Theorem 1.3.1. Moreover,
when the components of y(n) are not independent,no result on
the evaluation of the distortion rate function of y(n) is
known at present. p(F(n),G(n)) has been evaluated by Vallender
(1973) for n=l (see Lemma 5.2.1),but again no results are
known for n
~
2. Theorem 5.3.2 below gives an upper bound
U(F(n),G(n)) to p(F(n),G(n)) which can be used to obtain the
following bounds for Dx(n)(R):
D (n)(R)-U(F(n),G(n))
y
~ D (n)(R)
X
< D (n)(R)+U(F(n),G(n)).
y
e
91
Let X(n)= (X ' ••• ,X ) be a random vector with joint disn
1
tribution function F(n),let K.(.) be the distribution function
1
of each Xi' i=l, ... ,n, and for i=2, ... ,n,let K i (X , ... ,x i _ ,·)
1
1
be the regular conditional distribution function of X.1 given
X1 =x 1 '···'X i _1 = Xi-i' Let also Y(n) = ( Y1 , ""Yn ) be another
random vector (defined on the same probability space with X)
with joint distribution function G(n), marginal distribution
functions Qi(') and regular conditional distribution functions
Qi(Yl' ""Yi-l")'
THEOREM 5.3.2
where
[Cambanis(1975)]
() ()
L(F n ,G n )
n
00
= r.:
i=l
-00
00
= J IK1 (u 1 )-Ql(u 1 )ldU 1
+
-00
+
f.1- 1 (vi' • • . , v.1- 1)' u.1 ) -Q.1 (g1 (v ) , ... ,
1
I
g.1- 1 (vi' • • • , v.1- 1)' u.1 ) du.1
1
f. (v 1 ' . • • , v. ) = K:- f ( )
f
(
) (v. )
1
1
l ; 1 vi , ••• , i-l vi"'" vi-l
1
-1
g:l' (vi' • • • , v l'
)
= Q,1
i=2, ... ,n.
()
(
;g1 v 1 , .. ·,gi-l v 1 " " ' Vi _1
) (v. )
l'
92
These upper and lower bounds are equal when n=l. For
tt
n > 2, notice that the lower bound is symmetric, while the
upper bound clearly depends on the order in which the conditional distribution functions were considered. Hence a
better upper bound is the Jinimum of all upper bounds obtained
as in Theorem 5.3.2 over all (n!)2 orderings of Xl' ""Xn
and Yl""Yn' Of course when the distribution functions
F(n) and G(n) are symmetric then so is the upper bound. It
is conjectured that in most interesting cases a strict inequali ty holds in L <
f .
When the vector Y(n) = ( Y , ""Yn ) has independent coml
ponents we have for all i=2, .•• ,n and all Yl' ""Yi E R,
Q.1 (Y ,··· ,y.1- 1'Y·1 )=Q.1 (y.)
1
l
and then the expression of the upper bound can be simplifjed
as follows:
U(F(n) G(n))=
,
Joo'K1 (u 1 )-Q 1 (u 1 )!dU 1 +
-00
n
l:
.
J . 1 F.1- 1(dU 1 , •.. ,du.1- 1)'
1=2 R1-
00
'J \K.1 (u l ' ••• ,u.1- l'u.1
-00
=
)-Q. (u. )/dU.
1
1
1
n
P(K ,Ql)+ i: EP(K.(X ,···,X. 1,·),Q.(.)).
1
1
i=2
1
11
We now consider the case where X(n)and y(n) are zero
mean Gaussian vectors and the components of y(n) are independent ( so that Dy(n)(R) can be evaluated) and we determine the distribution of y(n) which minimizes the upper bound
of Theorem 5.].2. The result is given in Theorem 5.].]. In
its proof we will need the expression of the Vasershtein
93
distance between two Gaussian random variables,given in Lemma
5·3·2.
LEMMA
S.g
Let U 'U 2 be two Gaussian random variables
1
with distributions G1 ,G 2 ,means ~1'~2 and variances o~, o~
respectively. Then
the minimum value of P(G ,G 2 ) over all
1
°
2 o~ ~ 0 is achieved when 01=02 and this
POSSlobI e varlances
01'
For fixed means
~1'~2'
minimum value is equal to I~ 1 -~ 2\ •
Proof:
From the result of Vallender(1973), we have
00
P(G ,G )=
1 2
J
-00
IG1 (x)-G 2 (x)ldX.
(5·26)
Thus
P(G 1 ,G 2 )= J00' J x 1
exp[-00
-oo.j 2TfO 1
(t-~ 1 )
x
J
-00
x-1.:t
00
-- J
-00
°1
J
-00
....L e
.j2tf
1
exp[-
.j2tf° 2
t2
2 dt
J
-00
.j2tf
]dt -
20 2
1
(t-~ 2 )
2
2°2
x-~2
°2 _1_
2
e
2
]dt dx
t2
2 dt dx .
CS. 27)
The expression inside the absolute va].ue in (5.27) is positive
if and only if
94
a
i.e.
Therefore
P(G ,G
1
2
x-~12
)=J
a
dx
J
0"2
t2
__1__
2 dt +
e
J
a
X-1J.1 -/2tr
-00
00
dx
J
x-u_
_
1
0"1 __1__
0"1
X-1J.2 .J2:rr
0"2
.
')
tt:..
e
2 dt •
We now have
a
J dx
_00
-00
00
=J
dt e
-00
t
2
= JC [ (0" 1-0" 2 )t + 1J.1 -IJ. 2 J e -2dt
-00
e·
where
since 0"2t + 1J.2 < 0"1 t + 1J. 1 if and only if t < c and in this
case 0"1t +1J.1 < a. Similarly we find
X-IJ.
=-:.l
0"1
00
J dx J
a
X-1J.2
0"2
e
t2
- 2
t2
00
dt - J [(0"2-0"1 )t+ 1J.2-1J. ]e
1
c
c2
= (0"2- cr 1)e
and thus we have
-2
2 dt
00
(1J.1 -IJ. 2) J e
c
_
t2
2 dt
.
95
J
lei e
t
2
2 dt]
- lei
t
2
2 dt].
When 01> 02 we obtain similarly the same expression with
02-01 replaced by 01-02' Hence (5.25) follows when °1 / °2'
Now if 01=02=0 going back to (5.27) we have, assuming 1J.2~1'
X-II
_1"'_1
°
00
P(G1 ' G2 ) = J
_00
t
2
2
dx J
X-1J.2
ef2=.....
~G'IT
L2
2
00
00
J X x -IJ.
dt = J dt e
_00.j2:;
-00
X
-11
2 __
1)
(t
( __
° ' °
°
00
=J
dt
_00
we have P(G 1 ,G 2 )=
\1J. 1 -1J. 2 1 which is consistent with (5.25) since
Similarly for
1J. 1 > 1J.2" Hence when 01=02
00
J
o
e
Now let IJ. =11J.1-1J.21~ 0 and
and for all 1J.,0
2
~
_-lL
of (U 10)
20 2+ oe
-- - - =e
00
0 we have
0=lol-021~
O. Then from (5.25)
)dx
96
2
-.JL
2
2
= e 20 (1+ bL..2
> O.
0
Thus for each fixed
function of
0
~ ~
0, f(~,') is a strictly increasing
> 0 and hence for all
00
_t 2/2
f(~,o)~ f(~,O)= ~
J e
~,a ~
dt =
o
JTI72
0
.
~
i.e. minp(G 1 ,G 2 )= ~=1~1-~21
If X(n) is a zero mean Gaussian vector then for each
i=2, ••• ,n, K.1 (Xl' ••• ,X.1-1") is a Gaussian distribution
function with mean M.1 (Xl' •.• ,X.1- 1) and variance V~1 given
by
( 1 I Xl"'" X.1-J.1 )= X(i-l)f-l
M.1 (Xl"'" X.1- 1 )= EX.
. 1T.1- 1
1-
e·
where
Ii is the covariance matrix of x(i)=(X , ••• ,X i ), Ti
1
is defined by the following partition
ti
== (
ti -1
T!
1
Ti )
o~
1
and 0i is the variance of Xi [Miller(1964)].
THEOREM 5.3.3
Let X(n) be a zero mean Gaussian random
vector with distribution function F(n) and let
i.,
T., V.,
111
0i be defined as above. Let y(n) be a zero mean Gaussian
vector with distribution function G(n), whose components are
independent with variances
Then
97
sl,···,sn
and the minimum is achieved when sl=ol and si=V i ,i=2, ... ,n.
Proof:
We want to determine s. so that U(F(n),G(n)) is
1
minimized. Since the components of yen) are independent, we
have (from (5.24))
U(F(n),G(n))=f'(K ,Ql)+ ~ EP(K.(X ,···,X. 1,·),Q·(·))·
1
i=2
1 1
11
It is then clear that U(F(n),G(n)) is minimized if and only
if each term on the right hand side is minimized. Since
by taking sl=ol' we have P(K 1 ,Ql) = O. For i=2, ..• ,n, by
Lemma 5.].2 we have
P(K i (X 1 , .•• ,X i - 1 , ·),Qi(·))
M~1
and this is minimized when si=V i (which does not depend on
Xl' ••. ,X i _ )· The resulting minimum value is
1
and since
E M~ = E T!
1
we have
:t:- 1
1-1 %1-1
E IM·I
=
1
X ( i -1 ) , X ( i -1 )
i:- 1
~(T!1- 1 ~%1:-1 1T1. 1)!
T.
1-1 1-1
=
T'
i-1
:;[ -1 T
~i-1
i-1
and the desired expres-
sian of the minimum value of U follows.
96
For n=2 it is easily seen that the minimum value of U
.
given in Theorem 5.3.3 is .J27i1'IPI0 2 • Thus the least possible
upper bound is .J2TrfIP\min(ol' 02) and it is achieved when
~2- . (
sl= max(ol,02 ) and s2=vl-P
mln 01'02 ) •
We now show that the expression of the upper bound of
Theorem 5.3.2 in the general Gaussian case when n=2.
THEOREM 5.3.4
Gaussian vectors with distribution functions F(2) and G(2)
and variances of and sf, i=1,2, and correlation coefficients
p and r, respectively. Then
Proof:
The proof is lengthy but is a straight forward
e·
application of Lemma 5.3.2 and hence is omitted.
COROLLARY TO THEOREM 5.3.4
Proof:
Letting
If p=r in Theorem 5.3.4, then
p=r in Theorem 5.3.4, we have
By Theorem 5.3.2, it is easily seen that
Hence the Corollary.
This corollary gives an example when L=f
in Theorem
5.3.2. However this example is clearly of no interest since
the vectors (Xl' X2 ) and
« 0l/s l )Y1 , (02/s 2 )Y2)
have the same
~
99
distribution.
The following is a different approach in calculating an
upper bound to p(F(n),G(n»
when F(n) and G(n) are Gaussian.
Let X and Y be Gaussian vectors and assume their components are linearly independent. Let A,B be the triangular
matrices obtained from the Gram-Schmidt orthonormalization
procedure which are such that
and
have the identity as covariance matrix.
Consider now the joint distribution function of X and Y
which is such that U=V, i.e. taking any Gaussian vector U
with covariance matrix I and consider
X'= AU
y'= BU.
and
Then X' and Y' have the same distributions as X and Y respectively and thus
p(F(n),G(n»
n
n
< I; EIX!-y!!= I;
-i=l
l
l
i=l
=
EXAMPLE
Proof:
i
EI j=l
I; (a .. -b .. )U·I
lJ
lJ
J
n
i
2 J..
~ I; [ I; (a .. -b .. )
i=l j=l
lJ
lJ
J?=
W(F
(n)
,G
).
5.H
With the notation of Theorem 5.3.4, we have (by
performing the Gram-Schmidt orthonormalization)
o
)
°2~1-p2Thus
(n)
and
100
2
W(F (2) , G(2) )=.J2Tn { lOl- S l l+[( p02-rs2 )2+( 02.)1- p2- -s 2.)1-r -)2 J~J
=
by
U(F(2),G(2»
Theorem 5.3.4.
..
101
APPENDIX 1
Proof of (4.26)
SUbstituting (4.25) into (4.20) we have
e-tn-l
J[p(t)-S-2p ,,(t)+C
6(t-r1+a )+C
6(t-fl-b)+ 2: D. 6(t-d·)l·
2 s
V
-1 ,s
s,
S
j==,.e
J,S
J
s
• eS\x-tl dt ==,~ p(x)
x E Vs.
Suppose x E (dk,d k +l ], where fl-as ~ d k < fl+b s ' Then
k
2:
+
j=,.e
Now
fl +b
J
D. es(x-d.) +
J ,s
J
,.e+n-l
~
j==k+l
(A.l)
fl +b s
d e e-+n-2 d j +1
s(p(t)_s-2 p ,,(t))e s Ix-tl dt ==
J +
fl- a s
fl.-as
2:: J
j=-=,e d j
+
J
de +n-l
(p(t)_S-2p"(t))eslx-tl dt.
(p(t)_s-2 p ,,(t))e s (x-t)
Since for each sE(- oo,-2p(fl)),
(A.2)
is
a.e.[Leb.] on [fl-as,d£] equal to the derivative of
which by assumption
(1) is absolutely continuous, we have
dn
J
N
fl-a s
SX
s
-8
=-~--[e
s
d
£p~ (d£ )+se
-sd
£p(d,.e )-e
-s (fl-a )
-S
s p~(rl-as )-se
Similarly,
Jdj+l(p(t)_s-2p"(t))es(x-t)dt ==
d.
J
d
Sx
e [ e -st p , (t ) +s e -st p ( t
(p ( t ) - s - 2 p" ( t ) ) e s (x t
- ) d t == - -2-
)l. e
fl-a s
(lL
-a )
s P(fl-a
s
n.
102
sx
~ --;-[ e
s
-sd'+
-8d'+ 1
sd.
-sd.
l
J
p ~ ( d J' +1 ) +s e
,l p ( d ,]' +1 ) - e
,l p • (d . ) - s e
J p (d . ) + J
J
.e
=
2
1
s- p' (x)+ - p(x)
lsi
2
- '"c.- [lsi p(d k +1 )
.t
p
'(d
-
k +1
•
~ j
~ k-l
)] e S(dk+1-x)
•
k +1
J
ft +b
(p(t)_s-2 p l/(t))e s t--x
r
s
( )
:S- j
~e
+n - 2
dt
d.e+n-l
-
e -sx[_ s (/1 +b s )
sdp + -1
s (rt +b )
- -2-.e
p~(f1+bs)- se
s P(IJ.+b s )- e "n p~(de+n-l)
s
+
S
e
Sd,e +n-l
p ( d ,e +n -1 )] •
SUbstituting the above expressions in (A.2) and combining
similar terms, we have
]
-2asl p(11-a s )-p~(fl-as )Je
s
1
-7
.
s(x-fl+a )
s -
[IS\ P(fl+b S ) + P~(fl+bS)Je
s (1 1 +be. - X )
u
1 03
k
- 2:: s._
J -£
2
[p'(d.)-p+'(d.)]e
J
J .
s(x-d.) £+n-l 2
s(d.-x)
J - 2:: s- [p'(d.)-p+'(d.)]e
J •
J
J .
.
,]=k+l
Equating coefficients in (A.l) we obtain (4.26).
it +b
APPENDIX 2
Proof of J
sdGs(t) = 1
~-a
s
By (4.25) and (4.26), we have
~+bs
J
~-a
~+bs
dGs(t)=J (p(t)-s-2p "(t))dt + s-2 us1 p(~-as)-p~(rl-as)J
Il-a s
s
I
s
12DslP(~+b )+p'(~+b) ~
s
~-as
-
s
£+n-l
2::
s2 j=£
(p'(d.)-p+'(d.))
- J
J
b
£+n-l
2
- 1- J p(t)dt- J p(t)dt-s- [p' (~l+b )+ 2:: (p' (d. )-p~(d.))
s
a
~l +b
j =.Q
J
J
s
+ s-2
2,
+n-l
2:
j:::£
(p , (d. ) - P ~ ( d J. ) ) •
-
J
Using (4.21) and (4.22), all terms except the first one
cancel out and we have the desired result.
APPENDIX J
Calculation of AS(X) (i.e.(4.JO))
From (4.6), we have, for xE [a dl -as J
[A s (x)]
-1
=
J
~
+b
~l-a
(
)
s e S t-x dG (t)
s
s
and sUbstituting (4.25) and (4.26), we obtain
104
[AS (x
n
-1
de.e +n _2 d j ·t 1
[1 +bs
= J + L: J
+ J
(p (t )[l-a s j=.e d j
d.e+n-l
+
s ([I-as -x)
..;..e
_
s2
+
[lsi p([1-a s )-P+' ([1-a s )J- +
.e+n-l
s(d.-x)
E s-2(p~(d')-P~(d.»e
J
+
j=.e
J
J
The integrals in the above expression have been calculated in
Appendix 1 (except for minor adjustments in the exponents).
Thus, using results similar to those in Appendix 1, we have
ei.e.
'Ehe calculation of AS (x) for xE [_[1 +b s ' b J is similar and gives
lsi
e S ([1+b s )-SX
()
As x = 2p ([1 +b )
s
from the solution of (4.17).
APPENDIX 4
Proof of the expressions (4.4) and (4. 5)
The relation Ds = -
b p (x) oAs (x)
~As(x)'as
is proven as in
[Tan and Yao(1975),Appendix B] and thus its proof is omitted
e
105
here. The calculation of
o~s(x)/as
is given in the following:
xE(a,~-as)
From (4.30), we have for all
-s(~-a
s -x)
e
(A.3 )
-
~--::-T [
oa
as
_s -
S
2P\~-as J
Differentiating (4.21):
(II-a -x)] e
s
t-"
J
fl.-a
a
-s(~-a
.
s
-x)
sp(t)dt = - ~ p(fl.-as)
with respect to s, we have
s -2[ s
and thus
das
ds
=
1 . ap(~-as)
sp (fl.-as ) • as
s -2
(A.4)
SUbstituting (A.4) into (A.3), we obtain
v
The calculation for XE(fl.+bs,b) is similar. Thus
-(fl.-as-x)AS(X)
1
- lsi AS(X)
xE(fl.-as,wtbs)
-(x-fl.-bS)AS(X)
XE(ll+bs,b) .
From (A.5) and Ds =
-
b.2ixL d~ (x)
J ~. a~
a
s
fl. +b
(A. 5)
dx, we have
fl.-a
b
s
sl
Ds = J (fl.-as-x)p(x)dx+ J
I~ p(x)dx+ J (x-fl.-bs)p(x)dx. (A.6)
a
fl.-as
fl. +b s
From (4.4) and (A.5) we have
106
= sDs+ln
I~I
~-a
s(~-~+as)p(x)dX- In(p(~-as» J
+ J
a
~-a
P(X)dX
a
~+bs
b
- J p(x)lnp(x)dx + J
s(~+b -x)p(x)dx
~-as
~+bs
s
b
-
J
In(p(~+bs»
p(x)dx.
~+bs
Substituting (A.6) into the above expression, we obtain
R(D )= In
s
I~
~+b
-
J
p(~)ln(ep(x»dX
~-a
~
-
In(p(~-as»
s
-a s
J p(x)dx
a
b
- In
p(~+bs)
J'
fl
p(x)dx.
+b s
b
Finally, since Dmax = inf J'lx-y\p(x)dx, it is easy to verify
y a
that the infimum takes place when y=~, the median of p(x).
Thus
b
Dmax = J IX-~ll p(x)dx.
a
APPENDIX 5
Calculations in Example 4.2
Here we only sketch the calculations in Example 4.2 and
omit the lengthy details.
Since p(x) is symmetric about x = 0, we have a s = band
s
(4.11) gives the relationship between lsi and a S :
e·
107
lsi
=
2/ (a-as)
Is' = 2/(a+c-2a s )
for
a s E [c,a]
for
a s ( [O,c].
(A.8)
We now distinguish two cases (i) and (ii).
(i) If lsi'::: 2/(a-c),
using (A.7),(4.12) and (4.13), we obtain
R(D )= In ,sK~+c) + 2/ s2(a 2 _c 2 ) - (a+3c)/2(a+c)
s
The parameter s in (A.9) can be eliminated as follows:
Substi tuting x
= 2/ Isl(a+c)
into (A. 9), we obtain the follow-
ing cubic equation:
x 3 _ 3(a-c) x + 6(a-c)D = O.
a +c
(a +c)2
The solutions of this cubic equation consist of three unequal
real roots:
x
1
=
2
x 2= 2
.jra::c[
a+e cos
ja-c [ 3"
a +c c o s
I
co s
It can be shown that only x
lsi>
-
-l(
3
3D
-; 2 2
a -c
)J
satisfies the condition
_2_
a-c
Therefore
a +c
2
a +3c
R ( D) = - In x 3 + 2(a-c) x 3 - 2(a+c)
108
= 2 eos 2[ 4~ + 1 eoS-l(-~)J- - 2a+3e
( a +e) 3
3
a -e
J~~~.
- In(
a
for
j~
2eos [
+
~
l
eos- (-
.J~
al:::_el:::
)J)
< D < (a-e)(a+2e)
a+e
using (A.8),(4.12) and (4.13), we obtain
1
= In,sl a +e )
R(D s )
D = -1 s
,sl
1
s 2 (a+e)
I
+
2
1
lsi (a +e) -
(a_e)2
12 (a +e)
(A.IO)
Eliminating s in (A.I0), we have
R(D)
for
=-
~4D
J ~- a+e
+
(a_c)2 _ 1n(1- )1- 4aD+e +
3(a+e)2
1 (a-e)(a+2e) < D <
3
a+e
--
Thus (4.37) is proven.
(a-c)2)
3(a+e)2
e-
109
APPENDIX 6
Calculations in Example 2.1.1
In Theorem 5.1.1, let p(x)= (b-a)-l
for xE[a,b].
Then
from (4.10) and (4.11), we have
Now
b
-Hp (P 1 )= -
J Pi (x)ln(Pfx)(b-a»dx= -J
a
b
ptx)lnPtx)dx-ln(b-a)
a
Thus(5.1) becomes
a
+ 1
lsi
+ In I~ + In(b-a) J
2
a
P (X)dx
1
b
+ In(b-a)
J
Pi (x)dx
1
b- lsi
-
h (Pi) + In l§j
2
_
b- lsi
-1
J
Pi (x)dx
aoot- 1
lsi
+ 1
b
181
-1 + J
= h(P ) + In I§.I
P1 (x)dx + J
Pi (x)dx
2
1
a
1
b- lsi
a
= h(P ) + In
1
1
=
1 -
lsi
+.
1
+
J lsi [Pi (a+t) + P1 (b-t)]dt.
o
lsi
J t[P 1 (a+t) + P1 (b-t)]dt.
0
110
This proves (5.17) and (5.18).
Now condition (5.5) is equivalent to
1 .OAS(X) 2
A (x ) l
oS ) ] dx < 0
s
which is equivalent to (5.19) when AS(X) is replaced by (4.30)
for p(x)= (b_a)-l.
APPENDIX 7
Calculations in Example 5.11 (continued)
K2 (x) = e
2
x2
xE[o,i] since K (x) >
I Ji
e
t2
dt
is clearly increasing for
x
° for
every xE[o,i]. By symmetry,K (x)
1
is decreasing for xE[-i,o].
We now show that p (x) __
1_ p" (x) >
s2
° for
each sand
2
4x ]>o
s
2
(A .11
)
if and only if 4x 2 < s2_2 for all IXJ~ as' A sufficient
condition for (A.ii) to hold is that 4a~ < s2_2. But the
maximum of as is
i, thus if s2>3, (A.ii) will hold. We will
show that s2> 3 as follows:
a2
i 2
Since lsl= e sl J eX dx and K 2 (x) is monotonically
as
increasing, we have, by letting a s =
°
i
Isl~ [J
°
2
eX dx]-i.
•
III
Since e
x2
2(e~-1)x
is convex,
+ 1
~
e
x2
for every xE[O,tJ
(left side of the inequality is the line segment joining the
x2
points of e
at x=O and x=t). Therefore
1
t
i.e.
> J
- 0
2"
Hence
APPENDIX 8
p(X)={
k
~
Calculations in Example 5.1.2
-1 < x < 0
o
< x < 2
Calculation shows that the median
~
is 2/3. Let Vs =
2 a s' 3+bs
2
] be the support of the distribution Gs(y).
[ 3-
(i) Suppose ~-as > O.
This case is similar to the case of the uniform distribution. One can apply Theorem 4.2 directly. The rate distortion
function can be expressed as:
_ ---.L
~
R( Ds ) - ~I- 1 + In 3
D= l_-L-+ 1
s
/sl
8 rst 2
m
Eliminating lSI, one obtains
R(D)= -
~1-(24D-1)716
- In(l-
~1-(24D-1)716)
112
(1°1°) Suppose - 1 _< ~
J - as < 0 •
Then gs(y) has the form
where
1
c1.s = 4jSJ
Also
a
s
C2.s =-
i _ 1
:::t
J
,
lsi
b
s
_1_
8 isf
, CJ.s---lEV s •
8\s\ • V y
=~ _1 ·
J
\sl
From these expressions, As(x) can be calculated and has the
following form:
2
2 IS\ e lsi (3 - a s )-\Sl x
-1 < x
2
2 lsi
As (x)=
J
~ lsi
_ 2
a
- -J - s
'-
- a s -< x < 0
0 < x -< ~+ b
J
J
4\s\ e- \sl
J
2
(3
+ b )+ lSI x
~ + b < x < 2 ..
s J
s
It remains to show that Cs(y)
~
1 for y
f- Vs ' We will give
the proof of this when y ~ ~ + bs ' The case when y ~
is similar.
2+b
+J
3
o
s
+
y
2
+ J A (x)p(x)e s IX-yldx
y s
£+b
J s
J
s
5-
as
113
dx +
2
-lsi (-3 +b )+ lsi x -lsi (y-x)
! lsi e
s
•e
dx +
2
- ,sl (5+b ) +Isl x -lsi (x-y)
+ J !Isl e
s
'e
dx
y
= ! lsi e
- lsi y+ \sl (£-a )
3 s (2_ a
3
s
)+! lsi
,sl (,g-a )
e -lsi y. 1(1 _ e
3 s )+
\s,
lsi (j-+b s )
- lsi y- \sl (£+b )
-1) + ! lsi e
3 s
+ ! lsi e - lsi y 'I~I( e
.!. -
4
e
2
lsi y+ lsi (-3 +b s )
Differentiating Cs(y) with respect to y, we have
2
2
C' (y)= - ; lsi e- lSI y+ lsi (5+b s) + \sl (-:i+ lsi -! lsi y)elsl y- ISI(5+b s)
s
2
2
=\s\ JSI Y- 1S\(5+b s) [-:i+lsl-!lsl Y_:te-2ISI y+2/s\ ('3+ b s) ].
Let f(y)= -; + lsi Then
f' (y)=
-! \sl
if and only if Yo=
2
lsi y _ie- 2Is1 y+2 lSI ('3+b s).
i
+
!
~
2
IS/ e -21 sl y+2 lsi ('3+b s)
+ bs '
and f' (y )~:o
o
Now f"(yo)= -lsi 2 < 0
is a maximum. But f(yo)=O and thus f(y) ~
a and
i.e. f(yo)
C~(y) ~ 0
IJJI')
which implies that Cs(y) is non-increasing. Since Yo= ~
,1,
= 2- s
i
bs
we have Cs (y0 )=1. Thus
•
Since gs(y) is not a probability density function, the AS(X)
found above can be used only to provide a lower bound for
the rate distortion function of p(x). Thus by Theorem 4.1
Differentiating RL(D,s) with respect to lsi and setting
Ri(D,s)=O, we obtain
= .1 _
D
s
lsi
5
161sf'
Also RL"(D,s) = ~
8 Is,
lSi> 1.
-1-2 < 0 •
lsi
Eliminating lsi we obtain
RL(D)=
-~
) 4-5D - In(2-
APPENDIX 9
~)
F(x)= J
p(t)dt
and
FE(x)=
~
J
PE(t)dt.
-1
It is clear that F(x) = FE(X)
FE(x) for
Ixl~
for -1 < x < -E and C < x < 2.
E. Thus
o
E
p(F,F E)= J (FE(x)-F(x))dX + J (FE(x)-F(x))dx.
-E
0
Now
FE(x)=
~ i~ .
x
-1
Also F(x)
0 < D
Calculations in Example 5.2.1
x
Let
6~~'
- i In
1
--T,i
'+
+
x
t 2
J ( --E 16E 2
+
-1
8E
+
----5 )dt
1
Ib
115
= 1+
~
2
+
loE
E + 5~ +
Ij:8
1b
FE(x)= ~ + Ij:8E +
x
r (0
+
for -E< x< 0
2
t
-t -2+ +
16E
8E
Ii
)dt
for
0
< x < E
x)
x2
1 +Ij:8E
= ~
+1i +fbE - 48E 2
and
l+x
F(x) = T
F(x)
=
it
+.:g
8
for
-E< x < 0
for
0
-< x < E.
Therefore
0
2
x)
1
E +x
P(F,F E )= J (~+ ~
+IbE+1
48E 2
-E
ti - T
1 +x
E
E
+ J (1 +Ij:8
o 4
E
=
J
0
E
(24" -
x)
- 48E 2 +
x2
+ 8E
24E 2
x)
x2
1
1b'E + ~- 4:
)dx +
.:g )dx
8
x )dx
'8
Next we find a certain range of E so that the rate distortion
function can be calculated by Theorem 4.2. Condition (2) of
Theorem 4.2 is clearly satisfied. Thus we require
For
Ixl~
E, above condition is clearly satisfied since
Now for 0 < x < E,
PE(x) - s-2 PE (x) - PE(x) + (8E 2s 2 )-1> O.
116
For -E < x
~
0,
PE(x) -
s-2PE(X)~ 1~:2
+
~
+
Ii - 8E~s2·
(A.12)
By considering the derivative with respect to x,we see that
(A.12) is an increasing function of x for fixed E > 0 and
lSI > O. Thus a sufficient condition for (A.12) to be nonnegative is
p (-E) - s-2 p "{_E»
E
E-
0
Now lsI is a decreasing function of ~-a
2s
s
by (4.10). Thus we can choose E > 0 such that
E2 >{ 2s 2)-1> {2s 2 )-1
i.e.
E2~ -1-2 •
-
0
-
15
where IS o\ is the value of I s I when ~ -as=O, i. e. ISol = 12+
Therefore we require
2E 2 > (12+E)2/15 2
E'
E > 12/{15.J2 -1) ~ 0.59367 .
or
APPENDIX 10
Show: If III-All < 2~E ' then 1::: lIAIIIlA- 111::: 1+(,
Proof:
Let
> O.
E
B = I-A. Then IIBII= II I-AII=s. 2~Ecl {since ( > 0).
It follows from IlBII.~ 11- /lAU
II A \I <
I
that
1 + UBI! •
(A.13)
Now since IlBII< 1, we have
A- 1= (I_B)-l= I + B + B2 + •••
Hence IIA-1_III~ UBII + IIBU 2 + ••• = UBII/(l-!IBII)
and as in (A.13) we have
1
\lA- 11 ~
1+IIA- 1 - III
~
1+
l~~~lf
(A.14)
•
Multiplying the corresponding sides of (A.13) and (A.15),
~
117
we have
II A 1\ \I A -1" <
1 + IIBII .
- l-IIBII
(A.16)
The right side of (A.16) is easily seen to be an increasing
function of IIBII. Thus by assumption, we have
II A III1A -1"
:s.
1 + E.
On the other hand we have
(A .18 )
Combining (A.17) and (A.18), we obtain the desired result.
lIb
R: nats
"
4
3
e----
R(D)
2
1
o
.05
.1
.15
.2
Figure 5.1 (0:::.:0.1)
.25
Dmax ~0.248
D
119
R: nats
4
J
o
0-
R(D)
2
1
D
o
Figure 5.2 (0.=2)
Dmax~O.209
120
R: nats
,c
4-
3
0----
R(D)
2
1
o
L.---......:..-----+---------+---------l~D
.05
Figure 5.3 (0.=5)
121
R: nats
4
3
0 - - - - R(D)
2
1
o
D
· 5
Figure 5.4 (a=10)
BIBLIOGRAPHY
Bellman, R. (1960) Introduction to Matrix Analysis, McGrawHill Inc., New York.
Berger, T. (1970) Information Rates of Wiener Processes,
IEEE Transactions on Information Theory, IT-16: pp.
134-139.
Berger, T. (1971) Rate Distortion Theory, Prentice Hall Inc.,
Englewood Cliffs, New Jersy.
Binia, J.; Zakai, M. and Ziv, J. (1974) On the E-Entropy and
the Rate-Distortion Function of Certain Non-Gaussian
Processes, IEEE Transactions on Information Theory,
IT-20: pp.517-524.
Cambanis, S. (1975) Private Communication.
Dorbrushin, R.L. (1970) Prescribing a System of Random Variables by Conditional Distributions. Theory of Probability
and Its Applications, 15: pp.458-486.
Fan, K. (1949) On a Theorem of Weyl Concerning Eigenvalues
of Linear Transformations, I, Proceedings of the National Academy of Sciences of the United States of America,
Volume 35: pp.652-655.
Gallager, R.G. (1968) Information Theory and Reliable Communication, Wiley, New York.
Gerrish, A.M. and Schultheis, P.M. (1964) Information Rates
of Non-Gaussian Processes, IEEE Transactions on Information Theory, IT-l0, pp.265-271.
Gray, R.M. and Davisson, L.D. (1974) Source Coding Theorems
Without the Ergodic Assumption, IEEE Transactions on
Information Theory, IT-20: pp.502-516.
Gray, R.M.; Neuhoff and Schields (1975) A Generalization of
Ornstein's a Distance with Applications to Information
Theory, Annuals of Probability, 3: pp.315-328.
Grenander, U. and Szego, G. (1958) Toeplitz Forms and Their
Applications, University of California Press, Berkeley
and Los Angeles.
123
Hardy, Littlewood and P6lya (1967) Inequalities, Cambridge
University Press.
Miller, K.S. (1964) Multidimensional Gaussian Distributions,
John Wiley and Sons, Inc., New York.
Ornstein, D.S. (1973) An Application of Ergodic Theory to
Probability Theory, Annuals of Probability, 1: pp.
43-58.
Pinsker, M.S. (1964) Information and Information Stability
of Random Variables and Processes, Holden-Day, Inc.,
San Francisco.
Rubin, I. (1974) Information Rates and Data Compression
Schemes for Poisson Processes, IEEE Transactions on
Information Theory, IT-20: pp.200-210.
Tan, H.H. and Yao, K. (1975) Evaluation of Rate-Distortion
Functions for a Class of Independent Identically Distributed Sources Under an Absolute-Magnitude Criterion,
IEEE Transactions on Information Theory, IT-21: pp.
59-64.
Shannon, C.E. (1959) Coding Theorems for a Discrete Source
with a Fidelity Criterion, IRE National Conference
Records, Part 4: PP.142-163.
Vallender, s.s. (1973) Computing the Wasserstein Distance
between Probability Distributions on the Line, Theory
of Probability and Its Applications, 18: pp.824-827.
Vershik (1964) Some Characteristic Properties of Gaussian
Stochastic Processes, Theory of Probability and Its
Applications, 9: pp.353-356.