Information rates for a discrete

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991
1527
Information Rates for a Discrete-Time
Gaussian Channel with Intersymbol
Interference and Stationary Inputs
Shlomo Shamai (Shitz), Senior Member, IEEE, Lawrence H . Ozarow, Member, IEEE,
and Aaron D. Wyner, Fellow, IEEE
Abstract -Bounds are presented on Zi,i,d,-the achievable information rate for a discrete Gaussian channel with intersymbol
interference (IS11 present a n d i.i.d. channel input symbols governed by a n arbitrary predetermined distribution p , ( x ) . Upper
bounds on I , the achievable information rate with the symbol
independence demand relaxed, are given as well. The bounds
are formulated in terms of the average mutual information of a
memoryless Gaussian channel with scaled i.i.d. input symbols
governed by the same symbol distribution p , ( x ) where the
scaling value is interpreted a s a n enhancement (upper bounds)
o r degradation (lower bounds) factor. The bounds apply for
channel symbols with a n arbitrary symbol distribution p,(x),
discrete as well us continuous, a n d thus facilitate bounding the
capacity of the IS1 (dispersive) Gaussian channel under a variety of constraints imposed on the identically distributed channel
symbols.
The use of the bounds is demonstrated for binary (two-level)
i.i.d. symmetric symbols a n d a channel with causal ISI. I n
particular a channel with two a n d three IS1 coefficients, that is,
IS1 memory of degree one a n d two, respectively, is examined.
The bounds on 1i.i.d. are compared to the approximated (by
Monte Carlo methods) known value of Zi.i,d, a n d their tightness
is considered. An application of the new lower bound on Zi,i,d.
yields a n improvement on previously reported lower bounds for
the capacity of the continuous-time strictly bandlimited (or
bandpass) Gaussian channel with either peak power or simultaneously peak power a n d bandlimiting constraints imposed on
the channel’s input waveform.
Index Terms --ISI, additive Gaussian channel, capacity, average mutual-information.
I. INTRODUCTION
C
ONSIDER the discrete-time Gaussian channel
(DTGC) with intersymbol interference (ISI) described by
Manuscript received July 19, 1990; revised February 18, 1991. This
work was done at AT&T Bell Laboratories, Murray Hill, NJ.
S. Shamai (Shitz) is with the Electrical Engineering Department,
Technion-Israel Institute of Technology, Haifa 32000, Israel.
L. H. Ozarow is with the General Electric Corporate Research and
Development Center, Room Kwc 611, P.O. Box 8, Schenectady, NY
12301.
A. D. Wyner is with AT&T Bell Laboratories, Room 2C-365, 600
Mountain Ave., Murray Hill, NJ 07974.
IEEE Log Number 9101891.
where ( x k } are stationary identically distributed real-valued channel input symbols, (yk> are the corresponding
channel output observables, ( h k )are real IS1 coefficients’,
and ( n k } are independent identically distributed (i.i.d.1
zero-mean Gaussian noise samples with variance E ( n i ) =
(T2.
A convenient way to describe the channel (1) using
matrix notation is
yN= HNxN+nN
(2)
and it resides on the notion of the N-block DTGC [l],[2].
Here, y N = ( y o ,y l ; * * , y N - , I T , x N = ( x o ,~ 1 , .‘ . , X N - 1 I T
are column vectors with N
and n N = ( n o ,n , ; . ., n N components standing, respectively, for the output samples, channel symbols and noise samples and superscript
T denotes the transpose operation. The equivalence between (2) and (1) is evident for N + cc [ll and in this case,
which is of interest here, “end effects” are suppressed [l]
and the rows of H = H” are specified by circular shifts of
the IS1 coefficients {hi}.We assume throughout finite
energy )lhJ12<cc where h stands for the IS1 vector
( h o ,h , , h , . . . and (1. II denotes the I , norm. Note that as
far as information is concerned, the model in (2) can also
be used when the stationary noise samples are correlated
with an invertible correlation matrix T N = E [ n N ( n N ) ’ ] .
The conclusion follows by employing an information-lossless linear orthogonalizing transformation on the channel
output vector y N .
This classic model has been extensively used in the
information-theoretic and communications literature [l],
[3]-[6] (and references therein), and thoroughly examined
from a variety of aspects. The DTGC is well adapted to
describe pulse amplitude modulated (PAM) signaling
through a dispersive Gaussian channel encountered often
in telephone lines 151, and magnetic recording [6], when
optimal matched filters [4], sample-whitened matched filters [7] or mean-square whitened matched filters [SI as
well as linear suboptimal prefilters [9] are used by the
detector.
‘For the sake of brevity we refer to h , as the null IS1 coefficient. If
there are only M nonzero IS1 coefficients say h , , h , ; . ’ , h,+, we refer
to the channel as having IS1 memory of order M - 1.
0018-9448/91$01 .OO 01991 IEEE
_-
-~
~
_T
.~
~-
1528
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991
which is formidable for channels with large memory.
Closed form bounds on this cut-off rate which can evidently be employed as lower bounds on the information
1
rate
I, I were recently reported in [27]. Unfortunately,
I = NIim
-z(yN;xN),
(3)
- m h’
the results of [271 do not extend to other discrete alphaevaluated for a given input distribution p , ~ ( 8 )is of bets and the bounds are not always tight.
Orthogonalizing transformation [lo], [20], [28] is appliinterest since it determines the achievable rate of reliable
communication through this channel with this specific cable only in cases where the constraints imposed on the
input distribution [10]-[12].2 We are also interested in channel inputs {x,)can be translated into another set of
where
}
x N = 9iNand where
Zi.i,d,, the resultant average mutual information where i.i.d. constraints imposed on ( i C
input symbols are used, that is, ~ , N ( u =~ ) l p x ( a l ) . 9 is an N X N shaping matrix which orthogonalizes the
channel. This is easily done for block average power
Capacity,
constraints
[l] but is a subtle problem for other sets of
1
C = lim - sup z ( y N ; x N ) ,
(4) constraints (i.e., peak limit, equispaced discrete symbols).
The Tomlinson precoding approach [17], [19], [29] introN+m
PXN(UN)
duces similar obstacles since it is in general unknown how
where the supremum is taken over all admissible distributo characterize the inputs of the Tomlinson precoder such
tions p,.(aN) satisfying certain constraints, is well known
that the outputs {x,)(which form the channel inputs) will
only in the case of block average power [ll, [21, [101-[121
satisfy a given set of constraints.
or symbol average power [l] constraints, imposed on the
Upper bounds on either C, I , or Z l l d are found by
channel symbols {xi).In this case the capacity is achieved
replacing the actual channel symbols with Gaussian symby letting {xi)be dependent Gaussian random variables
bols having the same second-order moments and using
[l], [lo]. The average mutual information Zi,i,d. for i.i.d.
“water-pouring’’ arguments whenever needed [8], [lo],
Gaussian symbols {xi} is also well known [l] and of
[XI,
DI.
interest in a variety of cases [l], [131.
In certain cases where continuous-time constraints of
In several problems of primary theoretical importance,
the peak power [15] or constant magnitude (envelope)
the constraints imposed on x, preclude the use of Gauss[3013 type are imposed, improvement on the Gaussian
ian channel symbols. Peak limited [9], or both simultanebased upper bounds was achieved. Unfortunately it seems
ously peak and band limited channels [14], [15], are such
that no other general bounding techniques, applicable for
specific examples. Furthermore, in most practical cases
arbitrary symbol distributions, either continuous or diswhere coding over the IS1 channel is employed [161-[201
crete, are available.
the channel symbols are discrete taking on values from a
The information rate I, I d which corresponds to i.i.d.
finite alphabet. It is therefore important to estimate eichannel symbols {x,)with a given arbitrary symbol distrither the capacity C or the average mutual information Z
bution p , ( x ) evidently forms a lower bound on C in the
and Zi,i,d, for non-Gaussian symbols as well.
case when i.i.d. symbols, the distribution of which is
The main lower bounding technique on C, I , or Ii,i.d,
governed by p,(a), are permissible but are not necessarily
dates back to Shannon [ 141 and was extensively applied in
the optimal, capacity achieving selection. Nevertheless,
numerous interesting channels with various constraints
the information rate I l l ddeserves also attention for its
imposed on the channel input waveform, [91, [151, [21].
own sake [l], [25] mainly, since most practical coding
This technique relies heavily on the convolutional inschemes [31] approximate closely the statistics of random
equality of entropy powers [22] and the asymptotic propcodes, that is: i.i.d. inputs {x,)which are not necessarily
erties of log determinants of Toeplitz matrices [23]. The
uniformly distributed, [4], [lo], [31]. In a variety of interuse of the convolutional entropy power inequality preesting cases however, specializing to i.i.d. input symbols
cludes the application of these techniques to discretely
does limit generality, therefore the information rate I
distributed channel symbols {xi).Other lower bounds on
with the independence restriction relaxed is also adZi.i.d, based on the cut-off rate R , [24] for these channels
dressed. Upper bounds on Z maximized over all individare also adapted to continuous channel symbol alphabets.
ual symbol distributions p,(a) under the relevant
Binary i.i.d. symbols are considered in [25]-[27] and even
constraints, yield corresponding upper bounds on the
for this special case no general analytical methods for
capacity C.
computing Zi,i.d, are known and the difficulties in underIn this paper, we derive lower and upper bounds on
taking this task are pointed out in [251, where Monte
Z
I
l
d and upper bounds on Z formulated in terms of the
Carlo techniques, were applied to approximate Zi.i,d. for
certain channels, with relatively few nonzero IS1 coeffi- average mutual information for ISI-less (memoryless)
cients. The cut-off rate, however, for binary i.i.d. channel scalar channels with i.i.d. inputs governed by the same
symbols, is determined in terms of the maximum eigen- symbol distribution p,(a). These are easily evaluated
The average mutual information (in nats unless otherwise stated) per channel use,
n;“=
value of an IS1 related matrix [41, [25], the evaluation of
*The proof of the direct part of the coding-theorem requires in
certain cases more stringent assumptions [ll].
3Als0 see S. Shamai (Shitz) and I. Bar-David, “On the capacity
penalty due to input-bandwidth restrictions with application to ratelimited binary signalling,” IEEE Trans. Inform. Theory, vol. 36, pp.
623-627, May 1990.
1529
SHAMAI (SHITZ) et al. : INFORMATION RATES FOR DISCRETE-TIME GAUSSIAN CHANNEL
either numerically [32] or bounded once again to give
closed form expressions using techniques which are mainly
applicable for scalar memoryless channels (see [33] for an
example). The simple upper bounds based on Gaussian
input symbols are also mentioned.
Though we have specialized here to the real valued
ISI-channel the main results reported carry over with
mainly notational changes4 to the complex IS1 channel
for which {xi}, {nil, and { h i ) are complex valued. The
complex representation is adapted to describe passband
systems with quadrature amplitude modulation (QAM).
The lower and upper bounds on Zi,i.d, and I are formulated in the next section. In Section 111, the bounds on
Zi.i.d. are calculated for independent equiprobably binary
channel symbols and for causal channels with IS1 memory
of degree one and two ( h , # 0 for i = 0 , l and i = 0,1,2
respectively). The bounds are compared with the approximated value of 1i.i.d. calculated in [251 using Monte Carlo
techniques and their tightness is addressed. The lower
bound on Zi,i.d, is also applied to the continuous-time
strictly bandlimited and bandpass channels with inputs
constrained to be peak power limited (PPL) [9] or simultaneously band and peak power limited (BPPL) [15].
Improved lower bounds, especially at low regions of the
signal-to-noise ratio, over those previously reported [9],
[14], [15] are found by incorporating the optimized discrete symbol distribution in the lower bound derived here.
The paper concludes with a discussion. Appendix A includes the proofs and in Appendix B some upper bounds
on li,i,d,
are presented in addition to those appearing in
Section 111.
11. BOUNDSON THE INFORMATION
RATES
In this section, we present lower and upper bounds on
Ii,i,d.= Z(3) evaluated for i.i.d. channel input symbols, that
is, p x ~ ( a N=)rIE1p,(a,) where p,(a) is an arbitrary,
either discrete or continuous, known probability function.
An upper bound on Z(3) for identically distributed symbols (any individual symbol is governed by the probability
function p,(a)) not necessarily independent, is also derived. The bounds are formulated in terms of average
mutual information values for scalar memoryless Gaussian channels with i.i.d. inputs.
A. Lower Bounds
The following theorem proven in the Appendix A
establishes a lower bound on Zi.i,d, (3).
Theorem 1: A lower bound on
specified by
1i.i.d.
denoted by Z, is
where
I,
= I( px
+ v ;x ) ,
(6)
with x being a random variable with the probability
functionp,(a) and v being a zero-mean Gaussian random
variable with the same distribution as that of nk in (1)
(variance a 2 ) .The degradation factor p equals
-1
2%-
1 T r
p = exp
0
I n \ H ( h )l2 d h ,
(7)
where
m
h,exp(-ilh),
H(h)=
l=
i = m ,
(8)
-m
is the IS1 “transfer” function having a 2%-period.
The lower bound I, is given in terms of the average
mutual information of a scalar memoryless channel with
input x having the same probability function as the original x i and output p x + v where v is a Gaussian random
variable with the same distribution as that of nk in (1)
(variance a2).The factor p2 is, therefore, interpreted as a
power degradation factor that rises due to the memory
introduced by the IS1 coefficients {hi}.
It is realized, using a classical results of spectral factorization theory [3, Section 8.61, [201 that p equals exactly to
the leading (zero) coefficient of the discrete-time IS1
channel at the output of the feed-forward filter that yields
an equivalent representation of the channel in (1) having
only causal IS1 coefficient^.^ If this is already the case,
that is h, = O for I < 0, and the discrete IS1 channel is
minimum phase, i.e., the channel in (1) can be interpreted
as modeling the output of a sample-whitened matched
filter [7], then p = lhol.The lower bound (6) is interpreted
therefore as the average mutual information that corresponds to the ideal decision feedback equalizer (DFE) [3]
with errorless past decisions, which are used to fully
neutralize the causal IS1 effect [191, [20l, [291, [341. Note,
however, that no assumptions of errorless past decisions
were incorporated in the derivation of the lower bound I,
(see Appendix A).
For no ISI, that is only h , # 0, p = lhol as it should; in
this case the bound is exact I, = Zi,i.d,. Note that no
restriction whatsoever was imposed on p,(a) making the
results applicable to a wide class of problems as is further
discussed and demonstrated in the next section.
Tightening the bound ,
Z and a comparison with the
“interleaved” straight-forward lower bound are discussed
in Appendix A.
B. Upper Bounds -i.i.d.
Symbols
Several upper bounds on Zi.i.d. where the input symbols
are governed by the probability function p,(a) are sum-
4Conjugate transpose and complex norms are introduced wherever
needed.
% is assumed that the Z-transform of the causal IS1 coefficients at
the output of the feed forward filter has no zeros at the origin. In this
case p is interpreted also as the exponent of the null coefficient of the
complex cepstrum associated with these causal IS1 coefficients, see
A. V. Oppenheim and R. W. Schafter, Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975, ch. 10.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991
1530
marized in Theorem 2 and Lemma 1, which are proved in
Appendix A. Additional upper bounds for this case appear in Appendix B.
Theorem 2 -Matched
P, V ( U * )
Filter Bound:
=
n
Theorem 3 -Maximal Gain Bound:
1
I = Iim -I( y N ;x ” ) II,, = I ( ( x + v ; x), (12)
N + m
N
where x is a random variable governed by the probability
function p,(a), v is a zero-mean Gaussian variable with
the same distribution as that of n k in (1) (variance u 2 )
and the power enhancement factor
p,(a,)
5=oF”=T I H ( A ) l >
/=I
IM
I,
= I(
llhllx
+ v ; x),
(9)
where x is a random variable with the probability function p,(a), v is a zero-mean Gaussian variable with the
same distribution as that of nk in (1) (variance u 2 )and
the power enhancement factor is the norm
The notion “matched filter bound” stems from the fact
evidenced in Appendix A, that I,,
corresponds to a
single shot transmission meaning that only one symbol is
transmitted. For uncoded communication this assumption
leads to the matched filter lower bound on error probability [3]. Again, the upper bound I,, (9) is formulated in
terms of the mutual information of a memoryless channel
with i.i.d. inputs where llh1I2 (10) takes on the interpretation of a power enhancement factor as opposed to the
power degradation factor in (71, p 2 Illhl12,appearing in
the lower bound IL.
The Gaussian upper bound I,, to follow results imme’ diately by invoking standard arguments (see Appendix A)
and it is stated in the following lemma.
Lemma 1 -Gaussian Bound:
(13)
where H(A) is given by (8). The enhancement factor 5 is
interpreted as the maximal gain of the IS1 “transfer-function” H(A). Note that
I ( 5 E=
: -,lhrl.
d
n
The Gaussian-based upper bound I,,
stated next is
specified by the average mutual information over this
channel, taking {xl}to be the Gaussian symbols with the
same correlation as that corresponding to the actual symbols.
Lemma 2 -Gaussian Information Bound:
1
I = NIim
+ m -NI ( ~ , ; x , ) II,,
=
(1 + s ( A ) I H ( A )
-/-In
1
257 0
U
12)
dh,
(14)
where H ( A ) is given by (8) and where
CO
rx(1)er’*, i=m,
S,(A)=
I=
(15)
-m
stands for the discrete power spectral density of the
sequence {xl} for which r,(l) = E ( X , + ~ X , )denotes the
correlation coefficients. For i.i.d. symbols the bound I,,
(14) reduces to I,, (11). Clearly,
1
I
P
d h (16)
In [ma~(@1H(A)1~,1)]
2%with 0 being the solution of
I,,,
iT ( + (
1
22n-
In 1
PA / u ) I H ( A )
I ’) d A ,
IC,
=
-
~ ~ m a x ( O - I H ( h ) l - ’ , O ) d A = n - P ~ / u 2 . (17)
( 11)
where PA = E1xI2. This upper bound on Ii,i,d.equals the
mutual information I,, = lim,
l / N ( y N ;x N ) in the
case where {xi)are assumed to be i.i.d. Gaussian random
variables with variance (average power) PA. For symmetric binary symbols this kind of a bound was mentioned
and used in L2.51.
Two additional upper bounds I,,(i,i,d,)and I,Z(i,i,d.)are
stated respectively in Lemmas B1, B2, in Appendix B.
~~
C. Upper bounds -Identically Distributed Symbols
We relax now the independence demand and assume
that the symbols {xi} that are not necessarily i.i.d. are
identically distributed where each symbol is governed by
the probability function p,(a). The upper bounds stated
at Theorem 3 and Lemma 2 are proved in Appendix A.
The value C, is interpreted as the capacity under average
power constraints [1] that results by maximizing (14) over
all S,(A) that satisfy a symbol average power constraint,
that is, r,(O) = E ( x 2 )= l/n-/,“S,(h)dA = PA.
First, note that for no ISI, llhll = IhJ, and i.i.d. symbols,
we have that I = I, = I,,, while I I I,, = I,,
= C,.
For Gaussian symbols {xJ with correlation coefficients
r,(Z), and IS1 present, we have I = I,,.
Assume now that (x,}are i.i.d. and each symbol x is a
discrete symmetrically distributed random variable that
takes on N possible values and satisfies E(1xI2)= PA. It is
clear that for PA / a 2--)LE both I,.l,d.and I,,, + approach
to the entropy of the discrete random variable x - b ( x ) ,
where 6 stands for the standard entropy function [lo],
resembling thus the correct behaviour of Il,l.d,,
while I,,
+ w . For low SNR (PA/ a 2+ O), it can be shown, in a
similar way to that used in [25] for binary symbols, that
SHAMAI (SHITZ) el
U/.:
1531
INFORMATION RATES FOR DISCRETE-TIME GAUSSIAN CHANNEL
example in [25] and [32] while closed form bounds [331 are
further discussed in the next section.
For the sake of simplicity we turn now to the case of
only two nonzero IS1 coefficients (IS1 memory of degree
1)having the values h, = (1+ a’)-’/’, h, = a ( l + a 2 ) - 1 / 2 ,
and lh,l = 0 for i # 0 , l . We use here the convenient
normalization [25] Ilh11’ = h i + h; = 1. The parameter
- 1I
aI
1 determines the amount of IS1 present, a = 0
111. APPLICATIONS
corresponds to no IS1 while a = + 1(- 1) corresponds to
We apply here several of the bounds presented in the the duobinary (dicode) case [3], [5] that yields the maxiprevious section to some interesting examples. In Section mum IS1 possible with memory of degree 1. This very
111-A, we address the binary symmetric case, that is, { x L ) simple model is important in some practical cases encounare i.i.d. binary symmetrically distributed [25] symbols tered in magnetic recording [61, [161, [181, 12.51, [271.
For this case,
with causal minimum phase IS1 the memory order of
which is L - 1 , that is, h,=O for 1 < O and lr L . In
particular we examine the cases of L = 2 and L = 3.
In Section 111-B, we specialize on lower bounds for the
continuous-time bandlimited baseband channel with either a peak power limit (PPL) [9] or simultaneous bandwidth limit and PPL (BPPL) [14], [15] constraints imposed and
on the continuous-time channel input signal. The relevant
results for the bandpass case are also mentioned.
I,, = (1/2) I n ( l + P,,,,/u*) + (1/2) In 1/2+ (1/2)
A. Binary Symmetric Symbols with Causal Finite Minimum
Phase ISZ
I,, -+ ( I / ~ > I I ~ I I/ ~aP2,, see also [351 for similar
arguments.
In Appendix B, another two upper bounds on Zi,i,d.
denoted by Z,l(,.i,d,) (B.1) and IUZ(i,i,d,)
(B.4) are derived.
These bounds may turn, for certain cases, tighter as
compared to I,,, (9) and I,, (11) presented here, see
further discussion in Appendix B.
1i.i.d.-
Consider the binary symmetric case, that is, x , are i.i.d.
binary symbols taking on the values k f i with equal
probability 1/2. This is an interesting application since in
several communication problems the transmitter is re- where ,,Z follows from (11) substituting in (8)
stricted to use only binary alphabets [91, [lo], [161, [181,
2cu
[25], [27], [32]. We specialize here to the causal minimum
IH(A))* = h: + h; +2h0h,cosA = l + -cos A
phase IS1 representation, as is the case at the output of
l+a2
the sample-whitened matched filter (or the feed forward
part of the DFE equalizer [3]) and assume that the IS1 and using the integral [36, Section 4.224, p. 3291. The
memory is of degree L - 1, that is, h, = 0 for 1 < 0 and upper bounds Zul(,, ), and I,, , specialized for this
binary case (given, respectively, by equations (B.5a) and
12 L .
(B.5b) in Appendix B) were found to be less tight as
The lower bound
compared to min(ZUM,.),Z
I L = Cb( h t P M / a ’ ) ,
( 18a)
The bounds Z, (20a), ZUM (20b) and ,,Z
(20c) in
1
for
a
=
1 (the
(bits/channel
use)
are
shown
in
Fig.
and the upper bound
duobinary case) versus the signal-to-noise ratio P,,,,/ a 2
= c b ( II h 11 ’PM
)
( 18b) and are compared to jl the approximated value of 1,,d
calculated in [25, Fig. 4.41 using Monte Carlo techniques.
are given in terms of Cb(R) = ~ ( 6 +ap ; a ) the capacity The bounds ZL and ZuM are 3 dB apart and the matchedof a Gaussian scalar channel with binary inputs, where a filter upper bound,,Z (20b) is tighter for low and modis a binary random variable taking on the values k 1 with erate values of the signal-to-noise ratio PM/a2while the
equal probability 1/2 and p is a normalized Gaussian lower bound I, (20a) is found to be tighter for high values
random variable. The argument R is, therefore, interof the signal-to-noise ratio. Note that the Gaussian upper
preted as the signal-to-noise ratio. The notation Cb(R) is bound
,,Z is remarkably tight for small values of the
used since it is actually the capacity of the memoryless signal to noise ratio P M / u 2I 0 (dB) and it is the preGaussian channel with binary inputs and it is determined ferred upper bound in the region PM/a2I 2.5 (4dB).
by a single integral [lo, Problem 4.221, [25, (4.1411
We turn now to examine the IS1 case where L = 3 and
h , = 1/2, h, =
h, = 1/2, (h: + h: h; = 1) which
was considered also in [25, see Figs. 2.3 and 3.11. For this
channel,
Equivalent forms appear in [4, pp. 1531 and [32, pp. 2741.
I H ( A ) l2 = (cos A +
The function C,(R) has been evaluated numerically for
’
9
-
m,
+
m)’
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991
1532
1.
5
P,/w2
Py/u2 ( dB)
(dB)
Fig. 1. Bounds on the information rate Ii,i,d,
(bits/channel symbol) for Fig. 2. Bounds on the information rate Ii.i,d,
(bits/channel symbol) for
symmetric i.i.d. binary symbols and two equal IS1 coefficients h , = h , =
symmetric i.i.d. binary symbols and three IS1 coefficients h , = h , = 1/2,
I/& (IS1 memory of degree one) versus signal-to-noise ratio P M / u z h, =
(IS1 memory of degree two) versus signal-to-noise ratio
(dB). L,: Lower bound (20a). I,,: Matched-filter upper bound (20b). P M / u 2 (dB). ZL: Lower bound (21a). I,,:Matched-filter
upper bound
IUc: Gaussian upper bound (20c). Ii,,,d;
approximated value of li,i,d,
by (21b). Zuc: Gaussian upper bound (21c). Ii,i,d,
approximated value of
Monte Carlo techniques [25].
li,i,d,
by Monte Carlo techniques [251.
J1/2
B. Lower Bounds on the Capacity of the Bandlimited
Continuous- Time Channel with a PPL or BPPL Constraints
1
=-
2.rr
0
[ 1+ PM/a2(cos A + \11/2)2]d A
(21~)
are shown in Fig. 2 (in bits/channel use) along with I,, d ,
the Monte Carlo approximated value of 1,, d [25, Fig. 4.61.
Indeed the bounds are looser in this case with a larger IS1
memory ( L - 1 = 2) when compared to the previous case
with unit ( L -1 = 1) IS1 memory. However, the lower
bound ZL seems to capture the behavior of Zlld for large
signal-to-noise ratio PM/ u 2 values (which is determined
basically by h,) while the upper bound is tight for asymptotically low values of PM/ a 2 for which ZuM (as well as
lUG)
+ 1/2P;/p2
in agreement with the exact asymptotic behavior of
[25, Corollary 4.21. In midrange
values of the signal-to-noise ratio, the bounds ZL and I u M
seem not to be tight. This observation is believed to hold
in general and it is further supported by the Gaussian
case (i.e., x , are i.i.d. Gaussian) for which Zuc
1 / 2 P M / a 2 for P M / a 2 + 0 and C,+ I,,--+
1/21n(p2PM/a2) for P M / a 2+ W . Note that for the
Gaussian case and for asymptotically high signal-to-noise
ratio P M / a 2+CO, C, +,,Z [201, [34] evidencing that no
loss in capacity under symbol average power constraint
incurs by using i.i.d. Gaussian inputs. We conjecture that
the same holds for non-Gaussian continuous symbols as
well.
--f
We turn our attention to the strictly bandlimited continuous-time channel for which the channel filter's transfer function D(f)= 1 for I f 1 I
W and 0, otherwise. The
transmitted channel input, s ( t ) = C k x k g ( t- k T ) is taken
to be a PAM signal where g ( t ) stands for the pulse shape
and T is the symbol duration. The signal d t ) is constrained either to be peak power limited to PM [9] (abbreviated here as the PPL constraint) or to satisfy both a
PPL constraint and a strict bandwidth constraint [141, [151,
that is, s ( t ) is of bandwidth no larger than W (these joint
constraints are abbreviated by BPPL). We specialize here
on lower bounds on the capacity of this channel under the
PPL and BPPL constraints6. Following [9], [151 we restrict
the signal to the PAM class for the baseband case considered here. The channel symbols are chosen to be i.i.d.
digits x , satisfying the peak constraint Ix,I I
(where
subscript M stands for maximum). The pulse shape
g ( t ) is rectangular g ( t ) = (1, It( I
T/2j [91 for the
PPL constraint and spectral cosine7 g ( t ) = .rr2/4[1 ( 2 t / T)2]-' cos(Tt / T ) for the BPPL case respectively,
while the symbol duration T =(2W)-' [9], [15]. It has
been verified that the signals s ( t ) so constructed satisfy
the respective PPL [9] and BPPL [15] constraints. The
6For upper bounds on capacity under these constraints, see [301
an: [15].
In [14], a spectral triangular pulse g ( 0 = [(at/W 1
sin(at/ TI]*
was selected.
1533
SHAMAI (SHITZ) et al. : INFORMATION RATES FOR DISCRETE-TIME GAUSSIAN CHANNEL
frequency response of the receiver filter is chosen to
match d ( t ) = 9''Nf)
where 9and 9 - I stand for the
Fourier transform pair. Evaluating p using the calculations reported in [91 and [151 gives p pppL= e / r and
pApBppL= ~ / 8for the PPL and BPPL cases respectively. Thus, by Theorem 1 and with proper scaling, the
lower bound on capacity (per channel symbol of duration
T = (2Wl-l) is
where p equals pppLor peppL for the PPL and BPPL
constraints respectively and where u 2= NOW with No /2
standing for the two-sided spectral density of the additive
white Gaussian noise. The random variable x may take
any probability function satisfying 1x1s
The ran-
a.
i
a,
than those reported in [9], [151, since in [9] and [15] a
uniform distributions for x in [ G ] was applied
and here the optimizing distributions is used. However,
the improvement measured with respect to the signal to
noise ratio P M / ( N o W )decreases from 101og10( e ~ / 2 =
)
6.3 dB achieved for asymptotically low signal-to-noise
) 0, until it completely vanishes for
ratios P M / ( N o W -+
asymptotically high signal-to-noise ratios P / ( N O W )-+ w.
The bandpass case with either the PPL [9] or BPPL [15]
constraints can be treated in a similar manner since
Theorem 1 applies also for the complex case (see Appendix A, Part f). In this case where a QAM signalling is
employed the channel symbols { x , ) are i.i.d. complex
satisfying I x l i f i
and (72,) stand for i.i.d. complex
Gaussian random variables with independent, zero-mean
real and imaginary components each of variance u 2 =
2N0W. The analysis yields,
2WC,,[ ( e / r ) 2 P M / ( N O W ) ] , bandpass and PPL constraint
I
,
,
=
dom variable p stands as usual for a normalized zero-mean
Gaussian variable with unit variance. In [91, [141, [151, x
had to be chosen continuously distributed otherwise the
convolutional inequality of entropy powers [22] upon
which the derivation of [91, [14], [151 relies, collapses.
Here, free from such restrictions, we chose the channel
symbol distribution to maximize the bound in (22). This
maximizing distribution is well known and reported in
[37]. Denote by C,(R) the capacity derived in [371, that is,
C,(R)=
sup Z ( a + p ; a ) ,
fi
(23)
where C,,(R) stands for the capacity found in [38], which
is also defined by (23). However, a is now a complex
random variable and p is a zero-mean complex Gaussian
random variable with normalized i.i.d. components
[E(Re p)' = E(Im p)' = 1, EKRe pXIm p ) ) = 01.
In [38], it has been proved that the distribution of the
complex random variable a achieving C,,(R) is uniform
in arg(a) and independently discrete in (al. For R I 6,
the constant envelope distribution [38], that is, la1 = 1 with
probability 1, is optimal while for R -+athe optimal
distribution approaches the one that is uniform over a
disk with radius
This observation yields, therefore,
-
fi.
la/ I
where the supremum is taken over all distributions of the
real random variable a satisfying la1 _< fi and where p is
a zero-mean, real, unit-variance ( E ( p 2 )= 1) Gaussian
random variable. The optimized bound Z
, for this channel equals the optimized ZL multiplied by 2W (measured
in nats per sec), and thus takes the form
i
~wc,[( ~ / ~ ) ' P ~ / ( N , w )PPL
] , constraint
ILO =
ZWC,[ ( T / ~ ) ' P , / (
(25)
2WC,,[ ( ~ / 8 ) ~ f ' , /N( O W ) ] , bandpass and BPPL constraint
NOW)],
BPPL constraint
1.
(24)
It has been shown [37] that the distribution of the
random variable a in (23) achieving C,(R) is discrete and
further, for R c; 6.25 [37, Fig. 31, it is binary symmetric,
while for R +CQ it approaches a uniform distribution. It
follows that [37],
-
+
- R,
Cce(R),
In (2e)-'R
-
[
where C,,(R) is given [39] by,
Cc,(R)
=
R+o I
RI-6,
+ 11,
R +w
--)
- l m $ ( [ ) l n ( F ) d T + l n ( 2R
with $ ( T ) = 2Texp[ - R(1-k T ~ ) ] Z , ( ~ R T
and
) where Zo(*)
stands for the zero order modified Bessel function. The
improvement of the lower bounds ZLo (25) on the bounds
reported in [9] and [15] which were derived for complex
input symbols uniformly distributed over a disk of radius
measured with respkct to the signal-to-noise ratio
P M / ( N o W ) decreases
,
from 10loglo2e = 7.35 dB achieved
for asymptotically low signal-to-noise ratios P, / ( N O W )
+ 0, until it completely vanishes for asymptotically high
signal-to-noise ratios P / ( N O W )+a.
a,
IV.
The lower bounds reported here (24) are strictly tighter
I
CAR)
cSb(
R) =
QISCUSSION AND
CONCLUSION
We focus here on the achievable information rates for
the classical discrete tide Gaussian channel with IS1
present and with identically distributed, not necessarily
1534
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991
Gaussian, symbols. Lower and upper bounds on Zi,i,d. (the
information rate, for i.i.d. but otherwise arbitrary channel
input symbols) as well as upper bounds on I (the information rate for identically distributed input symbols, not
necessarily independent) are derived. The bounds are
formulated in terms of the average mutual information
between the output and input of a scalar memoryless
Gaussian channel. This formulation enables a unified
treatment of discrete as well as continuous channel symbol distributions using the same underlined framework.
These bounds are therefore easily calculated, either analytically or bounded again by applying the extensive results and techniques developed for memoryless channels.
To demonstrate this, we turn back to Section III-A where
binary symbols are considered and note that Cb(R), which
is expressed in an integral form (19), can be further lower
bounded as
li,i,d,
for high signal-to-noise ratio values and continuous
inputs, as has been demonstrated by the examples in
Section 111 and the Gaussian case for which Zi,i,d. is given
by (11). Other approaches which lead to memoryless
channels representation, as the orthogonalizations method
[lo], [20], [28] and the Tomlinson nonlinear filter [29]
cannot be directly used due to the difficulties in translating the constraints imposed on the {x,}-the channel
inputs to a corresponding set of constraints imposed on
{i[}
the
, inputs of the resultant memoryless channels. This
translation is straight forward for a block average power
constraints.
For example, if ( x l } , the outputs of a Tomlinson filter
are demanded to be i.i.d. with a given probability function, it is not at all clear how to restrict { i [ the
} , input of
the Tomlinson filter to satisfy this demand. For the spe-
(26c)
where b,,(a) is the binary entropy function [lo]. Equations
(26a) and (26b) were taken from [33] (where N = 2 in
notations of [33] was substituted) while ( 2 6 ~ is
) the cut-off
rate for the binary case [4, p. 1421 which clearly lower
bounds C,(R). Evident upper bounds on 1i.i.d. and I
based on a Gaussian assumption are also mentioned.
The lower bound on 1i.i.d. (6) can be used to lower
bound the capacity of the dispersive (ISI) Gaussian channel under a variety of constraints imposed on the input
symbols which do not preclude the use of i.i.d. symbols.
Upper bounds on capacity are constructed by supremizing
the upper bounds on Z over the relevant constraints
induced on an individual input symbol.
Incorporating the convolutional inequality of entropy
powers [22]’ with the lower bound ZL ( 6 ) reduces exactly
to the lower bounds derived using the standard technique
described in detail in [9].
Assuming a causal IS1 channel (as observed for example at the output of a sampled-whitened matched filter or
a feedforward equalizer), the lower bound in Theorem 1
is interpreted as the average mutual information of a
zero-forcing decision-feedback equalizer having ideal errorless feedback decisions. Note, however, that the errorless past decision assumption has not been employed here
to derive this lower bound. We conclude therefore that,
as far as the average mutual information 1i.i.d. is concerned, ignoring the information carried by the rest of the
IS1 coefficients {hi,i > 0) over compensates for the optimistic assumption of errorless past decisions, yielding thus
an overall lower bound Z, on 1i.i.d; Indeed this lower
bound seems to capture the exact asymptotic behavior of
‘Whenever the channel symbols are continuous random variables.
cia1 case of uniformly distributed (within the extreme
levels) i.i.d. { i [ }
it ,is readily verified that the outputs { x , )
are also uniformly distributed i.i.d. random variables.
Also in this special case the bound in Theorem 1 is
superior over the Tomlinson based bound and that is due
to the information destroying modulu operation at the
Tomlinson receiver. The information loss incurred by the
modulu operation is diminished with the increase of the
signal-to-noise ratio.
The matched filter upper bound I,, (9) shows that
under a given average power constraint at the channel
output, that is, llhJI2is kept constant, IS1 cannot improve
on the information rate 1i.i.d. over that of an ISI-less
channel (that is, h = h,,), This is attributed mainly to the
fact that the symbols {xi} were chosen i.i.d. as is also
concluded in [25] for the binary and Gaussian cases. This
feature is not necessarily true if optimal statistical dependence, (induced by the capacity achieving statistics) is
introduced into the channel symbols as has been demonstrated for Gaussian symbols in [ll. This is clearly evidenced by the upper bound ZUc (12) on Z which shows
that the increase in the information rate cannot exceed
the corresponding information rate for an ISI-less channel with h , taken to be the maximal value of the IS1
“transfer” function, that is: h, = maXO<, rr IH(A)I (13).
It was concluded (see also [25] for the binary case) that
I,, is an asymptotically tight bound on 1i.i.d. for signalto-noise ratios approaching zero.
Since I,,Z,,,Z,,
are formulated in terms of the mutual information of an ISI-less (memoryless) scalar channel with a power enhancement factor for IUM,I, and a
power degradation factor for I,, we conclude that if i.i.d.
channel symbols are permissible, the introduction of IS1
does not drastically modify the underlying functional de-
1535
SHAMAI (SHITZ) et al.: INFORMATION RATES FOR DISCRETE-TIME GAUSSIAN CHANNEL
pendence of Z on a properly defined measure of signalto-noise ratio.
The application of the bounds was demonstrated for
i.i.d. binary symmetric symbols and channels with IS1
memory of degree 1 and 2, that is, a two or three
component IS1 vector. The bounds were compared to
Zi.i.d,-the Monte Carlo based 1251 approximation of the
exact value of Ii.i,d.-and the asymptotic tightness of ZL
and ZuM,ZuG for respectively high and low values of the
signal-to-noise ratio, has been verified.
By using the lower bounding technique IL (6) in Theorem 1, we have been able to improve on the previously
known lower bound, 191, [141, [151, for the capacity of a
continuous-time strictly bandlimited or bandpass Gaussian channel with either peak power limiting (PPL) or
simultaneously band and peak power limiting (BPPL)
imposed on the channel input waveform. This relative
improvement, which increases as the signal-to-noise ratio
diminishes, is attributed to the possibility of incorporating
here the optimized discrete symbol distribution that maximize the lower bound ZL (6). This lower bound (6) has
been recently used to derive lower bounds on the capacity
of the peak- and slope-limited magnetization model with
binary signalling [40].
For the sake of simplicity the results were specialized
to real Gaussian channels, however the techniques used
here can be extended over to account for complex Gaussian channels describing passband systems. The same basic
structure of the bounds as compared to those appearing
in Section I1 is maintained.
We have specialized here to discrete-time IS1 channels
and mentioned that these are well adapted to characterize PAM and QAM signaling in additive Gaussian noise.
The processes of translating the continuous waveform
channel to the discrete-time channel has not been explicitly addressed, rather few alternatives as the matched
filter [41 or sample-whitened matched filter [7] were mentioned. Other alternatives of linear prefiltering, as the
minimum mean-square linear equalizer, combined with
matched sampling [81, which are also modeled by the
discrete-time IS1 channel (l), may, in certain cases, turn
advantageous.
loss in generality when optimal filters (i.e., samplewhitened matched filter) are employed since the resultant
H N is lower triangular being invertible as h , > 0. Nevertheless, in the context of this paper, it is only a technical
assumption that permits simple proofs. All the results are
still valid provided IH(A)l is integrable, which is guaranteed since 1 H(A)I2 was assumed integrable (finite power).
Note, however, that if H ( h ) equals zero over a region
(not isolated zeros) then p = 0 (7), yielding thus a trivial
lower bound in Theorem 1.
In the proofs of Theorems 1 and 2 and Lemma 1 it is
assumed that {xl}is an i.i.d. sequence while this assumption is relaxed in the proofs of Theorem 3 and Lemma 2.
Proof of Theorem 1: Since H N is nonsingular, then
where
z N= x N
+mN,
(A4
z N= ( H N ) - ' y N , and m N = ( H N ) - ' n N is a Gaussian vector with a correlation matrix r,N =
=
u 2 ( H N ) - ' ( H N ) - I T .The function G d ( . ) stands here for
the differential entropy [lo]. The chain rule [lo, ch. 21
yields
I
1
N-3
I
N-1
(A.3)
Conditioning which does not increase the differential
entropy [lo, ch. 21 gives
bd(X,
+ m,lz'-l)
2 fjd(X,
+ m,Iz/-',x[-')
=b,(x,
+ m,Ix'-',m'-'),
(A.4)
where the right-hand-side equality in (A.4) follows since
= I-1
x
+ m[-'(A.2). Express m, by
&I
m, = E(m,lm'-') + P,,
(A.5)
where PI is an innovation Gaussian random variable
statistically independent of the Gaussian vector m'- '.
ACKNOWLEDGMENT
The function E(m,(m'-') denoting conditional expectaThe authors are grateful to A. Dembo for interesting tion is a linear function of m'-' since the random varidiscussions and to an anonymous reviewer for his careful ables involved are jointly Gaussian. Now, since x[, m l - ' ,
x'-l, and P, are all statistically independent, by (A.4),
reading of the manuscript and useful suggestions.
(A.5)
APPENDIX
A
b d ( x [ + m,Iz'-') 2 b d ( x , + mIIx'-',m'-')
PROOFS
= b d ( x, + P , P ) = @ d ( XI + P , ) . (A.6)
In this appendix, we prove Theorem 1, 2, and 3 and
Lemmas 1 and 2, which appear in Section 11. We assume
Using the entropy chain rule and (A.5) yields
here that the symbols X I and the Gaussian noise samples
n , are i.i.d. real random variables. Extensions to the
complex case is shortly discussed at the end of this appendix. We further assume a nonsingular channel, that is,
H N is a nonsingular matrix. The assumption incurs no
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991
1536
Inserting now (A.7) and (A.6) into (A.3) and using (A.1)
gives
1
1 N-1
-I( z"; x") 2 I ( x I P I ; XI).
(A.8)
N
N I=O
+
The limit innovation variable p = 1imldmP I , which takes
also the interpretation of the stationary innovation variable in the noise prediction process [34], is a well-defined
Gaussian random variable with variance
up'=E ( p 2 )= u 2 [ lim
2/N
{detlHNI)
N-rm
] -'.
1
-z(y";x")
N
1
N-1
z(yN;X,IXf-l,XI+,,...,X").
I=O
For i.i.d. symbols Z(X'-', x / + ~ , .,
' .x N ;x,) = 0 and,
x/+l,. . * , X N ) =
therefore, I ( y N ; x I I x ' Z ( y N , x l - l , xi+'; . -,x N ; x,) which is evaluated by a
straight forward calculation since the IS1 effect of
X I - l ,xl+l,.
x N is fully neutralized". In other words to
evaluate ~ ( y " ;x J x ' - ~ x, ~ + ~ .,, x*,) we may use x k = 0
for k = 0,1, . 1 - 1 , 1 + 1 , 1 + 2,- . * , N in (2) which leaves
us with the formulation
',
e ,
y k = hk-,x,
and then rescaling Z(x
N-1
I-
N
(A.lO)
(A.ll)
(A.9)
Invoking the asymptotic properties of log determinants of
Toeplitz matrices [23] specifies up' in terms of "the IS1
transfer function" H ( A ) (8) which is given by the discrete
Fourier transform of { h J . The proof of Theorem 1 is
completed by following standard arguments [lo] (see also
[l] for more details) to show that
1
Proof of Theorem 2: By the chain rule
1
1 N-1
/C
= 0 Z(J";XIIX'-').
-NI ( Y " ; x " ) =
+ p; x) = Z(px + p p ; x) letting
+nk,
k
= 0,l;
. e ,
N > 1. (A.12)
It can readily be verified that
p
=(c7jp/a>-l.
I(Y";X,)=I(-CI/;X,),
(A.13)
The bound can be further tightened by relaxing the
conditioning in (A.4), that is, conditioning on x ' - ~ ,k =
2,3, . . rather than on XI-' ( k = 1). This yields an improvement on the bound ZL by a factor, expressed as a
conditional mutual information, the evaluation of which is
given in terms of a k - 1 fold integral.
Another straight-forward lower bound for i.i.d. symbols
(xI}results by applying the inequality l / N Z ( z N ;x N >>
Z(z; x) = Z(x + m ; x) [lo], which is interpreted also as the
mutual information achieved by employing ideal interleaving in (A.21, which effectively cancels the correlation
present among the components of the vector m". The
resultant bound, given by ( 6 ) where p in (7) is replaced
here by
where 9, = E::Jhk-,yk is the maximal ratio combining
that rises from the maximum likelihood rule when applied
to (A.12). It is clearly seen, using power rescaling, that
follows by noting that
Proof of Theorem 3: We use (A.l) and (A.2) which
stay valid also for non-i.i.d. {XI}-the case examined here.
Let now
"=JIN+p",
(A.15)
U2
E ( m 2 )= - / T l H ( A ) [ - 2 d A .
T
O
This lower bound is found to be inferior when compared
to the one given in Theorem 1 as is evidenced by Jensen
inequality9
which upon substitution in (A.ll), taking the limit for
0
N +03, yields Theorem 2.
Proof of Lemma I : It is well known that the average
mutual information l / N Z ( y N ;x") is upper bounded by a
Gaussian distribution of the vector x N under a given
correlation matrix E ( " T ) constraint [lo]. Thus, in our
case, letting x i be i.i.d. Gaussian random variables with
E(x,)'= PA yields Zi.i.d,= ZUc (11) [l], which sets the
0
upper bound stated in Lemma 1.
where tJJ and p" are independent zero-mean Gaussian
vectors with tJJ" having the covariance matrix
- 1/2
p = (expL/TlnlH(A)l-2dA)
T
O
U
'See [3, ch. 81 where this inequality is stated in the context of the
output signal-to-noise ratio superiority of the zero-forcing DFE over the
linear zero-forcing equalizers.
In the above expression I N stands for the N X N unit
matrix and 6 is a nonnegative scaling factor to be determined. This (A.15) representation is possible if
"This is identified as the mutual information with errorless past and
future "decisions" that are provided as side information.
1537
SHAMAI (SHITZ) et al. : INFORMATION RATES FOR DISCRETE-TIMEGAUSSIAN CHANNEL
where r," and :r stand for the covariance matrices of
the Gaussian vectors m" and p", respectively. Thus :r =
r," - [ - 2 ~ 2 Z N must be a nonnegative definite matrix. The
minimum value of t 2(maximum value of 5-2) for which
this is satisfied is
distributed complex random variables according to b ,(U)
= Jj,(Re(u),Im(u)). Note also that the asymptotic properties of log determinants of Toeplitz matrices extend to
the complex case [23] as well. The main modification
required is introducing conjugate and transpose conjugate
operations4 whenever needed and taking care of dimen-1
sionality issues, that is dimension 2 for the complex case
5 2 = (minimal eigenvalue of ( H N T H N )
and 1 for the real case. Thus the results of Section I1 with
(A.18) minor notational modifications extend also to complex
= maximal eigenvalue of ( H N T H N ) .
valued {xi},{hi},and {ni} accounting for bandpass sysNow
tems.
1
1
The technique used in the proof of Theorem 3 could
-I( z"; x") = -I( X" + IJJ" + p"; x")
N
N
have been used also to derive an alternative lower bound
on 1i.i.d. to that stated in Theorem 1. This is accomplished
1
I --Z(X"+
@"";x"),
(A.19) by adding an additional independent Gaussian vector 7"'
N
to m" such that T" m" form a Gaussian vector with
since we have ignored the additional noise component p"' i.i.d. component. This yields the same structure of a lower
that is independent of both IJJ" and x". Now since IJJ" is bound as stated in Theorem 1 with p (7) replaced by
composed of i.i.d. Gaussian components the variance of minoSAS,IH(A)l. This lower bound falls short as compared to the one given in Theorem 1, since p 2
which is tP2u2,
minolAl,lH(A)l.
1
-+"+
N
IJJN;xN)5 I ( x + $ ; x ) = I ( & + ($; x),
APPENDIX B
(A.20)
UPPER
BOUNDSON Zi,i,d
where the right-hand side equality in (A.20) results by
In this appendix, we formulate the upper bounds
scaling by the factor 5. Invoking the well-known result I,l(i.i,d.)
and ZU2(i,i,d,) on Zi,i.d. and compare them with the
upper bound I,,, given by Theorem 2. The proofs of
[231
these upper bounds, detailed in [41, Appendix B], are
lim t 2= lim max eigenvalue of ( H"TH")
somewhat lengthy and are, therefore, omitted.
N+m
N+m
Lemma Bl: For i.i.d. symbols {xi},
= max I H ( A ) ~ ~ ,
(A.21)
-'}
+
O l h i ~ r
where H ( A ) is given by (8) and noting that the variance of
0
t@is u 2 concludes the proof of the Theorem.
Proof of Lemma 2: Lemma 2 is well known [lo] and
the proof follows that of Lemma 1. Noting again that
replacement of the original symbols {xi) by Gaussian
symbols which satisfy the same second-order moments
rx(l),increases the average mutual information and yields
thus the upper bound Zuc,. The bound Zuc, is again
upper bounded by C,, which is the supremum of I,,
over all the possible correlation matrices satisfying the
symbolwise average power constraint E(xf)5 PA.
where the random variables U , and u1 are defined by
U , = (x, + x l ) / f i and u1 = (x, - x l ) / f i , having the respective probability functions
A. Extensions and Comments
Throughout the paper, for the sake of simplicity, we
have considered only the real case, that is, the channel Again p,(a> stands for the probability function of x, and
symbols xi,the noise samples ni and the IS1 coefficient xl, which by our assumptions are i.i.d. random variables
hi are real valued. However, all the proofs are easily and * denotes convolution. The random variable v is a
extended to the complex case where xi, ni, and hi are zero-mean Gaussian variable independent of U,, u l , and
x, having the same probability function as that of nk in (1)
complex valued. This is possible since the basic relation
(that is: variance ( r 2 ) and the correlation coefficient
J j d ( &U) = Inldet &I+ bd( U )
extends also to a complex vector U composed of continuI
I 1
ous random variables (in our case a Gaussian noise vector) and an arbitrarily complex nonsingular matrix & For a symmetric distribution of x it follows that puo(a)=
where we interpret differential entropies of continuously P u p d .
.
-
-
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991
1538
where the function Z,(R) = Z ( G a + p ; a ) in which a is a
ternary random variable with the probability function
Prob(a = 0) = 1/2, Prob(a = 1) = 1/4 and p is a normalized Gaussian random variable. Following [32, p. 2741
it is readily seen that
Lemma B2: For i.i.d. symbols {xi},
N-l
where x assumes the probability function p,(a) and v is
an independent Gaussian variable having the same distribution as that of I t k in (1) (variance a’).
First note that when no IS1 is present llhll = Ih,l and
Ph = 0 thus Ii,i,d.= I~2ti.i.d.) while 1i.i.d.S IUl(i.i,d,). For the
Gaussian case, that is, { x i ) are chosen to be i.i.d. Gaussian (with ~ ( 1 x 1=~P,),
) with ISI present l,,i,d,= I,, I
I,l(i,i,d.) s I,,.
This relation between Z,l(i,i,d,) and I,, in
the Gaussian case is established by noting that uo and u1
are i.i.d. Gaussian random variables having the same
variance PA and using the fact that In(.) is a convex
function. For high signal-to-noise ratio, PA / a 2 >> 1, it is
readily seen that Z,,l(i,i.d,) = IU2(i.i,d,)
< I,?. For low signal
to noise ratio assuming still Gaussian i.1.d. symbols with
IS1 present, we find that Zi,i,d.= I,, = Z,l(i.i.d.) = I,, =
1/211h112PA/ a 2 while Z,2(i.i,d,h is useless due to the nonnegative term 1/210g(l- lph( ), independent on the signal-to-noise ratio, P A / u 2 , which is present in the bound
(B.4).
Assume now that x is a discrete symmetrically distributed random variable that takes on J’ possible values
and satisfies E(1xI2)= PA. It is clear that, for P A / a 2-03,
both Zi,i,d. and I,,
b(x) I
In M, where b stands for the
standard entropy function [lo], while I,, -,m, I,I(i,i,d,)
2Mu) - b(x) > b(x) and IuZ(i.i.d.) b(x>- 1/2 In (1 -
REFERENCES
[ l ] W. Hirt and J. L. Massey, “Capacity of the discrete-time Gaussian
channel with intersymbol interference,” IEEE Trans. Inform. Theory, vol. 34, pp. 380-388, May 1988.
[2] B. S. Tsybakov, “Capacity of a discrete Gaussian channel with a
filter,” Probl. Peredach. Inform., vol. 6, pp. 78-82, 1970.
[3] E. A. Lee and D. G. Messerchmitt, Digital Communication. Norwell, MA: Kluwer Academic Publishers, 1989.
[4] A. J. Viterbi and J. K. Omura, Principles of Digital Communication
and Coding. New York McGraw-Hill, 1979.
[5] R. W. Lucky, J. Salz, and E. J. Weldon, Principles of Data Communication. New York: McGraw-Hill, 1968.
[6] K. A. S. Immink, “Coding techniques for the noisy magnetic recording channel; A state-of-the-art report,” IEEE Trans. Commun., vol.
37, no. 5, pp. 413-419, May 1989.
[7] I. N. Andersen, “Sample-whitened matched filters,” IEEE Trans.
tnform. Theory, vol. IT-18, pp. 363-378, May 1972.
[SI J. M. Cioffi, G. P. Dudevoir, M. V. Eyuboglu, and G. D. Forney,
Jr., “MMSE decision-feedback equalizers and coding-Part I: General results,” to appear in IEEE Trans. Commun.
[9] L. H. Ozarow, A. D. Wyner, and J. Ziv, “Achievable rates for a
constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol.
34, pp. 365-370, May 1988.
[IO] R. G. Gallager, Information Theory and Reliable Communication.
New York: Wiley, 1968.
lPhI2) > b h ) .
[ l l ] L. H. Brandenburg and A. D. Wyner, “Capacity of the Gaussian
We notice that also for low SNR, that is, PA / a 2-+ 0,
channel with memory: The multivariate case,” BSTJ, vol. 53, no. 5,
pp. 745-778, May-June 1974.
I,, = Iul(i,i.d,) + ( I / ~ > I I ~ I/Ia~2P while
,
IUz(i.i.d.) as was
already mentioned is useless. From this short discussion [12] W. Toms and T. Berger, “Capacity and error exponent of a channel
modeled as a linear dynamic system,” IEEE Trans. Inform. Theory,
we conclude that the relative tightness of the four upper
vol. IT-19, pp, 124-126, Jan. 1973.
bounds on Zi,i.d. presented here depends on the specific [13] J. M. Wozencraft and I. M. Jacobs, Principles of Communication
Engineering. New York: Wiley, 1965.
case, that is, the actual distribution of x, the IS1 vector h
[14] C. E. Shannon and W. Weaver, The Mathematical Theory of Comand the region of the signal-to-noise ratio of interest. For
munication. Urbana, IL: Univ. of Illinois, 1949.
discrete symbols having a bounded entropy Q ( x )and high [15] S. Shamai (Shitz), “On the capacity of a Gaussian channel with
signal to noise ratios the bounds IUG, Iul(i,i,d.),
and I,Z(i,i,d,) peak power and bandlimited input signals,” Archiv fur Electronik
Ubertragungstechnik, Band 42, heft 6, pp. 340-346, 1988.
do not approach to the correct term Q(x) while I,, does. [16] and
J. K. Wolf and G. Ungerboeck, “Trellis coding for partial-response
When specified to the first example in Section 111-A with
channels,” IEEE Trans. Commun., vol. 34, pp. 765-773, Aug. 1988.
IS1 memory of degree 1, the previous upper bounds (in [17] G. D. Forney, Jr. and A. R. Calderbank, “Coset codes for partial
response channels, or, coset codes with spectral nulls,’’ IEEE Trans.
nats/channel used) assume the form,
Information Theory, vol. 35, pp. 925-943, Sept. 1989.
[18] A. R. Calderbank, C. Heegard, and T. A. Lee, “Binary convoluzUl(i.i.d.)= It([’la1/(’+ a 2 ) ] P M / u 2 )
tional codes with application to magnetic recording,” IEEE Trans.
Inform. Theory, vol. IT-32, pp. 797-815, Nov. 1986.
+ I t @ + I a l / ( l + a 2 ) ] P , / u 2 ) - b‘ ( ‘M / ’)
[19] M. V. Eyuboglu and G. D. Forney, ‘‘Trellis precoding: Combined
(B.5a)
coding precoding and shaping for intersymbol interference channels,” to appear in IEEE Trans. Inform. Theory.
and
[20] S. Kasturia, J. T. Aslanis, and J. M. Cioffi, “Vector coding for
partial response channels,” IEEE Trans. Inform. Theory, vol. 36,
‘UZ(i.i.d.)=
2cb( [ l - a 2 / ( 1 +a 2 ) 2 ] P M / u 2 )
pp. 741-762, July 1990.
[21] I. Bar-David and S. Shamai (Shitz), “Information rates for magnetic
-~,(~,/a2)-1/21n(l-a2/(1+
a’)’),
recording with a slope-limited magnetization model,” IEEE Trans.
(B.5b)
Inform. Theory, vol. 35, pp. 956-962, Sept. 1989.
-
-+
+
SHAMAI (SHITZ) et al.: INFORMATION RATES FOR DISCRETE-TIME GAUSSIAN CHANNEL
[22] N. M. Blachman, “The convolutional inequality for entropy powers,”
IEEE Trans. Inform. Theory, vol. IT-11, pp. 267-271, Apr. 1965.
[23] U. Grenander and G. Sjego, Toeplitz Forms and Their Applications.
New York: Chelsea, 1984.
[24] S. Shamai (Shitz) and I. Bar-David, “A lower bound on the cut-off
rate fof dispersive Gaussian channels with peak-limited inputs,”
IEEE Trans. Commun., vol. 39, pp. 1058-1064, July 1991.
12.51 W. Hirt, “Capacity and information rates of discrete-time channels
with memory,” Doctoral dissert. (Diss. ETH No. 86711, Swiss Federal Inst. of Technol. (ETH), Zurich, Switzerland, 1988.
[26] H. Sasano, M. Kasahora, and T. Namekawa, “Evaluation of the
exponent function E ( R ) for channels with intersymbol interference,” Electron. and Commun. in Jap., vol. 65-A, no. 2, pp.
28-37, 1982.
[271 S. Shamai (Shitz) and A. Dembo, “Bounds on the symmetric binary
cut-off rate for dispersive Gaussian channels,” submitted to IEEE
Trans. Commun.
[281 J. W. Lechleider, “The optimum combination of block codes and
receivers for arbitrary channels,” IEEE Trans. Commun., vol. 38,
pp. 615-621, May 1990.
[291 J. R. Price, “Nonlinearly feedback-equalized PAM vs. capacity for
noisy filter channels,” in Proc. Int. Conf. Commun., June 1972, pp.
22-12-22-17.
[30] S. Shamai (Shitz) and I. Bar-David, “Upper bounds on the capacity
for a constrained Gaussian channel,” IEEE Trans. Inform. Theory,
vol. 35, pp. 1079-1084, Sept. 1989.
[311 A. R. Calderbank and L. H. Ozarow, “Nonequiprobable signaling
on the Gaussian channel,” IEEE Trans. Inform. Theory, vol. 36, pp.
726-740, July 1990.
[321 R. E. Blahut, Principles and Practice of Information Theory. Read-
1539
ing, MA: Addison-Wesley, 1987.
[33] L. H. Ozarow and A. D. Wyner, “On the capacity of the Gaussian
channel with a finite number of input levels,” IEEE Trans. Inform.
Theory, vol. 36, pp. 1426-1428, Nov. 1990.
[34] M. V. Eyuboglu, “Detection of coded modulation signals on linear,
severely distorted channels using decision-feedback noise prediction with interleaving,” IEEE Trans. Commun., vol. 36, pp. 401-409,
Apr. 1988.
[35] J. L. Massey, “All signal sets centered about the origin are optimal
at low energy-to-noise ratios on the AWGN channel,” Abstracts of
Papers, IEEE Inc. Symp. Inform. Theory, Ronneby Brunn-Ronneby,
Sweden, June 1976, pp. 80-81.
Gradshteyn and I. M. Ryzhik, Tables of Integrals, Series, and
[36] I.
Products. New York: Academic Press, 1980.
[37] J. G. Smith, “The information capacity of amplitude and variance
-Constrained scalar Gaussian channels,” Inform. Contr., vol. 18,
pp. 203-219, 1971.
[38] S. Shamai (Shitz) and I. Bar-David, “Capacity of peak and
average-power constrained quadrature Gaussian channels,” Abstract of Papers, IEEE Int. Symp. Inform. Theory, Ann Arbor, MI,
Oct. 1986, p. 66.
[39] A. D. Wyner, “Bounds on communication with polyphase coding,”
BSTJ, vol. 45, no. 4, pp. 523-559, April 1966.
[40] S. Shamai (Shitz), “Information rates for the peak- and slope-limited
magnetization model with binary signaling,” E E Pub. No. 761,
Technion, Haifa, Israel, 1990.
[41] S. Shamai (Shitz), A. D. Wyner, and L. H. Ozarow, “Information
rates for a Gaussian channel with intersymbol interference and
stationary inputs,” Internal Tech. Memo., AT&T Bell Laboratories,
1990.
s.