CHAPTER 7
Channel Capacity of N Identical Cascaded Channel
with Utility
7.1 Introduction
Cascaded channels are very often used in practice. They become necessary
because some waves like microwave links and the electromagnetic wave do not
follow the curvature of the earth, because the shrinking suffered by the signal
becomes prohibitive when the distance becomes large. In such conditions the designer
forced to break up the whole channel in to a cascade of channels. It is also called as
intermediate channel.
The performance of channels in cascade with themselves has been discussed
by various authors. Some of them consider the case of binary symmetric channel
cascaded
with
itself
n
times
(refer
to
Ash(1965),Cover
and
Thomas(1991),McEliece(1977)). Gallager(1968) shows that for long cascades of
fixed channels, the capacity tends to zero at least as fast as a certain exponentially
decreasing function. An early paper of Silverman(1955) compares BSC’s and Z
channels of the same capacity and examines which would have higher capacity when
cascaded with itself (i.e.a cascade of length 2). Majani (1988) considers such cascades
for the class of very noisy channels, and also examines the effect of an inverter placed
between the channels. Simon(1970) gives (under certain assumptions) a formula for
the capacity of k X k channel that is cascaded with itself n times. The capacity is
expressed in terms of eigenvalues and eigenvectors of the individual transition matrix.
In the view of information theory, the large number of microwave links
recently built enhances the discussion of channels in cascade .Designers know that in
131
cascaded channels it is important to use modulation system exhibiting noise reducing
properties such as Frequency Modulation(F.M.) and Pulse Code Modulation(P.C.M.).
There is a lot of difference between the problem of transmitting information
through a single channel and that of transmitting information through a cascade of
channels. In the first case, the transmitter has all the information to be transmitted;
whereas, in the second case, (except for the first transmitter) the information which is
available to each transmitter (to be precise information about what was transmitted by
the first transmitter) is no more in the form of a symbol but rather in the form of a set
of a –posteriori probabilities. We should therefore expect to find that manner in which
the intermediate channel will be operate and that is very important for the
performance of cascaded channels.
Cannels can be cascaded in different ways depending on condition of
transmitter and receiver. In the present chapter we discuss properties of cascaded
channels in section7.2. In section 7.3 we give the classification of cascaded channels
In section 7.4 we define mutual information with utility and prove that it is concave
function. In section 7.5 we study channel capacity of N identical cascaded channels
with utility.
7.2 Basic Properties of Cascades
Let C1 , C 2 ,.............C n be a list of n k k discrete memoryless channels
(DMC’s). There are n! ways to form a new DMC by cascading the Ci in different
orders. Initially we can think of each Ci as a stochastic matrix, so that the stochastic
matrix of the new DMC will be equal to a matrix product of individual DMC’s. Since
matrix multiplication is not commutative, we expect that the order in which the
channels are cascaded will effect the capacity of overall cascade. Our first observation
is that since the determinant of the channels are arranged, it suffices to analyze the
behaviour of the capacity function for channels of constant determinant.
132
Let we consider the Binary channels, i.e. 2 X 2 stochastic matrix. If
is a binary
1
DMC having transition matrix
then it will be convenient to think of
1
as the column vector , i.e. the point in the unit square whose coordinates are the
transition error probabilities for the channel .
If denotes the channel (α,β), then Silverman(1955) define
the capacity in
bits/channel as follows:
H ( ) H ( )
1 H ( ) H ( ) log 1 2 1 , if 1,
2
c( )
1
0, otherwise,
for 1, and 0 otherwise. (Here H ( x) x log x (1 x ) log(1 x ) is the binary
entropy function measured in bits.)
The unique optimizing input distribution (p, 1-p) is given by Silverman(1955).
H ( ) H ( )
1
p ( , )
(1 ) 1 2 1
1
1
, for 1.
7.2.1 Cascaded Pairs of Channels
If C1 and C2 are the binary channels with transition error probabilities (a,b)
and (α,β) respectively, we let C1 C2 denote the vector corresponding to the
stochastic matrix obtained from the matrix product C1C2 . Since the matrix
multiplication is associative, then so is , which operates on vectors. Since the
determinant of C1 is 1-a-b, for the C1 we define m(C1) 1 a b. Then channel
vectors C1 and C2 can be given as
C1 C2 C2 m(C2 )C1.
(7.2.1.1)
Now observe that the distance from the vector C1 to BSC line is a b 2 ; if we let
r (C1) a b, then this distance is r (C1 ) 2. The capacity of cascade C is maximized
133
(minimized) if r (C ) is maximized (minimized). We now discuss some simple
properties of r. From the definition of r it is easy to show that
r (C1 C2 ) r (C2 ) m(C2 )r (C1).
(7.2.1.2)
Next, suppose C1 is a channel vector with r (C1 ) r (C 2 ) . Then if m(C2 ) 0, i.e. the
matrix C2 has a positive determinant, then
r (C1 C2 ) r (C2 ) m(C2 )r (C1)
r (C2 ) m(C2 )r (C1)
r (C1 C2 ) .
(7.2.1.3)
Similarly, if r (C 2 ) r (C2 ), and if m(C 2 ) m(C2 ), then
r (C1 C2 ) r (C2 ) m(C2 )r (C1)
r (C 2 ) m(C 2 )r (C1)
r (C1 C 2 ).
(7.2.1.4)
The angle that the channel vector C1 makes with the BSC line is
(C1 ) arctan (a b) (a b). By simple calculation it can be verified that
r (C1 C2 ) r (C2 C1 ) (C1) (C2 ).
(7.2.1.5)
By combining these results, we can characterize the optimal cascade under certain
conditions.
7.2.2 Cascades of n Channels
Let we consider the locations of the n! points corresponding to each possible
cascade of the channels is C1 , C2 ,.............Cn . Since the determinant of the channel
matrix of such a cascade is independent of the order of the Ci all of these points lie on
a line in the unit square having slop -1. From the convexity and symmetry of the
capacity function, the cascade having highest capacity will be the one that lies the
farthest from the BSC line. Hence we need only consider the sequence which
maximize r for the overall cascade.
134
Definition 7.2.1: A cascade C1 C2 ................ Cn is said to be rmax(rmin)
optimal if no other cascade of the Ci produces a channel with larger (smaller) r.
Hence if rmax rmin then the sequence achieving r max yields the cascade of highest
capacity, while if rmax rmin then the sequence achieving r min yields the cascade of
highest capacity.
7.3 Classification of Cascaded Channels
Cascading of the channels can be classified in four categories as discussed below:
7.3.1 Equivalent Channels
It is often convenient to consider the cascade of channels as a unit, that is,
considering the cascade only in the term of its input and output. This unit will be call
the cascaded channel. Also, the equivalent channels is the channel which has
statistical properties identical to those of the cascade, at least as far as its input-output
relations are concerned.
The statistical properties of the equivalent channel depend very much on the
assumed operation of the intermediate channels. The change in the operation of the
intermediate station produces very drastic changes in the performance of the
equivalent channel.
All the properties of equivalent channels can be define by its transition
probability matrix; the transition probability matrix of the equivalent channel is equal
to the product of the transition probability matrices of each channel of the cascade and
the factor matrices is identical to the order of the channels in the cascade.
The channel capacity of the equivalent channel is always smaller or equal to
the smallest channel capacity of the cascaded channels. When the transitionprobability matrices of all channels are non-singular, the equal sign holds only if not
all but one of them is noiseless channel .
7.3.2 Discrete Channels
Consider a cascade of n discrete channels. Since each of these channels must
transmit the same message, therefore we may assume that they have a common
alphabet of m symbols. In each channel, appropriate signals are associated to each
135
symbol. We assume that in a particular channel, all the signals have the same duration
and also the intermediate channel operates as a “maximum a-posterior probability
detector”. Under these conditions, in addition to the propagation time, a delay at least
equal to Ti will occure in the ith channel because the receiver must have received the
complete signal before being able to compute the a- posteriori probabilities.
For each channel, on the basis of the noise statistics and the decoding procedure, it is
possible to obtain the transition probabilities, that is, the probability that a particular
symbol, say xi, being transmitted and some other symbol say yj, will be received. Let
this probability, for the ith channel, be represented by p xi
.
y
j
7.3.3 Cascade of Repeaters
The intermediate channel operation discussed in previous case may occur
addition delay in each channel but in certain cases this cumulative delay becomes
undesirable. It is therefore of interest to consider a case where this delay reduce to a
minimum. In particular we wish to consider here the case where the signals are
retransmitted exactly as they are received.
Let us assume that all channels are band limited and have the same bandwidth
w. Thus the signals are completely defined by a sequence of equidistant sample taken
at rate of 2w samples per second. For the simplicity, let us assume that the
intermediate channels are repeaters that are retransmitting the signal exactly as they
received. Thus in order to obtain the input-output statistics of the cascade, we need
only to consider the signal as one sample at a time.
Let the sample x of the first transmitter with its corresponding probability p(x). The
sample x will transmit down the first channel and will be received as y1 by the first
intermediate station, y2 by the second intermediate station, and finally as yn by the last
receiver.
Each channel is represented by a conditional probability density; for the ith channel
y
gives the probability distribution of the samples y . It will be completely
p i
i
y
i 1
y
define by the transition probability density p n .
x
136
7.3.4 Cascading of Identical Channels
Quit often in practice, one is concerned with transmitting data over a channel
that is composed of a cascade of identical sub channels, e.g. the repeated telephone
line. Calculation of the channel capacity via the standard technique of fist finding the
overall channel transition matrix becomes extremely tedious as the number N of sub
channels become extremely large (typically N=3000 for long haul telephone link). An
alternative simplified approach to find the channel capacity of channel is by
combining an Eigen value technique with a channel capacity. It has been explain in a
standard text on Information theory due to Robert Ash(1965) along with the condition
that sub channels should be discrete, memory less, and have a non-singular transition
matrix.
7.4 Mutual Information with Utility
Let X ( x1 , x2 ,.................., xn ) be a set of input alphabet of source X and
YN ( y1 , y2 ,................., yn ) be the set of output alphabet of Nth destination YN. Let
p( xi ), i 1,2,..........., n and p ( y j ), j 1, 2,................., n be the probability distribution
function defined on X and YN respectively.
Let Ai denote the transition matrix of the ith sub channel. As Ai is assumed nonsingular, it must be square. Since n number of letters to be transmitted, i.e. Ai is n n ,
therefore the input and output state column vector for the ith sub channel are p(Yi 1 )
and p(Yi ) , respectively, and are related by
p (Yi ) AiT . p (Yi 1 ) .
(7.4.1)
It is also true that p(Y0 ) p( X ), where p(X) denotes the probability distribution
vector for the source X. For N identical sub channels, the source distribution is related
to the output state by
1
p ( X ) [ AT ]N . p (YN ) ,
(7.4.2)
here the ith subscript on AT is ignored because we are assuming that all i sub channels
are identical. By Kerline (1966) if B denote A-1, then p(X) can be expressed as
137
m
p( X )
a v
N
i i
i
(7.4.3)
i 1
where λi and vi
are respectively , the ith eigen value and corresponding eigen
vector of the matrix BT. The coefficients a i are determined from the set of equations
n
p (YN )
a v
(7.4.4)
i i
i 1
Since the eigenvalues of AT are always less than or equal to 1 in magnitude, the
T
eigenvalues of BT A1 , say i ; i 1,2,3,..................., n and satisfy
i 1,
i 1, 2,3,..............., m
Also, i 1 is always one of the eigenvalues, and a repeated eigen value 1 will occur
if and only if the matrix AT is reducible. The eigenvectors of BT are identical to
those of AT. Thus, the eigenvectors corresponding to the non unit eigenvalues have
components that sum to zero. The elements of the eigen vectors corresponding to the
eigen values 1 are all positive or zero since they must correspond to probabilities.
Now, the mutual information between source X and destination YN can be given as
n
I ( X , YN )
p( y ) log p( y ) H
j
j
T
. p ( X ),
(7.4.5)
j 1
are the element of p (YN ) .
where p ( y j )
If H is the column vector with ith
component
n
H (YN xi )
p( y
xi ) log p ( y j xi ),
j
(7.4.6)
i 1
N
where p ( y j xi ) is the ijth element of the matrix AT
.
Substituting (7.4.3) in (7.4.5), we obtain
n
I ( X , YN )
n
p( y ) log p( y ) a
j
j 1
j
i
i
N
H T vi
(7.4.7)
i 1
Let ui be the utility corresponding to the input probabilities p(xi). Here we are
assuming that the utility for all the sub channels will be same for the N cascaded
channels.
Then the ‘useful’ mutual information for N identical cascaded channels is defined as
138
n
I ( X , YN ;U ) H (YN ;U )
a
i
N
i i
r,
(7.4.8)
i 1
where ri H T .vi and H T is the transpose of the column vector H and
n
n
u . p( y
j
H (YN xi ;U )
i 1
j
xi ) log p( y j xi )
j 1
n
(7.4.9)
n
p( x , y ).u
i
i 1
j
j
j 1
Theorem 7.4.1 The average ‘useful’ mutual information processed by N identical
cascaded channel is convex function of the input probabilities.
l
Proof. Let b1 , b2 ,................., bl are non-negative numbers such that
a
k
.
1.
k 1
Let us define input distribution
l
P0 ( x)
b P ( x),
(7.4.10)
k k
k 1
then we shall prove that the average ‘useful’ mutual information corresponding to
P0 ( x) satisfies
l
I 0 ( X , YN ;U )
b I ( X , Y ;U ),
k k
(7.4.11)
N
k 1
where I k ( X , YN ;U ) is the average ‘useful’ mutual information corresponding to input
distribution PK ( x) .
r
Let
I I 0 ( X , YN ;U )
a I
K K
( X , YN ;U , ) ,
k 1
then,
I H 0 (YN ;U )
n
i 1
ai iN ri 0 H k (YN ;U ) bk
139
n
i 1
ai iN r k i
(7.4.12)
n
u i . p 0 ( y j ) log p0 ( y j )
n
i, j 1
a i iN ri
m
0
i 1
ui . p0 ( y j )
i , j 1
n
u i . p k ( y j ) log p k ( y j )
l
n
l
i, j 1
N
bk (
)
bk
ai i ri
m
k
k 1
k 1
i 1
ui . pk ( y j )
i, j 1
Since p0 y j xi is the sum of the channel probabilities pk y j xi , k 1, 2,3,..........l ,
therefore, we have
l
n
bk
ui . pk ( y j xi ) log pk ( y j xi )
i , j 1
k 1
H 0 YN xi ;U
n
n
p( y
i 1
j
xi )ui
j 1
or
l
H 0 YN xi ;U
b H Y
k
k
N
xi ;U
k 1
It implies that
l
T
0
H
b H .
k
T
k
k 1
and hence
n
l
n
r
ri0
i 1
bk
k 1
(7.4.13)
i0
i 1
Equation (7.4.12) together with (7.4.13) reduced to
r
I H 0 (YN ;U )
a .H (Y ;U )
k
k
N
k 1
It implies that
140
n
n
l
n
n
u . p ( y ) log p ( y ) b . u . p ( y ) log p ( y )
j
I
i 1
0
j
0
j
k
j 1
n
j
k 1
i 1
u .p ( y )
i
0
k
j
k
j
j 1
n
u .p ( y )
j
i
j 1
k
j
j 1
(7.4.14)
l
Now since P0
b P therefore (7.4.14) can be written as
k
k
k 1
l
n
l
n
b u p ( y ) log p ( y ) b u p ( y ) log p ( y )
k
=
k 1
j
k
j
0
j
k
j 1
j
k 1
k
j
k
j
j 1
n
p ( y ).u
0
j
j
j 1
n
n
bk
pk ( y j )u( y j ) log p0 ( y j )
pk ( y j )u ( y j ) log p k ( y j )
k 1 j 1
j 1
l
I
m
.
p ( y )u ( y )
0
j
j
j 1
By the generalized Shannon’s inequality, we have
n
n
u j pk ( y j ) log p0 ( y j )
j 1
u p ( y ) log p ( y )
j
k
j
k
j
j 1
with equality only if pk ( y j ) p0 ( y j ) for each j, it implies
n
n
u . p ( y ) log p ( y ) u . p ( y ) log p ( y ) 0
j
j 1
k
j
0
j
j
k
j
k
(7.4.15)
j
j 1
n
Hence (7.4.15) together with (7.4.14) gives I 0 , since
p ( y )u
0
i , j 1
positive.Thus the proof of Theorem7.4.1 completes.
141
j
i
is always
7.5 Channel Capacity of Cascaded Channels
Channel capacity, as developed by Shannon(1948) is a key concept in
information theory. It is an important measure of channel-performance but it entails a
free choice of input distribution. In practice, however, several unrelated factors for
example, time, energy and money constraints, may restrict this choice and channel
performance has then to be considered under these restrictions.
A general method for determining the channel capacity of N cascade identical
discrete memory less non-singular channels was studied by Simon(1970). We discuss
the same technique and extend its applications in the case when source alphabet has
utilities in addition to probabilities.
From (4.7.8), we have
n
I ( X , YN ;U ) H (YN ;U )
a
i i
N
ri
(7.5.1)
i 1
So the channel capacity of N identical cascaded channels is given by
CN max I ( X , YN )
[YN ]
(7.5.2)
Theorem 7.5.1 Let Ai be a stochastic matrix of ith sub-channel, which is non-singular
and square. Also qij be the element of A-1 for i j 1, 2,............., n , then channel
capacity of N identical cascaded channels with utilities is given by
C (U ) H (YN ;U )
n
qij iN ri
i 1
ai
p ( xi )
(7.5.3)
n
provided
u . p( y ) k constant.
j
j
j 1
Proof. As we know ‘useful’ mutual information for N identical cascaded is given by
n
I ( X , YN ;U ) H (YN ;U )
a
i i
N
(7.5.4)
ri
i 1
142
The equation (7.5.4) may be regarded as being defined for all non-negative real values
of probability p( xi ) along with corresponding positive utilities u i. We have to
n
maximize (7.5.4) subject to the condition
p( x ) 1. For this we assume that the
i
i 1
solution does not involve any p( xi ) 0 and apply the method of Lagrange’s
multiplier.
n
I ( X , YN ;U ) N (
p( x ) 1)
i
(7.5.5)
i 1
Differentiating with respect to p( xi ) and equate to zero.
I ( X , YN ;U )
N 0
p ( xi )
pi
(7.5.6)
As we know
n
u p( y ) log p( y )
j
j
j
j 1
H (YN ;U )
m
u p( y )
j
j
j 1
It implies
H (YN ;U )
1
k
n
u p( y ) log p( y ),
j
j
(7.5.7)
j
j 1
n
where k
u p( y ) .
j
j
j 1
Partially differentiating both sides of (7.5.7) with respect to p( xi ) , we have
H (YN ;U ) H (YN ;U ) p( y j )
p ( xi )
p ( y j )
p ( xi )
1
k
n
u 1 log p( y ). p( y
j
j
(7.5.8)
xi )
j
j 1
The equation (7.5.6) together with (7.5.8) reduces to
n
i , j 1
n
u j [1 log p ( y j )]. p ( y j xi ) k
i 1
( ai .iN .ri )
kN 0.
p ( xi )
143
It implies that
n
n
u ( y j )[1 log p ( y j )]. p ( y j xi ) kN k
i , j 1
i 1
( ai .iN .ri )
.
pi
(7.5.9)
n
Since
p( y
j
xi ) 1 , therefore (7.5.9) can be written as
i , j 1
n
n
[u j {1 log p ( y j )} kN ] p ( y j xi ) k
j 1
p ( x ) (
N
i
ai ri )
i
i 1
Since Aij is a non-singular matrix, therefore, its inverse exists. Multiply both sides by
N
AT 1 A1 T
N
We have
n
u ( y j )[1 log p ( y j )] kN k
q
ij
i , j 1
( ai iN ri )
pi
(7.5.10)
Multiplying both sides of (7.5.10) by p ( y j ) and summing over j
1 N H (YN ;U )
n
i 1
qij iN ri
ai
p ( xi )
(7.5.11)
By the previous theorem is the sum of a convex and linear function and is therefore
convex on the set of non-negative numbers whose sum is unity. It implies that for
given N we can found an absolute maximum of the function
over the domain
n
p( xi ) 0 and
p( x ) 1. Thus the solution yield an absolute maximum for the
i
i 1
information processed.
If we multiply both sides of (7.5.9) by p( xi ) and summing over i, we have
n
i , j 1
n
u ( y j ). p ( y j )
n
u ( y j ). p ( y j ) log p ( y j ) kN k
j 1
a
i
i 1
144
i
N
ri
or
n
a
N
i i i
1 H (YN ;U ) N
r 0.
i 1
or
n
H (YN ;U )
a
N
i i
ri 1 N .
i 1
Hence
C (U ) max I ( X , YN ;U ) 1 N .
(7.5.12)
From (7.5.11) and (7.5.12) together, we get
C (U ) H (YN ;U )
n
q
ij
N
i
i 1
ri
ai
,
p ( xi )
which is the channel capacity of N identical cascaded channels.
Hence the proof of Theorem 7.5.1 completes.
7. 6 Conclusion
The characteristic difference between the problem of communication through
channels in cascade and that of communication through a single channel is that, in the
latter case, the transmitter possesses the complete knowledge of what it should
transmit. In this chapter we have studied the different type of cascaded channels. We
have also defined the channel capacity of N identical cascaded channel and proved
two theorems on it.
The general problem of determining the optimal ordering of n arbitrarily chosen
binary channels remains open. However, we have introduced a class of information
theoretic problem that deals with channel performance in cascade and estimates the
channel capacity with utility.
145
146
© Copyright 2026 Paperzz