On the Impact of Information Theory on Today`s Communication

On the Impact of Information Theory
on Today’s Communication Technology
Johannes B. Huber
Robert F.H. Fischer
Lehrstuhl für Informationsübertragung, Friedrich–Alexander–Universität Erlangen–Nürnberg,
Cauerstraße 7/LIT, 91058 Erlangen, Germany, Email: {huber,fischer}@LNT.de
Dedicated to our esteemed colleague Prof. Dr. Heinz Gerh äuser on occasion of his 60 th birthday
Abstract
2. The Capacity Formula
The impact of results from Information Theory on todays
information technology is illustrated by means of derivations
from capacity formula and chain rule. By this, the huge
possible increase in power and bandwidth efficiency for
transmission of analog source signals via modern digital
communication techniques is enlightened. The foundations of
modern approaches like OFDM, CDMA, coded modulation,
precoding for equalization, signaling over MIMO channels,
etc. in theorems of Information Theory, especially the chain
rule is pointed out.
The most important model for communication channels is the
discrete-time additive white Gaussian noise (AWGN) channel
(see Fig. 1). This appraisal is not only justified due to its simplicity, but also by the central limit theorem of statistics when
noise emerges from many, independent sources (white thermal
noise). Moreover, it is a worst case model for given signalto-noise ratio (SNR): As the Gaussian probability density
function (pdf) yields maximum differential entropy for fixed
variance, white Gaussian noise is the “most random” of all
wide sense stationary processes with fixed variance, i.e., the
“most noisy” noise possible.
1. Introduction
The year of birth of the “Age of Information” can be determined to 1948. The almost unbelievable story of success
and progress of information technology was initiated by the
invention of the transistor and, with comparable importance,
by its theoretic foundation which has been developed within
the same company, almost at the same place, and the same
time by Claude E. Shannon (1916–2001). The publication of
this epoch-making paper “A Mathematical Theory of Communication” [19], one year later reprinted as a textbook and
now entitled “The Mathematical Theory of Communication”,
together with the hereby initiated new research area “Information Theory”, had enormous influence on the upcoming
development. But usually this aspect is quite less recognized
than the story of success of microelectronics.
The purpose of this paper is to broaden the recognition of
the role of this mathematical theory to our today’s technology
using exemplary two theorems of Information Theory. By this,
the authors intend also to emphasize the enormous importance
of mathematical background knowledge in general, and Information Theory in particular, within the curriculum for students
in electrical and communication engineering.
Information Theory provides an objective measure of information, which is verified by the fundamental theorems on
possibilities and limitations in source and channel coding. By
this, these results immediately initiated important progress
in source coding, channel coding, digital communication
schemes, cryptography, and further fields, which characterize
today’s information technology.
i.i.d. Gaussian RV
X
N σN2 = N
Y
σX2 = S
Figure 1. Discrete-time AWGN channel model.
The derivation of the capacity formula of the discrete-time
AWGN channel, i.e., adding of an i.i.d. Gaussian random
variable N with variance σ N2 = N to the signal variable X
with variance limited to σ X2 = S is rather involved and not as
simple as the result
1
S bit
for X , N, Y ∈ R.
C = log2 1 +
channel use
2
N
(1)
This capacity is achieved, i.e., mutual information I(X ; Y )
is maximized, for an i.i.d. random input variable X with
Gaussian pdf, too. Eq. (1) holds for real random variables,
whereas, when modeling a passband channel by means of
complex baseband signal representation, the capacity reads
S bit
for X , N, Y ∈ C. (2)
C = log2 1 +
channel use
N
Employing the sampling theorem for band-limited signals
with single-sided bandwidth B RF , i.e., at most 2BRF independent channel uses are possible per second, the famous capacity
formula for the band-limited continuous-time AWGN channel
results in both cases in
S
bit .
(3)
CT = BRF log2 1 +
s
N0 BRF
Here, N0 denotes the one-sided power spectral density (psd)
of the continuous-time white Gaussian noise.
2.1 Power–Bandwidth Plane for Transmission of
Analog Signals
Let an analog wide sense stationary random source signal
q(t) ∈ R, t ∈ R, by specified by its one-sided bandwidth BLF and its peak-to-average power ratio PAPR =
max∀t |q(t)|2 /E{|q(t)|2 } which expresses the spikyness of
q(t).
The transmitter transforms q(t) into a transmit signal s(t)
with average power S, which allocates an interval of bandwidth BRF at the positive frequency axis. A modulation
scheme q(t) → s(t) is characterized by the bandwidth
efficiency Γa for transmitting an analog signal and the power
efficiency, defined as a minimum normalized signal-to-noise
ratio sufficient to achieve a desired quality of the communication scheme, specified by a signal-to-noise ratio SNR LF,min
for the receiver output signal
def
Γa =
BLF
BRF
def
SNR0 =
,
for SNRLF ≥ SNRLF,min
min SNR0
S
N0 BLF
=
SNRRF
Γa
,
with SNRRF =
S
N0 BRF
(4)
The normalized signal-to-noise ratio SNR 0 characterizes the
channel quality independent of the actual bandwidth efficiency
Γa , expressed by the SNR which would be valid for transmission without applying any modulation.
Each modulation scheme is marked by a point or curve
(if there is some free parameter) in the power-bandwidth
plane (min SNR 0 , Γa ). High efficiency corresponds to a
point towards the left upper corner. Fig. 2 shows this
power-bandwidth plane for different amplitude modulation
formats (AM), frequency modulation (FM), and for simple
pulse code modulation (PCM, 12 Bit/sample), combined
with uncoded digital quadrature-amplitude modulation (QAM)
schemes. Additionally several theoretical curves are depicted
which will be discussed subsequently. The desired quality
10 log10 (SNRLF,min ) = 60 dB is chosen corresponding to
10 log10 (SNRLF ) = 60 dB
10 log10 (PAPR) = 13 dB
(Gaussian source signal)
1
AM SSB
Sh
rent)
an
no
n
bo
un
d
source co
ng n =
ideal chandi
nel coding2/
(non transpa
Γa −→
0.4
1024
512
256
128
64
32
16
0.2
AM DSB
AM DSB
w/o carrier
w carrier
PCM +
uncoded dig. QAM
←
0
10
20
SNRLF ≤ 22n − 1
∆f
s(t) =
30
40
50
60
10 log10 (SNR0 ) [dB] −→
70
80
Figure 2. Power-bandwidth plane for analog modulation formats.
(5)
on the signal-to-noise ratio due to digital representation of
analog samples results, which is the so-called distortion-rate
function for a Gaussian random variable [2] (version for
unbiased signals). This bound can be achieved as close as
desired by vector quantization, i.e., by combining K samples
into vectors and performing suitable quantization in the space
RK , as long as K is chosen large enough. A reasonable
practical solution is described in [13], [16] using sphere
coordinates and uniform quantization of the angles. Here, the
bound (5) is missed by a factor of 1.42 (1.53 dB) in favor
of extreme complexity reduction, while a very high dynamic
range for the input samples is gained for free.
FM
4
0
2.2 Digital Representation of Analog Signals
Digital representation of analog signals offers a lot of important advantages like quasi noise-free and perfectly reproducible signal processing, perfect signal regeneration as long
as symbol errors are avoided, noise-free reproduction, etc. For
digitizing some signal distortion (quantization noise) has to be
accepted. But as perfect noise-free observation of an analog
random variable is not possible in principle, information per
analog sample nevertheless is restricted to a finite amount, cf.
(1).
A principle limit for the accuracy in digitizing a real-valued
analog signal can immediately be derived from the capacity
formula (1). Imagine a digitizer for a sequence of analog
samples producing n binary symbols per sample on average,
which are transmitted to a reconstruction unit reproducing
the samples with some quantization error. The signal path
from the digitizer input to the reconstruction unit output form
a noisy channel, providing some capacity C per sample.
Obviously, this capacity cannot exceed the average number
of binary symbols really being transmitted: C ≤ n (data
processing theorem). Using (1), the bound
2.3 Power–Bandwidth Plane for Digital Transmission
Efficient digital transmission over the AWGN channel is provided by digital pulse-amplitude modulation (PAM). Digital
symbols are mapped to amplitude coefficients, also called
signal points ai ∈ A, i ∈ {1, 2, . . . , M }, out of the signal
constellation A with M elements, i.e., |A| = M . Per modulation interval, an impulse g(t) is scaled by one of these
coefficients to form the PAM transmit signal
0.8
0.6
a medium quality audio signal transmission. The choice of
parameter 10 log 10 (PAPR) = 13 dB corresponds to an
overload probability less than 10 −5 assuming a Gaussian
random process for the source signal. The equations and
parameters specifying the results in Fig. 2 are summarized
in Table I.
90
∞
am[k] g(t − kT ) .
(6)
k=−∞
Here, T denotes the modulation interval; usually square-root
Nyquist impulses g(t) are used [17], providing intersymbol
Table I. Parameters for the analog modulation formats in the power-bandwidth plane, Fig. 2.
AM DSB w/o carrier
Γa = 0.5, SNRLF = SNR0
AM DSB w carrier (modul. index m = 0.8)
Γa = 0.5, SNRLF = SNR0 · (m2 /PAPR)/(1 + m2 /PAPR)
AM SSB
Γa = 1.0, SNRLF = SNR0
FM, max. frequency deviation ∆f
BRF = 2(∆f + 2BLF )
(Carson; high quality);
Γa = 1/(2∆f /BLF + 4)
SNRLF = 1/(1/SNR C + 1/SNRL )
with SNRC = 3(∆f /BLF )2 SNR0 /PAPR (high SNR regime)
√
SNRL = Γa · (∆f /BLF )2 /Q( 2SNR0 Γa )/PAPR (FM clicks)
2
x
e−t /2 dt
Q(x) = √1 −∞
2π
SNRLF = 3 · 22n /PAPR
PCM – uniform quantization
SNRLF = 0.1 · 22n (A-law, A = 88)
1+α
T
Here, φb = 1/Tb [bit/s] denotes the transmitted data rate, T b
the bit interval, Rc the rate of the binary channel code, and
α the roll-off factor (also called bandwidth extension factor)
of the square-root Nyquist impulse.
The power efficiency is specified by the minimum quotient
Eb /N0 sufficient to achieve a desired data reliability, i.e.,
bit error rate BER ≤ BERT (tolerable bit error rate).
The signal energy per information bit is given by E b =
S · Tb , independent of the specific coding and modulation
scheme. Fig. 3 shows the power-bandwidth plane for digital
transmission. The location of several M -PSK schemes, i.e.,
ai = ej2π(i−1)/M , and M -QAM schemes, i.e., a i ∈ (2Z +
1) + j(2Z + 1), without channel coding for α = 0 and
BERT = 10−8 are depicted. For PCM with n bit/sample,
i.e., φb = 2BLF · n, these points are translated into the powerbandwidth plane for transmission of analog signals using
with φb = Rc · log2 (M )/T , BRF =
SNR0 =
S
Eb
=
· 2n ,
N0 BLF
N0
Γa =
BLF
= Γd /(2n) ,
BRF
(8)
cf. Fig. 2.
The channel coding theorem of information theory states
that reliable digital communication over noisy channels is
possible in principle as long as redundant channel coding
with code rate Rc is applied, such that total transmission
rate R [bit/channel symbol] does not exceed channel capacity,
R ≤ C, and codewords with sufficient length N are used
(N → ∞ for BER → 0, R → C). For R > C reliable
communication is not possible in principle.
Assuming an optimum, capacity achieving coded modulation scheme
Tb
S
!
·
(9)
φb = CT = BRF log2 1 +
N0 BRF Tb
nic
ati
on
5
(7)
64 QAM
4
32 QAM
16 QAM 16 PSK
Sh
φb
Rc · log2 (M )
=
BRF
1+α
3
2
st
=
128 QAM
tolerated bit error ratio
BERT ≤ 10−8
6
Γd −→
Γd
7
s
interference free detection at the output of the corresponding
matched filter at the receiver. The bandwidth efficiency of
digital signaling is defined as
an
(Tu ate
no
rbo -o
nb
/LD f-t
h
PC e
ou
co ar
nd
de t d
s,
sig igit
na al
ls
ha com
pin
g) mu
– logarithmic quantization
1
0
−5
0
5
10
8 PSK
4 QAM = 4 PSK
2 PSK
15
20
10 log10 (Eb /N0 ) [dB] −→
25
30
Figure 3. Power-bandwidth plane for digital modulation formats.
is valid, from which immediately the Shannon bound for
digital communication over the AWGN channel results
Eb
Eb
1 Γd
or
· Γd
=
2 − 1 (10)
Γd = log2 1 +
N0
N0
Γd
which is also plotted in Fig. 3. This bound initiated extensive
research activities in channel coding and digital communications since 1948, but not before 1993 solutions, achieving
this bound sufficiently tight for practice, have been found. The
success in signaling near the Shannon bound started by the
invention of Turbo codes [3], which also led to a revival of
low-density parity-check (LDPC) codes [10], [12] and finally
to the design of multiple turbo codes [15] and irregular Turbo
and LDPC codes [18].
Additionally, for M > 2 ary modulation schemes a transmit
signal with a near Gaussian pdf is necessary in order to
approach the Shannon bound. For this, several methods for
signal shaping were developed [6]. If data delay due to a large
codelength is no problem, these methods nowadays can really
provide digital communication very close to the Shannon
bound, cf. Fig. 3.
2.4 Optimum Transmission of Analog Signals
Combining (5) for optimum digitizing of analog signals and
(10) for optimum digital communication by means of (8)
gives the following result for the optimum trade-off between
power and bandwidth efficiency for transparent transmission
of analog signals via digital communications
SNRLF = (1 + SNR0 · Γa )
1/Γa
−1,
(11)
see Fig. 2. Here, the expression “transparent” means that
the transmission scheme implies no restriction on the source
signal q(t) besides bandwidth limitation to B LF . (As (5) holds
for any amplification of the source signal, even no restriction
on its amplitude range is imposed.) Please notice that for
Γa = 1.0 eq. (11) yields SNR = SNR 0 , i.e., AM SSB is
an optimum transparent transmission scheme.
Transmitter, channel, and receiver form a chain of data
processors, cf. Fig 4. Transmitter and receiver can at best be
AWGN
X
transmitter
S
R
receiver
Y
Channel
CT,RF
CT,LF
Figure 4. Basic communication model.
lossless data processors. Only in this case the capacity C T,LF
of the LF channel X → Y is equal to the capacity C T,RF of
the RF channel S → R, i.e.,
(12)
CT,LF = BLF log2 (1 + SNRLF )
≤ CT,RF = BRF log2 (1 + SNRRF ) ,
which implies
SNRLF ≤ (1 + SNRRF )
BRF /BLF
−1.
(13)
Using definitions (4),
SNRLF ≤ (1 + SNR0 · Γa )Γa − 1
(14)
results, which coincides with (11). By this it is shown that
digitizing the source signal and subsequent digital transmission indeed is an optimum approach for transmission of
analog signals. Moreover, the procedures digitizing and digital
communication can be optimized quite independently, which
usually is regarded as the separability of source and channel
coding. Therefore, this approach usually applied in practice,
is well based on results from Information Theory.
2.5 Non–Transparent Transmission
The efficiency of communication can rather be increased by
data compression exploiting redundancy and irrelevancy. But
in this case the feature of transparency is lost. Transmission
is restricted to typical sources and typical sinks. W.l.o.g.
let us here consider sequences of discrete symbols from
an L-ary alphabet and with length K. Irrelevancy means
that the sink is not able or not willingly to distinguish all
LK = 2K log2 (L) possible messages, i.e., the set of all different
messages can be grouped into 2 KHr (Y ) disjoint subsets. On
the one hand, for each subset one representative sequence
suffices without any performance loss being noticeable or
relevant for the imperfect sink. On the other hand, out of all
2K log2 (L) possible messages only 2 KH (X ) need to be encoded.
Here, H(X ) denotes the entropy of the source. Assuming
independent source and sink properties, only
a comparable
small number 2 KH (X ) 2KHr (Y ) /2K log2 (L) of the different
sequences has to be encoded for which on average
n = H(X ) + Hr (Y ) − log2 (L) [bit/source symbol]
(15)
binary symbols suffice per source symbol. For K → ∞ the
actual number equals to this expectation.
Usually encoding of audio signals with n = 1.5 to 2
Bit/sample yield very high quality subject to the human
listener, whereas for video signals encoding down to n ≈
0.1 Bit/pixel is reasonable. In Fig. 2 the curve for n =
2 Bit/sample combined with the results of (8) and (10)
(optimum digital communications) is depicted.
Eqs. (8) and (10) indicate that gains from source encoding
count threefold for performance increase: No energy and
no bandwidth are wasted for transmission of redundant or
irrelevant symbols and power efficiency is increased due to
decreased demands on bandwidth efficiency.
Example Comparison of AM DSB with optimum transmission (Γa = 0.5): If a medium quality receiver output
signal with 10 log10 (SNRLF ) = 60 dB has to be guaranteed,
optimum digital signaling offers a gain of about 67 dB in
power efficiency while preserving signal bandwidth. This
corresponds to a possible replacement of a 500 kW transmit
station by a 0.1 W sender without loss in quality or distance
range. The gain separates into 42 dB due to digital transmission and channel coding (assuming transparent transmission)
and 25 dB due to audio source coding (non transparent
transmission).
3. The Chain Rule
The chain rule of information theory states that entropy of
a vector random variable can be expanded into a series of
conditional entropies
H(X1 X2 . . . Xk ) =
H(X1 ) + H(X2 | X1 ) + · · · (16)
+ H(Xk | (X1 X2 . . . Xk−1 )) .
Thus, it is sufficient to detect or process each variable step
by step while in each step the knowledge of already detected
variables is used in an optimum way. Chain rule also holds
for mutual information and channel capacity
I((X1 X2 . . . Xk ); Y )
= I(X1 ; Y )
(17)
+ I(X2 ; Y | X1 ) + · · ·
+ I(Xk ; Y | (X1 X2 . . . Xk−1 )) .
3.1 Multiple Access Channel
The situation when several, non-cooperative transmitters use
the same communication medium for information transmission to one central receiver is denoted as multiple access
channel (MAC), see Fig. 5.
Source 2
Source K
X1
X2
Y
noisy
channel
3.2 Coded Modulation by Multilevel Coding
XK
Figure 5. Multiple access channel.
In order to exploit the capacity I((X 1 , X2 , . . . , XK ); Y )
of the common channel, the chain rule (17) indicates that
the strategy “successive interference cancellation” (SIC) is
optimum. The procedure is illustrated for two users in Fig. 6,
where the region of achievable pair of rates (R 1 , R2 ) for
reliable communication is sketched.
I(X1 ; Y ), I(X2 ; Y | X1 )
For bandwidth efficient modulation formats with M > 2
signal points, the combination of a channel code with a
modulation scheme is a nontrivial task, since redundancy
should not be wasted for protection against unlikely error
events. In 1977, Imai [11] proposed a solution to this coded
modulation problem, where individual binary codes are used
for the components of binary address vectors for the signal
points. This method is called multilevel coding, see Fig. 7.
Enc bin.
data
DEMUX
Source 1
multiple access by means of channel coding over orthogonal
signaling—here unveiled by the chain rule of Information
Theory—is one of the reasons for the application of CDMA
(with non-orthogonal spreading sequences) for modern mobile
communication schemes. Additionally, the chain rule immediately proofs that SIC is a capacity-achieving, low-complexity
multiuser detection scheme for CDMA.
Enc 2
Enc 1
X
X2
X1
selection of
signal points
by binary
address
vectors
X
Channel
N AWGN
Y
R2 −→
Figure 7. Coded modulation by multilevel coding.
I(X1 ; Y | X2 ), I(X2 ; Y )
R1 −→
Figure 6. Rate region of the two-user multiple access channel. Dashed:
orthogonal signaling (time-division multiple access).
For user 1, signal 2 is regarded to as additional noise,
providing capacity I(X 1 ; Y ). After decoding the message of
user 1, its influence on the received signal is eliminated and
user 2 can exploit the higher capacity of the channel impaired
by the noise only. Thus, the point (I(X 1 ; Y ), I(X2 ; Y |
X1 )) is identified as an achievable pair of both code rates
(R1 , R2 ). Moreover, chain rule (17) shows that in this point,
the bound on the sum rate R 1 + R2 = I((X1 X2 ); Y ), valid
for cooperating users, is met. Of course the same arguments
hold for decoding user 2 first, leading to the point (I(X 1 ; Y |
X2 ), I(X2 ; Y )) in the rate region. Because one may switch
between both solutions in different time slots at any ratio, any
point on the boundary line with R 1 + R2 ≤ I((X1 X2 ); Y ) is
achievable on average. Additionally, the single user bounds
R1 ≤ I(X1 ; Y | X2 ) and R2 ≤ I(X2 ; Y | X1 ) complete the
well-known pentagon of the rate region of the two-user MAC.
In Fig. 6, pairs of maximum achievable rates for orthogonal
use of the channel by both users is illustrated, too, e.g., by
means of time-division or frequency-division multiple access.
The line of maximum rate (the “dominant face”) of the MAC
rate region is met in one point only, whereas considerable
rate losses have to be tolerated, if signal power/rates of
the users are not adjusted accordingly. The superiority of
For the design of the binary component codes, Imai suggested a rule for achieving maximum Euclidean distance of
codewords in multidimensional signal space, spanned combined by modulation and coding (design rule for “balanced
distances”). Unfortunately, this approach yields rather poor
performance when low-complexity successive multistage decoding is applied.
However, chain rule (17) indicates that multilevel coding is
a capacity achieving solution for the coded modulation problem, including successive decoding of the individual codes
while using decoding results from lower levels (multistage
decoding with successive interference cancellation) [26]. But
for this, the rates of the individual codes have to be designed
according to the individual terms on the right hand side
of (17). These terms correspond to the capacities of the
equivalent channels at different levels, when taking correct
decisions—possible in principle because of R i ≤ Ci —from
lower levels into account. Applying this capacity design rule
[14], performance close to the Shannon limit can be achieved
at any desired bandwidth efficiency of a coded modulation
scheme [25], cf. Fig 3.
Chain rule for mutual information proofs that modulation
and channel coding can be optimized separately and binary
channel codes suffice in principle for the design of capacity
achieving coded modulation schemes (theorem of separability
of coding and modulation in digital communications).
3.3 Decision–Feedback Equalization
By means of the chain rule it is easy to show that decision-feedback equalization (DFE) is an optimum equalization method for signaling over linear dispersive (frequency-
3.4 The Uplink–Downlink Duality
A very fundamental recent result in Information Theory is
the uplink-downlink duality principle [21], [23], [24], [27]:
To any detection/decoding strategy for a multiple access
channel (“uplink” situation) there exists an equivalent encoding/precoding strategy for the broadcast channel (“downlink”
situation), yielding exactly the same performance for the same
channel quality.
The dual to successive interference cancellation in the MAC
situation is superimposed coding or “Costa precoding”—
also known as “dirty paper coding” [4]—for the downlink
(broadcast channel, see [5]).
Costa’s theorem states that noise or interference known at
the transmitter does not impair the capacity of a channel at all.
There are precoding methods applicable at the transmitter to
combat multiuser interference—which otherwise would occur
at the receiver side—in a successive fashion and without any
increase in average transmit power. The achievable rates for
the users are again given by expanding mutual information
via the chain rule.
The concept of “Costa precoding” may be implemented as
a multidimensional generalization of Tomlinson–Harashima
precoding (THP) [6], i.e., imposing the information X k by
means of “codeword clouds” around the codewords specifying
the already encoded information X 1 , . . . , Xk−1 in a highdimensional signal space. All vertices of the rate region of the
corresponding dual MAC are achievable by Costa precoding.
However, for fixed sum transmit power, now additionally a
flexible power allocation to the signals of the different users
is possible. Variation of the power distribution results in the
rate region of the broadcast channel as the union over all
rate regions of the corresponding MACs, see Fig. 8. Again,
the coding approach from Information Theory provides an
increased rate region when compared to orthogonal channel
access, e.g., via time sharing.
Via the chain rule, the concept of Costa precoding, and
the uplink-downlink duality, Information Theory induced a lot
of very important new approaches in information technology
like interference suppression precoding for CDMA downlink,
THP for time-dispersive linear channels or signaling over
R2 −→
selective) channels. Decision is based on the first non-zero
sample of the minimum-phase impulse at the output of a
whitened matched filter (WMF) as receiver front-end, while
post-cursors of previous impulses are canceled by means of
decision feedback. When combined with capacity achieving
channel codes, DFE would provide equal performance as
extremely more complex maximum-likelihood sequence detection. Unfortunately, DFE cannot be easily combined with
coding due to causality problems but the dual structure using
precoding overcomes this problem, see Section 3.4.
Decision-feedback equalization is also a very efficient approach for signaling over MIMO channels in order to cancel
the interference from already detected signals, radiated from
different transmitters. The pioneering work, BLAST [9], was
based on DFE.
R1 −→
Figure 8. Rate region of the two-user Gaussian broadcast channel. Dashed:
orthogonal signaling (time-division multiple access). Dash dotted: rate regions
of the corresponding MACs for three exemplary power distributions. Circle:
point of maximum sum rate.
MIMO channels [7]. Now, multidimensional generalizations
of THP is of wide interest in current research. Hierarchical
source/channel coding for multi-resolution reception and modern schemes for digital watermarking [1], [8] are also based
on the chain rule and the uplink-downlink duality.
4. The Capacity Formula Again: MIMO Channels
A flat (frequency non-selective) multiple-input/multiple-output (MIMO) additive white Gaussian noise channel is characterized by the equation
y = Hx + n .
(18)
Here, x = [x1 , . . . , xNT ]T denotes the vector of N T input
signals to the MIMO channel at the same time instant, y =
[y1 , . . . , yNR ]T the corresponding vector of N R output signals,
H = [hji ] the channel matrix where h ji is the channel gain
from input i to output j, and n = [n 1 , . . . , nNR ]T the vector
of Gaussian noise variables, which are assumed to be spatially
and temporally i.i.d., each with variance σ n2 .
The capacity of the AWGN MIMO channel (for equivalent
complex baseband signals) is given by [22]
1
bit
CMIMO = log2 det I NR + 2 HQx H H
MIMO use
σn
(19)
with NR × NR identity matrix I NR and the covariance matrix
Qx = E{xxH } of the transmit signal vector, which also has
to be Gaussian distributed for achieving capacity. By means
of eigenvalue decomposition of the Gramian of the channel
matrix
HH H = U diag(λi )U H ,
λi ∈ R+ , U unitary
(20)
the optimum covariance matrix Q x for given average sum
power, trace(Qx ) ≤ S, results
µ − 1/λi , if µ > 1/λi
H
Qx = U diag(Pi )U , Pi =
,
0,
else
(21)
NT
Pi = S.
and µ such that i=1
The eigenvalue decomposition leads to N T equivalent noninterfering parallel channels with capacities C i = log2 (1 +
λi Pi /σn2 ) and
N T
CMIMO =
Ci .
(22)
i=1
This equation shows that a MIMO channel allows signaling
over parallel channels using space multiplexing, despite of
very high coupling of all paths from the different inputs to
the different outputs. Using space, bandwidth efficiency of
digital communication schemes is multiplied by a factor up
to min(NT , NR ). This realization of the potential of MIMO
signaling from Information Theory has inspired intensive
research activities in digital communications which will soon
initiate a revolution in wireless data links like WLANs etc. An
increase in data rate by a factor of 10 or even more for next
generation wireless communication will be made possible by
utilizing the resource space.
Although, these results seem to be very recent, MIMO
signaling is already included in Shannon’s basic approach
to channel capacity. Shannon used a quite more general
definition of bandwidth: the number of dimensions in a signal
space available per second [20]. Using multiple antennas is
nothing else than providing more dimensions for signaling;
however, the problem here is that because of mutual interference, these dimensions are far away from orthogonality. But
equalization methods well known from dispersive channels
apply in the same way for non-orthogonal signaling in space,
i.e., DFE which results in BLAST or spatial THP [7]. After
reconstruction of orthogonality, the well-known formula (2)
applies again, cf. also (22).
5. Conclusions
In this paper we have pointed out that Information Theory,
invented contemporary to microelectronics, has an important
impact of today’s information technology. Both fields, dating
back to 1948, are together the foundations of the information
age. Many of the basic principles in modern information and
communication technology can directly be derived from theorems of Information Theory. Here, capacity formula and chain
rule have been used as most prominent examples. Recent
developments in communications like OFDM, CDMA, MIMO
transmission are clearly inspired by results from Information
Theory.
The efficiency of communication schemes has been extremely enhanced, which was illustrated for the field of transmission of analog signals over the AWGN channel. Without
violating bandwidth constraints, signal power reduction of
several orders of magnitude are now possible when compared
to the beginnings where AM DSB with carrier was used. The
progress in system design based on Information Theory is
comparable to that in microelectronics. Against this background it is a major concern of the authors to emphasize that
a profound education of students in electrical, electronics and
communication engineering in Information Theory is essential
for their ability and success in development and research as
well as in continuing the progress of information technology.
References
[1] R. Bäuml, R. Tzschoppe, A. Kaup, J.B. Huber. Optimality of SCS
Watermarking. In SPIE Vol. 5020, Security and Watermarking of
Multimedia Contents V, Santa Clara, CA, USA, Jan. 2003.
[2] T. Berger. Rate Distortion Theory. Prentice-Hall, Englewood Clifs, NJ,
1971.
[3] C. Berrou, A. Glavieux, P. Thitimajshima. Near shannon limit errorcorrecting coding and decoding: Turbo-codes. In Intern. Conf.
on Communications (ICC), pp. 1064–1070, Geneva, Switzerland, May
1993.
[4] M.H.M. Costa. Writing on dirty paper. IEEE Transactions on
Information Theory, pp. 439–441, May 1983.
[5] T.M. Cover. Broadcast channels. IEEE Transactions on Information
Theory, pp. 2–14, Jan. 1972.
[6] R.F.H. Fischer. Precoding and Signal Shaping for Digital Transmission,
John Wiley & Sons, New York, 2002.
[7] R.F.H. Fischer, C. Windpassinger, A. Lampe, J.B. Huber. Space-Time
Transmission using Tomlinson-Harashima Precoding. In 4th Intern. ITG
Conf. on Source and Channel Coding, pp. 139–147, Berlin, Jan. 2002.
[8] R.F.H. Fischer, R. Tzschoppe, R. Bäuml. Lattice Costa Schemes using
Subspace Projection for Digital Watermarking. European Transactions
on Telecommunications (ETT), pp. 351–362, July/Aug. 2004.
[9] G. Foschini. Layered Space-Time Architecture for Wireless Communication in a Fading Environment When Using Multiple Antennas. Bell
Laboratories Technical Journal, pp. 41–59, Autumn 1996.
[10] R.G. Gallager. Low-Density Parity-Check Codes. MIT Press Classic
Series, Cambridge, MA, 1963
[11] H. Imai and S. Hirakawa. A New Multilevel Coding Method Using
Error Correcting Codes. IEEE Transactions on Information Theory,
pp. 371–377, May 1977.
[12] D.J.C. MacKay, R.M. Neal. Near Shannon limit performance of
low-density parity-check codes. Electronic Letters, pp. 1645–1646,
Aug. 1996.
[13] J.B. Huber, B. Matschkal. Spherical logarithmic Quantization and its
Application for DPCM. In 5th Intern. ITG Conf. on Source and Channel
Coding, pp. 349–356, Erlangen, Germany, Jan. 2004.
[14] J.B. Huber, U. Wachsmann. Capacities of Equivalent Channels in
Multilevel Coding Schemes Electronics Letters, pp. 557–558, March
1994.
[15] S. Hüttinger, J.B. Huber. Design of “Multiple-Turbo-Codes” with
Transfer Characteristics of Component Codes. In Conf. on Information
Sciences and Systems (CISS), Princeton, March 2002.
[16] B. Matschkal, F. Bergner, J.B. Huber. Joint Signal Processing for Spherical Logarithmic Quantization and DPCM. In 4th Intern. Symposium
on Turbo Codes/6th Intern. ITG Conf. on Source and Channel Coding,
München, Germany, April 2006.
[17] J.G. Proakis. Digital Communications. McGraw-Hill, New York,
4th edition, 2001.
[18] T.J. Richardson, M.A. Shokrollahi, R.L. Urbanke. Design of capacityapproaching irregular low-density parity-check codes. IEEE Transactions on Information Theory, pp. 619–637, Feb. 2001.
[19] C.E. Shannon. A mathematical theory of communication. Bell Syst.
Tech. J., vol. 27, pt. I, pp. 379–423, 1948; pt. II, pp. 623–656, 1948.
[20] C.E. Shannon. Communication in the presence of noise. Proceedings
of IRE, pp. 10–21, Jan. 1949.
[21] M. Schubert, H. Boche. A unifying theory for uplink and downlink
multi-user beamforming. In Intern. Zurich Seminar on Broadband
Communications, Zurich, Switzerland, Feb. 2002.
[22] E. Telatar. Capacity of Multi-Antenna Gaussian Channels. European
Transactions on Telecommunications (ETT), pp. 585–596, Nov. 1999.
[23] P. Viswanath, D.N. Tse. Sum Capacity of the Vector Gaussian
Broadcast Channel and Uplink-Downlink Duality. IEEE Transactions
on Information Theory, pp. 1912–1921, Aug. 2003.
[24] S. Vishwanath, N. Jindal, A. Goldsmith. Duality, Achievable Rates,
and Sum-Rate Capacity of Gaussian MIMO Broadcast Channels. IEEE
Transactions on Information Theory, pp. 2658–2668, Oct. 2003.
[25] U. Wachsmann, J.B. Huber. Power and Bandwidth Efficient Digital Communication Using Turbo Codes in Multilevel Codes. European Transactions on Telecommunications (ETT), pp. 557–567,
Sept./Oct. 1995.
[26] U. Wachsmann, R.F.H. Fischer, J.B. Huber. Multilevel Codes: Theoretical Concepts and Practical Design Rules. IEEE Transactions on
Information Theory, pp. 1361–1391, July 1999.
[27] W. Yu, J.M. Cioffi. Sum Capacity of Gaussian Vector Broadcast
Channels. IEEE Transactions on Information Theory, pp. 1875–1892,
Sep. 2004.