Nonlinear Sparse-Graph Codes for Lossy Compression

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009
1961
Nonlinear Sparse-Graph Codes for Lossy
Compression
Ankit Gupta, Student Member, IEEE, and Sergio Verdú, Fellow, IEEE
Abstract—We propose a scheme for lossy compression of discrete
memoryless sources: The compressor is the decoder of a nonlinear
channel code, constructed from a sparse graph. We prove asymptotic optimality of the scheme for any separable (letter-by-letter)
bounded distortion criterion. We also present a suboptimal compression algorithm, which exhibits near-optimal performance for
moderate block lengths.
Index Terms—Discrete memoryless sources, lossy data compression, rate–distortion theory, source–channel coding duality,
sparse-graph codes.
I. INTRODUCTION
VEN for simple sources and distortion criteria, such as
Bernoulli processes with bit-error-rate distortion, the
construction of compression–decompression algorithms that
perform near the rate–distortion function with reasonable complexity lags well behind the construction of capacity-achieving
error-correcting codes. One reason for this is the fact that while
linear codes achieve capacity for discrete channels with additive noise [1] (and the minimum lossless compression rate for
arbitrary sources [2]), linear compressors cannot approach the
rate–distortion function [3], (see also [4]). However, suppose
that for a binary-symmetric source with bit-error-rate distortion,
the codewords of a linear code for a binary symmetric channel
are used as reconstruction codewords. Then, if the compressor
is the maximum-likelihood channel decoder it is possible to
find a sequence of linear codes that attain the rate–distortion
function [5]. More generally, using nonbinary linear codes it
is possible to approach the rate–distortion function of discrete
memoryless sources arbitrarily closely as long as the distortion
function is separable [6].
The advances in sparse-graph codes that perform close to
capacity with low encoding–decoding complexity have spurred
a number of recent works in the lossy data compression literature where a decoder for a low-density parity check code
(LDPC) or low-density generator matrix code (LDGM) is
E
Manuscript received June 24, 2007; revised January 08, 2009. Current version published April 22, 2009. This work was supported in part by the National
Science Foundation under Grant CCR-0312839. The material in this paper was
presented in part at the IEEE Information Theory Workshop, Lake Tahoe, CA,
September 2007.
The authors are with the Department of Electrical Engineering, Princeton
University, Princeton NJ 08544 USA (e-mail:[email protected];
[email protected]).
Communicated by M. Effros, Associate Editor for Source Coding.
Color versions of Figures 1–7 in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIT.2009.2016040
used as the compressor. A sequence of LDPC codes is constructed in [7] that attains the rate–distortion function of the
binary symmetric source with bit-error-rate distortion when
the maximum-likelihood channel decoder is used as the lossy
compressor. Unfortunately, the belief propagation decoder fails
when used as a lossy encoder for this code. Furthermore, a polynomial-complexity encoder with near-optimal performance
has not been found for this code. LDGM codes were proposed
for this problem in [8]. In [9], generalized LDGM codes were
constructed by substituting modulo addition by other Boolean
operations. Both [8] and [9] also propose low-complexity
compressors based on the survey propagation algorithm [10]
that show excellent empirical performance. However, the
asymptotic optimality of LDGM codes for this problem is still
open. Another approach using an LDPC–LDGM hybrid code
with bounded check degrees is proposed in [11] and proven
to be asymptotically optimal (with the computationally intensive maximum-likelihood decoder used as the compressor).
Sparse-graph lossy compression systems for more general
rate–distortion problems have been studied in [12] and [13]. In
[12], asymptotically optimal LDPC codes for compressing the
nonredundant (i.e., memoryless and equiprobable) -ary source
with a Hamming distortion criterion were proposed. In [13],
an asymptotically optimal lossy compressor (based on LDPC
codes) for compressing the Bernoulli source with Hamming
distortion is proposed, but no computationally feasible compression algorithm is known for the codes in [12] and [13]. A
sparse-graph-based lossy compressor for compressing discrete
memoryless sources with an arbitrary separable distortion criterion has not been found in the literature yet. In fact, no code
(linear or nonlinear; sparse-graph-based or not) is known that
exhibits both asymptotic optimality and computationally feasible compression algorithms with near-optimal performance
in the finite block length regime.
In the literature, various types of matrix sparsity are refered
to as “low-density.” In the strong sense, this means that the
nonzero entries per column (or row) in the (parity check, or
generator) matrix remain bounded as the block length grows
[9], [8], [11], [14]. A weaker notion is that they are allowed to
scale sublinearly with [7], [12], [13]. It was shown in [15] that
any LDGM code with bounded ones per column cannot achieve
the optimal rate–distortion tradeoff, for the binary symmetric
source with Hamming distortion.
In this paper, we propose a new construction of nonlinear
codes based on LDGM matrices, which are asymptotically optimal for compressing discrete memoryless sources with separable distortion criterion. This construction is low-density in
the weaker sense that the number of nonzero entries per row
0018-9448/$25.00 © 2009 IEEE
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
1962
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009
in the generator matrix scale as
with the block length .
We also provide suboptimal compressors for these codes, which
have excellent empirical performance, even at moderate block
lengths. Our code design can be viewed as an intermediate on
a continuum of block codes, with the linear codebook and the
random nonlinear codebook as the two extremes.
The remainder of this paper is organized as follows. Section II
presents the code design and proof of the asymptotic optimality
of the construction for compressing the binary symmetric source
with a Hamming distortion criterion. Section III extends the
codes presented in Section II for compressing discrete memoryless sources with a separable (i.e., letter-bu-letter) and bounded
distortion criterion. Section IV proposes suboptimal compression algorithms whose performance is illustrated in Section V.
II. CODE CONSTRUCTION AND ANALYSIS FOR THE
BINARY SOURCE
A. Code Construction
codebook has block length and
codeA binary
words. If there is no underlying structure to this set (for example,
a random codebook) then exponential complexity is required for
channel decoding (or lossy compression). A binary linear codebook, on the other hand, is a much restricted set of codewords:
all the binary -vectors that can be written as
(1)
for all possible choices of a binary -vector , for a given binary
matrix . Note that if is not allowed to range over all
choices then the ensuing codebook is, in general, nonlinear. In
fact, any codebook (linear or nonlinear) can be described by (1)
if is allowed to range over only the vectors with unit Hamming
weight, and has as many rows as codewords, i.e., .
In this paper, we propose a class of nonlinear codebooks that
has some convenient structure by letting range over the -vec, where
.
tors with a given Hamming weight
Further, we let
(2)
A convenient low-density choice of the
binary matrix
is by independent and identically distributed generation of its
coefficients where
(6)
The lossy compressor is the minimum Hamming distance decoder and the decompressor is simply the encoder.
B. Code Analysis
We now turn to the analysis of the code introduced in
Section II-A. We show that with high probability the lossy
compressor described in Section II-A asymptotically attains the
rate–distortion function of the memoryless binary symmetric
source with Hamming distortion. More formally we show the
following result.
Theorem 1: Let a codebook be constructed as specified in
. As
,
Section II-A with block length and
the Hamming distortion obtained by representing an arbitrary
-length source realization with the nearest codeword in the
codebook is less than almost surely.
Proof: Pick a random codebook as outlined in Section II-A.
. Denote
Label the codewords as ,
(7)
and
(8)
The event
is equivalent to the event that at least one
codeword in the codebook is within Hamming distortion from
goes to as
the given source. Thus, if we show that
the theorem will be proved. However we will see later
that using martingale arguments it is sufficient to show that
(9)
to claim that
and
(10)
(3)
the binary -strings of HamWe denote by
in lexicographic order. The codebook is
ming weight
given by
where
. The number of
codewords in the codebook is equal to
(10) to
Therefore, we first show (9) and then we prove (9)
complete the proof of Theorem 1. The proof is structured as a
sequence of intermediate lemmas.
The Cauchy–Schwarz inequality gives a lower bound on
in terms of the first and second moments of the
nonnegative random variable
(4)
Lemma 1: The asymptotic rate of the code converges to
(11)
To compute
(5)
we make use of the following result.
Lemma 2:
with the choice of parameters in (2) and (3).
Proof: See Appendix I.
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
(12)
GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION
1963
Proof of Lemma 2: Although this result is similar to
Lemma 3 in [11], we cannot use the proof therein because it
requires the code to be linear
(21)
(22)
(13)
(23)
(14)
(24)
where (21) follows from (20) through application of the binomial expansion
Therefore
(25)
(15)
(16)
(17)
(26)
To compute probabilities of the form
,
. To that end, we have the
we need the joint statistics of
following result.
Lemma 4: Let
, and let
be a sequence such that
. Then
where (17) follows from (16), because by symmetry
does not depend on .
We now show that asymptotically each codeword behaves
like a sequence of fair coin flips, as formalized by the following
result.
(27)
whenever the limits exist.
Proof of Lemma 4:
and
the
Lemma 3: For every
of the codeword are independent
bits
and identically distributed. If
and ,
are disjoint
, then
and
subsets of
are independent. Furthermore
(18)
(28)
(29)
for every
Proof of Lemma 3: Recall that
(30)
(19)
if and only if the
positions corBy definition,
responding to the ones in select an odd number of ones in the
th row of
. These events are independent with identical
are
probabilities for different , because the coefficients of
independent and identically distributed. Thus, if
then
and
are independent and the
are independent and identically disbits
tributed. Furthermore
where (29) is obtained from Lemmas 21 and 22 and (30) is obtained from Lemma 21.
Returning to the proof of Theorem 1, let
(31)
and
(32)
In order to compute the quantity in the right-hand side of (12)
we write
(20)
(33)
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
1964
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009
The first term in (33) is less than or equal to , which grows
subexponentially with as the following result shows.
Lemma 5: Let
, such that
for
and
(47)
be defined as in (31), then
(34)
Proof of Lemma 5: Let
Let and , be the all-zero and all-one -vectors respectively.
Then, using Lemma 14 (Appendix II), and Sanov’s theorem
(e.g., [16, Theorem 11.4.1]), we obtain
(48)
(35)
and let
(49)
be the set of -vectors defined as
(36)
and
(50)
Clearly
(51)
(37)
(38)
Together with (47), we obtain the desired result from(49)–(51).
Using Lemma 2 in [11] we have the following result.
(39)
Lemma 7: For
defined in (8)
where we used the fact that
(52)
(40)
.
for
Next, using (39) we show that
, thus proving (34).
grows subexponentially in
Combined with inequality (11) and Lemma 7, the following
.
result gives a lower bound to
Lemma 8: If
for an arbitrary
, then
(53)
(41)
Proof of Lemma 8: For arbitrary
(42)
(43)
(44)
(54)
Let
and
be defined in (31) and (32), respectively, then
Substituting
for
and noticing that by assumption
we get (43). From (44) and the lower bound
we get (34).
We give the asymptotic rate of decay of
following lemma.
Lemma 6: Let
chosen in Section II-A. For
(55)
in the
(56)
be a random codebook as
where (56) follows from Lemma 5 and (54).
(45)
Proof of Lemma 6: Although a similar result is given in [14,
Lemma 3] , we give a self-contained proof due to the different
code construction. From Lemma 3,
are independent and identically distributed with
(57)
(46)
(58)
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION
(59)
1965
If
(70)
(60)
(61)
(62)
where (60) is obtained by using Lemma 4. Combining (56) and
(62) we obtain (53).
Now we show that
then there exists a convergent subsequence such that
(71)
From Lemma 10 along this subsequence, we have
does not decay exponentially in
(72)
.
Lemma 9: Let
(73)
. Then
(63)
which contradicts Lemma 9. Thus
(74)
Proof of Lemma 9: Using inequality (11) and Lemma 2 we
have
which, according to Lemma 10, implies that
(64)
(75)
Therefore, (10) follows in view of (69).
(65)
The desired result follows from the asymptotic behavior of both
terms in the right-side of (65) found in Lemmas 7 and 8.
To finalize the proof of Theorem 1 we use an argument that
is virtually identical to the proof of Theorem 2 in [11]; we spell
out the details because our code construction is different from
the one presented in [11].
To prove Theorem 1 we will also use the following auxiliary
bound.
Lemma 10: [17] For a martingale
if
III. CODE CONSTRUCTION AND PROOF OF OPTIMALITY FOR
THE DISCRETE MEMORYLESS SOURCE
A. Code Construction for the Nonredundant Source
We begin by generalizing the construction in Section II-A to
the discrete nonredundant (i.e., memoryless and equiprobable)
source taking values over an alphabet . We label the symbols
. The codebook
in the source alphabet as
is defined through a
matrix
(where addition is over the group
), as
(76)
(66)
, then
for all
(67)
where
are binary -vectors of Hamming
in lexicographic order and and are chosen
weight
from (2) and (3). The asymptotic rate of the code satisfies (see
Lemma 1)
Define the martingale
(77)
(68)
are the rows of the
matrix. For this
where
martingale, (66) is satisfied with
according to Lemma 15.
is the average (over all the codebooks) of the HamNote that
ming distance between the source realization and the closest
codeword. Furthermore, there is no averaging with respect to the
, as it is the distance between
codebook in the definition of
the source realization and the closest codeword in the codebook
defined by , and
(69)
The matrix is obtained by random independent and identically
distributed generation of its coefficients such that
(78)
and
(79)
for
. For this code construction we can show a general
by
in (18), and of Lemma
version of Lemma 3 replacing
4 where
. In addition, this code construction achieves
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
1966
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009
the ideal rate–distortion tradeoff asymptotically (see [18] for
details).
Theorem 2: Construct a codebook as given above with a
, where
is the rate–distorblock length and
tion function for the equiprobable -ary source with Hamming
is the nearest
distortion. Let be the source realization. If
codeword in Hamming distance to , then
(80)
.
for all
C. Code Analysis
The code construction in Section III-B achieves the ideal rate
distortion performance asymptotically, as stated in the following
result.
Theorem 3: Consider a memoryless source distributed according to
and per-letter distortion
, with
. Let
be the output of the
rate–distortion function
compressor–decompressor in Section III-B designed for distor, when the source realization and asymptotic rate
tion is
. Then
(85)
B. Code Construction for Discrete Memoryless Sources
Next, we bootstrap the code design for the nonredundant
-ary source with Hamming distortion, to obtain asymptotically
optimal codes for compressing general discrete memoryless
sources with bounded and separable1 distortion criteria. The
idea for this code construction is similar to the one given in
[19] for channel coding and [6] for lossy source coding.
Consider a discrete memoryless source taking values over an
alphabet with distribution . Let be the reproduction albe the per-letter distortion
phabet and
measure. Consider the variational problem corresponding to the
rate-distortion function for a given distortion
(81)
For brevity, we fix , and denote by
the marginal distribution
resulting from the minimization in (81). We will assume that
is a rational distribution, i.e., we can write it as
Proof: We begin by showing the following result for the
distribution of .
Lemma 11: For every
and ,
are independent and identically distributed. If
and ,
are disjoint subsets of
,
and
are independent. Furtherthen
more
(86)
for all
.
be the codeProof of Lemma 11: Let
words in Section III-B before the mapping is applied. From
Lemma 3 (generalized to the -ary case), if
and ,
are disjoint subsets of
, then
and
are independent. Therefore,
and
are independent since is a deterministic mapping. Further, from Lemma 3 (generalized to the -ary case), for
(82)
where
and
are integers. This is not a very restrictive condition as we can design codes to operate arbitrarily
closely to any given point on the rate–distortion tradeoff curve.
such that
Thus, for every
(87)
Therefore
(88)
(83)
(84)
corresponding to is of the form given in (82).
and
of the form (82), we construct a -ary
Given a rate and
codebook using LDGM matrices with block length , asymptotic rate , and a deterministic mapping
, such that maps the equiprobable probability disto
over . The codebook
tribution over
is obtained by applying to each symbol
in the codebook . The compressor selects the codeword closest
to the source realization (according to the distortion criterion
). Again, since the number of codewords has
defined by
not changed after applying the deterministic transformation ,
from Lemma 1 the asymptotic rate remains .
1Separable
distortion means that d (x ; y ) =
(89)
(90)
where
is the subset of
which is mapped
to . We get (90) because maps the equiprobable probability
distribution to .
Returning to the proof of Theorem 3, define
(91)
and let
be such that
d(x [i]; y [i]).
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
(92)
GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION
then
1967
whenever the limits exist. Therefore, for a sequence of sets
and
, we have
(93)
(106)
(94)
(107)
Therefore, to prove Theorem 3 all we need to show is that there
over which (92) holds and
exists a set
(108)
(95)
denotes the pre-image
whenever the limits exist (where
of the set ). Using (108) and (57)–(60) we have
, let
For
(96)
and define
(97)
for a vanishing sequence , which will be specified later. We
will now show that
satisfies (92) and (95) in the following
two lemmas, completing the proof of Theorem 3.
Lemma 12: For
defined in (97) and
holds.
, let
Proof of Lemma 12: Fix
we have
With
(110)
(111)
Therefore
, (92)
(112)
.
Using (112), (102), and (11) for
(98)
(113)
(99)
Now we proceed as in the proof of Theorem 1 (see also [11]).
Define a martingale
as
(100)
(114)
(101)
where
are the columns of the
matrix. For
this martingale, (66) is satisfied with
according to
Lemma 17. The remainder of the proof proceeds as in the proof
of Theorem 1.
(102)
where (102) follows from Lemma 1 and (97). Define
(31) and (32). Using Lemma 5 and (54)–(56) we get
(109)
as in
Now, to complete the proof of Theorem 3 all we need to show
is the following result.
Lemma 13: There exists a sequence
such that for
defined in (97), (95) holds.
Proof of Lemma 13: From Lemma 11, for fixed each
codeword is a sequence of independent and identically dis, such that
tributed random variables with distribution
(115)
(103)
(104)
Fix a source realization
and define the random variables
. Let
(116)
Let
be a sequence such that
, and let
and
. Using Lemma 4 generalized to the
-ary case: for a sequence of sets
, we
have
(117)
and let
(105)
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
(118)
1968
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009
Define
each component is mapped with the function
Section III-B. The -vector satisfies
defined in
(127)
(119)
Using the Chebyshev inequality
(120)
From Lemma 16 for
and any fixed
(121)
which implies, since
is arbitrary
(122)
Using the Gartner–Ellis theorem [20, Theorem 2.3.6], for
where
is the Fenchel–Legendre transform of
uated at , i.e.,
(123)
(124)
eval(125)
From [21, Theorem 2],
. Therefore, for a suitably
and (120) implies that (95) is satisfied.
chosen ,
Finally, Theorem 3 follows from Lemmas 12, 13 as well as
(92) and (95).
IV. SUBOPTIMAL ALGORITHMS FOR COMPRESSION
In this section, we describe a suboptimal algorithm (and its
variants) to encode a source using the codebooks described in
Sections II and III. This algorithm attempts to locate a codeword
in the codebook such that the distortion between the source and
codeword is minimum. We note that an optimal algorithm to select such a codeword is NP-complete [22], therefore any polynomial complexity algorithm (like the one presented here) is necessarily suboptimal. However, it should be noted that NP-completeness implies hardness in the sense of worst case input; it
does not rule out the existence of a polynomial time algorithm
that is able to locate the minimum-distance codeword for most
source sequences. Empirical results in Section V demonstrate
that our algorithm attains near-optimum performance. The compressor/decompressor work as follows.
matrix
Recall that the codebook is specified by the
and
. It consists of all the codewords of length
that can be written as
(126)
where is the identity mapping for the binary and nonredundant case and is a deterministic many to one mapping (see
Section III-B) otherwise, where with slight abuse of notation
The algorithm attempts to find a good approximation to the
source string of length among the codewords in an iterative
manner. At each step in the iteration, we select a string of length
,
by flipping one and only one bit in . The algorithm
and at the completion of the algostarts with
rithm we have
. Let
. The choice of
the bit to flip in is such that
is minimized. To
that end, we perform an exhaustive search for all columns of
enumerated as ,
, computing
, and selecting the index that leads to the lowest distortion metric. This
procedure is repeated till
. If
, then if
, the
algorithm is now constrained to flip only bits which are zero
, and vice versa for
in that lead to minimum
the case when
. We halt when
.
It is immaterial how ties are broken by the compressor. At the
final configuration of , the encoder then stores the value of
in the form of an index, using an enumerative encoding scheme
[23]. The decoder then uses this index to recover , and outputs
. A pseudocode description of the encoder is given in
Algorithm 1 at the top of the following page.
Some other variations in this algorithm are also possible. For
example, we can fix a recursion depth . We run multiple copies
of this algorithm whenever we have ties for the element with the
maximum gain by flipping each maximum gain position in different copies. After bits have been flipped, the algorithm proceeds as described above in each of the multiple copies. Finally,
we choose the winner out of all these multiple copies. Another
possible variation is to flip pairs of bits simultaneously, selecting
the pair which leads to the best approximation to the source.
For the core algorithm, the complexity analysis may be perat each iteration requires
formed as follows: To compute
computations on average because we add a total of
columns ( ) to
and each column contains
entries on average; therefore, the average computational cost is
. The number of iterations is bounded by
. Thus, the average complexity of the algorithm
compared to various mesis
sage-passing-based approaches such as survey propagation [9]
.
and its variants [8], which incur a complexity of
V. EXPERIMENTS
In this section, we show empirical results obtained with the
codes given in Sections II and III and the encoding/decoding algorithms in Section IV for a variety of rate–distortion problems.
For each rate we fix a randomly generated codebook and average the distortion obtained for compressing a random source
(for 1000 iterations).
LDGM codes and message-passing algorithms perform very
close to the rate–distortion function for compressing the binary
symmetric source with Hamming distortion for block lengths
of the order of thousands as demonstrated by the empirical results in [8] and [9]. However, these algorithms perform far from
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION
1969
In Figs. 4 and 5, we plot the rate–distortion tradeoff for
and
an equiprobable -ary source for block lengths
, respectively, with a Hamming distortion criterion.
These figures show that the performance of our codes and
encoding algorithm is very close to the optimal for short block
lengths (for the general -ary source with Hamming distortion).
In Fig. 6 we show results obtained with codes from Section III
-Bernoulli source with a Hamming
for compressing the
. We now prodistortion criterion and block length
vide an illustration of the code construction in Section III-B
for this problem. In this case, the reproduction distribution
is given as
corresponding to the rate–distortion point
.
, the corresponding reproduction distribution is
Let
, thus, a reasonable choice of
from (82) is
. The mapping
should map the
to
equiprobable probability distribution over
over
, a possible choice is
the distribution
and otherwise. Thus, to obtain
the codebook for compressing the
-Bernoulli source with
we construct a -ary codebook over
Hamming distortion
with rate
, and apply
the alphabet
to each of its codewords.
the mapping
In Fig. 7, we show results obtained with the codes in Section III, when used for compressing the binary symmetric source
) where the distortion criterion
(with block length
satisfies
(128)
These experiments demonstrate the near-optimal performance
of the proposed codes for simple memoryless sources and separable distortion criterion, even for short block lengths. Furthermore, the low complexity of the proposed suboptimal compression algorithm makes the new codes particularly appealing.
APPENDIX I
PROOF OF LEMMA 1
The rate of the code as a function of the block length is
given as
(129)
From [16, eq. (11.40)]
(130)
Therefore
(131)
(132)
(133)
optimal for short block lengths due to the effect of cycles in
the graph. On the other hand, our scheme performs well even
) as shown in Fig. 1.
for short block lengths (such as
, both schemes are very close to
For block length
the rate–distortion function without any discernible difference
in performance as seen in Figs. 2 and 3.
(134)
(135)
as we wanted to show.
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
1970
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009
Fig. 1. Empirical performance of the code in Section II-A with the suboptimal encoding algorithm in Section IV compared with LDGM codes and the message.
passing heuristic from [8], for the binary symmetric source with bit-error-rate distortion and block length n
= 400
Fig. 2. Empirical performance of the code in Section II-A with the suboptimal encoding algorithm in Section IV compared with LDGM codes and the message.
passing heuristic from [8], for the binary symmetric source with bit-error-rate distortion and block length n
= 1000
APPENDIX II
AUXILIARY RESULTS
Proof: Let
ables with
Lemma 14: Let
be independent and
identically distributed binary random variables such that
be independent binary random vari(139)
If we show that for arbitrary
(136)
where
and let
. Let
denote an arbitrary sequence in
, then for
then (138) follows by induction. Let
(137)
(138)
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
(140)
(141)
GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION
1971
Fig. 3. Empirical performance of the code in Section II-A with the suboptimal encoding algorithm in Section IV compared with LDGM codes and the message.
passing heuristic from [8], for the binary symmetric source with bit-error-rate distortion and block length n
= 2000
4
Fig. 4. Empirical performance of the code in Section III for compressing the -ary source with block length n
= 100 and Hamming distortion criterion.
and
(143)
(144)
(142)
Similarly
Clearly
, further
(145)
, (140) holds. Using (140)
Since
and induction on we have (137) and (138).
Lemma 15: Let
be defined as in (68)
then
(146)
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
1972
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009
Fig. 5. Empirical performance of the new code for compressing the 4-ary source with block length n = 400 and Hamming distortion criterion.
Proof: For a given source realization
let
(147)
satisfies
further note that the distribution function of
Note that Lemma 15 was shown for the code construction
in [11], however, we cannot use the proof therein due to the
difference in code construction.
Lemma 16: For
such that for
and every
there exists an
(148)
(154)
because, by construction, the rows
are independent.
Conditioning on
of the matrix
where ,
, and
(118), respectively.
Proof:
we obtain
are defined in (119), (116), and
(155)
(149)
From Lemma 11, for every
for
there exists an
such that
(156)
(150)
Therefore, for
A change in one column of the
at most one. Therefore
matrix can change
by
(157)
(158)
(151)
Choosing
, for
(152)
(153)
regardless of
. Therefore, (146) holds.
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
(159)
GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION
Fig. 6. Empirical performance of the code for compressing the Bernoulli (p
= 0:4) source with block length n = 1000 and bit-error-rate distortion criterion.
Fig. 7. Empirical performance of the new code for compressing the binary symmetric source with block length n
in (128).
Substituting
1973
from (116) in (159) and using (119), for
= 1000 and the asymmetric distortion criterion
Using (149)–(150) and replacing
by
, we have
(160)
Choosing
and
the required result.
we get
(163)
Lemma 17: Let
be defined as in (114)
then
(164)
(165)
(161)
Proof: For fixed
let
(162)
We get (164) because a change in one column of the
can change
by at most
.
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
matrix
1974
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009
Lemma 18: Let
be such that
are independent binary random variables with
If
(178)
(166)
then
then for all
(179)
(167)
, which does not
where
depend on .
Proof: Let
for all
and
. Identifying
we get (167).
Lemma 19: For a sequence of vectors
satisfying (166)
quence of integers
,
,
, and a se-
(168)
(180)
From (166),
where
. Define
, and
Proof: Obviously
(169)
and
(181)
(182)
(170)
where denotes the logical AND operation and denotes the
complement of . The vectors , , and are nonoverlapping
in the sense that
but for any , Lemma 3 yields
(183)
(171)
Therefore, (180) follows from Lemma 18.
Further
Lemma 20: For any sequence of deterministic vectors
(172)
and
(173)
where
(184)
Proof: The proof is similar to the proof of Lemma 19. Denoting the common distribution of
by
, we
have
denotes the logical XOR operation. Further
(185)
(174)
Therefore
(186)
and
(187)
(175)
,
and
as
and , respectively.
Denote
These vectors are mutually independent since , , and are
nonoverlapping. Further,
if and only if the
ones in
select an odd number of ones in the th row of
matrix. The probability of this event satisfies (see (20)– (22)
by
)
substituting
(176)
using Lemma 3
(188)
Lemma 21:
(189)
whenever the limits exist.
Proof: Let
(190)
(177)
and analogously for
, for
. We get (177) from (176) because
.
and
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
(191)
GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION
We have
(192)
(193)
Taking limits and using Lemma 20 the result follows.
Lemma 22: For a sequence of sets
satisfying (166)
sequence
and a
(194)
whenever the limits exist.
Proof: For each let
(195)
and
(196)
Therefore
(197)
(198)
Taking limits and using Lemma 19 we get (194).
ACKNOWLEDGMENT
We wish to thank the referees for their help in improving the
presentation.
REFERENCES
[1] I. Csiszár and J. Körner, Information Theory, Coding Theorems for Discrete Memoryless Systems. New York: Academic, 1981.
[2] G. Caire, S. Shamai (Shitz), and S. Verdú, “Lossless data compression with error correcting codes,” in Advances in Network Information
Theory. Providence, RI: Amer. Math. Soc., 2004, vol. 66, DIMACS
Series in Discrete Mathematics and Theoretical Computer Science, pp.
263–284.
[3] T. Ancheta, “Bounds and Techniques for Linear Source Coding,” Ph.D.
dissertation, Dep. Elec. Eng., Univ. Notre Dame, Notre Dame, IN,
1977.
[4] J. L. Massey, “Joint source and channel coding,” Commun. Syst.
Random Process Theory, vol. 11, pp. 279–293, 1978.
[5] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data
Compression. Eaglewood Cliffs, NJ: Prentice-Hall, 1971.
[6] J. Chen, D. He, and A. Jagmohan, “Achieving the rate-distortion bound
with linear codes,” in Proc. 2007 IEEE Information Theory Workshop,
Lake Tahoe, CA, Sep. 2007, pp. 662–667.
[7] Y. Matsunaga and H. Yamamoto, “A coding theorem for lossy data
compression by LDPC codes,” IEEE Trans. Inf. Theory, vol. 49, no. 9,
pp. 2225–2229, Sep. 2003.
1975
[8] M. J. Wainwright and E. Maneva, “Lossy source encoding via message
passing and decimation over generalized codewords of LDGM codes,”
in Proc. 2005 IEEE Int. Symp.Information Theory, Adelaide, Australia,
Sep. 2005, pp. 1493–1497.
[9] S. Ciliberti, K. Mezard, and R. Zecchina, “Message passing algorithms
for non-linear nodes and data compression,” ComPlexUs, vol. 3, pp.
58–65, Aug. 2006.
[10] A. Braunstein, K. Mezard, and R. Zecchina, “Survey propagation: An
algorithm for satisfiability,” Random Structures and Algorithms, vol.
27, pp. 201–226, Mar. 2005.
[11] E. Martinian and M. J. Wainwright, “Low-density codes achieve the
rate-distortion bound,” in Proc. 2006 Data Compression Conf., Snowbird, UT, Mar. 2006, pp. 153–162.
[12] S. Miyake, “Lossy data compression over Z by LDPC code,” in Proc.
2006 IEEE Int. Symp. Information Theory, Seattle, WA, Jul. 2006, pp.
813–816.
[13] S. Miyake and J. Muramatsu, “Construction of a lossy source code
using LDPC matrices,” in Proc. 2007 IEEE Int. Symp. Information
Theory, Nice, France, Jun. 2007, pp. 1106–1110.
[14] E. Martinian and M. J. Wainwright, “Analysis of LDGM and compound
codes for lossy compression and binning,” in Proc. 2006 Workshop on
Information Theory and its Applications, La Jolla, CA, Feb. 2006.
[15] S. Kudekar and R. Urbanke, “Lower bounds on the rate-distortion function of individual LDGM codes,” in Proc. 5th Int. Symp. Turbo Codes
and Related Topics, Lausanne, Switzerland, Sep. 2008, pp. 379–384.
[16] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd
ed. New York: Wiley Interscience, 2006.
[17] K. Azuma, “Weighted sums of certain dependent random variables,”
Tohoku Math. J., vol. 19, pp. 357–367, 1967.
[18] A. Gupta and S. Verdú, “Nonlinear sparse-graph codes for lossy compression of discrete nonredundant sources,” in Proc. 2007 IEEE Information Theory Workshop, Lake Tahoe, CA, Sep. 2007, pp. 541–546.
[19] R. G. Gallager, Information Theory and Reliable Communication.
New York: Wiley, 1968.
[20] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. New York: Springer, 2004.
[21] A. Dembo and I. Kontoyiannis, “Source coding, large deviations and
approximate pattern matching,” IEEE Trans. Inf. Theory, vol. 48, no.
6, pp. 1590–1615, Jun. 2002.
[22] E. R. Berlekamp, R. J. McEliece, and H. C. A. van Tilborg, “On the
intractability of certain coding problems,” IEEE Trans. Inf. Theory, vol.
IT-24, no. 3, pp. 384–386, May 1978.
[23] T. M. Cover, “Enumerative source encoding,” IEEE Trans. Inf. Theory,
vol. IT-10, no. 1, pp. 460–473, Jan. 1973.
Ankit Gupta (S’07) received the B.Tech. degree in 2003 from Indian Institute
of Technology, Delhi, India, and the M.A. degree in 2006 from Princeton University, Princeton, NJ, both in electrical engineering.
He is currently pursuing the Ph.D. degree in electrical engineering at
Princeton University.
Sergio Verdú (S’80–M’84–SM’88–F’03) received the Telecommunications
Engineering degree from the Universitat Politècnica de Barcelona, Barcelona,
Spain, in 1980 and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign, Urbana, IL, in 1984.
Since 1984, he has been a member of the faculty of Princeton University,
Princeton, NJ, where he is the Eugene Higgins Professor of Electrical Engineering.
Sergio Verdú is the recipient of the 2007 Claude E. Shannon Award and
the 2008 IEEE Richard W. Hamming Medal. He is a member of the National
Academy of Engineering and was awarded a Doctorate Honoris Causa from
the Universitat Politècnica de Catalunya in 2005. He is a recipient of several
paper awards from the IEEE: the 1992 Donald Fink Paper Award, the 1998 Information Theory Outstanding Paper Award, an Information Theory Golden Jubilee Paper Award, the 2002 Leonard Abraham Prize Award, and the 2006 Joint
Communications/Information Theory Paper Award. In 1998, Cambridge University Press published his book Multiuser Detection, for which he received the
2000 Frederick E. Terman Award from the American Society for Engineering
Education. He served as President of the IEEE Information Theory Society in
1997 and as Associate Editor for Shannon Theory of the IEEE TRANSACTIONS
ON INFORMATION THEORY. He is currently Editor-in-Chief of Foundations and
Trends in Communications and Information Theory.
Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.