IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 1961 Nonlinear Sparse-Graph Codes for Lossy Compression Ankit Gupta, Student Member, IEEE, and Sergio Verdú, Fellow, IEEE Abstract—We propose a scheme for lossy compression of discrete memoryless sources: The compressor is the decoder of a nonlinear channel code, constructed from a sparse graph. We prove asymptotic optimality of the scheme for any separable (letter-by-letter) bounded distortion criterion. We also present a suboptimal compression algorithm, which exhibits near-optimal performance for moderate block lengths. Index Terms—Discrete memoryless sources, lossy data compression, rate–distortion theory, source–channel coding duality, sparse-graph codes. I. INTRODUCTION VEN for simple sources and distortion criteria, such as Bernoulli processes with bit-error-rate distortion, the construction of compression–decompression algorithms that perform near the rate–distortion function with reasonable complexity lags well behind the construction of capacity-achieving error-correcting codes. One reason for this is the fact that while linear codes achieve capacity for discrete channels with additive noise [1] (and the minimum lossless compression rate for arbitrary sources [2]), linear compressors cannot approach the rate–distortion function [3], (see also [4]). However, suppose that for a binary-symmetric source with bit-error-rate distortion, the codewords of a linear code for a binary symmetric channel are used as reconstruction codewords. Then, if the compressor is the maximum-likelihood channel decoder it is possible to find a sequence of linear codes that attain the rate–distortion function [5]. More generally, using nonbinary linear codes it is possible to approach the rate–distortion function of discrete memoryless sources arbitrarily closely as long as the distortion function is separable [6]. The advances in sparse-graph codes that perform close to capacity with low encoding–decoding complexity have spurred a number of recent works in the lossy data compression literature where a decoder for a low-density parity check code (LDPC) or low-density generator matrix code (LDGM) is E Manuscript received June 24, 2007; revised January 08, 2009. Current version published April 22, 2009. This work was supported in part by the National Science Foundation under Grant CCR-0312839. The material in this paper was presented in part at the IEEE Information Theory Workshop, Lake Tahoe, CA, September 2007. The authors are with the Department of Electrical Engineering, Princeton University, Princeton NJ 08544 USA (e-mail:[email protected]; [email protected]). Communicated by M. Effros, Associate Editor for Source Coding. Color versions of Figures 1–7 in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIT.2009.2016040 used as the compressor. A sequence of LDPC codes is constructed in [7] that attains the rate–distortion function of the binary symmetric source with bit-error-rate distortion when the maximum-likelihood channel decoder is used as the lossy compressor. Unfortunately, the belief propagation decoder fails when used as a lossy encoder for this code. Furthermore, a polynomial-complexity encoder with near-optimal performance has not been found for this code. LDGM codes were proposed for this problem in [8]. In [9], generalized LDGM codes were constructed by substituting modulo addition by other Boolean operations. Both [8] and [9] also propose low-complexity compressors based on the survey propagation algorithm [10] that show excellent empirical performance. However, the asymptotic optimality of LDGM codes for this problem is still open. Another approach using an LDPC–LDGM hybrid code with bounded check degrees is proposed in [11] and proven to be asymptotically optimal (with the computationally intensive maximum-likelihood decoder used as the compressor). Sparse-graph lossy compression systems for more general rate–distortion problems have been studied in [12] and [13]. In [12], asymptotically optimal LDPC codes for compressing the nonredundant (i.e., memoryless and equiprobable) -ary source with a Hamming distortion criterion were proposed. In [13], an asymptotically optimal lossy compressor (based on LDPC codes) for compressing the Bernoulli source with Hamming distortion is proposed, but no computationally feasible compression algorithm is known for the codes in [12] and [13]. A sparse-graph-based lossy compressor for compressing discrete memoryless sources with an arbitrary separable distortion criterion has not been found in the literature yet. In fact, no code (linear or nonlinear; sparse-graph-based or not) is known that exhibits both asymptotic optimality and computationally feasible compression algorithms with near-optimal performance in the finite block length regime. In the literature, various types of matrix sparsity are refered to as “low-density.” In the strong sense, this means that the nonzero entries per column (or row) in the (parity check, or generator) matrix remain bounded as the block length grows [9], [8], [11], [14]. A weaker notion is that they are allowed to scale sublinearly with [7], [12], [13]. It was shown in [15] that any LDGM code with bounded ones per column cannot achieve the optimal rate–distortion tradeoff, for the binary symmetric source with Hamming distortion. In this paper, we propose a new construction of nonlinear codes based on LDGM matrices, which are asymptotically optimal for compressing discrete memoryless sources with separable distortion criterion. This construction is low-density in the weaker sense that the number of nonzero entries per row 0018-9448/$25.00 © 2009 IEEE Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. 1962 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 in the generator matrix scale as with the block length . We also provide suboptimal compressors for these codes, which have excellent empirical performance, even at moderate block lengths. Our code design can be viewed as an intermediate on a continuum of block codes, with the linear codebook and the random nonlinear codebook as the two extremes. The remainder of this paper is organized as follows. Section II presents the code design and proof of the asymptotic optimality of the construction for compressing the binary symmetric source with a Hamming distortion criterion. Section III extends the codes presented in Section II for compressing discrete memoryless sources with a separable (i.e., letter-bu-letter) and bounded distortion criterion. Section IV proposes suboptimal compression algorithms whose performance is illustrated in Section V. II. CODE CONSTRUCTION AND ANALYSIS FOR THE BINARY SOURCE A. Code Construction codebook has block length and codeA binary words. If there is no underlying structure to this set (for example, a random codebook) then exponential complexity is required for channel decoding (or lossy compression). A binary linear codebook, on the other hand, is a much restricted set of codewords: all the binary -vectors that can be written as (1) for all possible choices of a binary -vector , for a given binary matrix . Note that if is not allowed to range over all choices then the ensuing codebook is, in general, nonlinear. In fact, any codebook (linear or nonlinear) can be described by (1) if is allowed to range over only the vectors with unit Hamming weight, and has as many rows as codewords, i.e., . In this paper, we propose a class of nonlinear codebooks that has some convenient structure by letting range over the -vec, where . tors with a given Hamming weight Further, we let (2) A convenient low-density choice of the binary matrix is by independent and identically distributed generation of its coefficients where (6) The lossy compressor is the minimum Hamming distance decoder and the decompressor is simply the encoder. B. Code Analysis We now turn to the analysis of the code introduced in Section II-A. We show that with high probability the lossy compressor described in Section II-A asymptotically attains the rate–distortion function of the memoryless binary symmetric source with Hamming distortion. More formally we show the following result. Theorem 1: Let a codebook be constructed as specified in . As , Section II-A with block length and the Hamming distortion obtained by representing an arbitrary -length source realization with the nearest codeword in the codebook is less than almost surely. Proof: Pick a random codebook as outlined in Section II-A. . Denote Label the codewords as , (7) and (8) The event is equivalent to the event that at least one codeword in the codebook is within Hamming distortion from goes to as the given source. Thus, if we show that the theorem will be proved. However we will see later that using martingale arguments it is sufficient to show that (9) to claim that and (10) (3) the binary -strings of HamWe denote by in lexicographic order. The codebook is ming weight given by where . The number of codewords in the codebook is equal to (10) to Therefore, we first show (9) and then we prove (9) complete the proof of Theorem 1. The proof is structured as a sequence of intermediate lemmas. The Cauchy–Schwarz inequality gives a lower bound on in terms of the first and second moments of the nonnegative random variable (4) Lemma 1: The asymptotic rate of the code converges to (11) To compute (5) we make use of the following result. Lemma 2: with the choice of parameters in (2) and (3). Proof: See Appendix I. Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. (12) GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION 1963 Proof of Lemma 2: Although this result is similar to Lemma 3 in [11], we cannot use the proof therein because it requires the code to be linear (21) (22) (13) (23) (14) (24) where (21) follows from (20) through application of the binomial expansion Therefore (25) (15) (16) (17) (26) To compute probabilities of the form , . To that end, we have the we need the joint statistics of following result. Lemma 4: Let , and let be a sequence such that . Then where (17) follows from (16), because by symmetry does not depend on . We now show that asymptotically each codeword behaves like a sequence of fair coin flips, as formalized by the following result. (27) whenever the limits exist. Proof of Lemma 4: and the Lemma 3: For every of the codeword are independent bits and identically distributed. If and , are disjoint , then and subsets of are independent. Furthermore (18) (28) (29) for every Proof of Lemma 3: Recall that (30) (19) if and only if the positions corBy definition, responding to the ones in select an odd number of ones in the th row of . These events are independent with identical are probabilities for different , because the coefficients of independent and identically distributed. Thus, if then and are independent and the are independent and identically disbits tributed. Furthermore where (29) is obtained from Lemmas 21 and 22 and (30) is obtained from Lemma 21. Returning to the proof of Theorem 1, let (31) and (32) In order to compute the quantity in the right-hand side of (12) we write (20) (33) Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. 1964 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 The first term in (33) is less than or equal to , which grows subexponentially with as the following result shows. Lemma 5: Let , such that for and (47) be defined as in (31), then (34) Proof of Lemma 5: Let Let and , be the all-zero and all-one -vectors respectively. Then, using Lemma 14 (Appendix II), and Sanov’s theorem (e.g., [16, Theorem 11.4.1]), we obtain (48) (35) and let (49) be the set of -vectors defined as (36) and (50) Clearly (51) (37) (38) Together with (47), we obtain the desired result from(49)–(51). Using Lemma 2 in [11] we have the following result. (39) Lemma 7: For defined in (8) where we used the fact that (52) (40) . for Next, using (39) we show that , thus proving (34). grows subexponentially in Combined with inequality (11) and Lemma 7, the following . result gives a lower bound to Lemma 8: If for an arbitrary , then (53) (41) Proof of Lemma 8: For arbitrary (42) (43) (44) (54) Let and be defined in (31) and (32), respectively, then Substituting for and noticing that by assumption we get (43). From (44) and the lower bound we get (34). We give the asymptotic rate of decay of following lemma. Lemma 6: Let chosen in Section II-A. For (55) in the (56) be a random codebook as where (56) follows from Lemma 5 and (54). (45) Proof of Lemma 6: Although a similar result is given in [14, Lemma 3] , we give a self-contained proof due to the different code construction. From Lemma 3, are independent and identically distributed with (57) (46) (58) Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION (59) 1965 If (70) (60) (61) (62) where (60) is obtained by using Lemma 4. Combining (56) and (62) we obtain (53). Now we show that then there exists a convergent subsequence such that (71) From Lemma 10 along this subsequence, we have does not decay exponentially in (72) . Lemma 9: Let (73) . Then (63) which contradicts Lemma 9. Thus (74) Proof of Lemma 9: Using inequality (11) and Lemma 2 we have which, according to Lemma 10, implies that (64) (75) Therefore, (10) follows in view of (69). (65) The desired result follows from the asymptotic behavior of both terms in the right-side of (65) found in Lemmas 7 and 8. To finalize the proof of Theorem 1 we use an argument that is virtually identical to the proof of Theorem 2 in [11]; we spell out the details because our code construction is different from the one presented in [11]. To prove Theorem 1 we will also use the following auxiliary bound. Lemma 10: [17] For a martingale if III. CODE CONSTRUCTION AND PROOF OF OPTIMALITY FOR THE DISCRETE MEMORYLESS SOURCE A. Code Construction for the Nonredundant Source We begin by generalizing the construction in Section II-A to the discrete nonredundant (i.e., memoryless and equiprobable) source taking values over an alphabet . We label the symbols . The codebook in the source alphabet as is defined through a matrix (where addition is over the group ), as (76) (66) , then for all (67) where are binary -vectors of Hamming in lexicographic order and and are chosen weight from (2) and (3). The asymptotic rate of the code satisfies (see Lemma 1) Define the martingale (77) (68) are the rows of the matrix. For this where martingale, (66) is satisfied with according to Lemma 15. is the average (over all the codebooks) of the HamNote that ming distance between the source realization and the closest codeword. Furthermore, there is no averaging with respect to the , as it is the distance between codebook in the definition of the source realization and the closest codeword in the codebook defined by , and (69) The matrix is obtained by random independent and identically distributed generation of its coefficients such that (78) and (79) for . For this code construction we can show a general by in (18), and of Lemma version of Lemma 3 replacing 4 where . In addition, this code construction achieves Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. 1966 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 the ideal rate–distortion tradeoff asymptotically (see [18] for details). Theorem 2: Construct a codebook as given above with a , where is the rate–distorblock length and tion function for the equiprobable -ary source with Hamming is the nearest distortion. Let be the source realization. If codeword in Hamming distance to , then (80) . for all C. Code Analysis The code construction in Section III-B achieves the ideal rate distortion performance asymptotically, as stated in the following result. Theorem 3: Consider a memoryless source distributed according to and per-letter distortion , with . Let be the output of the rate–distortion function compressor–decompressor in Section III-B designed for distor, when the source realization and asymptotic rate tion is . Then (85) B. Code Construction for Discrete Memoryless Sources Next, we bootstrap the code design for the nonredundant -ary source with Hamming distortion, to obtain asymptotically optimal codes for compressing general discrete memoryless sources with bounded and separable1 distortion criteria. The idea for this code construction is similar to the one given in [19] for channel coding and [6] for lossy source coding. Consider a discrete memoryless source taking values over an alphabet with distribution . Let be the reproduction albe the per-letter distortion phabet and measure. Consider the variational problem corresponding to the rate-distortion function for a given distortion (81) For brevity, we fix , and denote by the marginal distribution resulting from the minimization in (81). We will assume that is a rational distribution, i.e., we can write it as Proof: We begin by showing the following result for the distribution of . Lemma 11: For every and , are independent and identically distributed. If and , are disjoint subsets of , and are independent. Furtherthen more (86) for all . be the codeProof of Lemma 11: Let words in Section III-B before the mapping is applied. From Lemma 3 (generalized to the -ary case), if and , are disjoint subsets of , then and are independent. Therefore, and are independent since is a deterministic mapping. Further, from Lemma 3 (generalized to the -ary case), for (82) where and are integers. This is not a very restrictive condition as we can design codes to operate arbitrarily closely to any given point on the rate–distortion tradeoff curve. such that Thus, for every (87) Therefore (88) (83) (84) corresponding to is of the form given in (82). and of the form (82), we construct a -ary Given a rate and codebook using LDGM matrices with block length , asymptotic rate , and a deterministic mapping , such that maps the equiprobable probability disto over . The codebook tribution over is obtained by applying to each symbol in the codebook . The compressor selects the codeword closest to the source realization (according to the distortion criterion ). Again, since the number of codewords has defined by not changed after applying the deterministic transformation , from Lemma 1 the asymptotic rate remains . 1Separable distortion means that d (x ; y ) = (89) (90) where is the subset of which is mapped to . We get (90) because maps the equiprobable probability distribution to . Returning to the proof of Theorem 3, define (91) and let be such that d(x [i]; y [i]). Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. (92) GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION then 1967 whenever the limits exist. Therefore, for a sequence of sets and , we have (93) (106) (94) (107) Therefore, to prove Theorem 3 all we need to show is that there over which (92) holds and exists a set (108) (95) denotes the pre-image whenever the limits exist (where of the set ). Using (108) and (57)–(60) we have , let For (96) and define (97) for a vanishing sequence , which will be specified later. We will now show that satisfies (92) and (95) in the following two lemmas, completing the proof of Theorem 3. Lemma 12: For defined in (97) and holds. , let Proof of Lemma 12: Fix we have With (110) (111) Therefore , (92) (112) . Using (112), (102), and (11) for (98) (113) (99) Now we proceed as in the proof of Theorem 1 (see also [11]). Define a martingale as (100) (114) (101) where are the columns of the matrix. For this martingale, (66) is satisfied with according to Lemma 17. The remainder of the proof proceeds as in the proof of Theorem 1. (102) where (102) follows from Lemma 1 and (97). Define (31) and (32). Using Lemma 5 and (54)–(56) we get (109) as in Now, to complete the proof of Theorem 3 all we need to show is the following result. Lemma 13: There exists a sequence such that for defined in (97), (95) holds. Proof of Lemma 13: From Lemma 11, for fixed each codeword is a sequence of independent and identically dis, such that tributed random variables with distribution (115) (103) (104) Fix a source realization and define the random variables . Let (116) Let be a sequence such that , and let and . Using Lemma 4 generalized to the -ary case: for a sequence of sets , we have (117) and let (105) Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. (118) 1968 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 Define each component is mapped with the function Section III-B. The -vector satisfies defined in (127) (119) Using the Chebyshev inequality (120) From Lemma 16 for and any fixed (121) which implies, since is arbitrary (122) Using the Gartner–Ellis theorem [20, Theorem 2.3.6], for where is the Fenchel–Legendre transform of uated at , i.e., (123) (124) eval(125) From [21, Theorem 2], . Therefore, for a suitably and (120) implies that (95) is satisfied. chosen , Finally, Theorem 3 follows from Lemmas 12, 13 as well as (92) and (95). IV. SUBOPTIMAL ALGORITHMS FOR COMPRESSION In this section, we describe a suboptimal algorithm (and its variants) to encode a source using the codebooks described in Sections II and III. This algorithm attempts to locate a codeword in the codebook such that the distortion between the source and codeword is minimum. We note that an optimal algorithm to select such a codeword is NP-complete [22], therefore any polynomial complexity algorithm (like the one presented here) is necessarily suboptimal. However, it should be noted that NP-completeness implies hardness in the sense of worst case input; it does not rule out the existence of a polynomial time algorithm that is able to locate the minimum-distance codeword for most source sequences. Empirical results in Section V demonstrate that our algorithm attains near-optimum performance. The compressor/decompressor work as follows. matrix Recall that the codebook is specified by the and . It consists of all the codewords of length that can be written as (126) where is the identity mapping for the binary and nonredundant case and is a deterministic many to one mapping (see Section III-B) otherwise, where with slight abuse of notation The algorithm attempts to find a good approximation to the source string of length among the codewords in an iterative manner. At each step in the iteration, we select a string of length , by flipping one and only one bit in . The algorithm and at the completion of the algostarts with rithm we have . Let . The choice of the bit to flip in is such that is minimized. To that end, we perform an exhaustive search for all columns of enumerated as , , computing , and selecting the index that leads to the lowest distortion metric. This procedure is repeated till . If , then if , the algorithm is now constrained to flip only bits which are zero , and vice versa for in that lead to minimum the case when . We halt when . It is immaterial how ties are broken by the compressor. At the final configuration of , the encoder then stores the value of in the form of an index, using an enumerative encoding scheme [23]. The decoder then uses this index to recover , and outputs . A pseudocode description of the encoder is given in Algorithm 1 at the top of the following page. Some other variations in this algorithm are also possible. For example, we can fix a recursion depth . We run multiple copies of this algorithm whenever we have ties for the element with the maximum gain by flipping each maximum gain position in different copies. After bits have been flipped, the algorithm proceeds as described above in each of the multiple copies. Finally, we choose the winner out of all these multiple copies. Another possible variation is to flip pairs of bits simultaneously, selecting the pair which leads to the best approximation to the source. For the core algorithm, the complexity analysis may be perat each iteration requires formed as follows: To compute computations on average because we add a total of columns ( ) to and each column contains entries on average; therefore, the average computational cost is . The number of iterations is bounded by . Thus, the average complexity of the algorithm compared to various mesis sage-passing-based approaches such as survey propagation [9] . and its variants [8], which incur a complexity of V. EXPERIMENTS In this section, we show empirical results obtained with the codes given in Sections II and III and the encoding/decoding algorithms in Section IV for a variety of rate–distortion problems. For each rate we fix a randomly generated codebook and average the distortion obtained for compressing a random source (for 1000 iterations). LDGM codes and message-passing algorithms perform very close to the rate–distortion function for compressing the binary symmetric source with Hamming distortion for block lengths of the order of thousands as demonstrated by the empirical results in [8] and [9]. However, these algorithms perform far from Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION 1969 In Figs. 4 and 5, we plot the rate–distortion tradeoff for and an equiprobable -ary source for block lengths , respectively, with a Hamming distortion criterion. These figures show that the performance of our codes and encoding algorithm is very close to the optimal for short block lengths (for the general -ary source with Hamming distortion). In Fig. 6 we show results obtained with codes from Section III -Bernoulli source with a Hamming for compressing the . We now prodistortion criterion and block length vide an illustration of the code construction in Section III-B for this problem. In this case, the reproduction distribution is given as corresponding to the rate–distortion point . , the corresponding reproduction distribution is Let , thus, a reasonable choice of from (82) is . The mapping should map the to equiprobable probability distribution over over , a possible choice is the distribution and otherwise. Thus, to obtain the codebook for compressing the -Bernoulli source with we construct a -ary codebook over Hamming distortion with rate , and apply the alphabet to each of its codewords. the mapping In Fig. 7, we show results obtained with the codes in Section III, when used for compressing the binary symmetric source ) where the distortion criterion (with block length satisfies (128) These experiments demonstrate the near-optimal performance of the proposed codes for simple memoryless sources and separable distortion criterion, even for short block lengths. Furthermore, the low complexity of the proposed suboptimal compression algorithm makes the new codes particularly appealing. APPENDIX I PROOF OF LEMMA 1 The rate of the code as a function of the block length is given as (129) From [16, eq. (11.40)] (130) Therefore (131) (132) (133) optimal for short block lengths due to the effect of cycles in the graph. On the other hand, our scheme performs well even ) as shown in Fig. 1. for short block lengths (such as , both schemes are very close to For block length the rate–distortion function without any discernible difference in performance as seen in Figs. 2 and 3. (134) (135) as we wanted to show. Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. 1970 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 Fig. 1. Empirical performance of the code in Section II-A with the suboptimal encoding algorithm in Section IV compared with LDGM codes and the message. passing heuristic from [8], for the binary symmetric source with bit-error-rate distortion and block length n = 400 Fig. 2. Empirical performance of the code in Section II-A with the suboptimal encoding algorithm in Section IV compared with LDGM codes and the message. passing heuristic from [8], for the binary symmetric source with bit-error-rate distortion and block length n = 1000 APPENDIX II AUXILIARY RESULTS Proof: Let ables with Lemma 14: Let be independent and identically distributed binary random variables such that be independent binary random vari(139) If we show that for arbitrary (136) where and let . Let denote an arbitrary sequence in , then for then (138) follows by induction. Let (137) (138) Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. (140) (141) GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION 1971 Fig. 3. Empirical performance of the code in Section II-A with the suboptimal encoding algorithm in Section IV compared with LDGM codes and the message. passing heuristic from [8], for the binary symmetric source with bit-error-rate distortion and block length n = 2000 4 Fig. 4. Empirical performance of the code in Section III for compressing the -ary source with block length n = 100 and Hamming distortion criterion. and (143) (144) (142) Similarly Clearly , further (145) , (140) holds. Using (140) Since and induction on we have (137) and (138). Lemma 15: Let be defined as in (68) then (146) Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. 1972 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 Fig. 5. Empirical performance of the new code for compressing the 4-ary source with block length n = 400 and Hamming distortion criterion. Proof: For a given source realization let (147) satisfies further note that the distribution function of Note that Lemma 15 was shown for the code construction in [11], however, we cannot use the proof therein due to the difference in code construction. Lemma 16: For such that for and every there exists an (148) (154) because, by construction, the rows are independent. Conditioning on of the matrix where , , and (118), respectively. Proof: we obtain are defined in (119), (116), and (155) (149) From Lemma 11, for every for there exists an such that (156) (150) Therefore, for A change in one column of the at most one. Therefore matrix can change by (157) (158) (151) Choosing , for (152) (153) regardless of . Therefore, (146) holds. Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. (159) GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION Fig. 6. Empirical performance of the code for compressing the Bernoulli (p = 0:4) source with block length n = 1000 and bit-error-rate distortion criterion. Fig. 7. Empirical performance of the new code for compressing the binary symmetric source with block length n in (128). Substituting 1973 from (116) in (159) and using (119), for = 1000 and the asymmetric distortion criterion Using (149)–(150) and replacing by , we have (160) Choosing and the required result. we get (163) Lemma 17: Let be defined as in (114) then (164) (165) (161) Proof: For fixed let (162) We get (164) because a change in one column of the can change by at most . Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. matrix 1974 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 Lemma 18: Let be such that are independent binary random variables with If (178) (166) then then for all (179) (167) , which does not where depend on . Proof: Let for all and . Identifying we get (167). Lemma 19: For a sequence of vectors satisfying (166) quence of integers , , , and a se- (168) (180) From (166), where . Define , and Proof: Obviously (169) and (181) (182) (170) where denotes the logical AND operation and denotes the complement of . The vectors , , and are nonoverlapping in the sense that but for any , Lemma 3 yields (183) (171) Therefore, (180) follows from Lemma 18. Further Lemma 20: For any sequence of deterministic vectors (172) and (173) where (184) Proof: The proof is similar to the proof of Lemma 19. Denoting the common distribution of by , we have denotes the logical XOR operation. Further (185) (174) Therefore (186) and (187) (175) , and as and , respectively. Denote These vectors are mutually independent since , , and are nonoverlapping. Further, if and only if the ones in select an odd number of ones in the th row of matrix. The probability of this event satisfies (see (20)– (22) by ) substituting (176) using Lemma 3 (188) Lemma 21: (189) whenever the limits exist. Proof: Let (190) (177) and analogously for , for . We get (177) from (176) because . and Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply. (191) GUPTA AND VERDÚ: NONLINEAR SPARSE-GRAPH CODES FOR LOSSY COMPRESSION We have (192) (193) Taking limits and using Lemma 20 the result follows. Lemma 22: For a sequence of sets satisfying (166) sequence and a (194) whenever the limits exist. Proof: For each let (195) and (196) Therefore (197) (198) Taking limits and using Lemma 19 we get (194). ACKNOWLEDGMENT We wish to thank the referees for their help in improving the presentation. REFERENCES [1] I. Csiszár and J. Körner, Information Theory, Coding Theorems for Discrete Memoryless Systems. New York: Academic, 1981. [2] G. Caire, S. Shamai (Shitz), and S. Verdú, “Lossless data compression with error correcting codes,” in Advances in Network Information Theory. Providence, RI: Amer. Math. Soc., 2004, vol. 66, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 263–284. [3] T. Ancheta, “Bounds and Techniques for Linear Source Coding,” Ph.D. dissertation, Dep. Elec. Eng., Univ. Notre Dame, Notre Dame, IN, 1977. [4] J. L. Massey, “Joint source and channel coding,” Commun. Syst. Random Process Theory, vol. 11, pp. 279–293, 1978. [5] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression. Eaglewood Cliffs, NJ: Prentice-Hall, 1971. [6] J. Chen, D. He, and A. Jagmohan, “Achieving the rate-distortion bound with linear codes,” in Proc. 2007 IEEE Information Theory Workshop, Lake Tahoe, CA, Sep. 2007, pp. 662–667. [7] Y. Matsunaga and H. Yamamoto, “A coding theorem for lossy data compression by LDPC codes,” IEEE Trans. Inf. Theory, vol. 49, no. 9, pp. 2225–2229, Sep. 2003. 1975 [8] M. J. Wainwright and E. Maneva, “Lossy source encoding via message passing and decimation over generalized codewords of LDGM codes,” in Proc. 2005 IEEE Int. Symp.Information Theory, Adelaide, Australia, Sep. 2005, pp. 1493–1497. [9] S. Ciliberti, K. Mezard, and R. Zecchina, “Message passing algorithms for non-linear nodes and data compression,” ComPlexUs, vol. 3, pp. 58–65, Aug. 2006. [10] A. Braunstein, K. Mezard, and R. Zecchina, “Survey propagation: An algorithm for satisfiability,” Random Structures and Algorithms, vol. 27, pp. 201–226, Mar. 2005. [11] E. Martinian and M. J. Wainwright, “Low-density codes achieve the rate-distortion bound,” in Proc. 2006 Data Compression Conf., Snowbird, UT, Mar. 2006, pp. 153–162. [12] S. Miyake, “Lossy data compression over Z by LDPC code,” in Proc. 2006 IEEE Int. Symp. Information Theory, Seattle, WA, Jul. 2006, pp. 813–816. [13] S. Miyake and J. Muramatsu, “Construction of a lossy source code using LDPC matrices,” in Proc. 2007 IEEE Int. Symp. Information Theory, Nice, France, Jun. 2007, pp. 1106–1110. [14] E. Martinian and M. J. Wainwright, “Analysis of LDGM and compound codes for lossy compression and binning,” in Proc. 2006 Workshop on Information Theory and its Applications, La Jolla, CA, Feb. 2006. [15] S. Kudekar and R. Urbanke, “Lower bounds on the rate-distortion function of individual LDGM codes,” in Proc. 5th Int. Symp. Turbo Codes and Related Topics, Lausanne, Switzerland, Sep. 2008, pp. 379–384. [16] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. New York: Wiley Interscience, 2006. [17] K. Azuma, “Weighted sums of certain dependent random variables,” Tohoku Math. J., vol. 19, pp. 357–367, 1967. [18] A. Gupta and S. Verdú, “Nonlinear sparse-graph codes for lossy compression of discrete nonredundant sources,” in Proc. 2007 IEEE Information Theory Workshop, Lake Tahoe, CA, Sep. 2007, pp. 541–546. [19] R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. [20] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. New York: Springer, 2004. [21] A. Dembo and I. Kontoyiannis, “Source coding, large deviations and approximate pattern matching,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1590–1615, Jun. 2002. [22] E. R. Berlekamp, R. J. McEliece, and H. C. A. van Tilborg, “On the intractability of certain coding problems,” IEEE Trans. Inf. Theory, vol. IT-24, no. 3, pp. 384–386, May 1978. [23] T. M. Cover, “Enumerative source encoding,” IEEE Trans. Inf. Theory, vol. IT-10, no. 1, pp. 460–473, Jan. 1973. Ankit Gupta (S’07) received the B.Tech. degree in 2003 from Indian Institute of Technology, Delhi, India, and the M.A. degree in 2006 from Princeton University, Princeton, NJ, both in electrical engineering. He is currently pursuing the Ph.D. degree in electrical engineering at Princeton University. Sergio Verdú (S’80–M’84–SM’88–F’03) received the Telecommunications Engineering degree from the Universitat Politècnica de Barcelona, Barcelona, Spain, in 1980 and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign, Urbana, IL, in 1984. Since 1984, he has been a member of the faculty of Princeton University, Princeton, NJ, where he is the Eugene Higgins Professor of Electrical Engineering. Sergio Verdú is the recipient of the 2007 Claude E. Shannon Award and the 2008 IEEE Richard W. Hamming Medal. He is a member of the National Academy of Engineering and was awarded a Doctorate Honoris Causa from the Universitat Politècnica de Catalunya in 2005. He is a recipient of several paper awards from the IEEE: the 1992 Donald Fink Paper Award, the 1998 Information Theory Outstanding Paper Award, an Information Theory Golden Jubilee Paper Award, the 2002 Leonard Abraham Prize Award, and the 2006 Joint Communications/Information Theory Paper Award. In 1998, Cambridge University Press published his book Multiuser Detection, for which he received the 2000 Frederick E. Terman Award from the American Society for Engineering Education. He served as President of the IEEE Information Theory Society in 1997 and as Associate Editor for Shannon Theory of the IEEE TRANSACTIONS ON INFORMATION THEORY. He is currently Editor-in-Chief of Foundations and Trends in Communications and Information Theory. Authorized licensed use limited to: Princeton University. Downloaded on July 28, 2009 at 12:23 from IEEE Xplore. Restrictions apply.
© Copyright 2026 Paperzz