Low Bit Rate Image Compression with Orthogonal Projection Pursuit Neural Networks S. R. Safavian1, Hamid R. Rabiee2, M. Fardanesh3, and R. L. Kashyap3 LCC Inc., Cellular Institute, Arlington, VA 22201 1 2 Intel Corporation, Media & Interconnect Technology Lab, Hillsboro, OR 97124 School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN 47907 3 Abstract A new multiresolution algorithm for image compression based on projection pursuit neural networks is presented. High quality low bit-rate image compression is achieved first by segmenting an image into regions of different sizes based on perceptual variation in each region and then constructing a distinct code for each block by using the orthogonal projection pursuit neural networks. This algorithm allows one to adaptively construct a better approximation for each block by optimally selecting the basis functions from a universal set. The convergence is guaranteed by orthogonalizing the selected bases at each iteration. The coefficients of the approximations are obtained by back-projection with convex combinations. Our experimental results shows that at rates below 0.5 bits/ pixel, this algorithm shows excellent performance both in terms of PSNR and subjective image quality. 1. Introduction Input Layer • • • Hidden Layer Channel • • • Image compression has been a major area of research for the past few decades. More recently it has attracted a lot of attention due to the increasing demand for visual communications in home entertainment and business applications over the existing bandlimitted channels. Current major trends in image compression [1] include the traditional transform based methods (mainly blockbased DCT), vector quantization, and more recent fractal and wavelet based approaches. In many of these image coding techniques it is necessary to divide an image into smaller regions of fixed or variable sizes. Fixed size fixed shape segmentation has the advantage of simplicity and lower overhead but usually fails to capture the information regarding the homogeneity of the regions. In variable size segmentation, the size of the region is controlled by the amount of the activity within the region. Popular approaches to variable size/shape segmentation include Gauss-Markov random fields [2], binary space partitioning [3][4] tree, Horizontal-Vertical binary trees [4], and quadtree decomposition [4]-[6]. Quadtree decomposition is particularly attractive because of its ease of implementation. The recent interest in artificial neural networks (ANNs) has motivated a large number of applications covering a wide range of research fields such as digital image compression [7]-[12]. These efforts have mostly concentrated in ANN implementation of the vector quantization [7] and orthogonal transforms such as Discrete Cosine Transform (DCT) and Karhunen-Loeve Transform (KLT) [8]-[12]. The later methods are based on a three-layer linear or nonlinear perceptron, as shown in Figure 1, in which the first two layers act as an encoder and the third layer (the output layer) acts as a decoder. Output Layer Figure 1. A three-layer perceptron for image compression. More specifically, every pixel of an nxn block in an image is fed into the input of an ANN with h hidden units and the network is trained by setting the desired output equal to the input. The parameters of the network (weights and biases) can be found via a gradient based method such as back-propagation and some error metric such as the mean square error, to reduce the error between the actual and the desired output. Typically, the network is trained by using a set of small size image blocks (8x8, or smaller) from several training images and then is tested on the actual image. If h<n2, then a compressed version of the input image will be available at the output of the hidden nodes. More recent papers in this area of research [10]-[12] discuss the effects of the hidden node size, error metrics and other parameters of the ANNs on the performance of this class of image coders. In this paper, we propose a new compression algorithm that fits different ANN models to different segmented image blocks by utilizing the theory of projection pursuit neural networks [13][14][15]. Previous works in image modeling includes Leonardi and Kunt [16] and Kocher and Leonardi [17] who attempted to approximate the image blocks with polynomials of different orders. Our models are easily implemented in terms of conventional neural networks and produce high quality images at very low bit rates. The organization of this paper is as follows. Section 2 is devoted to the image segmentation. Section 3 presents the theory of the orthogonal projection pursuit neural networks. Section 4 briefly discusses the quantization and coding of the ANN parameters. Finally, the experimental results and conclusions are provided in section 5. • The decomposition results in image segmentation. 2. Image Segmentation Each block in the segmentation map of an image can be modeled by using the Projection Pursuit (PP) image approximation technique [13],[14]. Projection pursuit is an efficient iterative function approximation method that can be efficiently implemented with ANNs. In every iteration of this algorithm, from a set of fixed pre-determined basis functions called the Universal Basis Function Set (UBFS), a function is selected that best approximates the current image in the given block. In the first step of the iteration, the current image is the original image, and in the step k the current image is the residual or the error image that results from subtraction of the original image and linear combination of all the (k-1)th previous approximations. Various measures of error can be used to assess the quality of the approximation, but due to its mathematical tractability, the L2 norm is the most widely used measure of error. A typical UBFS could include the following functions. Gaussian kernels, triangular functions, Laplacian functions, sinc squares, Daubechies wavelets, Meyer Natural gray-level images can usually be divided into different size regions with variable amounts of details and information. Such segmentation of the image is useful for efficient coding of the image data. The Quadtree decomposition is a simple technique for image representation at different resolution levels. It is a powerful technique that divides the image into 2-D homogenous regions. The Quadtree decomposition [5] is attractive for a number of reasons: • It is relatively simple to implement compared to the other methods of image representation. • It adaptively decomposes the image by dividing it into square regions with sizes depending on the activities in the blocks. This leads to variable-rate image coders that change the coding resolution (in bits per unit area) according to the local characteristics of the region that is being coded. Our segmentation algorithm divides the image into variable-size square blocks based on a measure of activity within the block. The size of the finest block is dictated by the combination of the desired bit rate and the desired peak signal-to-noise ratio. Several measures of activities such as visual entropy [5] and image variation were examined with minor difference in the final segmented image. The segmentation map obtained for the test image Lena is shown in Figure 2. Figure 2. Quadtree segmentation map of the test image Lena (Figure 5(a)), generated by using the visual entropy. 3. The Projection Pursuit Neural Networks wavelets, splines, sigmoidal functions, etc. Clearly, to obtain an efficient representation for each block (minimum number of parameters in the ANN model), the UBFS should include both continuous and discontinuous functions each with different degrees of smoothness. Two possible disadvantages of considering a large UBFS are slight increase in coding overhead and increase in computation in selecting the optimal basis function at every step of the iteration. One possible approach to ease the burden of the computation is to estimate the degree of smoothness of the current image and then select a basis function from the UBFS that has similar degree of smoothness. Let f(x,y) denote the intensity of the image at location (x,y), fk̂ (x ,y) its estimated value, and rk(x,y) the residual image at iteration k. Let Θk = { g k, α k, β k, γ k } denote the set of parameters at iteration k, where gk correspond to the various basis functions in the UBFS. Let d(.,.) be the desired error metric which is typically the mean square error, and z T = ( x, y ) . In our algorithm the optimal values of the ANN model will be indicated by superscript "*", and vectors with underbars. A T will represent the transpose of the vector A , and A • B the scalar product of the vectors A and B . Our proposed algorithm can be summarized as follows. First we have to select a UBFS, Φ = { φ1 , . . . , φ N } , where φ i’s are the basis functions. Then for every block B in the segmentation map let r 0 ( z ) = f ( z ) and computes the optimal parameter * vector of the ANN model, Θ k = {g k* , α *k , β *k , γ k* } . To compute the optimal parameters at iteration k let J ( z ) = d ( r k – 1 ( z ), γ k g k ( α k T •z + βk ) ) (3.1) then Θk * = arg gmin k ∈Φ min α , β , γ k k k ∑ J ( z ) z∈B k ∑ γl* g*l α*T • z l + β l * (3.3) l=1 and the residual image at iteration k is given by r k ( z ) = f ( z ) – fˆ* ( z ) k L L* ∑ γl* g*l α*T • z l + β *l (3.5) l =1 Note that after selection of a particular basis at each iteration the selected function is orthogonalized by projection into the span of all previously selected bases from the UBFS. The orthogonalization step is necessary for fast convergence of the projection pursuit algorithm [18]. Clearly, this type of ANN will produce a more efficient model for the image data because the nodes can be adaptively added to the model until the desired level of accuracy for representation is achieved. The ANN implementation of the above algorithm is illustrated in Figure 3. g1* γ1* α*11 x o y o g2* γ2* f^* γL** α*L*2 L* g* L* Figure 3. The projection pursuit neural network. The parameter b is mimicked in the node function g. Clearly, this type of ANN will produce a more efficient model for the image data because the nodes can be adaptively added to the model until the desired level of accuracy for representation is achieved. 4. Quantization and Coding k fˆ ( z ) = * fˆ* ( z ) = (3.2) and the reconstructed image at iteration k can be written as * and the Bit Rate at this iteration. If these values does not satisfy the desired limits, iterate through equation (3.1) to (3.4), otherwise (3.4) Then compute the Peak Signal-to-Noise Ratio (PSNR) The optimum parameters for each block must be quantized before encoding. The optimum quantizer, in the mean square error sense, is a non-uniform quantizer that matches the probability density function of its input signal. Based on the distribution of the weights and the biases and their dynamic ranges for each block, separate Lloyd-Max quantizers [18] were designed for each set of parameters at each iteration of the algorithm. By quantizing the coefficients at each iteration we force the algorithm to correct the quantization errors at each iteration and improving the resulting PSNR. The resulting quantized parameters can be coded in a number of ways. Various coding schemes such as Huffman coding, Shannon-Fano coding, and arithmetic coding [19] were considered and arithmetic coding was chosen based on the following: • Arithmetic coding can approach the entropy limit in coding efficiency. • Arithmetic coding requires only one pass through the data. • The source modeling and information encoding are separated. • Arithmetic coding is generally faster than the Huffman coding. • In arithmetic coding the encoder and the decoder can work on-line. Furthermore, a arithmetic coder has the advantage that it requires no a priori analysis of the data set to determine the bit allocation. 5. Experimental Results and Conclusions The proposed algorithm was used to encode different types of test images. The test image Lena (512x512x8) is shown in Figure 5(a) and the test image Peppers is shown in Figure 5(d). Once the image was segmented, the ANN model for each block was constructed by the following procedure. Every function in the Universal Basis Function Set (UBFS) was used to obtain the best ANN model for each block by optimizing the parameters of the function in the mean square error sense. The optimal network parameters were selected by performing a greedy search over all the functions in the UBFS. This provided the first level approximation for each block. The next levels of approximations were obtained by repeating the above process on the residual errors. The process was terminated when the overall error dropped below the desired threshold or if the desired bit rate was achieved. The UBFS considered in our experiments included the following functions: sigmoidal function, splines, and Daubechies Wavelet (W4). Although theoretically, a greedy search on a large UBFS should produce a better approximation of the image, our experimental results showed that a few basis functions (3 to 5 with various degrees of smoothness) are sufficient to produce almost identical results. The parameters of the optimal ANN for each block were quantized using the Lloyd-Max quantizers. We used the histograms of the parameters for each block to design the quantizer. For our experiments, Gaussian and Laplacian Lloyd-Max quantizers with 5 or 6 bits that provided signal-toquantization ratios in the range of 33-35 dB were used for all the ANN parameter sets. Finally, the quantized parameters for each block were separately arithmetic encoded to form the compressed bit stream. The decoded test image Lena at the bit rate of 0.125 bit/pixel with PSNR of 30.2 dB is shown in Figure 5(b). For comparison, the JPEG algorithm [20] was used to encode Lena at the same bit rate of 0.125 bit/pixel. The PSNR of the corresponding JPEG decoded image was found to be 26.2 dB. The severe blockiness artifact present at the JPEG decoded image can be seen in Figure 5(c). The results for the test image Peppers with the same UBFS are shown in Figures 5(e) and 5(f). Our experimental results showed that at rates below 0.5 bit/pixel, our method is superior to JPEG both in terms of PSNR and the visual image quality. Moreover, the performance of our algorithm is superior to other neural network algorithms [7]-[12], and is comparable to wavelet based algorithm such the EZW encoder [21]. Although the speed of our orthogonal projection pursuit ANN encoder could be considerably slower than the JPEG or EZW encoders for a large UBFS, the speed of our projection pursuit decoder is comparable to the JPEG and EZW decoders. 6. References [1] Rabbani M. (Ed.), Selected Papers on Image Coding and Compression, SPIE Optical Engineering Press, 1992. [2] Geman S. and Geman D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. on Pattern Anal. and Machine Intell., pp. 721-741, PAMI6, 1984. [3] Radha H., Leonardi R., Vetterli M., and Naylor B., Binary space partitiong (BSP) tree representation of images, J. Visual Comm. and Image Rep., 1991. [4] Rabiee H.R., Kashyap R.L., and Radha H., Multiresolution Image Compression with BSP trees and Multilevel Block truncation Coding, Proceedings of the IEEE ICIP’95, Washington D.C., Oct. 23-26, 1995. [5] Vaisey J., and Gersho A., Image compression with variable block size segmentation, IEEE Trans. on Signal Proc., pp. 20402060, SP-40, 1992. [6] Shusterman E., and Feder, M., Image compression via improved quadtree decomposition algorithms, IEEE Trans. on Image Proc., pp. 207-215, IP-3, 1994. [7] Chen O., Sheu B.J., and Fang W., Image compression using self-organization networks, IEEE Trans. on Circuits and Systems for Video Tech., pp. 480-489, CSVT-4, 1994. [8] Carrato S., Premoli A., and Sicuranza G. L., Linear and nonlinear neural networks for image compression, Proc. of the IEEE Conf. on Digital Signal Processing, pp. 526-531, September 1991. (a) (b) (c) (f) (d) (e) Figure 5. The performance of our coding algorithm on test images Lena and Peppers: (a) The original test image Lena (512x512x8), (b) Decoded Lena; our method, at 0.125 bit/pixel, PSNR = 30.2 dB, (c) Decoded Lena; JPEG, at 0.125 bit/pixel, PSNR = 26.2 dB, (d) The original test image Peppers (512x512x8), (e) Decoded Peppers; our method, at 0.125 bit/pixel, PSNR = 30.35 dB, (f) Decoded Peppers; JPEG, at 0.125 bit/pixel, PSNR = 23.75 dB. [9] Cottrell G. W., Munro P., and Zipser D., Image compression by backpropagation: an example of extensional programming,Ó in SHARKEY N. E. (Ed.): Models of cognition: a review of cognition science, Norwood, NJ, 1989. [10] Mougeot M., Azencott R., and Angeniol B., A study of image compression with backpropagation, in HERAULT J. (Eds.), NATO ASI Series, Vol. F 68, Neurocomputing, SpringerVerlag, Berlin Heidelberg, 1990. [11] Carrato S., and Marsi S., Parallel structure based on neural networks for image compression, Electronic Lett. 28, 1992. [12] Pinho A. J., Image compression based on quadtree segmentation and artificial neural networks, Electronic Lett. 29, 1993. [13] Safavian S.R. and Tenorio M.F., Projection pursuit neural networks can avoid the curse of dimensionality, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, October 1994. [14] Huber P., Projection pursuit, Ann. Statist., vol. 13, 435-525, 1985. [15] S. R. Safavian, H. R. Rabiee, and M. Fardanesh, Projection Pursuit Image Compression with Variable Block Size Segmentation, Accepted for publication in IEEE SPL letters, Oct. 1996. [16] Leonardi, R., and Kunt, M., Adaptive split and merge for image analysis and coding, Proc. SPIE Conf. on Visual Commun. Image Processing, Cannes, France, December 1985. [17] Kocher M., and Leonardi R., Adaptive region growing technique using polynomial functions for image approximation, Signal Process. 11, (1), 1986. [18] H. R. Rabiee, R. L. Kashyap, and S. R. Safavian, Multiresolution Image Coding with Adapted Wavelet Packet Bases Using Segmented Orthogonal Matching Pursuit, To be submitted to IEEE Trans. on Image Processing, April 1997. [19] Jayant N.S. and Noll P., Digital Coding of Waveforms, Prentice-Hall, 1984. [20] Pennebaker W.B. and Mitchell J., JPEG Still Image Data Compression Standards, Van Nostrand Reinhold, 1993. [21] J. M. Shapiro, Embedded Image Coding Using Zerotrees of Wavelet Coefficients IEEE Trans. Signal Processing, vol. 41, no. 12, Dec. 1993.
© Copyright 2025 Paperzz