icnn97.pdf

Low Bit Rate Image Compression with Orthogonal Projection Pursuit Neural Networks
S. R. Safavian1, Hamid R. Rabiee2, M. Fardanesh3, and R. L. Kashyap3
LCC Inc., Cellular Institute, Arlington, VA 22201
1
2
Intel Corporation, Media & Interconnect Technology Lab, Hillsboro, OR 97124
School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN 47907
3
Abstract
A new multiresolution algorithm for image compression
based on projection pursuit neural networks is presented.
High quality low bit-rate image compression is achieved
first by segmenting an image into regions of different sizes
based on perceptual variation in each region and then
constructing a distinct code for each block by using the
orthogonal projection pursuit neural networks. This
algorithm allows one to adaptively construct a better
approximation for each block by optimally selecting the
basis functions from a universal set. The convergence is
guaranteed by orthogonalizing the selected bases at each
iteration. The coefficients of the approximations are
obtained by back-projection with convex combinations.
Our experimental results shows that at rates below 0.5 bits/
pixel, this algorithm shows excellent performance both in
terms of PSNR and subjective image quality.
1. Introduction
Input Layer
•
•
•
Hidden Layer
Channel
•
•
•
Image compression has been a major area of research
for the past few decades. More recently it has attracted a
lot of attention due to the increasing demand for visual
communications in home entertainment and business
applications over the existing bandlimitted channels.
Current major trends in image compression [1] include
the traditional transform based methods (mainly blockbased DCT), vector quantization, and more recent fractal
and wavelet based approaches. In many of these image
coding techniques it is necessary to divide an image into
smaller regions of fixed or variable sizes. Fixed size
fixed shape segmentation has the advantage of simplicity
and lower overhead but usually fails to capture the
information regarding the homogeneity of the regions. In
variable size segmentation, the size of the region is
controlled by the amount of the activity within the
region. Popular approaches to variable size/shape
segmentation include Gauss-Markov random fields [2],
binary space partitioning [3][4] tree, Horizontal-Vertical
binary trees [4], and quadtree decomposition [4]-[6].
Quadtree decomposition is particularly attractive because
of its ease of implementation.
The recent interest in artificial neural networks
(ANNs) has motivated a large number of applications
covering a wide range of research fields such as digital
image compression [7]-[12]. These efforts have mostly
concentrated in ANN implementation of the vector
quantization [7] and orthogonal transforms such as
Discrete Cosine Transform (DCT) and Karhunen-Loeve
Transform (KLT) [8]-[12]. The later methods are based
on a three-layer linear or nonlinear perceptron, as shown
in Figure 1, in which the first two layers act as an encoder
and the third layer (the output layer) acts as a decoder.
Output Layer
Figure 1. A three-layer perceptron for image
compression.
More specifically, every pixel of an nxn block in an
image is fed into the input of an ANN with h hidden
units and the network is trained by setting the desired
output equal to the input. The parameters of the network
(weights and biases) can be found via a gradient based
method such as back-propagation and some error metric
such as the mean square error, to reduce the error
between the actual and the desired output. Typically, the
network is trained by using a set of small size image
blocks (8x8, or smaller) from several training images and
then is tested on the actual image. If h<n2, then a
compressed version of the input image will be available
at the output of the hidden nodes. More recent papers in
this area of research [10]-[12] discuss the effects of the
hidden node size, error metrics and other parameters of
the ANNs on the performance of this class of image
coders.
In this paper, we propose a new compression
algorithm that fits different ANN models to different
segmented image blocks by utilizing the theory of
projection pursuit neural networks [13][14][15]. Previous
works in image modeling includes Leonardi and Kunt
[16] and Kocher and Leonardi [17] who attempted to
approximate the image blocks with polynomials of
different orders. Our models are easily implemented in
terms of conventional neural networks and produce high
quality images at very low bit rates. The organization of
this paper is as follows. Section 2 is devoted to the image
segmentation. Section 3 presents the theory of the
orthogonal projection pursuit neural networks. Section 4
briefly discusses the quantization and coding of the ANN
parameters. Finally, the experimental results and
conclusions are provided in section 5.
• The decomposition results in image segmentation.
2. Image Segmentation
Each block in the segmentation map of an image can
be modeled by using the Projection Pursuit (PP) image
approximation technique [13],[14]. Projection pursuit is
an efficient iterative function approximation method that
can be efficiently implemented with ANNs.
In every iteration of this algorithm, from a set of fixed
pre-determined basis functions called the Universal Basis
Function Set (UBFS), a function is selected that best
approximates the current image in the given block. In the
first step of the iteration, the current image is the original
image, and in the step k the current image is the residual
or the error image that results from subtraction of the
original image and linear combination of all the (k-1)th
previous approximations. Various measures of error can
be used to assess the quality of the approximation, but
due to its mathematical tractability, the L2 norm is the
most widely used measure of error.
A typical UBFS could include the following
functions. Gaussian kernels, triangular functions, Laplacian
functions, sinc squares, Daubechies wavelets, Meyer
Natural gray-level images can usually be divided into
different size regions with variable amounts of details
and information. Such segmentation of the image is
useful for efficient coding of the image data. The
Quadtree decomposition is a simple technique for image
representation at different resolution levels. It is a
powerful technique that divides the image into 2-D
homogenous regions. The Quadtree decomposition [5] is
attractive for a number of reasons:
• It is relatively simple to implement compared to the
other methods of image representation.
• It adaptively decomposes the image by dividing it into
square regions with sizes depending on the activities in
the blocks. This leads to variable-rate image coders that
change the coding resolution (in bits per unit area)
according to the local characteristics of the region that is
being coded.
Our segmentation algorithm divides the image into
variable-size square blocks based on a measure of
activity within the block. The size of the finest block is
dictated by the combination of the desired bit rate and the
desired peak signal-to-noise ratio. Several measures of
activities such as visual entropy [5] and image variation
were examined with minor difference in the final
segmented image. The segmentation map obtained for
the test image Lena is shown in Figure 2.
Figure 2. Quadtree segmentation map of the test
image Lena (Figure 5(a)), generated by using the
visual entropy.
3. The Projection Pursuit Neural Networks
wavelets, splines, sigmoidal functions, etc. Clearly, to
obtain an efficient representation for each block
(minimum number of parameters in the ANN model), the
UBFS should include both continuous and discontinuous
functions each with different degrees of smoothness.
Two possible disadvantages of considering a large UBFS
are slight increase in coding overhead and increase in
computation in selecting the optimal basis function at
every step of the iteration. One possible approach to ease
the burden of the computation is to estimate the degree of
smoothness of the current image and then select a basis
function from the UBFS that has similar degree of
smoothness.
Let f(x,y) denote the intensity of the image at location
(x,y), fk̂ (x ,y) its estimated value, and rk(x,y) the residual
image at iteration k. Let Θk = { g k, α k, β k, γ k } denote the
set of parameters at iteration k, where gk correspond to
the various basis functions in the UBFS. Let d(.,.) be the
desired error metric which is typically the mean square
error, and z T = ( x, y ) . In our algorithm the optimal
values of the ANN model will be indicated by superscript
"*", and vectors with underbars. A T will represent the
transpose of the vector A , and A • B the scalar product
of the vectors A and B . Our proposed algorithm can be
summarized as follows. First we have to select a UBFS,
Φ = { φ1 , . . . , φ N } , where φ i’s are the basis functions.
Then for every block B in the segmentation map let
r 0 ( z ) = f ( z ) and computes the optimal parameter
*
vector of the ANN model, Θ k = {g k* , α *k , β *k , γ k* } . To
compute the optimal parameters at iteration k let
J ( z ) = d ( r k – 1 ( z ), γ k g k ( α k
T
•z
+ βk ) )
(3.1)
then
Θk
*


= arg  gmin
 k ∈Φ


 min
α , β , γ
 k k k



∑ J ( z ) 

z∈B

k
∑ γl* g*l
 α*T • z
 l
+ β l 
*
(3.3)
l=1
and the residual image at iteration k is given by
r k ( z ) = f ( z ) – fˆ* ( z )
k
L
L*
∑ γl* g*l
 α*T • z
 l
+ β *l 
(3.5)
l =1
Note that after selection of a particular basis at each
iteration the selected function is orthogonalized by
projection into the span of all previously selected bases
from the UBFS. The orthogonalization step is necessary
for fast convergence of the projection pursuit algorithm
[18]. Clearly, this type of ANN will produce a more
efficient model for the image data because the nodes can
be adaptively added to the model until the desired level
of accuracy for representation is achieved. The ANN
implementation of the above algorithm is illustrated in
Figure 3.
g1*
γ1*
α*11
x
o
y
o
g2*
γ2*
f^*
γL**
α*L*2
L*
g*
L*
Figure 3. The projection pursuit neural network.
The parameter b is mimicked in the node
function g.
Clearly, this type of ANN will produce a more efficient
model for the image data because the nodes can be
adaptively added to the model until the desired level of
accuracy for representation is achieved.
4. Quantization and Coding
k
fˆ ( z ) =
*
fˆ* ( z ) =
(3.2)
and the reconstructed image at iteration k can be written as
*
and the Bit Rate at this iteration. If these values does not
satisfy the desired limits, iterate through equation (3.1) to
(3.4), otherwise
(3.4)
Then compute the Peak Signal-to-Noise Ratio (PSNR)
The optimum parameters for each block must be
quantized before encoding. The optimum quantizer, in
the mean square error sense, is a non-uniform quantizer
that matches the probability density function of its input
signal. Based on the distribution of the weights and the
biases and their dynamic ranges for each block, separate
Lloyd-Max quantizers [18] were designed for each set of
parameters at each iteration of the algorithm. By
quantizing the coefficients at each iteration we force the
algorithm to correct the quantization errors at each
iteration and improving the resulting PSNR.
The resulting quantized parameters can be coded in a
number of ways. Various coding schemes such as
Huffman coding, Shannon-Fano coding, and arithmetic
coding [19] were considered and arithmetic coding was
chosen based on the following:
• Arithmetic coding can approach the entropy limit in
coding efficiency.
• Arithmetic coding requires only one pass through the data.
• The source modeling and information encoding are
separated.
• Arithmetic coding is generally faster than the Huffman
coding.
• In arithmetic coding the encoder and the decoder can
work on-line.
Furthermore, a arithmetic coder has the advantage that
it requires no a priori analysis of the data set to
determine the bit allocation.
5. Experimental Results and Conclusions
The proposed algorithm was used to encode different
types of test images. The test image Lena (512x512x8) is
shown in Figure 5(a) and the test image Peppers is shown
in Figure 5(d). Once the image was segmented, the ANN
model for each block was constructed by the following
procedure. Every function in the Universal Basis
Function Set (UBFS) was used to obtain the best ANN
model for each block by optimizing the parameters of the
function in the mean square error sense. The optimal
network parameters were selected by performing a
greedy search over all the functions in the UBFS. This
provided the first level approximation for each block.
The next levels of approximations were obtained by
repeating the above process on the residual errors. The
process was terminated when the overall error dropped
below the desired threshold or if the desired bit rate was
achieved. The UBFS considered in our experiments
included the following functions: sigmoidal function,
splines, and Daubechies Wavelet (W4). Although
theoretically, a greedy search on a large UBFS should
produce a better approximation of the image, our
experimental results showed that a few basis functions (3
to 5 with various degrees of smoothness) are sufficient to
produce almost identical results. The parameters of the
optimal ANN for each block were quantized using the
Lloyd-Max quantizers. We used the histograms of the
parameters for each block to design the quantizer. For
our experiments, Gaussian and Laplacian Lloyd-Max
quantizers with 5 or 6 bits that provided signal-toquantization ratios in the range of 33-35 dB were used
for all the ANN parameter sets. Finally, the quantized
parameters for each block were separately arithmetic
encoded to form the compressed bit stream. The decoded
test image Lena at the bit rate of 0.125 bit/pixel with
PSNR of 30.2 dB is shown in Figure 5(b). For
comparison, the JPEG algorithm [20] was used to encode
Lena at the same bit rate of 0.125 bit/pixel. The PSNR of
the corresponding JPEG decoded image was found to be
26.2 dB. The severe blockiness artifact present at the
JPEG decoded image can be seen in Figure 5(c). The
results for the test image Peppers with the same UBFS
are shown in Figures 5(e) and 5(f). Our experimental
results showed that at rates below 0.5 bit/pixel, our
method is superior to JPEG both in terms of PSNR and
the visual image quality. Moreover, the performance of
our algorithm is superior to other neural network
algorithms [7]-[12], and is comparable to wavelet based
algorithm such the EZW encoder [21]. Although the
speed of our orthogonal projection pursuit ANN encoder
could be considerably slower than the JPEG or EZW
encoders for a large UBFS, the speed of our projection
pursuit decoder is comparable to the JPEG and EZW
decoders.
6. References
[1] Rabbani M. (Ed.), Selected Papers on Image Coding and
Compression, SPIE Optical Engineering Press, 1992.
[2] Geman S. and Geman D., Stochastic relaxation, Gibbs
distributions, and the Bayesian restoration of images, IEEE
Trans. on Pattern Anal. and Machine Intell., pp. 721-741, PAMI6, 1984.
[3] Radha H., Leonardi R., Vetterli M., and Naylor B., Binary
space partitiong (BSP) tree representation of images, J. Visual
Comm. and Image Rep., 1991.
[4] Rabiee H.R., Kashyap R.L., and Radha H., Multiresolution
Image Compression with BSP trees and Multilevel Block
truncation Coding, Proceedings of the IEEE ICIP’95,
Washington D.C., Oct. 23-26, 1995.
[5] Vaisey J., and Gersho A., Image compression with variable
block size segmentation, IEEE Trans. on Signal Proc., pp. 20402060, SP-40, 1992.
[6] Shusterman E., and Feder, M., Image compression via
improved quadtree decomposition algorithms, IEEE Trans. on
Image Proc., pp. 207-215, IP-3, 1994.
[7] Chen O., Sheu B.J., and Fang W., Image compression using
self-organization networks, IEEE Trans. on Circuits and Systems
for Video Tech., pp. 480-489, CSVT-4, 1994.
[8] Carrato S., Premoli A., and Sicuranza G. L., Linear and
nonlinear neural networks for image compression, Proc. of the IEEE
Conf. on Digital Signal Processing, pp. 526-531, September 1991.
(a)
(b)
(c)
(f)
(d)
(e)
Figure 5. The performance of our coding algorithm on test images Lena and Peppers: (a) The
original test image Lena (512x512x8), (b) Decoded Lena; our method, at 0.125 bit/pixel, PSNR =
30.2 dB, (c) Decoded Lena; JPEG, at 0.125 bit/pixel, PSNR = 26.2 dB, (d) The original test image
Peppers (512x512x8), (e) Decoded Peppers; our method, at 0.125 bit/pixel, PSNR = 30.35 dB, (f)
Decoded Peppers; JPEG, at 0.125 bit/pixel, PSNR = 23.75 dB.
[9] Cottrell G. W., Munro P., and Zipser D., Image compression
by backpropagation: an example of extensional programming,Ó
in SHARKEY N. E. (Ed.): Models of cognition: a review of
cognition science, Norwood, NJ, 1989.
[10] Mougeot M., Azencott R., and Angeniol B., A study of
image compression with backpropagation, in HERAULT J.
(Eds.), NATO ASI Series, Vol. F 68, Neurocomputing, SpringerVerlag, Berlin Heidelberg, 1990.
[11] Carrato S., and Marsi S., Parallel structure based on neural
networks for image compression, Electronic Lett. 28, 1992.
[12] Pinho A. J., Image compression based on quadtree segmentation
and artificial neural networks, Electronic Lett. 29, 1993.
[13] Safavian S.R. and Tenorio M.F., Projection pursuit neural
networks can avoid the curse of dimensionality, Proceedings of
the IEEE International Conference on Systems, Man and
Cybernetics, October 1994.
[14] Huber P., Projection pursuit, Ann. Statist., vol. 13, 435-525, 1985.
[15] S. R. Safavian, H. R. Rabiee, and M. Fardanesh, Projection
Pursuit Image Compression with Variable Block Size Segmentation,
Accepted for publication in IEEE SPL letters, Oct. 1996.
[16] Leonardi, R., and Kunt, M., Adaptive split and merge for
image analysis and coding, Proc. SPIE Conf. on Visual
Commun. Image Processing, Cannes, France, December 1985.
[17] Kocher M., and Leonardi R., Adaptive region growing
technique using polynomial functions for image approximation,
Signal Process. 11, (1), 1986.
[18] H. R. Rabiee, R. L. Kashyap, and S. R. Safavian,
Multiresolution Image Coding with Adapted Wavelet Packet
Bases Using Segmented Orthogonal Matching Pursuit, To be
submitted to IEEE Trans. on Image Processing, April 1997.
[19] Jayant N.S. and Noll P., Digital Coding of Waveforms,
Prentice-Hall, 1984.
[20] Pennebaker W.B. and Mitchell J., JPEG Still Image Data
Compression Standards, Van Nostrand Reinhold, 1993.
[21] J. M. Shapiro, Embedded Image Coding Using Zerotrees of
Wavelet Coefficients IEEE Trans. Signal Processing, vol. 41, no.
12, Dec. 1993.