PSYCHOVISUALLY-BASED MULTIRESOLUTION IMAGE SEGMENTATION
Marcia G. Ramos, Sheila S. Hemami, Michael A. Tamburro
School of Electrical Engineering
Cornell University
Ithaca, NY 14853
ABSTRACT
Psychophysical studies have shown that there are three
image components of distinct perceptual signicance to
human observers: strong edges, smooth regions, and textured or detailed regions. A psychophysical test was performed to evaluate the perceptual role of each region and
a segmentation algorithm was developed to segment an
image into the three regions. The segmentation algorithm identies blocks of an image as belonging to one
of the perceptual regions by analyzing high frequency
coecients of a 3-level hierarchical subband/wavelet decomposition. The segmentation algorithm performs well
on a wide range of image content, including natural images and mixed images containing text, graphics, and
natural scenes.
1. INTRODUCTION
Psychophysical studies have shown that there are three
image components of distinct perceptual signicance to
human observers: strong edges, smooth regions, and textured or detailed regions [1]{[3]. Since the human visual
system responds to each region uniquely, perceptuallybased image processing applications can benet from
identifying these regions in an image and treating them
accordingly. A psychophysical test performed on the
World Wide Web with a large number of images and
subjects clearly demonstrated that the human visual system's (HVS) abilities to recognize objects and to tolerate distortion in images dier for each perceptual region. Identifying these regions is thus fundamental to
any perceptually-based image processing application.
This paper presents a segmentation algorithm to classify blocks of an image into one of the perceptually signicant regions by using the high frequency bands of a 3level hierarchical subband/wavelet decomposition. The
decay of the high frequency wavelet coecients from the
coarsest to the nest scale as well as the distributions
of these coecients are used to determine the classication. The segmentation algorithm performs well on a
This work was supported in part by a grant from the Department of Energy and a gift from the Eastman Kodak Company.
wide range of image content, including natural images
and mixed images containing text, graphics, and natural scenes. The algorithm can be easily incorporated
in mutiresolution-based coding schemes or as a tool in
perceptually-based image compression [4].
This paper is organized as follows. Section 2 describes the psychophysical test to evaluate the perceptual importance of each region, and Section 3 explains
the segmentation algorithm. Section 4 presents the results and the paper is concluded in Section 5.
2. CONTENT-BASED PSYCHOPHYSICAL
TEST
In order to evaluate in more detail the perceptual role of
each region (smooth, edge, and detailed), a psychophysical test was developed with the aid of a psychologist
specialized in visual perception. The purpose of this test
was to rank the three perceptually signicant regions according to their importance in recognizing images and
their noise masking ability. The test consisted of two
sections, recognition and comparison, and contained 30
images in which the subject matter was familiar and
the foreground objects clearly stood out from the background. Seventy ve subjects participated in the test.
The goal of the recognition section was to evaluate
how well human observers can recognize an image represented by a single perceptual region (smooth, edge, or
detailed) and to rank the regions in order of importance
for recognition tasks. The recognition section included
10 specially composited images in which a subject was
asked if he/she could recognize the object in the image [yes/no]. The composited images were created by
distorting two of the perceptual regions with Gaussian
noise, while leaving the third region untouched, testing
recognition of the image using the third region.
The goal of the comparison section was to rank the
dierent perceptual regions of the image according to
the HVS sensitivity to noise in these regions. The test
included 22 questions with two images each in which
the user was asked to identify the image that looked
better. Each of the 22 questions consisted of two copies
of the same image, with each copy having Gaussian noise
added to a dierent perceptual region, such that the
distortions in this region were clearly noticeable to a
human observer.
The recognition test conrmed previous results that
showed that edges play a signicant role in recognizing objects in images while smooth and detailed regions
are not as important in recognition tasks [1]{[3]. Images represented by edge regions were successfully recognized in 94% of the responses while images represented
by smooth regions were recognized in 81% of responses.
Images represented by detailed regions were recognized
in only 75% of responses.
The comparison test results demonstrated that humans are very sensitive to noise in the smooth regions.
While comparing an image with noise added to its smooth
regions versus an image with noise added to its edge regions, the noisy edge image was preferred to the noisy
smooth image in 70% of the comparisons. The noisy detailed images were also preferred to noisy smooth images
in 86% of the comparisons, and to noisy edge images in
97% of the comparisons.
These results motivated the development of a segmentation algorithm to identify the three regions.
3. MULTIRESOLUTION IMAGE
SEGMENTATION
This section introduces a segmentation algorithm that
classies blocks of an image into one of the three perceptual regions based on a 3-level hierarchical subband
decomposition. Previous work in identifying edges using
subband decompositions have been based on processing
the lowest frequency subband to relate activity in the
low and high frequency bands [5], [6]. The segmentation algorithm presented here uses only the LH and HL
bands and is based on the decay of the high frequency
coecients from the coarsest to the nest scale.
3.1. Theoretical Motivation
The theoretical motivation for measuring visual activity using only the high frequency subbands is given in
[7], which shows that multiscale Canny edge detection
is equivalent to nding local maxima of a wavelet transform. The separation of the three perceptual regions
is achieved by examining the high frequency coecient
decay across the scales of the subband decomposition.
The decay of the local maxima of the wavelet coefcients of a function f 2 L2 (<) across scales indicates
the number of continuous derivatives (or Holder regularity) of f [8]. Since an image is a discrete signal, it is
not meaningful to strictly consider continuity or singularities, but the theorem can still be used as a predictor
of visual smoothness or sharp intensity variations in the
signal [9]. The wavelet coecients that correspond to
smooth areas decrease rapidly as the resolution varies
from coarse to ne. A slow coecient decay indicates
the presence of activity in the image, corresponding to
an edge or detailed area. In particular, strong edges
are indicated by isolated features correlated over several
scales while the behavior of the coecients in detailed
areas appears random and exhibits no such correlation.
Hence, analyzing the high frequency coecients for a
given spatial location provides a good measure of activity. A drawback is the occurrences of false alarms for a
signal with noise, which can be avoided with a denoising
strategy.
3.2. Segmentation Algorithm
The full resolution image is divided into N N blocks
(N = 8; 16), and each block is classied into one of the
perceptual regions by examining the high frequency coecients of the LH and HL bands of a 3-level hierarchical subband/wavelet decomposition. The HH bands are
not used since they are primarily high frequency noise.
The segmentation algorithm consists of two parts. In
the rst part, the high frequency coecients are thresholded into two levels of activity on a band-by-band basis,
and in the second part, classication is performed on a
block-by-block basis to label each block as belonging to
one of the perceptual regions.
The two-level activity thresholding is based on the
distribution of the high frequency coecients inside a
given band. For each high frequency band, coecients
with absolute value larger than the standard deviation of
their corresponding bands are labelled as active and set
to 1, while the others are labelled as non-active and set
to 0, thus forming binary bands for each high frequency
band. This has the eect of denoising the coecients.
Each image block is then classied into one of the
perceptual regions based on the following characteristics:
smooth blocks have very few active coecients in any of
the three scales, edge blocks have groupings of active coecients occurring around the edge, and detailed blocks
have approximately 50% of active coecients. The segmentation criterion for each image block is based on a
activity measure which is computed across the scales of
the subband/wavelet decomposition. Each image block
has six corresponding high frequency blocks in the binary bands, three for the LH bands, and three for the
HL bands, one block per scale. The variance of each
one of the six blocks in the binary bands is computed
and then averaged to form . The nal segmentation is
achieved by thresholding as follows:
If < Tlow ) smooth
If Tlow Thigh ) edge
If > Thigh ) detailed
Using the variance as an activity measure is equivalent to counting the number of active coecients in the
binary bands. However, since the blocks in the coarsest bands are relatively small, both edge and detailed
blocks can have a high number of active coecients, and
a counting measure is inadequate to distinguish between
these blocks at the coarsest scale. By using the variance
instead of a counting measure, only the blocks that have
around 50% of active coecients in the coarsest scale
will have a high variance and be classied as detailed,
while blocks containing more than 50% of active coecients will be classied as either smooth or edge blocks
depending on the variance at the ner scales.
4. EXPERIMENTAL RESULTS
Several parameters need to be specied for the segmentation algorithm, namely, the block size, the subband/
wavelet lters, and the lower and upper thresholds. Since
a 3-level decomposition is used, the recommended block
sizes are 8 8 and 16 16. The lower threshold used in
the segmentation criterion is image independent, while
the upper threshold is image dependent and is determined according to the distribution of the LH and HL
bands at the three scales. The lower threshold is set
to classify smooth blocks as having 5% of active coefcients. The higher threshold is determined by nding
the Generalized Gaussian (with parameter ) that best
matches each high frequency subband to compute Thi as
a linear function of the average of all 's. The results
shown in this section are for Thi = ?0:16 + 0:28, which
was determined via linear regression and optimized for
16 16 blocks. In general, Thi as given by the equation
above should be slightly decreased when used to classify
8 8 blocks to provide a better delineation between the
edge and detailed regions. The upper threshold selected
for the lena image (Thi = 0:19) corresponds to classifying detailed blocks having between 30% and 70% of
active pixels.
The algorithm was tested using dierent lters (QMF
lters, Daubechies lters, and biorthogonal wavelet lters), performing well in all cases. The segmentation
algorithm was evaluated both visually as well as subjectively. Image blocks (16 16) for images in the USC
database were hand-labelled as smooth, edge, and detailed, and the results were compared to those from the
algorithm. Table I gives the performance for several images in the USC database.
Image
lena
couple
peppers
lax
Smooth Blocks Edge Blocks Detailed Blocks
88 5%
92 5%
83 5%
87 9%
88 2%
75 8%
91 2%
96 7%
88 1%
85 6%
85 5%
73 3%
:
:
:
:
:
:
:
:
:
:
:
:
Table I - Percentage of Hand-labeled Blocks Classied
Correctly by the Algorithm
The discrepancies mainly occur when hand-labelled
detailed blocks are classied as edges by the algorithm.
Figure 3 presents the results for the lena image and gure
4 presents the results for a mixed image using a 16 16
block size and a length-8 QMF lter.
5. CONCLUSIONS
This paper presents an algorithm to segment an image into three perceptually signicant regions, namely,
smooth regions, edge regions, and textured or detailed regions, using only the high frequency information of a
wavelet decomposition. Results show that the algorithm
matches a hand-labelled segmentation for approximately
80% of the blocks and can be easily incorporated into
perceptually-based coding techniques.
6.
REFERENCES
[1] D. Marr, Vision, W.H. Freeman and Company, 1982.
[2] I. Rock, The Perceptual World, Readings from Scientic American Magazine, W.H. Freeman and Company,
1990.
[3] X. Ran and N. Farvardin, \A Perceptually Motivated
Three-Component Image Model - Part I: Description of
the Model", IEEE Trans. Image Processing, vol. 4, no.
4, pp. 401-415, April 1995.
[4] M.G. Ramos and S.S. Hemami, \Robust Image Coding
With Perceptually-Based Scalability", Proc. Data Compression Conference, Data Compression Industry Workshop, Salt Lake City, UT, March 1996.
[5] O. Johnsen, O.V. Shentov, and S.K. Mitra, \A Technique for the Ecient Coding of the Upper Bands in
Subband Coding of Images" Proc. IEEE ICASSP'90,
Albuquerque, NM, vol.4, pp. 2097-2100.
[6] M. Mohsenian and M.M. Nasrabadi, \Edge-based Subband VQ Techniques for Images and Video", IEEE
Trans. Circuits Syst. Video Technol., vol. 4, no. 1, pp.
53-67, Feb 1994.
[7] S. Mallat, and S. Zhong, \Characterization of Signals
from Multiscale Edges", IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 14, no.7, pp.
710-732, July 1992.
[8] Y. Meyer, Wavelets and Operators, Cambridge University Press, 1992.
[9] W.K. Carey,
S.S. Hemami,
and P.N.
Heller, \Smoothness-Constrained Wavelet Image Compression", Proc. ICIP '96, Lausanne, Switzerland.
(a)
(a)
(b)
(b)
(c)
(c)
(d)
(d)
Figure 1: Segmentation results for the lena image: (a)
Original Image; (b) Smooth Regions; (c) Edge Regions;
(d) Detailed Regions.
Figure 2: Segmentation results for the mixed image: (a)
Original Image; (b) Smooth Regions; (c) Edge Regions;
(d) Detailed Regions.
© Copyright 2026 Paperzz