Hyperspectral Image Compression Using Three

1
Hyperspectral Image Compression Using
Three-Dimensional Wavelet Coding
Xaoli Tang, William A. Pearlman and James W. Modestino
Center for Image Processing Research
Rensselaer Polytechnic Institute
Troy, NY 12180-3590
Corresponding author. Voice: (518) 276-6082; fax: (518) 276-8715; e-mail: [email protected]
This work was performed at Rensselaer Polytechnic Institute and was supported in part by National Science Foundation Grant No. EEC-
981276. The government has certain rights in this material.
November 7, 2002
DRAFT
2
Abstract
Hyperspectral image is a sequence of images generated by hundreds of detectors. Each detector is sensitive
only to a narrow range of wavelengths. One can view such an image sequence as a three-dimensional array of
intensity values (pixels) within a rectangular prism. This image prism reveals contiguous spectrum information about
the composition of the area being viewed by the instrument. The price paid for these high resolution, contiguous
spectrum images is an extremely large set of data. These data cause processing, storage and transmission problems.
Therefore, some application-specific data compression techniques should be applied before we process, store or
transmit hyperspectral images. To solve this problem, we present a Three-Dimensional Set Partitioned Embedded
bloCK (3DSPECK) algorithm based on the observation that hyperspectral images are contiguous in the spectrum axis
(this implies large inter-band correlations) and there is no motion between bands. Therefore, three-dimensional discrete
wavelet transform can fully exploit the inter-band correlations. A modified SPECK [5] partitioning algorithm is used to
sort important information (significant pixels). Rate distortion (Peak Signal-to-Noise Ratio vs. bit rate) performances
were plotted by comparing 3DSPECK against 3DSPIHT on several sets of hyperspectral images. Results show that
3DSPECK is comparable to 3DSPIHT in hyperspectral image compression. 3DSPECK can achieve compression ratios
in the approximate range of 16 to 27 while providing very high quality reconstructed images. It guarantees over 3
dB PSNR improvement at all rates or rate savings at least a factor of 2 over 2D coding of separate spectral bands
without axial transformation.
Index Terms
Three Dimensional Image Compression, Discrete Wavelet Transform (DWT), Hyperspectral Imaging, AVIRIS Imaging, SPIHT
I. I NTRODUCTION
Hyperspectral imaging is a powerful technique and has been widely used in a large number of applications, such
as detection and identification of the surface and atmospheric constituents present, analysis of soil type, monitoring
agriculture and forest status, environmental studies, and military surveilance. Hyperspectral images are generated
by collecting hundreds of narrow and contiguously spaced spectral bands of data such that a complete reflectance
spectrum can be obtained for the region being viewed by the instrument.
All substances, including living things, have their own spectrum characteristics or diagnostic absorption features.
Hyperspectral images have high enough spectrum resolution that by comparing their resulting spectrum features
with those of known substances, we can reveal the information about the composition of the area being imaged.
However, at the time we gain high resolution spectrum information, we generate massively large image data sets.
Access and transport of these data sets will stress existing processing, storage and transmission capabilities. As an
example, the Airborne Visible InfraRed Imaging Spectrometer (AVIRIS), a typical hyperspectral imaging system,
has 224 sensors. In order to provide sufficient resolution to detect most absorption features, each sensor has a
wavelength sensitive range of approximately 10 nanometers. All sensors together yield contiguous spectral bands
covering the entire range between 380 nm and 2500 nm. If each band is 614 512 scans (pixels), with one byte
per pixel, the whole data set will be over 70 Mbytes. Actually, AVIRIS instrument can yield about 16 Gigabytes of
November 7, 2002
DRAFT
3
data per day. Therefore, efficient compression should be applied to these data sets before storage and transmission
[10].
The compression schemes used in the data sets can be classified as lossless or lossy. As what the name suggests,
lossless compression (lossless coding) reduces the redundancy of data sets without losing any information. We
can reconstruct the original images exactly. This is a reversible process. However, this kind of compression can
only provide a compression ratio of about 2 3:1. Obviously, the low compression ratio is not enough to cope
with serious data management issues such as hyperspectral image transmission. However, if we are willing to lose
some information, we can gain significantly higher compression ratio than that of lossless compression. This is
so-called lossy compression (lossy coding). We want to achieve as high as possible ratio without losing important
information.
For applications such as progressive transmission, we also want the compressed or coded data be embedded. We
can truncate the embedded information sequence at any point and reconstruct the data to available resolution.
To incorporate these requirements, many promising image compression algorithms based on wavelet transform
(WT) [9] were proposed recently. They are simple, efficient and have been widely used in many applications. One
of the them is Shapiro’s embedded zerotree wavelet (EZW) [16]. Said and Pearlman [14] refined and extended EZW
subsequently to SPIHT. Islam and Pearlman [5] proposed another low complexity image encoder – Set Partitioned
Embedded bloCK (SPECK). SPECK requires low dynamic memory. Related in various degrees to these earlier
works on scalable image compression, the EBCOT [17] algorithm also uses a wavelet transform to generate the
subband samples which are to be quantized and coded. EBCOT stands for Embedded Block Coding with Optimized
Truncation, which identifies some of the major contributions of the algorithm. It is resolution and SNR scalable
and with a random access property. EBCOT partitions each subband into relatively small blocks (typically, or pixels are used), and generates a separate highly scalable (or embedded) bit stream for each block. The
bit streams can be independently truncated to a given set of rates, which results in finer embedding.
The state-of-the-art encoder, SPIHT, has many attractive properties. It is an efficient embedded technique. The
original SPIHT was proposed for 2-dimensional image compression, and it has been extended to 3D applications by
Kim and Pearlman [6]. 3D-SPIHT is the modern-day benchmark for three dimensional image compression. It is a
powerful tool to compress image sequences. It has been applied on multispectral image compression by Dragotti et
al. [2]. They use vector quantization (VQ) and Karhunen-Loeve transform (KLT) on the spectral dimension to explore
the correlation between multispectral bands. In the spatial domain, they use discrete wavelet transform, and the
3DSPIHT sorting algorithm is applied on the transformed coefficients. Dragotti et al.’s algorithms are comparable to
3DSPIHT in multispectral image compression in rate distortion performance. This is because VQ is theoretically the
optimal block coding strategy and KLT is theoretically the optimal transform to decorrelate the data. However, VQ
and KLT are computationally complex. One needs to either design code book or update statistics on time. Applying
VQ and KLT on the spectral dimension of multispectral images is acceptable because multispectral images only
have a small number of bands. However, on the contiguous spectrum applications such as hyperspectral imagery, it
is not feasible for real-time computation. Fry [3] adopts 3DSPIHT directly on hyperspectral image compression. His
November 7, 2002
DRAFT
4
work demonstrates the speed and computational complexity advantages of 3DSPIHT. Results show that 3D-DWT
is a fast and efficient means to exploit the correlations between hyperspectral bands.
The EBCOT algorithm has also been extended to 3D applications. Three Dimensional Cube Splitting EBCOT
(3D CS-EBCOT) [15] initially partitions the wavelet coefficient prism into equally sized small code cubes of
elements. The cube splitting technique is applied on each code cube to generate separate scalable bit
streams. Like EBCOT, the bit streams may be independently truncated to any of a collection of different lengths
to optimize the rate distortion criteria. Xu et al. [21] used a different method to extend EBCOT on video coding –
Three-Dimensional Embedded Subband Coding with Optimized Truncation (3-D ESCOT). They treat each subband
as a code cube and generate embedded bit streams for each code cube independently by using fractional bit-plane
coding. Candidate truncation points are formed at the end of each fractional bit-plane. Again, the bit streams can
be truncated independently into a layered stream by optimizing a rate distortion function.
In this paper, we extend 2D SPECK to 3D sources such as hyperspectral images. We call this new 3D image
encoding technique Three-Dimensional Set Partitioned Embedded bloCK (3DSPECK). For an image sequence,
three dimensional discrete wavelet transform (3D-DWT) is applied to obtain a wavelet coefficient prism. Since our
applications are hyperspectral images, there is not motion, but tight statistical dependency along the wavelength
axis of this prism. Therefore, the 3D-DWT can exploit the consequent correlation along the wavelength axis, as
well as along the spatial axes. To start the algorithm, the wavelet coefficient prism is partitioned into small (threedimensional) code blocks1 with different sizes, and each subband is treated as a code block. Next, an extended and
modified version of the SPECK sorting algorithm is applied to these code blocks to sort the significance of pixels.
A block splitting algorithm similar to the cube splitting algorithm of 3D CS-EBCOT is used on the individual
code blocks to test their significance. If a code block contains significant coefficients, it is split into several smaller
sub-blocks. The descendant “significant” blocks are then further split until the significant coefficients are isolated.
This block splitting algorithm can zoom in quickly to areas of high energy and code them first and therefore can
exploit the presence of significant high frequency intra-band components. 3DSPECK exploits detailed underlying
physical modeling properties of hyperspectral images.
This paper is organized as following: We will first briefly review the discrete wavelet transform, and the SPIHT
and SPECK algorithms in Section II. The proposed technique is described in detail in Section III. Section IV
presents experimental results, and Section V concludes the paper.
II. MATERIAL AND METHODS
A. Discrete Wavelet Analysis
Wavelet transform coding provides many attractive advantages over other transform methods. This motivates intense
research on this area. Some widely known coding schemes such as EZW, SPIHT, SPECK and EBCOT are all
1 It
is customary to call the unit to be coded a block. This terminology is adopted here for the three-dimensional blocks in the shape of
rectangular prisms.
November 7, 2002
DRAFT
5
wavelet based schemes. One major advantage of wavelet analysis is its ability to perform local analysis – to
analyze a localized area of a larger signal. This implies that we can analyze signals in both time and frequency
domains at different resolutions. Moreover, wavelets tend to be irregular and asymmetric, and they are defined in
finite duration. Therefore, signals which are not periodic or have sharp local changes might be better analyzed with
an irregular wavelet than with a smooth sinusoid (like DCT). For natural non-stationary sources like hyperspectral
images, wavelet transform provides an elegant tool with which to analyze them.
Different kinds of wavelets have different trade-offs between the smoothness of the basis wavelet and the
localization in space. The 9/7 tap biorthogonal filters [1] are widely used in image compression techniques [14][5][6]
to generate a wavelet transform. This transform has proved to provide excellent performance for image compression.
Here in this paper, 9/7 tap biorthogonal filters will provide the transform for our implementation of the new algorithm.
The most frequently used wavelet decomposition structure is dyadic decomposition. All coding algorithms
mentioned above support dyadic decomposition. Occasionally, we need the wavelet packet decomposition which
provides a richer range of possibilities for signal analysis. As stated above, hyperspectral images have high interband dependence. Therefore, one may want to apply wavelet packet instead of dyadic wavelet transform to the
images. Xiong et al. [20] modified 3DSPIHT on lossless medical image compression in the sense that they used
DWT on the spatial domain and wavelet packet on the third dimension. Their results show a small coding gain upon
the original 3DSPIHT. Our experiments show that applying wavelet packet either on all three dimensions or only
on the spectral dimension of the hyperspectral images will decrease the performance of our algorithm. Therefore,
this paper will present only the DWT based (dyadic decomposition) coding technique for hyperspectral images.
B. The SPIHT and SPECK Algorithm
Both SPIHT and SPECK algorithms are highly efficient encoders. They are low in complexity and fast in encoding
and decoding. They are fully embedded; the earlier bits convey higher magnitudes than later ones. We can truncate
the embedded bit stream at any point, and use the available information to reconstruct the image to the allowable
resolution. This makes progressive transmission available.
Set partitioning in hierarchical trees (SPIHT) is the modern-day benchmark for an efficient, low complexity
image encoder. It is essentially a wavelet transform-based embedded bit-plane encoding technique. SPIHT partitions
transformed coefficients into spatial orientation tree sets (Fig. 1) based on the structure of the multi-resolution wavelet
decomposition [14].
A transformed coefficient with larger magnitude has larger information content and therefore should be transmitted
first [14][16]. SPIHT sorts coefficients and sends them in order of decreasing magnitude. As an example, if the largest
coefficient magnitude in the source is 100, SPIHT begins sorting from the root coefficients to their descendants (as
shown in Fig. 1, pixels in the root can either have no or four direct descendants) to find out significant coefficients
against the highest bit plane with
November 7, 2002
"!#$ "& ' )
%&(' )+*-, . , /10
DRAFT
6
Fig. 1.
2-D Spatial orientation tree
2
3456798;:=<<>8 ?
2
@
(1)
where ACB"D E stands for the coefficient located on the coordinate F"GIHKJML . For example, those coefficients with magnitudes
greater than or equal to NO are said to be significant with respect to P 2Q@ and will be transmitted. After finishing
the transmission of all significant pixels with respect to NO , SPIHT decreases P by one (P 2 S
P R
8 Y
the significance of the next lower bit plane. It finds and transmits coefficients with NMVXW 8 A B"D E Z
:2UT ) to test
NO . Each time
(except the first time) SPIHT finishes sorting one bit plane, it checks whether it found any significant coefficients
on the higher planes. If so, SPIHT outputs the P\["] most significant bits of those coefficients. This process will be
repeated recursively until either the desired rate is reached or all coefficients have been transmitted. As a result,
SPIHT generates bit streams that are progressively refined.
One of the main features of SPIHT is that the encoder and decoder have the same sorting algorithm. Therefore,
the encoder does not need to transmit the way it sorts coefficients. The decoder can recover the ordering information
from the execution path, which is put into the bit-stream through the binary outputs of the significance tests. The
details about the algorithm can be found in [14].
The SPECK algorithm has its roots primarily in the ideas developed in SPIHT. Therefore, SPECK has many
features and properties similar to or the same as SPIHT. Both algorithms apply wavelet transform on images. The
transformed images all have a hierarchical pyramidal structure. The difference in methods is in the partitioning of
wavelet transform domain. We learned that SPIHT uses spatial orientation tree sets and partitions into descendant
sets according to threshold tests on magnitude. SPECK uses intra-subband sets and splits into ^ quadrants through
the threshold test. As shown in Fig. 2, the pyramidal structure in SPECK is defined by the levels of decomposition.
SPECK starts by partitioning the image of transformed coefficients into two sets – the rectangular type _ set and
the rest part – type ` set (defined by the solid lines in the figure). The size of set _ depends on the original image
and _
is the root (topmost level of the pyramidal structure) of the pyramid. The purpose of the sorting algorithm
November 7, 2002
DRAFT
7
O(S)
S
S
S
S
I
Fig. 2.
Set Partitioning rules for SPECK algorithm
of SPECK is the same as SPIHT, that is to find significant coefficients with respect to the current a . However, the
path that SPECK tests coefficients is different.
First, SPECK applies quadtree partitioning algorithm to test type b
set. If this set is found to be significant
compare to the current bit plane a (there is at least one significant coefficient in this set), it will be partitioned into
four subsets cd(bfe (dotted lines in Fig. 2), with each subset having approximately one-fourth the size of the parent
set b . Then, SPECK treats each of these four subsets as type b
set and applies the quadtree partitioning algorithm
recursively to each of them to find the significant pixels. After finishing testing the type b
partitioning algorithm is applied to test type g
three type b
sets and one type g
sets. If g
sets, the octave band
set is found to be significant, it will be partitioned into
set as shown in Fig. 2 with dashed lines. The quadtree partitioning and octave
band partitioning will be applied to these new sets respectively to find significant pixels. The motivation for both
quadtree and octave band partitioning of different kind of sets is to zoom in quickly to areas of high energy and
code them first.
The previous paragraph described how SPECK works for the initial bit plane a . At the end of the first pass,
many type b
sets of varying sizes were generated. In the next pass SPECK checks their significance against the
next bit plane (aXhji ), SPECK will test the sets according to their sizes. Smaller size sets are more likely to have
significant pixels than larger size sets [5]. Therefore, these type b
sets will be tested in increasing order of size.
III. T HREE -D IMENSIONAL S ET PARTITIONED E MBEDDED
BLO CK
(3DSPECK)
Many researchers already extended 2-D SPIHT or SPIHT-like techniques to three-dimensional sources [6][2][20][15][7].
To do this, people usually take a 3-D wavelet transform on the image sequence, and apply the extended sorting
and partitioning algorithm to encode the source. As an example, 3-D SPIHT sorts coefficients along the paths of
3-D trees (one pixel corresponds to eight direct descendant pixels) instead of 2-D trees (one pixel to four direct
November 7, 2002
DRAFT
8
descendants). The inter-band dependence or correlation can be exploited automatically, as was done in the spatial
domain.
Following the similar idea, we extend and modify 2-D SPECK for three dimensional hyperspectral image
applications. The first step is to decompose images into wavelet coefficients. Discrete wavelet transform is first
applied on the spectral dimension, followed by on the horizontal (rows) and vertical (columns) axis. The resulting
coefficients have a pyramid structure, and the structure has three levels of decomposition. Figure 3 illustrates this
structure.
Spectrum
High
Low
Low
Low 1 5 2 6
12
7
8
9
4
3
High
13
20
19
10
14
11
15
16
21
High 17
Horizontal
22
18
Vertical
Fig. 3.
Structure for 3DSPECK. The numbers on the front lower left corners for each subband are marked to indicate the sorting order.
The next step is to design a sorting algorithm to encode wavelet coefficients based on their magnitudes. The
motivation comes from the underlying physics of hyperspectral images. Most hyperspectral images have more
high frequency content than other images in spatial domain. We illustrate this by comparing the spatial energy
distribution of a hyperspectral image “coast” and an non-hyperspectral image “chess” (Figure 4) for one level of
wavelet decomposition.
The energy is defined as the sum of squared value of wavelet coefficients. The energy distributions in spatial
frequency of these two images are depicted on Figure 5. The proportion is calculated by dividing the energy in
one sub-band over the total energy. We can see that the “coast” image has more energy distributed over the HH,
HL and LH sub-bands than that of “chess” image. This means that hyperspectral images tend to have more large
(significant) coefficients on finer spatial sub-bands by comparing to non-hyperspectral images.
On the temporal axis, it is also true that less energy tends to locate on the lowest subband of the hyperspectral
image sequences than that of other image sequences, such as medical image sequences. This is demonstrated by
a picture of the energy distribution on the temporal subband axis of “coast” sequence and a CT image sequence
for three levels of wavelet decomposition. Size of 16 slices per segment is used in our example, and the result is
obtained by averaging over several 16-slice segments. Refer to Figure 6, sequences are decomposed to three levels.
November 7, 2002
DRAFT
9
(a)
(b)
Fig. 4.
(a) hyperspectral image “coast”; (b) non-hyperspectral image “chess”
Subband LLL belongs to the first two slices, LLH to the third and the fourth slices, LH to slices 5 to 8, and the
remaining slices are for subband H. Over klm energy is located in the LLL subband for the CT sequence and only
nnpo qq
m for our hyperspectral sequence. Hyperspectral image sequences tend to distribute more energy over finer
subbands than other image sequences.
We want our sorting algorithm to be designed according to these observations. 3DSPIHT searches significant
coefficients from the root of the tree structure down to its leaves. One 3D orientation tree includes a pixel on
the root and its descendants. If all descendants are found to be insignificant against a certain threshold, 3DSPIHT
needs to send only one bit to indicate this information. This kind of tree structure performs excellently if energy
concentrates near the root of the tree. Like EZW and SPIHT, they all take advantage of the tree structure to design
their algorithms, and they are all proved to be very efficient in general cases. However, in our case of hyperspectral
images, energy is not so concentrated near the root of the tree. With higher probabilities than the case of regular
image sequences, the descendants of root pixels may be found to be significant during the first several passes.
3DSPIHT will send extra bits to represent the sorting path to locate the significant pixel. In order to locate the large
coefficients in the high frequency bands quickly, we extend the quadri-section partitioning algorithm of 2DSPECK
November 7, 2002
DRAFT
10
Fig. 5.
76.47%
9.82%
84.73%
3.45%
8.21%
5.51%
7.20%
4.61%
Energy distribution of “coast” (left) and “chess” (right)
LLL LLH
LH
H
77.55% 12.48% 4.93%
LLL LLH
5.04%
LH
H
90.22% 5.87% 2.81%
Fig. 6.
1.1%
Energy distribution of “coast” sequence (upper one) and a CT medical image sequence (lower one)
to octo-section partitioning in our 3D application. The wavelet coefficient prism is partitioned into code blocks of
different sizes. Each subband is treated as a code block. Then, the block splitting into eight is applied on each
code block following a certain order to sort significant pixels. This splitting of code blocks allows zooming in
quickly to areas of high energy and coding them first. This can be proved by comparing the rate distortion curves
of 2DSPECK and 2DSPIHT. The results in [5] show that 2DSPECK out-performs 2DSPIHT for images with much
high frequency content. It is expected that the 3DSPECK would have the same property. 2DSPECK maintains an
array of lists to process code units of varying sizes. This is simplified by maintaining only one list in 3DSPECK.
3DSPECK does not utilize the octave band partitioning in 2DSPECK. Details of the implementation are presented
in the following. Later, the experimental results will show the improvements of our algorithm.
We follow all notations developed in [5] when we use the structure or terminology of SPECK.
For each small (3D) block in Figure 3, we call it type r
set. These include 8 sub-bands under the coarsest level
low-low-low (LLL) sub-band and the 7 sub-bands at the next finer level and the remaining 7 sub-bands at the finest
level (e.g. HLL, LHH). All together, there are ss type r
sets. We are going to use these sets when we initialize
the algorithm.
3DSPECK maintains two linked lists:
t LIS – List of Insignificant Sets. This list contains sets of type r
of varying sizes.
t LSP – List of Significant Pixels. This list contains pixels that have been found significant against threshold u .
November 7, 2002
DRAFT
11
We say that the set v
is significant with respect to w , if
xy{z
}( €I 
|~}( €I ƒ‚K„…‡† ˆ
†f‰‹ŠŒ
(2)
Where ˆ }" €+  denotes the transformed coefficients at coordinate "ސ’‘I“•” . Otherwise it is insignificant. For convenience,
we can write the significance of a set – as a function of w and the set – :
š
xy{z |~}" €+ ƒ‚K
› œž Ÿ¢¡;£
„ ¥ † ˆ }" €+  †p§
Š Œ¤
¦ Š Œ¨ª©
v —–˜”9™
«
Ÿ¢¬®­°¯±¬
Œ
(3)
The size of a set is the number of elements in the set.
Below we present the new encoding algorithm:
²
1) Initialization
² Output w³™µ´ ­~¶·¸  xyz †{ˆ }" €+  † ”K¹
² Set LSP = º
Set LIS = » all small blocks in Figure 3 (total there are 22 sets) ¼
2) Sorting Pass
In increasing order of size of sets, for each set v¾½ LIS, ProcessS(v )
ProcessS(v )
»¿²
² Output À
if À
Œ
(v ) (Whether the set is significant respect to current w or not)
(v ) = 1
Œ
– if v is a pixel, output sign of v and add v
to LSP
– else CodeS(v )
– if v¾½ LIS, remove v
from LIS
¼
CodeS(v )
»¿²
Partition v into eight equal subsets Á (v ). For the situation that the original set size is odd  odd  odd,
we can partition this kind of sets into different but approximate equal sizes of subsets (see Fig 7). For the
situation that the size of the third dimension of the set is 1, we can partition the set into 4 approximately
² equal sizes of subsets.
For each Á (v )
– Output À
– if À
Ã
Ã
November 7, 2002
Œ
( Á (v ))
( Á (v )) = 1
Œ
if Á (v ) is a pixel, output sign of Á (v ) and add Á (v ) to LSP
else CodeS( Á (v ))
DRAFT
12
– else
Ä
add Å (Æ ) to LIS
Ç
3) Refinement Pass
For each entry È(ɐʒËÊIÌÎÍfÏ LSP, except those included in the last sorting pass output the ЪÑ"Ò MSB of Ó{ÔCÕ"Ö ×+Ö ØÓ .
4) Quantization Step
Decrement Ð by 1 and go to step 2.
Fig. 7.
Partitioning of set
Ù
After initialization, there are ÚÚ sets in the LIS. We treat each sub-band on the pyramid as an initial set. All
sets are put on the list in the order of extended “Z” scan. An example of the extended “Z” scan is illustrated in
Figure 3, and the order of the sets is marked by numbers. We can see from Figure 3 that the list is a tree-like list.
The underlying idea is based on the fact that though energy of hyperspectral images does not tend to concentrate
in the LLL sub-band as much as in the case of non-hyperspectral images, a large portion of the energy still resides
in the coarsest level sub-bands. In other words, the coarsest level sub-bands always convey the most important
information of the original image sequence. If these bands are treated as individual sets, we know with extremely
high probability that the algorithm will output 1 when testing the significance of these sets at the first iteration. Since
the sizes of the coarsest level sub-bands are small, the 3DSPECK algorithm zooms into the significant coefficients
quickly. On the other hand, the probability of finding significant coefficients in the sets during the early iterations
decreases when going down the subbands of the pyramid. Our experiments show that sometimes no significant
coefficient can be found against the first one or two bit planes on the finest sub-bands. Therefore, leaving those
larger size sub-bands as individual sets can avoid using extra bit budget to present non-significance. Hence, when
we test the significance of sets, we scan sets by following the path from the top of the pyramid down to its bottom.
In this way, we can find significant coefficients quickly in the area of high energy and send the most important
November 7, 2002
DRAFT
13
information first. This satisfies the requirement of embedded coding and progressive transmission.
During the sorting pass, we claim that sets should be tested in increasing order of size. This follows the argument
in [5]. After the first pass, many sets of type Û of varying sizes are generated and added to the LIS. The algorithm
sends those sets to the LIS because it found some immediate neighbors of those sets are significant against some
threshold Ü . Since coefficients in the same energy cluster have close magnitudes, those not found to be significant
in the current pass are very likely to be found as significant against the next lower threshold. Furthermore, our
experiments show that a large number of sets with size of 1 will be generated after the first iteration. Therefore,
testing sets in increasing order of size can test pixel level first and locate new significant pixels immediately.
We do not use any sorting mechanism to process sets of type Û
in increasing order of their size. Even the
fastest sorting algorithm will slow the coding procedure significantly. This is not desirable in fast implementation
of coders. However, there are simple ways of completely avoiding this sorting procedure. 2DSPECK uses an array
of lists [5] to avoid sorting, whereas we use a different and simpler method achieve this goal.
Note that the way sets Û
are constructed, they lie completely within a sub-band. Thus, every set Û
is located at
a particular level of the pyramidal structure. Partitioning a set Û into eight/four subsets is equivalent to going down
the pyramid one level at the corresponding finer resolution. Hence, the size of a set Û
corresponds to a particular
level of the pyramid. Therefore, we only need to search the same LIS several times at each iteration. Each time
we only test sets with sizes corresponding to a particular level of the pyramid. This implementation completely
eliminates the need for any sorting mechanism for processing the sets Û . Thus, we can test sets in increasing order
of size while keep our algorithm running fast.
The idea of ProcessS and CodeS is an extension of 2-D ProcessS and CodeS in SPECK. We extend from a 2-D
to a 3-D structure. 3DSPECK processes a type Û
set by calling ProcessS to test its significance with respect to
a threshold Ü . If this set is significant, ProcessS will call CodeS to partition set Û
equal subsets Ý (Û ) (Fig 7). We treat each of these subsets as new type Û
into eight/four approximately
set, and in turn, test their significance.
This process will be executed recursively until reaching pixel level where the significant pixel in the original set Û
is located. The algorithm then sends the significant pixel to the LSP and codes the significance.
After finishing the processing of all sets in the LIS, we execute the refinement pass. During the refinement pass,
the algorithm outputs the ÜßÞ"à most significant bit of the absolute value of each entry á(âã’äãå•æ in the LSP, except those
included in the just-completed sorting pass. The purpose of the refinement pass of 3DSPECK is the same as that
of SPIHT and SPECK. The refinement pass refines significant pixels found during previous passes progressively.
The last step of this algorithm is decreasing Ü by ç and returning to the sorting pass. This step makes the whole
process run recursively.
The decoder is designed to have the same mechanism as the encoder. Based on the outcome of the significance
tests in the received bit stream, the decoder can follow exactly the same execution path as the encoder and therefore
reconstruct the image sequence progressively.
As with other coding methods such as SPIHT and SPECK, the efficiency of our algorithm can be improved by
entropy-coding its output [8][11][13][17]. We use the adaptive arithmetic coding algorithm of Witten et al. [18] to
November 7, 2002
DRAFT
14
code the significance map. Referring to the function CodeS in our algorithm, instead of coding the significance test
results of the eight/four subsets separately, we code them together first before further processing the subsets. Simple
context is applied for conditional coding of the significance test result of this subset group. More specifically, we
encode the significance test result of the first subset without any context, but encode the significance test result of
the second subset by using the context (whether the first subset is significant or not) of the first coded subset and so
on. Other outputs from the encoder are entropy coded by applying simple arithmetic coding without any context.
Also, we make the same argument as 2DSPECK that if a set è
is significant and its first seven/three subsets
are insignificant, then this means that the last subset must be significant. Therefore, we do not need to test the
significance of the last subset. This reduces the bit rate somewhat and provides corresponding gains.
We have chosen here not to entropy-code the sign and magnitude refinement bits, as small coding gains are
achieved only with substantial increase in complexity. The SPECK variant, EZBC [4], has chosen this route, along
with a more complicated context for the significance map quadtree coding. The application will dictate whether the
increase in coding performance is worth the added complexity.
IV. N UMERICAL R ESULTS
AVIRIS image sequences “coast”, “moffett” and “jasper” will be tested for 3DSPECK. Among these three image
sequences, “coast” and “moffett” are typical hyperspectral images with higher spatial frequency content, while the
“jasper” sequence is selected to be very smooth in the spatial domain. Later experimental results will show the
advantages of our algorithm when applying it on image sequences with much high frequency content.
The description of AVIRIS can be found in the introduction. Image “coast” is taken at a coastal region in Puerto
Rico at Mayaguez. There are éêë channels (bands) in the range 400 to 2500 nm, each consisting of 397 lines of
268 pixels quantized to 8 bits per pixel (bpp). It is saved as BIS (band interleaved by sample) format. “moffett” is
pictured above the Moffett Field area in CA. Its format is band interleaved by sample (channel, sample, line) with
dimensions (224, 614, 512) and 8 bits per pixel. “jasper” portrays Jasper Ridge in CA. It has the same format and
dimensions as “moffett”. In all cases, a square region of 256 ì 256 pixels which exhibits all key features present
in the images is selected and used as a test set. Band 101 of “coast” image is shown in Figure 4 (a), band 101 of
“moffett” image is shown in Figure 8, and band 116 of “jasper” image is shown in Figure 9.
The coding unit size of 16 slices is used in our implementation. Coding 16 slices per time instead of coding
the whole set of image sequence (i.e. all 204 channels for whole coast sequence) in one time saves considerable
memory and coding delay. The final result is obtained by averaging over several 16-slice segments.
However, even if we compromise the coding delay and try to process as many bands as possible per time, the
performance cannot be significantly improved. Xiong et al. [20] proved that we cannot gain much by increasing
the coding unit size. If we did that, we would pay much more in nearly intolerable memory requirement and
computation delay.
The coding performance of 3DSPECK is compared with that of 3D-SPIHT and 2D-SPIHT algorithms on the
November 7, 2002
DRAFT
15
Fig. 8.
Band 101 of “moffett” test image
Fig. 9.
Band 116 of “jasper” test image
same sets of data, by means of the Peak Signal-to-Noise Ratio (PSNR):
ò
ífîpï-ðQñóò=ôöõ÷øMùûú˜ü’ý{þ®ÿ î where
(4)
denotes the maximum bit depth, and MSE is the mean squared-error between the original and recon-
structed images.
A. Comparison with Conventional Methods
Rate distortion curves are plotted in Figure 10, 11, and 12 for “coast”, “moffett”, and “jasper” sequences respectively.
The PSNR results for 2DSPIHT are obtained by first processing each band separately, then averaging all the results.
By comparing the rate distortion curves of 2DSPIHT to 3DSPIHT and 3DSPECK, we can see that 3D versions
guarantee over 3 dB improvement at all rates. This implies that the hyperspectral image sequences are highly
correlated, and taking 3D DWT can fully exploit these inter band correlations. Thus, 3D compression techniques
achieve much better performance than the 2D version which encodes images separately.
November 7, 2002
DRAFT
16
Fig. 10.
Comparative evaluation the rate distortions of 3DSPECK, 3DSPIHT and 2DSPIHT for the “coast” image
Fig. 11.
Comparative evaluation the rate distortions of 3DSPECK, 3DSPIHT and 2DSPIHT for the “moffett” image
Figure 10 and 11 show that 3DSPECK performs excellently in the case when high spatial frequency content is
present. If the bit rate is very low (i.e. less than 0.4 bpp), 3DSPECK out-performs the well known 3DSPIHT. As
the bit rate increases, 3DSPECK achieves higher PSNR than 3DSPIHT for the “coast” image, but lower PSNR for
“moffett”.
When the image sequence is smooth and does not have much high frequency content, the traditional 3DSPIHT
performs better than 3DSPECK. The reason is that if energy is highly concentrated on the highest pyramid level of
wavelet coefficients, the sorting algorithm of 3DSPIHT will locate the important information very quickly and use
the minimum number of bits to present the unimportance of the lower levels of the pyramid. However, 3DSPECK
has to use extra bit budget to describe whether each set on the list is important. Hence, for this situation, 3DSPIHT
November 7, 2002
DRAFT
17
Fig. 12.
Comparative evaluation the rate distortions of 3DSPECK, 3DSPIHT and 2DSPIHT for the “jasper” image
will achieve better result. Again, 3DSPECK performs better compared to 3DSPIHT at low bit rate than at higher
bit rate.
The rate distortion results suggest that if the sources have much high frequency content, and the compression
rate is required to be very high (bpp very low), 3DSPECK is an excellent candidate. Overall, it is comparable to
3DSPIHT.
B. Assessing the Image Quality
Since our application is hyperspectral imagery, the visual effect is very important. Therefore, it is necessary to
further assess the quality of the compressed images.
To show the visual effect, a single band of each of the test images is shown at different bit rates in Figures 13,
14 and 15.
Images (b) in Figures 13 and 14 show “coast” and “moffett” images decoded at 0.1 bpp. Many fine details are
lost and the borders between different regions are blurred. Such image quality can be used for quickly browsing
image to check whether it is the sequence of interest. Since our coded sequences are embedded, higher quality
images can be obtained by gradually decoding more of the available bits.
If the original image is smooth, one can obtain good compressed image quality even at 0.1 bpp. Figure 15 (b) is
such an example. Only strong edges are blurred. However, it is easy for people to tell many details on the picture.
All the structural information has been correctly reproduced.
As the bit rate is increased, we can get finer and finer images. The noise presented in the original image can
also be refined. In other words, air dust and other sources rendered noise during capturing images. However, lossy
compression has the effect of removing such noise. This makes the compressed images “look” better than the
original ones. Images (c) and (d) in Figure 13, Figure 14 and Figure 15 illustrate high quality of compressed
images at a bit rate as low as 0.3 bpp.
November 7, 2002
DRAFT
18
C. Transform and Coding Gain
We wish to establish the effectiveness of extension of SPECK from 2D to 3D separate from the 3D transform. An
experiment called “3D-DWT + 2DSPECK” is conducted to accomplish this task.
For each of a group of 16 images, 3D-DWT is applied to decompose them into a wavelet coefficient prism. The
prism is saved into 16 frame segments. For given overall rate
temporal subband
to minimize the overall rate distortion
, we assign different rate
to different frames in
. This rate allocation procedure is not novel, and the
details can be found in [12][19]. Then, 2DSPECK is applied on these frames separately according to the assigned
bit rates. The decoded coefficient frames will be concatenated into a coefficient prism. Inverse 3D-DWT is applied
on this prism to reconstruct images and compute the rate distortion performance.
The purpose of doing this is to test whether 3D-DWT provides the whole coding gain. “3D-DWT + 2DSPECK”
does 3D transform and 2D coding, therefore can assess separately gain from 3D transform and 3D coding.
Figures 16, 17, and 18 show the rate distortions curves for our test image sequences by comparing the results
of 3DSPECK, 2DSPECK, and “3D-DWT + 2DSPECK”. For all three test sequences, the rate distortion curves
of “3D-DWT + 2DSPECK” fall between the curves of 3DSPECK and 2DSPECK. This suggests that 3D-DWT
contributes only part of the coding gain, while our 3D coding accounts for the rest. For the “coast” and “moffett”
sequences, which contain more high frequency content, the “3D-DWT + 2DSPECK” curves are closer to 2DSPECK
curves, and the gap between “3D-DWT + 2DSPECK” and 2DSPECK is about 1
2 dB, while the “jasper” has
a “3D-DWT + 2DSPECK” curve that is closer to 3DSPECK, and the gap between “3D-DWT + 2DSPECK” and
2DSPECK is about 2.5 dB. The reason for this can be illustrated by using the same example of energy distributions
as shown in section III. As demonstrated in Figure 5 and 6, the 3D-DWT does a better job in localizing energy
to the low frequency subbands when the image sequence has less high frequency content, so that the 3D-DWT
provides a larger gap between 2DSPECK and “3D-DWT + 2DSPECK” for the “jasper” sequence.
If we go back to compare the PSNR results of 2DSPECK and 2DSPIHT, we can see that 2DSPECK performs
slightly better than 2DSPIHT for the sequences of “coast” and “moffett” but slightly worse than 2DSPIHT for
“jasper” sequence. This matches the results in the 2DSPECK paper [5]. In [5], the rate distortion curves of 2DSPECK
and 2DSPIHT are compared for three images “lena”, “goldhill” and “barbara”. 2DSPECK gives slightly higher PSNR
curve than that of 2DSPIHT for “barbara” and gives slightly lower PSNR curve than that of 2DSPIHT for the other
two images. The “barbara” image has much high frequency content, while the other two are just regular images.
These results suggest that 2DSPECK performs slightly better than 2DSPIHT in the case that much high spatial
frequency content is present. This is proved again in our results of 3D sources.
V. C ONCLUSION
AND
F UTURE W ORK
This paper proposed a three dimensional set partitioned embedded block coder for hyperspectral image compression.
The three dimensional wavelet transform automatically exploits inter-band dependence. The modified SPECK sorting
algorithm can find the most important information efficiently and therefore provides a corresponding coding gain
against two dimensional compression algorithm.
November 7, 2002
DRAFT
19
The proposed 3DSPECK is completely embedded and can be used for progressive transmission. These features
make the proposed coder a good candidate to compress (encode) hyperspectral images before transmission and to
decompress (decode) them at another end for image storage.
Compressed hyperspectral bit streams require protection from channel errors during transmission. For efficient
end-to-end transport, a suitable forward-error-correcting (FEC) algorithm should be applied on the embedded bit
streams before transmission. In the future, we will investigate this topic and implement appropriate channel coding
algorithms.
R EFERENCES
[1] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, Image coding using wavelet transform, IEEE Trans. Image Processing, vol. 1,
pp.205-220, 1992.
[2] P.L. Dragotti, G. Poggi, and A.R.P. Ragozini, Compression of multispectral images by three-dimensional SPIHT algorithm, IEEE Trans.
on Geoscience and remote sensing, vol. 38, No. 1, Jan 2000.
[3] Thomas W. Fry, Hyperspectral image compression on reconfigurable platforms, Master Thesis, Electrical Engineering, University of
Washington, 2001.
[4] S-T. Hsiang and J.W. Woods, Embedded image coding using zeroblocks of subband/wavelet coefficients and context modeling, IEEE Int.
Conf. on Circuits and Systems (ISCAS2000), vol. 3, pp.662-665, May 2000.
[5] A. Islam and W.A. Pearlman, An embedded and efficient low-complexity hierarchical image coder, in Proc. SPIE Visual Comm. and Image
Processing, vol. 3653, pp. 294-305, 1999.
[6] B. Kim and W.A. Pearlman, An embedded wavelet video coder using three-dimensional set partitioning in hierarchical tree, IEEE Data
Compression Conference, pp.251-260, March 1997.
[7] Y. Kim and W.A. Pearlman, Lossless volumetric medical image compression, Ph.D Dissertation, Department of Electrical, Computer, and
Systems Engineering, Rensselaer Polytechnic Institute, Troy, 2001.
[8] J. Li and S. Lei, Rate-distortion optimized embedding, in Proc. Picture Coding Symp., Berlin, Germany, pp. 201-206, Sept. 10-12, 1997.
[9] S. Mallat, Multifrequency channel decompositions of images and wavelet models, IEEE Trans. Acoust., Speech, Signal Processing, vol. 37,
pp.2091-2110, Dec. 1989.
[10] A.N. Netravali and B.G. Haskell, Digital pictures, representation and compression, in Image Processing, Proc. of Data Compression
Conference, pp.252-260, 1997.
[11] E. Ordentlich, M. Weinberger, and G. Seroussi, A low-complexity modeling approach for embedded coding of wavelet coefficients, in
Proc. IEEE Data Compression Conf., Snowbird, UT, pp. 408-417, Mar. 1998.
[12] W.A. Pearlman, Performance bounds for subband codes, Chapter 1 in Subband Image Coding, J. W. Woods and Ed. Klvwer. Academic
Publishers, 1991.
[13] Proposal of the arithmetic coder for JPEG2000, ISO/IEC/JTC1/SC29/WG1 N762, Mar. 1998.
[14] A. Said and W.A. Pearlman, A new, fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. on Circuits
and Systems for Video Technology 6, pp. 243-250, June 1996.
[15] P. Schelkens, Multi-dimensional wavelet coding algorithms and implementations, Ph.D dissertation, Department of Electronics and
Information Processing, Vrije Universiteit Brussel, Brussels, 2001.
[16] J.M. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Processing, vol. 41, pp.3445-3462, Dec.
1993.
[17] D. Taubman, High performance scalable image compression with EBCOT, IEEE Trans. on Image Processing, vol. 9, pp.1158-1170, July,
2000.
[18] I.H. Witten, R.M. Neal, and J.G. Cleary, Arithmetic coding for data compression, Commun. ACM, vol. 30, pp. 520-540, June 1987.
[19] J.W. Woods and T. Naveen, A filter based bit allocation scheme for subband compression of HDTV, IEEE Transactions on Image
Processing, IP-1:436-440, July 1992.
November 7, 2002
DRAFT
20
[20] Z. Xiong, X. Wu, D.Y. Tun, and W.A. Pearlman, Progressive coding of medical volumetric data using three-dimensional integer wavelet
packet transform, Medical Technology Symposium, 1998. Proceedings. Pacific, PP. 384 -387, 1998.
[21] J. Xu, Z. Xiong, S. Li, and Y. Zhang, Three-dimensional embedded subband coding with optimized truncation (3-D ESCOT), J. Applied
and Computational Harmonic Analysis: Special Issue on Wavelet Applications in Engineering. vol. 10, pp. 290-315, May 2001.
November 7, 2002
DRAFT
21
(a)
(b)
(c)
(d)
Fig. 13.
Results for “coast” hyperspectral image at channel 101: (a) original image (b) 0.1 bit/pixel (c) 0.4 bit/pixel (d) 0.8 bit/pixel
November 7, 2002
DRAFT
22
(a)
(b)
(c)
(d)
Fig. 14.
Results for “moffett” hyperspectral image at channel 116: (a) original image (b) 0.1 bit/pixel (c) 0.3 bit/pixel (d) 0.5 bit/pixel
November 7, 2002
DRAFT
23
(a)
(b)
(c)
(d)
Fig. 15.
Results for “jasper” hyperspectral image at channel 116: (a) original image (b) 0.1 bit/pixel (c) 0.3 bit/pixel (d) 0.5 bit/pixel
November 7, 2002
DRAFT
24
34
33
32
31
PSNR (dB)
30
29
28
27
3DSPECK
2DSPECK
3D−DWT + 2DSPECK
26
25
24
0.1
Fig. 16.
0.2
0.3
0.4
Rate (bpp)
0.5
0.6
0.7
0.8
Comparative evaluation the rate distortions of 3DSPECK, 2DSPECK and “3D-DWT + 2DSPECK” for the “coast” image
50
48
PSNR (dB)
46
44
42
3DSPECK
2DSPECK
3D−DWT + 2DSPECK
40
38
0.1
Fig. 17.
0.15
0.2
0.25
0.3
Rate (bpp)
0.35
0.4
0.45
0.5
Comparative evaluation the rate distortions of 3DSPECK, 2DSPECK and “3D-DWT + 2DSPECK” for the “moffett” image
November 7, 2002
DRAFT
25
42
40
PSNR (dB)
38
36
34
32
3DSPECK
2DSPECK
3D−DWT + 2DSPECK
30
28
0.1
Fig. 18.
0.15
0.2
0.25
Rate (bpp)
0.3
0.35
0.4
Comparative evaluation the rate distortions of 3DSPECK, 2DSPECK and “3D-DWT + 2DSPECK” for the “jasper” image
November 7, 2002
DRAFT