1 Hyperspectral Image Compression Using Three-Dimensional Wavelet Coding Xaoli Tang, William A. Pearlman and James W. Modestino Center for Image Processing Research Rensselaer Polytechnic Institute Troy, NY 12180-3590 Corresponding author. Voice: (518) 276-6082; fax: (518) 276-8715; e-mail: [email protected] This work was performed at Rensselaer Polytechnic Institute and was supported in part by National Science Foundation Grant No. EEC- 981276. The government has certain rights in this material. November 7, 2002 DRAFT 2 Abstract Hyperspectral image is a sequence of images generated by hundreds of detectors. Each detector is sensitive only to a narrow range of wavelengths. One can view such an image sequence as a three-dimensional array of intensity values (pixels) within a rectangular prism. This image prism reveals contiguous spectrum information about the composition of the area being viewed by the instrument. The price paid for these high resolution, contiguous spectrum images is an extremely large set of data. These data cause processing, storage and transmission problems. Therefore, some application-specific data compression techniques should be applied before we process, store or transmit hyperspectral images. To solve this problem, we present a Three-Dimensional Set Partitioned Embedded bloCK (3DSPECK) algorithm based on the observation that hyperspectral images are contiguous in the spectrum axis (this implies large inter-band correlations) and there is no motion between bands. Therefore, three-dimensional discrete wavelet transform can fully exploit the inter-band correlations. A modified SPECK [5] partitioning algorithm is used to sort important information (significant pixels). Rate distortion (Peak Signal-to-Noise Ratio vs. bit rate) performances were plotted by comparing 3DSPECK against 3DSPIHT on several sets of hyperspectral images. Results show that 3DSPECK is comparable to 3DSPIHT in hyperspectral image compression. 3DSPECK can achieve compression ratios in the approximate range of 16 to 27 while providing very high quality reconstructed images. It guarantees over 3 dB PSNR improvement at all rates or rate savings at least a factor of 2 over 2D coding of separate spectral bands without axial transformation. Index Terms Three Dimensional Image Compression, Discrete Wavelet Transform (DWT), Hyperspectral Imaging, AVIRIS Imaging, SPIHT I. I NTRODUCTION Hyperspectral imaging is a powerful technique and has been widely used in a large number of applications, such as detection and identification of the surface and atmospheric constituents present, analysis of soil type, monitoring agriculture and forest status, environmental studies, and military surveilance. Hyperspectral images are generated by collecting hundreds of narrow and contiguously spaced spectral bands of data such that a complete reflectance spectrum can be obtained for the region being viewed by the instrument. All substances, including living things, have their own spectrum characteristics or diagnostic absorption features. Hyperspectral images have high enough spectrum resolution that by comparing their resulting spectrum features with those of known substances, we can reveal the information about the composition of the area being imaged. However, at the time we gain high resolution spectrum information, we generate massively large image data sets. Access and transport of these data sets will stress existing processing, storage and transmission capabilities. As an example, the Airborne Visible InfraRed Imaging Spectrometer (AVIRIS), a typical hyperspectral imaging system, has 224 sensors. In order to provide sufficient resolution to detect most absorption features, each sensor has a wavelength sensitive range of approximately 10 nanometers. All sensors together yield contiguous spectral bands covering the entire range between 380 nm and 2500 nm. If each band is 614 512 scans (pixels), with one byte per pixel, the whole data set will be over 70 Mbytes. Actually, AVIRIS instrument can yield about 16 Gigabytes of November 7, 2002 DRAFT 3 data per day. Therefore, efficient compression should be applied to these data sets before storage and transmission [10]. The compression schemes used in the data sets can be classified as lossless or lossy. As what the name suggests, lossless compression (lossless coding) reduces the redundancy of data sets without losing any information. We can reconstruct the original images exactly. This is a reversible process. However, this kind of compression can only provide a compression ratio of about 2 3:1. Obviously, the low compression ratio is not enough to cope with serious data management issues such as hyperspectral image transmission. However, if we are willing to lose some information, we can gain significantly higher compression ratio than that of lossless compression. This is so-called lossy compression (lossy coding). We want to achieve as high as possible ratio without losing important information. For applications such as progressive transmission, we also want the compressed or coded data be embedded. We can truncate the embedded information sequence at any point and reconstruct the data to available resolution. To incorporate these requirements, many promising image compression algorithms based on wavelet transform (WT) [9] were proposed recently. They are simple, efficient and have been widely used in many applications. One of the them is Shapiro’s embedded zerotree wavelet (EZW) [16]. Said and Pearlman [14] refined and extended EZW subsequently to SPIHT. Islam and Pearlman [5] proposed another low complexity image encoder – Set Partitioned Embedded bloCK (SPECK). SPECK requires low dynamic memory. Related in various degrees to these earlier works on scalable image compression, the EBCOT [17] algorithm also uses a wavelet transform to generate the subband samples which are to be quantized and coded. EBCOT stands for Embedded Block Coding with Optimized Truncation, which identifies some of the major contributions of the algorithm. It is resolution and SNR scalable and with a random access property. EBCOT partitions each subband into relatively small blocks (typically, or pixels are used), and generates a separate highly scalable (or embedded) bit stream for each block. The bit streams can be independently truncated to a given set of rates, which results in finer embedding. The state-of-the-art encoder, SPIHT, has many attractive properties. It is an efficient embedded technique. The original SPIHT was proposed for 2-dimensional image compression, and it has been extended to 3D applications by Kim and Pearlman [6]. 3D-SPIHT is the modern-day benchmark for three dimensional image compression. It is a powerful tool to compress image sequences. It has been applied on multispectral image compression by Dragotti et al. [2]. They use vector quantization (VQ) and Karhunen-Loeve transform (KLT) on the spectral dimension to explore the correlation between multispectral bands. In the spatial domain, they use discrete wavelet transform, and the 3DSPIHT sorting algorithm is applied on the transformed coefficients. Dragotti et al.’s algorithms are comparable to 3DSPIHT in multispectral image compression in rate distortion performance. This is because VQ is theoretically the optimal block coding strategy and KLT is theoretically the optimal transform to decorrelate the data. However, VQ and KLT are computationally complex. One needs to either design code book or update statistics on time. Applying VQ and KLT on the spectral dimension of multispectral images is acceptable because multispectral images only have a small number of bands. However, on the contiguous spectrum applications such as hyperspectral imagery, it is not feasible for real-time computation. Fry [3] adopts 3DSPIHT directly on hyperspectral image compression. His November 7, 2002 DRAFT 4 work demonstrates the speed and computational complexity advantages of 3DSPIHT. Results show that 3D-DWT is a fast and efficient means to exploit the correlations between hyperspectral bands. The EBCOT algorithm has also been extended to 3D applications. Three Dimensional Cube Splitting EBCOT (3D CS-EBCOT) [15] initially partitions the wavelet coefficient prism into equally sized small code cubes of elements. The cube splitting technique is applied on each code cube to generate separate scalable bit streams. Like EBCOT, the bit streams may be independently truncated to any of a collection of different lengths to optimize the rate distortion criteria. Xu et al. [21] used a different method to extend EBCOT on video coding – Three-Dimensional Embedded Subband Coding with Optimized Truncation (3-D ESCOT). They treat each subband as a code cube and generate embedded bit streams for each code cube independently by using fractional bit-plane coding. Candidate truncation points are formed at the end of each fractional bit-plane. Again, the bit streams can be truncated independently into a layered stream by optimizing a rate distortion function. In this paper, we extend 2D SPECK to 3D sources such as hyperspectral images. We call this new 3D image encoding technique Three-Dimensional Set Partitioned Embedded bloCK (3DSPECK). For an image sequence, three dimensional discrete wavelet transform (3D-DWT) is applied to obtain a wavelet coefficient prism. Since our applications are hyperspectral images, there is not motion, but tight statistical dependency along the wavelength axis of this prism. Therefore, the 3D-DWT can exploit the consequent correlation along the wavelength axis, as well as along the spatial axes. To start the algorithm, the wavelet coefficient prism is partitioned into small (threedimensional) code blocks1 with different sizes, and each subband is treated as a code block. Next, an extended and modified version of the SPECK sorting algorithm is applied to these code blocks to sort the significance of pixels. A block splitting algorithm similar to the cube splitting algorithm of 3D CS-EBCOT is used on the individual code blocks to test their significance. If a code block contains significant coefficients, it is split into several smaller sub-blocks. The descendant “significant” blocks are then further split until the significant coefficients are isolated. This block splitting algorithm can zoom in quickly to areas of high energy and code them first and therefore can exploit the presence of significant high frequency intra-band components. 3DSPECK exploits detailed underlying physical modeling properties of hyperspectral images. This paper is organized as following: We will first briefly review the discrete wavelet transform, and the SPIHT and SPECK algorithms in Section II. The proposed technique is described in detail in Section III. Section IV presents experimental results, and Section V concludes the paper. II. MATERIAL AND METHODS A. Discrete Wavelet Analysis Wavelet transform coding provides many attractive advantages over other transform methods. This motivates intense research on this area. Some widely known coding schemes such as EZW, SPIHT, SPECK and EBCOT are all 1 It is customary to call the unit to be coded a block. This terminology is adopted here for the three-dimensional blocks in the shape of rectangular prisms. November 7, 2002 DRAFT 5 wavelet based schemes. One major advantage of wavelet analysis is its ability to perform local analysis – to analyze a localized area of a larger signal. This implies that we can analyze signals in both time and frequency domains at different resolutions. Moreover, wavelets tend to be irregular and asymmetric, and they are defined in finite duration. Therefore, signals which are not periodic or have sharp local changes might be better analyzed with an irregular wavelet than with a smooth sinusoid (like DCT). For natural non-stationary sources like hyperspectral images, wavelet transform provides an elegant tool with which to analyze them. Different kinds of wavelets have different trade-offs between the smoothness of the basis wavelet and the localization in space. The 9/7 tap biorthogonal filters [1] are widely used in image compression techniques [14][5][6] to generate a wavelet transform. This transform has proved to provide excellent performance for image compression. Here in this paper, 9/7 tap biorthogonal filters will provide the transform for our implementation of the new algorithm. The most frequently used wavelet decomposition structure is dyadic decomposition. All coding algorithms mentioned above support dyadic decomposition. Occasionally, we need the wavelet packet decomposition which provides a richer range of possibilities for signal analysis. As stated above, hyperspectral images have high interband dependence. Therefore, one may want to apply wavelet packet instead of dyadic wavelet transform to the images. Xiong et al. [20] modified 3DSPIHT on lossless medical image compression in the sense that they used DWT on the spatial domain and wavelet packet on the third dimension. Their results show a small coding gain upon the original 3DSPIHT. Our experiments show that applying wavelet packet either on all three dimensions or only on the spectral dimension of the hyperspectral images will decrease the performance of our algorithm. Therefore, this paper will present only the DWT based (dyadic decomposition) coding technique for hyperspectral images. B. The SPIHT and SPECK Algorithm Both SPIHT and SPECK algorithms are highly efficient encoders. They are low in complexity and fast in encoding and decoding. They are fully embedded; the earlier bits convey higher magnitudes than later ones. We can truncate the embedded bit stream at any point, and use the available information to reconstruct the image to the allowable resolution. This makes progressive transmission available. Set partitioning in hierarchical trees (SPIHT) is the modern-day benchmark for an efficient, low complexity image encoder. It is essentially a wavelet transform-based embedded bit-plane encoding technique. SPIHT partitions transformed coefficients into spatial orientation tree sets (Fig. 1) based on the structure of the multi-resolution wavelet decomposition [14]. A transformed coefficient with larger magnitude has larger information content and therefore should be transmitted first [14][16]. SPIHT sorts coefficients and sends them in order of decreasing magnitude. As an example, if the largest coefficient magnitude in the source is 100, SPIHT begins sorting from the root coefficients to their descendants (as shown in Fig. 1, pixels in the root can either have no or four direct descendants) to find out significant coefficients against the highest bit plane with November 7, 2002 "!#$ "& ' ) %&(' )+*-, . , /10 DRAFT 6 Fig. 1. 2-D Spatial orientation tree 2 3456798;:=<<>8 ? 2 @ (1) where ACB"D E stands for the coefficient located on the coordinate F"GIHKJML . For example, those coefficients with magnitudes greater than or equal to NO are said to be significant with respect to P 2Q@ and will be transmitted. After finishing the transmission of all significant pixels with respect to NO , SPIHT decreases P by one (P 2 S P R 8 Y the significance of the next lower bit plane. It finds and transmits coefficients with NMVXW 8 A B"D E Z :2UT ) to test NO . Each time (except the first time) SPIHT finishes sorting one bit plane, it checks whether it found any significant coefficients on the higher planes. If so, SPIHT outputs the P\["] most significant bits of those coefficients. This process will be repeated recursively until either the desired rate is reached or all coefficients have been transmitted. As a result, SPIHT generates bit streams that are progressively refined. One of the main features of SPIHT is that the encoder and decoder have the same sorting algorithm. Therefore, the encoder does not need to transmit the way it sorts coefficients. The decoder can recover the ordering information from the execution path, which is put into the bit-stream through the binary outputs of the significance tests. The details about the algorithm can be found in [14]. The SPECK algorithm has its roots primarily in the ideas developed in SPIHT. Therefore, SPECK has many features and properties similar to or the same as SPIHT. Both algorithms apply wavelet transform on images. The transformed images all have a hierarchical pyramidal structure. The difference in methods is in the partitioning of wavelet transform domain. We learned that SPIHT uses spatial orientation tree sets and partitions into descendant sets according to threshold tests on magnitude. SPECK uses intra-subband sets and splits into ^ quadrants through the threshold test. As shown in Fig. 2, the pyramidal structure in SPECK is defined by the levels of decomposition. SPECK starts by partitioning the image of transformed coefficients into two sets – the rectangular type _ set and the rest part – type ` set (defined by the solid lines in the figure). The size of set _ depends on the original image and _ is the root (topmost level of the pyramidal structure) of the pyramid. The purpose of the sorting algorithm November 7, 2002 DRAFT 7 O(S) S S S S I Fig. 2. Set Partitioning rules for SPECK algorithm of SPECK is the same as SPIHT, that is to find significant coefficients with respect to the current a . However, the path that SPECK tests coefficients is different. First, SPECK applies quadtree partitioning algorithm to test type b set. If this set is found to be significant compare to the current bit plane a (there is at least one significant coefficient in this set), it will be partitioned into four subsets cd(bfe (dotted lines in Fig. 2), with each subset having approximately one-fourth the size of the parent set b . Then, SPECK treats each of these four subsets as type b set and applies the quadtree partitioning algorithm recursively to each of them to find the significant pixels. After finishing testing the type b partitioning algorithm is applied to test type g three type b sets and one type g sets. If g sets, the octave band set is found to be significant, it will be partitioned into set as shown in Fig. 2 with dashed lines. The quadtree partitioning and octave band partitioning will be applied to these new sets respectively to find significant pixels. The motivation for both quadtree and octave band partitioning of different kind of sets is to zoom in quickly to areas of high energy and code them first. The previous paragraph described how SPECK works for the initial bit plane a . At the end of the first pass, many type b sets of varying sizes were generated. In the next pass SPECK checks their significance against the next bit plane (aXhji ), SPECK will test the sets according to their sizes. Smaller size sets are more likely to have significant pixels than larger size sets [5]. Therefore, these type b sets will be tested in increasing order of size. III. T HREE -D IMENSIONAL S ET PARTITIONED E MBEDDED BLO CK (3DSPECK) Many researchers already extended 2-D SPIHT or SPIHT-like techniques to three-dimensional sources [6][2][20][15][7]. To do this, people usually take a 3-D wavelet transform on the image sequence, and apply the extended sorting and partitioning algorithm to encode the source. As an example, 3-D SPIHT sorts coefficients along the paths of 3-D trees (one pixel corresponds to eight direct descendant pixels) instead of 2-D trees (one pixel to four direct November 7, 2002 DRAFT 8 descendants). The inter-band dependence or correlation can be exploited automatically, as was done in the spatial domain. Following the similar idea, we extend and modify 2-D SPECK for three dimensional hyperspectral image applications. The first step is to decompose images into wavelet coefficients. Discrete wavelet transform is first applied on the spectral dimension, followed by on the horizontal (rows) and vertical (columns) axis. The resulting coefficients have a pyramid structure, and the structure has three levels of decomposition. Figure 3 illustrates this structure. Spectrum High Low Low Low 1 5 2 6 12 7 8 9 4 3 High 13 20 19 10 14 11 15 16 21 High 17 Horizontal 22 18 Vertical Fig. 3. Structure for 3DSPECK. The numbers on the front lower left corners for each subband are marked to indicate the sorting order. The next step is to design a sorting algorithm to encode wavelet coefficients based on their magnitudes. The motivation comes from the underlying physics of hyperspectral images. Most hyperspectral images have more high frequency content than other images in spatial domain. We illustrate this by comparing the spatial energy distribution of a hyperspectral image “coast” and an non-hyperspectral image “chess” (Figure 4) for one level of wavelet decomposition. The energy is defined as the sum of squared value of wavelet coefficients. The energy distributions in spatial frequency of these two images are depicted on Figure 5. The proportion is calculated by dividing the energy in one sub-band over the total energy. We can see that the “coast” image has more energy distributed over the HH, HL and LH sub-bands than that of “chess” image. This means that hyperspectral images tend to have more large (significant) coefficients on finer spatial sub-bands by comparing to non-hyperspectral images. On the temporal axis, it is also true that less energy tends to locate on the lowest subband of the hyperspectral image sequences than that of other image sequences, such as medical image sequences. This is demonstrated by a picture of the energy distribution on the temporal subband axis of “coast” sequence and a CT image sequence for three levels of wavelet decomposition. Size of 16 slices per segment is used in our example, and the result is obtained by averaging over several 16-slice segments. Refer to Figure 6, sequences are decomposed to three levels. November 7, 2002 DRAFT 9 (a) (b) Fig. 4. (a) hyperspectral image “coast”; (b) non-hyperspectral image “chess” Subband LLL belongs to the first two slices, LLH to the third and the fourth slices, LH to slices 5 to 8, and the remaining slices are for subband H. Over klm energy is located in the LLL subband for the CT sequence and only nnpo qq m for our hyperspectral sequence. Hyperspectral image sequences tend to distribute more energy over finer subbands than other image sequences. We want our sorting algorithm to be designed according to these observations. 3DSPIHT searches significant coefficients from the root of the tree structure down to its leaves. One 3D orientation tree includes a pixel on the root and its descendants. If all descendants are found to be insignificant against a certain threshold, 3DSPIHT needs to send only one bit to indicate this information. This kind of tree structure performs excellently if energy concentrates near the root of the tree. Like EZW and SPIHT, they all take advantage of the tree structure to design their algorithms, and they are all proved to be very efficient in general cases. However, in our case of hyperspectral images, energy is not so concentrated near the root of the tree. With higher probabilities than the case of regular image sequences, the descendants of root pixels may be found to be significant during the first several passes. 3DSPIHT will send extra bits to represent the sorting path to locate the significant pixel. In order to locate the large coefficients in the high frequency bands quickly, we extend the quadri-section partitioning algorithm of 2DSPECK November 7, 2002 DRAFT 10 Fig. 5. 76.47% 9.82% 84.73% 3.45% 8.21% 5.51% 7.20% 4.61% Energy distribution of “coast” (left) and “chess” (right) LLL LLH LH H 77.55% 12.48% 4.93% LLL LLH 5.04% LH H 90.22% 5.87% 2.81% Fig. 6. 1.1% Energy distribution of “coast” sequence (upper one) and a CT medical image sequence (lower one) to octo-section partitioning in our 3D application. The wavelet coefficient prism is partitioned into code blocks of different sizes. Each subband is treated as a code block. Then, the block splitting into eight is applied on each code block following a certain order to sort significant pixels. This splitting of code blocks allows zooming in quickly to areas of high energy and coding them first. This can be proved by comparing the rate distortion curves of 2DSPECK and 2DSPIHT. The results in [5] show that 2DSPECK out-performs 2DSPIHT for images with much high frequency content. It is expected that the 3DSPECK would have the same property. 2DSPECK maintains an array of lists to process code units of varying sizes. This is simplified by maintaining only one list in 3DSPECK. 3DSPECK does not utilize the octave band partitioning in 2DSPECK. Details of the implementation are presented in the following. Later, the experimental results will show the improvements of our algorithm. We follow all notations developed in [5] when we use the structure or terminology of SPECK. For each small (3D) block in Figure 3, we call it type r set. These include 8 sub-bands under the coarsest level low-low-low (LLL) sub-band and the 7 sub-bands at the next finer level and the remaining 7 sub-bands at the finest level (e.g. HLL, LHH). All together, there are ss type r sets. We are going to use these sets when we initialize the algorithm. 3DSPECK maintains two linked lists: t LIS – List of Insignificant Sets. This list contains sets of type r of varying sizes. t LSP – List of Significant Pixels. This list contains pixels that have been found significant against threshold u . November 7, 2002 DRAFT 11 We say that the set v is significant with respect to w , if xy{z }( I |~}( I K f (2) Where }" + denotes the transformed coefficients at coordinate "I . Otherwise it is insignificant. For convenience, we can write the significance of a set as a function of w and the set : xy{z |~}" + K ¢¡;£ ¥ }" + p§ ¤ ¦ ¨ª© v 9 « ¢¬®°¯±¬ (3) The size of a set is the number of elements in the set. Below we present the new encoding algorithm: ² 1) Initialization ² Output w³µ´ ~¶·¸ xyz { }" + K¹ ² Set LSP = º Set LIS = » all small blocks in Figure 3 (total there are 22 sets) ¼ 2) Sorting Pass In increasing order of size of sets, for each set v¾½ LIS, ProcessS(v ) ProcessS(v ) »¿² ² Output À if À (v ) (Whether the set is significant respect to current w or not) (v ) = 1 – if v is a pixel, output sign of v and add v to LSP – else CodeS(v ) – if v¾½ LIS, remove v from LIS ¼ CodeS(v ) »¿² Partition v into eight equal subsets Á (v ). For the situation that the original set size is odd  odd  odd, we can partition this kind of sets into different but approximate equal sizes of subsets (see Fig 7). For the situation that the size of the third dimension of the set is 1, we can partition the set into 4 approximately ² equal sizes of subsets. For each Á (v ) – Output À – if À à à November 7, 2002 ( Á (v )) ( Á (v )) = 1 if Á (v ) is a pixel, output sign of Á (v ) and add Á (v ) to LSP else CodeS( Á (v )) DRAFT 12 – else Ä add Å (Æ ) to LIS Ç 3) Refinement Pass For each entry È(ÉÊËÊIÌÎÍfÏ LSP, except those included in the last sorting pass output the ЪÑ"Ò MSB of Ó{ÔCÕ"Ö ×+Ö ØÓ . 4) Quantization Step Decrement Ð by 1 and go to step 2. Fig. 7. Partitioning of set Ù After initialization, there are ÚÚ sets in the LIS. We treat each sub-band on the pyramid as an initial set. All sets are put on the list in the order of extended “Z” scan. An example of the extended “Z” scan is illustrated in Figure 3, and the order of the sets is marked by numbers. We can see from Figure 3 that the list is a tree-like list. The underlying idea is based on the fact that though energy of hyperspectral images does not tend to concentrate in the LLL sub-band as much as in the case of non-hyperspectral images, a large portion of the energy still resides in the coarsest level sub-bands. In other words, the coarsest level sub-bands always convey the most important information of the original image sequence. If these bands are treated as individual sets, we know with extremely high probability that the algorithm will output 1 when testing the significance of these sets at the first iteration. Since the sizes of the coarsest level sub-bands are small, the 3DSPECK algorithm zooms into the significant coefficients quickly. On the other hand, the probability of finding significant coefficients in the sets during the early iterations decreases when going down the subbands of the pyramid. Our experiments show that sometimes no significant coefficient can be found against the first one or two bit planes on the finest sub-bands. Therefore, leaving those larger size sub-bands as individual sets can avoid using extra bit budget to present non-significance. Hence, when we test the significance of sets, we scan sets by following the path from the top of the pyramid down to its bottom. In this way, we can find significant coefficients quickly in the area of high energy and send the most important November 7, 2002 DRAFT 13 information first. This satisfies the requirement of embedded coding and progressive transmission. During the sorting pass, we claim that sets should be tested in increasing order of size. This follows the argument in [5]. After the first pass, many sets of type Û of varying sizes are generated and added to the LIS. The algorithm sends those sets to the LIS because it found some immediate neighbors of those sets are significant against some threshold Ü . Since coefficients in the same energy cluster have close magnitudes, those not found to be significant in the current pass are very likely to be found as significant against the next lower threshold. Furthermore, our experiments show that a large number of sets with size of 1 will be generated after the first iteration. Therefore, testing sets in increasing order of size can test pixel level first and locate new significant pixels immediately. We do not use any sorting mechanism to process sets of type Û in increasing order of their size. Even the fastest sorting algorithm will slow the coding procedure significantly. This is not desirable in fast implementation of coders. However, there are simple ways of completely avoiding this sorting procedure. 2DSPECK uses an array of lists [5] to avoid sorting, whereas we use a different and simpler method achieve this goal. Note that the way sets Û are constructed, they lie completely within a sub-band. Thus, every set Û is located at a particular level of the pyramidal structure. Partitioning a set Û into eight/four subsets is equivalent to going down the pyramid one level at the corresponding finer resolution. Hence, the size of a set Û corresponds to a particular level of the pyramid. Therefore, we only need to search the same LIS several times at each iteration. Each time we only test sets with sizes corresponding to a particular level of the pyramid. This implementation completely eliminates the need for any sorting mechanism for processing the sets Û . Thus, we can test sets in increasing order of size while keep our algorithm running fast. The idea of ProcessS and CodeS is an extension of 2-D ProcessS and CodeS in SPECK. We extend from a 2-D to a 3-D structure. 3DSPECK processes a type Û set by calling ProcessS to test its significance with respect to a threshold Ü . If this set is significant, ProcessS will call CodeS to partition set Û equal subsets Ý (Û ) (Fig 7). We treat each of these subsets as new type Û into eight/four approximately set, and in turn, test their significance. This process will be executed recursively until reaching pixel level where the significant pixel in the original set Û is located. The algorithm then sends the significant pixel to the LSP and codes the significance. After finishing the processing of all sets in the LIS, we execute the refinement pass. During the refinement pass, the algorithm outputs the ÜßÞ"à most significant bit of the absolute value of each entry á(âãäãåæ in the LSP, except those included in the just-completed sorting pass. The purpose of the refinement pass of 3DSPECK is the same as that of SPIHT and SPECK. The refinement pass refines significant pixels found during previous passes progressively. The last step of this algorithm is decreasing Ü by ç and returning to the sorting pass. This step makes the whole process run recursively. The decoder is designed to have the same mechanism as the encoder. Based on the outcome of the significance tests in the received bit stream, the decoder can follow exactly the same execution path as the encoder and therefore reconstruct the image sequence progressively. As with other coding methods such as SPIHT and SPECK, the efficiency of our algorithm can be improved by entropy-coding its output [8][11][13][17]. We use the adaptive arithmetic coding algorithm of Witten et al. [18] to November 7, 2002 DRAFT 14 code the significance map. Referring to the function CodeS in our algorithm, instead of coding the significance test results of the eight/four subsets separately, we code them together first before further processing the subsets. Simple context is applied for conditional coding of the significance test result of this subset group. More specifically, we encode the significance test result of the first subset without any context, but encode the significance test result of the second subset by using the context (whether the first subset is significant or not) of the first coded subset and so on. Other outputs from the encoder are entropy coded by applying simple arithmetic coding without any context. Also, we make the same argument as 2DSPECK that if a set è is significant and its first seven/three subsets are insignificant, then this means that the last subset must be significant. Therefore, we do not need to test the significance of the last subset. This reduces the bit rate somewhat and provides corresponding gains. We have chosen here not to entropy-code the sign and magnitude refinement bits, as small coding gains are achieved only with substantial increase in complexity. The SPECK variant, EZBC [4], has chosen this route, along with a more complicated context for the significance map quadtree coding. The application will dictate whether the increase in coding performance is worth the added complexity. IV. N UMERICAL R ESULTS AVIRIS image sequences “coast”, “moffett” and “jasper” will be tested for 3DSPECK. Among these three image sequences, “coast” and “moffett” are typical hyperspectral images with higher spatial frequency content, while the “jasper” sequence is selected to be very smooth in the spatial domain. Later experimental results will show the advantages of our algorithm when applying it on image sequences with much high frequency content. The description of AVIRIS can be found in the introduction. Image “coast” is taken at a coastal region in Puerto Rico at Mayaguez. There are éêë channels (bands) in the range 400 to 2500 nm, each consisting of 397 lines of 268 pixels quantized to 8 bits per pixel (bpp). It is saved as BIS (band interleaved by sample) format. “moffett” is pictured above the Moffett Field area in CA. Its format is band interleaved by sample (channel, sample, line) with dimensions (224, 614, 512) and 8 bits per pixel. “jasper” portrays Jasper Ridge in CA. It has the same format and dimensions as “moffett”. In all cases, a square region of 256 ì 256 pixels which exhibits all key features present in the images is selected and used as a test set. Band 101 of “coast” image is shown in Figure 4 (a), band 101 of “moffett” image is shown in Figure 8, and band 116 of “jasper” image is shown in Figure 9. The coding unit size of 16 slices is used in our implementation. Coding 16 slices per time instead of coding the whole set of image sequence (i.e. all 204 channels for whole coast sequence) in one time saves considerable memory and coding delay. The final result is obtained by averaging over several 16-slice segments. However, even if we compromise the coding delay and try to process as many bands as possible per time, the performance cannot be significantly improved. Xiong et al. [20] proved that we cannot gain much by increasing the coding unit size. If we did that, we would pay much more in nearly intolerable memory requirement and computation delay. The coding performance of 3DSPECK is compared with that of 3D-SPIHT and 2D-SPIHT algorithms on the November 7, 2002 DRAFT 15 Fig. 8. Band 101 of “moffett” test image Fig. 9. Band 116 of “jasper” test image same sets of data, by means of the Peak Signal-to-Noise Ratio (PSNR): ò ífîpï-ðQñóò=ôöõ÷øMùûúüý{þ®ÿ î where (4) denotes the maximum bit depth, and MSE is the mean squared-error between the original and recon- structed images. A. Comparison with Conventional Methods Rate distortion curves are plotted in Figure 10, 11, and 12 for “coast”, “moffett”, and “jasper” sequences respectively. The PSNR results for 2DSPIHT are obtained by first processing each band separately, then averaging all the results. By comparing the rate distortion curves of 2DSPIHT to 3DSPIHT and 3DSPECK, we can see that 3D versions guarantee over 3 dB improvement at all rates. This implies that the hyperspectral image sequences are highly correlated, and taking 3D DWT can fully exploit these inter band correlations. Thus, 3D compression techniques achieve much better performance than the 2D version which encodes images separately. November 7, 2002 DRAFT 16 Fig. 10. Comparative evaluation the rate distortions of 3DSPECK, 3DSPIHT and 2DSPIHT for the “coast” image Fig. 11. Comparative evaluation the rate distortions of 3DSPECK, 3DSPIHT and 2DSPIHT for the “moffett” image Figure 10 and 11 show that 3DSPECK performs excellently in the case when high spatial frequency content is present. If the bit rate is very low (i.e. less than 0.4 bpp), 3DSPECK out-performs the well known 3DSPIHT. As the bit rate increases, 3DSPECK achieves higher PSNR than 3DSPIHT for the “coast” image, but lower PSNR for “moffett”. When the image sequence is smooth and does not have much high frequency content, the traditional 3DSPIHT performs better than 3DSPECK. The reason is that if energy is highly concentrated on the highest pyramid level of wavelet coefficients, the sorting algorithm of 3DSPIHT will locate the important information very quickly and use the minimum number of bits to present the unimportance of the lower levels of the pyramid. However, 3DSPECK has to use extra bit budget to describe whether each set on the list is important. Hence, for this situation, 3DSPIHT November 7, 2002 DRAFT 17 Fig. 12. Comparative evaluation the rate distortions of 3DSPECK, 3DSPIHT and 2DSPIHT for the “jasper” image will achieve better result. Again, 3DSPECK performs better compared to 3DSPIHT at low bit rate than at higher bit rate. The rate distortion results suggest that if the sources have much high frequency content, and the compression rate is required to be very high (bpp very low), 3DSPECK is an excellent candidate. Overall, it is comparable to 3DSPIHT. B. Assessing the Image Quality Since our application is hyperspectral imagery, the visual effect is very important. Therefore, it is necessary to further assess the quality of the compressed images. To show the visual effect, a single band of each of the test images is shown at different bit rates in Figures 13, 14 and 15. Images (b) in Figures 13 and 14 show “coast” and “moffett” images decoded at 0.1 bpp. Many fine details are lost and the borders between different regions are blurred. Such image quality can be used for quickly browsing image to check whether it is the sequence of interest. Since our coded sequences are embedded, higher quality images can be obtained by gradually decoding more of the available bits. If the original image is smooth, one can obtain good compressed image quality even at 0.1 bpp. Figure 15 (b) is such an example. Only strong edges are blurred. However, it is easy for people to tell many details on the picture. All the structural information has been correctly reproduced. As the bit rate is increased, we can get finer and finer images. The noise presented in the original image can also be refined. In other words, air dust and other sources rendered noise during capturing images. However, lossy compression has the effect of removing such noise. This makes the compressed images “look” better than the original ones. Images (c) and (d) in Figure 13, Figure 14 and Figure 15 illustrate high quality of compressed images at a bit rate as low as 0.3 bpp. November 7, 2002 DRAFT 18 C. Transform and Coding Gain We wish to establish the effectiveness of extension of SPECK from 2D to 3D separate from the 3D transform. An experiment called “3D-DWT + 2DSPECK” is conducted to accomplish this task. For each of a group of 16 images, 3D-DWT is applied to decompose them into a wavelet coefficient prism. The prism is saved into 16 frame segments. For given overall rate temporal subband to minimize the overall rate distortion , we assign different rate to different frames in . This rate allocation procedure is not novel, and the details can be found in [12][19]. Then, 2DSPECK is applied on these frames separately according to the assigned bit rates. The decoded coefficient frames will be concatenated into a coefficient prism. Inverse 3D-DWT is applied on this prism to reconstruct images and compute the rate distortion performance. The purpose of doing this is to test whether 3D-DWT provides the whole coding gain. “3D-DWT + 2DSPECK” does 3D transform and 2D coding, therefore can assess separately gain from 3D transform and 3D coding. Figures 16, 17, and 18 show the rate distortions curves for our test image sequences by comparing the results of 3DSPECK, 2DSPECK, and “3D-DWT + 2DSPECK”. For all three test sequences, the rate distortion curves of “3D-DWT + 2DSPECK” fall between the curves of 3DSPECK and 2DSPECK. This suggests that 3D-DWT contributes only part of the coding gain, while our 3D coding accounts for the rest. For the “coast” and “moffett” sequences, which contain more high frequency content, the “3D-DWT + 2DSPECK” curves are closer to 2DSPECK curves, and the gap between “3D-DWT + 2DSPECK” and 2DSPECK is about 1 2 dB, while the “jasper” has a “3D-DWT + 2DSPECK” curve that is closer to 3DSPECK, and the gap between “3D-DWT + 2DSPECK” and 2DSPECK is about 2.5 dB. The reason for this can be illustrated by using the same example of energy distributions as shown in section III. As demonstrated in Figure 5 and 6, the 3D-DWT does a better job in localizing energy to the low frequency subbands when the image sequence has less high frequency content, so that the 3D-DWT provides a larger gap between 2DSPECK and “3D-DWT + 2DSPECK” for the “jasper” sequence. If we go back to compare the PSNR results of 2DSPECK and 2DSPIHT, we can see that 2DSPECK performs slightly better than 2DSPIHT for the sequences of “coast” and “moffett” but slightly worse than 2DSPIHT for “jasper” sequence. This matches the results in the 2DSPECK paper [5]. In [5], the rate distortion curves of 2DSPECK and 2DSPIHT are compared for three images “lena”, “goldhill” and “barbara”. 2DSPECK gives slightly higher PSNR curve than that of 2DSPIHT for “barbara” and gives slightly lower PSNR curve than that of 2DSPIHT for the other two images. The “barbara” image has much high frequency content, while the other two are just regular images. These results suggest that 2DSPECK performs slightly better than 2DSPIHT in the case that much high spatial frequency content is present. This is proved again in our results of 3D sources. V. C ONCLUSION AND F UTURE W ORK This paper proposed a three dimensional set partitioned embedded block coder for hyperspectral image compression. The three dimensional wavelet transform automatically exploits inter-band dependence. The modified SPECK sorting algorithm can find the most important information efficiently and therefore provides a corresponding coding gain against two dimensional compression algorithm. November 7, 2002 DRAFT 19 The proposed 3DSPECK is completely embedded and can be used for progressive transmission. These features make the proposed coder a good candidate to compress (encode) hyperspectral images before transmission and to decompress (decode) them at another end for image storage. Compressed hyperspectral bit streams require protection from channel errors during transmission. For efficient end-to-end transport, a suitable forward-error-correcting (FEC) algorithm should be applied on the embedded bit streams before transmission. In the future, we will investigate this topic and implement appropriate channel coding algorithms. R EFERENCES [1] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, Image coding using wavelet transform, IEEE Trans. Image Processing, vol. 1, pp.205-220, 1992. [2] P.L. Dragotti, G. Poggi, and A.R.P. Ragozini, Compression of multispectral images by three-dimensional SPIHT algorithm, IEEE Trans. on Geoscience and remote sensing, vol. 38, No. 1, Jan 2000. [3] Thomas W. Fry, Hyperspectral image compression on reconfigurable platforms, Master Thesis, Electrical Engineering, University of Washington, 2001. [4] S-T. Hsiang and J.W. Woods, Embedded image coding using zeroblocks of subband/wavelet coefficients and context modeling, IEEE Int. Conf. on Circuits and Systems (ISCAS2000), vol. 3, pp.662-665, May 2000. [5] A. Islam and W.A. Pearlman, An embedded and efficient low-complexity hierarchical image coder, in Proc. SPIE Visual Comm. and Image Processing, vol. 3653, pp. 294-305, 1999. [6] B. Kim and W.A. Pearlman, An embedded wavelet video coder using three-dimensional set partitioning in hierarchical tree, IEEE Data Compression Conference, pp.251-260, March 1997. [7] Y. Kim and W.A. Pearlman, Lossless volumetric medical image compression, Ph.D Dissertation, Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, 2001. [8] J. Li and S. Lei, Rate-distortion optimized embedding, in Proc. Picture Coding Symp., Berlin, Germany, pp. 201-206, Sept. 10-12, 1997. [9] S. Mallat, Multifrequency channel decompositions of images and wavelet models, IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp.2091-2110, Dec. 1989. [10] A.N. Netravali and B.G. Haskell, Digital pictures, representation and compression, in Image Processing, Proc. of Data Compression Conference, pp.252-260, 1997. [11] E. Ordentlich, M. Weinberger, and G. Seroussi, A low-complexity modeling approach for embedded coding of wavelet coefficients, in Proc. IEEE Data Compression Conf., Snowbird, UT, pp. 408-417, Mar. 1998. [12] W.A. Pearlman, Performance bounds for subband codes, Chapter 1 in Subband Image Coding, J. W. Woods and Ed. Klvwer. Academic Publishers, 1991. [13] Proposal of the arithmetic coder for JPEG2000, ISO/IEC/JTC1/SC29/WG1 N762, Mar. 1998. [14] A. Said and W.A. Pearlman, A new, fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. on Circuits and Systems for Video Technology 6, pp. 243-250, June 1996. [15] P. Schelkens, Multi-dimensional wavelet coding algorithms and implementations, Ph.D dissertation, Department of Electronics and Information Processing, Vrije Universiteit Brussel, Brussels, 2001. [16] J.M. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Processing, vol. 41, pp.3445-3462, Dec. 1993. [17] D. Taubman, High performance scalable image compression with EBCOT, IEEE Trans. on Image Processing, vol. 9, pp.1158-1170, July, 2000. [18] I.H. Witten, R.M. Neal, and J.G. Cleary, Arithmetic coding for data compression, Commun. ACM, vol. 30, pp. 520-540, June 1987. [19] J.W. Woods and T. Naveen, A filter based bit allocation scheme for subband compression of HDTV, IEEE Transactions on Image Processing, IP-1:436-440, July 1992. November 7, 2002 DRAFT 20 [20] Z. Xiong, X. Wu, D.Y. Tun, and W.A. Pearlman, Progressive coding of medical volumetric data using three-dimensional integer wavelet packet transform, Medical Technology Symposium, 1998. Proceedings. Pacific, PP. 384 -387, 1998. [21] J. Xu, Z. Xiong, S. Li, and Y. Zhang, Three-dimensional embedded subband coding with optimized truncation (3-D ESCOT), J. Applied and Computational Harmonic Analysis: Special Issue on Wavelet Applications in Engineering. vol. 10, pp. 290-315, May 2001. November 7, 2002 DRAFT 21 (a) (b) (c) (d) Fig. 13. Results for “coast” hyperspectral image at channel 101: (a) original image (b) 0.1 bit/pixel (c) 0.4 bit/pixel (d) 0.8 bit/pixel November 7, 2002 DRAFT 22 (a) (b) (c) (d) Fig. 14. Results for “moffett” hyperspectral image at channel 116: (a) original image (b) 0.1 bit/pixel (c) 0.3 bit/pixel (d) 0.5 bit/pixel November 7, 2002 DRAFT 23 (a) (b) (c) (d) Fig. 15. Results for “jasper” hyperspectral image at channel 116: (a) original image (b) 0.1 bit/pixel (c) 0.3 bit/pixel (d) 0.5 bit/pixel November 7, 2002 DRAFT 24 34 33 32 31 PSNR (dB) 30 29 28 27 3DSPECK 2DSPECK 3D−DWT + 2DSPECK 26 25 24 0.1 Fig. 16. 0.2 0.3 0.4 Rate (bpp) 0.5 0.6 0.7 0.8 Comparative evaluation the rate distortions of 3DSPECK, 2DSPECK and “3D-DWT + 2DSPECK” for the “coast” image 50 48 PSNR (dB) 46 44 42 3DSPECK 2DSPECK 3D−DWT + 2DSPECK 40 38 0.1 Fig. 17. 0.15 0.2 0.25 0.3 Rate (bpp) 0.35 0.4 0.45 0.5 Comparative evaluation the rate distortions of 3DSPECK, 2DSPECK and “3D-DWT + 2DSPECK” for the “moffett” image November 7, 2002 DRAFT 25 42 40 PSNR (dB) 38 36 34 32 3DSPECK 2DSPECK 3D−DWT + 2DSPECK 30 28 0.1 Fig. 18. 0.15 0.2 0.25 Rate (bpp) 0.3 0.35 0.4 Comparative evaluation the rate distortions of 3DSPECK, 2DSPECK and “3D-DWT + 2DSPECK” for the “jasper” image November 7, 2002 DRAFT
© Copyright 2026 Paperzz