Multimedia Systems Sharif University of Technology INTRODUCTION AND BACKGROUND 1 What Is Image Compression Image compression is the art/science of finding efficient representations for digital images in order to: • Reduce the memory required for their storage, • Reduce the bandwidth or time required for their transfer across communication channels. • Increase the effective data transfer rate when reading from storage devices. 2 Digital Image Acquisition Digital images can be acquired in a variety of ways: •Image digitization or scanning (Photo CD, facsimile, etc.) •sight digitization or direct acquisition (digital cameras, most medical imaging modalities such as MRI) •Synthetic or computer generated When referring to digital images, the following terminology is useful: •Pixel or pel: picture element. •Spatial resolution: a measure of the number of image pixels per unit length in horizontal or vertical direction, expressed in pixels/in, dots/in, samples/in, etc. •Monochrome or grayscale: an image consisting of only gray values and no color. •Bit-depth: the number of bits used to represent each pixel. •Codevalue: a pixel’s gray level value, e.g., in an 8-bit/pixel image codevalues range from 0 to 255. 3 Need for Image Compression: Storage • Digital Camera An image captured with a single-chip CCD sensor (with a color filter array) requiring 768 pixels × 512 lines × 8 bits/pixel ≈ 375 KB per image. A solid-state memory card with a 1 MB capacity would hold less than 3 images. • Hybrid Image Capture A 24 × 36-mm (35-mm) negative photograph scanned at 12 μm requires 3072 pixels × 2048 lines × 24 bits/pixel ≈ 18 MB per image. A CD-ROM with a 600 MB storage capacity would hold about 30 images. • VHS-Quality Full-Motion Video Full-motion video using SIF format (CD-I) requires 352 pixels × 240 lines/frame × 24 bits/pixel × 30 frames/s ≈ 7.6 MB for one second of video. A CD-ROM with a 600 MB storage capacity would only hold about 80 seconds of full-motion video. Furthermore, the CD-ROM data transfer rate (1X speed) is 150 KB/s, which means it would take 72 minutes to playback 80 seconds of full-motion video! 4 Need for Image Compression: Transmission • Facsimile Transmission A low-resolution facsimile image (e.g., an 8 12 × 11-inch page scanned at 200 pixels/inch horizontal and 100 pixels/inch vertical ) transmitted over ordinary dial-up telephone lines. Using a 4.8 Kb/s modem, the transmission takes approximately 6.5 minutes per uncompressed image. • Digital Broadcasting (HDTV) Digital HDTV at a progressive scan resolution of 1280 × 720 pixels/frame ×24 bits/pixel × 60 frames/s. The data rate is about 166-186 MB/s. The aim of HDTV is to broadcast at about 20 Mbit/s using existing TV channels. This requires approximately 60:1 to 80:1 compression. • Videophone A QCIF color video image at 176 × 144 pixels × 24 bits/pixel, 15 frames/s, transmitted over ordinary dial-up telephone lines with a 28.8 Kb/s modem. The compression ratio needed is in excess of 300:1. 5 Pixels Lines 2084 7.6 MB/s 240 KB 18 MB 80 50 20 4 CR 3072 1100 30 166 MB/s 300 Size Photo CD 1728 240 60 1.1 MB/s f/s Facsimile 352 720 15 10 CD-I 1280 144 375 KB Need For Digital Image Compression Application 768 HDTV 176 512 Digital Camera Video Phone 6 Digital Image Compression Applications • Consumer Imaging (e.g., digital cameras, Photo CD) • Multimedia Products (e.g., CD-I, DV-I) • Medical Imaging (e.g., computed radiography, CT, MRI) • Remote Sensing (e.g., LANDSAT imaging) • Photojournalism (e.g., transceiver systems) • Graphic Arts (e.g., electronic pre-press) • Computer-Generated Images (e.g., advertising, entertainment, scientific visualization) • Facsimile Transmission (e.g., documents, digital half-tones, color fax) • Broadcasting (e.g., HDTV) • Many more such as: photo-videotex, image archiving, digital copiers, desktop and electronic publishing, teleconferencing, video phone, education, games, video mail, etc. 7 Compressibility of Images Consider the following 8 × 8 block example of the image LENA. How difficult is it to guess the correct value of the missing pixel? How critical is it to reproduce the exact value? 139 144 149 153 155 155 155 155 144 151 153 156 159 156 156 156 150 155 160 163 158 156 156 156 159 161 162 160 159 159 159 159 160 161 162 162 155 155 155 161 161 161 161 160 157 157 157 162 162 161 163 162 157 157 157 162 162 161 161 163 158 158 158 8 Why Can Images Be Compressed? Image compression can be achieved primarily because image data are often highly redundant and/or irrelevant. Redundancy relates to the statistical properties of images while irrelevancy relates to an observer viewing an image. • Redundancy (statistical): - Spatial correlation between neighboring pixel values. The amount of correlation depends on many factors such as resolution, bit depth, scene content, and noise. - Spectral correlation between color planes (e.g., RGB color images) or spectral bands (e.g., remote sensing). - Temporal correlation between neighboring frames in an image sequence. • Irrelevancy (perceptual): - The human visual system (HVS) exhibits sensitivity variations as a function of spatial frequency, orientation, light level, surrounding signals (masking), etc. Among factors affecting irrelevancy are scene content and noise, image size and viewing distance, display characteristics (e.g., noise, light level, nonlinearities), and the observer. 9 Compression of Color Images A digitized color image is typically represented by the tristimulus red, green, and blue signals R(i,j), G(i,j), and B(i,j), where (i,j) denotes the pixel location. In general, the R, G, and B signals are highly correlated and thus redundant. This correlation can be exploited by transforming the RGB values into a new set of values that are uncorrelated (or significantly less correlated). Many transformations, depending on the application, have been suggested to accomplish this task. The JPEG standard does not provide a definition for the color space to be used; the choice has intentionally been left to the user. The goal of the color transform is to pack most of the image spectral energy into one component and significantly less energy into the remaining two components. Loosely speaking, this is similar to the way the human visual system (HVS) encodes color. Psychophysical experiments suggest that the HVS encodes color by transforming the output of the three cone color mechanisms into an achromatic channel (such as luminance) and two chromatic channels (opponent channels). For the purposes of image compression, since the HVS is much more sensitive to the variations in the achromatic channel than in the chromatic channels, the chromatic components are often subsampled by a factor of 2:1 or 4:1 in both horizontal and vertical directions prior to compression. 10 Compression of color Images ( Contُ d ) An example of a transformation used for color image compression is that used in the NTSC national color television standard. The RGB values are transformed into a new set of values known as Y, I, and Q, (luminance, in-phase, and quadrature, respectively) according to the following Y(i,j) .299 .587 .114 R(i,j) I(i,j) = 596 –.274 –.322 G(i,j) Q(i,j) 212 –.523 .311 B(i,j) The transformation to convert YIQ back to RGB values is: R(i,j) G(i,j) B(i,j ) 1.000 .956 = 1.000 –.272 .621 Y(i,j) –.647 I(i,j) 1.000 –1.106 1.703 Q(i,j) A transform similar to the YIQ, but easier to implement in hardware is the color difference transform, which generates the Y, R – Y, and B – Y components. The perfect transform has yet to be agreed on, but with respect to data compression, the performance differences are insignificant. 11 Types of Image Compression ●Lossless (Reversible, Bit-preserving): – The image after compression/ decompression is numerically identical to the original image on a pixel-by-pixel basis. – Only the statistical redundancy is exploited to achieve compression. – A modest amount of compression, generally in the range of 1.5:1 to 4:1 (depending on the original image characteristics such as image detail, noise, resolution and bit-depth), can be achieved. ●Lossy (Irreversible): – The reconstructed image contains degradations relative to the original image. As a result, much higher compression can be achieved. In general, lower bit rates are achieved by allowing more degradations. – The statistical redundancy as well as the perceptual irrelevancy of the image data are exploited. – The term visually lossless is used to describe lossy compression schemes that result in no visible degradation. This definition is subjective and highly dependent on the viewing conditions. 12 Digital Image Compression Standardization Activities The adoption of image compression standards allows for: ● the exchange of images across application boundaries as well as a means for controlling image quality. ● a significant reduction in the cost of specialized hardware required in real-time image compression systems, thus stimulating product development. ITU-T: International Telecommunications Union (formerly CCITT) is concerned with telecommunications and transmission applications (e.g., fax standards). Standards originating from ITU are referred to as recommendations ISO: International Standardization Organization is concerned with information processing applications such as storage. Standards originating from ISO are referred to as International Standards. •Bilevel images (ITU-T Recs T.4 (G3) and T.6 (G4), JBIG) •Still-frame, continuous-tone images (JPEG) •Continuous-tone image sequences (ITU-T Recommendations H.261 and H.263, MPEG family of standards). 13 Bilevel Image standards JBIG (Joint Bilevel Imaging Group) was formed under the joint auspices of ISO and ITU-T in 1988 to work on an international standard for the compression and decompression of bilevel images. The group’s primary focus was to seek an algorithm that outperforms the existing ITU-T Group 3 and Group 4 facsimile standards within the scope of their applicability and to extend the utility of the standard to other applications. These extensions include: • Progressive representation of images so that the compressed data can be made available to a number of display or transmission devices with varying degrees of resolution and/or image quality requirements. • Adaptivity so that images with different characteristics can be compressed efficiently. For example, the current fax standards can result in a data expansion when digital half-tones are used as the input. This is because these standards use predefined coding tables based on the statistics of binary documents and text. Current status: The committee has issued a Standard document ITU-T T.82 — ISO/IEC 11544 in December of 1993. 14 Still-frame, Continuous-tone Image Standards JPEG (Joint Photographic Experts Group) was formed under the joint auspices of ISO and ITU-T in 1986 to work on an interna-tional standard for the compression and decompression of still-frame, The JPEG standard consists of the following components: •Baseline system: Provides a simple and efficient lossy DCTbased algorithm that is adequate for most applications. •Extended system features: Allow the baseline system to satisfy a broader range of applications such as up to 12-bit/sample input, arithmetic coding, adaptive quantization and progressive modes. •Independent lossless method: A DPCM-based lossless algorithm for applications that require an exact reconstruction of the original image. Current status: Three Standard documents ITU-T T.81- ISO/IEC 10918 were issued in 1994 and 1995. 15 Continuous-tone Image Sequence Standards ITU-T has standardized a coding algorithm in 1990 Recommendation H.261 for video telephony and teleconferencing over ISDN at bit rates ranging from 64 to 1920 (30 × 64) Kb/s and another algorithm in 1995 Recommendation H.263 for lower bit rate applications (8-32 Kb/s) over POTS. MPEG (Motion Picture Experts Group) has also been working under the auspices of ISO and ITU-T since 1988 to develop standards for the digital storage and retrieval and transmission of moving images and sound. The MPEG committee supports several concurrent activities that address the various needs and bit rate requirements of different applications: MPEG I committee has focused on compressing moving pictures and sound at consumer VCR quality at a combined bit rate of 1.0-1.5 Mb/s for applications such as CD-ROM storage. A Standard ISO/IEC 11172 was issued in 1994. MPEG II committee has focused on bit rates higher than 4 Mb/s up to and exceeding the bit rates required for HDTV. A Standard ITU-T H.262 ISO/IEC 13818 was issued in 1994. — MPEG IV committee has focused on a content-based interactive audio-visual standard for delivery in 1998. 16 Generalized Image Coding scheme Original image data (e.g., 24 bits/pixel full color RGB) Transformation or decomposition to reduce the dynamic range, or allow for more efficient coding (e.g., DPCM prediction error; DCT; Subband decomposition) L O S S Y L O S S L E S S Quantization Symbol encoding (e.g., Huffman; arithmetic; Original imageLempel-Ziv) data (e.g., 24 bits/pixel full color RGB) Encoded Data 17 Decomposition or Transformation • A reversible process (or near-reversible, due to finite-precision arithmetic) that provides an alternate image representation that is more amenable to the efficient extraction and coding of rele- vant information. • Examples: – Block-based linear transformations, e.g., Fourier transform, transform, discrete cosine transform (DCT), Hadamard etc. aimed at removing redundancy by decorrelating the image blocks. – Prediction/residual formation, e.g., differential pulse code modulation (DPCM), motion-compensated frame differenc-ing. – Subband decomposition. – Color space transformations, e.g., RGB → YIQ, aimed at decorrelating color components or utilizing the HVS perception of color. 18 SYMBOL ENCODING 19 Definition of Information Let E be an event that occurs with probability p(E). By the occurrence of the event E, one receives I (E) = log( 1 p(E) ) units of information. Thus, the more unlikely an event, the more information is revealed by its occurrence. The base of the logarithm determines the units for information: • –log 2p(E) bits • –ln p(E) nats • –log 10 p(E) Hartleys 20 Discrete Memoryless Source (DMS) si, sj, … Source S Consider a source that emits a sequence of symbols from a fixed finite source alphabet of size n, S = {s1, s2,... , sn}, where the probability of the occurrence of the symbol s, is given by p(si). Furthermore, assume that successive symbols emitted from the source are statistically independent. • If symbol si occurs, an amount of information equal to — 1 I(si)=log2( p ( s i ) ) bits is provided. • The probability of occurrence of a source symbol is p(si), so the average amount of information obtained per source symbol from the source is n H(S) = ∑ i =1 n p(si)I(si) = ∑ i =1 This is called the entropy of the source. 21 1 p(si)log p(si ) . Example Consider a source with an alphabet consisting of four letters S = {s1 s2, s3 s4}, where p(s1) = 0.60, p(s2) = 0.30, p(s3) = 0.05, and p(s4) = 0.05. The entropy of this source is H(S) = 1.40 bits/symbol The entropy of a source attains its maximum value when all the source symbols are equally probable and that value is log2 n bits, where n is the size of the source alphabet. In our example, if all the source symbols had the same probability (p = 0.25), the entropy of the source would be 2 bits/symbol. Interpreting the entropy: • The average amount of information per source symbol provided by the source. • Consider encoding this source with a binary alphabet for transmission or storage. According to Shannon’s noiseless source coding theorem, no uniquely decodable code can have an average length less than H(S). 22 Markov Information Source • An mth-order Markov source is a source in which the occurrence of a source symbol si depends upon a finite number m of preceding symbols. It is specified by the set of conditional probabilities: P (si| sj1 , sj2 …, sjm) for i, jp = 1, 2, …, n. • An mth -order Markov source has nm possible states. • The entropy of the mth -order Markov source is given by H(S) = ∑ s P(sj1 , sj2 …, sjm) H (S|sj1 ,…, sjm) m • Because of the structure present in the Markov source, its entropy is less than a DMS with the same source alphabet and stationary symbol probabilities (known as the adjoint source and denoted by S ). The output of the Markov source is more predictable than its adjoint source and thus less information is revealed by the occurrence of its source symbols. 23 Markov Source Example Consider a binary image represented by the values 0 and 1. The image may be assumed to have been generated by a Markov source where the probability of a given pixel value being 0 or 1 is determined by the combination of its m neighboring pixels. In general, m binary-valued pixels constitute 2m different combinations where each combination represents a certain state or context. An example of a 7pixel neighborhood (128 contexts) is shown below. 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 C D E F G 0 0 0 1 1 1 1 1 B A x 0 0 0 1 1 1 1 1 Let pj denote the probability of the pixel value being zero in the jth context. The entropy of the jth context Hj is given by Hj = – pj log2 pj, – (1 – pj ) log2(1 – pj) bits. The entropy of the image H(S), based on this model, is the average of the entropy of these different contexts, i.e., 2m H (S ) = ∑ H j P ( j ), j =1 where p(j) is the probability of the occurrence of the jth context. 24 Entropy of English Language1 Zeroth Approximation Assume all symbols are equiprobable: H(S) = log(27) = 4.75 bits/symbol A typical text generated by such a model would be: ZEWRTZYNSADXESYJRQY-WGECI JJ-OBVKRBQPOZBYMBUAWVLBTQ CNIKFMP-KMVUUGBSAXHLHSIE-M 1 N. Abramson, Information Theory and Coding, McGraw-Hill, 1968 25 Probability of Symbols in English Symbol Probability Symbol Probability Space 0.1859 N 0.0574 A 0.0642 O 0.0632 B 0.0127 P 0.0152 C 0.0218 Q 0.0008 D 0.0317 R 0.0484 E 0.1031 S 0.0514 F 0.0208 T 0.0796 G 0.0152 U 0.0228 H 0.0467 V 0.0083 I 0.0575 W 0.0175 J 0.0008 X 0.0013 K 0.0049 Y 0.0164 L 0.0321 Z 0.0005 M 0.0198 Probability of the letters in the English alphabet 26 Entropy of English Language First Approximation Symbols are chosen according to their probabilities: H(S)= ∑ Pilog ( pi ) = 4.03 1 bits/symbol i A typical text generated by such a model would be: AI-NGAE--ITF-NNR-ASAEV-OIE-BAINTHA-HYR-OOPOER-SETRYGAIETRWCO--EHDUARU-EU-C-FTNSREM-DIY-EESE--F-O-SRIS-R--UNNAS Second Approximation Symbols are chosen according to their 1st-order conditional probabilities: 1 H(S) = ∑∑ P(i,j)log( P (i| j ) )= 3.32 i bits/symbol j A typical text generated by such a model would be: URTESHETHING-AD-E-AT-FOULEITHALIORTWACT-D-STE-MINTSAN-OLINS-TWID-OULY-TETHIGHE-CO-YS-TH-HR-UPAVIDE-PAD-CTAVED 27 Entropy of English Language Third Approximation Symbols are chosen according to their 2nd-order conditional probabilities: H (S ) = ∑ ∑ ∑ P(i, j, k) log( i j k 1 ) = 3.1bits/ symbol P(i | j, k) A typical text generated by such a model would be: IANKS-CAN-OU-ANG-RLER-THATTEDOF-TO-SHOR-OF-TO-HAVEMEM-A-IMAND-AND-BUT-WHISSITABLY-THER VEREER-EIGHTS-TAKILLIS-TA nth (n → ∞) Approximation Symbols are chosen according to their nth-order probability, as n → ∞, the entropy of the model approaches that of the English language. Based on Shannon’s experiments, H(S) is estimated to be between 0.6 and 1.3 bits. 28 Variable-Length Codes Consider a source S with four symbols: S = { s1, s2, s3, s4} Symbol Probability Code I Code II s1 0.60 00 0 s2 0.30 01 10 s3 0.05 10 110 s4 0.05 11 111 • Average length of Code I = 2.0 bits/symbol • Average length of Code II = 1.5 bits/symbol • Code II is a prefix condition code, i.e., no codeword is a prefix of any other codeword. This allows a bit stream generated using Code II to be uniquely decodable. No uniquely decodable code can have an average length less than H(S). Example: Transmitted bit stream: 0011010111010 … Decoded bit stream: 0/0/110/10/111/0/10/… Decoded message: s1, s1, s3, s2, s4, s1, s2, … 29 Huffman Coding A code is compact (for a given source) if its average length is less than or equal to the average length of all other instantaneous odes for the same source and the same code alphabet. A general method for constructing compact codes is due to Huffman and is based on the following two principles: • Consider a source with an alphabet of size α. By combining the two least probable symbols of this source, a new source with α 1 symbols is formed. It can be shown that the codewords for the original source are identical to the reduced source codewords for all symbols that have not been combined. Futhermore, the codewords for the two least probable symbols of the original source are formed by appending (to the right) ‘0’ or ‘1’ to the codeword corresponding to the combined symbol in the reduced source. — • The Huffman code for a source with only two symbols consists of the trivial codewords ‘0’ and ‘1’. Thus, to construct the Huffman code for a source S, the original source is repeatedly reduced by combining the two least probable symbols at each stage until a source with only two symbols is obtained. The Huffman code for this reduced source is known (‘0’ and ‘1’). Then, the codewords for the previous reduced stage are found by appending a ‘0’ or ‘1’ to the codeword corresponding to the two least probable symbols. This process is continued until the Huffman code for the original source is found. 30 Huffman Coding Example Original source Reduced source Stage 1 Reduced source Stage 2 Reduced source Stage 3 Si P(Si ) S i′ P(Si′) S i′′ P ( S i′′) Si′′ P ( si′′′) S1 0.40 S 1′ 0.40 S 1′′ 0.40 S1′′′ 0.60 S2 0.20 S 2′ 0.25 S 2′′ 0.35 S 2′′′ 0.40 S2 0.15 S 3′ 0.20 S 3′′ 0.25 S4 0.15 S 4′ 0.15 S5 0.10 a) Source reduction process Original source Reduced source Stage 1 Reduced source Stage 2 S i Codeword S i′ S i′′ S1 1 S 1′ 1 S 1′′ 1 S2 000 S 2′ 01 S 2′′ 00 ← 001 S 3′ 000 ← S 3′′ 01 ← S 4′ 001 ← S2 0 S4 010 ← S5 011 ← Codeword 0 Codeword 0 Reduced source Stage 3 Si′′ Codeword S1′′′ 0← S 2′′′ 1← 1 1 1 b) Codeword construction process 31 0 1 Code Efficiency A compact code for S may still have an average length far greater than H(S). In our example, L = 1.50 bits/symbol, while H(S) =1.40 bits/symbol. The code efficiency is defined as H (S ) η= L In order to achieve η close to one, sometimes it is needed to encode blocks of source symbols instead of individual source symbols. In our example, a compact code for blocks of two source symbols results in an average length per original source symbol of 1.43 bits, which is reasonably close to the entropy. Symbol Probability Word Length s1s1 s1s2 s1s3 s1s4 S2s1 S2s2 S2s3 S2s4 S3s1 S3s2 S3s3 S3s4 S4s1 S4s2 S4s3 S4s4 0.3600 1 0 0.1800 3 111 0.0300 5 10100 0.0300 5 10000 0.1800 3 110 0.0900 4 1011 0.0150 6 100010 0.0150 6 101010 0.0300 5 10011 0.0150 6 100011 0.0025 9 101011001 0.0025 9 101011000 0.0300 5 10010 0.0150 7 1010111 0.0025 9 101011011 0.0025 9 101011010 32 Codeword Decoding Binary Huffman Codes • Sequential: By comparing the first received bit against the first P bit of every entry in the table, the number of possible transmitted P symbols is reduced. This process is sequentially repeated until the transmitted symbol is uniquely identified. • Table-Lookup Procedure: A decoding table of size 2lmax is p constructed where lmax is the maximum codeword length in the Huffman table. A block of lmax bits is parsed from the received sequence and is used to address the contents of the decoding table which identifies the transmitted symbol and the corresponding p codeword length. A decoding table for our example, where A → 0, B → 10, C → 110, and D → 111, would look like the following: Address Codeword Length Decoded Symbol 000 001 010 011 100 101 110 111 1 1 1 1 2 2 3 3 A A A A B B C D 33 Modified Huffman Code2 Frequently, some of the symbols in a symbol set that is being Huffman encoded are highly improbable. These symbols would take a disproportionately large share of the codeword memory, since the codeword length is proportional to – log2 p(si). It is advantageous to lump the less probable symbols into a symbol called ELSE (or ESCAPE) and design a Huffman code for the reduced symbol set including the ELSE symbol. Whenever a symbol belonging to the ELSE category needs to be encoded, the encoder transmits the codeword for ELSE followed by the actual message. Example: Consider a source S with 3 symbols with probabilities: p(s1) = 0.70, and p(s2) = p(s3) = 0.15, 3 H(S) = – ∑ p(si) log2 p(si) = 1.18 bits/symbol i =1 The Huffman coding procedure yields the codewords 0,10, and 11, where the average wordlength is 1.30 bits/symbol. Coding this source in blocks of two symbols each results in a codebook with 9 entries where the average codeword length is 1.20 bits per. original source symbol. Consider a modified Huffman code for encoding the symbol pairs which consists of only two entries: s1 s1 with a codeword of 0, and 2 Hankamer, IEEE Trans. Communications, Vol COM-27, pp. 930, 1979 34 “all the other pairs” with a codeword of 1 followed by three extra bits. The average length of this code is 1.26 bits/symbol, which is smaller than 1.30! Ideal Codelength Entropy: H(S) = ∑ pi log2 ( 1 ) i Pi Average codelength: L = ∑ pi Li i L ≥ H(S) The equality occurs iff Li = log2 ( 1 ) Pi The quantity log2( symbol si. 1 Pi ) is called the ideal codelength of the Example: Consider a binary source for which p0 0.999. The ideal codelengths are: L0 bits, respectively. 35 = = 0.001and p1 9.965 bits, and L1 = = 0.0014 Limitations of Huffman Coding •Since the Huffman codeword lengths have to be integers, the Huffman code becomes inefficient if the symbol probabilities significantly deviate from negative powers of two. • Given the minimum codeword length of one bit, Huffman coding becomes inefficient for low entropy sources (e.g., a binary source with large skew). • Performance can only be improved by encoding blocks of symbols, increasing the complexity and memory requirements. • The code becomes inefficient (and can even result in data expansion), if a mismatch between the actual source statistics and the assumed code statistics exist (one-pass vs. two-pass systems). • There is no adaptivity, i.e., performance suffers if there is a large variation of symbol statistics from one region of the data to another (unless dynamic Huffman coding is used). 36 Arithmetic Coding •In arithmetic coding, an entire source sequence is mapped into a single codeword (unlike Huffman coding where each codeword corresponds to a certain symbol). •In theory, arithmetic coding can encode the source sequence at its ideal codelength. In practice, most implementations exceed the ideal codelength by about 6%. •The symbols in a source sequence are inputted to the arithmetic coder one at a time, along with their probability estimates. For a given input, depending on the internal states of the arithemtic coder and the previous inputs, output bits may or may not be generated. In the long run, the output sequence is the codeword for the entire input sequence and has a length close to the ideal codelength of that sequence. Source Symbol Sequence Probability Estimate Codeword Sequence Arithmetic Coder 37 Practical Implementations of Arithmetic Coding In many practical implementations of arithmetic coding, such as the IBM Q-Coder or the JBIG/JPEG standard QM-Coder, the following features exist: • Only binary decisions are encoded. A nonbinary symbol is converted into a series of binary symbols by using an appropriate decision tree. • Finite precision registers are used for internal arithemtic operations. Multiplications are approximated by adds and shifts. • The probability of a particular symbol being 0 or 1 is selected from a finite set of values (i.e., is quantized). • The probability of a particular symbol being 0 or 1 is estimated from the encoded sequence and is adaptively updated after the encoding of each binary symbol using a finite state machine. • The adaptive probability estimation is part of the arithemtic coding process and, as such, has no effect on its complexity or processing speed. 38 Q-Coder and the QM-Coder Two of the most popular adaptive binary arithmetic coders are the Q-Coder invented by IBM, and the QM-Coder used in the JBIG and JPEG international standards: Q-Coder (IBM): • 12-bit registers • 29-state probability table • Bit stuffing to resolve carry propagation QM-Coder (IBM/AT&T/Mitsubishi): • 16-bit registers (smallest probability estimate is 2 –15 ). • 113-state fast-attack probability table (46 nontransient states) • Uses conditional exchange, resulting in up to 0.50% better compression on some images • Delayed output to resolve carry propagation • Inexpensive chip set available driven by the JBIG facsimile standard 39 LZW Compression Algorithm • The LZW (Lempel-Ziv-Welch) algorithm maps variable size strings of source symbols into fixed-length codewords. It is based on progressively building a dictionary which contains the symbol strings (of varying sizes) encountered in the message being compressed. The entries in the table reflect the statistics of the message. The LZW algorithm is particularly suitable for the encoding of one-dimensional data (such as computer source files) where certain patterns (strings of source symbols) occur frequently. In most applications, the size of the table is chosen to be 12 bits. The UNIX utility “compress” is a variation of LZW coding. The following describe the encoder and the decoder operations for the LZW algorithm. • Encoder: The table (dictionary) is first initialized to contain strings of single source symbols. The input string is then examined one symbol at a time and the longest input string for which an entry in the table exists is parsed off. The codeword for this string (a fixedlength code addressing the location of the recognized string in the table) is transmitted to the receiver. Each parsed input string is extended by its next input source symbol to form a new string which is then added to the table. Each added string is identified by a new codeword, namely, its address in the table. 40 LZW Compression Algorithm • Decoder: The decoder is initialized to the same string table as the encoder. Each received codeword is translated by the way of the string table into a source string. Except for the first symbol, everytime a codeword is received, an update to the string table is made in the following way: When a codeword has been translated, its first source symbol is appended to the prior string to add a new string to the table. In this way the decoder incrementally reconstructs the same table used at the encoder. A potential problem arises when an input source string contains the sequence KωKω, where K denotes a single source symbol, ω denotes a string of source symbols, and the string Kω already appears in the encoder string table. The compressor will parse out Kω, send the codeword for Kω, and add KωK to the table. Next, it will parse out KωK and send the just generated codeword for KωK. However, the decoder, on receiving the codeword for KωK, will not yet have added that string to its table because it does not yet know an extension symbol for the prior string. In such cases, the decoder should continue decoding based on the fact that this codeword’s string should be an extension of the prior string. Thus, the first symbol in the new string is the same as the first symbol of the prior string, which now also becomes the extension symbol of the prior string. 41 LZW Example Input string Index A B C AB BA AA AAA AAAC CA AAB BAA AAAA AC CAB BAAA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 . Input: ABAAAAAACAABAAAAACABAAAAB... 0 Parsed Sequence: 1 0 5 6 2 5 4 A/ B/ A/ A A / A A A / C / A A / B A / 6 0 8 10 A A A / A/ C A / B A A / AAB 42 Rice Code When the probability distribution of the symbols to be encoded is exponential, near optimal encoding can be performed by using a series of parameterized variable-length codes known as Rice codes. n Consider encoding a source S with 2 symbols, labeled n {0,1,...,2 – 1}, and arranged in a decreasing probability order. In a Rice code with parameter k, the n – k most significant bits are encoded as a unary number while the least significant k bits are fixed-length coded. Depending on the variance of the exponential distribution, a certain value of the parameter k will result in the best performance. Following is an example with n = 3: Symbol 0 1 2 3 4 5 6 7 k=0 k=1 1 10 01 11 001 010 0001 011 00001 0010 000001 0011 0000001 00010 00000001 00011 43 k=2 100 101 110 111 0100 0101 0110 0111 k=3 000 001 010 011 100 101 110 111 Pixel Representation in Different Resolutions1 44 Resolution Reduction (RR) The high-resolution image is divided into blocks of 2 × 2 pixels, and each block is mapped into one pixel in the low-resolution image. Although subsampling is the most straightforward method of resolution reduction (RR), it creates many artifacts with text and line drawings (e.g., deletes thin lines), as well as with dithered images (e.g., does not preserve average gray). As a result, the RR technique adopted by the standard is a carefully designed method which uses information from the neighboring high-resolution pixels in addition to previous low-resolution pixels to determine the mapping. It is aimed at preserving edges, lines, periodic patterns and dither patterns, thus achieving high performance with both text and dithered images. The concatenation of the color values of the numbered pixels (”0” for white and ”1” for black) is used as an address to the RR table, where the pixel labeled ”0” specifies the least significant bit, to determine 13 the color of the pixel marked with ”?” 11 10 8 7 6 5 4 3 9 ? 2 1 0 Pixels used to determine the color of a low-resolution pixel 3 JBIG Early Draft, WG9-S1R2, December 1990 45 Resolution Reduction Table4 4 JBIG Early Draft, WG9-S1R2, December 1990 46 Typical Prediction (TP) A given low-resolution pixel is called “not typical” if that pixel and all of its eight low-resolution nearest neighbors have the same color, but one of the four high-resolution pixels associated with it differs from this common color. A certain low-resolution line is “not typical” if it contains any “not typical” pixels. For each low-resolution line, the parameter LNTP determines if that line is typical. If LNTP = 1, the line is “not typical”, and TP is disabled. If LNTP = 0, during the process of coding that line, the coder would look for typical pixels. In general, the TP block looks for regions of solid color (white or black). Once it is determined that the pixel to be encoded belongs to such a region, none of the processing normally performed by the other blocks is required, thus resulting in a substantial gain in processing speed. On typical text images, TP would result in avoiding the coding of over 95% of the pixels15. The savings are significantly smaller for dithered images. Although the primary purpose of TP is to increase implementation speed, it also provides some coding gain. 5 JBIG Early Draft, WG9-SJR2, December 1990 47 Deterministic Prediction (DP) Consider a low-resolution image that has been generated from a highresolution image according to a particular set of resolution reduction (RR) rules. In the decoding of the high-resolution image, at times, a certain pixel value may be inferable from the available low-resolution pixels in addition to the high-resolution pixels already decoded. In such a case, the pixel value can be deterministically predicted and there is no need for it to be encoded. A trivial example of this is the subsampling RR rules, where the high-resolution pixel at the top-left of the 2 × 2 block can always be predicted as it has the same value as the low-resolution pixel representing that block. The values of the low-resolution and high-resolution neighboring pixels which are available to both the encoder and the decoder at the time of the encoding of a certain pixel are used as an address into a table to determine if that pixel is deterministically predictable. DP tables are highly dependent on the RR rules employed. The JBIG Standard makes provisions for an encoder to download DP tables to a decoder if it using a private RR algorithm. The primary purpose of DP is to improve the coding efficiency. Using the RR rules adopted by the JBIG Standard, DP provided about a 7% coding gain (which is thought to be typical) on a particular set of test images6. 6 JBIG Early Draft, WG9-S1R2, December 1990 48 Deterministic Prediction (DP)7 0 1 2 3 The four possible phases for high-resolution pixels 0 1 4 7 2 5 9 8 10 6 3 11 12 Labeling of pixels used by DP Phase Target Pixel Reference Pixels 0 1 2 3 8 9 11 12 0,1,2,3,4,5,6,7 0,1,2,3,4,5,6,7,8 0,1,2,3,4,5,6,7,8,9,10 0,1,2,3,4,5,6,7,8,9,10,11 Number of hits with default resolution reduction DP pixels for each spatial phase 7 JBIG Early Draft, WG9-S1R2, December 1990 56 20 108 526 1044 Model Templates for Arithmetic coding Model templates define a neighborhood around the pixel which is being coded. The values of the pixels within the model template, plus the spatial phase of the high-resolution pixel in the differential layers, specify a context for the arithmetic coder. Each pixel in the template contributes one bit to the context. The arithmetic coder needs to maintain a separate probability estimate for the LPS symbol within each context. For the lowest-resolution layer, the model template consists of 10 pixels (including the adaptive template whose position is allowed to change). This results in 1024 different contexts. For the differential layers, four different templates are used which depend on the spatial phase of the pixel to be encoded. The templates reference pixels from both the high- and low-resolution images. There are ten pixels in each differential-layer template. This, along with the two bits required to specify the spatial phase of the pixel, gives rise to 4096 different possible contexts for the encoding of the differential layers. 57 Model Template Configurations8 X X X X X X X X X ? A Model template for lowest resolution layer X A X X X X X A X X ? X X X X X ? X X X X A X X X X X X ? X Phase 3 Phase 2 Model templates for differential layer coding 8 JBIG Early Draft, WG9-S1R2, December 1990 58 X X X ? X X Phase 1 X A X X Phase 0 X X X X Adaptive Templates (AT) The various templates used to form the contexts for the arithmetic coder are allowed to somewhat change during the coding process. In particular, the location of a designated pixel (called the AT pixel) within the template is allowed to change according to certain rules to provide a higher predictive value. Generally speaking, the function of the AT is to find a horizontal periodicity in the image and change the template in such a way that the pixel preceding the pixel being encoded by exactly this periodicity is incorporated into the template. The AT block provides substantial coding gain (by as much as a factor of two)9 on images containing ordered dither. The figure in the next page shows the possible locations for the AT pixel in both the lowest-resolution layer and the differential layers. The initial location of the AT pixel is denoted by “0”. At any time there can only be one AT pixel in the model template and its location should be chosen from either the location labeled “0”, or the locations labeled by an integer in the range 3 to M, where M is a free parameter. For simplicity, when encoding the differential layers, any changes in the AT pixel location shall be effective simultaneously for all four spatial phases. 9 JBIG Early Draft, WG9-S1R2, December 1990 59 Adaptive Templates (AT) Locations10 M … 5 4 3 X X X X X X X X X ? 0 Possible locations for the AT pixel (lowest resolution layer) X M … 5 4 3 X 0 X X ? X Possible locations for the AT pixel (differential layers) 10 JBIG Early Draft, WG9-S1R2, December 1990 60 LOSSLESS IMAGE COMPRESSION SCHEMES 61 Lossless (Bit-preserving) Encoding • Lossless predictive coding (e.g. Differential Pulse Code Modulation, DPCM). • Bit-plane encoding. • Lossy encoding followed by lossless coding of the difference image. 62 Bit-plane Encoding Consider an image of size n × m where each pixel has a depth of k bits. This image may be represented by k, n × m bit-planes. Since each bit-plane is a binary image, binary compression techniques can be used to separately encode each bit-plane. f (i,j) = 10011001 most significant bit plane least significant bit-plane 63 The Gray Code (GC) Gray code has the property that if two pixel values differ by one, their GC representations would differ by only one bit. This results in more coherent bit planes compared with the binary representation and more efficient compression. In fact, the coherence of each GC bit plane is roughly equivalent to that of the next most significant binary bit plane. A method of generating GC is GC = BC (BC » 1), where denotes bit-wise exclusive OR operation, and » denotes bitwise logical right-shift operation. Example: 127 128 (B) 01111111 (GC) 01000000 (B) 10000000 (GC) 11000000 Binary Code Bit Planes 3 2 1 Binary Code Bit Planes 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 MSB LSB MSB 64 LSB Gray Code and_ Binary Code Representation of Bit-Planes 7 (MSB) and 6 65 Gray Code and Binary Code Representation 66 Bit-Plane Entropy Table Bit-Plane LENA BC Entropy GC Entropy 7-pel context 7-pel context bits/pixel bits/pixel GC Q-Coder 7-pel context bits/pixel BP 7 (MSB) BP 6 0. 142 0.141 0.144 0.273 0.145 0.150 BP 5 BP 4 0.436 0.707 0.227 0.473 0.235 0.490 BP 3 BP 2 BP 1 0.918 0.997 0.999 0.672 0.918 0.996 0.698 0.957 1.039 BP 0 (LSB) 0.999 0.999 1.042 Total 5.471 4.571 4.755 Bit-Plane BOOTS BC Entropy GC Entropy 7-pel context 7-pel context bits/pixel bits/pixel GC Q-Coder 7-pel context bits/pixel BP 7 (MSB) BP 6 0. 182 0.182 0.186 0.360 0.222 0.228 BP 5 BP 4 0.590 0.771 0.442 0.614 0.454 0.631 BP 3 BP 2 BP 1 0.901 0.948 0.953 0.783 0.907 0.947 0.808 0.938 0.981 BP 0 (LSB) 0.956 0.952 0.987 Total 5.661 5.049 5.213 • Entropies are for 2-D, 7-pel Markov contexts. • Q-Coder is the adaptive binary arithmetic coder developed by IBM. 67 Differential Pulse Code Modulation (DPCM) Previous line B C A Xn D Current line • The N pixels prior to X are used to form a linear prediction (estimate) of Xn denoted by X̂ n : n Xˆ N = N −1 ∑α X , i=0 i i where X̂ n is represented by the same number of bits as Xn and is truncated to the same range (e.g., 0 to 255 for an 8-bit image). • The error (differential) signal, e, is formed as e = XN – X̂ n The differential signal typically has a largely reduced variance compared to X N, is significantly less correlated than X N , and has a stable histogram well approximated by a Laplacian (doublesided exponential) distribution. A global Huffman code table may be designed for its encoding which would result in good performance for a wide variety of images. 68 69 Huffman Coding of Differential Signal Typically the number of entries in the Huffman table is large (e.g. ~ 512). This number may be substantially reduced by only a slight decrease in coding efficiency. All the symbols which represent large differences (and are thus less probable) are combined into one symbol and the Huffman code associated with it is used as a prefix to the actual binary representation of the pixel value to be encoded. Modified Huffman Table for LENA Difference value Histogram -16 -15 . . -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 . . 15 16 Mag > 16 623 729 . . 5376 7183 9893 13361 16813 21176 24830 25931 24366 20556 16639 12578 9387 7134 5190 . . 792 717 9347 Bits Code 9 8 . . 6 5 5 4 4 4 3 3 4 4 4 4 5 5 6 . . 8 9 5 110111000 00110010 . . 110110 01111 11010 0110 1001 1110 000 010 1111 1100 1000 0010 10110 01110 101111 . . 00110011 110111001 10101 • Entropy of the difference signal = 4.56 bits/pixel. • Average length of Huffman code (512 entries) = 4.60 bits/pixel • Average length of modified Huffman (34 entries) = 4.78 bits/pixel 70 Lossless DPCM Results LENA BOOTS Description Zeroth-order image entropy Zeroth-order differential entropy Original image standard deviation Differential image SD Reduction in variance 7.45 4.56 47.94 6.94 48 7.49 5.23 59.55 13.38 20 Local Huffman Code (LHC) Modified LHC (−64,64) Modified LHC (−32,32) Modified LHC (−16,16) Modified LHC (−8,8) Global Huffman Code (GHC) Modified GHC (−64,64) Modified GHC (−32,32) Modified GHC (−16,16) Modified GHC (−8,8) 4.60 4.60 4.62 4.69 4.96 4.67 4.67 4.67 4.72 5.09 5.26 5.27 5.35 5.51 5.76 5.34 5.35 5.43 5.61 5.81 Arithmetic Coding Context-based differential 4.45 4.94 entropy (45 contexts) IBM Q-coder bit rate 4.60 4.98 Huffman Coding 71 QUANTIZATION STRATEGIES 72 Quantization A quantizer is essentially a staircase function that maps the possible input values into a smaller number of output levels. For the purposes of image compression, quantization reduces the number of symbols that need to be encoded at the expense of introducing errors in the reconstructed image, i.e., it acts as a control knob that trades off image quality for bit rate. The individual quantization of each signal value is called scalar quantization (SQ), while the joint quantization of a block of signal values is called vector quantization (VQ). • Scalar quantization (SQ) − Uniform quantizer − Lloyd-Max or MMSE quantizer − Entropy-constrained quantizer • Trellis-coded quantization (TCQ) • Vector quantization (VQ) 73 Uniform Scalar Quantizer An N-level scalar quantizer maps the signal x into a discrete variable x* that belongs to a finite set {ri, i = 0,..., N − 1} of real numbers referred to as reconstruction, levels or quantizer output levels. The range of values x that map to a particular x* are defined by a set of points {di, i = 0,..., N}, referred to as decision levels. The quantization rule states that if x lies in the interval (di,di+1], it is mapped or quantized to ri, which also lies in the same interval. The simplest form of a quantizer is scalar a uniform quantizer where the quantizer decision levels are all of equal length, referred to as the quantizer step size. Reconstruction Levels Step Size Decision Levels 74 Optimum MMSE Nonuniform Quantizer Consider a signal x with a probability distribution function (pdf) p(x). In some applications, it is desirable to find the quantizer that minimizes the quantization distortion for a given number, N, of quantizer output levels. For a minimum-mean-square error (MMSE) distortion criterion, this quantizer is known as the Lloyd-Max quantizer. P(x) di 75 ri di+1 X Lloyd- Max Q uant izer For a given number of quantizer output levels (reconstruction 1ev-els) N, and a signal pdf p(x), the optimum MMSE quantizer has reconstruction levels ri , and decision levels di that minimize є = N ∑∫ i =1 ( x − r ) p ( x ) dx , d i +1 2 i di The solution to the above minimization problem is dj = r j −1 + r j 2 , rj = ∫ d j +1 d j ∫ d d j +1 xp ( x ) dx p ( x ) dx , j Each decision level is half-way between the two neighboring reconstruction levels, and each reconstruction level is at the cen1troid of the pdf that is 11enclosed by the decision region, i.e., the conditional mean of the input signal in the given interval. Lloyd iterative design algorithm An initial set of values for { di } are chosen; the optimum { ri } for the given set of { di } are solved for; the optimum { di } for the calculated { ri } are then determined; the iterations are proceeded with till ‘satisfactory convergence is achieved. Max, IRE Trans. Info. Theory, Vol IT- 6, pp. 7, 1960. 76 ‘U Entropy- Constrained Quantizer The Lloyd-Max quantizer aims at minimizing the MSE for a given number N of output quantization levels, and is useful when the out- put levels are coded with fixed-length codes. When variablelength coding is permitted, the final bit rate is determined by the entropy of the quantizer levels rather than by the number of levels. For example, a uniform quantizer may have a lower entropy, despite a larger number of output levels, compared to a Lloyd-Max quantizer operating with the same distortion. This leads to another criterion for quantizer design known as entropy-constrained quantization, where the quantization error is minimized subject to the constraint that the entropy of the quantizer output levels has a prescribed value. 77 Entropy- Constrained Quantizer (EC Q) • It has been theoretically established that for memoryless sources and at high output bit rates, the optimal entropyconstrained SQ quantizer is a uniform quantizer. With mild restrictions, these results are applicable to all pdfs and distortion measures of practical interest at high bit rates. • For the MSE distortion, the output bit rate of the optimal ECQ is within 0.255 bits/sample of the theoretical rate-distortion bound achievable by VQ only in the limit of very large block sizes. • It has been experimentally shown that even at low bit rates, for the MSE distortion and for a variety of memoryless sources, the performance of the uniform quantizer is virtually indistinguishable from that of the optimal ECQ, and interestingly, it is even closer to the rate-distortion bound than 0.255 bits, especially for more peaked pdfs such as Laplacian and gamma distributions. • Based on the above results, a uniform scalar quantizer followed by entropy coding should be considered when i) The source is memoryless; ii) The bit rate is moderate to large, and; iii) Variable-length coding of the quantizer levels are permitted. Makhoul et al., Proc. IEEE, Vol 73, PP. 1551, 1985. 78 Vector Quantization (VQ) In VQ, an n-dimensional input vector X = [x1, x2, ..., xn], whose components represent the discrete or continuous signal values, is mapped or quantized into one of Nc possible reconstruction vectors, Yi, i = 1, 2, ..., Nc. The distortion in approximating (quantizing) X with Yi is denoted by d(X, Yi) and for a MSE distortion measure is defined as 2 1 n d MSE ( X , Y ) = ∑ ( xi − yi ) n i =1 The set Y is referred to as reconstruction code book and its mem bers are called codevectors or templates. The code book design problem is to find the optimal codebook for a given input statistics, distortion measure, and codebook size Nc. Given the codebook, the quantization process (the determination of the decision regions) is straightforward and is based on the minimum distortion rule: an input vector that needs to be quantized is compared to all the entries in the codebook and is quantized to the codevector that results in the smallest distortion. An important theoretical result states that as n → ∞, the quantizer output distortion can get arbitrarily close to the rate-distortion bound. Furthermore, for a given codebook size Nc and for very large n, the quantizer output entropy approaches log2 Nc, i.e., the need for entropy coding is eliminated. Gray, IEEE ASSP Magazine, PP. 4, April 1984 79 VQ Example 80 Codebook Generation (LBG Algorithm) 1. Initialization: Given Nc = codebook size, є = distortion threshold, n = vector dimension, and A0 = {Yk, k = 1, ..., Nc} as an initial codebook, set m = 0 and D-1 = ∞. 2. Given the codebook Am find its minimum distortion partition p (Am) = { Si, i = 1, ..., Nc} such that X ∈ Si if d(X, Yi) ≤ d(X, Yj) for all j. D m −1 − D m m 3. Compute resulting average distortion D ; if ≤ є, Dm Stop. 4. Find the optimum codebook Am+1 = {Yk, k 1, ..., Nc}, by calculating the centroids of the partition p (A ); set m ← m+1; go to step 2. m Linde et. al, IEEE Trans. Communications, Vol COM-28, pp. 84, 1980. 81 = Tree-Search VQ • In finding the minimum distortion codevector, a full search of the codebook can be performed at a computational cost of O(n.Nc). The storage cost is n. Nc . • If a tree structure is imposed on the codebook, where each node has m branches with p = logm Nc levels to the tree, the computational cost reduces to O (n. m. logm Nc), but the storage cost increases to n. (Nc – 1). m . m −1 The codebook is designed by using successive applications of the LBG algorithm at each node for a codebook of size m. • Example: A binary tree (m = 2), for Nc = 8: Gray, IEEE ASSP Magazine ,pp. 4, April 1984. 82 Trellis-Coded Quantization (TCQ) This technique combines linear error correcting codes and lowcomplexity scalar quantizers to achieve a performance similar to VQ, but with significantly less complexity. TCQ can effectively close the 0.255 bps that exists between the best scalar and vector quantizers. Super Q 83 LOSSY IMAGE COMPRESSION SCHEMES 84 Lossy Compression Schemes • Predictive Coding (DPCM, Photo CD) • Transform Coding (DCT and the JPEG international standard) • Subband and Wavelet Coding • Fractal Compression • Vector Quantization Image Compression • Progressive Transmission 85 DIFFERENTIAL PULSE CODE MODULATION (DPCM) 86 Differential Pulse Code Modulation (DPCM) Previous line B C A Xn D Curren tline •The N pixels prior to X are used to form a linear prediction (estimate) of XN denoted by XN: N Xˆ N = N −1 ∑a X i=0 i i . • The error (differential) signal, e, is formed as e = X N − Xˆ N . • The differential signal, e, is quantized and encoded. The design of a DPCM system consists of two parts: 1. Predictor Optimization (Finding the optimum set of ai.) 2. Quantizer Optimization. 87 DPCM Example Predictor: X̂ = 0.75A – 0.5B + 0.75C Original image variance = 2300, Differential image variance = 48, Reduction in variance = i 2300 48 (di , di+1) → ri = 48 ×. Probability Huffman code 0 (–255,–16) → –20 0.025 111111 1 (–16,– 8) → –11 0.047 11110 2 (– 8,– 4) → – 6 0.145 110 3 (– 4,0) → – 2 0.278 00 4 (0,4) → 2 0.283 10 5 (4,8) → 6 0.151 01 6 (8,16) → 11 0.049 1110 7 (16,225) → 20 0.022 111110 • Fixed-length binary code average bit rate • Huffman code average bit rate • Entropy H(S) = 2.52 = bits/symbol 88 = 3 bits/symbol 2.57 bits/symbol DPCM Example Predictor: X̂ = 0.5A + 0.5C Original Image Reconstructed Image 100 105 115 120 X1 107 45 X2 110 100 115 107 110 105 121 94 X 1R X 2R Encoding pixel X1: Prediction: X̂ 1= (105 + 115)/2 = 110, Error signal: e1 = X 1 − Xˆ 1 = 120 − 110 = 10 , Quantized error signal: e1 = 11, * * R Reconstructed pixel value: X 1 = Xˆ 1 + e1 = 110 + 11 = 121 . Encoding pixel X2: Prediction: X̂ 2 = (121 + 107)/2 = 114, Error signal: e 2 = X 2 − Xˆ 2 = 45 − 114 = − 69 , * Quantized error signal: e2 = –20, Reconstructed pixel value: X 2R = Xˆ 2 + e2* = 114 − 20 = 94. 89 Predictor Optimization Habibi, IEEE 7Vans. Communications Technology, Vol COM-19, pp. 948, 1971. 90 Predictor Optimization B C D A X • The stationary assumption may not hold for some images. Further, optimizing the predictor coefficients for each image may not be feasible for many real-time applications. Instead, a set of typical predictor values can be used which result in good performance for most images. Examples are: Xˆ = A, orB, orC, 1st-order, 1-D predictor, Xˆ = 12 A + 12 C 2nd-order, 2-D predictor, Xˆ = 0.75 A − 0.5 B + 0.75C 3rd-order, 2-D predictor. Xˆ = A − B + C plane predictor. • Two-dimensional prediction usually results in large performance improvement compared to 1-D prediction. Although the improvement in signal-to-noise-ratio (SNR) is only around 3 db, the subjective quality improvement of the encoded images is usually more. This is mainly due to the elimination of the jaggedness around non-horizontal edges. Two- dimensional prediction, however, requires the buffering of the previous line. 91 DPCM Artifacts Netravali, Proceedings of IEEE, Vol 68, pp. 366, 1980 92 Adaptive Image Compression A compression scheme is adaptive if the structure of the encoder or the parameters used in encoding vary locally within an image. 1. Causal Adaptivity: Adaptivity at the encoder is based only on the previously reconstructed values and is fully repeatable at the decoder. The main advantage is that encoding does not require overhead bits. The disadvantages are: (a) the encoder would fail to adapt to the abrupt changes in the input statistics which can not inferred from previous reconstructed values; (b) the decoder should be as complex as the encoder since it should repeat the same decision making process. 2. Noncausal Adaptivity: Adaptivity at the encoder is based on previously reconstructed values as well as the future input values. Since the latter are not available at the decoder, the encoder should provide additional overhead bits to inform the decoder of the type of adaptivity selected. Systems of this kind usually display superior performance and require less decoder complexity but at the expense of extra overhead bits. 93 Adaptive Prediction in DPCM B C A X D • Determine contour lines and use only the pels along contours to form the prediction. • Scale the prediction in high contrast areas based on previous reconstructions. • Median Predictor: This predictor, which is used in the new JPEG lossless proposal, first performs a simple test to detect horizontal or vertical edges. If an edge is not detected, then the predicted value is A + C − B, as this would be the value of the missing pixel if a plane is passed through the A, B, and C sample locations. If a vertical edge is detected, C is used for prediction, while for a horizontal edge, A is used. X̂ = min(A, C) max(A,C) A + C −B if B ≥ max(A,C) if B ≤ min(A,C) otherwise 94 Adaptive Quantization • Scale the quantizer according to the quantized value of the previous differential signal. • Block the scan line and for each block switch among a finite number of quantizers based on the block variance. • Choose among a finite number of quantizers for each image block according to the RMS error resulting at the encoder12: — — — — — — 12 Partition each scan line into blocks of N pixels. Encode the block using each of M quantizers. Measure distortion resulting from each quantizer. Select quantizer with minimum distortion. Transmit overhead information to identify quantizer to receiver. Transmit encoded signal for block. Habibi, IEEE Trans. Communications, Vol COM-26, pp. 1671, 1978. 95 Block Diagram Of Switched Quantizer 96 CM Bitrate/Error Table (LENA) Technique Bitrate Entropy RMSE SNR 1-D DPCM 1-D DPCM 1-D DPCM 2-D DPCM 2-D DPCM 2-DD PCM 1-D ADPCM 1-D ADPCM 1-D ADPCM 2-D ADPCM 2-D ADPCM 2-D ADPCM bits/pixe bits/pixel l 1.00 1.00 1.65 2.00 2.39 3.00 1.00 1.00 2.00 1.72 3.00 2.52 1.20 2.20 3.20 1.20 2.20 3.20 1.17 1.92 2.78 1.18 2.03 2.91 • Predictor for 1-D DPCM and ADPCM is (0-255) (db) 18.67 9.44 5.11 14.58 6.93 3.71 22.71 28.63 33.96 24.86 31.32 36.74 10.91 4.37 2.16 7.84 2.87 1.37 27.37 35.32 41.44 30.24 38.97 45.40 = .97A. Predictor for 2-D DPCM and ADPCM is X̂X̂= .75A – .50B + .75C. Quantizers are Lloyd-Max for a Laplacian distribution with the same variance as the differential signal. The frequency of usage of each quantizer level is used as its probability to calculate the entropy. • The ADPCM scheme switches among 4 quantizers based on a minimum mean-square-error (MMSE) criterion on a block size of 10 pixels. The resulting overhead is .2 bits/pixel. Quantizers are scaled versions of the Max quantizer for the global differential signal with scaling factors of .50, 1.00, 1.75, and 2.50. • SNR is defined as 101og10 (2552/mse) db. 97 DPCM Bitrate/Error Table (BOOTS) Technique Bitrate Entropy RMSE SNR 1-D DPCM 1-D DPCM 1-D DPCM 2-D DPCM 2-D DPCM 2-D DPCM bits/pixel bits/pixel 1.00 1.00 2.00 1.71 3.00 2.41 1.00 1.00 2.00 1.69 3.00 2.41 (0-255) 22.79 10.96 5.33 15.91 7.70 4.01 (db) 20.98 27.33 33.60 24.10 30.40 36.07 1-D ADPCM 1-D ADPCM 1-D ADPCM 1.20 2.20 3.20 1.19 1.89 2.74 14.12 5.97 3.16 25.13 32.61 38.14 2-D ADPCM 2-D ADPCM 2-D ADPCM 1.20 2.20 3.20 1.18 1.89 2.74 10.20 4.75 2.56 27.96 34.60 39.97 • Predictor for 1-D DPCM and ADPCM is X̂ = .97A. Predictor for 2-D DPCM and ADPCM is X̂ = .75A – .50B + .75C. Quantizers are Lloyd-Max for a Laplacian distribution with the same variance as the differential signal. The frequency of usage of each quantizer level is used as its probability to calculate the entropy. • The ADPCM scheme switches among 4 quantizers based on a minimum mean-square-error (MMSE) criterion on a block size of 10 pixels. The resulting overhead is .2 bits/pixel. Quantizers are scaled versions of the Max quantizer for the global differential signal with scaling factors of .50, 1.00, 1.75, and 2.50. • SNR is defined as 10 log 10(255 2 /mse) db. 98 TRANSFORM CODING AND THE JPEG INTERNATIONAL STANDARD 99 Transform Coding • The original image is first divided into a number of smaller subimages or blocks which usually are processed independently of one another. A reversible transformation is performed on each block to reduce the correlation between pixels. The transform coefficients are then quantized and coded. • Example: Consider a simple 1-D transform encoder operating on 1 × 2 subimages. The vector x = (x1,x2) , represents two adjacent picture elements and y = (y1,y2) denotes the transform which in this case is a 45◦ rotation of the axes. A scatter plot of the values of x1 and x2 demonstrates that large values of x1 – x2 2 2 σ y2 << σ y2 Although unlikely. σ x = σ x , but .The transformation only redistributes the energy (variance) while keeping the total energy unchanged. 1 2 2 100 1 Image Transforms13 • Karhunen-Loève Transform (KLT): It is the optimum transform in an energy-packing sense, i.e., if a mean-square (energy) criterion is employed. However, it is image-dependent, requires estimation of the image covariance matrix, and it can not be implemented by a fast algorithm (takes O(N2) operations to transform a 1-D, N-point data). • Discrete Fourier Transform (DFT): Well-known for its connection to spectral analysis and filtering. Extensive study done on its fast implementation (O(N log2 N) for N-point DFT). Has the disadvantage of storage and manipulation of complex quantities and creation of spurious spectral components due to the assumed perioclicity of image blocks. • Discrete Cosine Transform (DCT): For conventional image data having reasonably high inter-element correlation, its performance is virtually indistinguishable from KLT. Avoids the generation of the spurious spectral components which is a problem with DFT and has a fast implementation which avoids complex algebra. • Walsh-Hadamard Transform (WHT): Although far from optimum in an energy-packing sense for typical imagery, its simple implementation (basis functions are either −1 or +1) has made it widely popular. • Haar Transform (HT): Not very popular due to its local sensitivity to image detail. 13 Pratt, Digital Image Processing, Wiley, 1978 101 Discrete Cosine Transform (DCT) 4C (u )C (v ) n −1 F (u , v ) = ∑ n2 j =0 ( 2 j + 1)uπ ( 2k + 1)vπ cos 2n 2n n −1 ∑ f ( j , k ) cos k =0 (2 j + 1)uπ ( 2k + 1)vπ cos f ( j , k ) = ∑ ∑ C (u )C (v ) F (u , v ) cos 2n 2n u = 0 v =0 n −1 n −1 12 c (w) = 1 forw = 0 forw = 1, 2 , , N − 1 102 The JPEG Standard “Toolkit” The JPEG standard provides a “toolkit” of techniques for the compression of continuous-tone, still, color and monochrome images from which applications can select the elements that satisfy their particular requirements: • The baseline system provides a simple and efficient DCTbased algorithm adequate for most image compression applications. It uses Huffman coding, operates only in sequential mode, and is restricted to 8 bits/sample source image precision. • The extended system enhances the baseline system to satisfy a broader range of applications. These optional features include: 12-bit/sample input — — — — — — — Sequential and/or hierarchical progressive build-up Arithmetic coding Adaptive quantization Tiling Still picture interchange file format (SPIFF) Selective refinement • A simple DPCM-based lossless method independent of the DCT (using either Huffman or arithmetic coding) to accommodate applications requiring lossless compression. 103 The JPEG Baseline Algorithm • The original image is partitioned into 8 × 8 pixel blocks. In order to facilitate the implementation of the DCT, for a b-bit image, the value 2 b-1 (e.g., 128 for an 8-bit image) is subtracted from each pixel value. Each block is independently transformed using the forward DCT. • All the transform coefficients are normalized (weighted) according to a userdefined normalization array that is fixed for all blocks. The objective is to weight each coefficient according to its perceptual importance. The human visual system (HVS) contrast sensitivity function can be used to determine this array. Each component of the normalization array is an 8-bit integer and is passed to the receiver as part of the header information. Up to four normalization arrays can be specified (e.g., different normalization arrays may be used for the luminance and chrominance components of a color image). The bit rate of an encoded image can be conveniently varied by changing this array, e.g., by scaling the array by a constant factor. The normalized coefficients are then uniformly quantized by rounding to the nearest integer. The normalization array effectively scales the quantizer. • The top-left coefficient in the 2-D DCT array is referred to as the DC coefficient and is proportional to the average brightness of the spatial block. The difference between the quantized DC coefficient of the current block and the quantized DC coefficient of the previous block is variable-length coded. This difference is first assigned to one of the twelve categories where the values in category k are in the range (2k-1, 2k −1) or (−2k +1, −2k −1) and 11 ≥ k ≥ 0. A set of Huffman codes with a maximum codeword length of 16 bits is used to specify the different categories. For each category, it is necessary to send an additional k bits to completely specify the sign and magnitude of a difference value within that category. For the baseline system, up to two separate DC Huffman tables can be specified in the header information. 104 The JPEG Baseline Algorithm • The quantization of AC coefficients creates many zeros, especially at higher frequencies. The 2-D array of the DCT coefficients is formatted into a 1-D vector using a zigzag reordering. This rearranges the coefficients in approximately decreasing order of their average energy (as well as in order of increasing spatial frequency) with the aim of creating large runs of zero values. • To encode the AC coefficients, each nonzero coefficient is first described by a composite 8-bit value (0 → 255), denoted by I, of the form (in binary notation): I = ’NNNNSSSS’ The four least significant bits, ‘SSSS’, define a category k for the coefficient amplitude in a way similar to the DC difference values. For the baseline system 10 ≥ k ≥ 1. The four most significant bits in the composite value, ’NNNN’, give the position of the current coefficient relative to the previous nonzero coefficient, i.e., the runlength of zero coefficients between nonzero coefficients. The runlengths specified by ’NNNN’ can range from 0 to 15, and a separate symbol, I = 240, is defined to represent a runlength of 16 zero coefficients. If the runlength exceeds 16 zero coefficients, it is coded by using multiple symbols. In addition, a special symbol, I = 0, is used to code the end of block (EOB) which signals that all the remaining coefficients in the block are zero. Therefore, the total symbol set contains 162 members (10 categories × 16 runlength values +2 additional symbols). The output symbols for each block are then Huffman coded (maximum codeword length of 16) and are followed by the additional bits (assumed to be uniformly distributed and thus fixedlength coded) required to specify the sign and exact amplitude of the coefficient in each of the categories. Up to two separate AC Huffman tables can be specified in the baseline system. 105 The JPEG Baseline Algorithm • Each component of a color (or multi-spectral) image is encoded indepenexample, an image represented in the YIQ luminance and chrominance space is encoded essentially as three separate images. dently. For • At the decoder, after the encoded bit stream is Huffman decoded and the 2dimensional array of the quantized DCT coefficients are recovered, each coefficient is denormalized, i.e., multiplied by the corresponding component of the normalization matrix. The resultant array is inverse DCT transformed and the value 2b-1, which was subtracted prior to the forward DCT, is added back to each pixel value. This yields an approximation to the original image block. The resulting error depends on the amount of quantization which is controlled by the normalization matrix. 106 Encoder Simplified Block Diagram 8x8 blocks DCT-Based Compression Encoder FDCT Source Image Data Quantizer Quantization Tables 107 Entropy Encoder Huffman Tables Header Data Compressed Image Data Decoder Simplified Block Diagram DCT-Based Compression Decoder Header Data Compressed Image Data Entropy Encoder Huffman Tables Dequantizer Quantization Tables 108 IDCT Reconstructed Image Data Original 8 × 8 Image Block The following example represents an 8 × 8 block of the original image LENNA. 139 144 150 159 159 161 162 162 144 151 155 161 160 161 162 162 149 153 160 162 161 161 161 161 153 156 163 160 162 161 163 161 155 159 158 160 162 160 162 163 109 155 156 156 159 155 157 157 158 155 156 156 159 155 157 157 158 155 156 156 159 155 157 157 158 Mean Subtracted Image Block The value of 128 is subtracted from each pixel prior to the application of the discrete cosine transform (DCT). This places the DC coefficient (the top-left corner coefficient that is proportional to the average brightness of the block) in the range (–1024, +1016). 11 16 22 31 31 33 34 34 16 23 27 33 32 33 34 34 21 25 32 34 33 33 33 33 25 28 35 32 34 33 35 33 110 27 31 30 32 34 32 34 35 27 28 28 31 27 29 29 30 27 28 28 31 27 29 29 30 27 28 28 31 27 29 29 30 DCTof 8 × 8 Image Block The JPEG forward DCT is defined as 4C ( u ) C ( v ) 7 F (u , v ) = ∑ n2 j =0 12 c (w) = 1 ( 2 j + 1)uπ ( 2k + 1)vπ cos 16 16 7 ∑ f ( j , k ) cos k =0 forw = 0 otherwise . 235.6 − 1.0 − 12.1 − 22.6 − 17.5 − 6.2 − 10.9 − 9.3 − 1.6 − 7.1 − 1.9 0.2 − 0.6 − 0.8 1.5 − 0.2 1.6 1.8 − 1.3 − 0.4 − 0.3 1.6 − 3.8 − 2.6 − 5.2 − 3.2 1.5 1.5 1.6 − 0.3 − 1.5 − 1.8 111 2.1 − 2.9 0.2 0.9 − 0.1 − 0.8 − 0.5 1.9 − 1.7 − 2.7 − 0.1 0.4 − 0.9 − 0.6 − 0.1 0.0 − 0.7 0.6 1.5 1.0 1.7 1.1 1.2 − 0.6 1.3 − 1.2 − 0.1 0.3 1.3 − 1.0 − 0.8 − 0.4 Normalization Matrix The DCT coefficients are scaled using a user-defined normalization array that is fixed for all blocks. For the baseline system, four different normalization arrays are allowed (e.g., to accommodate luminance and chrominance components of an image). Each component of the normalization array, Q(u,v), is an 8-bit integer that in effect determines the quantization step size. The quality and bit rate of an encoded image can be varied by changing this array. The normalization matrix may be designed according to the perceptual importance of the DCT coefficients under the intended viewing conditions. The following is an example obtained by measuring the DCT coefficient “visibility threshold” using CCIR-601 images and display: 16 12 14 14 18 24 49 72 11 12 13 17 22 35 64 92 10 14 16 22 37 55 78 95 16 24 40 19 26 58 24 40 57 29 51 87 56 68 109 64 81 104 87 103 121 98 112 100 112 51 60 69 80 103 113 120 103 61 55 56 62 77 92 101 99 Quantization Example Quantization step size = 20 DCT unquantized coefficient = 13.56 Scaled coefficient = 13.56 = 0.68 20 Scaled, quantized (rounded) coefficient = 1 Dequantized coefficient (Decoder) = 1 × 20 = 20 113 Normalized/Quantized Coefficients After normalization, the coefficients are quantized by rounding off to the nearest integer: F (u , v) F * (u , v ) = NINT Q ( u , v ) The normalization/quantization process typically results in many zero-valued coefficients which can be coded efficiently. 15 − 2 −1 0 0 0 0 0 0 −1 −1 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Zigzag Scan Prior to the entropy encoding, the 2-D coefficients are scanned into a 1-D format according to the following zigzag pattern. This rearranges the coefficients in approximately decreasing order of their average energy (as well as approximately in order of increasing spatial frequency) with the aim of creating large runs of zero values. 0 2 3 9 10 20 21 35 1 5 4 7 8 12 11 18 19 23 22 33 34 37 36 48 6 13 17 24 32 38 47 49 14 16 25 31 39 46 50 57 115 15 26 30 40 45 51 56 58 27 29 41 44 52 55 59 62 28 42 43 53 54 60 61 63 Huffman Coding Example Quantized DCT AC coefficients in zigzag order: 15 0 –2 –1 –1 –1 0 0 –1 All Zeros. Huffman Symbols: (Zero run, Category) DC Coefficient (1,2) (0,1) (0,1) (0,1) (2,1) EOB. Codeword Assignment Final codeword = (Huffman codeword) append (k additional bits for category k coefficient) –2 preceded by one zero: (k=2,run=1) 111001 append 0 (sign bit) append 1 = 11100101. Huffman code for the block: DC diff. codeword/ 11100101/ 000/ 000/ 000/ 110110/ 1010. Total bits assuming 8 bits for the DC difference codeword= 35, 35 Bit rate = = 0.55 bits/pixel 64 116 JPEG AC Huffman Table Example Zero Run Category Codelenght Codeword 0 2 1 00 0 2 2 01 0 3 3 100 0 4 4 1011 0 5 5 11010 0 6 6 111000 0 7 7 1111000 . . . . 1 4 1 1100 1 6 2 111001 1 7 3 1111001 1 9 4 111110110 . . . . 11011 2 5 1 11111000 2 8 2 . . . . 111011 3 6 1 111110111 3 9 2 . . . . 111011 4 6 1 1111010 5 7 1 1111011 6 7 1 11111001 7 8 1 11111010 8 8 1 111111000 9 9 1 111111001 10 9 1 111111010 11 9 1 . . . . . . . . 1010 4 End of Block (EOB) 117 Zigzag Scan and Coefficient Categories Zig-zag scan AC ordering category Coeffcient Rang 0 1 2 3 4 5 6 7 8 9 10 11 0 −1,1 −3, −2, 2, 3 −7,…, −4, 4, …,7 −15,…, −8, 8, …,15 −31,…, −16, 16, …,31 −63,…, −32, 32, …,63 −127,…, −64, 64, …,127 −255,…, −128, 128, …,255 −511,…, −256, 256, …,511 −1023,…, −512, 512, …,1023 −2047,…, −1024, 1024, ,2047 118 Denormalized DCT Coefficients At the decoder, the quantized DCT coefficients are dequantized (denormalized) by multiplying by the normalization matrix. 0 240 − 24 − 12 − 14 − 13 0 0 0 0 0 0 0 0 0 0 − 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 119 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Reconstructed 8 × 8 Image Block The dequantized DCT coefficients are inverse transformed according to 1 fˆ ( j, k ) = ∑ 4 u =0 7 (2 j + 1)uπ (2k + 1)vπ ˆ ( ) ( ) ( , ) cos cos C u C v F u v ∑= 16 16 v 0 7 forw = 0 otherwise 12 c (w) = 1 The value 128 is added back to each pixel value. 144 148 155 160 163 163 160 158 146 150 156 161 163 163 161 159 149 152 157 161 164 164 162 161 152 154 158 162 163 164 162 161 154 156 158 161 162 162 162 162 120 156 156 157 159 160 160 161 161 156 156 156 157 158 158 159 159 156 156 155 155 156 157 158 158 Error: (Original - Reconstructed) The root mean-squared error (RMSE) between the original image block and the reconstructed image block is [ ] 2 1 7 7 ˆ RMSE = ∑∑ f ( j, k ) − f ( j, k ) = 2.26. 64 j =0 k =0 − 5 − 4 − 5 −1 − 4 − 2 2 4 1 1 −2 0 1 1 2 3 −1 3 5 0 0 1 − 2 −1 − 3 − 3 −1 0 − 2 −3 −3 − 2 1 −1 1 0 3 0 0 1 121 −1 0 −1 0 −5 −3 −4 −3 − 1 − 1 0 0 0 1 2 4 − 3 − 1 −1 0 − 2 − 1 −1 0 JPEG Baseline DCT Results LENA Technique JPEG DCT JPEG DCT JPEG DCT JPEG DCT JPEG DCT Technique JPEG DCT JPEG DCT JPEG DCT JPEG DCT JPEG DCT Bitrate RMSE SNR bits/pixel (0-255) (db) 0.25 7.34 30.81 0.50 4.70 34.69 0.75 3.78 36.58 1.00 3.23 37.93 1.50 2.56 39.95 BOOTS Bitrate bits/pixel 0.25 0.50 0.75 1.00 1.50 RMSE (0-255) 14.78 10.07 8.20 6.92 5.02 SNR (db) 24.73 28.07 29.85 31.33 34.12 • The JPEG recommended normalization matrix (for CCIR-601 images and display) has been scaled to achieve the desired bit rate in each case. Huffman tables matched to the image are used for the entropy encoding. • SNR is defined as 10 log10(2552/mse) db. 122 JPEG Baseline System Summary • Image quality vs. bit rate: The baseline JPEG DCT is a lossy technique. The image quality can be traded for lower bit rate by manipulating the normalization matrix components. Larger values correspond to coarser quantization, lower bit rate, and lower reconstructed image quality. • Implementation complexity: Software implementation around 1 MB/s on a Pentium-based machine, ASIC implementation at video rates. • Constant bit rate vs. constant quality: Using the same compression parameters (i.e., normalization matrix) with differ-ent images would generally result in different per pixel bit rates, depending on image resolution, detail, and noise. For example, bit rate variations as high as 5 × have been obtained on video images that have the same resolution (512 × 512) but contain varying image detail. • Encoder/decoder asymmetry: The encoder and the de-coder have roughly the same complexity. This might raise a concern for some storage applications where the data needs to be encoded only once, but would be decoded many times. 123 JPEG Baseline System Summary (Cont’d) • Effect of multiple coding: The baseline JPEG DCT is fairly robust when subjected to multiple cycles of compression and decompression. Although in certain test cases additional degradations have resulted after the first compression and decompression cycle, the losses have been significantly less than the loss caused by the first stage (usually less than 5% after 10 or more stages). Also, in most cases, no additional loss has been observed after just 2 or 3 cycles. • Channel error tolerance: Due to the variable-length codes (Huffman) used in the encoding of the DCT coefficients, a channel error can potentially propagate through the whole image and have a catastrophic effect. The JPEG standard, through the use of restart codes, provides the means for recovering errors at a small cost of compression efficiency. • Artifacts: At low bit rates, the DCT results in artifacts that generally appear as visible blocking and/or ringing around the edges. 124 Normalization Matrix Design 1. Image characteristics • Noise • Resolution • Image detail 2. Display characteristics • Dynamic Range • MTF • Noise 3. Viewing conditions • Viewing distance • Ambient light level 4. Color • HVS color encoding • HVS color perception 125 Fixed-Rate JPEG Two mechanisms exit for creating a JPEG compressed file with a fixed (prespecified) filesize: • Muliple-pass: In this iterative approach, the image is first compressed using a default Q-table, e.g., one that has been empirically optimized for the target rate. If the resulting filesize is not within the acceptable range, the Q-table elements are resealed and the image is compressed again. This process is repeated until the desired filesize is achieved. • Single-pass: The image is compressed to a given size by using: — — i) A (virtual) rate buffer that can accept a variable-rate input while having a fixed-rate output equal to the target compressed bit rate; and ii) The adaptive quantization feature in the extended JPEG, which allows the scaling of the Q-table by a 5-bit factor from one DCT block to another. The image is subdivided into smaller subimages, e.g., 8 × 8 stripes, and the first segment is compressed using a default Qtable. The resulting bits are fed into the buffer while reading bits from the buffer at a constant rate. Before compressing each segment, the buffer status is monitored and the Q-table scaling is adjusted to compensate for deviations from the desired rate. The scale factor is chosen so as to minimize quality variations between segments while maintaining the approximate target rate. 126 The JPEG Extended System The JPEG extended system enhances the baseline system to satisfy a broader range of applications. Its features include • 12-bit/pixel input image precision: In certain applications, such as medical imaging, the source image has a higher precision than 8 bits. It should be noted that the 12-bit codecs require greater computational resources to achieve the required DCT or IDCT accuracy. • Sequential progressive build-up: The image is encoded in multiple scans instead of a single scan. This provides successively higher levels of image quality and is useful for human interaction over low data rate communication channels such as dial-up telephone lines. Two complementary techniques for use in this mode have been specified: spectral selection and successive approximation. • Hierarchical progressive build-up: The image is encoded at multiple resolutions. This is useful for applications where a high-resolution image must also be accessed by lowerresolution devices. • Arithmetic coding: For many of the images tested, arithmetic coding has produced on the average 5-10% better compression than Huffman. However, many believe that it is more complex than Huffman for both software and hardware imple-mentations. 127 The JPEG Extended System • Adaptive quantization: Allows for a 5-bit scale change of the quantization matrix from one block to another. This feature is useful for: — — — Fixed-rate JPEG through rate-buffer instrumentation, Better image quality for the same bit rate, Transcodability with the MPEG standard intraframe mode. • Tiling: Provides a means for tiling an image into a set of smaller subimages. In particular: — — — Simple tiling is useful for providing random access into the middle of a compressed image. Pyramidal tiling is useful for accessing the lower resolution versions of an image. Composite tiling is useful for relating diverse subimages into an image collage. • SPIFF: A file format that provides for the interchange of the compressed image files between application environments. • Selective refinement: Provides for selecting a region of an image for further refinement. 128 Lossless Predictive Mode C B A X • A combination of the three pixels A, B, and C, in the neighborhood of the pixel being encoded are used to form a prediction (estimate) of the pixel X, denoted X̂ by . • The error (differential) signal, e, is formed as e = X − Xˆ • The differential signal, e, typically has a largely reduced variance compared to X, is significantly less correlated than X, and has a stable histogram well approximated by a doublesided exponential (Laplacian) distribution. It may be encoded by either Huffman or arithmetic coding. 129 Predictor Selection For a given image, any of the eight predictors listed below can be used (selection zero is used in the hierarchical mode) to form a prediction. Selections 1-3 correspond to one-dimensional predictors, while selections 4-7 correspond to two-dimensional predictors. The original source image can have any precision from 2 to 16 bits/pixel. Selection Value 0 1 2 3 4 5 6 7 Prediction no prediction A B C A+B–C A + (B – C) / 2 B + (A – C) / 2 (A + B) / 2 Experiments with JPEG images show that the predictor selection 7, on the average, results in slightly lower bit rate (higher compression) compared to the other selections. However, the performance of a certain predictor is highly dependent on the image characteristics, and for a given image, any predictor may outperform its alternatives. 130 Lossless Predictive Coding Results The results of the lossless encoding of the nine JPEG images are summarized below. 14 The images are 720 pixels/line by 576 lines, 8 bits/sample for the luminance; 360 pixels/line by 576 lines, 8 bits/sample for each chrominance component (average of 16 bits/pixel per image). Arithmetic coding has been used to encode the prediction errors. Results indicate that an average compression of about 2:1 can be expected when encoding color images of moderate complexity. Image Bits/pixel Predictor BOATS 6.9 7 BOARD 7.0 7 ZELDA 7.2 7 8.3 7 BARBARA1 8.7 7 BARBARA2 8.1 7 HOTEL 5.7 7 BALLOONS 8.2 7 GOLDHILL 7.3 7 GIRL Average 7.5 7 Average 6 +1.8% Average 5 +4.8% Average 4 +7.1% Average 3 +11.6% Average 2 +1.5% Average 1 +6.9% 14 J• Mitchell, Proceedings of SPIE, CR37, 1991 131 SUBBAND AND WAVELET CODING 132 Subband Coding • The image is decomposed into a number of subimages (subbands), each representing a different spatial frequency content. • The decomposition is performed using a bank of low-pass and high-pass filters known as analysis filters, followed by downsampling. The filters are chosen so that minimal or no information is lost in the decimation process. • This filtering process is equivalent to a basis function decomposition, where the basis functions are the impulse responses of the filters, and the filtered subimages are the coefficients for the corresponding basis functions. • The subbands are independently or jointly quantized and encoded using conventional techniques. • At the decoder, the reconstructed subbands are upsampled and interpolated using a bank of low-pass and high-pass filters known as synthesis filters. To minimize (or completely eliminate) any artifacts resulting from the analysis and synthesis stages, the synthesis filters are usually derived from the analysis filters depending on the desired filter characterisitics . 133 134 Wavelet Coding • The wavelet transform uses basis functions that are scaled and translated versions of a single prototype function (the basic or mother wavelet). • The wavelet transform can be viewed as a bank of bandpass filters, where the filters have increasing bandwidth as the center frequency increases. For practical purposes, a wavelet decomposition is identical to a nonuniform subband decomposition, where the bandwidth of a subband is a function of its center frequency. • The wavelet filter bank can be implemented by recursively filtering with a low-pass/high-pass filter pair followed by downsampling. • The wavelet basis functions are the impulse responses of the cascaded filters. 135 Wavelet Analysis/Synthesis Filter Banks 136 Wavelet Filter Banks There are three general classes of wavelet basis functions, as defined by the type of filter bank. Each class has characteristics that may make it more or less suitable for a specific application. • Quadrature mirror filters (QMFs) • Conjugate quadrature filters (CQFs) • Bi-orthogonal filters Let’s denote by h0(n) and h1(n), the analysis low-pass and high-pass filters, and by g0(n) and g1(n), the synthesis low-pass and high-pass filters, respectively. 137 Quadrature Mirror Filters One set of desirable basis functions consists of those that are symmetrical and have the same length. Given a low-pass analysis filter h0(n), aliasing can be eliminated by choosing: h1(n) = (– 1)n h0(n) g0(n) = 2 h0(n) g1(n) = – 2(– 1)n h0(n) Unfortunately, the symmetry constraint on the basis functions makes it impossible to design a system with a gain of unity for all frequencies. Properties of QMFs: • Equal-length, linear phase filters (symmetrical). • Although not perfect reconstruction, but very close to, • • because the basis functions are almost orthogonal. Symmetric boundary extension (minimizes discontinuities). Both even and odd filter lengths allowed. 138 Conjugate Quadrature Filters Perfect reconstruction can be achieved with equal-length filters by relaxing the symmetrical filter constraint. Given a low-pass analysis filter h0(n), aliasing can be eliminated by choosing: h1(n) = (– 1)n h0(N – 1– n) g0(n) = 2 h0(N – 1– n) g1(n) = – 2(– 1)n h0(n) The proper choice of h0(n) will result in a system gain of unity for all frequencies. Properties of CQFs: • Equal-length, nonlinear phase filters. • Perfect reconstruction (orthogonal basis functions). • Periodic boundary extension (introduces discontinuities). • Even filter lengths only. • Possible regularity, vanishing moments properties. 139 Biorthogonal Filters Perfect reconstruction can be achieved with symmetrical basis functions by relaxing the constraint of equal length. Given a low-pass analysis filter h0(n), and a low-pass synthesis filter g0(n), aliasing can be eliminated if: h1(n) = 1 (– 1)n g0(n) 2 g1(n) = 2 (– 1)n h0(n) The proper choices of h0(n) and g0(n) will result in a system gain of unity all for frequencies. Properties of biorthogonal filters: • Unequal-length, linear phase filters (symmetrical). • Perfect reconstruction. The basis fucntions for h0(n) and g1(n) are orthogonal; and the basis functions for g0(n) and h1(n) are orthogonal. • Symmetric boundary extension. • Odd or even filter lengths that differ by a multiple of two (e.g., 7/9, 6/10). • Possible regularity, vanishing moments properties. 140 Biorthogonal Filters For Compression Length In a recent paper (IEEE Trans. IP, Vol 4, pp. 1053, Aug.’95), Villasenor et al. evaluated 4300 candidate wavelet biorthogonal filter banks. The following table summarizes their choice of the filters best suited to image compression. 1 2 3 4 5 6 H0 G0 H0 G0 H0 G0 H0 G0 H0 G0 H0 G0 9 7 13 11 6 10 5 3 2 6 9 3 Filter coefficients (origin is leftmost coefficient, coefs for negative indices follow by symmetry) 852699, .377402, -.110624, -.023849, .037828 .788486, .418092, -.040689, -.064539 .767245, .383269, -.068878, -.033475, .047282, .003759, -.008473 .832848, .448109, -.069163, -.108737, .006292, .014182 .788486, .047699, -.129078 .615051, .33389, -.067237, .006989, .018914 1.060660, .353553, -.176777 .707107,.353553 .707107,.707107 .707107,.088388,-.088388 .994369, .419845, -.176777, -.066291, .033145 .707107,.353553 141 Embedded Zerotree Wavelet (EZW)15 The embedded zerotree wavelet coding algorithm uses the following building blocks: • A discrete wavelet transform that decorrelates most images fairly effectively; • Zerotree coding, a procedure whereby insignificance is predicted across multiple scales using a practical image model, providing substantial coding gains over the first-order entropy of the wavelet coefficient significance maps; •A uniform embedded quantizer, which uses successive approximations, and allows the encoding or the decoding process to stop at any point (i.e., progressive transmission) without a penalty in performance • Adaptive arithmetic coding, which allows the entropy coder to incorporate learning into the bit stream itself. 15 J. Shapiro, IEEE Trans. Signal Proc., 41(12), pp. 3445-3462, Dec. 1993 142 Zerotrees and Significance Maps Zerotree coding is based on the assumption that if a wavelet coefficient at a coarse scale (called a parent) is insignificant with respect to a threshold T (i.e., its magnitude is less than T), then all the wavelet coefficients of the same orientation in the same spatial location at finer scales (called the descendants) are also likely to be insignificant with respect to T. LL3 HL3 HL2 LH3 HH3 HL1 LH2 HH2 LH1 HH1 143 Significance Maps The symbols to be encoded at the first pass of the EZW method are generated in the following way: • An initial quantization threshold T is chosen (usually set at about half the magnitude of the largest wavelet coefficient). •A raster scanning of the coefficients in each band is performed by starting at the lowest frequency subband, scanning the bands from left to right and top to bottom, and then moving on to the next scale and repeating this process. • Each wavelet coefficient is compared to the threshold T , resulting in one of four possibilities: — If the coefficient is significant and positive, the symbol en-coded is (POS). — If the coefficient is significant and negative, the symbol en-coded is (NEG). — If the coefficient is insignificant, but at least one of its descendants is significant, an isolated zero (IZ) is encoded. — Finally, if the coefficient is insignificant, its parent is significant, and all of its descendants are also insignificant, the symbol encoded is a zerotree root (ZTR). • These 4 symbols are encoded with adaptive arithmetic coding. 144 Successive Approximations To perform embedded coding, successive approximation quantization is applied where both the significance of a coefficient and its magnitude are further refined by applying a sequence of thresholds {Tn}, such that Ti = Ti-1 / 2. Two separate lists of wavelet coefficients are maintained and at each pass, which corresponds to a given threshold Tj, these list are updated: •A dominant list, which contains the location of those coefficients that have so far been found to be insignificant. At each pass of this list, its coefficients are compared to the current threshold Tj to determine their significance and, if significant, their sign. This information is encoded with a procedure similar to the one outlined for the first pass. Note that no descendants of a zerotree need to be encoded. • A subordinate list, which contains the magnitudes of those coefficients that have been found to be significant. At each pass of this list, the magnitudes of the coefficients are refined to an additional bit of precision using two symbols. This is similar to bit plane encoding of the magnitudes. Adaptive arithmetic coding is used to encode all the symbols generated in each pass. The encoding can be stopped whenever the bit budget has been exhausted or a certain distortion has been reached. 145 Example: 3-Stage Wavelet Transformed Image Following is an array of values corresponding to a 3-stage wavelet decomposition of a simple 8 × 8 image. The initial quantization threshold is chosen to be nearly half of the largest coefficient magnitude, i.e., T0 = 32. 63 -34 49 10 7 14 −13 -31 23 3 5 3 − 12 15 14 4 14 8 − −9 − 7 7 −1 9 2 4 6 −2 2 3 −2 0 4 −5 9 −1 47 3 13 −12 4 6 −7 3 −2 3 0 −3 2 2 −3 6 −4 5 11 5 6 146 3 6 0 3 3 6 −4 4 Zerotree Symbol Encoding Example Following is the result of processing the first dominant pass for T = 32. The reconstruction magnitudes are taken at the center of the uncertainty interval, i.e., 48. Subband Coefficient value Symbol Reconstruction value LL3 63 POS 48 HL3 -34 NEG -48 LH3 -31 IZ 0 HH3 23 ZTR 0 HL2 49 POS 48 HL2 10 ZTR 0 HL2 14 ZTR 0 HL2 -13 ZTR 0 LH2 15 ZTR 0 LH2 14 IZ 0 LH2 -9 ZTR 0 LH2 -7 ZTR 0 HL1 7 Z 0 HL1 13 Z 0 HL1 3 Z 0 HL1 4 Z 0 LH1 -1 Z 0 LH1 47 POS 48 LH1 -3 Z 0 LH1 -2 Z 0 147 FRACTAL CODING 148 Contractive Mappings x 1 Consider the expression f ( x ) = + . For an arbitrary initial 2 x value x0, let’s compute f (x0). Next, compute f (f (x0)) and f (f (f (x0))), …, etc. It can be shown that for any value of x0, this series will converge to the same value, namely, √(2). f (x) as defined above is a contractive mapping, and √(2) is its fixed point. A transformation τ is called contractive, if for all values of x and y, there exists a value s < 1, such that d(τ (x), τ (y) ) ≤ sd(x, y). The value s is called the contractivity of the transformation τ . If s = 1, the transformation is nonexpansive. If s > 1, the transformation is expansive. Now consider transmitting the value of √(2) over a communication channel. If the transmission of the function f (x) as defined above takes less bandwidth than the transmission of √(2) itself, the function can be transmitted and the value of √(2) can be recovered at the other end by starting with any initial guess and iterating the computation until the desired accuracy is achieved. Similarly, for the purposes of image compression, if the different regions of an image are represented as the fixed points of a set of contractive mappings, the storage of the parameters of those contractive mappings is all that is needed for the reconstruction of the image. 149 Jacquin’s Fractal Compression Technique Jacquin’s technique is predicated on the assumption that any section of an image can be represented as a transformation of another section of the same image. To encode a certain region of an image, first a search through the image is conducted, looking for another part of the image which when transformed by a cascade of one or more (overall contractive) transformations, would produce the best approximation to the original region. The compressed bits consist of the parameters needed to represent these contractive transformations in addition to the coordinates of the region that is being transformed. In order to simplify the coding operation, the image is segmented into nonoverlapping contiguous n × n blocks, known as range blocks, and each 2n × 2n block of the image, known as a domain block, is tried out through a series of contractive transformations to find a best match. Due to the intensive computational requirements of the best match search, the fractal encoding is usually several orders of magnitude slower than fractal decoding. 150
© Copyright 2025 Paperzz