Lossy Coding Hoda Roodaki [email protected] Vector Quantization (VQ) • Divide image into bock of N pixels, which are considered as Nx1 vectors. • VQ is mapping each input vector into a collection of W of finite length vector. • W is called the codebook. • Each member of W is a codeword or a code vector. VQ • If the number of codewords is K, then the number of bits required to send one vector is log 2 𝑘 . Structure of a VQ How?! Example Complexity of VQ • Encoder is Computationally Complex • Must select n Cores. • Must check all members to find the closest core. • Decoder is easy & fast Distortion • Given 𝑥 (original data) and 𝑥 the approximation • The distortion is defined as: 𝑑(𝑥, 𝑥) • Common distortion measure are: • Hamming distortion measure: • Absolute error: • Square error: 0 𝑖𝑓 𝑥 = 𝑥 𝑑 𝑥, 𝑥 = 1 𝑖𝑓 𝑥 ≠ 𝑥 𝑑 𝑥, 𝑥 = 𝑥 − 𝑥 𝑑 𝑥, 𝑥 = 𝑥 − 𝑥 2 • We use the Mean error and call it: Mean Absolute Error (MAE), Mean Square Error (MSE) respectively. Signal-to-noise ratio • Signal-to-noise ratio (abbreviated SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. • Signal-to-noise ratio is defined as the ratio of the power of a signal and the power of noise: Peak signal-to-noise ratio • Peak signal-to-noise ratio, often abbreviated PSNR, is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise. • PSNR is usually expressed in terms of the logarithmic decibel scale. • PSNR is most commonly used to measure the quality of reconstruction of lossy compression codecs (e.g., for image compression). • The signal in this case is the original data, and the noise is the error introduced by compression. • When comparing compression codecs, PSNR is an approximation to human perception of reconstruction quality. • Although a higher PSNR generally indicates that the reconstruction is of higher quality, in some cases it may not. Peak signal-to-noise ratio 2552 • Using dB • smaller numbers can be used • there's an approximately linear relation between measurement and perceived sensation Rate-distortion Theory • Rate-distortion theory says that for a given source and a gieven distortion measure, there exist a function, R(D), called the ratedistortion function. LBG Algorithm of VQ • Proposed by Linde, Buzo and Gray, 1980. • It is an iterative algorithm. • The algorithm requires an initial codebook obtaind by the splitting method. • An initial codevector is set as the average of the entire training sequence. This codevector is then split into two. • 𝑦1 = 1 + 𝜀 𝐶 𝑦2 = 1 − 𝜀 𝐶 LBG Algorithm of VQ (Cont.) • Given yk, Rk should chosen such that • 𝑅𝑘 = {𝑥: 𝑑 𝑥, 𝑦𝑘 ≤ 𝑑 𝑥, 𝑦𝑗 ∀𝑗 ≠ 𝑘} = the set of x for which 𝑦𝑘 is the nearest point. • Replace each 𝑦𝑘 with the centroid of Rk. • If the overall distortion D is lower than a thereshold, stop! • Otherwise, go to (1), repeat until the desired number of codework all the required distortion are obtained. JPEG • JPEG stands for Joint Photographic Experts Group is a lossy compression, was formed in 1986 by ISO and ITU. • Voted as international standard in 1992. • JPEG works best on natural images where unlike graphics images , the is not much redundancy to exploit. (compression ratio 10-50) JPEG • JPEG uses transform coding, it is largely based on the following observations: 1. A large majority of useful image contents change relatively slowly across images. • Lower spatial frequency components contain more information than the high frequency components which often correspond to less useful details and noise. • Global and local information: One of them is in terms of spatial frequencies. From the visual system’s perspective, the world is made up of different spatial frequencies. JPEG • The top a typical real-world scene. • The two bottom panels show the same picture filtered to pass only the low spatial frequencies or high spatial frequencies. • It is evident that the two spatialfrequency components carry different kinds of information that accord well with the general conceptualization of global information and local information. • The bottom left picture carries a global representation of the scene, whereas the bottom right picture conveys information about edges and details. JPEG • JPEG uses transform coding, it is largely based on the following observations: 2. Humans are more receptive to loss of higher spatial frequency components that loss of lower frequency components. JPEG Color Space • JPEG uses the UYV color space. • The human eye is more sensitive to luminance than to chrominance. Typically JPEGs throw out ¾ of the chrominance information before any other compression takes place. Chroma Subsampling 4:2:0 Subsampling Discrete Cosine Transform (DCT) • DCT helps separate the image into parts (sub-bands) of different importance. (with respect to the image’s visual quality.) • The DCT transforms a signal or image from the spatial domain to the frequency domain. • In JPEG DCT is applied on blocks of 8x8 pixels. DCT basis Function • Each basis function is multiplied by its coefficient and then this product is added to the final image. • An 8x8 image that just contains one shade of gray will consist only of a weighted value for the upper left hand DCT basis. DCT • Low spatial frequencies in DCT appear in the upper left corner of the DCT. The lower right values represent higher frequencies, and are often small. • Small enough to be neglected with little visible distortion. • In theory there should be no loss in this transform. In reality, the floating point values can’t be stored exactly so there is always some loss. Why DCT not FFT? • DCT is similar to the Fast Fourier Transform (FFT), but can approximate lines well with fewer coefficients. Quantization • Quantization: • Each coefficient is divided by an integer between 1 and 255 and rounded to the nearest integer. • The DCT block should be quantized using a quantization table. • The quantization table is chosen to reduce the precision of each coefficient to no more than necessary. • The choice of the quantization table is a trade off between compression and image quality. • Example: 110101 = 53 (6 bit) q[u,v] = 4 => Truncate to 4 bits: 1101 = 13 • Quantization error is the main source of Lossy Comprssion. Quantization • The quantization table is stored in the file header for use by the decoder. • Decoder multiplies each coefficients by the same number. Quantization • Uniform Quantization • Each transform coefficient is divided by the same constant N. • Non-uniform Quantization – Quantization Tables • Eye is most sensitive to low frequency (upper left corner), less sensitive to high frequencies (lower right corner) • Custom quantization • Quantization tables can be put in image header. Quantization Tables • The luminance Quantization Table • The chrominance Quantization Table Zigzag Scan • Re-arrange the 8x8 block into a 1x64 vector in zigzag order to group low frequency coefficients in top of the vector. Zigzag scan Example DCT Coefficients encoding • Method 1: • DC component • Differential Pulse Code Modulation (DPCM) • DC component is large and varied, but often close to previous value. • Encode the different from previous 8x8 blocks. • AC component • Run Length Encode (RLE) • 1x64 vector has lots of zeros in it. • Keeps skip and value. • skip is the number of zeros and value is the next non-zero component. DCT Coefficient Example Entropy Coding • Method 2: • Categorize DC values into SIZE (number of bits needed to represent) and actual amplitude bits. Entropy coding-DC • SIZE is Huffman encoded. • Actual bits are not encoded. Entropy Coding-AC • Two symbols are used: • Symbol 1: (skip, SIZE) • skip is the number of zeros and Size is the number of bits of non-zero component. • (0010, 0100) : AC coefficient of 4 bit and 2 zero value before it. • Huffman Encoded • Symbol 2: actual AC coefficient • Are not codec Entropy Coding-AC - Example Entropy Coding • Huffman tables (both for DC and AC coefficients) can be: • Custom -> Sent in file header • Default JPEG Encoder Block Diagram JPEG bitstream • Frame is a picture. • A scan is a pass through the pixel (e.g. the Y component). • A segment is a group of blocks. • A block is a 8x8 group of pixels. JPEG-Modes-Sequential • Sequential mode • Baseline Sequential (described so far) • Support only 8 bit per pixel. • Support only Huffman Coding. • Extended Sequential • Support 8 or 12 bits per pixel • Support Huffman and Arithmetic Coding. JPEG Modes-Progressive • Goal: Start with a low quality image and successively improve! • Spectral selection: Send DC coefficients and first few AC coefficients, then gradually some more Acs. • Successive approximation: send DCT coefficients MSB to LSB. JPEG Mode-Lossless • It does not use DCT-based method! • Instead, it uses a predictive (differential coding) method: • A predictor combines the values of up to three neighboring pixels (not blocks as in the Sequential mode) as the predicted value for the current pixel, indicated by "X" in the figure below. The encoder then compares this prediction with the actual pixel value at the position "X", and encodes the difference (prediction residual) losslessly. JPEG Mode-Lossless Since it uses only previously encoded neighbors, the very first pixel I(0, 0) will have to use itself. Other pixels at the first row always use P1, at the first column always use P2. JPEG Mode-Lossless JPEG Mode-Hierarchical • Pyramid Coding • Down-sample by factors of 2 in each dimension, e.g., reduce 640 x 480 to 320 x 240 • Code smaller image using another JPEG mode (Progressive, Sequential, or Lossless). • Decode and up-sample encoded image • Encode difference between the up-sampled and the original using Progressive, Sequential, or Lossless. • Can be repeated multiple times. • Good for viewing high resolution image on low resolution display. JPEG Mode-Hierarchical References • Reference: W.B. Pennebaker, J.L. Mitchell, "The JPEG Still Image Data Compression Standard", Van Nostrand Reinhold, 1993. • http://www-i6.informatik.rwthaachen.de/web/Misc/Coding/365/li/material/notes/Chap4/Chap4.2/ Chap4.2.html
© Copyright 2026 Paperzz