Lossy Coding

Lossy Coding
Hoda Roodaki
[email protected]
Vector Quantization (VQ)
• Divide image into bock of N pixels, which are considered as Nx1
vectors.
• VQ is mapping each input vector into a collection of W of finite length
vector.
• W is called the codebook.
• Each member of W is a codeword or a code vector.
VQ
• If the number of codewords is K, then the number of bits required to
send one vector is log 2 𝑘 .
Structure of a VQ
How?!
Example
Complexity of VQ
• Encoder is Computationally Complex
• Must select n Cores.
• Must check all members to find the closest core.
• Decoder is easy & fast
Distortion
• Given 𝑥 (original data) and 𝑥 the approximation
• The distortion is defined as: 𝑑(𝑥, 𝑥)
• Common distortion measure are:
• Hamming distortion measure:
• Absolute error:
• Square error:
0
𝑖𝑓 𝑥 = 𝑥
𝑑 𝑥, 𝑥 =
1
𝑖𝑓 𝑥 ≠ 𝑥
𝑑 𝑥, 𝑥 = 𝑥 − 𝑥
𝑑 𝑥, 𝑥 = 𝑥 − 𝑥 2
• We use the Mean error and call it: Mean Absolute Error (MAE), Mean Square
Error (MSE) respectively.
Signal-to-noise ratio
• Signal-to-noise ratio (abbreviated SNR or S/N) is a measure used in
science and engineering that compares the level of a desired signal to
the level of background noise.
• Signal-to-noise ratio is defined as the ratio of the power of a signal
and the power of noise:
Peak signal-to-noise ratio
• Peak signal-to-noise ratio, often abbreviated PSNR, is an engineering term
for the ratio between the maximum possible power of a signal and the
power of corrupting noise.
• PSNR is usually expressed in terms of the logarithmic decibel scale.
• PSNR is most commonly used to measure the quality of reconstruction of
lossy compression codecs (e.g., for image compression).
• The signal in this case is the original data, and the noise is the error introduced by
compression.
• When comparing compression codecs, PSNR is an approximation to human
perception of reconstruction quality.
• Although a higher PSNR generally indicates that the reconstruction is of higher
quality, in some cases it may not.
Peak signal-to-noise ratio
2552
• Using dB
• smaller numbers can be used
• there's an approximately linear relation between measurement and
perceived sensation
Rate-distortion Theory
• Rate-distortion theory says that for a given source and a gieven
distortion measure, there exist a function, R(D), called the ratedistortion function.
LBG Algorithm of VQ
• Proposed by Linde, Buzo and Gray, 1980.
• It is an iterative algorithm.
• The algorithm requires an initial codebook obtaind by the splitting
method.
• An initial codevector is set as the average of the entire training sequence. This
codevector is then split into two.
• 𝑦1 = 1 + 𝜀 𝐶
𝑦2 = 1 − 𝜀 𝐶
LBG Algorithm of VQ (Cont.)
• Given yk, Rk should chosen such that
• 𝑅𝑘 = {𝑥: 𝑑 𝑥, 𝑦𝑘 ≤ 𝑑 𝑥, 𝑦𝑗
∀𝑗 ≠ 𝑘}
= the set of x for which 𝑦𝑘 is the nearest point.
• Replace each 𝑦𝑘 with the centroid of Rk.
• If the overall distortion D is lower than a thereshold, stop!
• Otherwise, go to (1), repeat until the desired number of codework all
the required distortion are obtained.
JPEG
• JPEG stands for Joint Photographic Experts Group is a lossy
compression, was formed in 1986 by ISO and ITU.
• Voted as international standard in 1992.
• JPEG works best on natural images where unlike graphics images , the
is not much redundancy to exploit. (compression ratio 10-50)
JPEG
• JPEG uses transform coding, it is largely based on the following
observations:
1. A large majority of useful image contents change relatively slowly across
images.
• Lower spatial frequency components contain more information than the high frequency
components which often correspond to less useful details and noise.
• Global and local information: One of them is in terms of spatial frequencies. From the
visual system’s perspective, the world is made up of different spatial frequencies.
JPEG
• The top a typical real-world scene.
• The two bottom panels show the
same picture filtered to pass only the
low spatial frequencies or high spatial
frequencies.
• It is evident that the two spatialfrequency
components
carry
different kinds of information that
accord well with the general
conceptualization
of
global
information and local information.
• The bottom left picture carries a
global representation of the scene,
whereas the bottom right picture
conveys information about edges and
details.
JPEG
• JPEG uses transform coding, it is largely based on the following
observations:
2. Humans are more receptive to loss of higher spatial frequency components
that loss of lower frequency components.
JPEG Color Space
• JPEG uses the UYV color space.
• The human eye is more sensitive to luminance than to chrominance.
Typically JPEGs throw out ¾ of the chrominance information before
any other compression takes place.
Chroma Subsampling
4:2:0 Subsampling
Discrete Cosine Transform (DCT)
• DCT helps separate the image into parts (sub-bands) of different
importance. (with respect to the image’s visual quality.)
• The DCT transforms a signal or image from the spatial domain to the
frequency domain.
• In JPEG DCT is applied on blocks of 8x8 pixels.
DCT basis Function
• Each basis function is multiplied by its coefficient and then this
product is added to the final image.
• An 8x8 image that just contains one shade of gray will consist only of a
weighted value for the upper left hand DCT basis.
DCT
• Low spatial frequencies in DCT appear in the upper left corner of the
DCT. The lower right values represent higher frequencies, and are
often small.
• Small enough to be neglected with little visible distortion.
• In theory there should be no loss in this transform. In reality, the
floating point values can’t be stored exactly so there is always some
loss.
Why DCT not FFT?
• DCT is similar to the Fast Fourier Transform (FFT), but can
approximate lines well with fewer coefficients.
Quantization
• Quantization:
• Each coefficient is divided by an integer between 1 and 255 and rounded to
the nearest integer.
• The DCT block should be quantized using a quantization table.
• The quantization table is chosen to reduce the precision of each
coefficient to no more than necessary.
• The choice of the quantization table is a trade off between
compression and image quality.
• Example: 110101 = 53 (6 bit)
q[u,v] = 4 => Truncate to 4 bits: 1101 = 13
• Quantization error is the main source of Lossy Comprssion.
Quantization
• The quantization table is stored in the file header for use by the
decoder.
• Decoder multiplies each coefficients by the same number.
Quantization
• Uniform Quantization
• Each transform coefficient is divided by the same constant N.
• Non-uniform Quantization – Quantization Tables
• Eye is most sensitive to low frequency (upper left corner), less sensitive to
high frequencies (lower right corner)
• Custom quantization
• Quantization tables can be put in image header.
Quantization Tables
• The luminance Quantization Table
• The chrominance Quantization
Table
Zigzag Scan
• Re-arrange the 8x8 block into a 1x64 vector in zigzag order to group
low frequency coefficients in top of the vector.
Zigzag scan Example
DCT Coefficients encoding
• Method 1:
• DC component
• Differential Pulse Code Modulation (DPCM)
• DC component is large and varied, but often close to previous value.
• Encode the different from previous 8x8 blocks.
• AC component
• Run Length Encode (RLE)
• 1x64 vector has lots of zeros in it.
• Keeps skip and value.
• skip is the number of zeros and value is the next non-zero component.
DCT Coefficient Example
Entropy Coding
• Method 2:
• Categorize DC values into SIZE (number of bits needed to represent)
and actual amplitude bits.
Entropy coding-DC
• SIZE is Huffman encoded.
• Actual bits are not encoded.
Entropy Coding-AC
• Two symbols are used:
• Symbol 1: (skip, SIZE)
• skip is the number of zeros and Size is the number of bits of non-zero component.
• (0010, 0100) : AC coefficient of 4 bit and 2 zero value before it.
• Huffman Encoded
• Symbol 2: actual AC coefficient
• Are not codec
Entropy Coding-AC - Example
Entropy Coding
• Huffman tables (both for DC and AC coefficients) can be:
• Custom -> Sent in file header
• Default
JPEG Encoder Block Diagram
JPEG bitstream
• Frame is a picture.
• A scan is a pass through the pixel (e.g. the Y component).
• A segment is a group of blocks.
• A block is a 8x8 group of pixels.
JPEG-Modes-Sequential
• Sequential mode
• Baseline Sequential (described so far)
• Support only 8 bit per pixel.
• Support only Huffman Coding.
• Extended Sequential
• Support 8 or 12 bits per pixel
• Support Huffman and Arithmetic Coding.
JPEG Modes-Progressive
• Goal: Start with a low quality image and successively improve!
• Spectral selection: Send DC coefficients and first few AC coefficients, then
gradually some more Acs.
• Successive approximation: send DCT coefficients MSB to LSB.
JPEG Mode-Lossless
• It does not use DCT-based method!
• Instead, it uses a predictive (differential coding) method:
• A predictor combines the values of up to three neighboring pixels (not blocks
as in the Sequential mode) as the predicted value for the current pixel,
indicated by "X" in the figure below. The encoder then compares this
prediction with the actual pixel value at the position "X", and encodes the
difference (prediction residual) losslessly.
JPEG Mode-Lossless
Since it uses only previously encoded neighbors, the
very first pixel I(0, 0) will have to use itself.
Other pixels at the first row always use P1, at the first
column always use P2.
JPEG Mode-Lossless
JPEG Mode-Hierarchical
• Pyramid Coding
• Down-sample by factors of 2 in each dimension, e.g., reduce 640 x 480 to 320
x 240
• Code smaller image using another JPEG mode (Progressive, Sequential, or
Lossless).
• Decode and up-sample encoded image
• Encode difference between the up-sampled and the original using
Progressive, Sequential, or Lossless.
• Can be repeated multiple times.
• Good for viewing high resolution image on low resolution display.
JPEG Mode-Hierarchical
References
• Reference: W.B. Pennebaker, J.L. Mitchell, "The JPEG Still Image Data
Compression Standard", Van Nostrand Reinhold, 1993.
• http://www-i6.informatik.rwthaachen.de/web/Misc/Coding/365/li/material/notes/Chap4/Chap4.2/
Chap4.2.html