part1spring2009.pdf

Multimedia Systems
Sharif University of Technology
INTRODUCTION AND
BACKGROUND
1
What Is Image Compression
Image compression is the art/science of finding efficient
representations for digital images in order to:
•
Reduce the memory required for their storage,
• Reduce the bandwidth or time required for their transfer
across communication channels.
• Increase
the effective data transfer rate when reading
from storage devices.
2
Digital Image Acquisition
Digital images can be acquired in a variety of ways:
•Image digitization or scanning (Photo CD, facsimile, etc.)
•sight
digitization or direct acquisition (digital cameras, most
medical imaging modalities such as MRI)
•Synthetic or computer generated
When referring to digital images, the following terminology is useful:
•Pixel or pel: picture element.
•Spatial resolution: a measure of the number of image pixels per
unit length in horizontal or vertical direction, expressed in
pixels/in, dots/in, samples/in, etc.
•Monochrome
or grayscale: an image consisting of only gray
values and no color.
•Bit-depth: the number of bits used to represent each pixel.
•Codevalue:
a pixel’s gray level value, e.g., in an 8-bit/pixel
image codevalues range from 0 to 255.
3
Need for Image Compression: Storage
• Digital Camera
An image captured with a single-chip CCD sensor (with a color
filter array) requiring 768 pixels × 512 lines × 8 bits/pixel ≈
375 KB per image.
A solid-state memory card with a 1 MB capacity would hold
less than 3 images.
• Hybrid Image Capture
A 24 × 36-mm (35-mm) negative photograph scanned at 12 μm
requires 3072 pixels × 2048 lines × 24 bits/pixel ≈ 18 MB per
image.
A CD-ROM with a 600 MB storage capacity would hold about
30 images.
• VHS-Quality Full-Motion Video
Full-motion video using SIF format (CD-I) requires 352 pixels ×
240 lines/frame × 24 bits/pixel × 30 frames/s ≈ 7.6 MB for one
second of video.
A CD-ROM with a 600 MB storage capacity would only hold
about 80 seconds of full-motion video. Furthermore, the CD-ROM data transfer rate (1X speed) is 150 KB/s, which means it
would take 72 minutes to playback 80 seconds of full-motion
video!
4
Need for Image Compression: Transmission
• Facsimile Transmission
A low-resolution facsimile image (e.g., an 8 12 × 11-inch page
scanned at 200 pixels/inch horizontal and 100 pixels/inch vertical ) transmitted over ordinary dial-up telephone lines.
Using a 4.8 Kb/s modem, the transmission takes approximately
6.5 minutes per uncompressed image.
• Digital Broadcasting (HDTV)
Digital HDTV at a progressive scan resolution of 1280 × 720
pixels/frame ×24 bits/pixel × 60 frames/s.
The data rate is about 166-186 MB/s. The aim of HDTV is to
broadcast at about 20 Mbit/s using existing TV channels. This
requires approximately 60:1 to 80:1 compression.
• Videophone
A QCIF color video image at 176 × 144 pixels × 24 bits/pixel,
15 frames/s, transmitted over ordinary dial-up telephone lines
with a 28.8 Kb/s modem.
The compression ratio needed is in excess of 300:1.
5
Pixels Lines
2084
7.6 MB/s
240 KB
18 MB
80
50
20
4
CR
3072
1100
30
166 MB/s
300
Size
Photo CD
1728
240
60
1.1 MB/s
f/s
Facsimile
352
720
15
10
CD-I
1280
144
375 KB
Need For Digital Image Compression
Application
768
HDTV
176
512
Digital Camera
Video Phone
6
Digital Image Compression Applications
• Consumer Imaging (e.g., digital cameras, Photo CD)
• Multimedia Products (e.g., CD-I, DV-I)
• Medical Imaging (e.g., computed radiography, CT, MRI)
• Remote Sensing (e.g., LANDSAT imaging)
• Photojournalism (e.g., transceiver systems)
• Graphic Arts (e.g., electronic pre-press)
• Computer-Generated Images (e.g., advertising, entertainment,
scientific visualization)
• Facsimile Transmission (e.g., documents, digital half-tones, color
fax)
• Broadcasting (e.g., HDTV)
• Many more such as: photo-videotex, image archiving, digital
copiers, desktop and electronic publishing, teleconferencing, video
phone, education, games, video mail, etc.
7
Compressibility of Images
Consider the following 8 × 8 block example of the image LENA.
How difficult is it to guess the correct value of the missing pixel?
How critical is it to reproduce the exact value?
139 144 149 153 155 155 155 155
144 151 153 156 159 156 156 156
150 155 160 163 158 156 156 156
159 161 162
160 159 159 159
159 160 161 162 162 155 155 155
161 161 161 161 160 157 157 157
162 162 161 163 162 157 157 157
162 162 161 161 163 158 158 158
8
Why Can Images Be Compressed?
Image compression can be achieved primarily because image data are
often highly redundant and/or irrelevant. Redundancy relates to the
statistical properties of images while irrelevancy relates to an
observer viewing an image.
• Redundancy (statistical):
- Spatial
correlation between neighboring pixel values. The
amount of correlation depends on many factors such as resolution, bit depth, scene content, and noise.
- Spectral
correlation between color planes (e.g., RGB color
images) or spectral bands (e.g., remote sensing).
-
Temporal correlation between neighboring frames in an image sequence.
• Irrelevancy (perceptual):
-
The human visual system (HVS) exhibits sensitivity variations as a function of spatial frequency, orientation, light
level, surrounding signals (masking), etc. Among factors affecting irrelevancy are scene content and noise, image size
and viewing distance, display characteristics (e.g., noise, light
level, nonlinearities), and the observer.
9
Compression of Color Images
A digitized color image is typically represented by the tristimulus red,
green, and blue signals R(i,j), G(i,j), and B(i,j), where (i,j) denotes the
pixel location. In general, the R, G, and B signals are highly
correlated and thus redundant. This correlation can be exploited by
transforming the RGB values into a new set of values that are
uncorrelated (or significantly less correlated). Many transformations,
depending on the application, have been suggested to accomplish this
task. The JPEG standard does not provide a definition for the color
space to be used; the choice has intentionally been left to the user.
The goal of the color transform is to pack most of the image spectral
energy into one component and significantly less energy into the
remaining two components. Loosely speaking, this is similar to the
way the human visual system (HVS) encodes color. Psychophysical
experiments suggest that the HVS encodes color by transforming the
output of the three cone color mechanisms into an achromatic channel
(such as luminance) and two chromatic channels (opponent
channels). For the purposes of image compression, since the HVS is
much more sensitive to the variations in the achromatic channel than
in the chromatic channels, the chromatic components are often
subsampled by a factor of 2:1 or 4:1 in both horizontal and vertical
directions prior to compression.
10
Compression of color Images ( Contُ d )
An example of a transformation used for color image compression is
that used in the NTSC national color television standard. The RGB
values are transformed into a new set of values known as Y, I, and Q,
(luminance, in-phase, and quadrature, respectively) according to the
following
Y(i,j)
.299
.587
.114
R(i,j)
I(i,j) =
596
–.274
–.322
G(i,j)
Q(i,j)
212
–.523
.311
B(i,j)
The transformation to convert YIQ back to RGB values is:
R(i,j)
G(i,j)
B(i,j )
1.000
.956
= 1.000 –.272
.621
Y(i,j)
–.647
I(i,j)
1.000 –1.106 1.703
Q(i,j)
A transform similar to the YIQ, but easier to implement in hardware
is the color difference transform, which generates the Y, R – Y, and B
– Y components.
The perfect transform has yet to be agreed on, but with respect to data
compression, the performance differences are insignificant.
11
Types of Image Compression
●Lossless (Reversible, Bit-preserving):
– The image after compression/ decompression is numerically
identical to the original image on a pixel-by-pixel basis.
– Only the statistical redundancy is exploited to achieve compression.
– A modest amount of compression, generally in the range of
1.5:1 to 4:1 (depending on the original image characteristics
such as image detail, noise, resolution and bit-depth), can be
achieved.
●Lossy (Irreversible):
– The reconstructed image contains degradations relative to the
original image. As a result, much higher compression can be
achieved. In general, lower bit rates are achieved by allowing
more degradations.
– The statistical redundancy as well as the perceptual irrelevancy
of the image data are exploited.
– The term visually lossless is used to describe lossy compression
schemes that result in no visible degradation. This definition is
subjective and highly dependent on the viewing conditions.
12
Digital Image Compression
Standardization Activities
The adoption of image compression standards allows for:
● the exchange of images across application boundaries as well as
a means for controlling image quality.
● a significant reduction in the cost of specialized hardware required in real-time image compression systems, thus stimulating
product development.
ITU-T: International Telecommunications Union (formerly
CCITT) is concerned with telecommunications and transmission
applications (e.g., fax standards). Standards originating from ITU
are referred to as recommendations
ISO: International Standardization Organization is concerned with
information processing applications such as storage. Standards
originating from ISO are referred to as International Standards.
•Bilevel images (ITU-T Recs T.4 (G3) and T.6 (G4), JBIG)
•Still-frame, continuous-tone images (JPEG)
•Continuous-tone
image sequences (ITU-T Recommendations
H.261 and H.263, MPEG family of standards).
13
Bilevel Image standards
JBIG (Joint Bilevel Imaging Group) was formed under the joint
auspices of ISO and ITU-T in 1988 to work on an international standard for the compression and decompression of bilevel images.
The group’s primary focus was to seek an algorithm that outperforms
the existing ITU-T Group 3 and Group 4 facsimile standards within
the scope of their applicability and to extend the utility of the
standard to other applications. These extensions include:
• Progressive representation of images so that the compressed data
can be made available to a number of display or transmission
devices with varying degrees of resolution and/or image quality
requirements.
• Adaptivity so that images with different characteristics can be
compressed efficiently. For example, the current fax standards
can result in a data expansion when digital half-tones are used as
the input. This is because these standards use predefined coding
tables based on the statistics of binary documents and text.
Current status: The committee has issued a Standard document
ITU-T T.82 — ISO/IEC 11544 in December of 1993.
14
Still-frame, Continuous-tone Image Standards
JPEG (Joint Photographic Experts Group) was formed under the
joint auspices of ISO and ITU-T in 1986 to work on an interna-tional
standard for the compression and decompression of still-frame,
The JPEG standard consists of the following components:
•Baseline system: Provides a simple and efficient lossy DCTbased
algorithm that is adequate for most applications.
•Extended system features: Allow the baseline system to satisfy a
broader range of applications such as up to 12-bit/sample input,
arithmetic coding, adaptive quantization and progressive modes.
•Independent lossless method: A DPCM-based lossless algorithm
for applications that require an exact reconstruction of the
original image.
Current status: Three Standard documents ITU-T T.81- ISO/IEC
10918 were issued in 1994 and 1995.
15
Continuous-tone Image Sequence Standards
ITU-T has standardized a coding algorithm in 1990 Recommendation H.261 for video telephony and teleconferencing over ISDN at
bit rates ranging from 64 to 1920 (30 × 64) Kb/s and another
algorithm in 1995 Recommendation H.263 for lower bit rate
applications (8-32 Kb/s) over POTS.
MPEG (Motion Picture Experts Group) has also been working under
the auspices of ISO and ITU-T since 1988 to develop standards for
the digital storage and retrieval and transmission of moving images
and sound. The MPEG committee supports several concurrent
activities that address the various needs and bit rate requirements of
different applications:
MPEG I committee has focused on compressing moving pictures
and sound at consumer VCR quality at a combined bit rate of 1.0-1.5
Mb/s for applications such as CD-ROM storage. A Standard
ISO/IEC 11172 was issued in 1994.
MPEG II committee has focused on bit rates higher than 4 Mb/s up
to and exceeding the bit rates required for HDTV. A Standard ITU-T
H.262 ISO/IEC 13818 was issued in 1994.
—
MPEG IV committee has focused on a content-based interactive
audio-visual standard for delivery in 1998.
16
Generalized Image Coding scheme
Original image data
(e.g., 24 bits/pixel full color RGB)
Transformation or decomposition to reduce the
dynamic range, or allow for more efficient
coding (e.g., DPCM prediction error; DCT;
Subband decomposition)
L
O
S
S
Y
L
O
S
S
L
E
S
S
Quantization
Symbol encoding
(e.g., Huffman;
arithmetic;
Original
imageLempel-Ziv)
data
(e.g., 24 bits/pixel full color RGB)
Encoded Data
17
Decomposition or Transformation
• A reversible process (or near-reversible, due to finite-precision
arithmetic) that provides an alternate image representation that is
more amenable to the efficient extraction and coding of rele- vant
information.
• Examples:
– Block-based linear transformations, e.g., Fourier transform,
transform, discrete cosine transform (DCT), Hadamard etc. aimed
at removing redundancy by decorrelating the image blocks.
– Prediction/residual formation, e.g., differential pulse code
modulation (DPCM), motion-compensated frame differenc-ing.
– Subband decomposition.
– Color space transformations, e.g., RGB → YIQ, aimed at
decorrelating color components or utilizing the HVS perception of
color.
18
SYMBOL ENCODING
19
Definition of Information
Let E be an event that occurs with probability p(E). By the occurrence of the event E, one receives
I (E) = log(
1
p(E)
)
units of information. Thus, the more unlikely an event, the more
information is revealed by its occurrence.
The base of the logarithm determines the units for information:
•
–log 2p(E)
bits
•
–ln p(E)
nats
•
–log 10 p(E)
Hartleys
20
Discrete Memoryless Source (DMS)
si, sj, …
Source S
Consider a source that emits a sequence of symbols from a fixed
finite source alphabet of size n, S = {s1, s2,... , sn}, where the probability
of the occurrence of the symbol s, is given by p(si). Furthermore,
assume that successive symbols emitted from the source are statistically independent.
• If symbol si occurs, an amount of information equal to
—
1
I(si)=log2( p ( s i )
)
bits
is provided.
• The probability of occurrence of a source symbol is p(si), so the
average amount of information obtained per source symbol from
the source is
n
H(S) =
∑
i =1
n
p(si)I(si) =
∑
i =1
This is called the entropy of the source.
21
1
p(si)log p(si ) .
Example
Consider a source with an alphabet consisting of four letters S = {s1
s2, s3 s4}, where p(s1) = 0.60, p(s2) = 0.30, p(s3) = 0.05, and p(s4) = 0.05.
The entropy of this source is
H(S) = 1.40 bits/symbol
The entropy of a source attains its maximum value when all the
source symbols are equally probable and that value is log2 n bits,
where n is the size of the source alphabet. In our example, if all the
source symbols had the same probability (p = 0.25), the entropy of the
source would be 2 bits/symbol.
Interpreting the entropy:
• The average amount of information per source symbol provided
by the source.
• Consider encoding this source with a binary alphabet for transmission or storage. According to Shannon’s noiseless source
coding theorem, no uniquely decodable code can have an
average length less than H(S).
22
Markov Information Source
• An mth-order Markov source is a source in which the occurrence of
a source symbol si depends upon a finite number m of preceding
symbols. It is specified by the set of conditional probabilities:
P (si| sj1 , sj2 …, sjm) for i, jp = 1, 2, …, n.
• An mth -order Markov source has nm possible states.
• The entropy of the mth -order Markov source is given by
H(S) =
∑
s
P(sj1 , sj2 …, sjm) H (S|sj1 ,…, sjm)
m
• Because of the structure present in the Markov source, its entropy is
less than a DMS with the same source alphabet and stationary
symbol probabilities (known as the adjoint source and denoted by
S ). The output of the Markov source is more predictable than its
adjoint source and thus less information is revealed by the
occurrence of its source symbols.
23
Markov Source Example
Consider a binary image represented by the values 0 and 1. The
image may be assumed to have been generated by a Markov source
where the probability of a given pixel value being 0 or 1 is determined by the combination of its m neighboring pixels. In general, m
binary-valued pixels constitute 2m different combinations where each
combination represents a certain state or context. An example of a 7pixel neighborhood (128 contexts) is shown below.
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
1
1
1
C D E F G
0
0
0
1
1
1
1
1
B A x
0
0
0
1
1
1
1
1
Let pj denote the probability of the pixel value being zero in the jth
context. The entropy of the jth context Hj is given by
Hj = – pj log2 pj, – (1 – pj ) log2(1 – pj)
bits.
The entropy of the image H(S), based on this model, is the average of
the entropy of these different contexts, i.e.,
2m
H (S ) = ∑ H j P ( j ),
j =1
where p(j) is the probability of the occurrence of the jth context.
24
Entropy of English Language1
Zeroth Approximation
Assume all symbols are equiprobable:
H(S) = log(27) = 4.75
bits/symbol
A typical text generated by such a model would be:
ZEWRTZYNSADXESYJRQY-WGECI
JJ-OBVKRBQPOZBYMBUAWVLBTQ
CNIKFMP-KMVUUGBSAXHLHSIE-M
1
N. Abramson, Information Theory and Coding, McGraw-Hill, 1968
25
Probability of Symbols in English
Symbol Probability Symbol Probability
Space
0.1859
N
0.0574
A
0.0642
O
0.0632
B
0.0127
P
0.0152
C
0.0218
Q
0.0008
D
0.0317
R
0.0484
E
0.1031
S
0.0514
F
0.0208
T
0.0796
G
0.0152
U
0.0228
H
0.0467
V
0.0083
I
0.0575
W
0.0175
J
0.0008
X
0.0013
K
0.0049
Y
0.0164
L
0.0321
Z
0.0005
M
0.0198
Probability of the letters in the English alphabet
26
Entropy of English Language
First Approximation
Symbols are chosen according to their probabilities:
H(S)= ∑ Pilog ( pi ) = 4.03
1
bits/symbol
i
A typical text generated by such a model would be:
AI-NGAE--ITF-NNR-ASAEV-OIE-BAINTHA-HYR-OOPOER-SETRYGAIETRWCO--EHDUARU-EU-C-FTNSREM-DIY-EESE--F-O-SRIS-R--UNNAS
Second Approximation
Symbols are chosen according to their 1st-order conditional probabilities:
1
H(S) = ∑∑ P(i,j)log( P (i| j ) )= 3.32
i
bits/symbol
j
A typical text generated by such a model would be:
URTESHETHING-AD-E-AT-FOULEITHALIORTWACT-D-STE-MINTSAN-OLINS-TWID-OULY-TETHIGHE-CO-YS-TH-HR-UPAVIDE-PAD-CTAVED
27
Entropy of English Language
Third Approximation
Symbols are chosen according to their 2nd-order conditional probabilities:
H (S ) =
∑ ∑ ∑ P(i, j, k) log(
i
j
k
1 ) = 3.1bits/ symbol
P(i | j, k)
A typical text generated by such a model would be:
IANKS-CAN-OU-ANG-RLER-THATTEDOF-TO-SHOR-OF-TO-HAVEMEM-A-IMAND-AND-BUT-WHISSITABLY-THER
VEREER-EIGHTS-TAKILLIS-TA
nth (n → ∞) Approximation
Symbols are chosen according to their nth-order probability, as n →
∞, the entropy of the model approaches that of the English language.
Based on Shannon’s experiments, H(S) is estimated to be between
0.6 and 1.3 bits.
28
Variable-Length Codes
Consider a source S with four symbols: S = { s1, s2, s3, s4}
Symbol Probability Code I Code II
s1
0.60
00
0
s2
0.30
01
10
s3
0.05
10
110
s4
0.05
11
111
•
Average length of Code I = 2.0 bits/symbol
•
Average length of Code II = 1.5 bits/symbol
• Code II is a prefix condition code, i.e., no codeword is a prefix of
any other codeword. This allows a bit stream generated using
Code II to be uniquely decodable. No uniquely decodable code can
have an average length less than H(S).
Example:
Transmitted bit stream: 0011010111010 …
Decoded bit stream: 0/0/110/10/111/0/10/…
Decoded message: s1, s1, s3, s2, s4, s1, s2, …
29
Huffman Coding
A code is compact (for a given source) if its average length is less
than or equal to the average length of all other instantaneous odes for
the same source and the same code alphabet. A general method for
constructing compact codes is due to Huffman and is based on the
following two principles:
• Consider a source with an alphabet of size α. By combining the two
least probable symbols of this source, a new source with α 1
symbols is formed. It can be shown that the codewords for the
original source are identical to the reduced source codewords for all
symbols that have not been combined. Futhermore, the codewords
for the two least probable symbols of the original source are formed
by appending (to the right) ‘0’ or ‘1’ to the codeword
corresponding to the combined symbol in the reduced source.
—
•
The Huffman code for a source with only two symbols consists of
the trivial codewords ‘0’ and ‘1’.
Thus, to construct the Huffman code for a source S, the original
source is repeatedly reduced by combining the two least probable
symbols at each stage until a source with only two symbols is
obtained. The Huffman code for this reduced source is known (‘0’ and
‘1’). Then, the codewords for the previous reduced stage are found by
appending a ‘0’ or ‘1’ to the codeword corresponding to the two least
probable symbols. This process is continued until the Huffman code
for the original source is found.
30
Huffman Coding Example
Original source
Reduced source
Stage 1
Reduced source
Stage 2
Reduced source
Stage 3
Si
P(Si )
S i′
P(Si′)
S i′′
P ( S i′′)
Si′′
P ( si′′′)
S1
0.40
S 1′
0.40
S 1′′
0.40
S1′′′
0.60
S2
0.20
S 2′
0.25
S 2′′
0.35
S 2′′′
0.40
S2
0.15
S 3′
0.20
S 3′′
0.25
S4
0.15
S 4′
0.15
S5
0.10
a) Source reduction process
Original source
Reduced source
Stage 1
Reduced source
Stage 2
S i Codeword
S i′
S i′′
S1
1
S 1′
1
S 1′′
1
S2
000
S 2′
01
S 2′′
00 ←


001
S 3′
000 ←


S 3′′
01 ←


S 4′
001 ←


S2
0
S4
010 ←


S5
011 ←


Codeword
0
Codeword
0
Reduced source
Stage 3
Si′′
Codeword
S1′′′
0←


S 2′′′
1←


1
1
1
b) Codeword construction process
31
0
1
Code Efficiency
A compact code for S may still have an average length far greater
than H(S). In our example, L = 1.50 bits/symbol, while H(S)
=1.40
bits/symbol. The code efficiency is defined as
H (S )
η=
L
In order to achieve η close to one, sometimes it is needed to encode
blocks of source symbols instead of individual source symbols. In our
example, a compact code for blocks of two source symbols results in
an average length per original source symbol of 1.43 bits, which is
reasonably close to the entropy.
Symbol
Probability
Word Length
s1s1
s1s2
s1s3
s1s4
S2s1
S2s2
S2s3
S2s4
S3s1
S3s2
S3s3
S3s4
S4s1
S4s2
S4s3
S4s4
0.3600
1
0
0.1800
3
111
0.0300
5
10100
0.0300
5
10000
0.1800
3
110
0.0900
4
1011
0.0150
6
100010
0.0150
6
101010
0.0300
5
10011
0.0150
6
100011
0.0025
9
101011001
0.0025
9
101011000
0.0300
5
10010
0.0150
7
1010111
0.0025
9
101011011
0.0025
9
101011010
32
Codeword
Decoding Binary Huffman Codes
• Sequential: By comparing the first received bit against the first P
bit of every entry in the table, the number of possible transmitted P
symbols is reduced. This process is sequentially repeated until the
transmitted symbol is uniquely identified.
•
Table-Lookup Procedure: A decoding table of size 2lmax is p
constructed where lmax is the maximum codeword length in the
Huffman table. A block of lmax bits is parsed from the received
sequence and is used to address the contents of the decoding table
which identifies the transmitted symbol and the corresponding p
codeword length. A decoding table for our example, where A → 0,
B → 10, C → 110, and D → 111, would look like the following:
Address
Codeword Length
Decoded Symbol
000
001
010
011
100
101
110
111
1
1
1
1
2
2
3
3
A
A
A
A
B
B
C
D
33
Modified Huffman Code2
Frequently, some of the symbols in a symbol set that is being Huffman encoded are highly improbable. These symbols would take a
disproportionately large share of the codeword memory, since the
codeword length is proportional to – log2 p(si). It is advantageous to
lump the less probable symbols into a symbol called ELSE (or
ESCAPE) and design a Huffman code for the reduced symbol set
including the ELSE symbol. Whenever a symbol belonging to the
ELSE category needs to be encoded, the encoder transmits the codeword for ELSE followed by the actual message.
Example: Consider a source S with 3 symbols with probabilities:
p(s1) = 0.70, and p(s2) = p(s3) = 0.15,
3
H(S) = – ∑
p(si) log2 p(si)
=
1.18
bits/symbol
i =1
The Huffman coding procedure yields the codewords 0,10, and 11,
where the average wordlength is 1.30 bits/symbol. Coding this source
in blocks of two symbols each results in a codebook with 9 entries
where the average codeword length is 1.20 bits per. original source
symbol.
Consider a modified Huffman code for encoding the symbol pairs
which consists of only two entries: s1 s1 with a codeword of 0, and
2
Hankamer, IEEE Trans. Communications, Vol COM-27, pp. 930, 1979
34
“all the other pairs” with a codeword of 1 followed by three extra
bits. The average length of this code is 1.26 bits/symbol, which is
smaller than 1.30!
Ideal Codelength
Entropy:
H(S) = ∑ pi log2 ( 1 )
i
Pi
Average codelength:
L
=
∑ pi Li
i
L ≥ H(S)
The equality occurs iff Li = log2 ( 1 )
Pi
The quantity log2(
symbol si.
1
Pi
) is called the ideal codelength of the
Example: Consider a binary source for which p0
0.999. The ideal codelengths are: L0
bits, respectively.
35
=
=
0.001and p1
9.965 bits, and L1
=
=
0.0014
Limitations of Huffman Coding
•Since
the Huffman codeword lengths have to be integers, the
Huffman code becomes inefficient if the symbol probabilities significantly deviate from negative powers of two.
• Given the minimum codeword length of one bit, Huffman coding
becomes inefficient for low entropy sources (e.g., a binary source
with large skew).
• Performance can only be improved by encoding blocks of symbols,
increasing the complexity and memory requirements.
•
The code becomes inefficient (and can even result in data
expansion), if a mismatch between the actual source statistics and
the assumed code statistics exist (one-pass vs. two-pass systems).
• There is no adaptivity, i.e., performance suffers if there is a large
variation of symbol statistics from one region of the data to another
(unless dynamic Huffman coding is used).
36
Arithmetic Coding
•In arithmetic coding, an entire source sequence is mapped into a
single codeword (unlike Huffman coding where each codeword
corresponds to a certain symbol).
•In theory, arithmetic coding can encode the source sequence at its
ideal codelength. In practice, most implementations exceed the ideal
codelength by about 6%.
•The symbols in
a source sequence are inputted to the arithmetic
coder one at a time, along with their probability estimates. For a
given input, depending on the internal states of the arithemtic coder
and the previous inputs, output bits may or may not be generated. In
the long run, the output sequence is the codeword for the entire input
sequence and has a length close to the ideal codelength of that
sequence.
Source Symbol
Sequence
Probability
Estimate
Codeword
Sequence
Arithmetic
Coder
37
Practical Implementations of Arithmetic Coding
In many practical implementations of arithmetic coding, such as the
IBM Q-Coder or the JBIG/JPEG standard QM-Coder, the following
features exist:
• Only binary decisions
are encoded. A nonbinary symbol is converted into a series of binary symbols by using an appropriate
decision tree.
• Finite precision registers are used for internal arithemtic operations.
Multiplications are approximated by adds and shifts.
• The probability of a particular symbol being 0 or 1 is selected from
a finite set of values (i.e., is quantized).
• The probability of a particular symbol being 0 or 1 is estimated
from the encoded sequence and is adaptively updated after the
encoding of each binary symbol using a finite state machine.
• The adaptive probability estimation is part of the arithemtic coding
process and, as such, has no effect on its complexity or processing
speed.
38
Q-Coder and the QM-Coder
Two of the most popular adaptive binary arithmetic coders are the
Q-Coder invented by IBM, and the QM-Coder used in the JBIG and
JPEG international standards:
Q-Coder (IBM):
• 12-bit registers
• 29-state probability table
• Bit stuffing to resolve carry propagation
QM-Coder (IBM/AT&T/Mitsubishi):
• 16-bit registers (smallest probability estimate is 2
–15
).
• 113-state fast-attack probability table (46 nontransient states)
• Uses conditional exchange, resulting in up to 0.50% better compression on some images
• Delayed output to resolve carry propagation
• Inexpensive chip set available driven by the JBIG facsimile standard
39
LZW Compression Algorithm
• The
LZW (Lempel-Ziv-Welch) algorithm maps variable size
strings of source symbols into fixed-length codewords. It is based
on progressively building a dictionary which contains the symbol
strings (of varying sizes) encountered in the message being
compressed. The entries in the table reflect the statistics of the
message. The LZW algorithm is particularly suitable for the
encoding of one-dimensional data (such as computer source files)
where certain patterns (strings of source symbols) occur frequently.
In most applications, the size of the table is chosen to be 12 bits.
The UNIX utility “compress” is a variation of LZW coding. The
following describe the encoder and the decoder operations for the
LZW algorithm.
• Encoder: The table (dictionary) is first initialized to contain strings
of single source symbols. The input string is then examined one
symbol at a time and the longest input string for which an entry in
the table exists is parsed off. The codeword for this string (a fixedlength code addressing the location of the recognized string in the
table) is transmitted to the receiver. Each parsed input string is
extended by its next input source symbol to form a new string
which is then added to the table. Each added string is identified by
a new codeword, namely, its address in the table.
40
LZW Compression Algorithm
• Decoder: The decoder is initialized to the same string table as the
encoder. Each received codeword is translated by the way of the
string table into a source string. Except for the first symbol,
everytime a codeword is received, an update to the string table is
made in the following way: When a codeword has been translated,
its first source symbol is appended to the prior string to add a new
string to the table. In this way the decoder incrementally
reconstructs the same table used at the encoder. A potential
problem arises when an input source string contains the sequence
KωKω, where K denotes a single source symbol, ω denotes a
string of source symbols, and the string Kω already appears in the
encoder string table. The compressor will parse out Kω, send the
codeword for Kω, and add KωK to the table. Next, it will parse out
KωK and send the just generated codeword for KωK. However,
the decoder, on receiving the codeword for KωK, will not yet have
added that string to its table because it does not yet know an
extension symbol for the prior string. In such cases, the decoder
should continue decoding based on the fact that this codeword’s
string should be an extension of the prior string. Thus, the first
symbol in the new string is the same as the first symbol of the prior
string, which now also becomes the extension symbol of the prior
string.
41
LZW Example
Input string Index
A
B
C
AB
BA
AA
AAA
AAAC
CA
AAB
BAA
AAAA
AC
CAB
BAAA
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
.
Input: ABAAAAAACAABAAAAACABAAAAB...
0
Parsed Sequence:
1
0
5
6
2
5
4
A/ B/ A/ A A / A A A / C / A A / B A /
6
0
8
10
A A A / A/ C A / B A A / AAB 
42
Rice Code
When the probability distribution of the symbols to be
encoded is exponential, near optimal encoding can be performed by using a series of parameterized variable-length
codes known as Rice codes.
n
Consider encoding a source S with 2 symbols, labeled
n
{0,1,...,2 – 1}, and arranged in a decreasing probability
order. In a Rice code with parameter k, the n – k most
significant bits are encoded as a unary number while the
least significant k bits are fixed-length coded. Depending
on the variance of the exponential distribution, a certain
value of the parameter k will result in the best
performance. Following is an example with n = 3:
Symbol
0
1
2
3
4
5
6
7
k=0
k=1
1
10
01
11
001
010
0001
011
00001
0010
000001
0011
0000001 00010
00000001 00011
43
k=2
100
101
110
111
0100
0101
0110
0111
k=3
000
001
010
011
100
101
110
111
Pixel Representation in Different Resolutions1
44
Resolution Reduction (RR)
The high-resolution image is divided into blocks of 2 × 2 pixels, and
each block is mapped into one pixel in the low-resolution image.
Although subsampling is the most straightforward method of resolution reduction (RR), it creates many artifacts with text and line
drawings (e.g., deletes thin lines), as well as with dithered images
(e.g., does not preserve average gray). As a result, the RR technique
adopted by the standard is a carefully designed method which uses
information from the neighboring high-resolution pixels in addition to
previous low-resolution pixels to determine the mapping. It is aimed
at preserving edges, lines, periodic patterns and dither patterns, thus
achieving high performance with both text and dithered images.
The concatenation of the color values of the numbered pixels (”0” for
white and ”1” for black) is used as an address to the RR table, where
the pixel labeled ”0” specifies the least significant bit, to determine
13
the color of the pixel marked with ”?”
11
10
8
7
6
5
4
3
9
?
2
1
0
Pixels used to determine the color of a low-resolution pixel
3
JBIG Early Draft, WG9-S1R2, December 1990
45
Resolution Reduction Table4
4
JBIG Early Draft, WG9-S1R2, December 1990
46
Typical Prediction (TP)
A given low-resolution pixel is called “not typical” if that pixel and
all of its eight low-resolution nearest neighbors have the same color,
but one of the four high-resolution pixels associated with it differs
from this common color. A certain low-resolution line is “not typical”
if it contains any “not typical” pixels. For each low-resolution line,
the parameter LNTP determines if that line is typical. If LNTP = 1,
the line is “not typical”, and TP is disabled. If LNTP = 0, during the
process of coding that line, the coder would look for typical pixels.
In general, the TP block looks for regions of solid color (white or
black). Once it is determined that the pixel to be encoded belongs to
such a region, none of the processing normally performed by the
other blocks is required, thus resulting in a substantial gain in
processing speed. On typical text images, TP would result in avoiding
the coding of over 95% of the pixels15. The savings are significantly
smaller for dithered images. Although the primary purpose of TP is to
increase implementation speed, it also provides some coding gain.
5
JBIG Early Draft, WG9-SJR2, December 1990
47
Deterministic Prediction (DP)
Consider a low-resolution image that has been generated from a highresolution image according to a particular set of resolution reduction
(RR) rules. In the decoding of the high-resolution image, at times, a
certain pixel value may be inferable from the available low-resolution
pixels in addition to the high-resolution pixels already decoded. In
such a case, the pixel value can be deterministically predicted and
there is no need for it to be encoded. A trivial example of this is the
subsampling RR rules, where the high-resolution pixel at the top-left
of the 2 × 2 block can always be predicted as it has the same value as
the low-resolution pixel representing that block.
The values of the low-resolution and high-resolution neighboring pixels which are available to both the encoder and the decoder at the
time of the encoding of a certain pixel are used as an address into a
table to determine if that pixel is deterministically predictable. DP
tables are highly dependent on the RR rules employed. The JBIG
Standard makes provisions for an encoder to download DP tables to a
decoder if it using a private RR algorithm.
The primary purpose of DP is to improve the coding efficiency.
Using the RR rules adopted by the JBIG Standard, DP provided about
a 7% coding gain (which is thought to be typical) on a particular set
of test images6.
6
JBIG Early Draft, WG9-S1R2, December 1990
48
Deterministic Prediction (DP)7
0
1
2
3
The four possible phases for high-resolution pixels
0
1
4
7
2
5
9
8
10
6
3
11
12
Labeling of pixels used by DP
Phase
Target
Pixel
Reference
Pixels
0
1
2
3
8
9
11
12
0,1,2,3,4,5,6,7
0,1,2,3,4,5,6,7,8
0,1,2,3,4,5,6,7,8,9,10
0,1,2,3,4,5,6,7,8,9,10,11
Number of hits with default
resolution reduction
DP pixels for each spatial phase
7
JBIG Early Draft, WG9-S1R2, December 1990
56
20
108
526
1044
Model Templates for Arithmetic coding
Model templates define a neighborhood around the pixel which is
being coded. The values of the pixels within the model template, plus
the spatial phase of the high-resolution pixel in the differential layers,
specify a context for the arithmetic coder. Each pixel in the template
contributes one bit to the context. The arithmetic coder needs to
maintain a separate probability estimate for the LPS symbol within
each context.
For the lowest-resolution layer, the model template consists of 10
pixels (including the adaptive template whose position is allowed to
change). This results in 1024 different contexts.
For the differential layers, four different templates are used which
depend on the spatial phase of the pixel to be encoded. The templates
reference pixels from both the high- and low-resolution images.
There are ten pixels in each differential-layer template. This, along
with the two bits required to specify the spatial phase of the pixel,
gives rise to 4096 different possible contexts for the encoding of the
differential layers.
57
Model Template Configurations8
X
X
X
X
X
X
X
X
X
?
A
Model template for lowest resolution layer
X
A
X
X
X
X
X
A
X
X
?
X
X
X
X
X
?
X
X
X
X
A
X
X
X
X
X
X
?
X
Phase 3
Phase 2
Model templates for differential layer coding
8
JBIG Early Draft, WG9-S1R2, December 1990
58
X
X
X
?
X
X
Phase 1
X
A
X
X
Phase 0
X
X
X
X
Adaptive Templates (AT)
The various templates used to form the contexts for the arithmetic
coder are allowed to somewhat change during the coding process. In
particular, the location of a designated pixel (called the AT pixel)
within the template is allowed to change according to certain rules to
provide a higher predictive value. Generally speaking, the function of
the AT is to find a horizontal periodicity in the image and change the
template in such a way that the pixel preceding the pixel being
encoded by exactly this periodicity is incorporated into the template.
The AT block provides substantial coding gain (by as much as a
factor of two)9 on images containing ordered dither.
The figure in the next page shows the possible locations for the AT
pixel in both the lowest-resolution layer and the differential layers.
The initial location of the AT pixel is denoted by “0”. At any time
there can only be one AT pixel in the model template and its location
should be chosen from either the location labeled “0”, or the locations
labeled by an integer in the range 3 to M, where M is a free
parameter. For simplicity, when encoding the differential layers, any
changes in the AT pixel location shall be effective simultaneously for
all four spatial phases.
9
JBIG Early Draft, WG9-S1R2, December 1990
59
Adaptive Templates (AT) Locations10
M
…
5
4
3
X
X
X
X
X
X
X
X
X
?
0
Possible locations for the AT pixel (lowest resolution layer)
X
M
…
5
4
3
X
0
X
X
?
X
Possible locations for the AT pixel (differential layers)
10
JBIG Early Draft, WG9-S1R2, December 1990
60
LOSSLESS IMAGE
COMPRESSION SCHEMES
61
Lossless (Bit-preserving) Encoding
• Lossless predictive coding (e.g. Differential Pulse Code
Modulation, DPCM).
• Bit-plane encoding.
• Lossy encoding followed by lossless coding of the
difference image.
62
Bit-plane Encoding
Consider an image of size n × m where each pixel has a depth of k
bits. This image may be represented by k, n × m bit-planes. Since
each bit-plane is a binary image, binary compression techniques can
be used to separately encode each bit-plane.
f (i,j) = 10011001
most significant bit plane
least significant bit-plane
63
The Gray Code (GC)
Gray code has the property that if two pixel values differ by one, their
GC representations would differ by only one bit. This results in more
coherent bit planes compared with the binary representation and more
efficient compression. In fact, the coherence of each GC bit plane is
roughly equivalent to that of the next most significant binary bit
plane. A method of generating GC is
GC = BC
(BC
» 1),
where
denotes bit-wise exclusive OR operation, and » denotes bitwise logical right-shift operation.
Example:
127
128
(B) 01111111
(GC) 01000000
(B) 10000000
(GC) 11000000
Binary Code
Bit Planes
3
2
1
Binary Code
Bit Planes
0
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
3
2
1
0
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
MSB
LSB
MSB
64
LSB
Gray Code and_ Binary Code Representation
of Bit-Planes 7 (MSB) and 6
65
Gray Code and Binary Code Representation
66
Bit-Plane Entropy Table
Bit-Plane
LENA
BC Entropy GC Entropy
7-pel context 7-pel context
bits/pixel
bits/pixel
GC Q-Coder
7-pel context
bits/pixel
BP 7 (MSB)
BP 6
0. 142
0.141
0.144
0.273
0.145
0.150
BP 5
BP 4
0.436
0.707
0.227
0.473
0.235
0.490
BP 3
BP 2
BP 1
0.918
0.997
0.999
0.672
0.918
0.996
0.698
0.957
1.039
BP 0 (LSB)
0.999
0.999
1.042
Total
5.471
4.571
4.755
Bit-Plane
BOOTS
BC Entropy GC Entropy
7-pel context 7-pel context
bits/pixel
bits/pixel
GC Q-Coder
7-pel context
bits/pixel
BP 7 (MSB)
BP 6
0. 182
0.182
0.186
0.360
0.222
0.228
BP 5
BP 4
0.590
0.771
0.442
0.614
0.454
0.631
BP 3
BP 2
BP 1
0.901
0.948
0.953
0.783
0.907
0.947
0.808
0.938
0.981
BP 0 (LSB)
0.956
0.952
0.987
Total
5.661
5.049
5.213
• Entropies are for 2-D, 7-pel Markov contexts.
• Q-Coder is the adaptive binary arithmetic coder developed by
IBM.
67
Differential Pulse Code Modulation (DPCM)
Previous
line
B
C
A
Xn
D
Current
line
• The N pixels prior to X
are used to form a linear prediction
(estimate) of Xn denoted by X̂ n :
n
Xˆ N =
N −1
∑α X ,
i=0
i
i
where X̂ n is represented by the same number of bits as Xn and is
truncated to the same range (e.g., 0 to 255 for an 8-bit image).
• The error (differential) signal, e, is formed as
e = XN – X̂ n
The differential signal typically has a largely reduced variance
compared to X N, is significantly less correlated than X N , and has a
stable histogram well approximated by a Laplacian (doublesided
exponential) distribution. A global Huffman code table may be
designed for its encoding which would result in good performance
for a wide variety of images.
68
69
Huffman Coding of Differential Signal
Typically the number of entries in the Huffman table is large (e.g.
~ 512). This number may be substantially reduced by only a slight
decrease in coding efficiency. All the symbols which represent large
differences (and are thus less probable) are combined into one symbol
and the Huffman code associated with it is used as a prefix to the
actual binary representation of the pixel value to be encoded.
Modified Huffman Table for LENA
Difference value Histogram
-16
-15
.
.
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
.
.
15
16
Mag > 16
623
729
.
.
5376
7183
9893
13361
16813
21176
24830
25931
24366
20556
16639
12578
9387
7134
5190
.
.
792
717
9347
Bits
Code
9
8
.
.
6
5
5
4
4
4
3
3
4
4
4
4
5
5
6
.
.
8
9
5
110111000
00110010
.
.
110110
01111
11010
0110
1001
1110
000
010
1111
1100
1000
0010
10110
01110
101111
.
.
00110011
110111001
10101
• Entropy of the difference signal = 4.56 bits/pixel.
• Average length of Huffman code (512 entries) = 4.60 bits/pixel
• Average length of modified Huffman (34 entries) = 4.78 bits/pixel
70
Lossless DPCM Results
LENA BOOTS
Description
Zeroth-order image entropy
Zeroth-order differential entropy
Original image standard deviation
Differential image SD
Reduction in variance
7.45
4.56
47.94
6.94
48
7.49
5.23
59.55
13.38
20
Local Huffman Code (LHC)
Modified LHC (−64,64)
Modified LHC (−32,32)
Modified LHC (−16,16)
Modified LHC (−8,8)
Global Huffman Code (GHC)
Modified GHC (−64,64)
Modified GHC (−32,32)
Modified GHC (−16,16)
Modified GHC (−8,8)
4.60
4.60
4.62
4.69
4.96
4.67
4.67
4.67
4.72
5.09
5.26
5.27
5.35
5.51
5.76
5.34
5.35
5.43
5.61
5.81
Arithmetic Coding
Context-based differential
4.45
4.94
entropy (45 contexts)
IBM Q-coder bit rate
4.60
4.98
Huffman Coding
71
QUANTIZATION STRATEGIES
72
Quantization
A quantizer is essentially a staircase function that maps the
possible input values into a smaller number of output levels. For the
purposes of image compression, quantization reduces the number of
symbols that need to be encoded at the expense of introducing
errors in the reconstructed image, i.e., it acts as a control knob
that trades off image quality for bit rate. The individual quantization
of each signal value is called scalar quantization (SQ), while the
joint quantization of a block of signal values is called vector
quantization (VQ).
• Scalar quantization (SQ)
− Uniform quantizer
− Lloyd-Max or MMSE quantizer
− Entropy-constrained quantizer
• Trellis-coded quantization (TCQ)
• Vector quantization (VQ)
73
Uniform Scalar Quantizer
An N-level scalar quantizer maps the signal x into a discrete variable
x* that belongs to a finite set {ri, i = 0,..., N − 1} of real numbers
referred to as reconstruction, levels or quantizer output levels. The
range of values x that map to a particular x* are defined by a set of
points {di, i = 0,..., N}, referred to as decision levels. The
quantization rule states that if x lies in the interval (di,di+1], it is
mapped or quantized to ri, which also lies in the same interval.
The simplest form of a quantizer is scalar a uniform quantizer where
the quantizer decision levels are all of equal length, referred to as the
quantizer step size.
Reconstruction Levels
Step Size
Decision Levels
74
Optimum MMSE Nonuniform Quantizer
Consider a signal x with a probability distribution
function (pdf) p(x). In some applications, it is desirable
to find the quantizer that minimizes the quantization
distortion for a given number, N, of quantizer output
levels. For a minimum-mean-square error (MMSE)
distortion criterion, this quantizer is known as the
Lloyd-Max quantizer.
P(x)
di
75
ri
di+1
X
Lloyd- Max Q uant izer
For a given number of quantizer output levels (reconstruction
1ev-els) N, and a signal pdf p(x), the optimum MMSE quantizer has
reconstruction levels ri , and decision levels di that minimize
є =
N
∑∫
i =1
( x − r ) p ( x ) dx ,
d i +1
2
i
di
The solution to the above minimization problem is
dj =
r j −1 + r j
2
,
rj =
∫
d
j +1
d
j
∫
d
d
j +1
xp ( x ) dx
p ( x ) dx
,
j
Each decision level is half-way between the two neighboring reconstruction levels, and each reconstruction level is at the cen1troid of the pdf that is 11enclosed by the decision region, i.e., the
conditional mean of the input signal in the given interval.
Lloyd iterative design algorithm
An initial set of values for { di } are chosen; the optimum { ri } for
the given set of { di } are solved for; the optimum { di } for the
calculated { ri } are then determined; the iterations are proceeded
with till ‘satisfactory convergence is achieved.
Max, IRE Trans. Info. Theory, Vol IT- 6, pp. 7, 1960.
76
‘U
Entropy- Constrained Quantizer
The Lloyd-Max quantizer aims at minimizing the MSE for a given
number N of output quantization levels, and is useful when the out-
put levels are coded with fixed-length codes. When variablelength coding is permitted, the final bit rate is determined by the
entropy of the quantizer levels rather than by the number of levels.
For example, a uniform quantizer may have a lower entropy, despite
a larger number of output levels, compared to a Lloyd-Max quantizer
operating with the same distortion. This leads to another criterion
for quantizer design known as entropy-constrained quantization,
where the quantization error is minimized subject to the constraint
that the entropy of the quantizer output levels has a prescribed
value.
77
Entropy- Constrained Quantizer (EC Q)
• It
has been theoretically established that for memoryless
sources and at high output bit rates, the optimal entropyconstrained SQ quantizer is a uniform quantizer. With mild
restrictions, these results are applicable to all pdfs and
distortion measures of practical interest at high bit rates.
• For the MSE distortion, the output bit rate of the optimal ECQ
is within 0.255 bits/sample of the theoretical rate-distortion
bound achievable by VQ only in the limit of very large block
sizes.
• It has been experimentally shown that even at low bit rates, for
the MSE distortion and for a variety of memoryless sources,
the performance of the uniform quantizer is virtually
indistinguishable from that of the optimal ECQ, and
interestingly, it is even closer to the rate-distortion bound than
0.255 bits, especially for more peaked pdfs such as Laplacian
and gamma distributions.
• Based
on the above results, a uniform scalar quantizer
followed by entropy coding should be considered when i) The
source is memoryless; ii) The bit rate is moderate to large, and;
iii) Variable-length coding of the quantizer levels are
permitted.
Makhoul et al., Proc. IEEE, Vol 73, PP. 1551, 1985.
78
Vector Quantization (VQ)
In VQ, an n-dimensional input vector X = [x1, x2, ..., xn], whose
components represent the discrete or continuous signal values, is
mapped or quantized into one of Nc possible reconstruction
vectors, Yi, i = 1, 2, ..., Nc. The distortion in approximating
(quantizing) X with Yi is denoted by d(X, Yi) and for a MSE
distortion measure is defined as
2
1 n
d MSE ( X , Y ) = ∑ ( xi − yi )
n i =1
The set Y is referred to as reconstruction code book and its mem bers are called codevectors or templates. The code book design
problem is to find the optimal codebook for a given input
statistics, distortion measure, and codebook size Nc. Given the
codebook, the quantization process (the determination of the
decision regions) is straightforward and is based on the minimum
distortion rule: an input vector that needs to be quantized is
compared to all the entries in the codebook and is quantized to
the codevector that results in the smallest distortion.
An important theoretical result states that as n → ∞, the quantizer
output distortion can get arbitrarily close to the rate-distortion
bound. Furthermore, for a given codebook size Nc and for very
large n, the quantizer output entropy approaches log2 Nc, i.e., the
need for entropy coding is eliminated.
Gray, IEEE ASSP Magazine, PP. 4, April 1984
79
VQ Example
80
Codebook Generation (LBG Algorithm)
1. Initialization: Given Nc = codebook size, є = distortion
threshold, n = vector dimension, and A0 = {Yk, k = 1, ..., Nc} as
an initial codebook, set m = 0 and D-1 = ∞.
2. Given the codebook Am find its minimum distortion partition
p (Am) = { Si, i = 1, ..., Nc} such that
X ∈ Si
if d(X, Yi) ≤ d(X, Yj)
for all j.
D m −1 − D m
m
3. Compute resulting average distortion D ; if
≤ є,
Dm
Stop.
4. Find the optimum codebook Am+1 =
{Yk, k
1, ..., Nc}, by
calculating the centroids of the partition p (A ); set m ← m+1;
go to step 2.
m
Linde et. al, IEEE Trans. Communications, Vol COM-28, pp. 84, 1980.
81
=
Tree-Search VQ
• In finding the minimum distortion codevector, a full search of
the codebook can be performed at a computational cost of
O(n.Nc). The storage cost is n. Nc .
• If
a tree structure is imposed on the codebook, where each
node has m branches with p = logm Nc levels to the tree, the
computational cost reduces to O (n. m. logm Nc), but the storage
cost increases to n. (Nc – 1). m .
m −1
The codebook is designed by using successive applications of
the LBG algorithm at each node for a codebook of size m.
•
Example: A binary tree (m = 2), for Nc = 8:
Gray, IEEE ASSP Magazine ,pp. 4, April 1984.
82
Trellis-Coded Quantization (TCQ)
This technique combines linear error correcting codes and lowcomplexity scalar quantizers to achieve a performance similar to
VQ, but with significantly less complexity. TCQ can effectively
close the 0.255 bps that exists between the best scalar and vector
quantizers.
Super Q
83
LOSSY IMAGE
COMPRESSION SCHEMES
84
Lossy Compression Schemes
• Predictive Coding (DPCM, Photo CD)
• Transform Coding (DCT and the JPEG international standard)
• Subband and Wavelet Coding
• Fractal Compression
• Vector Quantization Image Compression
• Progressive Transmission
85
DIFFERENTIAL PULSE CODE
MODULATION (DPCM)
86
Differential Pulse Code Modulation (DPCM)
Previous
line
B
C
A
Xn
D
Curren
tline
•The N pixels prior to X
are used to form a linear prediction
(estimate) of XN denoted by XN:
N
Xˆ N =
N −1
∑a X
i=0
i
i
.
• The error (differential) signal, e, is formed as
e = X N − Xˆ N .
• The differential signal, e, is quantized and encoded.
The design of a DPCM system consists of two parts:
1. Predictor Optimization (Finding the optimum set of ai.)
2. Quantizer Optimization.
87
DPCM Example
Predictor: X̂ = 0.75A – 0.5B + 0.75C
Original image variance = 2300,
Differential image variance = 48,
Reduction in variance =
i
2300
48
(di , di+1) → ri
= 48 ×.
Probability Huffman code
0 (–255,–16) → –20
0.025
111111
1
(–16,– 8) → –11
0.047
11110
2
(– 8,– 4) → – 6
0.145
110
3
(– 4,0) → – 2
0.278
00
4
(0,4) → 2
0.283
10
5
(4,8) → 6
0.151
01
6
(8,16) → 11
0.049
1110
7
(16,225) → 20
0.022
111110
• Fixed-length binary code average bit rate
• Huffman code average bit rate
• Entropy H(S)
= 2.52
=
bits/symbol
88
=
3 bits/symbol
2.57 bits/symbol
DPCM Example
Predictor: X̂ = 0.5A + 0.5C
Original
Image
Reconstructed
Image
100
105
115
120
X1
107
45
X2
110
100
115
107
110
105
121
94
X 1R
X 2R
Encoding pixel X1:
Prediction: X̂ 1= (105 + 115)/2 = 110,
Error signal: e1 = X 1 − Xˆ 1 = 120 − 110 = 10 ,
Quantized error signal: e1 = 11,
*
*
R
Reconstructed pixel value: X 1 = Xˆ 1 + e1 = 110 + 11 = 121 .
Encoding pixel X2:
Prediction: X̂ 2 = (121 + 107)/2 = 114,
Error signal: e 2 = X 2 − Xˆ 2 = 45 − 114 = − 69 ,
*
Quantized error signal: e2 = –20,
Reconstructed pixel value: X 2R = Xˆ 2 + e2* = 114 − 20 = 94.
89
Predictor Optimization
Habibi, IEEE 7Vans. Communications Technology, Vol COM-19, pp. 948, 1971.
90
Predictor Optimization
B
C
D
A X
• The stationary assumption may not hold for some images. Further, optimizing the predictor coefficients for each image may
not be feasible for many real-time applications. Instead, a set
of typical predictor values can be used which result in good
performance for most images. Examples are:
Xˆ = A, orB, orC, 1st-order, 1-D predictor,
Xˆ = 12 A + 12 C
2nd-order, 2-D predictor,
Xˆ = 0.75 A − 0.5 B + 0.75C 3rd-order, 2-D predictor.
Xˆ = A − B + C plane predictor.
• Two-dimensional
prediction usually results in large
performance improvement compared to 1-D prediction.
Although the improvement in signal-to-noise-ratio (SNR) is
only around 3 db, the subjective quality improvement of the
encoded images is usually more. This is mainly due to the
elimination of the jaggedness around non-horizontal edges.
Two- dimensional prediction, however, requires the buffering
of the previous line.
91
DPCM Artifacts
Netravali, Proceedings of IEEE, Vol 68, pp. 366, 1980
92
Adaptive Image Compression
A compression scheme is adaptive if the structure of the encoder
or the parameters used in encoding vary locally within an image.
1. Causal Adaptivity: Adaptivity at the encoder is based
only on the previously reconstructed values and is fully
repeatable at the decoder. The main advantage is that
encoding does not require overhead bits. The disadvantages
are: (a) the encoder would fail to adapt to the abrupt changes
in the input statistics which can not inferred from previous
reconstructed values; (b) the decoder should be as complex
as the encoder since it should repeat the same decision
making process.
2. Noncausal Adaptivity: Adaptivity at the encoder is based
on previously reconstructed values as well as the future input
values. Since the latter are not available at the decoder, the
encoder should provide additional overhead bits to inform
the decoder of the type of adaptivity selected. Systems of
this kind usually display superior performance and require
less decoder complexity but at the expense of extra overhead
bits.
93
Adaptive Prediction in DPCM
B
C
A
X
D
• Determine contour lines and use only the pels along contours
to form the prediction.
• Scale the prediction in high contrast areas based on previous
reconstructions.
• Median Predictor: This predictor, which is used in the new
JPEG lossless proposal, first performs a simple test to detect
horizontal or vertical edges. If an edge is not detected, then the
predicted value is A + C − B, as this would be the value of the
missing pixel if a plane is passed through the A, B, and C
sample locations. If a vertical edge is detected, C is used for
prediction, while for a horizontal edge, A is used.
X̂ =
min(A, C)
max(A,C)
A + C −B
if B ≥ max(A,C)
if B ≤ min(A,C)
otherwise
94
Adaptive Quantization
• Scale the quantizer according to the quantized value of the previous differential signal.
• Block the scan line and for each block switch among a finite
number of quantizers based on the block variance.
• Choose among a finite number of quantizers for each image
block according to the RMS error resulting at the encoder12:
—
—
—
—
—
—
12
Partition each scan line into blocks of N pixels.
Encode the block using each of M quantizers.
Measure distortion resulting from each quantizer.
Select quantizer with minimum distortion.
Transmit overhead information to identify quantizer to receiver.
Transmit encoded signal for block.
Habibi, IEEE Trans. Communications, Vol COM-26, pp. 1671, 1978.
95
Block Diagram Of Switched Quantizer
96
CM Bitrate/Error Table (LENA)
Technique Bitrate Entropy RMSE SNR
1-D DPCM
1-D DPCM
1-D DPCM
2-D DPCM
2-D DPCM
2-DD PCM
1-D ADPCM
1-D ADPCM
1-D ADPCM
2-D ADPCM
2-D ADPCM
2-D ADPCM
bits/pixe bits/pixel
l
1.00
1.00
1.65
2.00
2.39
3.00
1.00
1.00
2.00
1.72
3.00
2.52
1.20
2.20
3.20
1.20
2.20
3.20
1.17
1.92
2.78
1.18
2.03
2.91
• Predictor for 1-D DPCM and ADPCM is
(0-255)
(db)
18.67
9.44
5.11
14.58
6.93
3.71
22.71
28.63
33.96
24.86
31.32
36.74
10.91
4.37
2.16
7.84
2.87
1.37
27.37
35.32
41.44
30.24
38.97
45.40
= .97A.
Predictor for 2-D DPCM and ADPCM is X̂X̂= .75A – .50B +
.75C.
Quantizers are Lloyd-Max for a Laplacian distribution with the
same variance as the differential signal. The frequency of
usage of each quantizer level is used as its probability to
calculate the entropy.
• The ADPCM scheme switches among 4 quantizers based on a
minimum mean-square-error (MMSE) criterion on a block size
of 10 pixels. The resulting overhead is .2 bits/pixel. Quantizers
are scaled versions of the Max quantizer for the global
differential signal with scaling factors of .50, 1.00, 1.75, and
2.50.
• SNR is defined as 101og10 (2552/mse) db.
97
DPCM Bitrate/Error Table (BOOTS)
Technique Bitrate Entropy RMSE SNR
1-D DPCM
1-D DPCM
1-D DPCM
2-D DPCM
2-D DPCM
2-D DPCM
bits/pixel bits/pixel
1.00
1.00
2.00
1.71
3.00
2.41
1.00
1.00
2.00
1.69
3.00
2.41
(0-255)
22.79
10.96
5.33
15.91
7.70
4.01
(db)
20.98
27.33
33.60
24.10
30.40
36.07
1-D ADPCM
1-D ADPCM
1-D ADPCM
1.20
2.20
3.20
1.19
1.89
2.74
14.12
5.97
3.16
25.13
32.61
38.14
2-D ADPCM
2-D ADPCM
2-D ADPCM
1.20
2.20
3.20
1.18
1.89
2.74
10.20
4.75
2.56
27.96
34.60
39.97
•
Predictor for 1-D DPCM and ADPCM is X̂ = .97A.
Predictor for 2-D DPCM and ADPCM is X̂ = .75A – .50B + .75C.
Quantizers are Lloyd-Max for a Laplacian distribution with the same
variance as the differential signal. The frequency of usage of each
quantizer level is used as its probability to calculate the entropy.
•
The ADPCM scheme switches among 4 quantizers based on a
minimum mean-square-error (MMSE) criterion on a block size of 10
pixels. The resulting overhead is .2 bits/pixel. Quantizers are scaled
versions of the Max quantizer for the global differential signal with scaling
factors of .50, 1.00, 1.75, and 2.50.
• SNR is defined as 10 log
10(255
2
/mse) db.
98
TRANSFORM CODING
AND THE
JPEG INTERNATIONAL
STANDARD
99
Transform Coding
• The original image is first divided into a number of smaller
subimages or blocks which usually are processed independently
of one another. A reversible transformation is performed on each
block to reduce the correlation between pixels. The transform
coefficients are then quantized and coded.
• Example: Consider a simple 1-D transform encoder operating on
1 × 2 subimages. The vector x = (x1,x2) , represents two adjacent
picture elements and y = (y1,y2) denotes the transform which in
this case is a 45◦ rotation of the axes. A scatter plot of the
values of x1 and x2 demonstrates that large values of x1 – x2
2
2
σ y2 << σ y2 Although
unlikely. σ x = σ x ,
but
.The transformation only redistributes the energy (variance)
while keeping the total energy unchanged.
1
2
2
100
1
Image Transforms13
• Karhunen-Loève Transform (KLT): It is the optimum transform in an energy-packing sense, i.e., if a mean-square (energy)
criterion is employed. However, it is image-dependent, requires
estimation of the image covariance matrix, and it can not be
implemented by a fast algorithm (takes O(N2) operations to
transform a 1-D, N-point data).
• Discrete Fourier Transform (DFT): Well-known for its connection to spectral analysis and filtering. Extensive study done on
its fast implementation (O(N log2 N) for N-point DFT). Has the
disadvantage of storage and manipulation of complex quantities
and creation of spurious spectral components due to the assumed
perioclicity of image blocks.
•
Discrete Cosine Transform (DCT): For conventional image
data having reasonably high inter-element correlation, its performance is virtually indistinguishable from KLT. Avoids the
generation of the spurious spectral components which is a
problem with DFT and has a fast implementation which avoids
complex algebra.
•
Walsh-Hadamard Transform (WHT): Although far from
optimum in an energy-packing sense for typical imagery, its
simple implementation (basis functions are either −1 or +1) has
made it widely popular.
•
Haar Transform (HT): Not very popular due to its local sensitivity to image detail.
13
Pratt, Digital Image Processing, Wiley, 1978
101
Discrete Cosine Transform (DCT)
4C (u )C (v ) n −1
F (u , v ) =
∑
n2
j =0
 ( 2 j + 1)uπ 
 ( 2k + 1)vπ 
cos



2n
2n

n −1
∑ f ( j , k ) cos 
k =0
 (2 j + 1)uπ 
 ( 2k + 1)vπ 
cos
f ( j , k ) = ∑ ∑ C (u )C (v ) F (u , v ) cos 



2n
2n
u = 0 v =0

n −1 n −1
 12
c (w) = 
1
forw = 0


forw = 1, 2 ,  , N − 1
102
The JPEG Standard “Toolkit”
The JPEG standard provides a “toolkit” of techniques for the
compression of continuous-tone, still, color and monochrome
images from which applications can select the elements that
satisfy their particular requirements:
• The baseline system provides a simple and efficient DCTbased algorithm adequate for most image compression
applications. It uses Huffman coding, operates only in
sequential mode, and is restricted to 8 bits/sample source
image precision.
• The extended system enhances the baseline system to satisfy
a broader range of applications. These optional features
include:
12-bit/sample input
—
—
—
—
—
—
—
Sequential and/or hierarchical progressive build-up
Arithmetic coding
Adaptive quantization
Tiling
Still picture interchange file format (SPIFF)
Selective refinement
• A simple DPCM-based lossless method independent of the DCT
(using either Huffman or arithmetic coding) to accommodate
applications requiring lossless compression.
103
The JPEG Baseline Algorithm
• The original image is partitioned into 8 × 8 pixel blocks. In order to facilitate
the implementation of the DCT, for a b-bit image, the value 2 b-1 (e.g., 128 for
an 8-bit image) is subtracted from each pixel value. Each block is
independently transformed using the forward DCT.
• All the transform coefficients are normalized (weighted) according to a userdefined normalization array that is fixed for all blocks. The objective is to
weight each coefficient according to its perceptual importance. The human
visual system (HVS) contrast sensitivity function can be used to determine this
array. Each component of the normalization array is an 8-bit integer and is
passed to the receiver as part of the header information. Up to four
normalization arrays can be specified (e.g., different normalization arrays may
be used for the luminance and chrominance components of a color image).
The bit rate of an encoded image can be conveniently varied by changing this
array, e.g., by scaling the array by a constant factor. The normalized
coefficients are then uniformly quantized by rounding to the nearest integer.
The normalization array effectively scales the quantizer.
• The top-left coefficient in the 2-D DCT array is referred to as the DC
coefficient and is proportional to the average brightness of the spatial block.
The difference between the quantized DC coefficient of the current block and
the quantized DC coefficient of the previous block is variable-length coded.
This difference is first assigned to one of the twelve categories where the
values in category k are in the range (2k-1, 2k −1) or (−2k +1, −2k −1) and 11 ≥ k
≥ 0. A set of Huffman codes with a maximum codeword length of 16 bits is
used to specify the different categories. For each category, it is necessary to
send an additional k bits to completely specify the sign and magnitude of a
difference value within that category. For the baseline system, up to two
separate DC Huffman tables can be specified in the header information.
104
The JPEG Baseline Algorithm
• The quantization of AC coefficients creates many zeros, especially at higher
frequencies. The 2-D array of the DCT coefficients is formatted into a 1-D
vector using a zigzag reordering. This rearranges the coefficients in
approximately decreasing order of their average energy (as well as in order
of increasing spatial frequency) with the aim of creating large runs of zero
values.
• To encode the AC coefficients, each nonzero coefficient is first described by
a composite 8-bit value (0 → 255), denoted by I, of the form (in binary
notation):
I = ’NNNNSSSS’
The four least significant bits, ‘SSSS’, define a category k for the coefficient
amplitude in a way similar to the DC difference values. For the baseline
system 10 ≥ k ≥ 1. The four most significant bits in the composite value,
’NNNN’, give the position of the current coefficient relative to the previous
nonzero coefficient, i.e., the runlength of zero coefficients between nonzero
coefficients. The runlengths specified by ’NNNN’ can range from 0 to 15,
and a separate symbol, I = 240, is defined to represent a runlength of 16 zero
coefficients. If the runlength exceeds 16 zero coefficients, it is coded by
using multiple symbols. In addition, a special symbol, I = 0, is used to code
the end of block (EOB) which signals that all the remaining coefficients in
the block are zero. Therefore, the total symbol set contains 162 members (10
categories × 16 runlength values +2 additional symbols). The output
symbols for each block are then Huffman coded (maximum codeword
length of 16) and are followed by the additional bits (assumed to be
uniformly distributed and thus fixedlength coded) required to specify the
sign and exact amplitude of the coefficient in each of the categories. Up to
two separate AC Huffman tables can be specified in the baseline system.
105
The JPEG Baseline Algorithm
• Each component of a color (or multi-spectral) image is encoded indepenexample, an image represented in the YIQ luminance and
chrominance space is encoded essentially as three separate images.
dently. For
• At the decoder, after the encoded bit stream is Huffman decoded and the 2dimensional array of the quantized DCT coefficients are recovered, each
coefficient is denormalized, i.e., multiplied by the corresponding component
of the normalization matrix. The resultant array is inverse DCT transformed
and the value 2b-1, which was subtracted prior to the forward DCT, is added
back to each pixel value. This yields an approximation to the original image
block. The resulting error depends on the amount of quantization which is
controlled by the normalization matrix.
106
Encoder Simplified Block Diagram
8x8 blocks
DCT-Based Compression Encoder
FDCT
Source
Image Data
Quantizer
Quantization
Tables
107
Entropy
Encoder
Huffman
Tables
Header
Data
Compressed
Image Data
Decoder Simplified Block Diagram
DCT-Based Compression Decoder
Header
Data
Compressed
Image Data
Entropy
Encoder
Huffman
Tables
Dequantizer
Quantization
Tables
108
IDCT
Reconstructed
Image Data
Original 8 × 8 Image Block
The following example represents an 8 × 8 block of the original
image LENNA.
139
144

150
159

159

161
162

162
144
151
155
161
160
161
162
162
149
153
160
162
161
161
161
161
153
156
163
160
162
161
163
161
155
159
158
160
162
160
162
163
109
155
156
156
159
155
157
157
158
155
156
156
159
155
157
157
158
155
156

156
159
155

157
157

158
Mean Subtracted Image Block
The value of 128 is subtracted from each pixel prior to the
application of the discrete cosine transform (DCT). This places
the DC coefficient (the top-left corner coefficient that is
proportional to the average brightness of the block) in the range
(–1024, +1016).
11
16

22
 31

 31

33
34

34
16
23
27
33
32
33
34
34
21
25
32
34
33
33
33
33
25
28
35
32
34
33
35
33
110
27
31
30
32
34
32
34
35
27
28
28
31
27
29
29
30
27
28
28
31
27
29
29
30
27
28

28
31
27

29
29

30
DCTof 8 × 8 Image Block
The JPEG forward DCT is defined as
4C ( u ) C ( v ) 7
F (u , v ) =
∑
n2
j =0
 12
c (w) = 
1
 ( 2 j + 1)uπ 
 ( 2k + 1)vπ 
cos



16
16

7
∑ f ( j , k ) cos 
k =0
forw = 0 

otherwise .
 235.6 − 1.0 − 12.1
− 22.6 − 17.5 − 6.2

 − 10.9 − 9.3 − 1.6
 − 7.1 − 1.9
0.2

 − 0.6 − 0.8
1.5

− 0.2
1.6
 1.8
 − 1.3 − 0.4 − 0.3

1.6
− 3.8
 − 2.6
− 5.2
− 3.2
1.5
1.5
1.6
− 0.3
− 1.5
− 1.8
111
2.1
− 2.9
0.2
0.9
− 0.1
− 0.8
− 0.5
1.9
− 1.7 − 2.7
− 0.1 0.4
− 0.9 − 0.6
− 0.1 0.0
− 0.7 0.6
1.5
1.0
1.7
1.1
1.2 − 0.6
1.3 
− 1.2 

− 0.1
0.3 
1.3 

− 1.0 
− 0.8

− 0.4
Normalization Matrix
The DCT coefficients are scaled using a user-defined
normalization array that is fixed for all blocks. For the baseline
system, four different normalization arrays are allowed (e.g., to
accommodate luminance and chrominance components of an
image). Each component of the normalization array, Q(u,v), is an
8-bit integer that in effect determines the quantization step size.
The quality and bit rate of an encoded image can be varied by
changing this array. The normalization matrix may be designed
according to the perceptual importance of the DCT coefficients
under the intended viewing conditions. The following is an
example obtained by measuring the DCT coefficient “visibility
threshold” using CCIR-601 images and display:
 16
 12

 14
 14

 18

 24
 49

 72
11
12
13
17
22
35
64
92
10
14
16
22
37
55
78
95
16 24 40
19 26 58
24 40 57
29 51 87
56 68 109
64 81 104
87 103 121
98 112 100
112
51
60
69
80
103
113
120
103
61 
55 

56 
62 
77 

92 
101

99 
Quantization Example
Quantization step size = 20
DCT unquantized coefficient = 13.56
Scaled coefficient =
13.56
= 0.68
20
Scaled, quantized (rounded) coefficient = 1
Dequantized coefficient (Decoder) = 1 × 20 = 20
113
Normalized/Quantized Coefficients
After normalization, the coefficients are quantized by rounding
off to the nearest integer:
 F (u , v) 
F * (u , v ) = NINT 

Q
(
u
,
v
)


The normalization/quantization process typically results in many
zero-valued coefficients which can be coded efficiently.
 15
− 2

−1
 0

 0

 0
 0

 0
0
−1
−1
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
114
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0












Zigzag Scan
Prior to the entropy encoding, the 2-D coefficients are scanned
into a 1-D format according to the following zigzag pattern. This
rearranges the coefficients in approximately decreasing order of
their average energy (as well as approximately in order of
increasing spatial frequency) with the aim of creating large runs
of zero values.
0
2

3
9

10

 20
 21

 35
1
5
4 7
8 12
11 18
19 23
22 33
34 37
36 48
6
13
17
24
32
38
47
49
14
16
25
31
39
46
50
57
115
15
26
30
40
45
51
56
58
27
29
41
44
52
55
59
62
28 
42 

43
53 
54 

60 
61

63 
Huffman Coding Example
Quantized DCT AC coefficients in zigzag order:
15 0 –2 –1
–1 –1 0 0 –1 All Zeros.
Huffman Symbols: (Zero run, Category)
DC Coefficient (1,2) (0,1)
(0,1)
(0,1) (2,1) EOB.
Codeword Assignment
Final codeword = (Huffman codeword) append (k additional bits
for category k coefficient)
–2 preceded by one zero: (k=2,run=1) 111001 append 0 (sign
bit) append 1 = 11100101.
Huffman code for the block:
DC diff. codeword/ 11100101/ 000/ 000/ 000/ 110110/ 1010.
Total bits assuming 8 bits for the DC difference codeword= 35,
35
Bit rate =
= 0.55 bits/pixel
64
116
JPEG AC Huffman Table Example
Zero Run Category Codelenght Codeword
0
2
1
00
0
2
2
01
0
3
3
100
0
4
4
1011
0
5
5
11010
0
6
6
111000
0
7
7
1111000
.
.
.
.
1
4
1
1100
1
6
2
111001
1
7
3
1111001
1
9
4
111110110
.
.
.
.
11011
2
5
1
11111000
2
8
2
.
.
.
.
111011
3
6
1
111110111
3
9
2
.
.
.
.
111011
4
6
1
1111010
5
7
1
1111011
6
7
1
11111001
7
8
1
11111010
8
8
1
111111000
9
9
1
111111001
10
9
1
111111010
11
9
1
.
.
.
.
.
.
.
.
1010
4
End of Block (EOB)
117
Zigzag Scan and Coefficient Categories
Zig-zag scan AC ordering
category
Coeffcient Rang
0
1
2
3
4
5
6
7
8
9
10
11
0
−1,1
−3, −2, 2, 3
−7,…, −4, 4, …,7
−15,…, −8, 8, …,15
−31,…, −16, 16, …,31
−63,…, −32, 32, …,63
−127,…, −64, 64, …,127
−255,…, −128, 128, …,255
−511,…, −256, 256, …,511
−1023,…, −512, 512, …,1023
−2047,…, −1024, 1024, ,2047
118
Denormalized DCT Coefficients
At the decoder, the quantized DCT coefficients are dequantized
(denormalized) by multiplying by the normalization matrix.
0
 240
− 24 − 12

 − 14 − 13
 0
0

 0
0

0
 0
 0
0

0
 0
− 10
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
119
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0












Reconstructed 8 × 8 Image Block
The dequantized DCT coefficients are inverse transformed
according to
1
fˆ ( j, k ) = ∑
4 u =0
7
 (2 j + 1)uπ   (2k + 1)vπ 
ˆ
(
)
(
)
(
,
)
cos
cos
C
u
C
v
F
u
v
∑=


16
16
v 0

 

7
forw = 0 

otherwise 
 12
c (w) = 
1
The value 128 is added back to each pixel value.
144
148

155
160

163

163
160

158
146
150
156
161
163
163
161
159
149
152
157
161
164
164
162
161
152
154
158
162
163
164
162
161
154
156
158
161
162
162
162
162
120
156
156
157
159
160
160
161
161
156
156
156
157
158
158
159
159
156
156

155
155
156

157
158

158
Error: (Original - Reconstructed)
The root mean-squared error (RMSE) between the original image
block and the reconstructed image block is
[
]
2
1 7 7
ˆ
RMSE =
∑∑ f ( j, k ) − f ( j, k ) = 2.26.
64 j =0 k =0
− 5
− 4

− 5
 −1

− 4

− 2
2

4
1
1
−2 0
1
1
2
3
−1 3
5
0
0
1 − 2 −1
− 3 − 3 −1 0
− 2 −3 −3 − 2
1 −1 1
0
3
0
0
1
121
−1
0
−1
0
−5
−3
−4
−3
− 1 − 1
0
0

0
1
2
4 
− 3 − 1

−1 0 
− 2 − 1

−1 0 
JPEG Baseline DCT Results
LENA
Technique
JPEG DCT
JPEG DCT
JPEG DCT
JPEG DCT
JPEG DCT
Technique
JPEG DCT
JPEG DCT
JPEG DCT
JPEG DCT
JPEG DCT
Bitrate RMSE SNR
bits/pixel (0-255) (db)
0.25
7.34 30.81
0.50
4.70 34.69
0.75
3.78 36.58
1.00
3.23 37.93
1.50
2.56 39.95
BOOTS
Bitrate
bits/pixel
0.25
0.50
0.75
1.00
1.50
RMSE
(0-255)
14.78
10.07
8.20
6.92
5.02
SNR
(db)
24.73
28.07
29.85
31.33
34.12
• The JPEG recommended normalization matrix (for CCIR-601 images and
display) has been scaled to achieve the desired bit rate in each
case. Huffman tables matched to the image are used for the
entropy encoding.
• SNR is defined as 10 log10(2552/mse) db.
122
JPEG Baseline System Summary
• Image quality vs. bit rate: The baseline JPEG DCT is a lossy
technique. The image quality can be traded for lower bit rate
by manipulating the normalization matrix components. Larger
values correspond to coarser quantization, lower bit rate, and
lower reconstructed image quality.
• Implementation
complexity: Software implementation
around 1 MB/s on a Pentium-based machine, ASIC
implementation at video rates.
• Constant
bit rate vs. constant quality: Using the same
compression parameters (i.e., normalization matrix) with
differ-ent images would generally result in different per pixel
bit rates, depending on image resolution, detail, and noise. For
example, bit rate variations as high as 5 × have been obtained
on video images that have the same resolution (512 × 512) but
contain varying image detail.
• Encoder/decoder asymmetry: The encoder and the de-coder
have roughly the same complexity. This might raise a concern
for some storage applications where the data needs to be
encoded only once, but would be decoded many times.
123
JPEG Baseline System Summary (Cont’d)
• Effect of multiple coding: The baseline JPEG DCT is fairly
robust when subjected to multiple cycles of compression and
decompression. Although in certain test cases additional
degradations have resulted after the first compression and
decompression cycle, the losses have been significantly less
than the loss caused by the first stage (usually less than 5%
after 10 or more stages). Also, in most cases, no additional loss
has been observed after just 2 or 3 cycles.
• Channel
error tolerance: Due to the variable-length codes
(Huffman) used in the encoding of the DCT coefficients, a
channel error can potentially propagate through the whole
image and have a catastrophic effect. The JPEG standard,
through the use of restart codes, provides the means for
recovering errors at a small cost of compression efficiency.
• Artifacts: At low bit rates, the DCT results in artifacts that
generally appear as visible blocking and/or ringing around the
edges.
124
Normalization Matrix Design
1. Image characteristics
• Noise
• Resolution
• Image detail
2. Display characteristics
• Dynamic Range
• MTF
• Noise
3. Viewing conditions
• Viewing distance
• Ambient light level
4. Color
• HVS color encoding
• HVS color perception
125
Fixed-Rate JPEG
Two mechanisms exit for creating a JPEG compressed file with a
fixed (prespecified) filesize:
• Muliple-pass: In this iterative approach, the image is first
compressed using a default Q-table, e.g., one that has been
empirically optimized for the target rate. If the resulting
filesize is not within the acceptable range, the Q-table
elements are resealed and the image is compressed again.
This process is repeated until the desired filesize is achieved.
• Single-pass: The image is compressed to a given size by
using:
—
—
i) A (virtual) rate buffer that can accept a variable-rate
input while having a fixed-rate output equal to the target
compressed bit rate; and
ii) The adaptive quantization feature in the extended
JPEG, which allows the scaling of the Q-table by a 5-bit
factor from one DCT block to another.
The image is subdivided into smaller subimages, e.g., 8 × 8
stripes, and the first segment is compressed using a default Qtable. The resulting bits are fed into the buffer while reading bits
from the buffer at a constant rate. Before compressing each
segment, the buffer status is monitored and the Q-table scaling is
adjusted to compensate for deviations from the desired rate. The
scale factor is chosen so as to minimize quality variations between segments while maintaining the approximate target rate.
126
The JPEG Extended System
The JPEG extended system enhances the baseline system to
satisfy a broader range of applications. Its features include
• 12-bit/pixel input image precision: In certain applications,
such as medical imaging, the source image has a higher
precision than 8 bits. It should be noted that the 12-bit
codecs require greater computational resources to achieve
the required DCT or IDCT accuracy.
• Sequential progressive build-up: The image is encoded in
multiple scans instead of a single scan. This provides
successively higher levels of image quality and is useful for
human interaction over low data rate communication
channels such as dial-up telephone lines. Two
complementary techniques for use in this mode have been
specified: spectral selection and successive approximation.
• Hierarchical progressive build-up: The image is encoded
at multiple resolutions. This is useful for applications where a
high-resolution image must also be accessed by lowerresolution devices.
• Arithmetic coding: For many of the images tested, arithmetic
coding has produced on the average 5-10% better
compression than Huffman. However, many believe that it is
more complex than Huffman for both software and hardware
imple-mentations.
127
The JPEG Extended System
• Adaptive quantization: Allows for a 5-bit scale change of the
quantization matrix from one block to another. This feature is
useful for:
—
—
—
Fixed-rate JPEG through rate-buffer instrumentation,
Better image quality for the same bit rate,
Transcodability with the MPEG standard intraframe mode.
• Tiling: Provides a means for tiling an image into a set of
smaller subimages. In particular:
—
—
—
Simple tiling is useful for providing random access into the
middle of a compressed image.
Pyramidal tiling is useful for accessing the lower
resolution versions of an image.
Composite tiling is useful for relating diverse subimages
into an image collage.
• SPIFF: A file format that provides for the interchange of the
compressed image files between application environments.
• Selective refinement: Provides for selecting a region of an
image for further refinement.
128
Lossless Predictive Mode
C B
A X
• A combination of the three pixels A, B, and C, in the neighborhood of the pixel being encoded are used to form a prediction
(estimate) of the pixel X, denoted X̂ by .
• The error (differential) signal, e, is formed as
e = X − Xˆ
• The
differential signal, e, typically has a largely reduced
variance compared to X, is significantly less correlated than X,
and has a stable histogram well approximated by a doublesided exponential (Laplacian) distribution. It may be encoded
by either Huffman or arithmetic coding.
129
Predictor Selection
For a given image, any of the eight predictors listed below can be
used (selection zero is used in the hierarchical mode) to form a
prediction. Selections 1-3 correspond to one-dimensional
predictors, while selections 4-7 correspond to two-dimensional
predictors. The original source image can have any precision
from 2 to 16 bits/pixel.
Selection Value
0
1
2
3
4
5
6
7
Prediction
no prediction
A
B
C
A+B–C
A + (B – C) / 2
B + (A – C) / 2
(A + B) / 2
Experiments with JPEG images show that the predictor selection
7, on the average, results in slightly lower bit rate (higher
compression) compared to the other selections. However, the
performance of a certain predictor is highly dependent on the
image characteristics, and for a given image, any predictor may
outperform its alternatives.
130
Lossless Predictive Coding Results
The results of the lossless encoding of the nine JPEG images are
summarized below. 14 The images are 720 pixels/line by 576
lines, 8 bits/sample for the luminance; 360 pixels/line by 576
lines, 8 bits/sample for each chrominance component (average of
16 bits/pixel per image). Arithmetic coding has been used to
encode the prediction errors. Results indicate that an average
compression of about 2:1 can be expected when encoding color
images of moderate complexity.
Image
Bits/pixel Predictor
BOATS
6.9
7
BOARD
7.0
7
ZELDA
7.2
7
8.3
7
BARBARA1
8.7
7
BARBARA2
8.1
7
HOTEL
5.7
7
BALLOONS
8.2
7
GOLDHILL
7.3
7
GIRL
Average
7.5
7
Average
6
+1.8%
Average
5
+4.8%
Average
4
+7.1%
Average
3
+11.6%
Average
2
+1.5%
Average
1
+6.9%
14
J• Mitchell, Proceedings of SPIE, CR37, 1991
131
SUBBAND AND WAVELET
CODING
132
Subband Coding
• The image is decomposed into a number of subimages (subbands), each representing a different spatial frequency content.
• The decomposition is performed using a bank of low-pass and
high-pass filters known as analysis filters, followed by downsampling. The filters are chosen so that minimal or no information is lost in the decimation process.
• This filtering process is equivalent to a basis function decomposition, where the basis functions are the impulse responses
of the filters, and the filtered subimages are the coefficients for
the corresponding basis functions.
• The subbands are independently or jointly quantized and encoded using conventional techniques.
• At the decoder, the reconstructed subbands are upsampled and
interpolated using a bank of low-pass and high-pass filters
known as synthesis filters. To minimize (or completely
eliminate) any artifacts resulting from the analysis and
synthesis stages, the synthesis filters are usually derived from
the analysis filters depending on the desired filter
characterisitics .
133
134
Wavelet Coding
• The
wavelet transform uses basis functions that are
scaled and translated versions of a single prototype
function (the basic or mother wavelet).
• The wavelet transform can be viewed as a bank of bandpass filters, where the filters have increasing bandwidth
as the center frequency increases. For practical
purposes, a wavelet decomposition is identical to a
nonuniform subband decomposition, where the bandwidth of a subband is a function of its center frequency.
• The
wavelet filter bank can be implemented by
recursively filtering with a low-pass/high-pass filter pair
followed by downsampling.
• The wavelet basis functions are the impulse responses
of the cascaded filters.
135
Wavelet Analysis/Synthesis Filter Banks
136
Wavelet Filter Banks
There are three general classes of wavelet basis functions,
as defined by the type of filter bank. Each class has characteristics that may make it more or less suitable for a
specific application.
• Quadrature mirror filters (QMFs)
• Conjugate quadrature filters (CQFs)
• Bi-orthogonal filters
Let’s denote by h0(n) and h1(n), the analysis low-pass
and high-pass filters, and by g0(n) and g1(n), the synthesis
low-pass and high-pass filters, respectively.
137
Quadrature Mirror Filters
One set of desirable basis functions consists of those that
are symmetrical and have the same length.
Given a low-pass analysis filter h0(n), aliasing can
be
eliminated by choosing:
h1(n) = (– 1)n h0(n)
g0(n) = 2 h0(n)
g1(n) = – 2(– 1)n h0(n)
Unfortunately, the symmetry constraint on the basis functions makes it impossible to design a system with a gain of
unity for all frequencies.
Properties of QMFs:
• Equal-length, linear phase filters (symmetrical).
• Although not perfect reconstruction, but very close to,
•
•
because the basis functions are almost orthogonal.
Symmetric boundary extension (minimizes discontinuities).
Both even and odd filter lengths allowed.
138
Conjugate Quadrature Filters
Perfect reconstruction can be achieved with equal-length
filters by relaxing the symmetrical filter constraint.
Given a low-pass analysis filter h0(n), aliasing can be
eliminated by choosing:
h1(n) = (– 1)n h0(N – 1– n)
g0(n) = 2 h0(N – 1– n)
g1(n) = – 2(– 1)n h0(n)
The proper choice of h0(n) will result in a system gain of
unity for all frequencies.
Properties of CQFs:
• Equal-length, nonlinear phase filters.
• Perfect reconstruction (orthogonal basis functions).
• Periodic boundary extension (introduces discontinuities).
• Even filter lengths only.
• Possible regularity, vanishing moments properties.
139
Biorthogonal Filters
Perfect reconstruction can be achieved with symmetrical basis
functions by relaxing the constraint of equal length.
Given a low-pass analysis filter h0(n), and a low-pass synthesis
filter g0(n), aliasing can be eliminated if:
h1(n) =
1
(– 1)n g0(n)
2
g1(n) = 2 (– 1)n h0(n)
The proper choices of h0(n) and g0(n) will result in a system gain
of unity all for frequencies.
Properties of biorthogonal filters:
• Unequal-length, linear phase filters (symmetrical).
• Perfect reconstruction. The basis fucntions for
h0(n) and
g1(n) are orthogonal; and the basis functions for g0(n)
and h1(n) are orthogonal.
• Symmetric boundary extension.
• Odd or even filter lengths that differ by a multiple of two
(e.g., 7/9, 6/10).
• Possible regularity, vanishing moments properties.
140
Biorthogonal Filters For Compression
Length
In a recent paper (IEEE Trans. IP, Vol 4, pp. 1053, Aug.’95),
Villasenor et al. evaluated 4300 candidate wavelet biorthogonal
filter banks. The following table summarizes their choice of
the filters best suited to image compression.
1
2
3
4
5
6
H0
G0
H0
G0
H0
G0
H0
G0
H0
G0
H0
G0
9
7
13
11
6
10
5
3
2
6
9
3
Filter coefficients (origin is leftmost coefficient, coefs for negative
indices follow by symmetry)
852699, .377402, -.110624, -.023849, .037828
.788486, .418092, -.040689, -.064539
.767245, .383269, -.068878, -.033475, .047282, .003759, -.008473
.832848, .448109, -.069163, -.108737, .006292, .014182
.788486, .047699, -.129078
.615051, .33389, -.067237, .006989, .018914
1.060660, .353553, -.176777
.707107,.353553
.707107,.707107
.707107,.088388,-.088388
.994369, .419845, -.176777, -.066291, .033145
.707107,.353553
141
Embedded Zerotree Wavelet (EZW)15
The embedded zerotree wavelet coding algorithm uses the
following building blocks:
• A discrete wavelet transform that decorrelates most images
fairly effectively;
• Zerotree
coding, a procedure whereby insignificance is
predicted across multiple scales using a practical image
model, providing substantial coding gains over the first-order
entropy of the wavelet coefficient significance maps;
•A
uniform embedded quantizer, which uses successive
approximations, and allows the encoding or the decoding
process to stop at any point (i.e., progressive transmission)
without a penalty in performance
• Adaptive arithmetic coding, which allows the entropy coder
to incorporate learning into the bit stream itself.
15
J. Shapiro, IEEE Trans. Signal Proc., 41(12), pp. 3445-3462, Dec. 1993
142
Zerotrees and Significance Maps
Zerotree coding is based on the assumption that if a wavelet coefficient at a coarse scale (called a parent) is insignificant
with respect to a threshold T (i.e., its magnitude is less than T),
then all the wavelet coefficients of the same orientation in the
same spatial location at finer scales (called the
descendants) are also likely to be insignificant with respect to
T.
LL3
HL3
HL2
LH3 HH3
HL1
LH2
HH2
LH1
HH1
143
Significance Maps
The symbols to be encoded at the first pass of the EZW method
are generated in the following way:
• An initial quantization threshold T is chosen (usually set at
about half the magnitude of the largest wavelet coefficient).
•A
raster scanning of the coefficients in each band is
performed by starting at the lowest frequency subband,
scanning the bands from left to right and top to bottom, and
then moving on to the next scale and repeating this process.
• Each wavelet coefficient is compared to the threshold T ,
resulting in one of four possibilities:
—
If the coefficient is significant and positive, the symbol
en-coded is (POS).
—
If the coefficient is significant and negative, the symbol
en-coded is (NEG).
—
If the coefficient is insignificant, but at least one of its
descendants is significant, an isolated zero (IZ) is
encoded.
—
Finally, if the coefficient is insignificant, its parent is
significant, and all of its descendants are also
insignificant, the symbol encoded is a zerotree root
(ZTR).
• These 4 symbols are encoded with adaptive arithmetic coding.
144
Successive Approximations
To perform embedded coding, successive approximation
quantization is applied where both the significance of a
coefficient and its magnitude are further refined by applying a
sequence of thresholds {Tn}, such that Ti = Ti-1 / 2.
Two separate lists of wavelet coefficients are maintained and at
each pass, which corresponds to a given threshold Tj, these list
are updated:
•A
dominant list, which contains the location of those
coefficients that have so far been found to be insignificant.
At each pass of this list, its coefficients are compared to the
current threshold Tj to determine their significance and, if
significant, their sign. This information is encoded with a
procedure similar to the one outlined for the first pass. Note
that no descendants of a zerotree need to be encoded.
• A subordinate list, which contains the magnitudes of those
coefficients that have been found to be significant. At each
pass of this list, the magnitudes of the coefficients are
refined to an additional bit of precision using two symbols.
This is similar to bit plane encoding of the magnitudes.
Adaptive arithmetic coding is used to encode all the symbols
generated in each pass. The encoding can be stopped whenever
the bit budget has been exhausted or a certain distortion has been
reached.
145
Example: 3-Stage Wavelet Transformed Image
Following is an array of values corresponding to a 3-stage
wavelet decomposition of a simple 8 × 8 image. The initial
quantization threshold is chosen to be nearly half of the largest
coefficient magnitude, i.e., T0 = 32.
63
-34 49 10
7
14 −13
-31 23
3
5
3
−
12
15 14
4
14
8
−
−9 − 7
7
−1
9
2
4 6 −2 2
3 −2 0 4
−5 9 −1 47
3
13 −12
4 6
−7 3
−2 3
0 −3 2
2 −3 6 −4
5 11 5 6
146
3
6
0
3
3
6
−4 4
Zerotree Symbol Encoding Example
Following is the result of processing the first dominant pass for T =
32. The reconstruction magnitudes are taken at the center of the
uncertainty interval, i.e., 48.
Subband
Coefficient
value
Symbol
Reconstruction
value
LL3
63
POS
48
HL3
-34
NEG
-48
LH3
-31
IZ
0
HH3
23
ZTR
0
HL2
49
POS
48
HL2
10
ZTR
0
HL2
14
ZTR
0
HL2
-13
ZTR
0
LH2
15
ZTR
0
LH2
14
IZ
0
LH2
-9
ZTR
0
LH2
-7
ZTR
0
HL1
7
Z
0
HL1
13
Z
0
HL1
3
Z
0
HL1
4
Z
0
LH1
-1
Z
0
LH1
47
POS
48
LH1
-3
Z
0
LH1
-2
Z
0
147
FRACTAL CODING
148
Contractive Mappings
x
1
Consider the expression f ( x ) = + . For an arbitrary initial
2
x
value x0, let’s compute f (x0). Next, compute f (f (x0)) and f (f (f (x0))),
…, etc. It can be shown that for any value of x0, this series will
converge to the same value, namely, √(2). f (x) as defined above is a
contractive mapping, and √(2) is its fixed point.
A transformation τ is called contractive, if for all values of x and y,
there exists a value s < 1, such that d(τ (x), τ (y) ) ≤ sd(x, y). The
value s is called the contractivity of the transformation τ . If s = 1, the
transformation is nonexpansive. If s > 1, the transformation is
expansive.
Now consider transmitting the value of √(2) over a communication
channel. If the transmission of the function f (x) as defined above
takes less bandwidth than the transmission of √(2) itself, the function
can be transmitted and the value of √(2) can be recovered at the other
end by starting with any initial guess and iterating the computation
until the desired accuracy is achieved.
Similarly, for the purposes of image compression, if the different
regions of an image are represented as the fixed points of a set of
contractive mappings, the storage of the parameters of those
contractive mappings is all that is needed for the reconstruction of
the image.
149
Jacquin’s Fractal Compression Technique
Jacquin’s technique is predicated on the assumption that any section
of an image can be represented as a transformation of another section
of the same image.
To encode a certain region of an image, first a search through the
image is conducted, looking for another part of the image which
when transformed by a cascade of one or more (overall contractive)
transformations, would produce the best approximation to the
original region.
The compressed bits consist of the parameters needed to represent
these contractive transformations in addition to the coordinates of the
region that is being transformed.
In order to simplify the coding operation, the image is segmented
into nonoverlapping contiguous n × n blocks, known as range
blocks, and each 2n × 2n block of the image, known as a domain
block, is tried out through a series of contractive transformations to
find a best match.
Due to the intensive computational requirements of the best match
search, the fractal encoding is usually several orders of magnitude
slower than fractal decoding.
150