Lecture notes

Self-entropy of symbol
H ( x)   log 2 px
Entropy of source
k
H ( x)   px   log 2 p x 
i 1
Example of a prefix code
a
b
c
d
=
=
=
=
0
10
110
111
Example of non-prefix code
a
b
c
d
=
=
=
=
0
01
011
111
Probability distribution
p(1)
0.50
0.60
0.70
0.80
0.90
0.98
H(1)
1.00
0.74
0.51
0.32
0.15
0.03
*
0.50
0.44
0.36
0.26
0.14
0.03
p(0)
0.50
0.40
0.30
0.20
0.10
0.02
H(0)
1.00
1.32
1.74
2.32
3.32
5.64
*
0.50
0.53
0.52
0.46
0.33
0.11
6
Entropy
5
4
Less probable symbol
3
2
1
More probable symbol
0
50 54 58 62 66 70 74 78 82 86 90 94 98
% % % % % % % % % % % % %
Probability
Entropy of binary model
Total entropy
1.00
0.80
0.60
0.40
0.20
0.00
50 54 58 62 66 70 74 78 82 86 90 94 98
% % % % % % % % % % % % %
Probability
Static or adaptive model?
Static:
+ No side information
+ One pass over the data is enough
- Fails if the model is incorrect
Semi-adaptive:
+ Optimizes model to the input data
- Two-passes over the image needed
- Model must also be stored in the file
Adaptive:
+ Optimizes model to the input data
+ One pass over the data is enough
- Must have time to adapt to the data
Penalty of using wrong model
BINARY DATA
ESTIMATED MODEL:
p w , pb
CORRECT MODEL:
pw , pb
AVERAGE CODE LENGTH:
CL   pw  log p w  pb  log pb
INEFFICIENCY:
p w
pb
d  C  CL  H   pw  log
 pb  log
pw
pb
Context model - example
PIXEL ABOVE:
Context
p
H
fH
48
86 %
0.22
10.56
8
14 %
2.84
22.72
8
100 %
0.00
0.00
0
0%

0.00
Freq
PIXEL TO LEFT:
Context
fH
p
H
56
98 %
0.03
1.68
1
2%
5.64
5.64
0
0%
0.00
7
100 %

0.00
Freq
0.00
Summary of context model example
NO CONTEXT:
fw = 56, fb = 8,
pw = 87.5 %, pb = 12.5 %
Total bit rate = 10.79 + 24 = 34.79
Entropy = 34.79 / 64 = 0.544 bpp
PIXEL ABOVE:
Total bit rate = 33.28
Entropy = 33.28 / 64 = 0.520 bpp
PIXEL TO LEFT:
Total bit rate = 7.32
Entropy = 7.32 / 64 = 0.114 bpp
Example of using wrong model
MODEL:
Context
p
H
56
98 %
0.03
1
2%
5.64
0
0%
7
100 %

0.00
Freq
INPUT:
Context
fH
48
0.03
1.44
8
5.64
45.12
8

0.00

0.00
0
RESULT:
H
Freq
1.44 + 45.12 + 
= ??? bits
Predictive modeling
ORIGINAL IMAGE
PREDICTION RESIDUALS
HISTOGRAM OF ORIGINAL IMAGE
HISTOGRAM OF RESIDUAL VALUES
Predictive modeling - example
Pixel above:
Pixel to the left:
Predictive modeling - statistics
Without prediction:
H
fH
56
87.5 % 0.19
10.64
8
12.5 % 3.00
24.00
Freq
p
Total bir rate = 34.64
Predicted by pixel above:
P
H
fH
48
75 %
0.42
20.16
16
25 %
2.00
32.00
Freq
Total bit rate = 54.16
Prediceted by pixel to the left:
fH
P
H
63
98 %
0.03
1.89
1
2%
5.64
5.64
Freq
Total bit rate = 7.53
Coding
Huffman coding
- David Huffman, 1952
- Prefix code
- Bottom-up algorithm for construction of the code tree
- Optimal in the case when probabilities are of the form 2-n
Arithmetic coding
- Rissanen, 1976
- General: applies to any source
- Suitable for dynamic models (no explicit code table)
- Optimal for any probability model
- All input file is coded as one code word
Golomb codes
- Solomon Golomb, 1966
- Sub-optimal but fast prefix code.
- Fast to construct (bitwise operations only)
- Code can be calculated ad hoc: no explicit code table
- Applies only to a predefined probability distribution
Huffman coding
Constructing the Huffman tree: leaf nodes
a
b
0.05
0.05
c
0.1
d
e
0.2
f
0.3
g
0.2
0.1
Combining nodes at the 1st step
0.1
a
b
0.05
0.05
c
0.1
d
e
0.2
f
0.3
g
0.2
0.1
The complete Huffman tree
1.0
0
1
0.4
0
1
0.2
0.6
0
1
0
1
0.1
0
a
0.05
0.3
1
b
0.05
1
0
c
0.1
d
0.2
e
0.3
f
0.2
g
0.1
Symbols, their probabilities, and the corresponding Huffman codes
Symbol
a
b
c
d
e
f
g
Probability
0.05
0.05
0.1
0.2
0.3
0.2
0.1
Codeword
0000
0001
001
01
10
110
111
Arithmetic coding
Interval [0,1] divided into 8 parts
1
0.875
0.75
0.625
0.5
0.375
0.25
0.125
0
-
111
110
11
101
10
100
1
011
0
010
01
001
00
000
Coding of any interval takes  log 2 A bits, where A is the length of the interval.
Arithemtic coding divides the interval according to the probability distribution.
The lengths of the subintervals sums up to 1.
Arithmetic coding of sequence 'aab'
1
0.9
c
b
0.7
0.63
0.49
c
b
0.441
a
0.343
c
b
a
a
0
The length of the final interval
n
A final  p1  p2  p3 ...  pn   pi
i 1
This interval can be coded by
n
n
i 1
i 1
C A   log 2  pi    log 2 pi
The code length can be described in respect to the source alphabet:
m
C A    pi  log 2 pi
i 1
Optimality of arithmetic coding
The length of the final interval is not exactly a power of ½, as it was assumed. The final
interval, however, can be approximated by any of its subinterval that meets the requirement
A=2-n. Thus the approximation can be bounded by
A  A'  A
2
(2.12)
yielding to the upper bound of the code length:
C A   log 2 A 2   log 2 A  1  H  1
(2.13)
The upper bound for the coding deficiency thus is 1 bit for the entire file.
C A  H  ps1  log
2  log e
 H  ps1  0.086 (bits per symbol)
e
(2.14)
Implementation aspects
Half point zooming
1.0
1.0
0.5
0.5
0.6
0.4
0.3
0.2
0.0
0.0
known that the following bit stream is either "01xxx" if final interval is below half point, or
"10xxx" if the final interval is above the half point (Here xxx refers to the rest of the code
stream). In general, it can be shown that if the next bit due to a half-point zooming will be b, it
is followed by as many opposite bits of b as there were quarter-point zoomings before the next
half-point zooming.
Quarter-point zooming
1.0
1.0
1.0
0.9
0.75
0.75
0.6
0.5
0.7
0.5
0.75
0.5
0.4
0.25
0.25
0.0
0.0
0.3
0.25
0.0
0.1
Arithmetic coding
/* Initialize lower and upper bounds */
low  0; high  1;
cum[0]  0; cum[1]  p1;
/* Calculate cumulative frequencies */
FOR i  2 TO k DO
cum[i]  cum[i-1] + pk
WHILE Symbols left> DO
/* Select the interval corresponding symbol c */
c  READ(Input);
range  high - low;
high  low + range*cum[c+1];
low  low + range*cum[c];
/* Half-point zooming: lower */
WHILE high < 0.5 DO
high  2*high;
low  2*low;
WRITE(0);
FOR buffer TIMES DO WRITE(1);
buffer  0;
/* Half-point zooming: higher */
WHILE low > 0.5 DO
high  2*(high-0.5);
low  2*(low-0.5);
WRITE(1);
FOR buffer TIMES DO WRITE(0);
buffer  0;
/* Quarter-point zooming */
WHILE (low > 0.25) AND (high < 0.75) THEN
high  2*(high-0.25);
low  2*(low-0.25);
buffer  buffer + 1;
QM-coder
QM-coder:
-
The input alphabet of QM-coder must be in binary form.
For gaining speed, all multiplications in QM-coder has been eliminated.
QM-coder includes its own modelling procedures.
For multi-symbol alphabet:
-
Multi-alphabet source can still be used.
Code the number by one bit at a time, using a binary decision tree.
The probability of symbol is the product of the probabilities of the node decisions.
In QM-coder the multiplication operations have been replaced by fast approximations or by
shift-left-operations in the following way. Denote the more probable symbol of the model by
MPS, and the less probable symbol by LPS. In other words, the MPS is always the symbol
which has the higher probability. The interval in QM-coder is always divided so that the LPS
subinterval is above the MPS subinterval. If the interval is A and the LPS probability estimate
is Qe, the MPS probability estimate should ideally be (1-Qe). The lengths of the respective
subintervals are then A  Qe and A  1  Qe . This ideal subdivision and symbol ordering is
shown in Figure 2.11.
Illustration of symbol ordering and ideal interval subdivision
A+C
LPS
A Qe
C+A-Qe A
M PS
A (1-Qe)
C
-
Instead of operating in the scale [0, 1], the QM-coder operates in the scale [0, 1.5].
Zooming, (or renormalization as it is called in QM-coder) is performed every time the
length of the interval gets below half the scale 0.75
The interval length is always in the range 0. 75  A  1. 5 . approximation is made:
A  1  A  Qe  Qe
(2.15)
After MPS:
C is unchanged
(2.16)
A  A  1  Qe  A  A  Qe  A  Qe
After LPS:
C  C  A  1  Qe  C  A  A  Qe  C  A  Qe
(2.17)
A  A  Qe  Qe
-
All multiplications are eliminated!
The renormalization involves only multiplications by 2  bit-shifting operation.
Differences between the optimal arithmetic coding and QM-coder
Compression with
arithmetic coding
Modelling:
Coding:
Compression
with QM-coder
Modelling:
Processing
the image
Determining
the context
Determining
the prob.
distribution
Updating
model
Arithmetic
Coding
Processing
the image
Determining
the context
QM-coder:
Determining
the prob.
distribution
Arithmetic
Coding
Updating
model
Finite-state automaton in QM-coder
LPS Probability
Zero-state
1
0.1
0.01
0.001
0.0001
0.00001
Row
mirrored
state 1 1
2
3
4
5
6
7
8
9
10
11
12
Fast-attack states
transient state
non-transient state
MPS transition
LPS transition
Golomb-Rice codes
Properties:
-
Prefix codes
Suboptimal
Easy to implement.
Symbols are arranged in descending probability order
How to do it:
To code integer x, divided it into two components:
x
x M   
m
x L  x mod m
Inverse transform:
x  xM  m  xL
How to code the two parts::
-
xM is outputted using unary code,
xL is outputted using adjusted binary code
Rice coding:
-
Rice coding is special case of Golomb coding, in which m=2k.
Simpler to implement as
o xM := x >> k;
o xL := x & 000011..112 (masking out all but k lowest bits)
Probability distribution function assumed by Golomb and Rice codes
p (x )
0 1 2 ...
x
An example of the Golomb coding with the parameter m=4
x
0
1
2
3
4
5
6
7
8
:
xM
0
0
0
0
1
1
1
1
2
:
xL
0
1
2
3
0
1
2
3
0
:
Code of xM
0
0
0
0
10
10
10
10
110
:
Code of xL
00
01
10
11
00
01
10
11
00
:
Golomb and Rice codes for the parameters m=1 to 5
Golomb:
Rice:
x=0
1
2
3
4
5
6
7
8
9
:
m=1
k=0
0
10
110
1110
11110
111110
:
:
:
:
:
m=2
k=1
00
01
100
101
1100
1101
11100
11101
111100
111101
:
m=3
00
010
011
100
1010
1011
1100
11010
11011
11100
:
m=4
k=2
000
001
010
011
1000
1001
1010
1011
11000
11001
:
m=5
000
001
010
0110
0111
1000
1001
1010
10110
10111
: