ELEC 515 Information Theory

ELEC 515
Information Theory
Distortionless Source Coding
1
Source Coding
Output Alphabet
Y={y1,…,yJ}
Source Encoder
Lengths
2
Typical Sequences
• Consider the following discrete memoryless
binary source:
p(1) = 1/4
p(0) = 3/4
• Sequences of 20 symbols
1. 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
2. 1,0,1,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1
3. 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3
Tchebycheff Inequality
4
Weak Law of Large Numbers
• Sequence of N i.i.d. RVs
• Define a new RV
5
Weak Law of Large Numbers
The average approaches the statistical mean
6
Asymptotic Equipartition Property
• N i.i.d. random variables X1, …, XN
p(X1 ,X2 ,…,X N )=p(X1 )p(X2 )…p(X N )
1
1 N
− logp(X1 ,X2 ,…,X N ) = − ∑ logp(X n ) → −E[logp(X)]=H(X)
N
N n=1
as N → ∞
7
Typical Sequences
• RV X where p(xk) = pk
• Consider a sequence x where xk appears
Npk times
Np Np
Np
p(x) = p1 1 p2 2  pK
K
K
K
=
∏ pkNpk =
∏ ((2log2 pk )Npk
k 1=
k 1
=
K
=
∏ 2Npk log2 pk
k =1
K
∑ pk log2 pk
=
2 k=1
N
= 2− NH(X)
8
Summary
• The Tchebycheff inequality was used to prove the weak
law of large numbers (WLLN)
– the sample average approaches the statistical mean as N → ∞
• The WLLN was used to prove the AEP
1 N
− ∑ logp(X n ) →H(X) as N → ∞
N n=1
• A typical sequence has probability p(x1 , x2 , , xN ) ≈ 2-NH(X)
• There are about 2NH(X) typical sequences of length N
9
Typical Sequences
H (X )
H (X )
10
Typical Sequences
set of typical
sequences
nontypical or atypical
sequence
set of
sequences
of length N
11
Interpretation
• Although there are very many results that may be
produced by a random process, the one actually
produced is most probably from a set of
outcomes that all have approximately the same
chance of being the one actually realized.
• Although there are individual outcomes which
may have a higher probability than outcomes in
this set, the vast number of outcomes in the set
almost guarantees that the outcome will come
from the set.
• ``Almost all events are almost equally surprising’’
Cover and Thomas
12
Typical Sequences
• From the definition, the probability of
occurrence of a typical sequence p(x) is
13
Example
•
•
•
•
•
•
•
p(x1) = 1/4 p(x2) = 3/4
H(X) = 0.811 bit
N=3
p(x1,x1,x1) = 1/64
p(x1,x1,x2) = p(x1,x2,x1) = p(x2,x1,x1) = 3/64
p(x1,x2,x2) = p(x2,x2,x1) = p(x2,x1,x2) = 9/64
(x2,x2,x2) = 27/64
14
Example
• If δ = 0.2 the typical sequences are
– (x1,x2,x2), (x2,x1,x2), (x2,x2,x1)
with probability 0.422
(1,0,0), (0,1,0), (0,0,1)
• If δ = 0.4 the typical sequences are
– (x1,x2,x2), (x2,x1,x2), (x2,x2,x1), (x2,x2,x2)
with probability 0.844
(1,0,0), (0,1,0), (0,0,1), (0,0,0)
15
16
Typical Sequences
•
•
•
•
•
Random variable X
Alphabet size K
Entropy H(X)
Arbitrary number δ>0
Sequences x of
blocklength N≥N0 and
probability p(x)
c
N
•  X (∂ ) +  X (∂ ) =K
17
Shannon-McMillan Theorem
18
• The essence of source coding or data compression is
that as N → ∞, atypical sequences almost never
appear as the output of the source.
• Therefore, one can focus on representing typical
sequences with codewords and ignore atypical
sequences.
• Since there are only 2NH(X) typical sequences of length
N, and they are approximately equiprobable, it takes
about NH(X) bits to represent them.
• On average it takes H(X) bits per source output
symbol.
19
Variable Length Codes
Output Alphabet
Y={y1,…,yJ}
Lengths
20
21
Variable Length Codes
• Prefix code (also prefix-free or instantaneous)
C1={0,10,110,111}
• Uniquely decodable code (which is not prefix)
C2={0,01,011,0111}
• Non-singular code (which is not uniquely
decodable)
C3={0,1,00,11}
• Singular code
C4={0,10,11,10}
22
Instantaneous Codes
• Definition:
A uniquely decodable code is said to be
instantaneous if it is possible to decode each
codeword in a sequence without reference to
succeeding codewords.
A necessary and sufficient condition for a code to be
instantaneous is that no codeword is a prefix of
some other codeword.
23
24
Average Codeword Length
K
L(C) = ∑ p(xk )lk
k =1
25
Two Binary Prefix Codes
• Five source symbols: x1, x2, x3, x4, x5
• K = 5, J = 2
• c1 = 0, c2 = 10, c3 = 110, c4 = 1110, c5 = 1111
– codeword lengths 1,2,3,4,4
• c1 = 00, c2 = 01, c3 = 10, c4 = 110, c5 = 111
– codeword lengths 2,2,2,3,3
26
Kraft Inequality for Prefix Codes
27
Code Tree
28
Five Binary Codes
Source
symbols
Code A
Code B
Code C
Code D
Code E
x1
x2
x3
x4
29
Ternary Code Example
• Ten source symbols: x1, x2, …, x9, x10
• K = 10, J = 3
• lk = 1,2,2,2,2,2,3,3,3,3
• lk = 1,2,2,2,2,2,3,3,3
• lk = 1,2,2,2,2,2,3,3,4,4
30
Variable Length Codes
Output Alphabet
Y={y1,…,yJ}
Lengths
31
Average Codeword Length Bound
H( X )
L(C) ≥
log b J
32
Four Symbol Source
Information
Source
H
• p(x1) = 1/2 p(x3) = 1/4 p(x2) = p(x4) = 1/8
• H(X) = 1.75 bits
x1
x2
x3
x4
0
110
10
111
L(C) = 1.75 bits
x1
x2
x3
x4
00
01
10
11
L(C) = 2 bits
33
Code Efficiency
H(X)
ζ=
L(C)log b J
• First code
ζ = 1.75/1.75 = 100%
• Second code ζ = 1.75/2.0 = 87.5%
34
Compact Codes
• A code C is called compact for a source X if its
average codeword length L(C) is less than or
equal to the average length of all other
uniquely decodable codes for the same source
and alphabet J.
35
Upper and Lower Bounds
H( X )
H( X )
≤ L(C ) <
+1
log b J
log b J
36
The Shannon Algorithm
• Order the symbols from largest to smallest
probability
• Choose the codeword lengths according to
lk =  − log J p(xk )
• Construct the codewords according to the
cumulative probability Pk
k −1
Pk = ∑ p(xi )
i =1
37
Example
•
•
•
•
•
K = 10, J = 2
p(x1) = p(x2) = 1/4
p(x3) = p(x4) = 1/8
p(x5) = p(x6) = 1/16
p(x7) = p(x8) = p(x9) = p(x10) = 1/32
38
Pk
39
Shannon Algorithm
• p(x1) = .4 p(x2) = .3 p(x3) = .2 p(x4) = .1
• H(X) = 1.85 bits
Shannon Code
x1 00
x2 01
x3 101
x4 1110
Alternate Code
x1 0
x2 10
x3 110
x4 111
L(C) = 2.4 bits
L(C) = 1.9 bits
ζ = 77.1%
ζ = 97.4%
40
Shannon’s Noiseless Coding Theorem
41
Robert M. Fano (1917-2016)
42
The Fano Algorithm
• Arrange the symbols in order of decreasing
probability
• Divide the symbols into J approximately
equally probable groups
• Each group receives one of the J code symbols
as the first symbol
• This division process is repeated within the
groups as many times as possible
43
David Huffman (1925-1999)
44
• ``It was the most singular moment in my life.
There was the absolute lightning of sudden
realization.’’
– David Huffman
• ``Is that all there is to it!’’
– Robert Fano
45
The Binary Huffman Algorithm
1. Arrange the symbols in order of decreasing probability.
2. Assign a 1 to the last digit of the Kth codeword ck and a 0
to the last digit of the (K-1)th codeword ck-1. Note that
this assignment is arbitrary.
3. Form a new source X´ with x´k = xk, k = 1, …, K-2, and
p(x´k-1) = p(xk-1) + p(xk)
x´k-1 = xk-1 U xk
4. Rearrange the new set of probabilities, set K = K-1.
5. Repeat Steps 2 to 4 until all symbols have been
combined.
To obtain the codewords, trace back to the original
symbols.
46
Five Symbol Source
• p(x1)=.35 p(x2)=.22 p(x3)=.18 p(x4)=.15 p(x5)=.10
• H(X) = 2.2 bits
47
48
Huffman Code for the English Alphabet
49
Six Symbol Source
• p(x1)=.4 p(x2)=.3 p(x3)=.1 p(x4)=.1 p(x5)=.06
p(x6)=.04
• H(X) = 2.1435 bits
50
Second Five Symbol Source
• p(x1)=.4 p(x2)=.2 p(x3)=.2 p(x4)=.1 p(x5)=.1
• H(X) = 2.1219 bits
51
Two Huffman Codes
x1
x1
x2
x2
x3
x3
x4
x4
x5
x5
C1
C2
52
Two Huffman Codes
x1
x2
x3
x4
x5
C1
0
10
111
1101
1100
C2
11
01
00
101
100
53
Second Five Symbol Source
• p(x1)=.4 p(x2)=.2 p(x3)=.2 p(x4)=.1 p(x5)=.1
• H(X) = 2.1219 bits
L(C) = 2.2 bits
• variance of code C1
0.4(1-2.2)2+0.2(2-2.2)2+0.2(3-2.2)2+0.2(4-2.2)2 = 1.36
• variance of code C2
0.8(2-2.2)2+0.2(3-2.2)2 = 0.16
• which code is preferable?
54
Nonbinary Codes
• The Huffman algorithm for nonbinary codes
(J>2) follows the same procedure as for binary
codes except that J symbols are combined at
each stage.
• This requires that the number of symbols in
the source X is K’=J+c(J-1), K’≥K
55
Nonbinary Example
• J=3 K=6
• Require K’=J+c(J-1) → c=2 so K’=7
• Add an extra symbol x7 with p(x7)=0
• p(x1)=1/3 p(x2)=1/6 p(x3)=1/6 p(x4)=1/9 p(x5)=1/9
p(x6)=1/9 p(x7)=0
• H(X) = 1.54 trits
56
Codes for Different Output Alphabets
• K=13
• p(x1)=1/4 p(x2)=1/4
p(x3)=1/16 p(x4)=1/16 p(x5)=1/16 p(x6)=1/16
p(x7)=1/16 p(x8)=1/16 p(x9)=1/16
p(x10)=1/64 p(x11)=1/64 p(x12)=1/64 p(x13)=1/64
• J=2 to 13
57
Codes for Different Output Alphabets
J
p(xi)
xi
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
L(C)
58
ζ
Code Efficiency
J
59
The Binary and Quaternary Codes
•
•
•
•
•
•
•
•
•
•
•
•
•
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
00
01
1000
1001
1010
1011
1100
1101
1110
111100
111101
111110
111111
0
1
20
21
22
23
30
31
32
330
331
332
333
60
Huffman Codes
• Symbol probabilities must be known a priori
• The redundancy of the code L(C)-H(X) is
typically nonzero
• Error propagation can occur
• Codewords have variable length
61
Fixed Length Source Compaction Codes
Lengths l1=l2=…=lK=L
62
Fixed Length Source Compaction Codes
• If JL < KN we cannot uniquely encode all
source words with length L codewords
• Two questions
1. How small can JL be such that performance is
acceptable?
2. Which sourcewords should be encoded uniquely
with length L codewords?
63
The number of typical sequences satisfies
TX (δ ) < b
N[H(X)+δ ]
so encoding all typical sequences with length L
codewords requires that
L
J ≥b
N[H(X)+δ ]
64
• Although the set of atypical sequences may
be large, the Shannon-McMillan Theorem
ensures that
• Thus it is possible to encode sourcewords
with an arbitrarily small probability of error Pe
provided that
– LlogbJ > NH(X)
– N is sufficiently large
65
Example
• K=J=2
• p(x1)=0.1 p(x1)=0.1 H(X) = 0.469 bit
• Choose N=4, L=3
L 3
R= =
> H(X)
N 4
• Partition the 16 sourcewords into 7 typical
sequences and 9 atypical sequences
66
67
The Code
Typical
Sequence
x2x2x2x2
x1x2x2x2
x2x1x2x2
x2x2x1x2
x2x2x2x1
x1x1x2x2
x1x2x1x2
Codeword
000
100
010
001
110
101
011
68
The Code
Atypical
Sequence
x1x2x2x1
x2x1x1x2
x2x1x2x1
x2x2x1x1
x1x1x1x2
x1x1x2x1
x1x2x1x1
x2x1x1x1
x1x1x1x1
Codeword
111 0000
111 1000
111 0100
111 0010
111 0001
111 1100
111 1010
111 1001
111 0110
69
Code Rate
• The actual code rate is
.9639 × 3 + .0361 × 7 3
.7861
R=
=+ .0361 =
4
4
70
71
72
73
74
Fixed Length Source Compaction Codes
• If R>H(X), as N →∞ Pe→0
• If R<H(X), as N →∞ Pe→1
75
Variable to Fixed Length Codes
M source words M≤JL
Lengths m1,m2,…,mM
Variable to fixed
length encoder
76
Variable to Fixed Length Codes
• Two questions:
1. What is the best mapping from sourcewords to
codewords?
2. How to ensure unique encodability?
77
Average Bit Rate
average codeword length
ABR =
average sourceword length
L
=
L(S)
=
L(S) E=
[XL ]
M
∑p(s )m
i =1
i
i
M - number of sourcewords
si - sourceword i
mi - length of sourceword i
78
Variable to Fixed Length Codes
• Design criteria: minimize the Average Bit Rate
L
ABR=
L(S)
• ABR ≥ H(X) (vs. L(C) ≥ H(X) for fixed to
variable length codes)
• L(S) should be as large as possible so that the
ABR is close to H(X)
79
Variable to Fixed Length Codes
• Fixed to variable length codes
H(X)
ζ=
L(C)
• Variable to fixed length codes
H(X)
ζ=
ABR
80
Binary Tunstall Code K=3, L=3
Let x1 = a, x2 = b and x3 = c
Unused codeword is 111
81
Binary Tunstall Code Construction
• Source X with K symbols
• Choose a codeword length L where 2L > K
1. Form a tree with a root and K branches labelled
with the symbols
2. If the number of leaves is greater than 2L - (K-1),
go to Step 4
3. Find the leaf with the highest probability and
extend it to have K branches, go to Step 2
4. Assign codewords to the leaves
82
K=3, L=3
p(a) = .7, p(b) = .2, p(c) = .1
83
H(X)
ζ = H(X)/ABR = 84.7%
84
Tunstall Code for a Binary Source
• N = 3, K = 2, J = 2, p(x1) = 0.7, p(x2) = 0.3
• JN = 8
Seven sourcewords Eight sourcewords
x1x1x1x1x1
x1x1x1x1x1
x1x1x1x1x2
x1x1x1x1x2
x1x1x1x2
x1x1x1x2
x1x1x2
x1x1x2
x1x2x1
x1x2
x1x2x2
x2x1
x2x1
x2x2
x2x2
Codewords
000
001
010
011
100
101
110
111
85
Huffman Code for a Binary Source
•
•
•
•
•
•
•
•
•
•
N = 3, K = 2, p(x1) = 0.7, p(x2) = 0.3
Eight sourcewords
A = x1x1x1 p(A) = .343 00
B = x1x1x2 p(B) = .147 11
C = x1x2x1 p(C) = .147 010
D = x2x1x1 p(D) = .147 011
E = x2x2x1 p(E) = .063 1000
F = x2x1x2 p(F) = .063 1001
G = x1x2x2 p(G) = .063 1010
H = x2x2x2 p(H) = .027 1011
86
Code Comparison
• H(X) = .8813
• Tunstall Code (7 codewords)
ABR = .9762
ζ = 90.3%
• Tunstall Code (8 codewords)
ABR = .9138
ζ = 96.4%
• Huffman Code
L(C) = .9087
ζ = 97.0%
87
Error Propagation
• Received Huffman codeword sequence
00 11 00 11 00 11 …
A B A B A B
• Sequence with one bit error
011 1001 1001 1 …
D F
F
88
Huffman Coding
• The length of Huffman codewords has to be an
integer number of symbols, while the selfinformation of the source symbols is almost always a
non-integer.
• Thus the theoretical minimum message compression
cannot always be achieved.
• For a binary source with p(x1) = .1 and p(x2) = .9
– An x1 symbol should be encoded to 3.32 bits and an x2 symbol
to .152 bits
– H(X) = .469 so the optimal average codeword length is
.469 bits.
89
Improving Huffman Coding
• One way to overcome the redundancy limitation is to
encode blocks of several symbols.
In this way the per-symbol inefficiency is spread over
an entire block.
– N = 1: ζ = 46.9%
N = 2: ζ = 72.7%
• However, using blocks is difficult to implement as
there is a block for every possible combination of
symbols, so the number of blocks increases
exponentially with their length.
90
Peter Elias (1923 – 2001)
91
Arithmetic Coding
• Arithmetic coding bypasses the idea of
replacing an input symbol (or groups of
symbols) with a specific codeword.
• Instead, a stream of input symbols is replaced
with a single floating-point number in [0,1).
• Useful when dealing with sources with small
alphabets, such as binary sources, and
alphabets with highly skewed probabilities.
92
Arithmetic Coding Applications
• JPEG, MPEG-1, MPEG-2
– Huffman and arithmetic coding
• JPEG2000, MPEG-4
– Arithmetic coding only
• ZIP
– prediction by partial matching (PPMd) algorithm
• H.263, H.264
93
Arithmetic Coding
• The output of an arithmetic encoder is a stream of
bits
• However we can think that there is a prefix 0, and
the stream represents a fractional binary number
between 0 and 1
01101010 → 0.01101010
• To explain the algorithm, numbers will be shown as
decimal, but they are always binary in practice
94
Example 1
• Encode string bccb from the alphabet {a,b,c}
• p(a) = p(b) = p(c) = 1/3
• The arithmetic coder maintains two numbers, low
and high, which represent a subinterval [low,high) of
the range [0,1)
• Initially low=0 and high=1
95
Example 1
• The range between low and high is divided between
the symbols of the source alphabet according to
their probabilities
high 1
c
p(c)=1/3
0.6667
p(b)=1/3
p(a)=1/3
b
0.3333
a
low 0
96
Example 1
b
high 1
high = 0.6667
c
0.6667
b
0.3333
a
low 0
low = 0.3333
97
Example 1
high
c
0.6667
high = 0.6667
c
p(c)=1/3
0.5556
b
p(b)=1/3
0.4444
a
p(a)=1/3
low
0.3333
low = 0.5556
98
Example 1
high
p(c)=1/3
c
0.6667
high = 0.6667
c
0.6296
b
p(b)=1/3
0.5926
a
p(a)=1/3
low
0.5556
low = 0.6296
99
Example 1
high
p(c)=1/3
b
0.6667
high = 0.6543
c
0.6543
b
p(b)=1/3
0.6420
a
p(a)=1/3
low
0.6296
low = 0.6420
100
Arithmetic Coding Algorithm
Set low to 0.0
Set high to 1.0
While there are still input symbols Do
get next input symbol
range = high – low
high = low + range × symbol_high_range
low = low + range × symbol_low_range
End While
output number between high and low
101
Arithmetic Coding Example 2
•
•
•
•
p(x1) = 0.5, p(x2) = 0.3, p(x3) = 0.2
Symbol ranges: 0 ≤ x1 < .5 .5 ≤ x2 < .8 .8 ≤ x3 < 1
low = 0.0 high = 1.0
Symbol sequence x1x2x3x2…
•
Iteration 1
x1: range = 1.0 - 0.0 = 1.0
high = 0.0 + 1.0 × 0.5 = 0.5
low = 0.0 + 1.0 × 0.0 = 0.0
Iteration 2
x2: range = 0.5 - 0.0 = 0.5
high = 0.0 + 0.5 × 0.8 = 0.40
low = 0.0 + 0.5 × 0.5 = 0.25
Iteration 3
x3: range = 0.4 - 0.25 = 0.15
high = 0.25 + 0.15 × 1.0 = 0.40
low = 0.25 + 0.15 × 0.8 = 0.37
•
•
102
Arithmetic Coding Example 2
•
•
Iteration 3
x3: range = 0.4 - 0.25 = 0.15
high = 0.25 + 0.15 × 1.0 = 0.40
low = 0.25 + 0.15 × 0.8 = 0.37
Iteration 4
x2: range = 0.4 - 0.37 = .03
high = 0.37 + 0.03 × 0.5 = 0.385
low = 0.37 + 0.03 × 0.8 = 0.394
•
0.385≤ x1x2x3x2<0.394
0.385 = 0.0110001…
0.394 = 0.0110010...
•
The first 5 bits of the codeword are 01100
•
If there are no additional symbols to be encoded the codeword is 011001
103
Arithmetic Coding Example 3
Suppose that we want to encode the message
BILL GATES
Character
Probability
SPACE
1/10
A
1/10
B
1/10
E
1/10
G
1/10
I
1/10
L
2/10
S
1/10
T
1/10
Range
0.00 ≤ r < 0.10
0.10 ≤ r < 0.20
0.20 ≤ r < 0.30
0.30 ≤ r < 0.40
0.40 ≤ r < 0.50
0.50 ≤ r < 0.60
0.60 ≤ r < 0.80
0.80 ≤ r < 0.90
0.90 ≤ r < 1.00
104
1.0
0.9
0.3
0.26
0.258
0.2576
0.25724
0.2572168
0.257216776
0.2572168
0.25722
T
T
T
T
T
T
T
T
T
S
S
S
S
S
S
S
S
S
0.8
T
0.2572167756
S
0.2572167752
L
L
L
L
L
L
L
L
L
L
I
I
I
I
I
I
I
I
I
I
G
G
G
G
G
G
G
G
G
G
E
E
E
E
E
E
E
E
E
E
B
B
B
B
B
B
B
B
B
B
A
A
A
A
A
A
A
A
A
A
()
()
()
()
()
()
()
()
()
()
0.256
0.2572
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.2
0.25
0.25720
0.257216
0.25721676
105
0.2572164
0.257216772
Arithmetic Coding Example 3
New Symbol
Low Value
High Value
0.0
1.0
B
0.2
0.3
I
0.25
0.26
L
0.256
0.258
L
0.2572
0.2576
SPACE
0.25720
0.25724
G
0.257216
0.257220
A
0.2572164
0.2572168
T
0.25721676
0.2572168
E
0.257216772
0.257216776
S
0.2572167752
0.2572167756
106
Binary Codeword
• 0.2572167752 in binary is
0.01000001110110001111010101100101
• 0.2572167756 in binary is
0.01000001110110001111010101100111
• The transmitted binary codeword is then
0100000111011000111101010110011
• 31 bits long
107
Decoding Algorithm
get encoded number
Do
find symbol whose range contains the encoded
number
output the symbol
subtract symbol_low_range from the encoded
number
divide by the probability of the output symbol
Until no more symbols
108
Decoding BILL GATES
Encoded
Number
Output
Symbol
Low
High
Probability
0.2572167752
B
0.2
0.3
0.1
0.572167752
I
0.5
0.6
0.1
0.72167752
L
0.6
0.8
0.2
0.6083876
L
0.6
0.8
0.2
0.041938
SPACE
0.0
0.1
0.1
0.41938
G
0.4
0.5
0.1
0.1938
A
0.2
0.3
0.1
0.938
T
0.9
1.0
0.1
0.38
E
0.3
0.4
0.1
0.8
S
0.8
0.9
0.1
0.0
109
Finite Precision
Symbol
Probability
(fraction)
Interval
(8-bit precision)
fraction
Interval
(8-bit precision)
binary
Range in binary
a
1/3
[0,85/256)
[0.00000000,
0.01010101)
00000000
01010100
b
1/3
[85/256,171/256)
[0.01010101,
0.10101011)
01010101
10101010
c
1/3
[171/256,1)
[0.10101011,
1.00000000)
10101011
11111111
110
Renormalization
Symbol
Probability
(fraction)
a
1/3
00000000
01010100
0
00000000
10101001
b
1/3
01010101
10101010
none
01010101
10101010
c
1/3
10101011
11111111
1
01010110
11111111
Range
Digits that can
be output
Range after
renormalization
111
Terminating Symbol
Probability
Symbol (fraction)
Interval
(8-bit precision)
fraction
Interval
(8-bit precision)
binary
Range in binary
a
1/3
[0,85/256)
[0.00000000,
0.01010101)
00000000
01010100
b
1/3
[85/256,170/256)
[0.01010101,
0.10101011)
01010101
10101001
c
1/3
[170/256,255/256)
[0.10101011,
0.11111111)
10101010
11111110
term
1/256
[255/256,1)
[0.11111111,
1.00000000)
11111111
112
Huffman vs Arithmetic Codes
• X = {a,b,c,d}
• p(a) = .5, p(b) = .25, p(c) =.125, p(d) = .125
• Huffman code
a
0
b
10
c
110
d
111
113
Huffman vs Arithmetic Codes
• X = {a,b,c,d}
• p(a) = .5, p(b) = .25, p(c) =.125, p(d) = .125
• Arithmetic code ranges
a
[0, .5)
b
[.5, .75)
c
[.75, .875)
d
[.875, 1)
114
Huffman vs Arithmetic Codes
• X = {a,b,c,d}
• p(a) = .7, p(b) = .12, p(c) =.10, p(d) = .08
• Huffman code
a
0
b
10
c
110
d
111
115
Huffman vs Arithmetic Codes
• X = {a,b,c,d}
• p(a) = .7, p(b) = .12, p(c) =.10, p(d) = .08
• Arithmetic code ranges
a
[0, .7)
b
[.7, .82)
c
[.82, .92)
d
[.92, 1)
116
Robustness of Huffman Coding
K
pi = p(xi )
(actual)
pˆ=i pi + ε i
(estimated)
∑p =1
i
i =1
K
∑ pˆ = 1
i =1
i
K
∴∑ε i =
0
i =1
117
Robustness of Huffman Coding
K
K
L(Cˆ) = ∑ pˆi lˆi
L(C ) = ∑ pi li
i =1
∆=
L L(Cˆ) − L(C=)
i =1
K
K
∑ pˆ lˆ − ∑ p l
i i
=i 1 =i 1
=
K
(
i i
)
K
ˆ
ˆ
p
l
−
l
+
l
ε
∑ i i i ∑ ii
=i 1 =i 1
118
Robustness of Huffman Coding
K
∆=
L L(Cˆ) − L(C ) ≈ ∑ ε i lˆi
i =1
• Let the variance of ε be σ2, then in the worst
case
2
 K 2  K 
( ∆L ) ≈ K ∑ lˆi −  ∑ lˆi 
=
i 1 
 i 1=
2
 2
σ

119
Gadsby by Ernest Vincent Wright
If youth, throughout all history, had had a champion to stand up for it;
to show a doubting world that a child can think; and, possibly, do it
practically; you wouldn’t constantly run across folks today who claim
that “a child don’t know anything.” A child’s brain starts functioning at
birth; and has, amongst its many infant convolutions, thousands of
dormant atoms, into which God has put a mystic possibility for noticing
an adult’s act, and figuring out its purport. Up to about its primary
school days a child thinks, naturally, only of play. But many a form
of play contains disciplinary factors. “You can’t do this,” or “that puts
you out,” shows a child that it must think, practically or fail. Now, if,
throughout childhood, a brain has no opposition, it is plain that it will
attain a position of “status quo,” as with our ordinary animals. Man
knows not why a cow, dog or lion was not born with a brain on a par
with ours; why such animals cannot add, subtract, or obtain from books
and schooling, that paramount position which Man holds today.
120
Lossless Compression Techniques
1 Model and code
The source is modelled as a random process. The
probabilities (or statistics) are given or acquired.
2 Dictionary-based
There is no explicit model and no explicit statistics
gathering. Instead, a codebook (or dictionary) is used
to map sourcewords into codewords.
121
Model and Code
•
•
•
•
•
Huffman code
Tunstall code
Fano code
Shannon code
Arithmetic code
122
Dictionary-based Techniques
• Lempel-Ziv
– LZ77 – sliding window
– LZ78 – explicit dictionary
• Adaptive Huffman coding
• Due to patents, LZ77 and LZ78 led to many variants:
LZ77
Variants
LZR
LZ78
Variants
LZW
LZSS DEFLATE
LZC
LZT
LZH
LZMW
LZJ
LZFG
• Zip methods use LZH and LZR among other techiques
• UNIX compress uses LZC (a variant of LZW)
123
Lempel-Ziv Compression
• Source symbol sequences are replaced by
codewords that are dynamically determined.
• The code table is encoded into the
compressed data so it can be reconstructed
during decoding.
124
Lempel-Ziv Example
125
126
127
Lempel-Ziv Codeword
128
Compression Comparison
Compression as a percentage of the original file size
File Type
UNIX Compact
Adaptive Huffman
UNIX Compress
Lempel-Ziv-Welch
ASCII File
66%
44%
Speech File
65%
64%
Image File
94%
88%
129
Compression Comparison
130