Document

exercise in the previous class
binary Huffman code?
average codeword length?
0.363×1+0.174×3+...+0.021×5=2.660
1.000
0
1
0.637
0.359
0.278
0.185
0.135
A
B
C
D
E
F
G
H
prob.
0.363
0.174
0.143
0.098
0.087
0.069
0.045
0.021
0.066
0.363
A
0
0.174
B
100
0.143
C
110
0.098
D
1010
0.087
E
1011
0.069
F
1110
0.045 0.021
G
H
11110 11111
1
exercise in the previous class
4-ary Huffman code?
[basic idea] join four trees
we may have #trees < 4 in the final round.
A
B
C
?
dummy D
E
with one “join”, 4 – 1 = 3 trees disappear.
F
G
add dummy nodes, start with 3k+1 nodes.
H
prob.
0.363
0.174
0.143
0.098
0.087
0.069
0.045
0.021
1.000
a
0.363
A
a
b
0.174
B
b
c
0.143
C
c
d
0.320
0.098
D
da
0.087
E
db
0.069
F
dc
0.066
0.045
G
dda
0.021
H
ddb
0
*
0
*
2
today’s class
basic properties needed for source coding
uniquely decodable
immediately decodable
Huffman code
construction of Huffman code
extensions of Huffman code
theoretical limit of the “compression”
related topics
today
3
today’s class (detail)
Huffman codes are good, but how good are they?
Huffman codes for extended information sources
possible means (手段) to improve the efficiency
Shannon’s source coding theorem
the theoretical limit of efficiency
some more variations of Huffman codes
blocks of symbols with variable block length
math.
algorithm
math.
algorithm
4
how should we evaluate Huffman codes?
0
good code
0
immediately decodable...“use code trees”
0
1
1
small average codeword length (ACL)
1
 It seems that Huffman’s algorithm gives a good solution.
To see that Huffman codes are really good,
we discuss a mathematical limit of the ACL
...under a certain assumption (up to the slide 11)
...in the general case (Shannon’s theorem)
5
theoretical limit under an assumption
assumption
the encoding is done in a symbol-by-symbol manner
define one codeword for each symbol of the source S
S produces M symbols with probabilities p1, ..., pM
Lemma (restricted Shannon’s theorem):
1. for any code, the ACL ≥ H1(S)
2. a code with ACL ≤ H1(S)+1 is constructible
H1(S) is the borderline of “possible” and “impossible”.
6
Shannon’s lemma (bad naming...)
To prove the restricted Shannon’s theorem
a small technical lemma (Shannon’s lemma) is needed.
Shannon’s lemma (シャノンの補助定理)
For any non-negative numbers q1, ..., qM with q1 +...+ qM ≤ 1,
𝑀
𝑀
−𝑝𝑖 log 2 𝑞𝑖 ≥
𝑖=1
−𝑝𝑖 log 2 𝑝𝑖 (= 𝐻1 (𝑆))
𝑖=1
with the equation holds if and only if pi = qi.
remind: p1, ..., pM are symbol probabilities and p1 +...+ pM = 1
7
proof (sketch)
left hand side – right hand side=
𝑀
𝑀
−𝑝𝑖 log 2 𝑞𝑖 +
𝑖=1
𝑀
𝑝𝑖 log 2 𝑝𝑖 =
𝑖=1
y = – logex
1
O
y=1–x
− log 𝑒 𝑥 ≥ 1 − 𝑥
𝑖=1
𝑀
≥
𝑖=1
𝑞𝑖
−𝑝𝑖 log 2
=
𝑝𝑖
𝑀
𝑖=1
𝑝𝑖
𝑞𝑖
(− log 𝑒 )
log 𝑒 2
𝑝𝑖
𝑝𝑖
𝑞𝑖
1
1−
=
log 𝑒 2
𝑝𝑖
log 𝑒 2
1
=
(
log 𝑒 2
𝑀
𝑀
𝑝𝑖 −
𝑖=1
𝑖=1
𝑀
𝑝𝑖 − 𝑞𝑖
𝑖=1
1
𝑞𝑖 ) =
(1 −
log 𝑒 2
𝑀
𝑞𝑖 )
𝑖=1
≥0
the equation holds iff qi/pi = 1
8
proof of the restricted Shannon’s theorem: 1
for any code, the average codeword length ≥ H1(S)
Let l1, ..., lM be the length of codewords, and define 𝑞𝑖 = 2−𝑙𝑖 .
Kraft: 1 ≥ 2−𝑙1 + ⋯ + 2−𝑙𝑀 = 𝑞1 + ⋯ + 𝑞𝑀
𝑙𝑖 = − log 2 𝑞𝑖
Shannon’s Lemma:
𝑀
𝑀
−𝑝𝑖 log 2 𝑞𝑖 ≥
𝑖=1
−𝑝𝑖 log 2 𝑝𝑖 = 𝐻1 𝑆 .
𝑖=1
the ACL 𝐿 = 𝑝𝑖 𝑙𝑖 = −𝑝𝑖 log 2 𝑞𝑖
We have shown that L ≥ H1(S).
9
proof of the restricted Shannon’s theorem: 2
a code with average codeword length ≤ H1(S)+1 is constructible
Choose integers l1, ..., lM so that − log 2 𝑝𝑖 ≤ 𝑙𝑖 < − log 2 𝑝𝑖 + 1.
The choice makes 2−𝑙𝑖 ≤ 2log2 𝑝𝑖 = 𝑝𝑖 , and
2−𝑙1 + ⋯ + 2−𝑙𝑀 ≤
𝑝𝑖 = 1 ... Kraft’s inequality
We can construct a code with codeword length l1, ..., lM,
whose ACL is
𝐿=
𝑝𝑖 𝑙𝑖 <
𝑝𝑖 (− log 2 𝑝𝑖 + 1)) =
−𝑝𝑖 log 2 𝑝𝑖 +
𝑝𝑖 = 𝐻1 𝑆 + 1.
10
the lemma and the Huffman code
Lemma (restricted Shannon’s theorem):
1. for any code, the ACL ≥ H1(S)
2. a code with ACL ≤ H1(S)+1 is constructible
We can show that, for a Huffman code,
L ≤ H1(S) + 1
there is no symbol-by-symbol code whose
ACL is smaller than L.
proof ... by recursion on the size of code trees
A Huffman code is said to be a compact code.
11
coding for extended information sources
The Huffman code is the best symbol-by-symbol code, but...
the ACL  1
not good for encoding binary information sources
A B A C C A
symbol prob. C1 C2
A
0.8
0
1
0 10 0 11 11 0
B
0.2
1
0
average
1.0 1.0
If we encode several symbols in a block, then...
the ACL per symbol can be < 1
A B A C C A
good for binary sources also
10
110 01
12
block Huffman coding
message
ABCBCBBCAA...
“block” operation
blocked message
AB CBC BB CAA...
Huffman encoding
codewords
fixed-length
(equal-, constant-)
variable-length
(unequal-)
block partition
run-length
01 10 001 1101...
13
fixed-length block Huffman coding
prob. codeword
0
A 0.6
10
B 0.3
11
C 0.1
AA
AB
AC
BA
BB
BC
CA
CB
CC
prob. codeword
0.36
0
0.18
100
0.06
1100
0.18
101
0.09
1110
0.03 11110
0.06
1101
0.03 111110
0.01 111111
ACL:
0.6×1+ 0.3×2+ 0.1×2 = 1.4 bit
for one symbol
blocks with two symbols
ACL:
0.36×1+ ... + 0.01×6 = 2.67 bit,
but this is for two symbols
2.67 / 2 = 1.335 bit for one symbol
improved!
14
block coding for binary sources
prob. codeword
0
A 0.8
1
B 0.2
AA
AB
BA
BB
prob. codeword
0.64
0
0.16
10
0.16
110
0.04
111
ACL:
0.8×1+ 0.2×1 = 1.0 bit
for one symbol
blocks with two symbols
ACL:
0.64×1+ ... + 0.04×3 = 1.56 bit
for two symbols
1.56 / 2 = 0.78 bit for one symbol
improved!
15
the block length
AAA
AAB
ABA
ABB
BAA
BAB
BBA
BBB
prob. codeword blocks with three symbols
0.512
0
ACL:
0.128
100
0.512×1+ ... + 0.008×5 = 2.184 bit
0.128
101
0.032 11100
for three symbols
0.128
110
2.184 / 3 = 0.728 bit for one symbol
0.032 11101
0.032 11110
0.008 11111
block size
1
2
3
:
ACL per symbol
1.0
0.78
0.728
:
larger block size
 more compact
16
block code and extension of information source
What happens if we increase the block length further?
Observe that...
a block code defines a codeword for each block pattern.
one block = a sequence of n symbols of S
= one symbol of Sn, the n-th order extension of S
 restricted Shannon’s theorem is applicable:
H1(Sn) ≤ Ln < H1(Sn) + 1
Ln = the ACL for n symbols
1
for one symbol of S, 𝐻1 (𝑆 𝑛 ) 𝐿𝑛 𝐻1 𝑆 𝑛
≤
<
+
𝑛
𝑛
𝑛
𝑛
17
Shannon’s source coding theorem
𝐻1 (𝑆 𝑛 ) 𝐿𝑛 𝐻1 𝑆 𝑛
1
≤
<
+
𝑛
𝑛
𝑛
𝑛
H1(Sn) / n ... the n-th order entropy of S (→ Apr. 12)
If n goes to the infinity...
𝐿𝑛
𝐻 𝑆 ≤
<𝐻 𝑆 +𝜀
𝑛
Shannon’s source coding theorem:
1. for any code, the ACL ≥ H (S)
2. a code with ACL ≤ H (S) + ε is constructible
18
what the theorem means
1.
2.
Shannon’s source coding theorem:
for any code, the ACL ≥ H (S)
a code with ACL ≤ H (S) + ε is constructible
Use block Huffman codes, and you can approach to the limit.
You never overcome the limit. however.
prob.
A 0.8
B 0.2
H(S) = 0.723
block size
1
2
3
:
ACL per symbol
1.0
0.78
0.728
:
0.723 + ε
19
remark 1
Why block codes give smaller ACL?
fact 1: the ACL is minimized by a real-number solution
if P(A) = 0.8, P(B) = 0.2, then we want l1 and l2 with...
0.8𝑙1 + 0.2𝑙2 → min
s.t. 2−𝑙1 + 2−𝑙2 ≤ 1
𝑙1 = − log 2 0.8 ≈ 0.322
𝑙2 = − log 2 0.2 ≈ 2.322
fact 2: the length of a codeword must be an integer
0.8𝑙1 + 0.2𝑙2 → min
s.t. 2−𝑙1 + 2−𝑙2 ≤ 1
and 𝑙1 , 𝑙2 :integers
𝑙1 = 1 > 0.322
𝑙2 = 1 < 2.322
...loss!
...gain!
frequent loss, seldom gain...
20
remark 1 (cnt’d)
the gap between the ideal and the real codeword lengths:
− log 2 𝑝 − − log 2 𝑝 ... [𝑥] is an integer approximation of 𝑥
the gap is weighted by the probability…
𝑝(− log 2 𝑝 − − log 2 𝑝 )
the weighted gap
p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
long block
 many symbols
 small probabilities
 small weighted gaps
 close to the ideal ACL
1
0.8
0.6
0.4
0.2
0
21
today’s class (detail)
Huffman codes are good, but how good are they?
Huffman codes for extended information sources
possible means (手段) to improve the efficiency
Shannon’s source coding theorem
the theoretical limit of efficiency
some more variations of Huffman codes
blocks of symbols with variable block length
math.
algorithm
math.
algorithm
22
practical issues (問題) of block coding
Theoretically saying, the block Huffman codes are the best.
From practical viewpoint, there are several problems:
We need to know the probability distribution in advance.
(this will be discussed in the next class)
We need a large table for the encoding/decoding.
if one byte is needed to record one entry of the table...
–256 byte table, if block length = 8
–64 Kbyte table, if block length = 16
–4 Gbyte table, if block length = 32
23
use blocks with variable-length
AAA
AAB
ABA
ABB
BAA
BAB
BBA
BBB
prob. codeword
If we define blocks so that
0.512
0
they have the same length, then ...
0.128
100
0.128
101
some blocks have small probabilities
0.032 11100
those blocks also need codewords
0.128
110
0.032 11101
0.032 11110
0.008 11111
prob. codeword If we define blocks so that
0
AAA 0.512
they have similar probabilities, then ...
100
AAB 0.128
length differ from block by block
0.16
101
AB
the table has little useless blocks
0.2
11
B
24
definition of block patterns
Block patterns must be defined so that...
the patterns can represent (almost) all symbol sequences.
bad example: block pattern = {AAA, AAB, AB}
AABABAAB
AAB AB AAB
AABBBAAB
AAB ?
two different approaches are well-known;
block partition approach
run-length approach
25
define patterns with block partition approach
1.
2.
3.
prepare all blocks with length one
partition the block with the largest probability
by appending one more symbol
go to 2
Example: P(A) = 0.8, P(B) = 0.2
A
B
0.8
0.2
AA 0.64
AB 0.16
B 0.2
codewords
AAA 0.512
0
AAB 0.128 100
AB 0.16
101
B
0.2
11
26
how good is this?
AAA
AAB
AB
B
0.512 0
0.128 100
0.16 101
0.2
11
to determine the average codeword length,
assume that n blocks are produced from S:
0 101 0 11 101 ...
0.512n×1 + 0.128n×3 ...
= 1.776n bits
S
encode
AAA AB AAA B AB ...
0.512n×3 + 0.128n×3 ...
= 2.44n symbols
2.44n symbols are encoded to 1.776n bits
 the average codeword length is 1.776n / 2.44n = 0.728 bit
(almost the same as the block length = 8, p. 16, but small table)
27
define patterns with run-length approach
run = a sequence of consecutive (連続の) identical symbol
Example: divide a message into runs of “A”:
ABBAAAAABAAAB
run of length = 1
run of length = 3
run of length = 0 run of length = 5
The message is constructible if the lengths of runs are given.
 define blocks as runs of various length
28
upper-bound the run-length
small problem? ... there can be very long run
 put an upper-bound limit : run-length limited (RLL) coding
upper-bound = 3
run length representation
0
0
1
1
2
2
3
3+0
4
3+1
5
3+1
6
3+3+0
7
3+3+1
:
:
ABBAAAAABAAAB is represented as
one “A” followed by B
zero “A” followed by B
three or more “A”s followed by B
two “A”s followed by B
three or more “A”s followed by B
zero “A” followed by B
29
run-length Huffman code
Huffman code defined to encode the length or runs
effective when there is strong bias on the symbol probabilities
p(A) = 0.9, p(B) = 0.1
run length block pattern
0
B
1
AB
2
AAB
3 or more
AAA
prob. codeword
0.1
10
0.09
110
0.081
111
0.729
0
ABBAAAAABAAAB: 1, 0, 3+, 2, 3+, 0 ⇒ 110 10 0 111 0 10
AAAABAAAAABAAB: 3+, 1, 3+, 2, 2 ⇒ 0 110 0 111 111
AAABAAAAAAAAB: 3+, 0, 3+, 3+, 2 ⇒ 0 10 0 0 111
30
example of various block coding
S: memoryless & stationary. P(A) = 0.9, p(B) = 0.1
the entropy of S is H(S) = –0.9log20.9 – 0.1log20.1=0.469 bit
code 1: a naive Huffman code
average codeword length = 1
symbol prob. codeword
A
0.9
0
B
0.1
1
code 2: fixed-length (3bit)
average codeword length = 1.661/3symbols = 0.55/symbol
AAA
AAB
ABA
ABB
0.729
0.081
0.081
0.009
0
100
110
1010
BAA
BAB
BBA
BBB
0.081 1110
0.009 1011
0.009 11110
0.009 11111
31
example of various block coding (cnt’d)
code 3: run-length Huffman (upper-bound = 8)
length prob. codeword
length prob. codeword
0
0.1
110
4 0.066 1011
1
0.09 1000
5 0.059 1110
2 0.081 1001
6 0.053 1111
3 0.073 1010
7+ 0.478
0
with n blocks...
0.1n×1 + ... + 0.478n×7 = 5.215n symbols
0.1n×3 + ... + 0.478n×1 = 2.466n bits
the average codeword length per symbol = 2.466 / 5.215 = 0.47
32
summary of today’s class
Huffman codes are good, but how good are they?
Huffman codes for extended information sources
possible means (手段) to improve the efficiency
Shannon’s source coding theorem
the theoretical limit of efficiency
some more variations of Huffman codes
blocks of symbols with variable block length
33
exercise
Write a computer program to construct a Huffman code for a
given probability distribution.
Modify the above program so that it can handle fixed-length
block coding.
Give distribution, change the block length, and observe how
the average codeword length changes according to the change.
34