Slides: Coding - Stanford University

Coding and Data Compression
Mathias Winther Madsen
[email protected]
Institute for Logic, Language, and Computation
University of Amsterdam
March 2015
Information Theory
All the ’s and ’s implied by the words “small” and “about” in these statements ap
allow T to increase and S0 to approach the maximizing source.
The situation is summarized in Fig. 10 where the input sequences are points on t
sequences points on the right. The fan of cross lines represents the range of possible c
output.
E
M
2H
xT
2H
HIGH PROBABILITY
MESSAGES
2Hy
xT
yT
HIGH PROBABILITY
RECEIVED SIGNALS
REASONABLE CAUSES
FOR EACH E
2Hx
yT
REASONABLE EFFECTS
FOR EACH M
Fig. 10— Schematic representation of the relations between inputs and outputs in a c
Now suppose we have another source producing information at rate R with R
C. I
Claude Shannon: “A Mathematical
Theory
Communication,”
source will have 2T R high
probability of
messages.
We wish to associate these with a select
channel inputs in such a way as to get a small frequency of errors. We will set up this
Bell System Technical Journal, 1948.
23
Information Theory
THE COIEF DIFFIOULTY ALOCE FOUOD OT FIRST WAS IN
OAOAGING HER FLAOINGO: SHE SUCCEODEO ON
GO OTIOG IOS BODY OUOKEO AOAO, COMFOROABLY
EOOOGO, UNDER OER O OM, WITO OTS O O OS HANGIOG
DOO O, BOT OENEOAO OY, OUST AS SO O HOD OOT OTS
O OCK NOCEO O SOROIGHTEOEO O OT, ANO WOS O O ONG
TO OIOE TO O HEDGEHOG O OLOW WOTH ITS O OAD, O O
WOULO TWOST O OSEOF OOUO O ANO O O OK OP IN HOR
OACO, O OTO OUO O A O O OZOED EO OREOSOOO O O O O
SHO COUOD O O O O O O O O O OSO O OG O O O OAO OHO O O:
AOD WHON O O O OAO OOO O O O O O O O DOO O, O OD
O OS GOIOG O O BO O ON O O OIO, O O O OS O O OY
O OOOOO O O O O O O O O O O O OT TO O OEOGO O O O O OD
O OROLO O O O O O O OF, O O O O O O O O OHO O O O O O
OOOOOOOO OOOOO
The Hartley Measure
Definition: The Hartley Measure of Uncertainty
H = log2 |Ω| .
Ralph V. L. Hartley: “Transmission of Information,”
Bell System Technical Journal, 1928.
The Hartley Measure
♠♣♥♦
♠♥♣♦
♠♥♦♣
♠♣♦♥
♠♦♣♥
♠♦♥♣
♣♠♥♦
♣♥♠♦
♣♥♦♠
♣♠♦♥
♣♦♠♥
♣♦♥♠
♠♣♦♥
♠♦♣♥
♠♦♥♣
♠♣♥♦
♠♥♣♦
♠♥♦♣
H = log2 24 = 4.58
♣♠♦♥
♣♦♠♥
♣♦♥♠
♣♠♥♦
♣♥♠♦
♣♥♦♠
The Hartley Measure
00000
00100
01000
01100
10000
10100
00001
00101
01001
01101
10001
10101
00010
00110
01010
01110
10010
10110
H = log2 24 = 4.58
00011
00111
01011
01111
10011
10111
The Hartley Measure
♠♣♥♦
♠♥♣♦
♠♥♦♣
♠♣♦♥
♠♦♣♥
♠♦♥♣
♣♠♥♦
♣♥♠♦
♣♥♦♠
♣♠♦♥
♣♦♠♥
♣♦♥♠
♠♣♦♥
♠♦♣♥
♠♦♥♣
♠♣♥♦
♠♥♣♦
♠♥♦♣
H = log2 24 = 4.58
♣♠♦♥
♣♦♠♥
♣♦♥♠
♣♠♥♦
♣♥♠♦
♣♥♦♠
The Hartley Measure
♠♣♥♦
♠♥♣♦
♠♥♦♣
♠♣♦♥
♠♦♣♥
♠♦♥♣
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
H = log2 6 = 2.58
The Hartley Measure
000
001
010
011
100
101
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
H = log2 6 = 2.58
The Hartley Measure
♠♣♥♦
♠♥♣♦
♠♥♦♣
♠♣♦♥
♠♦♣♥
♠♦♥♣
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
H = log2 6 = 2.58
The Hartley Measure
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
♠♥♣♦
♠♥♦♣
H = log2 2 = 1.00
The Hartley Measure
–
–
–
–
♠♥♣♦
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
H = log2 1 = 0.00
The Hartley Measure
H ==
H =log(k
log+k ?1) ?
H == log(
) ?!?
H = log(∞) ?
Entropy
The Shannon Entropy
X
1
1
H = E log
=
p(x) log
.
p(X)
p(x)
x
0.6
0.6
0.4
0.4
p(x)
p(x)
0.2
0.2
0
1
2
x
3
0
1
2
3
− log p(x)
Entropy
1
H 0.5
0
0
0.5
1
Entropy
Entropy
1−p
6
p
1−p
4
H
p
2
1−p
p
0
0
0.5
p
1
..
.
1
2
3
Entropy
Properties of the entropy
1. Positive: H ≥ 0.
2. Decomposes: H(X × Y) = H(X) + H(Y | X).
3. Reduced (on average) by information: H(X) ≥ H(X | Y).
Definition: Conditional Entropy
H(X | Y) = EY [ H(X | Y) ] =
X
y
p(y) H(X | Y = y)
Huffman Coding
x
a
b
c
d
e
Pr{X = x}
.05
.15
.20
.25
.35
David A. Huffman: “A Method for the Construction of
Minimum-Redundancy Codes,” Proceedings of the
Institute of Radio Engineers, 1952.
Huffman Coding
Huffman Coding
x
Code
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
1001
011101
00011
10100
001
101111
101011
11011
0110
011100000
0111001
10110
101110
0101
1000
110100
p
.0634
.0135
.0242
.0321
.0980
.0174
.0165
.0438
.0552
.0009
.0061
.0336
.0174
.0551
.0622
.0180
− log p
k
x
Code
3.98
6.21
5.37
4.96
3.35
5.84
5.92
4.51
4.18
10.17
7.35
4.89
5.85
4.18
4.01
5.80
4
6
5
5
3
6
6
5
4
9
7
5
6
4
4
6
Q
R
S
T
U
V
W
X
Y
Z
¶
_
’
,
.
?
0111000100
0000
0100
1100
00010
0111110
011110
011100001
101010
01110001011
0111111
111
011100011
1101011
1101010
01110001010
p
.0008
.0470
.0502
.0729
.0234
.0075
.0156
.0014
.0160
.0005
.0084
.1741
.0019
.0117
.0109
.0003
− log p
k
10.33
4.41
4.32
3.78
5.42
7.06
6.00
9.46
5.97
11.04
6.89
2.52
9.06
6.42
6.52
11.56
10
4
4
4
5
7
6
9
6
11
7
3
9
7
7
11