COMP2610/6261 — Information Theory - Lecture 13

COMP2610/6261 — Information Theory
Lecture 13: Symbol Codes for Lossless Compression
NU Logo UseMarkGuidelines
Reid and Aditya Menon
U logo is a contemporary
on of our heritage.
y presents our name,
ld and our motto:
Research School of Computer Science
The Australian National University
earn the nature of things.
authenticity of our brand identity, there are
rn how our logo is used.
ogo - horizontal logo
Preferred logo
September 2nd, 2014
Black version
ogo should be used on a white background.
cludes black text with the crest in Deep Gold in
CMYK.
rinting is not available, the black logo can
white background.
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
1 / 14
1
Variable-Length Codes
Unique Decodeability
Prefix Codes
2
The Kraft Inequality
3
Summary
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
2 / 14
1
Variable-Length Codes
Unique Decodeability
Prefix Codes
2
The Kraft Inequality
3
Summary
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
3 / 14
Codes: A Review
Notation:
If A is a finite set then AN is the set of all strings of length N.
S
A+ = N AN is the set of all finite strings
Examples:
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111}
{0, 1}+ = {0, 1, 00, 01, 10, 11, 000, 001, 010, . . .}
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
4 / 14
Codes: A Review
Notation:
If A is a finite set then AN is the set of all strings of length N.
S
A+ = N AN is the set of all finite strings
Examples:
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111}
{0, 1}+ = {0, 1, 00, 01, 10, 11, 000, 001, 010, . . .}
Binary Symbol Code
Let X be an ensemble with AX = {a1 , . . . , aI }.
A function c : AX → {0, 1}+ is a code for X .
The binary string c(x) is the codeword for x ∈ AX
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
4 / 14
Codes: A Review
Notation:
If A is a finite set then AN is the set of all strings of length N.
S
A+ = N AN is the set of all finite strings
Examples:
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111}
{0, 1}+ = {0, 1, 00, 01, 10, 11, 000, 001, 010, . . .}
Binary Symbol Code
Let X be an ensemble with AX = {a1 , . . . , aI }.
A function c : AX → {0, 1}+ is a code for X .
The binary string c(x) is the codeword for x ∈ AX
The length of the codeword for for x is denoted `(x).
Shorthand: `i = `(ai ) for i = 1 . . . , I .
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
4 / 14
Codes: A Review
Notation:
If A is a finite set then AN is the set of all strings of length N.
S
A+ = N AN is the set of all finite strings
Examples:
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111}
{0, 1}+ = {0, 1, 00, 01, 10, 11, 000, 001, 010, . . .}
Binary Symbol Code
Let X be an ensemble with AX = {a1 , . . . , aI }.
A function c : AX → {0, 1}+ is a code for X .
The binary string c(x) is the codeword for x ∈ AX
The length of the codeword for for x is denoted `(x).
Shorthand: `i = `(ai ) for i = 1 . . . , I .
The extension of c assigns codewords to any sequence x1 x2 . . . xN
from A+ by c(x1 . . . xN ) = c(x1 ) . . . c(xN )
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
4 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A+
X to 000100100001
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A+
X to 000100100001
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A+
X to 0100
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
5 / 14
Unique Decodeability
Unique Decodeability
A code c for X is uniquely decodeable if no two strings from A+
X have
+
the same codeword. That is, for all x, y ∈ AX
x 6= y =⇒ c(x) 6= c(y)
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
6 / 14
Unique Decodeability
Unique Decodeability
A code c for X is uniquely decodeable if no two strings from A+
X have
+
the same codeword. That is, for all x, y ∈ AX
x 6= y =⇒ c(x) 6= c(y)
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodeable
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
Why?
September 2nd, 2014
6 / 14
Unique Decodeability
Unique Decodeability
A code c for X is uniquely decodeable if no two strings from A+
X have
+
the same codeword. That is, for all x, y ∈ AX
x 6= y =⇒ c(x) 6= c(y)
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodeable
C2 = {0, 10, 110, 111} is uniquely decodeable
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
Why?
September 2nd, 2014
6 / 14
Unique Decodeability
Unique Decodeability
A code c for X is uniquely decodeable if no two strings from A+
X have
+
the same codeword. That is, for all x, y ∈ AX
x 6= y =⇒ c(x) 6= c(y)
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodeable Why?
C2 = {0, 10, 110, 111} is uniquely decodeable
C20 = {1, 10, 110, 111} is not uniquely decodeable because
c(aaa) = c(d) = 111
Mark Reid and Aditya Menon (ANU)
and
c(ab) = c(c) = 110
COMP2610/6261 — Information Theory
September 2nd, 2014
6 / 14
Unique Decodeability
Unique Decodeability
A code c for X is uniquely decodeable if no two strings from A+
X have
+
the same codeword. That is, for all x, y ∈ AX
x 6= y =⇒ c(x) 6= c(y)
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodeable Why?
C2 = {0, 10, 110, 111} is uniquely decodeable
C20 = {1, 10, 110, 111} is not uniquely decodeable because
c(aaa) = c(d) = 111
and
c(ab) = c(c) = 110
Why is unique decodeability useful for compression?
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
6 / 14
Prefix Codes
a.k.a prefix-free or instantaneous codes
There is an simple property of codes that guarantees unique decodeability.
Prefix
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword
c0 ∈ {0, 1}+ if there exists a string t ∈ {0, 1}+ such that c0 = ct.
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
7 / 14
Prefix Codes
a.k.a prefix-free or instantaneous codes
There is an simple property of codes that guarantees unique decodeability.
Prefix
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword
c0 ∈ {0, 1}+ if there exists a string t ∈ {0, 1}+ such that c0 = ct.
Example: 01101 has prefixes 0, 01, 011, 0110.
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
7 / 14
Prefix Codes
a.k.a prefix-free or instantaneous codes
There is an simple property of codes that guarantees unique decodeability.
Prefix
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword
c0 ∈ {0, 1}+ if there exists a string t ∈ {0, 1}+ such that c0 = ct.
Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1 , . . . , cI } is a prefix code if for every codeword ci ∈ C
there is no prefix of ci in C .
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
7 / 14
Prefix Codes
a.k.a prefix-free or instantaneous codes
There is an simple property of codes that guarantees unique decodeability.
Prefix
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword
c0 ∈ {0, 1}+ if there exists a string t ∈ {0, 1}+ such that c0 = ct.
Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1 , . . . , cI } is a prefix code if for every codeword ci ∈ C
there is no prefix of ci in C .
Examples:
C1 = {0001, 0010, 0100, 1000} is prefix-free
C2 = {0, 10, 110, 111} is prefix-free
C20 = {1, 10, 110, 111} is not prefix free since c3 = 110 = c1 c2
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
7 / 14
Prefix Codes
a.k.a prefix-free or instantaneous codes
There is an simple property of codes that guarantees unique decodeability.
Prefix
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword
c0 ∈ {0, 1}+ if there exists a string t ∈ {0, 1}+ such that c0 = ct.
Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1 , . . . , cI } is a prefix code if for every codeword ci ∈ C
there is no prefix of ci in C .
Examples:
C1 = {0001, 0010, 0100, 1000} is prefix-free
C2 = {0, 10, 110, 111} is prefix-free
C20 = {1, 10, 110, 111} is not prefix free since c3 = 110 = c1 c2
C200 = {1, 01, 110, 111} is not prefix free since c3 = 110 = c1 10
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
7 / 14
Prefix Codes as Trees
C1 = {0001, 0010, 0100, 1000}
00
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
1110
1111
September 2nd, 2014
8 / 14
Prefix Codes as Trees
C2 = {0, 10, 110, 111}
00
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
1110
1111
September 2nd, 2014
8 / 14
Prefix Codes as Trees
C20 = {1, 10, 110, 111}
00
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
1110
1111
September 2nd, 2014
8 / 14
Prefix Codes are Uniquely Decodeable
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
If `∗ = max{`1 , . . . , `I } then symbol is
decodeable after seeing at most `∗ bits
Consider C2 = {0, 10, 110, 111}
I
I
I
I
If
If
If
If
c(x) = 0 . . . then x1 = a
c(x) = 1 . . . then x1 ∈ {b, c, d}
c(x) = 10 . . . then x1 = b
c(x) = 11 . . . then x1 ∈ {c, d}
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
9 / 14
Prefix Codes are Uniquely Decodeable
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
If `∗ = max{`1 , . . . , `I } then symbol is
decodeable after seeing at most `∗ bits
Consider C2 = {0, 10, 110, 111}
I
I
I
I
If
If
If
If
c(x) = 0 . . . then x1 = a
c(x) = 1 . . . then x1 ∈ {b, c, d}
c(x) = 10 . . . then x1 = b
c(x) = 11 . . . then x1 ∈ {c, d}
1110
1111
However, not all uniquely decodeable codes are prefix codes
C3 = {0, 01, 011, 111} — Not prefix-free but uniquely decodeable
Hint: Notice c3 (bdca) = 011110110 and c2 (acdb) = 011011110
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
Why?
September 2nd, 2014
9 / 14
1
Variable-Length Codes
Unique Decodeability
Prefix Codes
2
The Kraft Inequality
3
Summary
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
10 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4}
L2 = {1, 2, 3, 3}
L3 = {2, 2, 3, 4, 4}
L4 = {1, 3, 3, 3, 3, 4}
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4} — C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}
L3 = {2, 2, 3, 4, 4}
L4 = {1, 3, 3, 3, 3, 4}
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4} — C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3} — C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}
L4 = {1, 3, 3, 3, 3, 4}
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4} — C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3} — C2 = {0, 10, 110, 111}
,
,
}
L3 = {2, 2, 3, 4, 4} — C3 = {00, ,
L4 = {1, 3, 3, 3, 3, 4}
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4} — C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3} — C2 = {0, 10, 110, 111}
,
,
}
L3 = {2, 2, 3, 4, 4} — C3 = {00, 01,
L4 = {1, 3, 3, 3, 3, 4}
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4} — C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3} — C2 = {0, 10, 110, 111}
,
}
L3 = {2, 2, 3, 4, 4} — C3 = {00, 01, 100,
L4 = {1, 3, 3, 3, 3, 4}
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4} — C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3} — C2 = {0, 10, 110, 111}
}
L3 = {2, 2, 3, 4, 4} — C3 = {00, 01, 100, 1010,
L4 = {1, 3, 3, 3, 3, 4}
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4} — C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3} — C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4} — C3 = {00, 01, 100, 1010, 1011}
L4 = {1, 3, 3, 3, 3, 4}
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Lengths for Prefix Codes
Suppose someone said “I want codes with codewords lengths”:
L1 = {4, 4, 4, 4} — C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3} — C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4} — C3 = {00, 01, 100, 1010, 1011}
L4 = {1, 3, 3, 3, 3, 4} — Impossible!
Could you construct such codes? Uniquely Decodeable? Prefix-free?
001
0
010
01
011
100
10
101
1
110
11
111
Mark Reid and Aditya Menon (ANU)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
1110
1111
COMP2610/6261 — Information Theory
September 2nd, 2014
11 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
2 x 2-bit codewords: {00, 01}
1110
1111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
1110
1111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
The total symbol code budget
000
00
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
1110
1111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
The total symbol code budget
000
00
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k-bit codewords
1111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
The total symbol code budget
000
00
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k-bit codewords
1111
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
2 x 2-bit codewords: {00, 01}
The total symbol code budget
000
00
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k-bit codewords
1110
1111
For lengths L = {`1 , . . . , `I } and `∗ = max{`1 , . . . , `I }, there will be
I
X
2`
∗ −`
i
i=1
excluded `∗ -bit codewords.
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
The total symbol code budget
000
00
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k-bit codewords
1111
For lengths L = {`1 , . . . , `I } and `∗ = max{`1 , . . . , `I }, there will be
I
X
2`
∗ −`
i
≤ 2`
∗
i=1
∗
excluded `∗ -bit codewords. But there are only 2` possible `∗ -bit codewords
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
The total symbol code budget
000
00
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k-bit codewords
1111
For lengths L = {`1 , . . . , `I } and `∗ = max{`1 , . . . , `I }, there will be
I
1 X `∗ −`i
2
≤1
2`∗
i=1
∗
excluded `∗ -bit codewords. But there are only 2` possible `∗ -bit codewords
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
Prefixes Exclude Codes
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
001
0
010
01
011
100
10
101
1
110
11
111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
The total symbol code budget
000
00
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k-bit codewords
1111
For lengths L = {`1 , . . . , `I } and `∗ = max{`1 , . . . , `I }, there will be
I
X
2−`i ≤ 1
i=1
∗
excluded `∗ -bit codewords. But there are only 2` possible `∗ -bit codewords
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
12 / 14
The Kraft Inequality
a.k.a. The Kraft-McMillan Inequality
Kraft Inequality
For any prefix (binary) code C , its codeword lengths {`1 , . . . , `I } satisfy
I
X
2−`i ≤ 1
(1)
i=1
Conversely, if the set {`1 , . . . , `I } satisfy (1) then there exists a prefix code
C with those codeword lengths.
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
13 / 14
The Kraft Inequality
a.k.a. The Kraft-McMillan Inequality
Kraft Inequality
For any prefix (binary) code C , its codeword lengths {`1 , . . . , `I } satisfy
I
X
2−`i ≤ 1
(1)
i=1
Conversely, if the set {`1 , . . . , `I } satisfy (1) then there exists a prefix code
C with those codeword lengths.
Examples:
1
C1 = {0001, 0010, 0100, 1000} is prefix and
Mark Reid and Aditya Menon (ANU)
P4
COMP2610/6261 — Information Theory
i=1 2
−4
=
1
4
≤1
September 2nd, 2014
13 / 14
The Kraft Inequality
a.k.a. The Kraft-McMillan Inequality
Kraft Inequality
For any prefix (binary) code C , its codeword lengths {`1 , . . . , `I } satisfy
I
X
2−`i ≤ 1
(1)
i=1
Conversely, if the set {`1 , . . . , `I } satisfy (1) then there exists a prefix code
C with those codeword lengths.
Examples:
1
2
P
C1 = {0001, 0010, 0100, 1000} is prefix and 4i=1 2−4 = 14 ≤ 1
P
C2 = {0, 10, 110, 111} is prefix and 4i=1 2−`i = 21 + 41 + 28 = 1
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
13 / 14
The Kraft Inequality
a.k.a. The Kraft-McMillan Inequality
Kraft Inequality
For any prefix (binary) code C , its codeword lengths {`1 , . . . , `I } satisfy
I
X
2−`i ≤ 1
(1)
i=1
Conversely, if the set {`1 , . . . , `I } satisfy (1) then there exists a prefix code
C with those codeword lengths.
Examples:
1
2
3
P
C1 = {0001, 0010, 0100, 1000} is prefix and 4i=1 2−4 = 14 ≤ 1
P
C2 = {0, 10, 110, 111} is prefix and 4i=1 2−`i = 21 + 41 + 28 = 1
P
Lengths {1, 2, 2, 3} give 4i=1 2−`i = 12 + 24 + 81 > 1 so no prefix code
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
13 / 14
Summary
Key ideas from this lecture:
Prefix and Uniquely Decodeable variable-length codes
Prefix codes are tree-like
Every Prefix code is Uniquely Decodeable but not vice versa
The Kraft Inequality:
I
I
P −`i
Code lengths satisfying P
≤ 1 implies Prefix/U.D. code exists
i 2
Prefix/U.D. code implies i 2−`i ≤ 1
Relevant Reading Material:
MacKay: §5.1 and §5.2
Cover & Thomas: §5.1, §5.2, and §5.5
Mark Reid and Aditya Menon (ANU)
COMP2610/6261 — Information Theory
September 2nd, 2014
14 / 14