Document

Uniquely Decodable Code
1. Related Notions
2. Determining UDC
3. Kraft Inequality
 2004 SDU
A Resulting Problem
Given a coding scheme of the source symbols, how to verify
whether it is uniquely decodable or not?
 2004, 2009 SDU
2
Related Notions
alphabet:  = {0, 1, …, -1}
symbol or letter: an element of alphabet 
word: a sequence of symbols of finite length
Code: a collection of words on a specified alphabet
codeword: a word in a code
message: a sequence of codewords
Uniquely decodable code C: every message can be uniquely
decomposed into the codewords in C
 {0, 10, 01} vs {0, 10, 11}
 2004, 2009 SDU
3
Related Notions
prefix and suffix: if w = ps, then p is prefix of w and s is suffix of
w
empty word: a word with length 0
suffix word: a non-empty word t is called a suffix word if there
exist two messages C1C2…Cm and C1’C2’…Cn’ such that
 Ci, Cj’ are all codewords for 1  i  m, 1  j  n, and C1  C1’,
 t is the suffix of Cn’,
 C1C2…Cm t = C1’C2’…Cn’.
 2004, 2009 SDU
4
A Key Lemma for Determining UDC
Lemma.
A code C is uniquely decodable if and only if each suffix word
is not a codeword in C.
Proof.
 Suppose that a suffix word t is a codeword in C, according
to the definition of suffix word, there exist two messages
C1C2…Cm and C1’C2’…Cn’ such that C1  C1’ and C1C2…Cmt
= C1’C2’…Cn’. Hence, there are two ways to decompose the
message C1’C2’…Cn’, indicating that C is not uniquely
decodable. A contradiction to that C is a UDC.
 2004, 2009 SDU
5
Proof
 Suppose that C is not uniquely decodable, then there exists
some message which can be decomposed in more than one ways.
Let  be such a message of the least length,  = C1C2…Ck =
C1’C2’…Cn’, where Ci (1  i  k), Cj’ (1  j  n) are all
codewords, and C1  C1’. Without loss of generality, assume that
Ck is a suffix of Cn’, then Ck is a suffix word. A contradiction to
that each suffix word is not a codeword in C.
 2004, 2009 SDU
6
UDC Verification
By the key lemma
If we can generate all the suffix words of a code C
 If none of suffix words is a codeword in C, then C is uniquely
decodable.
 If some suffix words are codewords, then C is not uniquely
decodable.
The following determining algorithm is directly from the key
lemma.
 2004, 2009 SDU
7
The Determining Algorithm
UDC-Verification(C)
1T
2 for each pair of codeword Ci, Cj C (i  j) do
3 if Ci = Cj, then return NO. (C is not uniquely decodable)
4 if there exists a word s such that Cis = Cj or Ci = Cjs, then
T  T  {s}
5 endfor
6 for each pair of suffix word t and codeword Ck do
7 if t = Ck, then return NO. (C is not uniquely decodable)
8 if there exists a word s such that ts = Ck or Cks = t, then
T  T  {s}
9 endfor
10 return YES. (C is uniquely decodable)
 2004, 2009 SDU
8
Correctness of Algorithm
Theorem.
The algorithm UDC-Verification correctly verifies whether a
code C is uniquely decodable or not.
Proof.
we should prove: (1) Each word s put into T in Step 1.2 or
Step 2.2 is a suffix word. (2) If the algorithm stops at Step 3,
then the algorithm computes all the suffix words of code C
and ensures that they are not codewords.
 2004, 2009 SDU
9
Proof
(1). The word s put in T in Step 1.2 is obviously a suffix word. We
next consider the word s put into T in Step 2.2. As t is a suffix
word, there exist codewords C1, C2,…, Cm and C1’, C2’, …, Cn’
such that C1  C1’ and C1C2…Cmt = C1’C2’…Cn’.
If ts = Ck, then C1C2…CmCk = C1’C2’…Cn’s, indicating s is a
suffix word.
If Cks = t, then C1C2…CmCks = C1’C2’…Cn’, indicating s is a
suffix word.
 2004, 2009 SDU
10
Proof
(2). For each suffix word t of C, let m(t) = C1C2…Cm be the shortest
message satisfying C1C2…Cmt = C1’C2’…Cn’ and t is the suffix
of Cn’. Prove by induction on the length of m(t) that t can be
generated by the algorithm.
Basic Step: |m(t)| = 1, then n = m =1, so t is generated in Step 1.2.
Inductive Step: Suppose every suffix word p with |m(p)| < |m(t)|
had been generated by the algorithm, we now prove that t can
also be generated by the algorithm. Because t is the suffix of Cn’,
we have pt = Cn’, then C1C2…Cm = C1’C2’…Cn-1’p.
 2004, 2009 SDU
11
Proof
(i). If p = Cm, then Cmt = Cn’, t is generated in Step 1.2.
(ii). If p is suffix of Cm, according to C1C2…Cm = C1’C2’…Cn-1’p,
p is a suffix word. For |m(p)| < |m(t)|, the inductive hypothesis
indicates that p had been generated by the algorithm. So when
applying suffix word p and codeword Cn’ in Step 2, Step 2.2 will
put t into T since pt = Cn’.
(iii). If Cm is a suffix of p, then Cmt is suffix of Cn’, then Cmt is a
suffix word for C1C2…Cmt = C1’C2’…Cn’, and |m(Cmt)| 
|C1C2…Cm-1|, the inductive hypothesis indicates that Cmt had
been generated by the algorithm. So when applying suffix word
Cmt and codeword Cm in Step 2, Step 2.2 will put t into T for Cmt
= Cmt.
suffix word
 2004, 2009 SDU
12
Time Complexity Analysis
Suppose there are n codewords in C, and the length of the
longest word is l, then
 Step 1: O(n2l) comparisons
 Step 2: Number of suffix words is at most O(nl), So O(n2l2)
comparisons and O(n2l2) insertion of suffix words into T.
 Totally, O(n2l2).
 2004, 2009 SDU
13
Property of UDC—Kraft Inequality
1.
Let C = {C1, C2, …, Cn} be a uniquely decodable code on an alphabet of
cardinality , let li = |Ci| for 1 i  n, then we have
n
l

 1
i
2.
Kraft Inequality
1
Conversely, if ai set
of integers {l1, l2, ..., ln} satisfies the Kraft inequality, then a
prefix code C = {C1, C2, …, Cn} can be found with codeword lengths {l1, l2, ...,
ln}.
Note:


prefix code C = {C1, C2, …, Cn} means that neither Ci nor Cj is a prefix of the other, for
each pair of codewords Ci and Cj (i  j). Strictly, called prefix-free code
Prefix-free code is UDC
{00, 10, 11, 100, 111} vs {00, 10, 11, 010, 011}
 2004, 2009 SDU
14
Proof of Property 1
(in text book page 246):
 Let m be an arbitrary positive integer, then
n
( 
 li
n
)
i 1
m
n
n
  ... 
i1 1 i2 1
 ( li1  li2 ... lim )
im 1
 For each of nm messages consisting of m codewords, there is a unique
corresponding term in the above formula. Let N(m, j) be the number of
messages of length j and consisting of m codewords. Then
n
n
n
...
i1 1 i2 1
 ( li1  li2 ... lim )
im 1
mlˆ
  N (m, j )  j
length of the longest
codeword in C
j m
 C is uniquely decodable, there are no identical messages. So N(m, j)  j,
mlˆ
mlˆ
We have
j
 N (m, j )    j  j  mlˆ
j m
j m
n
 So, for any positive integer m > 0, there is, (   l ) m  mlˆ
i 1
 So the Kraft Inequality Holds.
i
 2004, 2009 SDU
15
Proof of Property 2
Let 1< 2 < … < m be m integers such that {l1, l2, …, ln} = {1,
2, …, m} when ignoring repeats. Let kj is the number of li’s
that equals to j. We should prove that, there exists a prefix
code C such that the number of codewords in C with length j is
kj. The Kraft Inequality becomes
Prove by induction that: For each 1 r  m, there exists prefix
code Cr such that for any 1  j  r, the number of codewords in
Cr with length j is kj.
n
k 
i 1
 2004, 2009 SDU
j
 i
1
16
Proof of Property 2
 Basic Step: r = 1, the above inequality means k1-1  1, which is k1 
1. Obviously there exist 1different words of length 1, we can
arbitrarily select k1 of them to form C1.
 Inductive Step: Suppose that
Cr exists for r < m, rwe
prove that Cr+1
1
r 1
exist for r +1  m. From  k j    1 , we have  k j      ,
r 1
i
which means
kr 1  
r 1
i 1
i 1
r 1
i
r
  k j r 1 i
i 1
Among the r+1 different words with length r+1, there are k  r 1  j
j
r
codewords with length j in C . So we can select kr+1different words
with length r+1, and the codewords in Cr are not prefix of them. So
we extend Cr to Cr+1.
 2004, 2009 SDU
17
Thanks for attention!
 2004, 2009 SDU
18