Uniquely Decodable Code
1. Related Notions
2. Determining UDC
3. Kraft Inequality
2004 SDU
A Resulting Problem
Given a coding scheme of the source symbols, how to verify
whether it is uniquely decodable or not?
2004, 2009 SDU
2
Related Notions
alphabet: = {0, 1, …, -1}
symbol or letter: an element of alphabet
word: a sequence of symbols of finite length
Code: a collection of words on a specified alphabet
codeword: a word in a code
message: a sequence of codewords
Uniquely decodable code C: every message can be uniquely
decomposed into the codewords in C
{0, 10, 01} vs {0, 10, 11}
2004, 2009 SDU
3
Related Notions
prefix and suffix: if w = ps, then p is prefix of w and s is suffix of
w
empty word: a word with length 0
suffix word: a non-empty word t is called a suffix word if there
exist two messages C1C2…Cm and C1’C2’…Cn’ such that
Ci, Cj’ are all codewords for 1 i m, 1 j n, and C1 C1’,
t is the suffix of Cn’,
C1C2…Cm t = C1’C2’…Cn’.
2004, 2009 SDU
4
A Key Lemma for Determining UDC
Lemma.
A code C is uniquely decodable if and only if each suffix word
is not a codeword in C.
Proof.
Suppose that a suffix word t is a codeword in C, according
to the definition of suffix word, there exist two messages
C1C2…Cm and C1’C2’…Cn’ such that C1 C1’ and C1C2…Cmt
= C1’C2’…Cn’. Hence, there are two ways to decompose the
message C1’C2’…Cn’, indicating that C is not uniquely
decodable. A contradiction to that C is a UDC.
2004, 2009 SDU
5
Proof
Suppose that C is not uniquely decodable, then there exists
some message which can be decomposed in more than one ways.
Let be such a message of the least length, = C1C2…Ck =
C1’C2’…Cn’, where Ci (1 i k), Cj’ (1 j n) are all
codewords, and C1 C1’. Without loss of generality, assume that
Ck is a suffix of Cn’, then Ck is a suffix word. A contradiction to
that each suffix word is not a codeword in C.
2004, 2009 SDU
6
UDC Verification
By the key lemma
If we can generate all the suffix words of a code C
If none of suffix words is a codeword in C, then C is uniquely
decodable.
If some suffix words are codewords, then C is not uniquely
decodable.
The following determining algorithm is directly from the key
lemma.
2004, 2009 SDU
7
The Determining Algorithm
UDC-Verification(C)
1T
2 for each pair of codeword Ci, Cj C (i j) do
3 if Ci = Cj, then return NO. (C is not uniquely decodable)
4 if there exists a word s such that Cis = Cj or Ci = Cjs, then
T T {s}
5 endfor
6 for each pair of suffix word t and codeword Ck do
7 if t = Ck, then return NO. (C is not uniquely decodable)
8 if there exists a word s such that ts = Ck or Cks = t, then
T T {s}
9 endfor
10 return YES. (C is uniquely decodable)
2004, 2009 SDU
8
Correctness of Algorithm
Theorem.
The algorithm UDC-Verification correctly verifies whether a
code C is uniquely decodable or not.
Proof.
we should prove: (1) Each word s put into T in Step 1.2 or
Step 2.2 is a suffix word. (2) If the algorithm stops at Step 3,
then the algorithm computes all the suffix words of code C
and ensures that they are not codewords.
2004, 2009 SDU
9
Proof
(1). The word s put in T in Step 1.2 is obviously a suffix word. We
next consider the word s put into T in Step 2.2. As t is a suffix
word, there exist codewords C1, C2,…, Cm and C1’, C2’, …, Cn’
such that C1 C1’ and C1C2…Cmt = C1’C2’…Cn’.
If ts = Ck, then C1C2…CmCk = C1’C2’…Cn’s, indicating s is a
suffix word.
If Cks = t, then C1C2…CmCks = C1’C2’…Cn’, indicating s is a
suffix word.
2004, 2009 SDU
10
Proof
(2). For each suffix word t of C, let m(t) = C1C2…Cm be the shortest
message satisfying C1C2…Cmt = C1’C2’…Cn’ and t is the suffix
of Cn’. Prove by induction on the length of m(t) that t can be
generated by the algorithm.
Basic Step: |m(t)| = 1, then n = m =1, so t is generated in Step 1.2.
Inductive Step: Suppose every suffix word p with |m(p)| < |m(t)|
had been generated by the algorithm, we now prove that t can
also be generated by the algorithm. Because t is the suffix of Cn’,
we have pt = Cn’, then C1C2…Cm = C1’C2’…Cn-1’p.
2004, 2009 SDU
11
Proof
(i). If p = Cm, then Cmt = Cn’, t is generated in Step 1.2.
(ii). If p is suffix of Cm, according to C1C2…Cm = C1’C2’…Cn-1’p,
p is a suffix word. For |m(p)| < |m(t)|, the inductive hypothesis
indicates that p had been generated by the algorithm. So when
applying suffix word p and codeword Cn’ in Step 2, Step 2.2 will
put t into T since pt = Cn’.
(iii). If Cm is a suffix of p, then Cmt is suffix of Cn’, then Cmt is a
suffix word for C1C2…Cmt = C1’C2’…Cn’, and |m(Cmt)|
|C1C2…Cm-1|, the inductive hypothesis indicates that Cmt had
been generated by the algorithm. So when applying suffix word
Cmt and codeword Cm in Step 2, Step 2.2 will put t into T for Cmt
= Cmt.
suffix word
2004, 2009 SDU
12
Time Complexity Analysis
Suppose there are n codewords in C, and the length of the
longest word is l, then
Step 1: O(n2l) comparisons
Step 2: Number of suffix words is at most O(nl), So O(n2l2)
comparisons and O(n2l2) insertion of suffix words into T.
Totally, O(n2l2).
2004, 2009 SDU
13
Property of UDC—Kraft Inequality
1.
Let C = {C1, C2, …, Cn} be a uniquely decodable code on an alphabet of
cardinality , let li = |Ci| for 1 i n, then we have
n
l
1
i
2.
Kraft Inequality
1
Conversely, if ai set
of integers {l1, l2, ..., ln} satisfies the Kraft inequality, then a
prefix code C = {C1, C2, …, Cn} can be found with codeword lengths {l1, l2, ...,
ln}.
Note:
prefix code C = {C1, C2, …, Cn} means that neither Ci nor Cj is a prefix of the other, for
each pair of codewords Ci and Cj (i j). Strictly, called prefix-free code
Prefix-free code is UDC
{00, 10, 11, 100, 111} vs {00, 10, 11, 010, 011}
2004, 2009 SDU
14
Proof of Property 1
(in text book page 246):
Let m be an arbitrary positive integer, then
n
(
li
n
)
i 1
m
n
n
...
i1 1 i2 1
( li1 li2 ... lim )
im 1
For each of nm messages consisting of m codewords, there is a unique
corresponding term in the above formula. Let N(m, j) be the number of
messages of length j and consisting of m codewords. Then
n
n
n
...
i1 1 i2 1
( li1 li2 ... lim )
im 1
mlˆ
N (m, j ) j
length of the longest
codeword in C
j m
C is uniquely decodable, there are no identical messages. So N(m, j) j,
mlˆ
mlˆ
We have
j
N (m, j ) j j mlˆ
j m
j m
n
So, for any positive integer m > 0, there is, ( l ) m mlˆ
i 1
So the Kraft Inequality Holds.
i
2004, 2009 SDU
15
Proof of Property 2
Let 1< 2 < … < m be m integers such that {l1, l2, …, ln} = {1,
2, …, m} when ignoring repeats. Let kj is the number of li’s
that equals to j. We should prove that, there exists a prefix
code C such that the number of codewords in C with length j is
kj. The Kraft Inequality becomes
Prove by induction that: For each 1 r m, there exists prefix
code Cr such that for any 1 j r, the number of codewords in
Cr with length j is kj.
n
k
i 1
2004, 2009 SDU
j
i
1
16
Proof of Property 2
Basic Step: r = 1, the above inequality means k1-1 1, which is k1
1. Obviously there exist 1different words of length 1, we can
arbitrarily select k1 of them to form C1.
Inductive Step: Suppose that
Cr exists for r < m, rwe
prove that Cr+1
1
r 1
exist for r +1 m. From k j 1 , we have k j ,
r 1
i
which means
kr 1
r 1
i 1
i 1
r 1
i
r
k j r 1 i
i 1
Among the r+1 different words with length r+1, there are k r 1 j
j
r
codewords with length j in C . So we can select kr+1different words
with length r+1, and the codewords in Cr are not prefix of them. So
we extend Cr to Cr+1.
2004, 2009 SDU
17
Thanks for attention!
2004, 2009 SDU
18
© Copyright 2026 Paperzz