Lecture 16∗
Agenda for the lecture
• Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code
• Variable-length source codes with error
16.1
Error-free coding schemes
16.1.1
The Shannon-Fano-Elias code
Our next coding scheme is Elias’s twist on the Shannon-Fano code of the previous class
which gets rid of the requirement to sort the probabilities. However, this algorithmic
efficiency comes at the cost of 1 bit – we now need H(X) + 2 bits instead of H(X) + 1 as
before.
Input: Source distribution P = (p1 , ..., pm )
Output: Code C = {c1 , ..., cm }.
1. for i = 1, ..., m
(i) Let l(i) = d− log pi e + 1.
∗
c
Himanshu
Tyagi. Feel free to use with acknowledgement.
1
(ii) Compute Fi =
P
j<i pj
+ pi /2 (where F0 = p1 /2). Let c denote the infinite se-
quence corresponding to the binary representation of Fi . If Fi has a terminating
binary representation, append 0s at the end to make it an infinite sequence.
(iii) The codeword ci is given by the first l(i) bits of c, i.e., by the approximation of
c to l(i) bits.
For illustration, consider the example from the last class once more: From the length
Alphabet
a
b
c
d
PX
1/8
1/2
1/4
1/8
Fi in binary
0.0001
0.011
0.11
0.1111
l(i)
4
2
3
4
codeword
0001
01
110
1111
Table 1: An illustration of Shannon-Fano-Elias code.
assignments it is clear that the average length of this code is less than H(X) + 2. It only
remains to verify that this code is prefix free.
Theorem 16.1. A Shannon-Fano-Elias code is prefix-free.
Proof. Consider Fi and Fj such that i < j. Then, as noted in the analysis of Shannon-Fano
codes, the codeword cj satisfies
0.cj ≤ Fj ≤ 0.cj + 2−l(j) ≤ 0.cj + 2log pj −1 = 0.cj +
pj
.
2
Thus,
0.ci ≤ Fi ≤ Fj − (pi + pj )/2 ≤ 0.cj −
pi
.
2
In particular, 0.cj > 0.ci and so cj cannot be a prefix of ci . On the other hand, 0.cj ≥
0.ci + 2−l(i) . Note that since both ci and cj are of finite lengths and ci is of length l(i), ci
can be a prefix of cj only if
0.cj < 0.ci + 2−l(i) ,
2
which does not hold.
16.1.2
Huffman code
We next present the Huffman code, which still requires us to sort the probabilities, but has
p
average length exactly equal to L (X), i.e., it is of optimal average length. Unlike the two
coding schemes earlier, we now represent the prefix-free code by a binary tree.
Input: Source distribution P = (p1 , ..., pm )
Output: Code C = {c1 , ..., cm }.
1. Associate with each symbol a leaf node.
2. while m ≥ 2, do
(i) If m = 2, assign the two symbols as the left and the right child of the root.
Update m to 1.
(ii) Sort the probabilities of symbols in a descending order. Let p1 ≥ p2 ≥ ... ≥ pm
be the sorted sequence of probabilities.
(iii) If m > 2, assign the symbols (m − 1)th and the mth symbols as the left and
the right child of a node. Treat this node and its subtree as a new symbol
which occurs with probability (pm−1 + pm ) and associate a leaf with it. Update
m → m − 1 and P → (p1 , ..., pm−2 , pm−1 + pm ).
3. Generate a binary code from the tree by putting a 0 over each edge to a left child
and 1 over each edge to the right child.
For illustration, consider our foregoing example once more. The algorithm proceeds as
follows1 :
((a, 0.125); (b, 0.5); (c, 0.25)); (d, 0.125))
1
The reader can easily decode our representation of the evolving tree.
3
((b, 0.5); (c, 0.25)); (ab, 0.25))
((b, 0.5); (c(ab), 0.5))
(b(c(ab)), 1).
In summary, we have the following code: The average length for this example is 7/4 which is
Alphabet
a
b
c
d
PX
1/8
1/2
1/4
1/8
codeword
110
0
10
111
Table 2: An illustration of Huffman code.
equal to the entropy. Therefore, the average length must be optimal. In fact, the Huffman
code always attains the optimal average length.
Theorem 16.2. A Huffman code has optimal average length.
We only provide a sketch of the proof. An interested reader can find the proof in Cover
and Thomas.
Proof sketch. The proof relies on the following observation:
There exists a prefix-free code of optimal length which assigns two symbols with the
least probability to the two longest codewords of length lmax such that the first lmax − 1
bits for the two codewords are the same.
Next, we show the optimality of Huffman code by induction on the number of symbols.
Denote by L(P) the minimum average length of a prefix-free code. Suppose Huffman code
attains the optimal length for every probability distribution over an alphabet of size m − 1.
For P = (p1 , ..., pm ), consider the prefix-free code C of average length L(P) guaranteed by
the observation above. Let LH (P) denote the length of the optimal Huffman code. Then,
L((p1 , ..., pm )) ≤ LH ((p1 , ..., pm ))
4
= (pm−1 + pm ) + LH ((p1 , ..., pm−2 , pm−1 + pm ))
= (pm−1 + pm ) + L((p1 , ..., pm−2 , pm−1 + pm )),
where the previous equality is by the induction hypothesis. On the other hand, by property
of the optimal code C, it also yields a prefix-free code for (p1 , ..., pm−2 , pm−1 +pm ) of average
length L((p1 , ..., pm )) − (pm−1 + pm ). But then
L((p1 , ..., pm )) ≥ (pm−1 + pm ) + (pm−1 + pm ) + L((p1 , ..., pm−2 , pm−1 + pm )).
Thus, by combining all the bounds above, all inequalities must hold with equality. In
particular,
L((p1 , ..., pm )) = LH ((p1 , ..., pm )).
16.2
Variable-length source codes with error
Thus far we have seen performance bounds for error-free codes and specific schemes which
come close to these bounds. We now examine what do we have to gain if we allow a
small probability of error. For fixed-length codes, we have already addressed this question.
There the gain depends on the large probability upper bound for the entropy density h(X).
However, asymptotically the gain due to error was negligible since the optimal asymptotic
rate is independent of (strong converse). In the case of variable length codes the situation
is quite different. Allowing error results in significant gains when we are trying to compress
a single symbol as well as results in a rate gain asymptotically.
To make our exposition simple, we allow randomized encoding where in addition to the
source symbol the encoder also has access to a biased coin toss. We show that L (X) ≤
(1 − )L(X). In fact, we prove a stronger statement.
5
Lemma 16.3. Consider a source with pmf P over a discrete alphabet X . Given an errorfree code of average length l and minimum codeword length lmin , there exists a variable
length code which allows a probability of error of at most and has average length no more
than (1 − )l + lmin .
Proof. Consider a code C with average length l, and let c0 be a codeword of minimum
length lmin . We define a new code where for each symbol x ∈ X the encoder flips a coin
which shows heads with probability 1 − and outputs the codeword in C for x if it shows
heads or the codeword c0 if it shows tails. The decoder upon observing a codeword in C
simply outputs the corresponding symbol for C. Note that an error will occur only if the
coin used in encoding showed heads. Therefore, the probability of error is less than . Also,
the average length of the code must now be averaged over both the coin toss as well as the
source distribution. Given that the coin showed heads, the average length of the code is
l. Given that the coin showed tails, the average length of the code is lmin . Therefore, the
overall average length equals (1 − )l + lmin .
Note that lmin for an optimal nonsingular code is 0. Therefore, there exists a code with
probability of error less than and average length less than (1 − )L(X).
6
© Copyright 2026 Paperzz