Optimal Unequal Letter Cost Prefix-Free Codes
Quanquan Liu
1
Introduction
Variable-length codes are important in applications such as data compression. They were introduced in the seminal work of Shannon [4] in 1948. Variable-length codes are different from fixedlength codes in that variable-length codes map source symbols to codewords that have variable
lengths. Fixed-length codes, on the other hands, map all symbols to codewords that are the same
length. Variable-length codes often assign symbols that occur with higher frequency to shorter
length codewords and symbols that occur less frequently to longer length codewords.
In mathematical terms, suppose our source alphabet is S and our code alphabet is T . A code,
C : S → T ∗ , is a total function that maps a symbol in S to a sequence of letters in T . This
sequence of letters from T is a codeword. An extended code, C ∗ : S ∗ → T ∗ , is a code that
maps sequences of source symbols to sequences of code letters. A code is uniquely decodable
if the extended code mapping is one-to-one. In other words, a code is uniquely decodable if
any sequence of code letters cannot be decoded in more than one way. A prefix-free code is a
uniquely decodable code that has an additional restriction that no codeword is a prefix of another
codeword. A prefix-free code has the property of being instantaneously decodable which means
that a codeword can be immediately decoded once we see the entire codeword.
Given a code alphabet where all letters have equal costs, Huffman coding is a greedy algorithm
that finds an optimal uniquely decodable code for any probability distribution over a set of source
symbols or words. The uniquely decodable code offered by Huffman coding is also prefix-free.
However, when we consider code alphabets where letters have unequal letters costs, finding the
optimal code is less obvious. One example of an unequal letter cost code is Morse code. In Morse
code, the dashes are twice as long as the dots, therefore, the cost of the dash is twice the cost of
the dot since greater time is needed in transmitting a dash over a communication channel. The
problem of finding an optimal code for any set of unequal letter costs in polynomial time has been
studied by a variety of sources including [1] and [2] but still remains an open question.
In this paper, we consider the simplified problem of finding an optimal code for unequal letter
costs of 1 and 2 (as in Morse code) and focus on results that proves that the simplified problem
can be solved in polynomial time. First, we define in Section 2 some notations and definitions we
use throughout the paper. Then, we present in Section 3 some original insights into the simplified
problem of only letters of costs 1 and 2. Furthermore, in Section 4, we present a result that solves
the unequal letter costs problem with integer costs in time polynomial in the maximum cost. The
result, thus, implies a polynomial time solution for the simplified, Morse code case. Finally, in
Section 5 we present some open questions relating to our observations in the simplified unequal
letter cost problem.
2
Notations and Definitions
Let us first define the unequal letter cost code problem in more concrete terms.
Definition 1. Unequal Letter Costs Problem (ULC)[1]
1
Let Σ = {σ1 , σ2 , . . . , σn } be a set of letters in a coding alphabet. Let cost(σi ) denote the cost
of letter σi , where cost(σi ) ≥ 0 for all i. Given a probability distribution P = {p1 , p2 , . . . , pm }
on a set of m words, we may construct a uniquely decodable code χ where χj is the codeword for
the j-th word. Then, cost(χj ) is the sum of the costs ofP
all letters in codeword χj .
An optimal code is one of minimum cost where m
i=1 pi cost(χi ) is minimized. The ULC
problem seeks to find an optimal code in polynomial time.
We also define the simplification of ULC by the following definition.
Definition 2. Simplified Unequal Letter Costs Problem (SULC)
Let Σ = {σ1 , σ2 } consist of two letters with costs cost(σ1 ) = 1 and cost(σ
P 2 ) = 2. Using
the same notation as in Definition 1, an optimal code is one where the cost m
i=1 pi cost(χi ) is
minimized. We would like to find this minimum cost code in polynomial time.
Henceforth, we refer to the general unequal letter costs problem as ULC and the simplified
version as SULC.
Furthermore, we define some terminology that is used throughout the rest of this paper. We
define cost(w) to be the cost of a codeword w. The probability that a source symbol w0 occurs is
defined as p(w0 ).
3
SULC
There are several ways of approaching the SULC problem, some of which are more intuitive than
others. We first present in Section 3.1 an example of a very intuitive Huffman coding variation
adapted for SULC that fails to yield an optimal code. Finally, using insights gained from the
intuitive Huffman coding variation scheme, in Section 3.2, we present a proof that no greedy
algorithm can produce an optimal code.
3.1
Simple Huffman Coding Variation
The first obvious Huffman coding scheme for SULC is the following. It is similar to the Huffman
coding scheme for equal cost letter codes because it builds a code tree from the bottom up. We
call this scheme the Simple Huffman Coding Variation (SHCV) scheme. Suppose we have a list
of probabilities, P = {p1 , . . . , pn }. At each step, it performs the following procedure:
1. Pick the probability, pi , with the smallest value from P and set it to be the letter with cost 2.
2. Pick the second lowest probability, pj , from P and set it to be the letter with cost 1.
3. Create a parent node and set its probability to p1 + p2 . Create an edge from this parent node
to pi and let that edge have cost 2. Create an edge from the parent to pj and let that edge
have cost 1. Remove pi and pj from P and add p1 + p2 to P . We call this step the Huffman
reduction step.
4. Repeat this procedure until only one probability remains.
We see that the above algorithm is equivalent to the Huffman coding scheme for letters of
equal costs except that we choose to assign probabilities of smaller values to branches that have
greater costs.
We look at an example to show that thiscoding scheme
does not produce an optimal code in all
1 1 1 1 1
cases. Suppose we have the probabilities, 8 , 8 , 4 , 4 , 4 , on a set of words. Let {A, B, C, D, E}
2
be the codewords for each of the 5 source words. Using SHCV given above, we create the following prefix-free code tree.
Figure 1: Let the probability distribution on a set of words be {1/8, 1/8, 1/4, 1/4, 1/4}. Let a dash, −, be
cost 2 and a dot, ., be cost 1. Using SHCV, we devise the code tree above. The cost of each codeword is:
A = 4, B = 3, C = 3, D = 4, E = 3. This gives a cost of 27
8 .
Figure 2: A more optimal code tree with the same source probability distribution as Fig. 1 can be found by
switching codewords until the codewords that occur the least frequently are assigned to the longest lengths.
Here, we switched A and E. The cost of each codeword is: A = 3, B = 3, C = 3, D = 4, E = 4. This
gives a cost of 26
8 .
Fig. 1 shows the code, Cm , obtained from using SHCV. The code produced by using this
0
scheme has cost cost(Cm ) = 27
8 . Fig 2 shows the code, Cm , obtained from a more optimal code
0 ) = 26 . Therefore, SHCV cannot
tree. In this code tree, the cost of the code produced is cost(Cm
8
be optimal. We use the concept of switching branches when we prove that any greedy strategy
cannot be optimal in Section 3.2 for the unequal letter cost problem.
3.2
Proof of Simple Greedy Non-Optimality
In this section, we prove that no simple greedy strategy of picking a value based on the relative
ordering of word frequencies may produce an optimal code for SULC (and, hence, ULC). We first
provide a definition of greedy we use in the proof. Then, we prove that no strategy is optimal.
Definition 3. A greedy strategy to the SULC problem first orders the source probabilities by values
to produce an ordered list of probabilities, P = [p1 , p2 , . . . , pk ] where pi represents the i-th
smallest probability. Then, it deterministically chooses the i-th smallest (resp. largest) from the
3
list to assign to the branch with cost 2 and the j-th smallest (resp. largest) for the branch of
cost 1. The greedy strategy deterministically chooses i and j independent of k and the values
of the probabilities. For example, given two sets of probabilities a1 ≤ a2 ≤ · · · ≤ ak and
b1 ≤ b2 ≤ · · · ≤ bk , it greedily chooses the probabilities after each Huffman reduction, using
the same i and j regardless of the values of the individual probabilities and regardless of k. After
picking i and j, Huffman reduction is performed and the code tree is built from the bottom up. If
either i or j falls outside of the range of values (i.e. i > |P | or j > |P |), we resolve the issue
using the following cases:
1. If i > |P |, set i = |P |. Similarly, if j > |P |, set j = |P |.
2. If i = j due to step 1, we set
(a) i = |P | − 1 and j = |P | if we are picking the i-th smallest and j-th smallest values
and i < j originally
(b) j = |P | − 1 and i = |P | if we are picking the i-th smallest and j-th smallest values
and i > j originally
(c) i = |P | − 1 and j = |P | if we are picking the i-th largest and j-th largest values and
i > j originally
(d) j = |P | − 1 and i = |P | if we are picking the i-th largest and j-th largest values and
i < j originally
(e) Otherwise, k = |P | and l = |P | − 1 where k = max(i, j) and l = min(i, j), breaking
ties arbitrarily.
Using Definition 3, we prove that any greedy stategy defined as such cannot produce an optimal code tree.
Theorem 1. There exists no greedy strategy as defined by Definition 3 that produces an optimal
prefix-free code for SULC.
Proof. Suppose that for a probability distribution, p1 , p2 , . . . , pk , the element with the i-th order
statistic is picked to occupy the branch of cost 2 and the element with the j-th order statistic is
picked to occupy the branch with cost 1. We assume all probabilities are distinct, otherwise, we
can create small perturbations so that the probabilities are distinct. Let pl = ∞ when l > k.
(This designation is only used as a placeholder in the proofs. We never place a probability value
p1 where l > k in our code tree.) We show that for every pair of i and j, there exists a set of
probabilities p1 , p2 , . . . , pk such that the code tree produced is not optimal.
We do not have to consider the case where i = j since we pick 2 distinct elements. We
consider the other cases below. Suppose that for all the cases below, pi refers to the probability
picked from P during the first round of the procedure.
1. If pi > pj , then, we may switch the codewords for the j-th and the i-th probabilities (from
the first round) to produce a code with smaller expected cost. Suppose that L is the length
of the codeword for the i-th probability and L0 is the length of the codeword for the j-th
probability. Then, L ≥ L0 + 1 and L − L0 ≥ 1. Suppose the cost of the code produced is
C. Because pi > pj , if we switch the codewords, then C − pi (L − L0 ) + pj (L − L0 ) < C.
2. The case where pi < pj is more complex. The basic idea is that we want to perform
Huffman reduction 3 times each with a probability that is smaller than pj . Then, we would
have a probability, pu < pj , that is assigned to a codeword of smaller cost, which we can
switch with the codeword for pj , resulting in a more optimal expected cost of the code. See
Fig. 3 for an illustration of the proof procedure.
4
Figure 3: i and j refer to the i-th smallest and the j-th smallest probabilities in P . The procedure for
proving a greedy algorithm not optimal is to construct a set of probabilities such that there exists a subtree
where a greater probability is assigned to a branch of greater cost and a smaller probability is assigned to a
branch of smaller cost. Then, we may switch the codewords associated with the two probabilities to achieve
a more optimal code. In this figure, we perform the greedy algorithm as defined by Definition 3. In step c),
we see that probability 0.1 is assigned to a codeword of cost c + 3 while the probability 0.07 is assigned to
a codeword of cost c + 2. Therefore, switching the codewords for the two probabilities will result in a more
optimal code.
We have to consider four different cases:
(a) i and j refer to the i-th smallest and j-th smallest elements in P :
The case when j ≤ 2 is proven by the counterexample given in Fig. 2. The case
when j = 3 is proven by Fig. 4.
Therefore, we only consider the case when j > 3. First, suppose that |j − i| >
2. Let X = {i, i + 1, i + 2, j, j + 1, j + 2, j + 3}, and x ∈ X be the x-th smallest
element in the original list P . We need to pick probabilities that satisfy the following
inequalities:
pj+1 < pi + pj < pj+2
(1)
pj+2 < pi + pj + pi+1 < pj+3
(2)
5
For k > |j + 3 − i|, we can solve for probabilities that satisfy the above inequalities.
When the above inequalities are satisfied, then pi+2 will be assigned to a codeword of
less cost than the codeword assigned to pj , in which case, we may switch the codewords for the two probabilities and produce a code of smaller cost.
When |j − i| ≤ 2, we need to perform a slightly different analysis, but with the same
conclusion. We set k = j + 1. We need to find probabilities that satisfy the following
inequality:
pi + pj > pj+1 .
(3)
Then, for the case when |j − i| = 2, the codeword for probability pi−1 would be
assigned to a shorter codeword than the codeword assigned to probability pj , so we
may switch the two codewords to produce a code of smaller cost.
When |j − i| = 2, the probability pi−2 will be set to a smaller codeword than the
codeword for pj .
Figure 4: The list of probabilities P = [0.01, 0.02, 0.03, 0.94] can have the given code tree. Any code tree
created by either j = 3, i = 1 or j = 3, i = 2 where j and i are the j-th and i-th smallest values in P will
result in a less optimal code tree.
(b) i and j refers to the i-th and j-th largest values:
The case when |j − i| > 2 and |j − i| ≤ 2 reduces symmetrically to the case given
for part 2a of the proof. However, the case when we are only considering the i-th and
j-th largest elements is considerably easier than the case presented in part 2a. Suppose
k > i + 2. Let Y = {i, i + 1, i + 2, j, j − 1}, and y ∈ Y be the y-th largest element
in the original list P of ordered probabilities. For both |j − i| > 2 and |j − i| ≤ 2, we
need to have probabilities that satisfy the following inequalities:
pi + pj < pj−1
(4)
pi + pj + pi+1 < pj−1
(5)
pi + pj + pi+1 + pi+2 < pj−1 .
(6)
This case is simpler than 2a because the number of probabilities greater than pj never
decreases. Therefore, we only need to maintain the current sum as the j-th largest element in order to ensure that a source word with smaller frequency than pj is assigned
to a codeword that also has a smaller cost.
6
Here, pi+2 will be assigned to a codeword with smalller cost than the codeword for pj .
Therefore, we may switch the two codewords to obtain a more optimal code.
(c) i-th largest and j-th smallest elements: This case is not possible because for any i and
j, we may set a k such that pi > pj . We only need to pick a value of k such that
k − i > j or k > j + i.
(d) i-th smallest and j-th largest elements: This case is very simple. We pick k such that
(k − j) − i > 3 or k > 3 + j + i. Let Z = {i, i + 1, i + 2, k − j, k − j + 1}, and
z ∈ Z be the z-th smallest element in the original list P .
Then, we need to satisfy the following inequalities:
pi + pj < pk−j+1
(7)
pi + pj + pi+1 < pk−j+1
(8)
If we satisfy the above inequalities, pi+2 will be assigned to a codeword of smaller
cost than the codeword assigned to pj , so they may be switched to get a code with a
more optimal expected cost. See Fig. 5 for an example of this case.
Figure 5: Here, we take the i-th smallest probability and assign it to a branch of cost 2 and the j-th largest
probability to assign to the branch of cost 1. For the given set of probabilities, the greedy algorithm finds a
set of codewords that produces a more optimal code if they are switched.
Theorem 1 can be used to prove that a variety of strategies are sub-optimal. Some example
strategies include picking the smallest and largest values each round. In the future, it might be
fruitful to look at adaptive greedy stategies and see if they can either yield an optimal tree or be
proven to be sub-optimal. By adaptive strategies, we mean strategies that perform some calculations on the values of P and chooses an i and a j based on such computations.
7
4
Dynamic Programming Algorithm for Integer ULC [3]
In this section, we discuss the O(nC+2 ) algorithm given in [3]. For the sake of clarity, we omit
certain proofs from our discussion. To see the complete proof of correctness of the algorithm,
please refer to [3]. This algorithm optimally encodes n words in O(nC+2 ) time if the costs of the
code letters are integers between 1 and C.
4.1
Overview and Terminology
Suppose we are given a source with n symbols and a code alphabet A = {α1 , α2 , . . . , αr } with r
letters. Let L(c) be the length of codeword c. Then, the cost of letter αi is given by ci = L(αi ).
Suppose that all ci are integers and lie in the range [0, C]. Let P = {p1 , . . . , pn } be the frequency
that the source symbols occur.
All trees used in this algorithm will have edge lengths corresponding to the cost of the letter
designated by that edge. The main idea of the dynamic programming algorithm presented in [3]
is to construct a full tree from the top down. As the full tree is being constructed, we compute
the cost of the codewords that have been assigned so far. Then, we pick the tree that optimally
assigns the codewords to the frequencies among all possible full trees. We define a full tree as
one where all internal nodes of the tree have a full set of r children. Since a full tree may have
more than n leaves, we pad the probability sequence with 00 s to create a new probability sequence
P 0 = {p1 , . . . , pn , 0, 0, 0, . . . }.
Suppose we are given a tree T that may or may not be full, we define F ill(T ) to be a full tree
created from a code tree T by filling the internal nodes of T such that each internal node has r
children. Since each of the nodes that are used to fill T has probability 0, we conclude that
cost(F ill(T )) ≤ cost(T ).
(9)
To bound the maximum number of iterations of our dynamic programming algorithm, we need
to consider the maximum size of F ill(T ). Each internal node of T must have at least 2 children.
Suppose an internal node has only one child. Then, we merge the child with its parent to result
in a code of smaller expected cost, a contradiction. Since each internal node of T has at least two
children, T has at most n − 1 internal nodes.P
T has at most n − 1 internal nodes because we may
n
i
build a full binary tree with n children and log
i=1 (n/2 ) ≤ n − 1 internal nodes. Suppose that
F ill(T ) has I internal nodes. A full tree with I internal nodes has at most
1 + Ir − I = 1 + (r − 1)I
(10)
leaves. When we create a full tree from T , we do not add any internal nodes, thus, the total
number of leaves in F ill(T ) is at most
1 + (r − 1)(n − 1) ≤ n(r − 1).
(11)
We have now transformed the problem from finding an optimal code tree for a set of n source
symbols to finding an optimal full tree for a set of probabilities P 0 where p1 ≥ p2 ≥ · · · ≥ pn ≥
pn+1 ≥ . . . where pi = 0 for i > n. The number of leaves, m, this tree would contain is bounded
by n ≤ m ≤ n(r − 1). The lower bound is n because the tree must have enough leaves to encode
all of the source symbols. The upper bound is given by Eq. 11. After we find the full tree for P 0 ,
we can remove the 0-probability branches and the tree that remains is an optimal code tree for the
n source symbols.
8
We must introduce one more concept relating to levels in a tree before we explain in detail
the algorithm for finding an optimal full tree. We define the truncation of a tree to the i-th level
to be the portion of T that consists of the root and all nodes whose parents have depth at most i.
Formally, we let T runci (T ) be the truncation of T to level i:
T runci (T ) = root(T ) ∪ {u ∈ T : depth(parent(u)) ≤ i} .
(12)
The purpose of introducing the concept of truncation is to facilitate the discussion of building
the tree level by level. We define level as the distance from the root. For example, a child of the
root with cost 2 will be represented by an edge of length 2 and will be at level 2. When we are at a
particular level, we must decide whether to make a certain subset of the nodes at the level leaves.
We let the signature of a level indicate the information about which nodes in the level are leaves
and which nodes have children. Let
sigi (T ) = (m; l1 , l2 , . . . , lC )
(13)
be the signature for level i where m indicates the number of leaves at a level less than or equal
to i of T runci (T ) and lj be the number of leaves at level i + j. Given T runci (T ), the deepest
level a child of a node at level i may go is i + C since C is the maximum cost of a codeword. In
mathematical terms,
m = | {v ∈ T : v is a leaf, depth(v) ≤ i} |
(14)
lk = | {u ∈ T : depth(u) = i + k} |.
(15)
and
Note that by definition m + l1 + · · · + lC ≤ n(r − 1) since the maximum number of nodes in
a full tree F ill(T ) is n(r − 1).
Furthermore, we may redefine the cost of a truncation tree T runci (T ) in terms of its signature.
costi (T ) = cost(T runci (T )) =
m
X
k=1
depth(vk )pk + i ·
n
X
pk .
(16)
k=m+1
We do not have to consider pk for k > n since all of those probabilities equal 0.
Finally, let OP T [m; l1 , . . . , lC ] be the minimum cost of a truncation tree with signature sigi (T ) =
(m; l1 , . . . , lC ). Formally,
OP T [m; l1 , . . . , lC ] = min {costi (T ) : i ≥ 0, T runci (T ), sigi (T ) = (m; l1 , . . . , lC )} .
4.2
(17)
Algorithm Design
In this section, we use the terminology introduced in Section 4.1 to describe the dynamic programming algorithm for integer letter costs.
9
The concept that remains to be explained is how to expand the tree level by level. Suppose that
T runci (T ) is the truncation of the tree at level i. The question is how many different expansions
to level i + 1 do we have to compute.
Suppose that T runci (T ) has signature sigi (T ) = (m; l1 , . . . , ln ). The maximum number of
leaves at level i + 1 is l1 . Therefore, we may choose any number, q where q ≤ l1 , of l1 to expand
as internal nodes so that they each have a full set of r children and to keep l1 − q of the nodes as
leaves. The total number of possible tree expansions we need to consider is l1 since we have at
most l1 possible q values.
Let dk = 1 if adding children to a node at level i + 1 results in a child being created at level
k. Suppose we choose l1 − q of the nodes in level l1 to remain as leaves, then, the signature of the
next level is
sigi+1 (T ) = (m + l1 ; l2 , . . . , lC , 0) + q(−1; d1 , d2 , . . . , dC ).
(18)
With the new signature, we also need to update our cost function to reflect the new cost by the
addition of a new level. Suppose that our current level i has signature, sigi (T ) = (m; l1 , . . . , lC ).
The cost of the new level is
costi+1 (T ) = costi (T ) +
X
pk .
(19)
m<k≤n
Because of the importance of this step in reducing the runtime of the algorithm, we prove that
Eq. 19 holds.
Lemma 1. Suppose T runci (T ) is a truncation tree at level i with signature sigi (T ) = (m; l1 , l2 , . . . , lC ).
If we expand the tree to level i + 1, the cost of the new T runci+1 (T ) is
costi+1 (T ) = costi (T ) +
X
pk .
m<k≤n
Proof. The cost of the tree at level i + 1 can be determined from the expected cost of all nodes
in m plus the extra term for all other children. Suppose that m0 = min {m + l1 − q, n}. We may
write the detailed cost function as follows
costi+1 (T ) =
X
X
depth(vk )pk +
m<k≤m0
1≤k≤m
X
(i + 1)pk + (i + 1)
pk .
(20)
m0 <k≤n
The first summation term is from the cost of all the nodes that were in m from T runci (T ) at
level i. The second summation term comes from the leaves that were formed at level i + 1 because
we chose l1 − p of the nodes in level i + 1 to be leaves. The third summation term results from
all the children of inner nodes in T runci+1 (T ). We can simplify the summation in Eq. 20 to the
following,
costi+1 (T ) =
X
depth(vk )pk + i
1≤k≤m
X
pk +
m<k≤n
= costi (T ) +
X
X
m<k≤n
Eq. 22 proves our lemma.
10
pk
(21)
pk .
(22)
m<k≤n
The process grows the levels of the tree sequentially. At every level, it determines the q values
that will lead to tree expansions at that iteration. Then, it computes OP T [m; l1 , . . . , lC ] for each
q value. After computing OP T [m; l1 , . . . , lC ], it moves on to the next level. We terminate when
m ≥ n because that means that all of the n source symbols have been assigned to leaves in the
code tree. We provide a pseudocode of this procedure in Algorithm 1. Fig. 6 shows an example of
the algorithm for building the tree level by level.
Figure 6: Example of level by level expansion. The levels are indicated by the horizontal lines. [3]
Algorithm 1 Dynamic Programming for Integer Unequal Letter Costs
1: procedure T REE E XPANSION
2:
Set OP T [m; l1 , . . . , lC ] := ∞ for all entries where m + l1 + · · · + lC ≤ n(r − 1).
Let S be the list of OP T [m; l1 , . . . , lC ] values sorted in lexicographically increasing order:
(m + l1 + · · · + lC , m + l1 + · · · + lC−1 , . . . , m + l1 , m).
3:
While list S is not empty:
• P Obtain the entry OP T [m; l1 , . . . , lC ]. Set new cost := OP T [m; l1 , . . . , lC ] +
m<k≤n pk .
•
For 0 ≤ q ≤ l1 :
–
–
4:
4.3
0 ) := (m + l , l , . . . , l , 0) + q(−1; d , . . . , d ).
Let (m0 , l10 , . . . , lC
1 2
1
C
C
0
0
0 )
0 ]
If (m + l1 , . . . , lC
≤ n(r − 1), then OP T [m0 ; l10 , . . . , lC
0 ], new cost}.
min {OP T [m0 ; l10 , . . . , lC
=
Return min {OP T [m; l1 , . . . , lC ] : m ≥ n} as the cost of an optimal code.
Algorithm Analysis and Implementation
In this section, we provide an analysis of the runtime of the algorithm introduced in Section 4.2 as
well as an implementation of the algorithm using standard graph search principles.
The algorithm described in Algorithm 1 can be performed by constructing a dynamic programming graph where the nodes are signature values and the edges represent a transition with value
q. For example, a possible vertex is given by v = (m; l1 , . . . , lC ). A transition of value q would
connect v to vertex v 0 = (m + l1 ; l2 , . . . , lC , 0) + q(−1; d1 , . . . , dC ). Then, searching this tree
allows us to determine an optimal value for each signature OP T [m; l1 , . . . , lC ]. We may maintain
11
a pointer to an optimal value of each signature. Then, expanding this tree will only expand the tree
from the value indicated by an optimal value pointer.
We prove that the time time it takes to run the T REE E XPANSION algorithm is O(C(n(r −
C+2
1))
).
Theorem 2. T REE E XPANSION runs in O(C(n(r − 1))C+2 ) time.
Proof. Step 1 computes the nodes in the dynamic programming graph. There are a total of V =
O((n(r − 1))C+1 ) nodes in the graph. Each vertex has l1 + 1 = O(n(r − 1)) outgoing edges.
Therefore, there are a total of E = ((n(r − 1))C+2 ) edges in total. Then, each time we expand
a node, we have to update a signature which takes O(C) time since we have to recompute m as
well as the C li values. Therefore, the total time of the algorithm is O(C(n(r + 1))C+2 ) because
searching the graph takes O(|V | + |E|) time and recomputing the signature takes O(C) time when
we visit a new node.
4.4
Algorithm Extension to Shorter Runtime by Pruning
In this section, we describe an improvement on the algorithm presented in Section 4 to achieve the
O(nC+2 ) runtime. For the sake of brevity, we only present a brief description of the improvement
from the dynamic programming approach described above.
The idea of the runtime reduction is to only look at trees with at most n leaves (instead of trees
with at most n(r − 1) leaves. This will reduce the runtime by O((r − 1)C+2 ). Therefore, instead
of looking at all nodes where m + l1 + · · · + lC ≤ n(r − 1) in step 2 of Algorithm 1, we look
at nodes where m + l1 + · · · + lC ≤ n. However, we can no longer only specify an edge by q
since each expanded node could have unequal numbers of children. Here, we must also specify
the number of children each expanded node contains.
Intuitively, it makes sense to keep only the c0 shallowest leaves for each expanded node because we can make a codeword cost smaller by replacing a child with higher cost with a child
of lower cost. A more rigorous proof is presented in [3]. Therefore, it is possible to enforce the
restriction to at most n leaves by perform a reduce step each time we traverse an edge in our
dynamic programming graph.
Suppose that v = (28; 8, 2, 4, 4) is the signature of one node in our dynamic programming
graph with n = 40. When we perform a reduction, reduce(v), we eliminate the leaves with the
greatest costs until we have n leaves. Therefore, reduce(v) = (28; 8, 2, 2, 0).
Algorithm illustrates the process with the reduction step.
From Algorithm 2, we may reduce the runtime of our dynamic programming approach to
O(nC+2 ).
Theorem 3. T REE E XPANSION W ITH R EDUCTION runs in O(nC+2 ) time.
We present Theorem 3 without proof. For a detailed proof, please refer to [3].
5
Discussion
In this paper, we discussed the problem of unequal letter cost codes. Specifically, we presented
(what we believe are) some new results in the case of unequal letter cost where the letter costs are
1 and 2. Furthermore, we discussed a polynomial time algorithm for integer letter costs where the
costs are bounded by a constant C.
Some open problems for future consideration are whether arbitrary greedy strategies (beyond
the form discussed here) may or may not be optimal. The algorithm presented in Section 4 relies
on dynamic programming. Another question to consider is whether a deterministic algorithm that
12
Algorithm 2 Dynamic Programming for Integer Unequal Letter Costs
1: procedure T REE E XPANSION W ITH R EDUCTION
2:
Set OP T [m; l1 , . . . , lC ] := ∞ for all entries where m + l1 + · · · + lC ≤ n. Let S be
the list of OP T [m; l1 , . . . , lC ] values sorted in lexicographically increasing order: (m + l1 +
· · · + lC , m + l1 + · · · + lC−1 , . . . , m + l1 , m).
3:
While list S is not empty:
• P Obtain the entry OP T [m; l1 , . . . , lC ]. Set new cost := OP T [m; l1 , . . . , lC ] +
m<k≤n pk .
•
For 0 ≤ q ≤ l1 :
–
–
4:
0 ) := (m + l , l , . . . , l , 0) + q(−1; d , . . . , d ).
Let (m0 , l10 , . . . , lC
1 2
1
C
C
0
0 )]
If m0 + l10 + · · · + lC
> n, then OP T [reduce(m0 ; l10 , . . . , lC
0
0
0
min {OP T [reduce(m ; l1 , . . . , lC )], new cost}.
=
Return min {OP T [n; 0, . . . , 0]} as the cost of an optimal code.
does not rely on dynamic programming may be used to find an optimal letter cost in the SULC
case.
References
[1] Mordecai J. Golin, Claire Kenyon, and Neal E. Young. Huffman coding with unequal letter
costs. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing,
STOC ’02, pages 785–791, New York, NY, USA, 2002. ACM.
[2] Mordecai J. Golin and Jian Li. More efficient algorithms and analyses for unequal letter cost
prefix-free coding. CoRR, abs/0705.0253, 2007.
[3] Mordecai J. Golin and Gnter Rote. A dynamic programming algorithm for constructing optimal prefix-free codes with unequal letter costs. IEEE Transactions on Information Theory,
44(5):1770–1781, 1998.
[4] C. E. Shannon. A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev., 5(1):3–55, January 2001.
13
© Copyright 2026 Paperzz