An Approximation Algorithm for Binary Searching in Trees - PUC-Rio

An Approximation Algorithm for Binary Searching in Trees ∗
Eduardo Laber†
Marco Molinaro‡
Abstract
We consider the problem of computing efficient strategies for searching in trees. As a
generalization of the classical binary search for ordered lists, suppose one wishes to find a
(unknown) specific node of a tree by asking queries to its arcs, where each query indicates
the endpoint closer to the desired node. Given the likelihood of each node being the one
searched, the objective is to compute a search strategy that minimizes the expected number
of queries. Practical applications of this problem include file system synchronization and
software testing. Here we present a linear time algorithm which is the first constant factor
approximation for this problem. This represents a significant improvement over previous
O(log n)-approximation.
Keywords: Binary search, partially ordered sets, trees, approximation algorithms
1
Introduction
Searching in ordered structures is a fundamental problem in theoretical computer science. In
one of its most basic variants, the objective is to find a special element of a totally ordered set
by making queries which iteratively narrow the possible locations of the desired element. This
can be generalized to searching in more general structures which have only a partial order for
their elements instead of a total order [5, 20, 4, 23, 21].
In this work, we focus on searching in structures that lay between totally ordered sets and
the most general posets: we wish to efficiently locate a particular node in a tree. More formally,
as input we are given a rooted tree T = (V, E) which has a ‘hidden’ marked node and a function
w : V → R that gives the likelihood of a node being the one marked. For example, T could be
modeling a network with one defective unit. In order to discover which node of T is marked,
we can perform edge queries: after querying the arc (i, j) of T (j being a child of i)1 , we receive
an answer stating that either the marked node is a descendant2 of j (called a yes answer) or
that the marked node is not a descendant of j (called a no answer).
A search strategy is a procedure that decides the next query to be posed based on the
outcome of previous queries. As an example, consider the strategy for searching the tree T of
Figure 1.a represented by the decision tree D of Figure 1.b. A decision tree can be interpreted
as a strategy in the following way: at each step we query the arc indicated by the node of
D that we are currently located. In case of a yes answer, we move to the right child of the
current node and we move to its left child otherwise. We proceed with these operations until
the marked node is found. Let us assume that 4 is the marked node in Figure 1.a. We start at
∗
Preliminary version in ICALP 2008.
[email protected]. Informatics Department, PUC-Rio. Corresponding address: Informatics Department of
PUC-Rio, Rua Marquês de São Vicente 225, RDC, 4o andar, CEP 22453-900, Rio de Janeiro - RJ, Brazil.
‡
[email protected]. Tepper School of Business, Carnegie Mellon University.
1
Henceforth, when we refer to the arc (i, j), j is a child of i.
2
We assume that a node is a descendant of itself.
†
1
the root of D and query the arc (3, 4) of T , asking if the marked node is a descendant of node
4 in T . Since the answer is yes, we move to the right child of (3, 4) in D and we query the arc
(4, 6) in T . In this case, the outcome of the query (4, 6) is no and then we move to node (4, 5)
of D. By querying this node we conclude that the marked node of T is in indeed 4.
(a)
(b)
1
(3, 4)
no
2
3
(1, 2)
yes
no
4
(1, 3)
5
yes
no
1
6
2
yes
3
(4, 6)
yes
no
6
(4, 5)
no
4
yes
5
Figure 1: (a) Tree T . (b) Example of a decision tree for T ; Internal nodes correspond to arcs
of T and leaves to nodes of T .
P
We define the expected number of queries of a strategy S as v∈V sv w(v), where sv is the
number of queries needed by S to find the marked node when v is the marked node. Therefore,
our optimization problem is to find the strategy with minimum expected number of queries.
We make a more formal definition of a strategy by means of decision trees in Section 2.1.
Besides generalizing a fundamental problem in theoretical computer science, searching in
posets (and in particular in trees) also has practical applications in file system synchronization
and software testing [21]. We remark that although these applications were considered in the
‘worst case’ version of this problem, taking into account the likelihood that the elements are
marked (for instance via code complexity measures in the former example) may lead to improved
searches.
Apart from the above mentioned applications, strategies for searching in trees have a potential application in the context of asymmetric communication protocols [1, 2, 10, 16, 25]. In
this scenario, a client has to send a binary string x ∈ {0, 1}t to the server, where x is drawn
from a probability distribution D that is only available to the server. The asymmetry comes
from the fact that the client has much larger bandwidth for downloading than for uploading.
In order to benefit from this discrepancy, both parties agree on a protocol to exchange bits
until the server learns the string x. Such protocol typically aims at minimize the number of
bits sent by the client (although other factors such as the number of bits sent by the server
and the number of communication rounds should also be taken into account). In one of the
first proposed protocols [2, 16], at each round the server sends to the client a binary string y
and the client replies with a single bit depending on whether y is a prefix of x or not. Based
on the client’s answer, the server updates his knowledge about x and send another string if he
has not learned x yet. It is not difficult to realize that this protocol corresponds to a strategy
for searching a marked leaf in a complete binary tree of height t, where only the leaves of the
tree have non-zero probability. In fact, the binary strings in {0, 1}t can be represented by a
complete binary tree of height t where every edge that connects a node to its left(right) child
is labeled with 0(1). This generates a 1-1 correspondence between binary strings with length
at most t and the edges of the tree and, as a consequence, the message y sent by the server
naturally corresponds to an edge query.
Statement of the results. Our main result is a linear-time algorithm that provides the
first constant-factor approximation for the problem of searching in trees where the goal is to
minimize the expected number of queries. The algorithm is based on the decomposition of the
input tree into special paths. A search strategy is computed for each of these paths and then
combined to form a strategy for searching the original tree. This decomposition is motivated
2
by the fact that the problem of binary searching in paths is easily reduced to the well-solved
problem of searching in ordered lists with access probabilities [14].
We shall notice that the complexity of this problem remains open, which contrasts with
its ‘worst case’ version that is polynomially solvable [4, 23, 21]. In fact, the ‘average case’
version studied here is one more example of problems related to searching and coding whose
complexities are unknown [8, 3, 12, 11].
Related work. Searching in totally ordered sets is a very well-studied problem [14]. In
addition, many variants have also been considered, such as when there is information about the
likelihood of each element being the one marked [7], or where each query has a different fixed
cost and the objective is to find a strategy with least total cost [13, 17, 18]. As a generalization
of the latter, [22, 24] considered the variant when the cost of each query depends on the queries
posed previously.
The most general version of our problem when the input is a poset instead of a tree was first
considered by Lipman and Abrahams [20]. Apart from introducing the problem, they present
an optimized exponential time algorithm for solving it. In [15], Kosaraju et al. present a greedy
O(log n)-approximation algorithm. In fact, their algorithm handles more general searches, see
[19, 6] for other more general results. To the best of our knowledge, this O(log n)-approximation
algorithm is the only available result, with theoretical approximation guarantee, for the average
case version of searching in trees. Therefore, our constant approximation represents a significant
advance for this problem.
The variant of our problem where the goal is to minimize the number of queries in the
worst case for searching in trees, instead of minimizing the average case, was first considered
by Ben-Asher et al. [4]. They have shown that it can be solved in O(n4 log3 n) via dynamic
programming. Recent advances [23, 21] have reduced the time complexity to O(n3 ) and then
O(n). In contrast, the more general version of the worst case minimization where the input is
a poset instead of a tree is NP-hard [5].
2
2.1
Preliminaries
Notation
For any tree T and node j ∈ T , we denote by Tj the subtree of T composed by all descendants
of j. In addition, we denote the root of a tree T by r(T ). We also extend the set difference
operation to trees: given trees T 1 = (V 1 , E 1 ) and T 2 = (V 2 , E 2 ), T 1 − T 2 is the forest of T 1
induced by the nodes V 1 P
− V 2 . Furthermore, we extend the weight function to a subtree T ′ in
′
a standard way w(T ) = v∈T ′ w(v). Finally, for any multiset W of non-negative real values,
we define its entropy as:
X
w
w log2 P
H(W ) = −
′
w′ ∈W w
w∈W
We remark that this seemingly unnatural extension of the definition of entropy for unnormalized multisets greatly simplifies the notation in the proofs. Its homogeneity is a particularly
important property. We should also mention that, although not explicit in some passages, all
the logarithms in this text are in base 2.
Every query strategy for a tree T can be represented by a binary decision tree D such that
a path of D indicates what queries should be made at each step. In a decision tree D, each
internal node corresponds to a query for a different arc of T and each leaf of D corresponds to a
different node of T (these correspondences shall become clear later). In addition, each internal
node u of D satisfies a search property that can be described as follows. If u corresponds to a
3
query for the arc (i, j), then: (i) u has exactly one right subtree and one left subtree; (ii) all
nodes of the right subtree of u correspond to either a query for an arc in Tj or to a node in Tj ;
(iii) all nodes of the left subtree of u correspond to either a query for an arc in T − Tj or to a
node in T − Tj .
For any decision tree D, we use u(i,j) to denote the internal node of D which corresponds
to a query for the arc (i, j) of T and uv to denote the leaf of D which corresponds to the node
v of T .
From the example presented in the introduction, we can infer an important property of the
decision trees. Consider a tree T and a search strategy given by a decision tree D for T . If v is
the marked node of T , the number of queries posed to find v is the distance (in arcs) from the
root of D to uv .
For any decision tree D, we define d(u, v, D) as the distance between nodes u and v in D
(when the decision tree is clear from the context, we omit the last parameter of this function).
Thus, the expected (with respect to w) number of queries it takes to find the marked node using
the strategy given by the decision tree D, or simply the cost of D, is given by:
X
cost(D, w) =
d(r(D), uv , D)w(v)
v∈T
Therefore, the problem of computing a search strategy for (T, w) which minimizes the expected number of queries can be recast as the problem of finding a decision tree for T with
minimum cost, that is, that minimizes cost(D, w) among all decision trees D for T . The cost
of such minimum cost decision tree is denoted by OPT(T, w).
2.2
Basic Properties
Now we present two properties of decision trees which are crucial for the analysis of the algorithm
proposed in the next section. Consider a subtree T ′ of T ; we say that a node u is a representative
of T ′ in a decision tree D if the following conditions hold: (i) u is a node of D that corresponds
to either an arc or a node of T ′ (ii) u is an ancestor of all other nodes of D which correspond to
arcs or nodes of T ′ . For instance, in the decision tree of Figure 1.b, u(3,4) is the representative
of T3 and u(4,6) is the representative of T4 .
The next lemma asserts the existence of a unique representative for each subtree of T (proof
in the Appendix).
Lemma 1. Consider a tree T and a decision tree D for T . For each subtree T ′ of T , there is
a unique node u ∈ D which is the representative of T ′ in D.
We denote the representative of T ′ (with respect to some decision tree) by u(T ′ ).
The second property is given by the following lemma:
Lemma 2. Consider
P a tree T , a weight function w and a decision tree D for T . Then for every
subtree T ′ of T , x∈T ′ d(u(T ′ ), ux , D)w(x) ≥ OPT(T ′ , w).
′
′
Proof. It suffices
P to argue ′that for every subtree T of T there is a decision tree for T which
costs at most x∈T ′ d(u(T ), ux , D)w(x). This is done by induction on the number of nodes of
T ′.
When T ′ is a single node the result trivially holds, so suppose T ′ has at least two nodes.
Because T ′ has more than one node, its representative u(T ′ ) in D corresponds to an arc of T ′ , say
(i, j). Let D1 and D2 be respectively decision trees for Tj′ and T ′ − Tj′ that satisfy the inductive
hypothesis. Then consider the following decision tree DT ′ for T ′ : its root r′ corresponds to a
4
query for the arc (i, j), the right subtree of r′ equals D1 and the left subtree of r′ equals D2 .
Clearly DT ′ has cost:
cost(DT ′ , w) = w(T ′ ) + cost(D1 , w) + cost(D2 , w)
X
X
1 + d(u(Tj′ ), ux , D) w(x) +
≤
x∈T ′ −Tj′
x∈Tj′
1 + d(u(T ′ − Tj′ ), ux , D) w(x)
(1)
Now consider a path in D from its root to the node ux , where x is an arbitrary node in
As Tj′ ⊆ T ′ ⊆ T , the definition of a representative implies that this path has the form
(r(D)
u(T ′ )
u(Tj′ )
ux ). Since u(T ′ ) corresponds to the arc (i, j), which does not belong
′
′
to Tj , u(Tj ) must be a proper descendant of u(T ′ ). Therefore, the length of the path from u(T ′ )
to ux satisfies:
d(u(T ′ ), ux , D) ≥ 1 + d(u(Tj′ ), ux , D)
(2)
Tj′ .
The same argument shows that the path in D from its root to the node ux , where x is a
arbitrary node in T ′ − Tj′ , satisfies the following analogous inequality:
d(u(T ′ ), ux , D) ≥ 1 + d(u(T ′ − Tj′ ), ux , D)
(3)
Employing P
the bounds provided by (2) and (3) on inequality (1) we have that the cost of
DT ′ is at most x∈T ′ d(u(T ′ ), ux , D)w(x). This completes the inductive step and consequently
the proof of the lemma.
3
An Algorithm for Searching in Trees
In this section we present our algorithm for searching in trees. A crucial observation is that this
problem can be efficiently solved when the input tree is a path. This is true because it can be
easily reduced to the well-known problem of searching a hidden marked element from a total
ordered set U in a sorted list L ⊆ U of elements, where each element of U has a given probability
of being the marked one (see Section B of the appendix). Due to this correspondence, we can
employ an approximation algorithm for the latter problem in order to obtain an approximation
(with the same guarantee) for searching in path-like trees. We remark that one can use the
exact algorithm from [14] to obtain optimal strategies for searching in path-like trees. However,
we rely on the faster approximation algorithm from [7] in order to obtain a linear running time
for our algorithm.
Motivated by this observation, the algorithm decomposes the input tree into special paths,
finds decision trees for each of these paths (with modified weight functions) and combine them
into a decision tree for the original tree. In our analysis, we obtain a lower bound on the
optimal solution and an upper bound on the returned solution in terms of the costs of the
decision trees for the paths. Thus, the approximation guarantee of the algorithm is basically a
constant times the guarantee of the approximation used to compute the decision trees for the
paths. Throughout the text, we present the execution and analysis of the algorithm over an
instance (T, w), where T is rooted at node r. Moreover, we assume that T has more than one
node, which excludes trivial instances.
For every node u ∈ T , we define the cumulative weight of u as the sum of the weights of its
descendants, namely w(Tu ). A heavy path Q of T is defined recursively as follows: r belongs to
Q; for every node u in Q, the non-leaf child of u with greatest cumulative weight also belongs
to Q. This construction guarantees that a heavy path of T has the following properties:
1. If u is an internal node of T that does not belong to the heavy path then w(Tu ) ≤ w(T )/2.
5
2. If u is the last node of the heavy path then all of its children are leaves of T .
Let Q = (q1 → . . . → q|Q| ) be a heavy path of T . We define T qi = Tqi − Tqi+1 , for i < |Q|
and T q|Q| = Tq|Q| . In addition, we define Tqji as the jth heaviest maximal subtree rooted at a
child of qi not in Q (Figure 2). Finally, let ni denote the number of children of qi that do not
belong to Q and define eji as the arc of T connecting the node qi to the subtree Tqji .
1
3
T1 T1
1
T3
T31
T32
Figure 2: Example of structures T qi and Tqji , with nodes of the heavy path in black.
Now we explain the high level structure of the solution returned by the algorithm. The
main observation is that we can break the task of finding the marked node in three stages: first
finding the node qi of Q such that T qi contains the marked node, then querying the arcs {eji }j
′
to discover which tree Tqji contains the marked node or that the marked node is qi , and finally
′
(if needed) locate the marked node in Tqji . The algorithm follows this reasoning: it computes a
decision tree for the heavy path Q, then a decision tree for querying the arcs {eji } and recursively
computes a decision tree for each tree Tqji .
Now we present the algorithm itself, which consists of the following five steps:
(i) Find a heavy path Q of T and then for each qi ∈ Q define w′ (qi ) = w(T qi )/w(T ).
(ii) Calculate a decision tree DQ for the instance (Q, w′ ) using the approximation algorithm
presented in [7].
(iii) Calculate recursively a decision tree Dij for each instance (Tqji , w).
(iv) Build a decision tree Di for each T qi as follows. The leftmost path of Di consists of nodes
corresponding to the arcs e1i , . . . , eni i , with a node uqi appended at the end. In addition,
for every j ∈ {1, . . . , ni }, Dij is the right subtree of the node corresponding to eji in Di
(Figure 3).
(v) Construct the decision tree D for T by replacing the leaf uqi of DQ by Di , for each qi ∈ Q
(Figure 4).
It is not difficult to see that the tree D computed by the algorithm is a decision tree for
T . In the next sections we present an upper bound on the cost of D and lower bounds on the
cost of an optimal decision tree for (T, w). Together these bounds will give an approximation
guarantee for the algorithm.
3.1
Upper Bound
As the trees {Tqji } and Q form a partition of the nodes of T , we analyze the distance of the root
of D to the nodes in each of these structures separately in order to upper bound the cost of D.
First, consider a tree Tqji and let x be a node in Tqji . Noticing that ux is a leaf of the tree Dij ,
the path from r(D) to ux in D must contain the node r(Dij ). Then, by the construction made
6
(b)
(a)
ue1i
qi
ue2i
e1i e2
i e3
i
Tq1i
Tq2i
Di1
ue3i
Di2
uqi
Tq3i
Di3
Figure 3: Illustration of Step (iv) of the algorithm. (a) Tree T with heavy path in bold. (b)
Decision tree Di for T i .
(b)
(a)
D4
uq4
uq1
uq2
D1
uq3
D2
D3
Figure 4: Illustration of Step (v) of the algorithm. (a) Decision tree DQ built at Step (ii). (b)
Decision tree D constructed by replacing the leaves {uqi } by the decision trees {Di }.
in Step (iv), the path from r(D) to ux in D has the form (r(D)
r(Dij )
ux ). Notice that the path (r(D)
ue1 → ue2 → . . . → uej
i
i
i
ue1 ) in D is the same as the path (r(D)
i
r(Dij )
u qi )
Dij .
Employing the
to ux is the same in D and in
in DQ . In addition, the path from
previous observations, we have that the length of the path (r(D)
ue1 → ue2 → . . . → uej
i
r(Dij )
i
i
ux ) is:
d(r(D), ux , D) = d(r(D), uqi , DQ ) + j + d(r(Dij ), ux , Dij )
(4)
Now we consider a node qi ∈ Q. Again due to the construction made in Step (iv) of the
algorithm, it is not difficult to see that the path from r(D) to uqi in D traverses the leftmost
path of Di , that is, this path has the form (r(D)
ue1 → ue2 → . . . → ueni → uqi ). Because
i
i
i
the path (r(D)
ue1 ) in D is the same as the path (r(D)
uqi ) in DQ , it follows that the
i
length of the path from the root of D to uqi is:
d(r(D), uqi , D) = d(r(D), uqi , DQ ) + ni
(5)
Weighting the distances to reach nodes in {Tqji } and in Q and employing equalities (4) and
(5) we find the cost of D:
X
XX X
d(r(D), ux , D)w(x) +
d(r(D), uqi , D)w(qi )
cost(D, w) =
qi ∈Q j
=
XX
qi ∈Q
x∈Tqji
d(r(D), uqi , DQ )w(Tqji ) +
qi ∈Q j
+
d(r(D), uqi , DQ )w(qi ) +
qi ∈Q
XX X
qi ∈Q j
X
d(r(Dij ), ux , Dij )w(x)
+
7
j · w(Tqji )
qi ∈Q j
X
qi ∈Q
x∈Tqji
XX
ni w(qi )
(6)
Recall we define w′ (qi ) = w(T qi )/w(T ) at Step (i) of the algorithm. Then the combined
sum of the first two summations of equation (6) can be written as:
XX
X
d(r(D), uqi , DQ )w(Tqji ) +
d(r(D), uqi , DQ )w(qi )
qi ∈Q j
qi ∈Q
X
d(r(D), uqi , DQ )w(T qi ) = w(T )
=
qi ∈Q
X
d(r(D), uqi , DQ )w′ (qi )
qi ∈Q
′
= w(T ) · cost(DQ , w )
(7)
Moreover, the fourth summation of equation (6) can be expressed more concisely as the sum
of the cost of the decision trees {Dij }:
ni X
XX
d(r(Dij ), ux , Dij )w(x) =
qi ∈Q j=1 x∈Tqj
ni
XX
cost(Dij , w)
(8)
qi ∈Q j=1
i
Therefore, employing equalities (7) and (8) in (6) leads to the following expression for the
cost of D:
cost(D, w) = w(T ) · cost(DQ , w′ ) +
ni
XX
j · w(Tqji ) +
ni
XX
cost(Dij , w) +
ni w(qi )
qi ∈Q
qi ∈Q j=1
qi ∈Q j=1
X
Now all we need is to upper bound the first term of the previous equality. Notice that
cost(DQ , w′ ) is exactly the cost of the approximation computed at Step (ii) of the algorithm.
As mentioned previously, in this step we use the result from [7] which guarantees an upper bound
of H({w′ (qi )}) + 2 on cost(DQ , w′ ). Substituting this bound on the last displayed equality and
observing that w(T ) · H({w′ (qi )}) = H({w(T qi )}), we complete the bound for the cost of D:
cost(D, w) ≤ H({w(T qi )}) + 2w(T ) +
ni
XX
j · w(Tqji )
qi ∈Q j=1
+
X
ni
X
cost(Dij , w) +
qi ∈Q j=1
3.2
X
ni w(qi )
(9)
qi ∈Q
Entropy Lower Bound
In this section we present a lower bound on the cost of an optimal decision tree for (T, w).
Hence, let D∗ be a minimum cost decision tree for (T, w), and let r∗ be the root of D∗ .
Consider a tree Tqji and let x be a node in Tqji . By definition, the representative of Tqji in D∗
(the node u(Tqji )) is an ancestor of the node ux in D∗ . Because Tqji ⊆ T qi it also follows from
the definition of the representative that u(T qi ) is an ancestor of u(Tqji ) in D∗ . Combining these
observations we have that the path in D∗ from r∗ to ux has the form (r∗
u(T qi )
u(Tqji )
ux ).
Now consider a node qi ∈ Q; again by the definition of representative, the path from r∗ to
uqi ). Then adding the weighted paths from r∗ to nodes
uqi can be written as (r∗
u(T qi )
8
in {Tqji } and in Q we can write the cost of D∗ , which is OPT(T, w), as:
X
XX X
d(r∗ , ux , D∗ )w(x) +
d(r∗ , uqi , D∗ )w(qi )
OPT(T, w) =
qi ∈Q j
=
qi ∈Q j
+
X
qi ∈Q
=
X
qi ∈Q
x∈Tqji
XX X
x∈Tqji
d(r∗ , u(T qi )) + d(u(T qi ), u(Tqji )) + d(u(Tqji ), ux ) w(x)
d(r∗ , u(T qi )) + d(u(T qi ), uqi ) w(qi )
d(r∗ , u(T qi ))w(T qi ) +
d(u(T qi ), u(Tqji ))w(Tqji )
qi ∈Q j
qi ∈Q
+
XX
XX X
qi ∈Q j
d(u(Tqji ), ux )w(x)
+
X
d(u(T qi ), uqi )w(qi )
(10)
qi ∈Q
x∈Tqji
Now we lower bound each term of the last equation. The idea to analyze the first term is
the following: it can be seen as the cost of the decision tree D∗ under a cost function where the
representative of T qi , for every i, has weight w(T qi ) and all other nodes have weight zero. Since
D∗ is a binary tree, we can use Shannon’s Coding Theorem to guarantee that the first term of
(10) is lower bounded by H({w(T qi )})/ log 3 − w(T ) (see Section C of the appendix for a more
formal proof).
Now we bound the second term of (10). Fix a node qi ∈ Q; consider two different trees Tqji
′
′
′
and Tqji such that d(r∗ , u(Tqji )) = d(r∗ , u(Tqji )). We claim that u(Tqji ) and u(Tqji ) are siblings in
D∗ . In fact, because their distances to r∗ are the same, u(Tqji ) is neither a descendant nor an
′
ancestor of u(Tqji ). Therefore, they have a common ancestor, say u(v,z) , and one of these nodes
is in the right subtree of u(v,z) and the other in the left subtree of u(v,z) . It is not difficult to see
that u(v,z) can only correspond to either (qi , j) or (qi , j ′ ). Without loss of generality suppose
′
′
the latter; then the right child of u(v,z) corresponds to an arc/node of Tqji , and therefore u(Tqji )
must be this child. Due to their distance to r∗ , it follows that u(Tqji ) must be the other child of
u(v,z) .
As a consequence, we have that for any level ℓ of D∗ there are at most two representatives of
the trees {Tqji } located at ℓ. This together with the fact that Tqji is the jth heaviest tree rooted
at a child of qi guarantees that
ni
X
d(u(T qi ), u(Tqji ))w(Tqji )
j=1
≥
ni X
j
j=1
2
w(Tqji )
≥
ni X
j−1
j=1
2
w(Tqji )
(11)
and the last inequality gives a bound for the second term of (10).
Directly employing Lemma 2, we can lower bound the third term of (10) by:
ni X
XX
d(u(Tqji ), ux , D∗ )w(x)
qi ∈Q j=1 x∈Tqj
≥
ni
XX
OPT(Tqji , w)
qi ∈Q j=1
i
Finally, for the fourth term we fix qi and note that the path in D∗ connecting u(T qi ) to
uqi must contain the nodes corresponding to arcs e1i , . . . , eni i (otherwise when traversing D∗
and reaching uqi we would not have enough knowledge to infer that qi is the marked node,
contradicting the validity of D∗ ).PApplying this reasoning for each qi , we conclude that last
term of (10) is lower bounded by qi ni · w(qi ).
9
Therefore, applying the previous discussion to lower bound the terms of (10) we obtain that:
OPT(T, w) ≥
ni
H({w(T qi )}) 3w(T ) X X
j · w(Tqji )
−
+
log 3
2
2
qi ∈Q j=1
+
ni
XX
OPT(Tqji , w) +
qi ∈Q j=1
3.3
X
ni w(qi )
(12)
qi ∈Q
Alternative Lower Bounds
When the value of the entropy H({w(T qi )}) is large enough, it dominates the term −3w(T )/2
in inequality (12) and leads to a sufficiently strong lower bound. However, when this entropy
assumes a small value we need to adopt a different strategy to devise an effective bound. For
this purpose we present two different lower bounds which do not contain an entropy term but
have improved c · w(T ) terms.
First alternative lower bound. In order to increase the term −3w(T )/2 that appears in (12), we
use almost the same derivation that leads from (10) to (12); however, we simply lower bound
the first summation of (10) by zero. This gives:
ni
ni
X
j · w(Tqji ) X X
w(T ) X X
OPT(Tqji , w) +
ni w(qi ) (13)
+
+
OPT(T, w) ≥ −
2
2
qi ∈Q j=1
qi ∈Q
qi ∈Q j=1
Second alternative lower bound. Now we devise a lower
P bound without the −w(T )/2 term;
however it also does not contain the important term qi ∈Q ni w(qi ).
Consider a tree Tqji and a node x ∈ Tqji . By the definition of representative, u(Tqji ) is an
ancestor of ux in D∗ , thus d(r∗ , ux , D∗ ) = d(r∗ , u(Tqji ), D∗ ) + d(u(Tqji ), ux , D∗ ). Because {Tqji }
and Q form a partition of nodes of T , the cost OPT(T, w) can be written as:
X
X
X X
d(u(Tqji ), ux )w(x) (14)
OPT(T, w) =
d(r∗ , u(Tqji ))w(Tqji ) +
d(r∗ , uqi )w(qi ) +
qi ∈Q
qi ,j
qi ,j x∈Tqj
i
Now we analyze the first two summations of the previous equation. Because the trees {Tqji }
and the nodes {qi } are pairwise disjoint, the nodes {u(Tqji )} and {uqi } are all different and
consequently at most one of them can be equal to r∗ . Since we assumed T has more than one
node, r∗ must correspond to an arc of T . Therefore d(r∗ , u(Tqji ), D∗ ) ≥ 1 when Tqji is a leaf of T
and also d(r∗ , uqi , D∗ ) ≥ 1 for all qi ∈ Q. Then the sum of the first two terms of equation (14)
is at least
X
X
d(r∗ , u(Tqji ))w(Tqji ) +
d(r∗ , uqi )w(qi ) ≥ w(T ) − w(Tqji )
qi ∈Q
qi ,j
where Tqji is a subtree of T that is not a leaf. It follows from the first presented property of heavy
paths that this sum can be further lower bounded by w(T ) − w(T )/2 = w(T )/2. Combining
this fact with a lower bound for the last summation of equation (14) provided by Lemma 2
(with T ′ = Tqji ) we have:
OPT(T, w) ≥
ni
w(T ) X X
+
OPT(Tqji , w)
2
qi ∈Q j=1
10
(15)
3.4
Approximation Guarantee
We proceed by induction over the number of nodes of T , where the base case is the trivial
one when T has only one node. Notice that because each tree Tqji is properly contained in
T , the inductive hypothesis asserts that Dij is an approximate decision tree for Tqji , namely
cost(Dij , w) ≤ αOPT(Tqji , w) for some constant α. The analysis needs to be carried out in two
different cases depending on the value of the entropy (henceforth we use H as a shorthand for
H({w(T qi )}).
Case 1: H/ log 3 ≥ 2w(T ). Applying this bound to inequality (12) we have:
OPT(T, w) ≥
X j · w(Tqj ) X
X
H
i
+
+
OPT(Tqji , w) +
ni w(qi )
4 log 3
2
qi ,j
(16)
qi ∈Q
qi ,j
Employing the entropy hypothesis to the first term of inequality (9) and using the inductive
hypothesis we have:
cost(D, w) ≤
X
X
H · (log 3 + 1) X
+
j · w(Tqji ) +
αOPT(Tqji , w) +
ni w(qi )
log 3
qi ,j
qi ,j
qi ∈Q
Setting α ≥ 4(log 3 + 1) it follows from the previous inequality and from inequality (16) that
cost(D, w) ≤ αOPT(T, w).
Case 2: H/ log 3 ≤ 2w(T ). Applying the entropy hypothesis and the inductive hypothesis to
inequality (9) we have:
X
X
X
cost(D, w) ≤ (2 log 3 + 2)w(T ) +
j · w(Tqji ) +
αOPT(Tqji , w) +
ni w(qi )
qi ,j
qi ,j
qi ∈Q
Adding together (α − 2) times inequality (15) and two times inequality (13) we have:
αOPT(T, w) ≥
X
X
(α − 4)w(T ) X
+
j · w(Tqji ) +
αOPT(Tqji , w) + 2
ni w(qi )
2
qi ,j
qi ,j
qi ∈Q
Setting α ≥ 2(2 log 3 + 2) + 4 we have that cost(D, w) ≤ αOPT(T, w). Therefore, the
inductive step holds for both cases when α ≥ 2(2 log 3 + 2) + 4.
In fact, a more careful choice for the threshold that separates the cases of the analysis
can achieve a slightly improved approximation guarantee. Using H/ log 3 ≤ 1.86 w(T ) and
H/ log 3 ≥ 1.86 w(T ) as the hypotheses for cases 1 and 2 respectively, straightforward calculations show that the approximation factor of the algorithm is at most 14.
Theorem 1. The proposed algorithm is a 14-approximation for the problem of binary searching
in trees.
4
Efficient Implementation
Notice that in order to implement the Step (iv) of the algorithm, we need to sort the trees
{Tqji }. As we argue in Section 4.3, all other steps of the algorithm can be performed in linear
time and consequently this operation is the algorithm’s bottleneck. Nonetheless, we resort to
an ‘approximate sorting’ procedure which allows a linear time implementation of the algorithm.
11
4.1
Approximate Sorting
Informally, given a set of nonnegative real numbers S = {s1 , s2 , . . . , s|S| } we want to sort them
such that they are roughly in non-increasing order. In the next paragraph we define more
formally the concept of an approximate sorting of S.
Without loss of generality, assume that S is nonempty and that s1 ≥ s2 ≥ . . . ≥ s|S| . Let
ŝ = s1 be
the largest number in S. Then we define a set of intervals {I1 , . . . , It+1 } for t =
log |S|2 as follows: I1 = [0, ŝ/2t ), Ii = [ŝ/2t−i+2 , ŝ/2t−i+1 ) for i = 2, . . . , t and It+1 = [ŝ/2, ŝ].
We say that a permutation σ of {1, ..., |S|} approximately sorts S iff for every i > j the elements
of interval Ii appear before those from Ij in the sequence induced by σ.
Proposition 1. Let σ be an approximate sorting for S. For every 1 ≤ i ≤ |S|, if si belongs to
an interval Ij then sσ(i) also belongs to Ij .
The next lemma will be useful to show that we do not lose too much when employing an
approximately sorted sequence instead of a sorted one in the Step (iv) of our algorithm.
Lemma 3. Let σ be an approximate sorting for S. Then:
|S|
X
i · sσ(i) ≤ 3
i=1
|S|
X
i · si
i=1
Proof. Let Pi be the indexes of the {sj } which belong to the interval Ii , namely Pi = {j : sj ∈ Ii }.
Because each element of S belongs to exactly one interval Ii , we can recast the inequality of the
lemma as:
t+1 X
t+1 X
X
X
j · sσ(j) ≤ 3
j · sj
i=1 j∈Pi
i=1 j∈Pi
Thus it suffices to prove the validity P
of the previous inequality.
We analyze separately the sums j∈Pi j · sσ(j) for each interval Ii . First consider Pi with
2 ≤ i ≤ t + 1. It follows from Proposition 1 that if an index j belongs to Pi then both sj and
sσ(j) belong to Ii . Therefore, making use of the boundaries of the interval Ii we have:

 

X
X
X
X
ŝ
ŝ
j · sσ(j) ≤  t−i+1
j  = 2 · t−i+2
j ≤ 2
j · sj
(17)
2
2
j∈Pi
j∈Pi
j∈Pi
j∈Pi
Now consider the indexes in P1 . Again, for all j ∈ P1 we have that sσ(j) ∈ I1 and consequently sσ(j) ≤ ŝ/2t . Therefore:
X
j · sσ(j)
j∈P1
|S|
ŝ X
ŝ X
ŝ · |S|2
ŝ |S|2 + |S|
≤ t
j≤ t
≤
j= t
2
2
2
2
2t
j=1
j∈P1
Recalling that t = log |S|
X
2
we have:
j · sσ(j) ≤
j∈P1
t+1
XX
ŝ · |S|2
j · sj
2 ⌉ ≤ ŝ ≤
⌈log
|S|
2
i=1 j∈P
(18)
i
Thus, employing inequalities (17) and (18) we complete the proof:


t+1 X
t+1 X
t+1
t+1 X
X
X
X
X
X
2
j · sσ(j) ≤
j · sj +
j · sj  ≤ 3
j · sj
i=1 j∈Pi
i=1 j∈Pi
i=2
12
j∈Pi
i=1 j∈Pi
Approximate Sorting in Linear Time. We say that an element s ∈ S has rank i if it
belongs to the interval Ii . Once we know the rank of each element of S, we can employ a simple
bucketing strategy to output an approximately sorted sequence in linear time. Thus, it suffices
to show how to compute the ranks of all elements of S in O(|S|) time.
For simplicity
of notation, define m = |S|. Also recall that ŝ is the maximum of S and
that t = log m2 . Define T [j] = ⌈log(j + 1)⌉ + 1 for j ≥ 0. The first step towards finding
the ranks is to compute T [j] for j = 0, . . . , 2m − 1. This can be done in linear time using
the following strategy: tabulate all powers 2j for j = 0, . . . , ⌈log 2m⌉ then set T [0] = 1 and
fill entries T [2j ], T [2j + 1], . . . , T [2j+1 − 1] with value j + 2. In order to calculate the rank of
a number s ∈ S we check if s < 2⌈log m⌉−t ŝ. In the affirmative case, s has rank T [⌊2t s/ŝ⌋].
Otherwise, if s < ŝ then s has rank T [⌊2t−⌈log m⌉ s/ŝ⌋] + ⌈log m⌉. Finally, if s = ŝ then s has
rank t + 1.
It is easy to verify that this procedure calculates the right rank for elements of S smaller than
ŝ/2t or equal to ŝ. For all other values of s (i.e., s belongs to one of the intervals {I2 , . . . , It+1 })
the correctness of the procedure is based on the following two observations: if s ≥ 2⌈log m⌉−t ŝ
then T [⌊2t−⌈log m⌉ s/ŝ⌋] + ⌈log m⌉ = T [⌊2t s/ŝ⌋] and also for j = 2, . . . , t + 1:
t 2t s
2s
j−2
j−1
j−2
s ∈ Ij ⇔ 2
≤
<2
⇔2
≤
< 2j−1
ŝ
ŝ
t t t 2s
2s
2s
⇔ 2j−2 <
+ 1 ≤ 2j−1 ⇔ log
+1
=j−1⇔T
=j
ŝ
ŝ
ŝ
Thus, by making use of this procedure we can find in linear time an approximately sorted
sequence for any set of non-negative real numbers.
4.2
Modified Algorithm
Based on the previous remarks, we propose the following modification to our algorithm: in Step
(iv), the algorithm will now use an approximate sorting of the trees {Tij }j to build the tree Di ,
for each qi ∈ Q. In order to clarify this modification, we illustrate it for a particular qi ∈ Q. In
Step (iv), the modified algorithm finds a permutation σi which approximately sorts the weights
{w(Tq1i ), w(Tq2i ), . . . , w(Tqnii )}. Then it construct the decision tree Di using this permutation:
σ (1)
σ (n )
σ (j)
the leftmost path of Di consists of nodes corresponding to ei i , . . . , ei i i , uqi and Di i is
σ (j)
the right subtree of the node corresponding to ei i .
An upper bound for this modified version of the algorithm can be derived by the same
analysis used previously. In fact, the introduction of the approximate sorting only alters the
third term of the upper bound presented in inequality (9), leading to the following bound:
cost(D, w) ≤ H({w(T qi )}) + 2w(T ) +
ni
XX
j·
w(Tqσii (j) )
+
qi ∈Q j=1
ni
XX
cost(Dij , w) +
qi ∈Q j=1
X
ni w(qi )
qi ∈Q
Using the result from Lemma 3 we can bound the third term of the previous inequality and
reach the following expression:
cost(D, w) ≤ H({w(T qi )}) + 2w(T ) + 3
ni
XX
j · w(Tqji )
qi ∈Q j=1
+
ni
XX
cost(Dij , w) +
qi ∈Q j=1
X
qi ∈Q
13
ni w(qi )
(19)
Again considering separate cases for low and high entropy values (i.e., H/ log 3 ≤ 1.72w(T )
and H/ log 3 ≥ 1.72w(T )) and performing the same analysis as done previously, it can be verified
that the guarantee of the modified algorithm is at most 21.5.
4.3
Running Time
Finally, we argue that the modified algorithm presented in the previous section runs in O(n)
time, where n is the number of nodes of the input tree T .
Preprocessing. PThe preprocessing step is used to calculate in one linear pass the cumulative
weights w(Tu ) = v∈Tu w(v) for every vertex u ∈ T . This can be accomplished using a bottomup approach. Notice that the input tree for every recursive call launched during the execution
of the algorithm is a maximal subtree T . Therefore, the cumulative weight of a node u with
respect to one such tree is exactly w(Tu ). As a consequence, any cumulative weight used during
the algorithm can be found in constant time after the preprocessing step.
Algorithm. A heavy path Q can be found by starting at the root of T and at each step adding
its non-leaf child with greatest cumulative weight. Thus, this procedure takes O(deg(Q)) time,
where deg(Q) is the sum of the degrees of all nodes in Q. The rest of Step (i) can also be done
in O(deg(Q)) time. Step (ii) takes O(|Q|) time. Because Tqji = Tj , it is also easy to construct
the instances for Step (iii) in time which is linearly proportional to the number of such trees,
which is again O(deg(Q)). Assuming that we already have the decision trees {Dij } computed
recursively, building each tree Di takes time linearly proportional to the degree of qi , already
including the time to sort approximately the trees {Tqji }j . Therefore, Step (iv) can be also
computed in O(deg(Q)) time. Finally, Step (v) can be easily implemented in O(deg(Q)) time.
Thus, excluding the time spent in recursive calls, an execution of the algorithm take O(deg(Q))
time, where Q is the heavy path of the input tree of this execution. Adding this running time
for every call launched during the execution of the algorithm, we have the total time complexity
of the algorithm. Therefore, it follows that this complexity is linearly proportional to sum of
the degrees of the nodes in the heavy paths found during the execution. As these paths are
pairwise disjoint, the total running time is then linear in the number of edges of the input tree
T , which is also O(n).
Theorem 2. There is a linear time algorithm which provides a constant factor approximation
for the problem of binary searching in trees.
5
Conclusions
We presented a linear time algorithm that achieves constant approximation ratio for the problem
of minimizing the expected cost of binary searching in trees. This constitutes a significant
improvement over the O(log n) approximation algorithm available in the literature. The main
question that remains open is to decide whether the problem studied in this paper admits a
polynomial time algorithm.
References
[1] M. Adler, E. Demaine, N. Harvey, and M. Patrascu. Lower bounds for asymmetric communication channels and distributed source coding. In SODA, pages 251–260, 2006.
14
[2] M. Adler and B. Maggs. Protocols for asymmetric communication channels. Journal of
Computer and System Sciences, 63(4):573–596, 2001.
[3] A. Barkan and H. Kaplan. Partial alphabetic trees. Journal of Algorithms, 58(2):81–103,
2006.
[4] Y. Ben-Asher, E. Farchi, and I. Newman. Optimal search in trees. SIAM Journal on
Computing, 28(6):2090–2102, 1999.
[5] R. Carmo, J. Donadelli, Y. Kohayakawa, and E. Laber. Searching in random partially
ordered sets. Theoretical Computer Science, 321(1):41–57, 2004.
[6] V. Chakaravarthy, V. Pandit, S. Roy, P. Awasthi, and M. Mohania. Decision trees for
entity identification: Approximation algorithms and hardness results. In PODS, pages
53–62, 2007.
[7] R. de Prisco and A. de Santis. On binary search trees. Information Processing Letters,
45(5):249–253, 1993.
[8] K. Douı̈eb and S. Langerman. Near-entropy hotlink assignments. In ESA, pages 292–303,
2006.
[9] R. Gallager. Information Theory and Reliable Communication. John Wiley & Sons, Inc.,
New York, NY, USA, 1968.
[10] S. Ghazizadeh, M. Ghodsi, and A. Saberi. A new protocol for asymmetric communication
channels: Reaching the lower bounds. Scientia Iranica, 8(4), 2001.
[11] M. Golin, C. Kenyon, and N. Young. Huffman coding with unequal letter costs. In STOC,
pages 785–791, 2002.
[12] R. Karp. Minimum-redundancy coding for the discrete noiseless channel. IRE Transactions
on Information Theory, 7(1):27–38, 1961.
[13] W. Knight. Search in an ordered array having variable probe cost. SIAM Journal on
Computing, 17(6):1203–1214, 1988.
[14] D. Knuth. The art of computer programming, volume 3: sorting and searching. Addison
Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1998.
[15] R. Kosaraju, T. Przytycka, and R. Borgstrom. On an optimal split tree problem. In WADS,
pages 157–168, 1999.
[16] E. Laber and L. Holanda. Improved bounds for asymmetric communication protocols.
Information Processing Letters, 83(4):205–209, 2002.
[17] E. Laber, R. Milidiú, and A. Pessoa. Strategies for searching with different access costs.
In ESA, pages 236–247, 1999.
[18] E. Laber, R. Milidiú, and A. Pessoa. On binary searching with non-uniform costs. In
SODA, pages 855–864, 2001.
[19] E. Laber and L. Nogueira. On the hardness of the minimum height decision tree problem.
Discrete Applied Mathematics, 144(1-2):209–212, 2004.
15
[20] M. Lipman and J. Abrahams. Minimum average cost testing for partially ordered components. IEEE Transactions on Information Theory, 41(1):287–291, 1995.
[21] S. Mozes, K. Onak, and O. Weimann. Finding an optimal tree searching strategy in linear
time. In SODA, pages 1096–1105, 2008.
[22] G. Navarro, R. Baeza-Yates, E. Barbosa, N. Ziviani, and W. Cunto. Binary searching with
nonuniform costs and its application to text retrieval. Algorithmica, 27(2):145–169, 2000.
[23] K. Onak and P. Parys. Generalization of binary search: Searching in trees and forest-like
partial orders. In FOCS, pages 379–388, 2006.
[24] J. Szwarcfiter, G. Navarro, R. Baeza-Yates, J. de S. Oliveira, W. Cunto, and N. Ziviani.
Optimal binary search trees with costs depending on the access paths. Theoretical Computer
Science, 290(3):1799–1814, 2003.
[25] J. Watkinson, M. Adler, and F. Fich. New protocols for asymmetric communication channels. In SIROCCO, pages 337–350, 2001.
16
Appendix
A
Proof of Lemma 1
Given any tree T ′ and two nodes u, v ∈ T ′ , a lowest common ancestor of u and v is a node
x ∈ T ′ such that: (i) x is an ancestor of both u and v (ii) there is no proper descendant of x
which is an ancestor of both u and v. The following proposition is easily verified:
Proposition 2. Consider a tree T ′ and two nodes of it u and v such that u is neither an
ancestor of v nor a descendant of it. Then the lowest common ancestor x of u and v is a proper
ancestor of both of them. Furthermore, if u′ (v ′ ) is the child of x which is an ancestor of u (v),
then u′ 6= v ′ .
Proof of Lemma 1. Consider a subtree T ′ of T . We define u(T ′ ) as the node of D closest to
r(D) which corresponds to an arc or node of T ′ . More formally, u(T ′ ) = uα , such that α =
argminα′ {d(r(D), uα′ , D) : α′ is a node or an arc of T ′ }. We claim that u(T ′ ) is a representative
of T ′ , and to prove this it suffices to argue that u(T ′ ) is an ancestor of all nodes of D that
correspond to nodes or arcs of T ′ .
By means of contradiction, let β be a node or arc of T ′ such that uβ is not a descendant
of u(T ′ ) = uα . The node uβ cannot be a proper ancestor of uα , or that would contradict the
choice of uα . Thus, from Proposition 2 uα and uβ are on subtrees rooted at different children of
their lowest common ancestor, say u(i,j) . Without loss of generality, assume that uβ is belongs
to the right subtree of u(i,j) and that uα belongs to the left subtree of u(i,j) . By definition, β
is an arc/node of Tj and α is an arc/node of T − Tj . Thus, the root of T ′ must be a proper
ancestor of j so that both α and β belong to T ′ . But this implies that (i, j) ∈ T ′ . Because
u(i,j) is a proper ancestor of uα , it is closer to the root of D and contradicts the choice of uα .
Therefore, u(T ′ ) must be an ancestor of all nodes of D that correspond to arcs/nodes of T ′ .
Notice that u(T ′ ) is the unique representative of T ′ , otherwise a different representative
would have to be a proper ancestor of u(T ′ ) and would contradict the fact u(T ′ ) is a representative.
B
Searching in Path-like Trees
We argue that the problem of searching in path-like trees can be reduced to the well-known
problem of searching in ordered lists with different access probabilities. The latter problem is
defined as follows. As input we have a list L = {l1 , . . . , ln } of numbers in increasing order and
probabilities {p1 , . . . , pn , q1 , . . . , qn+1 }. One number, which may not belong to L, is marked;
pi is the probability that li is the marked number and qi is the probability that the marked
number is in the interval (li−1 , li ) (using the convention that l0 = −∞ and ln+1 = ∞). The goal
is to find the location of the marked number, that is, to identify which li equals to the marked
number (in case it belongs to L) or to identify which interval (li−1 , li ) contains it. For that, one
may query elements of L, where querying li gives the information whether the marked node is
greater than li , less than li or li itself.
The reduction from the problem of searching in path-like trees to the above problem goes
as follows. Given an instance (T, w) where T = (t1 → . . . → tn ) is a path, we create a list L
with the first n − 1 natural numbers and associate the arcs of T with the elements of L and
the nodes of T with the intervals delimited by L. That is, we set pi = 0 for all 1 ≤ i ≤ n − 1
and set qi = w(ti ) for all 1 ≤ i ≤ n. It is not difficult to see that a search strategy for L
gives a search strategy for T (with the same expected number of queries) and vice-versa. Due
17
to this equivalence, approximate algorithms for the problem of searching in lists with access
probabilities also approximate the problem of searching in paths with the same guarantee.
C
Proofs for the Entropy Lower Bound
Lemma 4.
P
qi ∈Q d(r
∗ , u(T
qi ), D
∗ )w(T
qi )
≥ H({w(T qi )})/ log 3 − w(T )
Proof. Construct the tree D′ by adding new leaves to D∗ as follows: for each node qi ∈ Q, add
a leaf li to u(T qi ) (the relative position of siblings can be ignored in this analysis). Now define
the probability distribution w′ for D′ such that w′ (li ) = w(T qi )/w(T ), and all other nodes of
D′ have zero probability.
Clearly the distance from r(D′ ) to li in D′ equals d(r∗ , u(T qi ), D∗ ) + 1. Because only the
nodes {li } have nonzero probability, the cost of D′ is:
cost(D′ , w′ ) =
X
d(r(D′ ), li , D′ )w′ (li ) =
X
qi
li
w(T qi )
d(r∗ , u(T qi ), D∗ ) + 1
w(T )
Because the trees {T qi } are pairwise disjoint, for qi 6= qj we have that u(T qi ) 6= u(T qj ).
Therefore at most one new leaf {li } is added to each node {u(T qi )} and D′ is at most a ternary
tree. We can then use Shannon Coding Theorem [9] to bound the cost of D′ as cost(D′ , w′ ) ≥
H({w′ (li )})/ log 3. Substituting this bound in the previous displayed inequality and noticing
that H({w(T qi )}) = H({w′ (li )}) · w(T ), we have:
H({w(T qi )}) X
≤
d(r∗ , u(T qi ), D∗ ) + 1 w(T qi )
log 3
q
i
The result follows by reorganizing the previous inequality and noticing that w(T ) =
as the trees {T qi } define a partition of nodes of T .
18
P
qi
w(T qi ),