csse2011_submission_75.pdf

Decomposing and Concatenating Binary Search Trees
and Applications in Making Data Structures Dynamic
Hamid Alaei
Alireza Bagheri
Department of Computer Engineering and IT
Amirkabir University of Technology
Tehran, Iran
[email protected]
Department of Computer Engineering and IT
Amirkabir University of Technology
Tehran, Iran
[email protected]
Abstract—Consider two balanced binary search trees T1 and
T2, such that all the data-items stored in T1 are less than every
data-item stored in T2. Concatenating these two trees means to
make a balanced tree, say T, contains the whole data in T1 and
T2 and satisfies the constraints over the height. Also,
decomposing a balanced binary search tree T by a given
splitter value s means to partition T into two balanced search
trees T1 and T2 such that all the items in T1 are less than or
equal to s, and all the items in T2 are greater than s. In this
paper, we introduce efficient algorithms for decomposing and
concatenating trees, which spend logarithmic time in the worst
case. Finally, we briefly introduce a general technique of
making data structures dynamic as an application of these
algorithms.
For instance, Height balanced binary search tree or
AVL [1] and Red-Black tree [2], [3] are two well-known
self-balance BSTs. In an AVL tree, for each node the height
of its left and right subtree defer at most one. In a Red-Black
tree, every node is colored red or black. Moreover, two
properties guarantee the height of tree. The first one is the
black property: The number of black nodes in every path
from the root to a leaf is equal. The second is the red
property: if the node is red, then both its children are black.
Both AVL, and Red-Black trees with n nodes guarantee their
height to be O(log n) with different constant factors.
Keywords- Binary search tree; AVL+ tree; decompose;
concatinate; static/ dynamic data structure.
I.
INTRODUCTION
Binary search trees (BSTs) are important data structures
with several applications. A BST can be used to represent
order in a totally ordered set and facilitates some operations
such as searching, insertion and deletion. Therefore, BSTs
and other extensions have a trivial application in common
commercial and database systems. However, they are also
fruitful for solving more complicated problems such as the
longest matching prefix problem [7], range searching and
windowing queries in multidimensional spaces [8].
The efficiency of operations on BSTs depends on the
height of the tree. Since the height of a binary search tree
could be large as the number of nodes or records stored on it,
in the worst case each of these operations could takes O(n)
time, where n is the number of records stored in that tree. To
guarantee O(log n) height for tree, self-balanced binary
search trees were introduced. These kinds of BSTs, during
their construction using insertion and deletion, keep their
heights automatically small using some self-balance
operations. Although the rules they use in self-balance
operations vary from type to type, most of them try to rotate
their nodes to reduce the height. In a rotation, three nodes
say u, v and w, where u is parent of v and v is parent of w
change their position in the tree such that it cause reduce the
height in some part of tree, but not violate the basic
properties of the BST.
In this paper, we use a kind of height-balanced tree that
we call it AVL+ tree and introduce it in section II. In
addition, we introduce two new kinds of operations in binary
search trees called decomposition and concatenation. In
decomposition of a BST T with a given splitter value s, we
are interested in partitioning T into two BSTs T1 and T2:
place all data-records of T with the key less than or equal to s
into T1 and the remaining data-records into T2. The reverse
operation is concatenation of two binary search trees T1 and
T2: make a BST T contains data-records of both T1 and T2. In
section III and IV, we introduce optimal methods for doing
these two operations in a logarithmic time that guarantee the
height of the resulting tree(s).
Although the tree decomposition and concatenation could
have trivial application in database systems, in section V we
use these operations to design a novel technique to construct
dynamic data structures from static data structures. Finally,
we conclude our article at section VI.
II.
AVL+ TREE
In an AVL+ tree data-items are stored in leave nodes
(data nodes). Non-leave nodes (index nodes) contain indexitems, which can direct the search path from root to the
appropriate leaf node. In addition, like an AVL tree [1],
every node stores a number indicates its balanced factor (bf),
which is equal to the height of the node’s left subtree minus
the height of its right subtree. A node with balanced factor of
– 1, 0 or + 1 is considered a balanced node. All nodes have
to be balanced. Moreover, we store in each node a pointer to
its parent. Although this parent pointer seems redundant, it
could be helpful to simplify the update operations. On the
other hand, this pointer is necessary in the application we
mention latter. Finally, we need to memorize the height of
the tree. Hence, we store the height of root in the data
structure. We can compute the height of other nodes using
the balance factor of their ancestors.
The search, insert and delete operations in an AVL+ tree
is similar to those in an AVL tree. However, there are some
small differences. For insertion, when we reach a leaf node u
during the search for appropriate place for the new node w,
we have to create a new index node v as the parent of both u
and w and replace u with v. Moreover, to delete a leaf node
or do rotation, if we face an index node p with only one child
node c, we easily delete p from the tree and replace it with c.
Obviously, every search, insert or delete operation can be
done in O(log n) time, where n is the number of data-items
in the AVL+ tree.
III.
CONCATENATE TWO AVL+ TREES
Let T1 and T2 be two AVL+ search trees and assume that
all data-items of T1 are smaller than every data-item in T2.
Without less of generality, consider the height of T1 is less
than or equal to the height of T2. As mentioned before, by
concatenating T1 and T2 we mean to construct an AVL+ tree
T1 | T2, contained all data-items in T1 and T2. We do this by
adding T1 to the appropriate location in T2 and balancing the
resulting tree. The Procedure Concatenate in table I receives
as input two AVL+ trees T1 and T2 with above properties,
concatenates them and puts the result into T2. In the
following, we describe this procedure.
First, using the information about the height of trees and
the balance factors of nodes in T2, we start from the root of
T2 and go down through the left pointers until we reach the
first node, say cur, which its subtree is not higher than T1.
Then, we create a new index-node u with the root of T1 as its
left child. The index value of u could be any value in range
[lb, ub), where lb is the key stored in the rightmost leaf of T1
and ub is the key of the leftmost leaf of T2. To decide about
the right child of u and constructing the AVL+ tree T1 | T2
we will face with one of the three following cases.
Case 1: The node cur is the root of T2 by itself. In this
case, we easily set cur as the right child of u and create T1 |
T2 by introducing u as its root. The height of this tree and the
balance factor of its root, u, can be computed using the
information we stored as the height of T1 and T2.
For the remaining cases, let h1 be the height of T1 and h2
be the height of subtree rooted by cur. The height balance
features of T2, enforce h2 two possible values of h1 or h1 – 1.
Case 2: The node p that is parent of cur has balance
factor of – 1 and h1 equals h2. In the other word, the right
sibling of cur is higher than cur and T1 is as high as cur. In
this case, we choose cur as right child of the new node u, and
replace cur with u as the left child of p. This turns the
balance factor of u and p to zero with no effect on the height
of p. The resulting tree is the concatenation of T1 and T2.
Case 3: If neither case 1 nor 2 happen, we set the right
pointer of u to the cur’s parent, p. Passing the above
procedure, we know that the height of T1 is one less than the
height of p, so we set u’s balance factor to – 1. After that, we
replace p with new node u in T2 that its height is one more
than p. As a result, we must do the rebalancing operations as
same as those in AVL tree, from the new node u up to the
root of T2 –in the worst case. The resulting height balanced
tree is the desired T1 | T2.
TABLE I.
01
02
03
04
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
1
2
1
2
PROCEDURE CONCATENATE.
Procedure Concatenate(tree T1, tree T2)
h1 T1.height
h2 T2.height
cur T2.root
while(h1 < h2)
if(cur.bf = – 1) h2 h2 – 2
else h2 h2 - 1
cur cur.left
end while
parent cur.parent
crete a new index node u
u.item = greatest key stored in T1
SetLeftChild(u, T1.root)
if(parent = null)
SetRightChild(u, cur)
u.bf h1 – h2
T2.root u
T2.height T2.height + 1
return
else if(h1 = h2 and parent.bf = -1)
SetRightChild(u, cur)
SetLeftChild(parent, cur)
u.bf 0
parent.bf 0
return
end if
SetRightChild(u, parent)
u.bf -1
if(parent.parent = null)
T2.root u
T2.height T2.height + 1
return
end if
SetLeftChild(paret.parent, u)
Balance the path from u to the root as it done in a
simple AVL tree
Procedure SetLeftChild(node u, node left)
u.left left
left.parent u
Procedure SetRightChild(node u, node right)
u.right right
right.parent u
This procedure straightly yields the following theorem.
Theorem 1: Consider T1 and T2 be two AVL+ trees of
height h1 and h2, respectively. Let all keys stored in T1 are
less than or equal to those in T2. Concatenating of two AVL+
trees T1 of height h1 and T2 of height h2 into a balanced
AVL+ tree can be done in time O(h1 + h2). Moreover, given
a value in range of [lb, ub), where lb is the greatest key
stored in T1 and ub is the smallest key of T2, one can do this
concatenation by spending O(|h1 – h2| + 1) worst-case time.
IV.
DECOMPOSE AN AVL+ TREE
Let T be an AVL+ tree and s be a given value. We are
going to decompose T by s: Partition T into two AVL+ trees
T1 and T2 such that all data-items in T less than or equal to s
are placed in T1 and other data-items are placed in T2.
Lemma 1: Let T be an AVL+ tree containing n dataitems and s be an arbitrary value. One can decompose T into
two lists of subtrees L1 and L2 of length O(log n) in the
worst-case time of O(log n) such that all data-item of T less
than or equal to s are placed in subtrees of L1 and others are
placed in subtrees of L2.
Proof: The proof is based on the Procedure
DecomposeToLists in Table II. Note that in Table II we do
not write down the details of computing the height of
subtrees. Using this procedure, we travel down from the root
of T, the pass to find the appropriate location for s. Hence,
during this search procedure, we face at most O(log n) nodes.
For each node u among these nodes, we add a subtree rooted
by u or one of its children to one of the lists L1 and L2. To
make a subtree, we need to create a tree rooted by a node
with known height and set the height of the node to the tree.
Since we know the height of T and the balance factor of
nodes in the search path, we could easily compute the height.
The proof can be done by an induction that we implicitly
mention in the following.
If node u is leaf and has key value less than or equals to
s, we easily add the subtree of u into the list L1 and finish the
procedure. Similarly, when the item of leaf node u is greater
than s, we add the subtree of u into the list L2 and terminate
the procedure. This could be done in O(1) time.
In the other case that u is not a leaf node, we compare the
index-item value of u with s. If s is less than or equal to the
value stored in u, it means that all data-items in right subtree
of u are greater than s. Therefore, we add the subtree of the
right child of u into the list L2 and move to the left child of u
to continue the procedure. Likewise, if s is greater than the
index-item’s value of u, we add the left subtree of u into L1,
and move to the right child of u to continue the procedure.
Obviously, this spends O(1) time all together.
Since we visit O(log n) nodes and for each we spend
O(1) time, the total procedure running time is O(log n). Also,
at the end the size of lists L1 and L2 could not exceed the
upper bound of O(log n).
□
Lemma 2: We can construct the lists L1 = <t1, t2, …, tk>
and L2 = < tk+1, tk+2, …, tm> of lemma 1 such that they are
sorted in both values and height of subtrees. More clearly:
For each i, 1 ≤ i < m, all data-items in ti have keys less
than every data-item in ti + 1.
For each i, 1 ≤ i < k, the height of ti is less than or equal
to the height of ti + 1. Moreover, for each i, 1 ≤ i < k – 1, the
height of ti is less than the height of ti + 2.
For each i, k < i < m, the height of ti is greater than or
equal to the height of ti + 1. Moreover, if k ≤ i < m – 1, the
height of ti is more than the height of ti + 2.
Proof: In the procedure of constructing lists L1 and L2,
we travel downward the tree from the root, and since the tree
T is balanced, every subtree that we visit and add into one of
the lists, are not higher than the previous sub-trees.
Moreover, every subtree is higher than the second next
subtree added into one of the lists. Therefore, the only thing
we need is that when we want to add a subtree into L1, add it
into the tail of the list, and when we want to add a subtree to
L2 add into the head of the list.
Furthermore, when we add a subtree t into L1, we move
left. Hence, future subtrees that will be appended into L1 or
added to L2 have data-items greater than t. The situation is
similar when we add an item into the head of L2.
□
TABLE II.
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
PROCEDURE DECOMPOSETOLISTS
Procedure DecomposeToLists(Tree T, value s,
output List L1, output List L2)
cur T.root
while(cur ≠ null)
if(s < cur.item and cur is leaf)
L2.addToHead(subtree rooted by cur)
return
else if (cur.item ≤ s and cur is leaf)
L1.addToTail(subtree rooted by cur)
return
else if(s ≤ cur.item)
L2.addToHead(subtree rooted by cur.right)
cur cur.left
else
L1.addToTail(subtree rooted by cur.left)
cur cur.right
end-if
end-while
Now we are ready to convert lists L1 and L2 with
properties mentioned in lemma 2 into two AVL+ trees. Since
the operation is similar for both lists, we just describe who to
convert L2 into an AVL+ tree.
Let L2 = < t1, t2, …, tk> and hi be the height of ti for each
ti in this list. First, we concatenate the t1 and t2 that are two
subtrees with the lowest height in L2. Then, we concatenate
the resulting tree with t3, and we continue so on until we
concatenate all subrees in L2. The resulting tree is our desired
T2. To concatenate two subtrees, we use parents of the root
of subtrees in the original tree T as the new node that must be
constructed during the procedure of concatenation. This
reduce the time of concatenation of two trees of height h1
and h2 from O(h1 + h2) to O(|h1 – h2| + 1).
Consider Si be the subtree made by concatenating t1, …, ti
and we are going to concatenate Si with ti+1. If the height of
Si is lower than the height of ti+1, this operation takes O(hi+1
– hi + 1) in the worst case. Otherwise, due to the properties
mentioned above, one can easily see that the height of Si is at
most hi+1+2. Therefore, this concatenation takes O(1) time in
this case. Hence, the concatenation of Si with ti+1 takes O(hi+1
– hi + 1) worst-case time. Since the total cost is:
ܱሺℎଶ − ℎଵ + 1 + ℎଷ − ℎଶ + 1 + ⋯ + ℎ௞ − ℎ௞ିଵ + 1ሻ =
ܱሺℎ௞ − ℎଵ + ݇ሻ = ܱሺlog ݊ሻ.
Hence, we can introduce the Procedure Decompose
(Table III), that decompose an AVL+ tree T by a given value
s into two trees T1 and T2. An illustration of this procedure is
depicted in Fig. 1.
TABLE III.
PROCEDURE DECOMPOSE
Procedure Decompose(Tree T, value s, output Tree T1, output
Tree T2)
01
02
03
04
05
06
07
08
09
10
11
12
13
DecomposeToLists(T, s, L1, L2)
let T1 and T2 two empty trees
while(L1 is not empty)
let t be the last tree in L1
remove t from L1
Concatenate(t, T1)
end-while
while(L2 is not empty)
let t be the first tree in L2
remove t from L2
Concatenate(T2, t)
T2 t
end-while
V.
BOUNDING THE IN-DEGREE IN DATA STRUCTURES
In this section, we establish in an abstract term a general
technique for making static data structures dynamic.
Consider a data structure S that is optimal for searching and
querying in a set of static data objects U. This means that by
spending a rather long time pre-processing over U, we can
construct S that is swift to answer our future queries.
However, we could not update S enough fast when some
changes such as insert a new item or delete an existing item
happen to the set U. Now, we are interested in designing a
dynamic data structure D that serves techniques of S, but can
handle update procedures on U faster. Off course, there
could be several difficulties to design D, some of them
mentioned in [4]. The problem that we want to focus is the
number of pointers that point to the same location of the
memory ( in-degree).
Consider a situation that we need to merge two nodes A
and B of S with in-degree of dA and dB, respectively (Fig. 2).
To do so, we could append data in B into node A. Then, we
need to change all pointers point to B and set a new address
of A to them. This pointer refinement operation takes O(dB)
time. If S guarantee a constant upper bound, then the pointer
refinement takes no more than O(1) time, that is tolerable.
However, some kinds of data structures do not guarantee
such a constant bound, so our try to make a fast dynamic
data structure will be failed. On the other hand, consider that
we want to split a node A that points by dA pointers. In this
case, the similar problem happens when we want to change
some of the pointers from A to a new node B.
This yield the following theorem:
Theorem 2: Let T be an AVL+ tree stores n data-items
and s be an arbitrary value. One can decompose T into two
AVL+ trees T1 and T2 spending O(log n) time in such a way
that all data-items of T less than or equal to s is placed in T1
and other data-items is placed in T2.
(a)
(b)
(a)
(c)
(b)
Figure 1. An example of tree decomposition: (a) a height balanced binary
search tree with a seperator value s; (b) The result of decomposition.
Highlighted nodes denote root of subtrees mentioned in lemma 1 and their
parents.
Figure 2. When the in-degree of nodes is large the cost of merging two
nodes or spliting a node would be high: (a) Two nodes A and B with large
in-degree. (b) The result of merging A and B into a single node A|B (c)
Splitting the node into two node A' and B'. Red arrows show the pointers
that have been changed.
To solve the problem, we have to reduce the in-degree of
nodes into a constant factor. Consider that we have an order
over the pointers point to the same node. This order may
come from the intrinsic order of data stored in nodes that
contain these pointers.
Let P be a set of pointers in S point to the same node u.
In the dynamic data structure D, we use a balanced binary
search tree T representing P. We call T the inverse pointer
tree of set P. Every leaf of T is corresponds to one pointer of
P and the order used to define this binary search tree is the
order defined in the set P. The root of tree T has a pointer to
the node u. Using the parent pointer of nodes in inverse
pointer tree T, one can traverse in D as S could be traversed.
Now consider in S we have two set of pointers PA and PB
point to the node A and B, and in D we have two inverse
pointer tree TA and TB correspond to PA and PB, respectively.
To merge nodes A and B during an update operation we can
update the pointers in PA and PB by concatenating TA and TB
in O(log(|PA|) + log(|PB|)) worst-case time. Similarly, to
update pointers during the split of node A into two nodes, we
can easily decompose TA into two trees by an appropriate
splitter value in O(log(|PA|)).
Although this method has a logarithmic worst-case
penalty for each node-access, it allows us to do update
operations faster. Moreover, penalties for node-accesses may
cause a lower amortized amount and sum up to less than the
value that can deteriorate the asymptotic worst-case search
time. Fig. 3 illustrates this technique by bounding the indegree of data structure that is shown in Fig. 2.
(a)
(b)
(c)
Figure 3. (a) Using an inverse pointer tree for pointers point to the same
node, we can reduce the in-degree to a constant factor. The Low cost of (b)
merging two nodes by concatinating their inverse pointer trees and (c)
splitting a node into two nodes by decomposing its inverse pointer tree.
VI.
CONCLUSION
In this paper, we introduced two new operations on selfbalanced binary search trees: decomposition of a tree into
two trees and concatenation of two trees into a single one. In
addition, we initiated algorithms that optimally do each of
these operations in a logarithmic worst-case time. Moreover,
we served these algorithms to design a novel method of
making static data structures dynamic. Using this technique,
one may could design novel dynamic data structures for
solving problems such as most-specific range matching [6]
and point location in planar subdivision [9], [10] based on
optimal static data structures [5]. The authors remain
applying of this technique to concrete applications for future
works.
REFERENCES
[1]
Adelson-Velskii G., E. M. Landis, “An algorithm for the organization
of information,” English translation by Myron J. Ricci in Soviet
Math. Doklady, 3:1259–1263, 1962 [Proceedings of the USSR
Academy of Sciences 146: 263–266. (Russian)].
[2] R. Bayer, “Symmetric binary B-trees: Data structure and maintenance
algorithms,” Acta Informatica, 1:290–306, 1972.
[3] L. J. Guibas, and R. Sedgewick, “A dichromatic framework for
balanced trees,” In Proceedings of the 19th Annual Symposium on
Foundations of Computer Science, pages 8–21. IEEE Computer
Society, 1978.
[4] M. H. Overmars, “Design of dynamic data structures,” SpringerVerlag New York, Inc., Secaucus, NJ, USA, 198..
[5] H. Edelsbrunner, L. J. Guibas, and J. Stolfi, “Optimal point location
in a monotone subdivision,” SIAM J. Comput., 15:317–340, 1986.
[6] H. Lu, and S. Sahni, “O(log n) dynamic router-tables for prefixes and
ranges,” IEEE Trans. Comput. 53. 10, October 2004, 1217-1230.
[7] M. Behdadfar, H. Saidi, H. Alaei, and B. Samari, “Scalar Prefix
Search: A New Route Lookup Algorithm for Next Generation
Internet,” IEEE INFOCOM, 2009.
[8] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwartzkopf,
“Computational geometry. Algorithms and applications,” Springer,
third edition, 2008.
[9] L. Arge, G. S. Brodal, and L. Georgiadis, “Improved Dynamic Planar
Point Location” IEEE Symposium on Foundations of Computer
Science, 47:305-314, 2006.
[10] Y, Giora, and H. Kaplan, “Optimal dynamic vertical ray shooting in
rectilinear planar subdivisions,” ACM Trans. Algorithms 5, 3, Article
28 , July 2009, 51 pages.