CHAPTER 10 Search Structures

CHAPTER 10
Search Structures
All the programs in this file are selected from
Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed
“Fundamentals of Data Structures in C”,
Computer Science Press, 1992.
AVL Trees
• Dynamic tables may also be
maintained as binary search trees.
• Depending on the order of the
symbols putting into the table, the
resulting binary search trees would
be different. Thus the average
comparisons for accessing a symbol is
different.
Binary Search Tree for The
Months of The Year
Input Sequence: JAN, FEB, MAR, APR, MAY, JUNE, JULY, AUG,
SEPT, OCT, NOV, DEC
JAN
FEB
MAR
JUNE
APR
MAY
JULY
AUG
SEPT
DEC
Max comparisons: 6
Average comparisons: 3.5
OCT
NOV
A Balanced Binary Search Tree
For The Months of The Year
Input Sequence: JULY, FEB, MAY, AUG, DEC, MAR, OCT, APR, JAN,
JUNE, SEPT, NOV
Max comparisons: 4
JULY
Average comparisons: 3.1
FEB
JAN
AUG
APR
MAY
DEC
MAR
JUNE
OCT
NOV
SEPT
Degenerate Binary Search
Tree
APR
Input Sequence: APR, AUG, DEC, FEB, JAN,
JULY, JUNE, MAR, MAY, NOV, OCT, SEPT
AUG
DEC
FEB
JAN
JULY
JUNE
MAR
MAY
Max comparisons: 12
Average comparisons: 6.5
NOV
OCT
SEPT
Minimize The Search Time of Binary
Search Tree In Dynamic Situation
• From the above three examples, we know that the
average and maximum search time will be
minimized if the binary search tree is maintained
as a complete binary search tree at all times.
• However, to achieve this in a dynamic situation,
we have to pay a high price to restructure the
tree to be a complete binary tree all the time.
• In 1962, Adelson-Velskii and Landis introduced a
binary tree structure that is balanced with
respect to the heights of subtrees. As a result of
the balanced nature of this type of tree, dynamic
retrievals can be performed in O(log n) time if
the tree has n nodes. The resulting tree remains
height-balanced. This is called an AVL tree.
AVL Tree
• Definition: An empty tree is height-balanced. If T
is a nonempty binary tree with TL and TR as its left
and right subtrees respectively, then T is heightbalanced iff
(1) TL and TR are height-balanced, and
(2) |hL – hR| ≤ 1 where hL and hR are the heights of
TL and TR, respectively.
• Definition: The Balance factor, BF(T) , of a node T
is a binary tree is defined to be hL – hR, where hL
and hR, respectively, are the heights of left and
right subtrees of T. For any node T in an AVL tree,
BF(T) = -1, 0, or 1.
Balanced Trees Obtained for
The Months of The Year
-2
0
MAR
MAR
(a) Insert MARCH
0
RR
-1
MAY
0
MAY
MAR
0
NOV
(c) Insert NOVEMBER
-1
MAR
+1
0
+1
MAY
(b) Insert MAY
0
AUG
MAY
MAY
0
NOV
(d) Insert AUGUST
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
+2
+2
+1
0
+1
MAY
LL
0
0
NOV
MAR
APR
(e) Insert APRIL
APR
AUG
0
AUG
-1
0
APR
AUG
0
JAN
+1
MAR
0
NOV
0
0
NOV
MAR
0
+2
MAY
MAY
0
LR
0
APR
(f) Insert JANUARY
AUG
MAR
0
JAN
-1
MAY
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
+1
+1
-1
AUG
0
APR
0
MAR
+1
-1
-1
MAY
JAN
DEC
(g) Insert DECEMBER
0
NOV
0
APR
AUG
0
DEC
MAR
0
JAN
-1
MAY
0
JULY
(h) Insert JULY
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
+2
-2
0
APR
AUG
-1
DEC
MAR
+1
JAN
0
+1
RL
-2
MAY
0
JULY
0
0
+1
NOV
0
AUG
APR
FEB
(i) Insert FEBRUARY
DEC
0
FEB
MAR
0
JAN
-1
MAY
0
JULY
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
+2
MAR
-1
-1
DEC
AUG
0
APR
0
NOV
JAN
0
FEB
JAN
MAY
-1
+1
0
LR
+1
-1
JULY
AUG
+1
0
DEC
MAR
0
FEB
-1
-1
JULY
MAY
0
0
0
JUNE
APR
JUNE
(j) Insert JUNE
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
-1
-1
JAN
+1
-1
DEC
+1
0
APR
FEB
-2
JULY
MAY
JUNE
0
DEC
-1
0
JAN
+1
MAR
0
AUG
RR
+1
NOV
0
AUG
-1
0
APR
0
OCT
(k) Insert OCTOBER
MAR
FEB
-1
0
JULY
NOV
0
JUNE
0
MAY
0
OCT
Balanced Trees Obtained for
The Months of The Year (Cont.)
-1
JAN
-1
+1
DEC
+1
AUG
MAR
0
FEB
-1
-1
JULY
NOV
0
0
APR
JUNE
0
MAY
-1
OCT
0
(i) Insert SEPTEMBER
SEPT
Rebalancing Rotation of
Binary Search Tree
• LL: new node Y is inserted in the left subtree of
the left subtree of A
• LR: Y is inserted in the right subtree of the left
subtree of A
• RR: Y is inserted in the right subtree of the right
subtree of A
• RL: Y is inserted in the left subtree of the right
subtree of A.
• If a height–balanced binary tree becomes
unbalanced as a result of an insertion, then these
are the only four cases possible for rebalancing.
Rebalancing Rotation LL
LL
+1
A
+2
A
0
B
BL
0
B
+1
B
BR
AR h
h+2
BL
BR
AR
height of BL increases
to h+1
h+2
0
A
BL
BR
AR
Rebalancing Rotation RR
RR
-1
A
-2
A
0
B
AL
BL
-1
B
h+2
BR
0
B
AL
BL
0
A
h+2
BR
height of BR increases
to h+1
BR
AL
BL
Rebalancing Rotation LR(a)
+1
A
0
B
+2
A
-1
B
0
C
LR(a)
0
B
0
C
0
A
Rebalancing Rotation LR(b)
LR(b)
+1
A
+2
A
0
B
h
-1
B
0
C
BL
CL
0
C
AR
h+2
+1
C
h
BL
CR
0
B
CL
-1
A
h+2
AR
CR
h
BL
CL
CR
AR
Rebalancing Rotation LR(c)
+2
A
LR(c)
-1
B
CL
+1
B
0
A
h+2
-1
C
BL
0
C
AR
CR
h
BL
CL
CR
AR
AVL Trees (Cont.)
• Once rebalancing has been carried
out on the subtree in question,
examining the remaining tree is
unnecessary.
• To perform insertion, binary search
tree with n nodes could have O(n) in
worst case. But for AVL, the
insertion time is O(log n).
AVL Insertion Complexity
• Let Nh be the minimum number of nodes in a
height-balanced tree of height h. In the worst
case, the height of one of the subtrees will be
h-1 and that of the other h-2. Both subtrees
must also be height balanced. Nh = Nh-1 + Nh-2 + 1,
and N0 = 0, N1 = 1, and N2 = 2.
• The recursive definition for Nh and that for the
Fibonacci numbers Fn= Fn-1 + Fn-2, F0=0, F1= 1.
• It can be shown that Nh= Fh+2 – 1. Therefore we
can derive that Nh   h2 / 5 1 . So the worst-case
insertion time for a height-balanced tree with n
nodes is O(log n).
Probability of Each Type of
Rebalancing Rotation
• Research has shown that a random
insertion requires no rebalancing, a
rebalancing rotation of type LL or RR,
and a rebalancing rotation of type LR
and RL, with probabilities 0.5349,
0.2327, and 0.2324, respectively.
Comparison of Various
Structures
Operation
Sequential List
Search for x
Linked List
AVL Tree
O(log n)
O(n)
O(log n)
Search for kth
item
O(1)
O(k)
O(log n)
Delete x
O(n)
O(1)1
O(log n)
O(n - k)
O(k)
O(log n)
Insert x
O(n)
O(1)2
O(log n)
Output in order
O(n)
O(n)
O(n)
Delete kth item
1.
Doubly linked list and position of x known.
2.
Position for insertion known
2-3 Trees
•
If search trees of degree greater than 2 is used, we’ll have
simpler insertion and deletion algorithms than those of AVL trees.
The algorithms’ complexity is still O(log n).
• Definition: A 2-3 tree is a search tree that either is empty or
satisfies the following properties:
(1) Each internal ndoe is a 2-node or a 3-node. A 2-node has one
element; a 3-node has two elements.
(2) Let LeftChild and MiddleChild denote the children of a 2-node.
Let dataL be the element in this node, and let dataL.key be its
key. All elements in the 2-3 subtree with root LeftChild have
key less than dataL.key, whereas all elements in the 2-3 subtree
with root MiddleChild have key greater than dataL.key.
(3) Let LeftChild, MiddleChild, and RightChild denote the children
of a 3-node. Let dataL and dataR be the two elements in this
node. Then, dataL.key < dataR.key; all keys in the 2-3 subtree
with root LeftChild are less than dataL.key; all keys in the 2-3
subtree with root MiddleChild are less than dataR.key and
greater than dataL.key; and all keys in the 2-3 subtree with
root RightChild are greater than dataR.key.
(4) All external nodes are at the same level.
2-3 Tree Example
A
40
B
C
10 20
80
The Height of A 2-3 Tree
• Like leftist tree, external nodes are
introduced only to make it easier to
define and talk about 2-3 trees. External
nodes are not physically represented
inside a computer.
• The number of elements in a 2-3 tree
with height h is between 2h - 1 and 3h - 1.
Hence, the height of a 2-3 tree with n
elements is between log 3 (n  1) and log 2 (n  1)
2-3 Tree Data Structure
typedef struct two_three *two_three_ptr;
struct two_three {
element data_l, data_r;
two_three_ptr left_child,
middle_child, right_child;
};
Searching A 2-3 Tree
• The search algorithm for binary search
tree can be easily extended to obtain the
search function of a 2-3 tree (Search()23).
• The search function calls a function
compare that compares a key x with the
keys in a given node p. It returns the value
1, 2, 3, or 4, depending on whether x is
less than the first key, between the first
key and the second key, greater than the
second key, or equal to one of the keys in
node p.
Program 10.4: Function to search a 2-3 tree
Insertion Into A 2-3 Tree
• First we use search function to search the 2-3
tree for the key that is to be inserted.
• If the key being searched is already in the tree,
then the insertion fails, as all keys in a 2-3 tree
are distinct. Otherwise, we will encounter a unique
leaf node U. The node U may be in two states:
– the node U only has one element: then the key can be
inserted in this node.
– the node U already contains two elements: A new node is
created. The newly created node will contain the
element with the largest key from among the two
elements initially in p and the element x. The element
with the smallest key will be in the original node, and the
element with median key, together with a pointer to the
newly created node, will be inserted into the parent of U.
Insertion to A 2-3 Tree
Example
A
A
20 40
40
B
C
10 20
70 80
(a) 70 inserted
B
10
D
30
(b) 30 inserted
C
70 80
Insertion of 60 Into Figure
10.15(b)
G
40
A
F
20
B
10
70
D
30
E
C
60
80
Node Split
• From the above examples, we find
that each time an attempt is made to
add an element into a 3-node p, a new
node q is created. This is referred to
as a node split.
Program 10.5: Insertion into a 2-3 tree (P.501)
Deletion From a 2-3 Tree
• If the element to be deleted is not in a
leaf node, the deletion operation can be
transformed to a leaf node. The deleted
element can be replaced by either the
element with the largest key on the left or
the element with the smallest key on the
right subtree.
• Now we can focus on the deletion on a leaf
node.
Deletion From A 2-3Tree
Example
A
A
50 80
50 80
B
C
10 20
A
(a) Initial 2-3 tree
C
10 20
90 95
60 70
60
(b) 70 deleted
50 80
B
10 20
(c) 90 deleted
B
D
C
60
D
95
D
90 95
Deletion From A 2-3Tree
Example (Cont.)
(d) 60 deleted
B
A
(e) 95 deleted
20 80
C
10
D
95
50
(f) 50 deleted
A
20
A
B
10
20
C
50 80
(g) 10 deleted
B
20 80
B
10
C
80
Rotation and Combine
• As shown in the example, deletion may
invoke a rotation or a combine operations.
• For a rotation, there are three cases
– the leaf node p is the left child of its parent r.
– the leaf node p is the middle child of its parent
r.
– the leaf node p is the right child of its parent r.
Three Rotation Cases
p
x
q
y z
a
c
b
w z
y ?
x ?
p
r
r
r
d
a
q
z
b
c
q
x y
d
c
b
p
d e
(a) p is the left child of r
a
b
q
x
p
c d
w y
y ?
z ?
q
x y
r
r
r
a
p
z
b
c
a
d
(b) p is the middle child of r
b
q
x
p
z
c
d
e
(c) p is the right child of r
Steps in Deletion From a
Leaf Of a 2-3 Tree
• Step 1: Modify node p as necessary to reflect its status after the
desired element has been deleted.
• Step 2: while( p has zero elements && p is not the root ) {
let r be the parent of p;
let q be the left or right sibling of p ( as appropriate );
if( q is a 3-node )
rotate;
else
combine;
p=r;
}
• Step 3: If p has zero elements, then p must be the root. The left
child of p becomes the new root, and node p is deleted.
Combine When p is the Left
Child of r
r
r
z
x z
p
a
p
x y
q
y
a
c
b
b
c
(a)
r
r
x z
q
p
b
p
x
d
y
a
z
c
a
(b)
d
b
c
M-Way Search Tree
Definition: An m-way search tree, either is empty or
satisfies the following properties:
(1)The root has at most m subtrees and has the following
structures:
n, A0, (K1, A1), (K2, A2), …, (Kn, An)
where the Ai, 0 ≤ i ≤ n ≤ m, are pointers to subtrees, and
the Ki, 1 ≤ i ≤ n ≤ m, are key values.
(2) Ki < Ki +1, 1 ≤ i ≤ n
(3) All key values in the subtree Ai are less than Ki +1 and
greater then Ki , 0 ≤ i ≤ n
(4) All key values in the subtree An are greater than Kn , and
those in A0 are less than K1.
(5) The subtrees Ai, 0 ≤ i ≤ n , are also m-way search trees.
Searching an m-Way
Search Tree
• Suppose to search a m-Way search tree T
for the key value x. Assume T resides on a
disk. By searching the keys of the root, we
determine i such that Ki ≤ x < Ki+1.
– If x = Ki, the search is complete.
– If x ≠ Ki, x must be in a subtree Ai if x is in T.
– We then proceed to retrieve the root of the
subtree Ai and continue the search until we
find x or determine that x is not in T.
Searching an m-Way
Search Tree
• The maximum number of nodes in a tree of degree
m and height h is
m
i
 (m h  1) /( m  1)
0i  h 1
• Therefore, for an m-Way search tree, the
maximum number of keys it has is mh - 1.
• To achieve a performance close to that of the
best m-way search trees for a given number of
keys n, the search tree must be balanced.
B-Tree
Definition: A B-tree of order m is an m-way
search tree that either is empty or
satisfies the following properties:
(1) The root node has at least two children.
(2) All nodes other than the root node and
failure nodes have at least m / 2 children.
(3) All failure nodes are at the same level.
B-Tree (Cont.)
• Note that 2-3 tree is a B-tree of order 3 and 2-3-4 tree is
a B-tree of order 4.
• Also all B-trees of order 2 are full binary trees.
• A B-tree of order m and height l has at most ml -1 keys.
• For a B-tree of order m and height l, the minimum number
of keys (N) in such a tree is N  2m / 2l 1 1, l  1.
• If there are N key values in a B-tree of order m, then all
nonfailure nodes are at levels less than or equal to
l, l  log m / 2 {( N  1) / 2}  1 . The maximum number of accesses
that have to be made for a search is l.
• For example, a B-tree of order m=200, an index with N ≤
2x106-2 will have l ≤ 3.
The Choice of m
• B-trees of high order are desirable since they
result in a reduction in the number of disk
accesses.
• If the index has N entries, then a B-tree of order
m=N+1 has only one level. But this is not
reasonable since all the N entries can not fit in
the internal memory.
• In selecting a reasonable choice for m, we need to
keep in mind that we are really interested in
minimizing the total amount of time needed to
search the B-tree for a value x. This time has two
components:
(1) the time for reading in the node from the disk
(2) the time needed to search this node for x.
The Choice of m (Cont.)
• Assume a node of a B-tree of order m is of a fixed size and
is large enough to accommodate n, A0 , and m-1 triple (Ki , Ai ,
Bi), 1 ≤ j < m.
• If the Ki are at most charactersα long and Ai and Bi each
characters βlong, then the size of a node is about m(α+2β).
Then the time to access a node is
ts + tl + m(α+2β) tc = a+bm
where a = ts + tl = seek time + latency time
b = (α+2β) tc , and tc = transmission time per character.
• If binary search is used to search each node of the B-tree,
then the internal processing time per node is c log2 m+d for
some constants c and d.
• The total processing time per node is τ= a + bm + c log2 m+d
ad
bm

 c}
• The maximum search time is f * log 2{( N  1) / 2}*{
log 2 m log 2 m
where f is some constant.
Figure 10.36: Values of
(35+0.06m)/log2m
m
Search time (sec)
2
35.12
4
17.62
8
11.83
16
8.99
32
7.38
64
6.47
128
6.10
256
6.30
512
7.30
1024
9.64
2048
14.35
4096
23.40
8192
40.50
Total maximum search time
Figure 10.37: Plot of
(35+0.06m)/log2m
6.8
5.7
50
m
125
400
Insertion into a B-Tree
•
•
•
•
Instead of using 2-3-4 tree’s top-down insertion, we generalize
the two-pass insertion algorithm for 2-3 trees because 2-3-4
tree’s top-down insertion splits many nodes, and each time we
change a node, it has to be written back to disk. This increases the
number of disk accesses.
The insertion algorithm for B-trees of order m first performs a
search to determine the leaf node p into which the new key is to
be inserted.
– If the insertion of the new key into p results p having m keys, the node
p is split.
– Otherwise, the new p is written to the disk, and the insertion is
complete.
Assume that the h nodes read in during the top-down pass can be
saved in memory so that they are not to be retrieved from disk
during the bottom-up pass, then the number of disk accesses for
an insertion is at most h (downward pass) +2(h-1) (nonroot splits) +
3(root split) = 3h+1.
The average number of disk accesses is approximately h+1 for
large m.
Figure 10.38: B-Trees of
Order 3
20
10, 30
25, 30
10
(a) p = 1, s = 0
p is the number of
nonfailure nodes
in the final B-tree
with N entries.
s is the number
of split
(b) p = 3, s = 1
20, 28
(c) p = 4, s = 2
10
10
25, 30
Deletion from a B-Tree
• The deletion algorithm for B-tree is also a
generalization of the deletion algorithm for 2-3
trees.
• First, we search for the key x to be deleted.
– If x is found in a node z, that is not a leaf, then the
position occupied by x in z is filled by a key from a leaf
node of the B-tree.
– Suppose that x is the ith key in z (x =Ki). Then x may be
replaced by either the smallest key in the sbutree Ai or
the largest in the subtree Ai-1. Since both nodes are leaf
nodes, the deletion of x from a nonleaf node is
transformed into a deletion from a leaf.
Deletion from a B-Tree
(Cont.)
• There are four possible cases when deleting from a leaf
node p.
– In the first case, p is also the root. If the root is left with at
least one key, the changed root is written back to disk.
Otherwise, the B-tree is empty following the deletion.
– In the second case, following the deletion, p has at least
m / 2  1keys. The modified leaf is written back to disk.
– In the third case, p has m / 2  2 keys, and its nearest sibling, q,
has at least m / 2 keys. Check only one of p’s nearest siblings. p
is deficient, as it has one less than the minimum number of keys
required. q has more keys than the minimum required. As in the
case of a 2-3 tree, a rotation is performed. In this rotation,
the number of keys in q decreases by one, and the number in p
increases by one.
– In the fourth case, p has m / 2  2 keys, and q has m / 2  1 keys.
p is deficient and q has minimum number of keys permissible
for a nonroot node. Nodes p and q and the keys Ki are combined
to form a single node.
Figure 10.39 B-Tree of
Order 5
2 20 35
2 10 15
2 25 30
3 40 45 50