SearchTrees - Erode Sengunthar Engineering College

CSCE 3110
Data Structures &
Algorithm Analysis
Rada Mihalcea
http://www.cs.unt.edu/~rada/CSCE3110
Search Trees
Reading: Chap. 4, Weiss
Sorting with BST
Use binary search trees for sorting
Start with unsorted sequence
Insert all elements in a BST
Traverse the tree…. how ?
Running time?
Better Binary Search Trees
Prevent the degeneration of the BST :
A BST can be set up to maintain balance during
updating operations (insertions and removals)
Types of BST which maintain the optimal performance:
splay trees
AVL trees
Red-Black trees
B-trees
AVL Trees
Balanced binary search
trees
An AVL Tree is a binary
search tree such that
for every internal node v
of T, the heights of the
children of v can
differ by at most 1.
4
44
2
3
17
78
1
2
32
1
88
50
1
48
62
1
Height of an AVL Tree
Proposition: The height of an AVL tree T storing n keys is
O(log n).
Justification: The easiest way to approach this problem is to find
n(h): the minimum number of internal nodes of an AVL tree of
height h.
n(1) = 1 and n(2) = 2
for n ≥ 3, an AVL tree of height h contains the root node, one AVL
subtree of height n-1 and the other AVL subtree of height n-2.
 n(h) = 1 + n(h-1) + n(h-2)
given n(h-1) > n(h-2)  n(h) > 2n(h-2)
n(h) > 2n(h-2)
n(h) > 4n(h-4)
…
n(h) > 2in(h-2i)
pick i = h/2 – 1  n(h) ≥ 2 h/2-1
follow h < 2log n(h) +2
 height of an AVL tree is O(log n)
Insertion
A binary search tree T is called balanced if for every node v, the
height of v’s children differ by at most one.
Inserting a node into an AVL tree involves performing an
expandExternal(w) on T, which changes the heights of some of
the nodes in T.
If an insertion causes T to become unbalanced, we travel up the
tree from the newly created node until we find the first node x
such that its grandparent z is unbalanced node.
Since z became unbalanced by an insertion in the subtree rooted
at its child y, height(y) = height(sibling(y)) + 2
Need to rebalance...
Insertion: Rebalancing
To rebalance the subtree rooted at z, we must perform a
restructuring
we rename x, y, and z to a, b, and c based on the order
of the nodes in an in-order traversal.
z is replaced by b, whose children are now a and c
whose children, in turn, consist of the four other
subtrees formerly children of x, y, and z.
Insertion (cont’d)
unbalanced...
5
44
z
2
17
3
1
32
1
1
50
2
1
7
78
2 y
1
48
64
3
4
62
88
x
5
T3
54
T0
...balanced
T2
44
4
3
2
17
2
1
32
1
1
48
62
2y
50
4 x
3
1
z 6
5
78
54
2
7
88
T2
T0
T1
T3
1
Restructuring
The four ways to rotate nodes in an AVL tree, graphically represented
-Single Rotations:
a=z
b=y
single rotation
b=y
a=z
c=x
c=x
T0
T3
T1
T3
T0
T1
T2
T2
c =z
b=y
single rotation
b=y
a=x
c =z
a=x
T3
T0
T2
T1
T3
T2
T1
T0
Restructuring (cont’d)
double rotations:
double rotation
a=z
c= y
b=x
a=z
c= y
b=x
T0
T3
T2
T0
T2
T1
T3
T1
double rotation
c=z
a=y
b=x
a=y
c=z
b=x
T3
T0
T2
T1
T3
T2
T1
T0
Restructure Algorithm
Algorithm restructure(x):
Input: A node x of a binary search tree T that has both a parent y and a
grandparent z
Output: Tree T restructured by a rotation (either
single or double) involving nodes x, y, and z.
1: Let (a, b, c) be an inorder listing of the nodes x, y, and z, and let (T0, T1,
T2, T3) be an inorder listing of the the four subtrees of x, y, and z, not
rooted at x, y, or z.
2. Replace the subtree rooted at z with a new subtree rooted at b
3. Let a be the left child of b and let T0, T1 be the left and right subtrees of
a, respectively.
4. Let c be the right child of b and let T2, T3 be the left and right subtrees of
c, respectively.
Cut/Link Restructure Algorithm
Let’s go into a little more detail on this algorithm...
Any tree that needs to be balanced can be grouped into 7 parts: x, y, z,
and the 4 trees anchored at the children of those nodes (T0-3)
44
y
17
62
50
T0
48
x
78
54
88
T2
T1
T3
Cut/Link Restructure Algorithm
44
y
17
62
50
T0
48
x
78
54
88
T2
T1
T3
Make a new tree which is balanced and put the 7 parts from the
old tree into the new tree so that the numbering is still correct
when we do an in-order-traversal of the new tree.
This works regardless of how the tree is originally unbalanced.
Let’s see how it works!
Cut/Link Restructure Algorithm
Number the 7 parts by doing an in-order-traversal. (note that x,y, and z are
now renamed based upon their order within the traversal)
1
2z (a)
44
y (b)
17
4
62
3
x (c)
50
T0
78
54
48
5
6
88
T2
T1
T3
7
Cut/Link Restructure Algorithm
Now create an Array, numbered 1 to 7 (the 0th element can be ignored
with minimal waste of space)
1
2
3
4
5
6
7
•Cut() the 4 T trees and place them in their inorder rank in the array
T0
1
T1
2
3
T2
4
5
T3
6
7
Cut/Link Restructure Algorithm
Now cut x,y, and z in that order (child,parent,grandparent)
and place them in their inorder rank in the array.
T0
a
44
T1
62
b T2
c
78
T3
1
2
3
4
5
6
7
•Now we can re-link these subtrees to the main tree.
•Link in rank 4 (b) where the subtree’s root formerly
4
b
62
Cut/Link Restructure Algorithm
Link in ranks 2 (a) and 6 (c) as 4’s children.
4b
62
a
2 44
c
78
6
Cut/Link Restructure Algorithm
Finally, link in ranks 1,3,5, and 7 as the children of 2 and 6.
4y
62
2 44z
17
3
x
78
5
54
Now you have a balanced tree!
T2
T3
T0
•
88
50
48
6
T1
7
Cut/Link Restructure Algorithm
This algorithm for restructuring has the exact same
effect as using the four rotation cases discussed earlier.
Advantages: no case analysis, more elegant
Disadvantage: can be more code to write
Same time complexity
Removal
We can easily see that performing a
removeAboveExternal(w) can cause T to become
unbalanced.
Let z be the first unbalanced node encountered while
traveling up the tree from w. Also, let y be the child of z with
the larger height, and let x be the child of y with the larger
height.
We can perform operation restructure(x) to restore balance
at the subtree rooted at z.
As this restructuring may upset the balance of another node
higher in the tree, we must continue checking for balance
until the root of T is reached
Removal (cont’d)
example of deletion from an AVL tree:
4
44
y
1
17
62
3
x
2
50
1
1
T0
48
78
0
1
54
88
T2
32
T
3
1
y
4
62
z
x
44
78
2
1
17
2
0
1
50
1
88
1
48
T0
2
54
T2
T3
Removal (cont’d)
example of deletion from an AVL tree:
z
44
4
y
1
17
62
x
2
50
1
1
T0
3
48
78
0
2
1
54
88
32
T1
T3
T2
x
4
50
2
z
y
44
62
1
1
1
17
3
48
54
0
78
2
1
88
T0
T1
T2
Multi-way Search Trees
Each internal node of a multi-way search tree T:
has at least two children
stores a collection of items of the form (k, x), where k is a key
and x is an element
contains d - 1 items, where d is the number of children  dnodes
“contains” 2 pseudo-items: k0=- , kd= 
Children of each internal node are “between” items
all keys in the subtree rooted at the child fall between
keys of those items
Multi-way Searching
Similar to binary searching
If search key s<k1 search the leftmost
child
If s>kd-1 , search the rightmost child
That’s it in a binary tree; what about if
d>2?
33 44
Find two keys ki-1 and ki between
which s falls, and search the child vi.
What would an in-order traversal look
like?
Searching
for s = 8
5 10
6 8
22
Searching
for s = 12
14
11 13
Not found!
25
23 24
27
17 18 19 20 21
2-4 Trees
a. Nodes may contain 1, 2 or 3 items.
b. A node with k items has k + 1 children
c. All leaves are on same level.
Example
10
3 8
25 38
45
70 90 100
Insertion
Insertion:
Find the appropriate leaf. If there is only one
item, just add to leaf.
If no room, move middle item to parent and
split remaining two items among two
children.
Insertion
insert 80
10
45
3 8
25 38
Overflow!
70 80 90 100
Insertion
Split & move middle element to parent
10
45 80
3 8
25 38
70
90 100
Removal
First : find the key with a simple multi-way
search
If the item to delete has non-external
children, reduce to the case where item is at
the bottom of the tree:
Find item which precedes it in in-order traversal
• which one?
Swap them
Remove the item
Alternative?
Removal
Not enough items in the node
Underflow!
Pull an item from the parent, replace it with
an item from a sibling - transfer
Still not good enough! What happens if
siblings are 2-nodes?
Could we just pull one item from the parent?
Removal
Remove 3
move 10 in the subtree
move 25 in the parent
10
3
45 80
25 38
70
90 100
Removal
If siblings are 2-nodes (i.e. contain only one
key)
cannot ‘steal’ from them
Do node merging
Remove 3
move 10 in the subtree
merge 10 with 25
10 45 80
3
25
70
90
2-4 Trees
More on removal:
What if parent is a 2-node?
Propagate underflow up the tree
2-4 trees are easy to maintain
Insertion and deletion take O(log n)
Balanced trees
B-Trees
Up to now, all data that has been stored in the tree
has been in memory.
If data gets too big for main memory, what do we do?
If we keep a pointer to the tree in main memory, we
could bring in just the nodes that we need.
For instance, to do an insert with a BST, if we need
the left child, we do a disk access and retrieve the left
child.
If the left child is NIL, then we can do the insert, and
store the child node on the disk.
Not too good for a BST
B-Trees
The problem with BST: storing the data requires disk
accesses, which is expensive, compared to execution
of machine instructions.
If we can reduce the number of disk accesses, then
the procedures run faster.
The only way to reduce the number of disk accesses is
to increase the number of keys in a node.
The BST allows only one key per leaf.
Very good and often used for Search Engines!
(when collection size gets very big  the index does not fit in
memory)
B-Trees
If we increase the number of keys in the nodes, how
will we do any tree operations effectively?
10 20 30 40 50 60 70
• Above is a node with 7 keys. How do we add
children?
10 20 30 40 50 60 70
123456789
11 22 33 44 55 66 77
B-Trees
Clearly, the tree below is useless.
10 20 30 40 50 60 70
123456789
11 22 33 44 55 66 77
• How many pointers do we need?
• Using the idea of BST, we need to be able to put nodes into
the tree that have smaller, same and larger values than the
node we are currently examining.
B-Trees: A General Case of
Multi-Way Search Trees
10
20
30
40
50
60
70
11
22
33
44
55
66
77
1 2 3 4 8 6 7 8 9
• We can easily find any value.
• We need to create operations, which require rules on what
makes a tree a B-Tree.
• Clearly, having one key per node would be very bad.
• We need a mechanism to increase the height of the tree (since
the number of keys in any node can get very high) so we can
shift keys out of a node, making the nodes smaller.
B-Trees: Fields in a Node
A B-Tree is a rooted tree (whose root is root[T]))
having the following properties:
1. Every internal node x has the following fields:
n[x]
c1[x]
key1[x key2[x key3[x key4[x key5[x key6[x key7[x key8[x leaf[x
]
]
]
]
]
]
]
]
]
c2[x]
c3[x]
c4[x]
c5[x]
c6[x]
c7[x]
c8[x]
c9[x]
n[x] is the number of keys in the node. n[x] = 8 above.
leaf[x] = false for internal nodes, since x is not a leaf.
The keyi[x] are the values of the keys, where keyi[x]  keyi+1[x].
ci[x] are pointers to child nodes. All the keys in ci[x] have values
that are between keyi-1[x] and keyi[x].
B-Trees
Leaf nodes have no child pointers
leaf[x] = true for leaf nodes.
All leaf nodes are at the same level
n[x]
key1[x key2[x key3[x key4[x key5[x key6[x key7[x key8[x leaf[x
]
]
]
]
]
]
]
]
]
B-Trees
There are lower and upper bounds on the number of
keys a node can contain. This depends on the
“minimum degree” t  2, which we must specify
for any given B-Tree.
a. Every node other than the root must have at least t-1
keys. Every internal node other than the root thus
has t children. If the tree is nonempty, the root must
have at least one key.
b. Every node can contain at most 2t-1 keys. Therefore,
an internal node can have at most 2t children. A
node is full if it contains exactly 2t-1 keys.
Height of B-Tree
If n  1, then for any n-key B-tree of height h and
mimimum degree t  2,
height = h  logt[(n+1)/2]
The important thing to notice is that the height of the tree
is log base t. So, as t increases, the height, for any number
of nodes n, will decrease.
Using the formula logax = (logbx)/(logba), we can see that
log2106 = (log10106)/(log102)  6/0.30102999566398  19
log10106 = 6
So, 13 less disk accesses to get to the leafs!
Basic Operations
The root of the B-tree is always in main
memory, so that a Disk-Read on the root is
never required; a Disk-Write of the root is
required, however, whenever the root node is
changed.
Any nodes that are passed as parameters
must already have had a Disk-Read operation
performed on them.
Searching a B-Tree
Start at the leftmost key
in the node, and go to the
right until you go too far.
If it is a leaf node, then you
are done, as there is no leaf
to inspect
Otherwise, retrieve the child node
from the disk, and put it into
memory
Inserting into B-trees
Really very easy. Very similar with (2,4) trees.
Just keep in mind that you are starting at the
root, and then finding the subtree where the
key should be inserted, and following the
pointer.
A deletion may eventually occur, and
sometimes deletions force keys into their
parents. So, if we encounter a full node on our
way to the node where the insertion will take
place, we must split that node into two.
Inserting into B-trees (cont’d)
B-Tree-Insert
If the node has
2t-1 keys, it can’t
accept any more
keys, so you need
to split it into
2 nodes before
doing the insert.
Otherwise, call Nonfull()
Deleting Keys from Nodes
Deleting Keys from Nodes