m/2 - Rogelio Davila

+
Data Structures
B-tree
Jibrael Jos : Sep 2009
Introduction
Multiway Trees
B Tree
Application
+
Structure
Agenda
Algo : Insert / Delete
Avoid Taking Printout : Use RTF Outline in case needed
2
+
3
B Tree
Critic
Summattion
Maths
Series
Variations
B*, B+
Application
Industry
Please Do Not Take Printout : Use RTF Outline in case needed
+
4
Binary Search Tree

What happens if data is loaded in a binary search tree in this
order

23, 32, 45, 11, 43 , 41

1,2,3,4,5,6,7,8

What is AVL tree
Please Do Not Take Printout : Use RTF Outline in case needed
+
5
Multiway Trees
K
1
< K1
Please Do Not Take Printout : Use RTF Outline in case needed
K
2
>= K1
<K2
>= K2
m-way
trees
+

Reduce the depth of the tree to
with m-way trees
K1
K
1
K
2
K
3
K
1
K
2
K2
K
3
O(logmn)
K3
K
1
K
2
K
3
K
1

m children, m-1 keys per node

m = 10 : 106 keys in 6 levels vs 20 for a binary tree

but ........
K
2
K
3
+
m-way trees
 But
you have to search through
the m keys in each node!
 Reduces
levels!
your gain from having fewer
m-way
trees
+
50
35
45
60
85
70
100
95
90
75
15
0
12
5
135
110
175
120
+
B-trees
 All
leaves are on the same level
 All
nodes except for the root and the leaves
have
Each node is at least


at least m/2 children
at most m children
half full of keys
Anand B
BTREE
+
21
11
14
74
102
78
85
97
12
5
135
+
11
Multiway Tree

M – ary tree

3 levels :

Cylinder , Track , Record : Index Seq (RDBMS)

Tables with less change
Please Do Not Take Printout : Use RTF Outline in case needed
+
12
BTree

If level is 3, m =199 then what is N

How many split per insertion ?
Please Do Not Take Printout : Use RTF Outline in case needed
+
13
Multiway Trees : Application


NDPL , Delhi: Electricity Billing

3 lakh consumers

Table indexed as BTREE
UCO Bank, Jaipur

One DD takes 10 minutes to print

Saviour : BTREE
Please Do Not Take Printout : Use RTF Outline in case needed
+
B-trees - Insertion

Insertion

B-tree property : block is at least half-full of keys

Insertion into block with
m keys

block overflows

split block

promote one key

split parent if necessary

if root is split, tree becomes one level deeper
Insert
Node
+
63
21
11
14
74
102
78
85
97
12
5
135
After
Insert
63
+
21
11
14
63
78
74
10
2
85
97
12
5
135
Insert
Node
+
99
21
11
14
74
102
78
85
97
12
5
135
After
Insert
99
+
21
11
14
74
85
78
10
2
97
99
12
5
135
Split
Node
+
74
78
85
78
85
97
0
4
node
63
74
97
+
20
Structure of Btree
node
Entry
firstPtr
key
numEntries
rightPtr
Entries[1.. M-1] End Entry
End
Avoid Taking Printout : Use RTF Outline in case needed
Split
Node
:
Final
+
0
entry
63
3
3
74
node
78
2
toNdx
2
rightPtr
85
97
median
4
fromNdx
Split
Node
:
Final
+
4
entry
74
3
3
78
node
85
1
toNdx
2
rightPtr
97
99
median
4
fromNdx
Traversal
+
21
11
14
42
45
78
63
74
85
95
Agenda
Delete
Delete Walk Through
Reflow
+
Borrow Left
Borrow Right
Combine
Delete Mid
Avoid Taking Printout : Use RTF Outline in case needed
24
Delete
:
For
78
+







25
Btree Delete
Delete()
Delete()
Delete Mid()
Reflow()
Reflow()
If shorter delete root
1 42
2 16
2
1
2 57
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
2 63
7
4
7
8
2 85
9
7
Btree
Delete
+







B
If (root null)
print (“Attempt to delete from null tree”)
Else
shorter = delete (root, target)
if Shorter
delete root
Return root
Target = 78
1 42
2 16
2
1
2 57
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
2 63
7
4
7
8
2 85
9
7
26
Delete(root
,
deleteKey)
+ If (root null)
B

data does not exist










D
Else
entryNdx= searchNode(root, deleteKey)
if found entry to be deleted
if leaf node
underflow=deleteEntry()
else
underflow=deleteMid (left)
if underflow
underflow=reflow()
Target = 78
1 42
2 16
2
1
2 57
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
2 63
7
4
7
8
2 85
9
7
27
Delete
Else
Part
+
B

Else

if deleteKey less than first entry
subtree=firstPtr
else
subtree=rightPtr
underflow= delete (subtree,deleteKey)
if underflow
underflow= reflow()
Return underflow







28
D
Target = 78
1 42
2 16
2
1
2 57
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
2 63
7
4
7
8
2 85
9
7
Delete(root
,
deleteKey)
+  If (root null)








29
D
data does not exist


B
Else
entryNdx= searchNode(root, deleteKey)
if found entry to be deleted
Target = 78
if leaf node
underflow=deleteEntry()
else
underflow=deleteMid (root,entryIndx,left)
if underflow
underflow=reflow(root,entryIndx)
D
DM
1 42
2 16
2
1
2 57
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
2 63
7
4
7
8
2 85
9
7
Delete(root
,
deleteKey)
+ If (root null)
B

data does not exist










D
Else
entryNdx= searchNode(root, deleteKey)
if found entry to be deleted
D
if leaf node
74 replaces 78
underflow=deleteEntry()
else
underflow=deleteMid (root,entryIndx,left)
if underflow
underflow=reflow(root,entryIndx)
1 42
2 16
2
1
2 57
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
1 63
7
4
2 85
9
7
30
Delete(root
,
deleteKey)
+ If (root null)
B

data does not exist










D
Else
entryNdx= searchNode(root, deleteKey)
if found entry to be deleted
if leaf node
After Reflow
underflow=deleteEntry()
else
underflow=deleteMid (root,entryIndx,left)
if underflow
underflow=reflow(root,entryIndx)
D
1 42
2 16
1 57
2
1
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
4 63
7
4
85
97
31
Delete
Else
Part
+ Else
B
D
if deleteKey less than first entry

subtree=firstPtr

else
Before Reflow

subtree=rightPtr

underflow= delete (subtree,deleteKey)

if underflow

underflow= reflow(root,entryIndx)
 Return underflow

1 42
2 16
1 57
2
1
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
4 63
7
4
85
97
32
Delete
Else
Part
+ Else








B
if deleteKey less than first entry
subtree=firstPtr
else
subtree=rightPtr
underflow= delete (subtree,deleteKey)
if underflow
underflow= reflow(root,entryIndx)
Return underflow
D
After Reflow
0
4 16
2
1
42
2 45
57
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
4 63
7
4
85
97
33
BTREE
Delete
+ If (root null)






B
print (“Attempt to delete from null tree”)
Else
shorter = delete (root, target)
if Shorter
delete root
Return root
0
4 16
2
1
42
2 45
57
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
4 63
7
4
85
97
34
BTREE
Delete
+ If (root null)
B
print (“Attempt to delete from null tree”)
 Else

shorter = delete (root, target)

if Shorter

delete root
 Return root

4 16
2
1
42
2 45
57
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
4 63
7
4
85
97
35
Delete
:
For
78
+ Btree Delete






38
Delete()
Delete()
Delete Mid()
Reflow()
Reflow()
If shorter delete root
1 42
2 16
2
1
2 57
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
2 63
7
4
7
8
2 85
9
7
+
39
Delete : Reflow
 1: Try
 2: If
 3:
to borrow right.
1 failed try to borrow from left
Cannot Borrow (1,2 failed) Combine
Please Do Not Take Printout : Use RTF Outline in case needed
+
40
Delete Reflow
 Underflow=false
 If RT->no > min Entries

BorrowRight (root,entryNdx,LT,RT)
 Else

If LT->no > min Entries

BorrowLeft (root,entryNdx,LT,RT)
 Else

combine (root,entryNdx,LT,RT)

if root->no < min entries

underflow=True
 Return underflow

Please Do Not Take Printout : Use RTF Outline in case needed
Borrow
Left
+
41
2 8
3 45
Please Do Not Take Printout : Use RTF Outline in case needed
6
3
74
Node >=
74
< 78
7
8
1 85
Node >=
78
< 85
Combine
+
42
3 21
2 42
5
7
78
4
5
1 63
2 59
Please Do Not Take Printout : Use RTF Outline in case needed
6
1
2 65
7
1
Combine
+
43
3 21
3 42
4
5
5
7
78
57
1 63
2 59
Please Do Not Take Printout : Use RTF Outline in case needed
6
1
2 65
7
1
Combine
+
44
3 21
4 42
45
57
5
7
63
2 59
Please Do Not Take Printout : Use RTF Outline in case needed
78
6
1
2 65
7
1
Combine
+
45
2 21
4 42
45
57
78
63
2 59
Please Do Not Take Printout : Use RTF Outline in case needed
6
1
2 65
7
1
+
46
Delete Mid
 If
leaf

exchange data and delete leaf entry
 Else

traverse right to locate predecessor

deleteMid(right)

if underflow

reflow
Please Do Not Take Printout : Use RTF Outline in case needed
Delete
Mid
+
47
1 42
2 16
2
1
2 57
2 45
5
2
2 63
7
8
7
4
Case 1: To Delete 78 we replace with 74
Please Do Not Take Printout : Use RTF Outline in case needed
2 85
9
7
Delete
Mid
+
48
1 42
2 16
2
1
2 57
2 45
5
2
2 63
Case 2:
To Delete 78 we replace with 76
Hence recursive call of Delete
Mid to locate predecessor
Please Do Not Take Printout : Use RTF Outline in case needed
7
8
7
4
2 85
2 75
7
6
9
7
+
49
order
Order
3
4
5
6
…
Min
2
2
3
3
…
Max
3
4
5
6
…
m
m/2
m
Please Do Not Take Printout : Use RTF Outline in case needed
Get
the
Order
Right
+
 Keys
are 4
 Subtrees
Max is 5 = Order is 5
 Minimum
 Min
50
= 3 (which is subtrees)
Keys is 2
4 16
2
1
42
2 45
57
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
4 63
7
4
85
97
+
51
2-3 Tree
 Order
3 ….. So how many keys in a node
 This
rule is valid for non root leaf
 Root
can have 0, 2, 3 subtrees
Please Do Not Take Printout : Use RTF Outline in case needed
2
-3
Tree
+
52
1 42
2 16
2 57
2 45
5
2
Please Do Not Take Printout : Use RTF Outline in case needed
2 63
7
8
2 85
9
7
+
53
2-3-4 Tree
 Order
4 ….. So how many keys in a node
 This
rule is valid for non root leaf
 Root
can have 0, 2, 3 subtrees
Please Do Not Take Printout : Use RTF Outline in case needed
+
54
Structure of B + tree
 Non
leaf node

firstPtr

numEntries

Entries[1.. M-1]
 End
 Leaf node

firstPtr

numEntries

Entries[1.. M-1]

Next Leaf Node
 End




Entry
key
rightPtr
End Entry
Avoid Taking Printout : Use RTF Outline in case needed
B
+
Tree
+
55
1 42
2 57
2 45
5
2
Implies there are more nodes
Please Do Not Take Printout : Use RTF Outline in case needed
2 63
7
4
7
8
2 85
9
7
+
56
B * Tree

Space Usage

BTREE nodes can be 50% Empty (1/2)

So rule modified to two third (2/3)
Also when node overflows instead of being split immed
distributed with siblings

And even when split happens all siblings are equally
distributed (pg 462)

Please Do Not Take Printout : Use RTF Outline in case needed
+
B+-trees
 B+
trees
 All
the keys in the nodes are dummies
 Only the keys in the leaves point to “real” data
 Linking the leaves

Ability to scan the collection in order
without passing through the higher nodes