Insertion in a B-Tree

CPS216: Advanced Database
Systems
Notes 05: Operators for Data Access
(contd.)
Shivnath Babu
1
Insertion in a B-Tree
n=2
49
15
36
49
Insert: 62
2
Insertion in a B-Tree
n=2
49
15
36
49
62
Insert: 62
3
Insertion in a B-Tree
n=2
49
15
36
49
62
Insert: 50
4
Insertion in a B-Tree
49
15
36
n=2
62
49
50
62
Insert: 50
5
Insertion in a B-Tree
49
15
36
n=2
62
49
50
62
Insert: 75
6
Insertion in a B-Tree
49
15
36
n=2
62
49
50
62
75
Insert: 75
7
Insertion
8
Insertion
9
Insertion
10
Insertion
11
Insertion
12
Insertion
13
Insertion
14
Insertion
15
Insertion
16
Insertion
17
Insertion
18
Insertion: Primitives




Inserting into a leaf node
Splitting a leaf node
Splitting an internal node
Splitting root node
19
Inserting into a Leaf Node
58
54
57
60
62
20
Inserting into a Leaf Node
58
54
57
60
62
21
Inserting into a Leaf Node
58
54
57
58
60
62
22
Splitting a Leaf Node
61
54
54
57
58
60
66
62
23
Splitting a Leaf Node
61
54
54
57
58
60
66
62
24
Splitting a Leaf Node
61
54
54
57
58
66
60
61
62
25
Splitting a Leaf Node
59
61
54
54
57
58
66
60
61
62
26
Splitting a Leaf Node
61
54
54
57
58
59
66
60
61
62
27
Splitting an Internal Node
…
21
99
…
59
40
54
[54, 59)
66
74
[ 59, 66)
84
[66,74)
Splitting an Internal Node
…
21
99
…
59
40
54
[54, 59)
66
74
[ 59, 66)
84
[66,74)
Splitting an Internal Node
66
…
40
54
[54, 59)
21
99
…
[21,66)
[66, 99)
59
74
[ 59, 66)
84
[66,74)
Splitting the Root
59
40
54
[54, 59)
66
74
[ 59, 66)
84
[66,74)
Splitting the Root
59
40
54
[54, 59)
66
74
[ 59, 66)
84
[66,74)
Splitting the Root
66
40
54
[54, 59)
59
74
[ 59, 66)
84
[66,74)
Deletion
34
Deletion
redistribute
35
Deletion
36
Deletion - II
37
Deletion - II
merge
Deletion - II
39
Deletion - II
40
Deletion - II
41
Deletion - II
Not needed
merge
42
Deletion - II
43
Deletion: Primitives





Delete key from a leaf
Redistribute keys between sibling leaves
Merge a leaf into its sibling
Redistribute keys between two sibling internal
nodes
Merge an internal node into its sibling
44
Merge Leaf into Sibling
72
…
54
58
64
67
85
68
72
75
45
Merge Leaf into Sibling
72
…
54
58
64
67
85
68
75
46
Merge Leaf into Sibling
72
…
54
58
64
68
67
85
75
47
Merge Leaf into Sibling
72
…
54
58
64
68
85
75
48
Merge Internal Node into Sibling
…
41
48
59
…
63
52
[52, 59)
74
[59,63)
49
Merge Internal Node into Sibling
…
41
48
52
59
59
…
63
[52, 59)
[59,63)
50
B-Tree Roadmap

B-Tree
Recap
 Insertion (recap)
 Deletion
 Construction
 Efficiency



B-Tree variants
Hash-based Indexes
51
Question
How does insertion-based construction
perform?
52
B-Tree Construction
Sort
48
57
41
15
75
21
62
34
81
11
97
13
53
B-Tree Construction
11
13
15
21
34
41
48
57
11
13
15
21
34
41
48
57
Scan
62
62
75
81
75
81
97
97
B-Tree Construction
21
11
13
15
21
34
48
41
Scan
75
48
57
62
75
81
97
B-Tree Construction
Why is sort-based construction better than
insertion-based one?
56
Cost of B-Tree Operations



Height of B-Tree: H
Assume no duplicates
Question: what is the random I/O cost of:
Insertion:
 Deletion:
 Equality search:
 Range Search:

57
Height of B-Tree


Number of keys: N
B-Tree parameter: n
log N
Height ≈ log n N =
log n
In practice: 2-3 levels
58
Question: How do you pick parameter n?
1. Ignore inserts and deletes
2. Optimize for equality searches
3. Assume no duplicates
59
Roadmap


B-Tree
B-Tree variants
Sparse Index
 Duplicate Keys


Hash-based Indexes
60
Roadmap



B-Tree
B-Tree variants
Hash-based Indexes
Static Hash Table
 Extensible Hash Table
 Linear Hash Table

61
Hash-Based Indexes



Adaptations of main memory hash tables
Support equality searches
No range searches
62
Indexing Problem (recap)
Index Keys
record pointers
a1
a2
A = val
ai
an
Main Memory Hash Table
buckets
key
h (key)
0
1
32
48
2
10
(null)
27
75
3
4
h (key) = key % 8
5
21
6
7
55
(null)
(null)
(null)
(null)
64
Adapting to disk

1 Hash Bucket = 1 Block
All keys that hash to bucket stored in the block
 Intuition: keys in a bucket usually accessed together
 No need for linked lists of keys …

65
Adapting to Disk
How do we handle this?
66
Adapting to disk

1 Hash Bucket = 1 Block
All keys that hash to bucket stored in the block
 Intuition: keys in a bucket usually accessed together
 No need for linked lists of keys …
 … but need linked list of blocks (overflow blocks)

67
Adapting to Disk
68
Adapting to Disk
0
Is there any other
issue?
1
2
Map ‘bucket id’
to disk location
69
Adapting to disk


1 Hash Bucket = 1 Block
Bucket Id  Disk Address mapping
Contiguous blocks
 Store mapping in main memory


Too large?
70
Beware of claims that assume 1 I/O
for hash tables and 3 I/Os for B-Tree!!
71
Adapting to disk



1 Hash Bucket = 1 Block
(or more than one contiguous blocks)
Bucket Id  Disk Address mapping
Number of buckets
≈ Number of keys (main memory version)
 ≈ Number of blocks (disk version)

Textbook: Static Hash Table
72
Assigned Reading
Insertion and Deletion on Static Hash Table
Section 13.4
73
Roadmap



B-Tree
B-Tree variants
Hash-based Indexes
Static Hash Table
 Extensible Hash Table
 Linear Hash Table

74
Dynamic Hash Indexes

Static Hash Table:
Fixed number of buckets
 Waste space / inefficient


Dynamic Hash Tables:

Number of buckets can increase / decrease
dynamically
75
Extensible Hash Table:
Main Ideas (Abstract)



Hash Function: {Keys}  {Large space of
hash values}
Buckets dynamically partition space of hash
values
Insertions: partitioning grows finer


i.e., more buckets
Deletions: partitioning grows coarser

i.e., fewer buckets
76
Extensible Hash Table:
Main Ideas (concrete)
Hash Function: {Keys}  bit string of length b
Example:
01110100
Bucket: prefix of bit string
All (keys with) hash values having that prefix
fall into that bucket
77
0
01100110
01011010
Hash Value  bucket?
10
10110001
10011010
11
11011110
prefixes
i = max length of prefix
i=2
00
01
10
11
0
01100110
01011010
10
10110001
10011010
11
11011110
Insertion
.
i=0
80
Insertion
.
10110001
i=0
81
Insertion
.
10110001
10110001
i=0
82
Insertion
.
00110101
10110001
i=0
00110101
83
Insertion
.
11010010
10110001
i=0
00110101
84
Insertion
0
11010010
10110001
i=0
00110101
1
85
Insertion
0
11010010
i=0
00110101
1
10110001
86
Insertion
0
11010010
i=1
0
1
00110101
1
10110001
87
Insertion
0
11010010
i=1
0
1
00110101
1
10110001
11010010
88
Insertion
0
11001101
i=1
0
1
00110101
1
10110001
11010010
89
Insertion
0
11001101
i=1
0
1
00110101
1
10110001
11010010
90
Insertion
0
11001101
i=1
0
1
00110101
10
10110001
11010010
11
91
Insertion
0
11001101
i=1
0
1
00110101
10
10110001
11
11010010
92
Insertion
0
11001101
i=2
00
01
00110101
10
10110001
10
11
11
11010010
93
Insertion
0
11001101
i=2
00
01
00110101
10
10110001
10
11
11
11010010
11001101
94
Deletion
Inverse of insertion: work out details
95
Textbook Notation
i=2
00
01
0
1
10
11
Number of bits in prefix
96
Extensible Hash Table
One Issue:
Directory doubles in size during some inserts
97
Roadmap



B-Tree
B-Tree variants
Hash-based Indexes
Static Hash Table
 Extensible Hash Table
 Linear Hash Table

98
Linear Hash Table

Differences from Extensible Hash Table:
Bucket: suffix of the hash value
 Grows linearly
(avoids doubling of directory)

99
Linear Hash Table
00
01100100
01011000
1
10110001
10011001
10
11011110
suffixes
Linear Growth
0
1
101
Linear Growth
00
1
10
redistribute
102
Linear Growth
00
01
10
redistribute
11
What does linear growth buy?
i=3
000
000
001
010
011
01
100
101
110
111
10
11
100
Redundant if we know # buckets = 5
104
What does linear growth buy?
i=3
000
000
001
010
011
01
100
10
11
i=3
n=3
100
105

Download Report

Insertion in a B-Tree

Paperzz.com

Your Paperzz