B+-Tree Index Files - COW :: Ceng

Physical Level of Databases:
B+-Trees
Adnan YAZICI
Computer Engineering Department
METU
DBMS , A. Yazıcı
1
B+ Tree
DBMS , A. Yazıcı
2
B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential files.
l
l
l
l
Disadvantage of indexed-sequential files:
performance degrades as file grows, since many
overflow blocks get created. Periodic reorganization
of entire file is required.
Advantage of B+-tree index files: automatically
reorganizes itself with small and local changes, in
the face of insertions and deletions. Reorganization
of entire file is not required to maintain performance.
Disadvantage of B+-trees: extra insertion and
deletion overhead, space overhead.
Advantages of B+-trees outweigh disadvantages,
and they are used extensively.
DBMS , A. Yazıcı
3
B+ Tree
l B+
Tree Properties
l B+ Tree Searching
l B+ Tree Insertion
l B+ Tree Deletion
DBMS , A. Yazıcı
4
B+ Tree
– Balanced Tree
Same height for paths from root to leaf
l Given a search-key K, nearly same access time for
different K values
l
– B+ Tree is constructed by parameter n
Each Node (except root) has ⎡n/2⎤ to n pointers
l Each Node (except root) has ⎡n/2⎤-1 to n-1 searchkey values
l
General case for n
Case for n=3
K1
P1
P2
DBMS , A. Yazıcı
K1
K2
P3
P1
P2
K2
Kn-1
Pn-1
Pn
5
B+-Tree Index Files (Cont.)
A B+-tree is a rooted tree satisfying the following properties:
l All
paths from root to leaf are of the same
length
l Each node that is not a root or a leaf has
between [n/2] and n children.
l Special cases:
– If the root is not a leaf, it has at least 2
children.
– If the root is a leaf (that is, there are no
other nodes in the tree), it can have
between 0 and (n–1) values.
DBMS , A. Yazıcı
6
B+ Tree
l
Search keys are sorted in order
– K1 < K2 < … <Kn-1
•Non-leaf Node
–Each key-search values in subtree Si
pointed by Pi < Ki, >=Ki-1
Key values in S1 < K1
K1 <= Key values in S2 < K2
•Leaf Node
–Pi points record or bucket with
search key value Ki
–Pn points to the neighbor leaf
node
DBMS , A. Yazıcı
P1
P1
K1 K2
P2 P3
S1
S2
K1 K2
Record of K1
S3
P3
…
P2
Record of K2
Record of K2
…
7
Leaf Nodes in B+-Trees
Properties of a leaf node:
l
l
l
For i = 1, 2, . . ., n, pointer Pi either points to a file record with
search-key value Ki, or to a bucket of pointers to file records,
each record having search-key value Ki. Only need bucket
structure if search-key does not form a primary key.
If Li, Lj are leaf nodes and i < j, Li’s search-key values are less
than Lj’s search-key values
Pn points to next leaf node in search-key order
DBMS , A. Yazıcı
8
Non-Leaf Nodes in B+-Trees
l Non
leaf nodes form a multi-level sparse
index on the leaf nodes. For a non-leaf node
with m pointers:
– All the search-keys in the subtree to which P1
points are less than K1
– For 2 ≤ i ≤ n – 1, all the search-keys in the subtree
to which Pi points have values greater than or
equal to Ki–1 and less than Ki+1
DBMS , A. Yazıcı
9
Example of a B+-tree
B+-tree for account file (n = 3)
DBMS , A. Yazıcı
10
Example of B+-tree
B+-tree for account file (n = 5)
l
l
All nodes must have between 2 and 4 key values
(⎣ (n)/2⎦ and n, with n = 5).
Root must have at least 2 children.
DBMS , A. Yazıcı
11
B+ Tree: Search
l
Given a search-value k
– Start from the root, look for the largest search-key
value (Kl) in the node <= k
– Follow pointer Pl+1 to next level, until reach a leaf
node
K <=k<K
l
P1
K1 K2
P2
P3
…
l+1
Kl
Kl+1
Pl+1
Kn-1
Pn-1
Pn
– If k is found to be equal to Kl in the leaf, follow Pl to
search the record or bucket
K
l
Record of Kl
Pl
Kl+1
k = Kl
Record of Kl
…
DBMS , A. Yazıcı
12
B+ Tree:Insertion
l
Overflow
– When the number of search-key values exceed n-1
7
–Leaf Node
9
13
15
Insert 8
•Split into two nodes:
–1st node contains ⎡(n-1)/2 ⎤ values
–2nd node contains remaining values
–Copy the smallest search-key value of the 2nd node
to parent node
9
7
DBMS , A. Yazıcı
8
9
13
15
13
B+ Tree: Insertion
l
Overflow
– When number of search-key values exceed n-1
7
9
13
15
Insert 8
–Non-Leaf Node
•Split into two nodes:
–1st node contains ⎡n/2 ⎤ -1 values
–Move the smallest of the remaining values, together
with pointer, to the parent
–2nd node contains the remaining values
9
7
DBMS , A. Yazıcı
8
13
15
14
B+ Tree: Insertion
1: Construct a B+ tree for (1, 4, 7,
10, 17, 21, 31, 25, 19, 20, 28, 42) with n=4.
l Example
7
1
4
7
1
7
1
DBMS , A. Yazıcı
4
4
7
10
17
7
10
17
21
15
B+ Tree: Insertion
l
B+ Tree Insertion
1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42
7
1
4
17
7
25
10
17
21
25
31
17
7
1
4
DBMS , A. Yazıcı
7
20
10
17
19
25
20
21
25
31
16
B+ Tree: Insertion
l
1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42
17
7
1
4
7
20
17
10
DBMS , A. Yazıcı
25
31
19
20
31
21
25
42
28
17
B+ Tree: Insertion
l Example
2: n=3, insert 4 into the following
B+Tree
9
7
2
8
5
Leaf
A
10
Subtree
C
Subtree
D
Leaf
B
9
7
10
4
2
DBMS , A. Yazıcı
8
4
5
C
A
D
B
18
B+ Tree: Deletion
l
Underflow
– When number of search-key values < ⎡n/2⎤-1
9
Delete 10
10
–Leaf Node
•Redistribute to sibling
•Right node not less than left node
•Replace the between-value in parent
by their smallest value of the right
node
•Merge (contain too few entries)
•Move all values, pointers to left node
•Remove the between-value in parent
13
9
10
DBMS , A. Yazıcı
18
13
9
10
13
14
9
14
18
14
9
13
14
16
18
13
22
13
18
16
22
14
19
B+ Tree: Deletion
–Non-Leaf Node
•Redistribute to sibling
•Through parent
•Right node not less than left node
•Merge (contain too few entries)
•Bring down parent
•Move all values, pointers to left
node
•Delete the right node, and pointers
in parent
13
9
10
DBMS , A. Yazıcı
18
9
13
9
14
14
9
9
15
13
14
15
16
18
13
18
16
18
10
22
14
Delete 10
10
16
22
16
20
B+ Tree: Deletion
l Example
3: n=3, delete 3
5
20
3
8
1
Subtree
A
3
20
5
8
Subtree
A
1
DBMS , A. Yazıcı
21
B+ Tree: Deletion
l Example
4: Delete 28, 31, 21, 25, 19
20
7
1
7
4
17
25
10
17
19
20
31
50
21
25
28
31
42
20
7
1
4
DBMS , A. Yazıcı
7
17
10
25
17
19
20
21
50
25
31
42
22
B+ Tree: Deletion
l Example
4: Delete 28, 31, 21, 25, 19
7
1
4
7
10
7
1
DBMS , A. Yazıcı
4
17
7
17
10
20
50
17
19
20
25
42
50
17
20
42
23
Queries on B+-Trees
l
Find all records with a search-key value of k.
1. Start with the root node
1.
2.
3.
Examine the node for the smallest search-key value > k.
If such a value exists, assume it is Kj. Then follow Pi to
the child node
Otherwise k ≥ Km–1, where there are m pointers in the
node. Then follow Pm to the child node.
2. If the node reached by following the pointer above
is not a leaf node, repeat step 1 on the node
3. Else we have reached a leaf node.
1.
2.
DBMS , A. Yazıcı
If for some i, key Ki = k follow pointer Pi to the desired
record or bucket.
Else no record with search-key value k exists.
24
Queries on B+-Trees (Cont.)
l
l
l
l
l
In processing a query, a path is traversed in the tree from
the root to some leaf node.
If there are K search-key values in the file, the path is no
longer than ⎡ log⎡n/2⎤(K)⎤.
A node is generally the same size as a disk block,
typically 4 kilobytes, and n is typically around 100 (40
bytes per index entry).
With 1 million search key values and n = 100, at most
log50(1,000,000) = 4 nodes are accessed in a lookup.
Contrast this with a balanced binary free with 1 million
search key values — around 20 nodes are accessed in a
lookup
– above difference is significant since every node
access may need a disk I/O, costing around 20
milliseconds!
DBMS , A. Yazıcı
25
B+-Tree File Organization
l
l
l
l
l
Index file degradation problem is solved by using B+Tree indices. Data file degradation problem is solved
by using B+-Tree File Organization.
The leaf nodes in a B+-tree file organization store
records, instead of pointers.
Since records are larger than pointers, the maximum
number of records that can be stored in a leaf node is
less than the number of pointers in a nonleaf node.
Leaf nodes are still required to be half full.
Insertion and deletion are handled in the same way
as insertion and deletion of entries in a B+-tree index.
DBMS , A. Yazıcı
26
B+-Tree File Organization (Cont.)
Example of B+-tree File Organization
l
l
Good space utilization important since records use
more space than pointers.
To improve space utilization, involve more sibling
nodes in redistribution during splits and merges
– Involving 2 siblings in redistribution (to avoid split /
merge where possible) results in each node having
at least entries
DBMS , A. Yazıcı
27
B-Tree Index Files
n Similar to B+-tree, but B-tree allows search-key
values to appear only once; eliminates redundant
storage of search keys.
n Search keys in nonleaf nodes appear nowhere else
in the B-tree; an additional pointer field for each
search key in a nonleaf node must be included.
n Generalized B-tree leaf node
l
Nonleaf node – pointers Bi are the bucket or
file record pointers.
DBMS , A. Yazıcı
28
B-Tree Index File Example
B-tree (above) and B+-tree (below) on same data
DBMS , A. Yazıcı
29
B-Tree Index Files (Cont.)
l
l
l
Advantages of B-Tree indices:
– May use less tree nodes than a corresponding B+-Tree.
– Sometimes possible to find search-key value before
reaching leaf node.
Disadvantages of B-Tree indices:
– Only small fraction of all search-key values are found
early
– Non-leaf nodes are larger, so fan-out is reduced. Thus,
B-Trees typically have greater depth than corresponding
B+-Tree
– Insertion and deletion more complicated than in B+-Trees
– Implementation is harder than B+-Trees.
Typically, advantages of B-Trees do not outweigh
disadvantages.
DBMS , A. Yazıcı
30
Index Definition in SQL
l
l
l
Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.: create index b-index on branch(branch_name)
Use create unique index to indirectly specify and
enforce the condition that the search key is a candidate
key.
– Not really required if SQL unique integrity constraint is
supported
To drop an index
drop index <index-name>
DBMS , A. Yazıcı
31