COSC242 Lecture 17 B

COSC242 Lecture 17
B-trees!
Problem - to maintain a growing search tree without
allowing it to get tall and thin like a linked list.
We’ve used rotations to keep RBTs short and fat.
Another approach is to allow more children:
But how can one decide which subtree to search?
Idea - allow more than one key in a node. Two keys
allow us to decide which of 3 subtrees to search, and
in general m keys which of m + 1 subtrees to search.
D L
A
J
S
COSC242 Lecture 17 Slide 1
What is a B-tree?!
A B-tree is a rooted tree with an associated integer t
called its minimum degree.
Every node contains at least t-1 and at most 2t-1 keys,
e.g. if t = 2, between 1 and 3 keys. (Exception: root.)
So every internal node (non-leaf) has at least t children
and at most 2t children.
Keys are placed in nodes in a way that satisfies the
search tree property —
if node x has keys key1[x], key2[x], ..., keym[x], then key1[x] < key2[x] < ...< keym[x], and
if k1, k2, ..., km+1 are keys in the subtrees of x,
then k1 < key1[x], ..., keym[x] < km+1.
5 9
2
7
If t = 2, internal nodes may
have 2, 3, or 4 children. This
is called a 2-3-4 tree.
12
All leaves are always on the same level.
COSC242 Lecture 17 Slide 2
How to build a B-tree!
The root may have 0, 1, ..., 2t-1 keys, otherwise we
could never build a B-tree from nothing.
We insert a key by searching for an existing leaf into
which it should go. If the leaf is already full, we split it,
creating a new leaf in the process.
So starting with the empty B-tree:
•  Create an empty root node.
•  Insert keys into root until it is full — has 2t - 1 keys.
•  When a node is full and there is a new key to be
inserted,
- split the node around the middle key (median)
- put the middle key into the parent
- leave the smallest t - 1 keys in existing node
- put the biggest t - 1 keys in a new sibling node
- if the full node was the root, we create a new root containing the median, having 2 children.
A D L
insert J if t = 2
COSC242 Lecture 17 Slide 3
Inserting into a B-tree!
If t = 2, show the B-trees resulting from the successive
insertion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Draw configurations just before some node must split,
and also draw the final configuration.
2
1 2 3
1
3 4 5
2 4
1
3
2 4 6
5 6 7
1
3
5
4
2
1
6
3
5
7 8 9
COSC242 Lecture 17 Slide 4
7 8 Deleting from a B-tree!
Deletion from a B-tree is a bit more complicated than
insertion because a key may be deleted from any node,
not just a leaf. Deletion from an internal node requires
that the node’s children be rearranged.
Just as we had to ensure that nodes didn’t get too big
due to insertion, we have to ensure that nodes don’t get
too small due to deletion. (Only the root may have
fewer than t - 1 keys.)
This is done by ensuring that, before a key is deleted
from a node, that node has at least t keys — which may
mean that we have to move an extra key into a node
before we can delete anything from the node.
There are two ways of moving in an extra node — we
may borrow a key from a nearby node that has more
than it needs, or if we can’t borrow then we may merge
two nodes that have no keys to spare.
To delete key k, our one-pass strategy is to search from
the root for the node containing k, and to strengthen
each node we visit on the way if it has fewer than t
keys.
COSC242 Lecture 17 Slide 5
B-tree deletion (Case 1)!
What we do with leaves:
Suppose the minimum degree t is 3, so that nodes
(other than the root) have to contain at least 2 keys.
Case 1: Imagine we have found leaf
A D L
and want to delete A. Then we simply delete A,
resulting in the leaf
D L
which is fine.
But if the key A lived in leaf
A D
then before deleting we would move an extra key into
the leaf or merge the leaf with a sibling (Case 3).
COSC242 Lecture 17 Slide 6
B-tree deletion (Case 2)!
Case 2: the key we want to delete is in an internal node.
For example, suppose t = 3 and we want to delete L in
P
T X
C G L
A B
D E
H J K
M N O
Q R
U V
Y Z
If the child that precedes key L has at least t (i.e. 3)
keys, replace L by its in-order predecessor in that child,
and recursively delete that predecessor.
P
T X
C G K
A B
D E
H J M N O
Q R
U V
Y Z
Here K is the predecessor, and K is in a leaf having
more than t - 1 keys, so we simply delete K from that
leaf after replacing L by K in the internal node.
COSC242 Lecture 17 Slide 7
B-tree deletion (Case 2 continued)!
Suppose t = 3 and we want to delete K in the B-tree
below. Again the key we want is in an internal node.
P
T X
C G K
A B
D E
H J
M N O
Q R
U V
Y Z
The child that precedes key K has only t - 1 (i.e. 2) keys,
but we can replace K by its successor in the child that
follows K, and then recursively delete that successor.
P
T X
C G M
A B
D E
H J N O
Q R
U V
Y Z
Here M is the successor of K, and M is in a leaf having
more than t - 1 keys, so we borrow M for the parent
node and delete it from the child node.
COSC242 Lecture 17 Slide 8
B-tree deletion (Case 2 end)!
Suppose t = 3 and we want to delete G in the B-tree
below. Again the key we want is in an internal node.
P
T X
C G M
A B
D E
H J
N O
Q R
U V
Y Z
We cannot borrow G’s predecessor E or its successor H
because both children have only t - 1 keys. So we merge
the children. We also reduce the keys in the parent node.
P
T X
C M
A B
D E G H
J
N O
Q R
U V
Y Z
Now G is recursively deleted from the merged node. In
this case, G is in a leaf with enough keys so we just
delete G.
COSC242 Lecture 17 Slide 9
B-tree deletion (Case 3)!
Suppose t = 3 and we want to delete D from the B-tree
below. P
T X
C M
A B
D E H J
N O
Q R
U V
Y Z
In our search for D we visit the internal node with keys
C and M only. We must strengthen this node so that it
has t keys instead of t - 1. If the sibling node had enough
keys we could borrow, but it doesn’t so we merge. The
parent supplies the new median key.
C M P T X
A B
D E H J
N O
Q R
U V
Y Z
Now we continue the search past the merged node and
recursively delete D. In this case, D is in the next node
we meet, which is a leaf with enough keys.
COSC242 Lecture 17 Slide 10
B-tree deletion (Case 3 cont.)!
Suppose t = 3 and we want to delete A below.
C M P T X
A B
D E H J
N O
Q R
U V
Y Z
To strengthen the node
containing A, we bring down C
from its parent and push up D from its sibling, to give:
D M P T X
A B C
E H J
N O
Q R
U V
Y Z
Now we recursively delete A. How do we delete Z?
D M
B C
E H J
P T N O
COSC242 Lecture 17 Slide 11
Why B-trees?"
Databases! Large tables and persistent data both need
to be stored externally, on disk.
Disk tracks are partitioned into pages each containing
many records.
B1!
B2!
B3!
...!
k’th record of page B3!
Disk seeks are slow. Every page access (disk-read
or disk-write) takes much more time than operations
on data in internal memory. Schemes to search data
in external storage try to reduce the number of
page accesses. So we take an entire page to be a
node in a B-tree. (This tells us what t has to be.)
(More details: textbook Chapter 18.)!
COSC242 Lecture 17 Slide 12
Exercises on B-trees!
1.  Find all legal B-trees of minimum degree 2 that
contain the keys 1, 2, 3, 4, 5.
2.  Show the results of inserting the keys F, S, Q, K,
C, L, H, T, V, W, M, R, N, P, A, B, X, Y, D, Z, E in
that order into an empty B-tree with minimum
degree t = 2. Then show the results of deleting A,
B, C, and D, in that order.
3.  Show the results of successively inserting the keys
10, 9, 8, 7, 6, 5, 4, 3, 2, 1 into an initially empty Btree of minimum degree t = 2. Then show the
results of deleting 1, 2, 3, 4 in that order.
4.  What is t in the case of the B-tree below? Show the
results of deleting C, P, and V, in that order.
E L P T X
A C
J K
N O
Q R S
COSC242 Lecture 17 Exercises
U V
Y Z