B-Tree

Lecture 11 : B-Tree
Bong-Soo Sohn
Assistant Professor
School of Computer Science and Engineering
Chung-Ang University
Lecture notes : courtesy of David Matuszek
Binary Search Tree (BST) : Problem

Consider disk access for BST

Disk access is much slower than memory access

Disk access


Seek time >> rotational delay > transfer time
Reducing seek time significantly affects overall performance

If we adopt trivial method for storing BST in a disk,
Each visit to a child node involves one disk access.
That is inefficient.

We want to reduce height of BST by using multiway search tree


m-way search tree

A non-empty node has M subtrees (2<=M<=m)






Therefore, has M-1 keys(elements)
The values in a node are stored in ascending order, V1 < V2 < ...
Vk (k <= M-1)
subtrees are placed between adjacent values, with one
additional subtree at each end.
We can thus associate with each value a `left' and `right' subtree
the right subtree of Vi is the same as the left subtree of V(i+1).
All the values in V1's left subtree are less than V1 ; all the values
in Vk's subtree are greater than Vk; and all the values in the
subtree between V(i) and V(i+1) are greater than V(i) and less
than V(i+1).
3-way search tree
B-Tree

B-Tree of order m has following property

m-way search tree





Keys in a node are in increasing order.
The root node (if not a leaf node) has at least
two children
All nodes other than the root node have at
least [m/2] keys. (how many children?)
All external nodes are at the same level
Mostly used in Database systems
B-Tree

a variation on binary search trees that allow quick
searching in files on disk

Instead of storing one key and having two children,
B-tree nodes have (up to) n keys and n+1 children,
where n can be large

This shortens the tree (in terms of height) and
requires much less disk access than a binary search
tree would

Algorithm is complex and requires more computation.
But computation is much cheaper than disk acces
Disk Access

Platter
 Track
 Sector (typical size : 512B)
 Block : read/write unit , several
consecutive sectors

Store related data into one block
 Locality???
 B-Tree utilize (spatial) locality
B-Tree


B-tree nodes have a variable number of
keys and children, subject to some
constraints.
In many respects, they work just like binary
search trees, but are considerably "fatter."
B-Tree

Every node has the following fields:





x.n, the number of keys currently in node x. For example, |40|50|.n
in the above example B-tree is 2. |70|80|90|.n is 3.
The x.n keys themselves, stored in nondecreasing order: x.key[1]
<= x.key[2] <= ... <= x.key[x.n] For example, the keys in
|70|80|90| are ordered.
x.leaf, a boolean value that is True if x is a leaf and False if x is an
internal node.
If x is an internal node, it contains x.n+1 pointers
c[1], c[2], ... , x.c[n], x.c[n+1] to its children.
Leaf nodes have no children so their c[i] fields are
undefined.
B-Tree


The keys x.key[i] separate the ranges of keys stored
in each subtree: if k[i] is any key stored in the
subtree with root x.c[i], then k[1] <= x.key[1] <=
k[2] <= x.key[2] <= ... <= x.key[x.n] <= k[x.n+1].
Every leaf has the same depth, which is the tree's
height h.
B-Tree Search

Perform Just like Binary Search Tree.
Insert value X into a B-tree
1. using the SEARCH procedure for M-way trees
(described above) find the leaf node to which X
should be added
2. add X to this node in the appropriate place among
the values already there
3. if there are M-1 or fewer values in the node after
adding X, then we are finished
4. If there are M nodes after adding X, we say the node
has overflowed
When overflowed during insertion




Left: the first (M-1)/2 values
Middle: the middle value (position 1+((M-1)/2)
Right: the last (M-1)/2 values
Notice that Left and Right have just enough values to be made
into individual nodes. That's what we do... they become the left
and right children of Middle, which we add in the appropriate
place in this node's parent.
what if there is no room in the parent? If it overflows we do the
same thing again: split it into Left-Middle-Right, make Left and
Right into new nodes and add Middle (with Left and Right as its
children) to the node above.
We continue doing this until no overflow occurs, or until the root
itself overflows. If the root overflows, we split it, as usual, and
create a new root node with Middle as its only value and Left
and Right as its children (as usual).
Example : Insert 17, 6, 21, 67
17
6
21
67
B-Tree Deletion

Not covered here.
B-Tree Summary

B-Tree

Perfectly balanced




Every leaf node is at the same depth
Every node except root node is at least half full
Rebalancing is not so frequent
Reduced disk accesses when tree is stored in disks

Make the size of one node be one or more disk blocks to
improve efficiency of disk accesses.

B-Tree height :

search/insert/delete : O(log N)
[amortized]