B-Tree - IJETAE

International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
How Can Manage Balance Tree (B-Tree)
Mangesh Umak1, Vijaya Sawarkar2, Hemant Gulhane3
1,2
M-tech in Computer Science & Engineering, Thakral College of Technology (RGPV University), Bhopal.
3
M-tech in Computer Science & Engineering, SSGBCOET, Bhusawal.
 The parental nodes are used for indexing
 Usually called B+ tree
 When used for storing a large data file on a disk, the
nodes of a B-tree usually correspond to the disk pages
(blocks).
 A B-tree is a method of placing and locating files (called
records or keys) in a database. (The meaning of the letter B
has not been explicitly defined.) The B-tree algorithm
minimizes the number of times a medium must be accessed
to locate a desired record, thereby speeding up the process.
In a tree, records are stored in locations called leaves. This
name derives from the fact that records always exist at end
points; there is nothing beyond them. The maximum
number of children per node is the order of the tree. The
number of required disk accesses is the depth. The image at
left shows a binary tree for locating a particular record in a
set of eight leaves. The image at right shows a B-tree of
order three for locating a particular record in a set of eight
leaves (the ninth leaf is unoccupied, and is called a null).
The binary tree at left has a depth of four; the B-tree at
right has a depth of three. Clearly, the B-tree allows a
desired record to be located faster, assuming all other
system parameters are identical. The tradeoff is that the
decision process at each node is more complicated in a Btree as compared with a binary tree. A sophisticated
program is required to execute the operations in a B-tree.
But this program is stored in RAM, so it runs fast.
 B-trees are preferred when decision points, called nodes,
are on hard disk rather than in random-access memory
(RAM). It takes thousands of times longer to access a data
element from hard disk as compared with accessing it from
RAM, because a disk drive has mechanical parts, which
read and write data far more slowly than purely electronic
media. B-trees save time by using nodes with many
branches (called children), compared with binary trees, in
which each node has only two children. When there are
many children per node, a record can be found by passing
through fewer nodes than if there are two children per
node. A simplified example of this principle is shown
below.
Abstract— In B-trees, internal (non-leaf) nodes can have a
variable number of child nodes within some pre-defined
range. When data is inserted or removed from a node, its
number of child nodes changes. In order to maintain the predefined range, internal nodes may be joined or split. Because
a range of child nodes is permitted, B-trees do not need rebalancing as frequently as other self-balancing search trees,
but may waste some space, since nodes are not entirely full.
The lower and upper bounds on the number of child nodes
are typically fixed for a particular implementation. For
example, in a 2-3 B-tree (often simply referred to as a 2-3
tree), each internal node may have only 2 or 3 child nodes.
I. INTRODUCTION
A B-tree is kept balanced by requiring that all leaf nodes
are at the same depth. This depth will increase slowly as
elements are added to the tree, but an increase in the overall
depth is infrequent, and results in all leaf nodes being one
more node further away from the root.
B-trees have substantial advantages over alternative
implementations when node access times far exceed access
times within nodes, because then the cost of accessing the
node may be amortized over multiple operations within the
node. This usually occurs when the nodes are in secondary
storage such as disk drives. By maximizing the number of
child nodes within each internal node, the height of the tree
decreases and the number of expensive node accesses is
reduced. In addition, rebalancing the tree occurs less often.
The maximum number of child nodes depends on the
information that must be stored for each child node and the
size of a full disk block or an analogous size in secondary
storage. While 2-3 B-trees are easier to explain, practical
B-trees using secondary storage want a large number of
child nodes to improve performance.
B-Trees
A B-tree provides an efficient index organization for
data sets of structured records
 [2]Introduced by R. Bayer and E. McGreight in 1972
 Extends the idea of the 2-3 tree by permitting more than
a single key in the same node of a search tree.
In a B-tree, all data records (or record keys) are stored at
the leaves, in increasing order of the keys
74
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
 Has keys between [m/2] - 1 and m -1.
 The tree is perfectly balanced
 All its leaves are at the same level.
II. THE DATABASE PROBLEM
1) Time to search a sorted file
Usually, sorting and searching algorithms have been
characterized by the number of comparison operations that
must be performed using order notation. A binary search of
a sorted table with N records, for example, can be done in
O(log 2N) comparisons. If the table had 1,000,000 records,
then a specific record could be located with about 20
comparisons: log 21,000,000 = 19.931....
Large databases have historically been kept on disk
drives. The time to read a record on a disk drive can
dominate the time needed to compare keys once the record
is available. The time to read a record from a disk drive
involves a seek time and a rotational delay. The seek time
may be 0 to 20 or more milliseconds, and the rotational
delay averages about half the rotation period. For a 7200
RPM drive, the rotation period is 8.33 milliseconds. For a
drive such as the Seagate ST3500320NS, the track-to-track
seek time is 0.8 milliseconds and the average reading seek
time is 8.5 milliseconds. For simplicity, assume reading
from disk takes about 10 milliseconds.
Naively, then, the time to locate one record out of a
million would take 20 disk reads times 10 milliseconds per
disk read, which is 0.2 seconds.
The time won't be that bad because individual records
are grouped together in a disk block. A disk block might be
16 kilobytes. If each record is 160 bytes, then 100 records
could be stored in each block. The disk read time above
was actually for an entire block. Once the disk head is in
position, one or more disk blocks can be read with little
delay. With 100 records per block, the last 6 or so
comparisons don't need to do any disk reads—the
comparisons are all within the last disk block read.
To speed the search further, the first 13 to 14
comparisons (which each required a disk access) must be
sped up.
2) An index speeds the search
A significant improvement can be made with an index.
In the example above, initial disk reads narrowed the
search range by a factor of two. That can be improved
substantially by creating an auxiliary index that contains
the first record in each disk block (sometimes called a
sparse index). This auxiliary index would be 1% of the size
of the original database, but it can be searched more
quickly.
Fig.1
In a practical B-tree, there can be thousands, millions, or
billions of records. Not all leaves necessarily contain a
record, but at least half of them do. The difference in depth
between binary-tree and B-tree schemes is greater in a
practical database than in the example illustrated here,
because real-world B-trees are of higher order (32, 64, 128,
or more). Depending on the number of records in the
database, the depth of a B-tree can and often does change.
Adding a large enough number of records will increase the
depth; deleting a large enough number of records will
decrease the depth. This ensures that the B-tree functions
optimally for the number of records it contains.
Parental Node of a B-Tree

Each B-tree node contains n -1 ordered keys K1 < K2
…….< Kn-1
 The keys are interposed with n pointers or references to
the node’s children so that all the keys in subtree T 0 are
smaller than K1, all the keys in subtree T 1 are greater than
or equal to K1 and smaller than K2 with K1 being equal to
the smallest key in T1, and so on.
 The keys of the last subtree Tn-1 are greater than or
equal to Kn-1 being equal to the smallest key in T
 A n-node is shown here.
Fig. 2
Properties of B-Trees
The root is either a leaf or has between 2 and m children.
 A leaf has between 1 and m -1 entries or keys.
 Each node, except the root and the leaves, has between
[m/2] and m children
75
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
Finding an entry in the auxiliary index would tell us
which block to search in the main database; after searching
the auxiliary index, we would have to search only that one
block of the main database—at a cost of one more disk
read. The index would hold 10,000 entries, so it would take
at most 14 comparisons. Like the main database, the last 6
or so comparisons in the aux index would be on the same
disk block. The index could be searched in about 8 disk
reads, and the desired record could be accessed in 9 disk
reads.
The trick of creating an auxiliary index can be repeated
to make an auxiliary index to the auxiliary index. That
would make an aux-aux index that would need only 100
entries and would fit in one disk block.
Instead of reading 14 disk blocks to find the desired
record, we only need to read 3 blocks. Reading and
searching the first (and only) block of the aux-aux index
identifies the relevant block in aux-index. Reading and
searching that aux-index block identifies the relevant block
in the main database. Instead of 150 milliseconds, we need
only 30 milliseconds to get the record.
The auxiliary indices have turned the search problem
from a binary search requiring roughly log 2N disk reads to
one requiring only log bN disk reads where b is the
blocking factor (the number of entries per block: b = 100
entries per block; log b1,000,000 = 3 reads).
In practice, if the main database is being frequently
searched, the aux-aux index and much of the aux index
may reside in a disk cache, so they would not incur a disk
read.
Now, both insertions and deletions are fast as long as
space is available on a block. If an insertion won't fit on the
block, then some free space on some nearby block must be
found and the auxiliary indices adjusted. The hope is
enough space is nearby that a lot of blocks do not need to
be reorganized. Alternatively, some out-of-sequence disk
blocks may be used.
1) The B-tree uses all those ideas
The B-tree uses all of the above ideas:
 It keeps records in sorted order for sequential traversing
 It uses a hierarchical index to minimize the number of
disk reads
 It uses partially-full blocks to speed insertions and
deletions
 The index is elegantly adjusted with a recursive
algorithm
In addition, a B-tree minimizes waste by making sure the
interior nodes are at least ½ full. A B-tree can handle an
arbitrary number of insertions and deletions.
IV. EXAMPLE OF B-TREE INSERTION AND DELETION
Insertion
All insertions start at a leaf node. To insert a new
element, search the tree to find the leaf node where the new
element should be added. Insert the new element into that
node with the following steps:
1. If the node contains fewer than the maximum legal
number of elements, then there is room for the new
element. Insert the new element in the node, keeping
the node's elements ordered.
2. Otherwise the node is full, evenly split it into two nodes
so:
a. A single median is chosen from among the leaf's
elements and the new element.
b. Values less than the median are put in the new left
node and values greater than the median are put in
the new right node, with the median acting as a
separation value.
c. The separation value is inserted in the node's parent,
which may cause it to be split, and so on. If the node
has no parent (i.e., the node was the root), create a
new root above this node (increasing the height of
the tree).
If the splitting goes all the way up to the root, it creates a
new root with a single separator value and two children,
which is why the lower bound on the size of internal nodes
does not apply to the root. The maximum number of
elements per node is U−1. When a node is split, one
element moves to the parent, but one element is added.
III. INSERTION AND DELETION CAUSE TROUBLE
If the database does not change, then compiling the
index is simple to do, and the index need never be changed.
If there are changes, then managing the database and its
index becomes more complicated.
Deleting records from a database doesn't cause much
trouble. The index can stay the same, and the record can
just be marked as deleted. The database stays in sorted
order. If there are a lot of deletions, then the searching and
storage become less efficient.
Insertions are a disaster in a sorted sequential file
because room for the inserted record must be made.
Inserting a record before the first record in the file requires
shifting all of the records down one. Such an operation is
just too expensive to be practical.
A trick is to leave some space lying around to be used
for insertions. Instead of densely storing all the records in a
block, the block can have some free space to allow for
subsequent insertions. Those records would be marked as if
they were "deleted" records.
76
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
So, it must be possible to divide the maximum number
U−1 of elements into two legal nodes.
If this number is odd, then U=2L and one of the new
nodes contains (U−2)/2 = L−1 elements, and hence is a
legal node, and the other contains one more element, and
hence it is legal too. If U−1 is even, then U=2L−1, so there
are 2L−2 elements in the node. Half of this number is L−1,
which is the minimum number of elements allowed per
node.
An improved algorithm (Mond & Raz 1985) supports a
single pass down the tree from the root to the node where
the insertion will take place, splitting any full nodes
encountered on the way. This prevents the need to recall
the parent nodes into memory, which may be expensive if
the nodes are on secondary storage. However, to use this
improved algorithm, we must be able to send one element
to the parent and split the remaining U−2 elements into two
legal nodes, without adding a new element. This requires U
= 2L rather than U = 2L−1, which accounts for why some
textbooks impose this requirement in defining B-trees.
The algorithm below uses the former strategy.
There are two special cases to consider when deleting an
element:
1. The element in an internal node may be a separator for
its child nodes
2. Deleting an element may put its node under the
minimum number of elements and children
The procedures for these cases are in order below.
a) Deletion from a leaf node
1. Search for the value to delete
2. If the value's in a leaf node, simply delete it from the
node
3. If underflow happens, check siblings, and either transfer
a key or fuse the siblings together
4. If deletion happened from right child, retrieve the max
value of left child if it has no underflow
5. In vice-versa situation, retrieve the min element from
right
b) Deletion from an internal node
Each element in an internal node acts as a separation
value for two subtrees, and when such an element is
deleted, two cases arise.
In the first case, both of the two child nodes to the left
and right of the deleted element have the minimum number
of elements, namely L−1. They can then be joined into a
single node with 2L−2 elements, a number which does not
exceed U−1 and so is a legal node. Unless it is known that
this particular B-tree does not contain duplicate data, we
must then also (recursively) delete the element in question
from the new node.
In the second case, one of the two child nodes contains
more than the minimum number of elements. Then a new
separator for those subtrees must be found. Note that the
largest element in the left subtree is still less than the
separator. Likewise, the smallest element in the right
subtree is the smallest element which is still greater than
the separator. Both of those elements are in leaf nodes, and
either can be the new separator for the two subtrees.
1. If the value is in an internal node, choose a new
separator (either the largest element in the left subtree or
the smallest element in the right subtree), remove it from
the leaf node it is in, and replace the element to be deleted
with the new separator
2. This has deleted an element from a leaf node, and so is
now equivalent to the previous case
c) Rebalancing after deletion
If deleting an element from a leaf node has brought it
under the minimum size, some elements must be
redistributed to bring all nodes up to the minimum.
Fig. 3
1.
2.
A B Tree insertion example with each iteration. [1]The
nodes of this B tree have at most 3 children.
Deletion
There are two popular strategies for deletion from a BTree.
Locate and delete the item, then restructure the tree to
regain its invariants, OR
Do a single pass down the tree, but before entering
(visiting) a node, restructure the tree so that once the key to
be deleted is encountered, it can be deleted without
triggering the need for any further restructuring
77
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
In some cases the rearrangement will move the
deficiency to the parent, and the redistribution must be
applied iteratively up the tree, perhaps even to the root.
Since the minimum element count doesn't apply to the root,
making the root be the only deficient node is not a problem.
The algorithm to rebalance the tree is as follows:
1. If the right sibling has more than the minimum number
of elements
2. Otherwise, if the left sibling has more than the minimum
number of elements
3. If both immediate siblings have only the minimum
number of elements
1. Add the separator to the start of the deficient node
2. Replace the separator in the parent with the last
element of the left sibling
3. Insert the last child of the left sibling as the first child
of the deficient node
4. Add the separator to the end of the deficient node
5. Replace the separator in the parent with the first
element of the right sibling
6. Append the first child of the right sibling as the last
child of the deficient node
7. Create a new node with all the elements from the
deficient node, all the elements from one of its
siblings, and the separator in the parent between the
two combined sibling nodes
8. Remove the separator from the parent, and replace the
two children it separated with the combined node
9. If that brings the number of elements in the parent
under the minimum, repeat these steps with that
deficient node, unless it is the root, since the root is
permitted to be deficient
The only other case to account for is when the root has
no elements and one child. In this case it is sufficient to
replace it with its only child.
For example, here is a portion of a B-tree with order 2
(nodes have at least 2 keys and 3 pointers). Nodes are
delimited with [square brackets]. The keys are city names,
and are kept sorted in each node. On either side of every
key are pointers linking the key to subsequent nodes:
Start here
|
v
[ Chicago Hoboken ]
|
|
|
+-----------+
|
+-----+
|
|
|
v
v
v
[ Aptos Boston ] [Denver] [San-Jose Seattle ]
| |
|
| |
| |
|
|
v v
v
v
v
v v
v
v
X
To find the key "Dallas", we begin searching at the top
"root" node. "Dallas" is not in the node but sorts between
"Chicago" and "Hoboken", so we follow the middle pointer
to the next node. Again, "Dallas" is not in the node but
sorts before "Denver", so we follow that node's first pointer
down to the next node (marked with an "X"). Eventually,
we will either locate the key, or encounter a "leaf" node at
the bottom level of the B-tree with no pointers to any lower
nodes and without the key we want, indicating the key is
nowhere in the B-tree.
Below is another fragment of an order 1 B-tree (nodes
have at least 1 key and 2 pointers). Searching for the key
"Chicago" begins at "Marin", follows the first pointer to
"Aptos" (since Chicago sorts before Marin), then follows
that node's second pointer down to the next level (since
Chicago sorts after Aptos), as marked with an "X".
v
[ Marin ]
| |
+--+ +---+
|
|
v
v
[ Aptos ] [ Seattle ]
| | |
|
v v v
v
X
V. B-TREE ALGORITHMS
A B-tree is a data structure that [9]maintains an ordered
set of data and allows efficient operations to find, delete,
insert, and browse the data. In this discussion, each piece of
data stored in a B-tree will be called a "key", because each
key is unique and can occur in the B-tree in only one
location.
A B-tree consists of "node" records containing the keys,
and pointers that link the nodes of the B-tree together.
Every B-tree is of some "order n", meaning nodes
contain from n to 2n keys, and nodes are thereby always at
least half full of keys. Keys are kept in sorted order within
each node. A corresponding list of pointers are effectively
interspersed between keys to indicate where to search for a
key if it isn't in the current node. A node containing k keys
always also contains k+1 pointers.
Searching a B-tree for a key always begins at the root
node and follows pointers from node to node until either
the key is located or the search fails because a leaf node is
reached and there are no more pointers to follow.
B-trees grow when new keys are inserted. Since the root
node initially begins with just one key, the root node is a
special exception and the only node allowed to have less
than n keys in an order n B-tree.
78
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
Here is an order 2 B-tree with integer keys. Except for
the special root node, order 2 requires every node to have
from 2 to 4 keys and 3 to 5 pointers. Empty slots are
marked with ".", showing where future keys have not yet
been stored in the nodes:
| |
+----+ +------+
|
|
v
v
[74 75 . .] [77 78 . .]
[ 57 . . .]
| |
+----------+ +----------+
|
|
v
v
[ 14 40 . .]
[ 72 84 . .]
| | |
| | |
+-+ | +-+
+-+ | +-+
| | |
| | |
v v v
v v v
[01 12 . .][15 16 17 .] [47 56 . .]
76 78] [85 86 99 .]
In this case, the parent node contained only 2 keys (72
and 84), leaving room for 76 to be promoted and inserted.
But if the parent node was also already full with 4 keys,
then it too would have to split. Indeed, splitting may
propagate all the way up to the root node. When the root
splits, the B-tree grows in height by one level, and a new
root with a single promoted key is formed. (A situation
when an order n root node sometimes has fewer than n
keys, just like the situation described earlier when the root
node stores the very first key placed in the B-tree.)
If a node underflows, we may be able to "redistribute"
keys by borrowing some from a neighboring node. For
example, in the order 3 B-tree below, the key 67 is being
deleted, which causes a node to underflow since it only has
keys 66 and 88 left. So keys from the neighbor on the left
are "shifted through" the parent node and redistributed so
both leaf nodes end up with 4 keys:
Before deleting 67
[ xx 55 xx ]
| |
+--------+ +--------+
|
|
v
v
[22 24 26 28 33 44][66 67 88 . . .]
After deleting 67
[ xx 33 xx ]
| |
+--------+ +--------+
|
|
v
v
[22 24 26 28 . .] [44 55 66 88 . .]
[58 60 61 .] [74 75
To insert the key "59", we first simply search for that
key. If 59 is found, the key is already in the tree and the
insertion is superfluous. Otherwise, we must end up at a
leaf node at the bottom level of the tree where 59 would be
stored. In the above case, the leaf node contains 58, 60, 61,
and room for a fourth key, so 59 is simply inserted in the
leaf node in sorted order:
[58 59 60 61]
Now we'll insert the key "77". The initial search leads us
to the leaf node where 77 would be inserted, but the node is
already full with 4 keys: 74, 75, 76, and 78. Adding another
key would violate the rule that order 2 B-trees can't have
more than 4 keys. Because of this "overflow" condition, the
leaf node is split into two leaf nodes. The leftmost 2 keys
are put in the left node, the rightmost 2 keys are put in the
right node, and the middle key is "promoted" by inserting it
into the parent node above the leaf. Here, inserting 77
causes the 74-75-76-78 node to be split into two nodes, and
76 is moved up to the parent node that contained 72 and 84:
Before inserting 77
But if the underflow node and the neighbor node have
less than 2n keys to redistribute, the two nodes will have to
be combined. For example, here key 52 is being deleted
from the B-tree below, causing an underflow, and the
neighbor node can't afford to give up any keys for
redistribution. So one node is discarded, and the parent key
moves down with the other keys to fill up a single node:
[ 72 84 . .]
| | |
-+ | +|
|
|
v
[74 75 76 78]
VI.
OPERATIONS ON B-TREES
The algorithms for the search, create, and insert
operations are shown below. Note that these algorithms are
single pass; in other words, they do not traverse back up the
tree.
After inserting 77
[ 72 76 84 .]
| | | |
--+ | | +-79
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
 B-Tree-Split-Child(x, i, y)
z <- Allocate-Node()
leaf[z] <- leaf[y]
n[z] <- t - 1
for j <- 1 to t - 1
do keyj[z] <- keyj+t[y]
if not leaf[y]
then for j <- 1 to t
do cj[z] <- cj+t[y]
n[y] <- t - 1
for j <- n[x] + 1 downto i + 1
do cj+1[x] <- cj[x]
ci+1 <- z
for j <- n[x] downto i
do keyj+1[x] <- keyj[x]
keyi[x] <- keyt[y]
n[x] <- n[x] + 1
Disk-Write(y)
Disk-Write(z)
Disk-Write(x)
If is node becomes "too full," it is necessary to perform a
split operation. The split operation moves the median key
of node x into its parent y where x is the ith child of y. A
new node, z, is allocated, and all keys in x right of the
median key are moved to z. The keys left of the median key
remain in the original node x. The new node, z, becomes
the child immediately to the right of the median key that
was moved to the parent y, and the original node, x,
becomes the child immediately to the left of the median
key that was moved into the parent y.
The split operation transforms a full node with 2t - 1
keys into two nodes with t - 1 keys each. Note that one key
is moved into the parent node. The B-Tree-Split-Child
algorithm will run in time O(t) where t is constant.
Since b-trees strive to minimize disk accesses and the
nodes are usually stored on disk, this single-pass approach
will reduce the number of node visits and thus the number
of disk accesses. Simpler double-pass approaches that
move back up the tree to fix violations are possible.
Since all nodes are assumed to be stored in secondary
storage (disk) rather than primary storage (memory), all
references to a given node be be preceeded by a read
operation denoted by Disk-Read. Similarly, once a node is
modified and it is no longer needed, it must be written out
to secondary storage with a write operation denoted by
Disk-Write. The algorithms below assume that all nodes
referenced in parameters have already had a corresponding
Disk-Read operation. New nodes are created and assigned
storage with the Allocate-Node call. The implementation
details of the Disk-Read, Disk-Write, and Allocate-Node
functions are operating system and implementation
dependent.
 B-Tree-Search(x, k)
i <- 1
while i <= n[x] and k > keyi[x]
do i <- i + 1
if i <= n[x] and k = keyi[x]
then return (x, i)
if leaf[x]
then return NIL
else Disk-Read(ci[x])
return B-Tree-Search(ci[x], k)
The search operation on a b-tree is analogous to a search
on a binary tree. Instead of choosing between a left and a
right child as in a binary tree, a b-tree search must make an
n-way choice. The correct child is chosen by performing a
linear search of the values in the node. After finding the
value greater than or equal to the desired value, the child
pointer to the immediate left of that value is followed. If all
values are less than the desired value, the rightmost child
pointer is followed. Of course, the search can be terminated
as soon as the desired node is found. Since the running time
of the search operation depends upon the height of the tree,
B-Tree-Search is O(logt n).
 B-Tree-Insert(T, k)
r <- root[T]
if n[r] = 2t - 1
then s <- Allocate-Node()
root[T] <- s
leaf[s] <- FALSE
n[s] <- 0
c1 <- r
B-Tree-Split-Child(s, 1, r)
B-Tree-Insert-Nonfull(s, k)
else B-Tree-Insert-Nonfull(r, k)
 B-Tree-Create(T)
x <- Allocate-Node()
leaf[x] <- TRUE
n[x] <- 0
Disk-Write(x)
root[T] <- x
The B-Tree-Create operation creates an empty b-tree by
allocating a new root node that has no keys and is a leaf
node. Only the root node is permitted to have these
properties; all other nodes must meet the criteria outlined
previously. The B-Tree-Create operation runs in time O(1).
 B-Tree-Insert-Nonfull(x, k)
i <- n[x]
if leaf[x]
then while i >= 1 and k < keyi[x]
do keyi+1[x] <- keyi[x]
i <- i - 1
80
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
keyi+1[x] <- k
n[x] <- n[x] + 1
Disk-Write(x)
else while i >= and k < keyi[x]
do i <- i - 1
i <- i + 1
Disk-Read(ci[x])
if n[ci[x]] = 2t - 1
then B-Tree-Split-Child(x, i, ci[x])
if k > keyi[x]
then i <- i + 1
B-Tree-Insert-Nonfull(ci[x], k)
To perform an insertion on a b-tree, the appropriate node
for the key must be located using an algorithm similiar to
B-Tree-Search. Next, the key must be inserted into the
node. If the node is not full prior to the insertion, no special
action is required; however, if the node is full, the node
must be split to make room for the new key. Since splitting
the node results in moving one key to the parent node, the
parent node must not be full or another split operation is
required. This process may repeat all the way up to the root
and may require splitting the root node. This approach
requires two passes. The first pass locates the node where
the key should be inserted; the second pass performs any
required splits on the ancestor nodes.
Since each access to a node may correspond to a costly
disk access, it is desirable to avoid the second pass by
ensuring that the parent node is never full. To accomplish
this, the presented algorithm splits any full nodes
encountered while descending the tree. Although this
approach may result in unecessary split operations, it
guarantees that the parent never needs to be split and
eliminates the need for a second pass up the tree. Since a
split runs in linear time, it has little effect on the O(t logt n)
running time of B-Tree-Insert.
VII. EXAMPLES
 Sample B-Tree
Fig. 4
 Searching a B-Tree for Key 21
Fig.5
 Inserting Key 33 into a B-Tree (w/ Split)
 B-Tree-Delete
Deletion of a key from a b-tree is possible; however,
special care must be taken to ensure that the properties of a
b-tree are maintained. Several cases must be considered. If
the deletion reduces the number of keys in a node below
the minimum degree of the tree, this violation must be
corrected by combining several nodes and possibly
reducing the height of the tree. If the key has children, the
children must be rearranged.
Fig.6
81
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
For example, imagine a database storing bank account
balances. Now assume that someone attempts to withdraw
$40 from an account containing $60.
First, the current balance is checked to ensure sufficent
funds. After funds are disbursed, the balance of the account
is reduced. This approach works flawlessly until concurrent
transactions are considered. Suppose that another person
simultaneously attempts to withdraw $30 from the same
account. At the same time the account balance is checked
by the first person, the account balance is also retrieved for
the second person. Since neither person is requesting more
funds than are currently available, both requests are
satisfied for a total of $70. After the first person's
transaction, $20 should remain ($60 - $40), so the new
balance is recorded as $20. Next, the account balance after
the second person's transaction, $30 ($60 - $30), is
recorded overwriting the $20 balance. Unfortunately, $70
have been disbursed, but the account balance has only been
decreased by $30. Clearly, this behavior is undesirable, and
special precautions must be taken.
VIII. APPLICATIONS
 Databases
A database is a [4]collection of data organized in a
fashion that facilitates updating, retrieving, and managing
the data. The data can consist of anything, including, but
not limited to names, addresses, pictures, and numbers.
Databases are commonplace and are used everyday. For
example, an airline reservation system might maintain a
database of available flights, customers, and tickets issued.
A teacher might maintain a database of student names and
grades.
Because computers excel at quickly and accurately
manipulating, storing, and retrieving data, databases are
often maintained electronically using a database
management system. Database management systems are
essential components of many everyday business
operations. Database products like Microsoft SQL Server,
Sybase Adaptive Server, IBM DB2, and Oracle serve as a
foundation for accounting systems, inventory systems,
medical recordkeeping sytems, airline reservation systems,
and countless other important aspects of modern
businesses.
It is not uncommon for a [5]database to contain millions
of records requiring many gigabytes of storage. For
examples, TELSTRA, an Australian telecommunications
company, maintains a customer billing database with 51
billion rows (yes, billion) and 4.2 terabytes of data. In order
for a database to be useful and usable, it must support the
desired operations, such as retrieval and storage, quickly.
Because databases cannot typically be maintained entirely
in memory, b-trees are often used to index the data and to
provide fast access. For example, searching an unindexed
and unsorted database containing n key values will have a
worst case running time of O(n); if the same data is indexed
with a b-tree, the same search operation will run in O(log
n). To perform a search for a single key on a set of one
million keys (1,000,000), a linear search will require at
most 1,000,000 comparisons. If the same data is indexed
with a b-tree of minimum degree 10, 114 comparisons will
be required in the worst case. Clearly, indexing large
amounts of data can significantly improve search
performance. Although other balanced tree structures can
be used, a b-tree also optimizes costly disk accesses that are
of concern when dealing with large data sets.
REFERENCES
1. Bowman, M., Debray, S. K., and Peterson, L. L. 1993. Reasoning
about naming
2. Bayer, R., M. Schkolnick. Concurrency of Operations on B-Trees. In
Readings in Database Systems (ed. Michael Stonebraker), pages 216226, 1994.
3. Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest,
Introduction to Algorithms, MIT Press, Massachusetts: 1998.
4. Gray, J. N., R. A. Lorie, G. R. Putzolu, I. L. Traiger. Granularity of
Locks and Degrees of Consistency in a Shared Data Base. In Readings
in Database Systems (ed. Michael Stonebraker), pages 181-208, 1994.
5. Kung, H. T., John T. Robinson. On Optimistic Methods of
Concurrency Control. In Readings in Database Systems (ed. Michael
Stonebraker), pages 209-215, 1994.
6. Section 19.3, pages 395-397, of Cormen, Leiserson, and Rivest.
7. http://www.cs.princeton.edu/introalgsds/44balanced
8. Comer, Douglas (June 1979), "The Ubiquitous B-Tree", Computing
Surveys 11 (2): 123–137, doi:10.1145/356770.356776, ISSN 03600300.
9. Cormen, Thomas; Leiserson, Charles; Rivest, Ronald; Stein, Clifford
(2001), Introduction to Algorithms (Second ed.), MIT Press and
McGraw-Hill, pp. 434–454, ISBN 0-262-03293-7. Chapter 18: BTrees.
10. Folk, Michael J.; Zoellick, Bill (1992), File Structures (2nd ed.),
Addison-Wesley, ISBN 0-201-55713-4
11. Knuth, Donald (1998), Sorting and Searching, The Art of Computer
Programming, Volume 3 (Second ed.), Addison-Wesley, ISBN 0-20189685-0. Section 6.2.4: Multiway Trees, pp. 481–491. Also, pp. 476–
477 of section 6.2.3 (Balanced Trees) discusses 2-3 trees.
 Concurrent Access to B-Trees
Databases typically run in multiuser environments where
many users can concurrently perform operations on the
database. Unfortunately, this common scenario introduces
complications.
82

Download Report

B-Tree - IJETAE

Paperzz.com

Your Paperzz