International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) How Can Manage Balance Tree (B-Tree) Mangesh Umak1, Vijaya Sawarkar2, Hemant Gulhane3 1,2 M-tech in Computer Science & Engineering, Thakral College of Technology (RGPV University), Bhopal. 3 M-tech in Computer Science & Engineering, SSGBCOET, Bhusawal. The parental nodes are used for indexing Usually called B+ tree When used for storing a large data file on a disk, the nodes of a B-tree usually correspond to the disk pages (blocks). A B-tree is a method of placing and locating files (called records or keys) in a database. (The meaning of the letter B has not been explicitly defined.) The B-tree algorithm minimizes the number of times a medium must be accessed to locate a desired record, thereby speeding up the process. In a tree, records are stored in locations called leaves. This name derives from the fact that records always exist at end points; there is nothing beyond them. The maximum number of children per node is the order of the tree. The number of required disk accesses is the depth. The image at left shows a binary tree for locating a particular record in a set of eight leaves. The image at right shows a B-tree of order three for locating a particular record in a set of eight leaves (the ninth leaf is unoccupied, and is called a null). The binary tree at left has a depth of four; the B-tree at right has a depth of three. Clearly, the B-tree allows a desired record to be located faster, assuming all other system parameters are identical. The tradeoff is that the decision process at each node is more complicated in a Btree as compared with a binary tree. A sophisticated program is required to execute the operations in a B-tree. But this program is stored in RAM, so it runs fast. B-trees are preferred when decision points, called nodes, are on hard disk rather than in random-access memory (RAM). It takes thousands of times longer to access a data element from hard disk as compared with accessing it from RAM, because a disk drive has mechanical parts, which read and write data far more slowly than purely electronic media. B-trees save time by using nodes with many branches (called children), compared with binary trees, in which each node has only two children. When there are many children per node, a record can be found by passing through fewer nodes than if there are two children per node. A simplified example of this principle is shown below. Abstract— In B-trees, internal (non-leaf) nodes can have a variable number of child nodes within some pre-defined range. When data is inserted or removed from a node, its number of child nodes changes. In order to maintain the predefined range, internal nodes may be joined or split. Because a range of child nodes is permitted, B-trees do not need rebalancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation. For example, in a 2-3 B-tree (often simply referred to as a 2-3 tree), each internal node may have only 2 or 3 child nodes. I. INTRODUCTION A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more node further away from the root. B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes, because then the cost of accessing the node may be amortized over multiple operations within the node. This usually occurs when the nodes are in secondary storage such as disk drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases and the number of expensive node accesses is reduced. In addition, rebalancing the tree occurs less often. The maximum number of child nodes depends on the information that must be stored for each child node and the size of a full disk block or an analogous size in secondary storage. While 2-3 B-trees are easier to explain, practical B-trees using secondary storage want a large number of child nodes to improve performance. B-Trees A B-tree provides an efficient index organization for data sets of structured records [2]Introduced by R. Bayer and E. McGreight in 1972 Extends the idea of the 2-3 tree by permitting more than a single key in the same node of a search tree. In a B-tree, all data records (or record keys) are stored at the leaves, in increasing order of the keys 74 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) Has keys between [m/2] - 1 and m -1. The tree is perfectly balanced All its leaves are at the same level. II. THE DATABASE PROBLEM 1) Time to search a sorted file Usually, sorting and searching algorithms have been characterized by the number of comparison operations that must be performed using order notation. A binary search of a sorted table with N records, for example, can be done in O(log 2N) comparisons. If the table had 1,000,000 records, then a specific record could be located with about 20 comparisons: log 21,000,000 = 19.931.... Large databases have historically been kept on disk drives. The time to read a record on a disk drive can dominate the time needed to compare keys once the record is available. The time to read a record from a disk drive involves a seek time and a rotational delay. The seek time may be 0 to 20 or more milliseconds, and the rotational delay averages about half the rotation period. For a 7200 RPM drive, the rotation period is 8.33 milliseconds. For a drive such as the Seagate ST3500320NS, the track-to-track seek time is 0.8 milliseconds and the average reading seek time is 8.5 milliseconds. For simplicity, assume reading from disk takes about 10 milliseconds. Naively, then, the time to locate one record out of a million would take 20 disk reads times 10 milliseconds per disk read, which is 0.2 seconds. The time won't be that bad because individual records are grouped together in a disk block. A disk block might be 16 kilobytes. If each record is 160 bytes, then 100 records could be stored in each block. The disk read time above was actually for an entire block. Once the disk head is in position, one or more disk blocks can be read with little delay. With 100 records per block, the last 6 or so comparisons don't need to do any disk reads—the comparisons are all within the last disk block read. To speed the search further, the first 13 to 14 comparisons (which each required a disk access) must be sped up. 2) An index speeds the search A significant improvement can be made with an index. In the example above, initial disk reads narrowed the search range by a factor of two. That can be improved substantially by creating an auxiliary index that contains the first record in each disk block (sometimes called a sparse index). This auxiliary index would be 1% of the size of the original database, but it can be searched more quickly. Fig.1 In a practical B-tree, there can be thousands, millions, or billions of records. Not all leaves necessarily contain a record, but at least half of them do. The difference in depth between binary-tree and B-tree schemes is greater in a practical database than in the example illustrated here, because real-world B-trees are of higher order (32, 64, 128, or more). Depending on the number of records in the database, the depth of a B-tree can and often does change. Adding a large enough number of records will increase the depth; deleting a large enough number of records will decrease the depth. This ensures that the B-tree functions optimally for the number of records it contains. Parental Node of a B-Tree Each B-tree node contains n -1 ordered keys K1 < K2 …….< Kn-1 The keys are interposed with n pointers or references to the node’s children so that all the keys in subtree T 0 are smaller than K1, all the keys in subtree T 1 are greater than or equal to K1 and smaller than K2 with K1 being equal to the smallest key in T1, and so on. The keys of the last subtree Tn-1 are greater than or equal to Kn-1 being equal to the smallest key in T A n-node is shown here. Fig. 2 Properties of B-Trees The root is either a leaf or has between 2 and m children. A leaf has between 1 and m -1 entries or keys. Each node, except the root and the leaves, has between [m/2] and m children 75 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) Finding an entry in the auxiliary index would tell us which block to search in the main database; after searching the auxiliary index, we would have to search only that one block of the main database—at a cost of one more disk read. The index would hold 10,000 entries, so it would take at most 14 comparisons. Like the main database, the last 6 or so comparisons in the aux index would be on the same disk block. The index could be searched in about 8 disk reads, and the desired record could be accessed in 9 disk reads. The trick of creating an auxiliary index can be repeated to make an auxiliary index to the auxiliary index. That would make an aux-aux index that would need only 100 entries and would fit in one disk block. Instead of reading 14 disk blocks to find the desired record, we only need to read 3 blocks. Reading and searching the first (and only) block of the aux-aux index identifies the relevant block in aux-index. Reading and searching that aux-index block identifies the relevant block in the main database. Instead of 150 milliseconds, we need only 30 milliseconds to get the record. The auxiliary indices have turned the search problem from a binary search requiring roughly log 2N disk reads to one requiring only log bN disk reads where b is the blocking factor (the number of entries per block: b = 100 entries per block; log b1,000,000 = 3 reads). In practice, if the main database is being frequently searched, the aux-aux index and much of the aux index may reside in a disk cache, so they would not incur a disk read. Now, both insertions and deletions are fast as long as space is available on a block. If an insertion won't fit on the block, then some free space on some nearby block must be found and the auxiliary indices adjusted. The hope is enough space is nearby that a lot of blocks do not need to be reorganized. Alternatively, some out-of-sequence disk blocks may be used. 1) The B-tree uses all those ideas The B-tree uses all of the above ideas: It keeps records in sorted order for sequential traversing It uses a hierarchical index to minimize the number of disk reads It uses partially-full blocks to speed insertions and deletions The index is elegantly adjusted with a recursive algorithm In addition, a B-tree minimizes waste by making sure the interior nodes are at least ½ full. A B-tree can handle an arbitrary number of insertions and deletions. IV. EXAMPLE OF B-TREE INSERTION AND DELETION Insertion All insertions start at a leaf node. To insert a new element, search the tree to find the leaf node where the new element should be added. Insert the new element into that node with the following steps: 1. If the node contains fewer than the maximum legal number of elements, then there is room for the new element. Insert the new element in the node, keeping the node's elements ordered. 2. Otherwise the node is full, evenly split it into two nodes so: a. A single median is chosen from among the leaf's elements and the new element. b. Values less than the median are put in the new left node and values greater than the median are put in the new right node, with the median acting as a separation value. c. The separation value is inserted in the node's parent, which may cause it to be split, and so on. If the node has no parent (i.e., the node was the root), create a new root above this node (increasing the height of the tree). If the splitting goes all the way up to the root, it creates a new root with a single separator value and two children, which is why the lower bound on the size of internal nodes does not apply to the root. The maximum number of elements per node is U−1. When a node is split, one element moves to the parent, but one element is added. III. INSERTION AND DELETION CAUSE TROUBLE If the database does not change, then compiling the index is simple to do, and the index need never be changed. If there are changes, then managing the database and its index becomes more complicated. Deleting records from a database doesn't cause much trouble. The index can stay the same, and the record can just be marked as deleted. The database stays in sorted order. If there are a lot of deletions, then the searching and storage become less efficient. Insertions are a disaster in a sorted sequential file because room for the inserted record must be made. Inserting a record before the first record in the file requires shifting all of the records down one. Such an operation is just too expensive to be practical. A trick is to leave some space lying around to be used for insertions. Instead of densely storing all the records in a block, the block can have some free space to allow for subsequent insertions. Those records would be marked as if they were "deleted" records. 76 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) So, it must be possible to divide the maximum number U−1 of elements into two legal nodes. If this number is odd, then U=2L and one of the new nodes contains (U−2)/2 = L−1 elements, and hence is a legal node, and the other contains one more element, and hence it is legal too. If U−1 is even, then U=2L−1, so there are 2L−2 elements in the node. Half of this number is L−1, which is the minimum number of elements allowed per node. An improved algorithm (Mond & Raz 1985) supports a single pass down the tree from the root to the node where the insertion will take place, splitting any full nodes encountered on the way. This prevents the need to recall the parent nodes into memory, which may be expensive if the nodes are on secondary storage. However, to use this improved algorithm, we must be able to send one element to the parent and split the remaining U−2 elements into two legal nodes, without adding a new element. This requires U = 2L rather than U = 2L−1, which accounts for why some textbooks impose this requirement in defining B-trees. The algorithm below uses the former strategy. There are two special cases to consider when deleting an element: 1. The element in an internal node may be a separator for its child nodes 2. Deleting an element may put its node under the minimum number of elements and children The procedures for these cases are in order below. a) Deletion from a leaf node 1. Search for the value to delete 2. If the value's in a leaf node, simply delete it from the node 3. If underflow happens, check siblings, and either transfer a key or fuse the siblings together 4. If deletion happened from right child, retrieve the max value of left child if it has no underflow 5. In vice-versa situation, retrieve the min element from right b) Deletion from an internal node Each element in an internal node acts as a separation value for two subtrees, and when such an element is deleted, two cases arise. In the first case, both of the two child nodes to the left and right of the deleted element have the minimum number of elements, namely L−1. They can then be joined into a single node with 2L−2 elements, a number which does not exceed U−1 and so is a legal node. Unless it is known that this particular B-tree does not contain duplicate data, we must then also (recursively) delete the element in question from the new node. In the second case, one of the two child nodes contains more than the minimum number of elements. Then a new separator for those subtrees must be found. Note that the largest element in the left subtree is still less than the separator. Likewise, the smallest element in the right subtree is the smallest element which is still greater than the separator. Both of those elements are in leaf nodes, and either can be the new separator for the two subtrees. 1. If the value is in an internal node, choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator 2. This has deleted an element from a leaf node, and so is now equivalent to the previous case c) Rebalancing after deletion If deleting an element from a leaf node has brought it under the minimum size, some elements must be redistributed to bring all nodes up to the minimum. Fig. 3 1. 2. A B Tree insertion example with each iteration. [1]The nodes of this B tree have at most 3 children. Deletion There are two popular strategies for deletion from a BTree. Locate and delete the item, then restructure the tree to regain its invariants, OR Do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring 77 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) In some cases the rearrangement will move the deficiency to the parent, and the redistribution must be applied iteratively up the tree, perhaps even to the root. Since the minimum element count doesn't apply to the root, making the root be the only deficient node is not a problem. The algorithm to rebalance the tree is as follows: 1. If the right sibling has more than the minimum number of elements 2. Otherwise, if the left sibling has more than the minimum number of elements 3. If both immediate siblings have only the minimum number of elements 1. Add the separator to the start of the deficient node 2. Replace the separator in the parent with the last element of the left sibling 3. Insert the last child of the left sibling as the first child of the deficient node 4. Add the separator to the end of the deficient node 5. Replace the separator in the parent with the first element of the right sibling 6. Append the first child of the right sibling as the last child of the deficient node 7. Create a new node with all the elements from the deficient node, all the elements from one of its siblings, and the separator in the parent between the two combined sibling nodes 8. Remove the separator from the parent, and replace the two children it separated with the combined node 9. If that brings the number of elements in the parent under the minimum, repeat these steps with that deficient node, unless it is the root, since the root is permitted to be deficient The only other case to account for is when the root has no elements and one child. In this case it is sufficient to replace it with its only child. For example, here is a portion of a B-tree with order 2 (nodes have at least 2 keys and 3 pointers). Nodes are delimited with [square brackets]. The keys are city names, and are kept sorted in each node. On either side of every key are pointers linking the key to subsequent nodes: Start here | v [ Chicago Hoboken ] | | | +-----------+ | +-----+ | | | v v v [ Aptos Boston ] [Denver] [San-Jose Seattle ] | | | | | | | | | v v v v v v v v v X To find the key "Dallas", we begin searching at the top "root" node. "Dallas" is not in the node but sorts between "Chicago" and "Hoboken", so we follow the middle pointer to the next node. Again, "Dallas" is not in the node but sorts before "Denver", so we follow that node's first pointer down to the next node (marked with an "X"). Eventually, we will either locate the key, or encounter a "leaf" node at the bottom level of the B-tree with no pointers to any lower nodes and without the key we want, indicating the key is nowhere in the B-tree. Below is another fragment of an order 1 B-tree (nodes have at least 1 key and 2 pointers). Searching for the key "Chicago" begins at "Marin", follows the first pointer to "Aptos" (since Chicago sorts before Marin), then follows that node's second pointer down to the next level (since Chicago sorts after Aptos), as marked with an "X". v [ Marin ] | | +--+ +---+ | | v v [ Aptos ] [ Seattle ] | | | | v v v v X V. B-TREE ALGORITHMS A B-tree is a data structure that [9]maintains an ordered set of data and allows efficient operations to find, delete, insert, and browse the data. In this discussion, each piece of data stored in a B-tree will be called a "key", because each key is unique and can occur in the B-tree in only one location. A B-tree consists of "node" records containing the keys, and pointers that link the nodes of the B-tree together. Every B-tree is of some "order n", meaning nodes contain from n to 2n keys, and nodes are thereby always at least half full of keys. Keys are kept in sorted order within each node. A corresponding list of pointers are effectively interspersed between keys to indicate where to search for a key if it isn't in the current node. A node containing k keys always also contains k+1 pointers. Searching a B-tree for a key always begins at the root node and follows pointers from node to node until either the key is located or the search fails because a leaf node is reached and there are no more pointers to follow. B-trees grow when new keys are inserted. Since the root node initially begins with just one key, the root node is a special exception and the only node allowed to have less than n keys in an order n B-tree. 78 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) Here is an order 2 B-tree with integer keys. Except for the special root node, order 2 requires every node to have from 2 to 4 keys and 3 to 5 pointers. Empty slots are marked with ".", showing where future keys have not yet been stored in the nodes: | | +----+ +------+ | | v v [74 75 . .] [77 78 . .] [ 57 . . .] | | +----------+ +----------+ | | v v [ 14 40 . .] [ 72 84 . .] | | | | | | +-+ | +-+ +-+ | +-+ | | | | | | v v v v v v [01 12 . .][15 16 17 .] [47 56 . .] 76 78] [85 86 99 .] In this case, the parent node contained only 2 keys (72 and 84), leaving room for 76 to be promoted and inserted. But if the parent node was also already full with 4 keys, then it too would have to split. Indeed, splitting may propagate all the way up to the root node. When the root splits, the B-tree grows in height by one level, and a new root with a single promoted key is formed. (A situation when an order n root node sometimes has fewer than n keys, just like the situation described earlier when the root node stores the very first key placed in the B-tree.) If a node underflows, we may be able to "redistribute" keys by borrowing some from a neighboring node. For example, in the order 3 B-tree below, the key 67 is being deleted, which causes a node to underflow since it only has keys 66 and 88 left. So keys from the neighbor on the left are "shifted through" the parent node and redistributed so both leaf nodes end up with 4 keys: Before deleting 67 [ xx 55 xx ] | | +--------+ +--------+ | | v v [22 24 26 28 33 44][66 67 88 . . .] After deleting 67 [ xx 33 xx ] | | +--------+ +--------+ | | v v [22 24 26 28 . .] [44 55 66 88 . .] [58 60 61 .] [74 75 To insert the key "59", we first simply search for that key. If 59 is found, the key is already in the tree and the insertion is superfluous. Otherwise, we must end up at a leaf node at the bottom level of the tree where 59 would be stored. In the above case, the leaf node contains 58, 60, 61, and room for a fourth key, so 59 is simply inserted in the leaf node in sorted order: [58 59 60 61] Now we'll insert the key "77". The initial search leads us to the leaf node where 77 would be inserted, but the node is already full with 4 keys: 74, 75, 76, and 78. Adding another key would violate the rule that order 2 B-trees can't have more than 4 keys. Because of this "overflow" condition, the leaf node is split into two leaf nodes. The leftmost 2 keys are put in the left node, the rightmost 2 keys are put in the right node, and the middle key is "promoted" by inserting it into the parent node above the leaf. Here, inserting 77 causes the 74-75-76-78 node to be split into two nodes, and 76 is moved up to the parent node that contained 72 and 84: Before inserting 77 But if the underflow node and the neighbor node have less than 2n keys to redistribute, the two nodes will have to be combined. For example, here key 52 is being deleted from the B-tree below, causing an underflow, and the neighbor node can't afford to give up any keys for redistribution. So one node is discarded, and the parent key moves down with the other keys to fill up a single node: [ 72 84 . .] | | | -+ | +| | | v [74 75 76 78] VI. OPERATIONS ON B-TREES The algorithms for the search, create, and insert operations are shown below. Note that these algorithms are single pass; in other words, they do not traverse back up the tree. After inserting 77 [ 72 76 84 .] | | | | --+ | | +-79 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) B-Tree-Split-Child(x, i, y) z <- Allocate-Node() leaf[z] <- leaf[y] n[z] <- t - 1 for j <- 1 to t - 1 do keyj[z] <- keyj+t[y] if not leaf[y] then for j <- 1 to t do cj[z] <- cj+t[y] n[y] <- t - 1 for j <- n[x] + 1 downto i + 1 do cj+1[x] <- cj[x] ci+1 <- z for j <- n[x] downto i do keyj+1[x] <- keyj[x] keyi[x] <- keyt[y] n[x] <- n[x] + 1 Disk-Write(y) Disk-Write(z) Disk-Write(x) If is node becomes "too full," it is necessary to perform a split operation. The split operation moves the median key of node x into its parent y where x is the ith child of y. A new node, z, is allocated, and all keys in x right of the median key are moved to z. The keys left of the median key remain in the original node x. The new node, z, becomes the child immediately to the right of the median key that was moved to the parent y, and the original node, x, becomes the child immediately to the left of the median key that was moved into the parent y. The split operation transforms a full node with 2t - 1 keys into two nodes with t - 1 keys each. Note that one key is moved into the parent node. The B-Tree-Split-Child algorithm will run in time O(t) where t is constant. Since b-trees strive to minimize disk accesses and the nodes are usually stored on disk, this single-pass approach will reduce the number of node visits and thus the number of disk accesses. Simpler double-pass approaches that move back up the tree to fix violations are possible. Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage (memory), all references to a given node be be preceeded by a read operation denoted by Disk-Read. Similarly, once a node is modified and it is no longer needed, it must be written out to secondary storage with a write operation denoted by Disk-Write. The algorithms below assume that all nodes referenced in parameters have already had a corresponding Disk-Read operation. New nodes are created and assigned storage with the Allocate-Node call. The implementation details of the Disk-Read, Disk-Write, and Allocate-Node functions are operating system and implementation dependent. B-Tree-Search(x, k) i <- 1 while i <= n[x] and k > keyi[x] do i <- i + 1 if i <= n[x] and k = keyi[x] then return (x, i) if leaf[x] then return NIL else Disk-Read(ci[x]) return B-Tree-Search(ci[x], k) The search operation on a b-tree is analogous to a search on a binary tree. Instead of choosing between a left and a right child as in a binary tree, a b-tree search must make an n-way choice. The correct child is chosen by performing a linear search of the values in the node. After finding the value greater than or equal to the desired value, the child pointer to the immediate left of that value is followed. If all values are less than the desired value, the rightmost child pointer is followed. Of course, the search can be terminated as soon as the desired node is found. Since the running time of the search operation depends upon the height of the tree, B-Tree-Search is O(logt n). B-Tree-Insert(T, k) r <- root[T] if n[r] = 2t - 1 then s <- Allocate-Node() root[T] <- s leaf[s] <- FALSE n[s] <- 0 c1 <- r B-Tree-Split-Child(s, 1, r) B-Tree-Insert-Nonfull(s, k) else B-Tree-Insert-Nonfull(r, k) B-Tree-Create(T) x <- Allocate-Node() leaf[x] <- TRUE n[x] <- 0 Disk-Write(x) root[T] <- x The B-Tree-Create operation creates an empty b-tree by allocating a new root node that has no keys and is a leaf node. Only the root node is permitted to have these properties; all other nodes must meet the criteria outlined previously. The B-Tree-Create operation runs in time O(1). B-Tree-Insert-Nonfull(x, k) i <- n[x] if leaf[x] then while i >= 1 and k < keyi[x] do keyi+1[x] <- keyi[x] i <- i - 1 80 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) keyi+1[x] <- k n[x] <- n[x] + 1 Disk-Write(x) else while i >= and k < keyi[x] do i <- i - 1 i <- i + 1 Disk-Read(ci[x]) if n[ci[x]] = 2t - 1 then B-Tree-Split-Child(x, i, ci[x]) if k > keyi[x] then i <- i + 1 B-Tree-Insert-Nonfull(ci[x], k) To perform an insertion on a b-tree, the appropriate node for the key must be located using an algorithm similiar to B-Tree-Search. Next, the key must be inserted into the node. If the node is not full prior to the insertion, no special action is required; however, if the node is full, the node must be split to make room for the new key. Since splitting the node results in moving one key to the parent node, the parent node must not be full or another split operation is required. This process may repeat all the way up to the root and may require splitting the root node. This approach requires two passes. The first pass locates the node where the key should be inserted; the second pass performs any required splits on the ancestor nodes. Since each access to a node may correspond to a costly disk access, it is desirable to avoid the second pass by ensuring that the parent node is never full. To accomplish this, the presented algorithm splits any full nodes encountered while descending the tree. Although this approach may result in unecessary split operations, it guarantees that the parent never needs to be split and eliminates the need for a second pass up the tree. Since a split runs in linear time, it has little effect on the O(t logt n) running time of B-Tree-Insert. VII. EXAMPLES Sample B-Tree Fig. 4 Searching a B-Tree for Key 21 Fig.5 Inserting Key 33 into a B-Tree (w/ Split) B-Tree-Delete Deletion of a key from a b-tree is possible; however, special care must be taken to ensure that the properties of a b-tree are maintained. Several cases must be considered. If the deletion reduces the number of keys in a node below the minimum degree of the tree, this violation must be corrected by combining several nodes and possibly reducing the height of the tree. If the key has children, the children must be rearranged. Fig.6 81 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012) For example, imagine a database storing bank account balances. Now assume that someone attempts to withdraw $40 from an account containing $60. First, the current balance is checked to ensure sufficent funds. After funds are disbursed, the balance of the account is reduced. This approach works flawlessly until concurrent transactions are considered. Suppose that another person simultaneously attempts to withdraw $30 from the same account. At the same time the account balance is checked by the first person, the account balance is also retrieved for the second person. Since neither person is requesting more funds than are currently available, both requests are satisfied for a total of $70. After the first person's transaction, $20 should remain ($60 - $40), so the new balance is recorded as $20. Next, the account balance after the second person's transaction, $30 ($60 - $30), is recorded overwriting the $20 balance. Unfortunately, $70 have been disbursed, but the account balance has only been decreased by $30. Clearly, this behavior is undesirable, and special precautions must be taken. VIII. APPLICATIONS Databases A database is a [4]collection of data organized in a fashion that facilitates updating, retrieving, and managing the data. The data can consist of anything, including, but not limited to names, addresses, pictures, and numbers. Databases are commonplace and are used everyday. For example, an airline reservation system might maintain a database of available flights, customers, and tickets issued. A teacher might maintain a database of student names and grades. Because computers excel at quickly and accurately manipulating, storing, and retrieving data, databases are often maintained electronically using a database management system. Database management systems are essential components of many everyday business operations. Database products like Microsoft SQL Server, Sybase Adaptive Server, IBM DB2, and Oracle serve as a foundation for accounting systems, inventory systems, medical recordkeeping sytems, airline reservation systems, and countless other important aspects of modern businesses. It is not uncommon for a [5]database to contain millions of records requiring many gigabytes of storage. For examples, TELSTRA, an Australian telecommunications company, maintains a customer billing database with 51 billion rows (yes, billion) and 4.2 terabytes of data. In order for a database to be useful and usable, it must support the desired operations, such as retrieval and storage, quickly. Because databases cannot typically be maintained entirely in memory, b-trees are often used to index the data and to provide fast access. For example, searching an unindexed and unsorted database containing n key values will have a worst case running time of O(n); if the same data is indexed with a b-tree, the same search operation will run in O(log n). To perform a search for a single key on a set of one million keys (1,000,000), a linear search will require at most 1,000,000 comparisons. If the same data is indexed with a b-tree of minimum degree 10, 114 comparisons will be required in the worst case. Clearly, indexing large amounts of data can significantly improve search performance. Although other balanced tree structures can be used, a b-tree also optimizes costly disk accesses that are of concern when dealing with large data sets. REFERENCES 1. Bowman, M., Debray, S. K., and Peterson, L. L. 1993. Reasoning about naming 2. Bayer, R., M. Schkolnick. Concurrency of Operations on B-Trees. In Readings in Database Systems (ed. Michael Stonebraker), pages 216226, 1994. 3. Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, Introduction to Algorithms, MIT Press, Massachusetts: 1998. 4. Gray, J. N., R. A. Lorie, G. R. Putzolu, I. L. Traiger. Granularity of Locks and Degrees of Consistency in a Shared Data Base. In Readings in Database Systems (ed. Michael Stonebraker), pages 181-208, 1994. 5. Kung, H. T., John T. Robinson. On Optimistic Methods of Concurrency Control. In Readings in Database Systems (ed. Michael Stonebraker), pages 209-215, 1994. 6. Section 19.3, pages 395-397, of Cormen, Leiserson, and Rivest. 7. http://www.cs.princeton.edu/introalgsds/44balanced 8. Comer, Douglas (June 1979), "The Ubiquitous B-Tree", Computing Surveys 11 (2): 123–137, doi:10.1145/356770.356776, ISSN 03600300. 9. Cormen, Thomas; Leiserson, Charles; Rivest, Ronald; Stein, Clifford (2001), Introduction to Algorithms (Second ed.), MIT Press and McGraw-Hill, pp. 434–454, ISBN 0-262-03293-7. Chapter 18: BTrees. 10. Folk, Michael J.; Zoellick, Bill (1992), File Structures (2nd ed.), Addison-Wesley, ISBN 0-201-55713-4 11. Knuth, Donald (1998), Sorting and Searching, The Art of Computer Programming, Volume 3 (Second ed.), Addison-Wesley, ISBN 0-20189685-0. Section 6.2.4: Multiway Trees, pp. 481–491. Also, pp. 476– 477 of section 6.2.3 (Balanced Trees) discusses 2-3 trees. Concurrent Access to B-Trees Databases typically run in multiuser environments where many users can concurrently perform operations on the database. Unfortunately, this common scenario introduces complications. 82
© Copyright 2026 Paperzz