Bulk Loading of a B+ Tree

Index tuning-B+tree
overview
B+-Tree Locking
• Tree Traversal
– Update, Read
– Insert, Delete
• phantom problem: need for range locking
• ARIES KVL (implemented in DB2)
•
•
•
•
Tree Traversal (next page)
Lock on tuples
Lock on key values
Range locking:
– Next key lock
2
© Dennis Shasha, Philippe Bonnet 2001
4
4
B+-Tree Locking
A
B
T1 lock
C
T1 lock
D
E
T1 lock
F
© Dennis Shasha, Philippe Bonnet 2001
Bulk Loading of a B+ Tree
• If we have a large collection of records, and we want to
create a B+ tree on some field, doing so by repeatedly
inserting records is very slow.
• Bulk Loading can be done much more efficiently.
• Initialization: Sort all data entries, insert pointer to first
(leaf) page in a new (root) page.
Bulk Loading (Contd.)
• Add <low key value on page, pointer to page> to
the root page
Bulk Loading (Contd.)
• Split the root and create a new root page.
Bulk Loading (Contd.)
• Index entries for leaf pages always entered into rightmost index
page just above leaf level. When this fills up, it splits. (Split may go
up right-most path to the root.)
• Much faster than repeated inserts, especially
when one considers locking!
Comparison: B-trees vs. static
indexed sequential file
Ref #1: Held & Stonebraker, “B-Trees Reexamined”, CACM, Feb. 1978
Ref # 1 claims:
- Concurrency control harder in B-Trees
- B-tree consumes more space
For their comparison:
block = 512 bytes
key = pointer = 4 bytes
4 data records per block
Example: 1 block static index
k1
k1
127 keys
1 data
block
k2
k2
k3
(127+1)4 = 512 Bytes
-> pointers in index implicit!
k3
up to 127
blocks
Example: 1 block B-tree
k1
k1
1 data
block
k2
k2
...
63 keys
k63
k3
next
63x(4+4)+8 = 512 Bytes
-> pointers needed in B-tree
blocks because index is
not contiguous
up to 63
blocks
Size comparison
Ref. #1
Static Index
# data
blocks
height
2 -> 127
2
128 -> 16,129
16,130 -> 2,048,383
B-tree
# data
blocks height
2
3
4
2 -> 63
64 -> 3968
3969 -> 250,047
250,048 -> 15,752,961
3
4
5
Ref. #1 analysis claims
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization
Ref. #1 conclusion
Static index better!!
Ref #2: M. Stonebraker, “Retrospective on a
database system,” TODS, June 1980
Ref. #2 conclusion
B-trees better!!
• DBA does not know when to reorganize
• DBA does not know how full to load
pages of new index
• Buffering
– B-tree: has fixed buffer requirements
– Static index: must read several overflow
blocks to be efficient
(large & variable size
buffers needed for this)
• Speaking of buffering…
Is LRU a good policy for B+tree buffers?
 Of course not!
 Should try to keep root in memory
at all times
(and perhaps some nodes from second level)
Interesting problem:
For B+tree, how large should n be?
…
n is number of keys / node
Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S
(3) Assume B+tree is full, i.e.,
nodes to examine is LOGn N
where N = # records
#
Can get:
f(n) = time to find a record
f(n)
nopt
n
 FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”
(see homework for details)
 What happens to nopt as
• Disk gets faster?
• CPU get faster?