Algorithms and Data Structures COMP 261 # 1

COMP261 Lecture 23
B Trees
2
Storing large amounts of data
• File systems:
– Lots of files, each file stored as lots of blocks.
• Databases:
– large tables of data
– each indexed by a key (or perhaps multiple alternative keys)
• How do we access the data efficiently:
– individual items (given key)
– sequence of all items (perhaps in order)
– assume data is stored in files on hard drives (slow access time)
• Use some kind of index structure
– Usually also stored in a file
• B Trees, B+ Trees: data structures and algorithms for
– large data
– stored on disk (ie, slow access)
3
COMP103 approaches:
• Efficient Set and Map implementations:
– Hash Tables – fast, but hard to list in order
– Binary Search Trees
– Intended for in-core data structures
• http://www.cs.usfca.edu/~galles/visualization
• Binary search tree
– Add M H T S R Q B A F D Z
– Search: Log(n)
– What if we add in alphabetical
order?
4
Problems with Binary Search Trees
• Trees can become unbalanced
⇒ lookup times can become linear
B
A
• How can we keep the tree balanced?
C
F
AVL trees: self-balanced by tree rotations
Red-Black trees: node with a colour bit
H
G
K
P
Splay trees: move recently accessed node
to the root
N
M
B/B+ trees: always add levels at the top!
5
Problems with Binary Search Trees
• Lots of pointer following
⇒ slow if each node is stored in a file!
In this case, loading a node to read
A
its value can take far longer than
the comparisons we usually count!
B
C
C
F
F
H
• How can we reduce pointer following?
G
K
• More data in each node
• More children of each node
P
N
⇒ "bushier" trees, fewer steps to leaves
M
6
B Trees
Like binary search trees, but:
• Designed for external data structures, stored in files
• Non-leaf nodes have up to m children, and m-1 data
values (for B tree of order m, or m-1:m tree)
• Non-leaf, non-root nodes always have at least ⌈m/2⌉
children and ⌈m/2⌉-1 data values (ie, at least half full)
• Leaf nodes have up to m-1 data values and no
children
• Adding is done "at the top" rather than "at the
bottom”, so leaves are all at same depth
7
B Trees
• Like binary search trees, but
– Non-leaf nodes have up to m children, and m-1 data values
– Non-leaf, non-root nodes always have at least ⌈m/2⌉ children and ⌈m/2⌉1 data values (ie, at least half full)
– Leaf nodes contain up to m-1 data values and no children
– Adding is done "at the top" rather than "at the bottom” – leaves are all at
the same depth
B tree example:
N
m=3, ("2-3 tree")
3 children, 2 values
2 children, 1 value
E
A
C
G
J
V
R
K
M
P
Q
X
S
T
W
Y
Z
8
2-3 trees
• Every internal node is a 3-node or a 2-node
– 3-node: 3 children, 2 values
– 2-node: 2 children, 1 value
• All leaves have 1 or 2 values
• Data is kept in sorted order
9
B Trees: Search
• Data values might be
– single items (set of values)
– key:value pairs (map)
• Search(key, node):
– Just like binary search, but more comparisons at each node:
if node is null
return fail
for i = 0 to k-1 (k is number of keys in node)
if key < keys[i]
return search(key, children[i])
if key == keys[i]
return value[i]
return search(key, children[k])
A
E
C
J
G
K
M
10
B Trees: Insert
To insert item (eg key:value pair):
Search for leaf where the item should be.
If leaf is not full, add item to the leaf.
If leaf is full:
Identify the middle item (existing item or the new item)
Create a new leaf node:
retain items before middle key in original node
put items after middle key in new leaf node,
push item up to parent, along with pointer to new node
To add new item to parent:
if parent is not full:
add new item to parent, and
add new child pointer just right of new item
else: split parent node into two nodes (like leaf)
push middle item up to grandparent.
add pointer to new child just right of pushed up item
11
2-3 B Tree: Inserting values
• Add M H T S R Q B A F D Z
• http://www.cs.usfca.edu/~galles/visualization/BTree.html
12
2-3 B Tree: Inserting values
• Add 8, 5, 1, 7, 3,12, 9, 6
• http://www.cs.usfca.edu/~galles/visualization/BTree.html
13
2-3 BTree: Deletion
• Opposite of inserting:
if a node becomes empty,
if possible, rotate a value from a sibling through the parent
to ensure minimum number of values per node.
if two siblings, require >= 5 keys in parent and siblings
if one sibling, require >=3 keys in parent and siblings
• else merge nodes:
– if two siblings, merge
– Easier at a leaf.
– Harder if at an internal node.
14
B Tree: Deletion
To delete a value, x:
• Search for the node, say n, containing x.
• If n is a leaf, just delete x from n.
• If n is an internal node, replace x by the next smallest (largest in
right subtree) or next largest (smallest in left subtree).
– E.g. if children are leaves:
[…|x|…| ]
[…|w|…|
]

[…|w| ]
[…| ]
– or:
[…|x|…|
]
[…|y|…|
]

[y|…| ]
[…| ]
15
B Tree: Deletion
• After deleting x, one leaf node has one fewer values.
• If this node now has fewer than the required minimum,
we must rebalance the tree.
• Either:
– Rotate: Move a value from a sibling node to the
parent node, in place of the separator, and move
the separator to the under-full leaf;
• Or:
– Merge: Combine the under-full node with a sibling
and the separator, and delete the empty node and
the separator from the parent.
Does this cover all
cases?
16
B Tree: Deletion - Rotate
If a sibling node has more than the minimum number of values,
move its smallest/largest value to the parent node, in place of the
separator, and move the separator to the under-full leaf.
• If left sibling has a “spare” value:
[…|x|…|
]
[…|w|…|
]

[…|w| ]
[…|
]
• If left sibling has a “spare” value:
• Exercise!
[…| ]
[x|…|
]
17
B Tree: Deletion - Merge
If there is not a sibling with a “spare” value:
merge the under-full leaf with a sibling (which has
minimum number of values)
– Move the separator and all of the values from the
sibling into the under-full node.
– Delete the sibling and delete the separator from the
parent.
• Merge
[…|s|…|
]
[…|…|

[…| ]
[…|
]
[…|s|…]
]
18
Deleting from leaves
E
A
C
E
J
G
C
K
A
M
C
J
G
M
J
C
J
A
E
M
A
A
C
J
M
Only reduce levels
at the root
A
C
19
More deletions:
• When internal node becomes too empty
– propagate the deletion up the tree
N
E
A
C
J
G
V
R
K
P
X
S
W
Y
Z
20
More deletions:
• Deletion from internal node ⇒
– rotate key and child from sibling
J
E
A
C
V
N
G
K
X
R
S
W
Y
Z
21
Analysis
• B trees are balanced:
– A new level is introduced only at the top
– A level is removed only from the top
Therefore:
• all leaves are at the same level.
• Cost of search/add/delete:
• O(log⌈m/2⌉(n)) (at worst)
• O(logm(n)) (at best)
= depth of tree with all half full nodes
= depth of tree with full nodes
• if 100 million items in a B tree with m = 20,
log10(100,000,000) = ? 8
log20(100,000,000) = ? 6.14
• if billion items in a B tree with m = 100,
log50(1,000,000,000) = ? 5.3
log100(1,000,000,000) = ? 4.5