Red Black Trees

Balanced Search Trees
15-211
Fundamental Data Structures and
Algorithms
Margaret Reid-Miller
3 February 2005
Plan
 Today
 2-3-4 trees
 Red-Black trees
 Reading:
For today: Chapters 13.3-4
 Reminder: HW1 due tonight!!!
HW2 will be available soon
AVL-tree Review
AVL-Trees
What is the key restriction on a binary
search tree that keeps an AVL tree
balanced?
5
3
6
7
5
2
4
6
2
9
2
7
1
1
4
5
8
3
4
6
7
9
4
5
8
3
OK
not OK
AVL-Trees
 Height balanced:
 For each node the heights of left
and right subtrees differ by at
most 1, a representational
invariance.
 What is the mechanism to rebalance
an out-of-balanced AVL tree caused
by an insert?
The single rotation
 Rotate the deepest out-of-balanced
node. “Pulls” the child up one level.
Z
Y
X
X
Y
Z
The double rotation
 First rotate around child node, then
around the parent node.
Z
Z
X
Y2
Y1
Y2
X
Y1
Double rotation cont’d
 Result is to “pull” the grandchild
node up two levels.
Z
X
X
Y1
Y2
Y1
Y2
Z
AVL Tree Summary
 In each node maintains a lazy
deletion flag and the height of its
subtree.
 The height of an AVL tree is at most
45% greater than the minimum.
 Requires at most one single or
double rotation to regain balance
after an insert.
 Thus, guarantees O(log N) time for
search and insert.
2-3-4 Trees
Balanced 2-3-4 Trees
 Maintain height balance in all
subtrees. Depth property.
 But allow nodes in the tree to
expand to accommodate inserts.
In particular, nodes can have 2, 3 or 4
children. Node-size property.
E.g., a 4-node would have 3 keys that
splits the keys into 4 intervals.
2-3-4 tree search
 Search is similar to a binary search.
 E.g., search for B
G M Q
A C
H
R
S
W
2-3-4 tree search
 Search is similar to a binary search.
 E.g., search for B
G M Q
A C
H
R
S
W
2-3-4 Tree Insert
 To insert, first search for a leaf node
in which to put the key.
 E.g., insert U
G M Q
A C
H
R
S
W
G M Q
A C
H
O
S
U W
2-3-4 Tree Insert
 May need to split a node
 E.g., insert T
G Q
A C
S
H
G
A C
T
H
Q
U W
U
S T
W
2-3-4 Tree Insert
/* Either returns an empty node or a new root */
public Node BUinsert(int key) {
if isEmptyNode() return new Node(key);
/* Search for leaf to put key into */
Node subtree = findChild(key);
// down which link?
Node upNode = child.BUinsert(key);
/* upNode is empty, the key at a leaf node, or
* the result of a 4-node split that needs to be
* propagated up. */
if upNode.isEmptyNode() return upNode;
else
return addToNode(upNode); // split?
}
Cascading splits
 When inserting a key into a 4-node, the
4-node splits and a key moves up to the
parent node.
 This new key may in turn cause the
parent to split, moving a key up to the
grandparent, and so on up to the root.
 When would this happen?
 Is there a way to avoid these cascading
splits?
Bottom-up 2-3-4 trees
 This BUinsert is called a bottom-up
version of insert, since splits occur
as we go back up the tree after the
recursive calls.
 Work occurs before and after the
recursive calls.
Preemptive Split
 Every time we find a 4-node while
traveling down a search path, we split the
4-node.
 Note: Two 2-nodes have the same
number of children as one 4-node.
 Changes are local to the split node (no
cascading).
 Guaranteed to find a 2-node or 3-node at
the leaf.
 Splitting a root node creates a new root.
2-3-4 Tree Height
 What is the height of the tree?
At most log2 N + 1
 Why?
The maximum depth is when every
node is a 2-node. Since every leaf has
the same depth, the tree is complete
and has depth log2 N + 1.
Number of splits
 How many splits does an insertion
require?
At most log2 N + 1 splits.
 Seems to require less than one split
on average when tree is built from a
random permutation. Trees tend to
have few 4-nodes.
Top-down 2-4-5 trees
 The second method is called topdown as splits occur on the way
down the tree.
 All the work occurs before the
recursive calls and no work occurs
after the recursive calls.
Called tail-recursion, which is much
more efficient.
 Can AVL trees be made tail
recursive?
2-3-4 trees
 Advantages:
 Guaranteed O(log N) time for search and
insert.
 Issues:
 Awkward to maintain three types of nodes.
 Need to modify the standard search on binary
trees.
 Splits need to move links between nodes.
 Code has many cases to handle.
Red Black Trees
Red-Black trees
 A red-black tree is binary tree
representation of a 2-3-4 tree using
red and black nodes.
G
D
G
I
D
B F H
I
D
OR
F
I
B
H
Red-black tree properties
A Red-Black tree is a binary search tree
where
 Every node is colored either red or black.
 Note: Every 2-3-4 node corresponds to one
black node.
 The root node is black.
 Red nodes always have black parents
(children)
 Every path from the root to a leaf has
same number of black nodes.
Red-black tree height
5
7
3
6
9
 What is the height of a red-black
tree?
It is at most 2 log N + 2 since it can be
at most twice as high as its
corresponding 2-3-4 tree, which has
height at most log N + 1.
Red-black Tree Search
 Search is the same as for binary
search trees.
Color is irrelevant.
 Search guaranteed to take O(log N)
time.
 Search typically occurs more
frequently than insert.
Red-black Tree Insert
Simple 4-node test (2 red children?)
Few splits as most 4-nodes tend to be
near the leaves.
Some 4-node splits require only
changing the color of three nodes.
Rotations needed only when a 4-node
has a 3-node parent.
Red-black Tree Summary
 Advantages:
Guaranteed O(log N) time for search
and insert.
Little overhead for balancing.
Trees are nearly optimal.
Top-down implementation can be made
tail-recursive, so very efficient.
B-Trees
B-trees
 A generalization of 2-3-4 trees.
 Used for very large dictionaries
where the data are maintained on
disks.
 Since disk lookups are very SLOW,
want to read as few disk pages as
possible.
Want really shallow depth trees!
B-trees Key Idea
 Make the nodes in the trees have a
huge number of links, k-way.
 Typically choose k so that a node fills
a disk page.
 As with 2-3-4 trees, not all the
nodes have k links. Some may have
as few as k/2 links.
 When a node overflows, split the
node.
B-trees
 Takes O(log
insert.
k/2
N) probes for search and
 Typically about 2-3 probes (disk accesses)
 E.g., for N < 125 million and k = 1000,
the height of the tree is less than 3.
 As all searches go through the root node,
usually keep the root node in memory.
 Many variants
 Common in many large data base
systems.
Conclusion
 AVL trees have the disadvantage that
insert is not tail recursive.
 2-3-4 trees are not practical, but are a
good way to think about other
approaches.
 Red-black trees are very efficient and
have guaranteed O(log N) insert and
search.
 B-trees have very shallow depth to
minimize the number of disk reads
needed for huge data bases.