Balanced Search Trees
15-211
Fundamental Data Structures and
Algorithms
Margaret Reid-Miller
3 February 2005
Plan
Today
2-3-4 trees
Red-Black trees
Reading:
For today: Chapters 13.3-4
Reminder: HW1 due tonight!!!
HW2 will be available soon
AVL-tree Review
AVL-Trees
What is the key restriction on a binary
search tree that keeps an AVL tree
balanced?
5
3
6
7
5
2
4
6
2
9
2
7
1
1
4
5
8
3
4
6
7
9
4
5
8
3
OK
not OK
AVL-Trees
Height balanced:
For each node the heights of left
and right subtrees differ by at
most 1, a representational
invariance.
What is the mechanism to rebalance
an out-of-balanced AVL tree caused
by an insert?
The single rotation
Rotate the deepest out-of-balanced
node. “Pulls” the child up one level.
Z
Y
X
X
Y
Z
The double rotation
First rotate around child node, then
around the parent node.
Z
Z
X
Y2
Y1
Y2
X
Y1
Double rotation cont’d
Result is to “pull” the grandchild
node up two levels.
Z
X
X
Y1
Y2
Y1
Y2
Z
AVL Tree Summary
In each node maintains a lazy
deletion flag and the height of its
subtree.
The height of an AVL tree is at most
45% greater than the minimum.
Requires at most one single or
double rotation to regain balance
after an insert.
Thus, guarantees O(log N) time for
search and insert.
2-3-4 Trees
Balanced 2-3-4 Trees
Maintain height balance in all
subtrees. Depth property.
But allow nodes in the tree to
expand to accommodate inserts.
In particular, nodes can have 2, 3 or 4
children. Node-size property.
E.g., a 4-node would have 3 keys that
splits the keys into 4 intervals.
2-3-4 tree search
Search is similar to a binary search.
E.g., search for B
G M Q
A C
H
R
S
W
2-3-4 tree search
Search is similar to a binary search.
E.g., search for B
G M Q
A C
H
R
S
W
2-3-4 Tree Insert
To insert, first search for a leaf node
in which to put the key.
E.g., insert U
G M Q
A C
H
R
S
W
G M Q
A C
H
O
S
U W
2-3-4 Tree Insert
May need to split a node
E.g., insert T
G Q
A C
S
H
G
A C
T
H
Q
U W
U
S T
W
2-3-4 Tree Insert
/* Either returns an empty node or a new root */
public Node BUinsert(int key) {
if isEmptyNode() return new Node(key);
/* Search for leaf to put key into */
Node subtree = findChild(key);
// down which link?
Node upNode = child.BUinsert(key);
/* upNode is empty, the key at a leaf node, or
* the result of a 4-node split that needs to be
* propagated up. */
if upNode.isEmptyNode() return upNode;
else
return addToNode(upNode); // split?
}
Cascading splits
When inserting a key into a 4-node, the
4-node splits and a key moves up to the
parent node.
This new key may in turn cause the
parent to split, moving a key up to the
grandparent, and so on up to the root.
When would this happen?
Is there a way to avoid these cascading
splits?
Bottom-up 2-3-4 trees
This BUinsert is called a bottom-up
version of insert, since splits occur
as we go back up the tree after the
recursive calls.
Work occurs before and after the
recursive calls.
Preemptive Split
Every time we find a 4-node while
traveling down a search path, we split the
4-node.
Note: Two 2-nodes have the same
number of children as one 4-node.
Changes are local to the split node (no
cascading).
Guaranteed to find a 2-node or 3-node at
the leaf.
Splitting a root node creates a new root.
2-3-4 Tree Height
What is the height of the tree?
At most log2 N + 1
Why?
The maximum depth is when every
node is a 2-node. Since every leaf has
the same depth, the tree is complete
and has depth log2 N + 1.
Number of splits
How many splits does an insertion
require?
At most log2 N + 1 splits.
Seems to require less than one split
on average when tree is built from a
random permutation. Trees tend to
have few 4-nodes.
Top-down 2-4-5 trees
The second method is called topdown as splits occur on the way
down the tree.
All the work occurs before the
recursive calls and no work occurs
after the recursive calls.
Called tail-recursion, which is much
more efficient.
Can AVL trees be made tail
recursive?
2-3-4 trees
Advantages:
Guaranteed O(log N) time for search and
insert.
Issues:
Awkward to maintain three types of nodes.
Need to modify the standard search on binary
trees.
Splits need to move links between nodes.
Code has many cases to handle.
Red Black Trees
Red-Black trees
A red-black tree is binary tree
representation of a 2-3-4 tree using
red and black nodes.
G
D
G
I
D
B F H
I
D
OR
F
I
B
H
Red-black tree properties
A Red-Black tree is a binary search tree
where
Every node is colored either red or black.
Note: Every 2-3-4 node corresponds to one
black node.
The root node is black.
Red nodes always have black parents
(children)
Every path from the root to a leaf has
same number of black nodes.
Red-black tree height
5
7
3
6
9
What is the height of a red-black
tree?
It is at most 2 log N + 2 since it can be
at most twice as high as its
corresponding 2-3-4 tree, which has
height at most log N + 1.
Red-black Tree Search
Search is the same as for binary
search trees.
Color is irrelevant.
Search guaranteed to take O(log N)
time.
Search typically occurs more
frequently than insert.
Red-black Tree Insert
Simple 4-node test (2 red children?)
Few splits as most 4-nodes tend to be
near the leaves.
Some 4-node splits require only
changing the color of three nodes.
Rotations needed only when a 4-node
has a 3-node parent.
Red-black Tree Summary
Advantages:
Guaranteed O(log N) time for search
and insert.
Little overhead for balancing.
Trees are nearly optimal.
Top-down implementation can be made
tail-recursive, so very efficient.
B-Trees
B-trees
A generalization of 2-3-4 trees.
Used for very large dictionaries
where the data are maintained on
disks.
Since disk lookups are very SLOW,
want to read as few disk pages as
possible.
Want really shallow depth trees!
B-trees Key Idea
Make the nodes in the trees have a
huge number of links, k-way.
Typically choose k so that a node fills
a disk page.
As with 2-3-4 trees, not all the
nodes have k links. Some may have
as few as k/2 links.
When a node overflows, split the
node.
B-trees
Takes O(log
insert.
k/2
N) probes for search and
Typically about 2-3 probes (disk accesses)
E.g., for N < 125 million and k = 1000,
the height of the tree is less than 3.
As all searches go through the root node,
usually keep the root node in memory.
Many variants
Common in many large data base
systems.
Conclusion
AVL trees have the disadvantage that
insert is not tail recursive.
2-3-4 trees are not practical, but are a
good way to think about other
approaches.
Red-black trees are very efficient and
have guaranteed O(log N) insert and
search.
B-trees have very shallow depth to
minimize the number of disk reads
needed for huge data bases.
© Copyright 2026 Paperzz