Red-Black Tree

1431227-3
File Organization and Processing
Week 2
Binary Search Tree and
Red-Black Tree
1
Introduction
• Tree
– Root, nodes (child), edges
– Quick search in ordered array using binary search.
– Running time to search an item O(logN)
– Insertion and deletion is fastest in linked list O(1), whereas
searching is slow O(N)
• Binary Tree
– At most 2 children (left and right)
– Combines the advantages of 2 structures ordered array and
linked list
2
Analogy
• Commonly encountered tree is the hierarchical file
structure in computer system.
–
–
–
–
The root directory of a given device is the tree’s root: c:\
Subdirectories one level below are the children of c:\
Files represent leaves, they have no children of their own
Hierarchical file structure is not a binary tree because they have
many children
– Complete path name from the root to the leaf SMITH.DAT:
C:\SALES\EAST\NOVEMBER\SMITH.DAT
• Differences
– Subdirectories contain no data only reference to other
subdirectories or files. Only files contain data.
– In a tree, every node contains data (a personnel record, car-part
specifications, or whatever). In addition to the data, all nodes
except leaves contain references to other nodes.
3
Unbalanced Tree
• Most of the nodes are one
side of the node or the other
• Individual subtree can also
be unbalanced
• Why unbalanced?
– Because of the order in which
data items are inserted
– If they are inserted randomly it
will be more or less balanced
– If ascending or descending
sequence are generated all
the children will be either in
right or left side-> tree
becomes unbalanced
– Tree
efficiency
can
be
seriously degraded
4
Degenerates to O(N)
• When there are no branches, tree becomes in effect a
linked list
– Arrangement of data becomes 1D instead of 2D
– Searching becomes O(N) instead of O(logN)
– Example: Searching through 10,000 items in an unbalanced tree
requires average 5000 comparisons, whereas for balanced tree
for random insertion it requires 14
– With a realistic amount of random data it's not likely a tree would
become seriously unbalanced. However, there may be runs of
sorted data that will partially unbalance a tree.
– Searching partially unbalanced trees will take time somewhere
between O(N) and O(logN), depending on how badly the tree is
unbalanced.
5
Traversing a Binary Tree
• Inorder traversal
– All the nodes to be visited in ascending order
– If you want to create a sorted list of the data in a
binary tree, this is one way to do it
• Preorder traversal, Postorder traversal
– these traversals are indeed useful if you're writing
programs that parse or analyze algebraic
expressions- Compiler Construction
6
Preorder, Postorder Traversal
• A binary tree (not a binary
search tree) can be used to
represent
an
algebraic
expression that involves the
binary arithmetic operators +,
-, /, and *.
• The root node holds an
operator, and each of its
subtrees represents either a
variable name (like A, B, or
C) or another expression.
*
+
A
B
C
Inorder: A*(B+C)Infix exp.
Preorder: *A+BCPrefix
Postorder: ABC+*Postfix
• Finding the minimum and
maximum value of a BST,
how?
7
Traversing a Tree
• Inorder traversal
– All the nodes to be visited in ascending order
– If you want to create a sorted list of the data in a binary tree, this
is one way to do it
• Preorder traversal, Postorder traversal
– these traversals are indeed useful if you're writing programs that
parse or analyze algebraic expressions- Compiler Construction
• Problem:
– Create a BST using the following items:
• 50, 25, 75, 37, 62, 84, 31, 43, 55, 92
– Traverse inorder, preorder and postorder and display the
numbers
8
Summary of BST
• Advantages of BST
–
–
–
–
Quick insertion, deletion and search
Running time- O(logN)
Array, sorted arrays, linked list perform one or the other activities slowly
BST might appear to the ideal data storage structure
• Disadvantages
– BST work well if the data is inserted into the tree in random order
– Become much slower if data is inserted already in sorted order
(ascending or descending)
– Eventually tree become unbalanced when any item is inserted
– Running time becomes O(N) whereas for balanced tree O(logN)
• How to solve the unbalanced problem of BST?
• RED-BLACK, 2-3-4 TREES ARE THE SOLUTION!!!
9
Red-Black Tree- Balance to Rescue
• To guarantee the quick O(log N) search times a tree is
capable of, we need to ensure that our tree is always
balanced (or at least almost balanced).
• This means that each node in a tree needs to have
roughly the same number of descendents on its left side
as it has on its right.
• In a red-black tree, balance is achieved during insertion.
As an item is being inserted, the insertion routine checks
that certain characteristics of the tree are not violated. If
they are, it takes corrective action, restructuring the tree
as necessary.
• By maintaining these characteristics, the tree is kept
balanced.
10
Red-Black Rules
• When inserting (or deleting) a new node, certain rules,
which we call the red-black rules, must be followed. If
they're followed, the tree will be balanced.
– 1. Every node is either red or black
– 2. The root is always black
– 3. If a node is red, its children must be black (although the
converse isn't necessarily true)
– 4. Every path from the root to a leaf, or to a null child, must
contain the same number of black nodes
11
Red-Black Rules (Cont’d)
• The "null child" referred to in Rule 4 is a place where a
child could be attached to a nonleaf node.
• In other words, it's the potential left child of a node with a
right child, or the potential right child of a node with a left
child. This will make more sense as we go along.
• The number of black nodes on a path from root to leaf is
called the black height.
• Another way to state Rule 4 is that the black height must
be the same for all paths from the root to a leaf.
12
Duplicate Keys
• What happens if there are more than one data item with
the same keys?
– Example: 50, 50, 50, Then 2nd 50 go to the right of the 1st one
and 3rd 50 will go to the left of the 1st 50. Otherwise, it becomes
unbalanced
• What if the randomization insertion process?
– Search process becomes complicated if all items with the same
keys must be found
13
Colors rules are violated
• What if the color rules are violated?
• Two possible actions:
– Change the color of the nodes
– Perform rotations- rearrangement of the nodes to make the tree
balanced
14
Experiment 1
50
• RB Tree: 50, 25, 75
25
• Newly inserted nodes are
always colored red (except
for the root). This is not an
accident. It's less likely
that inserting a red node
will violate the red-black
rules than inserting a black
one.
75
•This is because if the new
red node is attached to a
black one, no rule is
broken. It doesn't create a
situation in which there are
two red nodes together
(Rule 3), and it doesn't
change the black height in
any of the paths (Rule 4).
15
Experiment 1
• RB Tree: 50, 25, 75
• If you attach a new red node to
a red node, Rule 3 will be
violated. However, with any luck
this will only happen half the
time.
50
25
75
• Whereas, if it were possible to
add a new black node, it would
always change the black height
for its path, violating Rule 4.
• Also, it's easier to fix violations
of Rule 3 (parent and child are
both red) than Rule 4 (black
heights differ)
16
Experiment 2- Rotation
50
• Rotation Right
• Unbalanced: more nodes
to the right of the root that
the left
• RB rules (Rule 2) are
violated. Root is always
black
75
25
RoR
25
50
• Now rotate left and the
tree becomes balanced
50
75
RoL
25
75
17
Experiment 3- Flip Color
• Insert 12
50
• You can not insert 12 in the
current arrangement
75
25
• A color flip is necessary
whenever, during the insertion
process, a black node with two
red children is encountered.
• The root's two children change
from red to black. Ordinarily the
parent would change from black
to red, but this is a special case
because it's the root: it remains
black to avoid violating Rule 2.
Now all three nodes are black.
Insert 12
50
75
25
Color Flip
12
18
Experiment 3- Flip Color (Cont’d)
• The tree is still red-black
The root is black, there's
no situation in which a
parent and child are both
red, and all the paths have
the same number of black
nodes (2).
50
75
25
Insert 12
50
• Adding the new red node
didn't change the red-black
correctness.
75
25
Color Flip
12
19
Experiment 4- Unbalanced
• One path has one more
node than the other. This
isn't very unbalanced, and
no red-black rules are
violated, so neither we nor
the red-black algorithms
need to worry about it.
• However, suppose that
one path differs from
another by two or more
levels (where level is the
same as the number of
nodes along the path).
50
25
75
12
• In this case the redblack rules will always
be violated, and we'll
need to rebalance the
tree.
20
Experiment 4- Unbalanced
50
• Insert 6
25
• Rule 3 is violated- Children
must be black of a red node
75
12
• How can we fix things so
Rule 3 isn't violated?
6
6
• An obvious approach is to
change one of the offending
nodes to black. Let's try
changing the child node, 6
into black.
21
Experiment 4- Unbalanced (Cont’d)
• The good news is we fixed
the problem of both parent
and child being red. The bad
news is rule 4 is violated.
The path from the root to
node 6 has three black
nodes in it, while the path
from the root to node 75 has
only two. Thus Rule 4 is
violated. It seems we can't
win.
50
25
75
12
6
6
• This problem can be fixed
with a rotation and some
color changes.
22
RB Rules and Balanced Tree
• Try to create a tree that is unbalanced by two or more
levels but is red-black correct. As it turns out, this is
impossible.
• That's why the red-black rules keep the tree balanced. If
one path is more than one node longer than another, then
it must either have more black nodes, violating Rule 4, or it
must have two adjacent red nodes, violating Rule 3.
23
Null Children
• Rule 4: All paths that go from the
root to any leaf or to any null
children must have the same
number of black nodes.
50
• A null child is a child that a nonleaf node might have, but doesn't.
Thus in Figure the path from 50 to
25 to the right child of 25 (its null
child) has only one black node,
which is not the same as the paths
to 6 and 75, which have 2.
• This arrangement violates Rule 4,
although both paths to leaf nodes
have the same number of black
nodes.
NULL
25
12
6
75
NULL
NULL
NULL
24
Rotation
• To balance a tree, it is necessary to physically rearrange
the nodes. If all the nodes are on the left of the root, for
example, you need to move some of them over to the right
side. This is done using rotations.
• Rotations are ways to rearrange nodes. They were
designed to do the following two things:
– Raise some nodes and lower others to help balance the tree.
– Ensure that the characteristics of a binary search tree are not
violated.
• Recall that in a binary search tree the left children of any
node have key values less than the node, while its right
children have key values greater or equal to the node.
25
Rotation- Importance
• If the rotation didn't maintain a valid binary search tree it
wouldn't be of much use, because the search algorithm
relies on the search-tree arrangement.
• Color rules and node color changes are used only to help
decide when to perform a rotation; fiddling with the colors
doesn't accomplish anything by itself; it's the rotation that's
the heavy hitter.
• Color rules are like rules of thumb for building a house
(such as "exterior doors open inward"), while rotations are
like the hammering and sawing needed to actually build it.
26
Mind the Children
• During right rotation, the top node must has a left
child. Otherwise there's nothing to rotate into the
top spot.
• During left rotation, the top node must have a right
child.
27
The Weird Crossover Node
• Create a RB Tree: 50, 25, 75, 12, 37
50
50
25
25
75
75
Violates Rule 3
12
50
25
12
75
37
Flip the color , parent and children
change color
Parent color does not change,
because it is root node
28
The Weird Crossover Node- Rotation
50
25
RoR
25
12
75
12
50
37
37
Outside
grandchild
Inside
grandchild
75
Crossover Node
The inside grandchild, if it's the child of the node that's going up (which is the
left child of the top node in a right rotation) is always disconnected from its
parent and reconnected to its former grandparent. It's like becoming your own
uncle
29
Example
• Insert 62, 87, 6, 18, 31, 43
50
75
25
12
6
37
62
87
Violates Rule 3
Flip the color, parent and children
change color
30
Example- RB Tree
• Insert 62, 87, 6, 18, 31, 43
50
75
25
12
6
37
18
31
62
87
43
31
Subtree Rotation (RoR)
25
Violates Rule 2 and 4
50
12
6
75
37
18
62
31
87
43
Observations:
• The top node (50) goes to its right child.
• The top node's left child (25) goes to the top.
• The entire subtree of which 12 is the root moves up.
• The entire subtree of which 37 is the root moves
across to become the left child of 50.
• The entire subtree of which 75 is the root moves down.
32
Complete Red Black Tree
25
50
12
6
75
37
18
62
31
87
43
33
Inserting a New Node
• Let X, P, and G to designate a pattern of related
nodes. X is a node that has caused a rule
violation. (Sometimes X refers to a newly inserted
node, and sometimes to the child node when a
parent and child have a red-red conflict.)
• X is a particular node.
• P is the parent of X.
• G is the grandparent of X (the parent of P).
34
Inserting a New Node
• Let X, P, and G to designate a pattern of related
nodes. X is a node that has caused a rule
violation. (Sometimes X refers to a newly inserted
node, and sometimes to the child node when a
parent and child have a red-red conflict.)
• X is a particular node.
• P is the parent of X.
• G is the grandparent of X (the parent of P).
35