1431227-3 File Organization and Processing Week 2 Binary Search Tree and Red-Black Tree 1 Introduction • Tree – Root, nodes (child), edges – Quick search in ordered array using binary search. – Running time to search an item O(logN) – Insertion and deletion is fastest in linked list O(1), whereas searching is slow O(N) • Binary Tree – At most 2 children (left and right) – Combines the advantages of 2 structures ordered array and linked list 2 Analogy • Commonly encountered tree is the hierarchical file structure in computer system. – – – – The root directory of a given device is the tree’s root: c:\ Subdirectories one level below are the children of c:\ Files represent leaves, they have no children of their own Hierarchical file structure is not a binary tree because they have many children – Complete path name from the root to the leaf SMITH.DAT: C:\SALES\EAST\NOVEMBER\SMITH.DAT • Differences – Subdirectories contain no data only reference to other subdirectories or files. Only files contain data. – In a tree, every node contains data (a personnel record, car-part specifications, or whatever). In addition to the data, all nodes except leaves contain references to other nodes. 3 Unbalanced Tree • Most of the nodes are one side of the node or the other • Individual subtree can also be unbalanced • Why unbalanced? – Because of the order in which data items are inserted – If they are inserted randomly it will be more or less balanced – If ascending or descending sequence are generated all the children will be either in right or left side-> tree becomes unbalanced – Tree efficiency can be seriously degraded 4 Degenerates to O(N) • When there are no branches, tree becomes in effect a linked list – Arrangement of data becomes 1D instead of 2D – Searching becomes O(N) instead of O(logN) – Example: Searching through 10,000 items in an unbalanced tree requires average 5000 comparisons, whereas for balanced tree for random insertion it requires 14 – With a realistic amount of random data it's not likely a tree would become seriously unbalanced. However, there may be runs of sorted data that will partially unbalance a tree. – Searching partially unbalanced trees will take time somewhere between O(N) and O(logN), depending on how badly the tree is unbalanced. 5 Traversing a Binary Tree • Inorder traversal – All the nodes to be visited in ascending order – If you want to create a sorted list of the data in a binary tree, this is one way to do it • Preorder traversal, Postorder traversal – these traversals are indeed useful if you're writing programs that parse or analyze algebraic expressions- Compiler Construction 6 Preorder, Postorder Traversal • A binary tree (not a binary search tree) can be used to represent an algebraic expression that involves the binary arithmetic operators +, -, /, and *. • The root node holds an operator, and each of its subtrees represents either a variable name (like A, B, or C) or another expression. * + A B C Inorder: A*(B+C)Infix exp. Preorder: *A+BCPrefix Postorder: ABC+*Postfix • Finding the minimum and maximum value of a BST, how? 7 Traversing a Tree • Inorder traversal – All the nodes to be visited in ascending order – If you want to create a sorted list of the data in a binary tree, this is one way to do it • Preorder traversal, Postorder traversal – these traversals are indeed useful if you're writing programs that parse or analyze algebraic expressions- Compiler Construction • Problem: – Create a BST using the following items: • 50, 25, 75, 37, 62, 84, 31, 43, 55, 92 – Traverse inorder, preorder and postorder and display the numbers 8 Summary of BST • Advantages of BST – – – – Quick insertion, deletion and search Running time- O(logN) Array, sorted arrays, linked list perform one or the other activities slowly BST might appear to the ideal data storage structure • Disadvantages – BST work well if the data is inserted into the tree in random order – Become much slower if data is inserted already in sorted order (ascending or descending) – Eventually tree become unbalanced when any item is inserted – Running time becomes O(N) whereas for balanced tree O(logN) • How to solve the unbalanced problem of BST? • RED-BLACK, 2-3-4 TREES ARE THE SOLUTION!!! 9 Red-Black Tree- Balance to Rescue • To guarantee the quick O(log N) search times a tree is capable of, we need to ensure that our tree is always balanced (or at least almost balanced). • This means that each node in a tree needs to have roughly the same number of descendents on its left side as it has on its right. • In a red-black tree, balance is achieved during insertion. As an item is being inserted, the insertion routine checks that certain characteristics of the tree are not violated. If they are, it takes corrective action, restructuring the tree as necessary. • By maintaining these characteristics, the tree is kept balanced. 10 Red-Black Rules • When inserting (or deleting) a new node, certain rules, which we call the red-black rules, must be followed. If they're followed, the tree will be balanced. – 1. Every node is either red or black – 2. The root is always black – 3. If a node is red, its children must be black (although the converse isn't necessarily true) – 4. Every path from the root to a leaf, or to a null child, must contain the same number of black nodes 11 Red-Black Rules (Cont’d) • The "null child" referred to in Rule 4 is a place where a child could be attached to a nonleaf node. • In other words, it's the potential left child of a node with a right child, or the potential right child of a node with a left child. This will make more sense as we go along. • The number of black nodes on a path from root to leaf is called the black height. • Another way to state Rule 4 is that the black height must be the same for all paths from the root to a leaf. 12 Duplicate Keys • What happens if there are more than one data item with the same keys? – Example: 50, 50, 50, Then 2nd 50 go to the right of the 1st one and 3rd 50 will go to the left of the 1st 50. Otherwise, it becomes unbalanced • What if the randomization insertion process? – Search process becomes complicated if all items with the same keys must be found 13 Colors rules are violated • What if the color rules are violated? • Two possible actions: – Change the color of the nodes – Perform rotations- rearrangement of the nodes to make the tree balanced 14 Experiment 1 50 • RB Tree: 50, 25, 75 25 • Newly inserted nodes are always colored red (except for the root). This is not an accident. It's less likely that inserting a red node will violate the red-black rules than inserting a black one. 75 •This is because if the new red node is attached to a black one, no rule is broken. It doesn't create a situation in which there are two red nodes together (Rule 3), and it doesn't change the black height in any of the paths (Rule 4). 15 Experiment 1 • RB Tree: 50, 25, 75 • If you attach a new red node to a red node, Rule 3 will be violated. However, with any luck this will only happen half the time. 50 25 75 • Whereas, if it were possible to add a new black node, it would always change the black height for its path, violating Rule 4. • Also, it's easier to fix violations of Rule 3 (parent and child are both red) than Rule 4 (black heights differ) 16 Experiment 2- Rotation 50 • Rotation Right • Unbalanced: more nodes to the right of the root that the left • RB rules (Rule 2) are violated. Root is always black 75 25 RoR 25 50 • Now rotate left and the tree becomes balanced 50 75 RoL 25 75 17 Experiment 3- Flip Color • Insert 12 50 • You can not insert 12 in the current arrangement 75 25 • A color flip is necessary whenever, during the insertion process, a black node with two red children is encountered. • The root's two children change from red to black. Ordinarily the parent would change from black to red, but this is a special case because it's the root: it remains black to avoid violating Rule 2. Now all three nodes are black. Insert 12 50 75 25 Color Flip 12 18 Experiment 3- Flip Color (Cont’d) • The tree is still red-black The root is black, there's no situation in which a parent and child are both red, and all the paths have the same number of black nodes (2). 50 75 25 Insert 12 50 • Adding the new red node didn't change the red-black correctness. 75 25 Color Flip 12 19 Experiment 4- Unbalanced • One path has one more node than the other. This isn't very unbalanced, and no red-black rules are violated, so neither we nor the red-black algorithms need to worry about it. • However, suppose that one path differs from another by two or more levels (where level is the same as the number of nodes along the path). 50 25 75 12 • In this case the redblack rules will always be violated, and we'll need to rebalance the tree. 20 Experiment 4- Unbalanced 50 • Insert 6 25 • Rule 3 is violated- Children must be black of a red node 75 12 • How can we fix things so Rule 3 isn't violated? 6 6 • An obvious approach is to change one of the offending nodes to black. Let's try changing the child node, 6 into black. 21 Experiment 4- Unbalanced (Cont’d) • The good news is we fixed the problem of both parent and child being red. The bad news is rule 4 is violated. The path from the root to node 6 has three black nodes in it, while the path from the root to node 75 has only two. Thus Rule 4 is violated. It seems we can't win. 50 25 75 12 6 6 • This problem can be fixed with a rotation and some color changes. 22 RB Rules and Balanced Tree • Try to create a tree that is unbalanced by two or more levels but is red-black correct. As it turns out, this is impossible. • That's why the red-black rules keep the tree balanced. If one path is more than one node longer than another, then it must either have more black nodes, violating Rule 4, or it must have two adjacent red nodes, violating Rule 3. 23 Null Children • Rule 4: All paths that go from the root to any leaf or to any null children must have the same number of black nodes. 50 • A null child is a child that a nonleaf node might have, but doesn't. Thus in Figure the path from 50 to 25 to the right child of 25 (its null child) has only one black node, which is not the same as the paths to 6 and 75, which have 2. • This arrangement violates Rule 4, although both paths to leaf nodes have the same number of black nodes. NULL 25 12 6 75 NULL NULL NULL 24 Rotation • To balance a tree, it is necessary to physically rearrange the nodes. If all the nodes are on the left of the root, for example, you need to move some of them over to the right side. This is done using rotations. • Rotations are ways to rearrange nodes. They were designed to do the following two things: – Raise some nodes and lower others to help balance the tree. – Ensure that the characteristics of a binary search tree are not violated. • Recall that in a binary search tree the left children of any node have key values less than the node, while its right children have key values greater or equal to the node. 25 Rotation- Importance • If the rotation didn't maintain a valid binary search tree it wouldn't be of much use, because the search algorithm relies on the search-tree arrangement. • Color rules and node color changes are used only to help decide when to perform a rotation; fiddling with the colors doesn't accomplish anything by itself; it's the rotation that's the heavy hitter. • Color rules are like rules of thumb for building a house (such as "exterior doors open inward"), while rotations are like the hammering and sawing needed to actually build it. 26 Mind the Children • During right rotation, the top node must has a left child. Otherwise there's nothing to rotate into the top spot. • During left rotation, the top node must have a right child. 27 The Weird Crossover Node • Create a RB Tree: 50, 25, 75, 12, 37 50 50 25 25 75 75 Violates Rule 3 12 50 25 12 75 37 Flip the color , parent and children change color Parent color does not change, because it is root node 28 The Weird Crossover Node- Rotation 50 25 RoR 25 12 75 12 50 37 37 Outside grandchild Inside grandchild 75 Crossover Node The inside grandchild, if it's the child of the node that's going up (which is the left child of the top node in a right rotation) is always disconnected from its parent and reconnected to its former grandparent. It's like becoming your own uncle 29 Example • Insert 62, 87, 6, 18, 31, 43 50 75 25 12 6 37 62 87 Violates Rule 3 Flip the color, parent and children change color 30 Example- RB Tree • Insert 62, 87, 6, 18, 31, 43 50 75 25 12 6 37 18 31 62 87 43 31 Subtree Rotation (RoR) 25 Violates Rule 2 and 4 50 12 6 75 37 18 62 31 87 43 Observations: • The top node (50) goes to its right child. • The top node's left child (25) goes to the top. • The entire subtree of which 12 is the root moves up. • The entire subtree of which 37 is the root moves across to become the left child of 50. • The entire subtree of which 75 is the root moves down. 32 Complete Red Black Tree 25 50 12 6 75 37 18 62 31 87 43 33 Inserting a New Node • Let X, P, and G to designate a pattern of related nodes. X is a node that has caused a rule violation. (Sometimes X refers to a newly inserted node, and sometimes to the child node when a parent and child have a red-red conflict.) • X is a particular node. • P is the parent of X. • G is the grandparent of X (the parent of P). 34 Inserting a New Node • Let X, P, and G to designate a pattern of related nodes. X is a node that has caused a rule violation. (Sometimes X refers to a newly inserted node, and sometimes to the child node when a parent and child have a red-red conflict.) • X is a particular node. • P is the parent of X. • G is the grandparent of X (the parent of P). 35
© Copyright 2026 Paperzz