Binary Search Trees (continued) Remove: • Starting at the root

Binary Search Trees (continued) Remove: • Starting at the root, search for the value to be remove. • If we reach an empty tree, the value is not in the tree. • Otherwise, at some point the value is the root of some subtree. Remove the node at the root, using the in-­‐order predecessor (or alternatively the in-­‐order successor). There are three cases to consider. (The pictures below show the root as the left subtree. of its parent. The root could be the root of the whole tree or the root of the right subtree of its parent.) Case 1: There is no left subtree Return right subtree as the new root (could be null) Case 2: Predecessor is root of left subtree (That is, the left subtree has no right child) Option 1: Replace root value with predecessor value Set root's left subtree to the predecessor left subtree Return root Option 2: Set the predecessor right subtree to root’s right subtree and Return the predecessor Case 3: Predecessor is in the right subtree of the left child Replace the root value with the predecessor value, Set the left subtree of the parent of the predecessor to the predecessor’s left subtree. Return the root. Example: Which cases are which? node A K D R C E O B case 1 3 2 2 2 1 2 1 Example: What is the runtime of contains in a binary search tree? best O(1) at root worst O(n) unbalanced tree expected O(log n) when tree is balanced What is the runtime of add or remove in a binary search tree? The same as contains. Suppose n elements are added in random order to a BST What is the expected height of the tree? O(log n) It will be close to balanced Java Sets and Maps Recall: A set: A bag of values A map: A table of (key:value) pairs A map can be implemented as a set of key/value pairs A set can be implemented as a map with missing values The Java TreeSet class implements the Java Set interface The Java TreeMap class implements the Java Map interface Both TreeSet & TreeMap use a balanced binary search tree called a “red-­‐black tree". The height of the red-­‐black tree is guaranteed to be less than 2 log n. Every time you add or remove an element, the tree structure may be modified to maintain the balance. The runtime to rebalance the tree is worst-­‐case O(log n) time. Thus, contains: guaranteed worst case runtime is O(log n) add/remove: guaranteed worst case runtime O(log n) Can we do better than O(log n) for the operations of a Set/Map? Eg, Suppose we want maintain a set of students, where a student object has a name and 9-­‐digit id, and we want to be able to access students quickly. What data structure could we use that is often O(1) (i.e., get/set)? An array How can we use the student id to find an index into an array? 1. Use the id as the index into a really big array contains: O(1), add: O(1), remove: O(1) Yeh!!! 0 1  151212010 name: Dave id:151-­‐21-­‐2010 999999999 Problem: Memory space: The range of student id values is independent of the number of students, the size of the set! Most of the array would be empty. 2. Key Idea: Use the key to compute an index into a medium-­‐size array Want: contains: O(1), add: O(1), remove: O(1) O(n) memory space Eg, use last two digits of the student id 0 1 10  name: Dave id:151-­‐21-­‐2010 99 Problem: Two or more students might have the same last two digits. Hash Table – The array in which we refer to the elements. Hash Function – The function that maps the object to an index of the array h(obj) = index What properties would an ideal hash function have? 1. The hash function is fast to compute: O(1) 2. Maps each object to a unique index. But if you want to allow for any set of student ids, then we have to deal with the fundamental problem of collisions. Collision: some keys map to the same index: x ≠ to y, but h(x) = h(y). Can we prevent collisions when the set of key are unknown in advance? No Since the number of keys is much greater than the size of the table, there must be two keys that map to the same index. Any set that contains those two keys will have a collision. It is an example of the famous Pigeonhole Principle: If you put more than k items into k bins, then a least one bin contains more than one item. How likely are two keys going to hash to the same index? Surprisingly likely! The Birthday Paradox: Probability that n people don’t have the same birthday: p’ = (364/365)*(363/365)*…*((365-­‐n+1)/365) Probability at least two people have the same birthday is p = 1 – p’ When n = 23, p = 0.5. When n = 30, p = 0.7 When n = 50, p = 0.97 Desired properties of hash function: 1. The hash function should be fast to compute: O(1) Eg, Use length of a string as the index – fast! Bad if keys are 10-­‐character code; All the strings are the same length and map to the same index! 2. Limited number of collisions a. Given two keys, the probability that they hash to the same index is low b. When many keys are added to the table, they should appear “evenly” distributed. Eg, Use a random number for the index. Problem? Get a new index each time you use contains, add, remove: You can’t find the element quickly How can we handle collisions? 1. Open addressing / Probing (topic of 15-­‐211) 2. Separate chaining – keep a linked list of elements when keys hash to the same index What is the worst-­case runtime for contains, add, remove? O(n) – all the keys hash to the same index What is the best-­case runtime? O(1) – only a few keys map to any one index What is the expected runtime? O(1) – assuming the hash function is good, and the hash table is not too full