CSI402 Systems Programming Handout 6.2 Overview of BSTs and Hash Tables Binary Search Trees The binary search tree (BST) is a data structure that you should be familiar with at this point in your academic career. If not, then now is a great time to learn. The BST is incredibly important because it supports reliably-efficient insertion and retrieval of elements based on a key. A BST consists of nodes. Each node has a key, one or more data values, a left child, and a right child (the children are also nodes). At the “top” of the tree is a special node, called the root, which is not the child of any other node. Every node can be considered the root of a subtree that starts at that node and descends downward. Given a node N : The subtree rooted at N ’s left child is called the left subtree of N , and the subtree rooted at the right child of N is called the right subtree of N . BSTs are maintained in a special order. Given a node N with key k: The left subtree of N contains nodes whose keys precede k, while the right subtree of N contains nodes whose keys succeed k. The key can be any type which can be ordered (e.g integers with ≤ or strings with lexicographic ordering). For example, suppose we had a tree to store objects with keys 1, 2, 3, 4, 5, 6, and 7. The tree would appear as: As mentioned above, the keys can also be strings (which they will be in this assignment). Suppose we had a tree to store objects with keys apple, pear, orange, banana, strawberry, grape, and blueberry. The tree would appear as: 1 Hash Tables A hash table is another common data structure that supports efficient insertion and retrieval of elements based on a key. In its simplest version, a hash table is simply an array. A special function, called the hash function, takes in a key and returns the index in the array to store that object. For example, suppose the hash function is h(x), and the key for a object O is k. To insert O into the hash table, we would compute y = h(k), and then store O in the array at index y. The integer y is called the hash value of key k Unfortunately, in practice, “perfect” hash functions are rare. Most hash functions are not oneto-one functions. This means that for two distinct keys k1 and k2 , there is no guarantee that h(k1 ) and h(k2 ) will be distinct. In other words, keys k1 and k2 might have the same hash value. This is called a collision. When collisions occur, we cannot store both objects at the same index in the array. Therefore, we need a collision resolution strategy. In this class, we will use chaining. In the chaining strategy, each element of the array is the head of a linked list. At index y in the array, the linked list contains all of the objects whose keys have hash value y. Often, they key for a hash table is a string. The object is some other collection of one or more pieces of data. For example, if we were using a hash table to store information about students in this class, the key might be each student’s NetID, while the object would be a struct containing their name, major, gpa, etc. In assembler symbol tables, the objects we are storing are symbol table entries (i.e. a struct containing a symbol and its corresponding LC value). The key will be the symbol itself. Suppose we have an element O with key k that we want to store in our hash table. Suppose we have a hash function h(x). To insert the element into the hash table, we would follow the process below: 1. Construct a new linked list node N to store O 2. Compute the hash value y = h(k) 3. If table[y] is null (i.e. the list at location y is empty), then simply set table[y] = N . 4. Otherwise, insert N into the existing linked list stored in table[y] Note that ’table’ is an array of linked lists. In other words, each element of ’table’ is a list node that represents the head of a linked list. 2 For example, suppose we want to insert the following objects into a hash table: Object Symbol: Start LC Value: 1000 Symbol: Loop LC Value: 1012 Symbol: First LC Value: 1015 Symbol: Val1 LC Value: 1018 Symbol: Val2 LC Value: 1021 Symbol: Str1 LC Value: 1024 Key Hash Value “Start” 3 “Loop” 0 “First” 2 “Val1” 3 “Val2” 4 “Str1” 0 For each object, we obtain the hash value using a hash function (not shown here). For example, if the hash function were called h. Then h(“Loop2”) = 3. Shown below is the contents for each linked list in the hash table. Recall that a linked list at index y contains each object whose key has hash value y. Note that when inserting into a linked list, we insert at the head (i.e. the “front” or “top” of the list). This allows for simpler implementation, and ensures average case constant time insertion. 3
© Copyright 2026 Paperzz