CSI402 Systems Programming Handout 6.2 Overview of BSTs and

CSI402 Systems Programming
Handout 6.2
Overview of BSTs and Hash Tables
Binary Search Trees
The binary search tree (BST) is a data structure that you should be familiar with at this point in
your academic career. If not, then now is a great time to learn. The BST is incredibly important
because it supports reliably-efficient insertion and retrieval of elements based on a key.
A BST consists of nodes. Each node has a key, one or more data values, a left child, and a
right child (the children are also nodes). At the “top” of the tree is a special node, called the root,
which is not the child of any other node. Every node can be considered the root of a subtree that
starts at that node and descends downward. Given a node N : The subtree rooted at N ’s left child
is called the left subtree of N , and the subtree rooted at the right child of N is called the right
subtree of N .
BSTs are maintained in a special order. Given a node N with key k: The left subtree of N
contains nodes whose keys precede k, while the right subtree of N contains nodes whose keys
succeed k. The key can be any type which can be ordered (e.g integers with ≤ or strings with
lexicographic ordering).
For example, suppose we had a tree to store objects with keys 1, 2, 3, 4, 5, 6, and 7. The
tree would appear as:
As mentioned above, the keys can also be strings (which they will be in this assignment). Suppose
we had a tree to store objects with keys apple, pear, orange, banana, strawberry, grape, and blueberry.
The tree would appear as:
1
Hash Tables
A hash table is another common data structure that supports efficient insertion and retrieval of
elements based on a key.
In its simplest version, a hash table is simply an array. A special function, called the hash function,
takes in a key and returns the index in the array to store that object. For example, suppose the
hash function is h(x), and the key for a object O is k. To insert O into the hash table, we would
compute y = h(k), and then store O in the array at index y. The integer y is called the hash
value of key k
Unfortunately, in practice, “perfect” hash functions are rare. Most hash functions are not oneto-one functions. This means that for two distinct keys k1 and k2 , there is no guarantee that h(k1 )
and h(k2 ) will be distinct. In other words, keys k1 and k2 might have the same hash value. This is
called a collision.
When collisions occur, we cannot store both objects at the same index in the array. Therefore, we
need a collision resolution strategy. In this class, we will use chaining. In the chaining strategy,
each element of the array is the head of a linked list. At index y in the array, the linked list contains
all of the objects whose keys have hash value y.
Often, they key for a hash table is a string. The object is some other collection of one or more
pieces of data. For example, if we were using a hash table to store information about students in
this class, the key might be each student’s NetID, while the object would be a struct containing
their name, major, gpa, etc.
In assembler symbol tables, the objects we are storing are symbol table entries (i.e. a struct
containing a symbol and its corresponding LC value). The key will be the symbol itself.
Suppose we have an element O with key k that we want to store in our hash table. Suppose
we have a hash function h(x). To insert the element into the hash table, we would follow the
process below:
1. Construct a new linked list node N to store O
2. Compute the hash value y = h(k)
3. If table[y] is null (i.e. the list at location y is empty), then simply set table[y] = N .
4. Otherwise, insert N into the existing linked list stored in table[y]
Note that ’table’ is an array of linked lists. In other words, each element of ’table’ is a list node
that represents the head of a linked list.
2
For example, suppose we want to insert the following objects into a hash table:
Object
Symbol: Start
LC Value: 1000
Symbol: Loop
LC Value: 1012
Symbol: First
LC Value: 1015
Symbol: Val1
LC Value: 1018
Symbol: Val2
LC Value: 1021
Symbol: Str1
LC Value: 1024
Key
Hash Value
“Start”
3
“Loop”
0
“First”
2
“Val1”
3
“Val2”
4
“Str1”
0
For each object, we obtain the hash value using a hash function (not shown here). For example, if the hash function were called h. Then h(“Loop2”) = 3. Shown below is the contents for each
linked list in the hash table. Recall that a linked list at index y contains each object whose key has
hash value y. Note that when inserting into a linked list, we insert at the head (i.e. the “front” or
“top” of the list). This allows for simpler implementation, and ensures average case constant time
insertion.
3