Hash Function - Solon City Schools

Searches and your LAST
 Data Structure:
The Hash Table
Unsorted Array
[0]
66
[1]
5
[2]
90
[3]
55
[4]
1000
[5]
67
[6]
3
[7]
9
[8] 23
[9]
88
What is the most efficient method
of searching for a value in an
unsorted list?
Sequential Search – Start at one
end and work through the list
sequentially
O(n)
What is the Big O value?
On average, how many steps does it take to
find a value if there are:
100 items in the array? 50
1,000 items?
500
1,000,000 items?
500,000
n items?
n/2
Sorted Array
[0] 3
[1] 5
[2] 9
[3] 23
[4] 55
[5] 66
[6] 67
[7] 88
[8]
90
[9]
1000
What is the most efficient method
of searching a sorted list?
Binary Search – Check the middle index and
decide the next search direction based on:
•Higher
•Lower
•Done if found
O(log n)
What is the Big O value?
How many steps (on average) if there are:
100 items?
1,000 items?
1,000,000 items?
n items?
7
10
20
log2n
Illustrating a Binary Search in an array
A Binary
Search:
Looking for 66
3
5
9
23
55
?
66
?
YAY!
67
88 90
?
1000
That was fast. But can we do better?
• Linear search – ok
• Binary search – really good
• But an O(1) search? Is it even possible?
Hash Tables
(a final search)
Remember Maps in Java?
• Key, Value pair
• If the school wants to store your student info,
what is the key? What is the value?
• If the government want to store all of
someone’s financial records (credit score,
savings, debt, etc.), what is the key? What is
the value?
Space vs. Time Tradeoff
• Uses extra space – roughly 1.5 times the
total number of items you want to store
• Let’s say that there are 1800 students.
• We would need an array of about 2700
Student Records.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
2694
2695
2696
2697
2698
2699
null
null
null
null
null
null
The speed of a hashMap
• If you have a student number, you can
instantly (O(1) time) add the StudentRecord
to the array!
• If you have the student number, you can
instantly (O(1) time) get the StudentRecord
from the array!
• Is this really possible?
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
2694
2695
2696
2697
2698
2699
null
null
null
null
null
null
How does it work?
• Create a “bucket” – key, value pair
• Use the key and some hash function magic
160234
160884
Student ID
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
null
null
null
null
null
null
null
null
null
null
null
null
null
null
160234
null
null
null
null
null
null
null
hash
[ [2694
14 ] ]
index
• Each key generates an index value
2694 160884
null
2695
null
2696
null
2697
null
2698
null
2699
null
Hash Function Quality
• What happens with a GOOD hash function?
Each key generates a unique index
What is the
obvious next
question?
Hash Function Quality
• What happens with a BAD hash function?
What if different items map to the same
index value?
160234
170138
Student ID
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
null
null
null
null
null
null
null
null
null
null
null
null
null
null
160234
170138
null
null
null
null
null
null
null
2694
2695
2696
2697
2698
2699
null
null
null
null
null
null
hash
[ [14
14 ]]
index
Oh No!!!! COLLISIONS are BAD!!!
Collisions
• A Collision occurs when the hash function
maps more than one different item to the
same index value. 170138 also maps to 14!
Ways to handle collisions:
• Make the array 2-dimensional
• Linear Collision Processing – walk
sequentially to next open location
• Quadratic Collision Processing –
Jump in a predetermined quantity to
the next open location
170138
170138
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
null
null
null
null
160249 null
null
null
null
null
null
null
null
null
null
null
160330 null
160273 null
160268 null
null
null
null
null
null
null
160234
null 170138
null
null
null
null
null
null
null
160295 null
null
null
null
null
2694 160029
2695
null
2696
null
2697
null
2698 160192
2699
null
null
null
null
null
null
null
Collisions
Most common approach: Chaining – Use an
array of linked lists instead of an array of
Student Records
[0]
[1]
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
null
null
160249
null
null
null
null
null
160330
160273
160268
null
null
null
160234
170138
null
null
null
null
160295
null
null
[2]
[3]
[4]
[5]
[6]
[7]
2694 160029
2695
null
2696
null
2697
null
2698 160192
2699
null
Big O Values
What is the best case Big O value of adding or
retrieving from a hash table? O(1)
Worst case? O(n)
[0]
[1]
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
null
null
160249
null
null
null
null
null
160330
160273
160268
null
null
null
160234
170138
null
null
null
null
160295
null
null
[2]
[3]
[4]
[5]
[6]
[7]
2694 160029
2695
null
2696
null
2697
null
2698 160192
2699
null
So how does the hash function work?
• The God Object gives every object a
hashCode() method
• It returns an int – we will look at String’s
-381446182
• “Daniel McKeen” returns: ____________
• How can I instantly convert that number to
(and absolute
a value from 0 to 2699? Modulus!
value)
• The hash function uses the hashCode
method and modulus to generate an int in
the proper range.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
null
null
160249
null
null
null
null
null
160330
160273
160268
null
null
null
160234
null
null
null
null
160295
null
null
2694 160029
2695
null
2696
null
2697
null
2698 160192
2699
null
Hash Coded Data Storage – Key Points:
1. Must have a hash method to convert key data to array index –
Java gives us hashCode() -- Use of modulus (%) is common
2. Must know the maximum number of items to be stored
3. Must have a method for dealing with collisions
4. Alternate collision coverage methods:
1. Linear or Quadratic Collision Processing
2. Adding more dimensions to the array for collisions
3. Make the hash table an array of pointers -- Chaining
5. Big O value – Best and worst cases
6. Examples: Bar coding, VIN numbers on vehicles
Exercise
One
Example – Below is a Hash Table. It is a size 5 array of ListNode
pointers that will be used to store names.
The Hash Function takes the integer value of the middle letter % 5.
If there are an even number of letters, use the letter to the left of
middle. (Treat ‘a’ as 1.)
Question 1: Where would the name ‘Jacob’ be stored?
[0]
Middle letter c = 3  3 mod 5 = 3
[1]
[2]
[3]
[4]
Jacob
Hash Table using Chaining: Size 5 Array – Each element in
the array is a ListNode pointer
Hash Function: Integer value of middle letter mod 5 – If
even number of letters, use letter to the left of middle.
Add the following names to the hash table:
Ellen, Ann, George, Fred, Susan and draw the table
[0]
[1]
[2]
[3]
[4]
Hash Table: Size 5 Array – Each element is a ListNode pointer.
Hash Function: Integer value of middle letter mod 5 – If even
number of letters, use letter to the left of middle.
Add the following names to the hash table:
Ellen, Ann, George, Fred, Susan and draw the table
[0]
[1]
[2]
[3]
[4]
Ellen
Hash Table: Size 5 Array – Each element is a ListNode Pointer
Hash Function: Integer value of middle letter mod 5 – If even
number of letters, use letter to the left of middle.
Added:
Ellen, Ann, George, Fred
Remaining: Susan
[0]
George
[1]
[2]
Ellen
[3]
Fred
[4]
Ann
Hash Table: Size 5 Array – Each element is a ListNode pointer
Hash Function: Integer value of middle letter mod 5 – If even
number of letters, use letter to the left of middle.
Add the following names to the hash table:
Ellen, Ann, George, Fred, Susan and draw the table
[0]
George
[1]
[2]
Ellen
[3]
Fred
[4]
Susan
Ann
Exercise
Two
Hash Function: Determine a hash function that could result
in the following table
[0]
24
[1]
[2]
10
18
21
5
[3]
[4]
[5]
[6]
[7]
Hash Function Solution:
31
Value modulus 8
45
Let’s let you battle it out in your
own hash table!
• Great explanation:
http://scientopia.org/blogs/goodmath/2013/1
0/20/basic-data-structures-hash-tables/
• Image sources:
• https://www.cs.auckland.ac.nz/software/Alg
Anim/hash_tables.html
Unused Slides
Searching Techniques
Worst case: Sequential vs Binary
A Linear
Search
1
2
3
4
…15
5
1
A Binary
Search
2
3
4
• Picture an array
• Illustrate converting unique ID to array
index number