Assignment 15

CSIS 10B
Assignment 15 Maps and Hashing
Due: next Wed
In class Demos (0 points)
We develop a SimpleHashMap class:
1)
2)
3)
4)
built from an Array of Associations
experiment with different hash methods
put, get methods
linear probing vs chaining
Assignment (10 points) Do either Basic or Advanced
Basic -- do all these exercises in lab15basic folder inside mod15 download
Part 1 -- Building a SimpleHashSet class using Chaining (LinkedList) to accommodate collisions
As discussed below, a HashSet is simpler than a HashMap because it is not associative. It just stores
objects into a large table at a location according to their hash code. The internal table of your
SimpleHashSet will be an ARRAY of LinkedList. Items arriving at a hash index will simply be added to the
LinkedList at that location.
1) Design and implement a SimpleHashSet class following the specifications below, which are a
subset of the actual java HashSet class. Note that unlike the java HashSet, your SimpleHashSet
stores Strings only (rather than Object). Either way, no generic mechanisms need to be added to
your code.
2) Also, instead of using the default String hashCode method, your SimpleHashSet will define and
use its own private hash method to locate a position for new values in the table, according to
the following criteria:
private int hash(String s)
return the sum of the ASCII codes of the first 3 letters % table capacity
if there are less than 3 letters, just return the sum of all the letters
(note you can access the ASCII code of a letter by using (int) s.charAt(k)
where k is the index).
3) Please also define a print method which prints the entire contents of a SimpleHashSet to the
console in tabular form, giving an output that looks something like this:
HashSet
size = 4
table:
0
1
2
3
4
5 [aaab, aaaa]
6 [abax, abaa]
7
8
9
10
11
12
4) Test your new class using the HashSetTester program.
5) When that works, add statements to the bottom of your HashSetTester program that reads and
adds all words in file marlin.txt into a SimpleHashSet. Your final table will look something like
HashSet
size = 28
table:
0 [SINCERELY, THAT, AZORES]
1 [BUT, CAMELS]
2 [BIG, ABOUT]
Etc
Part 2 – Creating a Document Index (table of words and a list of what line number they are on)
A useful application of HashMaps (see Javadocs here) is to create a word index (kind of like a book
index) of the words in a file (KEY) and a list of the line numbers they appear in the file on (VALUE). The
program shell given in MarlinIndex reads the marlin.txt file one line at a time. Right now we just get a list
of all the words as they are read and the line numbers they are in as we are reading them. This is done
by changing delimeters to “\n” in the first Scanner object to read a complete line at a time, then creating
a second Scanner to read data from a String one word at a time.
Our goal is to take this data and organize it into a line index. The final result of this program will be an
output that looks something like this:
AARDVARKS : [2]
ABOUT : [4]
AND : [2, 6, 6, 6, 7, 7]
ANIMALS : [6]
ANTS : [6, 7]
ARE : [6, 7, 7, 7]
AZORES : [3]
BATS : [6, 7]
BETWEEN : [7]
BIG : [7]
BUT : [7]
CAMELS : [2]
CATS : [6, 7]
The output above indicates that the word AND appears on lines 2,6 and 7, while the word CAMEL
appears only on line 2.
All we need to do is add statements in the inner while loop to accomplish adding and modifying
key/value pairs in the HashMap. The basic steps to create the map are for each word read,
a) If the map contains a KEY for the word
set wordIndex = the value in map associated with word (this is a LinkedList)
add the current line number to wordIndex (invokes the LinkedList add method)
b) Otherwise
set wordIndex = a NEW LinkedList
add the current line number to wordIndex (invokes the LinkedList add method)
put a new KEY/VALUE item into your map (where KEY is word and VALUE is wordIndex)
and store it in wordIndex (our linked list variable, already declared). Using wordIndex,
add the current line number to the linked list
c) That's all that needs to be done to create the map. After both loops complete (file has been
completely read and organized into map) you can print the map (check the HashMap
javadoc) by
obtain an iterator of Keys ( map.keySet().iterator() )
obtain an iterator of Value ( map.values().iterator() )
as long as the key iterator has another item,
print the next items of both iterators on the same line
d) If all works well, you'll get something like this...which is not alphabetized! The reason is
becaues HashMaps do not order their items except by hashCode, which is generally not
alphabetical.
AARDVARKS : [2]
ABOUT : [4]
SHIPPED : [2]
TO : [2]
THE : [2, 2, 2]
WERE : [2]
MARLIN : [1]
MISTAKENLY : [2]
SORRY : [4]
BUT : [7]
AND : [2, 6, 6, 6, 7, 7]
ANIMALS : [6]
e) How to alphabetize the index? Change the data type of map from HashMap to TreeMap!
TreeMaps organize their associations in a type of BST so the key value iterators will show
the data in order.
Advanced – do these exercises in the lab15advanced folder
Do Lab 15.8 on page 401 of your text. A file in the lab15advanced folder has been provided with the top
15,000 surnames in the US. The idea is very similar to MarlinIndex above. The key is the soundex code of
a name, and the value is a linked list of surnames that match the soundex value.
A nice app would be to read in the 15000 names and store in a TreeMap. Then ask the user to enter
their surname, compute the soundex, and show the neighboring soundex entries with names up to 5
keys before and after the entered surname's soundex.