paper - People Server at UNCW

Hangman Optimization
Kyle Anderson, Sean Barton and Brandyn Deffinbaugh
Abstract
The purpose of this study was to find an
algorithm that would complete a game of
amount of guesses to complete the hangman
game and the amount of time to complete the
hangman game.
Hangman using the minimum amount of
guesses. This study used a slightly modified
version of the standard Hangman rules, the
1. Introduction
Hangman is a word game in which one
only change being that there is no limit to the
player, the Hangman, selects a word, and
amount of wrong guesses the algorithms can
presents this randomly chosen word to the
make. Three algorithms were created for this
player of the game as a set of underscores, one
task, a unigram search, a bigram search, and
for each letter in the chosen word. The player
an exhaustive search. The unigram and bigram
then guesses letters of the alphabet one at a
algorithms used letter frequencies obtained
time. If the letter of the current guess is in the
from an 8988 word English dictionary
word, its position in the word is made known to
containing no symbols. We used each of the
the player, and they can make more accurate
words in this word-bank in each of the three
future guesses. If the guessed letter is not in the
algorithms, we calculated the average amount
word, then that is also communicated to the
of guesses and average run time of each
player. Two of the algorithms used in this study
algorithm, this data was used to compare the
are referred to a n-gram models, where n is the
three algorithms and determine which one was
number of letters. These models can be used in
the most optimal solution in respect to both,
a variety of ways, but see a lot of use in natural
language processing. An n-gram is size 1 is
referred to as a unigram, and a n-gram of size 2
was found from the word-bank. With this list it
is referred to as a bigram. The unigram
iterates through and guesses at the possible
algorithm used in this study was created by
letters in the given word, if the letter is in the
finding the letter frequencies as they occur in
word it shows that to the player and increments
words used by the Hangman game's dictionary
the guess counter by one, if it is not in the word
of 8988 words. The letter frequencies were
it does not reveal any new letters to the player
generated in such a way as that repeated
and also increments the guess counter by one.
occurrences of the same letter in a word were
The time complexity of the unigram algorithm
not taken into account for the total letter
is M*N2, where M is a while-loop that
frequency Ex. in the word Aardvark, there are
corresponds to a Boolean statement, this while-
two As but for the purpose of the letter
loop can at most iterate to the length of the
occurrence, only 1 A would be noted. The
unigram. The two Ns represent for-loops for the
same is true for the bigram algorithm Ex: The
length of the word, the first N is a list
bigram combination of Ar appears twice in the
comprehension to create a list the same size as
word Aardvark but would only be counted
the string and make all the elements of the list
once. The third algorithm used in this study is a
an underscore, this is a representation of the
guess of the letters in the alphabet, in
hidden-word, it corresponds to a regular
alphabetical order from A to Z.
physical version of the hangman game where
the player is shown underscores corresponding
2. Unigram
The unigram algorithm is fairly straight
forward, given a word of x-length it tries to find
each letter of the word using a descendingsorted list of single-letter frequencies which
to the length of the word they are to guess at.
The second N is an iteration through a list the
size of the length of the word where each
element of the list is a single letter of the word
that was chosen for the algorithm to guess at.
The algorithm then returns the amount of
guesses it took to change all of the elements of
simply breaks the two-letter combination into
the list containing the underscore characters to
individual letters and checks to see if those two
characters from the English alphabet. The code
letters are in the list containing the letters of the
for this can be found in Fig. 1.
given word, it also checks to see if the two
letters are next to each other and if they aren’t
3. Bigram
Our second algorithm is a bigram
next to each other it continues to the next
bigram frequency. However, if they are next to
algorithm. This algorithm is very similar to the
each other it changes the elements of the list
unigram algorithm in that it finds the letter
containing the underscore characters to the
frequency of two-letter combinations instead of
letters corresponding to the bigram frequency in
the single-letter frequencies that the unigram
the same location as where the letters can be
finds. As it is based on an n-gram model the
found in the original word. The code for this
bigram behaves in the same fashion as the
algorithm can be found in Fig. 2.
unigram as well, because of the this the time
complexity of the bigram algorithm is the same
as the unigram, M*N2. The algorithm creates a
4. Exhaustive Search
Our final approach to this problem was
list of underscore characters that has the same
to use an exhaustive search. This algorithm is
amount of elements as the length of the given
much different from the unigram or bigram
word, this is N. The bigram also iterates
approach we used previously, in this we
through a list containing the letters of the given
algorithm the given word was transformed into
word as single elements. The M time
a list of where each element corresponded to a
complexity as with the unigram corresponds to
letter in the word and this list was guessed at
the iteration of through the bigram frequency
using the alphabet in alphabetical order. To do
list. When the bigram frequency is chosen it
this we used a while-loop associated to a
Boolean statement to find if the list containing
hangman game. On average the unigram
the underscore characters had been completely
performed the best in terms of guesses, with an
changed to a list only containing letters, if so it
average number of guesses at 16.63±3.78, the
returned the amount of guesses it tried. This
exhaustive search was very close in
algorithm’s time complexity is M2*N2. One M
performance having on average to guess
is the while-loop corresponding to the Boolean
20.76±3.09 times and the bigram was
statement as this statement can only be false as
significantly worse averaging 130.37±67.76
long as the list that contains the alphabet has
guesses to complete the game, this data can be
not been completely iterated through. The
found in Fig. 4. We also compared the average
second M is the for-loop iterating through the
run time between the two algorithms using the
list containing the alphabet. One of the Ns is the
time.time() function from the time class, the
list comprehension used to create a list that is
code for the main function handling the word
the same size as the length of the given word
inputs and the time management for the
and have each of the elements be an underscore
algorithms can be found in Fig. 6. The unigram
character. The other N is the iteration through
and bigram time complexities were the same
the list containing the letters of the word as
and as such the average time for each algorithm
each element in the list. The code for this
to complete is very similar. The average run
algorithm can be found in Fig. 3.
time of the unigram algorithm was
0.0171±0.0081 seconds and the bigram’s
5. Results and Conclusions
Each algorithm was given every word in
average run time was 0.0136±0.0085 seconds,
the exhaustive search was quite a bit longer on
our word-bank and compared to each other on
taking 0.0194±0.0184 seconds. The standard
how they performed in both amount of guesses
deviation for the exhaustive search was most
and amount of time taken to complete the
likely caused by the huge swing in possible
times for the algorithm to complete, the
maximum run time was two magnitudes higher
exhaustive search algorithm to be a little less
than average taking 1.573 seconds and the
complex in its time and it could possibly be
minimum was a magnitude lower taking 0.0050
better than the unigram or bigram approaches in
seconds to complete. The unigram and bigram
this instance if so since it is quite competitive
had much lower swings in minimum and
with them at the moment. Another possibility is
maximum run times having a maximum run
to have the unigram and bigram’s frequency list
time for the unigram not even reaching much
update depending on a couple of variables such
higher than the average at 0.0810 seconds and a
as positions filled in the given word and making
minimum run time of 0.0010 seconds. The
those positions in the words giving them their
bigram’s maximum run time was 0.3800
frequencies irrelevant. It would also be possible
seconds and the minimum run time being
to update the unigram and bigram frequency
0.0010 seconds, this data can be found in Fig.
lists based on the single-letter or double-letter
7. Overall the unigram algorithm performed the
combinations that have been used already and
best having he lowest amount of average
kicking them out.
guesses and a nearly identical average run time
Also, some holes existed in our work as well,
as the bigram and much faster run time than the
occasionally the time.time() function would
exhaustive search option.
return a time of 0 seconds elapsed for an
algorithm which might have skewed the results
6. Future Work
In the future we could optimize the
a little, in the future we would definitely like to
eliminate this problem.
7. Questions
1) What was the most effective algorithm with respect to average guesses?
A.) Unigram having an average of 16.63
2) What is a unigram? What is a bigram?
A.) Unigram is the probability of a single token to appear in a set using either that set or
another set to find the initial probability they appear. Bigram is the probability of all
combinations of tokens to appear in a given set using that set or another set to find the
initial probability.
3.) What is the Big(O) of the unigram algorithm? Bigram? Exhaustive search?
A.) Unigram: M*N^2, Bigram: M*N^2, Exhaustive search: M^2*N^2. Where M is a the
time complexity of the while-loop with the maximum iterations being the same as the
length of the unigram and bigram frequency list and M is the time complexity of the
while-loop for checking if the word is completely guessed and the iteration of the
alphabet list as both will have the same amount of iterations. N is the time complexity of
creating a list of underscore characters and the time complexity of iterating through the
list of the given word as both of these lists will be the same size.
8. Figures and Captions
1. Unigram Code
2. Bigram Code
3. Exhaustive Search Code
4. Average amount of guesses for each algorithm
5. Average run time for each algorithm
6. Code for the main function handling the word inputs and the time management for run times
7. Standard deviation, average run time, average guesses and minimum and maximum run time
and guesses for each algorithm
Figure 1.
Figure 2.
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7