Hangman Optimization Kyle Anderson, Sean Barton and Brandyn Deffinbaugh Abstract The purpose of this study was to find an algorithm that would complete a game of amount of guesses to complete the hangman game and the amount of time to complete the hangman game. Hangman using the minimum amount of guesses. This study used a slightly modified version of the standard Hangman rules, the 1. Introduction Hangman is a word game in which one only change being that there is no limit to the player, the Hangman, selects a word, and amount of wrong guesses the algorithms can presents this randomly chosen word to the make. Three algorithms were created for this player of the game as a set of underscores, one task, a unigram search, a bigram search, and for each letter in the chosen word. The player an exhaustive search. The unigram and bigram then guesses letters of the alphabet one at a algorithms used letter frequencies obtained time. If the letter of the current guess is in the from an 8988 word English dictionary word, its position in the word is made known to containing no symbols. We used each of the the player, and they can make more accurate words in this word-bank in each of the three future guesses. If the guessed letter is not in the algorithms, we calculated the average amount word, then that is also communicated to the of guesses and average run time of each player. Two of the algorithms used in this study algorithm, this data was used to compare the are referred to a n-gram models, where n is the three algorithms and determine which one was number of letters. These models can be used in the most optimal solution in respect to both, a variety of ways, but see a lot of use in natural language processing. An n-gram is size 1 is referred to as a unigram, and a n-gram of size 2 was found from the word-bank. With this list it is referred to as a bigram. The unigram iterates through and guesses at the possible algorithm used in this study was created by letters in the given word, if the letter is in the finding the letter frequencies as they occur in word it shows that to the player and increments words used by the Hangman game's dictionary the guess counter by one, if it is not in the word of 8988 words. The letter frequencies were it does not reveal any new letters to the player generated in such a way as that repeated and also increments the guess counter by one. occurrences of the same letter in a word were The time complexity of the unigram algorithm not taken into account for the total letter is M*N2, where M is a while-loop that frequency Ex. in the word Aardvark, there are corresponds to a Boolean statement, this while- two As but for the purpose of the letter loop can at most iterate to the length of the occurrence, only 1 A would be noted. The unigram. The two Ns represent for-loops for the same is true for the bigram algorithm Ex: The length of the word, the first N is a list bigram combination of Ar appears twice in the comprehension to create a list the same size as word Aardvark but would only be counted the string and make all the elements of the list once. The third algorithm used in this study is a an underscore, this is a representation of the guess of the letters in the alphabet, in hidden-word, it corresponds to a regular alphabetical order from A to Z. physical version of the hangman game where the player is shown underscores corresponding 2. Unigram The unigram algorithm is fairly straight forward, given a word of x-length it tries to find each letter of the word using a descendingsorted list of single-letter frequencies which to the length of the word they are to guess at. The second N is an iteration through a list the size of the length of the word where each element of the list is a single letter of the word that was chosen for the algorithm to guess at. The algorithm then returns the amount of guesses it took to change all of the elements of simply breaks the two-letter combination into the list containing the underscore characters to individual letters and checks to see if those two characters from the English alphabet. The code letters are in the list containing the letters of the for this can be found in Fig. 1. given word, it also checks to see if the two letters are next to each other and if they aren’t 3. Bigram Our second algorithm is a bigram next to each other it continues to the next bigram frequency. However, if they are next to algorithm. This algorithm is very similar to the each other it changes the elements of the list unigram algorithm in that it finds the letter containing the underscore characters to the frequency of two-letter combinations instead of letters corresponding to the bigram frequency in the single-letter frequencies that the unigram the same location as where the letters can be finds. As it is based on an n-gram model the found in the original word. The code for this bigram behaves in the same fashion as the algorithm can be found in Fig. 2. unigram as well, because of the this the time complexity of the bigram algorithm is the same as the unigram, M*N2. The algorithm creates a 4. Exhaustive Search Our final approach to this problem was list of underscore characters that has the same to use an exhaustive search. This algorithm is amount of elements as the length of the given much different from the unigram or bigram word, this is N. The bigram also iterates approach we used previously, in this we through a list containing the letters of the given algorithm the given word was transformed into word as single elements. The M time a list of where each element corresponded to a complexity as with the unigram corresponds to letter in the word and this list was guessed at the iteration of through the bigram frequency using the alphabet in alphabetical order. To do list. When the bigram frequency is chosen it this we used a while-loop associated to a Boolean statement to find if the list containing hangman game. On average the unigram the underscore characters had been completely performed the best in terms of guesses, with an changed to a list only containing letters, if so it average number of guesses at 16.63±3.78, the returned the amount of guesses it tried. This exhaustive search was very close in algorithm’s time complexity is M2*N2. One M performance having on average to guess is the while-loop corresponding to the Boolean 20.76±3.09 times and the bigram was statement as this statement can only be false as significantly worse averaging 130.37±67.76 long as the list that contains the alphabet has guesses to complete the game, this data can be not been completely iterated through. The found in Fig. 4. We also compared the average second M is the for-loop iterating through the run time between the two algorithms using the list containing the alphabet. One of the Ns is the time.time() function from the time class, the list comprehension used to create a list that is code for the main function handling the word the same size as the length of the given word inputs and the time management for the and have each of the elements be an underscore algorithms can be found in Fig. 6. The unigram character. The other N is the iteration through and bigram time complexities were the same the list containing the letters of the word as and as such the average time for each algorithm each element in the list. The code for this to complete is very similar. The average run algorithm can be found in Fig. 3. time of the unigram algorithm was 0.0171±0.0081 seconds and the bigram’s 5. Results and Conclusions Each algorithm was given every word in average run time was 0.0136±0.0085 seconds, the exhaustive search was quite a bit longer on our word-bank and compared to each other on taking 0.0194±0.0184 seconds. The standard how they performed in both amount of guesses deviation for the exhaustive search was most and amount of time taken to complete the likely caused by the huge swing in possible times for the algorithm to complete, the maximum run time was two magnitudes higher exhaustive search algorithm to be a little less than average taking 1.573 seconds and the complex in its time and it could possibly be minimum was a magnitude lower taking 0.0050 better than the unigram or bigram approaches in seconds to complete. The unigram and bigram this instance if so since it is quite competitive had much lower swings in minimum and with them at the moment. Another possibility is maximum run times having a maximum run to have the unigram and bigram’s frequency list time for the unigram not even reaching much update depending on a couple of variables such higher than the average at 0.0810 seconds and a as positions filled in the given word and making minimum run time of 0.0010 seconds. The those positions in the words giving them their bigram’s maximum run time was 0.3800 frequencies irrelevant. It would also be possible seconds and the minimum run time being to update the unigram and bigram frequency 0.0010 seconds, this data can be found in Fig. lists based on the single-letter or double-letter 7. Overall the unigram algorithm performed the combinations that have been used already and best having he lowest amount of average kicking them out. guesses and a nearly identical average run time Also, some holes existed in our work as well, as the bigram and much faster run time than the occasionally the time.time() function would exhaustive search option. return a time of 0 seconds elapsed for an algorithm which might have skewed the results 6. Future Work In the future we could optimize the a little, in the future we would definitely like to eliminate this problem. 7. Questions 1) What was the most effective algorithm with respect to average guesses? A.) Unigram having an average of 16.63 2) What is a unigram? What is a bigram? A.) Unigram is the probability of a single token to appear in a set using either that set or another set to find the initial probability they appear. Bigram is the probability of all combinations of tokens to appear in a given set using that set or another set to find the initial probability. 3.) What is the Big(O) of the unigram algorithm? Bigram? Exhaustive search? A.) Unigram: M*N^2, Bigram: M*N^2, Exhaustive search: M^2*N^2. Where M is a the time complexity of the while-loop with the maximum iterations being the same as the length of the unigram and bigram frequency list and M is the time complexity of the while-loop for checking if the word is completely guessed and the iteration of the alphabet list as both will have the same amount of iterations. N is the time complexity of creating a list of underscore characters and the time complexity of iterating through the list of the given word as both of these lists will be the same size. 8. Figures and Captions 1. Unigram Code 2. Bigram Code 3. Exhaustive Search Code 4. Average amount of guesses for each algorithm 5. Average run time for each algorithm 6. Code for the main function handling the word inputs and the time management for run times 7. Standard deviation, average run time, average guesses and minimum and maximum run time and guesses for each algorithm Figure 1. Figure 2. Figure 3 Figure 4 Figure 5 Figure 6 Figure 7
© Copyright 2026 Paperzz