Conference Session C11 Paper #64 Disclaimer—This paper partially fulfills a writing requirement for first year (freshman) engineering students at the University of Pittsburgh Swanson School of Engineering. This paper is a student, not a professional, paper. This paper is based on publicly available information and may not provide complete analyses of all relevant data. If this paper is used for any purpose other than these authors’ partial fulfillment of a writing requirement for first year (freshman) engineering students at the University of Pittsburgh Swanson School of Engineering, the user does so at his or her own risk. The Use of Genetic Algorithms in Evolutionary Computing to Improve the Performance of Speech Recognition Technology Nicholas Buck, [email protected], Budny, 10:00, Carly Hoffman, [email protected], Budny, 10:00 Abstract—It is no secret that humans are accustomed to speech as their main method of communication. It comes as no surprise then, that they would want to use that method to communicate with technological devices. However, speech processing to a human being and to a computer are two entirely different processes. A device, for example, cannot immediately distinguish background noise from the speech input you want it to recognize. However, the application of a neural network in conjunction with genetic algorithms can optimize speech recognition so that it can process speech input in a way similar to that of the human mind. Genetic algorithms are one of many types of optimization techniques which are used to make certain processes more efficient and accurate, processes like speech recognition. They resemble the biological process of evolution and natural selection. The use of speech recognition is widespread in most modern technologies of this era, being implemented in most phones, televisions, and even cars. We want to highlight how these algorithms work to make technology more usable and simple to communicate with to all who interact with the technology. To explain how these algorithms work and benefit humans, we intend to describe the history, components and inner workings of both genetic algorithms and speech recognition as well as the sustainability of these technologies. We will communicate how these algorithms will impact the future of our technology, and what impacts they have already made by referencing past and current experimental results. Key Words—Discrete Hidden Markov Model, Evolutionary Computing, Genetic Algorithms, Neural Networks, Optimization, Speech Recognition OPTIMIZATION IS KEY Many have likely heard of what is known as the “language barrier,” the idea that differences in communication are one of the main obstacles keeping humanity largely separated. To remedy the cause of this separation, advances in computing and technology have brought about the creation of speech recognition and computerized speech translation. But, these advances are far from perfect. What would really aid us in 1 University of Pittsburgh Swanson School of Engineering 11.02.2017 achieving the goal of this technology would be the ability to translate speech in real time. Recent years have yielded results close to this objective, and while full realization has not happened advances in evolutionary computing could be the key to obtaining the speed necessary to accomplish such a feat. Humans are accustomed to using speech to communicate, it makes sense then that they would like to interact with their technology in the same manner. Currently, we interact with most of our devices through peripherals such as a keyboard and mouse or through touch, but these would become less necessary and more inconvenient if speech recognition were to become more accurate and reliable [1]. There are difficulties that come with the technology, however. At times, it can be difficult for a machine to recognize everything we say due to external factors like background noise or similar sounding phonemes which can cause words to get switched or confused. There are many methods that have been used to optimize many different processes (computer or otherwise) using computer power, a genetic algorithm is a widely-used technique of this sort [2]. A genetic algorithm is a computerized process similar to the biological process of natural selection. It has been shown that the use of these algorithms applied to speech signals used in recognition greatly improve both speed and performance [1]. But this is not the only way these algorithms can be applied to this situation alone. Spoken translation software are becoming increasingly more prevalent, but they can often be inaccurate. These inaccuracies are due to anything from differences in language laws and words having multiple possible translations, to simple background noise. Filtering out background noise can be done using common methods, but few can match the speed of a genetic algorithm. In the sections to follow the methods behind evolutionary computing and therefore genetic algorithms will be explained and outlined, as well as how these can be applied to speech recognition and the world. Nicholas Buck Carly Hoffman makes it highly important to evolution, and as such it also plays an essential role in the operation of genetic algorithms. Crossover (one-point crossover) is simply defined: two chromosomes are lined up, a point along the chromosome is selected at random, then the pieces left of that point are exchanged between the chromosomes, producing a pair of offspring chromosomes. This simplified version of crossover is a close approximation of what occurs in mating. One point crossover causes alleles that are close together on the chromosome to be more likely to be passed on to one of the offspring together, while alleles that are further apart are likely to be separated by crossover. This phenomenon is referred to in genetic terms as linkage. Linkage can be determined experimentally, and assuming one point crossover made gene sequencing possible long before there was any knowledge of DNA. The genetic algorithm following Fisher’s outline uses differing fitness of variants in the current generation to make improvements to the next generation, but the computerized GA places emphasis on the variants produced by crossover. To clarify, a generation in this situation is a collection of data that could be user determined or randomly determined based on what problem they are trying to solve. In the case of speech recognition, an individual could be a string of words or phonemes that could possibly fit the input and the population would be made up of several individuals of that nature. The basic GA subroutine follows a few simple steps to produce each new generation. First, the algorithm will start with a population of individual strings of alleles. These could be specific, or generated at random. Second, two individuals are selected at random from the user determined population with a bias toward individuals with higher fitness. Third, crossover is used with the occasional use of mutation to produce two new individuals for the next generation. The fourth step is essentially to repeat steps one and two until all individuals from the initial population have been used, then step five is simply to return to step one to produce the next generation out of the newly created one. There are many ways to modify these steps, but most characteristics of a GA are already shown in the basic outline as described. THE SCIENCE BEHIND GENETIC ALGORITHMS Genetic algorithms (GA) are the computer-executable version of a mathematical algorithm created to model the rate at which a gene might spread through a population [3]. The original algorithm was created by R.A. Fisher, a statistician and biologist who founded mathematical genetics based on the idea that a chromosome consists of a string of genes. GA are routinely used to find solutions for problems that cannot be solved using standard techniques, as well as for their intended purpose of modeling genetics. The GA’s ability to function hinges on a few key concepts. For one, there is a specified set of alternatives for each gene, in biology these are known as alleles. This means that there are specific allowable strings or combinations of genes which will constitute the possible chromosomes [3]. Evolution within the algorithm is viewed as generational, such that at each stage a population of individuals produce a set of offspring which make up the next generation [3]. Each algorithm will have a fitness function which assigns to each string of alleles the number of offspring the individual possessing that chromosome will contribute to the next generation based on some specified criterion [1][3]. And finally, there is a set of genetic operators that modify offspring so that the next generation is unique and differs from the current generation [3]. Fisher’s operators are fewer than those which computer scientists have added since his formulation but his idea of mutation has remained as a common operator to ensure a unique next generation. A genetic algorithm can be thought of as a simplified model of “survival of the fittest.” In computing, these algorithms are a computer-executable version of the model created by Fisher. But, there are a couple of generalizations that were made to the original model to accommodate computerized use. For instance, weight is placed on the interaction of genes instead of making the assumption that they act independently. In addition, there is an expanded set of genetic operators which allow for a model closer to that of actual mating trends [3]. Fisher’s operator of mutation was kept, while other now common operators like crossing-over and inversion were added later to accommodate more problems and their parameters. The first generalization made about the fitness function causes it to become a more complex and nonlinear function that cannot be approximated by simply adding up the parts of individual genes. The second generalization concerning genetic operators allows the algorithm to emphasize more genetic mechanisms, like crossover, that operate regularly on chromosomes. The difference between the two is frequency. Crossover takes place in every mating whereas mutation typically occurs in less than one in a million individuals [3]. Crossover is the integral reason that mammals produce offspring exhibiting a mixture of their parents’ attributes. This FIGURE 1 [4] Visual representation of the genetic operator crossover GA are particularly helpful for solving problems that present the issues of chaos, chance, nonlinear interactions, and temporality [1]. They are efficient because they are more flexible than other methods of problem solving. That, and they are easy to modify given domain-dependent heuristics 2 Nicholas Buck Carly Hoffman where what works to solve one problem, may fail to solve another [1]. approach, artificial neural networks are commonly used, while a hidden Markov model is used for a statistical approach. Both methods will be explained fully in the following sections, as well as how they interact with genetic algorithms to optimize speech recognition. Pictured in figure 2 is a general model for speech recognition. The parameter of evaluation, and therefore the accuracy of any algorithm, can vary due to many factors. Vocabulary size and confusability for one can have an effect. The larger the vocabulary, the greater the chance of recognition error. Speaker independence or dependence is another factor which may affect accuracy. Speaker independence is a feat difficult to achieve because a system may become used to the speaker it was trained on, meaning that parameters can become speaker-specific and what may work for recognizing one speaker may not work for another [1]. Error rates are usually three to five times higher for speaker independent systems than for speaker dependent ones because of this [1]. A system is normally designed with one of three speech input attributes in mind; isolated, discontinuous, or continuous. Choosing any of these options could cause differing amounts of error. Isolated in terms of speech means that single words are being input, while discontinuous means full sentences are being input with a short silence separating words. Continuous speech input is where the user simply speaks how they normally would. Isolated and discontinuous speech recognition is relatively easy because word boundaries are easily detected and words are clearly pronounced, while continuous speech is more common yet more difficult to recognize. Kind of task and language constraints are the next varying attributes. Even having a fixed vocabulary, performance can vary based on the nature of constraints on the word sequences allowed during recognition. Task difficulty is measured by complexity rather than vocabulary size, so the more difficult the task is the more likely there will be errors. Read or spontaneous speech have an effect on a recognition system’s accuracy due to their nature. Systems can be evaluated on speech that is read from prepared scripts, or spontaneous speech. Spontaneous speech is much more difficult, due to an added element of chaos. And finally, adverse conditions have a large effect on recognition accuracy as previously mentioned. Many things can impact a system’s performance due to adverse conditions. Things like noise, acoustic distortions, differing microphones, limited frequency bandwidth and differing ways of speaking can all affect a system [1]. Essentially, the main issue in the realm of speech recognition is variability. Since there are so many factors that can affect the parameters for recognizing speech, it could prove challenging to eliminate all the possible obstacles. But, this is why GA are a good fit for optimizing speech recognition systems. They can handle elements of chaos and variability unlike most other heuristics. And studies have proven that the application of a genetic algorithm in the speech recognition process improves overall performance and SPEECH PROCESSING When you speak, you create vibrations in the air. An analog-to-digital converter (ADC) translates this analog wave into computer-readable data [5]. For this to occur, it digitizes the sound by taking measurements of the wave at frequent intervals. The system filters the sound to remove unwanted noise and separates it into different bands of frequency. The computer then normalizes the sound, or adjusts it to a constant volume level. It may also be corrected for temporal shifts. This is because people do not speak at the same rate, so the sound of their speech must be adjusted to match the speed of what the system has in its’ memory. Then, the signal is divided into many segments as small as hundredths of a second or even thousandths [5]. The program will match these segments to known phonemes in the designated language, a phoneme being the smallest sound elements of a language. This is often done by calculating the similarity of two different speech utterances to determine what has most likely been said [6]. FIGURE 2 [1] Pictured is a flow chart which models the speech recognition process Speech recognition is broken generally into two steps: feature extraction and classification [1]. Feature extraction is a preprocessing procedure in speech recognition. It extracts the specific voice features from the input speech signals. If the environment were noise free, each word or phoneme has a corresponding frequency. But, when the environment is noisy speech signals are impure and it becomes problematic to identify corresponding features. This problem worsens if the speech to be recognized has close phonemes, which is a common problem when recognizing mandarin speech as saying one utterance with a slight variation can impact meaning. Classification is the next procedure used to identify input speech based on what came through feature extraction. This classification can be done in a pattern recognition approach, or a statistical approach. For a pattern recognition 3 Nicholas Buck Carly Hoffman results [1]. These findings reveal a promising path for furthering speech recognition and its associated technology. large. So, A is a matrix containing probabilities of a hot year being followed by either a hot year or a cold year, and the same for a cold year. The B matrix will have rows corresponding to hot and cold, and rows corresponding to small, medium, and large. Thus, each element will be a probability of a certain ring size being observed on a year of either hot or cold average temperature. Finally, the initial state distribution simply contains two elements, the probability of the period of observation starting on either a hot year or a cold year. Because every element of each of these matrices is a probability, the sum of each row must be one. The last element, O, will contain the observations made, meaning each element will be either small, medium, or large. This set will contain T elements, with T being the number of years an observation is made for, in other words the number of rings looked at. There are three widely recognized “fundamental problems” to be solved using the Hidden Markov Model. The first is to determine the probability of obtaining the observed set, O, from the given model of A, B, and Pi. The second is to, given the state sequence O and the model λ, determine the most likely sequence of states. In other words, the objective of this problem is to “uncover the hidden part of the Hidden Markov Model” [8]. This is essentially what was being done in the above example. Finally, the third problem is to, given the observation sequence O and the dimensions N and M, find the model of A, B, and Pi that has the highest probability of producing this given observation sequence. This is the problem that pertains to speech recognition, and the next section will explain how it is solved using genetic computing. OPTIMIZATION WITH A DISCRETE HIDDEN MARKOV MODEL The Hidden Markov Model (HMM) has many applications, with speech recognition and acoustic modeling being at the forefront. An HMM is a doubly stochastic process with an underlying stochastic process that is not observable, but can be observed through another set of stochastic processes that produce the sequence of observed symbols [7]. Essentially, the purpose of an HMM is to, given a sequence of observable symbols produced by an unobservable method, create a model that explains and characterizes the occurrence of said symbols. These observable symbols can range from sets of observed data to vectors of linear coefficients to acoustic speech samples. An HMM is composed of several elements: T, N, M, Q, V, A, B, Pi, and O. T represents the number of times the sequence is observed. N is the number of states, and M is the number of possible observations that can be made. Q is a set with N elements that contains each state. Likewise, V is a set of M elements, containing each possible observation, or “symbol” that the state puts out. A is known as the state transition probability matrix- It is a matrix of N rows and N columns that gives the likelihood of each state transitioning into each other state. B is called the observation probability matrix and its dimensions are N by M. Each element is the probability that any of M “symbols” will be observed when the state is in any of N states. Pi is the initial state distribution, and is a matrix with one row and N columns, with each element being the probability that the system starts off in each of N states. The model of A, B, and Pi is commonly represented as λ. Finally, O is the observation sequence. This is a set with T elements, with each element being an observed “symbol”, so each element must be a member of V. Such a system can be difficult to comprehend without a concrete example. Mark Stamp’s “A Revealing Introduction to Hidden Markov Models” [8] provides one that is both intuitive and practical. The example presented in this article is to determine average annual temperature at some arbitrary location over some number of years in the past. This time could be before temperature records were taken- say, several thousand years. Since the desired time period of observation is in the past, the exact temperature cannot be observed. This is what is meant when it is said that the states are “hidden”. Instead, we must turn to something that is directly observable and can be related to what is being observed. In, this example, growth rings inside trees are used, as the amount a tree grows in a given year can be related to that year’s temperature. To set up the model, we must define the A, B, and Pi matrices. First, the states and observations (elements of Q and V) must be defined. In this example the states used are hot and cold, and the tree growth ring observations are small, medium, and COMING TOGETHER When recognizing speech, there is a wide variety of variables that must be taken into consideration including characteristics of the speaker such as sex or age, changes in context, emotion of the speaker, or environmental sounds. These many variations make it difficult to recognize speech using an approach that involves representing individual phonemes in a pre-determined set of training data. Fortunately, a genetic algorithm is capable of passing this obstacle. An expectation maximization (EM) algorithm with a Baum-Welch re-estimation formula is also a valid approach, but will ultimately be less successful as it is more reliant on accurate initial values and is more likely to return a local maximum rather than the global one. The approach used here hybridizes the genetic algorithm and EM approaches, allowing it to take advantage of the benefits of both. The stochastic constraints of the Markov model will allow the genetic algorithm to escape from a case in which the EM returns a local maximum. As stated before, we wish to apply this process to the Hidden Markov Model (HMM). Here, we are given speech samples to represent O, and we wish to solve for the most likely values of the parameters A, B, and Pi (represented as λ) 4 Nicholas Buck Carly Hoffman overcome the EM’s tendency to produce a local maximum, while still benefiting from its advantages, namely its ability to constantly produce a better value for the objective function. Later we will visit some data that proves that a speech recognition that implements a genetic algorithm is more accurate than a conventional one. of the HMM. Before initializing the Genetic Algorithm, an initial population must be selected. This will be done through two processes. The first is segmentation, which divides the observations (speech samples) into states for the HMM. The second process involves using a Gaussian model to cluster the observation vectors. Multiple initial models are created, then EM re-estimation is applied to obtain the initial population for the genetic algorithm. The next population is obtained by applying roulette-wheel selection, recombination, and mutation to this initial population. For this population, each member is checked to ensure it meets the stochastic constraints of the parameters of the HMM (each row of the A, B, and Pi matrices must add up to 1). The individuals of this population are penalized and sorted according to how well each one meets or violates these constraints. If an individual meets every stochastic constraint, it is not penalized and receives top position. Individuals that violate the observation probability (B) constraints, but meets the other two, it is given second priority. Individuals that preserve the observation probability constraints but violate any others are ranked third, with individuals violating all constraints receiving maximum penalty and lowest ranking. Next, each individual receives a fitness value according to its position in the ordered list. Population 4 is now generated by using the roulette-wheel selection algorithm to select parents based on the previously determined fitness values. An intermediate recombination method is applied, and through mutation population 5 is generated. Population 5 is then continuously mutated, and each iteration is evaluated according to the objective function (in this case, the function that gives the probability that each λ represents the proper model to produce our initial O). If an iteration returns better objective values than the previous one, the genetic algorithm is paused, and an EM re-estimation process is initiated. In this step, the objective value, P, is calculated for an individual λ from population 5. Then, Baum-Welch re-estimation formulas are applied to obtain new values, λ’, and a new objective value, P’, is calculated. P’ is compared with P, and the EM process is repeated until the difference between P and P’ falls within a certain convergence threshold. Once this entire EM process has been completed for all members of Population 5, Population 6 is obtained. This is the conclusion of the genetic algorithm, so the entire genetic algorithm is then repeated using this Population 6 as its initial population until the algorithm “converges”. The convergence criteria are met when the difference in objective value between the best individual of Population 6 and the last iteration’s best individual from Population 6 meets a certain convergence threshold. If this threshold is not met after a pre-determined number of iterations, the best individual from all previous iterations is used as the final model for the HMM. Finally, the phonemes determined according to the model are strung together and output as words and sentences. This approach combines two optimization methods, the genetic algorithm and expectation maximization, by means of a convergence threshold. Doing so allows the algorithm to SIGNIFICANCE EXPERIMENTAL PROOF Here, data from two experiments shows that the genetic algorithm approach to speech recognition is ultimately superior to other conventional methods. The first experiment compares the previously outlined genetic algorithm and EM hybrid to a purely EM based model. The EM convergence threshold was set to 0.50. Maximum EM iterations was 20 and maximum genetic algorithm iterations was 30. Two different test sets were used, with four different Gaussian Mixtures in each set. The results of all 8 trials showed that the Genetic Algorithm fared better by about 1% in both Percent Correct and Accuracy [7]. Comparing the Avg. Log Probability for each phoneme, the experiment found that the highest difference between the EM and genetic algorithm models was in the aw, ow, and aa phonemes [7]. In the other experiment, the Genetic Algorithm is pitted against a K-mean algorithm, another method of calculating the parameters of the HMM. In this experiment, the Mandarin language was used, and the tests were run in three different environments. In a quiet environment, the genetic algorithm and k-mean algorithm achieved recognition rates of 0.994 and 0.956 respectively [9]. However, the genetic algorithm really shined in environments with more noise. In the loud environment of a supermarket, the genetic algorithm averaged a 0.652 recognition rate, with the k-mean averaging 0.494 [9]. Finally, the tests were run in the noise environment of a road, in which the genetic algorithm and k-mean averaged rates of 0.782 and 0.706, respectively [9]. As proven by these experiments, the genetic algorithm will always return a higher recognition rate and accuracy than other traditional methods, especially in disruptive environments. COMMON APPLICATIONS Continued development and optimization of speech recognition technology is important due to the significant impacts it makes on many people’s daily lives. Learning and healthcare are the two fields in which speech recognition has made the greatest strides. In schools, implementation of speech recognition can not only provide convenience for the general population, but grant a boost in accessibility to students who suffer from motor or learning disabilities. For example, such software can provide hands-free computing to someone affected by a condition that hinders their ability to use a keyboard and mouse. Additionally, these systems can aid students with learning disabilities in their writing by 5 Nicholas Buck Carly Hoffman handling mechanics the student may struggle with. As the National Center for Technology Innovation states, “Often, writers with learning disabilities will skip over words when they are unsure of the correct spelling, leading to pieces of writing that are short, missing key elements, or not reflective of the student’s true abilities.” [10]. Essentially, speech recognition software has the potential to lower learning disabled students’ anxieties about grasping mechanics and provide them with a more fluid way to put their thoughts on paper. Finally, speech recognition can aid learning disabled students by increasing their independence. Traditionally, students who cannot write on their own are accompanied by someone to transcribe for them, but developments in speech recognition software are allowing these students to become more free and able to do work on their own time. Another area that has seen increasing implementation of speech recognition software is healthcare. Precise and properly formatted documentation is essential within healthcare to ensure proper patient care, provide accurate billing, and for legal purposes. Although physicians are generally hesitant to make changes like this to their workflow, speech recognition has been shown to be a huge time saver. In fact, a 2014 KLAS report showed that speech recognition software produced a widespread positive impact by reducing transcription costs, reducing documentation time, and producing more complete patient-narratives [11]. Although many institutions are reluctant to adopt speech recognition, it is seeing increasing use, with the adoption rate increasing from 21% to 47% from 2009 to 2013 [12]. FIGURE 3 [15] Pictured is the trend of Energy Consumption of Data Centers Green computing is a recently expanding field of study in the realm of computer sustainability, it is the practice of using computing resources in an energy efficient and environmentally conscious way [16]. It goes from power, to waste, to application, to education. The idea is to increase awareness of sustainability when it comes to computing, to educate people who are currently working in or considering entry into the field of technology to keep computer technology feasibly sustainable. Since neural networks, genetic algorithms and speech recognition systems run on technological devices, the continuation of sustainable computing is an important concept to consider when working with them as they often require devices which have more processing power and in turn, consume more electrical power. Now, many of the issues pertaining to sustainability on this topic are not directly related to computers themselves. For speech recognition to have a significant impact on as many lives as possible, there must be enough medical specialists available. Such professionals are needed to work alongside patients and decide if they would benefit from this type of assistance. Essentially the limit placed on how widespread speech recognition can become stems from the number of specialists such as speech pathologists available and the finite number of hours they are able to work. Another element of sustainability to consider is location. For example, many developing countries lack specialized medical care and/or access to advanced computer based technology. This fact places even more limitation on how widespread technologies like this can become. So, one goal of improving sustainability of computer based technology would be spreading it, along with proper education to these developing locations. One final point to consider when discussing sustainability on speech recognition is direct resistance to adopting it into one’s lifestyle. As previously mentioned, many professionals in the pharmaceutical industry refuse to adopt it simply because they’d rather not make such a drastic SUSTAINABILITY OF COMPUTERS AND THEIR PROCESSES Sustainability is the overall capability of a system to be maintained at a certain level of functionality. Today, it is often associated with the environment, and making the many aspects of human life more sustainable for the Earth. In other words, it is meeting the needs of the current generation, without compromising the ability of future generations to meet their own needs [13]. Neural networks themselves may not directly impact sustainability, since they run on computers which are already a big part of society. Computers however, do impact sustainability because they run on the resource of power. A data center is a group of networked computer servers normally used by large organizations, the amount of power these centers consume is a good model for overall computer efficiency. The power consumption of data centers in the U.S. grew nearly 90% between 2000 and 2005, and by 24% from 2010 to 2014 [14]. But, improvements in efficiency have majorly impacted the growth rate of the data center industry’s energy consumption. Without those improvements, data centers running at the efficiency levels of 2010 would have consumed close to 40 billion kilowatt hours more than they did in 2014 to do the same amount of work (see figure below) [15]. 6 Nicholas Buck Carly Hoffman change to their work flow. Lastly, for a patient who could benefit from use of speech recognition may be prevented from seeking help and visiting a professional by sheer apprehension. In essence, many issues pertaining to sustainability pertain to availability of medical specialists, accessibility to modern technology, and resistance to make lifestyle changes. be made. Despite this, the future looks very promising for the creation and improvement of new technologies that incorporate voice recognition technologies operating in tandem with neural networks and genetic algorithms. The future development of this tech will surely impact the way we interact with not just our technology, but our world, and could help many people in need of non-traditional methods of communication or medicine. THE FUTURE OF SPEECH RECOGNITION WITH GENETIC ALGORITHMS SOURCES So, genetic algorithms are a part of evolutionary computing and neural networks that function to solve problems of optimization that may not be solvable by traditional heuristics. They have since become an integral part of modern technologies that we use every day, and provide a promising future for the furthering of such technologies. The use of it in schools and healthcare have been mentioned, but there are so many other areas beyond those where this could be implemented. For example, many have heard of and over three million have even purchased an Amazon echo device since its’ launch in 2015. It has many features including but not limited to playing music, answering questions, delivering news or weather reports, and ordering products directly from Amazon. It is almost entirely voice controlled. But, the voice control that the Echo uses, is different from the methods we’ve seen previously. The difference between the way the Echo recognizes voice and traditional recognition is that you don’t have to be very close to the Echo for it to process your voice. In the past, voice recognition has been based on near-field recognition where the microphone is close the source of the voice. This allows for a clear signal and minimal background noise. The problems that would normally arise from not requiring the user to be a certain distance from the device were easily remedied, using deep neural networks and genetic algorithms. Speech recognition has gotten progressively better, especially with the implementation of evolutionary computing (neural networks and genetic algorithms are defined under evolutionary computing). It has progressed from having the user extremely close to the microphone and pausing between words to being able to shout an order from across the house to a speaker with artificial intelligence. The efficiency and accuracy of it can only improve with time and it has already come so far. An area of implementation that could be seeing some improvements soon is voice translation. Speech recognition, evolutionary computing, and a good amount of computer processing power are required to make computer voice translation work. The goal is to achieve translation between languages in real time, and with a portable device. Neural networks with genetic algorithms are currently very close to being able to translate in real time, but there are other elements that need improvement before such a device could [1] H. Gupta, D. Wadhwa. “Speech Feature Extraction and Recognition using Genetic Algorithm.” International Journal of Emerging Technology and Advanced Engineering Volume 4, Issue 1. January 2014. Accessed 1.9.2017. http://www.ijetae.com/files/Volume4Issue1/IJETAE_0114_ 63.pdf [2] H. Lam, F. Leung, K. Leung, S. Ling. “Application of a modified neural fuzzy network and an improved genetic algorithm to speech recognition.” Neural Computing & Applications. May 2007. Accessed 1.12.2017. http://ieeexplore.ieee.org/document/1209360/ [3] J.H. Holland. “Genetic Algorithms.” Scholarpedia. 2012. Accessed 1.11.2017. http://www.scholarpedia.org/article/Genetic_algorithms [4] “Computational science Genetic algorithm Crossover Cut and Splice.” CreationWiki.org. 2 November 2012. Accessed 3.3.2017. http://commons.wikimedia.org/wiki/File:Computational.scie nce.Genetic.algorithm.Crossover.Cut.and.Splice.svg [5] E. Grabianowski. “How Speech Recognition Works.” HowStuffWorks.com. 10 November 2006. Accessed 3.3.2017. http://electronics.howstuffworks.com/gadgets/high-techgadgets/speech-recognition.htm [6] Q. He, S. Kwong, K.F. Man, K.S. Tang. “Genetic Algorithms and their Applications.” IEEE Signal Processing Magazine. August 2002. Accessed 1.11.2017. http://rt4rf9qn2y.scholar.serialssolutions.com/?sid=google& auinit=KS&aulast=Tang&atitle=Genetic+algorithms+and+t heir+applications&id=doi:10.1109/79.543973&title=IEEE+ ASSP+magazine&volume=13&issue=6&date=1996&spage =22&issn=1053-5888 [7] M. Shamsul Huda, J. Yearwood, R Ghosh “A Hybrid Algorithm for Estimation of the Parameters of Hidden Markov Model based Acoustic Modeling of Speech Signals using Constraint-Based Genetic Algorithm and Expectation Maximization” IEEE Xplore. 07.11.2007 Accessed 01.26.2017 http://ieeexplore.ieee.org/document/4276421/ [8] M. Stamp “A Revealing Introduction to Hidden Markov Models” San Jose University Department of Computer Science. 12.11.2015 Accessed 2.20.2017 http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf [9] C. Chen, S. Pan, Y. Tsai. “Genetic Algorithm on Speech Recognition by Using DHMM.” IEEE. 2012. Accessed 7 Nicholas Buck Carly Hoffman 1.9.2017. http://rt4rf9qn2y.scholar.serialssolutions.com/?sid=google& auinit=ST&aulast=Pan&atitle=Genetic+algorithm+on+speec h+recognition+by+using+DHMM&id=doi:10.1109/ICIEA.2 012.6360929 [10] “Speech Recognition for Learning” The National Center for Technology Innovation. 2010. Accessed 1.26.2017 http://www.brainline.org/content/2010/12/speechrecognition-for-learning_pageall.html [11] M. Miliard “Speech Recognition Proving its Worth” Healthcare IT News. 6.20.2014. Accessed 2.20.2017 http://www.healthcareitnews.com/news/speech-recognitionproving-its-worth [12] MTS Team “Advantages and Disadvantages of Using Speech Recognition for Medical Transcription” MTS Services 1.08.2014 Accessed 2.20.2017 http://www.medicaltranscriptionservicecompany.com/blog/2 014/01/advantages-disadvantages-using-speech-recognitionsoftware-medical-transcription.html [13] United Nations. Report of the world commission on environment and development. In General Assembly Resolution 42/187, 1987 [14] U.S. Environmental Protection Agency. Report to Congress on Server and Data Center Energy Efficiency. August 2007 [15] S. Yevgeniy. “Here’s How Much Energy All US Data Centers Consume.” Data Center Knowledge. June 27, 2016. Accessed 3.30.2017 http://www.datacenterknowledge.com/archives/2016/06/27/ heres-how-much-energy-all-us-data-centers-consume/ [16] C.E. Landwehr. Green computing. IEEE Security & Privacy Magazine, 3(6):3–3, 2005., S. Murugesan. Harnessing Green IT: Principles and Practices. IEEE IT Professional, 10(1):24–33, 2008. ADDITIONAL SOURCES S. Omar Caballero Morales, Y. Perez Maldonado, F. Trujillo Romero “Improvement on Automatic Speech Recognition Using Micro-Genetic Algorithm” IEEE Xplore 10.27.2012 Accessed 1.26.2017 http://ieeexplore.ieee.org/document/6387222/ ACKNOWLEDGEMENTS We would like to thank our families for being so supportive, our writing instructor for providing feedback, and everyone who has gotten us where we are today. 8
© Copyright 2025 Paperzz