The Use of Genetic Algorithms in Evolutionary Computing to

Conference Session C11
Paper #64
Disclaimer—This paper partially fulfills a writing requirement for first year (freshman) engineering students at the
University of Pittsburgh Swanson School of Engineering. This paper is a student, not a professional, paper. This paper
is based on publicly available information and may not provide complete analyses of all relevant data. If this paper is
used for any purpose other than these authors’ partial fulfillment of a writing requirement for first year (freshman)
engineering students at the University of Pittsburgh Swanson School of Engineering, the user does so at his or her
own risk.
The Use of Genetic Algorithms in Evolutionary Computing to Improve the
Performance of Speech Recognition Technology
Nicholas Buck, [email protected], Budny, 10:00, Carly Hoffman, [email protected], Budny, 10:00
Abstract—It is no secret that humans are accustomed to
speech as their main method of communication. It comes as
no surprise then, that they would want to use that method to
communicate with technological devices. However, speech
processing to a human being and to a computer are two
entirely different processes. A device, for example, cannot
immediately distinguish background noise from the speech
input you want it to recognize. However, the application of a
neural network in conjunction with genetic algorithms can
optimize speech recognition so that it can process speech
input in a way similar to that of the human mind.
Genetic algorithms are one of many types of
optimization techniques which are used to make certain
processes more efficient and accurate, processes like speech
recognition. They resemble the biological process of
evolution and natural selection. The use of speech recognition
is widespread in most modern technologies of this era, being
implemented in most phones, televisions, and even cars. We
want to highlight how these algorithms work to make
technology more usable and simple to communicate with to
all who interact with the technology.
To explain how these algorithms work and benefit
humans, we intend to describe the history, components and
inner workings of both genetic algorithms and speech
recognition as well as the sustainability of these technologies.
We will communicate how these algorithms will impact the
future of our technology, and what impacts they have already
made by referencing past and current experimental results.
Key Words—Discrete Hidden Markov Model, Evolutionary
Computing, Genetic Algorithms, Neural Networks,
Optimization, Speech Recognition
OPTIMIZATION IS KEY
Many have likely heard of what is known as the “language
barrier,” the idea that differences in communication are one
of the main obstacles keeping humanity largely separated. To
remedy the cause of this separation, advances in computing
and technology have brought about the creation of speech
recognition and computerized speech translation. But, these
advances are far from perfect. What would really aid us in
1
University of Pittsburgh Swanson School of Engineering
11.02.2017
achieving the goal of this technology would be the ability to
translate speech in real time. Recent years have yielded results
close to this objective, and while full realization has not
happened advances in evolutionary computing could be the
key to obtaining the speed necessary to accomplish such a
feat.
Humans are accustomed to using speech to communicate,
it makes sense then that they would like to interact with their
technology in the same manner. Currently, we interact with
most of our devices through peripherals such as a keyboard
and mouse or through touch, but these would become less
necessary and more inconvenient if speech recognition were
to become more accurate and reliable [1]. There are
difficulties that come with the technology, however. At times,
it can be difficult for a machine to recognize everything we
say due to external factors like background noise or similar
sounding phonemes which can cause words to get switched or
confused. There are many methods that have been used to
optimize many different processes (computer or otherwise)
using computer power, a genetic algorithm is a widely-used
technique of this sort [2]. A genetic algorithm is a
computerized process similar to the biological process of
natural selection. It has been shown that the use of these
algorithms applied to speech signals used in recognition
greatly improve both speed and performance [1]. But this is
not the only way these algorithms can be applied to this
situation alone. Spoken translation software are becoming
increasingly more prevalent, but they can often be inaccurate.
These inaccuracies are due to anything from differences in
language laws and words having multiple possible
translations, to simple background noise. Filtering out
background noise can be done using common methods, but
few can match the speed of a genetic algorithm. In the sections
to follow the methods behind evolutionary computing and
therefore genetic algorithms will be explained and outlined,
as well as how these can be applied to speech recognition and
the world.
Nicholas Buck
Carly Hoffman
makes it highly important to evolution, and as such it also
plays an essential role in the operation of genetic algorithms.
Crossover (one-point crossover) is simply defined: two
chromosomes are lined up, a point along the chromosome is
selected at random, then the pieces left of that point are
exchanged between the chromosomes, producing a pair of
offspring chromosomes. This simplified version of crossover
is a close approximation of what occurs in mating. One point
crossover causes alleles that are close together on the
chromosome to be more likely to be passed on to one of the
offspring together, while alleles that are further apart are
likely to be separated by crossover. This phenomenon is
referred to in genetic terms as linkage. Linkage can be
determined experimentally, and assuming one point crossover
made gene sequencing possible long before there was any
knowledge of DNA.
The genetic algorithm following Fisher’s outline uses
differing fitness of variants in the current generation to make
improvements to the next generation, but the computerized
GA places emphasis on the variants produced by crossover.
To clarify, a generation in this situation is a collection of data
that could be user determined or randomly determined based
on what problem they are trying to solve. In the case of speech
recognition, an individual could be a string of words or
phonemes that could possibly fit the input and the population
would be made up of several individuals of that nature. The
basic GA subroutine follows a few simple steps to produce
each new generation. First, the algorithm will start with a
population of individual strings of alleles. These could be
specific, or generated at random. Second, two individuals are
selected at random from the user determined population with
a bias toward individuals with higher fitness. Third, crossover
is used with the occasional use of mutation to produce two
new individuals for the next generation. The fourth step is
essentially to repeat steps one and two until all individuals
from the initial population have been used, then step five is
simply to return to step one to produce the next generation out
of the newly created one. There are many ways to modify
these steps, but most characteristics of a GA are already
shown
in
the
basic
outline
as
described.
THE SCIENCE BEHIND GENETIC
ALGORITHMS
Genetic algorithms (GA) are the computer-executable
version of a mathematical algorithm created to model the rate
at which a gene might spread through a population [3]. The
original algorithm was created by R.A. Fisher, a statistician
and biologist who founded mathematical genetics based on
the idea that a chromosome consists of a string of genes. GA
are routinely used to find solutions for problems that cannot
be solved using standard techniques, as well as for their
intended purpose of modeling genetics.
The GA’s ability to function hinges on a few key concepts.
For one, there is a specified set of alternatives for each gene,
in biology these are known as alleles. This means that there
are specific allowable strings or combinations of genes which
will constitute the possible chromosomes [3]. Evolution
within the algorithm is viewed as generational, such that at
each stage a population of individuals produce a set of
offspring which make up the next generation [3]. Each
algorithm will have a fitness function which assigns to each
string of alleles the number of offspring the individual
possessing that chromosome will contribute to the next
generation based on some specified criterion [1][3]. And
finally, there is a set of genetic operators that modify offspring
so that the next generation is unique and differs from the
current generation [3]. Fisher’s operators are fewer than those
which computer scientists have added since his formulation
but his idea of mutation has remained as a common operator
to ensure a unique next generation.
A genetic algorithm can be thought of as a simplified
model of “survival of the fittest.” In computing, these
algorithms are a computer-executable version of the model
created by Fisher. But, there are a couple of generalizations
that were made to the original model to accommodate
computerized use. For instance, weight is placed on the
interaction of genes instead of making the assumption that
they act independently. In addition, there is an expanded set
of genetic operators which allow for a model closer to that of
actual mating trends [3]. Fisher’s operator of mutation was
kept, while other now common operators like crossing-over
and inversion were added later to accommodate more
problems and their parameters.
The first generalization made about the fitness function
causes it to become a more complex and nonlinear function
that cannot be approximated by simply adding up the parts of
individual genes. The second generalization concerning
genetic operators allows the algorithm to emphasize more
genetic mechanisms, like crossover, that operate regularly on
chromosomes. The difference between the two is frequency.
Crossover takes place in every mating whereas mutation
typically occurs in less than one in a million individuals [3].
Crossover is the integral reason that mammals produce
offspring exhibiting a mixture of their parents’ attributes. This
FIGURE 1 [4]
Visual representation of the genetic operator crossover
GA are particularly helpful for solving problems that
present the issues of chaos, chance, nonlinear interactions,
and temporality [1]. They are efficient because they are more
flexible than other methods of problem solving. That, and
they are easy to modify given domain-dependent heuristics
2
Nicholas Buck
Carly Hoffman
where what works to solve one problem, may fail to solve
another [1].
approach, artificial neural networks are commonly used,
while a hidden Markov model is used for a statistical
approach. Both methods will be explained fully in the
following sections, as well as how they interact with genetic
algorithms to optimize speech recognition.
Pictured in figure 2 is a general model for speech
recognition. The parameter of evaluation, and therefore the
accuracy of any algorithm, can vary due to many factors.
Vocabulary size and confusability for one can have an effect.
The larger the vocabulary, the greater the chance of
recognition error. Speaker independence or dependence is
another factor which may affect accuracy. Speaker
independence is a feat difficult to achieve because a system
may become used to the speaker it was trained on, meaning
that parameters can become speaker-specific and what may
work for recognizing one speaker may not work for another
[1]. Error rates are usually three to five times higher for
speaker independent systems than for speaker dependent ones
because of this [1]. A system is normally designed with one
of three speech input attributes in mind; isolated,
discontinuous, or continuous. Choosing any of these options
could cause differing amounts of error. Isolated in terms of
speech means that single words are being input, while
discontinuous means full sentences are being input with a
short silence separating words. Continuous speech input is
where the user simply speaks how they normally would.
Isolated and discontinuous speech recognition is relatively
easy because word boundaries are easily detected and words
are clearly pronounced, while continuous speech is more
common yet more difficult to recognize. Kind of task and
language constraints are the next varying attributes. Even
having a fixed vocabulary, performance can vary based on the
nature of constraints on the word sequences allowed during
recognition. Task difficulty is measured by complexity rather
than vocabulary size, so the more difficult the task is the more
likely there will be errors. Read or spontaneous speech have
an effect on a recognition system’s accuracy due to their
nature. Systems can be evaluated on speech that is read from
prepared scripts, or spontaneous speech. Spontaneous speech
is much more difficult, due to an added element of chaos. And
finally, adverse conditions have a large effect on recognition
accuracy as previously mentioned. Many things can impact a
system’s performance due to adverse conditions. Things like
noise, acoustic distortions, differing microphones, limited
frequency bandwidth and differing ways of speaking can all
affect a system [1].
Essentially, the main issue in the realm of speech
recognition is variability. Since there are so many factors that
can affect the parameters for recognizing speech, it could
prove challenging to eliminate all the possible obstacles. But,
this is why GA are a good fit for optimizing speech
recognition systems. They can handle elements of chaos and
variability unlike most other heuristics. And studies have
proven that the application of a genetic algorithm in the
speech recognition process improves overall performance and
SPEECH PROCESSING
When you speak, you create vibrations in the air. An
analog-to-digital converter (ADC) translates this analog wave
into computer-readable data [5]. For this to occur, it digitizes
the sound by taking measurements of the wave at frequent
intervals. The system filters the sound to remove unwanted
noise and separates it into different bands of frequency. The
computer then normalizes the sound, or adjusts it to a constant
volume level. It may also be corrected for temporal shifts.
This is because people do not speak at the same rate, so the
sound of their speech must be adjusted to match the speed of
what the system has in its’ memory. Then, the signal is
divided into many segments as small as hundredths of a
second or even thousandths [5]. The program will match these
segments to known phonemes in the designated language, a
phoneme being the smallest sound elements of a language.
This is often done by calculating the similarity of two
different speech utterances to determine what has most likely
been said [6].
FIGURE 2 [1]
Pictured is a flow chart which models the speech
recognition process
Speech recognition is broken generally into two steps:
feature extraction and classification [1]. Feature extraction is
a preprocessing procedure in speech recognition. It extracts
the specific voice features from the input speech signals. If
the environment were noise free, each word or phoneme has
a corresponding frequency. But, when the environment is
noisy speech signals are impure and it becomes problematic
to identify corresponding features. This problem worsens if
the speech to be recognized has close phonemes, which is a
common problem when recognizing mandarin speech as
saying one utterance with a slight variation can impact
meaning. Classification is the next procedure used to identify
input speech based on what came through feature extraction.
This classification can be done in a pattern recognition
approach, or a statistical approach. For a pattern recognition
3
Nicholas Buck
Carly Hoffman
results [1]. These findings reveal a promising path for
furthering speech recognition and its associated technology.
large. So, A is a matrix containing probabilities of a hot year
being followed by either a hot year or a cold year, and the
same for a cold year. The B matrix will have rows
corresponding to hot and cold, and rows corresponding to
small, medium, and large. Thus, each element will be a
probability of a certain ring size being observed on a year of
either hot or cold average temperature. Finally, the initial
state distribution simply contains two elements, the
probability of the period of observation starting on either a hot
year or a cold year. Because every element of each of these
matrices is a probability, the sum of each row must be one.
The last element, O, will contain the observations made,
meaning each element will be either small, medium, or large.
This set will contain T elements, with T being the number of
years an observation is made for, in other words the number
of rings looked at.
There are three widely recognized “fundamental
problems” to be solved using the Hidden Markov Model. The
first is to determine the probability of obtaining the observed
set, O, from the given model of A, B, and Pi. The second is
to, given the state sequence O and the model λ, determine the
most likely sequence of states. In other words, the objective
of this problem is to “uncover the hidden part of the Hidden
Markov Model” [8]. This is essentially what was being done
in the above example. Finally, the third problem is to, given
the observation sequence O and the dimensions N and M, find
the model of A, B, and Pi that has the highest probability of
producing this given observation sequence. This is the
problem that pertains to speech recognition, and the next
section will explain how it is solved using genetic computing.
OPTIMIZATION WITH A DISCRETE
HIDDEN MARKOV MODEL
The Hidden Markov Model (HMM) has many
applications, with speech recognition and acoustic modeling
being at the forefront. An HMM is a doubly stochastic
process with an underlying stochastic process that is not
observable, but can be observed through another set of
stochastic processes that produce the sequence of observed
symbols [7]. Essentially, the purpose of an HMM is to, given
a sequence of observable symbols produced by an
unobservable method, create a model that explains and
characterizes the occurrence of said symbols.
These
observable symbols can range from sets of observed data to
vectors of linear coefficients to acoustic speech samples.
An HMM is composed of several elements: T, N, M, Q, V, A,
B, Pi, and O. T represents the number of times the sequence
is observed. N is the number of states, and M is the number
of possible observations that can be made. Q is a set with N
elements that contains each state. Likewise, V is a set of M
elements, containing each possible observation, or “symbol”
that the state puts out. A is known as the state transition
probability matrix- It is a matrix of N rows and N columns
that gives the likelihood of each state transitioning into each
other state. B is called the observation probability matrix and
its dimensions are N by M. Each element is the probability
that any of M “symbols” will be observed when the state is in
any of N states. Pi is the initial state distribution, and is a
matrix with one row and N columns, with each element being
the probability that the system starts off in each of N states.
The model of A, B, and Pi is commonly represented as λ.
Finally, O is the observation sequence. This is a set with T
elements, with each element being an observed “symbol”, so
each element must be a member of V.
Such a system can be difficult to comprehend without a
concrete example. Mark Stamp’s “A Revealing Introduction
to Hidden Markov Models” [8] provides one that is both
intuitive and practical. The example presented in this article
is to determine average annual temperature at some arbitrary
location over some number of years in the past. This time
could be before temperature records were taken- say, several
thousand years. Since the desired time period of observation
is in the past, the exact temperature cannot be observed. This
is what is meant when it is said that the states are “hidden”.
Instead, we must turn to something that is directly observable
and can be related to what is being observed. In, this example,
growth rings inside trees are used, as the amount a tree grows
in a given year can be related to that year’s temperature. To
set up the model, we must define the A, B, and Pi matrices.
First, the states and observations (elements of Q and V) must
be defined. In this example the states used are hot and cold,
and the tree growth ring observations are small, medium, and
COMING TOGETHER
When recognizing speech, there is a wide variety of
variables that must be taken into consideration including
characteristics of the speaker such as sex or age, changes in
context, emotion of the speaker, or environmental sounds.
These many variations make it difficult to recognize speech
using an approach that involves representing individual
phonemes in a pre-determined set of training data.
Fortunately, a genetic algorithm is capable of passing this
obstacle. An expectation maximization (EM) algorithm with
a Baum-Welch re-estimation formula is also a valid approach,
but will ultimately be less successful as it is more reliant on
accurate initial values and is more likely to return a local
maximum rather than the global one. The approach used here
hybridizes the genetic algorithm and EM approaches,
allowing it to take advantage of the benefits of both. The
stochastic constraints of the Markov model will allow the
genetic algorithm to escape from a case in which the EM
returns a local maximum.
As stated before, we wish to apply this process to the
Hidden Markov Model (HMM). Here, we are given speech
samples to represent O, and we wish to solve for the most
likely values of the parameters A, B, and Pi (represented as λ)
4
Nicholas Buck
Carly Hoffman
overcome the EM’s tendency to produce a local maximum,
while still benefiting from its advantages, namely its ability to
constantly produce a better value for the objective function.
Later we will visit some data that proves that a speech
recognition that implements a genetic algorithm is more
accurate than a conventional one.
of the HMM. Before initializing the Genetic Algorithm, an
initial population must be selected. This will be done through
two processes. The first is segmentation, which divides the
observations (speech samples) into states for the HMM. The
second process involves using a Gaussian model to cluster the
observation vectors. Multiple initial models are created, then
EM re-estimation is applied to obtain the initial population for
the genetic algorithm. The next population is obtained by
applying roulette-wheel selection, recombination, and
mutation to this initial population. For this population, each
member is checked to ensure it meets the stochastic
constraints of the parameters of the HMM (each row of the A,
B, and Pi matrices must add up to 1). The individuals of this
population are penalized and sorted according to how well
each one meets or violates these constraints. If an individual
meets every stochastic constraint, it is not penalized and
receives top position. Individuals that violate the observation
probability (B) constraints, but meets the other two, it is given
second priority. Individuals that preserve the observation
probability constraints but violate any others are ranked third,
with individuals violating all constraints receiving maximum
penalty and lowest ranking. Next, each individual receives a
fitness value according to its position in the ordered list.
Population 4 is now generated by using the roulette-wheel
selection algorithm to select parents based on the previously
determined fitness values. An intermediate recombination
method is applied, and through mutation population 5 is
generated. Population 5 is then continuously mutated, and
each iteration is evaluated according to the objective function
(in this case, the function that gives the probability that each
λ represents the proper model to produce our initial O). If an
iteration returns better objective values than the previous one,
the genetic algorithm is paused, and an EM re-estimation
process is initiated. In this step, the objective value, P, is
calculated for an individual λ from population 5. Then,
Baum-Welch re-estimation formulas are applied to obtain
new values, λ’, and a new objective value, P’, is calculated.
P’ is compared with P, and the EM process is repeated until
the difference between P and P’ falls within a certain
convergence threshold. Once this entire EM process has been
completed for all members of Population 5, Population 6 is
obtained. This is the conclusion of the genetic algorithm, so
the entire genetic algorithm is then repeated using this
Population 6 as its initial population until the algorithm
“converges”. The convergence criteria are met when the
difference in objective value between the best individual of
Population 6 and the last iteration’s best individual from
Population 6 meets a certain convergence threshold. If this
threshold is not met after a pre-determined number of
iterations, the best individual from all previous iterations is
used as the final model for the HMM. Finally, the phonemes
determined according to the model are strung together and
output as words and sentences.
This approach combines two optimization methods, the
genetic algorithm and expectation maximization, by means of
a convergence threshold. Doing so allows the algorithm to
SIGNIFICANCE
EXPERIMENTAL PROOF
Here, data from two experiments shows that the genetic
algorithm approach to speech recognition is ultimately
superior to other conventional methods. The first experiment
compares the previously outlined genetic algorithm and EM
hybrid to a purely EM based model. The EM convergence
threshold was set to 0.50. Maximum EM iterations was 20
and maximum genetic algorithm iterations was 30. Two
different test sets were used, with four different Gaussian
Mixtures in each set. The results of all 8 trials showed that
the Genetic Algorithm fared better by about 1% in both
Percent Correct and Accuracy [7]. Comparing the Avg. Log
Probability for each phoneme, the experiment found that the
highest difference between the EM and genetic algorithm
models was in the aw, ow, and aa phonemes [7]. In the other
experiment, the Genetic Algorithm is pitted against a K-mean
algorithm, another method of calculating the parameters of
the HMM. In this experiment, the Mandarin language was
used, and the tests were run in three different environments.
In a quiet environment, the genetic algorithm and k-mean
algorithm achieved recognition rates of 0.994 and 0.956
respectively [9]. However, the genetic algorithm really
shined in environments with more noise. In the loud
environment of a supermarket, the genetic algorithm averaged
a 0.652 recognition rate, with the k-mean averaging 0.494 [9].
Finally, the tests were run in the noise environment of a road,
in which the genetic algorithm and k-mean averaged rates of
0.782 and 0.706, respectively [9]. As proven by these
experiments, the genetic algorithm will always return a higher
recognition rate and accuracy than other traditional methods,
especially in disruptive environments.
COMMON APPLICATIONS
Continued development and optimization of speech
recognition technology is important due to the significant
impacts it makes on many people’s daily lives. Learning and
healthcare are the two fields in which speech recognition has
made the greatest strides. In schools, implementation of
speech recognition can not only provide convenience for the
general population, but grant a boost in accessibility to
students who suffer from motor or learning disabilities. For
example, such software can provide hands-free computing to
someone affected by a condition that hinders their ability to
use a keyboard and mouse. Additionally, these systems can
aid students with learning disabilities in their writing by
5
Nicholas Buck
Carly Hoffman
handling mechanics the student may struggle with. As the
National Center for Technology Innovation states, “Often,
writers with learning disabilities will skip over words when
they are unsure of the correct spelling, leading to pieces of
writing that are short, missing key elements, or not reflective
of the student’s true abilities.” [10]. Essentially, speech
recognition software has the potential to lower learning
disabled students’ anxieties about grasping mechanics and
provide them with a more fluid way to put their thoughts on
paper. Finally, speech recognition can aid learning disabled
students by increasing their independence. Traditionally,
students who cannot write on their own are accompanied by
someone to transcribe for them, but developments in speech
recognition software are allowing these students to become
more free and able to do work on their own time.
Another area that has seen increasing implementation of
speech recognition software is healthcare. Precise and
properly formatted documentation is essential within
healthcare to ensure proper patient care, provide accurate
billing, and for legal purposes. Although physicians are
generally hesitant to make changes like this to their workflow, speech recognition has been shown to be a huge time
saver. In fact, a 2014 KLAS report showed that speech
recognition software produced a widespread positive impact
by reducing transcription costs, reducing documentation time,
and producing more complete patient-narratives [11].
Although many institutions are reluctant to adopt speech
recognition, it is seeing increasing use, with the adoption rate
increasing from 21% to 47% from 2009 to 2013 [12].
FIGURE 3 [15]
Pictured is the trend of Energy Consumption of Data
Centers
Green computing is a recently expanding field of study
in the realm of computer sustainability, it is the practice of
using computing resources in an energy efficient and
environmentally conscious way [16]. It goes from power, to
waste, to application, to education. The idea is to increase
awareness of sustainability when it comes to computing, to
educate people who are currently working in or considering
entry into the field of technology to keep computer
technology feasibly sustainable. Since neural networks,
genetic algorithms and speech recognition systems run on
technological devices, the continuation of sustainable
computing is an important concept to consider when working
with them as they often require devices which have more
processing power and in turn, consume more electrical power.
Now, many of the issues pertaining to sustainability on
this topic are not directly related to computers
themselves. For speech recognition to have a significant
impact on as many lives as possible, there must be enough
medical specialists available. Such professionals are needed
to work alongside patients and decide if they would benefit
from this type of assistance. Essentially the limit placed on
how widespread speech recognition can become stems from
the number of specialists such as speech pathologists
available and the finite number of hours they are able to
work. Another element of sustainability to consider is
location. For example, many developing countries lack
specialized medical care and/or access to advanced computer
based technology. This fact places even more limitation on
how widespread technologies like this can become. So, one
goal of improving sustainability of computer based
technology would be spreading it, along with proper
education to these developing locations.
One final point to consider when discussing
sustainability on speech recognition is direct resistance to
adopting it into one’s lifestyle. As previously mentioned,
many professionals in the pharmaceutical industry refuse to
adopt it simply because they’d rather not make such a drastic
SUSTAINABILITY OF COMPUTERS AND
THEIR PROCESSES
Sustainability is the overall capability of a system to be
maintained at a certain level of functionality. Today, it is often
associated with the environment, and making the many
aspects of human life more sustainable for the Earth. In other
words, it is meeting the needs of the current generation,
without compromising the ability of future generations to
meet their own needs [13]. Neural networks themselves may
not directly impact sustainability, since they run on computers
which are already a big part of society. Computers however,
do impact sustainability because they run on the resource of
power. A data center is a group of networked computer
servers normally used by large organizations, the amount of
power these centers consume is a good model for overall
computer efficiency. The power consumption of data centers
in the U.S. grew nearly 90% between 2000 and 2005, and by
24% from 2010 to 2014 [14]. But, improvements in efficiency
have majorly impacted the growth rate of the data center
industry’s energy consumption. Without those improvements,
data centers running at the efficiency levels of 2010 would
have consumed close to 40 billion kilowatt hours more than
they did in 2014 to do the same amount of work (see figure
below) [15].
6
Nicholas Buck
Carly Hoffman
change to their work flow. Lastly, for a patient who could
benefit from use of speech recognition may be prevented from
seeking help and visiting a professional by sheer
apprehension. In essence, many issues pertaining to
sustainability pertain to availability of medical specialists,
accessibility to modern technology, and resistance to make
lifestyle changes.
be made. Despite this, the future looks very promising for the
creation and improvement of new technologies that
incorporate voice recognition technologies operating in
tandem with neural networks and genetic algorithms. The
future development of this tech will surely impact the way we
interact with not just our technology, but our world, and could
help many people in need of non-traditional methods of
communication or medicine.
THE FUTURE OF SPEECH RECOGNITION
WITH GENETIC ALGORITHMS
SOURCES
So, genetic algorithms are a part of evolutionary
computing and neural networks that function to solve
problems of optimization that may not be solvable by
traditional heuristics. They have since become an integral part
of modern technologies that we use every day, and provide a
promising future for the furthering of such technologies. The
use of it in schools and healthcare have been mentioned, but
there are so many other areas beyond those where this could
be implemented.
For example, many have heard of and over three million
have even purchased an Amazon echo device since its’ launch
in 2015. It has many features including but not limited to
playing music, answering questions, delivering news or
weather reports, and ordering products directly from Amazon.
It is almost entirely voice controlled. But, the voice control
that the Echo uses, is different from the methods we’ve seen
previously. The difference between the way the Echo
recognizes voice and traditional recognition is that you don’t
have to be very close to the Echo for it to process your voice.
In the past, voice recognition has been based on near-field
recognition where the microphone is close the source of the
voice. This allows for a clear signal and minimal background
noise. The problems that would normally arise from not
requiring the user to be a certain distance from the device
were easily remedied, using deep neural networks and genetic
algorithms.
Speech recognition has gotten progressively better,
especially with the implementation of evolutionary
computing (neural networks and genetic algorithms are
defined under evolutionary computing). It has progressed
from having the user extremely close to the microphone and
pausing between words to being able to shout an order from
across the house to a speaker with artificial intelligence. The
efficiency and accuracy of it can only improve with time and
it has already come so far.
An area of implementation that could be seeing some
improvements soon is voice translation. Speech recognition,
evolutionary computing, and a good amount of computer
processing power are required to make computer voice
translation work. The goal is to achieve translation between
languages in real time, and with a portable device. Neural
networks with genetic algorithms are currently very close to
being able to translate in real time, but there are other
elements that need improvement before such a device could
[1] H. Gupta, D. Wadhwa. “Speech Feature Extraction and
Recognition using Genetic Algorithm.” International Journal
of Emerging Technology and Advanced Engineering Volume
4, Issue 1. January 2014. Accessed 1.9.2017.
http://www.ijetae.com/files/Volume4Issue1/IJETAE_0114_
63.pdf
[2] H. Lam, F. Leung, K. Leung, S. Ling. “Application of a
modified neural fuzzy network and an improved genetic
algorithm to speech recognition.” Neural Computing &
Applications.
May
2007.
Accessed
1.12.2017.
http://ieeexplore.ieee.org/document/1209360/
[3] J.H. Holland. “Genetic Algorithms.” Scholarpedia. 2012.
Accessed
1.11.2017.
http://www.scholarpedia.org/article/Genetic_algorithms
[4] “Computational science Genetic algorithm Crossover Cut
and Splice.” CreationWiki.org. 2 November 2012. Accessed
3.3.2017.
http://commons.wikimedia.org/wiki/File:Computational.scie
nce.Genetic.algorithm.Crossover.Cut.and.Splice.svg
[5] E. Grabianowski. “How Speech Recognition Works.”
HowStuffWorks.com. 10 November 2006. Accessed
3.3.2017.
http://electronics.howstuffworks.com/gadgets/high-techgadgets/speech-recognition.htm
[6] Q. He, S. Kwong, K.F. Man, K.S. Tang. “Genetic
Algorithms and their Applications.” IEEE Signal Processing
Magazine.
August
2002.
Accessed
1.11.2017.
http://rt4rf9qn2y.scholar.serialssolutions.com/?sid=google&
auinit=KS&aulast=Tang&atitle=Genetic+algorithms+and+t
heir+applications&id=doi:10.1109/79.543973&title=IEEE+
ASSP+magazine&volume=13&issue=6&date=1996&spage
=22&issn=1053-5888
[7] M. Shamsul Huda, J. Yearwood, R Ghosh “A Hybrid
Algorithm for Estimation of the Parameters of Hidden
Markov Model based Acoustic Modeling of Speech Signals
using Constraint-Based Genetic Algorithm and Expectation
Maximization” IEEE Xplore. 07.11.2007 Accessed
01.26.2017 http://ieeexplore.ieee.org/document/4276421/
[8] M. Stamp “A Revealing Introduction to Hidden Markov
Models” San Jose University Department of Computer
Science.
12.11.2015
Accessed
2.20.2017
http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf
[9] C. Chen, S. Pan, Y. Tsai. “Genetic Algorithm on Speech
Recognition by Using DHMM.” IEEE. 2012. Accessed
7
Nicholas Buck
Carly Hoffman
1.9.2017.
http://rt4rf9qn2y.scholar.serialssolutions.com/?sid=google&
auinit=ST&aulast=Pan&atitle=Genetic+algorithm+on+speec
h+recognition+by+using+DHMM&id=doi:10.1109/ICIEA.2
012.6360929
[10] “Speech Recognition for Learning” The National Center
for Technology Innovation. 2010. Accessed 1.26.2017
http://www.brainline.org/content/2010/12/speechrecognition-for-learning_pageall.html
[11] M. Miliard “Speech Recognition Proving its Worth”
Healthcare IT News. 6.20.2014. Accessed 2.20.2017
http://www.healthcareitnews.com/news/speech-recognitionproving-its-worth
[12] MTS Team “Advantages and Disadvantages of Using
Speech Recognition for Medical Transcription” MTS
Services
1.08.2014
Accessed
2.20.2017
http://www.medicaltranscriptionservicecompany.com/blog/2
014/01/advantages-disadvantages-using-speech-recognitionsoftware-medical-transcription.html
[13] United Nations. Report of the world commission on
environment and development. In General Assembly
Resolution 42/187, 1987
[14] U.S. Environmental Protection Agency. Report to
Congress on Server and Data Center Energy Efficiency.
August 2007
[15] S. Yevgeniy. “Here’s How Much Energy All US Data
Centers Consume.” Data Center Knowledge. June 27, 2016.
Accessed
3.30.2017
http://www.datacenterknowledge.com/archives/2016/06/27/
heres-how-much-energy-all-us-data-centers-consume/
[16] C.E. Landwehr. Green computing. IEEE Security &
Privacy Magazine, 3(6):3–3, 2005., S. Murugesan.
Harnessing Green IT: Principles and Practices. IEEE IT
Professional, 10(1):24–33, 2008.
ADDITIONAL SOURCES
S. Omar Caballero Morales, Y. Perez Maldonado, F. Trujillo
Romero “Improvement on Automatic Speech Recognition
Using Micro-Genetic Algorithm” IEEE Xplore 10.27.2012
Accessed
1.26.2017
http://ieeexplore.ieee.org/document/6387222/
ACKNOWLEDGEMENTS
We would like to thank our families for being so
supportive, our writing instructor for providing feedback, and
everyone who has gotten us where we are today.
8