Print this article

Volume||4||Issue||08||August-2016||Pages-5756-5763||ISSN(e):2321-7545
Website: http://ijsae.in
DOI: http://dx.doi.org/10.18535/ijsre/v4i08.28
Analysis of Speech Recognition system based on Gender/ Age group / Human- Machine
Authors
Amita , Abhishek Bhatnagar2
1
M.tech (CE) Computer Science Department Indus Institute of Engineering and Technology Kinana (Jind)
2
Asst.Professor Computer Science Department Indus Institute of Engineering and Technology Kinana (Jind)
[email protected], [email protected]
ABSTRACT:
Speech recognition (SR) is translation of spoken words into text. It is called automatic speech recognition
or just "speech to text" (STT). Some SR systems use to training where an each speaker reads text or isolated
vocabulary into system. System analyzes person's specific voice & uses it to fine-tune recognition of that
person's speech. The term voice recognition or speaker refers to identifying speaker, rather than what they
are saying. Recognizing speaker could simplify task of translating speech in systems that has been trained
on a specific person's voice or it could be used to authenticate or verify identity of a speaker as part of a
security process. In health care sector, speech recognition could be implemented in front-end or back-end
of medical documentation process. Front-end speech recognition is where provider dictates into a speechrecognition engine, recognized words are displayed as they are spoken, & dictator is responsible for
editing & signing off on document.
1
1 INTRODUCTION
Speech recognition (SR) is the inter-disciplinary sub-field of computational linguistics which incorporates
knowledge and research in the linguistics, computer science, andelectrical engineering fields to develop
methodologies and technologies that enables the recognition and translation of spoken language into text by
computers and computerized devices such as those categorized as smart technologies and robotics. It is also
known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text"
(STT).
Some SR systems use "training" (also called "enrollment") where an individual speaker reads text or
isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune
the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are
called "speaker independent"[1]systems. Systems that use training are called "speaker dependent".
Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call home"), call
routing (e.g. "I would like to make a collect call"), domotic appliance control, search (e.g. find a podcast
where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of
structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails),
and aircraft (usually termed Direct Voice Input).
Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016
Page 5757
Fig 1 speech Recognition
High-performance fighter aircraft
Substantial efforts has been devoted in last decade to test & evaluation of speech recognition in fighter
aircraft. Of particular note is U.S. program in speech recognition for Advanced Fighter Technology
Integration (AFTI)/F-16 aircraft (F-16 VISTA), & a program in France installing speech recognition systems
on Mirage aircraft, & also programs in UK dealing within a variety of aircraft platforms. In these programs,
speech recognizers has been operated successfully in fighter aircraft, within applications including: setting
radio frequencies, commanding an autopilot system, setting steer-point coordinates & weapons release
parameters, & controlling flight display.
2 LITERATURE REVIEW
As early as 1932, Bell Labs like Harvey Fletcher were investigating science of speech perception. Their
system worked by locating formants in power spectrum of every utterance. 1950s era technology was
limited to single-speaker systems within vocabularies of around ten words. In 1952 three Bell Labs
researchers built a system for single-speaker digit recognition.
Pierce's letter compared speech recognition to extracting gold from sea, curing cancer, or going to moon."
Pierce speech recognition research.
Raj Reddy first person to take on continuous speech recognition at Stanford University in late 1960s.
Previous systems required users to make a pause after every word. Reddy's system was designed to
problem spoken commands for game of chess. Also around this time Soviet researchers invented dynamic
time warping algorithm.
In 197 five years of speech recognition research through Understanding Research program within ambitious
including vocabulary size of 1,000 words. BBN. IBM., Carnegie Mellon & Stanford Research Institute all
participated in program. Government funding speech recognition research that had been largely abandoned
in United States after John Pierce's letter. Despite fact that CMU's Harpy system met goals established at
outset of program, many of predictions turned out to be nothing hype disappointing DARPA
administrators. Several innovations happened during this time, such as invention of beam search for use in
Harpy system. field also benefited from discovery of several algorithms in other fields such as linear
predictive coding & cepstral analysis.
3 DESIGN METHODOLOGY & PROPOSED IMPLEMENTATION
Dramatic advances had been made in speech recognition technology over past some years. Despite these
commercial recognizers has been successful only in a few constrained application areas. Many researchers
believe that recognizers would enjoy widespread use & become commonplace only if their performance
method that of humans under everyday listening environments. In this paper measures how far research has
progressed towards this goal. Results from scattered studies which has compared human & machine speech
Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016
Page 5758
recognition on similar tasks are analysis to determine how much speech recognizers must be growths to
match human performance. Speech corpora used in these different do not represent everyday listening
conditions that span a continuum ranging from quiet read isolated words to spontaneous smart phone speech.
Results demonstrate that modern speech recognizers still perform more worse than humans, both within
wideband speech read in quiet, & within band-limited or noisy spontaneous speech.
The human error rate for continuously spoken letters of the alphabet and the machine error rate for
4 IMPLEMENTATION RESULT & DISCUSSION
Implementation phase 1 (speech recognition using correlation method)
Step 1. Record voice.Using sound recorder
applications.
Open Sound Recorder. Click Start button. In search box, enter sound recorder, & in list of results click
Sound Recorder. In Windows 8, type ―sound recorder‖ while on Start screen & select Sound Recorder from
search results. Sound Recorder would not open if you do not has a microphone attached to your computer.
To play back audio, you must has speakers—or a pair of headphones—installed on your computer
Fig 2 convert audio to WAV
Fig 3 sound record 1
Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016
Page 5759
Fig 4 sound record 2
Implementation phase 2
Code to recognize hindi voice.
function speechrecognition(filename)
%Speech Recognition Using Correlation Method
%Write Following Command On Command Window
%speechrecognition('test.wav')
voice=wavread(filename);
x=voice;
x=x';
x=x(1,:);
x=x';
y1=wavread('ek.wav');
y1=y1';
y1=y1(1,:);
y1=y1';
z1=xcorr(x,y1);
m1=max(z1);
l1=length(z1);
t1=-((l1-1)/2):1:((l1-1)/2);
t1=t1';
%subplot(3,2,1);
plot(t1,z1);
y2=wavread('do.wav');
y2=y2';
y2=y2(1,:);
y2=y2';
z2=xcorr(x,y2);
m2=max(z2);
l2=length(z2);
t2=-((l2-1)/2):1:((l2-1)/2);
t2=t2';
%subplot(3,2,2);
Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016
Page 5760
figure
plot(t2,z2);
y3=wavread('teen.wav');
y3=y3';
y3=y3(1,:);
y3=y3';
z3=xcorr(x,y3);
m3=max(z3);
l3=length(z3);
t3=-((l3-1)/2):1:((l3-1)/2);
t3=t3';
%subplot(3,2,3);
figure
plot(t3,z3);
y4=wavread('char.wav');
y4=y4';
y4=y4(1,:);
y4=y4';
z4=xcorr(x,y4);
m4=max(z4);
l4=length(z4);
t4=-((l4-1)/2):1:((l4-1)/2);
t4=t4';
%subplot(3,2,4);
figure
plot(t4,z4);
Fig 5 hindi sound recording
Fig 6 sound record 3
Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016
Page 5761
5 CONCLUSION
Dramatic advances has recently been made in speech recognition technology. Large-vocabulary talkerindependent recognizers provide error rates that are less than 10% for read sentences recorded in a quiet
environment. Machine performance, how-ever, deteriorates dramatically under degraded conditions. For
example, error rates increase to roughly 40% for spontaneous speech & to 23% within channel variability &
noise. Human error rates remain below 5% in quiet & under similar degraded conditions. Comparisons using
many speech corpora demonstrate that human word error rates are often more than an order of magnitude
lower than those of current recognizers in both quiet & degraded environments. In general, superiority of
human performance increases in noise, & for more difficult speech material such as spontaneous speech. Although current speech recognition technology is well suited to many practical commercial applications,
these results suggest that there is much room for improvement.
REFERENCES
1. "Speaker Independent Connected Speech Recognition- Fifth Generation Computer Corporation".
Fifthgen.com. Retrieved 2013-06-15.
2. "British English definition of voice recognition". Macmillan Publishers Limited. Retrieved February
21, 2012.
3. "voice recognition, definition of". WebFinance, Inc. Retrieved February 21, 2012.
4. "The Mailbag LG #114". Linuxgazette.net. Retrieved 2013-06-15.
5. Reynolds, Douglas; Rose, Richard (January 1995). "Robust text-independent speaker identification
using Gaussian mixture speaker models" (PDF). IEEE Transactions on Speech & Audio Processing
(IEEE) 3 (1): 72–83. doi:10.1109/89.365379. ISSN 1063-6676. OCLC 26108901. Retrieved 21
February 2014.
6. "Speaker Identification (WhisperID)". Microsoft Research. Microsoft. Retrieved 21 February 2014.
When you speak to someone, they don't just recognize what you say: they recognize who you are.
WhisperID would let computers do that, too, figuring out who you are by way you sound.
7. Huffman, Larry. "Stokowski, Harvey Fletcher, &Bell Labs Experimental Recordings".
www.stokowski.org. Retrieved February 17, 2014.
8. Juang, B. H.; Rabiner, Lawrence R. "Automatic speech recognition–a brief history of technology
development" (PDF). p. 6. Retrieved 17 January 2015.
9. Pierce, John (1969). "Whither Speech Recognition". Journal of Acoustical Society of America.
doi:10.1121/1.1911801.
10. Benesty, Jacob; Sondhi, M. M.; Huang, Yiteng (2008). Springer Handbook of Speech Processing.
Springer Science & Business Media. ISBN 3540491252.
11. BLECHMAN, R.O.; BLECHMAN, NICHOLAS (June 23, 2008). "Hello, Hal". New Yorker.
Retrieved 17 January 2015.
12. Funding A Revolution. National Academy Press. 1999. Retrieved 22 January 2015.
13. Lowerre, Bruce. "The Harpy Speech Recognition System", Ph.D. thesis, Carnegie Mellon
University, 1976
14. Huang, Xuedong; Baker, James; Reddy, Raj. "A Historical Perspective of Speech Recognition".
Communications of ACM. Retrieved 20 January 2015.
15. Juang, B. H.; Rabiner, Lawrence R. "Automatic speech recognition–a brief history of technology
development" (PDF). p. 10. Retrieved 17 January 2015.
16. "History of Speech Recognition". Retrieved 17 January 2015.
17. Morgan, Nelson; Cohen, Jordan; Krishnan, Sree Hari; Chang, S; Wegmann, S (2013). Final Report:
OUCH Project (Outing Unfortunate Characteristics of HMMs). CiteSeerX: 10.1.1.395.7249.
Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016
Page 5762
18. "Nuance Exec on iPhone 4S, Siri, & Future of Speech". Tech.pinions. October 10, 2011. Retrieved
November 23, 2011.
19. Kincaid, Jason. "The Power Of Voice: A Conversation within Head Of Google's Speech
Technology". Tech Crunch.
20. Froomkin, Dan. "THE COMPUTERS ARE LISTENING". Intercept. Retrieved 20 June 2015.
21. NIPS Workshop: Deep Learning for Speech Recognition & Related Applications, Whistler, BC,
Canada, Dec. 2009 (Organizers: Li Deng, Geoff Hinton, D. Yu).
22. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen,
T. Sainath, B. Kingsbury (2012). "Deep Neural Networks for Acoustic Modeling in Speech
Recognition --- shared views of four research groups," IEEE Signal Processing Magazine, vol. 29,
no. 6, pp. 82-97.
23. Deng, L.; Hinton, G.; Kingsbury, B. (2013). "New types of deep neural network learning for speech
recognition & related applications: An overview (ICASSP)".
24. Markoff, John (November 23, 2012). "Scientists See Promise in Deep-Learning Programs". New
York Times. Retrieved 20 January 2015.
25. Morgan, Bourlard, Renals, Cohen, Franco (1993) "Hybrid neural network/hidden Markov model
systems for continuous speech recognition. ICASSP/IJPRAI"
Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016
Page 5763