Volume||4||Issue||08||August-2016||Pages-5756-5763||ISSN(e):2321-7545 Website: http://ijsae.in DOI: http://dx.doi.org/10.18535/ijsre/v4i08.28 Analysis of Speech Recognition system based on Gender/ Age group / Human- Machine Authors Amita , Abhishek Bhatnagar2 1 M.tech (CE) Computer Science Department Indus Institute of Engineering and Technology Kinana (Jind) 2 Asst.Professor Computer Science Department Indus Institute of Engineering and Technology Kinana (Jind) [email protected], [email protected] ABSTRACT: Speech recognition (SR) is translation of spoken words into text. It is called automatic speech recognition or just "speech to text" (STT). Some SR systems use to training where an each speaker reads text or isolated vocabulary into system. System analyzes person's specific voice & uses it to fine-tune recognition of that person's speech. The term voice recognition or speaker refers to identifying speaker, rather than what they are saying. Recognizing speaker could simplify task of translating speech in systems that has been trained on a specific person's voice or it could be used to authenticate or verify identity of a speaker as part of a security process. In health care sector, speech recognition could be implemented in front-end or back-end of medical documentation process. Front-end speech recognition is where provider dictates into a speechrecognition engine, recognized words are displayed as they are spoken, & dictator is responsible for editing & signing off on document. 1 1 INTRODUCTION Speech recognition (SR) is the inter-disciplinary sub-field of computational linguistics which incorporates knowledge and research in the linguistics, computer science, andelectrical engineering fields to develop methodologies and technologies that enables the recognition and translation of spoken language into text by computers and computerized devices such as those categorized as smart technologies and robotics. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). Some SR systems use "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker independent"[1]systems. Systems that use training are called "speaker dependent". Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016 Page 5757 Fig 1 speech Recognition High-performance fighter aircraft Substantial efforts has been devoted in last decade to test & evaluation of speech recognition in fighter aircraft. Of particular note is U.S. program in speech recognition for Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), & a program in France installing speech recognition systems on Mirage aircraft, & also programs in UK dealing within a variety of aircraft platforms. In these programs, speech recognizers has been operated successfully in fighter aircraft, within applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates & weapons release parameters, & controlling flight display. 2 LITERATURE REVIEW As early as 1932, Bell Labs like Harvey Fletcher were investigating science of speech perception. Their system worked by locating formants in power spectrum of every utterance. 1950s era technology was limited to single-speaker systems within vocabularies of around ten words. In 1952 three Bell Labs researchers built a system for single-speaker digit recognition. Pierce's letter compared speech recognition to extracting gold from sea, curing cancer, or going to moon." Pierce speech recognition research. Raj Reddy first person to take on continuous speech recognition at Stanford University in late 1960s. Previous systems required users to make a pause after every word. Reddy's system was designed to problem spoken commands for game of chess. Also around this time Soviet researchers invented dynamic time warping algorithm. In 197 five years of speech recognition research through Understanding Research program within ambitious including vocabulary size of 1,000 words. BBN. IBM., Carnegie Mellon & Stanford Research Institute all participated in program. Government funding speech recognition research that had been largely abandoned in United States after John Pierce's letter. Despite fact that CMU's Harpy system met goals established at outset of program, many of predictions turned out to be nothing hype disappointing DARPA administrators. Several innovations happened during this time, such as invention of beam search for use in Harpy system. field also benefited from discovery of several algorithms in other fields such as linear predictive coding & cepstral analysis. 3 DESIGN METHODOLOGY & PROPOSED IMPLEMENTATION Dramatic advances had been made in speech recognition technology over past some years. Despite these commercial recognizers has been successful only in a few constrained application areas. Many researchers believe that recognizers would enjoy widespread use & become commonplace only if their performance method that of humans under everyday listening environments. In this paper measures how far research has progressed towards this goal. Results from scattered studies which has compared human & machine speech Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016 Page 5758 recognition on similar tasks are analysis to determine how much speech recognizers must be growths to match human performance. Speech corpora used in these different do not represent everyday listening conditions that span a continuum ranging from quiet read isolated words to spontaneous smart phone speech. Results demonstrate that modern speech recognizers still perform more worse than humans, both within wideband speech read in quiet, & within band-limited or noisy spontaneous speech. The human error rate for continuously spoken letters of the alphabet and the machine error rate for 4 IMPLEMENTATION RESULT & DISCUSSION Implementation phase 1 (speech recognition using correlation method) Step 1. Record voice.Using sound recorder applications. Open Sound Recorder. Click Start button. In search box, enter sound recorder, & in list of results click Sound Recorder. In Windows 8, type ―sound recorder‖ while on Start screen & select Sound Recorder from search results. Sound Recorder would not open if you do not has a microphone attached to your computer. To play back audio, you must has speakers—or a pair of headphones—installed on your computer Fig 2 convert audio to WAV Fig 3 sound record 1 Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016 Page 5759 Fig 4 sound record 2 Implementation phase 2 Code to recognize hindi voice. function speechrecognition(filename) %Speech Recognition Using Correlation Method %Write Following Command On Command Window %speechrecognition('test.wav') voice=wavread(filename); x=voice; x=x'; x=x(1,:); x=x'; y1=wavread('ek.wav'); y1=y1'; y1=y1(1,:); y1=y1'; z1=xcorr(x,y1); m1=max(z1); l1=length(z1); t1=-((l1-1)/2):1:((l1-1)/2); t1=t1'; %subplot(3,2,1); plot(t1,z1); y2=wavread('do.wav'); y2=y2'; y2=y2(1,:); y2=y2'; z2=xcorr(x,y2); m2=max(z2); l2=length(z2); t2=-((l2-1)/2):1:((l2-1)/2); t2=t2'; %subplot(3,2,2); Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016 Page 5760 figure plot(t2,z2); y3=wavread('teen.wav'); y3=y3'; y3=y3(1,:); y3=y3'; z3=xcorr(x,y3); m3=max(z3); l3=length(z3); t3=-((l3-1)/2):1:((l3-1)/2); t3=t3'; %subplot(3,2,3); figure plot(t3,z3); y4=wavread('char.wav'); y4=y4'; y4=y4(1,:); y4=y4'; z4=xcorr(x,y4); m4=max(z4); l4=length(z4); t4=-((l4-1)/2):1:((l4-1)/2); t4=t4'; %subplot(3,2,4); figure plot(t4,z4); Fig 5 hindi sound recording Fig 6 sound record 3 Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016 Page 5761 5 CONCLUSION Dramatic advances has recently been made in speech recognition technology. Large-vocabulary talkerindependent recognizers provide error rates that are less than 10% for read sentences recorded in a quiet environment. Machine performance, how-ever, deteriorates dramatically under degraded conditions. For example, error rates increase to roughly 40% for spontaneous speech & to 23% within channel variability & noise. Human error rates remain below 5% in quiet & under similar degraded conditions. Comparisons using many speech corpora demonstrate that human word error rates are often more than an order of magnitude lower than those of current recognizers in both quiet & degraded environments. In general, superiority of human performance increases in noise, & for more difficult speech material such as spontaneous speech. Although current speech recognition technology is well suited to many practical commercial applications, these results suggest that there is much room for improvement. REFERENCES 1. "Speaker Independent Connected Speech Recognition- Fifth Generation Computer Corporation". Fifthgen.com. Retrieved 2013-06-15. 2. "British English definition of voice recognition". Macmillan Publishers Limited. Retrieved February 21, 2012. 3. "voice recognition, definition of". WebFinance, Inc. Retrieved February 21, 2012. 4. "The Mailbag LG #114". Linuxgazette.net. Retrieved 2013-06-15. 5. Reynolds, Douglas; Rose, Richard (January 1995). "Robust text-independent speaker identification using Gaussian mixture speaker models" (PDF). IEEE Transactions on Speech & Audio Processing (IEEE) 3 (1): 72–83. doi:10.1109/89.365379. ISSN 1063-6676. OCLC 26108901. Retrieved 21 February 2014. 6. "Speaker Identification (WhisperID)". Microsoft Research. Microsoft. Retrieved 21 February 2014. When you speak to someone, they don't just recognize what you say: they recognize who you are. WhisperID would let computers do that, too, figuring out who you are by way you sound. 7. Huffman, Larry. "Stokowski, Harvey Fletcher, &Bell Labs Experimental Recordings". www.stokowski.org. Retrieved February 17, 2014. 8. Juang, B. H.; Rabiner, Lawrence R. "Automatic speech recognition–a brief history of technology development" (PDF). p. 6. Retrieved 17 January 2015. 9. Pierce, John (1969). "Whither Speech Recognition". Journal of Acoustical Society of America. doi:10.1121/1.1911801. 10. Benesty, Jacob; Sondhi, M. M.; Huang, Yiteng (2008). Springer Handbook of Speech Processing. Springer Science & Business Media. ISBN 3540491252. 11. BLECHMAN, R.O.; BLECHMAN, NICHOLAS (June 23, 2008). "Hello, Hal". New Yorker. Retrieved 17 January 2015. 12. Funding A Revolution. National Academy Press. 1999. Retrieved 22 January 2015. 13. Lowerre, Bruce. "The Harpy Speech Recognition System", Ph.D. thesis, Carnegie Mellon University, 1976 14. Huang, Xuedong; Baker, James; Reddy, Raj. "A Historical Perspective of Speech Recognition". Communications of ACM. Retrieved 20 January 2015. 15. Juang, B. H.; Rabiner, Lawrence R. "Automatic speech recognition–a brief history of technology development" (PDF). p. 10. Retrieved 17 January 2015. 16. "History of Speech Recognition". Retrieved 17 January 2015. 17. Morgan, Nelson; Cohen, Jordan; Krishnan, Sree Hari; Chang, S; Wegmann, S (2013). Final Report: OUCH Project (Outing Unfortunate Characteristics of HMMs). CiteSeerX: 10.1.1.395.7249. Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016 Page 5762 18. "Nuance Exec on iPhone 4S, Siri, & Future of Speech". Tech.pinions. October 10, 2011. Retrieved November 23, 2011. 19. Kincaid, Jason. "The Power Of Voice: A Conversation within Head Of Google's Speech Technology". Tech Crunch. 20. Froomkin, Dan. "THE COMPUTERS ARE LISTENING". Intercept. Retrieved 20 June 2015. 21. NIPS Workshop: Deep Learning for Speech Recognition & Related Applications, Whistler, BC, Canada, Dec. 2009 (Organizers: Li Deng, Geoff Hinton, D. Yu). 22. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury (2012). "Deep Neural Networks for Acoustic Modeling in Speech Recognition --- shared views of four research groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97. 23. Deng, L.; Hinton, G.; Kingsbury, B. (2013). "New types of deep neural network learning for speech recognition & related applications: An overview (ICASSP)". 24. Markoff, John (November 23, 2012). "Scientists See Promise in Deep-Learning Programs". New York Times. Retrieved 20 January 2015. 25. Morgan, Bourlard, Renals, Cohen, Franco (1993) "Hybrid neural network/hidden Markov model systems for continuous speech recognition. ICASSP/IJPRAI" Amita , Abhishek Bhatnagar IJSRE Volume 4 Issue 8 August 2016 Page 5763
© Copyright 2026 Paperzz