Investigations of the Synthesis and Evaluation of Alaryngeal Speech Generated using Ultrasonic Excitations Romilla Malla Bhat1, Parveen Lehana2 1Department of Electronics, Gandhi Memorial Science College, Jammu, Jammu and Kashmir, India. 2Department of Physics and Electronics, University of Jammu, Jammu, Jammu and Kashmir, India [email protected], [email protected] Abstract Interference pattern of simulated ultrasonic waves using Matlab(free version R2010a) using the law of acoustics, have been used to get excitation in the audio frequency range. Priliminary experiments have been done using different high frequencies and the recordings have been done in Goldwave 5.1 version and then subsequent analysis have been done in Praat to analyze the beat frequency obtained by interfering two high frequency(above 15KHz) waves. The output thus obtained has been then subjected to Hilbert Transform for envelope detection using Matlab (free version R2010a) to get the audio excitations. These audio excitations are further articulated by the glottal passage for speech synthesis Index Terms: speech synthesis, ultrasonic waves, Hilbert transform Introduction Speech is a bio-acoustic signal, most easily produced but most difficult to decipher. Speech signal is generated when air stream from the lungs causes the vocal folds to vibrate in the larynx or cause some turbulence at some point in the vocal tract. The shape of the vocal tract is responsible for generating different types of sounds. Speech production mechanism has been very lucidly explained by Fant’s model[1] [2]. Now persons who suffer from throat cancer, oropharengeal diseases or due to injury require their larynx and vocal chords or sometimes a part of it to be removed by surgical operation. Such persons are unable to produce this bio-acoustic signal. They can be trained although with difficulty to use upper part of esophagus to provide excitation to the vocal tract or use some external methods such as artificial electrolarynx to communicate with the outside world [3][4][5]. These hand held electrolarynxes are electromechanical in nature, are thus bulky and have a lot of background noise. The presence of background noise degrades the speech quality which can be enhanced by post processing filtering as shown in Figure 1. These methods can be software or hardware based [6] [7] [8][9][10]. The other method can be by adequate shielding of the electrolarynx casing. However, these techniques do not improvise much and the speech signal produced is mechanical or musical in nature and annoying to the listener and thus cause a sociological impact on the user (laryngectomee). Figure 1: Enhancement of the speech signal using electrolarynx NP1 (courtesy SP Lab IIT Bombay ) Nose Output Ultrasonic Generator Speech output Velum Oral Pharyngeal Cavity Cavity Tongue Hump Stoma Vocal Chords Removed Nasal Cavity Trachea Mouth Output Ultrasonic Generator Lungs Our motivation is that this background noise can be reduced by generating the audio excitation to the vocal tract by the interference of two inaudible high frequency ultrasonic signals Figure 2. Muscle Force Figure 2: Schematic diagram of speech synthesis using ultrasonic generators. MICROPHONE AMPLIFIER4 TRANSDUCER1 TRANSDUCER2 ADC LP LPF1 F1 AMPLIFIER1 LPF2 AMPLIFIER2 LPF3 AMPLIFIER3 A/ DMUX ARM CONTROLLER PWM (a) SPEAKER ADDRESS SPEAKER ON/OFF PITCH CONTROL Figure 3: Proposed schematic diagram of experimental setup for speech synthesis using ultrasonic generators. Discussion Investigations have revealed that when two different frequencies in the audible range using MATLAB (Free version R 2010a) were applied to two twitters(Pioneer) through the amplifiers (assembled), it provided the beat frequency which were recorded using Goldwave v5.69 was in the audible range and after analyzing in Praat we could detect the difference(or the beat signal). As we approached the ultrasonic range the beat frequency produced was recorded using Goldwave v5.69(Free Version)and it was found that it was not in the audible range. In order to get the audible signal, the recorded signal was further subjected to Hilbert Transform for envelop detection using MATLAB (free version R2010a ) and the audio frequency equal to the difference signal of the various high frequency signals was obtained. The next part of the experiment is to apply this audio signal to the glottal tract made by a 3D printer. Initially the shape of the vocal tract will be modeled according to the three cardinal vowels using LF model[12] as it is not easy to experiment will actual laryngectomee. Then it can be later extended to other vowels and consonants. (b) (c) Spectrograms for 17KHz (a)difference 200 Hz,(b)300Hz, (c)600Hz respectively Conclusions Through this Doctoral Workshop paper presentation we have tried to present some of our results for your consideration and to guide us in our future endeavour. The results are a part of our investigations in which both software as well as hardware techniques have been used to present our idea. Acknowledgements The ISCA Board would like to thank the organizing committees of the past INTERSPEECH conferences for their help and for kindly providing the template files. References [1] [2] [3] G. Fant,Acoustic Theory of Speech Production. The Hague: Mouton,1960. L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Englewood Cliffs,New Jersey: Prentice Hall, 1978. Y. Lebrun, “History and development of laryngeal prosthetic devices,” The Artificial Larynx, Amsterdam: Swets and Zeitlinger, pp. 19-76, 1973. [4] L. P. Goldstein, “History and development of laryngeal prosthetic devices,” ElectrostaticAnalysis and Enhancement of Alaryngeal Speech, pp. 137-165, year not known. [0] "Speechaids", http://jkemp.larynxlink.com/speechaids.htm, Jan 2002. [6] S. S. Pratapwar, P. C. Pandey, and P. K. Lehana, “Reduction of background noise in alaryngeal speech using spectral subtraction with quantile based noise estimation,” in Proc. of 7th World Multiconference on Systemics, Cybernetics and Informatics SCI, (Orlando, USA, 2003). [8] Q. Yingyong and B. Weinberg, “Low-frequency energy deficit in electro laryngeal speech,” J. Speech and Hearing Research, vol. 34, pp.1250- 1256, 1991. [0] P. C. Pandey, S. M. Bhatnagar, G. K. Bachher, and P. K. Lehana, “Enhancement of alaryngeal speech using spectral subtraction,” in Proc. DSP2002 (1-3 July 2002), Santorini, Greece, 591-594. [0] Hanjan Liu, Qin zhao, Mingxi Wan and Supin Wang. “Application of spectral subtraction method on enhancement of electro larynx speech” J. Acoust. Soc. Am., vol. 120, no. 1, pp. 398-406, 2006. [11] 3D Simulation of an Audible Ultrasonic Electrolarynx Using Difference Waves Patrick Mills mail, Jason ZaraPublished: November 17, 2014 DOI: 10.1371/journal.pone.0113339 [12] G.Fant,J.Liljencrants,and Q.Lin.A four Parameter model of glottal flow. STL-QPSR,4:1--13,1985
© Copyright 2026 Paperzz