Paper Template for INTERSPEECH 2015

Investigations of the Synthesis and Evaluation of Alaryngeal Speech Generated using Ultrasonic Excitations
Romilla Malla Bhat1, Parveen Lehana2
1Department of Electronics, Gandhi Memorial Science College, Jammu, Jammu and Kashmir, India.
2Department of Physics and Electronics, University of Jammu, Jammu, Jammu and Kashmir, India
[email protected], [email protected]
Abstract
Interference pattern of simulated ultrasonic waves using
Matlab(free version R2010a) using the law of acoustics, have
been used to get excitation in the audio frequency range.
Priliminary experiments have been done using different high
frequencies and the recordings have been done in Goldwave
5.1 version and then subsequent analysis have been done in
Praat to analyze the beat frequency obtained by interfering two
high frequency(above 15KHz) waves. The output thus
obtained has been then subjected to Hilbert Transform for
envelope detection using Matlab (free version R2010a) to get
the audio excitations. These audio excitations are further
articulated by the glottal passage for speech synthesis
Index Terms: speech synthesis, ultrasonic waves,
Hilbert transform
Introduction
Speech is a bio-acoustic signal, most easily produced but most
difficult to decipher. Speech signal is generated when air
stream from the lungs causes the vocal folds to vibrate in the
larynx or cause some turbulence at some point in the vocal
tract. The shape of the vocal tract is responsible for generating
different types of sounds. Speech production mechanism has
been very lucidly explained by Fant’s model[1] [2].
Now persons who suffer from throat cancer, oropharengeal
diseases or due to injury require their larynx and vocal chords
or sometimes a part of it to be removed by surgical operation.
Such persons are unable to produce this bio-acoustic signal.
They can be trained although with difficulty to use upper part
of esophagus to provide excitation to the vocal tract or use
some external methods such as artificial electrolarynx to
communicate with the outside world [3][4][5].
These hand held electrolarynxes are electromechanical in
nature, are thus bulky and have a lot of background noise. The
presence of background noise degrades the speech quality
which can be enhanced by post processing filtering as shown
in Figure 1. These methods can be software or hardware based
[6] [7] [8][9][10]. The other method can be by adequate
shielding of the electrolarynx casing. However, these
techniques do not improvise much and the speech signal
produced is mechanical or musical in nature and annoying to
the listener and thus cause a sociological impact on the user
(laryngectomee).
Figure 1: Enhancement of the speech signal using
electrolarynx NP1 (courtesy SP Lab IIT Bombay )
Nose
Output
Ultrasonic
Generator
Speech
output
Velum
Oral
Pharyngeal
Cavity
Cavity
Tongue Hump
Stoma
Vocal
Chords
Removed
Nasal
Cavity
Trachea
Mouth
Output
Ultrasonic
Generator
Lungs
Our motivation is that this background noise can be reduced
by generating the audio excitation to the vocal tract by the
interference of two inaudible high frequency ultrasonic signals
Figure 2.
Muscle
Force
Figure 2: Schematic diagram of speech synthesis using
ultrasonic generators.
MICROPHONE
AMPLIFIER4
TRANSDUCER1
TRANSDUCER2
ADC
LP
LPF1
F1
AMPLIFIER1
LPF2
AMPLIFIER2
LPF3
AMPLIFIER3
A/ DMUX
ARM
CONTROLLER
PWM
(a)
SPEAKER
ADDRESS
SPEAKER
ON/OFF
PITCH
CONTROL
Figure 3: Proposed schematic diagram of experimental setup
for speech synthesis using ultrasonic generators.
Discussion
Investigations have revealed that when
two different
frequencies in the audible range using MATLAB (Free version
R 2010a) were applied to two twitters(Pioneer) through the
amplifiers (assembled), it provided the beat frequency which
were recorded using Goldwave v5.69 was in the audible range
and after analyzing in Praat we could detect the difference(or
the beat signal). As we approached the ultrasonic range the
beat frequency produced was recorded using Goldwave
v5.69(Free Version)and it was found that it was not in the
audible range. In order to get the audible signal, the recorded
signal was further subjected to Hilbert Transform for envelop
detection using MATLAB (free version R2010a ) and the
audio frequency equal to the difference signal of the various
high frequency signals was obtained.
The next part of the experiment is to apply this audio signal to
the glottal tract made by a 3D printer. Initially the shape of the
vocal tract will be modeled according to the three cardinal
vowels using LF model[12] as it is not easy to experiment
will actual laryngectomee. Then it can be later extended to
other vowels and consonants.
(b)
(c)
Spectrograms for 17KHz (a)difference 200 Hz,(b)300Hz,
(c)600Hz respectively
Conclusions
Through this Doctoral Workshop paper presentation we have
tried to present some of our results for your consideration and
to guide us in our future endeavour. The results are a part of
our investigations in which both software as well as hardware
techniques have been used to present our idea.
Acknowledgements
The ISCA Board would like to thank the organizing
committees of the past INTERSPEECH conferences for their
help and for kindly providing the template files.
References
[1]
[2]
[3]
G. Fant,Acoustic Theory of Speech Production.
The Hague: Mouton,1960.
L. R. Rabiner and R. W. Schafer, Digital Processing of
Speech
Signals,
Englewood
Cliffs,New
Jersey:
Prentice Hall, 1978.
Y. Lebrun, “History and development of laryngeal
prosthetic devices,” The Artificial Larynx, Amsterdam:
Swets and Zeitlinger, pp. 19-76, 1973.
[4]
L. P. Goldstein, “History and development of laryngeal
prosthetic
devices,”
ElectrostaticAnalysis
and
Enhancement of Alaryngeal Speech, pp. 137-165, year not
known.
[0]
"Speechaids", http://jkemp.larynxlink.com/speechaids.htm,
Jan 2002.
[6]
S. S. Pratapwar, P. C. Pandey, and P. K. Lehana,
“Reduction of background noise in alaryngeal speech using
spectral subtraction with quantile based noise estimation,”
in Proc. of 7th World Multiconference on Systemics,
Cybernetics and Informatics SCI, (Orlando, USA, 2003).
[8]
Q. Yingyong and B. Weinberg, “Low-frequency energy
deficit in electro laryngeal speech,” J. Speech and Hearing
Research, vol. 34, pp.1250- 1256, 1991.
[0]
P. C. Pandey, S. M. Bhatnagar, G. K. Bachher, and P. K.
Lehana, “Enhancement of alaryngeal speech using spectral
subtraction,” in Proc. DSP2002 (1-3 July 2002), Santorini,
Greece, 591-594.
[0]
Hanjan Liu, Qin zhao, Mingxi Wan and Supin Wang.
“Application of spectral subtraction method on
enhancement of electro larynx speech” J. Acoust. Soc. Am.,
vol. 120, no. 1, pp. 398-406, 2006.
[11]
3D Simulation of an Audible Ultrasonic Electrolarynx Using
Difference Waves Patrick Mills mail, Jason ZaraPublished:
November 17, 2014 DOI: 10.1371/journal.pone.0113339
[12]
G.Fant,J.Liljencrants,and Q.Lin.A four Parameter model of
glottal flow. STL-QPSR,4:1--13,1985