Class web page
HCS 7367
Speech Perception Lab
http://www.utdallas.edu/~assmann/hcs7367/
• Course information
• Lab details
Dr. Peter Assmann
Fall 2013
• Speech demos
• Matlab programs used for class assignments
• Additional resources
Matlab Background
Kermit Sigmon, MATLAB Primer 2nd Edition.
http://www.fi.uib.no/Fysisk/Teori/KURS/WRK/mat/singlemat.html
Praat : doing phonetics by computer
Download Praat:
Praat tutorial:
http://www.fon.hum.uva.nl/praat/
http://www.fon.hum.uva.nl/praat/manual/Intro.html
Getting started with Matlab (The MathWorks):
http://www.mathworks.com/help/techdoc/learn_matlab/bqr_2pl.html
UTD IR – Matlab and Simulink: Resources for Getting Started
http://www.utdallas.edu/ir/how-to/ml_help/index.html
Wavesurfer
Download Wavesurfer:
Wavesurfer User Manual
Starting with Matlab
www.speech.kth.se/wavesurfer
www.speech.kth.se/wavesurfer/man.html
Interactive MATLAB Tutorial
http://www.mathworks.com/help/techdoc/learn_matlab/f0‐11759.html
http://www.mathworks.com/academia/student_center/tutorials/ml_onramp/player.html?slide=1
Start Matlab
docMatlab
Clickon“GettingStarted”
Thislaunchesavideoinyourbrowser
1
Dates for lab assignments
Term project: important dates
Lab assignment 1: Sept 19
Sept 5: Submit project topics
Lab assignment 2: Oct 10
Sep 26: Turn in project outline
Lab assignment 3: Oct 31
Oct 3: Preliminary project presentations
Lab assignment 4: Nov 21
Nov 14/21: Oral presentations
• 3 page reports (with figures) on lab projects
Dec 12: Final project paper due
Examples of topics
Initial stages
• Acoustic analysis and intelligibility of children’s speech
• Identify a topic area and read the relevant papers
• Neural network models of vowel recognition
• Refine your topic; choose a manageable problem
• Simulating distortions introduced by hearing loss
• Set specific goals and define evaluation metric
• Noise reduction algorithms for hearing aid processors
• Production and perception of foreign accents
• Contribution of prosody to connected speech intelligibility
• Identify the approach to solve the problem
• Start right away.
• Effects of noise, reverberation on speech communication
• Monaural vs. binaural speech understanding in noise
• Development of speech perception in infants
• Models of speech coding in the auditory cortex
Finding papers
PubMed search engine:
http://www.ncbi.nlm.nih.gov/entrez/
Finding papers
PubMed search engine
http://www.ncbi.nlm.nih.gov/entrez/
Find more
papers
Find free fulltext articles
2
Finding papers
Fundamental frequency (F0)
Journal of the Acoustical Society of America:
http://scitation.aip.org/jasa/
Fundamental frequency (F0) is determined by the
rate of vocal fold vibration, and is responsible for the
perceived voice pitch.
Harmonicity and Periodicity
Audio demo: the source signal
Source signal for an adult male voice
Source signal for an adult female voice
Source signal for a 10-year child
• Period: regularly repeating pattern in
the waveform Period duration T = 6 ms
Waveform
0
F0 = 1000 / 6 = 166 Hz
Amplitude (dB)
20
Harmonics are
integer multiples
of F0 and are
evenly spaced in
frequency
F0 = 1 / T0
0
-20
Amplitude
Spectrum
-40
0
0.5
1
1.5
2
2.5
Frequency (kHz)
Source properties
In voiced sounds the glottal source spectrum contains
a series of lines called harmonics.
The lowest one is called the fundamental frequency
(F0).
F0
0
Amplitude
Spectrum
Relative Amplitude (dB)
-10
-20
-30
-40
-50
0
200
400
600
800
1000
Frequency (Hz)
3
Filter properties
The vocal tract resonances (called formants)
produce peaks in the spectrum envelope.
Formants are labeled F1, F2, F3, ... in order of
increasing frequency.
F1
Demo: harmonic synthesis
F2
F3
Amplitude
Spectrum
(with superimposed
LPC spectral envelope)
Amplitude in dB
0
F4
-10
Additive harmonic synthesis: vowel /i/
Cumulative sum of harmonics: vowel /i/
Additive synthesis: “wheel”
(.wav)
Cumulative sum of partials:
(.wav)
(.wav)
(.wav)
-20
-30
-40
-50
0
1
2
3
Frequency (kHz)
4
Uniform tube model (schwa)
Vocal tract properties
Resonating tube model
–
approximation for neutral vowel (schwa), [ə]
–
closed at one end (glottis); open at the other (lips)
uniform cross-sectional area
curvature is relatively unimportant
–
–
Glottis
/ə/
Lips
length, L
American English vowel space
F1→
front
i “heed”
high
center
←F2
back
u “who’d”
ɩ “hid”
ʊ “hood”
e “hayed”
mid
ɛ “head”
o “hoed”
low
æ “had”
Ə “schwa”
ɔ “hawed”
ʌ “hut”
ɑ “hod”
Height
Second formant, F2 frequency (Hz)
Advancement
Acoustic vowel space
3000
i “heed”
2000
Ə
1000
ɑ “hod”
u “who’d”
0
0
200
400
600
800
1000
First formant, F1 frequency (Hz)
4
Vocal tract model
Quarter-wave resonator:
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 L
Fn = ( 2n – 1 ) c / 4 L
–
Fn is the frequency of formant n in Hz
–
F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz
–
c is the velocity of sound in air (about 35000 cm/sec)
–
F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz
–
L is the length of the vocal tract (17.5 for adult male)
–
F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz
L
L
Helium speech
The speed of sound in a helium/oxygen mixture
at 20°C is about 93000 cm/s, compared to
35000 cm/s in air. This increases the resonance
frequencies but has relatively little effect on F0.
In helium speech, the formants are shifted up
but the pitch stays the same.
Note that the
vowel /Ə/
(‘schwa’ ) has
formants at
odd multiples
of F1
Helium speech
Using Matlab as a calculator, find the
frequencies of F1, F2 and F3 for a 17.5 cm
vocal tract producing the vowel /ə/ in a
helium/air mixture (velocity c ≈ 93000 cm/s)
Fn = ( 2n – 1 ) c / 4 L
F1 = (2*(1) - 1)*93000/(4*17.5) = 1329
F2 = (2*(2) - 1)*93000 /(4*17.5) = 3986
F3 = (2*(3) - 1)*93000 /(4*17.5) = 6643
Speech in air
Helium speech
Audio demos
–
Speech in air
–
Speech in helium
–
Pitch in air
–
Pitch in helium
4
3
Frequency (kHz)
2
1
0
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html
0
100
200
300
400 500 600
Time (ms)
700
800
900
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html
5
Vocal tract model
Speech in helium
4
Frequency (kHz)
3
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 l
2
where Fn is the frequency of formant n
c is the velocity of sound (about 35000 cm/sec)
l is the vocal tract tube length (17.5 cm for adult male)
1
0
L
0
100
200
300
400
500
Time (ms)
600
700
800
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html
Perturbation Theory
Perturbation Theory
The first formant (F1) frequency is lowered by
a constriction in the front half of the vocal tract
(/u/ and /i/), and raised when the constriction is
in the back of the vocal tract, as in //.
delta
F1
glottis
delta
F2
lips
Perturbation Theory
glottis
lips
Perturbation Theory
The third formant (F3) is lowered by a
constriction at the lips or at the back of the
mouth or in the upper pharynx. This occurs in
/r/ and /r/-colored vowels like American
English / ɚ / (as in “herd”).
F3 is raised when the constriction is behind
the lips and teeth or near the upper pharynx.
delta
F3
delta
F3
glottis
The second formant (F2) is lowered by a
constriction near the lips or just above the
pharynx; in /u/ both of these regions are
constricted. F2 is raised when the constriction is
behind the lips and teeth, as in the vowel /i/.
glottis
lips
lips
6
Perturbation Theory
Perturbation Theory
All formants tend to drop in frequency when
the vocal tract length is increased or when a
constriction is formed at the lips.
F1 frequency is correlated with jaw
opening (and inversely related to tongue
height ).
Amplitude in dB
0
-10
-20
-30
amplitude
spectrum
-40
glottis
-50
0
lips
1
2
3
Frequency (kHz)
4
Perturbation Theory
F2 frequency is correlated with tongue
advancement (front-back dimension)
Amplitude in dB
0
-10
-20
-30
amplitude
spectrum
-40
-50
0
1
2
3
Frequency (kHz)
4
Spectral analysis
Amplitude spectrum: sound pressure levels
associated with different frequency
components of a signal
Power or intensity
Amplitude or magnitude
Log units and decibels (dB)
Phase spectrum: relative phases associated
with different frequency components
Degrees or radians
Spectral analysis of speech
Why perform a frequency analyses of speech?
–
Ear+brain carry out a form of frequency analysis
–
Relevant features of speech are more readily visible
in the amplitude spectrum than in the raw waveform
7
Spectral analysis of speech
But: the ear is not a spectrum analyzer.
–
Auditory frequency selectivity is best at low
frequencies and gets progressively worse at higher
frequencies.
Short-term amplitude spectrum
F1 = 281 Hz
F2 = 2196 Hz
F3 = 2755 Hz
60
50
Amplitude (dB)
40
30
20
10
0
-10
0
1
2
Frequency (kHz)
3
4
Speech spectrograms
What is a speech spectrogram?
–
–
Display of amplitude spectrum at successive
instants in time ("running spectra")
How can 3 dimensions be represented on a twodimensional display?
Gray-scale spectrogram
Waterfall plots
Animation
Speech spectrograms
Why are speech spectrograms useful?
–
–
–
–
Shows dynamic properties of speech
Incorporates frequency analysis
Related to speech production
Helps to visually identify speech cues
8
“The watchdog”
waveform
spectrogram
Frequency (kHz)
5
F3
F2
F1
0
Digital representations of signals
Digital representations of signals
Sampling frequency (e.g. 44.1 kHz)
amplitude
Quantization rate (16 bits)
time
sampling
16 bits =216 quantization steps
Effects of discrete-level quantization on
dynamic range
quantization
In-class assignment
Nyquist frequency
Effects of discrete-time sampling on bandwidth
Use the wavesurfer program on your
laptop as a sound recording device.
Left-click red button to record the vowels
“ee”, “ah” and “oo,” in quick sequence.
In-class assignment
Use wavesurfer to make a spectrogram of
the vowels. Right click on waveform plot
to add spectrogram + formant tracks.
9
In-class assignment
Left-click and drag mouse to select the
desired region in the signal. Then rightclick and select “Statistics”.
In-class assignment
For this vowel (“ee”) the estimated
formant frequencies are F1=174 Hz,
F2=1849 Hz, F3=2492 Hz, F4=3392 Hz.
In-class assignment
This will display the formant frequencies
(mean and standard deviation across
n=13 frames, in this example).
In-class assignment
Now measure the formants in your
productions of the vowels “ee”, “ah”, “oo”.
Make a table of F1, F2 and F3 frequencies.
i “heed”
Save the waveform as vowels.wav
Load into Praat and Matlab and repeat the
assignment (instructions to follow).
u “who’d”
F1
F1
F1
F2
F2
F2
F3
F3
F3
Vector representation of speech
In-class assignment
ɑ “hod”
In Matlab speech signals are represented as row or
column vectors (e.g., N rows x 1 columns, where N
is the number of samples in the waveform).
>> [y,fs]=wavread(‘wheel.wav’); % load waveform
>> size( y )
ans =
3200 1
The variable ‘y’ has 3200 rows x 1 column (row vector).
The variable ‘fs’ has 1 row x 1 column (scalar).
10
Spectral analysis in Matlab
Vector representation of speech
Load the waveform and plot it:
>> [y,fs]=wavread(‘wheel.wav’); % load waveform
>> t=(1:length(y) ) ./ (fs/1000); % set up time axis
>> plot( t, y );
% use plot command
>> axis( [ 0 400 -1 1 ] ); % set axis limits
>> xlabel('Time (ms)');
% x-axis label
>> ylabel('Amplitude');
% y-axis label
>> title('Waveform plot'); % axis title
FFT Discrete Fourier transform.
FFT(X) is the discrete Fourier transform (DFT) of vector
X. If the length of X is a power of two, a fast radix-2 fastFourier transform algorithm is used. If the length of X is
not a power of two, a slower non-power-of-two algorithm
is employed. For matrices, the FFT operation is applied to
each column.
FFT(X,N) is the N-point FFT, padded with zeros if X has less
than N points and truncated if it has more.
Spectral analysis in Matlab
Log magnitude (amplitude) spectrum:
>> X= fft (y);
>> m = 20 * log10 ( abs ( X ) );
>> help abs
Fourier spectrum of a vector:
>> X= fft (y);
>> help fft
Spectral analysis in Matlab
Log magnitude (amplitude) spectrum:
>> plot(20*log10(abs(fft(y))))
140
120
100
ABS
Absolute value.
80
ABS(X) is the absolute value of the elements of X. When
X is complex, ABS(X) is the complex modulus
(magnitude) of the elements of X.
Plotting amplitude spectra
» help fp
FP: function to compute & plot amplitude
spectrum
Usage: [a,f]=fp(wave,rate,window);
wave: input waveform
rate: sample rate in Hz (default 10000 Hz)
window options: 'hann', 'hamm', 'kais', or 'rect'
(default=hamming)
[a,f]: log magnitude (dB re:1), frequency (Hz)
60
40
0
200
400
600
800
1000
Plotting amplitude spectra
» [a,f]=fp(wave,rate,window);
» [a,f]=fp(y,fs,'hann');
p
20
10
0
Am plitude (dB)
-10
-20
-30
-40
-50
0
1
2
Frequency (kHz)
3
4
11
Assignment 1
Assignment 1
Part 1: (Matlab code, plots, brief summary)
• Make a set of digital recordings (WAV files) of
the 12 vowels of American English:
/i/ "heed"
/ɪ/ "hid"
/e/ "hayed"
/ɛ/ "head"
/æ/ "had"
/ʌ/ "hud"
/ɑ/ "hod"
/ɔ/ "hawed"
/o/ "hoed"
/ʊ/ "hood"
/u/ "who’d"
/ɚ/ "herd"
• Load waveforms into Matlab; make 12
subplots of the amplitude spectra of the vowels,
sampled near the midpoint.
» [ y, fs ] = wavread ('heed.wav');
» subplot (4,3,1);
» start = ( length (y) / 2 ) - 256;
» stop = ( length (y) / 2 ) + 256;
» fp ( y ( start : stop ) , 512 , fs, 'heed.wav', 'Hamming');
Assignment 1
Assignment 1
• Plot the amplitude spectra of the vowels. Place
all 12 plots in a single figure window using the
subplot command:
>> subplot ( 3, 4, 1);
>> plot ( x, y );
>> filenames = char ( 'heed', 'hid', 'hayed', …
'head', 'had', 'hud', 'herd', 'hod', …
// "heed"
// "hid"
// "hayed"
// "head"
// "had"
// "hud"
// "hod"
// "hawed"
// "hoed"
// "hood"
// "who’d"
// "herd"
'hawed', 'hoed', 'hood', 'whod' ) ;
>> subplot ( 3, 4, 2);
>> plot ( x, y );
Assignment 1
• Step 2: Load the waveform of each vowel from
the disk:
>> for i=1:12,
>>
[ y, rate ] = wavread ( deblank ( filenames ( i , : ) ) );
>>
y = y * 2^15; % scale signal to 16-bit range (±215)
>>
• Step 1: Make a list of the filenames as a character
array:
>> deblank ( filenames ( 3, : ) )
ans =
hayed
Assignment 1
• Step 2: extract the middle part from the waveform
>> % extract samples that lie between start and stop:
>> y = y( start : stop ); % but how do we select start and stop?
start
stop
% insert plot commands here
>> end;
12
Exercise1
Exercise1
• Find out various properties of the waveform:
» length ( y )
% vector length
• Step 3: Find vowel midpoint; define a range
of sample points to extract from the waveform.
» min ( y )
% minumum value
» nfft = 512;
» max ( y )
% maximum value
» start = ( length (y) / 2 ) – (nfft/2 – 1);
» mean ( y )
% mean value
» stop = ( length (y) / 2 ) + nfft/2;
» plot ( y )
% inspect waveform
% y ( start : stop )
» sound ( y, rate ) % listen to waveform
Exercise1
Function M-file: fp.m
• Step 4: Use the function fp.m to compute and
plot the amplitude spectrum of the vowel
segment: input arguments
» fp ( y( start : stop ) , fs , 'Hamm' );
• There are two types of M-files: scripts and functions. To
display the contents of an M-file, type the following:
» type fp.m
• Function M-files start with a function statement (see next
page) and a series of comment lines. The comment lines
are included to provide online help and are optional (but
input vector
(waveform
segment)
sample
rate
type of
window
function
very useful!). The next five slides illustrate and explain
the contents of the function fp.m
Function M-file: fp.m
Function M-file: fp.m
% FP: function to compute & plot amplitude spectrum
% Usage: [a,f]=fp(wave,rate,window);
% set reasonable defaults for optional variables
comment
lines
% wave: input waveform
rate=10000;
% rate: sample rate in Hz (default 10000 Hz)
set
defaults
end;
% window options: 'hann', 'hamm', 'kais', 'rect' (default=hanning)
% [a,f]: log magnitude, frequency
function [ a, f ] = fp ( x, rate, window ) ;
if ~exist ( 'rate' , 'var' ) ,
function
statement
if ~exist ( 'window' , 'var' ),
window = 'hamm' ;
end;
optional output arguments
a=log magnitude spectrum
f=corresponding frequencies
13
Function M-file: fp.m
Function M-file: fp.m
% illustration of if-else statements:
x=x(:);
% convert x to column vector
window=lower(window); % window must be lower case
n = length ( x ) ;
% length of data vector
if window=='rect',
% rectangular window = [1 1 1 1 1]
x=x.*ones(n,1);
% multiplying x by 1 does nothing!
elseif window=='hamm',
Variables defined inside a
function are “local.” In
other words, they are
not accessible on the
command line, outside
the function itself.
x=x.*hamming(n);
% multiply x by Hamming window
elseif window=='hann',
x=x.*hanning(n);
% multiply x by Hanning window
else,
x=x.*hamming(n);
% default case: Hamming window
end;
Function M-file: fp.m
Function M-file: fp.m
m=fft(x,n); % Fast Fourier Transform (fastest if n = power of 2)
% plot amplitude spectrum: frequency vs. amplitude
no2=round(n/2);
plot ( freq , amp ) ; % frequency = x-axis, amplitude=y-axis
% n/2 samples: FFT is symmetrical
a=20*log10( abs ( m ) / n); % convert linear magnitude to dB
axis( [ 0 rate/2000 -Inf Inf ] ) ; % axis range: [ xl xh yl yh ]
f=rate/n*(0:no2)/1000; % frequency scale: DC = 0 to fs/2
freq = f (1:no2);
% retain only the first n/2 samples
amp = a (1:no2);
% retain only the first n/2 samples
Exercise1
Exercise: modify fp.m
% modify fp.m to compute phase spectrum
• Annotate graph:
>> xlabel ( 'Frequency (kHz)' );
% ****** End of function fp.m ******
% x-axis label
phase = unwrap ( angle (m) ) ;
p = 180 / phase; % convert from radians to degrees
>> ylabel ( 'Amplitude (dB)' );
% y-axis label
% plot phase spectrum: frequency vs. phase
>> title ( filenames ( i , : ) );
% graph title
plot ( freq , phase ) ; % frequency = x-axis, phase=y-axis
Turn off the axis labels by inserting an empty string:
>> ylabel ( ' ' );
axis( [ 0 rate/2000 –180 180 ] ) ;
% null axis label
14
Annotations
Modifying axes properties
>> xlabel ( 'Frequency (kHz)' );
% x-axis label
>> ylabel ( 'Amplitude (dB)' );
% y-axis label
>> title ( filenames ( i , : ) );
% graph title
• Modify default axes properties:
>> gca
% “get current axes” = axes handle
>> set ( gca, 'XLim', [ 0 4 ] );
% x-axis range
>> set ( gca, 'YLim', [ -20 40 ] ); % y-axis range
>> set ( gca, 'TickDir', 'Out' );
Amplitude spectrum
% tick mark dir
Phase spectrum
60
150
50
Phase (deg)
Amplitude (dB)
100
40
30
20
50
0
-50
10
-100
0
-10
0
-150
1
2
Frequency (kHz)
3
4
Speech spectrograms in Matlab
» help specgram
SPECGRAM Calculate spectrogram from signal.
B = SPECGRAM(A,NFFT,Fs,WINDOW,NOVERLAP)
calculates the spectrogram for the signal in vector A.
SPECGRAM splits the signal into overlapping
segments, windows each with the WINDOW vector
and forms the columns of B with their zero-padded,
length NFFT discrete Fourier transforms.
0
1
2
Frequency (kHz)
3
4
Speech spectrograms in Matlab
» help sp
sp: create gray-scale spectrogram
Usage: h=sp(wave,rate,nfft,nsampf,nhop,pre,drng);
wave: input waveform
rate: sample rate in Hz (default 8000 Hz)
nfft: FFT window length (default: 256 samples)
nsampf: number of samples per frame (default: 60)
nhop: number of samples to hop to next frame
(default: 5 samples)
pre: preemphasis factor (0-1) (default: 1)
drng: dynamic range in dB (default: 80)
title: title for graph (default: none)
15
Making spectrograms
Making spectrograms
hod
>> load wheel
% Load pre-recorded waveform
>> sp (wheel, 8000);
>> colormap(hot);
>> axis tight;
% Use defaults for other variables
6
% extends plot to axis limits
Frequency (kHz)
% determines image color scheme
5
4
3
2
1
0
TrackDraw: a graphical speech synthesizer
0
100
200
300
Time (ms)
400
500
600
TrackDraw program
Provides a graphical interface for controlling a
speech synthesizer (cascade formant synthesis,
Klatt, 1980)
Allows for successive iterations of hand-tracking,
synthesizing and listening to the results
Assmann, P., Ballard, W., Bornstein, L., and
Paschall, D. (1994). Track-Draw: A graphical
interface for controlling the parameters of a
speech synthesizer. Behavior Research Methods,
Instruments and Computers 26, 431-436.
Using TrackDraw
The “Spectral Slice” Display
» load wheel
» y=wheel;
» specsynth;
16
Fundamental Frequency (F0) window
TrackDraw: finished tracks
Amplitude of voicing (AV) window
Saving, printing and re-loading “tracks”
>> specsynth;
% when finished tracking click on exit button
>> savetr
% save tracks in file; enter name xxheedtr
% savetr will append the .mat extension
>> load xxheedtr.mat
% To re-load track files and run statistics
>> plottracks
>> print -Pljhd
17
© Copyright 2026 Paperzz