LAB 2 1.0 Lab overview and objectives 2.0 The spectrogram

LAB 2
Machine Perception of Music
Computer Science 395, Winter Quarter 2005
1.0 Lab overview and objectives
This lab will introduce you to displaying and analyzing sounds with spectrograms, with
an emphasis on getting a feel for the relationship between harmonicity, pitch, and
frequency.
Lab due time/date: 2PM, January 24, 2005
What to hand in: A text file containing short answers (at least a sentence, no more than
a paragraph, except where noted) to each question in each section of this lab.
Format of required submission: an email attachment to the course email address
([email protected]). DON’T “ZIP” YOUR ATTACHMENT!
For some reason, the mail server is giving us problems with zipped attachments.
2.0 The spectrogram
The MATLAB Signal Processing Toolbox provides a function, specgram, that returns the
time-dependent Fourier transform for a sequence, or displays this information as a
spectrogram. For your convenience, I have appended the MATLAB help page for
specgram to the end of this document. You will need to consult this page.
The time-dependent Fourier transform is the discrete-time Fourier transform for a
sequence, computed using a sliding window. This form of the Fourier transform, also
known as the short-time Fourier transform (STFT), has numerous applications in speech,
sonar, and radar processing. You will explore some of these applications in this lab.
The specgram function calculates the spectrogram for a given signal as follows (yes, this
is drawn from the help page):
1. It splits the signal into overlapping sections and applies the window specified by
the window parameter to each section.
2. It computes the discrete-time Fourier transform of each section with a length nfft
FFT to produce an estimate of the short-term frequency content of the signal;
these transforms make up the columns of B (see the MATLAB help page for
specgram). The quantity (length(window) - numoverlap) specifies by how many
samples specgram shifts the window.
3. For real input, specgram truncates the spectrogram to the first nfft/2 + 1 points for
nfft even and (nfft + 1)/2 for nfft odd. All audio signals are real (as opposed to
real + imaginary).
As a first step to using specgram, copy the following into a simple MATLAB file It
doesn’t have to be a function, just save this as a .m file, put it on the MATLAB path, and
then run the file by typing its name (without the .m extension).
LAB 2
Machine Perception of Music
Computer Science 395, Winter Quarter 2005
% first, make the pitch using the function you made in lab 1
duration = 0.5;
%the duration, in seconds, of the sound
Fs = 8000;
% the sample frequency, in Hz of the sound
soundFreq = 440;
% the frequency, in Hz of the pitch
y = makepitch(duration,Fs, 440);
% now create a spectrogram and display it
numberOfFrequencyBins = 500;
windowFunction = hanning(numberOfFrequencyBins);
specgram(y, numberOfFrequencyBins, sampleFreq, windowFunction);
When you run the file, you will see an image like the one below. In this image, the
frequency of the signal is indicated by the strong red line.
4000
3500
3000
Frequency
2500
2000
1500
1000
500
0
0
0.05
0.1
0.15
0.2
0.25
Time
0.3
0.35
0.4
0.45
MATLAB has a number of “built in” audio signals that you can play with. Load the
“laughter” signal by typing the following:
load laughter
Try playing it and displaying it with the spectrogram.
Once you have done that, try loading the “chirp” signal in the same way, and seeing what
its spectrogram looks like.
The spectrogram shows the estimate of the relative amplitudes of a set of sinusoids used
to approximate the wave form analyzed. In order to see the information about the set of
sinusoids used to approximate the wave, add the following to your .m file. (Note: the
“…” indicates a line continuation)
LAB 2
Machine Perception of Music
Computer Science 395, Winter Quarter 2005
% now get the values corresponding to the image
[B,frequencies,times] = specgram(y, numberOfFrequencyBins, Fs,…
windowFunction);
amps = 20*log10(abs(B));
Once you run the file, you will have three new MATLAB variables in your environment.
The variable “amps” contains a two dimensional array of values, each of which contains
a real and imaginary part. For our purposes, we will ignore the imaginary portion of each
value. The value B(i,j) contains phase and amplitude information for the ith window and
the jth sinusoid in that window. Typically, we are interested only in the amplitude of the
real portion of the values in B. This information is contained in “amps.”
The variable “frequencies” contains the frequencies of the sinusoids used to analyze the
sound, and frequencies(j) contains the frequency of the jth sinusoid. Similarly, “times”
contains center times for the analysis windows. Thus, times(i) gives the time of the ith
window.
QUESTION 2.1 If you look at the contents of “frequencies,” you will notice that 440Hz
is NOT among the frequencies used to analyze the sound you created. The spacing
between the analysis frequencies determines the frequency resolution of the spectrogram.
What is the frequency resolution of the spectrogram?
QUESTION 2.2 Modify the number of frequency bins by modifying the MATLAB
script you were given. Include the modified script as your answer to this question.
QUESTION 2.3 What number of bins seems sufficient to determine the frequency of the
singal within 2 Hz?
3.0 Window functions
The code for spectrogram generation uses a window function. You can plot the window
function by typing the following:
plot(windowFunction)
QUESTION 3.1 What shape is this window function?
QUESTION 3.2 There are a number of other window functions that could be used. The
most obvious is the “boxcar” window. In your MATLAB script, replace “hanning” with
“boxcar.” Now run the script again. What happens to the spectrogram?
QUESTION 3.3 Does the spectrogram using the “boxcar” window look more or less
accurate than the one with the “hanning” window?
QUESTION 3.4 What does the boxcar window function look like?
QUESTION 3.5 Now replace “boxcar” with “triang” and re-run your script. What
happens to the spectrogram? What does the “triang” window function look like?
QUESTION 3.6 Which window function seemed to produce the best result? Which
produced the worst?
LAB 2
Machine Perception of Music
Computer Science 395, Winter Quarter 2005
4.0 Pitch and the missing fundamental frequency
Create a signal of frequency 262 Hz using the “makepitch” function. Now listen to it,
using soundsc. You should hear Middle C on the piano. Display it with the spectrogram
function, to double check that you have created a signal of the frequency you expect.
Now create two new signals at the frequencies 262*3 = 786 Hz and 262*5 = 1310These
are multiples of 262, but not related by a power of two to the frequency 262.Thus, they
are harmonics of Middle C, but are NOT frequencies associated with the pitch class of C
Play each of these harmonics individually, using soundsc.
Now, create a composite signal by adding the two harmonics together and play the
resulting signal. Compare its pitch to the pitch for a sine wave of 262 Hz. Compare the
pitch of the composite signal to the pitch of the of the signals used to create it.
QUESTION 4.1 Do you hear a single pitch?
QUESTION 4.2 Do you hear a pitch that is not contained in either of the harmonics you
just added together?
QUESTION 4.3 If you do hear an additional pitch; is this pitch higher or lower than the
786 Hz tone?
QUESTION 4.3 Look at the composite sound with a spectrogram. How many
frequencies with strong components are displayed in the spectrogram?
QUESTION 4.4 If you add additional harmonics, what happens to the pitch? (Don’t
guess, TRY it)
5.0 Non harmonic tones
Harmonic sounds are generally considered to be sounds whose primary frequency
components are all integer multiples of a fundamental frequency within the range of
human hearing (20 to 20,000 Hz). Create several sinusoids, using makepitch, at the
following frequencies: 262, 500, 607, 714. Add these together and listen to the result.
QUESTION 5.1 Does the resulting sound seem to have a single pitch?
Now create a random signal (also known as “white noise”) by using the rand function.
y
= rand(8000,1);
QUESTION 5.2 Play the “rand” signal. What sounds does it remind you of?
QUESTION 5.3 Does the “rand” signal sound like it has a pitch?
QUESTION 5.4 Now display the”rand” signal with a spectrogram. What does the
spectrogram look like? Describe what you see.
QUESTION 5.5 Now display the “laughter” signal from section 2. What does the
spectrogram look like?
LAB 2
Machine Perception of Music
Computer Science 395, Winter Quarter 2005
QUESTION 5.6 Does the “laughter “signal sound like it has a pitch?
6.0 Finding the pitch
This section requires several .wav files, downloaded from the web. Go to the course
home page and click on the link labeled “Lab 2 audio files” to reach these files. These
will be made available on Tuesday, January 18, 2005.
QUESTION 6.1 Sounds that have a pitch are harmonic sounds. Harmonic sounds consist
primarily of energy concentrated at frequencies that are simple integer multiples of a
fundamental frequency, f, in the range of human hearing. Write out a series of
instructions to explain to another person how to determine whether a sound is harmonic
or not. Assume they know what a spectrogram is. This explanation should be fairly
detailed, taking about ½ a page.
QUESTION 6.2 For each of the Lab 2 audio files, say whether or not it is a harmonic
sound. Explain how you can determine whether each sound is harmonic, using the
approach described in QUESTION 6.1.
QUESTION 6.3 One method of estimating the fundamental frequency of a sound is to
look at the rate of repetition of the waveform. Another is to look at the spacing (in the
frequency domain) between harmonics to infer the frequency of the fundamental.
Assuming a sound is harmonic, write out a series of instructions to explain to another
person how to determine the fundamental frequency of the sound, based on one of these
two methods (or a third basic approach, if you come up with one). This should be as
detailed as the answer to QUESTION 6.1.
QUESTION 6.4 What are the strengths and weaknesses of the method you describe in
the answer to QUESTION 6.3? In other words: What kind of sound that has a pitch
would cause your system to fail?
QUESTION 6.4 Find and report the fundamental frequency (or frequencies) for each
harmonic sound in the Lab 2 audio files using the method you describe in the answer to
QUESTION 6.3.
QUESTION 6.5 Given your estimate from QUESTION 6.4, what is the distance from
A=440, in semitones, of each harmonic sound? If a sound has multiple fundamental
frequencies, calculate the distance for each fundamental frequency.
Now that you know how many semitones each sound is from A=440, you can determine
the pitch class. The wheel on the following page will help you determine this. It shows
the numerical name assigned to each pitch class by music theorists. It also shows the
common letter names given to each pitch class. Simply start at your reference pitch class
and count around the wheel the appropriate number of semitones. Each step around the
wheel is a semitone. If your pitch is higher than the reference pitch, count clockwise
around the wheel. If the pitch is lower than the reference, go counter-clockwise around
the wheel.
QUESTION 6.6 For each pitch calculated in QUESTION 6.5, give the pitch class (use
the letter name).
LAB 2
Machine Perception of Music
Computer Science 395, Winter Quarter 2005
Higher = clockwise
PITCH &
PITCH CLASS
10
Every “G” has the
same pitch class
w
c
& w
0
C
Bb/A#
1
Db/C#
2
D
Eb/D# 3
9 A
8
? cw
w
11
B
Ab/G#
G
Gb/F#
7
6
E 4
F
5
Signal Processing Toolbox
specgram
Time-dependent frequency analysis (spectrogram)
Syntax
B = specgram(a)
B = specgram(a,nfft)
[B,f] = specgram(a,nfft,fs)
[B,f,t] = specgram(a,nfft,fs)
B = specgram(a,nfft,fs,window)
B = specgram(a,nfft,fs,window,numoverlap)
specgram(a)
B = specgram(a,f,fs,window,numoverlap)
Description
specgram computes the windowed discrete-time Fourier transform of a signal using a
sliding window. The spectrogram is the magnitude of this function.
B = specgram(a) calculates the windowed discrete-time Fourier transform for the signal
in vector a. This syntax uses the default values:
nfft = min(256,length(a))
fs = 2
window is a periodic Hann (Hanning) window of lengthnfft.
numoverlap = length(window)/2
nfft specifies the FFT length that specgram uses. This value determines the frequencies
at which the discrete-time Fourier transform is computed. fs is a scalar that specifies the
sampling frequency. window specifies a windowing function and the number of samples
specgram uses in its sectioning of vector a. numoverlap is the number of samples by
which the sections overlap. Any arguments that you omit from the end of the input
parameter list use the default values shown above.
If a is real, specgram computes the discrete-time Fourier transform at positive frequencies
only. If n is even, specgram returns nfft/2+1 rows (including the zero and Nyquist
frequency terms). If n is odd, specgram returns nfft/2 rows. The number of columns in B
is
k = fix((n-numoverlap)/(length(window)-numoverlap))
If a is complex, specgram computes the two-sided discrete-time Fourier transform and
returns a frequency vector of range 0 to fs. In this case, B is a complex matrix with nfft
rows. Time increases linearly across the columns of B, starting with sample 1 in column 1.
Frequency increases linearly down the rows, starting at 0.
B = specgram(a,nfft) uses the specified FFT length nfft in its calculations.
[B,f] = specgram(a,nfft,fs) returns a vector f of frequencies at which the function
computes the discrete-time Fourier transform. fs has no effect on the output B; it is a
frequency scaling multiplier.
[B,f,t] = specgram(a,nfft,fs) returns frequency and time vectors f and t
respectively. t is a column vector of scaled times, with length equal to the number of
columns of B. t(j) is the earliest time at which the jth window intersects a. t(1) is
always equal to 0.
B = specgram(a,nfft,fs,window) specifies a windowing function and the number of
samples per section of the x vector. If you supply a scalar for window, specgram uses a
Hann window of that length. The length of the window must be less than or equal to nfft.
B = specgram(a,nfft,fs,window,numoverlap)
numoverlap samples.
overlaps the sections of x by
You can use the empty matrix [] to specify the default value for any input argument. For
example,
B = specgram(x,[],10000)
is equivalent to
B = specgram(x)
but with a sampling frequency of 10,000 Hz instead of the default 2 Hz.
specgram(...) with no output arguments displays the scaled logarithm of the
spectrogram in the current figure window using
imagesc(t,f,20 *log10(abs(b))), axis xy, colormap(jet)
The axis xy mode displays the low-frequency content of the first portion of the signal in the
lower-left corner of the axes. specgram uses fs to label the axes according to true time
and frequency.
B = specgram(a,f,fs,window,numoverlap) computes the spectrogram at the
frequencies specified in f, using either the chirp z-transform (for more than 20 evenly
spaced frequencies) or a polyphase decimation filter bank. f is a vector of frequencies in
hertz; it must have at least two elements.
Examples
Display the spectrogram of a digitized speech signal:
load mtlb
specgram(mtlb,512,Fs,kaiser(500,5),475)
title('Spectrogram')
Note
You can view and manipulate a similar spectrogram using the Signal
Processing Toolbox specgramdemo.
Algorithm
specgram calculates the spectrogram for a given signal as follows:
1. It splits the signal into overlapping sections and applies the window specified by the
window parameter to each section.
2. It computes the discrete-time Fourier transform of each section with a length nfft
FFT to produce an estimate of the short-term frequency content of the signal; these
transforms make up the columns of B. The quantity
(length(window) - numoverlap) specifies by how many samples specgram
shifts the window.
3. For real input, specgram truncates the spectrogram to the first nfft/2 + 1 points
for nfft even and (nfft + 1)/2 for nfft odd.
Diagnostics
An appropriate diagnostic message is displayed when incorrect arguments are used:
Requires
Requires
Requires
Requires
window's length to be no greater than the FFT length.
NOVERLAP to be strictly less than the window length.
positive integer values for NFFT and NOVERLAP.
vector input.
See Also
mscohere , cpsd, pwelch, tfestimate
References
[1] Oppenheim, A.V., and R.W. Schafer, Discrete-Time Signal Processing, Prentice-Hall,
Englewood Cliffs, NJ, 1989, pp. 713-718.
[2] Rabiner, L.R., and R.W. Schafer, Digital Processing of Speech Signals , Prentice-Hall,
Englewood Cliffs, NJ, 1978.
sosfilt
spectrum