LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005 1.0 Lab overview and objectives This lab will introduce you to displaying and analyzing sounds with spectrograms, with an emphasis on getting a feel for the relationship between harmonicity, pitch, and frequency. Lab due time/date: 2PM, January 24, 2005 What to hand in: A text file containing short answers (at least a sentence, no more than a paragraph, except where noted) to each question in each section of this lab. Format of required submission: an email attachment to the course email address ([email protected]). DON’T “ZIP” YOUR ATTACHMENT! For some reason, the mail server is giving us problems with zipped attachments. 2.0 The spectrogram The MATLAB Signal Processing Toolbox provides a function, specgram, that returns the time-dependent Fourier transform for a sequence, or displays this information as a spectrogram. For your convenience, I have appended the MATLAB help page for specgram to the end of this document. You will need to consult this page. The time-dependent Fourier transform is the discrete-time Fourier transform for a sequence, computed using a sliding window. This form of the Fourier transform, also known as the short-time Fourier transform (STFT), has numerous applications in speech, sonar, and radar processing. You will explore some of these applications in this lab. The specgram function calculates the spectrogram for a given signal as follows (yes, this is drawn from the help page): 1. It splits the signal into overlapping sections and applies the window specified by the window parameter to each section. 2. It computes the discrete-time Fourier transform of each section with a length nfft FFT to produce an estimate of the short-term frequency content of the signal; these transforms make up the columns of B (see the MATLAB help page for specgram). The quantity (length(window) - numoverlap) specifies by how many samples specgram shifts the window. 3. For real input, specgram truncates the spectrogram to the first nfft/2 + 1 points for nfft even and (nfft + 1)/2 for nfft odd. All audio signals are real (as opposed to real + imaginary). As a first step to using specgram, copy the following into a simple MATLAB file It doesn’t have to be a function, just save this as a .m file, put it on the MATLAB path, and then run the file by typing its name (without the .m extension). LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005 % first, make the pitch using the function you made in lab 1 duration = 0.5; %the duration, in seconds, of the sound Fs = 8000; % the sample frequency, in Hz of the sound soundFreq = 440; % the frequency, in Hz of the pitch y = makepitch(duration,Fs, 440); % now create a spectrogram and display it numberOfFrequencyBins = 500; windowFunction = hanning(numberOfFrequencyBins); specgram(y, numberOfFrequencyBins, sampleFreq, windowFunction); When you run the file, you will see an image like the one below. In this image, the frequency of the signal is indicated by the strong red line. 4000 3500 3000 Frequency 2500 2000 1500 1000 500 0 0 0.05 0.1 0.15 0.2 0.25 Time 0.3 0.35 0.4 0.45 MATLAB has a number of “built in” audio signals that you can play with. Load the “laughter” signal by typing the following: load laughter Try playing it and displaying it with the spectrogram. Once you have done that, try loading the “chirp” signal in the same way, and seeing what its spectrogram looks like. The spectrogram shows the estimate of the relative amplitudes of a set of sinusoids used to approximate the wave form analyzed. In order to see the information about the set of sinusoids used to approximate the wave, add the following to your .m file. (Note: the “…” indicates a line continuation) LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005 % now get the values corresponding to the image [B,frequencies,times] = specgram(y, numberOfFrequencyBins, Fs,… windowFunction); amps = 20*log10(abs(B)); Once you run the file, you will have three new MATLAB variables in your environment. The variable “amps” contains a two dimensional array of values, each of which contains a real and imaginary part. For our purposes, we will ignore the imaginary portion of each value. The value B(i,j) contains phase and amplitude information for the ith window and the jth sinusoid in that window. Typically, we are interested only in the amplitude of the real portion of the values in B. This information is contained in “amps.” The variable “frequencies” contains the frequencies of the sinusoids used to analyze the sound, and frequencies(j) contains the frequency of the jth sinusoid. Similarly, “times” contains center times for the analysis windows. Thus, times(i) gives the time of the ith window. QUESTION 2.1 If you look at the contents of “frequencies,” you will notice that 440Hz is NOT among the frequencies used to analyze the sound you created. The spacing between the analysis frequencies determines the frequency resolution of the spectrogram. What is the frequency resolution of the spectrogram? QUESTION 2.2 Modify the number of frequency bins by modifying the MATLAB script you were given. Include the modified script as your answer to this question. QUESTION 2.3 What number of bins seems sufficient to determine the frequency of the singal within 2 Hz? 3.0 Window functions The code for spectrogram generation uses a window function. You can plot the window function by typing the following: plot(windowFunction) QUESTION 3.1 What shape is this window function? QUESTION 3.2 There are a number of other window functions that could be used. The most obvious is the “boxcar” window. In your MATLAB script, replace “hanning” with “boxcar.” Now run the script again. What happens to the spectrogram? QUESTION 3.3 Does the spectrogram using the “boxcar” window look more or less accurate than the one with the “hanning” window? QUESTION 3.4 What does the boxcar window function look like? QUESTION 3.5 Now replace “boxcar” with “triang” and re-run your script. What happens to the spectrogram? What does the “triang” window function look like? QUESTION 3.6 Which window function seemed to produce the best result? Which produced the worst? LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005 4.0 Pitch and the missing fundamental frequency Create a signal of frequency 262 Hz using the “makepitch” function. Now listen to it, using soundsc. You should hear Middle C on the piano. Display it with the spectrogram function, to double check that you have created a signal of the frequency you expect. Now create two new signals at the frequencies 262*3 = 786 Hz and 262*5 = 1310These are multiples of 262, but not related by a power of two to the frequency 262.Thus, they are harmonics of Middle C, but are NOT frequencies associated with the pitch class of C Play each of these harmonics individually, using soundsc. Now, create a composite signal by adding the two harmonics together and play the resulting signal. Compare its pitch to the pitch for a sine wave of 262 Hz. Compare the pitch of the composite signal to the pitch of the of the signals used to create it. QUESTION 4.1 Do you hear a single pitch? QUESTION 4.2 Do you hear a pitch that is not contained in either of the harmonics you just added together? QUESTION 4.3 If you do hear an additional pitch; is this pitch higher or lower than the 786 Hz tone? QUESTION 4.3 Look at the composite sound with a spectrogram. How many frequencies with strong components are displayed in the spectrogram? QUESTION 4.4 If you add additional harmonics, what happens to the pitch? (Don’t guess, TRY it) 5.0 Non harmonic tones Harmonic sounds are generally considered to be sounds whose primary frequency components are all integer multiples of a fundamental frequency within the range of human hearing (20 to 20,000 Hz). Create several sinusoids, using makepitch, at the following frequencies: 262, 500, 607, 714. Add these together and listen to the result. QUESTION 5.1 Does the resulting sound seem to have a single pitch? Now create a random signal (also known as “white noise”) by using the rand function. y = rand(8000,1); QUESTION 5.2 Play the “rand” signal. What sounds does it remind you of? QUESTION 5.3 Does the “rand” signal sound like it has a pitch? QUESTION 5.4 Now display the”rand” signal with a spectrogram. What does the spectrogram look like? Describe what you see. QUESTION 5.5 Now display the “laughter” signal from section 2. What does the spectrogram look like? LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005 QUESTION 5.6 Does the “laughter “signal sound like it has a pitch? 6.0 Finding the pitch This section requires several .wav files, downloaded from the web. Go to the course home page and click on the link labeled “Lab 2 audio files” to reach these files. These will be made available on Tuesday, January 18, 2005. QUESTION 6.1 Sounds that have a pitch are harmonic sounds. Harmonic sounds consist primarily of energy concentrated at frequencies that are simple integer multiples of a fundamental frequency, f, in the range of human hearing. Write out a series of instructions to explain to another person how to determine whether a sound is harmonic or not. Assume they know what a spectrogram is. This explanation should be fairly detailed, taking about ½ a page. QUESTION 6.2 For each of the Lab 2 audio files, say whether or not it is a harmonic sound. Explain how you can determine whether each sound is harmonic, using the approach described in QUESTION 6.1. QUESTION 6.3 One method of estimating the fundamental frequency of a sound is to look at the rate of repetition of the waveform. Another is to look at the spacing (in the frequency domain) between harmonics to infer the frequency of the fundamental. Assuming a sound is harmonic, write out a series of instructions to explain to another person how to determine the fundamental frequency of the sound, based on one of these two methods (or a third basic approach, if you come up with one). This should be as detailed as the answer to QUESTION 6.1. QUESTION 6.4 What are the strengths and weaknesses of the method you describe in the answer to QUESTION 6.3? In other words: What kind of sound that has a pitch would cause your system to fail? QUESTION 6.4 Find and report the fundamental frequency (or frequencies) for each harmonic sound in the Lab 2 audio files using the method you describe in the answer to QUESTION 6.3. QUESTION 6.5 Given your estimate from QUESTION 6.4, what is the distance from A=440, in semitones, of each harmonic sound? If a sound has multiple fundamental frequencies, calculate the distance for each fundamental frequency. Now that you know how many semitones each sound is from A=440, you can determine the pitch class. The wheel on the following page will help you determine this. It shows the numerical name assigned to each pitch class by music theorists. It also shows the common letter names given to each pitch class. Simply start at your reference pitch class and count around the wheel the appropriate number of semitones. Each step around the wheel is a semitone. If your pitch is higher than the reference pitch, count clockwise around the wheel. If the pitch is lower than the reference, go counter-clockwise around the wheel. QUESTION 6.6 For each pitch calculated in QUESTION 6.5, give the pitch class (use the letter name). LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005 Higher = clockwise PITCH & PITCH CLASS 10 Every “G” has the same pitch class w c & w 0 C Bb/A# 1 Db/C# 2 D Eb/D# 3 9 A 8 ? cw w 11 B Ab/G# G Gb/F# 7 6 E 4 F 5 Signal Processing Toolbox specgram Time-dependent frequency analysis (spectrogram) Syntax B = specgram(a) B = specgram(a,nfft) [B,f] = specgram(a,nfft,fs) [B,f,t] = specgram(a,nfft,fs) B = specgram(a,nfft,fs,window) B = specgram(a,nfft,fs,window,numoverlap) specgram(a) B = specgram(a,f,fs,window,numoverlap) Description specgram computes the windowed discrete-time Fourier transform of a signal using a sliding window. The spectrogram is the magnitude of this function. B = specgram(a) calculates the windowed discrete-time Fourier transform for the signal in vector a. This syntax uses the default values: nfft = min(256,length(a)) fs = 2 window is a periodic Hann (Hanning) window of lengthnfft. numoverlap = length(window)/2 nfft specifies the FFT length that specgram uses. This value determines the frequencies at which the discrete-time Fourier transform is computed. fs is a scalar that specifies the sampling frequency. window specifies a windowing function and the number of samples specgram uses in its sectioning of vector a. numoverlap is the number of samples by which the sections overlap. Any arguments that you omit from the end of the input parameter list use the default values shown above. If a is real, specgram computes the discrete-time Fourier transform at positive frequencies only. If n is even, specgram returns nfft/2+1 rows (including the zero and Nyquist frequency terms). If n is odd, specgram returns nfft/2 rows. The number of columns in B is k = fix((n-numoverlap)/(length(window)-numoverlap)) If a is complex, specgram computes the two-sided discrete-time Fourier transform and returns a frequency vector of range 0 to fs. In this case, B is a complex matrix with nfft rows. Time increases linearly across the columns of B, starting with sample 1 in column 1. Frequency increases linearly down the rows, starting at 0. B = specgram(a,nfft) uses the specified FFT length nfft in its calculations. [B,f] = specgram(a,nfft,fs) returns a vector f of frequencies at which the function computes the discrete-time Fourier transform. fs has no effect on the output B; it is a frequency scaling multiplier. [B,f,t] = specgram(a,nfft,fs) returns frequency and time vectors f and t respectively. t is a column vector of scaled times, with length equal to the number of columns of B. t(j) is the earliest time at which the jth window intersects a. t(1) is always equal to 0. B = specgram(a,nfft,fs,window) specifies a windowing function and the number of samples per section of the x vector. If you supply a scalar for window, specgram uses a Hann window of that length. The length of the window must be less than or equal to nfft. B = specgram(a,nfft,fs,window,numoverlap) numoverlap samples. overlaps the sections of x by You can use the empty matrix [] to specify the default value for any input argument. For example, B = specgram(x,[],10000) is equivalent to B = specgram(x) but with a sampling frequency of 10,000 Hz instead of the default 2 Hz. specgram(...) with no output arguments displays the scaled logarithm of the spectrogram in the current figure window using imagesc(t,f,20 *log10(abs(b))), axis xy, colormap(jet) The axis xy mode displays the low-frequency content of the first portion of the signal in the lower-left corner of the axes. specgram uses fs to label the axes according to true time and frequency. B = specgram(a,f,fs,window,numoverlap) computes the spectrogram at the frequencies specified in f, using either the chirp z-transform (for more than 20 evenly spaced frequencies) or a polyphase decimation filter bank. f is a vector of frequencies in hertz; it must have at least two elements. Examples Display the spectrogram of a digitized speech signal: load mtlb specgram(mtlb,512,Fs,kaiser(500,5),475) title('Spectrogram') Note You can view and manipulate a similar spectrogram using the Signal Processing Toolbox specgramdemo. Algorithm specgram calculates the spectrogram for a given signal as follows: 1. It splits the signal into overlapping sections and applies the window specified by the window parameter to each section. 2. It computes the discrete-time Fourier transform of each section with a length nfft FFT to produce an estimate of the short-term frequency content of the signal; these transforms make up the columns of B. The quantity (length(window) - numoverlap) specifies by how many samples specgram shifts the window. 3. For real input, specgram truncates the spectrogram to the first nfft/2 + 1 points for nfft even and (nfft + 1)/2 for nfft odd. Diagnostics An appropriate diagnostic message is displayed when incorrect arguments are used: Requires Requires Requires Requires window's length to be no greater than the FFT length. NOVERLAP to be strictly less than the window length. positive integer values for NFFT and NOVERLAP. vector input. See Also mscohere , cpsd, pwelch, tfestimate References [1] Oppenheim, A.V., and R.W. Schafer, Discrete-Time Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1989, pp. 713-718. [2] Rabiner, L.R., and R.W. Schafer, Digital Processing of Speech Signals , Prentice-Hall, Englewood Cliffs, NJ, 1978. sosfilt spectrum
© Copyright 2026 Paperzz