questions

SGN-24006 Analysis of Audio, Speech and Music Signals
Exercise 1
15.3.2017
Bonus points: By completing math problems in advance (usually one per
exercise sheet) and by active participation in the exercise sessions (Matlab
problems are solved during the session), you get bonus points to the exam as
follows:
• 25% completed (6 exercise points) → 1 bonus point
• 50% completed (12 exercise points) → 2 bonus points
• 75% completed (18 exercise points) → 3 bonus points
During each exercise, you may get 4 exercise points. There will be six exercise
sessions in total. Three bonus points in the exam are worth one mark.
1
Math question (2 points)
Harmonic sounds (musical sounds) consist of frequency components at integer
multiples of the fundamental frequency of the sound:
fh = hF,
(1)
where F is the fundamental frequency and h = 1, 2, . . . is the harmonic index.
Let’s assume that you are analyzing such a harmonic sound using constantQ transform (CQT) with B = 36 frequency bins per octave. Remember that
the frequencies of CQT bins are given by
fk = fmin ∗ 2k/B
(2)
where k = 0, 1, 2, . . . is the frequency bin index and fmin is the frequency of
the lowest bin.
You have computed the CQT spectrogram of a harmonic sound and by
visual inspection you have located the first partial (h = 1, the fundamental
component) to be at CQT bin index kF .
Question: How many frequency bins above the fundamental component
is the second partial (2F )? So k2F = kF + . . .? How about the third, fourth,
and the fifth component?
1
2
Matlab (1 point)
Download the CQT toolbox from http://www.cs.tut.fi/sgn/arg/CQT/
Create a subdirectory and unzip the toolbox in it. Note that the toolbox
contains a couple of example audio signals (wave files). Remember that you
can read wave files to Matlab with the command wavread.
Note also that there are two example scripts that you can run directly,
DEMO.m and PITCHSHIFT.m.
Calculate the CQT of the signal fitzgerald.wav as follows:
[x,fs] = wavread(’fitzgerald.wav’);
% CQT parameters
fmin = 55;% Hz
fmax = fs/2;
B = 48;
gamma = 0;% value 0 corresponds to CQT
% Calculate CQT. ’Rasterize to get time-frequency matrix
Xcq = cqt(x, B, fs, fmin, fmax, ’rasterize’, ’full’, ’gamma’, gamma);
% Plot dB-magnitudes (flipud flips matrix upside-down)
imagesc(flipud(20*log10(abs(Xcq.c))))
Look at the CQT spectrogram of the signal. What do you notice at very
low frequencies?
In CQT, the Q-factor (ratio of bin frequency to its bandwidth) is the same
for all frequency bins. In human hearing, the Q-factors get smaller at low frequencies (less frequency resolution than in CQT at low frequencies). See Figure 3 in http://www.cs.tut.fi/sgn/arg/CQT/schoerkhuber-aes-2014.
pdf
The CQT toolbox allows us to utilize the auditory Q-factors too, by
setting an appropriate value for parameter gamma. More exactly, set:
gamma = 229 ∗ (21/B − 2−(1/B) )
and re-calculate the CQT using the Cqt = ... command shown above. Plot
the result using imagesc as shown above.
Compare the resulting “smooth-Q” spectrogram with the CQT spectrogram that you calculated earlier, especially at low frequencies. What do you
notice? Note that the number of frequency bins is still constant, B bins per
octave, although the Q factors are now smoothly varying along with frequency.
2
3
Matlab (1 point)
Let’s do simple pitch detection experiment. Let’s generate a “template” for a
harmonic sound by creating an all-zeros column-vector (T = zeros(170,1)).
Then let’s put harmonics at the positions of the first 10 harmonics:
T( 1+round( log2( 1:10) * B)) = 1./(1:10)
Above, we use 1/h as the amplitude value for h:th partial. That mimics
roughly the typical shape of natural sound spectra.
Then let’s convolve the original signal with the template using 2D convolution:
X = conv2( T(end:-1:1), (abs(Xcq.c))^0.3);
where template is reversed, because convolution reverses it back, and cubicroot compression is applied to flatten out the formant structure of the voice
a bit.
Use imagesc( flipud( X)) to visualize the result of the convolution.
Note that usage of the harmoic template generates also “subharmonics” in X
for the actual fundamental frequency (F/2, F/3,...).
Finally, try to detect the strongest pitch value in each frame:
D=X;
for id=1:size(D,2), [o,maxId]=max(D(:,id)); D(:,id)=0; D(maxId,id)=1;end;
imagesc( flipud( D))
3