SGN-24006 Analysis of Audio, Speech and Music Signals Exercise 1 15.3.2017 Bonus points: By completing math problems in advance (usually one per exercise sheet) and by active participation in the exercise sessions (Matlab problems are solved during the session), you get bonus points to the exam as follows: • 25% completed (6 exercise points) → 1 bonus point • 50% completed (12 exercise points) → 2 bonus points • 75% completed (18 exercise points) → 3 bonus points During each exercise, you may get 4 exercise points. There will be six exercise sessions in total. Three bonus points in the exam are worth one mark. 1 Math question (2 points) Harmonic sounds (musical sounds) consist of frequency components at integer multiples of the fundamental frequency of the sound: fh = hF, (1) where F is the fundamental frequency and h = 1, 2, . . . is the harmonic index. Let’s assume that you are analyzing such a harmonic sound using constantQ transform (CQT) with B = 36 frequency bins per octave. Remember that the frequencies of CQT bins are given by fk = fmin ∗ 2k/B (2) where k = 0, 1, 2, . . . is the frequency bin index and fmin is the frequency of the lowest bin. You have computed the CQT spectrogram of a harmonic sound and by visual inspection you have located the first partial (h = 1, the fundamental component) to be at CQT bin index kF . Question: How many frequency bins above the fundamental component is the second partial (2F )? So k2F = kF + . . .? How about the third, fourth, and the fifth component? 1 2 Matlab (1 point) Download the CQT toolbox from http://www.cs.tut.fi/sgn/arg/CQT/ Create a subdirectory and unzip the toolbox in it. Note that the toolbox contains a couple of example audio signals (wave files). Remember that you can read wave files to Matlab with the command wavread. Note also that there are two example scripts that you can run directly, DEMO.m and PITCHSHIFT.m. Calculate the CQT of the signal fitzgerald.wav as follows: [x,fs] = wavread(’fitzgerald.wav’); % CQT parameters fmin = 55;% Hz fmax = fs/2; B = 48; gamma = 0;% value 0 corresponds to CQT % Calculate CQT. ’Rasterize to get time-frequency matrix Xcq = cqt(x, B, fs, fmin, fmax, ’rasterize’, ’full’, ’gamma’, gamma); % Plot dB-magnitudes (flipud flips matrix upside-down) imagesc(flipud(20*log10(abs(Xcq.c)))) Look at the CQT spectrogram of the signal. What do you notice at very low frequencies? In CQT, the Q-factor (ratio of bin frequency to its bandwidth) is the same for all frequency bins. In human hearing, the Q-factors get smaller at low frequencies (less frequency resolution than in CQT at low frequencies). See Figure 3 in http://www.cs.tut.fi/sgn/arg/CQT/schoerkhuber-aes-2014. pdf The CQT toolbox allows us to utilize the auditory Q-factors too, by setting an appropriate value for parameter gamma. More exactly, set: gamma = 229 ∗ (21/B − 2−(1/B) ) and re-calculate the CQT using the Cqt = ... command shown above. Plot the result using imagesc as shown above. Compare the resulting “smooth-Q” spectrogram with the CQT spectrogram that you calculated earlier, especially at low frequencies. What do you notice? Note that the number of frequency bins is still constant, B bins per octave, although the Q factors are now smoothly varying along with frequency. 2 3 Matlab (1 point) Let’s do simple pitch detection experiment. Let’s generate a “template” for a harmonic sound by creating an all-zeros column-vector (T = zeros(170,1)). Then let’s put harmonics at the positions of the first 10 harmonics: T( 1+round( log2( 1:10) * B)) = 1./(1:10) Above, we use 1/h as the amplitude value for h:th partial. That mimics roughly the typical shape of natural sound spectra. Then let’s convolve the original signal with the template using 2D convolution: X = conv2( T(end:-1:1), (abs(Xcq.c))^0.3); where template is reversed, because convolution reverses it back, and cubicroot compression is applied to flatten out the formant structure of the voice a bit. Use imagesc( flipud( X)) to visualize the result of the convolution. Note that usage of the harmoic template generates also “subharmonics” in X for the actual fundamental frequency (F/2, F/3,...). Finally, try to detect the strongest pitch value in each frame: D=X; for id=1:size(D,2), [o,maxId]=max(D(:,id)); D(:,id)=0; D(maxId,id)=1;end; imagesc( flipud( D)) 3
© Copyright 2025 Paperzz