spgrambw: Plot Spectrograms in MATLAB

spgrambw: Plot Spectrograms in MATLAB
Mike Brookes
Version: 1639, 15/03/2012
Contents
1 Introduction
1
2 Function call
2
3 Colour maps
2
4 Frequency axis
4.1 Nonlinear frequency scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Frequency range and stepsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
3
5 Analysis bandwidth
4
6 Time Axis
4
7 Intensity scaling
5
8 Waveform and transcription
5
9 Output Arguments
6
10 MODE string options
6
11 MATLAB Code for figures
7
1
Introduction
This document describes the spgrambw function which is part of the voicebox toolbox available at http://
www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html [Bro11]. We will use as an example, the following
sentence “Six plus three equals nine” for which a spectrogram is shown below inculding the speech waveform
and a time-aligned phonetic annotation.
1: MODE='pJcwat'
60
s
k
s
p
l
s
r
i
i
k
w
z
n
a
n
10
9
55
Frequency (kHz)
45
7
6
40
5
35
4
3
30
2
25
1
0
0.2
0.4
0.6
0.8
1
Time (s)
1
1.2
1.4
1.6
1.8
Power/Decade (dB)
50
8
2
Function call
The basic call to the function is:
[ T, F , B]=spgrambw ( S , FS ,MODE,BW,FMAX,DB, TINC ,ANN)
where all but the first two input arguments are optional. The input arguments are:
S input speech waveform
FS sample rate of speech waveform
MODE text string specifying a large range of options
BW the bandwidth of the spectrogram. This argument determines the tradeoff between time and frequency
resolution.
FMAX specifies the range and resolution of the frequency axis
DB specifies the range of power spectral density displayed
TINC specifies the range and resolution of the time axis
ANN gives an optional annotation file containing words or phonemes.
If all you want to do is draw a spectrogram, then the function should be called without any output arguments.
If output arguments are specified, then no spectrogram will be drawn unless the ’g’ mode option is also given.
The output arguments are
T gives the time of each time-axis sample point
F gives the frequency of each frequency-axis sample point
B a 2-dimensional array giving the spectral density at each time-frequency point.
In the plots shown in this document, the title (above the spectrogram) shows the figure number (written {n}
in the text), the value of the MODE argument and the value of any other arguments that are not null.
3
Colour maps
The default output is a monochrome spectrogram shown as {2}. Specifying the ‘j’ mode option uses the “jet”
colourmap instead which is colourful and intuitive {3}. However it does not reproduce accurately if viewed
or printed in monochrome and so I normally use the ‘J’ option instead which is less aggressive and converts
accurately to monochrome {4}. Notice that I have also used the ‘c’ option in each case in order to include a
colourbar giving the intensity scale in decibels.
2: MODE='pc'
3: MODE='pjc'
4: MODE='pJc'
10
60
9
8
8
50
35
3
45
5
40
4
35
3
0
1
Time (s)
1.5
2: Monochrome
40
4
35
30
2
25
1
0
0.5
45
5
30
2
25
1
6
3
30
2
Frequency (kHz)
4
7
6
Power/Decade (dB)
40
Frequency (kHz)
5
Power/Decade (dB)
7
45
55
8
50
7
6
60
9
55
50
Frequency (kHz)
10
60
9
55
Power/Decade (dB)
10
25
1
0
0.5
1
Time (s)
1.5
3: ‘j’=Jet
0.5
1
Time (s)
1.5
4: ‘J’=Thermal
Adding the ‘i’ option inverts the colour map so that dark areas now correspond to high intensity. For these
examples, I have omitted the ‘c’ option so the colourbar is missing.
2
6: MODE='pji'
9
8
8
8
7
7
7
6
5
4
Frequency (kHz)
10
9
6
5
4
6
5
4
3
3
3
2
2
2
1
1
1
0
0
0.5
1
Time (s)
0
1.5
0.5
5: ‘i’= Inverted Monochrome
4
7: MODE='pJi'
10
9
Frequency (kHz)
Frequency (kHz)
5: MODE='pi'
10
1
Time (s)
1.5
0.5
6: ‘ij’=Inverted Jet
1
Time (s)
1.5
7: ‘iJ’=Inverted Thermal
Frequency axis
4.1
Nonlinear frequency scaling
The default frequency axis is linear in Hz as seen in the examples above. Speech scientists usually prefer a
nonlinear frequency scale in which high frequencies are compressed. There are several widely used frequency
scales and these are plotted below (scaled to coincide at 1 kHz) [MG83, Ghi94, SVN37, Zwi61, ZT80]. The log
scale {8} provides the most compression at high frequencies but it is more usual to use one of the physiological
or psychoacoustical scales: Erb-rate {9}, Mel {10} or Bark {11}. The scale is selected by the MODE options
‘l’, ‘e’, ‘m’ or ‘b’. In all cases, it is possible to add also the ‘f’ option which causes the frequency axis labels to
be written in Hz as in {12}. In all the plots below, I have reduced the bandwidth to 80 Hz (see section 5) to
give better frequency resolution.
8: MODE='pJcl', BW=80
9: MODE='pJce', BW=80
4
60
32
3.8
lin
55
3.6
mel
50
log
1
3
45
2.8
40
2.6
2.4
35
Frequency (Erb-rate)
1.5
24
3.2
Power/Decade (dB)
Frequency (log10Hz)
erb-rate
2
3
4
Frequency (kHz)
5
40
14
12
35
30
4
25
6
0.5
2
1
Time (s)
1.5
25
0.5
8: ‘l’ = Log scaled
10: MODE='pJcm', BW=80
11: MODE='pJcb', BW=80
2.6
12: MODE='pJcbf', BW=80
10k
60
5k
18
55
2.4
55
55
45
1.6
1.4
40
1.2
1
35
0.8
50
Frequency (Bark)
1.8
Power/Decade (dB)
50
2
14
45
12
10
40
8
35
6
Power/Decade (dB)
16
2.2
1.5
60
20
Bark-scaled frequency (Hz)
60
1
Time (s)
9: ‘e’ = Erb-rate scaled
22
3
2.8
Frequency (kMel)
16
0
Frequency scales
50
2k
45
40
1k
35
500
0.6
30
4
30
0.4
30
2
0.2
25
0
25
0
0.5
1
Time (s)
25
0
1.5
10: ‘m’ = Mel scaled
4.2
45
18
6
1.6
1
20
8
30
2
1.8
0
0
50
22
10
2.2
0.5
55
26
3.4
bark
2
60
28
Power/Decade (dB)
2.5
30
Power/Decade (dB)
Frequency scales
3
0.5
1
Time (s)
1.5
11: ‘b’ = Bark scaled
0.5
1
Time (s)
1.5
12: ‘bf’ = Bark + Hz labels
Frequency range and stepsize
By default the frequency axis encompasses the entire range from 0 Hz to the Nyquist frequency, 21 fs , but this
is often too large. The FMAX input parameter allows you to specify the desired frequency range. Setting
FMAX=4000 {13} restricts the frequency range to a maximum of 4 kHz while FMAX=[2000 4000] sets the
1
range to 2 kHz to 4 kHz {14}. Normally the frequency stepsize is 256
of the displayed range, but you can also
specify the stepsize explicitly: FMAX=[2000 200 4000] goes from 2 kHz to 4 kHz in steps of 200 Hz {15}. If a
nonlinear frequency scaling has been selected by the ‘l’, ‘e’, ‘m’ or ‘b’ options, then FMAX must be specified in
scaled units unless the ‘h’ option is given, in which case they are in Hz as normal. Note that selecting a very
small step size does not make the spectrogram any less blurry; the frequency resoulution is determined by the
analysis bandwidth, BW, described in section 5.
3
13: MODE='hpJc', FMAX=4000
14: MODE='hpJc', FMAX=[2000 4000]
15: MODE='hpJc', FMAX=[2000 200 4000]
4
60
60
4
3.8
3.5
55
50
3.6
50
50
2
40
1.5
35
3.2
45
3
40
2.8
35
30
0.5
30
2.4
25
3.2
40
3
35
2.8
2.6
2.6
1
45
3.4
Frequency (kHz)
45
Frequency (kHz)
2.5
Power/Decade (dB)
Frequency (kHz)
3.4
Power/Decade (dB)
3
55
3.8
55
3.6
30
2.4
25
2.2
25
2.2
20
2
0
2
0.5
1
Time (s)
1.5
0.5
13: 0 to 4 kHz
5
Power/Decade (dB)
4
1
Time (s)
1.5
0.5
14: 2 to 4 kHz
1
Time (s)
1.5
15: 200 Hz resolution
Analysis bandwidth
There is an unavoidable tradeoff between time resolution and frequency resolution that is often known as the
“uncertainty principle”. The BW input parameter specifies the −6 dB analysis bandwidth which is the frequency
separation at which two tones will definitely give distinct peaks. From the point of view of frequency resolution,
it follows that the smaller BW the better. However selecting a small value of BW means that rapid amplitude
variations within any single frequency bin will be attenuated and, in particular, amplitude variations faster than
1
2 BW will be attenuated by more than −6 dB resulting in poor time resolution.
16: MODE='pJcwat', BW=50
17: MODE='pJcwat'
18: MODE='pJcwat', BW=400
60
60
n 0
10
s
aI
n 0
10
55
9
9
45
6
40
5
4
45
7
6
40
5
35
3
30
2
1
1
Time (s)
45
7
40
6
5
35
30
2
25
1
20
0
0.5
16: BW=50 Hz
50
3
25
1.5
1
Time (s)
55
4
0
0.5
n 0
8
30
1
0
aI
9
2
25
I k s p l V s T r i: i: k w @ z n
10
50
4
35
3
s
55
8
Frequency (kHz)
7
Power/Decade (dB)
50
8
Frequency (kHz)
I k s p l V s T r i: i: k w @ z n
Power/Decade (dB)
aI
Frequency (kHz)
I k s p l V s T r i: i: k w @ z n
Power/Decade (dB)
s
1.5
0.5
17: BW=200 Hz (default)
1
Time (s)
1.5
18: BW=400 Hz
In this speech example, which is by a female talker, the larynx frequency varies from 300 Hz down to
150 Hz. If BW is chosen to be below the fundamental frequency, e.g. BW=50 Hz in {16}, the harmonics of the
larynx frequency are clearly visible as quasi-horizontal stripes, however the time resolution is relatively poor.
In a broadband spectrogram, in contrast, the bandwidth is chosen to be higher than the larynx frequency, e.g.
BW=400 Hz in {18}, and the individual harmonics are no longer resolved. The time resolution is however much
improved and it is possible to resolve the individual acoustic excitations arising from each larynx pulse; these
are visible as vertical striations during the /aI/ phoneme of “nine” at a time of around 1.5 seconds. The default
bandwidth is BW=200 Hz {17} which is often too large to reslve the larynx frequency harmonics but which
makes the vocal tract resonances, or formants, easy to see.
6
Time Axis
As discussed in section 5, the time resolution is determined by the BW parameter, and modulation frequencies
0.45
above 12 BW are not shown in the spectrogram. For this reason, the default time-step is taken as BW
and, for
small values of BW, this may give a blocky appearance {19}. To avoid this you can explicitly set a smaller time
step using the TINC parameter as shown in {20}; note that although this results in a smoother appearance, it
does not improve the time resolution which is still determined by the BW parameter (see section 5).
19: MODE='pJcwat', BW=20
20: MODE='pJcwat', BW=20, TINC=0.005
21: MODE='pJcwat', BW=20, TINC=[1.1 0.001 1.4]
60
6
40
5
35
4
3
30
2
aI
n 0
k
55
9
45
7
6
40
5
35
4
3
30
2
25
1
0
1
Time (s)
1.5
19: BW=20 Hz
@
z
n
25
1
0.5
1
Time (s)
1.5
20: BW=20, TINC=0.005
4
50
45
9
8
40
7
35
6
5
30
4
25
3
2
0
0.5
w
10
50
8
Frequency (kHz)
45
7
I k s p l V s T r i: i: k w @ z n
10
50
8
Frequency (kHz)
s
55
20
1
0
1.1
15
1.15
1.2
1.25
Time (s)
1.3
1.35
1.4
21: TINC=[1.1 0.001 1.4]
Power/Decade (dB)
9
n 0
Frequency (kHz)
aI
Power/Decade (dB)
I k s p l V s T r i: i: k w @ z n
Power/Decade (dB)
s
10
60
You can restrict the display to a specific time interval by setting TINC = [tmin tmax ] or TINC = [tmin tstep tmax ]
if you want to specifiy the time-step as well {21}. Notice in {21} that the waveform and annotations remain
correctly aligned.
The sample time of S(1) is assumed by default to be T 1 = F1S , but you can set it to any other value by
making the second input argument a vector: [FS T1].
7
Intensity scaling
The default spectrogram shows the spectral density in units of “power per Hz” {22}. Because most speech
energy is concentrated at low frequencies, this can make it difficult to see detail in the display at both low and
high frequencies. To avoid this, you can use the ‘p’ option to display “power per decade” instead: this option
multiplies the power by a value proportional to the frequency and so emphasises high frequencies {23}. If you
are using one of the non-linear frequency scaling options described in section 4.1, you have a third option which
is to show “power per bark/erb/...” {24}.
22: MODE='Jc'
23: MODE='pJc'
10
24: MODE='PJcbf'
10
10k
60
25
45
9
20
8
55
5k
8
40
0
3
6
45
5
40
4
35
Power/Decade (dB)
5
4
Frequency (kHz)
10
5
Power/Hz (dB)
Frequency (kHz)
7
15
6
Bark-scaled frequency (Hz)
50
7
3
30
2
30
25
1k
20
500
2
-5
35
2k
Power/Bark (dB)
9
15
1
25
1
-10
0
0
0.5
1
Time (s)
10
0
1.5
0.5
22: Power/Hz
1
Time (s)
1.5
0.5
23: ‘p’=Power/Decade
1
Time (s)
1.5
1
24: ‘P’=Power per Bark
Normally, the display shows a range of 40 dB from the maximum power anywhere in the spectrrogram
{25}. You can change this to a different range by setting the DB parameter either to the desired range{26}
or alternatively to the minimum and maximum powers to display: DB = [Pmin Pmax ] {27}. This option is
especially useful if you want to have several spectriograms with identical displayed power ranges. Values outside
the selected range will be set to either the minimum or maximum.
25: MODE='Jc'
26: MODE='Jc', DB=60
10
27: MODE='Jc', DB=[-25 0]
10
10
0
25
9
20
5
4
0
2
-5
1
-10
0
6
0
5
4
-10
3
1
Time (s)
1.5
25: 40 dB range (default)
6
-10
5
4
-15
3
-20
2
1
2
-20
1
-30
0
0.5
-5
7
Frequency (kHz)
10
5
8
10
7
Frequency (kHz)
6
3
8
8
15
Power/Hz (dB)
Frequency (kHz)
7
Power/Hz (dB)
8
9
20
Power/Hz (dB)
9
0
0.5
1
Time (s)
1.5
26: DB=60
-25
0.5
1
Time (s)
1.5
27: DB=[-25 0]
Waveform and transcription
It is often helpful to display the time-domain waveform on the spectrogram and you can do so with th ‘w’ option
{25}. If you have a transcription or other time-aligned annotation, you can specify it as the ANN input. Each
row of the ANN cell array is of the form {[tstart tend ] ‘text’}. By default, the annotations are left-aligned within
their time intervals without any time markers {26}. If you want to display phonetic characters, you will need
to install a non-unicode IPA font such as the SIL93 fonts (available for download from the Voicebox website).
You can specify the font of each annotation entry by including a third column; each row of ANN is now of the
form {[tstart tend ] ‘text’ ‘font’}. Example {27} uses the ‘SILDoulos IPA93’ font and also includes the options ‘a’
which centres the annotations in their time interval and ‘t’ which includes time markers.
5
28: MODE='Jcw'
29: MODE='Jc'
s
I k s p lV s
T r i:
30: MODE='Jcwat'
i: k w@ z n aI
n
0
25
25
5
5
4
10
6
5
5
4
0
0
3
3
-5
2
1
-10
0
1
Time (s)
1
1.5
25: ’w’=show waveform
9
n
20
15
7
10
6
5
5
4
0
3
-5
2
-10
0
0.5
a
8
Frequency (kHz)
6
Frequency (kHz)
10
Power/Hz (dB)
7
i kw z n
9
15
7
Power/Hz (dB)
15
k s p l s r i
10
20
8
8
Frequency (kHz)
9
20
9
s
Power/Hz (dB)
10
25
10
-5
2
1
-10
0
0.5
1
Time (s)
1.5
26: ANN input
0.5
1
Time (s)
1.5
27: ‘wat’ + ANN font
Output Arguments
Specifying output arguments normally suppresses the spectrogram plot unless the ‘g’ option is given. Note that,
perhaps unexpectedly, the spectrogram array is the third output rather than the first.
If you save the B output (with a linear frequency scale and without the ‘p’ or ‘P’ options), you can use it as
the input to a subsequent call to spgrambw instead of a time-domain waveform. In this case FS=[FS T1 FINC
F1] where FS is now the frame rate (each frame is one row of B), T1 is the time of the first row of B, FINC is
the frequency increment and F1 is the frequency of the first column in B.
10
MODE string options
a centre-align annotations rather than left-aligning them
b bark scale
c include a colourbar as an intensity scale
d give the B ouput in decibels rather than in power.
D clip the output B array to the limits specified by the ”db” input
e erb scale]
f label frequency axis in Hz rather than mel/bark/...
g draw a graph even if output arguments are present
h units of the FMAX input are in Hz instead of mel/bark/... In this case, the Fstep parameter is used only to
determine the number of filters.
H express the F output in Hz instead of mel/bark/...
i inverted colourmap” (white background)
j jet colourmap
J “thermal”colourmap that is linear in grayscale. Based on Oliver Woodford’s % real2rgb at http://www.mathworks.com/mat
l log10 Hz frequency scale
m mel scale
p calculate “power per decade” rather than “power per Hz”. This effectively increases the power level at high
frequencies and so maes them more visible
P calculate “power per erb/mel/...” rather than power per Hz.
t add time markers with annotations
w draw the speech waveform above the spectrogram
6
11
MATLAB Code for figures
The following code was used to generate all the figures in this document:
% d e m o n s t r a t i o n s f o r t h e spgrambw t u t o r i a l
clear all ;
close all ;
p={1 2 . 5 ’ pJcwat ’ [ ] [ ] [ ] [ ] 2 ;
2 1 . 3 3 ’ pc ’ [ ] [ ] [ ] [ ] 0 ;
3 1 . 3 3 ’ pjc ’ [ ] [ ] [ ] [ ] 0 ;
4 1 . 3 3 ’ pJc ’ [ ] [ ] [ ] [ ] 0 ;
5 1 . 3 3 ’ pi ’ [ ] [ ] [ ] [ ] 0 ;
6 1.33 ’ pji ’ [ ] [ ] [ ] [ ] 0;
7 1 . 3 3 ’ pJi ’ [ ] [ ] [ ] [ ] 0 ;
8 1 . 3 3 ’ p J c l ’ 80 [ ] [ ] [ ] 0 ;
9 1 . 3 3 ’ pJce ’ 80 [ ] [ ] [ ] 0 ;
10 1 . 3 3 ’ pJcm ’ 80 [ ] [ ] [ ] 0 ;
11 1 . 3 3 ’ pJcb ’ 80 [ ] [ ] [ ] 0 ;
12 1 . 3 3 ’ pJcbf ’ 80 [ ] [ ] [ ] 0 ;
13 1 . 3 3 ’ hpJc ’ [ ] [ 4 0 0 0 ] [ ] [ ] 0 ;
14 1 . 3 3 ’ hpJc ’ [ ] [ 2 0 0 0 4 0 0 0 ] [ ] [ ] 0 ;
15 1 . 3 3 ’ hpJc ’ [ ] [ 2 0 0 0 200 4 0 0 0 ] [ ] [ ] 0 ;
16 1 . 3 3 ’ pJcwat ’ [ 5 0 ] [ ] [ ] [ ] 1 ;
17 1 . 3 3 ’ pJcwat ’ [ ] [ ] [ ] [ ] 1 ;
18 1 . 3 3 ’ pJcwat ’ [ 4 0 0 ] [ ] [ ] [ ] 1 ;
19 1 . 3 3 ’ pJcwat ’ [ 2 0 ] [ ] [ ] [ ] 1 ;
20 1 . 3 3 ’ pJcwat ’ [ 2 0 ] [ ] [ ] [ 0 . 0 0 5 ] 1 ;
21 1 . 3 3 ’ pJcwat ’ [ 2 0 ] [ ] [ ] [ 1 . 1 0 . 0 0 1 1 . 4 ] 1 ;
22 1 . 3 3 ’ Jc ’ [ ] [ ] [ ] [ ] 0 ;
23 1 . 3 3 ’ pJc ’ [ ] [ ] [ ] [ ] 0 ;
24 1 . 3 3 ’ PJcbf ’ [ ] [ ] [ ] [ ] 0 ;
25 1 . 3 3 ’ Jc ’ [ ] [ ] [ ] [ ] 0 ;
26 1 . 3 3 ’ Jc ’ [ ] [ ] [ 6 0 ] [ ] 0 ;
27 1 . 3 3 ’ Jc ’ [ ] [ ] [ −25 0 ] [ ] 0 ;
28 1 . 3 3 ’ Jcw ’ [ ] [ ] [ ] [ ] 0 ;
29 1 . 3 3 ’ Jc ’ [ ] [ ] [ ] [ ] 1 ;
30 1 . 3 3 ’ Jcwat ’ [ ] [ ] [ ] [ ] 2 } ;
y f i g =420; % h e i g h t o f f i g u r e s
emf =1;
% s e t to 1 to p r i n t
a r g s ={’BW’ ’FMAX’ ’DB’ ’TINC ’ } ;
% r e a d t h e SFS−format s p e e c h f i l e
f n =’ a t 0 5 f 0 . s f s ’ ;
[ sp , f s ]= r e a d s f s ( fn , 1 , 1 ) ; % s p e e c h s i g n a l
[ pt , fw ]= r e a d s f s ( fn , 5 , 2 ) ;
% phonetic t r a n s c r i p t i o n
ann=[ m a t 2 c e l l ( [ c e l l 2 m a t ( pt ( : , 1 ) ) c e l l 2 m a t ( pt ( : , 1 : 2 ) ) ∗ [ 1 ; 1 ] ] / fw , o n e s ( 1 , s i z e ( pt , 1 ) ) ) pt ( : , 3 )
i p a=ann ( : , [ 1 2 2 ] ) ;
i p a ( : , 2 ) = { ’ s ’ ’ I ’ ’ k ’ ’ s ’ ’ p ’ ’ l ’ ’ Ã’ ’ s ’ ’T’ ’ r ’ ’ i ’ ’ i ’ ’ k ’ ’w’ ’ ’ ’ z ’ ’ n ’ ’ aI ’ ’ n ’ ’ ’
i p a ( : , 3 ) = repmat ( { ’ SILDoulos IPA93 ’ } , s i z e ( ipa , 1 ) , 1 ) ;
f o r i =1: s i z e ( p , 1 )
i f p{ i ,1} >0
f i g u r e ( p{ i , 1 } )
s e t ( g c f , ’ P o s i t i o n ’ , [ 1 0 0 100 round ( y f i g ∗p{ i , 2 } ) y f i g ] , ’ InvertHardcopy ’ , ’ o f f ’ ) ;
s w i t c h p{ i , 8 }
case 0
spgrambw ( sp , f s , p{ i , 3 } , p{ i , 4 } , p{ i , 5 } , p{ i , 6 } , p{ i , 7 } ) ;
case 1
spgrambw ( sp , f s , p{ i , 3 } , p{ i , 4 } , p{ i , 5 } , p{ i , 6 } , p{ i , 7 } , ann ) ;
7
case 2
spgrambw ( sp , f s , p{ i , 3 } , p{ i , 4 } , p{ i , 5 } , p{ i , 6 } , p{ i , 7 } , i p a ) ;
end
s s=s p r i n t f ( ’%d : MODE=’’% s ’ ’ ’ , p{ i , 1 } , p{ i , 3 } ) ;
f o r j =4:7
i f numel ( p{ i , j })==1
s s=s p r i n t f ( ’% s , %s=%g ’ , s s , a r g s { j −3} ,p{ i , j } ) ;
e l s e i f numel ( p{ i , j })>1
s s=s p r i n t f ( ’% s , %s=[%s ’ , s s , a r g s { j −3} , s p r i n t f ( ’% g ’ , p{ i , j } ) ) ;
s s =[ s s ( 1 : end −1) ’ ] ’ ] ;
end
end
t i t l e ( ss );
i f emf , e v a l ( s p r i n t f ( ’ p r i n t −dmeta %s ’ , s p r i n t f ( ’% s%d ’ , mfilename , round ( g c f ) ) ) ) ; end
i f i >1 && i <28
close ( i );
end
end
end
% now p l o t o t h e r g r a p h s
f o r i =201:201
figure ( i )
switch i
c a s e 201
f a x=l i n s p a c e ( 0 , 6 0 0 0 , 2 0 0 ) ’ ;
y=[ f a x [ nan ; l o g 1 0 ( f a x ( 2 : end ) ) ] f r q 2 m e l ( f a x ) f r q 2 b a r k ( f a x ) f r q 2 e r b ( f a x ) ] ;
[ v , i v ]=min ( abs ( fax − 1 0 0 0 ) ) ;
y=y . / repmat ( y ( iv , : ) , l e n g t h ( f a x ) , 1 ) ;
p l o t ( fax /1000 , y ) ;
s e t ( gca , ’ ylim ’ , [ 0 3 ] ) ;
x l a b e l ( ’ Frequency ( kHz ) ’ ) ;
y l a b e l ( ’ S c a l e r e l a t i v e t o 1 kHz ’ ) ;
t i t l e ( ’ Frequency s c a l e s ’ )
t x t ={2.8 2 . 7 ’ l i n ’ ; 5 1 . 1 ’ l o g ’ ; 4 . 7 2 . 5 ’ mel ’ ; 5 . 2 2 . 1 5 ’ bark ’ ; 4 . 5 1 . 7 ’ erb−
f o r j =1:5
text ( txt {j ,1} , txt {j ,2} , txt {j ,3})
end
figbolden
end
i f emf , e v a l ( s p r i n t f ( ’ p r i n t −dmeta %s ’ , s p r i n t f ( ’% s%d ’ , mfilename , round ( g c f ) ) ) ) ; end
close ( i );
end
References
[Bro11] M. Brookes, “VOICEBOX: A speech processing toolbox for MATLAB,” Imperial College, Software
Library, 2011. [Online]. Available: http://www.ee.imperial.ac.uk/hp/staff/dmb/voicebox/voicebox.
html
[Ghi94] O. Ghitza, “Auditory models and human performance in tasks related to speech coding and speech
recognition,” IEEE Trans Speech Audio Processing, vol. 2, no. 1, pp. 115–132, 1994.
[MG83] B. C. J. Moore and B. R. Glasberg, “Suggested formulae for calculating auditory-filter bandwidths
and excitation patterns,” J. Acoust. Soc. Amer., vol. 74, pp. 750–753, 1983.
[SVN37] S. S. Stevens, J. Volkman, and E. B. Newman, “A scale for the measurement of the psychological
magnitude of pitch,” J. Acoust. Soc. Amer., vol. 8, pp. 185–19, 1937.
[ZT80]
E. Zwicker and E. Terhardt, “Analytical expressions for critical-band rate and critical bandwidth as a
function of frequency,” J. Acoust. Soc. Amer., vol. 68, no. 5, pp. 1523–1525, Nov. 1980.
8
[Zwi61] E. Zwicker, “Subdivision of audible frequency range into critical bands,” J. Acoust. Soc. Amer., vol. 33,
p. 248, 1961.
9