Analysis and assessment of musical sounds

Dept. for Speech, Music and Hearing
Quarterly Progress and
Status Report
Analysis and assessment of
musical sounds
Pollard, H. F. and Jansson, E. V.
journal:
volume:
number:
year:
pages:
STL-QPSR
21
4
1980
062-098
http://www.speech.kth.se/qpsr
ANALYSIS AND ASSESSMENT OF MUSICAL SOUNDS
C.
H.F.
P o l l a r d * and E.V.
Jansson
Abstract
A ccsnplete description of musical sounds must include the changes
in spectrum w i t h time to describe the four major characteristics: dura-
tion, pitch, loudness and timbre a s we11 a s the microstructure within
notes. To understand the perception of the musical sounds, the operating characteristics of the hearing process should be considered, such
as the c r i t i c a l bands of hearing, masking effects, the t5-w resolution needed, tim constants for integration of signal paramters, and
likely memory functions.
Iong-time-average-spectra of organ ranks are shown t o predict
steady state paramters independent of starting transients, frequency
range of a rank and its roughness. Sampled f i l t e r methods show that
transients of different p a r t i a l s dartinate during different portions
of the total starting transient and that very high r a t e of changes
may occur. By replacing the natural starting transients with expnent i a l ones, the averall e f f e c t is not t o make the sound ccgnpletely different, it is a change in one quality factor of the tone, while lack
of reverberation makes the tone sound unnatural.
The tri-stimulus method, employed in explaining the colour vision,
is introduced a s a method to analyse musical sounds. In this investigation the j3rdmental was used a s the f i r s t coordinate, the folluwing
three p a r t i a l s resolved by the ear a s the second, and the higher part i a l s a s the tkird. The tri-stimulus graphs display differences b e h e m
ranks and between pipes of the same rank, which seems perceptually important.
In the following three appendices analysis methods are reviewd,
r e s u l t s of previous analysis presented, and a synthesis prescription
given t o test the relevance of an analysis.
1.
Introduction
- .
,
7.
'
,'
.
Musical sounds form a special class that d i f f e r markedly in
structure f r m either pure tones o r noise, the two extremes that are
usually considered by acousticians.
Methods of analysis t h a t have
been devised for pure tones o r noise are often unsuited to the analysis
of a musical sound. In particular, changes that occur in the spectrum
of the sound with time must be described i n same way.
In the time damain, a musical sound consists essentially of three
parts:
*
(1) a starting transient in which camplicated interference
guest researcher a t the Dept. of Speech Commmication, Febr.-June,
1979. Address : School of Physics, University of New South Wales,
Kensington, Australia
STL-QPSR 4 / 1980
63.
processes occur between the free vibrations of the modes ccanprising
the instrument and the forced m t i o n imposed by the source of energy,
(2) a m r e o r l e s s steady region corresponding t o the forced m t i o n
of the instrument.
(This 'steady s t a t e ' sound usually consists of a
s e t of p a r t i a l tones that may o r may not be related hanmnically, together with same background instrumntal noise. )
( 3 ) The decay of
free vibrations of the mdes of the instrument when the forcing function
has been remved.
The decay pattern is usually ccanplicated by inter-
ference between the instrument modes and roam modes.
Under normal
music-making conditions, the roam modes predominate during the decay.
A major characteristic of musical sounds are the changes of state that
occur with time. It is rare t o encounter a sound i n music whose descript i v e p a r a t e r s are not varying in scm way with time.
A measurement system suited for analysis of musical tones must be
capable then of providing not only a time description of the sound but
also a detailed frequency-time analysis i n terms of a l l the important
spectral cmpnents.
Furthennore, even the simplest piece of music
consists of a sequence of notes.
Therefore, the analysis procedure
should be capable of dealing w i t h a t l e a s t a simple sequence, such a s
a diatonic scale.
,
The present investigation aims towards such a physical analysis
procedure.
But it is equally important that the analysis procedure is
adequate for the perception by our hearing.
Therefore properties of
the hearing process are discussed i n sarne d e t a i l . Musical sounds have
certain signal properties and t h e i r perception determines which of
these properties are important.
..
The report is the work by one of the authors (HFP, on sabbatical
leave t o the Departmmt of Speech Carrarmnication, KTH) together with
the second author.
The report contains tm main parts:
one part reviewing previously used methods and investigations, and a second part
w i t h an analysis of organ tones done a t the Department of Speech Cammication.
F ' u r t h m r e , simple gating and f i l t e r experiments are
described and a new method, "the mi-stimulus Method", is introduced
a s a possible description for the timbre of musical instrurrrents.
2.
There are four major characteristics that are used to describe
a musical sound: (1) duration, which involves the relative imprtance
of the starting transient, the 'steady1sound and the decay; (2) pitch,
M c h involves an assessment on a one-dimensional scale related to
frequency; (3) loudness, which is a masure of the electric p e r
transmitted by the auditory nerve (Howes, 1979); (4) timbre, which
is a multi-dinuansional entity.
1
I
I
" -
'
'
.ci!::'
. ,,.
I
+<
- .
I
7-'
y
Characteristics of a musical sound .
-
-
In addition to the major characteristics of the sound, there are
the dynamic fluctuations within each note (soarretimescalled the microstructure), examples of which include: (a) variations in the level of
,
;,
-- .
',
1
*?-'<
the time envelope, (b) variations in both the level and frequency of
spectral canpnents, (c) the temprary appearance of hhammnic spectral
r ' j;
canponents, and (d) the presence of noise caponents. All of these
.+xifactors can be vital for a musical sound.
. ..
I
'"
3.
1
-i
.
/ '
'a
1
';.
for higher frequencies (Zwicker & Feldtkeller, 1967). The bandwidths
,' E
i-
soarre as-
The basilar mgnbrane contains about 30,000 receptor hair cells
and behaves as a set of 24 band pass filters (criticalbands) each con1
taining approximately the same n
&
r
of hair cells (about 1300 per
- band)
The bandwidths of the critical bands are approximately 100 Hz
for frequencies below about 450 Hz and approximately one-third octave
-,
s
When it is desired to study loudness relationships and
I
pects of timbre, the operating characteristics of the hearing process
should be considered. While there is detailed physiological and
I
psycho-physical information available concerning processes occurring
in the region of the cochlea, there is much less known concerning the
processing of neural impulses after they leave that region.
,..
r
'
Perception of musical sounds
- " ' measured as pulsation threshold have, however, approximately half the
":
'
.
above bandwidths '(~lcmlp,
1967, Ch. 1) The maximum firing rate of
neurons energised by the sound wave is 1000 per sec; the rate being
dependent on the intensity of the sound. Individual neurons respond
to an intensity range of 40 dB. The recovery time after each firing
I
is greater than 1 ms so t h a t the minimum time required t o detect a
There is sane
change in intensity is probably i n the range 2-10 ms.
agreement that a time constant of 100 ms is involved i n the aural assessment of the loudness of a steady sound (Zwicker & Feldtkeller,
1967).
Details concerning the subsequent auditory coding of neural
impulses arising fran the ear are still somewhat speculative
(Ibederer, 1975)
.
Thus, for loudness determination, the analyser should preferably
be the equivalent of a set of f i l t e r s with bandwidths approximately
the same a s the c r i t i c a l bands of the ear, and with the added capab i l i t y of providing an estimate of the loudness i n each band a f t e r
low pass f i l t e r i n g with a t i n e constant of 100 ms.
The f i r i n g r a t e
implies that the f i l t e r s should be sampled a t l e a s t every 10 ms.
One characteristic of a steady sound is roughness (Terhardt, 1974).
A musical sound w i t h
t m o r m r e p a r t i a l s w i t h i n the same c r i t i c a l
band w i l l give rise to a sensation of roughness.
The partials within
the sarue band are not resolved but produce a beating e f f e c t which can
be unpleasant.
With only one p a r t i a l in each occupied band, the music-
a l e f f e c t is one of m t h n e s s .
The perceived roughness corresponds
to a lm-pass RC-filtering w i t h a 13 ms t i n e constant.
A strong sound canpnent may make another c a t p n e n t impossible
to perceive. This p h e n m o n is called masking. In the frequency domain a low tone may mask a high tone, while the opposite effect is
snall. Therefore, simultaneous masking can be modelled a s a masking
threshold f a l l i n g fran the level of each band a t a r a t e of approximately
10 dB/Bark towads higher pitches (c.f
Fig. 76.2).
. Zwicker & Feldtkeller,
1967,
Masking p h e n m are also present i n the time dcanain.
A suitable labelling pre-masking and post-masking were introduced
i n Zwicker (1977). The p r e l ~ s k i n gfor loudness is short relative
to that of post-masking.
* ' >
A
The relative importance of the starting transient and the steady
state parts of a musical note have been studied by a n m h r of mrkers
(sunanary in Grey, 1975)
.
The starting transient plays an important
the note together with an assessmnt routine for dynamic changes,
(2) a long-term m r y which muld enable the identification of the
instmment being played o r the relationship of one note with another
in a sqknce.
. ' .s?43t,i1
c y,.
C
possible a t a l a t e r date
- . A further r e f i n m t that may beis to use a system of analysis that follows more closely the auditory coding processes used by the brain. These processes have yet
>;
1 1
t l
, I
J
,,
to be delineated with respect t o hearing but advantage may possibly
.l
be taken of the progress that has been made in relation to optical
- t
I
coding.
A
,
c.
i n i t i a l studies for this report are sumoarized i n Appen-
yses, and Appendix C methods of synthesis for finding out the im-
.
"
I
portant sound parameters for the perception.
4.
I
SUE
dices. Thus the Appendix A presents methods previously used for
I
analysis of musical sounds, Appendix B same results of previous anal-
Q-
t
.
,
I +
I
LTApmeasurements and f i l t e r i n g e x p e r i m e n t s
-
Recordings were made of a diatonic scale (C4 t o Cg) played on
three different organ pipe ranks a t Sphga Church, Stockholm, using
four different note durations (very short, 0.5 s, 2 s, continuous
diatonic scale). For each of the ranks of pipes recorded and each
Fig. IV-C-1 a shows
note set, an LTAS was c a t p t e d (see Appendix A)
.
the curves derived for a R i n c i p a l 8' rank (short notes and continuous s c a l e ) , while similar curves are shown i n Figs. IV-C-lb and 1c
for Gedackt 8' and Vox hum^ 8' ranks, respectively.
There is l i t t l e
difference i n the shapes of the curves obtained for each rank, except for the shortest notes played on the Principal 8'.
In the lat-
ter case, the curve for the shortest tirrae shows a reduced level a t
l o w frequencies likely because the fundamental is slaw to develop.
q^'"6-b:
.
It is concluded that where the LTAS is not very sensitive to the char"
:
acter of the start?ing transients of the notes, the LTAS is a representative average for the quasi-steady s t a t e portion of the tones
i .e.
, the physical
parameter defining steady state timbre.
-
I
STL-QPSR4/1980
69.
I
I
I
1
I
I
I
1
I
1
I
I
PITCH ( B A R K )
0
10
20
;*
PITCH ( B A R K )
0
Fig. IV-C-1.
10
PITCH ( B A R K )
20
LTAS-curves for a diatonic scale (C4 to C5) on
(a) Principal 8 ' rank of organ pipes,
(b) a Geciackt 86 rank,
(c) a Vox Humans 8 ' rank.
The organ was a Marcusson in Sphga church, S ~ . o l m .
The curves shown are for a set of very short duration
notes (---) and f o r a continuous diatonic scale (-!.
A pre-exphasis curve was used in the analysis.
I
m can predict p s s i b l e roughThe LTAS:es describe approximately the spectrum
Frm the LTAS:es i n Fig. IV-C-1
ness sensations.
envelope for different ccanplex tones and thus m can estimate the
upper frequency limits for the different musical tones.
I
For the three organ pipes having pitch C4 (262 H z ) , the high
frequency limits are approximately:
Principal C4, 4 kHz; Gedackt Cq,
3 kHz; Vox Humana C4, 8 kHz.
Principal and reed pipes a s Vox Humana produce a f u l l set of
harmonic p a r t i a l tones and for a fundamental frequency of 262 Hz,
which is also the frequency difference betmen partials, t m o r mre
partials w i l l f a l l within the s m c r i t i c a l band above 1.5 kHz.
This
w i l l give rise to possible roughness, which is especially so for the
with strong p a r t i a l s above 1.5 kHz.
Vox H,-
Closed pipes produce only odd-nuhered partials. For instance,
for a Gedackt C4 pipe, only p a r t i a l s above 3 kHz can contribute t o
roughness.
there is l i t t l e radiated sound above
t h i s frequency. Listening to recorded tones reveals that the Gedackt Cq sounds smooth, the Principal Cq sounds slightly rough,
and the Vox Humans Cq very rough. Low-pass f i l t e r i n g a t 1.5 kHz
rem3ves the roughness sensation, which c0nfin-w the previous con?
f
clusions regarding roughness.
5.
A s noted above,
Growth c u r v e measurements and q a t i n g e x p e r i m e n t
Steady signals are frequently analysed with the aid of a s e t
of analog f i l t e r s (either 1/3 octave bandwidth o r narrcw band) whose
outputs are scanned in series o r parallel.
For the analysis of
transient signals, a s i n speech and music, a parallel bank of f i l t e r s
is required whose outputs can be sampled a t short time intervals, a t
least every 10 m s . The f i l t e r bandwidths may be variable o r set t o
1/3 octave values. Digital f i l t e r s muld be an alternative but to
date they do not appear t o have been used in the analysis of musical
sounds.
A pre-emphasis netmrk may be desirable i f the magnitudes
of important high frequency components are lm.
The mighting net-
m r k can also be designed to be an approximation t o loudness mighting functions (Stevens, 1972; Zwicker
&
Feldtkeller, 1967). The
sampled f i l t e r outputs are stored i n a ccarrputer for subsequent analysis and display.
I
I
-
---
- -
--
STL-QPSR 4 /I 980
5.1
71.
Sampled F i l t e r Methcd.
A set of 51 adjustable f i l t e r s was available f o r t h i s analysis.
The f i l t e r s e r e set to have a centre frequency every 100 Hz s t a r t i n g
£ram 100 Hz, the bandwidth was set to 125 H Z , smoothing f i l t e r time
constant 13 ms, sampling r a t e 100 Hz.
ne-k
In addition, a pre-esrqhasis
was used (Fig. IV-C-2) that permitted the recording of great-
er amplitudes a t higher frequencies and a l s o gave a weighting t h a t was
an approximation to loudness weiqhting functions (Stevens, 1972;
Zwicker
&
Feldtkeller, 1967). I f required, unweighted f i l t e r respons-
es could easily be derived f r m which loudness values in sones could
be c q u t e d .
Figs. IV-C-3, IV-C-4,
IV-C-5 show r e s u l t s obtained for a Gedackt
C4, Principal Cq, and Vox Humana C4 pipe, respectively (Sphga church
organ).
In each case the weighted t o t a l level is given together w i t h
curves for
scatre
of the p a r t i a l tones (only the f i r s t six are shown
f o r the Principal and Vox Humana).
For loudness carputations the
f i r s t s i x p a r t i a l s are treated separately, a f t e r which the p a r t i a l s
are grouped since'mre than one of the upper p a r t i a l s w i l l f a l l into
each c r i t i c a l band.
One of the major e f f e c t s produced by the s t a r t i n g transients of
sounds is to d r a w the attention of the l i s t e n e r t o the sound.
It is
of interest therefore to p l o t the derivative of the sound level versus
t h e a s in Fig. IV-C-6, f o r the Gedackt C4. Fram inspection of such
graphs it is possible to ascertain which p a r t i a l tones are likely to
be dcaninant in gaining the attention of the ear. Such overall features
may be shown mre clearly by plotting only the dominant slopes in each
case, a s shcwn i n Fig. IV-C-7, For instance, in the case of the Gedackt Cq pipe, the fundamental tone has its f a s t e s t r a t e of change
between 0 and 11 ms, the f i f t h p a r t i a l betwen 11 and 18 ms, foilby the fundamental again fran 18 t o 42 ms.
For the Principal C4 pipe, the second p a r t i a l is d m h a n t be-
twen 0 and 23 ms, the 11th and 12th p a r t i a l s betwen 23 and 31 m s ,
the t h i r d p a r t i a l betwen 31 and 48 ms, followed by the fundammtal.
I
C
..a0.
*.
.-:,
..-..
-
,u&c;'
-*
t'
1
f
j
i
,
$**
5 .
J,
.:
(a) Pre-emphasis filter characteristic for sampled
filter set.
!b) mudness weighting function (Stevens, 1 972),
(c) Loudness weighting function (Zwicker, 1 967) '
r- :
.
->
*
:
50
i
F?
1
.
L -
i
, 7-
*
1
1
GEDACKT C,
.
'.;,
,.
F.:
\
,
,
x
40
'-tx;
a r .
)'-L
.
A
r
FREWENCY ( HZ 1
L
Fig. IVX-2.
.
... " * \ . .. . .. < , t F : ' "
,.,I 'f. ' r F.T;;;.-:
,
,.
,..
:
.->f:b
5, ;
*
'
. :;
.,=-sfeir
(,f
-. 3,-
..?.
3
Relative
,
+
Loudness
I
,
.
.
-
Level
\
\
k .,
7
j
'.A
.
/--
-
it
,
$
Fig. IV-C-3.
j
-.- . . .
+
TIME (rns )
. -.
--
1 .-.,A
..
,
Analysis of startina transient for a Gedackt Cd
organ pipe. Partials 1-9.
TlME ( m s )
D e r i v a t i ~of loudness level versus centre t i n e
of 10 m s band f o r a Gedackt C4 organ pipe. The
numbers indicate partial tones.
,Fig. IVq-6.
r.
.-,,-. : '
,,-;y,:
Fig. IV-C-7.
..:
TIME ( m s )
-1
.
.-,7-,.
.
....
Derivative of loudness level versus centre time of
10 m s band for (a) Gedackt Cq, (b) Principal Cq,
(c) Vox Humans Cq organ pipes. The p a r t i a l n u h e r
is marked against each graph segmnt.
STL-QPSR 4/1980
In the case of the Vox Humana, the fundamental is daninant
£ram 0 to 17 ms, the third p a r t i a l from 17 t o 32 ms, the 11-13th
partials £ram 32 to 43 ms, etc.
While such graphs do not constitute
the whole picture regarding the aural assessment of transients, they
are a t least useful in revealing the partials which have the greatest r a t e of change of loudness a t each interval of time. It is interesting to note the high r a t e s of change involved, in saw cases
w e r 2000 dB/s.
The above analysis implies that there is an analog of the
I
effect which is applicable t o rates of change of signal level.
It muld be of interest to conduct further psycho-acoustic tests re-
lating to this supposition.
Gating Experiments
5.2
$
Effects of three kinds of gatings were tried.
In the f i r s t one
modifications of the starting transients were investigated.
The
transients were cut w i t h an exponential f a s t gate to avoid click
sounds.
The i n i t i a l portions of tones were cut off i n steps froan 50
to 300 ms.
The effects of the cuttings varied fran small t o con-
siderable (especially for the Vox Humana) but did i n no case make
the sound ccanpletely different. The overall e f f e c t is to change one
quality factor of the tone.
,
I
In the second series the effect of cutting away the roam reverberation was tried.
W i t h the previously used gating system, tones
w i t h and without reverberation were carpared.
makes the tone sound unnatural.
Lack of reverberation
In the t h i r d series t m tones were
added w i t h the start of one delayed relatively t o the second.* The
delays t r i e d were 0 , i 25, i 50 ms.
The effect of this delay was
f a i r l y small even w i t h the 50 ms delays although the difference in
starting transients were perceived i n these cases.
A 50 ms time de-
lay corresponds t o 16 m difference in distance t o the pipes, i.e. a
f a i r l y large distance difference :seldom encountered by a listener.
5.3
Fourier Transform Method
Similar information to that provided by the sampled f i l t e r mthod
m y be obtained by digitising and storing the signal and then using a
*.
.
m e additions were i m p l m t e d by Ake Olofsson on the ccsnputer of
the Department of Technical Audiology, Karolinska Institutet.
1
f a s t Fourier transform t o cosnpute the spectrum of selected segmnts
of the signal.
I f a sampling r a t e of 25 kHz is used, then each 10
m s slice of the signal w i l l contain 256 points and a corresponding
I
I1
frequency spectrum is obtained w i t h a frequency resolution of 100
Hz.
Problems that arise w i t h this mthd include the definition of
the s t a r t i n g point of the analysis and the choice of a suitable
-
winduw function. Sane werlap of time s l i c e s is desirable - a t least
50%is often used. This method needs further dwelopmnt before it
can cclmpete w i t h the sampled f i l t e r mthcd for convenience of operation.
i
,
6.
T r i - s t i m u l u s Method
A basic problem in devising a satisfactory system of measure-
ment for transient sounds is to r e l a t e the acoustic procedures t o
those used by the ear and brain. Unfortunately, there is l i t t l e
detailed information available concerning the processing of neural
impulses a f t e r they leave the region of the cochlea.
,
This situation
is in contrast w i t h optical research where there is considerable in-
..
formtion available concerning the coding of optical signals.
p i t e the e n o m u s n&r
:
Des-
of receptors in the retina, colour vision
depends on only three different types of cone receptors.
Scune of the
possible coding operations that occur betmen the retina and the vi-
-
:
F
..
sual cortex have been discussed by Land (1977).
I
,
Sarrre clues a s t o possible auditory coding procedures cclme froan
psychoacci s t i c experiments relating to the timbre of musical sounds.
3,
-
Grey (1977) describes experimnts on a set of 16 music instrumnt
tones i n which perceptual similarities were determined for a l l pairs
of stimuli.
Application of multidimensional analysis sh-
that
three dimensions were sufficient to give a satisfactory representation of the data (see also Section 3). Plcanp and S-eken
(1973)
describe an experiment in which the relation between timbre and
sound spectrum for selected steady sounds was studied a t various
pints in a diffuse sound field. A dissimilarity matrix was constructed which showed that the ten test stimuli could be represented
satisfactorily by a three-dimensional space. Thus present tentative
information, such a s that provided by the above experiments, suggests
that the f i n a l assessment of auditory siqnals is in terms of a
nunher of derived parameters.
-11
Whether the number of such
parameters can be a s small a s three w i l l depend on the outcame of
m r e extensive experimentation.
On a related topic,
Mller (1979)
finds that six nvtasurement parameters are needed t o obtain reasonable correlation beheen objective and subjective descriptions of
high f i d e l i t y audio systems.
Time variations of amplitude and frequency are important in
the perception of sounds. Associated with these effects are memory processes, both short-term and long-term. A t present there
is no satisfactory method for incorporating such processes into
acoustical measurement procedures.
A s a starting point f o r
the develo-t
of a simplified m
e
w
f o r representing musical sounds, it was decided to use a 3-coordi-
nate methcd based on the spectral information derived f m the
sampled f i l t e r set. This method produces the level of each p a r t i a l
as a function of t i m e . The partials are groupd i f m r e than one
f a l l s within the sarrre c r i t i c a l band.
A choice must nuw be made of
the three spectral regions that are t o represent the three coordi-
nates.
It was decided to use the fun-ntal
a s one coordinate.
A choice must also be made of the unit in which to express the
f i l t e r outputs.
Since a nmber of arithmetic operations are t o be
made, it was decided t o convert the f i l t e r outputs into loudness u n i t s
in sones. I f the unweighted output levels are used, it is convenient
t o ccsnpute loudness values using Stevens (1972) Mark VII method.
When wei'*ted values were used, it was assumed that these values
were the equivalent of PLdB values f r m which sones were easily obtained using the table given i n Stevens ( 1972)
.
A s a f i r s t approximation, the output of each band may be con-
sidered to be independent, so that the total. loudness of the sound
may be expressed a s
I
I
.
-r
" .
I
.
:. .
I f an analogy is drawn w i t h the optical tri-stimulus method, each
?-
c . .
texm on the right-hand side of Eq. (1) may be regarded a s a tri-stimulus value f r m which a s e t of n o m l i s e d coordinates may be found:
." .
i
a
., tsi; !.
Frcm these coordinates, it is now only necessary t o select
Ism in order t o d r a w a graph since x + y + z = 1.
-L:w.
Fig. IV-C-8
shows a graph of x versus y with the significance of the main areas
indicated.
,
-:.I.
<
3,
I
Fig. IV-C-9 shm a tri-stimulus graph for the behaviour of
the s t a r t i n g transients of the following organ pipe sounds:
.1
I
-2
,i)
z:;
<.
3
.; '13:
I
(a) Gedackt 8 ' Cq, (b) Principal 8' C4, and (c) Vox Humans 8' Cq.
Tirrres measured f m the onset of the sound are shown beside each
curve. The f i n a l pint in each case corresponds w i t h the steady
sound. A nurmber of interesting features may be observed. It is
M i a t e l y apparent how the spectral character of the sound changes
.'
.]>
betmen c r ~ s e tand steady state. The Gedackt C4 has stronq high
p a r t i a l s i n i t i a l l y and then progresses t o a mre fun-tal
type
-*31
of tone. The Principal C4 starts with strong high partials and
i
a
then progresses t o an evenly balanced sound.
The Vox Humana Cq
I
starts with a predominance of lower p a r t i a l s and then m e s tawards
.
t3;4;;-:>
a predminance of higher partials.
This methcd may be used t o investigate the transient behavioux of different pipes within the same rank of pipes.
For in-
stance, Fig.I I17-C-10 shows n o m l i s e d tri-stimulus plots for three
notes of the Gedackt 8' rank, while Fig. N-C-11 and 12 show plots
for the same three notes of the Principal 8' rank and Vox Humans 8'
rank, respectively. While the Gedackt notes show a similarity of
behaviour, the Principal aFti! YOx H u m n a notes
show marked differ-
ences i n transient behaviour, although the steady state values are
I
I
Ad
Fig. IV-C-8.
Acoustic tri-stimulus diagram.
Fig. IV-C-9.
Acoustic t r i - s t i m l u s diagram shaving transient
behaviour of a Gedackt C4, Principal C4 and Vox
Humana C4 organ pipe. The nunhers indicate
t i m e in milliseconds a f t e r onset of sound. The
larger circles correspond t o the steady state.
I
'
Gedackt 8'
F . 1
0 Tri-stimulus diagram
s
~ the transient
u
behaviour of three
Gedackt 8 organ pips.
0.5
1.0
-x
>
.
'\
Fia. IV-C-11.
.
Tri-stimulus diagram
show in^ the transient
behaviour of three
Principal 8' organ
pipes.
\\
-
T
-
-..
'3
I
0.5
1.0
x-
i . -
1 2.
Tri-stimulus diagram
showing the transient
behaviour of three
Vox H i m a n a 8 ' organ
pipes.
. .
STL-QPSR 4/1980
in approximately the same area of the diagram.
In listening t o the
recorded sounds, the Principal E4 pipe had a slowly developing fundamental while the Principal Gq pipe had a prominent high-frequency
'chiff' i n the starting transient.
The Vox Humans E4 and Q4 notes
have a similar starting sound whereas the C4 note is markedly different.
7.
Discussion
&thods are now being evolved for the analysis of ccanplex sounds,
such a s the starting transients of musical sounds, that take account
of the known characteristics of the hearing process.
Factors that
are relevant t o such a system of analysis include (a) an i n i t i a l frequency analysis related to the c r i t i c a l bandwidths of the ear, (b)
a knowledge of the response times of the hearing mechanism, (c) masking effects, (dl the possible operation of the precedence effect, and
(e) m r y processes.
In our experiments w have shown that long-time-average-spectra
give a representation of the "steady-state" properties of tones f m
an instrument.
W e have shown t h a t the properties of starting tran-
sients are suitable to analysis i n sampled parallel f i l t e r bands and
a new method "the tri-stimulus method" has been introduced a s a
simple way t o s m i s e these p r o p r t i e s .
Not a l l of the factors (a) t o (e) above are sufficiently well
defined t o be incorporated i n a system of masurement (see Appendices
A and B ) .
For instance, there is a vagueness in the literature con-
cerning the minimum time required t o register a change in loudness
o r of timbre.
A knowledge of such times muld be important in defin-
ing the frequency of sampling and averaging of time-dependent transients. There have been few a t t e r s t o incorporate memory functions
into analysis procedures despite t h e i r importance in subjective evaluation of transient sounds.
Detailed information concerning auditory coding of signals is
not yet available.
Eventually t h i s type of information could lead
t o marked changes in analysis procedures. A satisfactory method of
analysis should be applicable t o sequences of musical notes since the
I
I
analysis of isolated notes is of little interest outside the laboratory. A critical test of any analysis procedm is to use the
analysis data as the basis for canparison by synthesis (c.f. Appen-
!
!
<
.
'
~.
...-..,,
..
,
.
!
, ,
..Zi
.j
L
' 8
,
"
>
:
;; .
' >
.;?.
;
7
,-
' : I",". ':Y%Il
., .,,<..:
..a
;,:,y,:
f
!-T.k3:33ri
y
:
:
;
?I-)
APPENDIX A
METHODS OF ANALYSIS FOR TRANSIENT AND STEADY STATES
Available methcds of analysis are nearly a l l forms of frequen-
cy analysis. L i t t l e progress has been made with analysis procedures
in the time damah.
*
A. 1
.
Modified Fourier Series ~ ~ 1 o d
The Fourier series methcd a s s m s that the signal is periodic
and i n f i n i t e in extent. One period of the signal may then be digitised and a line spectrum c q u t e d consisting of hanmnics of the assumed period. I f the signal is mildly aperiodic o r is varying slawly
with time, a s in a slowly growing o r decaying transient, t h i s method
rnay be approximated by assuming that each period of the transient may
be isolated and treated as i f it were part of an i n f i n i t e t r a i n of
such periods.
Thus, it is assumed that the sound pressure a s a function of
time can be represented by
I
where
, I
&(t)is the amplitude of the rmth p a r t i a l a t tim t ,
em(t)
is the phase angle of the m t h p a r t i a l a t time t.
w1
M
is the fundamntal frequency
is the order of the highest p a r t i a l q t e d
The values of amplitude and phase so computed are assumed t o
e x i s t a t the centre point of the chosen period.
Sources of error
include the definition of the period boundaries, the assmption that
the time function is quasi-periodic and limitations of numerical integration.
The method does not detect the presence of inhanmnic
tones o r give a measure of b a c k g r a d noise.
This methcd has been used by Keeler (1972a, b) for the determination of the growth i n amplitude of the p a r t i a l tones of organ
pipes. As pointed out by Moorer (1975), when this method is used
For extracting phase and frequency information, the values may vary
APPENDIX A
L l
',
dismntinuously £ran one period to another, which gives rise to
problems i f the data is t o be used l a t e r f o r synthesis. In addition, Keeler made use of Sinrpson's rule rather than use direct summation which could introduce aliasing problems f o r sounds having
greater harrnonic developwnt than organ pipes.
A.2.
Discrete Fourier Transform Method
F r ,
a
I
(1967) used the follcwing function to represent a
Frmusical note:
it 2i,
The function \(t) is a simple cascading of exponential attack functions
.
-?
and W,(t)
represents a frequency which changes by a discrete step w h ,
an ata t time t h,.
represents q l i t u d e , ,t
onset time, a
tack t i n e constant, @
i n i t i a l phase angle of the m t h w n e n t , and
,
u ( t - t) ,
is a step function dmse value changes £ran zero t o one a t
5:.
In order to evaluate the parameters, tsm transforms are used. The
D transform, defined by
!.-,
F ,
t
- > '.
;"
1
.a
-
1
STL-QPSR4/1980
Appendix A
I
i
.
}
{
:
A (n) sin nT [mu + 2nFm(n)]
X(n) =
m
m= 1
* <
,I
-
I
(A71
', -,;
I
where X(n) is the signal a t time nT, n is a tinre index, T is the time
between successive samples, w is the fundamental radian frequency of
the note, m is the partial number, Am(n) is the amplitude of partial
m a t time nT, M is the number of partials, and F,(n)
deviation of partial m a t t h nT.
I
i .
is the frequency
I
For the method t o function properly, it is necessary to estimate
the fundamental frequency of the waveform within a 2% deviation.
Each hammnic is extracted in turn, and plots made of amplitude and
,
. r frequency a s functions of tinre.
- - - ' bank of f i l t e r p ,-,
T --
The method is thus equivalent to a
.*.
d -
Sampled Analog F i l t e r Wthod
A.5
I
\
-....-<
- s
I
Steady signals are frequently analysed with the aid of a set of
: analog f i l t e r s (either 1/3 octave bandwidth or narrow band) whose
kt:
il- i 8 s outputs
,. .
i
1
LAL*.
.. - -Ti",:,
*
are scanned in series or i n parallel.
,J
For the analysis of transient signals, as i n speech and music,
a parallel bank of f i l t e r s is required whose outputs can be sampled
-
c.
. .*-
'.*
-
+
-'
-. .
):
3
a t short time intervals, a t least every 10 m s . Preferably, the bandwidth of the f i l t e r s should be variable. For certain analyses a pre-
2
#:I.-
emphasis netmrk may be required i f the m g i t u d e s of important high
f-
frequency components are low.
,.- of
,
.
I
A mdified form of t h i s method is to record
the output of a set
f i l t e r s in turn on a transient waveform recorder.
Although t h i s
-=A'
procedure is slow, it is feasible to record a t least 256 sample points
.
for each 10 m s of the waveform. Subsequent signal processing is per'nc
foby an interfaced ccarrputer.
3,
I
A. 6
Iong-Time-Average-Spectra (LTAS)
Jansson and Sundberg (1975) and Jansson (1976) describe a tech-
nique for deriving an average spectrum of sounds or sequences of
sounds that have a total duration of approximately 20 s. A bank of
. .
i
i
6
23 f i l t e r s is used having bandwidths similar to those of the c r i t i c a l
-- .
-.,.
bapds of the ear.
..c.
1
I
Appendix A
The output fram each filter is rectified, smoothed with a time
constant set to 13 ms, and converted to a logarithmic scale. The
logarithmic output £ram each filter is sampled at a frequency of 160
Hz, which means that several sampling points are obtained even of
tones as short as 50 ms duration. Thereafter, the sampled output
measures are quantized in steps of 1 dB, transfont& into p e r
measures, and added in storage cells in a cmputer, one cell for each
filter. Finally, the contents of every cell are normalized with r e s
pect to the analysing time, transformed into dB measures, and plotted
as function of frequency giving an LTAS. Thus an LTAS provides a
record of sound-poier band-level as function of frequmcy. The procedure allaws the results to be displayed w i t h essentially no delay
after the sampling is caplet&.
As a consequence of the procedure just described, the LTAS is
depndent on the fun-tal
frequencies, the spectrum envelopes, and
the intensity levels contained in the sound analyzed. It is notemrthy, that tones of high sound pressure level will contribute mre
than tones of law, as the LTAS averages the sound-pressure amplitude
M e d times the duration of each frequency cmpnent.
The ccmputer program allows canparison to be made of any of the
stored LTAS records. For instance, a dissimilarity measure may be
ccanputed between t m given LTAS :es
.
STL-QPSR4/1980
APPENDIX B
4
-
\
PREVIOUS ANALYSES OF TRANSIENTS
,
\.'
*.?:
,
I
..
'
-a
.,.
.
I
There are relatively few publications available with detailed
\
analyses of musical notes.
..J'
A
With the advert of d i g i t a l m t h o d s of
analysis an increasing n&r
of papers are appearing.
In the fol-
lowing notes, investigations have been grouped according to tradifif
,
. .,
.
tional classes of instrument.
d l ;
1,
,Y !i '
b.
t
B. 1 Pipe Organ
I
-jr
b a
'
One of the e a r l i e s t investigations was that of Trendelenburg
(1936) who masured the starting transients of a nwrber of organ
pipes by means of oscillograms of the build-up of sound i n successive
one-octave bands.
I
Nolle and Boner (1941) also examined starting tran-
-- i i s i e n t s , without using f i l t e r s , by photographing an oscilloscope display with a m i e camera. Deductions were made £ram the visual appearance of the oscillograms. For instance, it was observed t h a t ,
with open diapason pipes, the second partial was i n i t i a l l y dcaninant
during the starting transient.
daninated i n i t i a l l y .
Reed pipes start p q t l y with the fundamntal
prominent fram the outset.
.Z
" \ & - ,
u
With bourdon pipes, the f i f t h p a r t i a l
They also found that the time required
to reach steady speech required approximately the same nmber of
cycles regardless of the pitch of the pipe. They confirrcred Trendelenburg's observation that inharmonic p a r t i a l s may be present durinu
I
the starting transient state.
I
Richardson (1954) recognised the irrcportance of the starting trans i e n t with respect to i n s w t tone and conducted an investigation
of the i n i t i a l build-up of sound in organ pipes.
The microphone out-
put was displayed on an oscilloscope, together with a timing signal,
which was *tographed
by a w i n g film camera.
A l s o photographed
was a record of pressure changes in the pipe foot. Portions of the
film, a t approximately 10 m s intervals, =re enlarged and subjected
t o harmonic analysis (amplitude and relative phase).
I
Caddy and Pollard (1957) investigated the e f f e c t of the playing
action on the starting transient of a principal pipe.
controlling the starting transient and steady state *re
The factors
delineated.
89.
STL-QPSR 4/1980
Sundberg (1966) made a thorough study of the physical behaviour
of organ pipes including an examination of starting transients.
Oscil-
lograms are shown for principal and f l u t e pipes in which the qrowth
of the fun-tal
and of the upper p a r t i a l s have been separated by
filtering.
Keeler (1979b) applied a mdified Fourier series method to the
cmputer analysis of growth curves for a nLrmber of f l u t e , principal
and reed pipes.
W i t h t h i s method there is the need to identify each
cycle i n the sound emitted by the pipe, which is d i f f i c u l t t o do i n
the i n i t i a l stages of the transient.
In h i s published curves Keeler
shows plots of pressure amplitude versus time. When these are converted into loudness plots, the relative roles of some of the partials
a r e altered. Fig. IV-C-B.l shows loudness plots f o r the f i r s t few
partials of a gedackt, principal and trumrpet pipe.
Fletcher (1976) has developed a theory for the onset of sound in
an or9an pipe which involves non-linear couplin9 between the a i r j e t
system and the set of pipe n o m l modes. A sensitive factor i n developing a solution is the r a t e of r i s e of pressure i n the pipe foot.
In m y cases, the second p a r t i a l of the pipe is found to develop
mre rapidly than the fundamental. The frequencies of the upper part i a l s are not i n i t i a l l y harmonic being p r t l y determined by the resonances of the pipe alone. In the steady s t a t e a l l m d e s b e c m locked
into a hamronic relationship.
B.2
Piano and Harpsichord
Fletcher e t a 1 (1962) examined the decay of sound in a grand
piano using a wave analyser t o isolate the p a r t i a l tones. The rapid
attack precluded any detailed analysis of the starting transients.
Weyer (1976, 1976/77) made a thoroug!~examination of both the
time and frequency behaviour of the starting transients and decay of
various noted fram harpsichords, upright and grand pianos.
Alf redson and Steinke ( 1978) used the Fourier transform mthod
to examine
decay of a whole piano note but did not extend the
mthod to examine temporal dependence.
STL-QPSR4/1980
Appendix B
B.3
Strings
Beauchamp (1974) describes the analysis of violin tones that
were recorded i n an anechoic room and analysed by computer using a
d i f i e d Fourier series technique. The analysis provides fundamenta l frequency, hanmnic amplitudes and h m n i c phases a s a function
of time throughout the duration of each sound.
Grey (1975) describes the application of the heterodyne f i l t e r
method t o analyse variations i n amplitude and frequency of each p a r t i a l
tone of a musical sound a s a function of time.
Examples are given for
a number of different sounds including that of a cello note.
Fran the
original analyses, straight line segnaents are derived to show in a s
simple a m e r as p s s i b l e the time history of each p a r t i a l tone.
It
is found that each of the time functions can be satisfactorily represented by betmen 4 and 8 line segments, a s shown i n Fiq. IV-C-B.2.
This technique significantly reduces the m u n t of data storaqe necessary for l a t e r use i n synthesis of sounds.
B.4
Woodwinds
Grey (1975) also gives m examples of modwind tones reduced
to line segmnts. Fig. IT7<-R.3
a clarinet, f l u t e and oboe.
shows notes of the sanae pitch fran
Using the method described i n section A.3 of t h i s report, Bariaux
e t a1 (1975) give an application of t h e i r method to measmement of the
time histories of the f i r s t and third p a r t i a l s of a clarinet tone.
B.5
.
Brass
Rissett and Mathews (1969) used a d i f i e d Fourier series mthcd
t o analyse trumpet tones.
Since they were interested i n camparing
the original with synthesised tones, line segrmt functions were derived for the time histories of each p a r t i a l tone.
Factors arising
from the analysis t h a t =re found t o be important were:
( 1)
(2)
the spectrum envelope,
the attack transient which l a s t s for approximately 20 m s
f a s t e r build-up of lower order harmonics than
higher-order ones.
wit!!
(3)
a high-rate, quasi-random frequency fluctuation.
It was also found that low-order h m n i c s have a longer decay.
(c l
t
*
d
.!
Fig. n"C-8.2.
-
*
Heterodyne f i l t e r analysis of a
cellonote
frequency 314 Hz).
Each p a r t i a l curve has been approximated by s t r a i g h t l i n e segm t s (Grey, 1975)
(~2,
-
Amplitude
k
?J,
,!
Time
.
-,
l l
1
'
,
,
J
r!
0
.
Frequency
,
Fig. IV+-E3.3.
-
Heterodyne f i l t e r analysis of
notes £ran (a) an aboe, (b) a
f l u t e , (c) a c l a r i n e t . Each
note has the same pitch,
(Grey, 1975)
.
B.6
Percussion
There are few examples of the study of prcussion instruments,
probably because of the ccanplexity of the spectra which are often
mre noise-like than tone-like.
Moorer (1977) aives an example of
the overall analysis of a bass drum note.
'
Fletcher and Bassett (1 978)
describe the analysis of sounds frcnn the bass drum using computermodelled band-pass f i l t e r s with 3 Hz bandwidths.
The sounds were
analysed in 1 Hz steps f m 30-1000 Hz and frequency a s a function
of time, peak sound pressure level and decay r a t e determined f o r each
major cmponent.
-I
*,
STL-QPSR 4/1980
Pressure
-
3
__L___-----------
-
TIME ( m s l
r
.
-.?
.
i
t
- '
Fig. IVX-C.1.
Line segment representation
of s t a r t i n g transient of a
Gedackt C4 organ pipe.
..
..
.
.
.. ,.
Jansson, E.V. & Sundberg, J. (1975): "Long-Time-Average-Spectra
applied to analysis of music. Part I: Methcd and general applications", Acustica 34, pp. 15-19.
Keeler. J.S. (1972a): "Piecewise-periodic analysis of ahnst-periodic
sounds & musical transients", IEEE Trans. Audio & Electroacoustics
338-344. , *
AU-20, PE>.
1 ,
..$
.
, !
>, , .
Keeler. J.S. (1972b): "The attack transients of safe organ pipes",
IEEE Trans. Audio & ElectrQacoustics AU-20, pp. 3787,3$1.
Land, E.H. (1977): "The retinex theory of color vision", Sci.Amer.
237, No. 6, pp. 108-128.
-Land, E.H. & McCann, J.J. (1971): "Lightness and retinex theory", J.
@t. I30c.Am. 61, pp. 1-11. " .
",-,
-
.
,.
7
A
I 1
)a
~
1
#
. .
Lindbl~~t~,
B. & Sundberg, J. (1970): "Towards a generative theory
52, pp. 71-88.
of melody", Swed. J. of Musicology Luce, D. & Clark. M. (1965): "Durations of attack transients on nonpercussive orchestral instruments", J. Aaio Ehg. Soc. -13, pp. 194-199.
e r r H. (1979): "Multidimnsional audio: Parts I and 2", J.
27, pp. 386-393; -27, pp. 496-502.
Audio Ehg. Soc. M r e r , J.A. (I973): "Theheteroch'rtle m e t h d . of analysis of transient
wavefom", Artificial Intelligence Lab., Stanford University.
M r e r , J.A. (1975): "On the secpentation and analysis of continuous
musical sound by digital computer", Report No. STAN-M-3, Dept. of
Music, Stanford University.
M r e r , J.A. (1977): "Signalprocessingaspctsof ccanputermusic:
A SUTE~", Proc. IEEE 65, pp. 1108-1137.
* u.t.
Nolle, A.W. & Boner, C.F. (1941): "The initial transients of organ
pipes", J.Acoust.Soc.Am. 13, pp. 149-155.
Plcanp, R. (1976): Aspects of Tone Sensation, Academic Press, London,
pp. 111-142.
Plomp, R. & Steeneken. H.J.M. (1973): "Place dependence of timbre
in reverberant sound fields", Acustica 28, pp. 50-59.
Rash, R.A. (1978): "The perception of si.multaneous notes such as in
polyphonic music", Acustica 40, pp. 21-33.
Rash, R.A. ( 1 979): "Synchronization in performed ensemble music",
43, pp. 121-131.
Acustica Richardson, E.G. (1954): "The transient tones of wind instruments",
J.Acoust.Soc.Am. 26, pp. 960-962.
Rissett, J.C. & Mathews, M.V. (1969): "Analysis of musical instrurraent
tones", Physics Today 22, pp. 23-30.
L
'
-
.Ir
j
I.
Roedierer, J.G. (1975): Introduction to the Physics and Psychophysics
of Music, Springer-Verlag, New York, 2nd edition. ,
,
Russel, I.J. & Sellick, P. (1978): "Intracellular studies of hair
.- cells in the m l i a n cochlea", J. Physiol. 284, pp. 261-290.
'
Stevens, S.S. (1972): "Perceived level of noise by Mxrk VII ard
decibels (E)", J.Acoust.Soc.Am. 51, pp. 575-601.
,
,
,
Sundberg, J. ( 1 96.6): Mensurens betydelse i 6ppm labialpipor, Almqvist
& Wiksell, Uppsala.
'
Swdberg, J. (1978): "Synthesis of singing", Swed.J. of Musicology 60,
pp. 107-112.
"-'Terhardt, E ( 1974): "On the wrception of periodic sound fluctuations
30, pp. 202-213.
(roughness)",Acustica Trendelenburg, F. (1936): "Beginning of sound in organ pipes", Z f
tech.Physik 16, p. 513.
i
3
Wedin, L. ( 1 970): "Dimensionsanalys av perception av instrumentklancf"'
Inst. of P s ~ c ~ o ~ oStockholm
~Y,
University.
Weyer, R-D. (1976): "Time-frequency-structuresin the attack transients
of piano and harpsichord sounds - I", Acustica 35, pp. 232-252.
I
Weyer , R-D ( 1 976/77): " Time-varying amplitude-frequency-structures
in the attack transients of piano and harpsichord sounds - 11",
Acustica 36, pp. 241-258.
Zwicker, E. (1960): "Ein Verfahren zur Berechnung der LautslSrke",
Acustica 10, pp. 304-308.
Zwicker, E. (1977): "Procedures for calculating loudness of temporally
62, pp. 675-682.
variable sounds", J.Acoust.Soc.Am. Zwicker, E. & Feldtkeller, R. (1967): Das Ohr als Nachrichtenenpf3ngerI
S.Hirzel Verlag, Stuttcmk, 2nd ed.
.
. .
,
.
.