Automatic notation of played music (status report)

Dept. for Speech, Music and Hearing
Quarterly Progress and
Status Report
Automatic notation of played
music (status report)
Askenfelt, A.
journal:
volume:
number:
year:
pages:
STL-QPSR
17
1
1976
001-011
http://www.speech.kth.se/qpsr
MUSICAL ACOUSTICS
A. AUTOMATIC NOTATION OF PLAYED MUSIC (STATUS REPORT)
A. Askenfelt
Abstract
A computer program automatically converting sounding music
into written music is presented. A hardware system i s used for
the identification of r e s t s and for pitch detection. A statistical
treatment of the pitch periods serves a s the basis for the identification of the actual scale used in each individual melody. An analogous
treatment of the durations defines the note values. The transcription obtained i s presented in ordinary notation. The method used
seems promising but further development is needed, especially a s
regards the pitch detection.
1. Introduction
In Sweden, folk music has been collected on tape systematically for
the last 25 years, resulting in a large collection of tape recordings.
The work i s carried out by The Swedish Center for Folksong and Folk
Music Research. Today this collection contains more than 2 5 . 0 0 0 melodies and the number increases with about 3000 m e l ~ d i e s / ~ e a r .
Folk music i s interesting for various reasons.
1)
Some melodies a r e beautiful pieces of music. They a r e still played
and listened to for the same reasons a s the so-called llclassicallt
works.
2) Folk tunes a r e often very old. Therefore they may tell us about the
conditions under which man lived in the past, In this way a collection of folk tunes i s a contribution to the history of culture, comparable with collections of paintings, books, clothe s, et c,
3) A large collection of folk music i s a rich research object for music
theorists.
Only rarely i s the folk music available in written form,
The tunes
are taken over from one player to another directly without the aid of any
notation.
However, the accessibility of the items in a collection of this magnitude requires that the melodies a r e available in ordinary notation.
Other
systems possible for notation of music such a s numerical codes, graphical
representation a s obtained from a "melody writer" et c, have advantages
when used for special purposes; but in most cases the conventional notation i s preferable not least because the users of this collection including
musicelogists a r e familiar with and even tend to think in terms of ordinary
notation.
The material must also be stored in such a way that i t i s pos-
sible to make rapid searches through the melodies, using even rather
cnmplex search criteria.
The requirements mentioned above seem to make the task of notating
and storing the melodies suitable for a computer. Attempts with automatic notation of played music have been made earlier(').
This paper r e -
ports the development of
computer program and hardware equipment
for this purpolse during 1975.
2.
Hardware equipment
The hardware equipment used in the project i s seen in Fig, I-A-I,
The tape signal passes f i r s t a Rest Detector and Oscillator (RDO) which
I
I
I
DISC
STORAGES
TAPE
REST
DETECTOR
PITCH
EXTRACTOR
REST o s c I L L A t o ~
I
COMPUTER
CONSECUTIVE
PERIODTIME
couhrTER
IGRAPHICAL I I
TERMINAL
I
.
123456
VALS
b
-
-
*
GRAPHICAL
PLOTTER
+,
I
FUNDAMENTAL
FREQUENCY
HISTOG RAM
"d
123456 V A B
NOTATION
I
123456
VAL$
3I
21
-
0 -i
Fig. I-A- I.
260
II 400
I! I
DURATION
HlSTOGR A M
800 t msi
Flow diagram of automatic notation of played music.
7-MELOBY
LIBRARY
3.
STL-QPSR 1/1976
attempts to identify pauses.
The signal from the RDO i s fed into a
Pitch Extractor (PE) which extracts the fundamental frequency out of
the complex signal and converts i t into a pulse train.
i s a notorious problem in music and speech research.
Pitch detection
Although sophis-
ticated and reliable methods for pitch detection by means of computer
now a r e avialable, they a r e not perfectly suited in this case, because
they a r e still rather time consuming.
I
It i s important that the computer
can provide a notation of a melody a t least a s rapid a s a skilled professional.
Thus, it is not acceptable to wait 5 o r 10 times r e a l time only
for pitch detection processing.
Therefore two hardware systems for r e a l
time pitch detection a r e being tested a t the moment.
One enhances the
pitch periodicity in the time domain while the other accomplishes
spectrum flattening in the frequency domain.
-2.-1-Pitch
- - - detectors
----a)
TRIO
This method i s based on the well known trigonometrical identity
2
2
sin x cos x = I. Thus the name TRIO ("The Trigonometrical I").
+
Consider one period of a bandpass filtered periodic signal where only
one formant remains, i. e. the output of a pulse excited resonance circuit.
The time function u(t) in Fig. I-A-2 i s then expressed by
u(t) = ~ e - cosm
' ~ t
n
where % = 2nFn
( 1)
Fn = formant frequency
Processing this signal in the network in Fig. I-A-3 gives:
Phase shifting Eq. ( I ) 90'
Squaring ( I ) and (2) and adding forms:
2
2 -2pt
~ ( t =) A e
(sin mnt
+ cos 2m nt)
2 -2pt
=A e
( 3)
Thus, this processing enhances the pitch periodicity of the signal
and ideally only a stressed envelope remains.
of a pulse excited resonator i s not correct.
In reality the assumption
Furthermore i t i s some-
times difficult to isolate one formant in music o r speech with simple
means.
However, despite these difficulties &e method seems nromising.
Typical examples of performance a r e shown in Fig. I-A-4.
The TRIO is followed by an ordinary peak detector, the highest peaks
triggering a mono stable flip-flop, which produces the pulse train.
I
I
F i g . I - A - 5.
Id2alizcd TRIO signal waveforms.
u
= b a n d p a s s filtered signal,
z i t J = p r o c e s s e d signal
Fig. I - A - 3 .
Block d i a g r a m of the TRIO.
R P = b a n d p a s s filter, + 9 0 ~
= phase shifr c r , x = m*lltiplier.
Fig. I - A - 4 .
Typical performance of the TRIO.
Upper t r a c e : Unprocessed signal.
Lower t r a c e : Signal p r o c e s s e d by TRIO and peak detector.
a:
Male voice
[$
1,
F o = 125 Hz.
b:
Violin, pitch D4, F o = 294 Hz.
S P E C T R U M FLATTENER
Fig. I-A-5.
Block d i a g r a m of the s p e c t r u m f l a t t e n e r .
Each of the s i x channels c o n s i s t s of bandpass f i l t e r ,
c l i p p e r , and lowpass f i l t e r . See text.
1
kHz
kHz
Fig. I-A-6,
Spectra of violin tones. The solid line is the
spectrum envelope after the spectrum.,flattening.
a: Pitch D3, Fo = 147 H z (obtained from D4
by halving the tape speed)
b: Pitch B3. Fo = 247 H z
%
I.
COMPUTER
min
1 0 0 0 ~ s
-I
0
500 ns
2 Mc
+
STROBE
-T€-
s
RESET
T T L - C I R C U I T S T E X A S INSTRUMENTS
1
5
-
4 SN7493 4 - b i r B i n a r y C o u n t e r
8 SN7475 4 - b i t B i s t a b l e L a t c h e s
9 SN74123 M o n o s t a b l e M u l t i v i b r a t o r s
10 SN74121 Monostable M u l t i v i b r a t o r s
11 SN7404 H e x I n v e r t e r s
PC0
INTERRUPTSIGNAL
LOGICAL
SIGNALS
12 SN74124 Voltage C o n t r o l l e d O s c .
13 SN7474 D u a l D-type F l i p - F l o p s
Fig. I-A-7.
Circuit diagram and logical rignrlr of the CPTC.
3r
C
7
8
STL-QPSR 1/1976
me n-t
-2. -3 -Computer
- - - - -e q u 9 The computer used i n the p r o j e c t is a CDC 1700.
Basic specifications:
1; 1 p s
16 b i t s
16 k
2x1.5M
Memory cycle
Word length
Core memory
Disc memory
The p r o g r a m s a r e w r i t t e n i n a s s e m b l e r language.
3.
Computer p r o g r a m
The p r o g r a m for automatic notation is m o s t easily d e s c r i b e d by fol-
lowing the p r o c e s s i n g of a specific melody f r o m sound to notes p r i n t e d
on a staff.
The melody shown i n Fig. I-A-8 a s played on a r e c o r d e r will
s e r v e a s a n example.
The o v e r a l l s t r a t e g y i s based on a s t a t i s t i c a l
p r o c e s s i n g of the events i n the m u s i c , f i r s t concerning the frequency
domain and then, utilizing the r e s u l t s t h e r e b y obtained, the t i m e domain.
The s t a t i s t i c a l r e s u l t s s e r v e a s a t e m p l a t e f o r the quantization of the
m u s i c a l events i n the melody.
I
-3.-I -The
- - -Fundamental
- - - - - - -F-r e g u-e n-c y -H-i s-t o g r-a m
The tape with the melody i s played on a tape r e c o r d e r .
All pitch
period t i m e s provided by the h a r d w a r e equipment e a r l i e r d e s c r i b e d
a r e r e c o r d e d on the computer' s d i s c s t o r a g e .
N e a r l y one million w o r d s
a r e saved on the d i s c s t o r a g e for t h i s p u r p o s e , which c o r r e s p o n d s to a
m a x i m u m r e c o r d i n g t i m e of 750 s e c = 12 m i n a t 1000 Hz.
The com-
puter c o n v e r t s the p e r i o d t i m e s into frequency values and a s s o r t s the pitch
p e r i o d s according to t h e i r f r e q u e n c i e s i n a fundamental frequency histog r a m ( F F H ) . The F F H i s p r e s e n t e d to the o p e r a t o r on the s c r e e n of
the graphic t e r m i n a l .
i n F i g . I-A-9.
The F F H of the melody i n the example i s shown
F r e q u e n c i e s o c c u r r i n g frequently i n the melody, i. e.
the s c a l e tone f r e q u e n c i e s , a p p e a r a s m o r e o r l e s s pronounced p e a k s
along the l o g a r i t h m i c a l frequency a x i s .
to t i m e .
The y- coordinate c o r r e s p o n d s
The number of r e c o r d e d p e r i o d s a s well a s the number of
I
I
p e r i o d s falling outside the allowed frequency r a n g e (70-1000 Hz) are a l s o
given i n the FFH.
Any p a r t of this frequency r a n g e c a n be expanded to
fill up the s c r e e n .
T h e c u r v e can be smoothed with a n averaging algo-
r i t h m i n o r d e r to r e m o v e r i p p l e i n the contour.
If so, each h i s t o g r a m
Fig. I-A- 8.
"Fiuke skar rviran" in conventional notation.
STL-QPSR 1/1976
value fn is modified by
c,f.Fig. I-A-lOa and lob.
It i s now possible to define a scale which is perfectly adapted to the
actual melody played.
The operator points a t the scale tones, i. e. the
peaks in the FFH, with a movable cursor line, thereby defining the cente r frequencies of the scale tones.
a r e marked specially.
The tonic and possible accidentals
A pattern corresponding to the scale tone fr e-
1
quencies of an equally tempered major- o r minor scale i s available in
order to a s s i s t in the identification of the peaks, Fig. I-A- I I.
When this h a s been accomplished by the operator, a processed FFH
i s presented, Fig. I-A-12.
T h e frequency a d s i s now divided into parti-
tions, each corresponding to a scale tone.
All frequencies within such
a partition i s considered a s representing a specific scale tone.
When-
ever the fundamental frequency of the melody falls within one such tone
partition, the melody i s assumed to dwell on the corresponding scale
tone.
The limits between these partitions a r e located a t the geometrical
mean of two adjacent center frequencies.
In most cases these limits
agree with the absolute minima between two neighboring peaks.
The FFH also gives the intervals of the scale tones relative to the
The extension index $ tells how great a per cent of the
tonic in cents.
total recorded time the melody spent in that particular scale tone partition, i. e. to what extent the melody used that particular scale tone.
The intonation index a, i s a measure of the distribution of a peak thin
a tone partition.
This index i s obtained by transforming the peak into a
symmetrical triangle peak having the same height and a r e a a s the original peak.
The base of this triangle, expressed in cents, i s called in-
tonation index
It informs about the distribution of the peak, e. g. the
acuity of the intonation. This information mentioned above i s often useful
a,.
in folk music research.
Also, the sounding part of the recorded time a s
well a s the frequency of the tonic and the type of scale pattern used a r e
given in the FFH.
The main purpose of the FFH i s to establish a notation staff which i s
optionally adapted to the melody actually played, i. e. the frequency definition of the lines and spaces of the notation staff a r e determined in
Fig. I-A- 10.
a: FFH not smoothed.
b:
F F H smoothed once.
F r e q u e n c y scale i n H z x 100.
Fig. I-A- 11,
P a t t e r n of a n equally t e m p e r e d G-minor scale matched with
r e s p e c t to t h e tonic (G) peak in the F F H .
7
W
TIHE =
28 SEC
TONICo584 HZ
'"z
4
a UJ
%T
'2
=+Y
8
2
V
-
.
.
8
Ul
)40U
o*m
c.
%?
(0
.
tn
7T
m
cn
2
7i
I:
S
m cf r- r"
#
#
#
C)
m
Y
3
X
J:
27 Jm
2-
4-
C
0"
A
Z6
I)
.
3-
I
2-
1-
.'
..
0
1
5
:
Fig. I-A- 12.
7
:
9
:
I
10
P r o c e s s e d F F H of ttFiskesk%rsvisan". The numbers a r e
explained in the text. Fre2uency scale in H z x 100.
7.
STL-QPSR 1/1976
The i n t e r -
accordance with the distribution of fundamental frequencies.
vals between the lines and spaces of this staff may thus differ m o r e or
l e s s f r o m the corresponding intervals of the equally tempered scale.
However, i t i s essential that the distance between e. g. two lines o r two
spaces r e p r e s e n t s an interval a t l e a s t resembling a third.
This may
necessitate insertion of scale tone frequencies i n the FFH even in places
where there a r e no peaks.
This applies a s soon a s a scale tone r e -
presented on the staff i s missing in the melody.
In detecting such cases,
the operator is efficiently a s s i s t e d by the scale pattern mentioned above.
I
I
I
The F F H presents complete statistics over the frequencies of a l l
pitch periods in the melody processed, and thus gives an average of the
scale used during the melody and nothing more.
If the player changes
the scale during the melody, i. e. i f a certain scale tone i s not always
represented by the same frequency value whenever i t occurs, the peaks
in the F F H will broaden and eventually merge.
I t may be impossible to
define a useful s e t of scale tone frequencies on the basis of such a n F F H .
If, however, the scale changes abruptly o r continuously while the intervals between the scale tone frequencies r e m a i n the same, i t will be possible to track the scale and r e s t o r e i t to the starting pitch.
If, on the
other hand, even the intervals change considerably, e. g. depending on
their position i n the musical context, no scale can be defined on the basis
of the F F H .
However, in such c a s e s , the idea of a musical scale must
be said to be ambiguous, and one would question if i t i s meaningful to
force such music into the f r a m e work of conventional notation.
1
I
The r e m a r k s above have concerned the F F H a s a basis for notation.
However, a s mentioned e a r l i e r , the F F H itself can s e r v e a s a powerful
tool for measuring e . g. scales, intervals, tone distribution in a melody
and other factors of relevance to the field of ethnomusicology.
I
!
-3.-2 -The
- - -Duration
- - - - -Histogram
-- -Given the information contained in the F F H on the frequency l i m i t s
I
between adjacent scale tones, the recorded string of period times can
be transformed into a string of pitch quantified melody tones.
The dura- '
tion of these tones a r e obtained by summing up all period t i m e s belonging to a given tone.
Boundaries between consecutive melody tones i s
detected either a s a change of the scale tone, o r in the c a s e of tone r e petitions, by discontinuities in the period time values due to transition.
STL-QPSR 1/1976
8.
The durations of the melody tones can now be a s s o r t e d into a duration
h i s t o g r a m (DH), analogous to the FFH.
H e r e the l o g a r i t h m i c a l x-axis
r e p r e s e n t s t i m e and the y-axis the number of melody tones.
for the melody i n the example i s shown i n Fig. I-A- 13.
The DH
The melody
tone durations a r e generally distributed i n groups corresponding to half
notes, q u a r t e r notes, eight notes, etc.
The t i m e resolution is 3
Ole and
the allowed duration r a n g e 50- 2500 m s .
The o p e r a t o r now h a s t o define a typical t i m e value f o r one group of
T h i s i s e a s i l y accomplished by m e a n s of the c u r s o r line
durations.
mentioned.
F u r t h e r m o r e , the o p e r a t o r decides which type of note value
this group r e p r e s e n t s .
After t h i s h a s been accomplished, the computer
defines the center t i m e values f o r a l l remaining note values and divides
the t i m e a x i s into duration p a r t i t i o n s , each of which c o r r e s p o n d s to a
note value, Fig. I-A- 14.
T h i s i s done i n a c c o r d a n c e with the nominal
t i m e values, i. e. a half note i s twice a s long a s a q u a r t e r note etc.
Using t h i s information the computer now quantifies the durations of the
melody tones.
A l l \ m e l o d y tones falling into a given duration partition
will be r e p r e s e n t e d by the note value a s s i g n e d to this partition.
The r e -
sult i s a s t r i n g of p a i r s of quantified n u m b e r s , i. e. the played melody
h a s been t r a n s f o r m e d into a s u c c e s s i o n of c e r t a i n s t a t e s with fixed
pitches and durations, i n other words, into a notation.
The m a i n purpose of the DH i s t o e s t a b l i s h a b a s i s f o r the coming
notation.
The DH is a complete s t a t i s t i c a l t r e a t m e n t of the durations
of the melody tones.
However, t h i s t r e a t m e n t i s considerably m o r e sen-
sitive to e r r o r s than the F F H , because of the s m a l l s i z e of the underlying s t a t i s t i c a l m a t e r i a l , i n t h i s example about 30 notes.
player often changes the tempo during the melody.
Also, the
In such c a s e s the
groups corresponding to the different note values will m e r g e .
the DH will a s s u m e the shape of a splint fence.
At w o r s t
This can be avoided if
only a p a r t of the melody with e s s e n t i a l l y constant tempo i s s e l e c t e d for
p r o c e s s i n g , but this l e a d s to a n even s m a l l e r s t a t i s t i c a l m a t e r i a l r e sulting i n i n c r e a s e d uncertainty.
Thus, t h e r e a r e c e r t a i n p r o b l e m s a s -
sociated with the DH a s the b a s i s f o r the quantization of the durations of
the melody tones.
A method which combines the f e a t u r e s of a s t a t i c a l
DH o v e r the whole melody with a m o m e n t a r y DH which dynamically
adapts to tempo changes, s e e m s m o r e promising.
It i s also important
I
OUTGIH TONES SHORT LONE
138
TONES
Fig. I-A- 13.
'
32
239
1
Unprocessed DH of llFiskesk~rsvisan".
Time scale in s e c x 0. 1.
OUTLIM = number of periods falling outside the extreme frequency limits in the FFH.
TONES = number of tones in the DH.
SHORT = number of tones with durations shorter than 50 m s .
LONG = number of tonee with durations longer than 2500 ms.
Fig. I-A- 14..
Proc*eaed DH of "FiakeskXrsvisan",
T i m e scale in sec x 0. 1.
STL-QPSR 1/1976
to remember that there a r e other techniques possible for solving
the problem with duration quantization, e. g. sequential analysis, some
f o r m of hypothesis testing, and adaptive variants of pattern recognition.
One aspect of the DH that must not be forgotten i s that i t shows the
distribution of the durations of the musical events in a melody.
This
can be of interest in comparing different styles of playing and similar
questions in ethnomusicology.
-3.-3 -The
- - -notation
---'
After the melody has been processed a s described above, a string
of p a i r s of quantified numbers i s obtained.
The transformation of that
string into a printed notation i s easily accomplished by the computer.
The notation of the melody in the example i s shown in Fig. I-A- 15.
I
The
operator may decide any type of measure to be used in the notation.
The
Presently, all melodies a r e notated in either G-major or G-minor.
discrepancies in the notation compared with the original notation in Fig.
I-A-8 a r e few and not important to the identification of the melody.
The
discrepancies appear in bar 4 and 6 relative the original notation in t e r m s
of r e s t s due to breathing and phrasing.
The two final notes a r e prolonged
because of a slight ritardando.
These discrepancies illustrate some of the difficulties associated
with automatic computer notation of played music.
The output reflects
details in the actual performance, e. g. prolongings and shortenings of
the tones, insertion of r e s t s etc.
, which i n conventional notation appear
a s diacritic signs o r which a r e simply omitted and left to be realized
by the performed musical understanding.
Such discrepancies may r e -
duce the legibility of an automatic computer notation.
The most serious shortcoming in the notation in Fig. I-A- 15 i s the
absence of bar lines.
Although this melody i s of very simple structure,
it i s not quite trivial to read the notation without the support of such
lines.
Ideally i t would be sufficient to inform the computer about the
location of the f i r s t bar line a s the remaining bar lines can be obtained
simply by adding the note values according to the type of relevant
measure.
In practice, however, the final placing of the bar lines must
be preceded by a routine which adjusts the actual note values of the melody tones to f i t bars.
A possible basis f o r such a routine would be a
1
TIME
NOTATI 0N
CEN
1501
J
s
v
DEVIATION
OSClLLOGRAM
TIME
NOTATION
DEVIATION
OSCILLOGRAM
Fig. I-A- 16.
Notation without quantified durationr, see text.
STL-QPSR 1/1976
11.
agreement with conventional notation.
The melody used a s an example
in this paper i s very simple and much refinement and enlargement of the
computer program and hardware equipment i s needed in order to facilitate
successful processing of m o r e complex melodies.
seems promising.
Still the strategy used
A goal for the future i s a computer system capable of
processing large a r r a y s of melodies in a uniform way.
The developmental work in the near future i s planned a s follows:
1) A closer evaluation of the perfotmance of the pitch extractors
under test.
2 ) Implementation of strategies for digital filtering of the recorded
period time values.
3) Development of a routine for recording the intensity level values
simultaneously with the period time values, in order to obtain a
better recognition of pauses and tone repetitions.
4)
Development of a transpose routine that allows the notation of a
melody i n optional key.
5) Development of the rule systems and manually routines f o r insertion of bar lines.
6)
Connecting MUSSE (Music and Singing Synthesis Equipment) to the
computer in o r d e r to make i t possible to listen to the notated melody.
In the m o r e distant future l i e s the processing of a number of melodies
and the collecting of them in a melody l i b r a r y and also the development
of search routines and associated programs.
Acknowledgments
This project i s supported by the Swedish Humanistic Research Council,
Contract No. 7914.
The author i s indebted to Johan Sundberg, Pekka
Tjernlund, e specially a s regards the idea of the TRIO, Kjell Elenius and
Mats Blomber g for valuable discussions and hints.
References
( I ) SUNDBERG, J . and TJERNLUND, P. : "A computer program for the
notation of played music", STL-QPSR 2-3/1970, p. 46.
(2) SONDHI, M. M. : "New methods of pitch extraction", IEEE Trans.
Audio Electroac. , Vol. Au- I6 (1968), pp. 262-269.
(3) H O L M S T R ~ MS.
, : "Grundtonsdetektering med spektrurnutjZmningtl,
Thesis work, Dept. Elec. Eng. , KTH; to be publ. June 1976.
(4) REMMEL, M . , RUUTEL, 1. , SARV, J . , and SULE, R. : "Automatic
notation of one-voiced song", Acad. of Sciences of the Estonian
SSR, Inst. of language and literature. P r e p r i n t KKT -4, Tallin
1975.