Identifying the Scale in a Given(Western Classical) Musical

Identifying the Scale in a
Given(Western Classical)
Musical Composition using
Neural Networks
A project for the class of ECE/CS 539
Under the Guidance of
Prof. Yu Hen
By Lyndon Quadros,
University of Wisconsin-Madison
1/6/2014
Introduction
Scales or keys (as they are also called) play a very important part in almost all forms of
music. They provide the framework for composition and musical aesthetics. A good musician
can efficiently recognize the scale/key of a given piece of music. However, for budding
musicians and music students, this quality is always one they yearn for. In order to learn to
play a particular song just by listening to it, having the information of which scale the song is
in proves very helpful. As a musician myself, I struggled with this aspect and hence, have
been driven to employ my knowledge of Neural Networks to assist me and others like me in
recognizing/identifying the scales in pieces of music.
Scales and Octaves:
In music, a scale is any set of musical notes ordered by fundamental frequency or pitch. This
is the general definition of a scale. A more specific definition can be given as:
A musical scale is a series of musical notes differing in pitch according to a specific scheme
(usually within an octave). An octave is a series of eight notes occupying the interval
between (and including) two notes, one having twice or half the frequency of vibration of the
other.
Western classical music generally uses heptatonic scales for composition. These scales
consist of seven notes and repeat at the octave. Notes in the commonly used scales are
separated by whole and half step intervals of tones and semitones.
Not delving much into musical theory, I would now like to move on to the problem at hand
and a description of the scales I wish to identify using my trained network.
Scales to be Identified by the network:
The project focuses on identifying the two main classes of scales most frequently used in
Western Classical music viz. Major Scales and Minor Scales.
Each of these classes contains twelve scales. The scales in the class of Major Scales and the
notes they include (beginning from the root note and ending at the seventh note) are as
follows:
Note:
1. The number above each note indicates the note number (place of the note) in the scale.
2. If a particular note is followed by a “#”, it is read as ‘sharp’, and if it is followed by a ‘b’, it is
read as ‘flat’. For example : Eb, will be read as ‘E flat’ and ‘C#’ will be read as ‘C sharp’.
3. A sharp of a particular note is the same as the flat of the next note. For example: C# is the
same as Db.
Scale
Name
Note Number
1
2
3
4
5
Amajor
A
B
C#
D
E
Bbmajor Bb/A#
C
D
Eb/F#
F
Bmajor
B
C#/Db D#/Eb
E
F#/Gb
Cmajor
C
D
E
F
G
C#major C#/Db D#Eb
F
F#/Gb G#/Ab
Dmajor
D
E
F#/Gb
G
A
Ebmajor D#/Eb
F
G
G#/Ab A#/Bb
Emajor
E
F#Gb G#Ab
A
B
Fmajor
F
G
A
A#/Bb
C
F#major F#/Gb G#Ab A#Bb
B
C#/Db
Gmajor
G
A
B
C
D
G#major G#/Ab A#/Bb
C
C#/Db D#/Eb
6
7
F#
G#
G
A
G#/Ab A#/Bb
A
B
A#/Bb
C
B
C#/Db
C
D
C#/Db D#Eb
D
E
D#/Eb
F
E
F#/Gb
F
G
The minor scales and their notes are as follows :
Scale
Name
Aminor
Note Number
1
2
3
4
5
6
7
A
B
C
D
E
F
G
Bbminor A#/Bb
C
C#/Db D#/Eb
F
Bminor
B
C#/Db
D
E
F#/Gb
Cminor
C
D
D#/Eb
F
G
C#minor C#/Db D#/Eb
E
Dminor
F
D
Ebminor D#/Eb
E
F
Eminor
E
F#/Gb
Fminor
F
G
F#minor
Gminor
F#/Gb G#/Ab
G
A
G#minor G#/Ab A#/Bb
F#/Gb G#/Ab
G
A
F#/Gb G#/Ab A#/Bb
G
A
G#/Ab A#/Bb
B
C
F#/Gb G#/Ab
G
A
G#/Ab A#/Bb
A
B
A#/Bb
C
B
C#/Db
C
D
C#/Db D#/Eb
A
B
C#/Db
D
E
A#/Bb
C
D
D#/Eb
F
E
F#/Gb
B
C#/Db D#/Eb
If we ignore the note numbers and just look at the notes that make up a scale, we find that for
every major scale, there exists a corresponding minor scale that has the same notes as the
major scale (not necessarily with the same note number).
For example: Cmajor and Aminor are similar and so are Gmajor and Eminor. Using the
terminology employed in music theory, we say that Aminor is the relative minor of Cmajor
and so on.
Although the note numbers carry a fair amount of importance in music theory, this
observation helps in reducing the number of outputs to the neural network i.e. it makes the
classification problem easier by reducing the number of classes by half. (Provided we
compromise on the note number. How this is achieved will be discussed in the feature
extraction part.)
Hence, the twelve scales that eventually need to be identified are:
1. Cmajor/Aminor
2. C#major/Bbminor
3. Dmajor/Bminor
4. Ebmajor/Cminor
5. Emajor/C#minor
6. Fmajor/Dminor
7. F#major/Ebminor
8. Gmajor/Eminor
9. G#major/Fminor
10. Amajor/F#minor
11. Bbmajor/Gminor
12. Bmajor/G#minor
Feature Extraction and Pre-Processing
The features, training and testing set used for this project were made and decided upon by
me. Hence, a detailed explanation of the feature extraction and the building of the training
and testing sets is given.
Octaves and Frequencies:
The physical representation of the notes tabulated with the scales above is obtained through
the frequencies they correspond to. Each note occurs in every octave of frequency. As stated
earlier, the frequency is doubled when we traverse one octave. For example: In the -3 octave,
the note A is the sound at a frequency of 55Hz. When we move one octave higher to the -2
octave, note A is the sound at a frequency of 55*2=110 Hz.
Although theoretically many octaves are possible, the project is limited to octaves that are
defined above the frequency of 20Hz since that is the lowest possible frequency audible to
the human ear.
Key-Profiles:
The following the single, most challenging part of the project.
“What features should one use to possibly represent the scale/key a particular piece of music
is in?”
A very commonly used proposition and result from music cognition literature is Carol
Krumhansl and Mark Schmuckler's key-profile model. It posits that each key/scale has a
key-profile: a vector representing the optimal distribution of pitch-classes for that key/scale.
The key-profile for a major scale is as follows:
Note Number
1
b2
2
b3
3
4
Note Rank
6.35
b5
5
b6
6
b7
7
2.23 3.48 2.33 4.38 4.09 2.52 5.19 2.39 3.66 2.29 2.88
The note rank denotes the importance of each note in the scale/key. „b‟ indicates „flat‟.
The key-profile for a minor scale is as follows:
Note Number
1
b2
2
3
b4
Note Rank
6.33
2.68 3.52 5.38
2.6
4
6.35
5
6
b7
7
b1
3.53 2.54 4.75 3.98 2.69 3.34 3.17
Thus, the key-profile for C major will be as follows:
Note Name
C
C#
D
Eb
E F
F#
Note Rank
b5
G
G#
A
Ab
B
2.23 3.48 2.33 4.38 4.09 2.52 5.19 2.39 3.66 2.29 2.88
A statistical correlation of these key-profiles with the probabilities of occurrence of each
pitch in the piece yields a number called the correlation co-efficient. An array of the coefficients of the pitch occurrence probabilities with each key-profile yields a 12-dimensional
array which will act as the feature vector for the project.
Pitch and Frequency Extraction:
Since the pitch occurrence probabilities of the pitches in the song are required to build the
feature vector, the extraction of pitches and their occurrences are needed. This was done by
the open-source pitch extraction platform called Tarsos, which is available on
http://tarsos.0110.be.
This platform works best with the Tarsos-Yin pitch detection algorithm (Six and Cornellis,
2011).
The following is a screenshot of the pitch extraction done during the course of this project
The waveform in black is the waveform of the audio file whose pitches are to be detected.
The peaks seen above the waveform, in green are the peaks of the pitch-class histogram i.e.
the occurrences of each pitch normalised to one octave.
This information cannot be exported from the software in a format that can be used for
processing in MATLAB, in which the project was developed and tested. However, the
software exports the frequencies detected at every sampled instant.
These frequencies are normalised using the M-file bsefreqtry.m. to frequencies within the -3
octave. The notes ad their corresponding frequencies in this octave are as follows:
MIDI
Octave MIDI
Note
Absolute
Note
Frequency Hz
Name Octave
Name
Cents
Number
32.7031956626
4,500.00
C#/Db 34.6478288721
4,400.00
--
-3
24
C
--
-3
25
--
-3
26
--
-3
27
--
-3
28
E
41.2034446141
4,100.00
--
-3
29
F
43.6535289291
4,000.00
--
-3
30
F#/Gb 46.2493028390
3,900.00
Low
-3
31
Low
-3
32
Low
-3
33
Low
-3
34
Low
-3
35
36.7080959897
4,300.00
D#/Eb 38.8908729653
4,200.00
D
48.9994294977
3,800.00
G#/Ab 51.9130871975
3,700.00
G
55.0000000000
3,600.00
A#/Bb 58.2704701898
3,500.00
A
B
61.7354126570
3,400.00
The normalisation procedure is as follows:
First we find the number of semitones away the given frequency is from the base frequency
of the octave. The number of semitones essentially is the number of notes, independent of the
octave.
The base frequency in this octave is the C note and its frequency is= 32.7031956626 Hz.
𝑓
Number of semitones = 12𝑙𝑜𝑔2 (𝐶𝑏𝑎𝑠𝑒 )
Where, f is the frequency of the pitch and Cbase is the base frequency of the octave.
We then find the number of semitones this frequency is away from the base frequency in its
octave. This is done by taking
𝑓
𝑏 = (12𝑙𝑜𝑔2 (𝐶𝑏𝑎𝑠𝑒 ))%12
Where % denotes the „modulo‟ operator. The pitch extraction does not always yield a
frequency that is an absolute defined frequency corresponding to a note. Hence, in order to
estimate the note it is closest to, we round off the value of „b‟ to the nearest integer value.
Then, the corresponding frequency in the -3 octave is found by
𝑏
𝑓−3 =212 *Cbase
Once the frequencies are normalised, the probability of occurrence of each frequency is found
out. This gives a 1*12 vector. The statistical correlation of this vector with a key-profile gives
the correlation co-efficient of the musical piece with the scale. 12 such co-efficients, each
representing the correlation of the musical piece with a scale yields the final feature vector:
An example of a feature vector is as follows:
0.18
-0.59
0.44
-0.35
0.36
-0.31
-0.04
0.63
-0.62
0.33
-0.29
0.27
Training and Testing Sets
In order to build the training and testing sets, audio files of various western classical
compositions by famous composers such as Beethoven, Bach, Mozart, Vivaldi, Verdi,
Wagner, Choplin, Rossini and many others were obtained or recorded. The scales/keys of
these compositions were provided in their description.
One common feature in these pieces is that none of these pieces contain percussions, but only
polyphonic melodic/harmonic instruments. This constraint was imposed due to the limitations
is pitch-detection technology.
Then the following steps were performed:
1. The frequencies occurring in the songs were found out by running a portion of the
song through the pitch detection algorithm
2. These frequencies were normalized and their probabilities were found.
3. The correlation co-effecients of the songs with various key-profiles were found.
Step 1 was performed on the Tarsos platform. Steps 2 and 3 were done through the
bsefreqtry.m MATLAB code created.
These features were then concatenated with the one-hot 12-bit binary representation of the
outputs.
An example of the outputs is as follows:
100000000000
This corresponds to the scale Cmajor/Aminor.
A total of 180 training vectors and 80 testing data vectors were employed.
The Neural Network Employed
Since the problem boils down to a pattern classification problem, a Multi-Layered
Perceptron was trained using the back-propagation algorithm.
At first, a three-layered perceptron with one hidden layer was tested with different number of
hidden neurons for the same learning rate, α and momentum factor, µ i.e.
for α=0.1 andµ=0.8
The best results obtained were as follows:
No. Of Hidden Neurons
Testing Classification Rate(%)
13
Training Classification
Rate(%)
97.17
14
97.78
48.5322
16
21
24
25
26
27
97.22
98
100
100
98.33
100
48.5322
47.50
51.4713
47.0588
50
47.0588
48.5322
Clearly, the best results are obtained with 24 hidden neurons. Hence, this configuration was
tried with other values of α and µ. Some of the better results obtained are as follows:
With 24 hidden neurons:
α
µ
Training
Classification
Rate (%)
Testing
Classification
Rate(%)
0.1
0.8
100
51.4713
0.15
0.8
100
33.8235
0.12
0.8
98.8890
41.7650
0.1
0.7
98.33
44.1176
0.1
0.6
100
41.1176
0.1
0.5
100
47.0588
0.1
0.4
100
51.4706
0.1
0.3
100
48.5294
0.1
0.35
100
42.6471
0.1
0.45
100
57.3530
Thus, the best classification rate is obtained with
Values, α=0.1 and µ=0.45
The optimum weights obtained were used to create a network that could identify the scale of
a given piece of music with some preliminary processing required. The method to employ the
identifier has been elaborated in the appendix.
Conclusion
Although a 57% classification is not remarkable, it suggests that the problem of key/scale
identification in Music Information Retrieval can be approached in a more cognitive manner
using Neural Networks.
In this direction, Support Vector machines can be used to further work on this problem and
achieve a better classification.
References
1. Pitch Histograms in Audio and Symbolic Music Information Retrieval - George
Tzanetakis, Andrey Ermolinsky, Perry Cook
2. Tarsos, A Modular Platform for precise Pitch Analysis of Western and Non-Western
music. – Joren Six, Olmo Cornelis, Marc Leman
3. Cognitive Foundations of Musical Pitch [Oxford University Press, 1990] – Carol
Krumhansl
4. MELISMA-The Key Program - http://www.link.cs.cmu.edu/music-analysis/key.html
5. http://www.tonalsoft.com/pub/news/pitch-bend.aspx
APPENDIX
The scale_identify.m MATLAB code submitted along with the report has been developed to
identify the scale of a given piece of music.
In order to use the program, one must first extract the “annotations” of the music piece using
the Tarsos platform mentioned above. The exported values from the software are obtained in
a Comma-separated-values (.csv) file. These can be copied to a text file in notepad.
Once the text file containing the annotations of the piece of music is obtained, it can be fed as
input the scale_identify program. The output of the program looks as follows: