Identifying the Scale in a Given(Western Classical) Musical Composition using Neural Networks A project for the class of ECE/CS 539 Under the Guidance of Prof. Yu Hen By Lyndon Quadros, University of Wisconsin-Madison 1/6/2014 Introduction Scales or keys (as they are also called) play a very important part in almost all forms of music. They provide the framework for composition and musical aesthetics. A good musician can efficiently recognize the scale/key of a given piece of music. However, for budding musicians and music students, this quality is always one they yearn for. In order to learn to play a particular song just by listening to it, having the information of which scale the song is in proves very helpful. As a musician myself, I struggled with this aspect and hence, have been driven to employ my knowledge of Neural Networks to assist me and others like me in recognizing/identifying the scales in pieces of music. Scales and Octaves: In music, a scale is any set of musical notes ordered by fundamental frequency or pitch. This is the general definition of a scale. A more specific definition can be given as: A musical scale is a series of musical notes differing in pitch according to a specific scheme (usually within an octave). An octave is a series of eight notes occupying the interval between (and including) two notes, one having twice or half the frequency of vibration of the other. Western classical music generally uses heptatonic scales for composition. These scales consist of seven notes and repeat at the octave. Notes in the commonly used scales are separated by whole and half step intervals of tones and semitones. Not delving much into musical theory, I would now like to move on to the problem at hand and a description of the scales I wish to identify using my trained network. Scales to be Identified by the network: The project focuses on identifying the two main classes of scales most frequently used in Western Classical music viz. Major Scales and Minor Scales. Each of these classes contains twelve scales. The scales in the class of Major Scales and the notes they include (beginning from the root note and ending at the seventh note) are as follows: Note: 1. The number above each note indicates the note number (place of the note) in the scale. 2. If a particular note is followed by a “#”, it is read as ‘sharp’, and if it is followed by a ‘b’, it is read as ‘flat’. For example : Eb, will be read as ‘E flat’ and ‘C#’ will be read as ‘C sharp’. 3. A sharp of a particular note is the same as the flat of the next note. For example: C# is the same as Db. Scale Name Note Number 1 2 3 4 5 Amajor A B C# D E Bbmajor Bb/A# C D Eb/F# F Bmajor B C#/Db D#/Eb E F#/Gb Cmajor C D E F G C#major C#/Db D#Eb F F#/Gb G#/Ab Dmajor D E F#/Gb G A Ebmajor D#/Eb F G G#/Ab A#/Bb Emajor E F#Gb G#Ab A B Fmajor F G A A#/Bb C F#major F#/Gb G#Ab A#Bb B C#/Db Gmajor G A B C D G#major G#/Ab A#/Bb C C#/Db D#/Eb 6 7 F# G# G A G#/Ab A#/Bb A B A#/Bb C B C#/Db C D C#/Db D#Eb D E D#/Eb F E F#/Gb F G The minor scales and their notes are as follows : Scale Name Aminor Note Number 1 2 3 4 5 6 7 A B C D E F G Bbminor A#/Bb C C#/Db D#/Eb F Bminor B C#/Db D E F#/Gb Cminor C D D#/Eb F G C#minor C#/Db D#/Eb E Dminor F D Ebminor D#/Eb E F Eminor E F#/Gb Fminor F G F#minor Gminor F#/Gb G#/Ab G A G#minor G#/Ab A#/Bb F#/Gb G#/Ab G A F#/Gb G#/Ab A#/Bb G A G#/Ab A#/Bb B C F#/Gb G#/Ab G A G#/Ab A#/Bb A B A#/Bb C B C#/Db C D C#/Db D#/Eb A B C#/Db D E A#/Bb C D D#/Eb F E F#/Gb B C#/Db D#/Eb If we ignore the note numbers and just look at the notes that make up a scale, we find that for every major scale, there exists a corresponding minor scale that has the same notes as the major scale (not necessarily with the same note number). For example: Cmajor and Aminor are similar and so are Gmajor and Eminor. Using the terminology employed in music theory, we say that Aminor is the relative minor of Cmajor and so on. Although the note numbers carry a fair amount of importance in music theory, this observation helps in reducing the number of outputs to the neural network i.e. it makes the classification problem easier by reducing the number of classes by half. (Provided we compromise on the note number. How this is achieved will be discussed in the feature extraction part.) Hence, the twelve scales that eventually need to be identified are: 1. Cmajor/Aminor 2. C#major/Bbminor 3. Dmajor/Bminor 4. Ebmajor/Cminor 5. Emajor/C#minor 6. Fmajor/Dminor 7. F#major/Ebminor 8. Gmajor/Eminor 9. G#major/Fminor 10. Amajor/F#minor 11. Bbmajor/Gminor 12. Bmajor/G#minor Feature Extraction and Pre-Processing The features, training and testing set used for this project were made and decided upon by me. Hence, a detailed explanation of the feature extraction and the building of the training and testing sets is given. Octaves and Frequencies: The physical representation of the notes tabulated with the scales above is obtained through the frequencies they correspond to. Each note occurs in every octave of frequency. As stated earlier, the frequency is doubled when we traverse one octave. For example: In the -3 octave, the note A is the sound at a frequency of 55Hz. When we move one octave higher to the -2 octave, note A is the sound at a frequency of 55*2=110 Hz. Although theoretically many octaves are possible, the project is limited to octaves that are defined above the frequency of 20Hz since that is the lowest possible frequency audible to the human ear. Key-Profiles: The following the single, most challenging part of the project. “What features should one use to possibly represent the scale/key a particular piece of music is in?” A very commonly used proposition and result from music cognition literature is Carol Krumhansl and Mark Schmuckler's key-profile model. It posits that each key/scale has a key-profile: a vector representing the optimal distribution of pitch-classes for that key/scale. The key-profile for a major scale is as follows: Note Number 1 b2 2 b3 3 4 Note Rank 6.35 b5 5 b6 6 b7 7 2.23 3.48 2.33 4.38 4.09 2.52 5.19 2.39 3.66 2.29 2.88 The note rank denotes the importance of each note in the scale/key. „b‟ indicates „flat‟. The key-profile for a minor scale is as follows: Note Number 1 b2 2 3 b4 Note Rank 6.33 2.68 3.52 5.38 2.6 4 6.35 5 6 b7 7 b1 3.53 2.54 4.75 3.98 2.69 3.34 3.17 Thus, the key-profile for C major will be as follows: Note Name C C# D Eb E F F# Note Rank b5 G G# A Ab B 2.23 3.48 2.33 4.38 4.09 2.52 5.19 2.39 3.66 2.29 2.88 A statistical correlation of these key-profiles with the probabilities of occurrence of each pitch in the piece yields a number called the correlation co-efficient. An array of the coefficients of the pitch occurrence probabilities with each key-profile yields a 12-dimensional array which will act as the feature vector for the project. Pitch and Frequency Extraction: Since the pitch occurrence probabilities of the pitches in the song are required to build the feature vector, the extraction of pitches and their occurrences are needed. This was done by the open-source pitch extraction platform called Tarsos, which is available on http://tarsos.0110.be. This platform works best with the Tarsos-Yin pitch detection algorithm (Six and Cornellis, 2011). The following is a screenshot of the pitch extraction done during the course of this project The waveform in black is the waveform of the audio file whose pitches are to be detected. The peaks seen above the waveform, in green are the peaks of the pitch-class histogram i.e. the occurrences of each pitch normalised to one octave. This information cannot be exported from the software in a format that can be used for processing in MATLAB, in which the project was developed and tested. However, the software exports the frequencies detected at every sampled instant. These frequencies are normalised using the M-file bsefreqtry.m. to frequencies within the -3 octave. The notes ad their corresponding frequencies in this octave are as follows: MIDI Octave MIDI Note Absolute Note Frequency Hz Name Octave Name Cents Number 32.7031956626 4,500.00 C#/Db 34.6478288721 4,400.00 -- -3 24 C -- -3 25 -- -3 26 -- -3 27 -- -3 28 E 41.2034446141 4,100.00 -- -3 29 F 43.6535289291 4,000.00 -- -3 30 F#/Gb 46.2493028390 3,900.00 Low -3 31 Low -3 32 Low -3 33 Low -3 34 Low -3 35 36.7080959897 4,300.00 D#/Eb 38.8908729653 4,200.00 D 48.9994294977 3,800.00 G#/Ab 51.9130871975 3,700.00 G 55.0000000000 3,600.00 A#/Bb 58.2704701898 3,500.00 A B 61.7354126570 3,400.00 The normalisation procedure is as follows: First we find the number of semitones away the given frequency is from the base frequency of the octave. The number of semitones essentially is the number of notes, independent of the octave. The base frequency in this octave is the C note and its frequency is= 32.7031956626 Hz. 𝑓 Number of semitones = 12𝑙𝑜𝑔2 (𝐶𝑏𝑎𝑠𝑒 ) Where, f is the frequency of the pitch and Cbase is the base frequency of the octave. We then find the number of semitones this frequency is away from the base frequency in its octave. This is done by taking 𝑓 𝑏 = (12𝑙𝑜𝑔2 (𝐶𝑏𝑎𝑠𝑒 ))%12 Where % denotes the „modulo‟ operator. The pitch extraction does not always yield a frequency that is an absolute defined frequency corresponding to a note. Hence, in order to estimate the note it is closest to, we round off the value of „b‟ to the nearest integer value. Then, the corresponding frequency in the -3 octave is found by 𝑏 𝑓−3 =212 *Cbase Once the frequencies are normalised, the probability of occurrence of each frequency is found out. This gives a 1*12 vector. The statistical correlation of this vector with a key-profile gives the correlation co-efficient of the musical piece with the scale. 12 such co-efficients, each representing the correlation of the musical piece with a scale yields the final feature vector: An example of a feature vector is as follows: 0.18 -0.59 0.44 -0.35 0.36 -0.31 -0.04 0.63 -0.62 0.33 -0.29 0.27 Training and Testing Sets In order to build the training and testing sets, audio files of various western classical compositions by famous composers such as Beethoven, Bach, Mozart, Vivaldi, Verdi, Wagner, Choplin, Rossini and many others were obtained or recorded. The scales/keys of these compositions were provided in their description. One common feature in these pieces is that none of these pieces contain percussions, but only polyphonic melodic/harmonic instruments. This constraint was imposed due to the limitations is pitch-detection technology. Then the following steps were performed: 1. The frequencies occurring in the songs were found out by running a portion of the song through the pitch detection algorithm 2. These frequencies were normalized and their probabilities were found. 3. The correlation co-effecients of the songs with various key-profiles were found. Step 1 was performed on the Tarsos platform. Steps 2 and 3 were done through the bsefreqtry.m MATLAB code created. These features were then concatenated with the one-hot 12-bit binary representation of the outputs. An example of the outputs is as follows: 100000000000 This corresponds to the scale Cmajor/Aminor. A total of 180 training vectors and 80 testing data vectors were employed. The Neural Network Employed Since the problem boils down to a pattern classification problem, a Multi-Layered Perceptron was trained using the back-propagation algorithm. At first, a three-layered perceptron with one hidden layer was tested with different number of hidden neurons for the same learning rate, α and momentum factor, µ i.e. for α=0.1 andµ=0.8 The best results obtained were as follows: No. Of Hidden Neurons Testing Classification Rate(%) 13 Training Classification Rate(%) 97.17 14 97.78 48.5322 16 21 24 25 26 27 97.22 98 100 100 98.33 100 48.5322 47.50 51.4713 47.0588 50 47.0588 48.5322 Clearly, the best results are obtained with 24 hidden neurons. Hence, this configuration was tried with other values of α and µ. Some of the better results obtained are as follows: With 24 hidden neurons: α µ Training Classification Rate (%) Testing Classification Rate(%) 0.1 0.8 100 51.4713 0.15 0.8 100 33.8235 0.12 0.8 98.8890 41.7650 0.1 0.7 98.33 44.1176 0.1 0.6 100 41.1176 0.1 0.5 100 47.0588 0.1 0.4 100 51.4706 0.1 0.3 100 48.5294 0.1 0.35 100 42.6471 0.1 0.45 100 57.3530 Thus, the best classification rate is obtained with Values, α=0.1 and µ=0.45 The optimum weights obtained were used to create a network that could identify the scale of a given piece of music with some preliminary processing required. The method to employ the identifier has been elaborated in the appendix. Conclusion Although a 57% classification is not remarkable, it suggests that the problem of key/scale identification in Music Information Retrieval can be approached in a more cognitive manner using Neural Networks. In this direction, Support Vector machines can be used to further work on this problem and achieve a better classification. References 1. Pitch Histograms in Audio and Symbolic Music Information Retrieval - George Tzanetakis, Andrey Ermolinsky, Perry Cook 2. Tarsos, A Modular Platform for precise Pitch Analysis of Western and Non-Western music. – Joren Six, Olmo Cornelis, Marc Leman 3. Cognitive Foundations of Musical Pitch [Oxford University Press, 1990] – Carol Krumhansl 4. MELISMA-The Key Program - http://www.link.cs.cmu.edu/music-analysis/key.html 5. http://www.tonalsoft.com/pub/news/pitch-bend.aspx APPENDIX The scale_identify.m MATLAB code submitted along with the report has been developed to identify the scale of a given piece of music. In order to use the program, one must first extract the “annotations” of the music piece using the Tarsos platform mentioned above. The exported values from the software are obtained in a Comma-separated-values (.csv) file. These can be copied to a text file in notepad. Once the text file containing the annotations of the piece of music is obtained, it can be fed as input the scale_identify program. The output of the program looks as follows:
© Copyright 2026 Paperzz