A Method for Solmization of Melody Yongwei Zhu Institute for Infocomm Research [email protected] Mohan Kankanhalli National University of Singapore [email protected] Abstract This paper presents a novel method for automatic solmization of melody, by which a melody (a sequence of MIDI notes) can be transcribed to sol-fa syllables (i.e. do re me fa sol la ti). Automatic solmization can assist in music skill training, music notation and content-based music retrieval. The proposed method is based on an approach for estimating the music scale of a melody. The key of the major scale (“do”) is estimated using music scale models. Due to the diversity of melody types, models for both diatonic and pentatonic scales are employed to avoid the possible key ambiguity for folk songs. The decision of the key of a melody is based on the scale estimation results for aggregating music notes, so that the method can work for both short and long melodies. Experiments have shown that the technique can achieve 95% correct solmization of the melodies of pop songs. 1. Introduction Due to the advancement of digital technology, digital music has become pervasive in human life, which calls for innovative tools that can help people effectively interact with the huge amount of music data available today. An example is content-based music retrieval, such as query-by-humming, which allows a person to find a tune from a large database of music [1]. Another example is automatic music notation, by which a sequence of MIDI notes can be transformed to a music notation that is appropriate for music reading or learning. Somization is a music notation system, where musical notes are denoted using syllables. The somization syllables that prevail Europe and many other places in the modern world are the sol-fa syllables, i.e. do re me fa sol la ti, [2,3,4]. This system is widely adopted in music education (solfege and aural training) and music notation for songs. For instance, the Chinese cipher notation (Jian Pu or Dien Pou) [5] that uses numbers from 1 to 7 to notate 0-7803-8603-5/04/$20.00 ©2004 IEEE. Sheng Gao Institute for Infocomm Research [email protected] melodies of songs is directly derived from the sol-fa syllables. If a song or melody can be correctly and ambiguously translated to the sol-fa syllables, it is then possible to do music retrieval using the sol-fa syllables, which could be more efficient and effective than by using note pitch values. Furthermore, automatic solmization of a melody could also assist a music learner in the training of aural skills. Although a music piece can be notated using staff notation with any key, say C Major. However, an arbitrary key could not reflect the structure of the music, and could be awkward for practical music performance. From our experience, the key signatures encoded in MIDI files are often missing or incorrect. We have not seen any existing method that can automatically translate a melody into sol-fa syllables. The sol-fa syllables are defined for diatonic scale, where “do” corresponds to the first degree of the major scale. Solmization of a melody is thus a problem of finding the pitch that is associated with “do” in the diatonic scale. It has been shown in the work [6,7] that many songs are composed using Major or Minor scales, such that melody matching can be facilitated by estimation the scale root. However, it does not guarantee a solution for transcribing the melody into sol-fa syllables. The diversity of melodies, such as scales types, usage of accidental notes, and differences in length of songs, pose challenges for automatic solmization of melody. This paper presents a novel method to tackle this problem. This paper is organized as follows: section 2 presents relation of solmization and music scales, as well a model of music scales; section 3 presents the proposed method for solmization of melody based on music scale estimation; section 4 presents the experiments; and section 5 presents the conclusion. 2. Solmization and Scale Models A music scale is a sequence of music notes within an octave, from which a music piece is composed. In the de facto tuning system, each octave has 12 equal tempered notes. The diatonic scale, which dominated western music for more than 500 years, is a scale with 7 notes (out of the 12 notes). Diatonic scale has 7 modes with each one using a different note as the starting note. Major scale and minor scale are the 2 most widely used modes of diatonic scale. The diatonic scale can be modeled as a wheel with 12 spokes, as illustrated in figure 1. The darkened spokes correspond to the notes in the diatonic scale (scale tones), and the dimmed spokes correspond to the notes outside of the scale or accidental notes. When forming a scale mode, the major scale starts from note number 0, and the minor scale starts from note number 9. In the solmization system, the syllable “do” corresponds to the first note (or degree) of the major scale, and “la” corresponds to the first note of the minor scale. In fact, all modes of the diatonic scale share a same syllable for each note, as shown in figure 1. Minor 11 10 Major [ti] [re] 7 [sol] 6 [mi] 10 11 Major 0 [do] 9 10 11 [ti] 9 1 [la] 0 [do] 1 [la] 8 [re] 7 [sol] 6 [mi] 5 4 3 2 8 2 7 [mi] 6 3 [fa] 4 5 (b) Hemitonic Pentatonic Scale Figure 2: Scale models for pentatonic scales 1 [la] 8 Minor (a) Anhemitonic Pentatonic Scale 0 [do] 9 diatonic scale. However, if using the pentatonic scale to explain the melody, there is no ambiguity. Besides major and minor pentatonic scale, there are also other types of pentatonic scale, like the one shown in figure 2(b). This pentatonic has semitone steps (thus called hemitonic pentatonic scale), and is used in traditional Japanese music. On the contrary, this type of scale does not cause ambiguity with the diatonic scale, where “do” in pentatonic scale only associates with “do” in diatonic scale. 2 3 [fa] 4 5 Diatonic Scale Figure 1: Scale model for diatonic scale Thus as long as a melody is composed using diatonic scale, the sol-fa syllables for any notes of a melody can be solely determined, regardless which (major or minor) mode is used. However, for certain songs, especially folk songs, not all 7 notes in diatonic scale are used, and as a result the song can be associated to multiple modes of the diatonic scale, which leads to ambiguity of sol-fa syllables for the melody. We have seen that such cases of ambiguity are usually due to the use of pentatonic scale. The pentatonic (scale with 5 notes) is an important type of music scale beside diatonic scale in many eastern Asian cultures [4]. The notes of typical pentatonic scales are the subset of diatonic scale tones. Among the various pentatonic scales, the major pentatonic and minor pentatonic scales prevail over others. The model for major and minor pentatonic scales is illustrated in figure 2(a). It can be seen that major and minor pentatonic scale is the diatonic scale with note number 5 and 11 omitted. That means syllable “fa” and “ti” do not present in the songs of major and minor pentatonic scales. A song written in a major and minor pentatonic scale, if explained using a diatonic scale, would have ambiguity on the syllables: “do” in pentatonic scale can be associated with “do”, “fa” or “sol” in the Although most songs are composed with diatonic or pentatonic scale, accidental notes do present in some songs. From our observation and understanding, the number of accidental notes in a song is usually small. We also observed that short songs less likely have accidental notes than long songs, while short songs are more likely based on pentatonic scale. Our proposed method for solmization of melody is based on the use of diatonic scale and major/minor pentatonic scale models. Possible accidental notes and different length of melodies are considered. 3. Proposed Method for Solmization of Melody Solmization of a melody is equivalent to finding the absolute note/pitch of the melody that corresponds to syllable “do’. In the proposed approach, the note for “do” is estimated by fitting the melody notes into the scale models and then finding the best alignment of the notes in the melody with the notes in the scale models. The best alignment accounts for minimal number of notes in the melody that reside outside of the scale model. Since there are 12 notes in each octave, the “do” could possibly correspond to any one of the 12 notes. Section 3.1 presents estimation of “do” for a fixed number of notes in a melody for one scale model; section 3.2 presents how to decide “do” for melodies with different length using both diatonic and pentatonic scale models. 3.1. Melody Scale Estimation using One Scale Model The method for melody scale estimation using one scale model takes 3 steps: (1) constructing note histograms of the melody; (2) aligning the note histogram to the scale model and computing the model fitting error; and (3) deciding the right model alignment that has minimal fitting error. The note histogram is constructed by assigning the notes into bins, where each bin corresponds to a note in MIDI. Since there are 128 pitches for MIDI notes, each note histogram has 128 bins. The alignment of the note histogram with a scale model is done by wrapping the bins around the scale model, such that each bin will match to a spoke in the model. It is obvious that notes that are just octaves apart will be matched to a same spoke. Since a scale model has 12 spokes, there are totally 12 possible alignments. The model fitting error for a particular alignment is computed by summing all the bins that are matched to the dimmed spokes (accidental notes). Table 1 shows the note histogram for the first 35 notes for the melody of “Power of love”. The first row contains the MIDI note number, and second row contains number of notes in each bin. Table 2 shows the diatonic scale model fitting errors of the 12 alignments. The first row holds the alignment numbers, and the second row holds model fitting errors. Table 1: Note histogram for “Power of love” 70 2 71 0 72 5 73 3 74 0 75 16 76 0 77 5 78 0 79 2 80 2 scale estimation is also inappropriate, since for some melodies the key may be transposed in the middle, and scale notes after the transposition could be treated as accidental notes. For the above issue, we propose to use aggregating number of notes for scale estimation. And the final result is based on the sequence of scale estimation result for the 2 scale models. The method is described as follows. The melody is first separated into segments of equal number of notes (e.g. 7 notes). The melody scale estimation is then conducted for the melody by aggregated number of segments. For example, the first note histogram is constructed for the first segment, and scale is estimated for the first note histogram. And the second histogram is constructed for the first 2 segments, and scale is estimated. And so on. In scale estimation, the scale model fitting error is computed for each specific key with increasing number segments. Thus for diatonic scale model, 12 fitting error sequences are obtained. Similarly, 12 fitting error sequences are obtained for pentatonic scale model. Each model fitting error sequence monotonically increases with the number of segments. The final decision of scale is by choosing the sequence that is consistently minimal for the diatonic scale model. If more than one fitting error sequences keep zero error, then the result of pentatonic scale model is consulted, and the sequence with consistent minimal value is then chosen. 4. Experiments and Discussion Table 2: Model fitting errors for “Power …” 1 23 2 2 3 30 4 3 5 14 6 21 7 7 8 28 9 0 10 30 11 5 12 21 The alignment with minimal fitting error could be chosen as the scale estimation result. In the above example, alignment number 9 has a minimal error (0) for diatonic scale. For this alignment, note 80 matches with “do”. 3.2. Melody Solmization Using Diatonic and Pentatonic Scale Models As mentioned before, some melody based on pentatonic scale may have ambiguity on diatonic scale. So in our approach for melody solmization, both diatonic and pentatonic scale models are employed. The results of scale estimation from the 2 models are combined to produce the final result. It could be difficult to determine the best fixed number of notes for scale estimation for all melodies, since different melody progress in different ways. On one hand, the predetermined number could be too small, when many of the notes repeat a same pitch. On the other hand, to use all the notes in a melody for Figure 3: Scale model fitting errors for “Power of love” We have conducted experiments to evaluate the proposed melody solmization method. In the experiments 20 randomly chosen MIDI music songs, including western and Asian songs, are used. The solmization notes are first obtained manually from musician experts or song books. The result of the automatic solmization method is then evaluated with the manual result. Solmization of the song “Power of love” is given as an example. Figure 3 shows the model fitting errors for the 2 scale models. The fitting error sequence for each specific key is plotted. It can be seen in figure 3(a) that the sequence with consistent minimal value zero should be taken as the correct key. Since there is only one sequence chosen in diatonic scale, the results of pentatonic can be ignored. However, it still can be seen that the chosen sequence is also consistently minimal in pentatonic scale in this case. Figure 4 illustrates the solmization result for the song “Power of love”. pentatonic scale are all large. So the song should be based on diatonic scale. However, neither of the 2 sequences is consistently minimal, thus lead to ambiguity of 2 keys. We found that the problem is due to the use of a few accidental notes in the very beginning of the melody. For the cases of key ambiguity, the pitch range of the melody could be further considered to pick the appropriate key. In song books, usually, the lowest pitch of the melody is not lower than “sol” in low octave, and the highest note is not higher than “sol” in the high octave. By using this rule, the song “Yesterday” has a solely determined solmization, which in fact is identical to the song book. 5. Conclusion 80 79 78 77 76 75 74 73 72 71 70 69 68 do re mi fa sol la ti do Figure 4: Solmization result for “Power of love” Figure 5 shows the scale model fitting results for a Japanese song “Akatombo”. For diatonic scale model, 3 error sequences keep zero. While for pentatonic scale, there is only 1 error sequence remaining zero, which determines the right key of the melody. This paper presented a novel method for solmization of melody. This technique is based on the approach for estimation of music scales of a melody. The syllable “do” is associated with the first note of major scale. Two types of scale (diatonic and pentatonic) are employed to avoid possible scale ambiguity. The scales are estimated by fitting the notes into scale models, and finding the fitting error sequence that is consistently minimal. The experiments have shown the effectiveness of the method, where 95% of the songs can be correctly transcribed to sol-fa syllables. 10. References [1] A. Ghias, J. Logan, and D. Chamberlin. “Query By Humming”. Proceedings of ACM Multimedia 95, November 1995, pages 231-236. [2] The Columbia Encyclopedia, Sixth Edition. Columbia University Press 2003 [3] http://www.bartleby.com/65/gu/GuidodAr.html [4] Karpinski, G.S., Lessons from the Past: Music Theory Pdagogy and the Future, The Online Journal of the Society for Music Theory, Volume 6, Number 3, August, 2000. [5] http://www.wikipedia.org/wiki/Music_notation [6] Pickens, J., "Key-specific Shrinkage Techniques for Harmonic Models," Proceedings of ISMIR ’03 Conference, Baltimore, MD, Oct. 26-30, 2003 Figure 5: Scale model fitting for “Akatombo” In the experiments, 19 songs out of the 20 have the correct scale root estimated. The song “Yesterday” has 2 fitting error sequences both with small values in diatonic scale, and the respective error sequences in [7] Zhu Y., Kankanhalli M. “Music Scale Modeling for Melody Matching” Proc. Of ACM Multimedia 2003, Berkeley, CA, Nov. 2-8, 2003.
© Copyright 2026 Paperzz