Development of Singing-by-Onomatopoeia corpus for Query

International Journal of Advanced Intelligence
Volume 9, Number 1, pp.63-75, March, 2017.
c AIA International Advanced Information Institute
Development of Singing-by-Onomatopoeia corpus
for Query-by-Singing Music Information Retrieval system
Motoyuki Suzuki
Faculty of Information Science and Technology, Osaka Institute of Technology
1–79–1, Kitayama, Hirakata, Osaka, 573–0196, JAPAN
[email protected]
Akimitsu Hisaoka
Faculty of Information Science and Technology, Osaka Institute of Technology
1–79–1, Kitayama, Hirakata, Osaka, 573–0196, JAPAN
[email protected]
Received (15 Sep. 2016)
Revised (15 Feb. 2017)
Query-by-Singing Music Information Retrieval (MIR) system has an advantage compared
to the Query-by-Humming MIR system because it can use lyrical content in addition to
the melodic information. However, it cannot be used for instrumental music which has
no lyrics. Such music is often represented using onomatopoeia. There are many kinds of
onomatopoeia, related to the characteristics of sounds such as tone and timbre. If such
a relationship can be realized, then the onomatopoeia can be used for retrieving keys
instead of lyrics.
In order to investigate the relationship between musical sound and onomatopoeia we
constructed a new singing voice corpus. Forty-nine participants sung by onomatopoeia,
and the phonetic transcriptions of the data have been made by hand. A total of 452 audio
data samples and corresponding transcriptions were obtained.
In addition to constructing the corpus, several basic analyses were carried out. First,
we defined the “word” unit of onomatopoeia by using a statistical language model and
compiled a list of 492 such words. “Ta” and “la” were found to be the most commonly
used, but the variation in the usage of onomatopoeia among the singers were very wide.
Some singers used many kinds of onomatopoeias while some singers used only a few words
for any type of music. Some of the musical instruments were associated with a particular
onomatopoeia but these relationships were not strongly defined.
Keywords: Onomatopoeia; Query-by-Singing; Music Information Retrieval System.
1. Introduction
In recent years, a huge number of songs can be stored in a small music device such
as an iPod or an MP3 player. Users are able to listen to many songs randomly
but it is difficult to search and retrieve a specific song as most of the small players
do not have any convenient input device such as a keyboard, large-size touch pad,
or mouse. Therefore, a user cannot input retrieving keys (title of the song, singer
name, a part of lyrics, etc.) easily.
63
64
M. Suzuki, A. Hisaoka
In order to solve this problem, several content-based music information retrieval
(MIR) systems have been proposed. Many of the traditional content-based MIR
systems1,2,3 use a humming sample as the retrieval key in a procedure known as
Query-by-Humming (QbH). QbH-MIR systems extract melodic information, which
consists of the tone and the length of notes from an input humming and uses it as a
retrieval key. These systems cannot achieve high performance because of the error
in extracting melodic information.
Query-by-Singing (QbS) MIR systems4,5 have been also proposed. These systems accept a singing voice as input and extract lyrical content in addition to
melodic information by using a speech recognition technology6 . In general, these
systems show higher performance than the QbH-MIR system because both lyrical
and melodic information can be used for retrieving the required music piece.
However, the QbS-MIR system cannot use lyrical content for retrieving instrumental music. In general, a user sings an instrumental music by using humming. In
that case, what kinds of phones are usually used? Meaningless words such as “ta,”
“la,” “cha,” and “tang” are commonly used. These are called onomatopoeia and
there are several of such expressions in humming.
Each onomatopoeia relates to characteristics of sound. For example, both “pong”
and “ping” express a piano sound, but “ping” expresses a higher tone than “pong.”
“Dong” expresses a loud drum sound, while “ton” expresses a small drum sound.
If such a relationship between an onomatopoeia and a sound characteristic can be
found, then this information can be used to distinguish two pieces of music, both
of which have a similar melody but different tone, played using different musical
instruments.
Several researches used onomatopoeia as query of MIR, but these were used in restricted situations. Ishihara et al.7 extracted only length of notes from onomatopoeia
expressions because it accepted only one onomatopoeia (“la”) and its variations
(“laa,” “laaa” and so on). Some other researches8,9 also used onomatopoeia as input, but these systems only dealt with percussions. In this paper, we investigate
relationships between onomatopoeia and sound characteristics in general music.
In order to do it, a singing-by-onomatopoeia corpus is constructed. We collect
many audio clips of singers singing instrumental music with onomatopoeias and
make phonetic transcriptions of all the data. Subsequently, some basic relationships
are investigated using statistical techniques.
This corpus can be also used for developing an automatic construction system
of an onomatopoeia database for MIR system. Input of this system is a music, and
it is converted into a corresponding onomatopoeia sequence. The system can be
realized by using automatic speech recognition technique. Srinivasamurthy et al.10
has been proposed the similar method for translating percussion music into syllable
expressions. This method can be applied to general music by using the corpus.
Development of Singing-by-Onomatopoeia corpus for QbS-MIR system
65
2. Relationship between onomatopoeia and simple sound
There are several researches11,12 investigating relationships between onomatopoeia
and simple sound in Japanese.
First, the tone of a sound mainly corresponds to the vowel. The vowel /o/ is
used for lower sound (under 1 kHz), /i/ is used for higher sound (upper 2 kHz), and
the vowel /a/ is used for middle range (between 1 kHz and 2 kHz).
Second, the length of sound is expressed the end of onomatopoeia. Long vowel
(e.g. /a:/) or long vowel with syllabic nasal (e.g. /i:N/) are frequently used for a
long sound. On the other hand, onomatopoeia representing a very short sound ends
by geminate consonant (/Q/). A sound with middle length is represented by a short
vowel with a syllabic nasal (e.g. /iN/).
When a short sound is played twice, or two similar sounds are played continually,
the structure of the corresponding onomatopoeia becomes /cvtv/, where /c/ is a
consonant and /v/ is a vowel. Two vowels in this structure are the same, and
the second consonant is fixed to /t/ (e.g. /kata/, /koto/). In such a case, when the
second sound is of smaller or higher tone, /r/ is used instead of the second consonant
/t/ (/cvrv/). On the other hand, when the second sound is louder, the syllabic nasal
(/N/) follows the onomatopoeia (/cvtvN/). Moreover, when the second sound has
a longer duration, then the second vowel becomes a long vowel (/cvtv:N/). When
similar sounds are played repeatedly, the second consonant and vowel are used
repeatedly (e.g. /katata/, /pororo/).
As can be inferred from above, some simple sounds can be translated into onomatopoeias. However, we do not know what kind of onomatopoeias should be used
for a complex sound such as music. How to represent a difference among musical
instruments using onomatopoeia? Is there any individuality in the usage of onomatopoeia? In order to answer these questions, a singing-by-onomatopoeia corpus
is needed.
3. Construction of the corpus
3.1. Design policy of the corpus
The purpose of constructing the corpus is to develop a new QbS-MIR system. The
rules followed in the construction are stated below:
• The input music has no lyrics. Specifically, classical music samples are used.
• The length of input is set to about 10 seconds because the input voice to
the MIR system is typically short in length.
• The singing is without any accompaniment. The singer is allowed to listen
to the music in advance but has to sing from memory.
After recording, phonetic transcriptions of all data are given by hand.
66
M. Suzuki, A. Hisaoka
Table 1. Music list used in the database
Music title
Composer
The Marriage of Figaro
Sabre Dance
The Trout
Carmen
The Nutcracker
The Planets
Wedding March
Dance of the Knights
Triumphal March
Heroic Polonaise
Revolutionary Étude
For Elise
Canon
Radetzky March
Air on the G string
Hungarian Dances
Je te veux
Turkish March
The Four Seasons
Swan Lake
W. A. Mozart
A. Khachaturian
F. Schubert
G. Bizet
P. I. Tchaikovsky
G. Holst
F. Mendelssohn
S. Prokofiev
G. Verdi
F. Chopin
F. Chopin
L. Beethoven
J. Pachelbel
J. Strauss I
J. S. Bach
J. Brahms
E. Satie
W. A. Mozart
A. Vivaldi
P. I. Tchaikovsky
3.2. Target songs
Twenty popular classical music pieces were used as target songs. Table 1 shows a
list of classical music used here. Some of the music was played by an orchestra,
while other pieces were played on string instruments, the piano, and so on.
Four representative periods (each of which was about 10 ∼ 20 seconds) were
extracted from the music. A total of 80 musical periods were used for recording.
3.3. Recording procedure
A singer was made to sing onomatopoeias without any reference sounds to generate
the input for a QbH-MIR system. Details of the recording procedure are provided
next.
(1) A singer listens to a target musical period by using headphones and thinks
about how to sing the music using onomatopoeias. The singer should memorize
the music in order to sing it without any reference sounds (“a cappella”). The
music can be listened to repeatedly until the singer memorizes it.
Development of Singing-by-Onomatopoeia corpus for QbS-MIR system
67
(2) The singer sings the music with onomatopoeias. If the singer cannot memorize
the music, or cannot think of appropriate onomatopoeias, then the singer can
pass that piece of music.
Thirty-nine males and 10 females participated in the evaluation as singers. Each
singer attempted to sing 10 musical periods, but some periods were passed because
a singer could not memorize it. A total of 452 songs were recorded; 9.2 songs were
sung by a singer on an average.
On the other hand, the number of singers per music piece was not well-defined.
The maximum number of singers for a piece was 29 and the minimum number of
singers was 17.
3.4. Making transcriptions
After recording the songs, transcriptions were generated manually for all the audio
clips. The sung onomatopoeias were transcribed into a sequence of Japanese syllables by an operator. It was difficult to determine the onomatopoeia for some of the
data because of ambiguous pronunciation. In these cases, the operator determined
the most similar Japanese syllable to the sound.
Six operators, who were different from singers, were used for the transcription.
4. Definition of “words”
In order to analyze the relationship between onomatopoeia and music, all transcriptions should be split into “words.” It is difficult to define the scope of a “word” in
onomatopoeia. For example, for the transcription “talilalilalilang,” which splitting
pattern is the most appropriate — “tali lali lali lang,” “ta li la li la li lang,” or
“talilali lalilang?”
In this section, we define “words” for onomatopoeia sequences from a statistical
viewpoint.
4.1. Overview of the algorithm
A word in a natural language is a cluster of phones which appears frequently in
sentences. Our objective is to define a minimum unit (corresponding to the phone)
for onomatopoeia and find clusters appearing frequently in onomatopoeia sequences.
The minimum unit of onomatopoeia is defined based on the Japanese syllable.
In fact, it is defined as /[c]v[Q][N]/, where “c” denotes a consonant, “v” denotes
a vowel (including the long vowel), “Q” denotes a geminate consonant, and “N”
denotes a syllabic nasal. The bracket indicates that the phonemes can be omitted.
We also employ a single /N/ as the unit. Some singers sing music by using only
/N/. It is similar to humming and the transcription of it becomes /NNNNN· · · /. In
general, /N/ is not included in Japanese syllable independently, but in this study,
/N/ is also used as one of the minimum unit of onomatopoeia.
68
M. Suzuki, A. Hisaoka
For example, a transcription “talilalilalilang” is split into “ta li la li la li lang”
according to the definition of the minimum unit. After splitting, some clusters that
appear repeatedly can be used to define a “word.” “lila” or “lali” can be employed
as the “word” in this case because each of these is repeated twice in the example.
This problem setting is similar to the unsupervised word segmentation from letter
sequence or phonetic transcription. Several methods have been proposed to address
this problem and we employed the Bayesian unsupervised word segmentation model
using the Hierarchical Pitman-Yor language model (HPYLM)13,14 . In this method,
it is assumed that a set of onomatopoeia sequences X is generated by a statistical
language model G. It has a vocabulary and statistical model and generates word
sequences. X is generated by concatenating the word sequences.
P (G) is the prior probability of G and the joint probability P (X , G) is calculated
by using Eq. (1).
P (X , G) = P(X |G)P(G)
(1)
The method finds the most appropriate structure and parameters of G which maximizes P (X , G).
This method maximizes P (X , G) for given onomatopoeia sequences X . It means
that over-fitting may occur when the number of onomatopoeia sequences is small.
In that case, the length of a word becomes longer because the variation in the
sequences becomes less.
4.2. Statistics of obtained “words”
We applied the word segmentation tool15 to all the 452 transcription data samples. A vocabulary size of 492 was obtained and all the transcriptions were split
into 13,602 words. Table 2 shows the top-10 highest frequency words. These words
were simple and commonly used for onomatopoeia singing. Eight words have the
Table 2. Frequently occurred words
Word
/ta/
/ra/
/ra:/
R
/t ja/
/taN/
/ta:/
/te/
/pa/
/ru/
/rara/
Frequency
1579
1098
622
611
521
479
405
368
319
305
Development of Singing-by-Onomatopoeia corpus for QbS-MIR system
69
140
Number of vocabulary
120
100
80
60
40
20
0
0
5
10
15
20
25
30
35
40
Number of syllables
Fig. 1. Histogram of the number of different words per the number of syllables
vowel /a/ or /a:/, and eight words have the consonant /t/ or /r/. Hence, it can
be concluded that /ta/ and /ra/ are the most popular representation in Japanese
onomatopoeia singing.
Of the total words, 138 words consisted of two syllables and 120 words consisted
of only one syllable. About 80% of the words had less than five syllables. On the
other hand, 16 words had more than nine syllables and the longest word consisted
of 35 syllables. These longer words were not reasonable and are generated due to
over-fitting. Figure 1 shows the vocabulary size per word length.
5. Analysis
5.1. Variation of onomatopoeia within a singer
The usage of onomatopoeia words is different among singers. Some singers use
various words for a piece of music, while some singers use only a few type of words.
We checked the deviation in selecting words by calculating the entropy for each
singer.
The entropy is calculated using a unigram (frequency probability distribution
for words) and corresponds to the deviation of distribution. Mathematically, the
entropy E can be calculated by using Eq. (2).
X
E=−
P (w) log2 P (w)
(2)
w∈W
where, W denotes a set of words (vocabulary) and P (w) denotes the frequency
probability of the word w. As is seen from the equation, if all the words are used
with equal probability, then the entropy reaches its maximum value of E = log2 (N ),
where N denotes the number of words. On the other hand, if only one word is used,
then the entropy is equal to 1, which is the minimum value. A smaller entropy
implies that the distribution is more biased. Figure 2 shows the number of singers
70
M. Suzuki, A. Hisaoka
9
8
Number of singers
7
6
5
4
3
2
1
0
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Entropy
Fig. 2. Histogram of the number of singers per entropy
per entropy value. The maximum entropy was 5.04 and the minimum was 2.34.
Many singers showed similar entropy. In fact, the entropies of 28 singers were within
the range of 4.1 to 4.6. However, several singers showed a very low entropy. Seventeen
singers showed entropy lower than 4.0. Especially, the entropy value of four singers
were lower than 2.8. These singers used only a few words. For example, the singers
with the lowest entropy used only 10 words and 82% of onomatopoeia were occupied
by only three words.
5.2. Similarity among singers
The entropy indicates range in word variations. However, it does not mean that two
singers who have the same entropy use the same words. In order to investigate how
a word set is similar to another singer’s word set, the distance between the singers
were calculated. The distance is defined as the Bhattacharyya distance between two
unigrams. After calculating the distance between all combinations of two singers,
all the singers were mapped into a three-dimensional space by using the MultiDimensional Scaling (MDS) method.
Figure 3 shows the three-dimensional representation for all singers. As observed,
there are four clusters (R1 ∼ R4) and nine solitary singers. These nine singers used
singular words compared with other singers. For example, man10 did not use /ta/,
/ra/, or any onomatopoeia including these two onomatopoeia. He frequently used
R
/t a/ and variations of it (about 80% of his singing data). Man11 only used /ra/ and
its variations (long vowel, add /Q/, and so on). Entropies of these two singers were
2.50 and 2.34, respectively. Some of the other singers (man15, man26, and woman10)
used very long words such as /babababa:Nbaba:N/ and /tukutuQtukutuQtuku/. No
other singer used such words. We also analyzed the statistical characteristics of the
four groups as shown in Table 3. In this table, the “Group entropy” was calculated
using all the transcription data in the same group and “Individual entropy” was
Development of Singing-by-Onomatopoeia corpus for QbS-MIR system
71
Fig. 3. 3D plot of all singers
Table 3. Statistics for four groups
Group
R1
R2
R3
R4
#Singers
Vocabulary
size
11
7
10
12
224
105
166
267
Entropy
Group Individual
4.64
4.13
5.24
6.06
4.46
3.37
4.09
4.32
calculated using the average of individual entropies, each of which was calculated
by using the transcription data associated with a singer. If both the entropies are
similar, all the singers in the group used similar onomatopoeia word set. On the
other hand, if the group entropy is bigger than the individual entropy, then each
singer in the same group used a different word set.
In group R2, both the entropies were small and the number of vocabulary was
also small. It means that all the singers in the group used similar vocabulary to
each other and the vocabulary size was small. Singers in group R1 also used similar
vocabulary, but the vocabulary size was bigger than that of group R2. On the other
hand, singers R3 and R4 used different vocabulary from each other because the
72
M. Suzuki, A. Hisaoka
Table 4. Entropy for each music
Music title
Dance of the Knights
Radetzky March
Revolutionary Étude
Sabre Dance
The Marriage of Figaro
Carmen
Turkish March
Hungarian Dances
The Trout
The Nutcracker
The Planets
Je te veux
Wedding March
Canon
Heroic Polonaise
Swan Lake
The Four Seasons
For Elise
Triumphal March
Air on the G string
Entropy
6.04
5.59
5.43
5.42
5.33
5.32
5.21
5.02
4.97
4.96
4.92
4.92
4.74
4.61
4.43
4.38
4.34
4.23
4.19
4.17
Main instruments
Violin, horn
Violin
Piano
Xylophone, trumpet
Violin
Flute, oboe
Piano
Violin, flute
Violin
Flute
Violin
Oboe
Trumpet
Violin
Piano
Oboe
Violin
Piano
Trumpet
Violin
group entropy was bigger than the individual entropy value.
5.3. Analysis focused on musical instruments
The usage of onomatopoeia is strongly related to music. The musical instrument
used in the main melody is especially important. In order to investigate the relationship between onomatopoeia and musical instruments, we calculate the entropy
for each music and investigate the similarity between musical pieces using the same
analysis techniques used in section 5.1 and section 5.2.
Table 4 shows the entropy calculated for each music. From this table, the music
with higher entropy is played by many musical instruments and/or using a fast
tempo. On the other hand, the music with lower entropy is played by single instrument and/or using a slow tempo.
In a music pieces with slow tempo, the same onomatopoeia (e.g. /ta:/, /ra:/, etc.)
was used repeatedly (e.g. /ra:ra:ra:/). However, higher tempo music was constructed
using various onomatopoeias because repeating the same onomatopoeia is difficult
to verbalize quickly. For example, the onomatopoeia /tarariraraNrariraN/ is easier
to recite quickly as compared to the onomatopoeia /tatatatatatatata/. Thus, it can
be concluded that a music piece with fast tempo is sung using various onomatopoeia
words.
Development of Singing-by-Onomatopoeia corpus for QbS-MIR system
73
Fig. 4. Three dimensional plot for music
In order to investigate the relationship between musical instruments and onomatopoeias, all the music was plotted in a three-dimensional space using MDS.
Figure 4 shows the three-dimensional space plot. In this figure, each color circle
corresponds to each music piece and the color denotes the main musical instrument
(violin, piano, trumpet and so on). Some music pieces are represented by two circles because the main instrument cannot be determined. Each numerical number
located near a circle denotes the tempo (BPM).
It can be observed that the location of the music pieces can be roughly classified
on the basis of the main musical instruments. The piano pieces are located in the
right side of the bottom half, the trumpet pieces are located in the left side, and
the flute pieces are located in the center. The results imply that the onomatopoeia
are strongly related to the main musical instrument. The piano timbre was frequently represented by “la,” “ta,” and “cha” groups (/ra/, /ra:/, /ra:N/ etc) and
the trumpet was represented by “pa” and “la” groups.
However, the pieces played on the violin were widely distributed in space. We will
investigate the difference among such pieces and analyze the relationship between
the violin timbre and the onomatopoeia representation.
74
M. Suzuki, A. Hisaoka
6. Conclusion
Query-by-Singing MIR system is advantageous over the Query-by-Humming MIR
system but it cannot retrieve instrumental music. Such pieces are often sung using
onomatopoeias. There are many kinds of onomatopoeias and these are related to
sound characteristics such as tone, length, and timbre. If the music is converted into
a sequence of onomatopoeias, then the onomatopoeias can be used as retrieving keys
instead of lyrical context.
In order to investigate the relationship between sounds and onomatopoeias,
we constructed a new singing voice corpus. Instrumental pieces were sung with
onomatopoeias by 49 singers and phonetic transcriptions were generated manually.
Overall, 452 data samples were recorded.
In addition to constructing the corpus, several basic analyses were carried out.
First, we defined the “word” unit of onomatopoeia by using a statistical language
model and 492 words were defined. “Ta” and “la” were most commonly used but the
variation in the usage of onomatopoeia among singers was very wide. Some singers
used many kinds of onomatopoeias while some singers used only a few words for any
type of music. Some of the musical instruments were associated with a particular
onomatopoeia but these relationships were not well-defined.
This analysis was based on 452 transcriptions. In future work, we anticipate
using more number of transcriptions. After that, we would like to investigate the
following issues from the viewpoint of the transcriptions. sung data:
• Connectivity of onomatopoeias (bigram, trigram, and more)
• More various musical instruments (Beyond violin and piano)
• Relationship between native language of singer and onomatopoeia
Acknowledgement
A part of this work was supported by JSPS KAKENHI Grant Number 25330140.
References
1. N. Kosugi, Y. Nishihara, T. Sakata, M. Yamamoto, and K. Kushima, “A Practical Query-ByHumming System for a Large Music Database,” in ACM Multimedia 2000, 2000, pp. 333–342.
2. A. Ghias, J. Logan, D. Chamberlin, and B. C. Smith, “Query by humming: Musical information retrieval in an audio database,” in Proc. ACM Multimedia, 1995, pp. 231–236.
3. B. Liu, Y. Wu, and Y. Li, “A Linear Hidden Markov Model for Music Information Retrieval
Based on Humming,” in Proc. ICASSP 2003, vol. V, 2003, pp. 533–536.
4. M. Suzuki, T. Hosoya, A. Ito, and S. Makino, “Music information retrieval from a singing voice
using lyrics and melody information,” EURASIP Journal on Advances in Signal Processing,
vol. 2007, pp. Article ID 38 727, 8 pages, 2007, doi:10.1155/2007/38727.
5. ——, “Music information retrieval from a singing voice based on verification of recognized
hypotheses,” in Proc. ISMIR, 2006, pp. 168–171.
6. T. Hosoya, M. Suzuki, A. Ito, and S. Makino, “Lyrics recognition from a singing voice based
on finite state automaton for music information retrieval,” in Proc. ISMIR, 2005, pp. 532–535.
7. K. Ishihara, F. Kimura, and A. Maeda, “Music retrieval using onomatopoeic query,” in Proc.
World Congress on Engineering and Computer Science (WCECS), 2013, pp. 437–442.
Development of Singing-by-Onomatopoeia corpus for QbS-MIR system
75
8. T. Masui, “Music composition by onomatopoeia,” in IFIP Advances in Information and
Communication Technology, 2002, pp. 297–304.
9. T. Nakano, J. Ogata, M. Goto, and Y. Hiraga, “A drum pattern retrieval method by voice
percussion,” in Proc. ISMIR, 2004, pp. 550–553.
10. A. Srinivasamurthy, R. C. Repetto, H. Sundar, and X. Serra, “Transcription and recognition
of syllable based percussion patterns: the case of Beijing opera,” in Proc. ISMIR, 2014, pp.
431–436.
11. K. Tanaka, K. Matsubara, and T. Sato, “Study of onomatopoeia expressing strange sounds :
Cases of impulse sounds and beat sounds,” Transactions of the Japan Society of Mechanical
Engineers Ser.C, vol. 61, no. 592, pp. 4730–4735, 1995, (in Japanese).
12. K. Hiyane, N. Sawabe, and J. Iio, “Study of spectrum structure of short-time sounds and
its onomatopoeia expression,” The institute of Electronics, Information and Communication
Engineers, Technical Report of IEICE SP97-125, 1998, (in Japanese).
13. D. Mochihashi, T. Yamada, and N. Ueda, “Bayesian unsupervised word segmentation with
nested Pitman-Yor language modeling,” in Proc. ACL, 2009, pp. 100–108.
14. G. Neubig, M. Mimura, S. Mori, and T. Kawahara, “Learning a language model from continuous speech,” in Proc. INTERSPEECH, 2010, pp. 1053–1056.
15. G. neubig. [Online]. Available: http://www.phontron.com/latticelm/
Motoyuki Suzuki
He received the B.E., M.E., and Ph.D. degrees from Tohoku University, Sendai, Japan, in 1993, 1995, and 2004,
respectively. Since 1996, he worked with the Tohoku University as a Research Associate. From 2006 to 2007, he
worked with the University of Edinburgh, UK, as a Visiting Researcher. In 2008 he became an Associate Professor
in the University of Tokushima, and currently he worked
with the Osaka Institute of Technology, Osaka, Japan. He
has been engaged in spoken language processing, music
information retrieval, and pattern recognition using statistical modeling. He is a member of the IEICE, IPSJ,
ASJ, and JSAI.
Akimitsu Hisaoka
He received the B.E. from Osaka Institute of Technology, Osaka Japan, in 2016. He studied music information
retrieval and statistical approach for natural language
processing when he was in college.