スペクトログラムにもとづく音楽情報へのマイニング技術の適用

平成 15 年度修士学位論文
スペクトログラムにもとづく音楽情報へのマイニング技術の適用
松田卓久（Matsuda Takahisa）
内容梗概
近年，コンピュータをはじめとする電子機器やネット環境の目覚しい発達により，様々な情報
を容易に手にすることができる．だが，画像，動画などマルチメディアファイルに対する検索技
術はいくつか提案されているがまだ実用化されておらず，実用化されていたとしても，マルチメ
ディアファイルに付属しているテキスト・数値情報による検索であり，マルチメディアファイル
そのものによる検索，すなわち内容検索の技術は未だ発展途上である．
そこで本論文では，マルチメディアファイルの中でも特に音楽ファイルに対してマイニング技
術を適用し，直接，検索や分析に応用する手法である音楽マイニングを提案し，その手法の有効
性について述べる．マイニング技術とは，データの中に潜んでいる価値ある情報を掘り出すこと
を目的とした大規模データに対応可能なデータ処理技術である．本研究では，とりわけここ最近
では扱われることが少なかったオーディオファイルに着目している．オーディオファイルに対し
てフーリエ変換を用いることで，オーディオファイルを時刻，周波数，音の強さをグラフで表し
たものであるスペクトログラムを得ることができる．それを本論文ではある一定の規則を与える
ことで一単語に置き換え，一曲を通して変換することでオーディオファイルをテキストデータに
する．そのテキストデータ内に出てくる単語の出現頻度を数え，曲ベクトルを定義した後，曲と
曲との非類似度を表す曲間距離を定義する．こうして得られた曲間距離をマイニング技術の中の
クラスタ分析を用いていくつかのグループに分類する．その分類結果は，音楽市場の将来の予
測・展望，マーケティングへの利用などに活かすことができる．実際，クラスタ分析の結果と現
在の日本における音楽市場の動向と一致し，マーケティングを行うときの新たな軸を提案した．
また，
「明るい」
「穏やかである」といった形容語による音楽検索である感情価検索や，持ってい
るオーディオファイルと似ているオーディオファイルを探す内容検索も，先に定義した曲間距離
を利用することで実現できる．本論文ではその両方の検索ができる音楽検索システムを作り，評
価実験を行い有用性を示した．更に将来，人間の感情に変化や，新たなオーディオファイルがデ
ータベースに登録されることを見据えて，検索システムにフィードバック機構を付け加えること
を提案した．
Abstract
In recent years, various information can be easily obtained by the remarkable
development of the internet and computers. Technologies to retrieve multimedia files
such as images and movies are proposed. Though, they are still under way.
In this paper, by employing the mining technologies to the multimedia files especially
music files, we propose the music mining which is the technique applying direct search
and analysis, and state the validity of the technique. Mining technology is a data
processing technology that has the ability to correspond large scale data, whose propose
is to mine the valuable information inside data. We focus on audio files that have not
been investigated recently. A spectrogram can be obtained by applying Fourier
transformation to an audio file. By giving the certain rule described in the paper, the
spectrogram is transposed into one word, and finally converts them into text data for a
whole song. From dissimilarity of each text data, we define “musical distance” that
explains dissimilarity between the songs. Sets of musical distance are classified by
cluster analysis, one of mining technologies.
Results of classifications can be applied to forecasting and marketing of music
industries. In fact, the results of cluster analysis accorded with the trend of music
markets in Japan. We propose two types of retrievals. One is affective value retrieval,
(the music retrieval that uses the adjectives such as “bright” and “gentle”). The other is
content-based retrieval, (finds audio files that are similar to what the users own), can be
realized by utilizing the previously defined musical distance. We build music retrieval
system that enables both ways of retrievals, and have some experimenters use the
system to measure its accuracy and show its utility.

Download Report