Learning Tags that Vary Within a Song

Learning Tags that Vary Within a Song
Michael Mandel, Douglas Eck, and Yoshua Bengio
LISA Lab, DIRO, Université de Montréal
[email protected]
International Society of Music Information Retrieval Conference
August 11th, 2010
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
1 / 26
Vision
Automatically describe music from its sound (autotag)
I
in terms of genre, instrument, vocals, production, rhythm, . . .
In order to
I
I
I
find music by description
browse music with similar descriptions
summarize many songs simultaneously
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
2 / 26
News
Amazon’s Mechanical Turk can collect useful training data
People agree more when describing “closer” musical excerpts
Correlations over time can be used to “smooth” MTurk tags
Smoothing improves overall classification accuracy from 57% to 61%
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
3 / 26
Overview
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
4 / 26
Outline
1
Collecting tags
2
Tag language model
3
Autotagging experiment
4
Summary
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
5 / 26
Outline
1
Collecting tags
2
Tag language model
3
Autotagging experiment
4
Summary
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
6 / 26
Mechanical Turk
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
7 / 26
MTurk data collected
5 clips from each of 185 random blog tracks = 925 clips
I
each seen by 3 turkers and described with 18 tags on average
Total of 2,500 (user,clip) pairs, 15,500 (user,clip,tag) triples
Paid $0.03–$0.05 per clip, total of about $100
Rejected 11% of responses as spammy or incomplete
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
8 / 26
MTurk data analysis: Tag co-occurrence
How similar are the tag vectors from two different (user, clip) pairs?
How does that vary for pairs with different relationships?
For all pairs of observations (or a random subset)
I
I
count number of shared tags
add this count to a bucket based on the observations’ relationship
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
9 / 26
Tag co-occurrence vs “separation” (MajorMiner)
1.2
Co-occurring tags
1.0
0.8
0.6
0.4
0.2
0.0
Mandel, Eck, Bengio (LISA Lab)
Nothing Artist
Album Track
Shared
Learning tags
Clip
August 11th, 2010
10 / 26
Tag co-occurrence vs “separation” in one track
Co-occurring tags above baseline
0.65
0.60
0.55
0.50
0.45
0.40
Mandel, Eck, Bengio (LISA Lab)
60
40
20
0
20
Separation (% of track)
Learning tags
40
60
August 11th, 2010
11 / 26
Outline
1
Collecting tags
2
Tag language model
3
Autotagging experiment
4
Summary
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
12 / 26
Tag language model
Predict joint distribution of tags for each clip
I
using knowledge of its proximity to other clips
Up-weight false negative tags, down-weight false positives
Conditional restricted Boltzmann machine
I
I
I
unsupervised model of p(tags | user, track, clip)
trained on all tags of all clips
smooth by computing E {tags | user, track, clip} for each clip
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
13 / 26
Restricted Boltzmann machine
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
14 / 26
Conditional restricted Boltzmann machine
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
15 / 26
Conditional restricted Boltzmann machine
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
15 / 26
Conditional restricted Boltzmann machine
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
15 / 26
Conditional restricted Boltzmann machine
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
15 / 26
Conditional restricted Boltzmann machine
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
15 / 26
Conditional restricted Boltzmann machine
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
15 / 26
Outline
1
Collecting tags
2
Tag language model
3
Autotagging experiment
4
Summary
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
16 / 26
Experiment
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
17 / 26
Experiment
Two datasets
I
I
MTurk: 925 clips, 100 tags
MajorMiner: 2500 clips, 100 tags
Each tag is treated as a separate classification problem
I
I
Train using smoothed or aggregated data
Test using smoothed or aggregated data
Evaluate using classification accuracy on a balanced test sets
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
18 / 26
Accuracy on balanced test set, averaged over tags
Mechanical Turk
Trained
Raw
Smoothed
Raw
Tested
Smoothed
56.87 ± 0.52
61.43 ± 0.51
56.56 ± 0.36
63.40 ± 0.35
MajorMiner
Trained
Raw
Smoothed
Mandel, Eck, Bengio (LISA Lab)
Raw
Tested
Smoothed
65.97 ± 0.49
66.67 ± 0.49
Learning tags
60.58 ± 0.35
63.09 ± 0.35
August 11th, 2010
19 / 26
MTurk data, tested on raw tags
hiphop
techno
disco
dance
electronica
rap
angry
acoustic
acousticguitar
party
club
folk
synthesizer
rock
violin
guitar
blues
malevocals
classic
electricguitar
slow
femalevocals
relaxed
loud
80s
metal
country
pop
fast
soft
instrumental
upbeat
sad
female
piano
bass
trance
banjo
jazz
funk
male
indie
happy
and
love
duet
alternative
christmas
classicrock
harmonica
Mandel, Eck, Bengio (LISA Lab)
Smoothed
Raw
0.4
0.5
0.6
Learning tags
0.7
0.8
0.9
August 11th, 2010
20 / 26
MajorMiner data, tested on raw tags
hiphop
rap
quiet
club
loud
folk
silence
rock
jazz
dance
ambient
beat
techno
soft
slow
house
piano
pop
funk
saxophone
drummachine
female
distorted
noise
repetitive
synth
80s
sample
guitar
strings
electronica
male
punk
end
electronic
vocal
voice
country
trumpet
acoustic
bass
horn
british
fast
drum
indie
keyboard
instruments
organ
solo0.4
Mandel, Eck, Bengio (LISA Lab)
Smoothed
Raw
0.5
0.6
0.7
Learning tags
0.8
0.9
1.0
August 11th, 2010
21 / 26
Outline
1
Collecting tags
2
Tag language model
3
Autotagging experiment
4
Summary
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
22 / 26
News
Amazon’s Mechanical Turk can collect useful training data
People agree more when describing “closer” musical excerpts
Correlations over time can be used to “smooth” MTurk tags
Smoothing improves overall classification accuracy from 57% to 61%
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
23 / 26
Future work
Combine audio with joint tag model
Exploit unlabeled and weakly labeled data
Analyze song structure using tags more systematically
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
24 / 26
Thanks!
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
25 / 26
Thanks!
Questions?
Mandel, Eck, Bengio (LISA Lab)
Learning tags
August 11th, 2010
26 / 26