Learning Tags that Vary Within a Song Michael Mandel, Douglas Eck, and Yoshua Bengio LISA Lab, DIRO, Université de Montréal [email protected] International Society of Music Information Retrieval Conference August 11th, 2010 Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 1 / 26 Vision Automatically describe music from its sound (autotag) I in terms of genre, instrument, vocals, production, rhythm, . . . In order to I I I find music by description browse music with similar descriptions summarize many songs simultaneously Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 2 / 26 News Amazon’s Mechanical Turk can collect useful training data People agree more when describing “closer” musical excerpts Correlations over time can be used to “smooth” MTurk tags Smoothing improves overall classification accuracy from 57% to 61% Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 3 / 26 Overview Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 4 / 26 Outline 1 Collecting tags 2 Tag language model 3 Autotagging experiment 4 Summary Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 5 / 26 Outline 1 Collecting tags 2 Tag language model 3 Autotagging experiment 4 Summary Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 6 / 26 Mechanical Turk Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 7 / 26 MTurk data collected 5 clips from each of 185 random blog tracks = 925 clips I each seen by 3 turkers and described with 18 tags on average Total of 2,500 (user,clip) pairs, 15,500 (user,clip,tag) triples Paid $0.03–$0.05 per clip, total of about $100 Rejected 11% of responses as spammy or incomplete Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 8 / 26 MTurk data analysis: Tag co-occurrence How similar are the tag vectors from two different (user, clip) pairs? How does that vary for pairs with different relationships? For all pairs of observations (or a random subset) I I count number of shared tags add this count to a bucket based on the observations’ relationship Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 9 / 26 Tag co-occurrence vs “separation” (MajorMiner) 1.2 Co-occurring tags 1.0 0.8 0.6 0.4 0.2 0.0 Mandel, Eck, Bengio (LISA Lab) Nothing Artist Album Track Shared Learning tags Clip August 11th, 2010 10 / 26 Tag co-occurrence vs “separation” in one track Co-occurring tags above baseline 0.65 0.60 0.55 0.50 0.45 0.40 Mandel, Eck, Bengio (LISA Lab) 60 40 20 0 20 Separation (% of track) Learning tags 40 60 August 11th, 2010 11 / 26 Outline 1 Collecting tags 2 Tag language model 3 Autotagging experiment 4 Summary Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 12 / 26 Tag language model Predict joint distribution of tags for each clip I using knowledge of its proximity to other clips Up-weight false negative tags, down-weight false positives Conditional restricted Boltzmann machine I I I unsupervised model of p(tags | user, track, clip) trained on all tags of all clips smooth by computing E {tags | user, track, clip} for each clip Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 13 / 26 Restricted Boltzmann machine Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 14 / 26 Conditional restricted Boltzmann machine Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 15 / 26 Conditional restricted Boltzmann machine Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 15 / 26 Conditional restricted Boltzmann machine Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 15 / 26 Conditional restricted Boltzmann machine Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 15 / 26 Conditional restricted Boltzmann machine Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 15 / 26 Conditional restricted Boltzmann machine Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 15 / 26 Outline 1 Collecting tags 2 Tag language model 3 Autotagging experiment 4 Summary Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 16 / 26 Experiment Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 17 / 26 Experiment Two datasets I I MTurk: 925 clips, 100 tags MajorMiner: 2500 clips, 100 tags Each tag is treated as a separate classification problem I I Train using smoothed or aggregated data Test using smoothed or aggregated data Evaluate using classification accuracy on a balanced test sets Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 18 / 26 Accuracy on balanced test set, averaged over tags Mechanical Turk Trained Raw Smoothed Raw Tested Smoothed 56.87 ± 0.52 61.43 ± 0.51 56.56 ± 0.36 63.40 ± 0.35 MajorMiner Trained Raw Smoothed Mandel, Eck, Bengio (LISA Lab) Raw Tested Smoothed 65.97 ± 0.49 66.67 ± 0.49 Learning tags 60.58 ± 0.35 63.09 ± 0.35 August 11th, 2010 19 / 26 MTurk data, tested on raw tags hiphop techno disco dance electronica rap angry acoustic acousticguitar party club folk synthesizer rock violin guitar blues malevocals classic electricguitar slow femalevocals relaxed loud 80s metal country pop fast soft instrumental upbeat sad female piano bass trance banjo jazz funk male indie happy and love duet alternative christmas classicrock harmonica Mandel, Eck, Bengio (LISA Lab) Smoothed Raw 0.4 0.5 0.6 Learning tags 0.7 0.8 0.9 August 11th, 2010 20 / 26 MajorMiner data, tested on raw tags hiphop rap quiet club loud folk silence rock jazz dance ambient beat techno soft slow house piano pop funk saxophone drummachine female distorted noise repetitive synth 80s sample guitar strings electronica male punk end electronic vocal voice country trumpet acoustic bass horn british fast drum indie keyboard instruments organ solo0.4 Mandel, Eck, Bengio (LISA Lab) Smoothed Raw 0.5 0.6 0.7 Learning tags 0.8 0.9 1.0 August 11th, 2010 21 / 26 Outline 1 Collecting tags 2 Tag language model 3 Autotagging experiment 4 Summary Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 22 / 26 News Amazon’s Mechanical Turk can collect useful training data People agree more when describing “closer” musical excerpts Correlations over time can be used to “smooth” MTurk tags Smoothing improves overall classification accuracy from 57% to 61% Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 23 / 26 Future work Combine audio with joint tag model Exploit unlabeled and weakly labeled data Analyze song structure using tags more systematically Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 24 / 26 Thanks! Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 25 / 26 Thanks! Questions? Mandel, Eck, Bengio (LISA Lab) Learning tags August 11th, 2010 26 / 26
© Copyright 2026 Paperzz