Page |1 Page |I Abstract/Summary This report discusses the design and implementation of a real time, computer generated drum accompanist. The report starts off describing the motivation and aim of the project before discussing a selection of related work. This background work focuses on the concepts of tracking the timing of musical signals, deriving information from them, and creating a musical accompaniment. From there, the report gives an overview of the entire system then proceeds to go into detail on each of the components. Justification for each component and the methods used to create them are discussed alongside these details, as well as how each component contributes to the overall performance of system. The final portion of the report discusses the outcome of the project and how well it performed the designated task of providing a drum accompaniment to the user. The system described here makes use of machine listening techniques to analyze an incoming musical audio signal and determine various musical properties such as dynamics, tempo, and style. The system will then generate an expressive drum beat to accompany the audio signal in real time. These drum beats will be created with a set of genetic algorithms which are intended to model basic human creativity in a musical sense. The system is intended as a practice tool as well as a means to observe the musical interaction that may occur between humans and computers. P a g e | II Chapter 1: Introduction ...................................................................................................................................1 1.1 Motivation..............................................................................................................................................1 1.2 Overview ................................................................................................................................................2 1.2.1 Aim ..................................................................................................................................................2 1.2.2 Objectives........................................................................................................................................2 1.2.3 Requirements ..................................................................................................................................3 1.2.4 Enhancements ................................................................................................................................5 1.2.5 Deliverables.....................................................................................................................................6 1.3 Methodology..........................................................................................................................................6 1.3.1 Overview .........................................................................................................................................6 1.3.2 Development Stages .......................................................................................................................7 1.3.3 Schedule ..........................................................................................................................................8 Chapter 2: Related Background .................................................................................................................... 11 2.1 Audio Analysis and Onset Detection ................................................................................................... 11 2.2 Beat Tracking....................................................................................................................................... 12 2.3 Computer Generated Music ............................................................................................................... 14 Chapter 3: Design and Development ............................................................................................................ 17 3.1 Key Concepts ....................................................................................................................................... 17 3.2 Overall Architecture ............................................................................................................................ 17 3.3 General Approach ............................................................................................................................... 18 3.4 Modelling a Drum Kit .......................................................................................................................... 19 3.5 Input .................................................................................................................................................... 20 3.5 Audio Signal Analysis........................................................................................................................... 21 3.5.1 Onset Detection ........................................................................................................................... 21 3.5.2 Beat Tracking................................................................................................................................ 22 3.5.3 Volume ......................................................................................................................................... 23 3.5.4 Sustain .......................................................................................................................................... 24 3.5.5 Complexity ................................................................................................................................... 27 3.6 Drum Beat Generation ........................................................................................................................ 28 3.6.1 Scoring.......................................................................................................................................... 29 3.6.2 Evolution ...................................................................................................................................... 30 3.6.3 Introducing Cymbals and Toms.................................................................................................... 31 3.7 Drum Beat Enhancement .................................................................................................................... 32 3.7.1 Probability Values ........................................................................................................................ 33 3.7.2 Volume Values ............................................................................................................................. 35 P a g e | III 3.7.3 Tone and Position Values ............................................................................................................. 35 3.8 Output ................................................................................................................................................. 36 3.9 Simulink Model ................................................................................................................................... 36 Chapter 4: Results ......................................................................................................................................... 38 Chapter 5: Evaluation.................................................................................................................................... 40 5.1 Overview ............................................................................................................................................. 40 5.2 Quantitative Evaluation ...................................................................................................................... 40 5.2.1Beat Tracking ................................................................................................................................ 40 5.2.1.1 Paranoid Android ...................................................................................................................... 41 5.2.1.2 Blitzkrieg Bop: .......................................................................................................................... 43 5.2.2 Latency ......................................................................................................................................... 44 5.3 Qualitative Evaluation ......................................................................................................................... 46 Chapter 6: Conclusions and Further Work.................................................................................................... 48 6.1 Conclusions ......................................................................................................................................... 48 6.2 Further Work....................................................................................................................................... 49 References: ................................................................................................................................................... 50 Appendix A - Personal Reflection Appendix B - Interim Report Appendix C - Operation Manual Appendix D - Resources Used Page |1 Chapter 1: Introduction 1.1 Motivation Many musicians enjoy performing and practising music with other musicians, however, it is not always possible to assemble a group of like minded musicians together for this purpose. A software based solution to provide automatic accompaniment for musicians is a good way to practise performing with others, as well as gauging how a piece of music might sound with additional instruments. For those looking to play their music with a drummer, drum programming software is readily available to anyone with an internet connection. However, most of these programs only allow for the programming of simple looping drumbeats and can take time to perfect. Those who are not familiar with drumming may also have a hard time programming a drumbeat that sounds good and fits with the music being played. These drum machines also do not typically respond to the musician's playing which forces the musician to follow whatever has been programmed. This is particularly bad for practising as the player is not able play as expressively as they would with a human drummer. Various instrument playing robots have also been created, both programmable and improvisational. Unfortunately, most musicians cannot afford to purchase robots to use as band mates. What’s needed is a system which allows a musician to perform material on a live instrument and hear back a drumbeat which can go along with it. This could be useful as a practice tool and as a compositional aid for those who may not have immediate access to a human drummer. The system described in this report is a step towards the ability to provide a practice tool for musicians who may not be able to rehearse with other musicians by allowing them to play their instrument live and have a supporting drum beat provided to them. Many beginning musicians will practise by playing along with their favourite songs. However, this method does not allow one to take a leading role in the performance. The finished program can be a useful tool for soloists to improve their group performance in an interactive way and without relying on a metronome. The program may also be useful for those trying to write music as they could get an idea of how a given guitar or piano part would sound with drums supporting it. It will also be interesting to see how human musicians interact with a computer simulated drummer and how it may influence their performance. This project is relevant to the field of Artificial Intelligence as it is fundamentally a computerised model of human creativity as it relates to drum performance. This system is designed to exhibit the basic musical intelligence of a drum player in such a way that it can react appropriately to different musical cues and properties. Page |2 1.2 Overview 1.2.1 Aim The overall aim of this project is to design and implement a program which uses machine listening and learning to analyse audio produced by a human musician in real time. By analysing beat patterns and rhythms, the program should provide feedback in the form of an accompanying percussion part. The system will receive a musical audio signal as it is played, determine the tempo and beat pattern of the signal, then output a percussion accompaniment to the musical signal in real-time. First, a beat detection feature needs to be implemented. This feature must derive the beat structure from an incoming audio waveform by detecting onset events within the wave. These onsets are defined by sudden energy increases within the waveform and typically coincide with a note or chord being played on the instrument that produced the waveform. Once the beat structure has been determined, a fitting drum part will be generated in a predictive manner so that it will be played along with the human musician in real time. This drum part will be modelled on the components of a standard rock/jazz drum kit consisting of at least a bass drum, snare drum, and hi hats. Another aim is to provide a practice tool for musicians who may not be able to rehearse with other musicians. It will also be interesting to see how human musicians interact with a computer simulated drummer and how it may influence their performance. 1.2.2 Objectives The program can be divided into four primary components: Audio Input/Output Handling Signal Analysis Beat Tracker Drumbeat Generator Artificial Expression Simulator There are many ways to design an audio input/output system so it is important to employ a system which is efficient and effective. In order to handle streaming input, a buffer will be implemented so that small samples of the incoming audio can be analysed. The output will be handled in a similar way, most likely by playing out full measures of drumbeats at a time, though this may be subject to change. Page |3 The signal analysis component processes the incoming musical waveform into a form that the rest of the program can process. This step is most vital for the beat tracking aspect. Further analysis will attempt to detect other musical features that may be relevant to the style of the music. The beat tracker will determine the tempo of a piece of music based on a consistent pattern of onset moments occurring in the input signal. Because most humans do not possess perfect timing, the beat tracker will be able to adapt to slight changes in tempo without skipping notes or going out of time. It must also be responsive to expressive playing where onsets will not always be consistent and straightforward. The drumbeat generator makes use of a probability template to generate a simple but varying drumbeat. This is primarily to provide a groundwork for which the artificial expression simulator can create interesting and relevant beats to complement the musician's playing. This aspect will hopefully make further use of the input signal analysis to determine the dynamics and overall style of the music being played. 1.2.3 Requirements Key requirements of the project include the development of a robust and accurate beat tracking algorithm capable of tracking songs with non-constant tempos. It requires a suitable onset detection method from which the beat tracking algorithm can derive a tempo. The beat tracking process needs to work in real time with a constant input therefore it must be able to accurately predict approaching beat locations. A suitable drum beat template must also be created with a strong reliance on the derived tempo. This template will allow for the creation of basic and relatively complex beats while remaining flexible to changes in tempo. The template does not define the drum beat itself but rather provide an empty structure which allows particular drum hits to be specified. This will prevent the generated drum beat from playing out of time. Artificial expression is another key aspect of the system. In order to avoid repetitive and emotionless drum beats, an algorithmic approach to mimic human creativity will be implemented. This will allow the drum beat to respond to a musician’s dynamic level and style as they change throughout a song. With respect to the basic drum beat, extra hits and drum fills should included but set so they are not overdone, this will provide a more of a human effect to beat. The minimum requirements are as follows: Page |4 1. Implement an efficient input/output system The program will require a buffer system to analyse small samples of an audio signal as it is produced by a human musician. A circular buffer will most likely be implemented for this purpose. The output must also not interfere with the input signal as this would cause the program to be influenced by its own output rather than the human musician. 2. Analyse beat patterns within the audio signal Extracting the tempo from input audio is essential to creating a drumbeat which will stay in time with the user's playing. This can be accomplished by performing an onset analysis on the waveform and by finding patterns in the peaks that occur. Any regular interval occurring between peaks (even if other peaks are present during the interval) can produce the tempo. [7] 3. Perform in Real Time In order to provide a true accompaniment, the program must be able to analyse the audio signal, produce a beat, and play it back in time with the human musician. The program will have to predict what the user will play and when, then play back the drumbeat to go along with what was predicted. 4. Implement a genetic algorithm to produce potential beat candidates. The genetic algorithm will focus on generating patterns for each instrument of the drum kit to produce a whole beat. Beats are scored according to criteria relating to each instrument which will influence its overall score. The higher scoring beats will move on to the next generation, while beats meeting certain scoring requirements will be randomly selected for crossover and mutations. The lowest scoring beats will be scrapped and replaced with new randomly generated beats in order to provide new possible beats. This algorithm will require modification as development progresses. A potential beat created by this genetic algorithm could be visualised in the following way, with 1 indicating a hit and 0 representing a rest. The following beat in Figure 1 is for a single 4/4 measure split along 16th notes, it can be programmed as a simple integer matrix in most programming languages: Page |5 Figure 1. Drum Beat Matrix Representation (adapted from [5]) 5. Simulate a human drummer with feedback based on the previous analysis of tempo and patterns. Using all the previous minimum requirements, the final drumbeat will attempt to emulate a human drummer in the sense that it will adapt to change and provide suitable, non-repetitive drumbeats in real time. The minimum requirement for this virtual drummer's kit is a bass drum, snare drum, and hi hat. 1.2.4 Enhancements Most rock and jazz drummers possess more than just a snare, kick, and hi hat. Including these additional components would allow for more expressive drumbeats to be created. Each addition will require a new scoring system for the genetic algorithm, as well as the introduction of new rules in how each component can interact with the others. Acceptable drumbeats should be able to be played by a competent human drummer possessing two arms and two legs. Extracting musical features and information from the input can also improve the system. Further waveform analysis can result in information which will affect various characteristics of the generated drumbeat. Analysis may include changes in amplitude, the rate of onsets, and the rate of decay of any peaks as these can all help influence stylistic decisions which are made during drumbeat creation. [7] Because most drummers are not just drum machines, it is important to include a degree of variability within a set beat. This allows a single beat to be repeated a few times with enough variation to create a more natural effect. Instead of using 1's and 0's to indicate a hit or rest, values between 0 and 1 can be used to show the percentage of a hit occurring at that moment in time. Hits which help define the beat can be weighted so that they will always be a 1 or close to it, while less essential hits can be given lower values so that they are not repeated on every iteration. [5] Introducing tone values for the cymbals and toms as well as a position value for the hi hats may help to increase the realism of the virtual drum kit. This would create the effect of a drum kit possessing multiple toms and cymbals without having to create additional instrument tracks within the drumbeat template. With the hi hats, this value would represent the distance between the two cymbals. The tone values would also range from 0 to 1, with lower values corresponding to a lower tone/hi hat distance, and higher values indicating a higher tone/hi hat distance. [5] Also, allowing for volume adjustments would allow the program to create accents and ghost notes, again giving the beat a more human feel. Page |6 Additional features such as capacity for triplets, drum fills, and swing beats would also greatly increase the realism and musical capabilities of the virtual drum accompanist. Many drummers use drum fills to signify transitions, create tension, and to bridge a section of music where not much else is going on. A drum fill in this system would temporarily override the current beat for an appropriate duration so it does not sound like a second drummer has suddenly joined in and dropped out. The fills would need to be regulated so they do not occur too often or at inappropriate moments. This feature would greatly increase the program's versatility if correctly implemented. In an effort to make the program more user friendly, a simple user interface containing all the functions of the program could be implemented. This would allow any user to simply install the program and begin using it in a fully live situation without having to compile code or perform any unnecessary setup. 1.2.5 Deliverables The deliverables of this project will include a final report and a program which can take a musical input and output a drumbeat to accompany it in real time. 1.3 Methodology 1.3.1 Overview Iterative/incremental development [4] allows a developer to gradually incorporate individual features to a program in such a way that each addition results in a fully functional version of the program. Each addition follows a cycle of planning, design, implementation, testing, and evaluation. Every addition also gives the developer a chance to reconfigure other existing design aspects within the program if necessary. This method allows for a great deal of flexibility during architecture construction as well as early and easier bug detection. [4] [15] Each working implementation should be thoroughly tested and analysed before work on the next version begins. Individual features should also be clearly separable in order to accommodate modification. This process may call for a redesign of the system architecture should the need arise. [4] [15] A project such as this can benefit greatly from this method of development. Due to the reliance on genetic algorithms and probability based functions, it is difficult to predict exactly how the program will react to any given feature implementation. Utilising an iterative/incremental development method, each addition of a feature can be tested and optimised until the desired result is achieved. This development Page |7 method also states that features should be easily separable and well organized, this may allow the developer to disable a given feature at any point in the development cycle in order to evaluate its usefulness and efficiency. Sub-features may interact with each other but will be designed to be independent of one another. The project will follow the iterative/incremental [4] [15] method by first carrying out the initial planning phase which involves heavy research and a basic architecture design. A few starting features will also be considered during this stage in such a way that the next stage will produce workable results. The next step will be to create the basic beat detection and drumbeat generation structures. These two features make up the foundation of the project and will serve as a starting point for all additional features to be implemented. A series of optimising features will then be gradually incorporated into the program. 1.3.2 Development Stages The development stages of the project are shown below: Stage 1 Initial Planning Research Broad Architecture Proposals Stage 2 Beat Detection: Implement as a subsection of main function. Test to ensure detection is accurate Drumbeat Generation: Create basic drumbeat template using genetic algorithm (kick, snare, hi hat) Do not take input waveform into consideration, create template independent of waveform. Check to see that top beats are acceptable Beat generator will only respond to pre-recorded tracks at this point Stage 3 Beat Detector/Generator Syncing Link the two features together so that the generated drumbeat will be displayed in time with the song playback Latency analysis will start here for reference Stage 4 Incorporate basic wave feature analysis to augment beat generator (will use features to determine stylistic changes to beat). Will have a few different segments (each addition with latency analysis): Page |8 Dynamics: Volume parameters will be added to beat template. Are determined by relative local amplitude. Sustain: Observes peak trails to determine if legato (smooth) or staccato (detached) beat should be used Hi hat openness parameter introduced. Accents and ghost notes: Adjust volume parameters of individual hits to allow for more dynamic drum beats. Stage 5 Real time implementation for beat generator/detection system. May occur during the feature implementations of stage 4. Stage 6 Hit Probabilities Non essential drum hits can have probability values assigned to them to create the effect of a more varied drumbeat while still retaining the overall feel of the drumbeat. Additional tracks Cymbals and toms added to structure Stage 7 User testing Optimisation and Deployment 1.3.3 Schedule Below, the tentative schedule as of the writing of the interim report is shown. This schedule indicated that many of the styling features associated with the drum beat output would be programmed just after the submission of the interim report at the same time as the conversion to real time. Before this point the system only accepted pre-recorded audio tracks. This plan, shown in figure 2, allowed for plenty of time at the end for writing the report, evaluation, and optimisation. Page |9 Figure 2. Initial Project Schedule The chart below (figure 3) shows how the process actually occurred. Figure 3. Final/Actual Schedule P a g e | 10 Converting the system to real time proved to be much more difficult than previously anticipated. The beat tracking software used with pre-recorded tracks [LabROSA] was not designed for real time function, so I opted to convert their system for real time use. Unfortunately, when attempting real time processes in MATLAB using Simulink, many of the commands and techniques commonly used in MATLAB are not available. This required an extensive period of time in which all of the lines in the original code which threw out errors were substituted with alternate and less efficient blocks of code in order to circumvent the limitations of Simulink. Nearing the end of this process, it was demonstrated that a significant amount of latency was occurring during test runs and the clock timer built in to Simulink would also slow down. These factors led to the decision to start over using the aubio [10] beat tracking system which, while not as accurate, was fully ready to operate in real time. Following a series of licensing, installation, and driver issues which were resolved thanks to the help from the School of Computing support staff, the aubio system was able bring the system up to speed in terms of real time execution. This period resulted in other aspects of the project being pushed back, most notably the drum styling features which are considered vital to the emulation of a human drummer. The various obstacles which occurred during this time led to an early start on the final report. The limitations associated with Simulink also prevented the desired method of audio output, utilising a MIDI based drum kit to accompany the input signal. P a g e | 11 Chapter 2: Related Background This section gives an overview of previous work which relates to this project. Various audio analysis techniques are discussed first, followed by different approaches to the problem of beat tracking. The final sub-section looks at a few papers related to the artificial emulation of human creativity 2.1 Audio Analysis and Onset Detection Onset detection is a key aspect of the overall beat tracking algorithm. Bello, et al.[1] discuss various approaches involving differences in energy and phase within the waveform. Each method has a potential application that is largely based on the type of input that will be most commonly received. One method is to observe the spectral features of the waveform by performing a Fourier transform on the wave [2]. The features that may be derived from a waveform filtered this way are useful in detecting onsets amidst relatively noisy and layered inputs. Temporal features, which relate to a wave's amplitude, may also be taken into consideration. A valid onset typically occurs during a sudden increase in the waveform's amplitude [26]. It is described in [1] how rectifying and smoothing the signal can help to accentuate these onset features. It is also argued that the wave should be filtered so that high amplitudes which are not part of a sudden rise will be lowered. This makes it easier to identify where the onsets actually occur. It is stated that analysis of the temporal features is a fast and efficient method of onset detection and is particularly useful when the audio signal is being produced by single, accented instrument such as a guitar or piano. Ellis [11] has worked to develop a beat tracking program which utilises Matlab. This program analyses a waveform and generates an audio file which produces clicks corresponding to the detected beat. Ellis discusses using an onset strength envelope which is essentially a filtered representation of the original waveform. The onset strength envelope used in [11] locates sudden energy increases within the waveform and represents them as individual spikes in the processed waveform. Higher spikes tend to represent valid onsets. The filtering process involves performing a short-time Fourier transform similar to the spectral feature analysis approach described in [1]. The signal is also passed through a high pass filter and is then convolved with a Gaussian envelope [11]. This seems to be an effective approach though it would need to be modified for real time applications. Goto and Muraoka [13] attempt to recognise chord changes in a piece of music in order to detect onsets for the purpose of beat tracking. By focusing on the lower end of the frequency spectrum, the onset events are likely to occur on chord changes rather than on potentially complex melodies which are typically of a higher frequency. While this method is relevant due to its execution in real time, it may not P a g e | 12 be robust enough to accommodate pieces of music which do not feature clear and discernable chord changes. Tools such as Max/MSP and Pd can also be used for various aspects of audio analysis, as described in [16]. The fiddle system outlined in [16] attempts to determine the pitch of an incoming audio stream using sinusoidal decomposition. A rectangular-window discrete Fourier transform is used to obtain the peaks in the audio along with their corresponding frequencies and amplitudes. From there, the fundamental frequency of the input is estimated using a likelihood function in which each individual peak is matched to the nearest frequency corresponding to a musical note. This works for single note inputs as well as inputs consisting of more than one note and will display the note name(s) to the user. If an input is not near enough to any of these fundamental frequencies then the input is determined to not have any pitch at that moment in time. Another system described in [16] is the bonk application. This system is used to detect the onsets of percussive hits which are not pitched and therefore not susceptible to sinusoidal decomposition analysis [17]. The system uses spectral analysis to detect percussive onsets rather than looking for sudden, sharp increases in amplitude in order to avoid onsets being masked by loudly ringing sounds. This analysis is further used to help identify what instrument produced the onset, this is done by comparing the analysis with pre-stored and identified spectral templates. These systems, most notably the bonk program, may offer interesting features to future implementations of the virtual drum accompanist system. 2.2 Beat Tracking The concept of beat tracking has been explored in many different ways. Essentially, beat tracking is the process of deriving the tempo and beat pattern of a piece of music in the same way that a human may tap their foot in time with the music. A few approaches to this problem will be discussed in this section. Goto and Muraoka have developed a series of beat trackers, one of which derives the beat based on audible percussive hits [12]. This beat tracker is able to function in real time but relies on a steady drum beat to be present, whereas in this project the drum beat is to be responding to the tempo, not leading it. Another one of Goto and Muraoka's beat trackers [13] detects chord changes in order to determine patterns, specifically root note changes which occur between10Hz-1kHz. The beat tracker used by Zhe and Wang [35] detects measures by extrapolating from evenly spaced, pronounced downbeats. Subdivisions are then calculated to fill out each measure. This system assumes a basic pop song in 4/4 time is being played at a constant tempo. Songs which do not feature a pronounced downbeat may confuse the system, therefore limiting its ability to detect tempos in songs P a g e | 13 that do not follow pop conventions. This system is not likely to have any relevance to this project. The beat tracking system described by Ellis [11] utilises Matlab to analyse a waveform and generate an audio file which produces clicks corresponding to the detected beat. Ellis discusses using an onset strength envelope which is essentially a filtered representation of the original waveform. The onset strength envelope used by Bello et al. [1] locates sudden energy increases within the waveform and represents them as individual spikes in the processed waveform. Higher spikes tend to represent valid onsets. The filtering process involves performing a short-time Fourier transform similar to the spectral feature analysis approach also described in [11]. The signal is also passed through a high pass filter and is then convolved with a Gaussian envelope. The onset detection methods used by Ellis are derived from the filtering equations described by Bello et al. [1] By finding a pattern of recurring and equidistant peaks in the onset envelope, a general tempo can be easily derived. The tempo calculation is weighted to prevent extremely distant and near peaks from forming improbable tempos. The derived tempo is biased towards 120 beats per minute, a design decision which acts as probable middle ground for human created music. Extremely fast and slow tempos, while still possibly in time, would not represent the common human interpretation of the beat. This system is open source and easy to set up, though it is designed to work only with pre-recorded audio files. Collins [7] proposes a multi-agent beat tracking algorithm which follows a human musician playing a MIDI keyboard in real time. This method uses a number of agents which predict where the next beat should occur. Each agent, a hypothesis of beat locations, has a score and a weight to determine its accuracy. An agent's score is increased for making correct beat predictions which coincide with a human musician playing a note. Low scoring agents are erased and new ones are constantly created to allow for the tracking of dynamic tempos. Each agent contains a set of values describing that particular agent's current score, weight, and beat estimations. Whenever an onset occurs, each agent is checked to see how well it predicted the onset occurring at that particular time. Poorly performing agents are eliminated and new ones are generated based on the onset time. Scoring is weighted depending on the amount of time since the last onset in order to prevent subdivision of a beat from counting as an actual beat. Essentially, this weighting prevents overly fast tempos from being derived. Collins onset detection method is dependent on MIDI onsets rather than a waveform so it does not utilise any onset detection methods that would be required of a non-MIDI audio signal. The aubio real-time audio library [10] provides real time beat tracking for streaming audio input captured by a microphone. The library uses real time onset detection methods described in [1]. The real time aspect is possible due to the predictive nature of the system. The next predicted beat is determined by the intervals between the previous few beats with a Gaussian weighting applied, meaning the most recent bit intervals will have more influence. The predicted beat location is based on the location of the beat immediately preceding it. This makes the tracker very adaptable to changes in tempo, though it may P a g e | 14 be prone to error if a complex input rhythm is introduced. The primary advantage of this system is the absence of noticeable lag during the execution of the beat tracking function. Toiviainen [27] explains how adaptive oscillators can be used for beat tracking purposes. The even motion of an oscillator can easily map to the pulse of a beat. The peaks during oscillation can correspond to downbeats while the troughs can represent upbeats. In this way, reliable counting during a bar can be modelled. Onsets occurring near the peak of an oscillation can help influence the oscillator's speed depending on whether the peak occurs before or after the detected onset. This also helps to prevent onsets of complicated rhythms to have less influence on the derived tempo providing that a strong and clearly defined downbeat is present. Toiviainen's adaptive oscillator also takes into account short term and long term changes. If a sudden change in tempo occurs, the oscillator will increase speed in order to catch up before settling back to the original tempo. The long term change tracker looks at how the tempo has changed over time and adjusts the base speed of the oscillation to match. The adaptive and dynamic nature of this system looks to be promising though it requires a MIDI input for its onset detection aspect. The B-Keeper system uses the kick drum from a real drum kit to determine the beat in a live setting [24]. The program uses a microphone with a line into a Max/MSP program which performs the onset detection and beat tracking portions. The beat tracking aspect hooks up to a system to play backing tracks used in live performances without the need for the drummer to play to a click track in a pair of headphones. This allows for great expressivity within the band by not constraining them to an unchanging backing track. 2.3 Computer Generated Music Collins [7] explains how his system creates a melodic accompaniment to a human musician in real time by detecting the chords and notes being played and producing an opposing melody in order to inspire the musician to try something new. The rhythm of the melody is also designed to accent beats which the human musician is overlooking. While many interesting ideas are discussed, the virtual drummer of this project is meant to provide a more supportive role rather than pushing new ideas. Perhaps this idea could be incorporated as an option in a future version of the program for musicians who may be seeking new sources of inspiration or just looking for a challenge. Collins also touches on the creation of computer generated music in 'Algorithmic Composition Methods for Breakbeat Science' [5]. Collins explains how a series of probability templates can be set up which display the probability of a given instrument being activated at that particular moment in time. The example (a copy of figure 1) below displays a template for a single measure of 4 beats, each divided into a set of 16th notes. When the probability value is 1.0, the note is guaranteed to be played on every iteration of the measure and a 0.0 indicates no chance of activation. P a g e | 15 Figure 1. Drum Beat Matrix Representation (Adapted from [5]) This method allows for a non-repetitive drumbeat which will cause the generated part to sound slightly more human. Additional values can be attached to each location which affect volume and pitch if desired. The template will be set to synchronise with the beat pattern and will continuously update the time interval between notes as tempo changes in the musician's playing occur. The exact beat locations must be anticipated if the system is to keep up in real time [6]. Another method contains set beats of varying length, a member of the set has a probability of being executed rather than individual hits (which are 0 and 1 probability). It is also mentioned that additional probability values could be included that influence an effect or tone for a given hit (would only apply when p(hit) > 0). This concept could be used for varying volume levels in such a way that enables ghost notes and accents within a drumbeat. Having current hits influence the probability values of future hits will also help to create a more interesting drum part by potentially reducing chance based repetition. Collins is still continuing work on expanding various machine listening and learning techniques in computer generated musical accompaniment [8]. Weinberg et al.[28] have created a situation in which they can study the interaction between human and robotic musicians. Shimon is a four armed robot which plays the marimbas and Haile uses two appendages to play a hand drum. The robots are able to perform along with human musicians in real time, Haile providing percussion accompaniment [31] [30] [29] (with lead/support trading capabilities) and Shimon providing Thelonious Monk inspired marimba parts. Using physical robots to play instruments is to give a sense of personality to the robotic performers, making musical interaction with human musicians easier. Shimon is designed with to have a “head” which moves in time to the music and is able to track/follow fellow performers. The ideas discussed are very relevant this project but without the emphasis on physical robotic performers as this greatly complicates the process. In the future, however, the program could easily be modified to send musical instructions to robot designed to play a drum kit. The robot, Haile, can also be adapted to play a xylophone or small marimba [33] [32]. Haile's playing is determined by a genetic algorithm which uses melodic excerpts of a human pianist's playing as its base population. These excerpts set the style for what the robot will play, essentially providing the robot with a natural, human inspired starting point. The robot is able to freely improvise due to the musical instructions it receives from the other instruments. Note densities highly influence the robot's playing between lead and support mode, creating an atmosphere similar to when human musicians P a g e | 16 improvise together and take turns leading the song. The genetic algorithm used to determine what Haile plays takes roughly 0.1 seconds to run which enables it to quickly respond to changes in playing if needed. A notable difference between the Haile system and this project is that Haile relies on multiple samples of human playing rather than coming up with its own parts. This is useful for selecting a specific style of playing and may be useful in a future version of this program which enables a musician to select a drumming style for the song he or she will be playing. Ramirez, et al. [22] provide an approach rooted in the concept of genetic algorithms [18] and machine learning in order to model human musician expressivity [21]. The authors make use of recordings from a professional jazz saxophonist as a training set for which their computer generated composition will use as a basis for creating creation rules [20]. Because these rules are rooted in a particular style, the program will create a melody that is similar in style while still being relevant to song it was created for. The authors attempted to look at the problem from the point of view of a human musician and how they would interpret the music being presented to them before improvising an accompaniment [19]. This method, with its genre based training sets, may be useful for the creation of drumbeats that are intended for a particular style of music, similar to the work of Weinberg et al. [33] P a g e | 17 Chapter 3: Design and Development 3.1 Key Concepts Onset Detection In musical audio signal processing, an onset refers to the peak that occurs when a note on an instrument is first played. The onset marks the very beginning of the note and should not appear in the middle of a sustained note. Onset detection describes the processing of locating these peaks within an audio waveform, typically for the purpose of beat tracking. Beat/Tempo Tracking Beat tracking refers to process of determining the location of the beat of a musical signal, a beat being the pulse within the music that one would normally tap their foot to. The beat can be derived from a set of onsets by finding a consistent pattern between the onsets. The tempo of a song can be derived from these beat locations by looking at the common interval time between them. Drum Beats A drum beat is a musical pattern often performed on a drum kit to accompany a piece of music. The drum beat is played in a way which complements the music it is accompanying and can help keep a group of musicians in time with each other. Modelling Human Expression While human expression is quite an abstract concept, for the purposes of this report it refers to the way a human drummer plays a drum beat with variations in volume and style as well as variations on the drum beat itself. 3.2 Overall Architecture The system follows a circular process in which it first listens to a section of a streaming input which consists of a musician playing their instrument. The signal is then processed in order to determine the beat as well as various musical features. This information is then used to determine the fitness of potential drum beats created by a genetic algorithm. Once a winning drum beat has been chosen, it is P a g e | 18 then output back to the user with additional expressive features to simulate a realistic drum beat. This is a continuous process so that changes in tempo can be properly accounted for. This architecture, shown in figure 4, allows the system to be responsive to changes in the musician's playing, as well as having the potential to influence the musician to respond to the drum beat which is created. This back and forth interaction is at the heart of many live performances between human musicians, therefore, the system also attempts to emulate this process in order to provide for a more natural feel. Figure 4. System Architecture Flow 3.3 General Approach The audio signal analysis is what processes the incoming musical waveform into a form that the rest of the program can recognize. This step is most vital for the beat tracking aspect. The beat tracker will determine the tempo of song based on a consistent pattern of onset moments occurring in the input signal. As this project is geared toward guitar and piano players, a sampling of both guitar and piano performances will be used to test the beat tracker. Input is captured with a standard, inexpensive computer microphone to ensure that lower quality signals will be effective. For validation, the performances will feature sample songs of varying speeds, volume, and levels of complexity. Additional samples will contain non-consistent levels of these features, such as a song that is sped up and slowed P a g e | 19 down at varying rates. This will be necessary as few musicians are able to continuously play at a perfectly constant tempo. It is also important as some songs are very dynamic in this regard and should be accounted for. Additional analysis will include translating musical concepts into a basic numerical form. These concepts include dynamics, articulation, and complexity. The drum beat creation aspect makes use of a probability template to generate a simple but varying drum beat. This is primarily to provide a groundwork for which the artificial expression simulator can create interesting and relevant beats to the musician's playing. This aspect will make further use of the input signal analysis to determine the dynamics and overall style of the song being played. Because drumming is an art without too many set rules, many of the stylistic decisions made during drum beat creation are based on a few simple conventions which mostly relate to rock and jazz drumming. A wide range of musicians, with backgrounds in styles of music which commonly feature a drum kit, were polled for their opinion on some of the decisions made below. The results of this poll will be discussed as each concept requiring a musical assumption is brought up. 3.4 Modelling a Drum Kit In order to provide an accurate representation of a human drummer, a standard rock/jazz drum kit will need to be emulated. The image on the following page (figure 5) shows a five piece drum kit complete with hi hats and cymbals. Figure 5. Drum Kit Components P a g e | 20 The bass and snare drums typically make up the core of a drumbeat, with the bass drum commonly being struck first during a drum beat (known as the down beat) with the snare occurring in between on the backbeats. Depending on how hard the snare is hit, it can provide loud accents or softer filler notes known as ghost notes. The hi hats are used primarily to keep time, but when another drum is being used for time keeping the hi hats may be hit to provide accents. Cymbals will often provide loud accents to a drum beat but can also be used to keep time if desired. Toms range in sound and tone depending on size. They are used for accents and drum fills, and occasionally lower toms will be used to keep time. 3.5 Input The system receives input from a basic computer microphone picking up the signal of an acoustic instrument. The microphone is connected to a Linux machine running aubio [10]. aubio is an open source package with real time beat tracking and onset detection capabilities. The input signal is analyzed and the predicted beats are output as an audio signal in the form of clicks. aubio makes use of JACK (http://jackaudio.org/), an open source program which allows for real time audio interfacing. JACK allows the microphone signal to be sent to aubio and for the signal produced by aubio to be sent to the speaker channel. Figure 6 shows a graphical representation of this process. User Microphone Speakers or Headphones Linux machine w/ aubio and JACK Windows machine w/ MATLAB and Simulink Figure 6. User/Hardware Information Flow P a g e | 21 The click track produced by aubio is then output through the left speaker channel of the computer, while the raw input signal is sent through the right. A stereo speaker cable then leads to the microphone input of a Windows machine running an instance of MATLAB. The core program is contained within a Simulink model, a feature of MATLAB which allows for real-time handling of signals. The Simulink model receives the signal from the Linux machine and analyses it within two second windows. The signal is split into two arrays, one containing the click track and one containing the input signal. The tempo of the audio is then derived from the click track while the input signal undergoes further analysis. A parallel process at the start looks at the input signal to make sure the user is about to start playing. For a piece of music in 4/4, the musician needs to tap out eight hits before the program will begin. This can be done by clapping, tapping on the instrument, or strumming muted guitar strings. Once eight hits have been played, the musician can start playing as normal. Because aubio will continuously output the timing click track, this introduction gives it a chance to readjust itself to the new tempo. The tap-out introduction also helps the program to identify the down beat of the bar so that the drum beat will not only stay in time with the musician, but correctly align itself to the time signature. This technique is often used by musicians in a group for coordination purposes so it should not be an unfamiliar concept. 3.5 Audio Signal Analysis 3.5.1 Onset Detection First, a suitable onset detection method must be implemented in order for the beat tracker to successfully derive the tempo of the song. Running the signal through a short-time Fourier transform, followed by a high pass filter, then convolving the signal with a Gaussian envelope, a suitable onset strength envelope can be created [11]. An example of the onset envelope can be seen in figure 7. These methods essentially just accentuate possible onsets so that beat tracking is a simple matter of peak selection and pattern detection. These methods are also utilised by the real time aubio system, so its onset detection incorporated into the beat tracking system will be used. The input signal analysis must be executed in real time as there is no guarantee that a musician will be playing the same sample repeatedly and at a perfectly consistent tempo. P a g e | 22 Figure 7. Waveform Onset Envelope Derivation [11] 3.5.2 Beat Tracking Accurate beat tracking is vital to the performance of the virtual drum accompanist. If it cannot play in time with a human musician, then it would be considered a very poor accompanist. The beat tracking aspect of this system is provides the tempo for the drum beat playback aspect by analyzing the interval between beats. The image on the following page (figure 8) shows the stages of beat detection used in the LabROSA Cover Song ID package [11]. P a g e | 23 Figure 8. The top graph shows the raw waveform of a 17 second piano part. The graph below it displays the waveform after it has been processed to display onset strength envelope. The bottom plot displays the beat derived from the onset envelope [11]. Because this system is to be run in real time, the aubio beat tracking system [10] discussed in section 2.2 will be used to set the tempo for the output. 3.5.3 Volume A proficient drummer knows when to play loudly and when the play softly, often taking cues from the leading musicians as to what volume level is appropriate. The automatic volume control of this system establishes the output drum playing volume on the amplitude of the input signal. The original waveform is monitored and tracked for gradual changes in amplitude, as these will determine the overall volume of the drum beat. The program looks at the maximum amplitude within the current window and retains the value in order to set the base output volume. Because the amplitude of the P a g e | 24 input signal ranges from -1 to 1, the absolute value is observed, giving a volume range of 0 to about 1.3. This method for volume control was chosen for its simplicity and fast computing time. When a selection of musicians were asked if this was a valid musical assumption, all respondents agreed. The image below (figure 9) shows the first section of the piano part from The Great Gig in the Sky by Pink Floyd [34]. This song features very soft playing at first with a gradual crescendo (increase in amplitude) into a louder section with a guitar accompaniment. These volume changes are noted within the system to dictate the output volume of the drum beat. Figure 9. Waveform volume change 3.5.4 Sustain Articulation is a musical term which refers to the how the space between notes is handled, with either silence, sustain, or a degree of both. In the case of guitar and piano, sustained chords will often correspond to a smoother playing style, while notes with a noticeable silence between them can mean something a bit more disjointed is being performed. While there are no strict rules dictating how a drummer should react to differences in articulation, it is fairly common to see a drummer matching his or her articulations to that of the other musicians. Regardless of what direction a drummer takes in regard to articulation or sustain, it is a factor that should not be ignored. Sustain will be represented as a rating from 1 to 3, 1 meaning little to no sustain, 3 meaning full sustain, and 2 representing a moderate level of sustain. The decrease rate in amplitude between peaks is what will be observed in order to determine the sustain value. Quick, drastic drops in amplitude will give a sustain value of 1 while slow and mild drops in amplitude between peaks will produce a value of 3. Any decay rate in between these is given a 2. At the end of the window, the average sustain value is then sent forward to determine drum articulation. The sustain value does not carry over from window to window as this would greatly hinder the system's ability to quickly adapt to changing styles. P a g e | 25 Testing has shown that quick amplitude drops to below 20% of the maximum amplitude indicate a staccato (heavily disjointed) style of playing. Drops to this value over a longer period of time (typically greater than .4 seconds) are mainly due to the natural amplitude decay of a musical signal and therefore not considered to be staccato. Testing has also shown that if a signal retains at least 30% of its maximum amplitude, a high level of sustain is present. These values have been gathered with the use of inexpensive microphones, the use of compressors in any recording device would not be compatible with this system. The first image (figure 10) is from a slower, acoustic version of "Blitzkrieg Bop" by the Ramones [23]. The slower version of the song features long, sustained chords which accounts for the slow rate of decay in the signal. The signal holds around .4 in amplitude before the next onset occurs, further evidence of the sustained chord. Figure 10. Gradual amplitude decay with balance around .4 on the y axis. The next image (figure 11) is taken from the second movement of "Paranoid Android" by Radiohead [14]. This part features slightly quicker decay as only a single string is played at a time as opposed to full chords. This results in a slightly less smooth guitar part but with still with some degree of sustain dropping only slightly below .4. P a g e | 26 Figure 11. Quick initial amplitude decay with slow latent decay and/or balance. The final image (figure 12) is taken from the song "Hanuman" by Rodrigo y Gabriela [34]. This is a much more disjointed guitar part than the previous samples as evidenced by the quick drops in amplitude as well as the drops below .2 which indicate a brief silence from the guitar aside from background noise including the sound created by fingers sliding across the muted strings to new positions. Figure 12. Quick initial decay with occasional balances below .2. P a g e | 27 3.5.5 Complexity Rhythm is a very important aspect in music, the way a particular part is played on a melodic instrument can greatly influence how a human drummer will perform their part. Typically, rhythmically simple guitar or piano parts will feature an equally simple drum part. One can look at the classic punk band, The Ramones, to hear an example of this. For musicians who are practicing simple parts, a complex drum beat may be detrimental to the feel of the song and may also confuse novices who are not accustomed to intricate rhythms. Musicians who are able to play slightly more complex rhythms should therefore be able to handle increasingly complex drum parts. While some may argue that higher level playing does not always necessitate an elaborate drum beat, it may be useful for practice purposes to help push a musician to work with and around drum beats of varying difficulty. The complexity value is determined by looking at the number of offbeat onsets compared the number of detected beats. Where and is the total number of onsets, is the number of onsets which lie on a detected beat, is the total number of beats. This number typically ranges from 0 to anywhere above 2.0, where 0 indicates a very straightforward rhythm, while higher values correspond to an increased complexity. Ob is determined by looking at the time location of a given onset and comparing it with the time locations from the beat array. If the time locations are within 1/40th of a second, it is assumed that the onset is on the beat. While the onsets and beats which occur simultaneously normally have the same time value, occasional discrepancies between the two waveforms may result in slightly offset onset index locations. This may also be caused by the musician playing a note slightly ahead or behind the beat. Once found, the complexity value is sent forward to the genetic algorithm fitness function. The justification for this method comes from the observation that simple rhythms will follow the beat of the song fairly strictly, or at least within even subdivisions of the beat. Complex rhythms will deviate from the beat and typically do not feature consistently spaced notes. In figure 13, the graph on the left (from [34]) shows how each onset occurs at the same time as a beat indicator, this is representative of a very simple input and will be given a complexity value of 0. The graph on the right (from [14]) shows a number of onsets which do not correspond to any beat as well as a few beats which have no onset occurring simultaneously. The local window in this sample would have a higher complexity value due to these factors. P a g e | 28 Low Complexity Value High Complexity Value Figure 13. Low complexity value on the left due to an even beat:onset ratio with all onsets occurring on beat. The wave on the right has a higher complexity value due to the number of onsets which occur off beat, some beats also have no onsets associated with them. 3.6 Drum Beat Generation Artificial creativity is a rather abstract concept in which a model of human creativity can be represented algorithmically. Collins [7] employs a method of finding empty spaces within a musician’s rhythm and creates counter-melodies within the spaces. This provides an interestingly layered and complex melodic structure that is meant to inspire the human musician into exploring new musical ideas. This results in a constant trade of ideas between human and computer rather than having the computer restricted to a supporting role. Another approach makes use of evolutionary computation and genetic algorithms to generate artificial creativity [22]. This method creates a number of randomly generated musical segments, picks from the most suitable segments and uses them to seed new segments. This process is repeated until an acceptable segment is found. This system uses a similar technique by using drum beats in the form of matrices as musical segments. In the first generation, all of the potential drum beat candidates are randomly generated. From there, the fitness functions determines the validity of each drum beat to see what will pass on to the next P a g e | 29 generation, what will undergo mutation and crosser, and what will be purged and replaced with new, randomly generated drum beats. 3.6.1 Scoring Drum beats receive an overall fitness score which is determined by many independent factors. If a low complexity value has been detected (between 0 and .5 for most input signals which feature straightforward rhythms), each individual drum track will receive harsh penalties for extraneous hits and for not having the kick and snare follow simple patterns. The hi hat pattern must also stay on beat for time keeping purposes. Figure 14. A simplified look at drum beat candidate evaluation. The above image (figure 14) shows the factors that are taken into consideration when scoring a drum beat for a simple piece of music. The kick drum is scored in the following way: Where is the number of kick drum hits occurring on the down beats (beats 1 and 3) and being the total number kick drum hits. This method pushes kick drum tracks which keep the pulse of beat forward while an extra hit or two will be allowed without too much penalty. The maximum score is always capped at 10 to provide an evenly rounded score across all tracks. 10 was chosen for no other reason than simplicity based in percentages and to avoid the over-complication of the fitness functions. The snare drum is scored in the same way but with an emphasis on the upbeats (2 and 4). 10 P a g e | 30 The hi hat, being the primary time keeper in this system, is scored differently: Where is the total number of hi hat hits occurring on the beat and is the time signature numerator (commonly 4). This is done so the hi hats stay on beat to help keep time for the musician while still allowing for some variation on the off beats. When the input signal has been determined to be of moderate complexity (.5- 2.0 for most input signals which are not completely straightforward nor overly intricate) the same equation is applied but with (+2) appended to the end. This is done so the penalties attributed to extra hits are less severe, allowing for greater freedom of expression. This number ensures that more complex drum beats are possible while still ensuring that unacceptable beats are not passed on. For complexity values greater than 2.0, many restrictions are removed to allow for a wide range of potential drum beats of varying difficulty. Instead of the scoring system present in the lower complexity ranges, a set of hard coded penalties have been put in place to prevent very sparse drum beats as well as drum beat matrices which are overly full of hits, these beats are given maximum scores of 2 out of 10. Simple beats which only exhibit kick hits on 1 and 3 and snare hits on 2 and 4 are given a maximum score of 5. Otherwise, an acceptable beat will receive a score between 8 and 10. 3.6.2 Evolution Once the scoring process has completed, the transition to the next generation begins. Drum beats scoring in the top 25% are carried on to the next generation without modification. Another 25% of the next generation is made up of high scoring individual instrument tracks from different drum beats. The first is a hybrid of the top scoring instruments while the rest are randomly comprised of the top 25% scoring instrument tracks with at least one of them being randomly generated. This hybrid set of drum beats is purely experimental and does not always lead to an optimal solution but it does create a unique set of drum beats which can potentially rise to the top. A third 25% of drum beats are subject to vertical crossover in which the first section of a drum beat is combined with the complement end section of another drumbeat. Drum beats scoring in the top 50% are potentially subject to this. The dividing line is randomly decided with checks in place to prevent the same two drum beats from being split in the same place more than once. The following image (figure 15) graphically demonstrates these concepts. P a g e | 31 Figure 15. A look at generational transitions associated with the genetic algorithm. Finally, the final quarter of the next generation is randomly generated, ensuring a fresh supply of new drum beats and instrument tracks is available. Currently, the evolutionary process is run through only ten iterations, which has been shown to allow passable drum beats to be created without having the process converge on the same drum beat each run through. 3.6.3 Introducing Cymbals and Toms As there are no absolute strict rules for cymbal and tom hits within a drum beat, and some may argue going against convention can help push new creative boundaries, the rules defined by the system are fairly loose while still maintaining some degree of restraint. One restraining aspect regarding the cymbals is how the downbeat of every fourth drum beat iteration will feature a cymbal hit. There is no scientific justification for this other than in the author's own experience as a drummer, cues such as this can help keep a group of performers aware of their position within a song. Many patterns in rock and pop will feature repeats in multiples of a four which is why four loops was chosen, though this number can easily be changed. Anchors like this will help to reassure the user of where they are in a piece of music as well as assuring them that the program is in the correct location. Cymbals are also introduced to the drum beat mainly as accents which will replace hi hat hits, depending on the complexity of the input signal. Medium complexity inputs will have a 15% chance of a cymbal hit replacing a hi hat hit providing that a kick or snare drum hit is present as accented cymbals P a g e | 32 sound relatively weak without bottom end support. For higher complexity inputs, this number is increased to 20%. These numbers allow the cymbals to still be thought of as accents while not risking overuse. If the sustain and volume values of a signal are high enough, there is also a 50% chance that a cymbal will completely replace the hi hat track as a time keeper. The extra sustain associated with cymbals can often help strengthen and support louder, sustained chords. The 50% value is chosen as open hi hats and sustained cymbal hits are equally valid options. For higher complexity values, toms may be added in as filler between snare drum hits or to even replace certain snare and hi hat hits if there is a conflict (this is done to constrain the virtual drummer to simulate a human drummer which typically will only have two arms available). 3.7 Drum Beat Enhancement Table 2 shows a sample of the drum beat template. This template contains the basic elements of a standard rock or jazz drum kit. A volume parameter has also been included for each instrument. An overall volume will be derived from the relative amplitude of the incoming waveform. This volume value can be adjusted for individual hits in order to allow for accents and ghost notes, common techniques used in the composition of a dynamic drumbeat. The volume parameter ranges from 0 to 1. A tone parameter has also been introduced to the tom and cymbal instrument tracks. A lower value corresponds to a lower pitch and vice versa. This parameter thus allows the template to model a drum kit containing multiple toms and cymbals, as is common for most standard kits. The tone parameter also ranges from 0 to 1. The position value for the hi hat represents the distance between the two cymbals that make up a pair of hi hats. A value of 0 indicates tightly closed hi hats which produce a short, sharp sound. Completely open hi hats, a value of 1, occur when there is no contact between the two cymbals; this is rarely used by most human drummers. Values between 0 and 1 cover the remaining distance and allow for interesting crescendo effects, accents, and setting an overall style. Figure 16 displays a representation of the drum beat styling parameters. P a g e | 33 Beat Template Beat intr. parameter 1 kick probability 1.0 volume 0.8 probability 0.0 snare & 0.0 0.0 2 0.7 tom cymbal probability 0.0 0.0 3 0.0 0.8 0.0 0.0 0.0 volume hi hat 0.0 & 1.0 & 0.0 0.0 1.0 0.0 0.0 0.0 0.8 1.0 0.0 0.0 0.0 0.0 0.0 0.8 0.0 0.0 4 1.0 0.4 0.0 0.4 0.0 1.0 0.0 1.0 0.0 1.0 1.0 & 0.4 0.4 0.8 0.8 0.0 0.0 0.8 0.0 1.0 0.0 0.3 0.4 0.0 1.0 0.5 volume 0.8 0.8 0.8 0.8 0.8 0.8 0.8 1.0 position 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.7 1.0 1.0 0.0 volume 0.8 0.8 tone 0.6 0.4 0.0 0.0 hit 0.0 hit 1.0 volume 0.8 tone 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.5 Figure 16. Representation of drum beat styling paramaters. The values contained in the drum beat template are assigned by the drum beat styling aspect. The probability values are rolled before the final drum beat is sent out so that each potential hit is a 0 or a 1. This is done each time the beat is looped so that it has a chance of variation on each iteration. 3.7.1 Probability Values Bilmes [3] claims that the key to machine expressivity is in variation and deviation, without which any computer generated music will sound mechanical. To combat this, a set of probability values will be introduced so that repeated drum beats are not played exactly the same way every time. These probability values allow for variation within a drum beat without having to generate a completely new one which retains some similarity. The probability values assigned to each track determine how likely a given hit will occur each time it is looped. Some rhythms require less rigidity in a drum beat, so the probability values can help to keep things interesting as a particular beat is repeated. The probability values for each hit are determined randomly but must remain above the minimum in order to help preserve a more natural feeling drum beat. Each instrument's set of probability values are determined randomly but with varying minimum values. The following plot (figure 17) demonstrates how the lowest possible probability value for the kick drum is determined for a given location. This chart only relates to drum beats where the complexity value has been determined to be greater than .5, as inputs which score above .5 are typically of a high enough complexity to warrant these additional drumming techniques. P a g e | 34 1 & 2 & 3 & 4 & 1 & 2 & 3 & 4 & Figure 17. Top - Minimum possible probability values by location for the kick drum with an emphasis on higher values for beat one. Bottom - The snare is weighted as a counter to the bass drum in terms of possible probability values. The area above the curve is the range of possible probability values at each beat location, randomly generated within the acceptable bounds. In this case, the main pulse of the beat (on beats 1 and 3) has a higher minimum than on beats 2 and 4, this is done to avoid too much degradation of the normally important hits of the kick drum. The minimum probability function is simply a cosine wave ranging from 66-100 with clipping on peaks occurring after beat 1. The downbeat remains important while the offbeats are subject to a greater degree of variability. P a g e | 35 3.7.2 Volume Values The next stage is to set the volume levels for each hit. A base volume level will be set based on the relative amplitude of the input waveform. This base value will adjust to whatever the current dynamic level of the input signal may be, much in the same way a human drummer will respond to another musician playing louder or softer. In drumming, an accent is a hit which is played with an increased power level compared to the hits surrounding it. This is a commonly used technique by many drummers to increase the dynamic range of their drum beats (e.g. John Bonham's introduction on Rock n' Roll by Led Zeppelin). Ghost notes are when a drummer plays hit at a reduced volume, allowing for greater improvisational potential without overloading the drum beat with full volume hits. For the snare drum, accents and ghost notes are randomly decided when the complexity level is high enough. The volume level for ghost notes ranges anywhere from (BaseVolume/2) to (BaseVolume/2.5). Accents are played at double volume with a maximum of 1 as to avoid overly loud accents; a human drummer already playing at full volume will have a hard time distinguishing his or her accents by volume as well. Tom, cymbal, and kick hits are currently kept at a steady volume but are still adjusted overall during loud or quiet inputs. 3.7.3 Tone and Position Values The tom tone level will be set to a low value if the tom is being used as the primary beat/time keeper, essentially when it is being struck in an even, repeating pattern. Isolated tom hits can have completely random tone levels otherwise. Fast, repeated tom hits which make up a drum fill will commonly move from high to low, but other methods and combinations are just as valid and should not be discouraged. Cymbals are treated in much the same way, a lower value will be assigned if the cymbal is being used as a time keeper. This is to represent a ride cymbal which is typically larger than other cymbals. The position value largely relies on the input waveform (specifically, the intensity rating). A waveform with longer, high amplitude trails following the onsets is most likely representing an instrument that is being played with sustained chords. If the high hat is being used as the time keeper in this section, the position value will be somewhere near the middle. If the waveform features quick drops in amplitude after an onset, then this section of song will be more suited to a closed high hat featuring a lower position value. The higher position values are used for accents, which are indicated by a higher volume level for the hi hat at that moment in time. P a g e | 36 3.8 Output Due to constraints present in MATLAB and Simulink, audio output options are extremely limited. Instead, a visual representation of the drumbeat is displayed for evaluation purposes along with an audible click to indicate beat location. A separate vector array, located underneath the drum beat display, gives a visual cue of where the system is currently located in regards to the drum beat output. This output in figure 18 is for a simple kick/snare/hi hat drum beat. kick snare hi hats beat location indicator kick probability kick volume snare probability snare volume hi hat probability hi hat volume hi hat position Figure 18. Simulink visual representation of output. The top matrix represents (titled drumbeat) the current drum beat. The middle array (titled timing) shows the current location of the beat. The bottom matrix (titled beatfx) displays the styling parameters associated with the drum beat. 3.9 Simulink Model The system features discussed above are all contained in MATLAB and mostly handled through Simulink. Simulink allows for the use of Embedded MATLAB functions, which are similar to standard MATLAB functions but with real time functionality and C code generation options. Unfortunately, Embedded MATLAB is hindered by many restrictions including audio output, variable sized matrices, and the exclusion of some commonly used, built in MATLAB functions. The following images located in figure 19 display the Simulink architecture and explain some of the components within. P a g e | 37 Input Processing: -Volume -Complexity -Sustain Output Processing -drum beat styling -drum beat output timing ProgramStart =1 when input amplitude threshold is reached Input signal received from JACK 8000Hz Timing Issues: -Synchronisation -Time Signature Drum Beat Generation: -Genetic Algorithm -Create un-styled beat StartProcess =1 when 2 bars have been tapped out Figure 19. Simulink workspace. The three information flows being sent from the Output Processing block are directed to the drum beat visualisation matrices of Figure 18 in the previous section (3.8). The input signal is buffered into a two second window for analysis by the Input Processing block as well as being sent in 0.003125 second samples to the Timing Issues block. Input Processing handles all the waveform analysis needed to create and style the drum beat, namely the volume, complexity, and sustain values. Timing Issues is responsible for syncing the output of the drum beat with beat pattern being predicted by aubio. Beat Generation is the block which calls the genetic algorithm for drum beat creation. When a drum beat has been chosen, it is sent to the output processing block which adds the final styling features such as, accents, ghost notes, hi hat position. P a g e | 38 Chapter 4: Results Below is a set of a example drum beats produced by the program. Each drum beat is in 4/4 timing divided into eighth notes. A brief description of the beat and the input signal it was accompanying can be found to the right of each chart. These results were taken from the user evaluation stage in which only 8 count, kick/snare/hi hat beats were used in order to the simplify the output display for the user. vol 0.500 comp 0.000 sustain 1.000 4/4 snapping kick 1 0 0 0 1 0 0 1 snare 0 0 1 0 0 1 1 0 hat 1 1 1 0 1 0 1 0 beat 1 & 2 & 3 & 4 & kick prob. 1 1 1 1 1 1 1 1 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 1 1 1 1 1 1 1 1 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 1 1 1 1 1 1 1 1 vol. 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 pos. 1.0 1.0 vol. snare prob. vol. hi hat The input of for this drum beat consisted only of consistently timed finger-snapping into a microphone. The drum beat itself is fairly straight forward, but the offbeat kick and snare hits in the second half of the bar help to keep it interesting. The styling table below is locked with 100% probability and fixed volume for all tracks due to the low complexity value prob. 1.0 1.0 1.0 1.0 1.0 1.0 Figure 20. Drum beat to accompany consistent snapping. Just vol 1.072 comp 0.5695 sustain 2.429 kick 1 0 1 0 1 0 0 0 snare 0 0 1 0 0 1 1 0 hat 1 0 1 0 1 1 1 1 beat 1 & 2 & 3 & 4 & kick prob. 1 0 .8 0 .98 0 0 0 1.072 1.072 1.072 1.072 1.072 1.072 1.072 1.072 prob. 0 0 .99 0 0 .96 .85 0 vol. 0 0 1.072 0 0 .356 1.072 0 prob. .8 0 .91 0 .89 .87 .83 .82 vol. 1.072 0 1.072 0 1.072 1.072 1.072 1.072 pos. 2.4 0.0 vol. snare hi hat The guitar input here was played at full volume and is somewhat more complex but still with full chord strumming. Below, the styling directions can be observed. Having the complexity value above .5 allows for the inclusion of probability values as well as the potential of ghost notes and accents 2.4 0.0 2.4 2.4 2.4 2.4 Figure 21. Drum beat to accompany a strummed guitar part. P a g e | 39 vol 1.013 comp 1.750 sustain 1.920 1400 jam kick 1 0 1 1 0 0 0 0 snare 0 0 1 0 0 0 1 1 hat 1 1 1 1 1 1 0 1 beat 1 & 2 & 3 & 4 & kick prob. 1 0 .8 .85 0 0 0 0 1.013 0 1.013 1.013 0 0 0 0 prob. 0 0 .92 0 0 0 .87 .72 vol. 0 0 1.013 0 0 0 1.3 0 prob. .83 .85 .91 .99 .89 .95 0 .92 vol. 1.9 1.9 1.9 0 1.9 1.9 0 1.9 pos. 1.9 1.9 vol. snare hi hat The guitar part used in the creation of this drum beat was quite complex. It was played slightly muted but with some chords still left sustained. vol 1.117 comp 1.750 sustain 2.290 1.9 1.9 1.9 3 1.9 1.9 Figure 22. Drum beat to accompany a complex guitar part. paradise city In this guitar part, the notes of the chords are struck individually in a way that features many off-beat onsets. Every note typically rings out. kick 1 0 0 0 1 0 0 1 snare 1 0 1 0 0 0 1 0 hat 1 0 1 0 1 1 1 1 beat 1 & 2 & 3 & 4 & kick prob. .98 0 0 0 .92 0 0 .80 1.117 0 0 0 1.117 0 0 1.117 0 0 .92 0 0 0 .87 0 1.117 0 1.117 0 0 0 1.117 0 .83 0 .91 0 .89 .95 0 .92 1.117 0 1.117 0 1.117 1.117 1.117 1.117 vol. snare prob. vol. hi hat prob. vol. pos. 2.3 0 2.3 0 2.3 2.3 2.3 2.3 Figure 23. Drum beat to accompany a chord based guitar part where each note is struck individually. The complexity value can also be heavily influence by how quick the tap lead in is performed. If the user taps out the beat in eighth notes, the complexity value will be lower than if it is tapped out in quarter notes. This can be used as a way to give the user slightly more control of the drum beat generation process. P a g e | 40 Chapter 5: Evaluation 5.1 Overview This chapter will observe the performance of the system in terms of accuracy and style. The beat tracking and onset detection methods will be compared with their respective input waveforms in order to determine accuracy. Consistency within the beat tracking system is another aspect which will be evaluated. The beat locations will be compared to the beat locations determined by a human musician. Any delay in the output will also be noted as this can negatively affect the human musician and potentially render the system unusable. The evaluation performed looked at the average latency value between the input and output at different stages as well as any jitter than may be present. A consistent latency value would allow the program to accurately jump ahead and predict beat locations while a high amount of jitter would make accurate accompaniment impossible. As for the actual performance of the program, a selection of human musicians will be asked to try out the system and answer a series of questions regarding the quality of the experience. User feedback can help identify key problem areas as well as provide useful information on potential future implementations. 5.2 Quantitative Evaluation 5.2.1Beat Tracking The two beat tracking systems tested here are the LabROSA beat tracker [11] and the aubio beat tracker[10]. Each system was tested and compared with the human perceived beat. Analysis looks at the tempo derived from each method, represented as an average interval between each detected beat (in seconds). The standard deviation is another important factor to consider as it shows how consistent each method is. Some degree of variance is to be expected as the test audio files are played by humans so a natural, human deviation in tempo is present. The minimum and maximum time intervals are also shown to demonstrate the potential severity of the variance. Two song samples were chosen for analysis in this section as they are representative of the beat tracking capabilities of each system. The first song shown will be an excerpt from the second movement of Paranoid Android, originally performed by the group, Radiohead [14]. The song sample is played on a solo acoustic guitar. This sample features a rhythm with many offbeat notes and accents as well as some slight syncopation, making it ideal to test the limitations of any of the beat tracking systems. The second song will be a sample of the chorus from Blitzkrieg Bop by the Ramones [23]. The sample is slower than the original and played on an acoustic guitar. It was chosen because of its relatively straightforward rhythm, featuring accents only on the beats. P a g e | 41 5.2.1.1 Paranoid Android The charts below (figure 24) show the beat locations in relation to the song sample for each method. The first chart was produced by the LabROSA beat tracker [11]and looks to be fairly consistent and accurate (confirmed by listening the audio click track along with the original sample). The second chart was produced by aubio [10] and appears to be much less consistent, deriving a much slower tempo than what is present at the beginning. It should be noted that the tempo here is not being displayed in half-time, which would be acceptable, and many of the beat locations range from slightly to greatly offset from some of the obvious onsets. The third chart shows the beat locations determined by a human tapping along with the song sample. It should be noted that the aubio beat tracker does not show any beat locations for the first few beats as it has an initial warm up period when initialized. Figure 24. Waveform with slightly complex rhythm. Top - beat locations as determined by the LabROSA Cover Song ID beat tracker [11]. Middle - beat locations as determined by aubio beat tracker [10]. Bottom - beat locations as determined by a human musician. The chart below (figure 25) displays a numerical representation of what was stated above. The average interval time between beat locations as determined by the LabROSA beat tracker is very close to the human determined interval time. Both of these also have a low variance value which is seen as a P a g e | 42 positive due to the relatively consistent tempo of the input guitar part. The aubio beat tracker performed poorly on this sample, with a much higher variance and an extremely outlying maximum interval value. Paranoid Android LabROSA aubio human Average Interval 0.3355 0.4995 0.3385 Variance 0.0183 0.1818 0.0144 Max. Interval 0.3720 0.9056 0.3680 Min. Interval 0.2920 0.3019 0.3080 Figure 25. Accuracy information of different beat tracking methods. The Average Interval row displays the mean interval times for the entire beat location file (basically giving the tempo of the song). Variance shows how far the interval deviate from the mean on average. Max. Interval show the longest interval time within the beat location file while Min. Interval shows the shortest. All values are in seconds. The aubio package has a number of alternative onset detection methods which can be used with the beat tracker, however, as can be seen below in figure 26, little difference is made in terms of accuracy. One can conclude that the beat detection algorithm included in aubio would need improvements in order to handle more complex rhythms. Figure 26. Aubio beat detection utilising each onset detection method included in the program. While method produces greatly different results, no one of them stands out as being particularly accurate or consistent. P a g e | 43 5.2.1.2 Blitzkrieg Bop: The beat locations of the charts below ( figure 27) appear to be consistent across all three. This most likely due to simpler rhythmic pattern of this song sample compared to the previous one. Figure 27. Waveform with a simple rhythm. Top - beat locations as determined by the LabROSA Cover Song ID beat tracker [11]. Middle - beat locations as determined by aubio beat tracker [10]. Bottom - beat locations as determined by a human musician. The chart below (figure 28) also shows that both beat detection methods worked as well as the human tapping method. Blitzkrieg Bop LabROSA aubio human Average Interval 0.5163 0.5167 0.5163 Variance 0.0159 0.0124 0.0150 Max. Interval 0.5921 0.5480 0.5640 Min. Interval 0.4760 0.4920 0.4800 Figure 28. Accuracy information of different beat tracking methods. The Average Interval row displays the mean interval times for the entire beat location file (basically giving the tempo of the song). Variance shows how far the interval deviate from the mean on average. Max. Interval show the longest interval time within the beat location file while Min. Interval shows the shortest. All values are in seconds. P a g e | 44 5.2.2 Latency Latency was observed at two stages within the program. The difference between real input time and the output from JACK was the first to be observed. The output latency here is what the system on the second computer receives as its input. Figure 29 demonstrates this. input - JACK delay (s) Average delay time 0.0278 Variance 0.0022 Max. delay time 0.0360 Min. delay time 0.0240 Figure 29. Latency measurements from the initial signal input to the signal after beat tracking has been performed. The delay time after running the signal through the computer running aubio appears to be fairly predictable. The delay time itself is relatively short and the amount of jitter (variance) is also quite low. Users report that there was no real noticeable lag present at this stage in the system. The plot below (figure 30) shows the distribution of delay times. Figure 30. Distribution of delay times from the initial signal input to the signal after beat tracking has been performed. P a g e | 45 The second point observed was at the final output of the program. A signal was sent through each section of the chain to determine what the overall latency of the system would be. input - program delay (s) Average Interval 0.1015 Variance 0.0317 Max. Interval 0.1720 Min. Interval 0.0440 Figure 31. Distribution of delay times from the signal input to the final output signal (drum beat output). The delay time here is noticeably higher (figure 31). Taking the delay from JACK into consideration, the average would be about 0.0737 seconds. The variance here is also quite a bit higher, but between being noticeable and not to users. The image below shows the distribution of delay intervals (figure 32). Figure 32. Distribution of delay times from the initial signal input to the final output signal (drum beat output). P a g e | 46 When compared to the latency distribution from JACK, the difference is much more noticeable, as shown in the graph below (figure 33). Delay from JACK Delay from final output Figure 33. Scale comparison of Figures 30 and 32. The latency of the aubio and JACK systems is tolerable but the jitter associated with the program running on MATLAB is of questionable reliability. It does not seem that MATLAB itself is an environment that should be used for time dependent tasks such as required for this project. 5.3 Qualitative Evaluation Qualitative evaluation is very important to a project such as this, as it can be hard to measure musicality and entertainment mathematically, especially if the factors are largely based upon the personal preference of the user. Users were asked to test out the system using an acoustic guitar, playing a range of different styles, and then asked a series of questions regarding the performance of the program. Users were asked how accurate they felt the system was, what level of interaction was experienced, how they felt about the decisions made regarding the output drum beat, as well as some other general questions. The main complaint received from users was the lack of an audio representation of the drum beats presented to them. The visual cue was not nearly intuitive enough and proved to be difficult to interpret for users not familiar with drum tabs, though multiple users commented that this method could P a g e | 47 benefit those learning drums. The program would basically provide a number of drum beats to the guitar player who could then ask a drummer to play the beat on an actual drum kit. Another major complaint shared by users was the accuracy of the beat tracker when attempting songs without straightforward rhythms. One user felt that the drum beats did not take the actual rhythm of the input into consideration as much as it could have. The user felt the drum beats were too simple for some of the complex rhythms which he was playing and should have been based more on what as being played rather than just increasing the complexity. Other users commented on the fact that drum beats accompanying versechorus-verse structured songs were not reprised when melody patterns were repeated so the playing was not as cohesive as it could have been. As for the overall quality of the drum beats, reviews were mixed. Some felt that they provided a very suitable accompaniment to their guitar playing while others felt that the beats were relatively mediocre. One user felt that none of the generated beats would be considered acceptable for the style of music which was being played. No users felt that the program had an influence on what they were playing, saying it felt more like they were playing to the system and not with it. Users felt there was great potential in the system and were not expecting a perfect accompaniment from a program created in this time span. The general consensus was that this system could be very useful as a practice tool as well as a potential method for song writing. Users were very receptive to the concept of the system and a few were quite interested in how the beats were randomly generated and scored, though it was stated that the fitness functions should be more receptive to different styles of drumming than just basic rock beats. Some users said they would prefer to manually choose a genre style for the drum beat rather than having it derived from the input. Users also commented on the impracticality of requiring two computers to be able to fully run the program. An observation noted by the author is how the sustain value calculations were more tailored to the play style of author. The different playing techniques of other users resulted in unexpected sustain values which generate hi hat position values which did not necessarily fit with the part being played. In order to accurately and consistently derive this value, waveforms created by a wide range of musicians of varying skill levels playing the same set of parts would need to be observed. P a g e | 48 Chapter 6: Conclusions and Further Work 6.1 Conclusions This report presents a design and development of a virtual drum accompanist for musical composition and practice purposes. The ultimate goal is to have the virtual drummer seamlessly provide accurate and appropriate accompaniments to pieces of music as they are played. The system is to model a human drummer who has no prior knowledge of a song but can still create a suitable and dynamic drum beat to complement whatever is being played. Through detailed waveform analysis and accurate beat tracking, the virtual drummer can quickly compose and execute drum beats while modelling basic forms of creativity and expression commonly found in human musicians. This is an ongoing project with a prototype system currently being validated with a series of song samples meant to determine the overall adaptability of the system. There is much that can be done to optimise this system as it is still in the early stages of development. Future enhancements will help to increase the virtual drummer's versatility as well as its ability to interact with human musicians. The variables associated with the drum beat styling aspect are also undergoing constant analysis due to their weighted influence on drum beat candidates. When an optimal configuration has been found, after extensive testing with many users from a wide range of styles and abilities, the system may be put through an initial training phase to reduce the number of invalid drum beats created during the probability assignment phase. The system did, however, help to provide an interesting perspective on the interaction between human and virtual musicians. The way music and maths are related provides a motivation for modelling musical concepts with computers, and the best way to evaluate the system is to have human musicians interact with it. While the system is by no means a replacement for a human drummer, it provides musicians with something to experiment and to their creative boundaries with. The process of designing, implementing, and evaluating this system has shown that while a number of the techniques were valid and could be expanded upon, different approaches should be taken in regard to the completely random generation of drum beats. While an element of randomness and deviation is important to a system like this, perhaps it should not be completely reliant on it. The system present can serve as a springboard for future projects attempting to emulate the expressivity of human drummers. The potential for use as a practice tool could be seen as the primary focus and help give a greater sense of direction to further developments. P a g e | 49 6.2 Further Work Some key improvements needed in the system are in the beat detection portion and the audio output section. A beat tracker which is not as easily influenced by rhythm changes within the same tempo is of the greatest importance. The audio output aspect of the system also needs improvement which may require converting the source code to another language due to Simulink's restrictions on various standard MATLAB operations. Additional drum beat options, such as syncopation, swing beats, and drum fills could also greatly benefit the system by increasing its flexibility. Other future developments may include upgrades to the artificial expression simulator and the drum beat template. Additional parameters may include hit locations for particular instruments which can greatly expand the range of sound and create a more natural feel. Musical recognition techniques could also be very beneficial to the system. If a musician performs a piece with a recurring theme, the system should recognise it and reprise the drum beat that was previously associated with that theme. This feature would give the system the ability to participate in very structured songs in a more cohesive manner. This would also allow musicians to teach the system to play a song if it is able to pick up on these musical cues. The system could also be greatly expanded to include additional accompanists, such as piano, bass, guitar, horns, and stringed instruments. However, an extension like that would require a much higher level of audio analysis to extract the tones of individual notes. Another possibility is the implementation of a vision system which allows the user to give visual cues to the system to indicate movement changes, pauses, and other cues which are normally visually communicated between band members. P a g e | 50 References: [1] Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler M, (2005), A tutorial on onset detection in music signals, IEEE transactions on speech and audio processing, vol. 13, no. 5: pp.1035-1047. [2] Bello J P, Duxbury C, Davies M, Sandler M, (2004), On the Use of Phase and Energy for Musical Onset Detection in the Complex Domain, IEEE Signal Processing Letters, vol. 11, no. 6: pp. 553-556. [3] Bilmes J, (1993), Techniques to foster drum machine expressivity, ICMC 93, pp.276-283. [4] Cockburn A, (2008), Using both incremental and iterative development, STSC Cross Talk, 21 (5): pp. 27-30. [5] Collins N, (2001) Algorithmic composition methods for breakbeat science, Proceedings of Music Without Walls. [6] Collins N, (2007), Musical robots and listening machines, Cambridge Companion to Electronic Music, pp. 171184. [7] Collins N, (2010), Contrary Motion: An oppositional interactive music system, NIME Conference, pp. 125-129. [8] Collins N, (2011), LL: Listening and Learning in an Interactive Improvisation System, University of Sussex, unpublished. [9] Danby E, Ng K (2011), Virtual Drum Accompanist: Interactive Multimedia System to Model Expression of Human Drummers, Conference on Distributed Multimedia Systems, vol. 17: pp. 110-113. [10] Davies M, Brossier P, Plumbley M, (2005), Beat Tracking Towards Automatic Musical Accompaniment, Audio Engineering Society Convention, 118. [11] Ellis D, (2007), Beat tracking by dynamic programming, Journal of New Music Research, 36:1: pp. 51-60. [12] Goto M, Muraoka Y, (1994), A Beat Tracking System for Acoustic Signals of Music, ACM Multimedia 94 Proceedings, pp. 365-372. [13] Goto M, Muraoka Y, (1999), Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions, Speech Communication, Volume 27: pp. 311–355. [14] Greenwood C, Greenwood J, O'Brien E, Selway P, Yorke T, (1997), "Paranoid Android", OK Computer, Parlophone. [15] Larman C, Basili V, (2003), Iterative and Incremental Development: A Brief History, IEEE Computer Society, 36 (6): pp. 47-56. [16] Puckette M, Apel T, Zicarelli D, (1998), Real-time audio analysis tools for Pd and MSP, ICMC 98. [17] Puckette M, Brown J, (1998), Accuracy of Frequency Estimates from the Phase Vocoder, IEEE Transactions on Speech and Audio Processing, vol. 6 no. 2: pp. 166-176. [18] Ramirez R, Hazan A, (2005), Understanding Expressive Music Performance Using Genetic Algorithms, European Workshop on Evolutionary Music and Art, Berlin:Springer, pp. 508-516. [19] Ramirez R, Hazan A, Maestre E, Pertusa A, Gomez E, Serra X, (2007), Performance Based Interpreter Identification in Saxophone Audio Recordings, IEEE Transactions on Circuits and Systems for Video Technology, 17(3): pp. 356-364. [20] Ramirez R, Hazan A, Maestre E, Serra X, (2005), Understanding Expressive Transformations in Saxophone Jazz Performances, Journal of New Music Research, 34(4): pp.319-330. P a g e | 51 [21] Ramirez R, Hazan A, Maestre E, Serra X, (2006), A Data Mining Approach to Expressive Music Performance Modelling, Multimedia Data Mining and Knowledge Discovery, Berlin: Springer, pp.362-380. [22] Ramirez R, Hazan A, Maestre E, Serra X, (2008), A genetic rule-based model of expressive performance for jazz saxophone, Computer Music Journal, 32:1: pp. 38-50. [23] Ramone D, Ramone T, (1976), "Blitzkrieg Bop", Ramones, Sire Records. [24] Robertson A, Plumbly M, (2007), B-Keeper: A Beat-Tracker for Live Performance. NIME07, pp. 234-237. [25] Sánchez R, Quintero G, (2009), "Hanuman", 11:11, Rubyworks. [26] Schloss A, (1985), On the Automatic Transcription of Percussive Music - From Acoustic signal to High-Level Analysis, Stanford University Ph.D. dissertation, Tech. Rep. STAN-M-27. [27] Toiviainen P, (1998), An Interactive MIDI Accompanist, Computer Music Journal, Vol. 22, No. 4: pp. 63-75. [28] Weinberg G, Blosser B, Mallikarjuna T, Raman A, (2009) The creation of a multi-human, multi-robot interactive jam session, NIME Conference, pp. 70-73. [29] Weinberg G, Driscoll S, Parry M, (2005), Musical Interactions with a Perceptual Robotic Percussionist, IEEE International Workshop on Robots and Human Interactive Communication, pp. 456-461. [30] Weinbrerg G, Driscoll S, (2006), Human Interaction with an Anthropomorphic Percussionist, CHI 2006 Proceedings, pp. 1229 - 1232. [31] Weinberg G, Driscoll S, (2006), Towards Robotic Musicianship, Computer Music Journal, 30:4: pp.28-45. [32] Weinberg G & Driscoll S, (2007), The Design of a Perceptual and Improvisational Robotic Marimba Player, IEEE International Conference on Robot & Human Interactive Communication, 15: pp.769-774. [33] Weinberg G, Godfrey M, Rae A, Rhoads J, (2007), A real-time genetic algorithm in human-robot musical improvisation, CMMR, pp. 351-259. [34] Wright R, Torry C, (1973), "The Great Gig in the Sky", The Dark Side of the Moon, Harvest/Capitol. [35] Zhe J, Wang Y, (2008), Complexity-Scalable Beat Detection with MP3 Audio Bitstreams, Computer Music Journal, 32:1: pp. 71-8 Page |1 Appendix A: Personal Reflection Many unforeseen roadblocks were encountered during the course of this project. The primary impeding factor was the limitations associated with Simulink in regards to the MATLAB programming language. Initial testing and design was done in the standard MATLAB environment and it was somewhat surprising to see that many of the techniques already implemented had to be rewritten or adjusted to increase compatibility with Simulink. Looking back, MATLAB was probably not the best choice of software for a project which relied heavily on timing as it is not always accurate in that regard. There were a variety of methods for implementing MIDI output and processing for the standard MATLAB package, so when the decision to use MATLAB was made I had assumed that everything would work out. Too much time was spent adjusting everything for use in Simulink that it severely detracted or prevented important timing and audio output issues. On another note, the DMS conference (http://www.ksi.edu/seke/dms11.html) provided a great opportunity to see what others in the field had been working on, especially in regards to computing in music. The conference deadline pushed many things forward and greatly helped with the writing of this final report. Without the extra pressure from submission and presentation deadlines from the conference, the system architecture and some of the methods would not have been developed as early and may have been rushed at the last minute. Page |2 Appendix B: Interim Report Page |3 Appendix C: Operation Manual Setup: The aubio library (http://aubio.org/download) and JACK (jackaudio.org)need to be downloaded and installed to a Linux machine. A computer microphone should be connected to the microphone input on this computer. Another computer must have an up to date version of MATLAB with Simulink. A license for the Signal Processing Blockset within Simulink is also required. A male-male 1/8" stereo headphone/speaker cable must be connected to the microphone input on this computer with the other end plugged into the headphone/speaker output of the first machine (the one running aubio and JACK). Download the compressed folder containing all of the necessary MATLAB code. Page |4 Set up software: On the aubio machine. Start up the JACK software and click start. Open up a command prompt and enter: aubiotrack -j -t .5 where '-j' tells aubio to use jack and '-t .5' sets the input threshold amplitude at .5. This value can be raised or lowered depending on the users play style. On the JACK interface, click the 'Connect' button. Open the 'system' and 'aubio' drop down menus. Drag out_1 to playback_1, and capture_1 to playback_2 and in_1. The image below demonstrates how this will look. Open MATLAB and map to the drive which contains drummer.mdl as well as all the other code from the compressed folder. Open drummer.mdl. Running: Simply click the run button on the drummer.mdl model window, tap out 8 counts to properly align the beat tracker, and begin playing right on count 9. Page |5 Appendix D: Resources Used Aubio - http://aubio.org/download Real time beat tracking software JACK - jackaudio.org Audio interfacing/routing between applications LabROSA Cover Song ID - http://labrosa.ee.columbia.edu/projects/coversongs/ Beat detection and onset detection MATLAB/Simulink Programming environment, user interface Audacity - http://www.download-audacity.com/ Recording and evaluation for aubio processes
© Copyright 2026 Paperzz