Journal of New Music Research 2002, Vol. 31, No. 2, pp. 119–129 0929-8215/02/3102-119$16.00 © Swets & Zeitlinger Gestural Control of Digital Audio Environments Sylviane Sapir Electronic Music Department, Conservatory of Cagliari, Italy Abstract In a traditional musical situation based on the strict use of acoustic instruments, human gesture usually produces sound. The relationship between gesture and sound is unique, it is a cause to effect link. In computer music, the possibility of uncoupling gesture from sound is due to the fact that computer can carry out all the aspects of sound production from composition up to interpretation and performance. Real-time computing technology and development of human gesture tracking systems may enable gesture to be introduced again into the practice of computer music, but with a completely renewed approach. There is no more need to create direct cause to effect relationships for sound production, and gesture may be seen as another musical parameter to play with in the context of interactive musical performances. 1. Introduction Musical performances within the context of computer music implicate an interaction between human beings and digital instruments which result in audible and maybe visible effects by an audience. Performers interact with their body by manipulating physical objects or by freely moving into space. The forces applied with body parts to the objects or the dynamic characteristics of the movements such as the trajectory or the velocity have to be captured by sensors, converted into data that represents the movements and then turned into sound after processing. The terms of movements and gesture may be ambiguous and an unconditional definition of gesture is still problematic. In this paper gesture and movement are referred to the motions of parts of the human body, but gesture involve manipulative or communicative movements founded on an intent, in our case a musical intent. The study and the analysis of human gesture in music area for the control of digital instruments, along with standardization requirements coming from the industry have led to the birth of MIDI devices and their communication protocol. This protocol has often been judged insufficient, because of its narrow bandwidth and its limitations due to the pianistic gesture reproduction (Moore, 1987; McMillen, 1994; Wright, 1994). This is even more true today as the relationship between composer and technology is no more exclusive but is extended to other forms of artistic projects in which multimedia interact with other forms of expression such as dance, theatre or video art – predilected fields of body communication. However, and maybe in spite of all its defects, MIDI remains the most popular way of controlling digital audio synthesis. An explanation may be found in the MIDI approach itself which detaches the interface from the sound source, thus enabling the expansion of innovative controllers. Another explanation may be more prosaically found in the low cost and easy implementation of the MIDI technology. The development of interactive audio environments which capture and analyze movements of one or more performers in real-time to arbitrary control sound processing systems allow the traditional aspects of composition and sound synthesis in computer music to be connected with those less codified of the body expression. In this way it’ll be possible to establish new gestural interactions in the perspective of an extended musical creative process which involves the spectacular, affective and emotive aspects of the human gesture. This field of investigation is at the core of promising research from many different disciplines such as computer science, psychology, linguistics and music. A very detailed bibliography, with interesting theoretical articles and cases of study may be found in the IRCAM’s electronic publication “Trends in Gestural Control of Music” (Wanderley & Battier eds., 2000). 2. Sound and gesture The possibility of producing music in a different place and at a different moment in which it may be listened has incred- Accepted: 8 December, 2001 Correspondence: Via Cimate 92, 03018 Paliano (FR), Italy. E-mail: [email protected] 120 Sylviane Sapir ibly powered up the possibilities of artistic and creative processes. Furthermore, the possibility of creating music material in all its details by physically describing the sound event and by taking into account the limits of our perception, has freed us from the mechanical constraints of traditional musical instruments. By detaching the control of sound from its production either in space or time opens new horizons to composers and performers. 2.1. Control and automation The progress of digital audio technology of these last 20 years is so impressive that we may wonder how it will continue and what its impact could be on music production and other forms of creativity. J. A. Moorer (2000) in an AES lecture foresees that the main problem facing digital audio engineers will not be how to perform a particular manipulation of sound, but how the amount of power that will be available can possibly be controlled. In effect, the growth of processor speed, the development of new processors architecture and networking capacity, the reduction of memory access time, the increasing volume of hard-disk storage, etc. leads Moorer to assume that in 20 years we should be able to process 9000 channels of sound in real time, doing million-point FFTs and that processes that were considered unfeasible because they were too complex will become matter of fact. But what will happen if a mixing console has 2000 or more controls on it, or if a digital audio workstation has two million virtual sliders and buttons? The only viable solution, according to Moorer, will be the rise of “intelligent assistants.” This extreme example which belongs more to the audio engineering domain than to the musical/artistic one invites us to reflect on human control requirements respect to the increasing needs of computing power for sound processing. Intelligent assistance is the crucial point of computer music, and it is fundamental for the control of real time performances. Nowadays it is possible to interact with a digital system through its user interface which can convey information through many communication channels such as hearing, sight and touch. But as seen before, the number of parameters to be managed simultaneously may be very big which makes real time control a complicate task for performers. To overcome this problem more solutions have been found such as grouping parameters or by introducing automation in performances. A software layer is always necessary between gesture acquisition and sound production: gesture management software and performance software. Performance software includes sequencers, arrangers, score followers and other rule-based applications which allow computers to be charged with specific tasks in order to relieve performers from some low level controls. Dividing control between computer and performers allows one to go beyond traditional performance modalities. When gestural data follows the MIDI format, it is easy to find some program that can manage it, like MIDI sequencers and more general development environment such as Max (Puckette, 1991). But other environments like PLAY (Chadabe, 1978), HMSL (Polansky & Rosenboom, 1987), or ARCTIC (Dannenberg et al., 1986), and more recently HARP (Camurri, 1995) or SICIB (Morales-Manzarenes, 1997) have been specifically developed for interactive music performance systems. These environments support dedicated languages that allow one to define and handle musical control flow, i.e., a sequence of instructions that manage musical and gestural events in a formal way. 2.2. New meanings for gesture Acoustic musical instruments, including voice are the most natural and traditional interfaces used by musicians to perform music. But technology has brought to light new instruments that can transduce human motion into electrical signals, and then pilot any kind of sound processing. As described by T. Winkler (1995), pioneering work on gestural input devices was carried out during the analog electronic music period. The invention of the Theremin anticipated the development of new expressive instruments. It is noticeable that the first electronic musical instrument, a simple oscillator, performed without physical touch could produce very subtle and varied sounds because the sound reflected more the expressive quality of human movement than its own timbre quality. A main question in using new input devices is whether to use controllers based on existing instruments or innovative ones. The main advantage of using the former is that musicians are already familiar with them and can easily exploit their performance skills learnt over years of practice. But new themes of reflection arise when gesture is no more linked to sound production and when traditional expressions of virtuosity hardly find place in the music which is performed. By the way, we may wonder if virtuosity is still needed. Virtuosity has to deal with notions of skill development, effort, and performance boundary. Many artists believe that the ease of use of gestural controllers leads to a loss of expressive power. According to David Wessel (2000) easy to use controllers proposed for musical purpose have a toy-like character which does not invite musical exploration and continued musical evolution. Virtuosity seems to be strongly linked to expression and musicality and thus strongly required in the context of performing arts. On the other hand, new models of controllers may be simpler or more intuitive to use for non-skilled performers and well adapted to “getting started” situations. They also allow the introduction of new gestures which better fit control features when precision is required. Gestural control of digital audio environments Ergonomics, mechanical and temporal precision, flexibility and programmability are the characteristics more often needed for new controllers. In the mean time innovative input devices make performances more spectacular and may extend expressive possibilities of computer-based interactive music systems. Finally the selection of the gestural input device, and consequently those of the category of gesture to be used for the performance belongs more to a musical or a compositional choice than to a technological one – gesture becomes part of the new musical parameters. 3. Gesture tracking technology The explosion of new devices that enables capturing and tracking movement is symptomatic of a great deal of interest from many fields. The development of new interactive systems based on gesture is concerned with both industrial and artistic applications just like virtual reality and multimedia applications. 3.1. Requirements for a tracking system As described by A. Mulder (1994), human movement tracking systems can be classified as inside-in, inside-out and outside-in systems. Inside-in and out systems employ sensors on the body while outside-in systems employ external sensors. Both types have strengths and weaknesses. For example, the former are generally considered more obtrusive than the latter. But the category of gesture that has to be captured will determine the kind of system to use (Depalle et al., 1997). In the case of instrumental gesture, it is possible to directly capture the gesture through sensors. MIDI devices belong to this category. It is also possible to indirectly capture gesture by analyzing the sound produced by an instrument and deducing information about the gestures applied by a performer to generate it. In this case, it is necessary to provide very sophisticated programs that extract significant gestural/audio information in real-time. The following figure describes a very generic overall tracking system: Gesture Fig. 1. Sensors Computer application Gestural Data 121 3.2. Some hardware technology Any technology that can transduce human gesture or movement into electrical signal is available to build sensors. Regardless of sensor type, it is important to recognize not only what is being measured, but how it is being measured: resolution and sampling rate are fundamental aspects to take into account. The commonly used technologies include infra red, ultra sonic, hall effect, electromagnetic and video. Keyboard, switch, pushbutton, slider, joystick, mouse, graphic tablet, touch-sensitive pad, etc. belong to computer hardware. However, although such devices have been intensively used as gestural input devices for musical purpose they hardly fit musical gesturality requirements such as the sense of tactile feedback, and they miss natural expressive power. On the other hand, musical keyboards, foot pedals, drum pads, ribbon controllers, breath controllers and other instrument-like controllers which belong to the MIDI world have been exclusively developed for musical aim. They capture musical gesture on the scheme of the MIDI Protocol, but they are sometimes used for other purposes, lighting controls for example. Many other technologies have been developed to sense and track a performer’s movements in space, and not only for musical performances. On-body sensors measure angle, force, or position of various parts of the body. For example, piezo-resistive and piezo-electric technologies are very common for tracking small body parts such as fingers through the use of gloves. Spatial sensors detect location in space by proximity to hardware sensors or by location within a projected grid. The Radio Drum, for example, measures the location of two batons in three dimensions in relation to a radio receiver (Boie et al., 1989). Videocamera-based technologies are used to track larger marked body parts and are often used to integrate music and movement languages such as dance or theater (Camurri, 1995). Force feedback devices developed at ACROE for interactive systems using physical modeling synthesis are very interesting attempts to offer tactile-kinesthetic feedback to performers (Cadoz et al., 1984). By feeding back the sensation of effort, it is possible for the performer to have a finest control on its movements. Many musicians have built their own input devices as prototypes by using microphones, accelerometers and other types of sensors combined to electronic circuitry. But all this hardware has sense if it is connected to the computer and managed by some software – the performance software. Scheme for a human gesture tracking system. Gesture is captured by sensors which transduce any physical energy into an electrical signal which will be converted into a digital format. Dedicated computer applications analyze such data in order to extract significant gestural information that will be used for the control of sound processing or other musical application. 3.3. What about software? As described by Roads (1996) in his Computer Music Tutorial, at the core of every performance program is a software representation of the music. By working with DSPs to process audio data, we have to deal with a lot of different types of parameters at a low level. 122 Sylviane Sapir Some parameters have discrete representation, some other are continuous, some have time constraints, some not, etc. On the other hand gestural data also has its own internal representation. Performance software has to support all these representations, and to map sensor data onto algorithm parameters in a meaningful musical way, i.e., adapted to the styles of music and movement. Timing is another focal point of performance software, as for any kind of reactive computing application. These applications require reliable, real-time performance compatible with intimate human interaction. Software research for the integration of robust temporal scheduling in operating systems, drivers, programming languages and environments is fundamental to provide good software tools that allow musical control flow description and interactive music applications without latency problems. In our case, latency is the delayed audio reaction of a system respect to the gesture actuation time. The main causes of latency may be found in weak operating systems scheduling strategies and non-optimized software applications, but also in time consuming operations such as intensive disk access. Wessel and Wright (2000) state that the acceptable upper bound computer’s audible reaction to gesture is 10 milliseconds. Furthermore they argue that latency variation should not exceed 1 millisecond in order to be compatible with measurements realized through psychoacoustic experiments on temporal auditory acuity. They suggest to solve the latency problem by using time tags or by treating gesture as continuous signals tightly synchronized with the audio I/O stream. A solution which excludes the use of MIDI equipment and suggests the development of new device and protocols. 4. The mapping concept The second step after having captured some gesture is to decide how the gestural data will be related to sound processing. The mapping problem is at the center of the effectiveness of an interactive environment. To a certain extent, as described by Winkler (1995), we can consider human movement in space as a musical instrument if we allow the physicality of movement to impact on musical material and processes, and if we are able to map physical parameters of movement on high level musical parameters down to low level sound parameters. Adding some musical intelligence inside the control system implies thinking of the range of operations to deal with, the conversion mechanisms involved, the temporal independence of parameters, grouping or splitting gestural channels, adding causal or random transformations on gestural data, and adding design flexibility and programming facilities to the interactive environments. Gesture may be used to produce immediate sound by triggering it or by directly controlling parameters of synthesis or other sound processing algorithms. But gesture may be used to control higher level processes that automatically generate variables for sound processing, introducing the concept of performance variations. Gestural data may be tracked in order to compare the actual performance with an ideal one stored into a score. The objective of this job is to follow and interpret in order to be able to produce some performance dependent accompaniment with the digital audio system (Dannenberg, 1984;. Vercoe, 1984; Pennycook, 1991). Gesture may also have a conducting meaning, which help the control system to take some decisions and transform the state of the running processes during performances (Sundberg et al., 2000). This includes compositional processes and musical structure management. 4.1. Gesture analysis First of all, human gesture has to be noise filtered, calibrated and quantized in order to have a normalized representation. Such data can then be analyzed in order to extract pertinent information. Calibration process does not necessarily leads to data reduction with an implicit impoverishment of the gestural information, but helps to get homogeneous information from heterogeneous input channels. Studies on behavior from a kinetic approach point out that human beings hardly reproduce their gesture with precision; skilled and trained subjects are not considered here. Furthermore, by dealing with gesture it is hard to separate the sign or gesture from the signal it conveys and then to extract its significance. There is no general formalisation of movement that may help us to recognize its meaning. As detailed by Cadoz (1994), gesture allows human to interact with their environment to modify it, to get information and to communicate – the ergotic, epistemic, and semiotic functions associated with the gestural channel of communication. Gesture applied to a musical context may follow the same objectives: modifying musical or sound material, getting information on the performance and communicating information to other processes. Segmentation, statistical analysis, pattern matching, neural networks, information retrieval are all examples of analysis techniques which can be used to extract meaningful information on gesture. These techniques have to be implemented in real-time and have to produce stable and reliable outputs on stage for the specific musical purpose of the performance. But as for other domains, such as sound analysissynthesis, it is maybe into a dynamic relationship between gesture analysis and some kind of synthesis feedback – auditive, tactile, or visual – that we can imagine better exploring and studying gesture with a particular focus on signification and emotional content. 4.2. Mapping strategies Interactive musical performances require the simultaneous manipulation of inter-dependent parameters. A good 123 Gestural control of digital audio environments mapping metaphor will help performers and the audience understand the effects of gesture on sound. A good mapping strategy will fulfill all the requirements of the interactive environment: • Hardware and software robustness with reliable timing response; • Data reduction and significant perceptive results according to movement changes; • Practice facilities for the performers to learn the idiosyncrasies of the system; • Quick prototyping facilities for rehearsals; • Different abstraction levels of data representation and mapping for visualizing and editing both gestural and musical parameters. We can consider five different ways to map or connect gestural data to sound parameters. One-to-one connections are typically used for selecting formatted or preset sound parameter values, or for dynamically updating sound parameters. In the former case, the current value of the gestural parameter is used as a branching condition to select materials. In the latter, gestural parameter values need to be converted in order to be adapted to the range of sound parameters extension. This can be done in an absolute mode, as for the amplitude, or in a relative one, as for a vibrato variation. Convergent connections are used to get more information on movement or to add control dimensions to a higher level sound parameter. Pitch, for example, may depend on a fundamental frequency driven by a key pressure, by some frequency deviations coming from some gliding device and by the intensity which may be controlled through some other device. In this case too, conversion rules are necessary to fit gestural parameter range to those of sound. Divergent connections are used to reduce the number of input devices and to optimize the control process. This mapping strategy is often used for commercial digital music instruments. A typical example is the scaling mechanism on the keyboard extension to manipulate envelope profiles, to balance amplitudes or to adjust spectrum bandwidths of generated sounds according to the selected pitch. Metaphors may be a good approach to solve multidimensional control aspects of a sound. Metaphors may help to afford abstract concepts and may inspire the development of original controllers (Wessel & Wright, 2000). Mulder demonstrates the validity of introducing some virtual objects or surfaces, like spheres or sheets, manipulated with hands through instrumented gloves that measure hand shapes (Mulder et al., 1997). The metaphor used in this case is sound sculpting and it has been studied for helping sound designers in their difficult job of adjusting and editing sound algorithm parameters. The implementation of such mapping is very computer expensive and requires a sophisticated programming environment – in this case MAX/FTS has been chosen. Rules which link gesture to sound or musical parameters may also be chosen arbitrarily. They may be considered as part of the compositional material of a music piece. The following table illustrates three of the five mapping strategies mentioned above that can easily be implemented when using a MIDI Keyboard: Table 1. Mapping examples considering gestural data coming from a simple MIDI keyboard. Mapping One to one Convergent Divergent Gestural data Synthesis parameter Key index Key index Key pressure Pitchbend index Key index Oscillator frequency Oscillator frequency Oscillator frequency Envelope attack slope Filter bandwidth 4.3. Data synchronization Real-time processing does not mean updating sound parameters at the moment gestural data is available. On the contrary, in a musical system performers actions may introduce delayed reactions. They also may be analyzed over longer spans of time to extract information on the direction of the movement, on its repetition, and on other kinds of patterns. Immediate actions are generally used to trigger preset parameters or to dynamically update parameter values using the one-to-one mapping strategy. Delayed actions may be used as a compositional or musical choice, such as delayed sounds in live electronics applications. Other delayed actions may be necessary when waiting for some other events to introduce synchronization mechanisms between processes. 5. Controlling performance environments Most of the literature found on gestural control of music tends to consider gesture as physical playing or performing and thus, to privilege the instrumental gestural control approach. Traditional instrumental gesture is analyzed and categorized, gesture tracking device and mapping solutions for digital synthesis systems are developed for virtual instruments performance purpose. Gesture applications to digital sound processing may be faced in many ways, especially when thinking of the new possible meaning for gesture in computer music. The following part will describe some situations in which gestural control of performance environments goes beyond the model of traditional instrumental control. 124 Sylviane Sapir Table 2. Musical score as a grid of macro (rows) and micro (columns) events. Table 4. Description of the four microevent control primitives. Primitive id. Event Time Dur. Ins. P5 P6 P7 Not Not ... Not 0 0 ... 5000 1000 750 ... 3000 1 1 ... 1 100 50 ... 100 x y ... z 440 131 ... 220 Table 3. Gestural score as a grid of macro (rows) and micro (columns) events. Event Time Dev. P4 P5 P6 Gest Gest ... Gest 0 0 ... 3000 sensor sensor ... sensor 100 80 ... 100 a b ... c 30 60 ... 25 5.1. Computer score performance During the author’s working period at the Centro di Sonologia Computazionale (C.S.C.) of Padua University, the problems that derived from using gesture to control the real time performace of a computer musical score, such as Music V or Csound ones have been studied (Azzolini & Sapir, 1984). This led to the description of a theoretical frame in which gesture and sounds may be organized into two independent and structured spaces that may interact according to arbitrary relationships (Sapir, 1989). The problem is not to play with an instrument, but to interpret a score with different levels of interaction from the complete automatic performance (press a button and the sequence runs by itself) up to improvisation (gesture built musical phrases). Something has to be said about the significance of the time data: gesture time (start, end or duration) may influence notes or silence timings and vice-versa. Real-time – loading a value or a set of sound parameters at the moment a gesture occurs – is only one possibility among others. By considering the computer sound score (notes, timings, parameters) and the gesture score (gesture, timings, parameters) as two grids of events, as shown in Tables 2 and 3, it is possible to mix elements of these grids in order to produce a performance which is musically significant. Macroevents are defined as a group of parameters logically correlated, that intervene at the same instant in the elaboration of a musical or a gestural event, while the individual variations of a single parameter are defined as microevents. A set of control primitives on microevents has been defined in order to combine in diversified ways their timevalue elements, and general treatment modalities of the time parameter has been identified to allow temporal organization control of the different macroevents, (see Tables 4 and 5). No Control Direct Access Sample & Hold Sequencer Table 5. Time comes from Value comes from Score Gesture Score Gesture Score Gesture Gesture Score Description of the three macroevent timing controls. Time control id. Action time comes from Duration comes from No Control Trigger Trigger & Gate Score Gesture Gesture Score Score Gesture A symbolic language to describe the way gesture interacts with sound along with a real time software that allows score performance have been designed and implemented for the digital sound processor 4i. This processor was physically controlled by only two sliders and the host computer keyboard – a DEC PDP-11 with the RTI4I proprietary real-time operating system. Even with such a simple prototype, numerous musical pieces have been produced, such as the digital part of the Prometeo by L. Nono in the two first versions (Sapir & Vidolin, 1985). 5.2. Live electronic performance Live electronic performances may be considered a challenge for creative gesture to control sound processing. Composers and performers are often fascinated by the combination of different expressive means. Interactions between real and synthetic worlds stimulate invention and creativity. The possibility of instantly processing sounds that are produced by acoustic instruments on the stage allows to overcome some natural limitations of traditional music performers, and adds new dimensions to the performance of computer music works. By dealing with existing sound and recognizable instrumental gesture, we have to think how sound processing and non instrumental gesture may be musically congruent to the musical performance. This is done without having any direct control on the sound emission, but by transforming audio data according to external events which are not always explicit on stage but which depend on states of the performance itself. Furthermore, live electronics is often combined with sound projection for spatialization and/or sound reinforcement, thus including gesture which belongs more to sound engineers practices than to instrumental ones. Digital audio environments for live electronics may require heterogeneous equipment such as audio processors, 125 Gestural control of digital audio environments MIDI device, digital mixers, computers network with eventual backup systems in case of crash. Thus very complex situations on stage and at the mixing console have to be managed to allow a good and safe interaction between performers via different communication channels. Such interactive environments must be designed to fit the requirements of the final musical performance but also of the critical rehearsal phase, and each piece or each new performance environment requires new developments and new solutions. With such complex setups, it’s generally more comfortable to work in the former case with cue lists of pre-defined musical material such as sets of sound or transformation parameters, or audio samples. This strategy allows to lower risk levels and is well suited when live electronics have been encoded on the score. Performers at computer, or performers on the stage will only have to trigger the cues at a given time through keyboards, pedals, buttons or other devices. In the case of a rehearsal, the environment must be designed to allow experimentation by means of direct gestural control on sound processing parameters. Usually, one to one mapping solutions along with the possibility of taking snapshots of running configurations are good strategies that allow to prepare the pre-defined material to be cued during the final performance. Sound processing algorithms may be considered as new instruments to play. Such instruments require immediate dynamic control which can be done on stage with remote controllers or by some hidden performer close to the mixing console. Gesture is intimately linked to sound through a bidirectional path that may initially disorient but that certainly opens new horizons for music creation. By giving the possibility of selecting the quality of gesture and its relationship to the resulting sound, the composer enlarges its musical language, letting gestural meaning to grow through the interaction between performers and the way we experience the performance. Figure 2 illustrates a general situation during live electronic performances. One or more performers play acoustic instruments whose sound may be transformed by processes. These processes may be controlled by other performers who can use specific control device or traditional acoustic instruments. For example, the intensity of a sound produced by a clarinet player may be dynamically controlled by the intensity of a sound produced in the mean time by a flute player. A simple envelope follower allows to extract the dynamic profile of the sound intensity and to apply it to the sound of the clarinet. A similar control could be realized by moving a slider and by mapping the output data coming from the slider onto a modulator input. Two different typologies of gesture would led to two different audio and visible results and would might have two different musical meanings. Finally, the feedback that appears in Figure 2 indicates that processed sounds may be added to new incoming ones. Such situation may create unstable systems which Instrumental Gesture Control Gesture Sound emission Sound Processing Fig. 2. Scheme of the relationship between gesture and sound during live electronic performance. are complex to manage but with a great potential for expressiveness. Traditional music performance implies the actuation of instrumental gestures on acoustic instruments. In such a case, the relationship “one gesture to one acoustic event” is dominant and clear to the audience. The introduction of sound processing during the performance allows to transform the acoustic event thus decreasing the strength of the relationship between gesture and sound. By physically controlling such processes and adding new performers on stage it is possible to balance again the link between gesture and sound, and to play with the theatrical aspects of gesturality. 5.3. Real-time control of a music workstation Real-time synthesis and sound processing require large computer power. Furthermore, when processing depends on the musical situation, on logic conditions lying on audio or gesture signals, or when parameter extraction from sound analysis is necessary to trigger effects or to pilot automatic score following, the need of a fast powerful programmable integrated system becomes a must. The MARS musical workstation, developed at the IRIS research center of the Bontempi-Farfisa’s group during the nineties satisfies main of the requirements of such a system. This musical workstation is a totally programmable digital machine dedicated to real-time digital sound processing. In its latest version, this workstation includes a sound board, NERGAL, to be plugged into the host computer and the graphic user interface ARES which runs under the Windows platform (Andrenacci et al., 1992). ARES is a set of integrated graphic editors which allow the interactive development of hierarchized sound objects. It is possible to graphically design algorithms for sound processing and to control them by associating any type of MIDI device to their parameters with arbitrary relationships. The real-time control is done by the embedded RT20 operating system of NERGAL (Andrenacci et al., 1997). MARS does not provide any compositional environment, but its compatibility with the MIDI world allows users to include it in a studio having MIDI facilities. To overcome the MIDI limitations, MARS provides an application toolkit which allows to extend its environment and to easily develop user applications. 126 Sylviane Sapir MARS allows users to develop single sound objects called tones. Tones are specific configurations of all algorithm input variables such as parameters, envelopes or LFOs. Tones are related to their algorithms and are associated to a MIDI channel. Tones may be dynamically loaded with Program Change messages. Parameters have a symbolic name, a default representation unit (such as dB, Hz or ms) and a state that defines the way their values will be updated: • Static parameters are initialized at the beginning of a note event by a constant value. • Dynamic parameters are initialized at the beginning of a note and are continuously updated during the life-time of the note, according to a conversion formula which solve most of the main mapping problems. The setup of a dynamic parameter is made in two phases: 1. Formula selection – a set of 17 predefined formulas is available as combinations of a maximum of three functional blocks called M. For example, in the Figure 3 the formula associated to the pitch parameter is: Pitch = M1* [1 + M2] 2. Blocks definition – Each M block is defined by a conversion function (an array), a MIDI index, and a scaling factor for output values. For example: M1 = HZTEMP(Key)*1.0 M2 = BEND12HT(Pitch-bend)*1.0 The mapping features provided by the MARS workstation allows for different solutions which fit different control situations. One to one mapping strategies have been intensively used for DSP research and quick sound prototyping. Convergent mapping strategies have been used to add performance possibilities and allow subtle live control mechanisms, as illustrated in the previous example on pitch. Divergent mapping has been commonly used to simulate commercial instrument behaviors and to optimize the ergonomic aspects of their control part. The MARS workstation combined with the MAX environment has been intensively used for real-time synthesis and for live electronic pieces. Its interface provides four important features related to the sound – gesture relationship which usually have fit the interactive digital environments requirements. 1. Gestural data conversion facilities; 2. Gestural data grouping rules to define “meta-gesture”; 3. Sound parameters extraction and logic facilities on audio signal; 4. Time control and synchronization features for parameter updating. Fig. 3. The MARS dynamic parameter editor. 5.4. Gesture and music education At school, it’s currently more frequent to find multimedia computers than musical instruments. It’s always possible to learn about music with a computer, but using the computer keyboard or the mouse to interact with materials in a musical context is not at all ideal. Similarly, the movements belonging to the traditional world of music frequently need sophisticated motor skills that are beyond the capacity of children who do not study a musical instrument. Numerous studies have shown that methods which involve the voice and the whole body in an exploration of motor coordination and of synchronicity with the development of the musical experience were helpful for the young children music education but also in situations of rehabilitation or motor or psychological disadvantage (Aucouturier, Darrault, & Empinet, 1984; Bachmann, 1991; Le Boulch, 1991; Vayer & Roncin, 1988). An interactive digital audio environment called “Il Teatro dei Suoni1” (The Theater of Sounds) has been developed by a team of music teachers, engineers and musicians under the author’s coordination, for the expressive, musical and motor education of children. The Theater of Sounds transforms the classroom into a live interactive sound-creating space, through a system of mats/sensors, which capture each movement and transform it into sound through the computer. The Theatre of Sounds is made up of: • A “daisy” with 12 mat-sensor petals which the children can walk and jump on. This is connected via a wire-free radio link to the computer which generates sounds, music and sound effects; • A sound-generating software system which interacts with the audio card of the multimedia computer; 1 The Teatro dei Suoni has been developed by the Soundcage s.r.l http://www.geureka.com; it is a registered trademark GARAMOND S.r.l. (Rome – Italy) – http://www.garamond.it/ Gestural control of digital audio environments Fig. 4. 127 The Theatre of Sounds – “Daisy” with the mats/sensor and the graphical user interface. • An authoring program which allows to launch interactive audio environments or to create new ones. The Theatre of Sounds uses a great variety of audio items. These can be single notes played on instruments from a vast palette of tones, but can also be musical phrases or sound effects. As well as providing a rich bank of sounds and sequences, the system also permits the creation of new interactive audio environments and allows musical sequences and samples to be imported from audio and MIDI compatible external files. The educational aims of the system are: • The exploration of sound, tone, melody, rhythm and form, by manipulating the parameters and values which define them; • The development of perceptive-motor skills; • The co-ordination of movements directed at a particular musical aim; • The elaboration of narrative plots from an evocative audio stimulus, and the development of more mature creative and communicative skills through movement. With such an environment, we faced two problems: mapping simple gestural data onto audio and musical materials at different hierarchical levels, and allowing more performers (up to 12) to play together in order to perform structured musical material. We adopted a very simple technology to track gesture – analogical sensors hidden in the mats that let children free to move on the daisy. The radio transmission of the gestural data to the host computer allows to place the “daisy” far away from the computer for the children to focus the attention on their movements and not on what happens on the screen. Gesture is split into two categories – single steps like walking and high-rated steps like running – which corresponds to two different control types – discrete and continuous. Discrete controls allow events triggering or sequencing while continuous controls allows to gradually transform musical or audio parameters. MIDI based material or audio material may be mapped onto the sensors by means of a set of pre-defined programmable rules. This material is then dynamically re-organized and transformed according to the selected rules of the current environment and the gestural inputs of the children. The theater of sounds aims to be a “collective musical instrument” to develop communication and expression through sound, music and movement. The metaphor of a theatre helps the children to bring together a performance in which each of them plays a part as composer, performer and even director. 6. Conclusions This paper was concerned with the study of new relationships between sound and gesture. After having drawn a general framework of the gesture tracking technology, it faced the problem of mapping gestural data onto audio data. Some applications have been described to illustrate specific situations in which gesture may control interactive audio environments without direct reference to the conventional instrumental model. Human gesture has always been at the center of the musical creation process either with the writing form of a score or for instrumental and conducting performance. On the contrary, listening to music often leads to emotions which reverberate at body level; a phenomenon which is at the basis of dance and choreography. Music and gesture are closely linked with a dynamic relationship between acting and listening, cause and effect. Today’s technology allows one to imagine and build in an arbitrary way new relationships between gesture and sound. Besides new hints for creativity these new possibilities allow the development of innovations for education, help for the handicapped and overall communication. 128 Sylviane Sapir Composers and performers have an important role to assume for helping developing interactive systems with meaningful relationships between music and human gesture. References Andrenacci, P., Armani, F., Paladin, A., Prestigiacomo, A., Rosati, C., Sapir, S., & Vetuschi, M. (1997). MARS = NERGAL + ARES. In Proceedings of The Journées d’Informatique Musicale 1997 (pp. 145–152). Lyon: SFIM (French Computer Music Society). Andrenacci, P., Favreau, E., Larosa, N., Prestigiacomo, A., Rosati, C., & Sapir, S. (1992). MARS: RT20M/EDIT20 – Development tools and graphical user interface for a sound generation board. In Proceedings of the International Computer Music Conference 1992 (pp. 340–344). San Jose, California: International Computer Music Association. Aucouturier, B., Darrault, I., & Impinet, J.L. (1984). La pratique psychomotrice: rééducation et thérapie. Doin. Paris. Azzolini, F. & Sapir, S. (1984). Score and/or gesture: a real time system control for the digital sound processor 4i. In Proceedings of the International Computer Music Conference 1984 (pp. 25–34). Paris: International Computer Music Association. Bachmann, M.L. (1991). Dalcroze Today: An Education Through and Into Music. Clarendon Press. Oxford. Boie, R., Mathews, M., & Schloss, A. (1989). The radio drum as a synthesizer controller. In Proceedings of the International Computer Music Conference 1989 (pp. 42–45). Columbus, Ohio: International Computer Music Association. Cadoz, C., Luciani, A., & Florens, J.L. (1984). Responsive input devices and sound synthesis by simulation of instrumental mechanisms: The CORDIS system. Computer Music Journal, 8(3), 60–73. Cadoz, C. (1994). Le geste canal de communication hommemachine. La communication “Instrumentale” – Numéro Spécial: Interface Homme-machine, 31–61. Camurri, A. (1995). Interactive dance/music systems. In: Proceedings of the International Computer Music Conference 1995 (pp. 245–252). Banff, Alberta, Canada: International Computer Music Association. Chadabe, J. & Meyers, R. (1978). An introduction to the PLAY program. Computer Music Journal, 2(1), 12–18. Dannenberg, R. (1984). An on-line algorithm for real-time accompaniment. In Proceedings of the International Computer Music Conference 1984 (pp. 193–198). Paris, France: International Computer Music Association. Dannenberg, R. & McAvinney, Rubine, D. (1986). Arctic: functional language for real time systems. Computer Music Journal, 10(4), 67–78. Depalle, P., Tassart, S., & Wanderley, M. (1997). Instruments Virtuels. Résonance, 5–8. IRCAM. Paris, France. Le Boulch, J. (1995). Mouvement etˆ développement de la personne. Vigot. Morales-Manzaneres, R. & Morales, E.F. (1997). Music Composition, Improvisation, and Performance Through Body Movements. In: Proceedings of the International Workshop on KANSEI – The Technology of Emotion (pp. 92–97). Genova: AIMI (Italian Computer Association) and DIST – University of Genova, Italy. Moore, F.R. (1988). The dysfunction of MIDI. Computer Music Journal, 18(4), 19–28. McMillen, K. (1994). ZIPI – Origins and Motivations. Computer Music Journal, 18(4), 47–51. Moorer, J.A. (2000). Audio in the new millennium. R.C. Heyser memorial lecture series, Audio Engineering Society. http://www.aes.org/technical/Heyser.html. Mulder, A. (1994). Human movement tracking technology. Technical Report. Simon Fraser University, Canada. Mulder, A., Fels, S., & Mase, K. (1997). Empty-handed gesture analysis in Max/FTS. In Proceedings of the International Workshop on Kansei: The technology of Emotion (pp. 87–91). Genova: AIMI (Italian Computer Music Association) and DIST – University of Genova, Italy. Pennycook, B. (1991). The PRAESIO Series – Compositiondriven Interactive Software. Computer Music Journal, 15(3), 267–289. Polansky, J., Rosenboom, D., & Burk, P. (1987). HMSL: overview and notes on intelligent instrument design. In: Proceedings of the International Computer Music Conference 1987 (pp. 256–263). Urbana, Illinois: International Computer Music Association. Puckette, M. (1991). Combining event and signal processing in the MAX graphical programming environment. Computer Music Journal, 15(3), 68–77. Roads, C. (1996). The Computer Music Tutorial. Cambridge, MA: MIT Press. Sapir, S. (1989). Informatique musicale et temps réel: geste – structure – matériaux. In Proceedings of the International Conference on Musical Structures and Computer Aid (Structures Musicales et Assistance Informatique) (pp. 5–19). Marseille: MIM (Marseille Computer Music Laboratory), CRSM – Université de Provence, Sud Musique and CCSTI (Cultural Center on Science, Techonlogy and Industry of Provence), France. Sapir, S. & Vidolin, A. (1985). Interazioni fra tempo e gesto – Note tecniche sulla realizzazione informatica del Prometeo. Quaderno 5 del LIMB (pp. 25–33). Venezia: La Biennale di Venezia, Italy. Sundberg, J., Friberg, A., Mathews, M.V., & Benett, G. (2001). Experiences of Combining the Radio Baton with the Director Musices Performance Grammar. In: Proceedings of MOSART Workshop on Current Research Directions in Computer Music (pp. 70–74). Barcelona, France. http://www.iua.upf.es/mtg/mosart/papers/p34.pdf. Vayer, P. & Roncin, C. (1988). Les Activités corporelles chez le jeune enfant. Presses Universitaires de France. Vercoe, B. (1984). The synthetic performer in the context of live performance. In Proceedings of the International Computer Music Conference 1984 (pp. 199–200). Paris, France: International Computer Music Association. Wanderley, M.M. & Battier, M., eds. (2000). Trends in Gestural Control of Music. Ircam – Centre Pompidou. Gestural control of digital audio environments Winkler, T. (1995). Making motion musical: gestural mapping strategies for interactive computer music. In Proceedings of the International Computer Music Conference 1995 (pp. 261–264). Banff, Alberta, Canada: International Computer Music Association. Wessel, D. & Wright, M. (2000), Problems and Prospects 129 for Intimate Musical Control of Computers. In: ACM SIGCHI, CHI ’01 Workshop New Interfaces for Musical Expression (NIME’01). http://cnmat.cnmat.berkeley.edu/ Research/CHI2000/wessel.pdf. Wright, M. (1994). A comparison of MIDI and ZIPI. Computer Music Journal, 18(4), 86–91.
© Copyright 2026 Paperzz