Gestural Control of Digital Audio Environments - Digital

Journal of New Music Research
2002, Vol. 31, No. 2, pp. 119–129
0929-8215/02/3102-119$16.00
© Swets & Zeitlinger
Gestural Control of Digital Audio Environments
Sylviane Sapir
Electronic Music Department, Conservatory of Cagliari, Italy
Abstract
In a traditional musical situation based on the strict use of
acoustic instruments, human gesture usually produces sound.
The relationship between gesture and sound is unique, it is a
cause to effect link. In computer music, the possibility of
uncoupling gesture from sound is due to the fact that computer can carry out all the aspects of sound production from
composition up to interpretation and performance.
Real-time computing technology and development of
human gesture tracking systems may enable gesture to be
introduced again into the practice of computer music, but with
a completely renewed approach. There is no more need to
create direct cause to effect relationships for sound production, and gesture may be seen as another musical parameter to
play with in the context of interactive musical performances.
1. Introduction
Musical performances within the context of computer music
implicate an interaction between human beings and digital
instruments which result in audible and maybe visible effects
by an audience. Performers interact with their body by
manipulating physical objects or by freely moving into space.
The forces applied with body parts to the objects or the
dynamic characteristics of the movements such as the trajectory or the velocity have to be captured by sensors, converted into data that represents the movements and then
turned into sound after processing.
The terms of movements and gesture may be ambiguous
and an unconditional definition of gesture is still problematic. In this paper gesture and movement are referred to the
motions of parts of the human body, but gesture involve
manipulative or communicative movements founded on an
intent, in our case a musical intent.
The study and the analysis of human gesture in music area
for the control of digital instruments, along with standardization requirements coming from the industry have led to the
birth of MIDI devices and their communication protocol.
This protocol has often been judged insufficient, because
of its narrow bandwidth and its limitations due to the pianistic gesture reproduction (Moore, 1987; McMillen, 1994;
Wright, 1994). This is even more true today as the relationship between composer and technology is no more exclusive
but is extended to other forms of artistic projects in which
multimedia interact with other forms of expression such
as dance, theatre or video art – predilected fields of body
communication. However, and maybe in spite of all its
defects, MIDI remains the most popular way of controlling
digital audio synthesis. An explanation may be found in the
MIDI approach itself which detaches the interface from the
sound source, thus enabling the expansion of innovative
controllers. Another explanation may be more prosaically
found in the low cost and easy implementation of the MIDI
technology.
The development of interactive audio environments which
capture and analyze movements of one or more performers
in real-time to arbitrary control sound processing systems
allow the traditional aspects of composition and sound synthesis in computer music to be connected with those less
codified of the body expression. In this way it’ll be possible
to establish new gestural interactions in the perspective of an
extended musical creative process which involves the spectacular, affective and emotive aspects of the human gesture.
This field of investigation is at the core of promising research
from many different disciplines such as computer science,
psychology, linguistics and music. A very detailed bibliography, with interesting theoretical articles and cases of study
may be found in the IRCAM’s electronic publication “Trends
in Gestural Control of Music” (Wanderley & Battier eds.,
2000).
2. Sound and gesture
The possibility of producing music in a different place and
at a different moment in which it may be listened has incred-
Accepted: 8 December, 2001
Correspondence: Via Cimate 92, 03018 Paliano (FR), Italy. E-mail: [email protected]
120
Sylviane Sapir
ibly powered up the possibilities of artistic and creative
processes.
Furthermore, the possibility of creating music material in
all its details by physically describing the sound event and by
taking into account the limits of our perception, has freed us
from the mechanical constraints of traditional musical instruments. By detaching the control of sound from its production either in space or time opens new horizons to composers
and performers.
2.1. Control and automation
The progress of digital audio technology of these last 20
years is so impressive that we may wonder how it will continue and what its impact could be on music production and
other forms of creativity.
J. A. Moorer (2000) in an AES lecture foresees that the
main problem facing digital audio engineers will not be
how to perform a particular manipulation of sound, but how
the amount of power that will be available can possibly be
controlled.
In effect, the growth of processor speed, the development
of new processors architecture and networking capacity, the
reduction of memory access time, the increasing volume of
hard-disk storage, etc. leads Moorer to assume that in 20
years we should be able to process 9000 channels of sound
in real time, doing million-point FFTs and that processes that
were considered unfeasible because they were too complex
will become matter of fact.
But what will happen if a mixing console has 2000 or
more controls on it, or if a digital audio workstation has two
million virtual sliders and buttons? The only viable solution,
according to Moorer, will be the rise of “intelligent assistants.” This extreme example which belongs more to the
audio engineering domain than to the musical/artistic one
invites us to reflect on human control requirements respect
to the increasing needs of computing power for sound processing. Intelligent assistance is the crucial point of computer
music, and it is fundamental for the control of real time
performances.
Nowadays it is possible to interact with a digital system
through its user interface which can convey information
through many communication channels such as hearing,
sight and touch. But as seen before, the number of
parameters to be managed simultaneously may be very
big which makes real time control a complicate task for
performers.
To overcome this problem more solutions have been found
such as grouping parameters or by introducing automation in
performances. A software layer is always necessary between
gesture acquisition and sound production: gesture management software and performance software.
Performance software includes sequencers, arrangers,
score followers and other rule-based applications which
allow computers to be charged with specific tasks in order to
relieve performers from some low level controls. Dividing
control between computer and performers allows one to go
beyond traditional performance modalities.
When gestural data follows the MIDI format, it is easy
to find some program that can manage it, like MIDI sequencers and more general development environment such as
Max (Puckette, 1991). But other environments like PLAY
(Chadabe, 1978), HMSL (Polansky & Rosenboom, 1987),
or ARCTIC (Dannenberg et al., 1986), and more recently
HARP (Camurri, 1995) or SICIB (Morales-Manzarenes,
1997) have been specifically developed for interactive music
performance systems. These environments support dedicated
languages that allow one to define and handle musical control
flow, i.e., a sequence of instructions that manage musical and
gestural events in a formal way.
2.2. New meanings for gesture
Acoustic musical instruments, including voice are the
most natural and traditional interfaces used by musicians
to perform music. But technology has brought to light new
instruments that can transduce human motion into electrical
signals, and then pilot any kind of sound processing.
As described by T. Winkler (1995), pioneering work on
gestural input devices was carried out during the analog electronic music period. The invention of the Theremin anticipated the development of new expressive instruments. It
is noticeable that the first electronic musical instrument, a
simple oscillator, performed without physical touch could
produce very subtle and varied sounds because the sound
reflected more the expressive quality of human movement
than its own timbre quality.
A main question in using new input devices is whether to
use controllers based on existing instruments or innovative
ones. The main advantage of using the former is that musicians are already familiar with them and can easily exploit
their performance skills learnt over years of practice. But
new themes of reflection arise when gesture is no more linked
to sound production and when traditional expressions of virtuosity hardly find place in the music which is performed. By
the way, we may wonder if virtuosity is still needed. Virtuosity has to deal with notions of skill development, effort,
and performance boundary. Many artists believe that the
ease of use of gestural controllers leads to a loss of expressive power. According to David Wessel (2000) easy to use
controllers proposed for musical purpose have a toy-like
character which does not invite musical exploration and continued musical evolution. Virtuosity seems to be strongly
linked to expression and musicality and thus strongly
required in the context of performing arts.
On the other hand, new models of controllers may be
simpler or more intuitive to use for non-skilled performers
and well adapted to “getting started” situations. They also
allow the introduction of new gestures which better fit control
features when precision is required.
Gestural control of digital audio environments
Ergonomics, mechanical and temporal precision, flexibility and programmability are the characteristics more often
needed for new controllers. In the mean time innovative input
devices make performances more spectacular and may
extend expressive possibilities of computer-based interactive
music systems.
Finally the selection of the gestural input device, and consequently those of the category of gesture to be used for the
performance belongs more to a musical or a compositional
choice than to a technological one – gesture becomes part of
the new musical parameters.
3. Gesture tracking technology
The explosion of new devices that enables capturing and
tracking movement is symptomatic of a great deal of interest from many fields. The development of new interactive
systems based on gesture is concerned with both industrial
and artistic applications just like virtual reality and multimedia applications.
3.1. Requirements for a tracking system
As described by A. Mulder (1994), human movement tracking systems can be classified as inside-in, inside-out and
outside-in systems.
Inside-in and out systems employ sensors on the body
while outside-in systems employ external sensors. Both types
have strengths and weaknesses. For example, the former are
generally considered more obtrusive than the latter.
But the category of gesture that has to be captured will
determine the kind of system to use (Depalle et al., 1997).
In the case of instrumental gesture, it is possible to directly
capture the gesture through sensors. MIDI devices belong to
this category. It is also possible to indirectly capture gesture
by analyzing the sound produced by an instrument and
deducing information about the gestures applied by a performer to generate it. In this case, it is necessary to provide
very sophisticated programs that extract significant gestural/audio information in real-time.
The following figure describes a very generic overall
tracking system:
Gesture
Fig. 1.
Sensors
Computer
application
Gestural
Data
121
3.2. Some hardware technology
Any technology that can transduce human gesture or movement into electrical signal is available to build sensors.
Regardless of sensor type, it is important to recognize not
only what is being measured, but how it is being measured:
resolution and sampling rate are fundamental aspects to take
into account. The commonly used technologies include infra
red, ultra sonic, hall effect, electromagnetic and video.
Keyboard, switch, pushbutton, slider, joystick, mouse,
graphic tablet, touch-sensitive pad, etc. belong to computer
hardware. However, although such devices have been intensively used as gestural input devices for musical purpose they
hardly fit musical gesturality requirements such as the sense
of tactile feedback, and they miss natural expressive power.
On the other hand, musical keyboards, foot pedals, drum
pads, ribbon controllers, breath controllers and other
instrument-like controllers which belong to the MIDI world
have been exclusively developed for musical aim. They
capture musical gesture on the scheme of the MIDI Protocol, but they are sometimes used for other purposes, lighting
controls for example.
Many other technologies have been developed to sense
and track a performer’s movements in space, and not only for
musical performances. On-body sensors measure angle,
force, or position of various parts of the body. For example,
piezo-resistive and piezo-electric technologies are very
common for tracking small body parts such as fingers
through the use of gloves. Spatial sensors detect location in
space by proximity to hardware sensors or by location within
a projected grid. The Radio Drum, for example, measures the
location of two batons in three dimensions in relation to a
radio receiver (Boie et al., 1989). Videocamera-based technologies are used to track larger marked body parts and are
often used to integrate music and movement languages such
as dance or theater (Camurri, 1995).
Force feedback devices developed at ACROE for interactive systems using physical modeling synthesis are very
interesting attempts to offer tactile-kinesthetic feedback to
performers (Cadoz et al., 1984). By feeding back the sensation of effort, it is possible for the performer to have a finest
control on its movements.
Many musicians have built their own input devices as
prototypes by using microphones, accelerometers and other
types of sensors combined to electronic circuitry. But all this
hardware has sense if it is connected to the computer and
managed by some software – the performance software.
Scheme for a human gesture tracking system.
Gesture is captured by sensors which transduce any
physical energy into an electrical signal which will be converted into a digital format. Dedicated computer applications
analyze such data in order to extract significant gestural
information that will be used for the control of sound processing or other musical application.
3.3. What about software?
As described by Roads (1996) in his Computer Music
Tutorial, at the core of every performance program is a software representation of the music. By working with DSPs to
process audio data, we have to deal with a lot of different
types of parameters at a low level.
122
Sylviane Sapir
Some parameters have discrete representation, some other
are continuous, some have time constraints, some not, etc.
On the other hand gestural data also has its own internal
representation. Performance software has to support all these
representations, and to map sensor data onto algorithm
parameters in a meaningful musical way, i.e., adapted to
the styles of music and movement.
Timing is another focal point of performance software, as
for any kind of reactive computing application. These applications require reliable, real-time performance compatible
with intimate human interaction. Software research for
the integration of robust temporal scheduling in operating
systems, drivers, programming languages and environments
is fundamental to provide good software tools that allow
musical control flow description and interactive music applications without latency problems.
In our case, latency is the delayed audio reaction of a
system respect to the gesture actuation time. The main causes
of latency may be found in weak operating systems scheduling strategies and non-optimized software applications, but
also in time consuming operations such as intensive disk
access. Wessel and Wright (2000) state that the acceptable
upper bound computer’s audible reaction to gesture is 10
milliseconds. Furthermore they argue that latency variation
should not exceed 1 millisecond in order to be compatible with
measurements realized through psychoacoustic experiments
on temporal auditory acuity. They suggest to solve the latency
problem by using time tags or by treating gesture as continuous signals tightly synchronized with the audio I/O stream. A
solution which excludes the use of MIDI equipment and suggests the development of new device and protocols.
4. The mapping concept
The second step after having captured some gesture is to
decide how the gestural data will be related to sound processing. The mapping problem is at the center of the effectiveness of an interactive environment. To a certain extent,
as described by Winkler (1995), we can consider human
movement in space as a musical instrument if we allow the
physicality of movement to impact on musical material and
processes, and if we are able to map physical parameters of
movement on high level musical parameters down to low
level sound parameters.
Adding some musical intelligence inside the control
system implies thinking of the range of operations to deal
with, the conversion mechanisms involved, the temporal
independence of parameters, grouping or splitting gestural
channels, adding causal or random transformations on gestural data, and adding design flexibility and programming
facilities to the interactive environments.
Gesture may be used to produce immediate sound by triggering it or by directly controlling parameters of synthesis or
other sound processing algorithms. But gesture may be used
to control higher level processes that automatically generate
variables for sound processing, introducing the concept of
performance variations.
Gestural data may be tracked in order to compare the
actual performance with an ideal one stored into a score. The
objective of this job is to follow and interpret in order to be
able to produce some performance dependent accompaniment with the digital audio system (Dannenberg, 1984;.
Vercoe, 1984; Pennycook, 1991).
Gesture may also have a conducting meaning, which
help the control system to take some decisions and transform
the state of the running processes during performances
(Sundberg et al., 2000). This includes compositional processes and musical structure management.
4.1. Gesture analysis
First of all, human gesture has to be noise filtered, calibrated
and quantized in order to have a normalized representation.
Such data can then be analyzed in order to extract pertinent
information. Calibration process does not necessarily leads
to data reduction with an implicit impoverishment of the gestural information, but helps to get homogeneous information
from heterogeneous input channels.
Studies on behavior from a kinetic approach point out that
human beings hardly reproduce their gesture with precision;
skilled and trained subjects are not considered here. Furthermore, by dealing with gesture it is hard to separate the sign
or gesture from the signal it conveys and then to extract its
significance. There is no general formalisation of movement
that may help us to recognize its meaning.
As detailed by Cadoz (1994), gesture allows human to
interact with their environment to modify it, to get information and to communicate – the ergotic, epistemic, and
semiotic functions associated with the gestural channel of
communication. Gesture applied to a musical context may
follow the same objectives: modifying musical or sound
material, getting information on the performance and communicating information to other processes.
Segmentation, statistical analysis, pattern matching,
neural networks, information retrieval are all examples of
analysis techniques which can be used to extract meaningful
information on gesture. These techniques have to be implemented in real-time and have to produce stable and reliable
outputs on stage for the specific musical purpose of the
performance.
But as for other domains, such as sound analysissynthesis, it is maybe into a dynamic relationship between
gesture analysis and some kind of synthesis feedback – auditive, tactile, or visual – that we can imagine better exploring
and studying gesture with a particular focus on signification
and emotional content.
4.2. Mapping strategies
Interactive musical performances require the simultaneous
manipulation of inter-dependent parameters. A good
123
Gestural control of digital audio environments
mapping metaphor will help performers and the audience
understand the effects of gesture on sound. A good mapping
strategy will fulfill all the requirements of the interactive
environment:
• Hardware and software robustness with reliable timing
response;
• Data reduction and significant perceptive results according
to movement changes;
• Practice facilities for the performers to learn the idiosyncrasies of the system;
• Quick prototyping facilities for rehearsals;
• Different abstraction levels of data representation and
mapping for visualizing and editing both gestural and
musical parameters.
We can consider five different ways to map or connect
gestural data to sound parameters.
One-to-one connections are typically used for selecting
formatted or preset sound parameter values, or for dynamically updating sound parameters. In the former case, the
current value of the gestural parameter is used as a branching condition to select materials. In the latter, gestural parameter values need to be converted in order to be adapted to
the range of sound parameters extension. This can be done
in an absolute mode, as for the amplitude, or in a relative
one, as for a vibrato variation.
Convergent connections are used to get more information on movement or to add control dimensions to a higher
level sound parameter. Pitch, for example, may depend
on a fundamental frequency driven by a key pressure, by
some frequency deviations coming from some gliding
device and by the intensity which may be controlled
through some other device. In this case too, conversion rules
are necessary to fit gestural parameter range to those of
sound.
Divergent connections are used to reduce the number
of input devices and to optimize the control process. This
mapping strategy is often used for commercial digital music
instruments. A typical example is the scaling mechanism on
the keyboard extension to manipulate envelope profiles, to
balance amplitudes or to adjust spectrum bandwidths of
generated sounds according to the selected pitch.
Metaphors may be a good approach to solve multidimensional control aspects of a sound. Metaphors may help to
afford abstract concepts and may inspire the development
of original controllers (Wessel & Wright, 2000). Mulder
demonstrates the validity of introducing some virtual objects
or surfaces, like spheres or sheets, manipulated with hands
through instrumented gloves that measure hand shapes
(Mulder et al., 1997). The metaphor used in this case is sound
sculpting and it has been studied for helping sound designers in their difficult job of adjusting and editing sound algorithm parameters. The implementation of such mapping
is very computer expensive and requires a sophisticated
programming environment – in this case MAX/FTS has
been chosen.
Rules which link gesture to sound or musical parameters may also be chosen arbitrarily. They may be considered as part of the compositional material of a music
piece.
The following table illustrates three of the five mapping
strategies mentioned above that can easily be implemented
when using a MIDI Keyboard:
Table 1. Mapping examples considering gestural data coming
from a simple MIDI keyboard.
Mapping
One to one
Convergent
Divergent
Gestural data
Synthesis parameter
Key index
Key index
Key pressure
Pitchbend index
Key index
Oscillator frequency
Oscillator frequency
Oscillator frequency
Envelope attack slope
Filter bandwidth
4.3. Data synchronization
Real-time processing does not mean updating sound parameters at the moment gestural data is available. On the
contrary, in a musical system performers actions may introduce delayed reactions. They also may be analyzed over
longer spans of time to extract information on the direction
of the movement, on its repetition, and on other kinds of
patterns.
Immediate actions are generally used to trigger preset
parameters or to dynamically update parameter values using
the one-to-one mapping strategy.
Delayed actions may be used as a compositional or
musical choice, such as delayed sounds in live electronics
applications. Other delayed actions may be necessary when
waiting for some other events to introduce synchronization
mechanisms between processes.
5. Controlling performance environments
Most of the literature found on gestural control of music
tends to consider gesture as physical playing or performing
and thus, to privilege the instrumental gestural control
approach. Traditional instrumental gesture is analyzed and
categorized, gesture tracking device and mapping solutions
for digital synthesis systems are developed for virtual instruments performance purpose.
Gesture applications to digital sound processing may be
faced in many ways, especially when thinking of the new
possible meaning for gesture in computer music. The following part will describe some situations in which gestural
control of performance environments goes beyond the model
of traditional instrumental control.
124
Sylviane Sapir
Table 2. Musical score as a grid of macro (rows) and micro
(columns) events.
Table 4.
Description of the four microevent control primitives.
Primitive id.
Event
Time
Dur.
Ins.
P5
P6
P7
Not
Not
...
Not
0
0
...
5000
1000
750
...
3000
1
1
...
1
100
50
...
100
x
y
...
z
440
131
...
220
Table 3. Gestural score as a grid of macro (rows) and micro
(columns) events.
Event
Time
Dev.
P4
P5
P6
Gest
Gest
...
Gest
0
0
...
3000
sensor
sensor
...
sensor
100
80
...
100
a
b
...
c
30
60
...
25
5.1. Computer score performance
During the author’s working period at the Centro di
Sonologia Computazionale (C.S.C.) of Padua University, the
problems that derived from using gesture to control the real
time performace of a computer musical score, such as Music
V or Csound ones have been studied (Azzolini & Sapir,
1984). This led to the description of a theoretical frame in
which gesture and sounds may be organized into two independent and structured spaces that may interact according to
arbitrary relationships (Sapir, 1989).
The problem is not to play with an instrument, but to interpret a score with different levels of interaction from the
complete automatic performance (press a button and the
sequence runs by itself) up to improvisation (gesture built
musical phrases).
Something has to be said about the significance of the
time data: gesture time (start, end or duration) may influence
notes or silence timings and vice-versa. Real-time – loading
a value or a set of sound parameters at the moment a gesture
occurs – is only one possibility among others.
By considering the computer sound score (notes, timings,
parameters) and the gesture score (gesture, timings, parameters) as two grids of events, as shown in Tables 2 and 3, it
is possible to mix elements of these grids in order to produce
a performance which is musically significant.
Macroevents are defined as a group of parameters logically
correlated, that intervene at the same instant in the elaboration of a musical or a gestural event, while the individual
variations of a single parameter are defined as microevents.
A set of control primitives on microevents has been
defined in order to combine in diversified ways their timevalue elements, and general treatment modalities of the time
parameter has been identified to allow temporal organization
control of the different macroevents, (see Tables 4 and 5).
No Control
Direct Access
Sample & Hold
Sequencer
Table 5.
Time comes from
Value comes from
Score
Gesture
Score
Gesture
Score
Gesture
Gesture
Score
Description of the three macroevent timing controls.
Time control id.
Action time comes
from
Duration comes from
No Control
Trigger
Trigger & Gate
Score
Gesture
Gesture
Score
Score
Gesture
A symbolic language to describe the way gesture interacts
with sound along with a real time software that allows score
performance have been designed and implemented for the
digital sound processor 4i. This processor was physically
controlled by only two sliders and the host computer keyboard – a DEC PDP-11 with the RTI4I proprietary real-time
operating system. Even with such a simple prototype, numerous musical pieces have been produced, such as the digital
part of the Prometeo by L. Nono in the two first versions
(Sapir & Vidolin, 1985).
5.2. Live electronic performance
Live electronic performances may be considered a challenge
for creative gesture to control sound processing. Composers
and performers are often fascinated by the combination of
different expressive means. Interactions between real and
synthetic worlds stimulate invention and creativity. The possibility of instantly processing sounds that are produced by
acoustic instruments on the stage allows to overcome some
natural limitations of traditional music performers, and adds
new dimensions to the performance of computer music
works.
By dealing with existing sound and recognizable instrumental gesture, we have to think how sound processing and
non instrumental gesture may be musically congruent to the
musical performance. This is done without having any direct
control on the sound emission, but by transforming audio
data according to external events which are not always
explicit on stage but which depend on states of the performance itself. Furthermore, live electronics is often combined
with sound projection for spatialization and/or sound reinforcement, thus including gesture which belongs more to
sound engineers practices than to instrumental ones.
Digital audio environments for live electronics may
require heterogeneous equipment such as audio processors,
125
Gestural control of digital audio environments
MIDI device, digital mixers, computers network with eventual backup systems in case of crash. Thus very complex
situations on stage and at the mixing console have to be
managed to allow a good and safe interaction between
performers via different communication channels. Such
interactive environments must be designed to fit the requirements of the final musical performance but also of the
critical rehearsal phase, and each piece or each new performance environment requires new developments and new
solutions.
With such complex setups, it’s generally more comfortable to work in the former case with cue lists of pre-defined
musical material such as sets of sound or transformation parameters, or audio samples. This strategy allows to
lower risk levels and is well suited when live electronics have
been encoded on the score. Performers at computer, or performers on the stage will only have to trigger the cues at a
given time through keyboards, pedals, buttons or other
devices.
In the case of a rehearsal, the environment must be
designed to allow experimentation by means of direct gestural control on sound processing parameters. Usually, one
to one mapping solutions along with the possibility of taking
snapshots of running configurations are good strategies that
allow to prepare the pre-defined material to be cued during
the final performance.
Sound processing algorithms may be considered as new
instruments to play. Such instruments require immediate
dynamic control which can be done on stage with remote
controllers or by some hidden performer close to the mixing
console. Gesture is intimately linked to sound through a bidirectional path that may initially disorient but that certainly
opens new horizons for music creation. By giving the possibility of selecting the quality of gesture and its relationship
to the resulting sound, the composer enlarges its musical language, letting gestural meaning to grow through the interaction between performers and the way we experience the
performance.
Figure 2 illustrates a general situation during live electronic performances. One or more performers play acoustic
instruments whose sound may be transformed by processes.
These processes may be controlled by other performers who
can use specific control device or traditional acoustic instruments. For example, the intensity of a sound produced by a
clarinet player may be dynamically controlled by the intensity of a sound produced in the mean time by a flute player.
A simple envelope follower allows to extract the dynamic
profile of the sound intensity and to apply it to the sound of
the clarinet. A similar control could be realized by moving a
slider and by mapping the output data coming from the slider
onto a modulator input. Two different typologies of gesture
would led to two different audio and visible results and would
might have two different musical meanings.
Finally, the feedback that appears in Figure 2 indicates
that processed sounds may be added to new incoming
ones. Such situation may create unstable systems which
Instrumental
Gesture
Control Gesture
Sound emission
Sound Processing
Fig. 2. Scheme of the relationship between gesture and sound
during live electronic performance.
are complex to manage but with a great potential for
expressiveness.
Traditional music performance implies the actuation of
instrumental gestures on acoustic instruments. In such a case,
the relationship “one gesture to one acoustic event” is
dominant and clear to the audience. The introduction of
sound processing during the performance allows to transform
the acoustic event thus decreasing the strength of the relationship between gesture and sound. By physically controlling such processes and adding new performers on stage it is
possible to balance again the link between gesture and sound,
and to play with the theatrical aspects of gesturality.
5.3. Real-time control of a music workstation
Real-time synthesis and sound processing require large
computer power. Furthermore, when processing depends on
the musical situation, on logic conditions lying on audio or
gesture signals, or when parameter extraction from sound
analysis is necessary to trigger effects or to pilot automatic
score following, the need of a fast powerful programmable
integrated system becomes a must.
The MARS musical workstation, developed at the IRIS
research center of the Bontempi-Farfisa’s group during the
nineties satisfies main of the requirements of such a system.
This musical workstation is a totally programmable digital
machine dedicated to real-time digital sound processing. In
its latest version, this workstation includes a sound board,
NERGAL, to be plugged into the host computer and the
graphic user interface ARES which runs under the Windows
platform (Andrenacci et al., 1992).
ARES is a set of integrated graphic editors which allow
the interactive development of hierarchized sound objects. It
is possible to graphically design algorithms for sound processing and to control them by associating any type of MIDI
device to their parameters with arbitrary relationships. The
real-time control is done by the embedded RT20 operating
system of NERGAL (Andrenacci et al., 1997).
MARS does not provide any compositional environment,
but its compatibility with the MIDI world allows users to
include it in a studio having MIDI facilities. To overcome
the MIDI limitations, MARS provides an application toolkit
which allows to extend its environment and to easily develop
user applications.
126
Sylviane Sapir
MARS allows users to develop single sound objects called
tones. Tones are specific configurations of all algorithm input
variables such as parameters, envelopes or LFOs.
Tones are related to their algorithms and are associated
to a MIDI channel. Tones may be dynamically loaded with
Program Change messages.
Parameters have a symbolic name, a default representation unit (such as dB, Hz or ms) and a state that defines the
way their values will be updated:
• Static parameters are initialized at the beginning of a note
event by a constant value.
• Dynamic parameters are initialized at the beginning of a
note and are continuously updated during the life-time of
the note, according to a conversion formula which solve
most of the main mapping problems.
The setup of a dynamic parameter is made in two phases:
1. Formula selection – a set of 17 predefined formulas is
available as combinations of a maximum of three functional blocks called M. For example, in the Figure 3 the
formula associated to the pitch parameter is:
Pitch = M1* [1 + M2]
2. Blocks definition – Each M block is defined by a conversion function (an array), a MIDI index, and a scaling
factor for output values. For example:
M1 = HZTEMP(Key)*1.0
M2 = BEND12HT(Pitch-bend)*1.0
The mapping features provided by the MARS workstation
allows for different solutions which fit different control
situations.
One to one mapping strategies have been intensively used
for DSP research and quick sound prototyping.
Convergent mapping strategies have been used to add
performance possibilities and allow subtle live control
mechanisms, as illustrated in the previous example on
pitch.
Divergent mapping has been commonly used to simulate commercial instrument behaviors and to optimize the
ergonomic aspects of their control part.
The MARS workstation combined with the MAX environment has been intensively used for real-time synthesis and
for live electronic pieces. Its interface provides four important features related to the sound – gesture relationship
which usually have fit the interactive digital environments
requirements.
1. Gestural data conversion facilities;
2. Gestural data grouping rules to define “meta-gesture”;
3. Sound parameters extraction and logic facilities on audio
signal;
4. Time control and synchronization features for parameter
updating.
Fig. 3.
The MARS dynamic parameter editor.
5.4. Gesture and music education
At school, it’s currently more frequent to find multimedia
computers than musical instruments. It’s always possible to
learn about music with a computer, but using the computer
keyboard or the mouse to interact with materials in a musical
context is not at all ideal. Similarly, the movements belonging to the traditional world of music frequently need sophisticated motor skills that are beyond the capacity of children
who do not study a musical instrument.
Numerous studies have shown that methods which involve
the voice and the whole body in an exploration of motor
coordination and of synchronicity with the development of
the musical experience were helpful for the young children
music education but also in situations of rehabilitation or
motor or psychological disadvantage (Aucouturier, Darrault,
& Empinet, 1984; Bachmann, 1991; Le Boulch, 1991; Vayer
& Roncin, 1988).
An interactive digital audio environment called “Il Teatro
dei Suoni1” (The Theater of Sounds) has been developed by
a team of music teachers, engineers and musicians under the
author’s coordination, for the expressive, musical and motor
education of children. The Theater of Sounds transforms
the classroom into a live interactive sound-creating space,
through a system of mats/sensors, which capture each movement and transform it into sound through the computer.
The Theatre of Sounds is made up of:
• A “daisy” with 12 mat-sensor petals which the children can
walk and jump on. This is connected via a wire-free radio
link to the computer which generates sounds, music and
sound effects;
• A sound-generating software system which interacts with
the audio card of the multimedia computer;
1
The Teatro dei Suoni has been developed by the Soundcage s.r.l
http://www.geureka.com; it is a registered trademark GARAMOND
S.r.l. (Rome – Italy) – http://www.garamond.it/
Gestural control of digital audio environments
Fig. 4.
127
The Theatre of Sounds – “Daisy” with the mats/sensor and the graphical user interface.
• An authoring program which allows to launch interactive
audio environments or to create new ones.
The Theatre of Sounds uses a great variety of audio items.
These can be single notes played on instruments from a vast
palette of tones, but can also be musical phrases or sound
effects. As well as providing a rich bank of sounds and
sequences, the system also permits the creation of new interactive audio environments and allows musical sequences and
samples to be imported from audio and MIDI compatible
external files.
The educational aims of the system are:
• The exploration of sound, tone, melody, rhythm and form,
by manipulating the parameters and values which define
them;
• The development of perceptive-motor skills;
• The co-ordination of movements directed at a particular
musical aim;
• The elaboration of narrative plots from an evocative audio
stimulus, and the development of more mature creative and
communicative skills through movement.
With such an environment, we faced two problems:
mapping simple gestural data onto audio and musical materials at different hierarchical levels, and allowing more
performers (up to 12) to play together in order to perform
structured musical material.
We adopted a very simple technology to track gesture –
analogical sensors hidden in the mats that let children free
to move on the daisy. The radio transmission of the gestural
data to the host computer allows to place the “daisy” far
away from the computer for the children to focus the attention on their movements and not on what happens on the
screen.
Gesture is split into two categories – single steps like
walking and high-rated steps like running – which corresponds to two different control types – discrete and continuous. Discrete controls allow events triggering or sequencing
while continuous controls allows to gradually transform
musical or audio parameters.
MIDI based material or audio material may be mapped
onto the sensors by means of a set of pre-defined programmable rules. This material is then dynamically re-organized
and transformed according to the selected rules of the current
environment and the gestural inputs of the children.
The theater of sounds aims to be a “collective musical
instrument” to develop communication and expression
through sound, music and movement. The metaphor of a
theatre helps the children to bring together a performance in
which each of them plays a part as composer, performer and
even director.
6. Conclusions
This paper was concerned with the study of new relationships between sound and gesture. After having drawn a
general framework of the gesture tracking technology, it
faced the problem of mapping gestural data onto audio data.
Some applications have been described to illustrate specific
situations in which gesture may control interactive audio
environments without direct reference to the conventional
instrumental model.
Human gesture has always been at the center of the
musical creation process either with the writing form of a
score or for instrumental and conducting performance. On
the contrary, listening to music often leads to emotions which
reverberate at body level; a phenomenon which is at the basis
of dance and choreography.
Music and gesture are closely linked with a dynamic relationship between acting and listening, cause and effect.
Today’s technology allows one to imagine and build in an
arbitrary way new relationships between gesture and sound.
Besides new hints for creativity these new possibilities allow
the development of innovations for education, help for the
handicapped and overall communication.
128
Sylviane Sapir
Composers and performers have an important role to
assume for helping developing interactive systems with
meaningful relationships between music and human gesture.
References
Andrenacci, P., Armani, F., Paladin, A., Prestigiacomo, A.,
Rosati, C., Sapir, S., & Vetuschi, M. (1997). MARS =
NERGAL + ARES. In Proceedings of The Journées d’Informatique Musicale 1997 (pp. 145–152). Lyon: SFIM (French
Computer Music Society).
Andrenacci, P., Favreau, E., Larosa, N., Prestigiacomo, A.,
Rosati, C., & Sapir, S. (1992). MARS: RT20M/EDIT20 –
Development tools and graphical user interface for a sound
generation board. In Proceedings of the International Computer Music Conference 1992 (pp. 340–344). San Jose,
California: International Computer Music Association.
Aucouturier, B., Darrault, I., & Impinet, J.L. (1984). La pratique
psychomotrice: rééducation et thérapie. Doin. Paris.
Azzolini, F. & Sapir, S. (1984). Score and/or gesture: a real time
system control for the digital sound processor 4i. In Proceedings of the International Computer Music Conference
1984 (pp. 25–34). Paris: International Computer Music
Association.
Bachmann, M.L. (1991). Dalcroze Today: An Education
Through and Into Music. Clarendon Press. Oxford.
Boie, R., Mathews, M., & Schloss, A. (1989). The radio drum
as a synthesizer controller. In Proceedings of the International Computer Music Conference 1989 (pp. 42–45).
Columbus, Ohio: International Computer Music Association.
Cadoz, C., Luciani, A., & Florens, J.L. (1984). Responsive input
devices and sound synthesis by simulation of instrumental
mechanisms: The CORDIS system. Computer Music
Journal, 8(3), 60–73.
Cadoz, C. (1994). Le geste canal de communication hommemachine. La communication “Instrumentale” – Numéro
Spécial: Interface Homme-machine, 31–61.
Camurri, A. (1995). Interactive dance/music systems. In: Proceedings of the International Computer Music Conference
1995 (pp. 245–252). Banff, Alberta, Canada: International
Computer Music Association.
Chadabe, J. & Meyers, R. (1978). An introduction to the PLAY
program. Computer Music Journal, 2(1), 12–18.
Dannenberg, R. (1984). An on-line algorithm for real-time
accompaniment. In Proceedings of the International Computer Music Conference 1984 (pp. 193–198). Paris, France:
International Computer Music Association.
Dannenberg, R. & McAvinney, Rubine, D. (1986). Arctic: functional language for real time systems. Computer Music
Journal, 10(4), 67–78.
Depalle, P., Tassart, S., & Wanderley, M. (1997). Instruments
Virtuels. Résonance, 5–8. IRCAM. Paris, France.
Le Boulch, J. (1995). Mouvement etˆ développement de la personne. Vigot.
Morales-Manzaneres, R. & Morales, E.F. (1997). Music Composition, Improvisation, and Performance Through Body
Movements. In: Proceedings of the International Workshop
on KANSEI – The Technology of Emotion (pp. 92–97).
Genova: AIMI (Italian Computer Association) and DIST –
University of Genova, Italy.
Moore, F.R. (1988). The dysfunction of MIDI. Computer Music
Journal, 18(4), 19–28.
McMillen, K. (1994). ZIPI – Origins and Motivations. Computer Music Journal, 18(4), 47–51.
Moorer, J.A. (2000). Audio in the new millennium. R.C.
Heyser memorial lecture series, Audio Engineering Society.
http://www.aes.org/technical/Heyser.html.
Mulder, A. (1994). Human movement tracking technology.
Technical Report. Simon Fraser University, Canada.
Mulder, A., Fels, S., & Mase, K. (1997). Empty-handed gesture
analysis in Max/FTS. In Proceedings of the International
Workshop on Kansei: The technology of Emotion (pp.
87–91). Genova: AIMI (Italian Computer Music Association)
and DIST – University of Genova, Italy.
Pennycook, B. (1991). The PRAESIO Series – Compositiondriven Interactive Software. Computer Music Journal, 15(3),
267–289.
Polansky, J., Rosenboom, D., & Burk, P. (1987). HMSL:
overview and notes on intelligent instrument design. In: Proceedings of the International Computer Music Conference
1987 (pp. 256–263). Urbana, Illinois: International Computer Music Association.
Puckette, M. (1991). Combining event and signal processing in
the MAX graphical programming environment. Computer
Music Journal, 15(3), 68–77.
Roads, C. (1996). The Computer Music Tutorial. Cambridge,
MA: MIT Press.
Sapir, S. (1989). Informatique musicale et temps réel: geste –
structure – matériaux. In Proceedings of the International
Conference on Musical Structures and Computer Aid (Structures Musicales et Assistance Informatique) (pp. 5–19).
Marseille: MIM (Marseille Computer Music Laboratory),
CRSM – Université de Provence, Sud Musique and CCSTI
(Cultural Center on Science, Techonlogy and Industry of
Provence), France.
Sapir, S. & Vidolin, A. (1985). Interazioni fra tempo e gesto –
Note tecniche sulla realizzazione informatica del Prometeo.
Quaderno 5 del LIMB (pp. 25–33). Venezia: La Biennale di
Venezia, Italy.
Sundberg, J., Friberg, A., Mathews, M.V., & Benett, G. (2001).
Experiences of Combining the Radio Baton with the Director Musices Performance Grammar. In: Proceedings of
MOSART Workshop on Current Research Directions
in Computer Music (pp. 70–74). Barcelona, France.
http://www.iua.upf.es/mtg/mosart/papers/p34.pdf.
Vayer, P. & Roncin, C. (1988). Les Activités corporelles chez le
jeune enfant. Presses Universitaires de France.
Vercoe, B. (1984). The synthetic performer in the context of live
performance. In Proceedings of the International Computer
Music Conference 1984 (pp. 199–200). Paris, France: International Computer Music Association.
Wanderley, M.M. & Battier, M., eds. (2000). Trends in Gestural
Control of Music. Ircam – Centre Pompidou.
Gestural control of digital audio environments
Winkler, T. (1995). Making motion musical: gestural mapping
strategies for interactive computer music. In Proceedings of
the International Computer Music Conference 1995 (pp.
261–264). Banff, Alberta, Canada: International Computer
Music Association.
Wessel, D. & Wright, M. (2000), Problems and Prospects
129
for Intimate Musical Control of Computers. In: ACM
SIGCHI, CHI ’01 Workshop New Interfaces for Musical
Expression (NIME’01). http://cnmat.cnmat.berkeley.edu/
Research/CHI2000/wessel.pdf.
Wright, M. (1994). A comparison of MIDI and ZIPI. Computer
Music Journal, 18(4), 86–91.