-cl
ol
(sl
JI
Alberto Novello
'!',-......!-...-,.....'..,.....'.!.-.,.-..!. i:::'.-..-..-..-,.......i.-..-.,.-
cl
.(d
. ..i.: .f,...... .., r-....
,.
I
PI
@t
(tll
_ol
ol
Juan SebastiAn Lach
al
cl
(sl
f
?
or
i
:
i:...ii' :::::.:..i.i..!.+ i : i;:::!.i:..!.:...i._i
'i.r":.:;;1';.'!;;;;:
iiiii :.;;
;'i':r..:..i..i., :i-;ii.:i'i'r
a)l
>l
ol
zl
rlol
ol
-ol
<l
::::;'i
lntroduction
"To establish the basis for a theory of music, one
would want to explain why cerlain conceivable
construcfs are utilized and others not. l...) Where,
then, do the constructs and relationships described by music theory reside? [...] One commonly
speaks of musical structure for which there is no
direct correlate in the score or in the sound waves
produced in performance. One speaks of music
as segmented into units of all sizes, of patterns of
strong and weak beats, of thematic relationships,
of pitches as ornamental or structurally important,
of tension and repose, and so forth. lnsofar as one
wlshes to ascribe sorne sod of 'reality' to these
kinds of structure, one must ultimately treat them
as mental products imposed on or inferred from
the physical signal. ln our view, the central task of
music theory should be to explicate this mentally
produced organization," (Lehrdal & Jackendoff,
1983, pp. 2-3)
Auditory scene analysis (ASA) (Bregman, 1990) is one of
the most successful theories describing how our auditory
system is able to build a mental picture of space from a
complex sound environment. The categorization processes
at the core of our listening involve segregation in order to
separate the auditive field into meaningful audio streams
that can be followed independently. During listening, the
auditory system extracts relevant audio descriptors, called cues, that can then be related and interpreted in the
segregation processes. They serve to distinguish different
acoustic aspects of the surroundings and to selectively
isolate and position the different sound sources. lnspired
by Gesta/t theories (for an overview of their application in
music research see Leman and Schneider, '1 991, for a
compositional approach see Tenney, 1988), which explain
parls of our visual organization of space, ASA has defined
some laws that allow the computational analysis of complex sound scenes. From an audio recording, these laws
are used to extract the salient audio cues necessary to imitate the human segregation processes. A cue can be the
onset time: when two sounds occur within a relatively shotl
time interval, they are perceived as to belong to the same
source. Another dimension is pitch; two sounds occupying
adjacent spectral zones are perceived as coming from the
same physical source. Other possible audio cues are location (e.9., spatial position, in the case of stereo recordings)
as well as amplitude or frequency modulation (in the sense
of similar variation in intensity or in pitch).
We can apply ASA laws to music to investigate how music
is fragmented into parallel audio streams by psychological
processes and thus to separate different instrumental parts
from an audio file for example, or to understand in which
temporal conditions a set of unorganized sounds is perceived as a rhythm.
Despite their intrinsic interest in the analysis of complex auditory scenes for possrble applications (e.9. automatic de-
tection of accidents, speech decodification, etc.), and their
efficiency in reproducing/predicting human perception, ASA
laws have never been applied into music synthesis for composition. Their power rn explaining the stream segregation
of our auditory system can serve in the synthesis process
to artificially control a sense of sound fusion in the perception of the listener. ln this paper, we present some music
composition algorithms inspired by the ASA laws to control
the degree of fusion of different audio streams. Our goal
is to create a perceptually relevant effect of increase and
decrease of cohesion between several instrumental parts
or sounds using time cues in the form of rhythm, frequency
cues in the form of pitch, and spatial localization cues.
Perceptual research has had a limited influence on music
composition, a few notable exceptions being composers
in the circle around Max Mathews and computer music
in the 1960 s like James Tenney, Jean Claude Risset and
John Chowning. Their theoretical and aesthetic output was
deeply influenced by both psychoacoustics and Gestalt
theory as applied to their music. Conversely, compositional
insights and questions have not always been taken over
by music perception research. Both disciplines share similar concerns but have different aims and each should keep
hold of its own strengths, positions of practice and necessities. Additionally, each of them comprises a multitude of
fields, styles, and approaches. In this sense, it is not so
much a question of breaching a divide, but more of acknowledging their productive and discursive spaces in ways
that can enhance each field while producing an interchange
between them.
Approaching composition via scientific (and partiiulady
perceptual) research requires more than just an 'application' of its models, instead requiring atranslation of the underlying ideas behind the models in a way that is pertinent
to a musical world, where aesthetics and performance are
central. Musical hypothesis must be formulated from the
consequences of these theories. Taking ideas that result
from analytic theories in order to synthesize musical structures leads the lines of thought into directions unrelated to
the premises used to derjve those theories in the first place.
Music psychology explains some phenomena of listening
by establishing correlations between the physical properties of sound and a 'mental image' of music. ln an inverse
trend, these correlations can serve as starting points for the
composition of musical forms and materials. The music that
is produced is not a demonstration of these perceptual laws
but an artifact motivated by them. Other factors come into
play when listening to the structures and processes that are
put into motion, as the actual sound material used already
implies its own context, far removed from purely sensory
passive 'laboratory' material, such as test tones used in experimental setups. The ideas are abstracted into another
domain that renders them with new properties. The questions posed are of a drfferent nature than the ones that are
needed in order to investigate perceptual correlations. These questions are conditioned by the mappings that translate these ideas, which can be modeled either mathematically or metaphorically into compositional parameters. The
results vary enormously within a chosen material and model
depending on the particular mapping used, and there are
usually many ways to map. This is a crucial step in a creative process that cycles through generating mappings and
materials, listening to their deployment (for which computer
aid is mostly but not always useful), evaluating them and
modifying/generating again. The process is simultaneously
top-down (from general, formal conceptions towards elementary materials) and bottom-up (building from elements
towards aggregates) and at every new turn the cycle is lead
by way of musical questions having to do with narrative,
form, texture, timbre and so on.
These experiences brought by music making could serve
to better pose and contextualize perceptual research questions. A lot of perceptual research is done on traditional music premises, as for example in Lehrdal and Jackendoff's
Generative Theory of Tonal Music, where they provide
cognitive correlations for already existing music theoretjcal
concepts. They assume a certain compositional practice
(Western common period tonal music) that is codified as
'natural', restraining them from deducing consequences
from the theory that are not in accordance wjth this practice and ignoring current trends in the problematization of
music. There is always a gap between the scientific and the
artistic questioning and there is no simple translation that
can conciliate both sides. lt sometimes Ieads into a mutual
triviality situation: important matters in science are artistically
trivial and vice-versa. We feel once a scientifically influenced
idea trespasses into the musical domain, musicality must
take over, but on the other hand the results can also lead
to conclusions that raise new insights into the premises on
which the scientific ideas were based. An good case of this
happens in the section 'Musical Scene Analysrs' in chapter
5 of Bregman's Auditory Scene Analysis, where he follows
many challenges posed by contemporary composers and
approaches them through his ASA theory.
With this article, we present three algorithms inspired by
perceptual laws to synthesize instrumental parts with di_
fferent degrees of fusion/separation jn three music dimen_
sions: rhythm, pitch and spatialization. We use these algorithms to evaluate the effect that perceptual laws of auditory
segregation/fusion may have in an aesthetic domain such
as music composition. On the other hand, through this pro_
cess, we also try io bring the interest of the music liste_
ner toward perceptual questions about music: how much
regularity and order between several instrumental parts is
necessary to start perceiving a rhythmic/metric structure?
What is the aesthetic/emotional impact of different degrees
of fusion between musical parts? How is a progressive music fragmentation in different dimensions perceived by the
listener, and what is the effect of a subsequent reintegration
of all its parts together?
Clinamen, a piece on fusion and
segregation of parts
Experimentation with the ASA laws into music synthesis,
inspired the composition of an electroacoustic piece which
became the goal of our collaboralion, Clinamen. lts structure has a symmetrical form.
The piece builds up from an initial sound excerpt which
is decomposed into several spectral streams (pitch segregation) depending on the dissonance and harmonicity
between its spectral components. Each spectral stream
is assigned to an independent track which plays cyclically
with an individual period and phase. At the beginning all
tracks play simultaneously to form the original sound excerpt; then a periodic rhythm is progressively created by
using different accelerations, thus independent tempos for
each track (rhythmic segregation). During the acceleration,
the streams progressively separate spectrally and spatially,
while passing through rhythmically complex stages of phasing against each other. When the separate stream accelerations finally converge into the periodic rhythm, the original
sound excerpt is not recognizable anymore, it has been
transformed into a rhyihm. This corresponds to the second
part of the piece, while the third part, coming after this periodic rhythm has finished its cycle, is a mirror of the first:
the rhythm decomposes through separate decelerations,
the streams spatially collapse to a single point and spectrally converge to reconstitute the original sound excerpt.
One can think of a sound te)dure or an audio excerpt that
is decomposed into its component streams in order to as-
I
I
t
F
>\
E
il
I
K
semble a cyclic rhythm. The process can also be reversed:
the rhythm is the source material, progressively disassembled toward the moment when all its parts coincide in time
after having been decelerated independently. Both approaches lead from a state of recognizability of the material
into one of unrecognizability, albeit inversely. ln the case of
the sound object, it is when the components coincide
in
time that the object is identifiable and the rhythm is absent;
when the rhythm is completed, the pafts of the sound object are not recognizable anymore. On the other hand, in
the case of the rhythm as the point of arrival, the procedure
is less obvious as it arrives after the construction process
and is less easy to identify as a musical object or goal; the
beginning and ending being moments where the rhythm s
elements are piled up vertically. Between these two extremes, we experiment with continuous degrees of rhythmic
and pitch fragmentation. ls the rhythmic part the result of
the sound transformation, or is the sound object the result
of the rhythmic transformation? This question reminds us of
the duality of frequency and rhythm as two interdependent
components of sound.
The sound segregations can be done in different perceptual
domains; in Clinamen we concentrate on three dimensions:
rhythm, pitch and space. Each one can be done independently and in combination with the others, giving many
compositional possibilities.
The Algorithm for Rhy.thmic Segregation controls the rhyth-
mical/temporal aspects of the piece. lt builds a transition
from the coincidence in time of all the components of a
sound towards a specrfied periodic rhythm. As ASA is interested in which temporal separation between two sounds
let the listener's perception image either one or two sound
sources, we are interested in exploring at which point in
time an apparent random cascade of sounds progressively
organizing is perceived as a regular rhythm, and what is
the aesthetic effect on the listener. The algorithm repeats
playing each sound with an individual tempo. The tempo
is accelerated for each sound sample independently to interpolate from time O (when all sounds elements are played
together) to the moment when all sounds are put into the
rhythmic pattern. lt is through the independent acceleration
of the parts that each sound gradually falls into an element
of the specified rhythm. This is why the rhythm is defined
in spectral terms of amplitudes and phases of partials, so
that the tempo of each part can follow its own acceleration
in such a way as to fall in synchronicity with the rhythm;
conversely, it will also allow the passage from rhy,thm into
simultaneity at the point of coincidence. This notation is insp red by the early works of Steve Reich in which the phases
of repeating instrument patterns are progressively shifted to
c'eaie several intricate rhy'thms from simple initial patterns
=ecr, 2CA2t. There are also many similarities with work
done in granular synthesis, although our project is focused
on what they call the meso temporal scale instead of the
micro (Roads,2001).
The Dissonance Curve Analysis controls the segregation in
the frequency domain. The main purpose is to be able to
separate a sound in harmonic and timbral terms instead
of just in terms of high and low frequency bands. ln this
way, the material used for the rhythmic segregation can be
divided (or transposed, or accompanied) in terms of its consonance/dissonance or harmonic properties and then transitioned along with the rhythmic process between different
harmonic or timbral zones. This is done by implementing a
dissonance curve analysis in parallel to the rhythmic algorithm, yielding information relating to timbral and harmonic
properties of the pitch content of the sound material. This
information can be used for harmonic transitioning, synthesis, filtering, etc.
Segregation in the rhythmic domain:
lmplementation of the algorithm for
rhythmic segregation
The Algorithm for Rhythmic Segregation is composed by
three main functions: 'calcUp', 'calcSteady', and 'calcDown'. These functions calculate the onset times (e.9.,
note attack times) for three tempo sections: when the tracks
(e.9., playing instruments) in the music follow the same
tempo and the rhythm is stable (calcSteady), and when
each track follows independent tempo, with acceleration
(calcUp) and deceleration (calcDown). The user first inputs
the number of tracks by assigning a single sound sample to
each trackl. The user then chooses an onset phase and a
period Ior each track, which are used by the three functions
to calculate the onset trmes (i.e., the points in time at which
the sound samples are to be played). The onset phase and
period are chosen by the user to define the rhythm of the
stable section; in the acceleration and deceleration sections
the rhythmic position of sound samples is interpolated by
the functions.
calcSteady calculates the onset time of each sound sample when the rhythm is stable. The onset phase for each
track represents the temporal shift from the beginning of
the rhythmic pattern, the onset period for each track represents the temporal separation between two onset times of
the sound sample. Grven M tracks represented by the letter
i=1, 2, ... , M, we call A(/ the onset period and ph(i)lhe
onset phase in the steady rhythm section relative to track i.
The period of the whole rhy.thmic paltern, i.e., the time span
before it repeats, is equal to the oeriod of the first track,
A(0), which also has no plase oniat = 0. Because of this
definition, the period c* :^e ".s: :'a:< has to be the largest
between all tracks.
lmplementation of the rhythmic
construction (calcUp)
deita=2 ph=0
delta=B ph=?
delta=B ph*4
delta=16 ph=O
Because the period of the initial synchronous beat is
A(O),
whrch is also the largest period between all tracks in
Figure 1: Schematjc representation of the onset phase
and
period of 4 tracks in the stable rh!4hm sectton
calcup and calcDown treat the tempo of each track inde_
pendently to create a smooth temporal transition
between
the stable rhy,thm section and the rhyihm created by all
tracks playing synchronously with an onset period equal to
the period of the rhythmic pattern L(O) and no onset phase.
calcup is a nonrinear function caicurating the accereration
from the single beat of the synchronized tracks to the stable
rhythm. The user can provide the desired temporal length
of
the acceleration interval (in muttiptes of the period L(0)). Be_
cause the function accelerates each track from the synchro_
nous beat wilh L(0) and ph(0)=e (equal for ail tracks), to the
stable rhythm section in which the onset period is L() and
the phase is ph(i) (different from track to track), the accelera_
tion must be different for each track, rndividuar different accelerations for every track are necessary to create a smooth
and progressive transition. calcDown is a nonlinear function
analogue and inverse Io calctJp calculating the deceleration
from the stable rhythm to the single synchronous beat.
the
stable rhythmic section, every track must accelerate to
come to its characteristic period A(/ at the beginning
of the
stable rhythm section. The acceleration can be thought
of
as a progressive compression of time. Because every
track
reaches a different period, the acceleration curve is
different
from one track to the other. An easy way for calculating
the time-onsets for accelerated events can be obtained
by
pro1ecting a number of equidistant events
on a compres_
sion function y. For each track i we have a group of ndiv(i)
equidistant onset events (k = 1, 2,..., ndiv()), that represent
the time carried by a stable watch in the concert room.
We
can calculate the time for accelerated onset events by projecting each value of Ihe ndiv() equidistant
events on the
compression function y, which represents a virtual accele_
rating clock. This is a rapid way of mapping a linear time
in
a nonlinear way, as shown in Figure 2.
F
F
c
o
!
g
g
The smoothness of transition provided by these two
functions is necessary for the progressive ,,construction,,
and "deconstruction" of the stable rh\,thm from and to the
synchronous beat. During the rhythmic construction, the
listener notices at first the smearing of the single beat into
an apparently chaotic set of events; then sound progressions emerge; then regular temporal structures progressively form, finally clearing up into an unvarying rhythm. From
an experimental point of view, depending on the type and
complexity of the target rhythm, it can be interesting to note
at which point in time the listener starts perceiving a rhythm,
e.9,, when the individual tracks segregate into an integrated
rhythm. This process may be used to extend knowledge of
previous literature on rhythm perception.
During the rhythmic deconstruction section the listener
becomes rapidly aware that the opposite of the rhythmic
construction section is happening: the smearing of the
rhythm; the smooth progressive congregation of the djffe_
rent sounds into a synchronous single beat. This predic_
tabllity is not an aesthetic problem; we found instead that
the listener gets interested in noticing how his mind works,
becoming a self-experimenter testing his/her perception to
find when helshe misses ihe stable rhythm. For the expe_
rimenter it may be interesting to measure if the perceptual
dissolution of the rhythm is time-wise symmetrical in the
construction and deconstruction intervals as suggested
by intuition, or if the attention to the stable rhythm and its
memory may retard the point of djssolution in the rhythmic
deconstruction interval.
RealTime
Fisure 2: Compression
.:H.l:'.ff5
reat-time onsers to
ffi:?istant
The optimal compression function
A linear function provides a too rapid increase of onset
rate
at the beginning of the piece for samples with small periods,
causing the rhythm to build too fast in the beginning of the
section and showing less variation in the rest. On the other
hand, an exponential curve brings the rhythmical structures
too late, showing a rapid construction of rhythm only at the
end of the accelerating time,
Following the criterion of having a balanced rhythmic
cons_
truction throughout the whole acceleration length, we found
the most perceptually and aesthetically satisfying effects
using the function defined as:
V(k)
= a.(ReatTime
(k))b
where the variable k = 1, 2,..., ndiv(i)represents the diffe_
rent onset events, while a and b are constants that
have to
be decided following two side conditions. Because each
track has a different acceleration function, a and b are in
facl a=a(i) and b=b(i) and y=y(i)and must be calculated for
each track.
ln the simple case of a track having 5 onsets (with distance
= 1) positioned at: 1 , 2, 3, 4,5 and defining the y curve as
having a = 1 andb =
- the equidistant onsets would turn
into: 1 .0, 1.4, 1.7,2.0,2.2.
For more detailed information about the implementation of
the algorithm please refer to the Supercollider programming
language class PolyRhythmia and its help file, downloadable at http://dindisalvadi.free.frlRhythmSegregation.zip.
itself. (For more information on dissonance curves, refer to
Carlos, 1987 and Setlares 1999).
Psychoacoustic roughness, the model of consonance on
which dissonance curves are based, is the sensation caused by amplitude fluctuations occurring due to the constructive and destructive inter-ference between the vibrations
of two overlapping sound waves or between the partial
components of a single sonority. lt is analogical to a textural, tactile sensation in the sense of irregularities in the
perception of sounds. Amplitude fluctuations have several
registers: beating, from around 1/1 6th Io 4 Hz, tremolo
from 4 to 10 Hz and roughness, varying from around 1O to
40 Hz depending on the register of the pitches.
Adding the onset phase
When two tones are sounded together at some interval
Once the accelerated onsets y(k) are calculated from the
compression curve, we have to progressively shift their onset by a fraction of the onset-phase value, ph(i), Io connect
the events to the stable rhythm, which includes the whole
phase shift in its calculation.
E
tr
g
Figure 3: Curve to calculate phase shift for every accelerated onset
Final Onset - Accelerated Onset + (k/ndiv)Phase Shift
lmplementation of the rhythmic deconstruction
(calcDown)
From the last onset event of the stable rhythm of each track,
calcDown calculates the decompression curve, to decelerate the tempo and bring all tracks to play their samples synchronously in the single beat of period A(0). This procedure
follows a similar but inverse calculation compared to what
we explained in the implementation of Ihe calcUp function.
Segregation in the pitch domain:
lmplementation of the dissonance
curve analysis
between their perceived pitches, it is probable that at least
some of the partials from each tone will overlap. The sensation caused by continuous transposition, when two padials
approach each other, is that they begin to beat, tremolate,
become rough and then fuse into one, before repeating the
sequence in reverse order as they continue to move apart.
The amount of roughness for each pair of parlials can be
quantified according Io Ihe critical bandwidth,which models
the auditory filters of the basilar membrane in the cochlea
(roughness being maximal at about _ of a band), Dissonance curves are assembled by summing the roughness of all
pairs of paftials between two tones for every transposition
step, leading to an estimation of roughness as a function of
the transposition interval. The points in the curve at which
roughness reaches a local minimum are collected into interval sets that provide compositionally useful harmonic and
timbral information about the sources (see Fig. 4). When
both timbres come from the same source (that is, when a
tone is transposed with itself, which is the case we're most
interested in) the interval sets correspond to the pitches for
which the timbre is most consonant with itself. ln the case
of sounds with harmonic spectra, the corresponding sets
include extended just-intonation intervals. Not all the intervals coincide with the partials of the sound's spectrum, new
pitches are produced, their amount depending also on the
intervallic range upon which the analysis is performed.
170
150
t2t
$
'')i.,"^ -, ,
/
roo
.;,
"
,/ '''' ,(l:
50
25
The approach we have taken in order to combine rhythmic segregation with pitch segregation is to use information
provided by dissonance curyes. Dissonance curves give a
roughness profile of a tone with respect to intervallic transposition, yielding the best intervals at which the timbre of
that tone is minimally rough (that is, most consonant) with
0.9
I
r_t
1.?
i.-i
I.4
t.5
1.6
r.r
1.,{ 1.9
2
Ll
lntenal
Figure 4: Example of a dissonance curve for an B pad a harmonic spectrum
over a range of a little more than an octave Notice how the rationalized
interuals form extended just intoned sets. A non harmonic spectrurn produces
some interva s wh ch are ttmbral, i.e , ilerr iai os are made of higher, more
complex numbers wh ch are not co.dJc \,,e ic harmon c treatment but are
nonethe ess cor.pcs i cia ,, useju n other ways
It is important to remark that consonance and harmonicity
are two separate dimensions of hearing. The first is related
to high-low hearing (tone-height), phonemes, gesture and
tone color. lt is useful in perception for stream segregation,
as in speech listening. On the other hand, the second dimension pertains to stability with respect to a system of
relationships, as in intervallic listening. When Iistening harmonically, timbre is of secondary importance; harmonic
meaning is not lost when re-instrumenting or transposing.
Harmonic hearing allows a certain tolerance, which brings
about the possibility for temperaments: approximations by
small deviations from just intervals do not alter their harmonic meaning. On the other hand, consonance refers to the
'smoothness' of a perceived sound. ln the lowest register of
hearing and related to the timbre of an instrument, harmonic intervals are in some cases rougher than more inharmonic ones. Although the two dimensions are intertwined, they
can be treated independently of each other and transitions
between the saliencies of the two modes of listening can
be composed. Transposition changes the timbral (and thus
dissonant) context but not the harmonic one. Conversely,
a change in one interval of a passage (like a chromatic or
foreign note) does not change the context of dissonance,
only the harmony.
can reinforce different harmonic and timbral areas
of that sound, modulating continuously through
them. Different articulatory possibilities for these
notes can aid in providing sensation of fusion or
segregation.
. When segregating
a musical passage into a pe-
riodic rhythm, the dissonance analysis can serve
as the pitch grid onto which each stream can be
transposed so that they arrive at chordal 'goals, at
the points of coincidence.
.
Use of the pitch information for pitch shifting the
original samples and traversing between different
harmonic configurations.
harmonic-iimbral plane
antitonality
The use of dissonance curyes allows us to segregate a
sound object in the frequency domain based not only on
low and high attributes of pitch but also with respect to the
two dimensional perceptual attributes of harmonicity-inharmonicity and consonance-dissonance (roughness-smoothness). (See Barlow, 2006, for more on harmonic fields). The
use of pitch sets with harmonic and sensory dissonance
properties are compounded into the Algorithm for Rhythmic
Segregation so that the tracks (auditory streams) are made
independent in high level music-perceptual terms.
Several strategies can be envisioned that will allow the traversal of the harmonic-timbral plane, obtained by this kind
of pitch segregation:
.
To drvide the sound object with a filter bank tuned to the pitches of the dissonance analysis, so it
can be reconstructed via harmonic means. When
all the pafts coincide in time, the original object will
be obtained; the rest of the time different chordal
balances can be produced, with control over the
gradualness of the harmonic development.
r
lntervals from the analysis are divided into har-
dissonance
Figure 5: Schematic representation of the timbral-harmonic plane. Traversing it
depends on many factors, the main ones being harmonic strength (probability
of choosing intervals based on a harmonic measure) as well as speed and
adiculation in the presentation of the note streams
One properly of the Algorithm for Rhythm Segregation
is that rhythm is specified in spectral terms. This aspect
makes posslble the generation of rhy'thms from the spectral
data of the sound to be segregated. Even though this presents the compositional challenge of how to make the relationship musically tangible, the fact that both rhythm and
timbre can be represented in the same manner sets the
ground for sonifying rhythms harmonically and viceversa, as
well as crossing perceptual boundaries such as the pulsepitch frequency that Stockhausen sonified in Kontakte bul
in a polyphonic, multi-layered way. Auditory Scene Analysis treats frequency and time segregation in the same way
(Bregman, 1990, p. 19).
I
I
monic and timbral sets and treated differen|y,
for example, by rendering harmonic pitches on a
slower time scale (the metric or 'harmonic-rhythm
scale)than timbral ones, which would enjoy a more
ephemeral role.
r
The sonority can be accompanied with synthetic tones at pitches taken from the analysis. They
Segregation in space
Behind the temporal and frequency aspects of sound segregation, we can use a third dimension that our auditory
system uses in the everyday life to analyze sound scenes:
spatial localization. We notice how impoftant this dimension
is if we block one ear (and inter-fere with spatial localization):
L
x
t
i)
f
.s
d
I
IJ?
GI
the task of following one sound source becomes much
harder. Spatial segregation happens when two sounds
perceived to be separated by a large spatial distance oc-
central to the aesthetic results of the sonification of models,
the actual choice of materials is also crucial. ln the case of
the present study, material refers to the specific sound ob-
cur within a small temporal interval. Our brain interprets this
interval imagining two sound sources at drfferent positions
in space. Our lived experience probably influences this judgement: two sound events can be produced by the same
source, only if the distance between their spatial position
is small compared to their temporal separation (Bregman,
1990, pp. 73-74).ln Clinamen, we will use this observation
to create a varying degree of spatial segregation between
the sound sources. We use a simple modification of the
Ambisonic functions (see for example http://www.ambisonics.net) to progressively increase and decrease the spatial
distance between positions of virlual sound sources in a 3d
space. The user provides for each track the 3d coordinates
of the final position of the sound source. When all tracks
play simultaneously the sources are situated in the center of
a sound sphere. During the acceleration, the sources progressively spread toward the surface of the sphere; they
are positioned at the point defined by the user when the
rhythm is formed. The inverse process happens during the
rhythmic deconstruction, in the deceleration phase.
jects, rhythms, spatial paths and pitch configurations that
Gonclusions and future developments
The motivation for this research is to translate insights from
auditory scene analysis into music composition, ln particular,
our focus was developing three algorithms to control the degree of perceived fusion/segregation of different audio streams
in three dimensions: rhythm (Algorithm for Rhy.thm Segrega-
tion), pitch (Dissonance Curve Analysis) and space. During
this process the collaborative piece Clinamen was conceived
to try out the aesthetic results of the algorithmic work.
Although the piece is not completed and still in preparation,
preliminary tests of the algorithms showed some interes-
ting results. The use of perceptual ideas makes the fusion
and segregation effect quite strong for the listener, Some
unexpected effects also appeared: when the rhythm is
completely built and its decomposition begins, the listener
clearly foresees what is going to happen. This predictability
is however not an aesthetic problem: we found instead that
the listener gets curious about how smoothly the process
happens and is interested in where he/she looses the rhy.thmic structure.
Clinamen is stjll a work in progress; it is still undecided regarding the kind of sound materials it may be based upon.
This is an interesting aspect of algorithmic composition,
briefly touched upon above, namely, that it is made up
of model and material. This is related to the more general question of form and content. As much as mapping is
2' :'-:,.,.,
; :,,
;.
tJasi;rindar
are used to feed the segregation algorithms. We could call it
also the model's mask This phase of composition determines most of the results that follow, governing many aspects
of what the music is 'about'.
The Algorithm for Rhythm Segregation, along with the Dis-
sonance Curve Analysis have been already used in the
composition of two pieces by Juan Sebasti6n Lach. Btank
Space, for clarinet, piano and soundtrack was composed
in part by transcribing the rhythmic process for the instruments. The resulting piece sounds very processual, direc-
tional, notwithstanding its highly discontinuous melodic
material; there is a sense of a procedure taking place even
though it is very difficult Io Iell what exacfly is happening.
The rhythmic process resembles minimalist phase music
but in a more abstract way, mainly because of its discontinuity and for not being based on a repetitive cell. The
process from verticality to rhythm and back has an extra
layer of continuous transition from harmonic to inharmonic
pitch material and vice versa (the pitches are derived from
the dissonance analysis of the electroacoustic soundtrack
which is sounding at the same time as the instruments). lt
traverses different poles in the harmonic fields during the
rhythmic transitions, which also gives it a 'spectral' flavor
as normal harmonic tension rs usually based on contrast
and not on gradual transformation; the feeling of ongoing
movement being noticeable all along. Noteworlhy in both
the rhythmic and the harmonic aspects is that the arrivals
at the rhythmic or harmonic poles are not felt as arrivals into
goals, but more as zones of relative stability, not lasting for
long before the teleology continues once more.
There is another set of pieces done with these algorithms
and made with the use of musical automats at the Logos
Foundation in Ghent, Belgium2. The ability to control acoustic instruments by computer has the advantage of making
possible the navigation of the harmonic-timbral plane by
generating notes at various time scales, from slow to very
fast. These algorithmic improvisations have been great experimental setups in which to try different combinations of
parameters (both pitch and time related) and to hear and
judge their resulting musical textures.3
Finally, we noticed the complexity of using dlfferent dimensions of segregation simultaneously. One example pointed
out by Bregman happens using spatial segregation on melody: if we spatially separate sources playing different notes
of a melody (without altering the duration or pitch of the
notes), the perception of the melody as a whole breaks,
an. clE,'
3:=::":-:-s:r,'a:3:.l:rlr33l3'5as\4;eli
as
cf EiartkSpaceaieavarlabieathiti::,frr,;eb.me,con/lslach/webpage/riusic.h.,nl
ri
I
I
until the listener cannot perceive the melody anymore. This
rupture may become even more complex if we alter the
rhythm and the spectral content of the melody. Because
in Clinamen we plan to use three types of segregation simultaneously, the degree of segregation used for each one
has to be carefully chosen. lt takes several trials and careful
listening to find the optimal balance between degrees of
segregation for all three dimensions.
Future work can try to define some strategies that allow the
composer to balance the three types of segregation as well
as developing algorithms for other segregation dimensions
such as amplitude and frequency modulation, as exemplified in Bregman's Auditory Scene Analysis.
Bibliography
Barlow, C. 2006. On Musiquantics, The Hague: Royal
Conservatory
Bregman, A. 1990. Auditory Scene Anatysis, Boston:
MIT Press.
Carlos, W. 1987. 'Tuning: At the Crossroads', pp. 2943, Computer Music Journal, 11/1
.
Erickson, R. 1975. Sound Structure in Music, Berklee:
University of California Press.
Leman, M. and Schneider, A. 1997. 'Origin and Nature
of Cognitive and Systematic Musicology', pp. 13-29,
in: Music, Gesfa/f and Computing Berlin: Springer-Verlag. Leman, M. (Ed.).
Lerdahl, F. and Jackendoff, R. 1983.
A
Generative
theory of Tonal Music, Boston: MIT Press.
Reich, S. 2002. 'Music as a gradual process', pp. 3436, in: Writings on Music, 1965/2000, Steve Reich,
Oxford: Oxford University Press.
Roads, C. 2001 . Microsound, Cambridge, Mass: MIT
Press.
Sethares, W. '1999. Tuning, Timbre, Spectrum, Scale,
London: Springer-Verlag.
I
Tenney, J. 1982. Meta Hodos and META Meta Hodos,
Lebanon NH: Frog Peak.
I
t
I
tr
h)
.f
€
fit
© Copyright 2026 Paperzz