Role of Habituation in the Irrelevant Sound Effect: Evidence From

Copyright 1998 by the American PsychologicalAssociation,Inc.
0278-7393/98/$3.00
Journal of ExperimentalPsychology:
l~au'ning,Memory,and Cognition
1998, Vol.24, No. 3, 659-671
Role of Habituation in the Irrelevant Sound Effect: Evidence
From the Effects of Token Set Size and Rate of Transition
S6bastien Tremblay and Dylan M. Jones
Cardiff University
The disruption of serial recall by irrelevant sound was explored by examining the effect of the
number of different tokens (token set size) and by varying the transition rate between different
tokens. Two sets of predictions were contrasted. One, based on changing state, posited that a
mismatch in physical composition between immediately successive stimuli was the important
factor, leading to the prediction that disruption would increase as the token set size increased
from 1 to 2 but would show no increase above that. Another, based on habituation, predicted
that increasing the set size would increase disruption monotonically, on the grounds that for a
given exposure each token would be relatively less habituated. Generally, the results showed
the most marked increase in disruption occurred when the token set increased from 1 to 2,
giving some support to the changing state hypothesis.
The disruptive effect of irrelevant sound on serial recall is
well established (CoUe & Welsh, 1976; Jones, Madden, &
Miles, 1992; Salam6 & Baddeley, 1982). Even though
participants are instructed to ignore them, the presence of
spoken words or other sound tokens produces appreciable
disruption of serial recall. Recent work has attempted to
establish the characteristics of the sound and those of the
task that augment or diminish the effect (see Jones, 1995, for
a review). The purpose of the present series of studies was to
establish the degree to which the effects of irrelevant sounds
are mediated either by habituation or by changes in state.
Interest in the effect of background sound on performance
is longstanding (e.g., Broadbent, 1958). Emphasis at a
practical level has been placed on the likely impact of sound
on the efficiency at the place of work or learning (e.g., Jones
& Broadbent, 1991). Early work concentrated on meaningless sound, the main variable of interest being the effects of
intensity, rather than meaning or spectral composition. The
main explanatory construct was, therefore, behavioral arousal.
However, the discovery that irrelevant verbal material could
bring about losses of efficiency on the order of 30% on serial
recall at relatively low levels of sound intensity (Colic,
1980; Colle & Welsh, 1976) meant that the impact on
efficiency was likely to be more widespread than was
thought hitherto. Moreover, the fact that early studies
indicated that the effect was confined to speech meant that
emphasis was placed on constructs involving cognitive
representation within memory (Cowan, 1988, 1995; Jones &
Morris, 1992; Massaro & Cowan, 1993). Latterly, the
finding that nonspeech sounds also produce the effect (Jones
& Macken, 1993, 1997a; Jones, Macken, & Murray, 1993;
Klatte, Kilcher, & Hellbrtick, 1995) has shifted the emphasis
toward the linkage between organizational processes in
audition (based on the auditory scene analysis framework of
Bregman, 1990) and memory (see Jones, 1993; Jones,
Beaman, & Macken, 1996). Indeed, it has been argued that
the irrelevant sound paradigm may be used to address key
issues in auditory attention that had been explored by the
classical variants of the dichotic listening paradigm (Beaman & Jones, 1997). The research has implications, therefore, for a number of domains within psychology, particularly as they relate to the interplay of auditory attention and
short-term memory.
Several key characteristics of the irrelevant sound effect
can be asserted and with a fair degree of confidence: First,
and as already noted, intensity does not have an effect, at
least within the range of 40--75 dB(A) (CoUe, 1980;
Ellermeier & Zimmer, 1997; Jones, Miles, & Page, 1990);
second, there is little effect of meaning--that is, meaningless
speech is usually as disruptive as meaningful speech (Buchner, Irmen, & Erdfelder, 1996; Colle, 1980; Jones et al.,
1990; Salamd & Baddeley, 1982); third, nonspeech sounds
can produce as much disruption as speech sounds (Jones &
Macken, 1993; Jones & Macken, 1997a; Jones, Macken, &
Murray, 1993; Klatte & Hellbrtick, 1993; Klatte et al.,
1995); fourth, disruption occurs not at encoding but at a later
stage within memory, and therefore the effect is not one on
the registration of stimuli but on their maintenance in
memory (Baddeley & S a l a m t , 1986; Jones, 1994; Miles,
Jones, & Madden, 1991; Salam6 & Baddeley, 1989); and
This research was supported by an Overseas Research Studentship (United Kingdom), by the Qutbee government via the funding
body Formation de Cbercheurs et Aide ~ la Recherche, by the
Defence Evaluation and Research Agency (Air Systems Sector),
and by the Economic and Social Research Council of the United
Kingdom.
We thank Robyn Boyle, Karen Anderson Howes, Bill Macken,
Phil Beaman, Jean St-Aubin, and David Alford for critical readings
of a draft of this article. We also thank Romin Tafarodi, Jaekie
Boivin, and Gordon Harold for statistical suggestions. Nelson
Cowan suggested that we undertake Experiment 4.
Correspondence concerning this article should be addressed to
Stbastien Tremblay or Dylan M. Jones, School of Psychology,
Cardiff University, P.O. Box 901, Cardiff CF1 3YG, United
Kingdom. Electronic mail may be sent to [email protected]
or to [email protected].
659
660
TREMBLAY AND JONES
fifth, tasks involving rote rehearsal are particularly susceptible to disruption (Beaman & Jones, 1997, in press; Boyle &
Coltheart, 1996; Salam6 & Baddeley, 1990; but see
LeCompte, 1994, 1996). From the point of view of the
current article, a further key characteristic--the changing
state effect--is of particular interest. This refers to the
well-established phenomenon that a sound sequence made
by repeating a token produces small effects that are usually
nonsignificant (see below for further discussion), whereas a
sequence consisting of different tokens produces markedly
greater disruption (Jones et al., 1992, 1993; LeCompte,
1994, 1995, 1996). These findings are embodied within the
changing state hypothesis, which rests on two assumptions.
The first is that the disruption of serial recall is the product of
a conflict of cues to serial order between the sedation of
items in memory and the automatic and preattentive sedation of irrelevant auditory events. The second is that
information about sedation of auditory events arises from a
mismatch in the physical characteristics between successive
stimuli. The greater the mismatch, the stronger the cue to
sedation, and in any given period the more frequently the
mismatch occurs, the greater the number of cues to sedation
(see Beaman & Jones, 1997, in press; Jones & Macken,
1995c), both of which increase the disruption of serial recall.
A potential significant weakness in this approach is that
some changing state phenomena can be incorporated satisfactorily within an entirely different conceptual framework, that
of habituation to the orienting response (OR; see Sokolov,
1963). Classically, the OR is a response to novel or
significant stimuli and is associated with a panoply of
behaviors, including orientation of the head, behavioral
quieting, parasympathetic activity, and preferential processing of the eliciting stimulus. Cowan (1995) proposed that the
OR can serve to recruit attention away from the performance
of a short-term memory task. Abstractly, the OR framework
supposes that, with repeated presentations of a stimulus, a
neural model is built up (see Sokolov, 1963). An OR may be
elicited when the representation of the new stimulus fails to
match a previously established neural model of elements
within the stimulus sequence. As the neural model builds up,
the discrepancy between the model and a repeated stimulus
decreases, and therefore the OR habituates. In this way, the
habituation hypothesis could explain the changing state
phenomenon; that is, repeated sounds are habituated to very
quickly, but those that change are habituated to more slowly,
and hence continue to disrupt performance.
Within the current experimental series, we make the
assumption that there is a lawful relation between the
magnitude of the OR and the number of different tokens in
the stimulus sequence. If, as Sokolov (1963) proposed, each
event contributes to the fabrication of a neural model, then
for a given rate of presentation the number of tokens in the
irrelevant stream should dictate the rate of habituation. That
is, if one event is repeated, then only one neural model will
be current, with successive events in the stream having
progressively diminished ORs in view of their increasingly
good fit to the model. An additional characteristic of the
habituation hypothesis as it is used in the current article is
that the fabrication of the neural model takes time and,
hence, the degree of disruption should diminish with repeated presentation, specifically, that the disruption of serial
recall by irrelevant sound shouM under some conditions
show a relation to the degree of prior exposure. If one further
assumes that several models may be simultaneously in the
process of construction, over a given period of exposure
each will receive relatively few tokens that give a good fit to
the model, and hence ORs will be evoked more frequently.
Even if there is only one model, the complexity of which
varies as the number of tokens in the irrelevant sequence
increases, then the same prediction holds; in this case, with
larger numbers of tokens the rate of habituation will be slow
because each token will be a relatively poor fit to the neural
model. Perforce, this will mean that the recruitment of
attentional resources that is the behavioral manifestation of
the OR will endure.
Experiment 1
In Experiment 1, some of the predictions of the habituation and changing state hypotheses are tested. Each theory
may be couched in both strong and weak forms. A strong
form of the habituation hypothesis may be taken to predict
that disruption of serial recall will be a linearly increasing
function of the token set size. That is, the degree of
disruption increases by the same degree every time the token
set size increases by one. Weak forms might not specify a
linear increase, only a monotonic one. A strong version of
the changing state hypothesis predicts that disruption will
only increase appreciably as the number of tokens increases
from one to two, with no increase in disruption as the
number in the token set enlarges beyond that. However, the
same pattern of results could be predicted by adding an
auxiliary assumption to the habituation hypothesis, namely,
that the transition from one to two mental models carries
with it a disproportionate load for attention. A weaker
version of the changing state hypothesis might predict a
strong rise for the increase from one to two tokens and a
more modest rise thereafter on the grounds that stimulus
mismatch is derived from several successive stimuli. Another possibility is that the processing system may look for
patterns in the random repetition of irrelevant sounds. Until
such a pattern is found, it could be argued, habituation could
not fully take place. This carries with it the assumption that it
would be much harder to find such a repetition of two sounds
than in repetitions of more than two sounds. However, the
particular difficulty with two sounds seems rather arbitrary
and, moreover, there is some evidence that fixed and random
sequences of irrelevant sound have roughly a similar effect
(Jones, Madden, & Miles, 1992), a result that suggests that
the irrelevant sound effect is insensitive to suprasegmental
features.
Evidence pivotal to differentiating the two hypotheses
should also be provided by the nature of the changes over the
experimental session. It seems reasonable from the habituation hypothesis to expect both that the degree of disruption
will diminish with repeated exposure and that the rate at
which the disruption diminishes depends on the token set
size, lower set sizes being associated with faster rates of
IRRELEVANTSPEECH AND HABITUATION
decline. By contrast, neither the decline nor its interaction
with set size is a logical consequence from the standpoint of
the changing state hypothesis, because the rate of change
from one stimulus to another is all that counts according to
the changing state hypothesis.
A large pool o f irrelevant material was used in this
experiment, permitting the variation of items from trial to
trial within one condition, to heighten the influence of
novelty. The usual method of presenting auditory conditions
is to randomize them on a trial-by-trial basis. This is a
widely used methodology in studies of the irrelevant sound
effect; its adoption in Experiment 1 was intended to safeguard against possible effects of habituation. Because the
difference between steady state and changing state conditions was originally observed using these designs, it is then
fitting that the framework be tested with these designs also.
In the case of Experiment 1, this means that the predicted
interaction between token set size and exposure might be
attenuated relative to a setting in which trials were blocked
by conditions.
Method
Participants. Twenty-four students, each a native English
speaker, volunteered to participate in the experiment in return for
course credit. All participants reported normal hearing and normal
or corrected-to-normal vision.
Apparatus and materials. The memory task took the same
general form as in earlier studies (e.g., Jones & Macken, 1993),
with sequences of to-be-remembered items--the nine consonants,
f, k, l, m, q, r, s, t, and v----presented on the screen of an Apple
Macintosh Pefforma 475. The sequences were constructed from
random orderings of items, with the constraints that a consonant
could not be presented in the same serial position in two consecutive lists and that recognizable words or acronyms would be
excluded. Following presentation of the list, there was a delay
during which participants were expected to rote rehearse the
sequences; at the end of the interval they were instructed to write
down the sequences in correct serial order.
Irrelevant sound was played during the some of the trials. The
main variable of interest was the number of tokens presented
during a sequence---qhe token set size--with five levels. Set size
was systematically varied between one, two, three, five, and seven
tokens. For each of these conditions the tokens were drawn without
replacement from a pool of 270 monosyllabic words (picked at
random from the word frequency list of Gilhooly & Logic, 1980).
All sequences of monosyllabic words were checked to ensure that
there were no rhymes, no meaningful sentences, and no tokens
starting with the same letter within a trial. In the two-token case, the
tokens were alternated.
The tokens were spoken in a male voice and recorded digitally to
8-bit resolution at a sampling of 22.5 KHz using Sound Designer II
software (Degidesign, Menlo Park, CA), and all trials were stored
as sound resources within SuperCard (Version 2.5, Allegiant, San
Diego, CA). All words were edited using digital signal-processing
techniques to last 400 ms, and the length of the silent gap between
them was 100 ms. Thus, for every level of the token set, the tokens
were played at a rate of two per second.
Experimental design. In addition to five levels of the token set
variable, a quiet control condition was included. A repeatedmeasures design was used in which the ordering of trims was
prearranged quasirandomly such that they were presented randomiy over a block of 6 successive trials and in 6-trial blocks
661
thereafter, each with a different random sequence. Participants
received 15 trials in each condition, for a total of 90 trials.
Procedure. Participants were tested individually, seated in a
soundproof laboratory, at a distance of approximately 0.5 m from
the computer screen. At the outset, participants were given standard
instructions on the computer screen, informing them of the nature
of the recall task and instructing them to ignore any sounds they
might hear; participants were also informed that the sounds would
not contain any messages and that they would not be tested on the
sounds' contents. The experimental trials were preceded by a short
practice session of three trials with no sound. Participants used a
mouse to initiate the presentation of each list by clicking on a
SuperCard button on the screen, and the consonants were displayed
individually at a rate of one per second (on for 0.8 s, off for 0.2 s).
When nine consonants had been presented, the word wait flashed
for 10 s, during which participants were expected to rehearse
covertly. The word recall was then displayed to prompt participants
to write down the consonants on a response sheet. The irrelevant
sound was played over headphones throughout presentation and
rehearsal, and was switched off automatically during recall. The
experiment took 60 min.
Results
The results were scored by a strict serial order criterion:
That is, if a response was not in its correct serial position, it
was counted as an error. A three-way repeated-measures
analysis of variance (ANOVA) was carded out on the error
data, with auditory condition (six levels), serial position
(nine levels), and trial block (three levels) as variables. The
15 trials of each condition were grouped per block of 5 trials
according to their order of presentation. It should be noted
that because of the trial-by-trial randomization of the
presentation of the auditory conditions, the 5 trials of any
one condition were not contiguous.
There were significant main effects of the three variables:
block, F(2, 46) = 4.70, MSE = 5.44, p < .05; auditory
condition, F(5, 115) = 20.30, MSE = 2.51, p < .0001; and
serial position, F(8, 184) = 48.86, MSE = 6.70, p < .0001.
The only interaction to reach significance was that between
condition and serial position, F(40, 920) = 2.29, MSE =
0.86, p < .0001 (see Figure 1). No functional significance is
attached to this interaction; it more than likely reflects a
scalar effect, and furthermore no theory attaches a particular
functional role to it.
Post hoe pairwise comparisons o f means (Neuman-Keuls
procedure) showed that there were fewer errors in the quiet
condition than in steady state and that there were fewer
errors with one token than with two tokens, but above two
tokens the number of errors did not increase significantly
with the number of tokens. Generally the pattern in terms of
errors is as follows: seven tokens ~ live tokens ~ three
tokens ~ two tokens > one token > quiet (see Figure 2, in
which Cohen's d values are given for adjacent contrasts;
also, as elsewhere in the series, error bars represent withinsubjects confidence intervals (~t = .05), calculated using the
method of Loftus & Masson, 1994).
This pattern of results does not give clear support to the
strong form of the habituation hypothesis; clearly, the rise
was not linear. Arguably, the weak form was not supported
either; although the rise in error above two tokens was
TREMBLAYAND JONES
662
80-
:t
30t _ ~ / j ~ / l ~
---O---1Token
20 V / /
10
J
~,
3 Tokens
LI
,L
5 Tokens
7 Tokens
0
2
3
4
5
6
7
8
SerialPosition
Figure 1. The interaction of token set size and serial position in
Experiment 1.
evident, it was not of sufficient reliability to achieve
significance, unlike the rise from one to two tokens. The
strong form of the changing state hypothesis was not
supported unequivocally; there was a weak trend for tokens
above two, but a weaker form--predicting the largest
increase as the number of tokens increased from one to
two--seemed to be broadly supported by the findings. One
possibility seems to be that there was a ceiling effect, but this
seems remote given that the overall level of error was
approximately 50%.
The effect of trial block was examined to test whether the
strength of disruption by any of the auditory conditions
diminished during the duration of the experiment, that is, to
examine whether trial block and auditory condition interacted. This interaction term was not significant, F(2, 230) =
1.08, M S E = 2.43, p > .05. As mentioned above, the
randomization of trials precludes a test of this feature of the
habituation hypothesis, namely, the interaction of trial block
with token set size.
disruption. For example, Jones et al. (1992) showed that
fixed changing state sequences of syllables and randomized
sequences of syllables disrupted serial recall performance to
a similar degree. Moreover, a change in the order of stimulus
presentation failed to evoke an OR, according to one study
(Furedy, 1968).
One initially puzzling feature of the results is that the
effect of one token was significantly greater than quiet. If
changing state is the primary determinant, why should a
condition in which the same item was repeated--a steady
state condition--produce a significant degree of deterioration? Part of the reason for this effect may be that the single
token condition was not completely free of variation. Two
sources of such variation can be suggested. The first is that at
the beginning of each trial there is, of course, change from
quiet. This, arguably, may be a change sufficiently large to
disrupt performance, and may be especially important in the
present case because the last item heard in the previous trial
will be different from the first items of the current trial. The
other possibility is that the information derived from stimulus mismatch was inherently noisy. However, the effect of a
single token was sometimes significant and sometimes not
and was typically very small in magnitude, which makes it
rather difficult to study (see also LeCompte, 1995). The
important issue for the changing state hypothesis is that the
effect of more than one token is always appreciably greater
than the effect of one token.
The disruption of serial recall, independent of the number
of tokens in the stream, did not show any attenuation over
the three blocks of irrelevant sound trials. Throughout the
I
-"O-- Tones(Exp2: n~48) I
60-
Speech(ExpI: n=24)
-
29 T
.05
T
.]]
3
4
50-
L
i
1
1
40-
T
.02
T
o
.L
Discussion
The form of the function relating set size to disruption of
serial recall supported the changing state hypothesis, albeit a
weak version of it. Evidence from the way in which
disruption stayed stable over successive periods of the task
also suggests that habituation-like processes were not at
work.
In Experiment 1, the order of the irrelevant items was
fixed, but this should not have influenced the outcome.
Suprasegmental features, such as meaning or the order of the
tokens, appeared to have little influence on the degree of
1
30
.
l
.
2
.
.
;
,
7
TokenSetSize
Figure 2. Percentage of serial recall errors under different
conditions of token set size in Experiments 1 and 2, pooled over
serial positions and blocks. Cohen's d is given for adjacent
contrasts within the body of the graph. Exp = experiment.
IRRELEVANTSPEECH AND HABITUATION
experimental session, the difference between multiple tokens and either a single token or the quiet condition was
altogether marked and sustained. In terms of the action both
of the token and of the pattern of interference over trial
blocks, the evidence seems to favor the changing state
hypothesis and is in line with previous work (Hellbrtiek,
Kuwano, & Namba, 1996; Jones, Macken, & Mosdell,
1997). However, it is possible to argue that using a large
pool of irrelevant stimuli could have concealed an attenuation of the irrelevant sound effect over trials, and that it is
unlikely that a participant could habituate to a stimulus
sequence when its content varies at every trial. If the mental
models take appreciable time to build up, say an interval
beyond the time for each trial, there would be little
opportunity for each token to contribute to a unique model,
and therefore the process would not be susceptible to
habituation, Additionally, this effect may have been compounded by the complexity of the stimuli. As Sokolov
(1963) suggested, neural models for simple representations,
such as those for relatively simple tones, may be constructed
more readily than those for complex words. Presumably, the
faster a model is constructed, the sooner the process of
habituation can get under way.
Experiment 2
Experiment 2 used a small fixed set of tone stimuli with a
view to extending the scope of the results of Experiment 1.
Here, the simpler stimuli and the smaller pool of tokens
should contribute to faster habituation. There are a number
of studies that have shown an effect of tones and nonverbal
stimuli on serial recall (Jones & Macken, 1993; Jones et al.,
1993; Klatte et al., 1995); hence, the predictions are the
same as for Experiment 1.
Method
Participants. Forty-eight undergraduate students, each reporting normal heating and normal or corrected-to-normal vision,
received either course credit or a small honorarium for participating
in the experiment.
Apparatus and materials. The memory task was the same as
that for Experiment 1.
Tones were used instead of words as auditory irrelevant material;
18 different tones were selected quasirandomly from a range of
eight octaves. Token set size was varied, using one, two, three, five,
or seven tokens. Within each sequence, the tones adjacent in pitch
were different by an octave. In the case of more than two tokens,
ascending and descending sequences were avoided by randomizing
the order of the tones (once randomized, the order remained fixed
for the experiment). Five different irrelevant sequences (given here
in their randomized order) were contrasted: one token (294 Hz),
two tokens (220, 440 Hz), three tokens (349, 698, 175 Hz), five
tokens (196, 784, 392, 1568, 98 Hz), and seven tokens (660, 330,
165, 2640, 83, 41, 1320 Hz). Each particular tone sequence was
fixed for each condition. Thus, there was no variation of the
irrelevant material within a condition.
Square wave tones were generated to 8-bit resolution at a
sampling of 22.5 KI-Iz using Sound Edit Pro (Version 1.0,
Macromind and Paracomp, San Francisco, CA) software. Square
waves include the fundamental tone frequency and all of its odd
harmonics (Rosen & Howell, 1991) and tend to produce a sound of
663
more strident timbre than pure tones (Handel, 1989). Every tone
was edited to last exactly 400 ms, and the length of the silent gap
between them was 100 ms. Each tone had a rise and fall time of 20
ms. Thus, as for Experiment 1, the recorded tokens were played at a
rate of two per second. A quiet condition was used as a control. The
experimental design and procedure were identical to those used in
Experiment 1.
Results
As in Experiment 1, a 6 (auditory condition) x 9 (serial
position) X 3 (block) repeated-measures ANOVA was
undertaken on the serial recall error data. As before, there
was a significant main effect of the three variables: block,
F(2, 94) -- 10.76, MSE = 4.70, p < .001; auditory
condition, F(5, 235) = 9.75, MSE = 3.15, p < .0001; and
serial position, F(8, 376) -- 52.95, MSE = 8.03, p < .0001.
There was no significant interactions (Fs ~- 1). The GeisserGreenhouse procedure, which provides a conservative F test
(Winer, 1971), was used for this ANOVA because of the low
homogeneity of covariance (these stricter Fs were also used
in Experiments 4 and 5 for the same reason). The serial
position effect took the usual form, with 15.2% errors at the
first serial position, with errors peaking at 56.1%, and with
errors decreasing to 46.8% at the last serial position.
Orthogonal planned comparisons (using Helmert transformations) were performed on the error scores. This procedure
compares one level of a repeated-measures variable to the
mean of subsequent levels and is a useful device for
discovering at which level recall error performance ceases to
change. The analysis for Experiment 2 revealed that recall
performance in the quiet condition, F(1,235) - 30.65, p <
.001, and in the one-token condition, F(1,235) = 13.25,p <
.001, was significantly different from that of the subsequent
token conditions. All subsequent comparisons---within the
two-, three-, five-, and seven-token conditions---were not
significant (p > .1). The pattern of significance resembled
that in Experiment 1 (seven tokens ~- five tokens ~= three
tokens ~ two tokens > one token > quiet; see Figure 2).
For Experiments 1 and 2, trend analyses (orthogonal
polynomials) were carded out on the data under the different
auditory conditions of the token set size. The function
relating token set size and degree of disruption appears to be
better described by a composite of trends (see Keppel,
1988). Both Experiments revealed a significant departure
from linearity (see Kirk, 1982): Experiment 1, F(1, 92) =
6.84, p < .05, and Experiment 2, F(1, 188) = 3.95, p < .05.
As indicated by curve fitting (performed on each curve in
Figure 1), the best estimate of the underlying relationship is
an inverse (Ay-- 1/x) function (Experiment 1: R 2 = .95;
Experiment 2: R 2 = .98).
Discussion
The pattern found with words drawn from a large corpus
(Experiment 1) was repeated with tones drawn from a small
corpus. Granted, the effect of stimulus type (tone vs. word)
and corpus size are confounded, but the key characteristic of
Experiment 2 is the similarity of the shape of the relationship
between token set size to the degree of disruption. This
664
TREMBLAY AND JONES
suggests a functional similarity in the action of tones and
speech (see Jones & Macken, 1993; but see LeCompte,
Neely, & Wilson, 1997).
Just as in Experiment 1, the difference between the effect
of the number of tokens remained constant over successive
blocks of trials, which can be taken as not supporting the
habituation hypothesis. However, as before, it is possible to
argue that this comparison is a somewhat unfair test of
habituation because auditory conditions were randomized,
and hence the opportunity to build up a stable representation
of the auditory stimuli was rather limited. Experiment 3
addressed this issue by blocking the auditory conditions.
It is important to acknowledge at this point that the design
of Experiment 3 represents a departure from the concerns
that were the initial motivating force for the current series.
One of the main purposes of the series was to understand the
body of data that consistently showed a changing state
effect. These data were derived from designs in which
conditions were randomized as a means of minimizing any
possible effect of habituation. A blocked design diverges
from this tradition, and the relevance of the result to the
original purpose of the series is potentially doubtful. One
previous study had examined the effect of blocking auditory
conditions and found no evidence of habituation-like diminution in response over 20 trials (Jones et al., 1997). The main
purpose of Experiment 3, therefore, was to contrast critical
conditions of Experiments 1 and 2 with an extended
exposure to the token set but with a design in which
conditions are blocked.
Experiment 3
In this experiment, three critical auditory conditions from
the previous two experiments were examined in relation to
error in serial recall: The effects of one, two, and seven
tokens were compared with quiet. Given that Jones et al.
(1997) found no evidence of habituation with speech over
exposure to 20 consecutive trials, the number of trials was
increased to 30. Further, to avoid the possible contaminating
effects of other conditions in which competing neural
models are built up, a between-subjects design was used.
The main prediction from the habituation hypothesis is that
the degree of disruption should diminish as the exposure
increases, and further, that this should interact with token set
size so that the rate of decline in the level of disruption will
be greater for two tokens than for seven tokens over the 30
trials. Unlike in Experiment 1, but like in Experiment 2, the
identity of the tokens remained fixed in each condition. The
rationale for this was the same as for Experiment 2, namely
that according to the habituation hypothesis it should
promote the fabrication of few, relatively enduring neural
models.
Me~od
Participants. Eighty undergraduate students were given either
course credit or a small honorarium for their participation in the
experiment (20 participants were assigned randomly to each of four
conditions). All reported normal hearing and normal or correctedto-normal vision.
Apparatus and materials. The memory task was identical to
that used in the other experiments of the series.
Three conditions containing irrelevant sound, this time speech,
were contrasted: a sequence of one token (the word cent), a
sequence of two different tokens (the words car and verb), and a
sequence of seven different tokens (the words turn, kilt, band, jaw,
fruit, rod, and porch). As before, the recorded tokens were played at
a rate of two per second.
Experimental design. A mixed design was used in which
auditory condition represented a between-subjects variable, and
block of trials and serial position both represented within-subject
variables. Participants received five blocks of six trials of one
condition. Four conditions were contrasted: three in which sound
was presented (one, two, and seven tokens), and a quiet control.
Procedure. This was identical to that used in Experiments 1
and 2, except that participants undertook 30 trials of only one of the
four conditions. As in previous experiments, the sound was not
played continuously over trials. The procedure took 30 min.
Results
In this analysis, 30 trials were divided into five blocks of
six trials to examine changes over each block. Serial order
error data were submitted to a three-variable (4 × 5 X 9)
ANOVA with one between-subjects variable (auditory condition, four levels) and two within-subject variables (block,
five levels; serial position, nine levels). Auditory condition
produced a significant effect, F(3, 76) = 11.25, M S E =
38.15, p < .0001, as did serial position, F(8, 608) = 87.51,
M S E = 2.71, p < .0001, and block, F(4, 304) = 10.07,
M S E = 3.71, p < .0001. There were no significant
interactions (all Fs ~- 1). The effect of serial position took
roughly the same form as others in the current series, with
21.7% errors at the first serial position, rising to 60.0% and
eventually falling to 47.4% at the last serial position.
An orthogonal set of planned comparisons indicated that
performance in the two-token condition was worse than in
the one-token condition, F(1, 76) = 9.09, p < .01, Cohen's
d = 1.02; but comparisons between quiet and one token,
F(1, 76) = 0.07,p > .05, Cohen's d = .07, and also between
two tokens and seven tokens, were not significant, F(1, 76) =
2.54, p > .05, Cohen's d = .63 (where significance is
reached at d = .77). Although the contrast between two
tokens and seven tokens presents a medium effect size, the
difference is modest relative to the large effect size of the
critical contrast between one token and two tokens. This
pattern of results (seven tokens ~ two tokens > one
token ~- quiet) supports previous results (with the exception
that the difference between one token and quiet is nonsignificant). Because there were only three levels of token set size,
and because these levels were not equally spaced, no
analyses of trend were performed (see Keppel, 1988).
A greater exposure to the irrelevant material (30 trials
instead of 20), and the fact that conditions were blocked by
groups, should have provided more opportunity for habituation, but there was no evidence of an habituation-like effect.
Following predictions related to the framework of Sokolov
(1963), an interaction between block and condition was
expected, but this interaction did not even approach significance (F < 1). The significant effect of block could be
explained by a practice effect or an improvement in doing
IRRELEVANT SPEECH AND HABITUATION
the recall task whatever the condition. Figure 3 shows the
performance over time, with no evidence of an habituation
effect: Differences between two tokens and quiet and
between seven tokens and quiet remained numerically
steady over the five blocks of trials.
Discussion
In terms of the effect over blocks, the results of Experiment 3 converge with the results of Experiments 1 and 2 of
the current series and also with results found elsewhere.
Hellbriick et al. (1996) found changing state speech disruption within- and between-experimental sessions, thus suggesting a long-term stability of the detrimental effect. Jones et al.
(1997) modified a preexposure method previously used by
Morris and Jones (1990) and found that after a 20-rain
exposure to changing state material, the same material still
disrupted serial recall performance.
The first three experiments of the series show that the
most marked increase in disruption occurred as the number
of tokens increased from one to two. Although this is
encouraging for the changing state hypothesis, though by no
means wholly supportive of it given the small numerical
increase in error as the token set rises above two, it seems
reasonable at this stage to seek further evidence from a
different experimental manipulation. In Experiment 4, therefore, a slightly different approach was adopted. The action of
two variables was investigated. The first relates to the rate of
transition between different tokens and the second relates to
the predictability of the token transitions. Two rates of
transition were compared: The token (word) was changed
either at a low rate, every three stimuli (e.g., AAABBBCCC
80.
----4D---- Q u i e t
+
T
-----4>--- 1 Token
2Tokens +
T
T
60-
7Tokens
T_
T
tg
I
40.
1
20
I
!
I
1
3
4
;
Block (6 trials)
Figure 3. Percentage of serial recall errors in relation to successive blocks of trials under auditory conditions in Experiment 3.
665
etc.) or at a high rate, after every stimulus (e.g., ABC etc.).
The changing state hypothesis predicts that as the rate of
transition increases so does the degree of disruption; on
average the degree of mismatch between two successive
stimuli is greater in the case of a high transition rate, which
will therefore produce a greater degree of disruption. The
predictability of token transitions was studied by comparing
sequences of three tokens that appeared either in a fixed
order (ABCABCABC etc.) or in a random order (ACBCABABC etc.). According to the changing state hypothesis,
two conditions sharing the same rate of transition, irrespective of the overall sequence of the tokens, should produce
broadly similar effects on the grounds that the crucial
variable is the contrast between successive tokens, not the
repetition of the supratoken pattern (a result that has some
precedent: Jones, Madden, & Miles, 1992).
Experiment 4
In Experiment 4, the effect of rate of transition (contrasting two conditions, high and low rate) and predictability
(two conditions, fixed vs. random) were compared. In each
case, the auditory sequences were based on a common token
set (in terms of both set size and rate of presentation). In
addition, two control conditions were included, a steady
state condition, in which the same token was repeated, and a
quiet condition.
Method
Participants. Forty undergraduates who received either course
credit or a small honorarium participated in the study. They were all
native English speakers and reported normal hearing and normal or
corrected-to-normal vision.
Apparatus and materials. The memory task was the same as
that for the three previous experiments.
Five conditions were used in total. In three of the conditions,
auditory sequences were constructed with a different order of the
same three monosyllabic words (turn, kilt, and band). The words
were chosen to be phonologically distinct. Rate of transition was
manipulated by varying the degree of repetition of words; they
were either repeated three times before a change or changed from
token to token. The predictability of the sequence was changed by
having either a fixed sequence of words or by having a quasirandom sequence (the sequence was constrained so that no two
successive tokens were identical). Two control conditions were
used: a steady state sequence with one word (/aw) repeated and a
quiet condition. Auditory stimuli were recorded and presented as in
Experiment 1.
Experimental design. A repeated measures design was used, in
which the presentation of conditions was blocked in groups of 15
trials and the order of conditions was dictated by the order given in
a Latin square. Participants received 15 trials in each condition, for
a total of 75 trials.
Procedure. This was identical to that used in Experiments 1
and 2, except that there were five conditions and trials were
blocked. For each participant, the experiment took 50 rain.
Results and Discussion
The means of number recalled correct as a function of the
auditory conditions are shown in Table 1. In numerical
666
TREMBLAYAND JONES
Table 1
Mean Percentage of Errors Under Conditions of
Experiment 4
Condition
M
SE
Quiet control
Steady state
Low transition
High transition (fixed)
High transition (random)
38.0
41.3
45.2
46.9
46.4
3.0
2.7
2.5
2.6
2.6
terms, the experiment resulted in the following pattern: high
transition > low transition > steady state. However, some
critical tests of these differences were not statistically
significant. As revealed by a 5 (auditory condition) x 9
(serial position) × 3 (blocks of trials) repeated-measures
ANOVA, all three main effects were significant: auditory
conditions, F(4, 156) = 7.03, MSE = 5.43,p < .001; serial
position, F(8, 312) = 71.19, MSE = 4.96, p < .0001; and
block, F(2, 78) = 4.36, MSE = 3.25, p < .05, but no
interaction was significant (Fs ~ 1). The serial position
effect was similar to that found elsewhere, with 16.2% errors
in the first serial position, rising to 61.7% errors later in the
list and falling to 51.1% at the final serial position.
According to planned comparisons within the auditory
conditions variable, the effect of predictability was small
and nonsignificant. Indeed, the effect of fixed and random
sequences on errors appears to be nearly identical, F(1,
156) = 0.72, p = .79, Cohen's d = .04. Both the fixed and
random conditions were significantly different from the
steady state condition, F(1,156) = 10.52,p < .001, Coben's
d = .52. As in many irrelevant speech experiments, the
contrast between quiet and steady state conditions failed to
produce a significant difference, F(1,156) = 3.30, p = .07,
Cohen's d = .29.
The effect of rate of transition, pivotal to the predictions
of the changing state hypothesis, was revealed by comparing
the high transition conditions (fixed or random) or steady
state sequence conditions with the low transition condition.
The order of means shows an increasing degree of error
going from steady to low transition to high transition.
However, only the contrast between low transition versus
steady state was significant, F(1, 156) = 4.20, p < .05; the
contrast of low transition with high transition (random and
fixed pooled scores) was not significant, F(1, 156) = 0.82,
p = .36.
The overall ordering of conditions in Experiment 4 was as
predicted by the changing state hypothesis: A small effect of
steady state compared to quiet, an increase in errors for
conditions in which the rate of transition of tokens was low,
and errors increasing still further for high transition sequences, but increasing the unpredictability of these high
transition sequences did not further augment the disruption.
However, critical tests of significancewbetween low and
high transition sequences--did not lend sufficient weight for
this pattern of results to be taken as support for the changing
state hypothesis. Clearly, the results of Experiment 4 are as
equivocal with respect to the changing state hypothesis as
they are to the habituation hypothesis.
In Experiment 5, further convergent evidence was sought
for the idea that transitions within the auditory stream are the
important determinant of disruption by irrelevant sound.
Instead of manipulating the number of transitions by modifying the auditory sequence, the number of tokens per unit of
time was manipulated.
Experiment 5
For an auditory sequence in which there is change, the
degree of disruption is dependent on the word dose, in other
words, on the number of tokens presented during the course
of a trial (Bridges & Jones, 1996). According to the changing
state hypothesis, the more occasions on which there is a
change between two successive tokens, the greater the
disruptive effect of irrelevant sound; that is, more cues to
seriation with high dose results in greater disruption of serial
recall. However, the opposite result can be predicted from
the standpoint of habituation. Simply, the more times a
stimulus is repeated, the quicker the OR should habituate
and, hence, the disruption of serial recall should be reduced
in high dose conditions.
To test this prediction, we examined in Experiment 5 the
interaction of word dose and the token set size in a sequence
of sound. In essence, the 2 × 2 interaction of dose (high vs.
low) and token set (two or six tokens) is a pivotal element of
the analysis. It is possible to suggest that the habituation
hypothesis will predict a significant interaction. It might be
that the disruptive effect of irrelevant sound will be greater
for the low dose at the slowest rate and, correspondingly, it
will be lower for the highest dose at the fastest rate. By
contrast, the changing state hypothesis can be taken to
predict no significant interaction: There will be a main effect
of dose, and no effect of token set size (remembering the
precedent from earlier experiments in the series contrasting
two- and six-token sets).
In addition to the conditions comprising the critical 2 × 2
interaction, two control conditions (quiet and single token,
but only at high dose to reduce the overall numbers of
treatments to comfortable levels for participants) were also
used.
Method
Participants. Twenty-four native English speakers volunteered
to take part in the study in return for a small honorarium. All
participants reported normal hearing and normal or corrected-tonormal vision.
Apparatus and materials. The memory task was the same as
that in previous experiments.
Two changing state speech sequences were used, namely, a
sequence of two different tokens (the words car and verb) and a
sequence of six different tokens (the words turn, kilt, band, jaw,
fruit, and rod). The variable, number of tokens in a stream (two or
six tokens), was combined with the word dose (high or low)
variable. All words were edited, using digital signal-processing
techniques, to last 400 ms, but the length of the gap between them
was varied: 100 ms for the high-dose sequences and 600 ms for the
low-dose sequences. Thus, for the two high-dose conditions the
recorded tokens (two and six different tokens) were played at a rate
of two per second, and for the two low-dose conditions the
IRRELEVANT SPEECH AND HABITUATION
recorded tokens were played at a rate of one per second. The
control conditions consisted of a quiet condition and also a steady
state condition with one word (cent) repeatedly played at a rate of
two tokens per second (the high-dose rate).
Experimental design. A repeated-measures design was used, in
which the presentation of trials was blocked. Six auditory conditions of 15 blocked trials were contrasted: quiet, one token at high
dose, two tokens at low dose, two tokens at high dose, six tokens at
low dose, and six tokens at high dose. The order of conditions was
randomized between subjects in such a way that each condition was
presented at each position the same number of times. Participants
undertook 90 trials in all.
Procedure. This was identical to that used in Experiments 1
and 2, except that there were six conditions and trials were blocked.
The experiment lasted 60 rain.
667
[]
2 Tokens
50
40,
30,
Results
As in previous experiments, participants' responses were
scored according to the strict serial recall criterion. An
overall analysis incorporating all of the auditory conditions
was computed first. This involved a 3 (block) × 6 (auditory
condition) × 9 (serial position) repeated-measures ANOVA.
There were significant effects of auditory condition, F(5,
115) = 13.23, MSE = 5.25, p < .0001, and serial position,
F(8, 184) = 35.91, MSE = 3.62,p < .0001, and a modestly
significant effect of block, F(2, 46) = 4.99, MSE = 2.29,
p < .05. Except for the significant interaction between
auditory condition and serial position, F(40, 920) = 1.98,
MSE = 0.85, p < .05 (see Figure 4), there were no
significant interactions (Fs ---- 1). As revealed by a test of
contrast with quiet, the one-token condition showed a
Quiet
70-
+
I Token
2 Tokens - L o w dose
~
2 Tokens - High dose
6 Tokens - L o w dose
--~
6 Tokens - High dose
6O
20,
Low Dose
High Dose
Auditory Conditions
Figure 5. Percentage of serial recall errors under dose and token
set size conditions in Experiment 5, pooled over serial position and
blocks.
numerically very small and nonsignificant disruptive effect
on serial recall, F(1, 23) = 1.72, MSE = 2.59, p > .05,
Cohen's d = .26.
The data were then submitted to a 2 (number of tokens:
two and six) × 2 (word dose: low and high) repeatedmeasures ANOVA. A significant effect of word dose was
obtained, F(1, 23) = 13.62, MSE = 51.41, p < .005,
Coben's d = .75, and the effect of the number of tokens was
nonsignificant, F(1, 23) = 0.68, MSE = 58.77, p > .05,
Cohen's d = .17. Of particular importance, there was no
interaction between the number of tokens and the word dose
(F < 1; see Figure 5).
Discussion
lO
Serial Position
Figure 4. The interaction of serial position and auditory conditions in Experiment 5.
The results of Experiment 5 reinforce the conclusion that
word dose is an influential determinant of the irrelevant
sound effect (Bridges & Jones, 1996). This in itself constitutes evidence that is difficult to explain by the version of the
habituation hypothesis used here; as the rate goes up, the
fabrication of the neural model should be accelerated and,
hence, the OR should become progressively diminished.
This result can be predicted by the changing state hypothesis; however, more frequent changes between tokens provide more cues to seriation and, hence, stronger interference
with serial recall (see Bridges & Jones, 1996, for a discussion). A second feature of the data--the failure of word dose
to interact with set size--is also difficult to explain with the
habituation hypothesis. It was expected that the rate of
habituation would be greater when there were fewer different tokens (when the set size was small) and that this trend
would be more marked when the dose was higher simply on
668
TREMBLAYAND JONES
the basis that there were more tokens of a particular class at
the higher dose to help fashion neural models.
Therefore, these findings generally give some support to
the changing state hypothesis: A change between two
successive sound units appears to be the main precondition
for the disruptive effect of sound on serial recall; moreover,
the more changes that occur during the time course of a trial,
the higher the degree of disruption, these effects being
independent of each other.
As in Experiment 3, the contrast between one-token and
quiet conditions was numerically very small and statistically
nonsignificant. These results might be taken to suggest (at
least tentatively) that significant differences between the
one-token and the quiet conditions are characteristic of
randomized designs, and not of blocked designs. What this
means in more abstract terms is not clear, but one common
denominator for a randomized design in contrast to a
blocked design is that the auditory stimuli at the end of one
trial are different than those at the beginning of the next. So,
for a one-token condition in a randomized design, the first
few stimuli represent a contrast to the last auditory stimuli
that were presented, and in a blocked design they do not. In
future studies, it may be useful to pursue these ideas beyond
the level of conjecture. At present they serve as interesting
observations that are somewhat tangential to the main thrust
of this article.
General Discussion
We may summarize the key features of the experimental
series as follows. Experiment 1 showed that disruption of
serial recall increased as the number of tokens in the
irrelevant stream increased from one to two, but as the
number of tokens increased above that, to three, then to five
and to seven, there was a small numerical increase in
disruption that was nonsignificant. When performance was
charted in terms of the successive trials in each condition,
this effect of tokens in relation to the quiet and the one-token
condition remained stable. Experiment 2 used a small set of
tones to test the generality of the effect found in Experiment
1, and the effects in form (but not in magnitude) were almost
identical. The results of Experiments 1 and 2 are generally,
but not completely, in line with the changing state hypothesis. The largest increase in disruption occurred when the set
size went from one to two; there was a small rise in
disruption thereafter, as token set size increased, but the rise
was nonsignificant. Even though the pattern of statistical
significance is in line with the changing state hypothesis in
interpreting the outcome of Experiments 1 and 2, we place
greater emphasis on the shape of the function. The fact that
we obtained the same shape of function for tones and for
speech we take as an illustration of the functional equivalence of these two types of stimuli (Jones & Macken, 1993).
In addition, we draw reassurance from the fact that the tones
shared the same shape of function, while being at a lower
overall level than the speech tokens, insofar as they logically
could not both be a result of a ceiling effect Experiment 3 useda
design in which the experimental conditions were blocked; this
was designed to increase the likelihood that habituation
would occur, but again in their general form the outcome was
the same as in Experiments 1 and 2.
In contrast, Experiment 4 offered little support for the
changing state hypothesis. As the rate of transition was
varied, the degree of disruption did not vary significantly.
Only the prediction that fixed and random sequences yield
identical degrees of disruption is borne out by the data, a
finding to which the habituation approach does not directly
speak. Much further work is required to reconcile these data
with the others in the series, possibly by using the parametric
approach to construct some kind of function following the
general approach of Experiments 1 and 2 of the current
series. As they stand, the results of Experiment 4 detract
from the overall coherence of the current series, but they
constitute a salutary sign that further work is required on
refining the analysis of the rates of transition.
Finally, in Experiment 5, the interplay of two variables-the dose and the token set size within the auditory sequence-was investigated. It was suggested that from the habituation
viewpoint, the two variables should interact, with less
disruption of serial recall when the dose was high and the
token set was small. By contrast, the changing state hypothesis predicted that there should be no interaction, only a
main effect of dose and no effect of token set (remembering
that in Experiment 5 the crucial test of the hypothesis two
tokens were contrasted with six tokens). The results were in
line with the changing state hypothesis.
In none of the experiments in the current series is there
any evidence of a change in the magnitude of the difference
between the conditions as a function of the number of trials
undertaken. This is true whether conditions were blocked or
randomized. Generally, therefore, there seems to be no
evidence of diminution of the irrelevant sound response over
time. When the effect of irrelevant sound is charted on a
trial-by-trial basis, a similar picture emerges (Hellbriick et
al., 1996), which suggests that the process of pooling the
results of trials, which has been the common procedure in
the current series, has not concealed a very short-lived
habituation effect. Whether diminution over time is a critical
prediction of the habituation hypothesis is difficult to judge
given that classic experiments on habituation of the OR use
stimuli that occur far more infrequently than is typical of the
irrelevant sound paradigm. If nothing else, however, the
results over blocks show that the effect is not evanescent,
which in turn means that the irrelevant sound effect may not
be an epiphenomenon of laboratory life and therefore will
have implications for practical settings.
The main impact of the current findings is to provide some
support for the changing state hypothesis, but not unequivocal support in view of the outcome of Experiment 4. Just
why the effect of rate of transition failed to produce the expected outcome is not clear, and as it stands the result can be
regarded as being moot, in view of the fact that it also failed
to give unequivocal support for the habituation hypothesis.
On the basis of the current series, further work in
clarifying the effect of token set size and particularly
transition rate seems warranted. In addition, further refinement of habituation theory is also needed. Jones et al. (1997)
have already discussed the main difficulties with the applica-
IRRELEVANT SPEECH AND HABITUATION
tion of habituation-based theories to the irrelevant sound
paradigm; they are, therefore, only summarized here. Apart
from the problems with the clarity of articulation of the
theory (see Gati & Ben Shakhar, 1990, for a critique), there
are doubts based on empirical evidence as to whether
habituation of the OR occurs when the participant is
undertaking a demanding information-processing task. Most
observations of the OR have been undertaken in settings in
which the participants have been listening passively. Studies
of the OR in settings in which attention is directed away
from the sound to some information-processing task have
yielded equivocal conclusions. Indeed, Ohman (1979) concluded that "habituation to the OR . . . requires central
processing capacity [so that] little habituation of the OR
would occur in situations involving heavy processing demands because of subsidiary tasks" (p. 466). The controversy over habituation of unattended stimuli is particularly
acute in studies of evoked potentials. Typically, participants
engage in a reading task while recordings are made of the
evoked responses of the brain to irrelevant auditory stimuli.
Research has been focused on the component known as
"mismatch negativity," which is evoked to a stimulus
distinct in some acoustic feature from a stimulus or stimulus
sequence that preceded it (Lyytinen, Blomberg, & Nii~t~nen,
1992; N~i~t~en, 1990, 1991, 1992), usually a few seconds
before (MtLntysalo & N~t~inen, 1987). The similarities of
this domain of research to that of irrelevant sound are
striking, both empirically and conceptually, but for the
purposes of the present discussion it is important to note that
here too it has been suggested that habituation without
attention does not occur (Trejo, Ryan-Jones, & Kramer,
1992; Woldorff et al., 1993), although it should be noted also
that this the issue is controversial, at least for pitch (Alho,
Woods, & Algazi, 1994).
The impact of the current findings on theory is not
confined to habituation and to the changing state hypothesis,
however; the results also run counter to the application of
temporal distinctiveness theory to the irrelevant sound effect
(Jones & Macken, 1997b; LeCompte, 1996). The temporal
distinctiveness account is based on the idea that retrieval of
an item depends on the search set within which it is
contained (see Glenberg, 1987; Glenberg & Swanson,
1986). If the item shares the search set with others, such as
might result from irrelevant sounds at the time of encoding
of a to-be-remembered item, then it becomes more difficult
to retrieve. However, to explain the greater disruption by
changing as opposed to steady state sequences, the theory
has to incorporate the auxiliary assumption that "repetitions
of the same item (or different items that share many
characteristics) will overload the temporal search set, but
this overload will be limited by the large redundancy
inherent in that repetition" (LeCompte, 1996, p. 1163). One
prediction that should follow from the redundancy assumption is that of a monotonic increase in disruption with an
increase in set size, an expectation that is not strongly
supported by the results of the current series.
Little has yet been made about the mechanisms relating
the stimulus mismatch to the resulting disruption of serial
recall. Essentially, the disruption of serial recall is argued to
669
arise from a conflict of order cues. These cues are derived
from two sources: one stemming from the stimulus mismatch of sounds in the auditory domain, and the other from
the linkages between items in the to-be-remembered stream.
The effect stems not from a similarity of content of the
irrelevant and to-be-remembered information (as Salam6 &
Baddeley, 1982, 1989, have suggested); rather, it stems from
similarities in process, related to the representation of order.
That the effect is not due to the similarity of content has been
demonstrated in a variety of ways; for example, the similarity explanation is contraindicated by the mere fact that
significant degrees of disruption are produced by an array of
nonspeech sounds such as tones (Macken & Jones, 1995),
pitch glides (Jones et al., 1993), or bursts of noise (Jones &
Macken, 1997a) with minimal similarity to the syllables of
the memory task. The same conclusion can be reached if the
similarity of the content of the auditory sequence is manipulated systematically so that it is either similar to the to-beremembered sequence or to other tokens in the auditory sequence; in this case, similarity within the auditory sequence
seems to dictate the disruption (Jones & Macken, 1995c).
The strong effect of within-stream similarity as opposed to
between-stream similarity resonates well with the results of
the current series, given that we conclude that similarity
within the token set is the primary determinant of disruption.
Although it seems plausible to talk about linkages between items in terms of deliberate rehearsal of verbal item
sequences in a short-term memory paradigm--indeed, they
form a central plank of several models of memory (see, e.g.,
Murdock, 1993, 1996)--proposinga kindred mechanism for
generating linkages amongst unattended auditory stimuli
seems at first to be rather curious. The key to understanding the
role of a linkage mechanism for the auditory material lies
in the realization that, even when unattended, sounds are
organized into sUeams on the basis of their physical properties,
just as when they are the subject of directed attention (a
phenomenon referred to as auditory streaming; see Bregman, 1978, 1990). Several studies have now demonstrated
that the degree of disruption by irrelevant sound can be
modulated by organizational factors such as the physical
location of the sound (see Jones & Macken, 1995a, 1995b).
Therefore, part of the action of information processing of the
unattended sound is its organization into streams; our
speculation is that the process of stimulus mismatch contributes to the formation of auditory streams. The fine-grained
detail of the action of such a mechanism has not been
worked out, but early speculations suggest that a process of
correlation between successive tokens may be responsible.
It seems plausible that these specific effects within the
irrelevant sound paradigm inform the general understanding
of cognitive architecture. Results from the irrelevant sound
paradigm are in line with the idea that short-term memory
phenomena are best regarded as an activated subset of
long-term memory, rather than as a distinct repository to and
from which representations are transported (see also Cowan,
1988, 1995). The demonstration that the damage due to
irrelevant sound occurs in a postencoding stage testifies as
much (Miles et al., 1991). If sounds within the irrelevant
sound paradigm are not deliberately attended to, then in
670
TREMBLAYAND JONES
some sense they are simply registered, and this could
conventionally be regarded as a function restricted to the
perceptual level of representation. By contrast, the processing undertaken in a serial recall task is under volitional
control; moreover, the material is represented at the postcategorical level, and this would be regarded as a memory
effect. If we accept this characterization of each factor, then
we have evidence of interference between two activities that
apparently occur at different stages in the processing chain.
One of the main ways in which apparent conflict could be
reconciled is by supposing that in fact the distinction
between perception and memory is a false one, and that
irrelevant sound interferes with the process involved in the
activation of representations. Stage theories (e.g., the modal
model of memory; see Baddeley, 1986, 1990) cope with this
shift in theoretical emphasis much less well than procedural
theories (e.g., Crowder, 1989, 1993; Kolers & Roediger,
1979). Arguably, therefore, the results from irrelevant sound
paradigms have contributed to the debate about the appropriate general descriptions of phenomena of memory and their
relation to attention. Further resolution of the role of
habituation needs to made, however, before a complete
account of the relation of attention to memory is realized.
References
ALho, K., Woods, D. L., & Algazi, A. (1994). Processing of
auditory stimuli during auditory and visual attention as revealed
by event-related potentials. Psychophysiology, 31, 469-479.
Baddeley, A. D. (1986). Working memory. Oxford, England:
Clarendon Press.
Baddeley, A. D. (1990). Human memory: Theory and practice.
Hillsdale, NJ: Edbaum.
Baddeley, A. D., & Salamt, P. (1986). The unattended speech
effect: Perception or memory? Journal of Experiraental Psychology: Learning, Memory, and Cognition, 12, 525-529.
Beaman, C. E, & Jones, D. M. (1997). The role of serial order in the
irrelevant speech effect: Tests of the changing state hypothesis.
Journal of Experimental Psychology: Learning, Memory, and
Cognition, 23, 459-471.
Beaman, C. P., & Jones, D. M. (in press). Irrelevant sound disrupts
order information in free as in serial recall. Quarterly Journal of
Experimental Psychology.
Boyle, R., & Coltheart, V. (1996). Effects of irrelevant sounds on
phonological coding in reading comprehension and short-term
memory. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 49(A), 398-416.
Bregman, A. S. (1978). Auditory streaming is cumulative. Journal
of Experimental Psychology: Human Perception and Performance, 4, 380-387.
Bregman, A. S. (1990). Auditory scene analysis: The perceptual
organization of sound. Cambridge, MA: MIT Press.
Bridges, A. M., & Jones, D. M. (1996). Word-dose in the disruption
of serial recall by irrelevant speech: Phonological confusions or
changing state? Quarterly Journal of Experimental Psychology:
Human Experimental Psychology, 49(A), 919-939.
Broadbent, D. E. (1958). Perception and communication. Oxford,
England: Pergamon Press.
Buchner, A., Irmen, L., & Erdfelder, E. (1996). On the irrelevance
of semantic information for the "irrelevant speech" effect.
Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 49(A), 765-779.
Colic, H. A. (1980). Auditory encoding in visual short-term recall:
Effects of noise intensity and spatial location. Journal of Verbal
Learning and Verbal Behavior, 19, 722-735.
CoUe, H. A., & Welsh, A. (1976). Acoustic masking in primary
memory. Journal of Verbal Learning and Verbal Behavior, 15,
17-31.
Cowan, N. (1988). Evolving conceptions of memory storage,
selective attention, and their mutual constraints within the
human information processing system. Psychological Bulletin,
104, 163-191.
Cowan, N. (1995). Attention and memory: An integrated framework. Oxford, England: Oxford University Press.
Crowder, R. G. (1989). Imagery for musical timbre. Journal of
Experimental Psychology: Human Perception and Performance,
15, 472-478.
Crowder, R. G. (1993). Auditory memory. In S. McAdams & E.
Bigand (Eds.), Thinking in sound (pp. 113-145). Oxford,
England: Oxford University Press.
Ellermeier, W., & Zimmer, K. (1997). Individual differences in
susceptibility to the irrelevant speech effect. Journal of the
Acoustical Society of America, 102, 2191-2199.
Furedy, J. J. (1968). Human orienting reactions as a function of
electrodermal versus plethysmographic response modes and
single versus alternating stimulus series. Journal of Experimental Psychology, 7Z 70-78.
Gaff, I., & Ben-Shakhar, G. (1990). Novelty and significance in
orientation and habituation: A feature-matching approach. Journal of Experimental Psychology: General, 119, 251-263.
Gilhooly, K. J., & Logic, R. H. (1980). Age-of-acquisition,
imagery, concreteness, familiarity, and ambiguity measures for
1944 words. Behavior Research Methods and Instrumentation,
12, 395-427.
Glenberg, A. M. (1987). Temporal context and memory. In D. S.
Gorfein & R. R. Hoffman (Eds.), Memory and learning: The
Ebbinghaus centennial conference (pp. 173-190). Hillsdale, NJ:
Erlbanm.
Glenberg, A. M., & Swanson, N. C. (1986). A temporal distinctiveness theory of recency and modality effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 3-15.
Handel, S. (1989). Listening. Cambridge, MA: MIT Press.
Hellbrtick, J., Kuwano, S., & Namba, S. (1996). Irrelevant speech
and human performance: Is there long-term habituation? Journal
of Acoustical Society of Japan, 17, 239-247.
Jones, D. M. (1993). Objects, streams, and threads of auditory
attention. In A. Baddeley & L. Weiskrantz (Eds.), Attention:
Selection, awareness, and control (pp. 87-104). Oxford, England: Clarendon Press.
Jones, D. M. (1994). Disruption of memory for lipread lists by
irrelevant speech: Further support for the changing state hypothesis. Quarterly Journal of Experimental Psychology: Human
Experimental Psychology, 47(A), 143-160.
Jones, D. M. (1995). The fate of the unattended stimulus: Irrelevant
speech and cognition. Applied Cognitive Psychology, 9, 23-38.
Jones, D. M., Beaman, C. P., & Macken, W. J. (1996). The
object-oriented episodic record model. In S. E. Gathercole (Ed.),
Models of short-term memory (pp. 209-238). London: Erlbanm.
Jones, D. M., & Broadbent, D. E. (1991). Human performance and
noise. In C. S. Harris (Ed.), Handbook of noise control (pp.
24.1-24.24). New York: McGraw-Hill.
Jones, D. M., & Macken, W. J. (1993). Irrelevant tones produce an
irrelevant speech effect: Implications for phonological coding in
working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 369-381.
Jones, D. M., & Macken, W. J. (1995a). Auditory babble and
IRRELEVANTSPEECH AND HABITUATION
cognitive efficiency: The role of number of voices and their
location. Journal of Experimental Psychology:Applied, 1, 216--226.
Jones, D. M., & Macken, W. J. (1995b). Organizational factors in
the effect of irrelevant speech: The role of spatial location and
timing. Memory & Cognition, 23, 192-200.
Jones, D. M., & Macken, W. J. (1995c). Phonological similarity in
the irrelevant speech effect: Within- or between-stream similarity? Journal of Experimental Psychology: Learning, Memory,
and Cognition, 21, 103-115.
Jones, D. M., & Macken, W. J. (1997a). Irrelevant noise and the
disruption of serial recall: A definitive test of the changing state
hypothesis ? Manuscript submitted for publication.
Jones, D. M., & Macken, W. J. (1997b). Temporal distinctiveness
theory applied to the irrelevant sound effect: A critique. Manuscript submitted for publication.
Jones, D. M., Macken, W. J., & Mosdell, N. (1997). The role of
habituation in the disruption of recall performance by Lrrelevant
sound. British Journal of Psychology, 88, 549-564.
Jones, D. M., Macken, W. J., & Murray, A. C. (1993). Disruption of
visual short-term memory by changing-state auditory stimuli:
The role of segmentation. Memory & Cognition, 21, 318-328.
Jones, D. M., Madden, C., & Miles, C. (1992). Privileged access by
irrelevant speech to short-term memory: The role of changing
state. Quarterly Journal of Experimental Psychology: Human
Experimental Psychology, 44(A), 645--669.
Jones, D. M., Miles, C., & Page, J. (1990). Disruption of
proof-reading by irrelevant speech: Effects of attention, arousal,
or memory? Applied Cognitive Psychology, 4, 89-108.
Jones, D. M., & Morris, N. (1992). Irrelevant speech and serial
recall: Implications for theories of attention and working memory.
Scandinavian Journal of Psychology, 33, 212-229.
Keppel, G. (1988). Design and analysis: A researcher's handbook.
New York: Prentice Hall.
Kirk, R. E. (1982). Experimental design: Procedures for the
behavioral sciences. Monterey, CA: Brooks/Cole.
Klatte, M., & Hellbrtick, J. (1993). Der "irrelevant speech effekt":
Wirkungen von Hintergrundschall auf das Arbeitsged~htnis
[The "irrelevant speech effect": Effects of background noise on
working memory]. Zeitschrifl fiir Liirmbekiimpfung, 40, 91-98.
Klatte, M., Kilcher, H., & Hellbrtick, J. (1995). Wirkungen der
zeitlichen Struktur von Hintergrundschall auf das Arbeitsged~hthis und ihre theoretischen und praktiscben Implikationen [The
effects of temporal structure of background noise on working
memory: Theoretical and practical implications]. Zeitschriflfiir
Experimentelle Psychologie, 17, 517-544.
Kolers, P. A., & Roediger, H. L., HI. (1979). Procedures of mind.
Journal of Verbal Learning and Verbal Behavior, 23, 425--449.
LeCompte, D. C. (1994). Extending the irrelevant speech effect
beyond serial recall. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 20, 1396-1408.
LeCompte, D. C. (1995). An irrelevant speech effect with repeated
and continuous background speech. Psychonomic Bulletin and
Review, 2, 391-397.
LeCompte, D. C. (1996). Irrelevant speech, serial rehearsal, and
temporal distinctiveness: A new approach to the irrelevant
speech effect. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 22, 1154-1165.
LeCompte, D. C., Neely, C. B., & Wilson, J. R. (1997). Irrelevant
speech and irrelevant tones: The relative importance of speech to
the irrelevant speech effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 472-483.
Loftus, G. R., & Masson, M. E. J. (1994). Using confidence
intervals in within-subject designs. Psychonomic Bulletin &
Review, 1, 476--490.
Lyytinen, H., Blomberg, A. P., & N~t,~men, R. (1992). Event-
671
related potentials and autonomic responses to a change in
unattended auditory stimuli. Psychophysiology, 29, 523-534.
Macken, W. J., & Jones, D. M. (1995). Functional characteristics of
the inner voice and the inner ear: Single or double agency?
Journal of Experimental Psychology: Learning, Memory, and
Cognition, 21, 436-448.
Mantysalo, S., & N ~ e n , R. (1987). The duration of a neuronal
trace of auditory stimulus as indicated by event-related potentials. Biological Psychology, 24, 183-195.
Massaro, D. W., & Cowan, N. (1993). Information processing
models: Microscopes of the mind. Annual Review of Psychology,
44, 383-425.
Miles, C., Jones, D. M., & Madden, C. A. (1991). Locus of the
irrelevant speech effect in short-termmemory. Journal of Ext~rimentaI Psychology:Learning,Memory, and Cognition, 17, 578-584.
Morris, N., & Jones, D. M. (1990). Habituation to irrelevant
speech: Effects on a visual short-term memory task. Perception
& Psychophysics, 47, 291-297.
Murdock, B. B., Jr. (1993). TODAM2: A model for the storage and
retrieval of item, associative, and serial-order information.
Psychological Review, 100, 183-203.
Murdock, B. B., Jr. (1996). Item, associative, and serial order
information in TODAM. In S. E. Gathercole (Ed.), Models of
short-term memory (pp. 239-266). London: Erlbaum.
N~tinen, R. (1990). The role of attention in auditory information
processing as revealed by event-related potentials and brain
measures of cognitive function. Behavioral and Brain Sciences,
14, 201-288.
N~t~t~tnen, R. (1991). Mismatch and processing negativities in
auditory stimulus processing and selection. Behavioral and
Brain Sciences, 14, 764-768.
N~lfiinen, R. (1992). Attention and brain function. Hillsdale, NJ:
Erlbaum.
Ohman, A. (1979). The orienting response, attention, and learning:
An information processing perspective. In H. D. Kimmel,
E. H. V. Olst, & J. E Orlebeke (F_zls.), The orienting reflex in
humans (pp. 443-471). Hillsdale, NJ: Erlbaum.
Rosen, S., & Howell, P. (1991). Signals and systemsfor speech and
hearing. London: Academic Press.
Salarn6, P., & Baddeley, A. (1982). Disruption of short-term
memory by unattended speech: Implications for the structure of
working memory. Journal of Verbal Learning and Verbal
Behavior, 21, 150-164.
Salam6, P., & Baddeley, A. D. (1989). Effects of background music
on phonolo~cal short-term~ .
QuarterlyJournalof Experimental Psychology:Human ExperimentalPsychology, 41(A), 107-122.
Salam6, P., & Baddeley, A. D. (1990). The effects of irrelevant
speech on immediate free recall. Bulletin of the Psychonomic
Society, 28, 540-542.
Sokolov, E. N. (1963). Perception and the conditioned reflex.
London: Pergamon Press.
Trejo, L. J., Ryan-Jones, D., & Kramer, A. E (1992, October).
Attentional modulation of the pitch-change mismatch negativity.
Paper presented at the Annual Meeting of the Society for
Psychophysiological Research, San Diego, CA.
Winer, B. J. (1971). Statistical principles in experimental design.
New York: McGraw-Hill.
Woldorff, M. G., Gallon, C. C., Hampson, S. A., Hillyard, S. A.,
Pantev, C., Sobel, D., & Bloom, F. E. (1993). Modulation of
early sensory processing in human auditory cortex. Proceedings
of the National Academy of Sciences, USA, 90, 8722-8726.
Received June 24, 1996
Revision received July 8, 1997
Accepted July 23, 1997 •