Interindividual variability in auditory scene analysis

Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
rstb.royalsocietypublishing.org
Interindividual variability in auditory
scene analysis revealed by confidence
judgements
C. Pelofi1,4, V. de Gardelle2, P. Egré3,4 and D. Pressnitzer1,4
Research
Cite this article: Pelofi C, de Gardelle V, Egré
P, Pressnitzer D. 2017 Interindividual variability
in auditory scene analysis revealed by
confidence judgements. Phil. Trans. R. Soc. B
372: 20160107.
http://dx.doi.org/10.1098/rstb.2016.0107
Accepted: 30 September 2016
One contribution of 15 to a theme issue
‘Auditory and visual scene analysis’.
Subject Areas:
cognition
Keywords:
hearing, ambiguity, vagueness,
musical training, Shepard tones
Author for correspondence:
D. Pressnitzer
e-mail: [email protected]
1
Laboratoire des systèmes perceptifs, CNRS UMR 8248, 2Paris School of Economics & CNRS, 3Institut Jean Nicod,
CNRS UMR 8129, and 4Institut d’étude de la cognition, École normale supérieure - PSL Research University,
75005 Paris, France
DP, 0000-0003-4744-5165
Because musicians are trained to discern sounds within complex acoustic
scenes, such as an orchestra playing, it has been hypothesized that musicianship improves general auditory scene analysis abilities. Here, we compared
musicians and non-musicians in a behavioural paradigm using ambiguous
stimuli, combining performance, reaction times and confidence measures.
We used ‘Shepard tones’, for which listeners may report either an upward
or a downward pitch shift for the same ambiguous tone pair. Musicians and
non-musicians performed similarly on the pitch-shift direction task. In particular, both groups were at chance for the ambiguous case. However, groups
differed in their reaction times and judgements of confidence. Musicians
responded to the ambiguous case with long reaction times and low confidence,
whereas non-musicians responded with fast reaction times and maximal
confidence. In a subsequent experiment, non-musicians displayed reduced
confidence for the ambiguous case when pure-tone components of the
Shepard complex were made easier to discern. The results suggest an effect
of musical training on scene analysis: we speculate that musicians were
more likely to discern components within complex auditory scenes, perhaps
because of enhanced attentional resolution, and thus discovered the ambiguity. For untrained listeners, stimulus ambiguity was not available to
perceptual awareness.
This article is part of the themed issue ‘Auditory and visual scene analysis’.
1. Introduction
Electronic supplementary material is available
online at https://dx.doi.org/10.6084/m9.figshare.c.3583496.
Two observers of the same stimulus may sometimes report drastically different
perceptual experiences [1,2]. When such interindividual differences are stable
over time, they provide a powerful tool for uncovering the neural bases of perception [3]. Here, we investigated interindividual differences for auditory scene
analysis, the fundamental ability to focus on target sounds amidst background
sounds. We used an ambiguous stimulus [4], as inconclusive sensory evidence
should enhance the contribution of idiosyncratic processes [5]. We further
compared listeners with varying degrees of formal musical training, as musicianship has been argued to affect generic auditory abilities [6]. Finally, we combined
standard performance measures with introspective judgements of confidence [7].
This method addressed a basic but unresolved question about ambiguous stimuli:
are observers aware of the physical ambiguity, or not [8,9]?
Ambiguous stimuli have been used to uncover robust interindividual
differences in perception. For vision, colour [1] or motion direction [10] can be
modulated by strong and unexplained idiosyncratic biases, that are surprisingly stable over time. For audition, reports of pitch-shift direction between
ambiguous sounds have also revealed stable biases [2] that are correlated with
the language experience of listeners [11], although this has been debated [12].
Stable interindividual differences also extend to the so-called metacognitive
abilities, such as the introspective judgement of the accuracy of percepts [7,13].
& 2017 The Author(s) Published by the Royal Society. All rights reserved.
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
(a)
2
amplitude
rstb.royalsocietypublishing.org
frequency (octave)
(b)
(i) 9
2
3 4 5 6 7 8
frequency (octave)
T1
T2
(ii) 9
9
T1
T2
(iii) 9
8
8
8
7
7
7
6
6
6
5
5
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
T1
T2
time
Figure 1. Illustration of the ambiguous and non-ambiguous stimuli. (a) Shepard tones were generated by adding octave-related pure tones, weighted by a bellshaped amplitude envelope. (b) Pairs of Shepard tones, T1 and T2, were used in the experiment. The panels show frequency intervals of 3 st (i), 6 st (ii) and 9 st
(iii) between T1 and T2. When reporting the pitch shift between T1 and T2, participants tend to choose the shortest log-frequency distance: ‘up’ for 3 st and ‘down’
for 9 st. For 6 st, there is no shortest log-frequency distance. The stimulus is ambiguous and judgements tend to be equally split between ‘up’ and ‘down’.
Such reliable interindividual differences can provide useful
methodological tools, to reveal underlying associations
between different behavioural measures [14], or to relate perceptual measures to clinical symptoms [15]. Variations in
behaviour can also be correlated with variations in physical
characteristics, to probe the neurophysiological [16,17], neuroanatomical [3] or genetic [18] bases of perceptual processing.
For auditory perception, one long-recognized source of
interindividual variability is musical training [19]. Musical
training provides established benefits for music-related
tasks, such as fine-grained pitch discrimination [6,20]. Generalization to basic auditory processes is still under scrutiny
however. In particular, a number of studies have investigated
whether musicianship improved auditory scene analysis.
Musicians were initially shown to have improved intelligibility for speech in noise, which was correlated to enhanced
neural encoding of pitch [21]. However, attempts to replicate
and generalize these findings have been unsuccessful [22]. An
advantage for musicians was subsequently found by emphasizing non-auditory aspects of the task, by using intelligible
speech as a masker instead of noise [23,24], but null findings
also exist with intelligible speech as the masker [25]. For auditory scene analysis tasks not involving speech, musicians were
better at extracting a melody [26] or a repeated tone [27,28]
from an interfering background. Musicians were also more
likely to discern a mistuned partial within a complex tone
[29] or within an inharmonic chord [30].
Here, we take the comparison between musicians and
non-musicians to a different setting, using ambiguous stimuli.
We used Shepard tones [4], which are chords of many simultaneous pure tones, all with an octave relationship to each
other (figure 1a). When two Shepard tones are played in
rapid succession, listeners report a subjective pitch shift,
usually corresponding to the smallest log-frequency distance
between successive component tones (figure 1b(i) or (iii)).
When two successive Shepard tones are separated by a frequency distance of half-an-octave, however, an essential
ambiguity occurs: there is no shortest log-frequency distance
to favour either up or down pitch shifts (figure 1b(ii)). In this
case, listeners tend to report either one or the other pitch-shift
direction, with equal probability on average across trials and
listeners [4]. Interindividual differences have been observed
for the direction of the reported shift, with listeners displaying
strong biases for one direction of pitch shift for some stimuli,
but without any systematic effect of musicianship [2,11].
We did not investigate interindividual differences in
pitch-shift direction bias, but rather, differences in the introspective experience of Shepard tones: are listeners aware of
the ambiguity, or not? The experimental evidence for ambiguity so far has been an equal split between ‘up’ and
Phil. Trans. R. Soc. B 372: 20160107
1
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
(a) Methods
(i) Participants
Sixteen self-reported normal hearing participants (age in years
M ¼ 26, s.d. ¼ 5) were tested. Eight were musicians (four men
and four women, age M ¼ 27, s.d. ¼ 7) and eight were nonmusicians (four men and four women, age M ¼ 26, s.d. ¼ 2).
Musicians had more than 5 years of musical practice in an academic institution. Among them, three were professionals
(a clarinet player, a pianist, and a composer and pianist, all
with self-reported absolute pitch). Among the non-musicians,
(ii) Stimuli
Shepard tones were generated as in [33]. Briefly, nine pure
tones all with an octave relationship to a base frequency, Fb,
were added together to cover the audible range. They were
amplitude-weighted by a fixed spectral envelope (Gaussian
in log-frequency and linear amplitude, with M ¼ 960 Hz and
s.d. ¼ 1 octave; figure 1a). A trial consisted of two successive
tones, T1 and T2, with tone duration of 125 ms and no silence
between tones. The Fb for T1 was randomly drawn for each
trial, uniformly between 60 and 120 Hz, to counterbalance
possible idiosyncratic biases in pitch-shift direction preference
[2,33]. The Fb interval between T1 and T2 was randomly drawn
from a uniform distribution between 0 semitones (st) and 11 st
in steps of 1 st. The interval of 6 st corresponds to a half octave,
the ambiguous case. Each interval was presented 40 times with
presentation order randomly shuffled. To minimize context
effects [33], participants were played an intertrial sequence of
five tones between trials, with tone duration of 125 and
125 ms of silence between tones. The intertrial tones were similar to Shepard tones but with a half-octave relationship
between tones. The Fb for intertrial tones was randomly
drawn, uniformly between 60 and 120 Hz. The experiment
lasted for about 90 min, in a single session split over four
blocks.
(iii) Apparatus and procedure
Participants were tested individually in a double-walled soundinsulated booth (Industrial Acoustics). Stimuli were played
diotically through an RME Fireface 800 soundcard, at 16-bit
resolution and a 44.1 kHz sampling rate. They were presented
through Sennheiser HD 250 Linear II headphones. The presentation level was 65 dB SPL, A-weighted. Participants
provided an ‘up’ or ‘down’ response through a custom-made
response box, which recorded reaction times with submillisecond accuracy. The right/left attribution of responses
was counterbalanced across participants. Trial presentation
was self-paced. A trial was initiated by participants depressing
both response buttons. This started a random silent interval of
50–850 ms followed by the stimulus pair T1–T2. Participants
were instructed to release as quickly as possible the response
button corresponding to the pitch-shift direction they wished
to report. Then, participants used a computer keyboard to
rate their confidence in the pitch-shift direction report. They
used a scale from 1 (very unsure) to 7 (very sure). The intertrial
sequence was then played and the next trial was ready to
be initiated.
(iv) Screening procedure
We asked participants to report pitch-shift direction, a task
for which variability in performance is well documented [37].
A screening test was therefore applied, as in [33], to select
participants who could report pitch shifts reliably. Briefly,
3
Phil. Trans. R. Soc. B 372: 20160107
2. Main experiment: the perception of
ambiguous pitch shifts by musicians and
non-musicians
four reported having no musical training whatsoever, the
other four having practised an instrument for less than 4
years in a non-academic institution. Such a binary distinction
between musicians and non-musicians was arbitrary and
based on the sample of participants who volunteered for the
experiment. A finer sampling is presented later with the
online experiment. The two groups did not differ with respect
to age (two-samples t-test, t14 ¼ 0.39, p ¼ 0.69). All were paid
for participation in the experiment.
rstb.royalsocietypublishing.org
‘down’ pitch shift reports, for the same stimulus. However,
there are three possible reasons for such an outcome. First, listeners may hear neither an upward nor a downward shift,
and respond at chance. Second, listeners may hear upward
and downward shifts simultaneously, and randomly choose
between the two. Third, listeners may clearly hear one direction of shift, and report it unhesitatingly, but this direction
may change over trials. The question bears upon current
debates on the nature of perception with under-determined
information, which is the general case. The first two options,
hearing neither or both pitch shifts, would be compatible
with what has been termed vagueness: response categories
are fuzzy and non-exclusive, and observers are aware of
their uncertainty when selecting a response [31,32]. The
third option would be more akin to what is assumed for bistable stimuli [9]. We will designate this third option as a
‘polar’ percept: observers are sure of what they perceive
and unaware of the alternatives, but they differ in which percept they are attracted to. Based on informal observations,
Shepard described ambiguous tone pairs as polar [4].
Deutsch further argued that the stable individual biases
observed for such sounds pointed to polar percepts [2]. Interestingly, however, the idiosyncratic biases can be overcome
by context effects [33] or cross-modal influence [34],
suggesting that both percepts may be available to the listener.
Here we assessed the listeners’ introspective confidence in
their judgement of Shepard tone pairs.
In three behavioural experiments, we investigated the
perception of pitch shifts between Shepard tone pairs, comparing ambiguous and non-ambiguous cases. We collected
pitch-shift direction choices, but also reaction times and judgements of confidence. We hypothesized that, if listeners
were aware of the perceptual ambiguity in physically
ambiguous stimuli, this would translate into longer reaction
times [8,35] and lower confidence judgements. We further
compared musicians and non-musicians, hypothesizing that
musicians may be better able to discern individual components within complex Shepard tones [29,30] and thus be
more likely to report the ambiguity. We finally assessed
whether non-musicians could report the ambiguity when
the component tones were made easier to discern, through
acoustic manipulation [36]. Results suggested that the
Shepard tones were polar for the naive non-musicians listeners, but that musicianship or acoustic manipulation could
reveal the ambiguity.
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
For each participant and interval, we computed the proportion
of ‘up’ responses, P(up), for the pitch-shift direction choices.
Psychometric curves were fitted to the data for individual participants over the 1–11 st interval range, using cumulative
Gaussians and estimating their parameters using the psignifit
software [40]. The fitting procedure returned the point of
subjective equality corresponding to P(up) ¼ 0.5; a noise parameter corresponding to the standard deviation of the
cumulative Gaussian, s, inversely related to the slope of
the psychometric function; and the two higher and lower
asymptotes. The response times (RTs) were defined relative
to the onset of T2, the first opportunity to provide a meaningful
response. RTs faster than 100 ms were discarded as anticipations. Because of the long-tailed distribution typical of RTs,
the natural logarithms of RTs were used for all analyses.
(vi) Data analysis for metacognitive performance
We used an extension of the signal detection theory framework
to quantify the use of the confidence scale by each participant
[41]. In this framework, a stimulus elicits a value of an internal
variable, and this value is used to predict both perceptual
decisions and metacognitive judgements of confidence. For
perceptual decisions, as is standard with signal detection
theory, the internal variable is compared to a fixed criterion
value. For confidence judgements, it is the distance between
the internal variable and the criterion that is used: values
closer to the criterion should correspond to lower confidence.
Assuming no loss of information, the metacognitive performance of each participant can then be mathematically derived
from the perceptual performance. As shown in [41], this
performance can be expressed as ‘meta-d0 ’, which measures
the perceptual information (in d0 units) that is translated
into the empirical metacognitive judgements. Under those
assumptions, meta-d0 equals d0 for participants with ideal
metacognition. A complete formulation of the method is available in [41,42] and comparisons with other techniques are
reviewed in [43]. For each participant, we computed d0 , metad0 , and the ratio meta-d0 /d0 , which quantifies the efficacy
of metacognitive judgements [43]. Intervals from 1 to 5 st,
which had an expected correct response of ‘up’, were treated
as signal trials. Intervals from 7 to 11 st, which had an expected
correct response of ‘down’, were treated as noise trials. Intervals of 0 and 6 st, for which there was no expected correct or
incorrect response, were discarded from this analysis. For the
confidence judgements, we summarized confidence levels
using a median-split for each participant and added onefourth trial to responses for each condition (stimulus (b) Results
(i) Pitch-shift direction responses
As expected, both musicians and non-musicians reported
mostly ‘up’ for small intervals and ‘down’ for large intervals
(figure 2a), corresponding to the shortest log-frequency distances between successive components. Performance was
around the chance level of P(up) ¼ 0.5 for both the 0-st interval, corresponding to no physical difference between T1 and
T2, and the 6-st interval, corresponding to the ambiguous
case with no shortest log-frequency distance. Overall accuracy, as measured by s, was statistically indistinguishable
between musicians and non-musicians (two-samples t-test,
t14 ¼ 0.45, p ¼ 0.65). The trend for higher performance for
musicians at the extreme of the raw values (figure 2a) was
not confirmed in the fitted psychometric functions (upper
asymptote: t14 ¼ 1.46, p ¼ 0.16; lower asymptote: t14 ¼ 1.28,
p ¼ 0.22). The point of subjective equality corresponding to
P(up) ¼ 0.5 was not different from 6 st for both groups (musicians: M ¼ 6.0, t7 ¼ 20.26, p ¼ 0.80; non-musicians: M ¼ 6.0,
t7 ¼ 0.37, p ¼ 0.72) and not different across groups (t14 ¼ 0.45,
p ¼ 0.66).
(ii) Response times
Perhaps predictably, the longest RTs for judging the pitch-shift
direction were observed for 0 st (figure 2b), corresponding to
identical sounds for T1 and T2. For the other intervals, the pattern of responses differed across groups. Musicians were
slower for the ambiguous 6-st interval than for the less ambiguous intervals, whereas non-musicians showed a trend to be
faster for the ambiguous interval. We tested the statistical
reliability of this observation by averaging the log-RTs for
non-ambiguous intervals (all intervals in the 1–11 st range
except 6 st) and comparing this value with the log-RT at 6 st.
The analysis confirmed that musicians were slower for the
ambiguous interval compared with non-ambiguous intervals
(paired t-test, t7 ¼ 23.72, p ¼ 0.007). For non-musicians, the
difference was not significant (t7 ¼ 0.93, p ¼ 0.38). Finally, to
test the interaction between response pattern and group, we
computed the difference between the averaged non-ambiguous
cases and the ambiguous case, what we term the ‘ambiguity
effect’. The ambiguity effect was then contrasted across
groups. The contrast was significant (t14 ¼ 3.07, p ¼ 0.008), confirming that musicians and non-musicians displayed different
ambiguity effects in terms of log RT.
(iii) Confidence ratings
Confidence was lowest for 0 st, and lower for musicians
than non-musicians at this interval (figure 2c). For the other
intervals, the patterns of responses differed across groups.
Musicians, but not non-musicians, displayed a dip in confidence for the ambiguous case. We used the same analysis
method as above to quantify this observation. For musicians,
confidence was higher for the non-ambiguous intervals than
for the ambiguous interval (t7 ¼ 3.38, p ¼ 0.012). For nonmusicians, the pattern was reversed, with higher confidence
for the ambiguous interval (t7 ¼ 23.34, p ¼ 0.012). Note
that this ambiguous interval also contained the largest logfrequency distance between successive components of T1
4
Phil. Trans. R. Soc. B 372: 20160107
(v) Data analysis for perceptual performance
response confidence) to ensure that there were no empty
cells in the analysis.
rstb.royalsocietypublishing.org
prospective participants were asked to report ‘up’ or ‘down’
shifts for pure tone or Shepard tone pairs, and a minimum accuracy of 80% for a 1-st interval was required. Broadly consistent
with previous reports [33,37], 10 participants out of 26 failed the
screening test and were not invited to proceed to the main
experiment. All of those who failed the screening test were
non-musicians. The relatively large proportion of failure may
have been caused by our use of frequency roving in a pitchshift identification task [38]. Also, we did not provide any
training prior to the brief screening procedure, which could
have affected performance on the pitch direction task for
naive participants [39].
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
(a) 1.00
(b)
5
rstb.royalsocietypublishing.org
non-musicians
musicians
response time (ms)
P (up)
0.75
0.50
1000
600
0
(c)
0 1 2 3 4 5 6 7 8 9 10 11
T1–T2 interval (st)
0 1 2 3 4 5 6 7 8 9 10 11
T1–T2 interval (st)
(d)
7
1400
response time (ms)
confidence rating
6
5
4
3
1000
600
2
400
1
0 1 2 3 4 5 6 7 8 9 10 11
T1–T2 interval (st)
3
4
5
6
confidence rating
7
Figure 2. Results of the main experiment. (a) The proportion of ‘up’ responses, P(up), is displayed for each interval and averaged within groups (non-musicians,
circles; musicians, crosses). For all panels, shaded areas indicate +1 s.e. about the mean. The interval of 6 st corresponds to the ambiguous case. (b) Response
times for each interval, averaged within groups. The natural logarithms of RTs were used to compute means and standard errors; y-labels have been converted to
milliseconds for display purposes. (c) Confidence ratings on a scale from 1 (very unsure) to 7 (very sure). (d ) Correlation between confidence and log-RT. Each point
represents an interval condition for an individual participant. Solid lines are fitted linear regressions over all intervals for each participant. (Online version in colour.)
and T2. The interaction between group and ambiguity, tested
with the ambiguity effect, was significant (t14 ¼ 24.43, p ¼
0.001). Musicians and non-musicians displayed different
ambiguity effects in terms of confidence.
(iv) Metacognitive performance
We investigated whether the difference in confidence judgements between non-musicians and musicians could be
explained by a different use of the confidence scale. We estimated
a meta-d0 for each participant, which measures the perceptual
information (in d0 units) needed to explain the empirical metacognitive data, and expressed the results as the meta-ratio
of meta-d0 /d0 (see §2a and [41]). We observed relatively low
values of meta-ratio overall, and no difference between groups
(musicians: M ¼ 0.4, s.d. ¼ 0.14; non-musicians: M ¼ 0.3,
s.d. ¼ 0.39; t14 ¼ 20.68, p ¼ 0.5). Moreover, there was no
correlation between meta-ratio and ambiguity effect (Pearson
correlation coefficient r14 ¼ 0.24, p ¼ 0.36). The metacognitive
analysis thus suggests that not all of the perceptual evidence
available for pitch-shift direction judgements was used for confidence judgements, but that, importantly, the efficacy of each
individual participant in using the confidence scale was not
related to the ambiguity effect.
(v) Correlation of response times and confidence
We hypothesized that RTs and confidence judgments would
be negatively correlated. To test this hypothesis, we calculated
the correlation between the log-RTs and confidence values for
each participant separately, over the 1–11 st range. We found
negative correlations for almost all participants (figure 2d).
The relationship between the two variables was strong
(Pearson correlation coefficient r averaged across participants,
M ¼ 20.7, s.d. ¼ 0.23).
3. Online experiment: replication on a
larger cohort
(a) Rationale
In the main experiment, we observed different ambiguity
effects for non-musicians and musicians. However, the comparison rested on a relatively small sample size (eight in each
Phil. Trans. R. Soc. B 372: 20160107
0.25
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
(i) Participants
Participants were recruited through an email call sent to a selfregistration mailing list, provided by the ‘Relais d’information
sur les Sciences Cognitives’, Paris, France. Participation was
anonymous and participants were not paid. In total, 359 participants started the experiment after reading the instructions
(age M ¼ 30, s.d. ¼ 11). Among them, 173 participants completed the Shepard tones experiment, which took about
30 min. We only considered the data of those 173 participants
(age M ¼ 30, s.d. ¼ 10).
(c) Results
(i) Data-based screening
Data were expected to be noisy, especially as we did not control
the equipment used for playback nor the attentiveness of
participants. Therefore, we chose to perform a data-based
screening of participants, using their performance on the
pitch-shift direction task (and not the confidence judgements).
Psychometric functions were fitted to the data for each participant (see §2a). There was a large spread in the parameter
representing accuracy, s (M ¼ 6.2, s.d. ¼ 43). After visual
inspection of individual functions, we excluded participants
with s . 5 (larger values indicate shallower slopes of the psychometric function and thus poorer performance). This left 134
participants for subsequent analyses. The new sample comprised 86 musicians and 48 non-musicians. The unequal
balance may reflect higher performance or higher motivation
of musicians in the online experiment. Performance on catch
trials after the data-based screening was high overall, but marginally differed between groups (per cent correct; musicians
M ¼ 98, s.d. ¼ 0.06; non-musicians M ¼ 96, s.d. ¼ 0.09;
t132 ¼ 22.05, p ¼ 0.04).
(ii) Questionnaire
(ii) Pitch-shift direction responses
We collected information about years of formal musical training, current practice and type of instrument played. For
simplicity, and to be consistent with the main experiment,
we only used the duration of formal musical training as a
shorthand for musicianship. In the self-selected sample of
the online experiment, mean number of years of musical
training was M ¼ 8.7 years (s.d. ¼ 8.5). For group contrasts,
we used the same criterion as for the main experiments:
musicians were participants with 5 years or more of formal
musical training. This resulted in a sample comprising 90
musicians and 83 non-musicians.
Participants responded in the expected way for unambiguous
cases and responses were close to P(up) ¼ 0.5 for the ambiguous case (figure 3a). No difference was observed across groups
when comparing accuracy, as measured by the slopes of the
psychometric functions (s, t132 ¼ 0.83, p ¼ 0.4). Groups differed for extreme interval values, corresponding to small
log-frequency distances between T1 and T2 (upper asymptote:
t132 ¼ 2.72, p ¼ 0.007; lower asymptote: t132 ¼ 3.60, p , 0.001),
consistent with a better fine frequency discrimination for musicians [20]. The point of subjective equality was equivalent for
the two groups (t132 ¼ 20.69, p ¼ 0.5). This point of subjective
equality was not different from 6 st for both groups (musicians:
M ¼ 5.9, s.d. ¼ 0.81; t85 ¼ 20.84, p ¼ 0.4; non-musicians: M ¼
5.8, s.d. ¼ 1.0; t47 ¼ 21.24, p ¼ 0.22).
(iii) Stimuli and procedure
Test pairs of T1–T2 Shepard tones were generated prior to the
experiment, as in the main experiment. To keep the experiment
at a reasonable length, we only presented intervals of 0, 2, 4, 6, 8
and 10 st. Each interval was presented 12 times and the presentation order was shuffled across participants. Participants were
instructed to listen over headphones or loudspeaker, preferably in a quiet environment, but no attempt was made to
control the sound presentation conditions. After each trial,
they provided a pitch-shift direction response by means of
the up or down arrows on the keyboard. They were then
asked for their confidence ratings on a scale ranging from 1
to 7 (1 ¼ very unsure, 7 ¼ very sure) using number keys on
the keyboard. Between each trial, an intertrial sequence
of three tones was played. Randomly interspersed with experimental trials were four catch trials, containing two successive
harmonic complex tones (first six harmonics with flat spectral
envelope) at an interval of 12 st. These catch trials were
intended to contain large and unambiguous pitch shifts,
which attentive participants should have no difficulty in
reporting correctly. The online experiment contained a final
(iii) Confidence ratings
As for the main experiment, musicians showed a dip in confidence for the ambiguous case, whereas non-musicians showed
a broad peak in confidence for the same stimulus (figure 3b).
However, the contrast between intervals was less pronounced
than in the main experiment. Moreover, musicians gave higher
confidence values than non-musicians overall. The same statistical analysis as for the main experiment was applied:
confidence for non-ambiguous intervals of 2, 4, 8 and 10 st
was averaged and compared to confidence for the ambiguous
interval of 6 st. Musicians were less confident for the ambiguous interval (t85 ¼ 24.22, p , 0.001), whereas non-musicians
showed equivalent confidence for the two types of intervals
(t47 ¼ 1.21, p ¼ 0.23). The interaction between group and confidence was significant, as estimated by the ambiguity effect
(t132 ¼ 3.45, p ¼ 0.001). To illustrate the difference between
groups, histograms of the ambiguity effect for individual participants are displayed in figure 3c. Thus, even though the
6
Phil. Trans. R. Soc. B 372: 20160107
(b) Methods
part, where we attempted to measure performance for the
detection of a mistuned harmonic within a harmonic complex
[29,44]. However, because a large proportion of participants
failed to complete this last phase, results were too noisy to be
meaningfully analysed and will not be reported here.
rstb.royalsocietypublishing.org
group). Moreover, the binary definition of musicianship,
which was required to analyse a small sample, could not
reflect the spread of musical abilities between participants.
We performed an online experiment to try to replicate our
main findings on a larger cohort. Participants first completed
a questionnaire about their music background. Then, they
performed the pitch-shift direction task, followed by the
confidence task, for a reduced set of intervals containing
non-ambiguous and ambiguous cases. Response times were
not collected for technical reasons and also because they
were correlated with confidence in the main experiment.
Thanks to the number of online participants with useable
data (n ¼ 134), we could then compare the effect of ambiguity
with years of musical training.
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
(a) 1.00
(b)
7
7
rstb.royalsocietypublishing.org
non-musicians
musicians
6
confidence rating
P (up)
0.75
0.50
5
4
3
2
1
0 1 2 3 4 5 6 7 8 9 10 11
T1–T2 interval (st)
(c)
(d)
2
ambiguity effect (confidence)
0
1
proportion participants
0.5
0.4
0.3
0.2
0.1
0
–3
–2
–1
0
1
2
ambiguity effect (confidence)
3
0 1 2 3 4 5 6 7 8 9 10 11
T1–T2 interval (st)
0
–1
–2
–3
0
10
20
30
musical practice (years)
40
Figure 3. Results of the online experiment. (a) The proportion of ‘up’ responses, P(up), is displayed for each interval tested in the online experiment. Format as in
figure 2a. (b) Confidence ratings. Format as in figure 2b. (c) Histograms of the ambiguity effect, in confidence units, for non-musicians (dotted line) and musicians
(solid line). The ambiguity effect was defined as the average confidence for all intervals excluding 0 and 6 st, minus confidence for 6 st, the ambiguous case.
Negative values correspond to participants being less confident for the ambiguous case. Bin width is 0.4 confidence units (d ) Correlation between the ambiguity
effect and years of formal musical training. Blue stars indicate participants aged 35 years or less. Solid lines indicate the linear regression (straight line) and 95% CI
(curved lines) fitted to the data for these younger participants. (Online version in colour.)
differences across groups were less pronounced on average
than in the main experiment, presumably because of the
noisy nature of online data, the main findings were replicated.
(iv) Metacognitive performance
The meta-ratio analysis revealed a difference between musicians
and non-musicians (musicians: M ¼ 0.69, s.d. ¼ 0.43; nonmusicians: M ¼ 0.36, s.d. ¼ 0.62; t132 ¼ 23.58, p , 0.001).
However, the meta-ratio was not correlated to the ambiguity
effect (r132 ¼ 20.053, p ¼ 0.55). Also, we observed a sizeable
proportion of participants with negative meta-ratio values,
denoting a higher confidence for incorrect perceptual judgements, or meta-ratio values greater than unity, suggesting that
confidence was not entirely based on perceptual information
within a signal detection theory framework [41]. To test whether
such unexpected response patterns affected the results, we performed an additional metacognitive analysis, retaining only
those participants with a meta-ratio of between zero and
unity. In this subgroup of 52 musicians and 30 non-musicians,
there was no difference in meta-ratio (musicians: M ¼ 0.57,
s.d. ¼ 0.26; non-musicians: M ¼ 0.50, s.d. ¼ 0.31; t80 ¼ 21.13,
p ¼ 0.26). We also confirmed that the ambiguity effect was
maintained for this subgroup (t80 ¼ 2.67, p ¼ 0.009). As in the
main experiment, there was therefore no link between the use
of the confidence scale and the ambiguity effect.
(v) Correlation with musical expertise
The ambiguity effect became more pronounced (more
negative) with years of musical training (figure 3d). The
rank-order correlation between the two variables was negative and significant (r132 ¼ 20.25, p ¼ 0.004). Inevitably, the
measure ‘years of musical training’ was partly confounded
by ‘age’. Also, one outlier participant with 25 years of musical training displayed an especially strong ambiguity effect,
which could in part have driven the correlation. We performed an additional correlation restricting the analysis to
participants younger than 35 years, retaining 73 musicians
and 41 non-musicians (and thus excluding the outlier, who
was over 35 years). Age did not differ across the two subgroups (t112 ¼ 20.97, p ¼ 0.3). Even when matching age
Phil. Trans. R. Soc. B 372: 20160107
0.25
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
4. Control experiment: discerning component
tones
(a) Rationale
(c) Results
(i) Pitch direction response
Psychometric curves were fitted to P(up) for each participant.
For clarity, because results were visually similar, we only display the results for baseline and the maximal jitter value of
100 ms (figure 4a). The accuracy of participants, as measured
by s for the fitted psychometric functions, was equivalent for
baseline and 100-ms jitter (t15 ¼ 21.01, p ¼ 0.33). There was
also no difference in the point of subjective equality (t15 ¼
0.40, p ¼ 0.7), which was not different from 6 st in both cases
(baseline: M ¼ 6.0, s.d. ¼ 0.28; t15 ¼ 20.46, p , 0.65; 100 ms:
M ¼ 6.0, s.d. ¼ 0.52; t15 ¼ 0.30, p , 0.76). In summary, temporal
jitter had no measurable effect on the pitch-shift direction task.
(ii) Confidence ratings
Three values of temporal jitter were used. For 0-ms jitter,
stimuli were T1–T2 test pairs generated as in the main experiment. For 50-ms jitter, a time jitter was introduced on the onset
of individual component tones. Nine different onset-time
values, linearly spaced between 0 and 50 ms, were assigned,
at random on each trial, to the components of T1 and T2 independently. For 100-ms jitter, the jitter values were linearly
spaced between 0 and 100 ms. We tested intervals of 0, 2, 4,
6, 8 and 10 st between T1 and T2, as in the online experiment.
An intertrial sequence of five tones, of the same type of those in
previous experiments and thus with no jitter, was presented
after each trial.
Baseline results replicated the main and online experiments
for non-musicians, but a dip in confidence appeared for
these non-musicians at a jitter of 100 ms (figure 4b). As
before, we compared confidence for the non-ambiguous
cases (2, 4, and 10 st) to confidence for the ambiguous case
(6 st). For the baseline condition, we found no ambiguity
effect (t15 ¼ 20.10, p ¼ 0.92). To quantify the effect of jitter
on a participant by participant basis, we normalized the
ambiguity effect observed for each jitter condition by subtracting the ambiguity effect observed for the baseline
condition. Figure 4c displays the individual data for such
normalized ambiguity effects. A negative ambiguity effect
(a decrease in confidence for the ambiguous case) would
correspond to the behaviour of musicians in the main experiment. Not all participants exhibited such a negative
ambiguity effect, but for some of them the effect was as large
as for musicians. The ambiguity effect differed from the baseline value only for the 100-ms condition (0 ms: t15 ¼ 21.40,
p ¼ 0.18; 50 ms: t15 ¼ 21.18, p ¼ 0.26; 100 ms: t15 ¼ 22.52,
p ¼ 0.024). However, when contrasting the three jitter conditions against each other, there was no significant difference
(all pairwise comparisons p . 0.17).
(iii) Apparatus and procedure
(iii) Metacognitive performance
Participants were tested individually in a sound-insulated
booth, as in the main experiment. They performed six blocks,
with a short rest after each block. The total duration of the
experiment including the rests was approximately 3 h, performed in a single session. During the first two blocks, only
stimuli with 0-ms jitter were presented, with 40 repeats per
interval. This was intended as a replication of the main experiment for this group of naive participants. We term these initial
two blocks the ‘baseline’ condition. For the following four
blocks, all intervals and jitter values (0, 50 and 100 ms) were
There was only one group of participants in this experiment,
but we still tested whether the ambiguity effect might have
been related to a different use of the confidence scale across
conditions. This was not the case, as variations in the ambiguity effect between baseline and 100-ms jitter were not
correlated to variations in meta-ratio (r14 ¼ 0.24, p ¼ 0.38).
(b) Method
(i) Participants
Sixteen naive participants were tested (age M ¼ 24, s.d. ¼ 0.8,
6 men and 10 women). All had less than 5 years of formal
musical training and were thus labelled non-musicians by
our criterion. They were paid for their participation.
(ii) Stimuli
(iv) Interim discussion
Non-musicians exhibited a significant confidence dip for the
non-ambiguous interval when a large temporal jitter was
8
Phil. Trans. R. Soc. B 372: 20160107
The main experiment and its online replication suggest that
musicians were aware of the ambiguity at 6 st, whereas
non-musicians were not. Following our initial hypothesis,
this would be consistent with a perceptual difference between
musicians and non-musicians: musicians were able to discern
the component tones of the stimuli, which enabled them to
discover the ambiguity. But it is also possible that the difference was in the report itself. Possibly, non-musician
participants also heard out the different component tones,
but were unable to discover the ambiguity. We assessed
this possibility by making component tones easier to discern
for non-musicians, through acoustic manipulation. We compared conditions where all component tones started in
synchrony, as in the previous two experiments, with conditions where the onset of each tone was jittered in time.
Successive components of complex chords that would be normally fused can be explicitly compared when onset jitter is
applied [36]. We hypothesized that, if non-musicians could
discern component tones thanks to onset jitter, they would
behave as the musicians of the main experiment.
presented at random, with 40 repeats per interval and jitter
value. We designate these conditions by their jitter values.
Note that the baseline and 0-ms jitter conditions used the
same stimuli, with no temporal jitter. However, the conditions
differed because baseline trials were presented in blocks that
only contained sounds without jitter, to replicate the main
experiment, whereas 0-ms jitter trials were interleaved with
trials for the 50- and 100-ms jitter conditions. Also, the baseline
condition blocks were performed first. Participants responded
through a computer keyboard. Response times were not
recorded. All other details of the apparatus were as in the
main experiment.
rstb.royalsocietypublishing.org
and removing outliers, the correlation between years of
musical practice and the ambiguity effect was maintained
(r112 ¼ 20.37, p , 0.001).
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
(b) 7
(a) 1.00
9
rstb.royalsocietypublishing.org
baseline
100 ms
6
confidence rating
P (up)
0.75
0.50
5
4
3
2
0
0 1 2 3 4 5 6 7 8 9 10 11
T1–T2 interval (st)
ambiguity effect re:baseline
(c)
1
0 1 2 3 4 5 6 7 8 9 10 11
T1–T2 interval (st)
0.4
0.2
0
–0.2
–0.4
–0.6
–0.8
–1.0
0 ms
50 ms
100 ms
Figure 4. Results of the control experiment. (a) The proportion of ‘up’ responses, P(up), is displayed for each interval tested in the control experiment. Format as in
figure 2a. Only results for the baseline condition (no jitter) and 100 ms condition (maximum jitter) are displayed, for clarity. (b) Confidence ratings. Format as in figure
2b. (c) For each participant, we normalized the ambiguity effect observed in each condition by subtracting the ambiguity effect observed for the baseline. The panel
displays individual means and standard errors about the means. Negative values signal a dip in confidence for the ambiguous case, when jitter was applied. (Online
version in colour.)
introduced, while for the baseline condition no dip was
observed. However, there was some variability across participants. Furthermore, results were statistically less clear cut
when all jitter conditions were interleaved (note also that we
did not correct for multiple comparisons). The lack of difference across jitter conditions is likely due to a trend towards
an ambiguity effect for 0 ms (figure 4c), even though this condition corresponded to the same stimulus as for the baseline
but presented in a different context. A possible interpretation
of this contextual effect is that non-musicians were able to partially discern the component tones, even without jitter, but
only when their attention had been drawn to the structure of
the stimulus by other trials that contained jitter. A further
point to consider is that we did not test whether the jitter
values were sufficient for all participants to discern component
tones, which may also account for part of the variability across
participants.
5. Discussion
Results from the three experiments can be summarized as follows. In the main experiment, all participants responded with
‘up’ and ‘down’ responses in equal measure when judging the
pitch direction of an ambiguous stimulus. The same pattern of
chance result was, predictably, observed when they compared
two physically identical stimuli. However, confidence differed
between the two cases: participants were more confident in
their judgement for the ambiguous tones than for identical
tones. Non-musician participants were even more confident
for ambiguous tones, for which they were at chance, than for
non-ambiguous tones, for which they could perform the task
accurately. As the ambiguous tones also contained the largest
interval between tones, results for non-musician participants
are consistent with confidence mirroring the size of the
perceived pitch shift, irrespective of stimulus ambiguity.
Musicians provided the same pattern of responses for ‘up’
and ‘down’ judgements as non-musicians, but, importantly,
response patterns differed markedly for confidence ratings.
For musicians, confidence was lower for the ambiguous tones
than for the non-ambiguous tones. In both groups of participants, response times mirrored the confidence ratings, with
higher confidence corresponding to faster responses. The
online experiment replicated those findings on a larger
cohort (n ¼ 134), further demonstrating that the ambiguity
effect was correlated with years of musical training. The control
Phil. Trans. R. Soc. B 372: 20160107
0.25
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
10
Phil. Trans. R. Soc. B 372: 20160107
segregation, because component tones had synchronous
onsets and were all spaced by octaves [55]. Scene analysis
processes related to informational masking may thus have
caused non-musicians to hear Shepard tones as perceptual
units, while reduced informational masking may have helped
musicians to hear out component tones. Moreover, in the
control experiment, asynchronous onsets between components
were introduced, a manipulation that has been shown to
reduce informational masking [56]. This led non-musicians to
behave qualitatively like musicians, again consistent with an
implication of informational masking in our task.
We now come back to the distinction, drawn in §1, between
vague and polar percepts. The results suggest that, for nonmusicians, perception was polar: the intrinsic ambiguity of
the stimuli was unavailable to awareness. This is consistent
with recent observations using bistable visual stimuli [8]. In
this visual study, the degree of ambiguity of bistable stimuli
(defined as the proximity to an equal probability of reporting
either percept) was varied, and reaction times were collected
for the report of the first percept. No increase in reaction time
was observed for more ambiguous stimuli, consistent with
an unawareness of ambiguity, as in our results. Why would
ambiguity, a potentially useful cue, not be registered by observers? One hypothesis is that ambiguity is not a valid property
of perceptual organization at any given moment: either an
acoustic feature belongs to one source, or it does not. This
has been formulated as a principle of exclusive allocation for
auditory scene analysis [55,57]. However, there are exceptions
to this principle, as for instance when a mistuned harmonic
within a harmonic complex is heard out from the complex,
but still contributes to the overall pitch of the complex
[44,46,58]. Another hypothesis, perhaps more specific to our
experimental set-up, involves perceptual binding over time.
Shepard tones are made up of several frequency components,
which have to be paired over time to define a pitch shift. If
musicians heard Shepard tones as perceptual units, this may
have biased the binding processes towards a single and unambiguous direction of shift. By contrast, if musicians were able to
discern components, they may have experienced contradictory
directions of pitch shift and thus ambiguity.
To conclude, we must point out that these interpretations remain speculative, especially as we have no direct
evidence that musicians heard out component tones. Further
tests could include counterparts to our control experiment,
increasing informational masking to prevent musicians from
discerning component tones. This could be achieved, for
instance, by presenting shorter duration sounds. Context effects
could also be used to bias perceptual binding in a consistent
manner across components [33], again predicting higher
confidence for musicians. Irrespective of the underlying mechanisms, the mere existence of polar percepts demonstrates that a
seemingly obvious feature of sensory information, ambiguity, is
sometimes unavailable to perceptual awareness. The precise
conditions required for polar perception remain to be explored
experimentally. Ambiguity in the stimulus often causes
multistable perception [9], but it is still unknown whether
all multistable stimuli elicit polar perception, or whether all
polar percepts are associated with spontaneous perceptual
alternations over time. By analogy to categorical perception, it
could also be tested whether polar perception is a feature of conscious processing and absent from subliminal processing [59].
Finally, we showed that ambiguity was not experienced in
the same way by different observers, so polar perception
rstb.royalsocietypublishing.org
experiment showed that at least some naive non-musicians,
initially unaware of the ambiguity, could exhibit an ambiguity
effect when component tones were easier to discern. This confirms that the difference between musicians and non-musicians
was perceptual and not purely decisional.
What was the nature of the perceptual difference between musicians and non-musicians? Based on prior evidence
[29,30] combined with the control experiment, we hypothesize
that musicians were better able to discern acoustic components
within Shepard tones. Discerning the components would have
revealed that two opposite pitch-shift directions were available,
leading to low confidence in the forced-choice task. The advantage previously demonstrated for musicians when discerning
tones within a chord [30] or a complex tone background [28]
did not seem to be based on enhanced peripheral frequency
selectivity (although see [45]). In our case, the octave spacing
of Shepard tones was also greater than the frequency separation
required to discern partials [46], so we can rule out a difference
in the accuracy of peripheral representations.
Another possible difference between listeners, perhaps due
to more central processes, is the distinction first introduced
by Helmholtz between ‘analytic’ and ‘holistic’ listeners [47].
Stimuli containing frequency shifts changing in one direction
if one focuses on individual components, or the opposite direction if one focuses on the (missing) fundamental frequency,
have been used to characterize such a difference. Analytic
listeners focus on individual components, whereas holistic listeners focus on the missing fundamental. In one study, the
analytic/holistic pattern of response was found to be correlated
with brain anatomy, but not musicianship [48]. A later behavioural investigation suggested, in contrast, that musicians were
overall more holistic [49]. Recent data indicate that listeners
may change their listening style according to the task or stimulus parameters [50]. In our case, musicians would be classified
as analytic if they spontaneously heard out component tones
within ambiguous stimuli. Non-musicians were able to
switch from holistic to analytic when the component tones
were easier to discern. This confirms that the holistic/analytic
distinction is task dependent [50].
We would argue that a more general characteristic is relevant
to the interindividual differences we observed: the ability to
focus on sub-parts of a perceptual scene. In the visual modality,
experts such as drawing artists [51] or video-game enthusiasts
[52] have been shown to better resist visual crowding (the
inability to distinguish items presented simultaneously in the
visual periphery even though each item would be easily recognized on its own). This has been interpreted as enhanced
‘attentional resolution’ for experts. The notion of attentional
resolution can be transposed to auditory perception, through
what has been termed ‘informational masking’ [53,54]. Informational masking is defined as the impairment in detecting a target
caused by irrelevant background sounds, in the absence of any
overlap between targets and maskers at a peripheral level of
representation [53]. An effect of auditory expertise has been
shown for informational masking. Musicians are less susceptible
to informational masking than non-musicians when using
complex maskers such as random tone clouds with a large
degree of uncertainty [28], environmental sounds [20] or simultaneous speech [23]. This has been interpreted as better
attentional resolution for musicians [28]. Importantly, informational masking can also occur in simpler situations, through a
failure to perceptually segregate target and background [53].
The Shepard tones we used certainly challenged perceptual
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
may be another useful trait to consider when investigating
interindividual differences.
may thus provide a useful tool for probing the neural bases of
interindividual differences in perception.
6. Conclusion
IRB
Data accessibility. Data archives are provided as the electronic supplementary material. Each file provides data for one figure panel,
identified by the file name. Data are provided for musicians (m in
the file name) and non-musicians (nm in the file name).
Authors’ contributions. D.P. and P.E. initiated the research. C.P., D.P. and
V.d.G. designed the experiments. C.P. collected the data. C.P. and
V.d.G. analysed the data. All authors interpreted the data and drafted
the manuscript.
Competing interests. We have no competing interest.
Funding. C.P. and D.P. were supported by grant ERC ADAM nos
295603, EU H2020 COCOHA 644732. C.P., P.E. and D.P. were supported by grant nos ANR-10-LABX-0087 IEC and ANR-10-IDEX0001-02 PSL. P.E. was partly supported by a fellowship from the
Swedish Collegium for Advanced Study.
Acknowledgements. We thank Maxime Sirbu for implementing the online
experiment. We also thank Brian C. J. Moore and two anonymous
reviewers for useful comments and suggestions on a previous version
of this manuscript.
References
1.
Brainard DH, Hurlbert AC. 2015 Colour vision:
understanding #thedress. Curr. Biol. 25,
R551 –R554. (doi:10.1016/j.cub.2015.05.020)
2. Deutsch D. 1986 A musical paradox. Music Percept.
3, 275–280. (doi:10.2307/40285337)
3. Kanai R, Rees G. 2011 The structural basis of interindividual differences in human behaviour and
cognition. Nat. Rev. Neurosci. 12, 231– 242. (doi:10.
1038/nrn3000)
4. Shepard RN. 1964 Circularity in judgments of
relative pitch. J. Acoust. Soc. Am. 36, 2346. (doi:10.
1121/1.1919362)
5. Kleinschmidt A, Sterzer P, Rees G. 2012 Variability
of perceptual multistability: from brain state to
individual trait. Phil. Trans. R. Soc. B 367,
988–1000. (doi:10.1098/rstb.2011.0367)
6. Moreno S, Bidelman GM. 2014 Examining neural
plasticity and cognitive benefit through the unique
lens of musical training. Hear. Res. 308, 84 –97.
(doi:10.1016/j.heares.2013.09.012)
7. de Gardelle V, Mamassian P. 2015 Weighting mean
and variability during confidence judgments. PLoS
ONE 10, e0120870. (doi:10.1371/journal.pone.
0120870)
8. Takei S. 2010 Perceptual ambiguity of bistable
visual stimuli causes no or little increase in
perceptual latency. J. Vis. 10, 1– 15. (doi:10.1167/
10.4.23)
9. Schwartz JL, Grimault N, Hupe JM, Moore BCJ,
Pressnitzer D. 2012 Multistability in perception:
binding sensory modalities, an overview. Phil.
Trans. R. Soc. B 367, 896 –905. (doi:10.1098/rstb.
2011.0254)
10. Wexler M, Duyck M, Mamassian P. 2015 Persistent
states in vision break universality and time
invariance. Proc. Natl Acad. Sci. USA 112, 14 990–
14 995. (doi:10.1073/pnas.1508847112)
11. Deutsch D. 1991 The tritone paradox: an influence
of language on music perception. Music Percep. 8,
335 –347. (doi:10.2307/40285517)
12. Repp BH. 1994 The tritone paradox and the pitch
range of the speaking voice: a dubious connection.
Music Percept. 12, 227–255. (doi:10.2307/
40285653)
13. Fleming SM, Weil RS, Nagy Z, Dolan RJ, Rees G.
2010 Relating introspective accuracy to individual
differences in brain structure. Science 329,
1541 –1543. (doi:10.1126/science.1191883)
14. Vogel EK, Awh E. 2008 How to exploit diversity for
scientific gain: using individual differences to
constrain cognitive theory. Curr. Dir. Psychol. Sci. 17,
171 –176. (doi:10.1111/j.1467-8721.2008.00569.x)
15. Lin I-F, Shirama A, Kato N, Kashino M. 2017 The
singular nature of auditory and visual scene analysis
in autism. Phil. Trans. R. Soc. B 372, 20160115.
(doi:10.1098/rstb.2016.0115)
16. Kondo HM, van Loon AM, Kawahara J-I, Moore BCJ.
2017 Auditory and visual scene analysis: an
overview. Phil. Trans. R. Soc. B 372, 20160099.
(doi:10.1098/rstb.2016.0099)
17. Takeuchi T, Yoshimoto S, Shimada Y, Kochiyama T,
Kondo HM. 2017 Individual differences in visual
motion perception and neurotransmitter
concentrations in the human brain. Phil.
Trans. R. Soc. B 372, 20160111. (doi:10.1098/rstb.
2016.0111)
18. Kashino M, Kondo HM. 2012 Functional brain
networks underlying perceptual switching: auditory
streaming and verbal transformations. Phil.
Trans. R. Soc. B 367, 977–987. (doi:10.1098/rstb.
2011.0370)
19. Zatorre R, McGill J. 2005 Music, the food of
neuroscience? Nature 434, 312–315. (doi:10.1038/
434312a)
20. Carey D, Rosen S, Krishnan S, Pearce MT, Shepherd
A, Aydelott J, Dick F. 2015 Generality and specificity
in the effects of musical expertise on perception
and cognition. Cognition 137, 81– 105. (doi:10.
1016/j.cognition.2014.12.005)
21. Parbery-Clark A, Skoe E, Lam C, Kraus N. 2009
Musician enhancement for speech-in-noise. Ear
Hear. 30, 653 –661. (doi:10.1097/AUD.
0b013e3181b412e9)
22. Ruggles DR, Freyman RL, Oxenham AJ. 2014
Influence of musical training on understanding
voiced and whispered speech in noise. PLoS ONE 9,
e86980. (doi:10.1371/journal.pone.0086980)
23. Swaminathan J, Mason CR, Streeter TM, Best V,
Gerald Kidd J, Patel AD. 2015 Musical training,
individual differences and the cocktail party
problem. Sci. Rep. 5, 11628. (doi:10.1038/
srep11628)
24. Başkent D, Gaudrain E. 2016 Musician advantage for
speech-on-speech perception. J. Acoust. Soc. Am.
139, EL51– EL56. (doi:10.1121/1.4942628)
25. Boebinger D, Evans S, Rosen S, Lima CF, Manly T, Scott
SK. 2015 Musicians and non-musicians are equally
adept at perceiving masked speech. J. Acoust. Soc.
Am. 137, 378–387. (doi:10.1121/1.4904537)
26. Bey C, McAdams S. 2002 Schema-based processing
in auditory scene analysis. Percept. Psychophys. 64,
844–854. (doi:10.3758/BF03194750)
27. Beauvois MW, Meddis R. 1997 Time decay of
auditory stream biasing. Percept. Psychophys. 59,
81– 86. (doi:10.3758/BF03206850)
28. Oxenham AJ, Fligor BJ, Mason CR, Kidd G. 2003
Informational masking and musical training.
J. Acoust. Soc. Am. 114, 1543–1547. (doi:10.1121/
1.1598197)
29. Zendel BR, Alain C. 2009 Concurrent sound
segregation is enhanced in musicians. J. Cogn.
Phil. Trans. R. Soc. B 372: 20160107
Ambiguous auditory stimuli [2,4,33] were judged with high
confidence and fast reaction times by naive non-musician
listeners, showing that those listeners were unaware of the
physical ambiguity of the sounds. This confirms an untested
assumption in previous reports about those stimuli [2,4,33,60].
By contrast, musicians judged ambiguous stimuli with less confidence and slower reaction times, suggesting that they did
perceive the ambiguity. We interpreted this interindividual
difference as an enhanced attentional resolution in crowded
perceptual scenes for musicians [28–30]. From a methodological perspective, we showed robust effects in confidence
judgements for matched performance within participants
(identical sounds versus ambiguous sounds), and robust interindividual differences for matched performance and matched
stimuli (musicians versus non-musicians). Ambiguous stimuli
rstb.royalsocietypublishing.org
Ethics. Ethical approval was provided by the CERES
#20142500001072 (Université Paris Descartes, France).
11
Downloaded from http://rstb.royalsocietypublishing.org/ on January 5, 2017
31.
32.
34.
35.
36.
37.
38.
39.
40.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
2013 Patterns of individual differences in the
perception of missing-fundamental tones. J. Exp.
Psychol. Hum. Percept. Perform. 39, 1386–1397.
(doi:10.1037/a0031261)
Perdreau F, Cavanagh P. 2014 Drawing skill is
related to the efficiency of encoding object
structure. i-Perception 5, 101 –119. (doi:10.1068/
i0635)
Green CS, Bavelier D. 2007 Action-video-game
experience alters the spatial resolution of vision.
Psychol. Sci. 18, 88 –94. (doi:10.1111/j.1467-9280.
2007.01853.x)
Kidd G, Mason CR, Richards VM, Gallun FJ, Durlach
NI. 2007 Informational masking. Boston, MA:
Springer.
Durlach NI, Mason CR, Kidd G, Arbogast TL, Colburn
HS, Shinn-Cunningham BG. 2003 Note on
informational masking (L). J. Acoust. Soc. Am. 113,
2984. (doi:10.1121/1.1570435)
Bregman AS. 1990 Auditory scene analysis. The
perceptual organization of sound. Cambridge, MA:
MIT Press.
Neff DL. 1995 Signal properties that reduce masking
by simultaneous, random-frequency maskers.
J. Acoust. Soc. Am. 98, 1909– 1920. (doi:10.1121/
1.414458)
Shinn-Cunningham BG, Lee AKC, Oxenham AJ. 2007
A sound element gets lost in perceptual
competition. Proc. Natl Acad. Sci. USA 104,
12 223 –12 227. (doi:10.1073/pnas.0704641104)
Moore BCJ. 1985 Relative dominance of individual
partials in determining the pitch of complex tones.
J. Acoust. Soc. Am. 77, 1853. (doi:10.1121/1.391936)
de Gardelle V, Charles L, Kouider S. 2011 Perceptual
awareness and categorical representation of faces:
evidence from masked priming. Conscious. Cogn. 20,
1272– 1281. (doi:10.1016/j.concog.2011.02.001)
Repp BH, Thompson JM. 2009 Context sensitivity
and invariance in perception of octave-ambiguous
tones. Psychol. Res. 74, 437– 456. (doi:10.1007/
s00426-009-0264-9)
12
Phil. Trans. R. Soc. B 372: 20160107
33.
41. Maniscalco B, Lau H. 2012 A signal detection
theoretic approach for estimating metacognitive
sensitivity from confidence ratings. Conscious. Cogn.
21, 422– 430. (doi:10.1016/j.concog.2011.09.021)
42. Maniscalco B, Lau H. 2014 Signal detection theory
analysis of type 1 and type 2 data: meta-d0 ,
response-specific meta-d0 , and the unequal variance
SDT model. In The cognitive neuroscience of
metacognition, pp. 25 –66. Berlin, Germany:
Springer.
43. Fleming SM, Lau HC. 2014 How to measure
metacognition. Front. Hum. Neurosci. 8, 443.
(doi:10.3389/fnhum.2014.00443)
44. Moore B, Peters RW, Glasberg BR. 1985 Thresholds for
the detection of inharmonicity in complex tones.
J. Acoust. Soc. Am. 77, 1861. (doi:10.1121/1.391937)
45. Bidelman GM, Schug JM, Jennings SG, Bhagat SP.
2014 Psychophysical auditory filter estimates reveal
sharper cochlear tuning in musicians. J. Acoust. Soc.
Am. 136, EL33– EL39. (doi:10.1121/1.4885484)
46. Moore BC, Glasberg BR, Peters RW. 1986 Thresholds
for hearing mistuned partials as separate tones
in harmonic complexes. J. Acoust. Soc. Am. 80,
479 –483. (doi:10.1121/1.394043)
47. Schneider P, Wengenroth M. 2009 The neural basis
of individual holistic and spectral sound perception.
Contemp. Music Rev. 28, 315 –328. (doi:10.1080/
07494460903404402)
48. Schneider P et al. 2005 Structural and functional
asymmetry of lateral Heschl’s gyrus reflects pitch
perception preference. Nat. Neurosci. 8, 1241–1247.
(doi:10.1038/nn1530)
49. Seither-Preisler A, Johnson L, Krumbholz K, Nobbe
A, Patterson R, Seither S, Lütkenhöner B. 2007 Tone
sequences with conflicting fundamental pitch and
timbre changes are heard differently by musicians
and nonmusicians. J. Exp. Psychol. Hum. Percept.
Perform. 33, 743–751. (doi:10.1037/0096-1523.
33.3.743)
50. Ladd DR, Turnbull R, Browne C, Caldwell-Harris C,
Ganushchak L, Swoboda K, Woodfield V, Dediu D.
rstb.royalsocietypublishing.org
30.
Neurosci. 21, 1488–1498. (doi:10.1162/jocn.
2009.21140)
Fine PA, Moore BCJ. 1993 Frequency analysis
and musical ability. Music Percept. 11, 39 –53.
(doi:10.2307/40285598)
Raffman D. 2014 Unruly words: a study of vague
language. Oxford, UK: Oxford University Press.
Egré P, de Gardelle V, Ripley D. 2013 Vagueness and
order effects in color categorization. J. Logic Lang.
Inform. 22, 391– 420. (doi:10.1007/s10849-0139183-7)
Chambers C, Pressnitzer D. 2014 Perceptual
hysteresis in the judgment of auditory pitch shift.
Atten. Percept. Psychophys. 76, 1271 –1279. (doi:10.
3758/s13414-014-0676-5)
Repp BH, Knoblich G. 2007 Action can affect
auditory perception. Psychol. Sci. 18, 6–7. (doi:10.
1111/j.1467-9280.2007.01839.x)
Kalisvaart JP, Klaver I, Goossens J. 2011 Motion
discrimination under uncertainty and ambiguity.
J. Vis. 11, 20. (doi:10.1167/11.1.20)
Demany L, Semal C, Pressnitzer D. 2011 Implicit
versus explicit frequency comparisons: two
mechanisms of auditory change detection. J. Exp.
Psychol. Hum. Percept. Perform. 37, 597– 605.
(doi:10.1037/a0020368)
Semal C, Demany L. 2006 Individual differences in
the sensitivity to pitch direction. J. Acoust. Soc. Am.
120, 3907 –3909. (doi:10.1121/1.2357708)
Mathias SR, Micheyl C, Bailey PJ. 2010 Stimulus
uncertainty and insensitivity to pitch-change
direction. J. Acoust. Soc. Am. 127, 3026. (doi:10.
1121/1.3365252)
Foxton JM, Brown ACB, Chambers S, Griffiths TD.
2004 Training improves acoustic pattern perception.
Curr. Biol. 14, 322– 325. (doi:10.1016/j.cub.2004.
02.001)
Wichmann FA, Hill NJ. 2001 The psychometric
function: I. Fitting, sampling, and goodness of fit.
Percept. Psychophys. 63, 1293– 1313. (doi:10.3758/
BF03194544)