Rapid pitch correction in choir singers

Rapid pitch correction in choir singers
Anke Grella兲
Department of Speech, Music and Hearing, Royal Institute of Technology (KTH), SE-10044 Stockholm,
Sweden and Institute for Music Physiology and Musicians’ Medicine (IMMM), Hannover University of Music
and Drama, Hohenzollernstrasse 47, DE-30161 Hannover, Germany
Johan Sundberg and Sten Ternström
Department of Speech, Music and Hearing, Royal Institute of Technology (KTH), SE-10044 Stockholm,
Sweden
Martin Ptok
Department of Phoniatry and Paediatric Audiology, Hannover Medical School, Carl-Neuberg-Strasse 1, DE30625 Hannover, Germany
Eckart Altenmüller
Institute for Music Physiology and Musicians’ Medicine (IMMM), Hannover University of Music and
Drama, Hohenzollernstrasse 47, DE-30161 Hannover, Germany
共Received 3 June 2008; revised 6 May 2009; accepted 11 May 2009兲
Highly and moderately skilled choral singers listened to a perfect fifth reference, with the instruction
to complement the fifth such that a major triad resulted. The fifth was suddenly and unexpectedly
shifted in pitch, and the singers’ task was to shift the fundamental frequency of the sung tone
accordingly. The F0 curves during the transitions often showed two phases, an initial quick and large
change followed by a slower and smaller change, apparently intended to fine-tune voice F0 to
complement the fifth. Anesthetizing the vocal folds of moderately skilled singers tended to delay the
reaction. The means of the response times varied in the range 197– 259 ms depending on direction
and size of the pitch shifts, as well as on skill and anesthetization.
© 2009 Acoustical Society of America. 关DOI: 10.1121/1.3147508兴
PACS number共s兲: 43.75.Rs 关DD兴
Pages: 407–413
I. INTRODUCTION
The voice is probably the most important tool for interhuman communication. Concerning the acoustic properties
of the voice, several aspects contribute to the information
transfer. Among them pitch may be one of the most prominent, being used to code emotional arousal, stress, prosodic
and grammatical sentence structure. A deviation from the
target pitch contour of an utterance therefore might affect or
even destroy the meaning of a spoken sentence.
With respect to pitch control, singers are faced with a
specific problem. Not only do they have to reach and preserve the required pitch in a melodic phrase with high precision, but also they need to rapidly adapt their intonation to
accompanying instruments or to fellow singers in an ensemble. Professional singers can be expected to be highly
skilled in this task, and even semiprofessional choir singers
should be able to quickly adapt their pitch, since this is a
mandatory prerequisite for the joyful experience of harmonious group singing.
The mechanisms allowing such precise pitch control
have been subject to several investigations. Wyke 共1967兲
stressed the relevance of a combined acoustico-laryngeal reflex system. According to Burnett et al. 共1998兲 auditory information from the spiral ganglion in the inner ear reaches,
via the ventral and dorsal cochlear nuclei, a collection of
a兲
Author to whom correspondence should be addressed. Present address:
Kleiner Schäferkamp 16E, DE-20357 Hamburg, Germany. Electronic mail:
[email protected]
J. Acoust. Soc. Am. 126 共1兲, July 2009
nuclei constituting the superior olive. From the superior olive, a fiber tract termed the lateral lemniscus projects to the
inferior colliculus. The brainstem pathway then projects directly from the inferior colliculus to the periaqueductal gray
and then via the nucleus retroambiguus and nucleus ambiguus to the motoneurons of the respiratory system in the
spinal cord and to the laryngeal motor system in the nucleus
ambiguus.
The long cortical auditory-motor pathway follows the
ascending auditory path from the inferior colliculus to the
medial geniculate nucleus. The ventral region of the medial
geniculate nucleus projects to the primary auditory cortex
共area 41兲, the dorsal region of the medial geniculate nucleus
to the secondary auditory cortex 共area 42, 22兲. From there, a
pitch control system could project from the anterior cingulate
gyrus and from the dorsolateral prefrontal motor region via
the limbic system to the periaqueductal gray and finally follow the same structures as the reflex-like pathway to reach
the vocal folds and the respiratory system 共Duus, 1995;
Deetjen et al., 2005; Friedrich et al., 2004; Schönweiler and
Ptok, 2004兲.
Most evidence for these pathways comes from animal
research 共e.g., Jürgens, 2006兲. However, empirical findings
in humans are also available. Sapir et al. 共1982兲, for example, asked their subjects to vocalize with constant pitch
and intensity, while receiving auditory stimulation with
clicks of different intensities. The electromyography 共EMG兲
of the cricothyroid muscle and the vocal output was assessed
and averaged using the respective clicks as triggers. The au-
0001-4966/2009/126共1兲/407/7/$25.00
© 2009 Acoustical Society of America
407
thors found short latency responses in fundamental frequency 共F0兲 共50 ms兲 and EMG 共11 ms兲 in response to auditory stimulation, which definitely supports the above
outlined brainstem path.
Brown et al. 共2008兲 found two sets of activation peaks in
their functional magnetic resonance imaging study while
subjects were phonating and performing glottal stops. One
ventromedial peak was located deep in the central sulcus and
one dorsolateral peak in area 6, which is more superficial.
These two peaks outline a kind of wedge, extending from a
sulcal position ventrally to a gyral position dorsally and anteriorly. They referred to this area as the “larynx/phonation
area” and claim that it is the major region for vocal control in
the human motor cortex. They further suggested a connection between this area and the nucleus ambiguus, which is
involved in activating intrinsic laryngeal muscles.
Over the past decades several investigations have studied reactions to suddenly occurring pitch shifts in the auditory feedback. Burnett et al. 共1997, 1998兲 studied the relevance of the feedback of one’s own voice to the reaction
time for pitch corrections in trained singers. Subjects were
instructed to phonate at a constant pitch. Without warning,
the auditory feedback was manipulated such that the pitch
that the subjects heard of their own voices suddenly shifted
by about a semitone. Analyzing the F0 change curves, Burnett et al. 共1997, 1998兲 identified two components in most
reactions, one early after about 160 ms on average and a
longer latency response after approximately 300 ms.
Kestler et al. 共1999兲 used the same experimental approach and analyzed the influence of the singers’ expertise
by comparing opera singers to laymen. When measuring the
reaction times, i.e., the time between the change in the auditory feedback pitch and the singer’s pitch change, they found
in professional singers two peaks in the distribution, one
early at about 113 ms and one late appearing at 261 ms,
while non-singers showed one peak only at 135 ms.
In both these investigations the results were interpreted
as evidence for two pathways involved in pitch control in
singing, a rapid mechanism that may operate via the brain
stem and a slower mechanism that may operate via the cerebral cortex. Kestler et al. 共1999兲 proposed that the slower
mechanism includes analysis, e.g., of the precise pitch
change required to stay in tune.
Zarate and Zatorre 共2008兲 recently found neurophysiological support for the idea that different regions of the
brain are recruited by singers and non-singers for the purpose
of pitch control. Thus, when subjects were asked to compensate for a pitch shift in the auditory feedback of their singing,
the singers recruited bilateral auditory areas and left putamen, while non-musicians recruited the left supramarginal
gyrus and primary motor cortex.
The technique of pitch-shifted auditory feedback has
also been applied to experiments with speech production. Xu
et al. 共2004兲 analyzed the effect of such shifts on the production of Mandarin tone sequences and found that the majority
of the compensatory pitch changes occurred at 143 ms. Applying the same pitch-shifted auditory feedback technique,
Chen et al. 共2007兲 studied the effects in the production of
408
J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009
English speech. They found a mean latency of 122 ms in
syllable production and a somewhat longer latency in vowel
production.
Singers are likely to rely not only on auditory but also
on proprioceptive feedback for reaching this level of pitch
control. Wyke 共1974兲 found that there are three different
types of laryngeal mechanoreceptors: stretch-sensitive myotactic receptors in the intrinsic muscles of the larynx, mucosal receptors in the subglottic mucosa, and articular receptors in the fibrous capsules of the intercartilaginous joints. At
any change in tension, subglottic pressure, or posture, the
information reaches the brain stem from Ia-afferents, and
then motor impulses go back to the larynx via efferent alphamotoneurons as a reflex. From a longitudinal study of pitch
accuracy under different conditions Mürbe et al. 共2004兲 concluded that kinesthetic feedback substantially contributes to
singers’ pitch control.
Larson et al. 共2008兲 also demonstrated the influence of
kinesthetic feedback on vocal pitch control by anesthetizing
singers’ vocal folds. They found larger response latencies to
pitch-shifted voice auditory feedback in the anesthetic than
in the pre-anesthetic condition. Additionally, they developed
a mathematical model suggesting that “early in response, kinesthesia alone provides feedback control, but after about
100 ms, auditory feedback also participates.”
However, up to now, there is incomplete information
available concerning the role of the kinesthetic system in
pitch control of the voice.
The aim of the present investigation was to elucidate the
physiological mechanisms that underlie pitch control; the authors measured the distribution of reaction times observed
when subjects were required to adjust their F0 in response to
a sudden shift of an external tuning reference. Measurements
were made under three conditions. In experiments 1 and 2
the effect of training was studied by comparing the response
time in highly and moderately skilled singers, respectively.
Experiment 3 aimed at studying the importance of the laryngeal kinesthetic system by analyzing the effect of anesthetizing the vocal folds.
II. MATERIAL AND METHODS
The subjects in experiment 1 consisted of a group of 13
highly skilled female choir singers, mean age 28.9 years
共range 23–39 years兲. All had extensive musical experience.
The mean choral experience was 19.3 years, range 5–30
years, and all had been taking solo singing lessons for 10.2
years on average, range 4–17 years.
Eleven female moderately skilled singers constituted the
subject group in experiment 2, age range 14–24 years, mean
19.9. On average, they had 10 years of experience of choral
singing.
Five female moderately skilled choral singers participated as subjects in experiment 3. They were all experienced
singers from different choirs in Hannover, aged 23–27 years,
mean 24.8. Two of them participated also in experiment 2.
At the time of the experiments none of the subjects reported any voice disease, hearing loss, or other medical problems.
Grell et al.: Rapid pitch correction in choir singers
The procedure was basically the same in all three experiments although they were run in different sound insulated rooms. Experiment 1 was run at the Department of
Speech Music Hearing, KTH, Stockholm, experiment 2 at
Institute for Music Physiology and Musicians‘ Medicine
Hannover, Germany, and experiment 3 at the Department of
Phoniatrics and Pediatric Audiology in the Hannover Medical School, Germany.
The subjects were presented with reference dyad stimuli
representing the sound of two fellow choral singers singing a
fifth interval. Their task was to complement this dyad by
singing the missing major third such that a complete major
triad resulted.
The two tones constituting the reference stimuli were
prepared in the following way. The tones were sung by a
female singer on the vowel /u:/ and recorded digitally one by
one on a Yakumo 166 MHz computer. The F0 values were
adjusted in COOL EDIT 共1996兲 so that they produced a perfect
fifth dyad 共D4 = 293.7 Hz and A4 = 440 Hz兲. The two tones
were then mixed and edited to 5 s duration. Using the COOL
EDIT program the F0 values of the two tones were shifted up
or down by either a quarter tone or a semitone at a point in
time that was randomly selected to occur at t = 1.5, 2.0, 2.5,
3.0, or 3.5 s following stimuli onset. The resulting 20 stimuli
共4 shifts⫻ 5 times兲 were recorded in random order and
were separated by 10 s silence. The set of 20 stimuli was
presented three times to each subject, making a total of 60
stimuli for every participant in each of the three experiments.
The initial pitch level of the missing third was always
F#4 共370 Hz兲. When the reference pitch shifted, the singers’
task was to adapt to this shift by changing their F0 accordingly. They were asked to perform this correction as fast and
accurately as possible and to phonate continuously until they
reached the new target F0.
The procedures were the same in experiments 1 and 2
while the subjects in the former were more experienced as
singers than those in experiment 2.
The anesthesia used in experiment 3 was applied by an
experienced phoniatrician. Prior to local anesthesia it was
ruled out that the subjects suffered from an allergy against
lidocain. Then, they were asked to phonate a neutral vowel
/e/. During phonation and while the tongue was gently fixed
by the examiner, lidocain was sprayed onto the vocal cords
using a commercially available spraying device 共Xylocain
Pumpspray, AstraZenica, Wedel, Germany兲. Two sprays of
20 mg lidocain 5% were applied. Due to the position of the
tip of the spray device not only vocal folds but also the
whole aditus ad laryngis was anesthetized. The recording
was started when the subjects reported a throat numbness
and swallowing difficulties. According to clinical experience
these effects are reliable signs of an anesthesia, sufficient for
performing phonosurgery.
The anesthesia was administered in a room close to the
experimental setup, so that the subjects could start the experiment immediately after the anesthesia was applied. The
experimental procedure was approved by the local ethical
committee.
As the subjects’ ability to perceive frequency differences
was crucial to the outcome of the main experiment, two adJ. Acoust. Soc. Am., Vol. 126, No. 1, July 2009
ditional tests were run in order to determine their just noticeable difference 共JND兲 for F0. These tests were conducted
after the first and after the second set of 20 stimuli. However,
as in experiment 3, the anesthetization effect lasted for no
longer than about 20 min, all three experimental blocks of
singing were performed in one sequence, and the JND tests
were carried out afterwards.
In the first JND test, the vowel /u:/ sung at an F0 of
293,8 Hz was presented followed by the same vowel sound
with an F0 increased by n ⫻ 1.73 cent, 20ⱖ n ⱖ 0. Thus, the
smallest stimulus difference was 0 cent, the second smallest
was 1.73 cent, the third was 2 ⫻ 1.73 cent, and so on, and the
largest was 20⫻ 1.73 cent, corresponding to 299.7 Hz. In the
second test, the vowel sounds were replaced with sine tones.
The subjects were sitting in front of a computer screen listening to two consecutive tones, each 2 s long and separated
by a 1.16 s pause. The tones were presented monophonically
over two loudspeakers. The singers were asked in a forced
choice condition to decide whether or not they heard the
same stimulus twice and to click accordingly on a yes or a no
button on the screen. If three consecutive answers were correct, the frequency difference between the following two
tones was decreased, and if an answer was wrong, the frequency difference in the following stimulus was increased.
After about 10 min the test was stopped, and the subject’s
JND was automatically calculated, thus specifying the subject’s ability to discriminate very small pitch differences.
In all three experiments the two stimulus loudspeakers
and the microphone were located at the corners of an equalsided triangle with sides of 1 m. The distance between the
microphone and the subject’s mouth was 5 cm. However, the
equipment used in experiment 1 differed from that used in
experiments 2 and 3. In experiment 1 the stimuli were played
from a Dell Optiplex GX 240 computer and presented monophonically over two loudspeakers 共Fostex Personal Monitor
6301B兲. The subjects’ responses were picked up by an omnidirectional microphone 共TCM 110兲. The subjects’ responses were recorded on one channel of a digital audio tape
共DAT兲 recorder while the computer stimulus was recorded on
the other channel. The stimuli and responses were then digitized and transferred to a Laptop 共Targa Visionary N251C2兲
in COOL EDIT by means of a custom-made program 共DINODAT, S. Granqvist兲. Translations between sound file formats
were performed by another custom-made program 共AUDIOFIL, S. Granqvist兲. The F0 analysis was performed by PRAAT
共version 3.8.27兲.
In experiments 2 and 3 a Yakumo computer 共166 MHz兲
and Typhoon PS 56 loudspeakers were used. The responses
were picked up by a Sennheiser Black Line microphone connected to a Viscount professional 共MM 8兲 mixing console
and an Aiwa HD-S200 DAT recorder. The responses were
digitally transferred from the DAT recorder directly into the
COOL EDIT program of the Yakumo computer.
For the analysis it was important to select a method
suitable for the material collected. A common way to identify
the onset of an F0 reaction is to define a variance estimate for
the F0 signal prior to the change and then to identify the
point in time when the signal exceeds this average variation
window. However, several subjects were singing with wide
Grell et al.: Rapid pitch correction in choir singers
409
FIG. 1. Examples of frequently observed patterns of pitch shift corrections: 共a兲 slow reaction after 723 ms, 共b兲 quick reaction after 50 ms, 共c兲 double pitch
change after 220 and 1290 ms, and 共d兲 overshoot reaction after 220 ms. These examples were taken from the professional and the moderately skilled singers’
groups.
vibrato extent that generated a very large average variation
of F0. Other subjects sang with no or quite small vibrato,
producing a quite narrow average variation window. Applying the above definition of the onset of F0 change would
produce a systematic delay of the reaction in subjects who
were singing with a wide vibrato. Therefore, another definition of the onset of F0 shift had to be applied.
The reaction time, defined as the interval between the F0
shift of the reference and onset of the subject’s response, was
measured using the ORIGIN 6 program. This program displayed the subject’s F0 contour together with the target F0 in
the same graph 关see Figs. 1共a兲–1共d兲兴. The onset of the response was manually identified in these graphs. The criterion
used for this onset was an F0 shift that approached the new
target without interruption, disregarding occasional vibrato
undulations during the F0 change. The time coordinates of
the onset of the F0 shifts were then collected in an Excel file.
To check data reliability a different experimenter 共co-author
JS兲 measured 75 randomly chosen reaction times. The mean
difference amounted to 41 ms 关standard deviation 共SD兲 45
ms兴.
The choice of statistical analysis was complicated by
the fact that there were different numbers of subjects in the
different experiments. Also, some subjects participated both
in experiments 2 and 3. Moreover, the data were not normally distributed according to a Kolmogorov–Smirnov test.
410
J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009
This suggested the application of non-parametric tests. A
Kruskal–Wallis test showed that the ranks were significantly
different. Therefore, the averages of the different conditions
were tested pairwise by means of a Mann–Whitney statistics.
As the observed values showed considerable scatter, particularly for the highly skilled subjects, it was necessary to
exclude outliers. All values lying above three times the interquartile range, counted from the median, were considered
as outliers. The minimum latency value for outliers was just
above 740 ms. To eliminate outliers in all groups from the
computation of means, all values greater than 740 ms were
excluded from the calculation of means. This implied that a
total of 17 values were excluded, 11 for the highly skilled, 5
for the moderately skilled, and 1 for the anesthetized subjects. The total number of observations thus became 1381.
III. RESULTS
The JND for frequency, averaged across all subjects in
all three experiments, amounted to 10.5 cent 共SD 4.4兲 for the
voice stimulus and to 14.2 cent 共SD 5.5兲 for the sine tone.
Thus the subjects were able to hear much smaller pitch differences than the quarter tone 共50 cents兲 used for the stimulus shift in the main experiment.
In the singing task, subjects performed in accordance
with the instructions in most cases, i.e., they changed their
Grell et al.: Rapid pitch correction in choir singers
FIG. 2. 共Color online兲 Distribution of reaction times for the three subject groups: 共a兲 highly skilled singers, 共b兲 moderately skilled singers, and 共c兲 moderately
skilled singers with anesthetized vocal folds.
F0 quickly and continuously until the new target was
reached. In some cases, however, they interrupted their singing when the reference stimulus changed, and in some cases
the onset of F0 change was very slow, thus preventing determination of the onset of the F0 shift. This problem was particularly frequent in the group of highly skilled singers.
These cases were eliminated from the subsequent analysis
共253, 82, and 7 in experiments 1, 2, and 3, respectively兲.
The curves shown in Figs. 1共a兲–1共d兲 show typical examples of different types of responses, all observed for a
quarter-tone shift of the stimulus. Figures 1共a兲 and 1共b兲 illustrate a slow 共723 ms兲 and a quick 共50 ms兲 reaction time.
Interestingly, in both these examples of a descending F0
shift, the rising phase of the vibrato cycle was interrupted by
the F0 drop. Figure 1共c兲 presents an example of a pitch
change containing two parts. The first one, appearing after
220 ms, is large 共+152 cent peak-to-peak, which was three
times as large as the stimulus shift兲. The second part is small
共⫺32 cent兲 occurring after 1290 ms. Figure 1共d兲 shows an
example of a marked overshoot, amounting to +134 cent.
The reactions of the two subjects who participated in
both experiments 2 and 3 did not differ notably from the rest
of these respective groups. Experiment 3 was run several
weeks later than experiment 2. Thus there was no indication
of a learning effect.
The distribution of reaction times for the different
groups is shown in Fig. 2 and in terms of box plots in Fig. 3.
J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009
The means and the standard deviations are listed in
Table I. As the Kruskal–Wallis test indicated that the ranks
were significantly different, the means of the groups’ latencies were compared pairwise. A Mann–Whitney test for two
FIG. 3. Box plots of the results for the three indicated subject groups. The
boxes represent the interquartile range, the heavy horizontal lines show the
medians, and the bars ⫾1 standard deviation. Unfilled circles show values
exceeding the standard deviation.
Grell et al.: Rapid pitch correction in choir singers
411
TABLE I. Means in milliseconds and standard deviations for all subject groups and all conditions. N is the number of trials.
All conditions
Highly skilled singers
Moderately skilled singers
Anesthetized vocal folds
Groups pooled
Semitones
Quarter tones
Descending
Mean
共ms兲
SD
N
Mean
共ms兲
SD
N
Mean
共ms兲
SD
N
Mean
共ms兲
SD
N
Mean
共ms兲
SD
N
227
206
251
223
120
135
122
128
516
573
292
1381
216
197
243
214
113
135
129
127
265
283
147
695
238
216
259
233
126
136
114
129
251
290
145
686
236
199
246
222
112
127
110
120
246
285
147
678
219
214
257
224
126
143
133
136
270
288
145
703
independent samples showed that the difference between the
means for the moderately skilled and the anesthetized groups
was significant 共p ⬍ 0.001兲. The same was true both for the
difference between the anesthetized and highly skilled
groups 共p ⬍ 0.001兲, and the difference between the moderately and highly skilled groups 共p ⬍ 0.001兲. The mean values
showed that anesthesia slowed down the reaction time in all
conditions. Moreover, high skill tended to be associated with
slightly longer latencies than moderate skill in all conditions.
The Mann–Whitney test further revealed that the direction of
the pitch change did not have a significant effect on the latency 共p ⬎ 0.1兲 comparing the mean latencies of the three
subject groups pooled. Looking at the subject groups separately, the moderately trained subjects, with and without anesthesia, showed the same nonsignificant result as the pooled
group 共both p ⬎ 0.5兲, while highly skilled singers showed
significantly shorter latencies in performing descending intervals 共p ⬍ 0.01兲. With regard to interval size, the quartertone shifts were associated with a significantly slower reaction in all subject groups as a pool 共p ⬍ 0.01兲, as well as the
separated groups 共in all groups p ⬍ 0.05兲.
IV. DISCUSSION
The authors studied how quickly choir singers adapted
their intonation to a change in a reference pitch. Burnett et
al. 共1997兲 carried out a somewhat related experiment in
which the pitch of singers’ auditory feedback was suddenly
shifted while the subject was instructed to sustain a tone and
to keep the pitch constant. Under these conditions they observed a mean latency of 159 ms 共variation range 104–223
ms兲 between the onset of the pitch shift of the auditory feedback and the onset of the subject’s attempt to correct F0. This
value is smaller than the mean values of 227 and 206 ms
observed in experiments 1 and 2. It is probable that the responses reported by Burnett et al. 共1997兲 were unconscious
while the authors analyzed a deliberate F0 change. In their
experiment, the subjects were frequently even unaware of the
pitch changes that they were performing. Our subjects were
performing a far more complex task that required analysis of
both the direction and the magnitude of the pitch shift.
The highly skilled group showed greater mean latency
values than the moderately skilled group. It is tempting to
speculate that this may be related to the neurophysiologic
difference between singers and non-singers observed by
Zarate and Zatorre 共2008兲. Quoting these authors: “Through
years of training and experience, singers have learned that
they need to monitor their auditory feedback closely to en412
Ascending
J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009
sure that their notes are produced correctly.” Moderately
skilled singers may be less perfectionistic than highly skilled
singers in their attempts to produce intended pitches. Such
perfectionism may require recruiting the motor cortex and
hence take more time. Anesthetizing the vocal folds slowed
down the reaction time, thus showing that the kinesthetic
feedback represents an important part of singers’ pitch control system. Mürbe et al. 共2004兲 found experimental evidence for the same conclusion.
The semitone intervals showed a shorter latency than the
quarter-tone intervals. This may simply be an effect of training, since the singers are more accustomed to halftone than
to quarter-tone intervals.
Burnett et al. 共1997兲 and Kestler et al. 共1999兲 suggested
the involvement of a double pathway for the control of vocal
pitch. Also other investigations have suggested the existence
of such a double pathway. Furthermore, as mentioned, Zarate
and Zatorre 共2008兲 observed that different subjects used different brain areas for the purpose of pitch control. One possible manifestation of a double pathway in our results would
be that some reactions were early and some late, i.e., a
double-peaked distribution. The histograms in Fig. 2, however, fail to show any distributions of this type.
On the other hand a double pathway could also be manifested in a different way in our data, such that several individual F0 curves showed two such phases in many cases. A
typical example of such a curve was shown in Fig. 1共c兲.
Constructing models of the human pitch control system
seems a promising avenue for further elucidating the pathways involved. For example, Guenther et al. 共2006兲 presented a neural network model of the components which
corresponded to regions of the cerebral cortex and the cerebellum, including premotor, motor, auditory, and somatosensory cortical areas. Another attempt was presented by
Xu et al. 共2004兲.
Several mechanisms are likely to have been involved in
the subjects’ behaviors, some conscious and others unconscious. Reflexes would cause quick, though imprecise reactions. The authors may speculate that very quick F0 shifts
such as the one illustrated in Fig. 1共b兲 共50 ms兲 are produced
by such an imprecise reflex system, while slow reactions
such as in Fig. 1共a兲 共723 ms兲 and the second F0 change in
Fig. 1共c兲 共1290 ms兲 rely on conscious control.
Also Hain et al. 共2000兲 observed an early and a late
component in subjects’ F0 changes. Interestingly, subjects
showed such double responses more often when they were
instructed to produce specific voluntary responses. The first
response was often incorrect, whereas the second, later one
Grell et al.: Rapid pitch correction in choir singers
almost always followed the instruction. This seems another
support for the idea of an early subcortical unconscious reaction and a consciously controlled cortical mechanism.
Our results seem relevant also to choral singing practice.
They showed that most singers have a rather quick reaction
to a shift in an external auditory pitch reference. In a choir
the fellow singers generally provide this reference and it is
important that the entire ensemble synchronize their pitch
changes. At least in many amateur choirs, one individual in
each choir voice tends to act as a leader and the fellow singers mainly follow this leader when they sing. Indeed, this
structure of the choral group was systematized in the Baroque period, when the leaders were called “concertisten”
and the followers “riepienisten” as the German music performance expert Wilhelm Ehmann described in his work 共Ehmann, 1961兲. Incidentally, this structure is systematically
implemented also in today’s orchestras in terms of the Concert Masters of each instrument group.
Xu et al. 共2004兲 found in Mandarin speakers that the
mean latency of 164 ms was comparable to the mean syllable
duration. This seems comparable to the situation in singing.
In coloratura singing typical note durations lie in the vicinity
of 125 ms 共Huron, 2001; Lindblom and Sundberg, 2007兲.
This is too short to allow a slow correction of pitch. This
implies that the strategy of ripienisten simply shadowing
concertisten in choral ensembles is not appropriate in fast
music. Instead singers need to hit the target closely enough
on the first attempt. Singers must know the entire sequence
of pitches before they start to sing it.
V. CONCLUSION
Our investigation of the reaction times in different
groups of singers has shown that, typically, highly skilled
choir singers reacted to a change in a pitch reference after
227 ms while moderately skilled choir singers’ reactions appeared after 206 ms. Anesthetization of moderately skilled
singers’ vocal folds tended to slow down the reaction. The
reaction time was typically shorter when the reference was
shifted by a semitone rather than by a quarter tone. Indications of a double pathway for F0 controls were not found in
the distribution of reaction times, but many singers’ F0
curves during the transitions showed two phases, an initial
quick and large change followed by a slower and smaller
change, apparently aiming at a fine-tuning of the pitch.
ACKNOWLEDGMENTS
Mikael Bohman kindly assisted with some of the MATprocessing. Henrik Jansson kindly discussed with us
some control-theory aspects of this investigation. Friederike
Lipka helped complementing the statistics. Kathrin Lüerßen
did the anesthesia of the vocal folds in the Department of
Phoniatry and Paediatric Audiology at Hannover Medical
School. Dietrich Parlitz who supported the first idea of the
LAB
J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009
study and the pilot experiment. The work on experiment 1
was done in Stockholm while participating in the Marie Curie Fellowship Program, sponsored by the European Commission.
Brown, S., Ngan, E., and Liotti, M. 共2008兲. “A larynx area in the human
motor cortex,” Cereb. Cortex 18, 837–845.
Burnett, T. A., Freedland, M. B., Larson, C. R., and Hain, T. C. 共1998兲.
“Voice F0 responses to manipulations in pitch feedback,” J. Acoust. Soc.
Am. 103, 3153–3161.
Burnett, T. A., Senner, J. E., and Larson, C. R. 共1997兲. “Voice F0 responses
to pitch-shifted auditory feedback: A preliminary study,” J. Voice 11, 202–
211.
Chen, S., Liu, H., Xu, Y., and Larson, C. 共2007兲. “Voice F0 responses to
pitch-shifted voice feedback during English speech,” J. Acoust. Soc. Am.
121, 1157–1163.
Deetjen, P., Speckmann, E. J., and Hescheler, J. 共2005兲. Physiologie (Physiology) 共Elsevier Urban Fischer, Amsterdam兲.
Duus, P. 共1995兲. Neurologisch-Topische Diagnostik (Neurologic-Topical Diagnostics) 共Thieme, Stuttgart兲.
Ehmann, W. 共1962兲. Concertisten und Ripienisten in der h-moll Messe Johann Sebastian Bachs (Concertists and Ripienists in the Mass in B Minor
by Johann Sebastian Bach) 共Bärenreiter, Kassel, Germany兲.
Friedrich, G., Biegenzahn, W., and Zarowka, P. 共2004兲. Phoniatrie und
Pädaudiologie (Phoniatrics and Paedaudiology) 共Huber, Bern兲.
Guenther, F. H., Ghosh, S. S., and Tourville, J. A. 共2006兲. “Neural modeling
and imaging of the cortical interactions underlying syllable production,”
Brain Lang 96, 280–301.
Hain, T. C., Burnett, T. A., Kiran, S., Larson, C. R., Singh, S., and Kenney,
M. K., 共2000兲. “Instructing subjects to make a voluntary response reveals
the presence of two components to the audio-vocal reflex,” Exp. Brain
Res. 130, 133–141.
Huron, D. 共2001兲. “Tone and voice: A derivation of the rules of voiceleading from perceptual principles,” Music Percept. 19, 1–64.
Hage, S. R., Jürgens, U., and Ehret, G., 共2006兲. “Audio-vocal interaction in
the pontine brainstem during self-initiated vocalization in the squirrel
monkey,” Eur. J. Neurosci. 23, 3297–3308.
Kestler, C., Parlitz, D., and Altenmüller, E. 共1999兲. “Experimentelle studie
zur schnellen tonhöhenkorrektur bei sängern und nichtsängern 共An experimental study in rapid pitch correction of professional singers and nonsingers兲,” Diploma thesis, Hannover University for Music and Drama,
Hannover, Germany.
Larson, C. R., Altman, K. W., Liu, H., and Hain, T. C. 共2008兲. “Interactions
between auditory and somatosensory feedback for voice F0 control,” Exp.
Brain Res. 187, 613–621.
Lindblom, B., and Sundberg, J. 共2007兲. “The human voice in speech and
singing,” in Springer Handbook of Acoustics, edited by T. Rossing
共Springer, Heidelberg, Germany兲, Chap. 16, pp. 669–712.
Mürbe, D., Pabst, F., Hofmann, G., and Sundberg, J. 共2004兲. “Effects of a
professional solo singer education on auditory and kinesthetic
feedback—A longitudinal study of singers’ pitch control,” J. Voice 18,
236–241.
Sapir, S., McClean, M. D., and Larson, C. R. 共1983兲. “Human laryngeal
responses to auditory stimulation,” J. Acoust. Soc. Am. 73, 315–321.
Schönweiler, R., and Ptok, M. 共2004兲. Phoniatrie und Pädaudiologie (Phoniatrics and Paedaudiology) 共Hannover Medical School, Hannover, Germany兲.
Wyke, B. D. 共1967兲. “Advances in the neurology of phonation: Phonatory
reflex mechanisms in the larynx,” British J. Communic. 2, 2–14.
Wyke, B. D. 共1974兲. “Laryngeal neuromuscular control systems in singing.
A review of current concepts,” Folia Phoniatr Logop 26, 295–306.
Xu, Y., Larson, C., Bauer, J., and Hain, T. 共2004兲. “Compensation for pitchshifter auditory feedback during the production of Mandarin tone sequences,” J. Acoust. Soc. Am. 116, 1168–1178.
Zarate, J. M., and Zatorre, R. J. 共2008兲. “Experience-dependent neural substrates involved in vocal pitch regulation during singing,” Neuroimage, 40,
1871–1887.
Grell et al.: Rapid pitch correction in choir singers
413